- Hotspot had a disk problem on Monday. SCSI unit 8 [a 45GB disk in the
Andataco unit] was reporting many errors. This caused the load average
on the machine to skyrocket. Shut it down and cycled power on the disk
unit. This seemed to fix the problem for the time being.
- The new toner cartridges came in for the Lexmark Optra SC in 525.
Replaced the black toner and cleaned the printer.
- The replacement memory kit for Jet arrived and was installed on
Tuesday.
- Got a copy of version 1.3.1 of the Java Plugin for testing on Baldur.
At first it didn't work, because the system was missing some required
patches. Installed the patches on Wednesday and it seemed to work
after that.
- Got the Akamai discovery document from Jill McCarthy. Gathered
the information they will need to set us up with their service and
sent it back to them.
- Talked with Doug and Bob about the feasibility of using a DirecPC
satellite connection for telemetry. The consensus was that it would
certainly work if we can wheedle a static IP address out of them, and
it could probably still be made to work with DHCP.
- Fixed some ssh problems for Bruce on Terra10 and the new replacement
for Flint.
- Put the new pager CGI with groups on the Terra10 web page.
- Fixed the Xearthwormhub archiving to upcase everything after the
initial uppercase 'X'.
- Hotspot's disk problem came back on Sunday. The disk appears to
be failing. Called nStor and got RMA 26268 for it. Sent it back
on Tuesday.
- Called nStor to get a quote for a replacement disk for Hotspot. This
will get us back online sooner, and when the warranty replacement comes
in, we will have more disk space on the machine.
- Moved Genie to a rack shelf in the telemetry room.
- Got the replacement for the failed power supply from the Jet AA RAID.
- Installed a new 18GB disk on Iron for use by Oracle.
- Started setting up the new Ultra-60 that will replace Terra10.
- Made an account on Agent86 for Mark Benthien at USC. He is going to
be doing work on the RELM web site with Ned. I had to set up a
'.login_conf' file in his home directory to set the umask for ftp to
allow group write access for the RELM group.
- Met with Phillip Vaziri from Caltech about the air conditioning
requirements in the new computer room slated for 525. The original
estimate was for 6.5 tons, which we decided was overkill for a 170
square foot room. We finally settled on a figure of 3 tons, which
should give us about a factor of 2 safety margin.
- 535 had a power failure on Friday afternoon. The building was
running on the temporary power line, and it appears that both air
conditioners cycled on at the same time, which caused a surge that
blew the 80-amp fuse in the temporary power feed.
- The Caltech electrical shop people replaced the 80-amp fuses in the
temporary power feed with 100-amp. This should allow them to take the
startup current that the air conditioners require.
- There was a planned power outage in Reston on Sunday. After the
reboot, the Menlo Park Squid server had a problem, and we had to
restart the squid process. Also, the web server Ehzeast had a problem
with Apache, and required a restart.
- Installed Solaris on the new Ultra-60. The machine is going to be
called 'rift'. Began configuring rift as the new Trinet Intranet
web server. Moved it into the computer room and installed it in the
rack above Rtdev.
- Put a pair of spare 18GB disk drives on Hotspot so that we can get
the machine running again while we are waiting for the warranty
replacement for the failed drive to come back from nStor.
- CPU 0/1, which is the second CPU on board 0, failed. This caused
the machine to crash. After reseating the CPU and memory on board
0, the machine passed self-test and was able to reboot. The one good
thing to come out of this is that we found that the machine came back
up much faster than in previous crashes. This is due to the major
reduction in disk I/O activity on the RAID due to the wavepool
reconfiguration.
- Made a temporary web server at 66.35 for Lisa to use to test her
new layout for the Earthquake Hazards Program web site.
- Installed the new machine to replace Flint on Monday morning. Made
the switch at 13:20 on Monday afternoon.
- We received a new 45GB drive from nStor on Monday. Installed it
on Tuesday morning and set up the disk partitions the same as they
were before.
- Moved the main Big Brother server installation off of Terra10 and
on to Rift. This required changing all the Big Brother clients to
send their reports to the new server.
- Got the station updates mail archive working on Rift on Tuesday.
There was a problem with a stray MX record for Rift's address. After
ITS removed the MX record, mail worked correctly.
- Built a new mailing list server on Tuesday. This will become the
main mailing list server, with the current Eqinfo server acting as
backup.
- The replacement for the failed 45GB drive came back from nStor on
Wednesday.
- Iron had a disk problem on Wednesday afternoon. It appeared to
clear up after rebooting the machine, and the drive passed low-level
tests.
- Installed xemacs on Flint for Lisa.
- There was a problem with Big Brother on Wednesday morning. This was
caused by the duty operator change script, which still thought that
Big Brother was living on Terra10. When it restarted Big Brother on
Terra10, it began sending out spurious problem messages. Fixed the
script to reflect the new setup.
- Made a copy of Alberto's Electronics Shop web pages on Rift. Also
made an account for him to use to maintain them.
- Set up one of the new Ultra-10s for Ellen to use. The machine name
is 'hotlava'.
- Removed the old 'Flint' from the rack. Gave the CPU box to Kimo
to keep for spare parts.
- Did some research to locate computer recyclers.
***
- Replaced the old Sparc/143 web server for Trinet with a new machine
running an AMD K6-2 and FreeBSD. This will make the server somewhat
faster and more reliable.
- Set up a new Ultra-60 to serve as the Trinet Intranet web server,
replacing the overloaded Ultra-5 it was running on previously.
- Filled out the Akamai Edge Suite Discovery Document and sent it
back to Akamai. This is the first step towards setting up our web
servers for their service. When the service is implemented, it will
give us the web service capacity to accomodate the traffic surges that
occur after large earthquakes.