- Got five disk drive bay cooler fans for the new file server. Installed
these on alternate disks in the upper part of the server disk bay. Also
added two exhaust fans at the rear of the case. All of this has the disks
running much cooler.
- Tried taking a 50GB disk out of the Andataco can and mounting it directly
inside Hotspot. The disk turned out to be too fat to fit in the slot.
- Got part numbers for a pair of 73GB disks for Hotspot. Ken ordered them
on Wednesday. They came in and were installed on Thursday.
- Set up a mirror of the USGS web pages on Fang.
- Put Ken Hudnut's disk in Fang. Had to rebuild the kernel with the NTFS
option. Then we mounted the original NTFS disk in the machine temporarily
in order to transfer the data. There is now a virtual server on Fang at
http://rincon.gps.caltech.edu to serve up Ken's data.
- Asked Charlene to set up DNS for shakemap.org.
- Did some diagnostics on the refrigerator and found a broken wire in the
defrost timer. Took it to Gary, and he figured out how to reattach it.
This seems to have fixed the refrigerator.
- Rebooted Bigone on Friday to clear an NFS mounting issue for the
jukebox.
- Got the replacements for the disks in the Spring RAID on Friday.
- Columbus Day holiday on Monday.
- Experimented with different NFS options for mounting the Pluton RAID
disks on Hotspot. Added 'vers=2' to vfstab.
- Changed Scree to 131.215.65.192.
- Put in the DNS change request for shakemap.org.
- Rewrote the remove_old_datafiles script on Hotspot to convert it from
shell to Perl. It is scheduled to run once an hour, and the old shell
script was taking over an hour to run, since the wavepool on Hotspot
contained so many files. The Perl script is able to process the whole
wavepool in about 20 minutes.
- Helped Ken Ou with the disk cabling for the big file server he is
building.
- Set up an account for 'shake' on Fang.
- Added Fang to Big Brother monitoring.
- Added network and CPU usage monitoring on Fang.
- Made a 'timers' mailing list on Eqinfo to replace the old DIS$TIMERS
distribution list on Bigone.
- Looked into getting some long keyboard and monitor cables for when
Pluton moves into the telemetry room.
- Tested the new wavepool cleaning script on Spring.
- Meeting to begin discussions of a new RT system architecture.
- Installed one of the Exabyte Eliant drives in Bort.
- Added Vikki to the email lists.
- Rewrote move_datafiles in Perl as a test. This did not seem to
perform substantially better than the shell version of the script.
- Added Raven's cell phone to the earthquake paging email list.
- Investigated a problem reported where some stations had data gaps
near the top of the hour. This led to examining the actions of
datalog. Datalog closes out all its files and opens new ones at
the top of each hour. This spike in I/O activity was causing the
systems to have a petit mal seizure for about 60-80 seconds.
- Looked at options for disks for Jet and Spring. Faster disks were
considered. Briefly considered a solid-state disk, but the $22,000
price tag was out of the question.
- Investigated a problem with the DBD modules on Iron.
- Ned reported a problem with the mailing lists page on the RELM
web site. This turned out to be caused by a redirect that was added
to the server configuration file when Lisa rearranged the Pasadena
web site.
- Looked into the possible transfer of the shakemap.com domain.
- Got some faster disks for the temporary wavepools on Jet and Spring.
They are 18GB, 10000rpm, and claim a 5.2ms average seek time. Put four
of them on Jet. This lowered the I/O overhead and essentially
eliminated the system seizures.
- Since nearly all actions have unintended consequences, the faster
disks on Jet led to the machine experiencing tremendous load spikes
at the top of the hour. Before, the system would seize up, and it
took about 20-30 seconds to recover. Now that it has faster disks,
it doesn't seize up, but all 358 datalog processes try to run the
Perl script to make the wavepool links at the same time. This creates
a load spike of about 110-120.
- Patrick modified datalog to include a configuration parameter for an
offset time. Installed this on Hotspot and set the datalog processes
to close their files over a 10 minute interval. This spread the load
out enough to eliminate the problem.
- The 'change your password' option on the Rift web page was broken.
The problem was in how the 'cgi-bin' directory was configured in the
web server. There was an extraneous <Directory> section that overrode
the correct configuration.
- Made a web page account for Tom Heaton on Rift.
- Attended the CUBE meeting on Friday morning.
- Made an account for Phil Powers on Agent86, so he can maintain the
ANSS web site.
- M4.0 near Compton at 08:27 on Sunday.
- Added two more disks to the temporary wavepool on Jet.
- Set the hosts.allow file on Agent86 to allow Phil Powers to log in
from the office in Golden.
- Made a Rift web password for Jamie.
- Took the second password off the eqnews/bin directory. Security is
nice, but it's not much good when it's so secure that the Duty
Seismologist can't use it.
- Swapped the temporary wavepool disks on Spring and Hotspot. Put
6 new disks in the enclosure on Spring.
- The Catalyst 5000 in the USGS office died at 21:50 on Tuesday. This
took the entire USGS network off the air.
- M5.1 event near Anza at 23:56 on Tuesday.
- Moved Agent86 and Eqinfo1 across the street early Wednesday morning
so that the web pages and email could go out.
- Got ITS to replace the failed Catalyst. The network was back up by
about 13:00.
***
- Reconfigured the disks on all the Trinet Real-Time systems. Installed
six-unit fast disks for initial data collection. The fast disks relieved
the problem of system overload caused by high levels of disk activity.
This fixed the observed problem of some data streams having gaps due to
lost data packets during times of high system load.
- The main network switch in the USGS building suffered a catastrophic
failure. Worked with the Caltech network people to set up a workaround
and then to replace the failed hardware.