- There was a problem with the EHZ DNS servers on Saturday night. They
thought that the Pasadena web server was down, and were directing our
traffic to their mirror copy in Menlo Park. Wrote to Will Prescott about
this and requested that it be set up to never fail over to an alternate
server.
- Installed 256MB of new memory in Galena for Frank Vernon.
- First Trinet transition meeting to deal with Phil leaving.
- The rsync process between eqweb-menlo and eqweb-east and north was not
deleting files properly. Turned out the source directory had to be
specified as '/directory/' without a trailing '*' character.
- Fixed secure shell logins for the eqnews account on ehznorth and ehzeast.
- Tested SNMP querys and set up a daemon to receive SNMP traps from the new
S. Mudd UPS system.
- Moved the Ehz Pasadena Squid server across the street to the basement of
S. Mudd so that it can be on a 100Mb port there.
- Helped Menlo Park with setting up expirations for pages served by their
web servers. This is to fix improper caching of the Recent Earthquakes
pages by proxy servers around the Internet.
- Database transition meeting to deal with Sarala leaving.
- Attended a briefing on how comserv works and also one on the Berkeley
software used on Trinet.
- Added UPS monitoring to Big Brother. The Big Brother page now monitors
UPS general status, battery status, network connection, and load average.
- Made an account for Ned Field on Ehzsouth.
- Fixed the 'Fault' hack on the eqinfo mailing lists. The sed line had to
be on the same line as the 'qmail-send' command in the .qmail file for the
list.
- Helped Katrin set up Oracle backups.
- M4.1 Big Bear and M4.8 Truckee events on Saturday morning.
- Talked to Patrick about Trinet RT system issues.
- Analyzed the traffic patterns generated by the events on Saturday. Put
up some web pages about these events' effect on our web servers at
http://bort.gps.caltech.edu/spikes/02dec2000/
- Helped Karen with a problem on K2 at 22:30 on Monday.
- Moved Jet and its RAIDs to a new ventilated rack on Tuesday. This took
most of the day, and required assistance from Carl, Paul, and Mike.
Since then, the system has been running about 10 degrees cooler, so the
operation was a success. Also, the new rack is bolted to the floor for
security.
- The S. Mudd UPS reported a 4 second outage on Tuesday at 10:53. This was
reported via an SNMP trap.
- Got the DLT drive from Terra10 and put it on Iron for use by Trinet.
- Talked to Egill about the RT system RAID issues.
- Meeting about the new T1 numbering scheme. Phil presented his new scheme,
which was identical to the one that Busby and I came up with back in March.
This scheme did not work, because the routing software in the FRADs was not
capable of dealing with subnet masks that were not a multiple of 8 bits.
We redid the scheme to use 24 bit masks and got SOT renumbered on Friday.
- Renamed Goldfish back to Lamprey while reconfiguring it for the new
numbering scheme.
- Added Dave Johnson, Kate, Nick, and Karen Kahler to the list to receive
SNMP trap notifications for the UPS.
- The system disk on Bigone filled up over the weekend. In digging around,
I found a 300,000+ block errorlog file from 1996.
- Moved Willow to the Swordfish network so that it will push Shakemap
information to Edison over the microwave link.
- Set concurrency to 100 for qmail on Eqinfo. This should help speed up
delivery to the mailing lists in the future.
- Added two Media Relations people to the quake-alarm mailing list on
Ehzsouth.
- Attended the monthly Seismo Lab social. Saw Phil get his paperweight.
- Talked to John Alden <jalden@nstor.com> (888)-811-6743 about options for
adding disk space to the SX RAID-1 array.
- Talked to Katrin about the disk space that will be needed for the
database on each online system when we reconfigure the big RAID-5 array.
- Reston suffered a power failure from 08:45-09:50 PST on Wednesday. The
computers there stayed up on emergency power, but the routers were down.
They are looking at putting the routers on emergency power.
- Worked out a small revision in the numbering convention for the far-side
LANs in the new T1 numbering scheme. This revision was suggested by Mike
Watkins. The scheme is documented at
http://bort.gps.caltech.edu/stan/net/lamprey.html
- Arranged for Kimo to make a new account for Greg Anderson.
- Assisted in the testing of the new building UPS. It worked properly, and
sent pages and email about the 'power failure'.
- Hugo reported that the direct line to Associated Press was not working.
We looked, and the wire had come loose from the modem. Plugged it back in.
- The Lexmark color printers are finally fixed.
- Looked up information on load balancing DNS services provided by Akamai
and Speedera. Speedera would want something like $6000 a month to do DNS
for the earthquake hazards web site. This seems a bit steep.
- Gave Ken Ou the shopping list of parts for the new Trinet web server.
- Attended another of Phil's briefings about Trinet configuration files.
- Got a new Arcinfo license for Willow. Called ESRI and spoke with Kevin
Schumm at 888-377-4575, and our customer number is 1699.
- Investigated a weird problem with the Squid servers for the National web
site. For some unknown reason, some clients were using them to proxy other
sites on the Net. Testing showed that Squid was willing to server as a
proxy for any client that asked, despite being configured to not allow this.
The fix was to downgrade Squid from 2.3 to 2.2, which behaves correctly.
- There was a problem with connections from Menlo Park to Reston. The
usgs.net routers were not acting correctly. Called Judy Konnert in Reston
and she relayed the problem to the router people there.
- Attended Phil's going-away luncheon.
- Renamed 'water' to 'wacke' at the request of the Timers.
- Sent in a revised list of Sun nodes to ITS for licensing.
- Ran checkbot against the National web site to check for bad links.
- Attended the GPS Holiday Party.
- Installed a new 9GB disk in Terra10. This was to fix problems caused by
lack of disk space. Also, it appears that the act of reseating the disk
cables has helped the problems with excessive iowait time on the CPU.
- M4.4 Grapevine event at 17:04 on Saturday. Web traffic peaked at about
5.3 hits/sec, and the mailing list server sent out 1200+ messages in about
three minutes.
- Moved disk partitions around on Terra10 to take advantage of the new disk.
- Set up a new directory and virtual server for Lisa to use for testing the
revamped Pasadena web site.
- Assisted Hugo and Patrick in troubleshooting the AP data connection. The
fundamental problem was that Patrick's program had been configured to talk
to the AP modem through the Lantronix terminal server, but Phil had never
informed me of this, so the modem was still plugged in to the serial port
on Jet.
- Moved the quake-alarm mailing list from ehzsouth to eqinfo and converted
it to ezmlm. Majordomo had taken 19 minutes to send out 64 messages to
this list after the M3.1 Big Bear event on Tuesday. This turned out to
have been caused by a balky mail server at home.com, but the change to
qmail should prevent a recurrence of this problem in the future.
- Added Alberto's new pager to the eqpage mailing list.
- Added links in the station_updates directory. This is so the old,
broken links from the Trinet Watch page will still work. ISTI still needs
to fix these links.
- Got some sample hardware configurations for the new Timers' workstations.
- Changed passwords on all Trinet systems and held a Timers meeting to tell
the duty operators about the new passwords.
***
- Set up Big Brother to monitor the status of the new building UPS system
in the Seismo Lab. Also set up a receiver for SNMP traps sent by the UPS.
This will send pages for severe alerts on the UPS.
- Moved the primary Trinet online system to a new, better-ventilated rack.
This has helped the overheating problem that it was experiencing.
- Worked with Mike Watkins and Mandy Johnson to implement the new numbering
scheme for the new T1 and set up the first station on it.