- Took a load of computer junk to All-Tech Recyclers in Hawthorne. They
were very easy to deal with, and they took the whole load without
complaint. Their phone number is 310-978-2970, and their address is
14311 Cerise Ave, #108.
- The SMTP symbiont process on Bigone kept dying on Monday. This turned
out to be caused by several pieces of spam that caused the symbiont to
stack dump. The system disk filled up with SMTP_SYMBIONT.DMP files.
Removed the offending messages ('Do YOU owe money to the IRS?') from
the queue and restarted.
- xntpd died on Granite. Disabled it in startup, and set up a crontab
entry to periodically run ntpdate to synch the clock.
- Set the SMTP service on Carizo, Ojai, Tejon, and Ridge to reject mail
from outside caltech.edu. This is to prevent a recurrence of the spam
that kills the UCX SMTP process.
- Larry from ITS came by and said they want to replace the new switch
that they just put in with a 2900, which would have 16 100Mb/s ports.
He asked for a list of ports that we would like to have upgraded from
10 to 100.
- At Egill and Patrick's request, I added delays in the Big Brother rules
so that Patrick will not be paged unless a problem has gone unresolved
for over 60 minutes.
- Had the DNS entry for Terra10 changed to return the IP address of Rift.
- Set up CGI execution on Willow in Bruce's testing area.
- Hotspot was reporting disk errors on Wednesday. The disk on
/wavepool/tmp/2 was logging errors. Tried swapping it for another disk,
but a SCSI imcompatibility made this not work. Since the errors were
all at high block/cylinder numbers, repartitioned the disk to only use
the first 2047 of 4093 cylinders. Build a new file system and restarted
the system.
- Hotspot was reporting more disk errors on the disk at c2t8. Moved
everything to a spare and removed the bad disk.
- DREN cut off HTTP access to the USGS networks from 09:40 to 14:20 on
Wednesday, due to concern about the Code Red worm.
- Turned off Terra10 and removed it from the rack.
- Transplanted the 9GB disk from Terra10 into Granite.
- Took the 320MB of memory out of the former Flint and put it in Silver
to see if this would speed it up for running Jiggle.
- The disk on /wavepool/tmp/2 on Hotspot was showing a lot of errors.
We did not have a spare for it, so I partitioned the disk to only
use the first half of the cylinders, which avoided the bad section.
- One of the 45GB disks on Hotspot went bad on Friday. Replaced it
with a spare and got RMA 5390296 to send it back to Seagate.
- Meeting on Friday to discuss how to deal with Patrick's leaving.
- Meeting with Patrick and Mandy to learn how to update configurations
on Jet and Spring, as well as how to maintain paging.
- Trip to Menlo Park for the computer and network security meeting.
- Ehzsquidmenlo died at 03:15 on Tuesday. Bob brought it back from
Menlo Park after it became clear that the power supply was not the problem.
- Talked to Charlene about setting up MX records for eqinfo.wr.usgs.gov
so that both mail servers can be running simultaneously.
- Set up Eqinfo1 for testing.
- Helped Karen replace a failed disk drive in their GigaRAID HA.
- Turned off telnet and ftp on all the FreeBSD machines.
- Instituted Big Brother monitoring of machines to make sure that
disabled services, such as telnet, are not turned back on.
- There were two short campus-wide power failures on Thursday morning.
They only totaled about 15 seconds without power, but machines off the
UPS all crashed. They occurred at 07:39 and 07:52 PDT.
- Added Mandy to the list to receive pages about SNMP traps generated
by the S. Mudd UPS.
- Reinstalled FreeBSD on eqinfo1.
- Got Sqehzeast [the Northern California Squid server that lives in
Reston] for postmortem. It had been hacked on Monday.
- M5.5 near Portola at 13:29. Web traffic in Pasadena peaked at 58/sec.
Menlo Park peak was 520/sec, and Earthquake Hazards peaked at 70/sec.
- ITS took their equipment out of the basement of 525 for the duration of
the remodeling.
- Added a bullet on the main Pasadena page about the M5.5 event. Had to
hack all the scripts, since they were all assuming that any event ID
would be either 'ci' or 'nc', and this was an 'nn' event.
- Did the postmortem on Sqehzeast. It was apparently broken into with
the telnetd exploit, and a trojaned login program was installed. They
also put a few other programs on it to try to use it against other machines.
Details at: http://bort.gps.caltech.edu/stan/mail-archive/msg00031.html
- Got Eqinfo1 working.
- Got some corrections to the Akamai Discovery Document from Nancy Dickman.
- Found a typo in the search.html page on earthquake.usgs.gov. This was
an extraneous '2' in the field name on the form. This caused any query
entered to be ignored and treated like it was blank.
- Made a script to put the duty operator and duty seismologist lists on
the Trinet Intranet web page.
- Found a problem with station CAB. The 'active' file was not being
closed out every hour, but instead was growing obscenely large. Reported
this to Mandy, who found a typo in the station configuration file.
- A machine on the 66 subnet was sending out Code Red probes. Bob looked
in the DHCP logs and figured out that it was Aris' machine. ITS blocked
it at the router.
- Found the problem with Ehzsquidmenlo. The memory DIMM in it had
gone suddenly, spectacularly bad. Replaced it with another DIMM and
the machine booted normally. Shipped it back to Menlo Park on Wednesday,
and it was back up by 12:06 on Thursday.
- M4.4 and M4.2 events offshore on Thursday. Web traffic reached 140/sec
in Menlo Park, 70 in Pasadena, and 26 on the Trinet server.
- Fixed Sqehzeast. Removed the trojaned login program and deleted the
other stuff that had been put on it. Shipped it back to Reston on
Thursday.
- After being fixed, Aris put his machine back on the network on Thursday,
and it was immediately reinfected. ITS blocked it again.
- Added expires headers to the /mail-archive directory on Agent86.
- Testing Eqinfo1.
- Fixed a problem with the script that generates the 'Old Commentary'
index page on Flint.
- Made up a parts list for building the big file server for Egill.
- Added a feature to the Eqinthenews web page to allow for adding a
bullet linked to a page on another server.
- The disk at 1,4 on the Jet AA RAID failed on Tuesday. The contoller
serial console was not responding, so nStor said we had to power-cycle
the RAID. Replaced the failed disk with a spare and got RMA 60868819 to
send it back to IBM for replacement.
- The disk for RMA 5390296 came back from Seagate on Tuesday.
- Took 'eqinfo.wr.usgs.gov' out of Jet and Spring's hosts files. This
name now resolves into a pair of MX records, which gives us a failover
capability. The hosts file entry was short-circuiting this.
- Changed the 'a2m' entries to 'ew2mcast' in the process list for Big
Brother on Magma.
- Eqinfo1 went live on Wednesday morning. Had to add 'eqinfo.wr.usgs.gov'
to /var/qmail/control/locals to get it to accept incoming mail properly.
- Restored some files on the MEM_00 disk that Raven had accidentally deleted.
- Renewed the 'anss.org' domain name.
- Made the 'listsync.pl' script on Eqinfo1 to check for subscribe/
unsubscribe activity on the mailing lists and synchronize the lists on
the backup server.
- Built GMT from source on FreeBSD as a proof-of-concept.
- Set up backups for Eqinfo1.
- Made a script to build an archive index of old EqInTheNews web pages.
Rewrote it in Perl and added the wrappings to put the standard navigation
bars on it.
- Gave the parts list for the big file server to Ken Ou.
- Got a new Exabyte tape drive to replace the one that died recently.
- Checked out a strange email that was generated by the sitemail.cgi
script on earthquake.usgs.gov. It appears that someone was probing it
to see if it was vulnerable to a known bug.
- Fixed the links on the Trinet documentation pages. Replaced 'terra10'
with 'rift' in the links.
- Found out that the plastic spiral binding on the nStor GigaRAID manual
glows in the dark. I have no idea why this should be so, but it was
an amusing discovery.
- Changed the EqInTheNews archive script to have it write the index file
for the 'eqinthenews' directory. This way, if a user goes to that
directory, by default they will see a listing of all past newsworthy
events.
- M3.4 near Idylwild. Caused a minor web traffic spike.
- Discovered that qmail has a default limit of 120 on concurrency. Found
a patch to allow this to be higher. Recompiled qmail with the patch on
Eqinfo1 and set concurrency to 250. Details:
http://bort.gps.caltech.edu/stan/mail-archive/msg00034.html
- Installed tripwire on Rift.
- Put up an online signup for Dog Day at http://bort.gps.caltech.edu/dogday/
This is based on http://freshmeat.net/projects/phpgiftlist/
phpGiftlist v.0.1 and http://www.mysql.com/ MySQL.
- Users with T-900 two-way pagers reported not receiving the CUBE test
on Wednesday morning, and not receiving the M3.4 on Tuesday. Called
Bill Frank at Airtouch and found out that the email gateway we had
been using, 'airtouch.net' had been discontinued. The T-900 pagers
are now supposed to be accessed by sending mail to
'1112223333@airtouchpaging.com', and Verizon cell phones are supposed
to be addressed at '1112223333@msg.myvzw.com'.
- Set up automatic propagation of mailing list changes from Eqinfo1
back to the Eqinfo. Eqinfo, which is the old server, is now configured
to be a backup in case the primary server fails.
- T1 outage for several hours on Wednesday. Checked the power in the
basement of 525, and it appeared to be all right.
- Found 200,000 blocks of 'SHSYS.TMP' files in SYS$MANAGER on Bigone.
This was the cause of the 'file header full' errors, and also the
reason the system disk was so full.
- Got the case and some of the parts for the new big file server.
- Looked at a problem with the 'masterdb' script. It apparently fails
when run under Solaris 2.7.
- Moved Kate's computers to the new UPS outlet in her office.
- Moved Ellen's computer to the new UPS outlet in her office.
- Installed new HP printer drivers on Iron so that the HP printers
can be spooled off Iron instead of Bigone.
- Added the dgga library directory to the LD_LIBRARY_PATH for Apache
on Agent86. This is for the SCIGN map server.
***
- Attended the network security meeting in Menlo Park. This meeting
was held in response to security concerns voiced by the ISP providing
service to the USGS networks.
- Set up a new primary mailing list server and set up the original server
to serve as a hot backup with automatic fail-over.
- Did a post-mortem on one of Menlo Park's servers. It had been broken
into and used to launch attacks on other machines in Reston. Found the
hole that was used to break in, and also the programs that were installed
on the machine. Cleaned it up, fixed the hole, and sent it back to go
back into service.