- There was a flood in the computer room on Monday. Shut down Carizo to
move the two UPSs onto a shelf in the Tustin rack.
- Fixed the ethernet usage graphs for the new Menlo Park squid server.
- Helped Stan Silverman fix the timezone on the Menlo Park squid. Had to
restart cron and squid to get them to see the new timezone.
- Deleted Sarala from paging. Added Jascha.
- Installed gcc, gdb, and some other packages on Granite per Paul Friberg's
request.
- Made the newly-redesigned office web site live at 13:30 on Thursday.
- Patrick moved the rest of the stations on Spring to the temporary
wavepool. CPU iowait time dropped from 25% to less than 10%, and the
machine is very snappy and fast now.
- Set up an RRDtool script to monitor and graph the total traffic on the
three Earthquake Hazards Program web servers.
- Moved Eqinfo to sit next to the rack in the hallway nook in 535.
- Installed a new version of Jiggle on Scree.
- Fixed up the navbars for the new-style Simpson maps on Agent86.
- Spring had a problem on Sunday. The file-moving script encountered an
error and refused to run. The temporary wavepool disks filled up. Split
the file moving procedure into four separate scripts, one for each
temporary disk. Fixed the error.
- Web Team conference call on Monday to discuss the fallout from the
Nisqually earthquake.
- Right-side power supply on the bottom chassis of the Spring AA RAID had a
fan fail. Got RMA number 25559 and had Paul ship the power supply back.
- Got copies of the Denver web farm server logs for February 28th. These
cover traffic the NEIC web site on the day of the Nisqually earthquake.
- Analyzed the NEIC web logs and wrote a report about the failure:
http://bort.gps.caltech.edu/spikes/28feb2001/neic.html
- Added 'eqnews' to the usgswww group on Agent86.
- Changed the Big Brother qpage to use snpp.airtouch.com.
- Rebooted Lander to clear a problem with the window system.
- Went to Spring Internet World on Wednesday. Looked at web accelerator
caching appliances, and also inquired about caching services.
- Changed the TCP MSL on all the Squid servers from 30 to 3 seconds. This
was recommended by the Squid developers as a way to speed up reuse of TCP
ports when the system is under heavy load.
- Set up a directory for the Earthquake Hazards Program Internal web pages
on ehzmenlo.
- Talked to Linda Feather at Sun about our upgrade options for Iron and
Rtdev.
- Talked to Will Prescott about DNS issues and what to do when their second
Squid server comes on line.
- Spring had a problem on Thursday night. The fourth disk in the temporary
wavepool developed some bad blocks. Rebooted in single-user mode and ran a
format repair. Reported that it repaired 22 bad blocks. After this, the
disk passed fsck and the system booted normally.
- Rebooted Agent86 on Friday morning to enable soft updates.
- Helped Stan Silverman build and configure their second Squid server.
- Set up for MRTG monitoring of the second Menlo Park Squid server.
- The disk problems on the temporary wavepool disks on Spring came back on
Monday. Replacing the disk stopped the errors.
- Did more research on possible solutions to the NEIC web server problems.
Cacheflow has a web accelerator that appears to have the capacity of about
two of our Squid servers. Retail for the appliance is $10k.
- Fixed a problem with the station updates email archive. If two messages
for the archive came in too close together, the second one would step on
the first one during processing. Assigned a timestamp to the temporary
files to avoid this problem.
- Helped Nick set up backups of the GREENx disks on the VMS systems.
- Stan Silverman did some voodoo on the Ehzsquidmenlo system. This machine
has crashed mysteriously several times over the last five months. He found
a small scrap of plastic caught between the CPU and its heat sink. I think
this may have been causing the CPU to overheat and the machine to crash.
- Reconfigured the disks on Rtdev to free up one of the four-disk arrays
for use as the temporary wavepool for Jet.
- Added 'noatime' to the mount options for the /cache filesystem on all the
Squid servers. This should make access a bit faster for these filesystems.
- Worked with Lisa to fix up the frames and navigation bars around the new
version of the Simpson Maps.
- Put the four-disk array on Jet and built the temporary wavepool and its
associated links.
- Enabled 'time_hack' on the Menlo Park Squid servers and recompiled Squid.
- The second Menlo Park Squid server went live on Friday. Set up SNMP
monitoring of it.
- Talked to Jill McCarthy in Reston and Linda Pratt in Golden about the
options for improving NEIC's web server performance. Linda is going to be
writing up their proposal for fixing the problem.
- Stan Silverman noticed that their new Squid servers seemed slow. This
turned out to be caused by a duplex conflict. The machines were set to
100Mb full-duplex, but the ports were set to half-duplex. Charlene changed
the ports to full-duplex, and the problem was solved.
- Set up procedures to merge the logs from the Menlo Park Squid servers and
do statistics reports on them.
- Upgraded Squid on the the Pasadena Squid server to 2.4-stable1.
- Talked to Charlene in Menlo Park to be sure that Ehzsquidmenlo did not
have the same duplex problems as the Squids for the Menlo Park site.
- Got quotes for new CPU modules for Iron and Rtdev, as well as a quote for
a refurbished Ultra-60 to serve as a spare. Gave both quotes to Patrick.
- Got the first draft of the NEIC web server proposal. Emailed comments
back to Linda Pratt, and also spoke with her on the phone for a bit.
- The external disk on Genie is making noise. The noise appears to be
coming from the cooling fan.
- Attended a Compaq dog-and-pony show about their products for web servers.
They have web accelerator boxes available.
- Fixed a minor problem in the footer for the htdig results page on the
Pasadena web site.
- Figured out how to do redirects in Squid. This can allow a single server
to accelerate multiple web sites.
- Disabled snmpXdmi on all Solaris systems after a CERT advisory about this
was posted on Friday.
- Found documentation of a known bug in Internet Explorer which causes
problems with persistent connections. This is the reason people
occasionally get '403 - Forbidden' errors on Squid-accelerated web sites.
Their browser gets confused and sends inappropriate requests to our server.
- Sent ezmlm scripts and information to John Lahr in Golden.
- Started Solaris reinstall on Baldur.
***
- Worked with Patrick to reconfigure the wavepool disks on Spring and Jet.
This has reduced stress on the RAID-5 arrays, and both systems are now fast
and responsive.
- Analyzed the NEIC web server logs to try to understand why they failed so
completely in the traffic surge after the Seattle earthquake on February
28th. Talked with the USGS web development group about possible solutions
for increasing NEIC's capacity.
- Assisted Menlo Park with setting up two Squid servers to increase their
web service capacity. The machines they bought are Athlon-based, and
should give their site the capacity to withstand about 1,000-1,200
hits/second, which is nearly an order of magnitude increase over their old
server setup.