- Continued working with the Diskeeper disk defragmenter. I changed the
system disk job to run weekly, along with the user disk job.
- Started testing the backup procedure with Riley. We ran into a problem
in that we had figured that a 4GB DAT tape would hold about 4,000,000
blocks, but the actual capacity turned out to be around 3,200,000 blocks.
We are currently looking at options such as different combinations of disks,
or writing two-volume tapes.
- The disks on MALIBU logged some errors this week. They all occurred on
Monday morning. The errors appeared to be related to a combination of
activity on that system coupled with network activity at the time. The
errors have not recurred since then.
- Continued with the informal C++ programming class.
- Talked with Tony Coleman at KCET about setting up a modem line on Sue
Hough's PC so that she can use some software called First Class that they
sent her.
- Ran and ANALYZE/DISK_STRUCTURE/REPAIR on the AVALON$DKB0: disk. This
disk was showing about 30,000 blocks of lost files and various other file
system errors.
- Attended the weekly Timers' meeting.
- Continued testing the disk backup procedure. I worked out a schedule
that gets around the tape capacity limitations and still backs up every
disk at least every other week.
- Looked into OPCOM error messages that are generated whenever someone
prints multiple jobs on the LW_YEL laser printer. The configuration of
this printer is the same as all of the other printers, so it is not
immediately obvious what is happening, but it appears that no loss of data
is occurring.
- Assisted Bob with cleaning up some old directory structures on the disks
on CAJON. These are old files that have not been touched in several years.
We are archiving them to tape and deleting the files.
- Found an intermittent problem with the dialup modem connected to MOJAVE.
- Fastened the computer in my office down to the desk.
- Bob gave me the brief explanation of how to interpret the "beach balls"
that appear in seismologists' reports.
- Got information about ordering a set of updated FORTRAN manuals for VMS
6.1 on the VAX and Alpha machines. This is part of the project to update
CUSP to run on VMS 6.x.
- Continued with the informal C++ programming class.
- Attended the weekly Timers' meeting. Got some feedback on how the system
is running, as well as getting to hear about the latest crank earthquake
predictions that the Timers have gotten.
- Continued testing the disk backup procedure. This is the first week that
we are actually running this in production. I found a couple of minor
problems with the facility for mailing status reports back after the
backups finish, but aside from these, it is working well.
- Discovered and fixed a problem with the DAT drive on MALIBU which caused
some problems with backups on Tuesday night.
- Found a problem with some 'ghost files' on the system disk on CAJON.
Talked to Bob about this, and I will be running ANALYZE/DISK_STRUCTURE
/REPAIR on this when they stop and restart the CAJON on-line system at the
end of next week.
- Assisted Doug in looking into problems with receiving EDIS messages on
BIGONE.
- Finished cleaning up the old directory trees that Bob found last week.
These have been backed up to tape and stored.
- Found out that the tape drives and two of the disks on BIGONE were not
fastened down to the rack. Got some velcro blocks and fixed this. Also,
the disk drives on the workstations in the Timers' room are not fastened
down. I will be talking to Egill about having this taken care of.
- Created a batch procedure to run on the first of every month to create
new system accounting and error log files. This will prevent these files
from growing excessively large, which will make analyzing errors faster, as
well as saving disk space.
- Attended the SCEC seminar on real-time seismology. This was suggested by
Bob and Doug as a way to learn more about the 'big picture' of what we are
doing here.
- Arranged to order an updated set of GKS manuals for the Alpha machine.
This is part of Bob and Allen Walter's project to get CUSP working on the
new architecture.
- Installed the DECset suite of programming tools on the Alpha to assist
with the CUSP porting effort.
- Investigated the Timers' complaints about MECCA being slow. It appears
that running Timit on this node causes the user's process to slip in and
out of an RWSCS state, which causes momentary hangs. This is a state where
the process is waiting for network communications with another computer,
and could be related to the load on the network, although it is not clear
why this node is more susceptible to this than the others.
- Attended the USGS staff meeting with the branch chief to hear about the
latest news about the ongoing adjustments in the USGS.
- The backup procedure is working in production now.
- Worked with the 9-track tape drive service guy to test the drives after
he had adjusted them.
- INDIO developed a problem with one of its disks on Sunday night. I did a
power-off reboot to fix it on Monday morning. INDIO also crashed on
Thursday morning, although this does not look like it was related to the
disk problem. The crash dump indicated that the system was idle at the
time when it crashed, so there was little to go on in diagnosing the cause.
- Did some monitoring of ethernet statistics recorded on the VAX
workstations. This is part of the investigation into why MECCA was slow.
I noticed that MECCA, LANDER, BBEAR, and INDIO all had non-zero counts in
the "Collision detect check failure" counter. Normally, this counter
should always be zero. Tracing the wires turned up that they were all
connected to the same four-port ethernet transceiver. Chuck made a new tap
in the computer room, and we moved MECCA to a different transceiver. The
timers report that it is running significantly faster. It has not logged
any more collision detect check failures. This indicates that the
four-port transceiver is probably bad, and should be replaced.
- Talked to Bob and Doug and Chuck about our options for replacing the bad
transceiver. We will be deciding what to do next week.
- Did some system and user account tuning on the CARIZO Alpha workstation.
This is more of the CUSP porting effort.
- Talked to Egill about getting supplies to tie down the workstation disks
in the Timers' room.
- Assisted Steve Bryant with debugging a problem with paging.