Feb 28, 2001 Seattle Earthquake - NEIC Web Site Report
The M6.8 earthquake centered near Seattle, WA turned the screws on the NEIC web site. The traffic quickly
exceeded
the capacity of the servers and rendered the site unavailable for some time after the event.
 |
Two-hour traffic detail Figure 1 |
The traffic on the web server started to increase within two minutes of the event. This is the
usual pattern for earthquakes that are felt by large numbers of people. This can be seen in
Figure 1, which shows the two hours around the event. Note that the traffic increases
at a constant rate for the first four minutes after the event, after which it slows. The normal
traffic pattern for felt earthquakes in Southern California always shows traffic increasing to
a peak at 10 minutes after the event. Given the rate of increase for the first four minutes,
if we assume a peak at 10 minutes post-event, the peak would have been on the order
of 400 requests/sec. The National Earthquake Hazards Program web site experienced
its peak traffic at 25 minutes post-event. If we project the NEIC spike out to 25 minutes,
the peak traffic would have been about 1000 requests/sec. Either way, this peak was
beyond the 300 requests/sec that the Denver web servers could handle.
 |
Twelve-hour traffic detail Figure 2 |
Figure 2 shows the twelve hours around the event. Of note is the peak at 17:15.
It appears that the server farm had something happen that caused it
to recover from its problems about this time. This was about the time
that people in the Pasadena office observed that the site became available again.
This peak is about 80% of the traffic on the Earthquake Hazards
Program web server at that time. If we extrapolate back to the peak, that would
translate into a peak of about 600 requests/sec for the NEIC server.
 |
Individual server traffic Figure 3 |
Figure 3 shows the same line as figure 2, but with the traffic reported by each
of the three servers. Note the red line for server wb01. This
server apparently functions as the load balancer for the whole server farm,
as well as being one of the servers itself. Note that its traffic does not look
like the others. There is even a 'hole' in the log file that indicates that
this machine serviced no traffic at all from about 14:50 until
16:40. This indicates that there was some sort of severe problem
on this machine from the time of the event until about 17:15.
From reports of the system administrators, it looks like the machine
probably ran out of available network sockets. This can cause a
unix system to go catatonic for long periods of time, and could
account for the strange behavior shown on the graph. It appears
that the apparent recovery at 17:15 was probably when the level
of incoming traffic dropped into the range that the wb01 server could handle.
At this time, it was able to resume its role of directing traffic to the
other two servers. The traffic on the whole site increased, and all the
servers' traffic lines then looked normal.
Ultimately, there is no sure way to know exactly how much traffic the NEIC
site received on February 28. But a reasonable guess would be that the
peak traffic was about 500-1000 requests/sec. The capacity should be
seen as a bare minimum requirement for this server.
Another problem for the NEIC site is that the bandwidth for
the Denver Federal Center was observed to be saturated during the
peak load periods. This indicates that adding server capacity will not
help, unless it can be located in another facility with higher available
bandwidth, or if the bandwidth for the Denver facility is increased.
In order to have a
reasonable assurance of being able to survive future traffic spikes, the
web server should really be built to deal with at least 2000
requests/sec, and it should have at least 100 Mb/s bandwidth
available.
Additional Information:
Analog statistics report for February 28, 2001.
- Stan Schwarz
- Honeywell Technical Services
- Southern California Seismic Network Contract
- Pasadena, California