Feb 28, 2001 Seattle Earthquake - NEIC Web Site Report

The M6.8 earthquake centered near Seattle, WA turned the screws on the NEIC web site. The traffic quickly exceeded the capacity of the servers and rendered the site unavailable for some time after the event.

Two-hour traffic detail
Figure 1
The traffic on the web server started to increase within two minutes of the event. This is the usual pattern for earthquakes that are felt by large numbers of people. This can be seen in Figure 1, which shows the two hours around the event. Note that the traffic increases at a constant rate for the first four minutes after the event, after which it slows. The normal traffic pattern for felt earthquakes in Southern California always shows traffic increasing to a peak at 10 minutes after the event. Given the rate of increase for the first four minutes, if we assume a peak at 10 minutes post-event, the peak would have been on the order of 400 requests/sec. The National Earthquake Hazards Program web site experienced its peak traffic at 25 minutes post-event. If we project the NEIC spike out to 25 minutes, the peak traffic would have been about 1000 requests/sec. Either way, this peak was beyond the 300 requests/sec that the Denver web servers could handle.

Twelve-hour traffic detail
Figure 2
Figure 2 shows the twelve hours around the event. Of note is the peak at 17:15. It appears that the server farm had something happen that caused it to recover from its problems about this time. This was about the time that people in the Pasadena office observed that the site became available again. This peak is about 80% of the traffic on the Earthquake Hazards Program web server at that time. If we extrapolate back to the peak, that would translate into a peak of about 600 requests/sec for the NEIC server.

Individual server traffic
Figure 3
Figure 3 shows the same line as figure 2, but with the traffic reported by each of the three servers. Note the red line for server wb01. This server apparently functions as the load balancer for the whole server farm, as well as being one of the servers itself. Note that its traffic does not look like the others. There is even a 'hole' in the log file that indicates that this machine serviced no traffic at all from about 14:50 until 16:40. This indicates that there was some sort of severe problem on this machine from the time of the event until about 17:15. From reports of the system administrators, it looks like the machine probably ran out of available network sockets. This can cause a unix system to go catatonic for long periods of time, and could account for the strange behavior shown on the graph. It appears that the apparent recovery at 17:15 was probably when the level of incoming traffic dropped into the range that the wb01 server could handle. At this time, it was able to resume its role of directing traffic to the other two servers. The traffic on the whole site increased, and all the servers' traffic lines then looked normal.

Ultimately, there is no sure way to know exactly how much traffic the NEIC site received on February 28. But a reasonable guess would be that the peak traffic was about 500-1000 requests/sec. The capacity should be seen as a bare minimum requirement for this server. Another problem for the NEIC site is that the bandwidth for the Denver Federal Center was observed to be saturated during the peak load periods. This indicates that adding server capacity will not help, unless it can be located in another facility with higher available bandwidth, or if the bandwidth for the Denver facility is increased. In order to have a reasonable assurance of being able to survive future traffic spikes, the web server should really be built to deal with at least 2000 requests/sec, and it should have at least 100 Mb/s bandwidth available.

Additional Information:

Analog statistics report for February 28, 2001.

Stan Schwarz
Honeywell Technical Services
Southern California Seismic Network Contract
Pasadena, California