The M4.9 earthquake centered near Yucaipa on the eastern end of the Los Angeles metropolitan area was widely felt over a large part of Southern California. This created a large surge of traffic on the earthquake program web servers. Here is a report on this surge.
|Figure 1||Figure 2|
|Figure 3||Figure 4|
Figure 1 shows the total traffic on all the Earthquake Hazards Program web sites for June 16th. The traffic surge is clearly visible on the graph, and this surge set a new record for our web sites. Note that the peak bandwidth usage was about 1.2gigabits per second. Earthquakes felt by large numbers of people generally generate surges that peak between five and ten minutes after the origin time. The peak for this event occurred seven minutes after the origin time.
Figures 2, 3, and 4 show the same information for the Earthquake Program site at earthquake.usgs.gov, the Western Region site at quake.wr.usgs.gov, and the Pasadena sites at pasadena.wr.usgs.gov. All the graphs look similar, showing the nearly vertical rise to the peak, followed by an exponential decay.
For more detail about the day's web traffic, there are statistics reports covering the 24 hours for each site:
For the most part, the web sites worked well during this surge, with the exception of the Pasadena server. It was observed to be unresponsive starting about five minutes after the event. Both servers for the site were essentially catatonic. I forced a crash and reboot on both servers, which brought them back briefly, but the load average on both was observed to be rapidly increasing until it reached about 70, and the server became catatonic again. After a second reboot, I edited the Apache configuration to lower the 'MaxClients' limit to 256. By this time, the traffic level had decayed somewhat, and the servers were able to run again.
The culprit in this case is the Community Internet Intensity Map, also known as "Did you feel it?". Users can fill out an online questionnaire to report their experience in the earthquake. The form is submitted for processing by a Perl CGI script, which calculates their observed Mercalli Intensity. The processing of these questionnaires has caused trouble for us on a previous occasion. Figure 5 illustrates the problem we experienced this time. The first line on the graph shows the number of questionnaires submitted per minute. In testing, our servers were able to process 20 questionnaires per second, and with two servers we should have been able to process 40 per second. But this testing was done with no other load on the server. With all the other attendent activity on the server after an earthquake, the processing topped out at around 8-10 per second. The peak incoming rate was about 33 per second, which quickly overwhelmed the servers.
The Earthquake Hazards web sites are served through the Akamai EdgeSuite service, which is a caching service. The Akamai service acts as an amplifier on our web servers. At the same time, it only acts on static content. Any kind of dynamic content processing has to still be done by our servers. This was the origin of the problem this time.
The most obvious solution to this problem is to throw hardware at it. The Pasadena web servers are a pair of single-processor Athlon machines running FreeBSD. One dates from 2000 and the second from 2001, so they are quite old by computer standards. It has been proposed to replace them with a pair of dual-Xeon machines similar to the ones that run the Earthquake Hazards Program and Western Region web sites. These machines have several times the processing power of the Pasadena servers, and they would probably have survived this traffic surge easily. Another possiblity is to split off the CIIM processing to a completely separate set of machines. This would have the advantage that if CIIM processing were to still overwhelm the machines, it would not act as a millstone around the neck of a whole web site. Ultimately, the solution to this problem will probably use both of these ideas.
Overall, the Earthquake Program web sites performed quite well after this event. The size of the traffic surge shows that we are succeeding in our public outreach, and that people are aware of our sites as authoritative sources of earthquake information.