M5.6 Alum Rock Event, October 30, 2007

Observations on web server backend performance

The Earthquake Hazards Program web sites are all set up with a similar architecture. There are backend servers that serve content to the Akamai EdgeSuite caching servers. These servers in turn serve the pages to the public. In practical terms, this means that we never see about 90% of our web traffic. It is served by the Akamai servers and only about 10% of requests end up coming back to our servers.

During a post-earthquake web traffic surge, the shape of the curve is always the same. There is a nearly vertical rise up to peak traffic, which generally occurs five minutes after the event. Typically, a widely-felt earthquake will cause traffic on the web site to increase by a factor of about 1,000. Following the peak, traffic falls off in an exponential decay curve. The traffic on the backend servers will show the same shape, just at a lower level. Here are some graphs of traffic on the backend servers after recent events.

First off, here is the backend traffic on quake.wr during the Oct 30 traffic surge. The backend traffic is directly proportional to the public traffic, and the peak is sharp. The quake.wr web site was observed to be responsive the whole time.

This is the backend traffic on earthquake.usgs.gov for the Sep 2 M4.7 event near Lake Elsinore. This generated a modest traffic surge of about 1,900 hits/sec on the public side. This translated into a backend surge peaking at 250 hits/sec on the backend server. Usually, the backend traffic is about 5-10% of the public traffic. Note that the peak is not sharp, which indicates that the server was struggling to keep up with demand.

Here is the earthquake.usgs.gov backend traffic from the Aug 9 M4.6 Simi Valley event. The public traffic surge peaked at 985 hits/sec, which is not very high compared to other past surges. Note that the peak is ragged and not sharp. It is visibly clipped with a flattened top. This is a clear indication of the server not being able to keep up with demand. The top of the clipped peak is at about 133 hits/sec, which is pretty close to the value of 115 hits/sec that the server was observed to be saturated at after the Alum Rock event.

These graphs indicate that the earthquake.usgs.gov backend servers failed after both the Simi Valley and Lake Elsinore events. These events were not particularly large traffic surges, so the server was only overwhelmed for a short time. This meant that the failure was less obvious than the two hours downtime after the Oct 30 Alum Rock earthquake.

Our servers have survived traffic surges larger than the Aug 9 and Sep 2 events in the past. This indicates that the real problem after the Alum Rock event is that the server has been severely handicapped by the switch to dynamic pages.

Stan Schwarz
Honeywell Technical Services
Southern California Seismic Network Contract
Pasadena, California
November 4, 2007