The Gmail Outage: How Reliable Is Reliable? (UPDATED)

Posted by: Stephen Wildstrom on February 24, 2009

Update, Thursday Feb. 26

I can’t claim credit for this, but Google has followed my suggestion and implemented a status dashboard for its various services. (Thanks to TechCrunch for the tip.)

———————

Today’s Gmail outage naturally raised questions about the reliability of trusting mission-critical applications to the vagaries of cloud computing. But just how bad a blow to Gmail’s reliability was the outage, which Google puts at 2 1/2 hours, but user report made seem somewhat longer. The answer: Not that bad, as long as it doesn’t happen often.

In the good old days, AT&T used to promise “five nines” of reliability. That meant you could expect your phone service to be up 99.999% of the time, a standard that allowed for just a bit more than 5 minutes of downtime as year.

But five nines is really, really expensive to deliver and led to a phone network that was, by most standards, massively over-engineered. Google's service level agreement for paid business Gmail promises "three nines," or 99.9% up time. That actually leaves room for nearly 9 hours of outage a year, or three failures of the same magnitude as today's. Amazon makes a slightly higher promise for its Elastic Compute Cloud service, 99.95% uptime, or nearly 4 1/2 outage hours a year.

What's more disturbing than the Gmail outage is Google's lack of transparency about it. The most recent post on Google's official blog declares the problem over, apologizes for the inconvenience, and explains why some users had to prove to Google that they were human beings before being allowed to log in to their Gmail accounts. But it provides no explanation whatever of what went wrong or what had been done to fix it or prevent its recurrence.

Amazon, by contrast, maintains a Service Health Dashboard for its Amazon Web Services with both a report on the current status of each service and a 35-day history of any problems (I can't tell you how good the reports are because the current time frame shows no incidents.) At a minimum, Google should maintain a similar site for the folks who have come to depend on its services.

Reader Comments

Bob

February 24, 2009 9:09 PM

Perhaps Google heard you? http://gmailblog.blogspot.com/2009/02/update-on-todays-gmail-outage.html now has an explanation of the outage.

Steve

February 25, 2009 3:16 PM

This post is gibberish. Those of us who work in most companies (small or large, tech or non-tech) deal with at least this much in terms of email outages, be it server or network-related. Stop the drama related to "cloud" (i.e., on demand, i.e., SaaS) computing. Stuff breaks, IT fixes it, life goes on.

Raymond

August 30, 2009 9:27 AM

Steve, you are full of &^%$. Do you understand where your data is or what it is being used for using SaaS?

Post a comment

 

About

Bloomberg Businessweek writers Peter Burrows, Cliff Edwards, Olga Kharif, Aaron Ricadela, and Douglas MacMillan, dig behind the headlines to analyze what’s really happening throughout the world of technology. Tech Beat covers everything from tech bellwethers like Apple, Google, and Intel and emerging new leaders such as Facebook to new technologies, trends, and controversies.

Categories

 

BW Mall - Sponsored Links

Buy a link now!