When it comes to things like serving customers, business leaders are fond of saying that nothing less than 100 percent will do. So it seems natural to take that same approach with your company’s website, and if you can’t guarantee 100 percent uptime come as close as possible — 99.999 percent — or less than six minutes of unplanned downtime a year.
That may sound like a logical goal. But what does it take to guarantee near-perfect uptime? The only way is to have backups for everything that could conceivably go wrong. You’ll need backup servers, and/or servers with multiple disk drives in a redundant array of independent disks (RAID) configuration. You also need multiple instances of important applications and databases. Your connection to the Internet could fail, so it’s smart to have more than one provider, a practice known as “multihoming.” Redundant servers and software won’t be much use without electricity, so your equipment should be protected with an uninterruptible power supply (UPS) device, which can run your servers on battery power for a couple of hours, long enough to get a generator — which you also need — up and running.
Even all this may not prevent an outage in a hurricane or earthquake. So, to truly guarantee uptime, you should have a second set of servers with multiple disks and uninterruptible power in a different geographic area, ready to take over in case of need.
Large enterprises do all of this as a matter of course. Should smaller companies follow suit, putting as many of these protections in place as their budgets allow? No, according to David Heinemeier Hansson, partner in the software firm 37signals, and creator the popular Ruby on Rails software development framework. “Companies tend to emphasize uptime to the detriment of other things,” he says. “Unless you have a very large number of users, uptime doesn’t matter as much as other things, such as innovation.”
Most experts agree that 99 percent uptime — or a total 3.65 days of outage a year — is unacceptably bad. So it may make sense to seek better performance than that, but the closer you get to perfection, the more it will cost. “The expense of going from 99 percent to 99.59 percent can be astronomical,” Hansson says.
It can have an unexpected impact on future costs as well, notes Dirk Morris, founder and CTO of Untangle, which provides open source gateway appliances. “A typical scenario for a small business is to have some kind of database-driven Web application for sales,” he says. “To avoid having that go down, they might put in a second instance of the same database. Now you have an extra layer of complexity in your system, and it’s much harder to change or add anything. You might have better uptime, but you’ve lost flexibility.” Because of this tradeoff, many companies wind up regretting the backups they’ve put in place, he says.
Outsourcing for uptime
Not necessarily, argues Dan Ushman, co-founder and vice president of marketing at SingleHop, Inc. SingleHop provides managed hosting, and Ushman claims that for clients of companies like his, 100 percent uptime is indeed possible, because the service can provide the many layers of redundancy required. In fact, Ushman says most small businesses don’t spend enough on uptime. “The biggest mistake small businesses make is to go with shared hosting, which may just cost $20 a month,” he says. “Then you’re one of 500 accounts on one server, and any of the other accounts can cause the server to crash and cause downtime for everyone else.”
It’s certainly true that hosting services have more redundancy, more expertise, and better monitoring than most small businesses, allowing them to offer a higher percentage of uptime. But, Hansson notes, hosting providers don’t put much financial commitment behind their uptime guarantees; most offer only a partial reimbursement of their hosting fee for the time the service was down. “We were actually in this situation, and the payments you get back are not substantial,” Hansson says.
Doing downtime right
How much cost and complexity to take on to avoid possible downtime is a question with a different right answer from company to company. But in every case, there are ways to minimize the business effect of an outage. Unless you’re certain your site can never go down, it’s worth spending a little time and energy to prepare in case it does. Here are some tips that can help you get started:
1. Find an alternate way to communicate with customers. Even if customers can’t actually get to your site, they should get more than their browser’s error message when they try. This can be as simple as a webpage that apologizes for the outage and lets customers know you’re working to fix it. If your site is down for more than a brief time, Hansson recommends redirecting traffic to that page instead. (Needless to say, it should be hosted away from the servers you normally use.)
37signals takes this one step further with a blog that constantly reports on its site’s status. During an outage, 37signals staff post updates every 15 minutes or so, reporting on their progress getting the site back up.
2. Apologize and explain. Once the outage is solved, give your customers a post mortem as to what went wrong, and tell them what you did to fix the problem. Make sure to apologize for the inconvenience your down time undoubtedly caused.
Whatever you don’t, don’t fudge the facts or use anything that sounds like double-speak, such as Amazon’s recent description of its outage as an “availability event.” “Admit your mistakes up front, and in human language. That’s all people really want,” Hansson says. “If you’re making it sound like you didn’t do anything wrong, if you can’t call an outage an outage, then you’re not trying hard enough.”
3. Treat your outage as an opportunity. “When people go to a hotel and everything goes smoothly, they’ll give it an OK rating,” Hansson says. “But if there’s an issue and the hotel fixes it, they’ll give it a better rating.”
There’s a lesson in this, he says: if you have an outage, but customers see that you fixed it as quickly as possible, were honest about the event, and apologized for the trouble, they may appreciate you more than if you never had the outage at all. “When we have a downtime issue we respond to it honestly, and we get positive feedback,” he says. “It’s a unique opportunity to bond with your customers. If you handle it right, they’ll come away liking you more, not less.”