While we await Amazon’s autopsy on why their EC2 PaaS (Platform as a Service) went down the toilet for 36 hours, there has been a lot of talk on making sure that users check their hoster’s SLA (Service Level Agreement) to see what uptime they guarantee. But that is missing the point. SLAs are basically an insurance policy that pays out if you site goes down, but in the same way that life insurance doesn’t bring the insured back to life, if the hoster doesn’t meet their SLAs that doesn’t bring your site back online. And like many insurance policies, the small print will always get you when you try and claim.
Meanwhile, let’s just check the maths again on what needs to happen if you want the magic “five nines” of uptime:
36 hours down a year=99.59% uptime
53 minutes down a year=99.99% uptime
5 minutes down a year=99.999% uptime
No matter what Amazon does to learn from this outage, and no matter what SLA you negotiate you them, there is no way that EC2 is going to get to 99.999%. In fact, there is no way ANY one hosting solution will achieve 99.999%. The only way to get to 99.999% is to have (at least) two hosting solutions from different suppliers and to be able to fail over automatically, be they PaaS or your own servers.