Amazon outage -the view from the mainstream press

Tuesday, 3 May 2011

When a story that gets the IT world excited actually makes it into the mainstream press – then for once the IT world was right to get excited.

So when The Economist covered Amazon’s and Sony’s problems last week, it was proof that cloud computing (and its teething problems) had broken out of the IT world and into general business consciousness.

Interesting to note that The Economist main recommendation was that SaaS vendors should not rely on just one hosting supplier, as I prescribed in my last blog.


More on Amazon outage – SLAs are not the point

Wednesday, 27 April 2011

While we await Amazon’s  autopsy on why their EC2 PaaS (Platform as a Service) went down the toilet for 36 hours, there has been a lot of talk on making sure that users check their hoster’s SLA (Service Level Agreement) to see what uptime they guarantee. But that is missing the point. SLAs are basically an insurance policy that pays out if you site goes down, but in the same way that life insurance doesn’t bring the insured back to life, if the hoster doesn’t meet their SLAs that doesn’t bring your site back online. And like many insurance policies, the small print will always get you when you try and claim.

Meanwhile, let’s just check the maths again on what needs to happen if you want the magic “five nines” of uptime:

36 hours down a year=99.59% uptime
53 minutes down a year=99.99% uptime
5 minutes down a year=99.999% uptime

No matter what Amazon does to learn from this outage, and no matter what SLA you negotiate you them, there is no way that EC2 is going to get to 99.999%. In fact, there is no way ANY one hosting solution will achieve 99.999%. The only way to get to 99.999% is to have (at least) two hosting solutions from different suppliers and to be able to fail over automatically, be they PaaS or your own servers.

Amazon Cloud Outage – The lessons

Friday, 22 April 2011

Over the past couple of years, well meaning people in the cloud industry have told me “You ought to host on a PaaS (Platform-as-a-Service) like Amazon or Google. Your customers would be reassured by having such a big name behind you, and it solves all the scalability issues”. And I’ve replied that, call us old fashioned, but we like to know where our customers’ data is and we like to have control of the technical environment we’re running on, and you only get both if you own and maintain your own servers. There is also the mostly-completey-ignored issue of complying with UK & European law and not holding data on EC citizens outside of the EC.

This blog post is not about schadenfreude, rejoicing in Amazon EC2’s two day outage that has taken a swathe of major cloud applications down, including some of our competitors. This is a plea (yet again!) for simplicity in IT design.

It is a truism that the more complex a system, the greater the chance that something will go wrong. The more firewalls, load balancers, routers and software layers between the customer’s browser and your application, the greater chance that something will fail, be it as simple as an engineer in the datacentre pulling out the wrong cable (as happened to us a few months ago).

The other reason we like hosting our own servers is that, if they go down, we have a team of our own people working flat out focussed 100% on getting our system back and not 1,000 other systems at the same time. Which is a lot easier job, especially as we’ve made sure that we have as few layers between our boxes and the outside world.

We also have a backup system on standby with real time data sync so that if our main datacentre does go down, we can fail over in about 20 minutes.

So, cloud developers! Rack your own boxes and keep the IT simple. Maintaining servers is not that hard, you’ll get much better scalability and efficiency by specifying your own software and hardware platform. And your customers won’t be left without an application that they have paid for.

Just make sure there are no Armenian old ladies near the building.

Fog Computing

Sunday, 1 February 2009

I was interested in reading the controversy this week about Oracle offering their customers the ability to run their CRM On Demand product on their own servers (see Eric Krangle’s article in Silicon Valley Insider, and also Phil Wainewright’s blog). The comments have ranged across challenging Oracle’s claim that their product is really SaaS (if the product is installed on your own box, what’s the difference between this and conventional in-house software apart from the pricing model?), insinuating that one of Oracle’s motivations is so that that the press will never notice any downtime (if one customer’s server crashes for a day only one customer is affected, whereas if a large shared platform goes down for an hour everybody screams and the press pick it up), and picking up on Europe’s stricter data protection laws (legally, if you want to store the personal details of European citizens outside of the EU you need each and every one of them’s individual permission, not that many people seem to know or care about this).

As Cloud Computing becomes the must-have technology for this and the next decade we’ll see lots of more traditional vendors claiming that their offering is Software-as-a-Service, all with their own definition of what Cloud Computing is about: pure play vendors with browser based applications and no (or minimal) local software, shared tenancy and monthly pricing (, NetSuite, Kashflow, Really Simple Systems); browser based software offered on in-house or single servers (Oracle); local software and the data in-house or hosted (Microsoft); traditional software running on in-house servers but accessed through the like of Remote Desktop Connection. Only when the fog around Cloud Computing clears and customers work out what they want and at and what price will the terminology and offerings stabilise.