What Can IT Reasonably Expect in Cloud Reliability?

By Dan Sullivan August 17, 2012 12:50 AM

Piecing Together the Risks of Cloud Computing

Rather than expecting the unattainable, we are better off mitigating the risks of cloud computing.

You don’t need to be savvy with predictive analytics to know that an outage at Amazon, Microsoft of Twitter will generate a slew of articles, blog posts and tweets.  Some of the coverage will focus on details of the outage (i.e. cause, duration, etc) and other stories will be more like case studies on the impact of frustrated users.

Too few will be on how the outage fits in the bigger picture of cloud reliability in general.  A well thought through exception to the usual cloud-outage piece is Gene Mark’s article in Forbes called Is the Cloud Really Ready for Business?

A couple of Mark’s points standout. He’s frustrated with stories about the cloud as a panacea. 

I could be more patient if this were a less mature technology or a technology that’s not constantly being jammed down my throat as the solution to all of my technology problems.

He has a point, there is no shortage of positive reviews and predictions about the cloud but there are also plenty pieces on the limitations of the cloud, especially about costs and security.

The other point that struck me is more problematic.

If The Cloud was ready for prime-time, services like Twitter, Amazon’s EC2, Salesforce.com, Google Talk and Microsoft Azure wouldn’t go down.  Ever.  

That isn’t a realistic expectation.

Even if hardware engineers and software designers accounted for all possible points of failure in the components that make up the cloud there would still be failures.  Distributed systems like the cloud are complex systems that have emergent properties that you can’t understand by reducing the system to its smallest components.  Just because we understand the basic laws of mechanics and thermodynamics doesn’t mean we can accurately predict the weather every time we try.

A challenge with complex systems is predicting a future state for that system. It’s not a linear process.  Predicting how a network will respond to a peak in traffic will depend on how are number of algorithms are designed, the state of every network device involved, and how other factors are changing the state of the network.  Yes, designers can anticipate and plan for failures but the way a device responds to a failure becomes another factor influencing the state of the system. The solution to one problem could actually exacerbate a problem in another part of the system.

It’s not likely we will reach the ideal state where complex systems will never fail.  Look at organisms like ourselves. We are the product of 3.8 billion years of evolution that started with simple organisms like bacteria that are far more complex than clouds. For example, we have evolved defensive systems designed to prevent failures caused by pathogens but when those systems fail we suffer from the likes of auto-immune diseases.  We can design systems to compensate for failure but those systems are also subject to failure.

Failure can occur at multiple levels in both technical and organic systems.  Rather than expect the unattainable, we’d be better off mitigating the risks with compensating measures outside the cloud.

Dan Sullivan is an author, systems architect, and consultant with over 20 years of IT experience with engagements in systems architecture, enterprise security, advanced analytics and business intelligence. He has worked in a broad range of industries, including financial services, manufacturing, pharmaceuticals, software development, government, retail, gas and oil production, power generation, life sciences, and education.  Dan has written 16 books and numerous articles and white papers about topics ranging from data warehousing, Cloud Computing and advanced analytics to security management, collaboration, and text mining.

See here for all of Dan's Tom's IT Pro articles.

(Shutterstock image credit: Cloud Puzzle)

Additional The Silver Lining Posts By Dan Sullivan:

Comment on this article
Comments