When you use the cloud, you get what you pay for.
And if you’re expecting diligent support and quick resolution of technical problems, then – to state the obvious – choosing the cheapest support option might not be the best idea.
If you purchase virtual server capacity from a cloud provider, there’s a very good chance you’re a customer of the Amazon Elastic Compute Cloud (EC2). Amazon offers a “Premium Support” package at a price, but many customers opt to instead use the free support forums, with mixed results.
Amazon support forum administrators usually respond to posts within 10 or 12 hours, but problem resolution often takes days, according to a new analysis of the forums by a team of researchers from the University of Wisconsin-Madison and IBM Research.
Their paper – titled “A First Look at Problems in the Cloud” – will be presented at the Usenix technical conference in Boston later this month, and examined 9,575 message threads posted between August 2006 and December 2009 in the Amazon support forums. (The paper refers to Amazon only as “a prominent IaaS provider,” but lead author Theophilus Benson confirmed to Network World that Amazon is the company that was analyzed.)
Connectivity, virtual image management and performance are common problems for Amazon EC2 users, but resolving technical issues is often a headache, the researchers suggest.
Even though Amazon is highly experienced at operating its own data centers, customers may still face challenges operating their own virtual machines.
“In a typical commercial cloud, the cloud provider is mainly responsible for problems associated with its own infrastructure,” the authors write. “The provider monitors its physical resources such as servers, storage and network systems to provide reasonably stable resources up to the hypervisor level. While the cloud provider tries to ensure a highly available infrastructure, it typically does not provide guarantees on individual instance availability. Users should expect that the provided virtual resources may become unavailable at times, thus requiring users to restart their application instances on a new server.”
The researchers did not examine the effectiveness of Amazon’s Premium Support, which offers one-on-one problem resolution at prices starting at $100 and $400 per month. Premium customers get unlimited support cases and help building and running applications on the Amazon cloud infrastructure.
While this is a key limitation of the study, the researchers said “we do believe that our preliminary evaluation sheds light on the most common problems faced by a typical user of IaaS clouds.”
Over the three-year-period studied, 166 administrators participated in the threads, responding to 60% of problems in less than nine hours, but sometimes taking nearly 20 hours to respond. A group of 10 administrators are responsible for solving most problems.
“We observe that problems seem to be resolved within 11 to 110 hours of the administrator’s first response,” the researchers write. “It is likely that much of the time after the first response from an administrator is spent in an iterative trial and error process as customers explorer possible root causes.”
The number of problems rises significantly when Amazon releases new features. “For example, we observed that the increase in Virtual Infrastructure problems in the third quarter of 2008 coincided with the introduction of a new virtual storage service,” called Elastic Block Store, the authors write.
Gartner analyst Lydia Leong pointed out in her blog this week that Amazon cloud customers often have expectations that don’t meet reality. Leong was not involved in the support forum research, but her experiences talking to clients helps illuminate the issue.
In her blog, Leong wrote the following: “I recently talked to an enterprise client who has a group of developers who decided to go out, develop, and run their application on Amazon EC2. Great. It’s working well, it’s inexpensive, and they’re happy. So Central IT is figuring out what to do next.
I asked curiously, ‘Who is managing the servers?’
The client said, well, Amazon, of course!
Except Amazon doesn’t manage guest operating systems and applications.
It turns out that these developers believed in the magical cloud — an environment where everything was somehow mysteriously being taken care of by Amazon, so they had no need to do the usual maintenance tasks, including worrying about security — and had convinced IT operations of this, too.
Imagine running Windows. Installed as-is, and never updated since then. Without anti-virus, or any other security measures, other than Amazon’s default firewall (which luckily defaults to largely closed).
Plus, they also assumed that auto-scaling was going to make their app magically scale. It’s not designed to automatically scale horizontally. Somebody is going to be an unhappy camper.”
The authors of “A First Look at Problems in the Cloud” conclude that “to offer more effective support,” clouds should develop tools that debug new features, automate operator tasks, and “provide a vehicle to gather and transfer information between operator and user.”
According to Leong, the cautionary wisdom for IT shops is to “make sure you know what the cloud is and isn’t getting you.”
Follow Jon Brodkin on Twitter: www.twitter.com/jbrodkin