While there’s no question cloud computing represents a big change in infrastructure, that approach overlooks the fact that cloud computing is comprised of an agile infrastructure married to automated operation. If you install the former without implementing the latter, your revolution is only half-completed. The second half of the revolution is about bringing automation to daily operations and ensuring that one’s cloud offers on-demand resource access, application scalability and elasticity, and a generalized resource pool available as needed.
Implementing a cloud environment means that resource consumers and resource providers must interact across a service interface–an automated set of services that can be called with no need for human interaction: no phone calls, no request tickets, no meetings.
In other words, one needs to become a cloud service provider (CSP), with all that implies.
Looking at what the public cloud providers offer, and how they operate, is instructive and serves as a model for the CIO as CSP. What are the core competencies that need to be in place to operate as a CSP?
Well, first there are the basics:
Consumer self-service. The first element of the NIST cloud computing definition is that consumers of IT resources must be able to self-service, with no need to interact with another human as part of the resource request. To achieve this, some kind of web interface, typically with a service catalog of pre-packaged resources, is used. This definitely does not mean sending an email off to a help desk requesting that a virtual machine be created on the requester’s behalf.
Application abstraction from specific infrastructure. CSPs offer computing capability, not specific hardware resources. To put it another way, the virtual machine provided via self-service may migrate around the cloud infrastructure, with no implied promise that it will reside on specific hardware. In the recent book “Visible Ops, Private Cloud,” the authors referred to this virtual machine migration as “lift and shift.”
Infrastructure funding separate from applications. Many CIOs “play the game” of getting funding for necessary infrastructure spending by tying it to specific application initiatives. Being a CSP means having a generalized pool of resources that applications use but are not tied to; therefore, funding for the infrastructure must be handled separately from application initiatives. To a certain extent, this is a bookkeeping distinction. However, in organizations in which infrastructure funding is a low-priority and tying it to applications is the only way to make it possible, one can foresee that culture and organization change is necessary. Beyond this, one might observe that the overall level of infrastructure spend is likely to increase significantly. Even though every computing platform shift (e.g., mainframes to minicomputers) has led to predictions that overall IT spend will shrink, lower costs have always in fact led to vastly increased use and growth in overall IT spend. Cloud computing will be no different.
Beyond the basics, what does it mean to take on the mantle of being a CSP? The next set of implications are far more revolutionary and challenging for an IT organization, but getting them wrong will result in a failed initiative and a forced march to an external cloud provider.
Support for applications with highly variable load and resource use. Traditional transactional apps tend to be quite stable in resource consumption, but the new breed of applications have a much higher standard deviation of load. For example: You create a Facebook app. When people “Like” your page, the app offers them the opportunity to register for a free sample of your product, which triggers a user account creation as well as an order entry. A Twitter celebrity tweets about your offer and tens of thousands of “Likes” happen over the following 24 hours. Two days later, the attention dies down to several hundred “Likes” per day. Your cloud has to be able to host the application gracefully at both extremes of load, which means making sufficient resources available when its load is large and removing those resources when they are no longer necessary due to reduced demand.
Automated operations. Such resource storms as the one described above don’t conform to normal business hours; that celebrity might have tweeted while on the other side of the world. When load hits one’s infrastructure, resources must be capable of being assigned to an application without someone being present or anyone doing any manual work. In other words, operating the cloud must be pre-configured so that resources can be dynamically attached and detached via automated rules. As a CSP, staff should design the system, and the system should manage individual resource requests. Requiring a human to intervene to change application resource assignment or topology is an admission of failure.
Capacity planning. This has been addressed in previous blog postings, but it’s important to reiterate. As a CSP, visibility into resource demand (across that service interface) is limited and of very short duration. However, the service interface implies a promise that, upon request, resources will always be available. Of course, a policy can be set (as even Amazon does) that limits requests to a set number of servers; however, if an application hits like the Facebook one described above, it must be possible to make an exception to the policy. More importantly, there must be sufficient capacity available to grow to the required resource level made necessary by the application load.
High utilization. Many presentations that aver the case for a private cloud cite as a primary reason the fact that they can be run more cheaply than services can be obtained from a public CSP. This is quite controversial, with plenty of people strongly weighing in on both sides. However, in all of the presentations I’ve seen, the case for the economics of a private cloud is underpinned by an assumption that utilization will run at 70 percent or more. Because being a CSP is like any capital-intensive retail business (e.g., airlines), the economics can turn ugly if utilization falls short of the necessary load factor. In the past, responsibility for utilization of servers fell mostly on the application group: If they overprovisioned, it was unfortunate, but just a byproduct of poor forecasting. If you run your cloud with the implicit promise that it will be more cost-effective than the public alternative, suddenly achieving high utilization rates becomes paramount. Given the changing nature of application load stability described above, this task will become more challenging. Amazon uses clever inducements (e.g., spot instance pricing) to increase utilization during low-load periods; something similar may be necessary for the CIO CSP.
All of these elements can be attributed to one thing: Operating as a CSP requires acting like a business, not a cost center. Only by recognizing that being a CSP means marrying infrastructure agility with operational capability will the vision of private cloud come to fruition. Every previous generation of technology upgrade has carried forward the same operational practices–smart people manually installing and configuring individual pieces of hardware. The phrase “racking and stacking” captures these practices perfectly. Cloud computing implies a second operational upgrade is needed to accompany the technology upgrade.
Where are you in planning your operational upgrade that must be part of your cloud computing objectives?