How to build an enterprise private cloud that looks better than AWS

Clouds that enterprises build rarely stack up to AWS and its ilk in a public cloud. Follow these steps if you want a viable private cloud for users.

Amazon Web Services, Google, Rackspace and Microsoft are all very different cloud providers with one thing in common: a relentless focus on automation to drive agility, efficiency and quality.

You were vendor-led like a lamb to the slaughterhouse, and your cloud build was butchered.

And then there's your enterprise private cloud.

Businesses should focus on building a model that uses cloud to exceed expectations for agility, innovation, quality and efficiency. Many enterprises look at these public cloud providers and conclude that they too should be in the cloud business, for their internal customers. New IT initiatives get created, budgets allocated and vendors selected. Nirvana reached, right? Wrong.

Very wrong. In 2009, I told a roomful of attendees at a cloud conference that most enterprise private cloud initiatives would fail. Time after time, I discover private cloud implementations at large enterprises that would make Amazon and Google engineers scoff.

What's wrong with most private clouds?

Stakeholders don't need a cloud; they need the ability to get a fully configured container for an application in just minutes. They need to pay only for what resources they use; they want a change from weeks-long approval processes driven by enterprise architecture, infosec and ITIL-based processes that were created in a different era.

Your private cloud just isn't AWS. It's too constrained and hard to use. It lacks application programming interfaces (APIs) and integration with management tools. It costs the company twice what AWS and Google would cost, sometimes more. There's no robust catalog of common services, and for all of that, it hasn't been certified for production by the abominable "NO" men who govern such things.

Let's look at the proverbial litmus test: a fully usable container provisioned in five minutes. The enterprise's private cloud management and infosec stack lack the automation to stand anything up in that timeframe. The virtual machine might be there in five minutes, but then a week or more goes by for manual infosec and management provisioning.

You were vendor-led like a lamb to the slaughterhouse, and your cloud build was butchered. You can neither build a viable private cloud with 15-year-old IT automation suites held together with bailing wire and gum, nor with big-vendor converged infrastructure stacks on prepackaged, partially automated frameworks.

If you're still determined to build an internal cloud stack that rivals Amazon, here's what you need to do in five steps:

1. Spend more money -- a lot more
Amazon has spent billions to build its cloud. If you want the agility and efficiency of AWS, you can't get there with a $1 million investment in half-baked vendor products. At a minimum, you're going to need to budget $15 to $20 million to implement the base capability.

2. Plan for organizational change
Many of the roles on your IT staff will change or disappear if the cloud implementation is a success. If the same people are doing the same jobs at the end of this process, you've failed. The server-to-admin ratio at many enterprises is 25 or 50 to 1. At a large cloud provider, expect 1,000 or even 10,000 servers to 1 admin. Humans are only involved to physically pull and replace equipment.

Make sure the cloud program has enough executive-level support that it is isolated from the naysayers who worry about losing their jobs or importance.

The positions most impacted by private cloud implementation call for manual processes that the cloud stack will automate. IT staffers that recognize the risk to their jobs can have a very negative influence on cloud initiatives, so give them a vision for their future role or a generous incentive to stay and help phase themselves out.

3. Keep the IT ops team away at first
An IT ops-led approach to enterprise private cloud is almost destined to fail, according to Forrester analyst James Staten. I have witnessed many real-world cloud attempts that affirm it. Despite a well-intentioned attempt to get rid of "business as usual" thinking, most IT ops teams are far too indoctrinated with ITIL- and COBIT-driven processes and other standards to focus on making cloud computing valuable to users.

Start with a team of developers to be the cloud's users; have that team build the design from the top down. That means a robust catalog of developer-friendly services and APIs, easy integration with the tools that they need to use every day, and so on. Their development practices should change too; you don't build apps for AWS the same way you build them for your virtualized enterprise infrastructure.

You can and should involve IT ops to finalize the design and work alongside the developers on implementation. Developers still need to be in charge in this period to avoid a vendor-driven IT ops approach that invalidates earlier work.

4. Leave no automation stone unturned
Automation is often a source of failure. People love features and move on to the next thing before the current task is completed. But don't jump ahead: Leaving a few automation tasks to be finished later will result in a broken cloud.

Find every manual process, no matter how seemingly small or inconsequential, and automate it out of the mix -- you're upping your server-to-admin ratio. While you may never get to the 10,000:1 ratio seen at the top cloud providers, you're going to need to get staff doing the minimum required work -- moving hardware and wiring. From the moment a server powers on, automation should be the only action.

5. Test your cloud, and automate the testing
Clouds are complex systems. Automation can control that complexity, but it can also amplify failure. Testing your internal cloud does not mean just putting up workloads; it's getting your pilot users provisioned and declaring success.

You need to create the chaos of a fully functioning system at scale to see if something will trigger a flash crash in the enterprise cloud. Netflix uses tools like Chaos Monkey to kill servers and systems in AWS, testing their own resiliency. Perhaps a new simian army of tools for testing the underlying resiliency of cloud automation would be useful in the enterprise as well.

Think long and hard about what it will take to be successful before embarking on your AWS-killer private cloud. Few IT organizations have the resources, budget, skills or political will to be successful. That doesn't mean you shouldn't have some form of private cloud as part of your IT portfolio. But it does mean that you should be careful of your ambitions to build and operate it yourself.

About the author:
John Treadway is a senior vice president at Cloud Technology Partners and is based in Boston.

[email protected]
Twitter: @johntreadway

Dig Deeper on Systems automation and orchestration

Software Quality
App Architecture
Cloud Computing
Data Center