Guido Vrola - Fotolia
How to effectively work with multiple cloud providers
Before working with multiple cloud providers, evaluate their services for compute, storage, security and more.
Businesses have multiple cloud providers from which to choose. Amazon Web Services is the 800-pound gorilla, but Microsoft Azure offers an increasingly competitive set of services. The Google Cloud Platform is attractive to those with big data and processing needs that can use Google's infrastructure. IBM and Rackspace offer alternatives to the big three.
Businesses may not want to be tied too closely to a single cloud provider. Specialized services available in one cloud aren't always available in another. In other cases, various departments within an organization may have developed services on different platforms, requiring centralized cloud management teams to support environments hosted by multiple providers.
There are strategies and techniques for working with multiple cloud providers in ways that take advantage of the benefits while limiting duplication of effort and other extra work. To sort through this, we'll examine the core services: compute and workload management, storage services, data management and security.
Next, we'll consider how the concept of infrastructure as code is a key enabler of multicloud management. Container services are described as a common layer of abstraction across clouds that will help further ease the burden of managing applications across multiple clouds.
Compute and workload management services
Compute and workload management services include orchestration, cluster management and configuration tools.
For starters, virtual machines (VM) are the building blocks of any compute service. Infrastructure as a service cloud vendors offer a variety of VMs, and it's important to understand their differences. The specifications of a given VM will include an OS, processing power, memory and features for network optimization. Microsoft, Google and Amazon Web Services (AWS) also offer specialized clusters designed for big data and analytics workloads supporting, for example, Hadoop and Apache Spark.
Container services are also becoming increasingly important. Containerization is well suited to deploying microservices, and, in many cases, can be more efficient than running individual VMs for each application. Containers offer lightweight virtualization on a Linux foundation and can be used with standard tools, such as Docker. As for cluster management, Apache Mesos and Docker Swarm are tools worth considering. Mesos lends itself to job scheduling, as Marathon and Chronos are both supported. And Mesos supports the Docker Swarm API, so you can run Swarm in Mesos if need be.
Also falling under the umbrella of compute and management services is orchestration. Orchestration features allow system admins to define infrastructure as code and automate code deployment.
Orchestration is particularly important for organizations that require scaling or that run across multiple clouds. For multicloud use, configuration and orchestration tools, such as Chef and Puppet, are important. Vendor-specific tools, such as AWS CloudFormation, may also be options.
When deciding between first- and third-party tools, keep in mind the tradeoffs. Third-party tools offer more cross-cloud flexibility and allow for moving workloads across clouds. On the other hand, a vendor's own tools will be purpose-built to work within the given vendor's cloud.
Moving workloads across multiple clouds is challenging if you need to coordinate them between clouds or run scripts designed to use one cloud's API on a competing cloud's platform. Storage, however, poses a different kind of problem. While all of the major cloud vendors offer object storage services, it is important to understand the more subtle differences between options. To optimize costs and performance, you may want to choose object storage based on the duration for which the data will be stored, durability requirements, latency in saving and retrieving data and proximity to compute resources.
For example, AWS offers Simple Storage Service (S3) for object and elastic block storage for file system storage on VMs. S3 object storage comes in at a lower cost, but has a higher latency and is not suitable for a file system. Google, on the other hand, offers near-line storage at a low cost that's suitable for file systems. Be aware that near-line storage has a higher latency. If you are looking for archival storage, it may make sense to choose a single cloud provider to keep storage management to a minimum. If redundancy is important, though, you may want to consider archiving on multiple clouds.
Proximity to computing is another key consideration. In general, data should be as near to computation resources as possible. This will increase performance and, as a result, cut costs. Additionally, copying data out of a cloud will often incur egress charges, so it may make sense to keep compute jobs in the same cloud that generates the data.
Data management entails using relational and non-relational -- also known as NoSQL -- database offerings for managing structured and semi-structured data. To manage data, there are two options: use a database as a service (DBaaS) or manage your own database. When deciding, you'll want to consider how you will store data, the physical location of the data, what level of latency this entails and the environment's durability. Additionally, be aware of the costs involved if you move the data in the event the vendor or storage method doesn't work out.
Using a DBaaS tightly couples database operations to a single vendor, although this isn't necessarily a bad thing. One option is AWS' DynamoDB, which is well suited for key value and document data stores. It offers low latency and configurable consistency with virtually no database administration overhead. It does, however, employ a proprietary database that's unavailable from other vendors. Using proprietary database tools such as DynamoDB can stretch operations staff thin if you need proprietary services with multiple clouds.
When using multiple cloud providers, your best option is most likely to manage your own databases. While this sounds like a costly chore, orchestration tools can help. These tools make managing your own data store more effective across clouds if you script configurations for databases. An advantage of using multiple cloud providers is that you can store backups across clouds, thus enabling a multivendor, cloud-based disaster recovery strategy.
Regardless of which data management method is chosen, always keep data governance in mind. Where the data is stored can affect how it needs to be treated and secured. Safe Harbor is gone, so consider using another regulation, such as HIPAA, as a guideline, and plan for data storage strategies based on geography.
The abstraction and security questions
Vendor-specific tools can lock you out of some of cloud computing's flexibility. Still, there are ways to get around this issue and decouple your resources from vendor-specific platforms. Certain tactics will hide implementation details of specific cloud vendors. For example, you could use thin layers of abstraction for common functions, such as object store.
This would entail using a cross-cloud API such as Apache Libcloud. Libcloud is an open source Python API that enables users to interact with a variety of cloud providers. The API is supported by AWS Elastic Cloud Compute and S3 instances, as well as Google Compute Engine and Rackspace. Supported Python versions include 2.5, 2.6, 2.7, PyPy and Python 3.
Use specialized services, such as AWS Lambda, sparingly. Consider implementing functions you would run in Lambda in Docker containers instead. Containers are commonly available across clouds.
If you plan to decouple from a single vendor, it is recommended you run your own services. This is particularly true for databases. However, always take cost into consideration; a DBaaS may end up saving you money in the long run.
Security becomes more challenging when using multiple cloud providers because you have to implement multiple versions of the same controls across clouds. For example, instead of implementing multiple directories, implement one directory and make it available via federation to multiple clouds. Use a common infrastructure, such as LDAP or Active Directory, as much as possible. As for enforcing policies across clouds, third-party services and tools are indispensable.
For complex identity management tasks, consider third-party services such as Ping Identity. Ping Identity offers a number of tools that are spread across platforms, such as multifactor authentication and user identity management.
For security log management, a strong third-party option is Loggly, which uses open source protocols, and so is compatible with multiple cloud providers. The tool reads and consolidates a number of text-based log types, including Ruby, Java, Python, PHP and MySQL, among others. Alert Logic offers a full suite of security tools with its Cloud Defender product. Cloud Defender collects security data, enables security data analytics and can perform threat analysis.
Managing infrastructure as code
When it comes to making effective use of multiple clouds, managing infrastructure as code is a good place to start. Software developers have created sophisticated techniques and tools for managing multiple versions of frequently changing code. These tools and practices can be used to manage infrastructure when using declarative specifications. Follow practices that require any resource deployed to the cloud to be done in a scripted manner. If all goes well, system admins should not be manually adding and removing resources or changing configurations; this should be done through scripts that are deployed using either third-party or proprietary tools.
This mentality stretches beyond code deployment. Admins can take advantage of third-party services to make better use of multiple clouds. A cloud service brokerage, which serves as an intermediary between a cloud provider and the user, is one such service worth considering. Another option is cloud aggregation tools for integration between clouds to facilitate workload management and cost management tools.
There are no hard and fast rules for working with multiple cloud providers, but there are practices that will limit the duplication of work, difficult migrations and security vulnerabilities.
An IT pro's survival guide for multi-cloud computing