Creativa - Fotolia
In today's cloud world, IT organizations are exploiting hybrid and multi-clouds across their business. From the perspective of containers, we know that application mobility, flexibility and efficiency are built-in from day one. But what about the mobility of container data? How can you build a data framework that optimizes public cloud and delivers real mobility for applications and their data?
Containers and block storage
The popularity of containers stems from Docker Inc. introducing us to the frameworks of container runtimes, application libraries and multinode (server) configurations. Docker was quickly subsumed by Kubernetes, Google's open source container platform from.
Both platforms use block-based storage to provide persistence for container data. Initially, it was assumed that containers would be stateless, using application replication and redundancy to maintain access to persistent data. This proved impractical, and it's accepted that some form of persistent storage is needed, even for a short-lived container.
Block storage is fast and offers low latency to applications. In container deployments, a block device is formatted with a local file system and mapped into a container. Depending on usage requirements, the block device can survive the life of the container or be used only while the container is running.
If an application task in a container is restartable, then block storage doesn't have to offer anything more than scalable, fast storage for the container data. By restartable, we mean the application component can be instantiated with a freshly formatted empty block device.
However, as we step outside these simple boundaries, storage must offer more. If a container needs to restart due to hardware or software failure, for instance, then it may be practical to re-use the data on an existing block device rather than re-create it from another source. If part of an application must move to another physical location, then the container data might have to move, too.
Building a data plane framework
In order to achieve data and application mobility, we need to build a framework for the data plane. Data will outlive any individual container and will need to be mobile across multiple data centers and locations. This might mean moving between public clouds and on-premises locations. A good example is the requirement to replicate data to the public cloud to seed test/development environments.
Object storage is one of many ways to build a data framework. It's inherently mobile and accessible over WANs using the http(s) protocol. Object stores provide the capability to replicate over distances and can easily work cross-platform. The main challenges for object storage are to implement good security and map data to an application hierarchy, for example, using buckets and folders to map to application names.
Adding block and file storage to a container framework is more complex. Block storage has had lower latency and greater throughput than file storage, but this is no longer the case. Using new media such as NVMe, startups are building high-performance, scalable file systems that work both on premises and in public clouds.
Another challenge for local block storage, such as AWS' Elastic Compute Cloud, is that these devices can only be connected to local containers or virtual instances. There's no direct way to replicate block storage within the public cloud to a different provider or on premises.
If data portability is essential, then the best option is to build an independent data plane that doesn't rely on the public cloud provider's native storage. Products exist that provide scale-out block and file storage, either across single or multiple geographic locations. This includes merging public and private clouds.
Some scale-out products can present a single view of data over multiple locations, while others use snapshots to replicate data. This means data is copied and processes are in place to ensure that the most recent copy can be tracked and managed. When making data available across a wide area, there will be latency issues.
The role of APIs and remaining challenges
Ecosystem developers have started to add APIs for storage. Kubernetes has the Container Storage Interface (CSI), which exposes a set of functions to create and attach volumes to containers. CSI ensures that storage is successfully mapped to containers within pods that can be deployed across multiple physical servers.
Docker uses volume plug-ins to achieve the same functionality as CSI. Users can then deploy volume drivers from storage hardware and software that ensure data is mapped correctly from storage to container.
Challenges remain when it comes to storage associated with container data and multiple container deployments and seamless data migration among platforms. AWS Fargate, for example, doesn't support external volumes other than those in Elastic Cloud Compute. Building a true multi-cloud container environment currently means using bespoke container builds rather than native cloud features.
How to deal with Kubernetes data storage challenges
Find out about data backup in containers
Learn how to prep a container infrastructure