georgejmclittle - Fotolia
Data streams are a common pattern in modern software architectures. They consist of high-volume data producers that constantly feed information into a target system. This target system must be able to ingest and process incoming data in real time, with minimal delay and at virtually any incoming volume.
AWS offers two avenues to ingest and process data streams -- Amazon Kinesis and Amazon Managed Streaming for Apache Kafka (Amazon MSK). While these services address the same tasks, they have different strengths and weaknesses.
Review this brief explainer of the main similarities and difference between Amazon MSK and Amazon Kinesis so you are better equipped to choose a data stream process for your application.
Amazon MSK vs. Kinesis overview
Amazon Kinesis is a managed service for real-time data stream ingestion and processing. At one end of the stream, data producers write the data. At the other end, data consumer applications process the data that has been published.
Amazon Kinesis, which is serverless, also provides managed features such as Amazon Kinesis Analytics and Amazon Kinesis Firehose, which analyze data and eventually send it to permanent storage. AWS also offers Amazon Kinesis Video Streams, which has features specific to video processing and analysis, including built-in integrations with AWS machine learning services such as Amazon SageMaker and Amazon Rekognition.
Alternatively, Amazon MSK is a managed version of the popular open source data streaming service Apache Kafka. Developers using Amazon MSK have to explicitly provision Kafka instances in Amazon EC2, even though the instances are then managed by Amazon MSK.
The amount of data that can be ingested or consumed in Amazon Kinesis is driven by the number of shards assigned to a stream. Capacity in Amazon MSK is directly driven by the number and size of Amazon EC2 instances deployed in a cluster.
Amazon Kinesis offers a default data retention period of 24 hours, which can be extended up to seven days. Amazon MSK offers virtually unlimited data retention, driven by the amount of storage provisioned in the cluster through the size of Amazon Elastic Block Store volumes mounted to Amazon EC2 instances and the total number of instances.
Amazon MSK vs. Kinesis: Find the right fit for your app
Amazon MSK is a good option for applications where data producers and consumers already use Apache Kafka libraries and need to migrate them to the cloud with minimal friction or code updates. It could also be a good option for application owners who are concerned about vendor lock-in and want to build data streams that could be migrated more easily out of a particular cloud.
Given its serverless nature, Amazon Kinesis is a more straightforward way to provision data streams in the cloud, but it results in higher vendor lock-in due to the use of AWS-specific concepts and features. Amazon Kinesis' integration with other Amazon cloud services saves significant development and maintenance time. And its built-in features, such as Video Streams, Analytics and Firehose, also simplify application development and maintenance compared to Amazon MSK.
Dig Deeper on Cloud provider platforms and tools
Related Q&A from Ernesto Marquez
Trying to decide between Amazon EMR, Amazon Redshift and Amazon Athena? Check out this overview of capabilities and use cases to help narrow down ... Continue Reading
Lambda and VPCs are essential to many AWS architectures, but they don't come together as intuitively as you might think. Learn how to configure ... Continue Reading
There are two primary ways to handle capacity in DynamoDB: on-demand or provisioned. Learn the advantages, concerns and use cases for each option. Continue Reading