E-Handbook: Big data containers gain wider appeal in system deployments Article 1 of 4

Containers give big data users new ways to set up systems

Running big data systems in containers has become a feasible option. Hadoop, Spark and other big data platforms can be deployed in clusters of Docker containers managed by orchestration frameworks like Kubernetes. Even mainstream databases are embracing big data containers. For example, SQL Server 2019 lets users mix Spark, Hadoop and relational databases in Kubernetes-based containers.

But the available technologies are still maturing, and they're being used primarily by early adopters willing to do some development work themselves. JD.com Inc. is a case in point. The Beijing-based online retailer has built a large Kubernetes-based container architecture that runs AI and big data analytics applications in various systems, including Spark, Flink, Storm and TensorFlow.

Applications can now be "automatically packaged into images and deployed onto containers in near real time," Haifeng Liu, JD.com's vice president of engineering and chief architect, said in a Q&A posted on the Cloud Native Computing Foundation's website in August 2018. But, he added, the company had to customize Kubernetes to fix performance issues -- a step that included adding new features, removing unnecessary ones and optimizing the technology's scheduler.

This handbook explores the use of big data containers and offers advice on how to deploy and manage them. First, we take a more in-depth look at potential applications and the hurdles that users face. Next, we detail a list of to-do items for containerizing big data systems. We close with a consultant's view of how big data, microservices and containers fit together.

Craig Stedman, Industry Editor

Container technologies promise more agility for big data apps

Along with the ability to provide greater agility and flexibility for big data applications, containers can play a role in IT strategy that drives real-time decision-making.
5 things to know about deploying big data systems in data containers

Planning for security and container APIs, and watching out for infrastructure sprawls are some issues to be aware of before deploying big data in containers.
Microservices and big data start to get closer

Microservices are riding a wave of user interest, leading to changes in IT operations. ThoughtWorks expert Zhamak Dehghani discusses what that means for big data.

E-Handbook: Big data containers gain wider appeal in system deployments

Article1 of 4

Up Next

Containers give big data users new ways to set up systems

Container technologies promise more agility for big data apps

5 things to know about deploying big data systems in data containers

Microservices and big data start to get closer