Getty Images/iStockphoto

Tip

How and why to run machine learning workloads on Kubernetes

Running ML model development and deployment on Kubernetes is an absolute must in a world where decoupling workloads can optimize resources and cut costs.

Machine learning and AI have moved into the mainstream. Regardless of their job role, most business and IT professionals are now familiar with leading AI tools like ChatGPT.

As the buzz around AI grows, so do the engineering needs in ML and AI. In particular, managing machine learning workloads is top of mind for many organizations due to rising costs and complexity. Key considerations are related to how models are trained and deployed, including the scalability, efficiency and cost-effectiveness of those processes.

As ML use cases have become increasingly complex, training ML models has become more resource-intensive and less cost-effective. In fact, it's quite expensive -- and a key reason that GPUs have become so pricey and sought-after. Containerizing ML workloads can help solve these challenges.

Containerization can alleviate many of the challenges associated with ML model development and deployment, including scaling, automation and infrastructure sharing. Kubernetes, a popular tool for containerizing workloads, is a viable option for organizations looking to streamline their ML processes.

Kubernetes basics

Over the years, engineering priorities have shifted, but one consistent trend is the need to minimize applications' operational footprints. From the mainframes of the late 1980s to modern servers and later virtualization, the trend has been toward minimalism.

After virtualization, containers emerged as a method for decoupling application stacks into the smallest possible entities while maintaining performance. Containers started with cgroups and namespaces in Linux, but gained more widespread popularity with Docker. The problem was that containers alone didn't scale well; if a container went down, it didn't automatically start back up.

Kubernetes, an open source platform for managing containerized workloads, came onto the scene to fix this issue. As an orchestration tool, Kubernetes not only helps developers build containerized applications, but also facilitates workload scaling, ensuring that containers are always active and properly managed.

In Kubernetes, containers run inside resources called pods, which house all the information needed to run the application. In addition to containers, Kubernetes has also become valuable for orchestrating other types of resources, such as virtual machines.

Machine learning on Kubernetes

AI and ML systems' demands are a major driver of the recent surge in GPU costs, which has posed challenges for consumers and tech pros alike.

ML systems require vast amounts of system power, including CPU, memory and GPU resources. Without ample compute, the training process can be highly time-consuming, especially for larger models. Traditionally, this forced users to buy multiple servers to train models, as there was no way to efficiently share those resources.

That's where Kubernetes comes into play with its ability to orchestrate containers and decouple workloads. Within a Kubernetes cluster, multiple pods can run models simultaneously, using the same CPU, memory and GPU power for training.

This can assist with many ML practices, including automated deployment and scaling. Although there still needs to be a powerful Kubernetes cluster with a GPU attached to the worker nodes, the ability to share resources increases production velocity and reduces costs.

Examples of ML workloads that can be run on Kubernetes include the following:

  • Distributing model training tasks across multiple pods at the same time.
  • Automatically deploying models to production, with the ability to make updates and rollbacks as needed.
  • Optimizing model performance by concurrently running multiple hyperparameter tuning experiments.
  • Scaling workloads dynamically based on demand at inference time.

ML on Kubernetes pitfalls

Running ML workloads on Kubernetes is a stable and popular option. Even OpenAI, the creator of ChatGPT, runs its experiments on Kubernetes. However, organizations should be aware of two notable disadvantages:

  • Tool maturity. Software designed for running ML on Kubernetes, such as Kubeflow, is still relatively young. Because these tools are evolving, they might undergo changes over time, leading to instability and increased time spent keeping up with the latest developments.
  • Talent availability. Finding experts with the knowledge and experience to effectively run ML on Kubernetes can be expensive and time-consuming. The specialized combination of IT operations and AI skills is in demand and relatively rare, making hiring costly and challenging.

Tools for machine learning on Kubernetes

Kubernetes by itself isn't equipped to manage ML workloads; instead, it needs specific tools or software designed to run ML workloads on top of Kubernetes. These tools integrate with Kubernetes, using its orchestration capabilities to handle the specialized requirements of ML tasks.

Just as Kubernetes uses a container runtime interface to interact with the software running containers, it uses a flexible plugin model to ensure that it can manage different types of resources. There are three primary ML tools in the Kubernetes ecosystem:

  • Kubeflow, an open source platform for running and experimenting with ML models on Kubernetes.
  • MLflow, a tool for running ML model training via a Flask interface as the inference endpoint.
  • KubeRay, a tool built by the creators of Ray, an open source framework for scaling AI and Python-based applications. KubeRay adapts Ray's capabilities for Kubernetes environments.

Another option is to use TensorFlow on Kubernetes. However, TensorFlow isn't built specifically for Kubernetes, so it lacks the dedicated integration and optimization of Kubernetes-focused tools like Kubeflow.

For those looking to run ML workloads on Kubernetes, exploring Kubeflow first is often the best option. At the time of writing, Kubeflow is the most advanced and mature tool in terms of capabilities, ease of use, community support and overall functionality.

Michael Levan is a cloud enthusiast, DevOps pro and HashiCorp Ambassador. He speaks internationally, blogs, publishes books, creates online courses on various IT topics and makes real-world, project-focused content to coach engineers on how to create quality work.

Next Steps

Set up a machine learning pipeline in this Kubeflow tutorial

Dig Deeper on AI business strategies