Getty Images/iStockphoto

Tutorial

Set up a machine learning pipeline in this Kubeflow tutorial

For teams running machine learning workflows with Kubernetes, using Kubeflow can lead to faster, smoother deployments. Get started with this installation guide.

Chris Tozzi

By

Chris Tozzi

Published: 19 Apr 2023

You don't have to use Kubernetes to power machine learning deployments. But if you do -- and there are many reasons why you might want to -- Kubeflow is the simplest and fastest way to get machine learning workloads up and running on Kubernetes.

Kubeflow is an open source tool that streamlines the deployment of machine learning workflows on top of Kubernetes. Kubeflow's main purpose is to simplify setting up environments for building, testing, training and operating machine learning models and applications for data science and MLOps teams.

It's possible to deploy machine learning tools such as TensorFlow and PyTorch on a Kubernetes cluster directly without using Kubeflow, but Kubeflow automates much of the process required to get these tools up and running. To decide whether it's the right choice for your machine learning projects, learn how Kubeflow works, when to use it and how to install it to deploy a machine learning pipeline.

The pros and cons of Kubernetes and Kubeflow for machine learning

Before deciding whether to use Kubeflow specifically, it's important to understand the pros and cons of running AI and machine learning workflows on Kubernetes in general.

Should you run machine learning models on Kubernetes?

As a platform for hosting machine learning workflows, Kubernetes offers several advantages.

The first is scalability. With Kubernetes, you can easily add or remove nodes from a cluster to modify the total resources available to that cluster. This is particularly beneficial for machine learning workloads, whose resource consumption requirements can fluctuate significantly. For example, you might want to scale your cluster up during model training, which usually requires a lot of resources, then scale back down to reduce infrastructure costs after training is done.

Machine learning project steps: Identify a business problem, lay out process and gather info from experts, choose and prepare data, choose and tune algorithm, and retune based on results. — Tools such as Kubeflow can speed up deployment of machine learning projects by standardizing and streamlining stages of the model development lifecycle.

Hosting machine learning workflows on Kubernetes also offers the advantage of providing containers access to bare-metal hardware. This is useful for accelerating the performance of your workloads using GPUs or other hardware that wouldn't be accessible on virtual infrastructure. Although you could access bare-metal infrastructure without using Kubernetes by running workloads in standalone containers, orchestrating containers with Kubernetes makes it easier to manage workloads at scale.

A major reason why you might not want to use Kubernetes to host machine learning workflows, however, is that it adds another layer of complexity to your software stack. For smaller workloads, a Kubernetes-based deployment might be overkill. In such situations, running workloads directly on VMs or bare-metal servers could make more sense.

When should you choose Kubeflow?

The chief advantage of using Kubeflow for machine learning is the tool's fast and simple deployment process. With just a few kubectl commands, you get a ready-to-use environment where you can start deploying machine learning workflows.

On the other hand, Kubeflow restricts you to the tools and frameworks it supports -- and might include some resources that you won't end up using. If you just need one or two specific machine learning tools, you might find it simpler to deploy them individually rather than with Kubeflow. But for anyone who needs a general-purpose machine learning environment on Kubernetes, it's hard to argue against using Kubeflow.

Kubeflow tutorial: Install and setup walkthrough

On most Kubernetes distributions, installing Kubeflow boils down to running just a few commands.

This tutorial demonstrates the process using K3s, a lightweight Kubernetes distribution that you can run on a laptop or PC, but you should be able to follow the same steps on any mainstream Kubernetes platform.

Step 1. Create a Kubernetes cluster

Start by creating a Kubernetes cluster if you don't already have one up and running.

To set up a cluster using K3s, first download K3s with the following command.

curl -sfL https://get.k3s.io | sh -

Next, run the command below to start a cluster.

sudo k3s server &

To check that everything's running as expected, run the following command.

sudo k3s kubectl get node

The output should resemble the following.

NAME           STATUS       ROLES            AGE    VERSION
chris-gazelle  Ready  control-plane,master   2m7s  v1.25.7+k3s1

Step 2. Install Kubeflow

With your cluster up and running, the next step is to install Kubeflow.

Use the following commands to do this on a local machine using K3s.

sudo -s
export PIPELINE_VERSION=1.8.5
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION"
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic-pns?ref=$PIPELINE_VERSION"

If you're installing Kubeflow on a nonlocal Kubernetes cluster, the commands below will work in most cases.

export PIPELINE_VERSION=<kfp-version-between-0.2.0-and-0.3.0>
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/base/crds?ref=$PIPELINE_VERSION"
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/dev?ref=$PIPELINE_VERSION"

Step 3. Verify that containers are running

Even after you install Kubeflow, it's not fully operational until all the containers that comprise it are running. Verify the status of your containers with the following command.

kubectl get pods -n kubeflow

If the containers aren't running successfully after several minutes, check out their logs to determine the cause.

Step 4. Start using Kubeflow

Kubeflow provides a web-based dashboard to create and deploy pipelines. To access that dashboard, first make sure port forwarding is correctly configured by running the command below.

kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80

If you're running Kubeflow locally, you can access the dashboard by opening a web browser to the URL http://localhost/8080. If you installed Kubeflow to a remote machine, replace localhost with the IP address or server hostname where you're running Kubeflow.

Next Steps

Meeting the challenges of scaling AI with MLOps

The rise of automation and governance in MLOps

Tips for planning a machine learning architecture

Dig Deeper on Containers and virtualization

Part of: What DevOps teams should know about MLOps

Up Next

Meeting the challenges of scaling AI with MLOps

As businesses race to capitalize on the promises of AI in the wake of ChatGPT's launch, strategies to move machine learning software from idea to reality are becoming essential.

Decide when and how to adopt an MLOps framework

Unsure where to start when it comes to standardizing your organization's machine learning processes? Explore key considerations and practical tips for adopting an MLOps framework.

Battle of the buzzwords: AIOps vs. MLOps square up

Another -Ops has entered the arena: MLOps. Is it just another buzzword, or does the term hold its own weight? Learn more about it and how it compares to AIOps.

DataOps vs. MLOps: Streamline your data operations

How many Ops combos can we get? What's DataOps? How is it different from MLOps? This article clarifies the differences and how to choose one over the other.

Set up a machine learning pipeline in this Kubeflow tutorial

For teams running machine learning workflows with Kubernetes, using Kubeflow can lead to faster, smoother deployments. Get started with this installation guide.

How to run ML workloads with Apache Spark on Kubernetes

IT staff looking for ways to maintain ML workloads with ease are increasingly turning to Apache Spark. Follow these simple steps to set up a Spark cluster on Kubernetes.

Search Software Quality

Google adds Gemini CLI for GitHub Actions coding agent
The beta version of Google Gemini CLI for GitHub Actions starts simple and builds in security, but overall, the 'honeymoon phase'...
Scrum master certification exam questions and answers
Are you ready for the Scrum master certification exam? Test yourself on these 10 tough Scrum master exam questions and answers.
8 examples of ethical issues in software development
As software becomes entrenched in every aspect of the human experience, developers have an ethical responsibility to their ...

Search App Architecture

Insomnia vs. Postman: Comparing API management tools
Insomnia has a streamlined interface and focus. Postman has extensive features for end-to-end development. Choosing comes down to...
8 best practices for creating architecture decision records
An ADR is only as good as the record quality. Follow these best practices to establish a dependable ADR creation and maintenance ...
Refactor vs. rewrite: Deciding how to fix problem software
At some point, all developers must decide whether to refactor code or rewrite it. Base this choice on factors such as ...

Search Cloud Computing

The cloud observability quiz: Are you monitoring or observing?
Ready to test your cloud observability expertise? Discover if you can distinguish between metrics, logs and traces while ...
A practical guide to PATs in Azure DevOps
In the rapidly evolving DevOps landscape, understanding how and when to use PATs empowers users to build flexible, secure and ...
AWS reports 17.5% growth, fails to impress investors
Amazon's cloud business delivered better-than-expected growth in the second quarter, but pales in comparison with results from ...

Search AWS

Compare Datadog vs. New Relic for IT monitoring in 2024
Compare Datadog vs. New Relic capabilities including alerts, log management, incident management and more. Learn which tool is ...
AWS Control Tower aims to simplify multi-account management
Many organizations struggle to manage their vast collection of AWS accounts, but Control Tower can help. The service automates ...
Break down the Amazon EKS pricing model
There are several important variables within the Amazon EKS pricing model. Dig into the numbers to ensure you deploy the service ...

TheServerSide.com

Product backlog vs. sprint backlog: What's the difference?
The sprint backlog and product backlog are important elements of Scrum and essential to iterative and incremental development. ...
Acceptance criteria vs. definition of done: What's the difference?
Software teams must understand the important distinction between acceptance criteria and definition of done and how to use them ...
Spring, Quarkus or Jakarta EE? How to choose a Java framework
Choosing a Java framework is not about which one is best, it's about accepting their tradeoffs of stability, flexibility and ...

Search Data Center

Server hardware guide: Architecture, products and management
Today's server platforms offer various options for SMBs and enterprise IT buyers; it's important to learn the essentials before ...
Trump fee for Nvidia, AMD China exports could face legal battle
The administration's unprecedented move may conflict with the U.S. Constitution's rules against export taxes.
The cloud rush: The rise of data centers in North Carolina
North Carolina is emerging as a data center hub due to its renewable energy options, tax incentives and skilled workforce, but it...

Close