How to Build Machine Learning Pipelines with ZenML

TL;DR

The Goal: Build scalable, modular, and reproducible machine learning pipelines that can transition seamlessly from local development to cloud production.
The Stack: ZenML, Python, and your choice of MLOps integrations (e.g., MLflow for tracking, local/cloud orchestrators).
The Outcome: A fully functional ML pipeline that manages data loading, preprocessing, model training, and artifact tracking without infrastructure spaghetti code.

Why Build This?

Historically, transitioning machine learning models from Jupyter Notebooks into production environments is a massive bottleneck. Data scientists write monolithic scripts that work perfectly on their local machines but break the moment they hit a staging or production server. This happens because code, data, and infrastructure are tightly coupled. When you need to swap your local compute for a Kubernetes cluster, or change your tracking from local JSON files to MLflow, you end up rewriting significant portions of your pipeline.

ZenML solves this by abstracting the infrastructure layer away from your pipeline code. It introduces the concept of a "Stack" — a decoupled configuration that defines where and how your code executes. By building with ZenML, you enforce modularity. Your data loader is a discrete step, your preprocessor is another, and your training loop is a third. This architecture guarantees reproducibility and allows you to swap infrastructure components via simple CLI commands, ensuring your ML workflows are truly production-ready from day one.

The Architecture

ZenML operates on a client-server model that tracks metadata across your ML lifecycle.

At the core, you have Steps and Pipelines. A Step is a single unit of work (e.g., training a model), and a Pipeline is a Directed Acyclic Graph (DAG) that connects these steps.

When a pipeline runs, the execution is governed by the active Stack. A Stack is a collection of integrations. For example, a basic local stack might consist of a local orchestrator (to run the code) and a local artifact store (to save outputs). A production stack might use Vertex AI as the orchestrator, an S3 bucket as the artifact store, and MLflow as the experiment tracker.

The ZenML Server acts as the central nervous system. It stores the metadata of every run, ensuring you know exactly which version of data produced which model artifact. The code you write remains identical whether you are running the pipeline on your laptop or on an enterprise cloud cluster.

Step-by-Step Implementation

Initializing the Environment and Server

First, you need to install ZenML and initialize a local project. The ZenML server provides a dashboard to visualize your DAGs and artifacts.

# Install ZenML with server capabilities
pip install "zenml[server]"

# Initialize ZenML in your project directory
zenml init

# Start the local dashboard and server
zenml login --local

This creates a .zen repository to track your pipeline configurations. The local server provides immediate visual feedback for your runs.

Defining Pipeline Steps

In ZenML, you define individual tasks as standard Python functions decorated with @step. These steps are modular and reusable. When a step executes, ZenML automatically captures its inputs and outputs, saving them in the artifact store.

from zenml import step
from typing import Tuple, Annotated

@step
def load_data() -> Annotated[dict, "raw_data_dict"]:
    """Simulates loading raw data."""
    # In a real scenario, you would fetch from a database or storage bucket
    data = [[1.2, 3.4], [5.6, 7.8], [9.1, 2.3]]
    labels = [0, 1, 0]
    return {'features': data, 'labels': labels}

@step
def preprocess_data(data: dict) -> Annotated[dict, "processed_data"]:
    """Normalizes the feature data."""
    # Mock preprocessing logic
    features = [[x * 10 for x in row] for row in data['features']]
    return {'features': features, 'labels': data['labels']}

@step
def train_model(data: dict) -> None:
    """Mock training loop."""
    print(f"Training on {len(data['features'])} samples...")
    # Add actual training logic here (e.g., using scikit-learn or PyTorch)

Notice how type hinting and the Annotated type are used. ZenML uses these to enforce data contracts between steps and to label the artifacts stored in your backend.

Orchestrating the Pipeline

Once your steps are defined, you wire them together using the @pipeline decorator. The pipeline definition is declarative; it simply states how the outputs of one step flow into the inputs of the next.

from zenml import pipeline

@pipeline
def baseline_ml_pipeline():
    """Connects the steps into a Directed Acyclic Graph."""
    raw_data = load_data()
    processed_data = preprocess_data(raw_data)
    train_model(processed_data)

if __name__ == "__main__":
    # Execute the pipeline
    baseline_ml_pipeline()

When you run this script (python pipeline.py), ZenML handles the execution graph, caches results if steps have not changed, and logs all metadata to the active stack.

Managing Stacks for Production

The true power of ZenML is in its Stack management. If you want to scale this pipeline to run on Kubernetes and track experiments with MLflow, you do not touch your Python code. You configure a new stack via the CLI.

# Register an MLflow experiment tracker
zenml experiment-tracker register mlflow_tracker --flavor=mlflow

# Register a local or cloud artifact store
zenml artifact-store register s3_artifact_store --flavor=s3 --path=s3://my-bucket/zenml

# Register a cloud orchestrator (e.g., Kubernetes)
zenml orchestrator register k8s_orchestrator --flavor=kubernetes

# Compose them into a stack and set it as active
zenml stack register prod_stack -o k8s_orchestrator -a s3_artifact_store -e mlflow_tracker
zenml stack set prod_stack

Now, the exact same python pipeline.py command will automatically containerize your code, deploy the pods to Kubernetes, store artifacts in S3, and log metrics to MLflow. Check the official ZenML Stack documentation for deep dives into specific integrations.

The Results & Takeaways

Absolute Reproducibility: Every run is versioned. You can track exactly which commit produced which model in the artifact store.
Decoupled Infrastructure: Your Python code is clean and focused entirely on ML logic, while the infrastructure is managed via declarative Stacks.
Seamless Scalability: Moving from local compute to cloud orchestrators requires zero code changes, drastically reducing the friction between data science and DevOps teams.

What you can build next: Integrate this pipeline with a model deployer like BentoML to automatically serve the model once the training step completes, or hook it into a CI/CD pipeline using GitHub Actions to trigger retraining runs on schedule.

Frequently Asked Questions

Raw Python scripts lack built-in artifact tracking, infrastructure decoupling, and reproducibility. While Airflow is an excellent generic orchestrator, it is not purpose-built for ML; it lacks native concepts like Artifact Stores and Model Registries. ZenML provides a specialized, ML-first abstraction layer that can actually run on top of Airflow if needed.