Although Bodywork is focused on deploying machine learning projects, it is flexible enough to deploy almost any type of Python project. We're going to demonstrate this by using Bodywork to deploy a production-ready instance of MLflow (a Flask app), to Kubernetes, in only a few minutes.
MLflow is a popular open-source tool for managing various aspects of the the machine learning lifecycle, such as tracking training metrics or versioning models. It can be used alongside Bodywork's machine learning deployment capabilities, to make for a powerful open-source MLOps stack.
All of the files mentioned below can be found in the bodywork-mlflow repository on GitHub. You can use this repo, together with this guide to deploy MLflow to your Kubernetes cluster. Alternatively, you can use this repo as a template for deploying your own Python project.
Once we get MLflow deployed, we'll demonstrate it in action by training a model, tracking metrics and storing artefacts in the model registry. We'll also discuss patterns for integrating MLflow with Bodywork machine learning pipelines and how you can add monitoring and alerting to the deployment, by integrating with Sentry.
If you already have access to a Kubernetes cluster, then install the Bodywork Python package using Pip and head straight down to 'Deploying to Kubernetes'. If you're new to Kubernetes, then don't worry - we've got you covered - read on.
Bodywork enables you to map executable Python modules to Kubernetes primitives - jobs and deployments. All you need to do is add to your project a single configuration file, `bodywork.yaml`, to describe how you want Bodywork to deploy your application - i.e., which executable Python scripts should be deployed as jobs (with a well defined start and end), and which should be deployed as service-deployments (with no scheduled end).
Based on the contents of `bodywork.yaml`, Bodywork creates a deployment plan and configures Kubernetes to execute it, using pre-built Bodywork containers for running Python modules. Bodywork containers use Git to pull your project's codebase from your remote Git repository, removing the need to build and manage bespoke container images.
Each unit of deployment is referred to as a stage and runs using its own Bodywork container. You are free to specify as many stages as your project requires. Stages can be executed sequentially and/or in parallel - you have the flexibility to specify a deployment workflow (or DAG).
It is precisely this combination of jobs, service-deployments, workflows and using Git repos as a means of distributing your codebase into pre-built container images, that makes Bodywork a powerful tool for quickly deploying machine learning projects. But it will also easily deploying something simpler, like MLflow.
The `bodywork.yaml` for our MLflow deployment is reproduced below. In what follows we will give a brief overview of the deployment it describes.
The name of the project, used to name Kubernetes resources created by the deployment.
The container image to use for running your Python executables - must be hosted from a public Docker Hub repository.
The deployment workflow. For this project, we have just one 'stage' to deploy, which we have named 'server'.
A deployment stage object, that we have named 'server'. The key-value pairs that follow, describe how 'server' will be deployed by Bodywork.
Path to the executable Python module within the project's Git repo, that you want Bodywork to run for this stage.
A list of Python package dependencies, required by the executable Python module. These will be installed using Pip from inside the Bodywork container, when it is created.
Encrypted credentials and other secrets can be mounted into Bodywork containers as environment variables. This set of key-value pairs defines the name of the encrypted value and the Kubernetes secret in which to find for it.
CPU resource to request from Kubernetes.
Memory resource to request from Kubernetes.
Bodywork stages can be one of two types: batch or service. The 'server' stage is a service, so we need to supply it with service-specific configuration: the port to open on the container, the number of container replicas to create, whether to open a public HTTP ingress route to the service and how long to wait for the service to successfully start-up.
For a complete discussion of how to configure a Bodywork deployment, refer to the User Guide.
Bodywork is distributed as a Python package, that exposes a Command Line Interface (CLI) for configuring Kubernetes to deploy Python projects, directly from remote Git repositories (e.g. GitHub). Start by creating a new Python virtual environment and installing Bodywork,
If you have never worked with Kubernetes before, then please don't stop here. We have written a guide to Getting Started with Kubernetes for MLOps, that will explain the basic concepts and have you up-and-running with a single-node cluster on your machine, in under 10 minutes.
Should you want to deploy to a cloud-based cluster in the future, you need only to follow the same steps while pointing to your new cluster. This is one of the key advantages of Kubernetes - you can test locally with confidence that your production deployments will behave in the same way.
Before we deploy to our Kubernetes cluster, we'll configure and run the server locally. Start by installing MLFlow,
And then cloning the GitHub repo containing the MLflow deployment project,
We have written a custom script for starting the MLflow server - mlflow_server.py. This is the executable Python module that we have configured the Bodywork container to run in the 'server' deployment stage described in `bodywork.yaml`.
The quickest way to get started is to use Python's in-built SQLite database as a back-end for MLflow to store things like model metrics, and to use the local file-system for storing artefacts, such as trained models. Our MLflow startup-script will look for this configuration in environment variables, so we will need to export these locally,
This will cause MLflow to create a `mlflow.db` database file, together with a folder called `mlflow_artefacts`. Now start the server,
To provide concurrent connections to multiple users from multiple service replicas, MLflow will need to use a database service that can support this scenario, as a backend store. Similarly, to make model artefacts available to anyone, it will need to use a common storage service. We are AWS users here at Bodywork HQ, so we have opted to use S3 for storing artefacts and an AWS managed Postgres database instance. Managed cloud services allow you to easily scale-out when required and also offer the convenience of automated backups, upgrades, etc. (albeit, at a price).
To test this setup locally, we need to install some more Python packages,
See the MLflow documentation for how to configure MLflow to use your chosen database and artefact store. To their credit, they provide a lot of options.
Now, set the MLflow environment variables for our chosen production setup.
We also need to make sure that our local machine is configured to use the AWS CLI, or that the following environment variables have been set:
Now re-start the server to test the connection.
As before, open a browser at `http://localhost:5000` to access the MLflow UI.
Start by configuring a new namespace for use with Bodywork,
Then check the `bodywork.yaml` for errors,
Create a Kubernetes secret that contains values for `MLFLOW_BACKEND_STORE_URI` and `MLFLOW_DEFAULT_ARTIFACT_ROOT`, to be mounted as environment variables into the containers running MLflow. If you don't have a database and/or cloud object storage available and just want to play with a toy deployment, then use the defaults shown below.
If you want to use cloud object storage, then create a secret to contain your cloud access credentials - e.g., for AWS we would use,
Now deploy MLflow, using the Bodywork deployment described in the bodywork-mlflow GitHub repo,
This will run the workflow-controller (locally) to orchestrate the deployment and will stream logs to stdout.
Wait until the deployment has finished and then create a local proxy into the cluster,
Now open a browser to the location of the MLflow service on your cluster,
If you want to run this notebook locally, the you'll need to install the following Python packages,
And then fire-up Jupyter Lab,
Kubernetes greatly simplifies networking between services in the cluster. From within the cluster, you will be able to access the tracking server using the following URL,
Any Bodywork stage can make use of the tracking server at this location, by setting the tracking URI (as we did in the demo notebook),
This enables us to revisit the Bodywork Quickstart Tutorial for serving a model. This project also provides predictions for the iris classification task, as used in the MLflow demo notebook. In this example deployment, we used the joblib package to load a model that was persisted as an artefact in the project's repo (not a best-practice, albeit pragmatic). We can now modify service.py to collect the latest 'production' model from MLflow. If we assume that this is the same 'production' version as the one trained in the MLflow demo notebook, then all we need to do is swap these units of code,
For these ones,
We can also take this one step further and create a Bodywork cronjob, that will re-deploy (and hence re-run the server start-up code), on a schedule. For example, issuing the following command,
Will cause Bodywork to trigger a rolling re-deployment of the prediction web API, every hour. Each time, it will load the most recent version of the model that has been pushed to 'production' - either manually, or as part of an automated re-training pipeline. Thereby demonstrating how Bodywork can be used to implement continuous delivery for machine learning.
Here at Bodywork HQ, we are becoming fans of using Sentry for monitoring Python services and generating alerts. Sentry is free for individual users, but charges for business use (i.e. Teams). We have setup `mlflow_server.py` to configure Sentry, if it has been installed and configured for your deployment.
To ensure that Sentry gets installed in the Bodywork containers running the MLflow service replicas, modify `bodywork.yaml` to include the Sentry Python client in the list of requirements.
Then, modify `bodywork.yaml` again to retrieve the Sentry client key from a Kubernetes secret called sentry-integration,
Create the Kubernetes secret containing your secret client key,
Finally, re-deploy the server,
Which we trigger remotely this time, as there is no need observe the logs as they are generated.
Congratulations - you now have a production-worthy deployment of MLflow!
Learn about the latest features and releases.