- Scikit-Learn
- Keras
- Tensorflow
- Pytorch
Quick start guide
- Prerequisites
- Introduction
- MLflow UI server
- MLflow runner for training ML models
Prerequisites
- Create a project and add an SSH key
- Optionally download CLI tool
Introduction
In this deployment of MLflow we will set up one CUDO Compute virtual machine to serve the MLflow UI/Web app and store models and metrics from runs. We will then use a second CUDO Compute virtual machine to perform training, you can run as many of these as you like concurrently. They only need to run for the duration of training. Optionally you can use your local machine to run the web app if you are able to configure your network so that you have a port publicly accessible.MLflow UI server
Start a virtual machine on CUDO Compute, this can be CPU only no GPU. Use theUbuntu Minimal 20.04 image. This virtual machine should remain
running for the duration of your work. Pick something with 8GB RAM or more.
Get the IP address of the virtual machine. Enter replace the address in tracking_ip below with the IP address of the virtual machine and then
run the commands below.
~/mlruns directory and ~/mlruns.db file
MLflow UI server on a local machine
Make sure port 5000 of your local machine is publicly accessible.
MLflow runner for training ML models
Start another virtual machine on CUDO Compute, this can be CPU only or a GPU machine. Use theUbuntu 22.04 + NVIDIA drivers + Docker image.
The script below pulls a docker container for MLflow, then MLflow pulls a GitHub repository and runs it. The GitHub
repository is configured with MLflow projects. So when MLflow runs it creates a conda environment and installs the
necessary python packages. Then it runs the model training.
The training script logs its output to the MLFLOW_TRACKING_URI.
Get the IP address from your CUDO Compute virtual machine that is used for training and replace runner_ip with it
Get the IP address from your CUDO Compute virtual machine that is used for the MLFlow UI and replace tracking_ip with it