π MLflow Integration and Deployment
This document provides a comprehensive guide to integrating MLflow for experiment tracking and model management. It covers connecting a client application to a remote MLflow server and deploying a robust tracking server on AWS.
π 1. Connecting a Client to MLflow
This section explains how a client application (e.g., a prediction API in a Docker container) securely connects to a remote MLflow server to fetch models and artifacts.
βοΈ 1.1. The Connection Mechanism
The client connection is a two-step process managed by environment variables.
-
Initial Connection to the Tracking Server:
- The client application needs the address of the MLflow tracking server on the EC2 instance.
- This is set using the
MLFLOW_TRACKING_URIenvironment variable. - The MLflow client library automatically uses this variable to initiate a network request to the server.
-
Fetching Artifacts from S3:
- After connecting, the tracking server tells the client where the model files are located (typically an S3 bucket path).
- The client then needs permissions to access S3, which are provided via AWS credentials.
- These are set using the
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY, andAWS_DEFAULT_REGIONenvironment variables.
π΅οΈ 1.2. Automatic Credential Detection
You do not need to install the AWS CLI or run
aws configureinside the Docker container. The AWS SDK for Python (boto3), used by MLflow, automatically finds and uses credentials from environment variables.
When boto3 detects AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY in the container's environment, it authenticates with AWS services like S3 automatically.
π 1.3. Production Best Practice: Environment Files
The most secure and standard method for providing credentials to a Docker container is using an environment file (e.g., prediction_app.env).
Warning: This file contains sensitive credentials and must never be committed to version control. Add it to your
.gitignorefile.
1. Create the .env file:
# ----------------------------------
# MLflow Production Configuration
# ----------------------------------
# The public IP or domain of your EC2 instance running the MLflow server.
MLFLOW_TRACKING_URI="http://<YOUR_EC2_IP_ADDRESS>:5000"
# ----------------------------------
# AWS Credentials for MLflow Artifacts
# ----------------------------------
# Credentials for an IAM user with read-only access to your S3 artifact bucket.
AWS_ACCESS_KEY_ID="<YOUR_AWS_ACCESS_KEY_ID>"
AWS_SECRET_ACCESS_KEY="<YOUR_AWS_SECRET_ACCESS_KEY>"
AWS_DEFAULT_REGION="<YOUR_S3_BUCKET_REGION>"
2. Run the container with the --env-file flag:
This command injects the variables into the container at runtime.
docker run --env-file ./src/prediction_server/prediction_app.env -p 9000:9000 your-image-name
βοΈ 2. Deploying an MLflow Tracking Server on AWS
This section provides a step-by-step guide to setting up a robust MLflow Tracking Server using EC2 (server), S3 (artifact storage), and RDS (backend database).
πͺ£ 2.1. Step 1: Create an S3 Bucket for Artifacts
- Navigate to S3: Log into your AWS account and go to the S3 service.
- Create Bucket: Click "Create bucket" and configure the following:
- Bucket name: Must be globally unique (e.g.,
yourname-mlflow-artifacts-2025). - AWS Region: Choose a region (e.g.,
ap-south-1). Important: Launch all other services (EC2, RDS) in the same region. - Block Public Access: Keep "Block all public access" checked for security.
- Bucket Versioning: Enable to protect against accidental data loss.
- Tags (Recommended): Add a tag for cost tracking (e.g.,
Key: Project,Value: mlflow-server). - Default encryption: Keep the default (
SSE-S3).
- Bucket name: Must be globally unique (e.g.,
- Finalize: Review your settings and click "Create bucket".
π 2.2. Step 2: Create a PostgreSQL Database with RDS
- Navigate to RDS: In the AWS Console, go to the RDS service.
- Create Database: Click "Create database" and follow the wizard:
- Creation method: Select "Standard Create".
- Engine: Choose "PostgreSQL".
- Templates: Select the "Free tier" template.
- Settings:
- DB instance identifier:
mlflow-db. - Master username:
mlflow_user. - Master password: Create a strong password and store it securely.
- DB instance identifier:
- Connectivity:
- Public access: Select "No".
- VPC security group: Choose "Create new" and name it
mlflow-db-security-group.
- Additional configuration:
- Initial database name: Enter
mlflow_db. This is crucial.
- Initial database name: Enter
- Finalize: Review the settings and click "Create database".
Note: Database creation can take 10-15 minutes.
π₯οΈ 2.3. Step 3: Launch an EC2 Virtual Server
- Navigate to EC2: Go to the EC2 service in the AWS Console.
- Launch Instance: Click "Launch instance" and configure:
- Name:
mlflow-server. - AMI: Select Ubuntu (Free tier eligible).
- Instance type: Choose
t2.micro(Free Tier eligible). - Key pair (login):
- Click "Create new key pair", name it
mlflow-key, and keep the defaults. - The
.pemfile will download. Store this file securely.
- Click "Create new key pair", name it
- Network settings:
- Click "Edit".
- Create a new security group (
mlflow-server-sg) with these inbound rules:- SSH:
Type: SSH,Source: My IP(for better security). - HTTP:
Type: HTTP,Source: Anywhere. - Custom TCP:
Type: Custom TCP,Port: 5000,Source: Anywhere.
- SSH:
- Name:
- Launch: Review the summary and click "Launch instance".
π€ 2.4. Step 4: Connecting the Components
2.4.1. Connect EC2 and RDS Security Groups
Create a firewall rule to allow the EC2 instance to communicate with the RDS database.
- Navigate to the RDS dashboard, select your
mlflow-db, and go to the "Connectivity & security" tab. - Click on the active VPC security group (
mlflow-db-security-group). - Go to the "Inbound rules" tab and click "Edit inbound rules".
- Add a new rule:
- Type:
PostgreSQL. - Source: Select your EC2 security group (
mlflow-server-sg).
- Type:
- Click "Save rules".
2.4.2. Connect to Your EC2 Instance
- Go to the EC2 dashboard, select your
mlflow-server, and copy the "Public IPv4 address". - Open a terminal and make your key file private:
chmod 400 /path/to/your/mlflow-key.pem
- Connect via SSH:
ssh -i /path/to/your/mlflow-key.pem ubuntu@<YOUR_PUBLIC_IP_ADDRESS>
2.4.3. Create and Attach an IAM Role for S3 Access
Grant your EC2 instance permissions to access the S3 bucket.
-
Create an IAM Policy:
- Go to IAM > Policies > "Create policy".
- Use the visual editor:
- Service:
S3. - Actions:
ListBucket,GetObject,PutObject,DeleteObject. - Resources: Specify the ARN for your bucket (
arn:aws:s3:::your-bucket-name) and the objects within it (arn:aws:s3:::your-bucket-name/*).
- Service:
- Name the policy
MLflowS3AccessPolicy.
-
Create an IAM Role:
- Go to IAM > Roles > "Create role".
- Trusted entity:
AWS service. - Use case:
EC2. - Attach the
MLflowS3AccessPolicyyou just created. - Name the role
MLflowEC2Role.
-
Attach the Role to EC2:
- In the EC2 dashboard, select your
mlflow-server. - Go to "Actions" > "Security" > "Modify IAM role".
- Select
MLflowEC2Roleand save.
- In the EC2 dashboard, select your
2.4.4. Ubuntu Server Setup Best Practices
- Update Your System:
sudo apt update && sudo apt upgrade -y
- Create a New User:
# Replace 'your_username' with a chosen name
sudo adduser your_username
sudo usermod -aG sudo your_username
> Log out and log back in as the new, non-root user for daily work.
- Set Up a Basic Firewall:
sudo ufw allow OpenSSH
sudo ufw allow 80/tcp
sudo ufw allow 5000/tcp
sudo ufw enable
π οΈ 2.5. Step 5: Install MLflow Software
- Install Tools:
sudo apt update
sudo apt install python3-pip python3-venv -y
- Create a Virtual Environment and Install Packages:
python3 -m venv mlflow-env
source mlflow-env/bin/activate
pip install mlflow boto3 psycopg2-binary
βΆοΈ 2.6. Step 6: Launch the MLflow Server
This command connects all the components. You will need your RDS Endpoint, RDS Password, and S3 Bucket Name.
SQLAlchemy Connection String
The required format is
postgresql://<user>:<password>@<host>:<port>/<database>.
Execute the following, replacing all placeholders:
mlflow server \
--backend-store-uri postgresql://mlflow_user:<YOUR_RDS_PASSWORD>@<YOUR_RDS_ENDPOINT>/mlflow_db \
--default-artifact-root s3://<your-s3-bucket-name>/
--host 0.0.0.0 \
--port 5000
--host 0.0.0.0Explained: This tells the server to listen on all available network interfaces, making the UI accessible via the instance's public IP address.
To keep the server running after you disconnect, use a screen session:
# Start a new named session
screen -S mlflow
# Activate environment and run the mlflow server command
source mlflow-env/bin/activate
mlflow server ... # (paste the full command from above)
# Detach from the session by pressing Ctrl+A, then D.
π» 2.7. Step 7: Local Machine Setup
2.7.1. Connect Your Local Project
Configure your local machine to log experiments to the remote server.
- Set the Tracking URI:
# In your local terminal (macOS/Linux)
export MLFLOW_TRACKING_URI="http://<YOUR_EC2_PUBLIC_IP>:5000"
- Configure AWS Credentials: Grant your local machine S3 upload permissions.
2.7.2. AWS CLI Configuration Guide
- Install the AWS CLI:
pip install awscli
- Run Configure:
aws configure
- Enter Your Credentials:
- AWS Access Key ID: Paste your key.
- AWS Secret Access Key: Paste your secret key.
- Default region name: Enter your S3 bucket's region (e.g.,
ap-south-1). - Default output format: Press Enter for
json.
The CLI securely stores these credentials, and MLflow will automatically use them.
π 2.8. Step 8: Persistent Server Operation with systemd
To run the MLflow server as a background service that starts on boot, use systemd.
- SSH into your EC2 server.
- Create a
systemdservice file:
sudo nano /etc/systemd/system/mlflow-server.service
- Paste the following configuration, replacing all placeholders.
[Unit]
Description=MLflow Tracking Server
After=network.target
[Service]
User=<your_user>
Restart=on-failure
# Note: Use the absolute path to the mlflow executable in your venv
ExecStart=/home/<your_user>/mlflow-env/bin/mlflow server \
--backend-store-uri postgresql://mlflow_user:<YOUR_RDS_PASSWORD>@<YOUR_RDS_ENDPOINT>/mlflow_db \
--default-artifact-root s3://<your-s3-bucket-name>/
--host 127.0.0.1 \
--port 5000
[Install]
WantedBy=multi-user.target
> **Security Note:** `--host` is set to `127.0.0.1`, meaning the server only accepts connections from the machine itself. A reverse proxy like Nginx should be used to handle public traffic securely.
- Enable and Start the Service:
# Reload systemd to recognize the new file
sudo systemctl daemon-reload
# Enable the service to start on boot
sudo systemctl enable mlflow-server.service
# Start the service now
sudo systemctl start mlflow-server.service
# Check its status
sudo systemctl status mlflow-server.service
π 3. Using a Static IP with AWS Elastic IP
An EC2 instance's public IP changes on every restart, which breaks the MLFLOW_TRACKING_URI. An Elastic IP (EIP) provides a permanent, static IP address to solve this problem.
π 3.1. EIP Intuition: A Permanent Address
An EIP is a static public IPv4 address you allocate to your AWS account. You can attach it to your EC2 instance, and it will persist across all stop/start cycles, ensuring permanent connectivity.
πΊοΈ 3.2. Step-by-Step EIP Implementation
-
Allocate an Elastic IP:
- Go to the EC2 Dashboard > Elastic IPs.
- Click "Allocate Elastic IP address" and confirm by clicking "Allocate".
-
Associate the EIP with Your EC2 Instance:
- On the Elastic IPs screen, select the new IP.
- Click "Actions" > "Associate Elastic IP address".
- Choose Instance as the resource type and select your
mlflow-serverinstance. - Click "Associate".
-
Update the
MLFLOW_TRACKING_URI:- Replace the old dynamic IP in your
prediction_app.envfile and any other configurations with the new Elastic IP. - Example:
MLFLOW_TRACKING_URI="http://<Your_Elastic_IP>:5000"
- Replace the old dynamic IP in your
β οΈ 3.3. EIP Cost and Security Considerations
| Consideration | Detail | Best Practice/Tip |
|---|---|---|
| Cost π° | EIPs are free only when associated with a running EC2 instance. AWS charges a small hourly fee for EIPs that are allocated but unassociated or on a stopped instance. | Since your server may stop/start, you will incur a minimal charge during the stopped period. This is usually worth the operational stability. |
| Security π‘οΈ | The EIP is just an address. Your EC2 Security Group (mlflow-server-sg) must still allow inbound traffic on port 5000. |
No change is needed if you configured the security group correctly in Step 2.3. |
| Advanced DNS π·οΈ | For maximum flexibility, use Route 53 to create a friendly domain name (e.g., mlflow.yourproject.com) that points to the EIP. |
If you ever change the EIP, you only need to update the DNS record, not every client configuration. This decouples your clients from the specific IP address. |