Construct a customized portal with embedded Amazon SageMaker AI MLflow Apps

As ML groups develop, embedding Amazon SageMaker AI MLflow Apps right into a customized portal requires a scalable method to entry administration. Distributing presigned URLs doesn’t scale for groups with dozens of information scientists, and granting particular person AWS Administration Console entry provides operational overhead for directors managing entry controls. Groups who depend on SSO-integrated inside portals want their MLflow experiment monitoring accessible alongside different inside functions via a single bookmarkable URL. With a customized portal, you scale back onboarding time for brand new group members, simplify entry administration, and provides information scientists a constant expertise throughout your inside instruments.

With this resolution, you give your machine studying (ML) groups a persistent, bookmarkable URL to the complete MLflow net UI with out presigned URLs or AWS Administration Console entry. You’ll be able to embed the MLflow experiment monitoring UI immediately into your group’s SSO-integrated inside portal or customized dashboard, so customers authenticate as soon as and entry experiment monitoring alongside different inside instruments. Your steady integration and steady supply (CI/CD) pipelines and automation scripts can work together with MLflow REST APIs programmatically via the identical proxy endpoint, with SigV4 authentication dealt with behind the scenes.

On this put up, you discover ways to construct a customized portal with embedded SageMaker AI MLflow Apps UI. You stroll via the structure sample behind a React entrance finish paired with a Flask reverse proxy that handles AWS Signature Model 4 (SigV4) authentication, deploy the whole stack via the AWS Cloud Growth Equipment (AWS CDK), validate the deployment, and assessment safety concerns and cleanup procedures.

Answer overview

You deploy a customized React net software with the SageMaker AI MLflow Apps UI embedded utilizing iframe, backed by a Flask reverse proxy working on Amazon Elastic Compute Cloud (Amazon EC2). The structure consists of 4 elements that work collectively to provide your group authenticated entry to MLflow.

Utility Load Balancer

The Utility Load Balancer (ALB) serves as the only entry level on your customers. It handles HTTPS termination by routing site visitors to the suitable backend targets and integrates along with your group’s current DNS and certificates infrastructure. It gives a steady, public-facing URL for the portal that may combine with current SSO infrastructure. It distributes site visitors for each the React dashboard and MLflow API requests, and helps customized domains and SSL termination.

Notice: This implementation makes use of ALB with HTTP. For manufacturing environments, you need to add HTTPS with an SSL/TLS certificates through AWS Certificates Supervisor (ACM).

React entrance finish portal

The React entrance finish provides your group a branded entry level to the MLflow expertise. It gives a customized portal that embeds the MLflow monitoring UI in an iframe and serves as an integration level for organizational branding and extra instruments. It delivers static information via the Flask proxy from the /app path.

Flask reverse proxy service

The Flask reverse proxy sits between the entrance finish and the MLflow backend, dealing with authentication so your customers by no means handle AWS credentials immediately. A Python-based Flask software handles:

Intercepting incoming requests, together with UI paths and REST API calls.
Signing every request with AWS SigV4 utilizing short-term credentials obtained by assuming a devoted AWS Identification and Entry Administration (IAM) position.
Forwarding signed requests to the Amazon SageMaker AI MLflow Apps endpoint.
Rewriting absolute MLflow URLs in HTML responses to relative paths and stripping X-Body-Choices headers so the UI renders appropriately inside an iframe.

Amazon SageMaker AI MLflow apps

Amazon SageMaker AI totally manages MLflow apps for you, so there are not any servers to provision or patch. Amazon SageMaker AI MLflow Apps gives experiment monitoring with runs, metrics, parameters, and artifacts, together with a mannequin registry for mannequin versioning and lifecycle administration. It’s a totally managed backend with no infrastructure to take care of.

This structure helps safe communication whereas sustaining compatibility with current enterprise portals. The proxy service acts as a bridge, remodeling commonplace HTTPS requests into authenticated AWS API calls.

Structure and request workflow

The next diagram exhibits how the totally different elements work collectively to provide your group safe, browser-based entry to Amazon SageMaker AI MLflow Apps.

Right here’s what occurs when a consumer navigates to the portal:

The consumer opens the ALB URL of their browser, both immediately or via a hyperlink in your group’s inside portal. The ALB routes the request to the Amazon EC2 occasion working the Flask proxy.
The Flask proxy serves the React dashboard (from the /app path). The React app renders the web page and hundreds the MLflow UI inside an iframe pointing to /mlflow-ui/.
From this level on, each request the iframe makes goes via the Flask proxy, whether or not it’s loading the MLflow UI pages or calling API endpoints like /api/2.0/mlflow/experiments/search. The proxy indicators every request with AWS SigV4 utilizing short-term credentials (obtained by assuming a devoted IAM position) and forwards it to the serverless MLflow App endpoint.
When the MLflow App responds, the proxy does two issues earlier than passing the response again to the browser. It rewrites absolute MLflow URLs to relative paths in order that navigation works appropriately via the proxy. It additionally strips X-Body-Choices headers in order that the browser permits the content material to render contained in the iframe.

Your customers see the complete MLflow monitoring UI, together with experiments, runs, metrics, and mannequin registry, proper of their browser, with AWS authentication dealt with behind the scenes.

Walkthrough

The next part walks you thru learn how to deploy the answer. ### Stipulations

To comply with together with this walkthrough, ensure you have the next conditions:

An AWS account.
AWS Command Line Interface (AWS CLI) v2.34.5 or later (required for create-mlflow-app, list-mlflow-apps, and describe-mlflow-app instructions).
Python 3.13 or later put in regionally (utilized by the deployment script to parse JSON outputs).
AWS CDK v2 (aws-cdk-lib 2.243.0 or later) put in and bootstrapped within the goal account and Area. For directions, see Getting began with the AWS CDK.
Node.js 18.x or later put in regionally for CDK deployment.
Python 3.13 put in on the Amazon EC2 occasion (automated by the setup script).
Ample IAM permissions to create VPCs, Amazon EC2 cases, ALBs, Amazon SageMaker AI domains, MLflow Apps, and IAM roles.
An Ubuntu 24.04 LTS AMI obtainable within the goal AWS Area (mechanically resolved utilizing SSM Parameter Retailer).
Required data:
- Fundamental understanding of AWS companies and IAM permissions.
- Familiarity with Python and Flask functions.
- Understanding of MLflow ideas and operations.
Price concerns:
- This resolution creates AWS sources that will incur prices.
- Key cost-driving sources embrace:
  - Amazon EC2 cases.
  - Utility Load Balancer.
  - Amazon SageMaker AI sources.
  - Amazon Easy Storage Service (Amazon S3) storage.

For details about AWS service pricing, see the AWS Pricing Calculator.

Deploy the answer

This part guides you thru deploying the answer in your AWS account and validating it. The deployment makes use of a single deploy.sh script that orchestrates CDK stack deployment and serverless MLflow App creation.

Step 1: Clone the repository and deploy the infrastructure

Obtain the answer code and set up dependencies:

# Clone the repository
git clone https://github.com/aws-samples/sample-sagemaker-mlflow-embedded-ui.git

# Navigate to challenge listing and set up dependencies
cd sample-sagemaker-mlflow-embedded-ui
npm set up

Set your AWS account ID and Area as surroundings variables:
```
export CDK_DEFAULT_ACCOUNT=
export CDK_DEFAULT_REGION=
export AWS_DEFAULT_REGION=
export AWS_REGION=
```
Notice: For those who beforehand deployed to a unique Area, delete the cached context file.
Bootstrap your surroundings for AWS CDK (skip this step in case your AWS account and Area is already bootstrapped for AWS CDK).Bootstrap the AWS account and Area for CDK:
Deploy the required sources in your AWS account.Run the deployment script to deploy the stacks:
Notice the ALB DNS title and Amazon EC2 occasion ID from the deployment output. You want these within the following steps.

Step 2: Arrange the Flask proxy service on Amazon EC2

Register to the Amazon EC2 occasion utilizing the occasion ID from Step 1. Use AWS Methods Supervisor Session Supervisor to entry the occasion. For detailed directions, see the Session Supervisor connection information.
Set up Python 3.13 and dependencies.Set up Python packages:
```
# Change to root consumer
sudo su -
cd /root

# Set up Python and dependencies
chmod +x install_python13.sh
./install_python13.sh
```
Notice: This script works on Ubuntu-based techniques. For different Linux distributions, confirm that Python 3.12+, PIP3, and Virtualenv are put in utilizing your system’s package deal supervisor.

Set up and begin the MLflow proxy service:

chmod +x setup_mlflow_proxy_app.sh
./setup_mlflow_proxy_app.sh

Test Flask MLflow proxy service standing:
```
systemctl standing mlflowproxy
```
If the service isn’t working, examine logs with the next.
```
journalctl -u mlflowproxy
```

Step 3: Validate the deployment

This part demonstrates learn how to work together with MLflow REST APIs via the ALB. These examples use the HTTP (unsecured) protocol, and for manufacturing environments, HTTPS is advisable. The next examples use the curl instrument to make API requests, however you can even use a instrument like Postman or equal.

Open the ALB URL that you just famous in Step 1 in your browser. You can too retrieve it from the AWS CloudFormation stack output:

aws cloudformation describe-stacks --stack-name sagemaker-infra-flaskapp --query 'Stacks[0].Outputs[?OutputKey==`ALBUrl`].OutputValue' --output textual content

Open the ALB URL in your browser at http:///. You’re mechanically redirected to /app, the place the React dashboard shows the MLflow UI embedded in an iframe, as proven within the following determine.
Confirm the well being endpoint:
This could return {"standing": "wholesome"}.

Check MLflow experiment monitoring through the REST API.

Create an experiment.Use the MLflow REST API via the ALB to create a brand new experiment. Notice the experiment ID from the response.

curl -X POST http:///api/2.0/mlflow/experiments/create -H "Content material-Sort: software/json" -d '{"title": "my-first-experiment"}'

Create and log a run.Create a run underneath the experiment and log metrics and parameters.

curl -X POST http:///api/2.0/mlflow/runs/create -H "Content material-Sort: software/json" -d '{"experiment_id": "", "run_name": "training-run-1"}'

curl -X POST http:///api/2.0/mlflow/runs/log-parameter -H "Content material-Sort: software/json" -d '{"run_id": "", "key": "learning_rate", "worth": "0.01"}'

curl -X POST http:///api/2.0/mlflow/runs/log-metric -H "Content material-Sort: software/json" -d '{"run_id": "", "key": "accuracy", "worth": 0.95, "timestamp": 1700000000000, "step": 1}'

Confirm the run within the React dashboard.Refresh the React dashboard in your browser at http:///app. The MLflow UI now shows the experiment, runs, metrics, and parameters you created within the previous steps, as proven within the following determine.

Clear up

To keep away from ongoing prices and take away the sources created by this resolution, comply with these cleanup steps:

Run the cleanup script from the challenge root.
This script tears down the deployed sources in reverse dependency order. It begins by destroying the Flask app stack, then deletes the serverless MLflow App via the AWS CLI and waits for the deletion to complete. After that, it removes the MLflow sources, Amazon SageMaker area, and networking stacks. The networking stack contains an AWS Lambda-backed customized useful resource. It mechanically cleans up Amazon SageMaker AI-created Amazon Elastic File System (Amazon EFS) file techniques, orphaned community interfaces, and safety teams earlier than deleting the VPC.
Handbook useful resource cleanup.The MLflow artifacts Amazon S3 bucket has a RETAIN removing coverage and have to be manually deleted if not wanted. For detailed directions, see Deleting a common goal bucket within the Amazon S3 Person Information.

CDK stack particulars

The answer deploys 4 CDK stacks, every accountable for a definite layer of the structure.

Networking stack

This stack creates the VPC and related networking elements, together with private and non-private subnets, route tables, and safety teams. It gives the community basis that each one different stacks rely upon.

SageMaker AI area stack

This stack units up the Amazon SageMaker AI area, which serves because the organizational container for SageMaker sources. The area gives the identification and entry context wanted for the MLflow App.

SageMaker MLflow stack

This stack deploys the serverless MLflow App throughout the SageMaker AI area that shops experiments, runs, metrics, and mannequin registry information.

Flask software stack

This stack deploys the Flask reverse proxy service on an Amazon EC2 occasion behind an ALB. It handles SigV4 authentication and serves the React entrance finish portal.

Subsequent steps

After deploying the portal, take into account extending it with these use circumstances:

When deploying this resolution in a manufacturing surroundings, take into account implementing these extra safety measures:

Configure Amazon CloudWatch monitoring for the Flask-based proxy service to trace software well being, detect anomalies, and arrange alerts for suspicious actions. For extra info, see Monitor your cases utilizing CloudWatch and Create a CloudWatch alarm primarily based on anomaly detection.
Implement fee limiting for the Flask-based proxy service to guard towards potential denial-of-service (DoS) assaults and management the variety of requests from particular person shoppers. You should utilize AWS WAF along side Utility Load Balancer to implement rate-based guidelines.
Allow HTTPS termination on the Utility Load Balancer stage to assist safe communication between shoppers and your software. You should utilize ACM to provision and handle SSL/TLS certificates on your software. For directions on configuring HTTPS listeners, see the Utility Load Balancer HTTPS listeners documentation.

Conclusion

On this put up, you discovered learn how to construct a React-based dashboard with the Amazon SageMaker AI MLflow Apps UI embedded utilizing iframe, backed by a Flask reverse proxy that handles SigV4 authentication. This resolution helps ML infrastructure groups present persistent, bookmarkable entry to the complete MLflow experiment monitoring expertise via a customized portal that integrates with current organizational infrastructure.

With this method, your group will get a persistent, bookmarkable URL for MLflow experiment monitoring with out presigned URLs, together with direct integration into current SSO-protected inside portals. Customers get the complete MLflow UI expertise, together with run comparability, metric visualization, and mannequin registry, whereas directors profit from diminished operational overhead by eradicating per-user console entry. All the resolution is deployed as infrastructure as code with automated provisioning and cleanup. To get began, clone the pattern repository and deploy the stack in your AWS account.