A MLflow HTTP API service for AI inference, designed for serving HuggingFace models through a robust, scalable web service for the Exasol RDBMS.
- Fast HTTP API - FastAPI-based service with automatic OpenAPI documentation
- HuggingFace Integration - Seamless integration with HuggingFace Transformers
- Model Hot-Swapping - Dynamic model loading and switching
- Concurrent Processing - Built-in request limiting and parallel processing
- MLflow Integration - Full MLflow model registry and tracking support
- JWT Authentication - Optional token-based authentication with model access control
- Docker Ready - Production-ready containerization
- Comprehensive Testing - Full test suite with coverage reporting
- Production Grade - Security scanning, linting, and CI/CD pipeline
# Install the package
pip install exasol-mlflow-server
# Or for development
git clone https://github.com/exasol/exasol-labs-mlflow-server.git
cd exasol-labs-mlflow-server
pip install -e .[dev]# Start the service with default configuration
mlflow-server
# Or with custom configuration
python -m mlflow_service.server --config configs/models.yaml --api-port 50051The service will start this server:
- AI API Server:
http://localhost:50051(Model inference API)
from mlflow_service.client import AIClient
import pandas as pd
# Connect to the service
client = AIClient(host="localhost", port=50051)
# Prepare input data
data = pd.DataFrame({"text": ["I love this product!", "This is terrible."]})
# Get predictions
predictions = client.predict(data)
print(predictions)
# Output: [{"label": "POSITIVE", "score": 0.95}, {"label": "NEGATIVE", "score": 0.89}]# Check service status
curl http://localhost:50051/status
# Run inference on specific model
curl -X POST http://localhost:50051/model/small/infer \
-H "Content-Type: application/json" \
-d '{"text": ["I love this!", "Not great."]}'
# List available models
curl http://localhost:50051/modelsThe service supports optional JWT-based authentication with fine-grained model access control.
Set environment variables to enable authentication:
export MLFLOW_AUTH_ENABLED=true
export MLFLOW_JWT_SECRET_KEY="your-secret-key-change-this-in-production"
export MLFLOW_TOKEN_EXPIRE_MINUTES=1440 # 24 hours (optional)Generate JWT access tokens with model permissions.
Request:
{
"subject": "user123",
"models": ["small", "medium"],
"admin": false,
"expire_minutes": 1440
}Response:
{
"access_token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"token_type": "bearer",
"expires_in": 86400
}Note: This endpoint requires admin privileges when authentication is enabled.
Get current token information and permissions.
Response:
{
"subject": "user123",
"models": ["small", "medium"],
"admin": false,
"expires_at": 1640995200,
"issued_at": 1640908800
}Include the JWT token in the Authorization header:
curl -H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{"text": ["Hello world"]}' \
http://localhost:50051/model/small/infer- Model Access: Tokens specify which models the user can access
- Admin Privileges: Admin tokens can access all models and manage authentication
- Wildcard Access: Use
["*"]in models array for access to all models
- Change the default JWT secret key in production
- Use HTTPS in production environments
- Tokens expire automatically (default: 24 hours)
- Admin tokens should be carefully managed and rotated regularly
Models are configured in configs/models.yaml:
# Example model configuration
small:
hf_model_name: "cardiffnlp/twitter-roberta-base-sentiment-latest"
mlflow_class: "HFSentimentModel"
batch_size: 8
medium:
hf_model_name: "nlptown/bert-base-multilingual-uncased-sentiment"
mlflow_class: "HFSentimentModel"
batch_size: 4python -m mlflow_service.server --help
Options:
--config PATH Model configuration file (default: configs/models.yaml)
--api-port INT AI API port (default: 50051)
--max-parallel-requests Maximum concurrent requests (default: 2)
--memory-limit-mb INT Memory limit in MB (default: 0, unlimited)
--gpu-memory-fraction GPU memory fraction (default: 0.0, auto-growth)The service provides a comprehensive REST API with full OpenAPI documentation available at http://localhost:50051/docs when running.
Run AI model inference on a specific model.
Parameters:
model_tag- Model identifier (e.g., "small", "medium")
Request:
{
"text": ["I love this product!", "This is terrible."]
}Response:
{
"predictions": [
{"label": "POSITIVE", "score": 0.95},
{"label": "NEGATIVE", "score": 0.89}
],
"model_used": "small"
}Features:
- Automatic model loading if not currently active
- Thread-safe model switching
- Structured response format with confidence scores
Explicitly load or reload a specific model.
Parameters:
model_tag- Model identifier to load
Request: Empty body
Response:
{
"status": "model loaded",
"model_uri": "models:/small/1",
"tag": "small"
}List available models with enriched configuration and registry details.
Response (example):
{
"default": "small",
"models": {"small": "models:/small/1", "medium": "models:/medium/1"},
"current": "small",
"details": {
"small": {
"tag": "small",
"model_uri": "models:/small/1",
"is_default": true,
"is_loaded": true,
"exists_in_registry": true,
"mlflow_class": "HFSentimentModel",
"hf_model_name": "cardiffnlp/twitter-roberta-base-sentiment-latest",
"batch_size": 8,
"registry_versions": [
{
"version": "1",
"stage": "Staging",
"status": "READY",
"run_id": "abc123",
"source": "runs:/abc123/small",
"last_updated_timestamp": 1700000000,
"size_bytes": 123456789
}
]
}
}
}Get service status and performance metrics.
Response:
{
"max_parallel_requests": 2,
"active_requests": 1,
"waiting_requests": 0,
"total_requests": 42,
"current_model": "small",
"queue_available": 1
}Add a new model to the service at runtime.
Parameters:
model_tag- Unique identifier for the new model
Request:
{
"model_uri": "models:/custom-model/1",
"hf_model_name": "distilbert-base-uncased-finetuned-sst-2-english",
"mlflow_class": "HFSentimentModel",
"batch_size": 1
}Response:
{
"status": "success",
"message": "Model 'custom-model' added successfully",
"tag": "custom-model",
"model_uri": "models:/custom-model/1"
}Remove a model from the service.
Parameters:
model_tag- Model identifier to remove
Response:
{
"status": "success",
"message": "Model 'custom-model' removed successfully",
"tag": "custom-model",
"model_uri": "models:/custom-model/1"
}Note: Cannot remove the default model or currently loaded model.
Register an external model class at runtime.
Parameters:
class_name- Name to register the class under
Request:
{
"module_name": "examples.custom_models",
"class_name": "CustomSentimentModel"
}Response:
{
"status": "success",
"message": "Successfully registered model class: CustomSentiment",
"class_name": "CustomSentiment"
}Remove a model class from the service.
Parameters:
class_name- Name of the class to remove
Response:
{
"status": "success",
"message": "Successfully removed model class: CustomSentiment",
"class_name": "CustomSentiment"
}Note: Built-in model classes cannot be removed.
List all registered model classes.
Response:
{
"model_classes": ["HFSentimentModel", "CustomSentiment"],
"details": {
"HFSentimentModel": "HFSentimentModel",
"CustomSentiment": "CustomSentimentModel"
}
}- Swagger UI:
http://localhost:50051/docs - ReDoc:
http://localhost:50051/redoc - OpenAPI Spec:
http://localhost:50051/openapi.json
# Clone the repository
git clone https://github.com/exasol/exasol-labs-mlflow-server.git
cd exasol-labs-mlflow-server
# Install development dependencies
pip install -e .[dev]
# Install pre-commit hooks
pre-commit install# Run all tests
pytest
# Run with coverage
pytest --cov=mlflow_service --cov-report=htmlThe project uses several tools for code quality:
# Format code
make format
# Lint code
make lint
# Run all pre-commit checks
pre-commit run --all-filesmlflow_service/ # Main service implementation
__init__.py
server.py # FastAPI server and MLflow integration
client.py # HTTP client for the API
models.py # MLflow model wrappers
sql/ # UDF SQL files
configs/ # Configuration files
models.yaml # Model definitions
tests/ # Test suite
.github/workflows/ # CI/CD pipelines
Dockerfile # Container definition
pyproject.toml # Project configuration
docker build -t mlflow-server .# Run with default configuration
docker run -p 50051:50051 mlflow-server
# Run with custom configuration
docker run -p 50051:50051 \
-v $(pwd)/configs:/app/configs \
mlflow-server --config configs/models.yamlversion: '3.8'
services:
mlflow-server:
image: mlflow-server
ports:
- "5000:5000" # MLflow UI
- "50051:50051" # API Server
volumes:
- ./configs:/app/configs
- ./mlruns:/app/mlruns
environment:
- MLFLOW_BACKEND_STORE_URI=sqlite:///mlflow.dbThe MLflow server supports loading custom model classes externally in two ways:
Load external model classes when starting the server:
# Load a single external model class
python -m mlflow_service.server \
--external-models "examples.custom_models:CustomSentimentModel"
# Load multiple classes with custom names
python -m mlflow_service.server \
--external-models \
"examples.custom_models:CustomSentimentModel:CustomSentiment" \
"my_package.models:AdvancedClassifier:Advanced"The package ships a command-line client mlflow-client to help with operations, Exasol integration, and token management.
- Install:
make install(orpip install -e .) - Build wheel:
make build→ createsdist/*.whl
- The client and server both support loading a
.envfile via--env .env. - Example variables (see
.env.example):- Server/auth:
MLFLOW_AUTH_ENABLED,MLFLOW_JWT_SECRET_KEY,MLFLOW_JWT_ALGORITHM,MLFLOW_TOKEN_EXPIRE_MINUTES,MLFLOW_HTTP_PORT,MLFLOW_CONFIG_PATH - Client/API:
MLFLOW_API_HOST,MLFLOW_API_PORT - Exasol:
EXA_DSN,EXA_USER,EXA_PASSWORD,EXA_SCHEMA,EXA_CONNECTION_NAME - Token:
MLFLOW_ADMIN_TOKEN(only where needed; avoid committing!) - BucketFS (preferred single URL):
EXA_BUCKETFS_URL=http://USER:BASE64_PASSWORD@HOST:PORT/buckets/BUCKET/PATH- Example:
http://w:dw==@127.0.0.1:6583/buckets/default/mlflow(dw==is base64("w")) - Or use components:
EXA_BUCKETFS_HOST,EXA_BUCKETFS_PORT,EXA_BUCKETFS_BUCKET,EXA_BUCKETFS_PATH,EXA_BUCKETFS_USER,EXA_BUCKETFS_PASSWORD
- Server/auth:
Create tokens by calling the server (preferred) or by signing offline.
-
Server-side (requires admin token when auth is enabled):
mlflow-client create-token --subject admin --models '*' --admin --expire-minutes 43200 --env .env- If auth is enabled, provide an admin token:
--token "$MLFLOW_ADMIN_TOKEN"or setMLFLOW_ADMIN_TOKENin.env.
-
Offline signing (no server call; needs server secret locally):
mlflow-client create-token --offline --subject user1 --models small,large --expire-minutes 1440 --env .env- Reads
MLFLOW_JWT_SECRET_KEYandMLFLOW_JWT_ALGORITHMfrom.envunless--secret/--algorithmare given. - Output is a JWT printed to stdout.
Bootstrap tip: If you don’t have an admin token yet, you can temporarily start the server with MLFLOW_AUTH_ENABLED=false to mint the first admin token via /auth/token, then restart with auth enabled.
Store a token securely in Exasol using CREATE CONNECTION:
mlflow-client store-admin-token --env .env
# Or pass explicit flags: --dsn --user --password --connection --tokenUDFs will read the token from the connection specified by EXA_CONNECTION_NAME (default MLFLOW_ADMIN_TOKEN).
Create Python UDFs to call the MLflow service:
mlflow-client create-udfs --env .env
# Or pass: --dsn --user --password --schema --connectionThis creates scripts in the target schema:
MLFLOW_INFER_JSON(model_tag VARCHAR, text VARCHAR) RETURNS JSONMLFLOW_LOAD_MODEL(model_tag VARCHAR) RETURNS JSONMLFLOW_LIST_MODELS() RETURNS JSONMLFLOW_STATUS() RETURNS JSON
These UDFs call http://MLFLOW_API_HOST:MLFLOW_API_PORT and add Authorization: Bearer <token> if the token connection exists.
Upload the client/server wheel to BucketFS for Exasol environments:
# Upload newest dist/*.whl using EXA_BUCKETFS_URL from .env
mlflow-client bucketfs-upload --env .env
# Or specify a file and components explicitly
mlflow-client bucketfs-upload --file dist/exasol_mlflow_server-0.1.0-py3-none-any.whl \
--host 127.0.0.1 --port 6583 --bucket default --path mlflow --user w --password wThe programmatic client requires a model tag in calls, matching the API:
from mlflow_service.client import AIClient
import pandas as pd
client = AIClient() # reads MLFLOW_API_HOST/PORT if set
client.token = "<JWT>" # optional
client.load("small")
resp = client.predict("small", pd.DataFrame({"text": ["great", "bad"]}))
print(resp)Set MLFLOW_API_HOST and MLFLOW_API_PORT in .env or pass host/port to AIClient.
Register model classes at runtime using the REST API:
# Register a new model class
curl -X POST http://localhost:50051/register-model-class \
-H "Content-Type: application/json" \
-d '{
"module_name": "examples.custom_models",
"class_name": "CustomSentimentModel",
"register_name": "CustomSentiment"
}'
# List all registered model classes
curl http://localhost:50051/model-classesfrom mlflow_service.models import register_model_class, load_external_model_class
# Method 1: Register an already imported class
from examples.custom_models import CustomSentimentModel
register_model_class("CustomSentiment", CustomSentimentModel)
# Method 2: Load and register from module
load_external_model_class(
"examples.custom_models",
"CustomSentimentModel",
"CustomSentiment"
)- Create a custom model class that inherits from
HFModel:
from mlflow_service.models import HFModel
import pandas as pd
class MyCustomModel(HFModel):
def _load_pipeline(self):
"""Load your HuggingFace pipeline."""
self.pipeline = self._pipeline_fn(
"text-classification", # or your task
model=self.hf_model_name,
device=self.device,
batch_size=self.batch_size,
)
def predict(self, context, model_input, params=None):
"""Implement your prediction logic."""
texts = model_input["text"].astype(str).tolist()
outputs = self.pipeline(texts, batch_size=self.batch_size)
# Process outputs as needed
results = []
for output in outputs:
results.append({
"label": output["label"],
"score": output["score"],
# Add custom fields
"custom_field": "custom_value"
})
return pd.DataFrame(results)
def input_example(self):
"""Return example input for signature inference."""
return pd.DataFrame({"text": ["example text"]})-
Save your model in a Python module (e.g.,
my_models.py) -
Load it using one of the methods above
-
Configure it in your
models.yaml:
my_custom_model:
hf_model_name: "your-model-name"
mlflow_class: "MyCustomModel" # Use the registered name
batch_size: 4See examples/custom_models.py for complete examples including:
- CustomSentimentModel: Enhanced sentiment analysis with preprocessing/postprocessing
- TextClassificationModel: Generic text classification for various tasks
All custom model classes must:
- Inherit from
mlflow_service.models.HFModel - Implement the abstract methods:
_load_pipeline(): Initialize your HuggingFace pipelinepredict(): Process input and return predictionsinput_example(): Return example input DataFrame
- Follow the expected input/output format (DataFrame with "text" column)
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes
- Run tests and ensure they pass (
pytest) - Run code quality checks (
pre-commit run --all-files) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with MLflow for model management
- Powered by FastAPI for HTTP service
- Integrated with HuggingFace Transformers for model inference
- Developed by Exasol Labs