You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Best Practices for Creating Label Studio ML Backends
7
+
8
+
This document outlines guidelines for building new ML backend examples in the `label-studio-ml-backend` repository. Follow these steps when creating a new model under `label_studio_ml/examples/<model>`.
9
+
10
+
## 1. Folder Layout
11
+
12
+
Each example should contain the following files:
13
+
14
+
- **README.md** – overview of the model, instructions for running the backend, and description of the labeling configuration. Include quick-start commands and environment variables.
15
+
- **model.py** – implementation of `LabelStudioMLBase` with `predict()` and `fit()` methods. Keep functions short and well commented. Reuse helper methods when possible.
16
+
- **_wsgi.py** – minimal entry point exposing the `app` for gunicorn. Import the model and define `app` via `make_wsgi_app()`.
17
+
- **Dockerfile** – builds an image with only the dependencies required to run the model. Install packages from `requirements.txt`.
18
+
- **docker-compose.yml** – example service definition for running the backend locally. Expose `9090` by default.
19
+
- **requirements.txt** – pinned dependencies for the model. Optional files `requirements-base.txt` and `requirements-test.txt` may list shared and test deps.
20
+
- **tests/** – pytest suite. Provide at least one test that runs `fit()` on labeled tasks and verifies `predict()` returns expected results. Use small fixtures under `tests/` to avoid relying on network access.
21
+
22
+
## 2. Implementation Tips
23
+
24
+
- Use environment variables like `LABEL_STUDIO_HOST`, `LABEL_STUDIO_API_KEY`, and `MODEL_DIR` to make the backend configurable.
25
+
- Parse the labeling configuration with `self.label_interface` to get tag names, label values and data fields. This ensures the backend works with custom configs.
26
+
- Save trained artifacts inside `MODEL_DIR`. Use a stable file name such as `model.pkl` or `model.keras`.
27
+
- When training, gather all labeled tasks via the Label Studio SDK and convert each annotation to training samples. Keep network requests minimal and log useful information.
28
+
- When predicting, load data referenced in the task (e.g., download the CSV) and return results in Label Studio JSON format.
29
+
- Handle missing data gracefully and skip tasks without required inputs.
30
+
- Keep the code style consistent with `black` and `flake8` where applicable.
31
+
32
+
## 3. Documentation
33
+
34
+
- Reference the main repository README to help users understand how to install and run the ML backend.
35
+
- Include labeling configuration examples in the example README so users can quickly reproduce training and inference.
36
+
- Provide troubleshooting tips or links to Label Studio documentation such as [Writing your own ML backend](mdc:https:/labelstud.io/guide/ml_create): https://labelstud.io/guide/ml_create.
37
+
38
+
## 4. Testing
39
+
40
+
- Tests should be runnable with `pytest` directly from the repository root or inside the example’s Docker container.
41
+
- Mock Label Studio API interactions whenever possible to avoid requiring a running server during tests.
42
+
- Aim for good coverage of `fit()` and `predict()` logic to catch regressions.
43
+
44
+
## 5. Examples
45
+
46
+
- You can use as an implementation example `label_studio_ml/examples/yolo/`. It's well written and can be a model to follow.
47
+
48
+
Following these conventions helps maintain consistency across examples and makes it easier for contributors and automation tools to understand each backend.
0 commit comments