Skip to content

Commit 3a32230

Browse files
committed
rules
1 parent 71db4d2 commit 3a32230

File tree

1 file changed

+48
-0
lines changed

1 file changed

+48
-0
lines changed
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
---
2+
description:
3+
globs:
4+
alwaysApply: true
5+
---
6+
# Best Practices for Creating Label Studio ML Backends
7+
8+
This document outlines guidelines for building new ML backend examples in the `label-studio-ml-backend` repository. Follow these steps when creating a new model under `label_studio_ml/examples/<model>`.
9+
10+
## 1. Folder Layout
11+
12+
Each example should contain the following files:
13+
14+
- **README.md** – overview of the model, instructions for running the backend, and description of the labeling configuration. Include quick-start commands and environment variables.
15+
- **model.py** – implementation of `LabelStudioMLBase` with `predict()` and `fit()` methods. Keep functions short and well commented. Reuse helper methods when possible.
16+
- **_wsgi.py** – minimal entry point exposing the `app` for gunicorn. Import the model and define `app` via `make_wsgi_app()`.
17+
- **Dockerfile** – builds an image with only the dependencies required to run the model. Install packages from `requirements.txt`.
18+
- **docker-compose.yml** – example service definition for running the backend locally. Expose `9090` by default.
19+
- **requirements.txt** – pinned dependencies for the model. Optional files `requirements-base.txt` and `requirements-test.txt` may list shared and test deps.
20+
- **tests/** – pytest suite. Provide at least one test that runs `fit()` on labeled tasks and verifies `predict()` returns expected results. Use small fixtures under `tests/` to avoid relying on network access.
21+
22+
## 2. Implementation Tips
23+
24+
- Use environment variables like `LABEL_STUDIO_HOST`, `LABEL_STUDIO_API_KEY`, and `MODEL_DIR` to make the backend configurable.
25+
- Parse the labeling configuration with `self.label_interface` to get tag names, label values and data fields. This ensures the backend works with custom configs.
26+
- Save trained artifacts inside `MODEL_DIR`. Use a stable file name such as `model.pkl` or `model.keras`.
27+
- When training, gather all labeled tasks via the Label Studio SDK and convert each annotation to training samples. Keep network requests minimal and log useful information.
28+
- When predicting, load data referenced in the task (e.g., download the CSV) and return results in Label Studio JSON format.
29+
- Handle missing data gracefully and skip tasks without required inputs.
30+
- Keep the code style consistent with `black` and `flake8` where applicable.
31+
32+
## 3. Documentation
33+
34+
- Reference the main repository README to help users understand how to install and run the ML backend.
35+
- Include labeling configuration examples in the example README so users can quickly reproduce training and inference.
36+
- Provide troubleshooting tips or links to Label Studio documentation such as [Writing your own ML backend](mdc:https:/labelstud.io/guide/ml_create): https://labelstud.io/guide/ml_create.
37+
38+
## 4. Testing
39+
40+
- Tests should be runnable with `pytest` directly from the repository root or inside the example’s Docker container.
41+
- Mock Label Studio API interactions whenever possible to avoid requiring a running server during tests.
42+
- Aim for good coverage of `fit()` and `predict()` logic to catch regressions.
43+
44+
## 5. Examples
45+
46+
- You can use as an implementation example `label_studio_ml/examples/yolo/`. It's well written and can be a model to follow.
47+
48+
Following these conventions helps maintain consistency across examples and makes it easier for contributors and automation tools to understand each backend.

0 commit comments

Comments
 (0)