Skip to content

Commit 452859e

Browse files
committed
Add time series segmentation example
1 parent 7d81714 commit 452859e

File tree

9 files changed

+443
-0
lines changed

9 files changed

+443
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ Check the **Required parameters** column to see if you need to set any additiona
6262
| [sklearn_text_classifier](/label_studio_ml/examples/sklearn_text_classifier) | Text classification with [scikit-learn](https://scikit-learn.org/stable/) |||| None | Arbitrary |
6363
| [spacy](/label_studio_ml/examples/spacy) | NER by [SpaCy](https://spacy.io/) |||| None | Set [(see documentation)](https://spacy.io/usage/linguistic-features) |
6464
| [tesseract](/label_studio_ml/examples/tesseract) | Interactive OCR. [Details](https://github.com/tesseract-ocr/tesseract) |||| None | Set (characters) |
65+
| [timeseries_segmenter](/label_studio_ml/examples/timeseries_segmenter) | Time series segmentation using scikit-learn |||| None | Set |
6566
| [watsonX](/label_studio_ml/exampels/watsonx)| LLM inference with [WatsonX](https://www.ibm.com/products/watsonx-ai) and integration with [WatsonX.data](watsonx.data)|||| None| Arbitrary|
6667
| [yolo](/label_studio_ml/examples/yolo) | All YOLO tasks are supported: [YOLO](https://docs.ultralytics.com/tasks/) |||| None | Arbitrary |
6768

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# syntax=docker/dockerfile:1
2+
ARG PYTHON_VERSION=3.11
3+
4+
FROM python:${PYTHON_VERSION}-slim AS python-base
5+
ARG TEST_ENV
6+
7+
WORKDIR /app
8+
9+
ENV PYTHONUNBUFFERED=1 \
10+
PYTHONDONTWRITEBYTECODE=1 \
11+
PORT=${PORT:-9090} \
12+
PIP_CACHE_DIR=/.cache \
13+
WORKERS=1 \
14+
THREADS=8
15+
16+
# Update the base OS
17+
RUN --mount=type=cache,target="/var/cache/apt",sharing=locked \
18+
--mount=type=cache,target="/var/lib/apt/lists",sharing=locked \
19+
set -eux; \
20+
apt-get update; \
21+
apt-get upgrade -y; \
22+
apt install --no-install-recommends -y \
23+
git; \
24+
apt-get autoremove -y
25+
26+
# install base requirements
27+
COPY requirements-base.txt .
28+
RUN --mount=type=cache,target=${PIP_CACHE_DIR},sharing=locked \
29+
pip install -r requirements-base.txt
30+
31+
# install custom requirements
32+
COPY requirements.txt .
33+
RUN --mount=type=cache,target=${PIP_CACHE_DIR},sharing=locked \
34+
pip install -r requirements.txt
35+
36+
# install test requirements if needed
37+
COPY requirements-test.txt .
38+
# build only when TEST_ENV="true"
39+
RUN --mount=type=cache,target=${PIP_CACHE_DIR},sharing=locked \
40+
if [ "$TEST_ENV" = "true" ]; then \
41+
pip install -r requirements-test.txt; \
42+
fi
43+
44+
COPY . .
45+
46+
EXPOSE 9090
47+
48+
CMD gunicorn --preload --bind :$PORT --workers $WORKERS --threads $THREADS --timeout 0 _wsgi:app
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# Time Series Segmenter for Label Studio
2+
3+
This example demonstrates a minimal ML backend that performs time series segmentation.
4+
It trains a logistic regression model on labeled CSV data and predicts segments
5+
for new tasks. The backend expects the labeling configuration to use
6+
`<TimeSeries>` and `<TimeSeriesLabels>` tags.
7+
8+
## Before you begin
9+
10+
1. Install the [Label Studio ML backend](https://github.com/HumanSignal/label-studio-ml-backend?tab=readme-ov-file#quickstart).
11+
2. Set `LABEL_STUDIO_HOST` and `LABEL_STUDIO_API_KEY` in `docker-compose.yml`
12+
so the backend can download labeled tasks for training.
13+
14+
## Quick start
15+
16+
```bash
17+
# build and run
18+
docker-compose up --build
19+
```
20+
21+
Connect the model from the **Model** page in your project settings. The default
22+
URL is `http://localhost:9090`.
23+
24+
## Labeling configuration
25+
26+
Use a configuration similar to the following:
27+
28+
```xml
29+
<View>
30+
<TimeSeriesLabels name="label" toName="ts">
31+
<Label value="Run"/>
32+
<Label value="Walk"/>
33+
</TimeSeriesLabels>
34+
<TimeSeries name="ts" valueType="url" value="$csv_url" timeColumn="time">
35+
<Channel column="sensorone" />
36+
<Channel column="sensortwo" />
37+
</TimeSeries>
38+
</View>
39+
```
40+
41+
The backend reads the time column and channels to build feature vectors for
42+
training and prediction. Each CSV referenced by `csv_url` is expected to contain
43+
at least the time column and the listed channels.
44+
45+
## Training
46+
47+
Training starts automatically when annotations are created or updated. The model
48+
collects all labeled segments, extracts sensor values inside each segment and
49+
fits a logistic regression classifier. Model artifacts are stored in the
50+
`MODEL_DIR` (defaults to the current directory).
51+
52+
## Prediction
53+
54+
For each task, the backend loads the CSV, applies the trained classifier to each
55+
row and groups consecutive predictions into labeled segments. Prediction scores
56+
are averaged per segment and returned to Label Studio.
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
import os
2+
import argparse
3+
import json
4+
import logging
5+
import logging.config
6+
7+
logging.config.dictConfig({
8+
"version": 1,
9+
"disable_existing_loggers": False,
10+
"formatters": {
11+
"standard": {
12+
"format": "[%(asctime)s] [%(levelname)s] [%(name)s::%(funcName)s::%(lineno)d] %(message)s"
13+
}
14+
},
15+
"handlers": {
16+
"console": {
17+
"class": "logging.StreamHandler",
18+
"level": os.getenv('LOG_LEVEL'),
19+
"stream": "ext://sys.stdout",
20+
"formatter": "standard"
21+
}
22+
},
23+
"root": {
24+
"level": os.getenv('LOG_LEVEL'),
25+
"handlers": [
26+
"console"
27+
],
28+
"propagate": True
29+
}
30+
})
31+
32+
from label_studio_ml.api import init_app
33+
from model import TimeSeriesSegmenter
34+
35+
36+
_DEFAULT_CONFIG_PATH = os.path.join(os.path.dirname(__file__), 'config.json')
37+
38+
39+
def get_kwargs_from_config(config_path=_DEFAULT_CONFIG_PATH):
40+
if not os.path.exists(config_path):
41+
return dict()
42+
with open(config_path) as f:
43+
config = json.load(f)
44+
assert isinstance(config, dict)
45+
return config
46+
47+
48+
if __name__ == "__main__":
49+
parser = argparse.ArgumentParser(description='Label studio')
50+
parser.add_argument(
51+
'-p', '--port', dest='port', type=int, default=9090,
52+
help='Server port')
53+
parser.add_argument(
54+
'--host', dest='host', type=str, default='0.0.0.0',
55+
help='Server host')
56+
parser.add_argument(
57+
'--kwargs', '--with', dest='kwargs', metavar='KEY=VAL', nargs='+', type=lambda kv: kv.split('='),
58+
help='Additional LabelStudioMLBase model initialization kwargs')
59+
parser.add_argument(
60+
'-d', '--debug', dest='debug', action='store_true',
61+
help='Switch debug mode')
62+
parser.add_argument(
63+
'--log-level', dest='log_level', choices=['DEBUG', 'INFO', 'WARNING', 'ERROR'], default=None,
64+
help='Logging level')
65+
parser.add_argument(
66+
'--model-dir', dest='model_dir', default=os.path.dirname(__file__),
67+
help='Directory where models are stored (relative to the project directory)')
68+
parser.add_argument(
69+
'--check', dest='check', action='store_true',
70+
help='Validate model instance before launching server')
71+
parser.add_argument('--basic-auth-user',
72+
default=os.environ.get('ML_SERVER_BASIC_AUTH_USER', None),
73+
help='Basic auth user')
74+
75+
parser.add_argument('--basic-auth-pass',
76+
default=os.environ.get('ML_SERVER_BASIC_AUTH_PASS', None),
77+
help='Basic auth pass')
78+
79+
args = parser.parse_args()
80+
81+
# setup logging level
82+
if args.log_level:
83+
logging.root.setLevel(args.log_level)
84+
85+
def isfloat(value):
86+
try:
87+
float(value)
88+
return True
89+
except ValueError:
90+
return False
91+
92+
def parse_kwargs():
93+
param = dict()
94+
for k, v in args.kwargs:
95+
if v.isdigit():
96+
param[k] = int(v)
97+
elif v == 'True' or v == 'true':
98+
param[k] = True
99+
elif v == 'False' or v == 'false':
100+
param[k] = False
101+
elif isfloat(v):
102+
param[k] = float(v)
103+
else:
104+
param[k] = v
105+
return param
106+
107+
kwargs = get_kwargs_from_config()
108+
109+
if args.kwargs:
110+
kwargs.update(parse_kwargs())
111+
112+
if args.check:
113+
print('Check "' + TimeSeriesSegmenter.__name__ + '" instance creation..')
114+
model = TimeSeriesSegmenter(**kwargs)
115+
116+
app = init_app(model_class=TimeSeriesSegmenter, basic_auth_user=args.basic_auth_user, basic_auth_pass=args.basic_auth_pass)
117+
118+
app.run(host=args.host, port=args.port, debug=args.debug)
119+
120+
else:
121+
# for uWSGI use
122+
app = init_app(model_class=TimeSeriesSegmenter)
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
version: "3.8"
2+
3+
services:
4+
timeseries_segmenter:
5+
container_name: timeseries_segmenter
6+
image: heartexlabs/label-studio-ml-backend:timeseries-segmenter
7+
init: true
8+
build:
9+
context: .
10+
args:
11+
TEST_ENV: ${TEST_ENV}
12+
environment:
13+
# LABEL_STUDIO_HOST: This is the host URL for Label Studio, used for training.
14+
# It can be set via environment variable "LABEL_STUDIO_HOST".
15+
# If not set, it defaults to 'http://localhost:8080'.
16+
- LABEL_STUDIO_HOST=${LABEL_STUDIO_HOST:-http://localhost:8080}
17+
# LABEL_STUDIO_API_KEY: This is the API key for Label Studio, used for training.
18+
# It can be set via environment variable "LABEL_STUDIO_API_KEY".
19+
# There is no default value for this, so it must be set.
20+
- LABEL_STUDIO_API_KEY=${LABEL_STUDIO_API_KEY}
21+
# START_TRAINING_EACH_N_UPDATES: This is the number of updates after which training starts.
22+
# It is an integer value and can be set via environment variable "START_TRAINING_EACH_N_UPDATES".
23+
# If not set, it defaults to 10.
24+
- START_TRAINING_EACH_N_UPDATES=${START_TRAINING_EACH_N_UPDATES:-10}
25+
# specify these parameters if you want to use basic auth for the model server
26+
- BASIC_AUTH_USER=
27+
- BASIC_AUTH_PASS=
28+
# set the log level for the model server
29+
- LOG_LEVEL=DEBUG
30+
# any other parameters that you want to pass to the model server
31+
- ANY=PARAMETER
32+
# specify the number of workers and threads for the model server
33+
- WORKERS=1
34+
- THREADS=8
35+
# specify the model directory (likely you don't need to change this)
36+
- MODEL_DIR=/data/models
37+
ports:
38+
- "9090:9090"
39+
volumes:
40+
- "./data/server:/data"

0 commit comments

Comments
 (0)