You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement proper model versioning for production systems:
107
107
108
108
- **Version Tracking**: Include model version in predictions and logs
109
-
- **Backwards Compatibility**: Handle multiple model versions gracefully
110
-
- **Migration Strategies**: Provide clear upgrade paths for model updates
111
109
- **Rollback Support**: Maintain ability to revert to previous model versions
112
110
113
111
Example versioning pattern:
@@ -148,6 +146,7 @@ Implement robust data processing for different scenarios:
148
146
- **Type Safety**: Use proper type conversion and validation for different data types
149
147
- **Streaming Data**: Support large files that don't fit in memory using streaming approaches
150
148
- **Data Caching**: Cache preprocessed data when appropriate to improve performance
149
+
- **LabelStudioMLBackend::preload_task_data(path, task)**: Download all URLs from a task and stores them locally. It uses get_local_path() `from label_studio_sdk._extensions.label_studio_tools.core.utils.io import get_local_path` and requires LABEL_STUDIO_API_KEY and LABEL_STUDIO_URL to be able to download files through Label Studio instance.
Copy file name to clipboardExpand all lines: label_studio_ml/examples/timeseries_segmenter/README.md
+88-31Lines changed: 88 additions & 31 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -46,57 +46,114 @@ columns.
46
46
47
47
## Training
48
48
49
-
Training starts automatically when annotations are created or updated. The model
50
-
collects all labeled segments, extracts sensor values inside each segment and
51
-
fits an LSTM classifier. Model artifacts are stored in the
52
-
`MODEL_DIR` (defaults to the current directory).
49
+
Training starts automatically when annotations are created or updated. The model uses a PyTorch-based LSTM neural network with proper temporal modeling to learn time series patterns.
53
50
54
-
Steps performed by `fit()`:
51
+
### Training Process
55
52
56
-
1. Fetch all labeled tasks from Label Studio.
57
-
2. Convert labeled ranges to per-row training samples.
58
-
3. Fit a small LSTM network.
59
-
4. Save the trained model to disk.
53
+
The model follows these steps during training:
54
+
55
+
1.**Data Collection**: Fetches all labeled tasks from your Label Studio project
56
+
2.**Sample Generation**: Converts labeled time ranges into training samples:
57
+
-**Background Class**: Unlabeled time periods are treated as "background" (class 0)
58
+
-**Event Classes**: Your labeled segments (e.g., "Run", "Walk") become classes 1, 2, etc.
59
+
-**Ground Truth Priority**: If multiple annotations exist for a task, ground truth annotations take precedence
60
+
3.**Model Training**: Fits a multi-layer LSTM network with:
61
+
- Configurable sequence windows (default: 50 timesteps)
62
+
- Dropout regularization for better generalization
63
+
- Background class support for realistic time series modeling
64
+
4.**Model Persistence**: Saves trained model artifacts to `MODEL_DIR`
65
+
66
+
### Training Configuration
67
+
68
+
You can customize training behavior with these environment variables:
69
+
70
+
-`START_TRAINING_EACH_N_UPDATES`: How often to retrain (default: 1, trains on every annotation)
71
+
-`TRAIN_EPOCHS`: Number of training epochs (default: 1000)
72
+
-`SEQUENCE_SIZE`: Sliding window size for temporal context (default: 50)
When multiple annotations exist for the same task, the model prioritizes ground truth annotations:
78
+
- Non-ground truth annotations are processed first
79
+
- Ground truth annotations override previous labels and stop processing for that task
80
+
- This ensures the highest quality labels are used for training
60
81
61
82
## Prediction
62
83
63
-
For each task, the backend loads the CSV, applies the trained classifier to each
64
-
row and groups consecutive predictions into labeled segments. Prediction scores
65
-
are averaged per segment and returned to Label Studio.
84
+
The model processes new time series data by applying the trained LSTM classifier with sliding window temporal context. Only meaningful event segments are returned to Label Studio, filtering out background periods automatically.
85
+
86
+
### Prediction Process
66
87
67
-
The `predict()` method:
88
+
For each task, the model performs these steps:
68
89
69
-
1. Loads the stored model.
70
-
2. Reads the task CSV and builds a feature matrix.
71
-
3. Predicts a label for each row.
72
-
4. Merges consecutive rows with the same label into a segment.
73
-
5. Returns the segments in Label Studio JSON format.
90
+
1.**Model Loading**: Loads the trained PyTorch model from disk
91
+
2.**Data Processing**: Reads the task CSV and creates feature vectors from sensor channels
92
+
3.**Temporal Prediction**: Applies LSTM with sliding windows for temporal context:
93
+
- Uses overlapping windows with 50% overlap for smoother predictions
94
+
- Averages predictions across overlapping windows
95
+
- Maintains temporal dependencies between timesteps
96
+
4.**Segment Extraction**: Groups consecutive predictions into meaningful segments:
97
+
-**Background Filtering**: Automatically filters out background (unlabeled) periods
98
+
-**Event Segmentation**: Only returns segments with actual event labels
99
+
-**Score Calculation**: Averages prediction confidence per segment
100
+
5.**Result Formatting**: Returns segments in Label Studio JSON format
101
+
102
+
### Prediction Quality
103
+
104
+
The model provides several quality indicators:
105
+
106
+
-**Per-segment Confidence**: Average prediction probability for each returned segment
0 commit comments