Skip to content

Commit 6dd7902

Browse files
committed
update readme and add video
1 parent 71237fa commit 6dd7902

File tree

2 files changed

+25
-177
lines changed

2 files changed

+25
-177
lines changed
24.3 MB
Binary file not shown.
Lines changed: 25 additions & 177 deletions
Original file line numberDiff line numberDiff line change
@@ -1,195 +1,43 @@
11
<!--
22
---
3-
title: SAM2 with Images
3+
title: Deepgram Text To Speech
44
type: guide
55
tier: all
66
order: 15
77
hide_menu: true
88
hide_frontmatter_title: true
9-
meta_title: Using SAM2 with Label Studio for Image Annotation
10-
categories:
11-
- Computer Vision
12-
- Image Annotation
13-
- Object Detection
14-
- Segment Anything Model
15-
image: "/tutorials/sam2-images.png"
9+
meta_title: Using Deepgram with label Studio for Text to Speech
1610
---
1711
-->
1812

19-
# Using SAM2 with Label Studio for Image Annotation
13+
# Using Deepgram with Label Studio for Text to Speech annotation
2014

21-
Segment Anything 2, or SAM 2, is a model released by Meta in July 2024. An update to the original Segment Anything Model,
22-
SAM 2 provides even better object segmentation for both images and video. In this guide, we'll show you how to use
23-
SAM 2 for better image labeling with label studio.
15+
This backend uses the Deepgram API to take the input text from the user, do text to speech, and return the output audio for annotation in Label Studio.
2416

25-
Click on the image below to watch our ML Evangelist Micaela Kaplan explain how to link SAM 2 to your Label Studio Project.
26-
You'll need to follow the instructions below to stand up an instance of SAM2 before you can link your model!
17+
IMPORTANT NOTE: YOU MUST REFRESH THE PAGE AFTER SUBMITTING THE TEXT TO SEE THE AUDIO APPEAR.
2718

28-
[![Connecting SAM2 Model to Label Studio for Image Annotation ](https://img.youtube.com/vi/FTg8P8z4RgY/0.jpg)](https://www.youtube.com/watch?v=FTg8P8z4RgY)
19+
## Prerequistes
20+
1. [Deepgram API Key](https://deepgram.com/) -- create an account and follow the instructions to get an api key with default permissions. Store this key as `DEEPGRAM_API_KEY` in `docker_compose.yml`
21+
2. AWS Storage -- make sure you configure the following parameters in `docker_compose.yml`:
22+
- `AWS_ACCESS_KEY_ID` -- your AWS access key id
23+
- `AWS_SECRET_ACCESS_KEY` -- your AWS secret access key
24+
- `AWS_SESSION_TOKEN` -- your AWS session token
25+
- `AWS_DEFAULT_REGION` - the region you want to use for S3
26+
- `S3_BUCKET` -- the name of the bucket where you'd like to store the created audio files
27+
- `S3_FOLDER` -- the name of the folder within the specified bucket where you'd like to store the audio files.
28+
3. Label Studio -- make sure you set your `LABEL_STUDIO_URL` and your `LABEL_STUDIO_API_KEY` in `docker_compose.yml`. As of 11/12/25, you must use the LEGACY TOKEN.
2929

30-
## Before you begin
31-
32-
Before you begin, you must install the [Label Studio ML backend](https://github.com/HumanSignal/label-studio-ml-backend?tab=readme-ov-file#quickstart).
33-
34-
This tutorial uses the [`segment_anything_2_image` example](https://github.com/HumanSignal/label-studio-ml-backend/tree/master/label_studio_ml/examples/segment_anything_2_image).
35-
36-
Note that as of 8/1/2024, SAM2 only runs on GPU.
37-
38-
## Labeling configuration
39-
40-
The current implementation of the Label Studio SAM2 ML backend works using Interactive mode. The user-guided inputs are:
41-
- `KeypointLabels`
42-
- `RectangleLabels`
43-
44-
And then SAM2 outputs `BrushLabels` as a result.
45-
46-
This means all three control tags should be represented in your labeling configuration:
47-
48-
```xml
30+
## Labeling Config
31+
This is the base labeling config to be used with this backend. Note that you may add additional annotations to the document after the audio without breaking anything!
32+
```
4933
<View>
50-
<Style>
51-
.main {
52-
font-family: Arial, sans-serif;
53-
background-color: #f5f5f5;
54-
margin: 0;
55-
padding: 20px;
56-
}
57-
.container {
58-
display: flex;
59-
justify-content: space-between;
60-
margin-bottom: 20px;
61-
}
62-
.column {
63-
flex: 1;
64-
padding: 10px;
65-
background-color: #fff;
66-
border-radius: 5px;
67-
box-shadow: 0 2px 5px rgba(0, 0, 0, 0.1);
68-
text-align: center;
69-
}
70-
.column .title {
71-
margin: 0;
72-
color: #333;
73-
}
74-
.column .label {
75-
margin-top: 10px;
76-
padding: 10px;
77-
background-color: #f9f9f9;
78-
border-radius: 3px;
79-
}
80-
.image-container {
81-
width: 100%;
82-
height: 300px;
83-
background-color: #ddd;
84-
border-radius: 5px;
85-
}
86-
</Style>
87-
<View className="main">
88-
<View className="container">
89-
<View className="column">
90-
<View className="title">Choose Label</View>
91-
<View className="label">
92-
<BrushLabels name="tag" toName="image">
93-
94-
95-
<Label value="defect" background="#FFA39E"/></BrushLabels>
96-
</View>
97-
</View>
98-
<View className="column">
99-
<View className="title">Use Keypoint</View>
100-
<View className="label">
101-
<KeyPointLabels name="tag2" toName="image" smart="true">
102-
103-
104-
<Label value="defect" background="#250dd3"/></KeyPointLabels>
105-
</View>
106-
</View>
107-
<View className="column">
108-
<View className="title">Use Rectangle</View>
109-
<View className="label">
110-
<RectangleLabels name="tag3" toName="image" smart="true">
111-
112-
113-
<Label value="defect" background="#FFC069"/></RectangleLabels>
114-
</View>
115-
</View>
116-
</View>
117-
<View className="image-container">
118-
<Image name="image" value="$image" zoom="true" zoomControl="true"/>
119-
</View>
34+
<Header value="What would you like to TTS?"/>
35+
<TextArea name="text" toName="audio" placeholder="What do you want to tts?" value="$text" valrows="4" maxSubmissions="1"/>
36+
<Audio name="audio" value="$audio" zoom="true" hotkey="ctrl+enter"/>
12037
</View>
121-
</View>
122-
```
123-
124-
## Running from source
125-
126-
1. To run the ML backend without Docker, you have to clone the repository and install all dependencies using pip:
127-
128-
```bash
129-
git clone https://github.com/HumanSignal/label-studio-ml-backend.git
130-
cd label-studio-ml-backend
131-
pip install -e .
132-
cd label_studio_ml/examples/segment_anything_2_image
133-
pip install -r requirements.txt
134-
```
135-
136-
2. Download [`segment-anything-2` repo](https://github.com/facebookresearch/sam2) into the root directory. Install SegmentAnything model and download checkpoints using [the official Meta documentation](https://github.com/facebookresearch/sam2?tab=readme-ov-file#installation)
137-
You should now have the following folder structure:
138-
139-
140-
| root directory
141-
| label-studio-ml-backend
142-
| label-studio-ml
143-
| examples
144-
| segment_anything_2_image
145-
| sam2
146-
| sam2
147-
| checkpoints
148-
149-
150-
3. Then you can start the ML backend on the default port `9090`:
151-
152-
```bash
153-
cd ~/sam2
154-
label-studio-ml start ../label-studio-ml-backend/label_studio_ml/examples/segment_anything_2_image
155-
```
156-
157-
Due to breaking changes from Meta [HERE](https://github.com/facebookresearch/sam2/blob/c2ec8e14a185632b0a5d8b161928ceb50197eddc/sam2/build_sam.py#L20), it is CRUCIAL that you run this command from the sam2 directory at your root directory.
158-
159-
4. Connect running ML backend server to Label Studio: go to your project `Settings -> Machine Learning -> Add Model` and specify `http://localhost:9090` as a URL. Read more in the official [Label Studio documentation](https://labelstud.io/guide/ml#Connect-the-model-to-Label-Studio).
160-
161-
## Running with Docker
162-
163-
1. Start Machine Learning backend on `http://localhost:9090` with prebuilt image:
164-
165-
```bash
166-
docker-compose up
16738
```
39+
## A Data Note
40+
Note that in order for this to work, you need to upload dummy data (i.e. empty text and audio) so that the tasks populate. You can use `dummy_data.json` as this data.
16841

169-
2. Validate that backend is running
170-
171-
```bash
172-
$ curl http://localhost:9090/
173-
{"status":"UP"}
174-
```
175-
176-
3. Connect to the backend from Label Studio running on the same host: go to your project `Settings -> Machine Learning -> Add Model` and specify `http://localhost:9090` as a URL.
177-
178-
179-
## Configuration
180-
Parameters can be set in `docker-compose.yml` before running the container.
181-
182-
183-
The following common parameters are available:
184-
- `DEVICE` - specify the device for the model server (currently only `cuda` is supported, `cpu` is coming soon)
185-
- `MODEL_CONFIG` - SAM2 model configuration file (`sam2_hiera_l.yaml` by default)
186-
- `MODEL_CHECKPOINT` - SAM2 model checkpoint file (`sam2_hiera_large.pt` by default)
187-
- `BASIC_AUTH_USER` - specify the basic auth user for the model server
188-
- `BASIC_AUTH_PASS` - specify the basic auth password for the model server
189-
- `LOG_LEVEL` - set the log level for the model server
190-
- `WORKERS` - specify the number of workers for the model server
191-
- `THREADS` - specify the number of threads for the model server
192-
193-
## Customization
194-
195-
The ML backend can be customized by adding your own models and logic inside the `./segment_anything_2` directory.
42+
## Configuring the backend
43+
When you attach the model to Label Studio in your model settings, make sure to toggle ON interactive preannotations!

0 commit comments

Comments
 (0)