You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/guide/storage.md
+96-31Lines changed: 96 additions & 31 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1183,10 +1183,12 @@ You can also create a storage connection using the Label Studio API.
1183
1183
1184
1184
### Azure Blob Storage with Service Principal authentication
1185
1185
1186
-
You can use Azure Service Principal authentication to securely connect Label Studio Enterprise to Azure Blob Storage without using storage account keys. Service Principal authentication provides enhanced security through Azure Active Directory (Azure AD) identity and access management, allowing for fine-grained permissions and audit capabilities.
1186
+
You can use Azure Service Principal authentication to securely connect Label Studio Enterprise to Azure Blob Storage without using storage account keys. Service Principal authentication provides enhanced security through Entra ID (formerly "Azure Active Directory") identity and access management, allowing for fine-grained permissions and audit capabilities.
1187
1187
1188
1188
Service Principal authentication is a secure method that uses Azure AD identity to authenticate applications. Unlike storage account keys that provide full access to the storage account, Service Principal authentication allows you to grant specific permissions and can be easily revoked or rotated.
1189
1189
1190
+
For more information, see [Microsoft - Application and service principal objects in Microsoft Entra ID](https://learn.microsoft.com/en-us/entra/identity-platform/app-objects-and-service-principals).
1191
+
1190
1192
#### Prerequisites
1191
1193
1192
1194
- Azure subscription and Storage Account
@@ -1195,42 +1197,105 @@ Service Principal authentication is a secure method that uses Azure AD identity
1195
1197
1196
1198
#### Set up a Service Principal in Azure
1197
1199
1198
-
1. Create an App Registration: Azure AD → App registrations → New registration → name it (e.g., "LabelStudio-ServicePrincipal").
1199
-
2. Capture IDs: from the app Overview, copy the Directory (tenant) ID and Application (client) ID.
1200
-
3. Create a Client Secret: Certificates & secrets → New client secret → copy the Value immediately.
1201
-
4. Grant Storage access: Storage Account → Access control (IAM) → Add role assignment → Storage Blob Data Contributor → assign to the App Registration.
1202
-
5. Create a container: Data storage → Containers → + Container → set Public access level = Private.
1200
+
1. **Add an App Registration:**
1201
+
1. From the Azure portal, search or select**Entra ID**.
1202
+
2. Select **Add > App registration**.
1203
+
2. **Register the application:**
1204
+
1. Provide a name (e.g., "LabelStudio-ServicePrincipal").
1205
+
2. Select the account type appropriate for your organization.
1206
+
3. Leave the redirect URI blank.
1207
+
4. Click **Register**.
1208
+
3. **Copy required information:**
1209
+
1. From the Overview page, copy the following fields: <br/><br/>
1210
+
***Directory (tenant) ID**
1211
+
***Application (client) ID**
1212
+
4. **Create a client secret:**
1213
+
1. While still on the overview page for your new app, expand the **Manage** menu on the left. Select **Certificates & secrets**.
1214
+
2. Click **New client secret**.
1215
+
3. Provide a description and selectan expiration date. Click **Add**.
1216
+
4. Copy the **Value** field. (You will only have one chance to copy this value and then it will be hidden.)
1217
+
5. **Grant Storage access:**
1218
+
1. Go to the storage account you created as part of the prerequisites.
1219
+
2. On the left, select**Access control (IAM)**.
1220
+
3. Select **Add role assignment**.
1221
+
4. Use the search field to locate **Storage Blob Data Contributor**. Click the role to highlight it.
1222
+
5. Select the **Members** tab above.
1223
+
6. With **User, group, or service principal** selected, click **Select members**.
1224
+
7. Use the search field provided to locate the name of the app you created earlier.
1225
+
8. Click **Select**
1226
+
9. Click **Review + assign**.
1227
+
6. **Create a container:**
1228
+
1. While still on the page for your storage account, click **Data storage** on the left.
1229
+
2. Select **Containers**
1230
+
3. You may already have a container with files, but if you do not, create a new one with private access.
1203
1231
1204
1232
!!! warning
1205
-
If you plan to use pre-signed URLs, configure CORS on the Storage Account Blob service: methods GET/HEAD/OPTIONS; allowed origins = your Label Studio domain(s); headers = *; exposed headers = *; max age ≈ 3600.
1233
+
If you plan to use pre-signed URLs, configure CORS on the Storage Account Blob service. See below.
1234
+
1235
+
<br/>
1236
+
1237
+
{% details <b>Configure CORS for the Azure storage account</b> %}
1238
+
1239
+
If you plan to use pre-signed URLs, configure CORS on the Storage Account Blob service.
1240
+
1241
+
1. In the Azure portal, navigate to the page for the storage account.
1242
+
2. From the menu on the left, scroll down to **Settings > Resource sharing (CORS)**.
1243
+
3. Under **Blob service** add the following rule:
1244
+
1245
+
***Allowed origins:**`https://app.humansignal.com` (or the domain you are using)
1246
+
***Allowed methods:**`GET, HEAD, OPTIONS`
1247
+
***Allowed headers:**`*`
1248
+
***Exposed headers:**`*`
1249
+
***Max age:**`3600`
1250
+
1251
+
4. Click **Save**.
1252
+
1253
+
{% enddetails %}
1206
1254
1207
1255
#### Set up connection in the Label Studio UI
1208
1256
1209
-
In the Label Studio UI, do the following to set up the connection:
1257
+
From Label Studio, open your project and select**Settings> Cloud Storage**>**Add Source Storage**.
1210
1258
1211
-
1. Open Label Studio in your web browser.
1212
-
2. For a specific project, open **Settings > Cloud Storage**.
1213
-
3. Click **Add Source Storage**.
1214
-
4. In the dialog box that appears, select**Azure Blob Storage with Service Principal** as the storage type.
1215
-
5. In the **Storage Name** field, type a name forthe storage to appearin the Label Studio UI.
1216
-
6. Specify the name of the Azure Storage Account in the **Storage Name** field.
1217
-
7. Specify the name of the Azure Blob container, and if relevant, the container prefix to specify an internal folder.
1218
-
8. Configure the Service Principal authentication:
1219
-
- In the **Tenant ID** field, specify the Directory (tenant) ID from your App Registration.
1220
-
- In the **Client ID** field, specify the Application (client) ID from your App Registration.
1221
-
- In the **Client Secret** field, specify the client secret value you created.
1222
-
9. Adjust the remaining optional parameters:
1223
-
- In the **File Filter Regex** field, specify a regular expression to filter bucket objects. Use `.*` to collect all objects.
1224
-
- In the **Import method** dropdown, choose how to import your data:
1225
-
- **Files** - Automatically creates a task for each storage object (e.g. JPG, MP3, TXT). Use this if your container contains BLOB storage files such as JPG, MP3, or similar file types.
1226
-
- **Tasks** - Treat each JSON, JSONL, or Parquet as a task definition (one or more tasks per file). Use this if you have multiple JSON files in the container with one task per JSON file.
1227
-
- In the **Use pre-signed URLs (On) / Proxy through Label Studio (Off)** toggle, choose how media is loaded:
1228
-
- **ON** (Pre-signed URLs) - All data bypasses the platform and user browsers directly read data from storage.
1229
-
- **OFF** (Proxy) - The platform proxies media using its own backend.
1230
-
- Set the **Expire pre-signed URLs (minutes)** counter to control how long pre-signed URLs remain valid.
1231
-
10. Click **Add Storage**.
1232
-
1233
-
After adding the storage, click **Sync** to collect tasks from the container, or make an API call to sync import storage.
1259
+
Select **Azure Blob Storage with Service Principal** and click **Next**.
1260
+
1261
+
##### Configure Connection
1262
+
1263
+
Complete the following fields and then click **Test connection**:
1264
+
1265
+
<div class="noheader rowheader">
1266
+
1267
+
|||
1268
+
| --- | --- |
1269
+
| Storage Title | Enter a name forthe storage connection to appearin Label Studio. |
1270
+
| Storage Name | Enter the name of your Azure storage sccount. |
1271
+
| Container Name | Enter the name of a container within the Azure storage account. |
1272
+
| Tenant ID | Specify the **Directory (tenant) ID** from your App Registration. |
1273
+
| Client ID | Specify the **Application (client) ID** from your App Registration. |
1274
+
| Client Secret | Specify the **Value** of the client secret you copied earlier. |
1275
+
|**Use pre-signed URLs / Proxy through the platform**| Enable or disable pre-signed URLs. [See more.](#Pre-signed-URLs-vs-Storage-proxies) |
1276
+
| Expiration minutes | Adjust the counter for how many minutes the pre-signed URLs are valid. |
1277
+
1278
+
</div>
1279
+
1280
+
##### Import Settings & Preview
1281
+
1282
+
Complete the following fields and then click **Load preview** to ensure you are syncing the correct data:
1283
+
1284
+
<div class="noheader rowheader">
1285
+
1286
+
|||
1287
+
| --- | --- |
1288
+
| Bucket Prefix | Optionally, enter the folder name within the container that you would like to use. For example, `data-set-1` or `data-set-1/subfolder-2`. |
1289
+
| Import Method | Select whether you want create a task foreach filein your container or whether you would like to use a JSON/JSONL/Parquet file to define the data for each task. |
1290
+
| File Name Filter | Specify a regular expression to filter bucket objects. Use `.*` to collect all objects. |
1291
+
| Scan all sub-folders | Enable this option to perform a recursive scan across subfolders within your container. |
1292
+
1293
+
</div>
1294
+
1295
+
1296
+
##### Review & Confirm
1297
+
1298
+
If everything looks correct, click **Save & Sync** to sync immediately, or click **Save** to save your settings and sync later.
1234
1299
1235
1300
#### Create a target storage connection in the Label Studio UI
0 commit comments