Skip to content

Commit 63ef003

Browse files
authored
feat: BROS-192: Introduce Azure Principal storage to LSE (#8109)
Co-authored-by: makseq <[email protected]>
1 parent 144fd7c commit 63ef003

File tree

2 files changed

+108
-1
lines changed

2 files changed

+108
-1
lines changed

docs/source/guide/storage.md

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1177,6 +1177,110 @@ You can also create a storage connection using the Label Studio API.
11771177
- See [Create new import storage](/api#operation/api_storages_azure_create) then [sync the import storage](/api#operation/api_storages_azure_sync_create).
11781178
- See [Create export storage](/api#operation/api_storages_export_azure_create) and after annotating, [sync the export storage](/api#operation/api_storages_export_azure_sync_create).
11791179

1180+
1181+
<div class="enterprise-only">
1182+
1183+
1184+
### Azure Blob Storage with Service Principal authentication
1185+
1186+
You can use Azure Service Principal authentication to securely connect Label Studio Enterprise to Azure Blob Storage without using storage account keys. Service Principal authentication provides enhanced security through Azure Active Directory (Azure AD) identity and access management, allowing for fine-grained permissions and audit capabilities.
1187+
1188+
Service Principal authentication is a secure method that uses Azure AD identity to authenticate applications. Unlike storage account keys that provide full access to the storage account, Service Principal authentication allows you to grant specific permissions and can be easily revoked or rotated.
1189+
1190+
#### Prerequisites
1191+
1192+
- Azure subscription and Storage Account
1193+
- Permission to create App Registrations and assign roles on the Storage Account
1194+
- A private container for your data (create one if needed)
1195+
1196+
#### Set up a Service Principal in Azure
1197+
1198+
1. Create an App Registration: Azure AD → App registrations → New registration → name it (e.g., "LabelStudio-ServicePrincipal").
1199+
2. Capture IDs: from the app Overview, copy the Directory (tenant) ID and Application (client) ID.
1200+
3. Create a Client Secret: Certificates & secrets → New client secret → copy the Value immediately.
1201+
4. Grant Storage access: Storage Account → Access control (IAM) → Add role assignment → Storage Blob Data Contributor → assign to the App Registration.
1202+
5. Create a container: Data storage → Containers → + Container → set Public access level = Private.
1203+
1204+
!!! warning
1205+
If you plan to use pre-signed URLs, configure CORS on the Storage Account Blob service: methods GET/HEAD/OPTIONS; allowed origins = your Label Studio domain(s); headers = *; exposed headers = *; max age ≈ 3600.
1206+
1207+
#### Set up import storage in the Label Studio UI
1208+
1209+
1. Open your project → **Settings > Cloud Storage****Add Source Storage**select **Azure Blob Storage with Service Principal**.
1210+
2. Fill the fields exactly as labeled in the UI (matches backend schema):
1211+
- **Integration Name**: Display name for this connection.
1212+
- **Storage Name**: Azure Storage Account name (not a URL).
1213+
- **Container Name** and optional **Container Prefix**.
1214+
- **Tenant ID**, **Client ID**, **Client Secret**: values from the App Registration.
1215+
- Optional: **File Filter Regex** to include specific objects.
1216+
- Import mode: toggle **Treat every items object as an image/src file**
1217+
- ON = Files (create a task per blob)
1218+
- OFF = Tasks (JSON/JSONL/Parquet task definitions)
1219+
- **Use pre-signed URLs** (ON) or proxy (OFF), and **Expiration minutes**.
1220+
3. Click **Add Storage**, then **Sync** (or use the API) to load tasks.
1221+
1222+
UI fields reference
1223+
1224+
Navigate to your Azure Storage Account:
1225+
- Go to **Access control (IAM)**
1226+
- Click **Add > Add role assignment**
1227+
- Select the **Storage Blob Data Contributor** role
1228+
- In the **Members** tab, select **User, group, or service principal**
1229+
- Search for and select your App Registration
1230+
- Click **Review + assign**
1231+
1232+
#### Set up connection in the Label Studio UI
1233+
1234+
In the Label Studio UI, do the following to set up the connection:
1235+
1236+
1. Open Label Studio in your web browser.
1237+
2. For a specific project, open **Settings > Cloud Storage**.
1238+
3. Click **Add Source Storage**.
1239+
4. In the dialog box that appears, select **Azure Blob Storage with Service Principal** as the storage type.
1240+
5. In the **Storage Name** field, type a name for the storage to appear in the Label Studio UI.
1241+
6. Specify the name of the Azure Storage Account in the **Storage Name** field.
1242+
7. Specify the name of the Azure Blob container, and if relevant, the container prefix to specify an internal folder.
1243+
8. Configure the Service Principal authentication:
1244+
- In the **Tenant ID** field, specify the Directory (tenant) ID from your App Registration.
1245+
- In the **Client ID** field, specify the Application (client) ID from your App Registration.
1246+
- In the **Client Secret** field, specify the client secret value you created.
1247+
9. Adjust the remaining optional parameters:
1248+
- In the **File Filter Regex** field, specify a regular expression to filter bucket objects. Use `.*` to collect all objects.
1249+
- In the **Import method** dropdown, choose how to import your data:
1250+
- **Files** - Automatically creates a task for each storage object (e.g. JPG, MP3, TXT). Use this if your container contains BLOB storage files such as JPG, MP3, or similar file types.
1251+
- **Tasks** - Treat each JSON, JSONL, or Parquet as a task definition (one or more tasks per file). Use this if you have multiple JSON files in the container with one task per JSON file.
1252+
- In the **Use pre-signed URLs (On) / Proxy through Label Studio (Off)** toggle, choose how media is loaded:
1253+
- **ON** (Pre-signed URLs) - All data bypasses the platform and user browsers directly read data from storage.
1254+
- **OFF** (Proxy) - The platform proxies media using its own backend.
1255+
- Set the **Expire pre-signed URLs (minutes)** counter to control how long pre-signed URLs remain valid.
1256+
10. Click **Add Storage**.
1257+
1258+
After adding the storage, click **Sync** to collect tasks from the container, or make an API call to sync import storage.
1259+
1260+
#### Create a target storage connection in the Label Studio UI
1261+
1262+
Repeat the steps from the previous section but using **Add Target Storage**. Use the same fields:
1263+
- **Storage Name**, **Container Name/Prefix**, **Tenant ID**, **Client ID**, **Client Secret**.
1264+
1265+
After adding, click **Sync** (or use the API) to push exports.
1266+
1267+
#### Required permissions
1268+
1269+
- Source: `Microsoft.Storage/storageAccounts/blobServices/containers/read`, `.../containers/blobs/read`
1270+
- Target: `.../containers/blobs/read`, `.../containers/blobs/write`, `.../containers/read`, `.../containers/blobs/delete` (optional)
1271+
1272+
These are included in the built-in **Storage Blob Data Contributor** role.
1273+
1274+
#### Validate and troubleshoot
1275+
1276+
- After adding the storage, the connection is checked. If it fails, verify:
1277+
- Tenant ID, Client ID, Client Secret values (no extra spaces; secret not expired)
1278+
- Storage account and container names (case-sensitive)
1279+
- Role assignment: App Registration has Storage Blob Data Contributor on the Storage Account
1280+
- CORS is set when using pre-signed URLs; try proxy mode if testing
1281+
1282+
</div>
1283+
11801284
## Redis database
11811285

11821286
You can also store your tasks and annotations in a [Redis database](https://redis.io/). You must store the tasks and annotations in different databases. You might want to use a Redis database if you find that relying on a file-based cloud storage connection is slow for your datasets.

label_studio/io_storages/base_models.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,10 @@ def info_set_in_progress(self):
127127

128128
@property
129129
def time_in_progress(self):
130-
return datetime.fromisoformat(self.meta['time_in_progress'])
130+
if 'time_failure' not in self.meta:
131+
return datetime.fromisoformat(self.meta['time_in_progress'])
132+
else:
133+
return datetime.fromisoformat(self.meta['time_failure'])
131134

132135
def info_set_completed(self, last_sync_count, **kwargs):
133136
self.status = self.Status.COMPLETED

0 commit comments

Comments
 (0)