You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs/ops/sources.md
+66Lines changed: 66 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -148,6 +148,72 @@ The spec takes the following fields:
148
148
### Schema
149
149
150
150
The output is a [*KTable*](/docs/core/data_types#ktable) with the following sub fields:
151
+
152
+
* `filename` (*Str*, key): the filename of the file, including the path, relative to the root directory, e.g. `"dir1/file1.md"`.
153
+
* `content` (*Str* if `binary` is `False`, otherwise *Bytes*): the content of the file.
154
+
155
+
156
+
## AzureBlob
157
+
158
+
The `AzureBlob` source imports files from Azure Blob Storage.
159
+
160
+
### Setup for Azure Blob Storage
161
+
162
+
#### Get Started
163
+
164
+
If you didn't have experience with Azure Blob Storage, you can refer to the [quickstart](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-portal).
165
+
These are actions you need to take:
166
+
167
+
* Create a storage account in the [Azure Portal](https://portal.azure.com/).
168
+
* Create a container in the storage account.
169
+
* Upload your files to the container.
170
+
* Grant the user / identity / service principal (depends on your authentication method, see below) access to the container. At minimum, a **Storage Blob Data Reader** role is needed. See [this doc](https://learn.microsoft.com/en-us/azure/storage/blobs/authorize-data-operations-portal) for reference.
171
+
172
+
#### Authentication
173
+
174
+
We use Azure’s **Default Credential** system (DefaultAzureCredential) for secure and flexible authentication.
175
+
This allows you to connect to Azure services without putting any secrets in the code or flow spec.
176
+
It automatically chooses the best authentication method based on your environment:
177
+
178
+
* On your local machine: uses your Azure CLI login (`az login`) or environment variables.
179
+
180
+
```sh
181
+
az login
182
+
# Optional: Set a default subscription if you have more than one
183
+
az account set --subscription "<YOUR_SUBSCRIPTION_NAME_OR_ID>"
184
+
```
185
+
* In Azure (VM, App Service, AKS, etc.): uses the resource’s Managed Identity.
186
+
* In automated environments: supports Service Principals via environment variables
187
+
* `AZURE_CLIENT_ID`
188
+
* `AZURE_TENANT_ID`
189
+
* `AZURE_CLIENT_SECRET`
190
+
191
+
You can refer to [this doc](https://learn.microsoft.com/en-us/azure/developer/python/sdk/authentication/overview) for more details.
192
+
193
+
### Spec
194
+
195
+
The spec takes the following fields:
196
+
197
+
* `account_name` (`str`): the name of the storage account.
198
+
* `container_name` (`str`): the name of the container.
199
+
* `prefix` (`str`, optional): if provided, only files with path starting with this prefix will be imported.
200
+
* `binary` (`bool`, optional): whether reading files as binary (instead of text).
201
+
* `included_patterns` (`list[str]`, optional): a list of glob patterns to include files, e.g. `["*.txt", "docs/**/*.md"]`.
202
+
If not specified, all files will be included.
203
+
* `excluded_patterns` (`list[str]`, optional): a list of glob patterns to exclude files, e.g. `["*.tmp", "**/*.log"]`.
204
+
Any file or directory matching these patterns will be excluded even if they match `included_patterns`.
205
+
If not specified, no files will be excluded.
206
+
207
+
:::info
208
+
209
+
`included_patterns` and `excluded_patterns` are using Unix-style glob syntax. See [globset syntax](https://docs.rs/globset/latest/globset/index.html#syntax) for the details.
210
+
211
+
:::
212
+
213
+
### Schema
214
+
215
+
The output is a [*KTable*](/docs/core/data_types#ktable) with the following sub fields:
216
+
151
217
* `filename` (*Str*, key): the filename of the file, including the path, relative to the root directory, e.g. `"dir1/file1.md"`.
152
218
* `content` (*Str* if `binary` is `False`, otherwise *Bytes*): the content of the file.
0 commit comments