Skip to content

Commit 317672f

Browse files
Merge pull request #2624 from MicrosoftDocs/main638832709403706320sync_temp
For protected branch, push strategy should use PR and merge to target branch method to work around git push error
2 parents be39486 + 952f8aa commit 317672f

File tree

2 files changed

+98
-46
lines changed

2 files changed

+98
-46
lines changed

data-explorer/kusto/includes/python-plugin-adx.md

Lines changed: 14 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
---
22
ms.topic: include
3-
ms.date: 09/17/2024
3+
ms.date: 05/19/2025
44
---
55

66
The Python plugin runs a user-defined function (UDF) using a Python script. The Python script gets tabular data as its input, and produces tabular output. The plugin's runtime is hosted in [sandboxes](../concepts/sandboxes.md), running on the cluster's nodes.
77

88
## Syntax
99

10-
*T* `|` `evaluate` [`hint.distribution` `=` (`single` | `per_node`)] [`hint.remote` `=` (`auto` | `local`)] `python(`*output_schema*`,` *script* [`,` *script_parameters*] [`,` *external_artifacts*][`,` *spill_to_disk*]`)`
10+
*T* `|` `evaluate` [`hint.distribution` `=` (`single` | `per_node`)] [`hint.remote` `=` (`auto` | `local`)] `python(`*output_schema*`,` *script* [`,` *script_parameters*] [`,` *external_artifacts*] [`,` *spill_to_disk*]`)`
1111

1212
[!INCLUDE [syntax-conventions-note](syntax-conventions-note.md)]
1313

@@ -49,11 +49,11 @@ To see the list of packages for the different Python images, see [Python package
4949
## Use ingestion from query and update policy
5050

5151
* Use the plugin in queries that are:
52-
* Defined as part of an [update policy](../management/update-policy.md), whose source table is ingested to using *non-streaming* ingestion.
52+
* Defined as part of an [update policy](../management/update-policy.md), whose source table is ingested by [queued ingestion](/azure/data-explorer/ingest-data-overview#continuous-data-ingestion).
5353
* Run as part of a command that [ingests from a query](../management/data-ingestion/ingest-from-query.md), such as `.set-or-append`.
5454
* You can't use the plugin in a query that is defined as part of an update policy, whose source table is ingested using [streaming ingestion](/azure/data-explorer/ingest-data-streaming).
5555

56-
## Examples
56+
## Example
5757

5858
~~~kusto
5959
range x from 1 to 360 step 1
@@ -74,31 +74,6 @@ result["fx"] = g * np.sin(df["x"]/n*2*np.pi*f)
7474

7575
:::image type="content" source="../query/media/plugin/sine-demo.png" alt-text="Screenshot of sine demo showing query result." border="false":::
7676

77-
~~~kusto
78-
print "This is an example for using 'external_artifacts'"
79-
| evaluate python(
80-
typeof(File:string, Size:string), ```if 1:
81-
import os
82-
result = pd.DataFrame(columns=['File','Size'])
83-
sizes = []
84-
path = '.\\\\Temp'
85-
files = os.listdir(path)
86-
result['File']=files
87-
for file in files:
88-
sizes.append(os.path.getsize(path + '\\\\' + file))
89-
result['Size'] = sizes
90-
```,
91-
external_artifacts =
92-
dynamic({"this_is_my_first_file":"https://kustoscriptsamples.blob.core.windows.net/samples/R/sample_script.r",
93-
"this_is_a_script":"https://kustoscriptsamples.blob.core.windows.net/samples/python/sample_script.py"})
94-
)
95-
~~~
96-
97-
| File | Size |
98-
|--------|------|
99-
| this_is_a_script | 120 |
100-
| this_is_my_first_file | 105 |
101-
10277
## Performance tips
10378

10479
* Reduce the plugin's input dataset to the minimum amount required (columns/rows).
@@ -116,9 +91,9 @@ print "This is an example for using 'external_artifacts'"
11691
` ``` `
11792
` python code`
11893
` ``` `
119-
* Use the [`externaldata` operator](../query/externaldata-operator.md) to obtain the content of a script that you've stored in an external location, such as Azure Blob storage.
94+
* Use the [externaldata operator](../query/externaldata-operator.md) to obtain the content of a script that you've stored in an external location, such as Azure Blob storage.
12095

121-
### Example
96+
### Example reading the Python script external data
12297

12398
```kusto
12499
let script =
@@ -145,7 +120,7 @@ The URLs referenced by the external artifacts property must be:
145120
> [!NOTE]
146121
> When authenticating external artifacts using Managed Identities, the `SandboxArtifacts` usage must be defined on the cluster level [managed identity policy](../management/managed-identity-policy.md).
147122
148-
The artifacts are made available for the script to consume from a local temporary directory, `.\Temp`. The names provided in the property bag are used as the local file names. See [Examples](#examples).
123+
The artifacts are made available for the script to be read from a local temporary directory, `.\Temp`. The names provided in the property bag are used as the local file names. See [Example](#example-using-external-artifacts).
149124

150125
For information regarding referencing external packages, see [Install packages for the Python plugin](#install-packages-for-the-python-plugin).
151126

@@ -188,24 +163,24 @@ download the package and its dependencies.
188163
pip wheel [-w download-dir] package-name.
189164
```
190165

191-
1. Create a ZIP file that contains the required package and its dependencies.
166+
1. Create a zip file containing the required package and its dependencies.
192167

193168
* For private packages, zip the folder of the package and the folders of its dependencies.
194169
* For public packages, zip the files that were downloaded in the previous step.
195170

196171
> [!NOTE]
197172
>
198-
> * Make sure to download the package that is compatible to the Python engine and the platform of the sandbox runtime (currently 3.6.5 on Windows)
173+
> * Make sure to download the package that is compatible to the Python engine and the platform of the sandbox runtime (currently 3.10.8 or 3.11.7 on Windows)
199174
> * Make sure to zip the `.whl` files themselves, and not their parent folder.
200175
> * You can skip `.whl` files for packages that already exist with the same version in the base sandbox image.
201176

202-
1. Upload the zipped file to a blob in the artifacts location (from step 1).
177+
1. Upload the zip file to a blob in the artifacts location (from step 1 of the prerequisites).
203178

204179
1. Call the `python` plugin.
205-
* Specify the `external_artifacts` parameter with a property bag of name and reference to the ZIP file (the blob's URL, including a SAS token).
206-
* In your inline python code, import `Zipackage` from `sandbox_utils` and call its `install()` method with the name of the ZIP file.
180+
* Specify the `external_artifacts` parameter with a property bag of local name and blob URL of the zip file (including a SAS token).
181+
* In your inline python code, import `Zipackage` from `sandbox_utils` and call its `install()` method with the local name of the ZIP file.
207182

208-
### Example
183+
### Example using external artifacts
209184

210185
Install the [Faker](https://pypi.org/project/Faker/) package that generates fake data.
211186

@@ -221,7 +196,7 @@ range ID from 1 to 3 step 1
221196
for i in range(df.shape[0]):
222197
result.loc[i, "Name"] = fake.name()
223198
```,
224-
external_artifacts=bag_pack('faker.zip', 'https://artifacts.blob.core.windows.net/Faker.zip?*** REPLACE WITH YOUR SAS TOKEN ***'))
199+
external_artifacts=bag_pack('faker.zip', 'https://artifacts.blob.core.windows.net/Faker.zip;impersonate'))
225200
~~~
226201

227202
| ID | Name |

data-explorer/kusto/includes/python-plugin-fabric.md

Lines changed: 84 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,14 @@
11
---
22
ms.topic: include
3-
ms.date: 08/11/2024
3+
ms.date: 05/19/2025
44
---
55

66
The Python plugin runs a user-defined function (UDF) using a Python script. The Python script gets tabular data as its input, and produces tabular output.
77

88
## Syntax
99

10-
*T* `|` `evaluate` [`hint.distribution` `=` (`single` | `per_node`)] [`hint.remote` `=` (`auto` | `local`)] `python(`*output_schema*`,` *script* [`,` *script_parameters*] [`,` *spill_to_disk*]`)`
10+
*T* `|` `evaluate` [`hint.distribution` `=` (`single` | `per_node`)] [`hint.remote` `=` (`auto` | `local`)] `python(`*output_schema*`,` *script* [`,` *script_parameters*] [`,` *external_artifacts*] [`,` *spill_to_disk*]`)`
11+
1112

1213
[!INCLUDE [syntax-conventions-note](syntax-conventions-note.md)]
1314

@@ -18,8 +19,9 @@ The Python plugin runs a user-defined function (UDF) using a Python script. The
1819
|*output_schema*| `string` | :heavy_check_mark:|A `type` literal that defines the output schema of the tabular data, returned by the Python code. The format is: `typeof(`*ColumnName*`:` *ColumnType*[, ...]`)`. For example, `typeof(col1:string, col2:long)`. To extend the input schema, use the following syntax: `typeof(*, col1:string, col2:long)`.|
1920
|*script*| `string` | :heavy_check_mark:|The valid Python script to execute. To generate multi-line strings, see [Usage tips](#usage-tips).|
2021
|*script_parameters*| `dynamic` ||A property bag of name value pairs to be passed to the Python script as the reserved `kargs` dictionary. For more information, see [Reserved Python variables](#reserved-python-variables).|
21-
|*hint.distribution*| `string` ||A hint for the plugin's execution to be distributed across multiple cluster nodes. The default value is `single`. `single` means a single instance of the script will run over the entire query data. `per_node` means that if the query before the Python block is distributed, an instance of the script will run on each node, on the data that it contains.|
22+
|*hint.distribution*| `string` ||A hint for the plugin's execution to be distributed across multiple sandboxes. The default value is `single`. `single` means a single instance of the script will run over the entire query data in a single sandbox. `per_node` means that if the query before the Python block is distributed to partitions, each partition will run in its own sandbox in parallel.|
2223
|*hint.remote*| `string` ||This hint is only relevant for cross cluster queries. The default value is `auto`. `auto` means the server decides automatically in which cluster the Python code is executed. Setting the value to `local` forces executing the Python code on the local cluster. Use it in case the Python plugin is disabled on the remote cluster.|
24+
|*external_artifacts*| `dynamic` ||A property bag of name and URL pairs for artifacts that are accessible from OneLake storage. See more in [Using external artifacts](#using-external-artifacts).|
2325
|*spill_to_disk*| `bool` ||Specifies an alternative method for serializing the input table to the Python sandbox. For serializing big tables set it to `true` to speed up the serialization and significantly reduce the sandbox memory consumption. Default is `true`.|
2426

2527
## Reserved Python variables
@@ -46,11 +48,11 @@ To see the list of packages for the different Python images, see [Python package
4648
## Use ingestion from query and update policy
4749

4850
* Use the plugin in queries that are:
49-
* Defined as part of an [update policy](../management/update-policy.md), whose source table is ingested to using *non-streaming* ingestion.
51+
* Defined as part of an [update policy](../management/update-policy.md), whose source table is ingested by [queued ingestion](/azure/data-explorer/ingest-data-overview#continuous-data-ingestion).
5052
* Run as part of a command that [ingests from a query](../management/data-ingestion/ingest-from-query.md), such as `.set-or-append`.
5153
* You can't use the plugin in a query that is defined as part of an update policy, whose source table is ingested using [streaming ingestion](/azure/data-explorer/ingest-data-streaming).
5254

53-
## Examples
55+
## Example
5456

5557
~~~kusto
5658
range x from 1 to 360 step 1
@@ -88,9 +90,9 @@ result["fx"] = g * np.sin(df["x"]/n*2*np.pi*f)
8890
` ``` `
8991
` python code`
9092
` ``` `
91-
* Use the [`externaldata` operator](../query/externaldata-operator.md) to obtain the content of a script that you've stored in an external location, such as Azure Blob storage.
93+
* Use the [externaldata operator](../query/externaldata-operator.md) to obtain the content of a script that you've stored in an external location, such as Azure Blob storage.
9294

93-
### Example
95+
### Example reading the Python script external data
9496

9597
```kusto
9698
let script =
@@ -105,6 +107,81 @@ result["fx"] = g * np.sin(df["x"]/n*2*np.pi*f)
105107
| render linechart
106108
```
107109

110+
## Using External Artifacts
111+
112+
External artifacts from OneLake storage can be made available for the script and used at runtime.
113+
114+
The artifacts are made available for the script to be read from a local temporary directory, `.\Temp`. The names provided in the property bag are used as the local file names. See [Example](#example-using-external-artifacts).
115+
116+
For information regarding referencing external packages, see [Install packages for the Python plugin](#install-packages-for-the-python-plugin).
117+
118+
### Refreshing external artifact cache
119+
120+
External artifact files utilized in queries are cached on your cluster. If you make updates to your files in cloud storage and require immediate synchronization with your cluster, you can use the [.clear cluster cache external-artifacts command](../management/clear-external-artifacts-cache-command.md). This command clears the cached files and ensures that subsequent queries run with the latest version of the artifacts.
121+
122+
## Install packages for the Python plugin
123+
124+
Install packages as follows:
125+
126+
### Prerequisite
127+
128+
* Create a lakehouse to host the packages, preferably in the same workspace as your eventhouse.
129+
130+
### Install packages
131+
132+
1. For public packages in [PyPi](https://pypi.org/) or other channels,
133+
download the package and its dependencies.
134+
135+
* From a cmd window in your local Windows Python environment, run:
136+
137+
```python
138+
pip wheel [-w download-dir] package-name.
139+
```
140+
141+
1. Create a zip file containing the required package and its dependencies.
142+
143+
* For private packages, zip the folder of the package and the folders of its dependencies.
144+
* For public packages, zip the files that were downloaded in the previous step.
145+
146+
> [!NOTE]
147+
>
148+
> * Make sure to download the package that is compatible to the Python engine and the platform of the sandbox runtime (currently 3.10.8 or 3.11.7 on Windows)
149+
> * Make sure to zip the `.whl` files themselves, and not their parent folder.
150+
> * You can skip `.whl` files for packages that already exist with the same version in the base sandbox image.
151+
152+
1. Upload the zip file to the lakehouse.
153+
154+
1. Copy the OneLake URL (from the zipped file's properties)
155+
156+
1. Call the `python` plugin.
157+
* Specify the `external_artifacts` parameter with a property bag of local name and OneLake URL of the zip file.
158+
* In your inline python code, import `Zipackage` from `sandbox_utils` and call its `install()` method with the name of the ZIP file.
159+
160+
### Example using external artifacts
161+
162+
Install the [Faker](https://pypi.org/project/Faker/) package that generates fake data.
163+
164+
~~~kusto
165+
range ID from 1 to 3 step 1
166+
| extend Name=''
167+
| evaluate python(typeof(*), ```if 1:
168+
from sandbox_utils import Zipackage
169+
Zipackage.install("Faker.zip")
170+
from faker import Faker
171+
fake = Faker()
172+
result = df
173+
for i in range(df.shape[0]):
174+
result.loc[i, "Name"] = fake.name()
175+
```,
176+
external_artifacts=bag_pack('faker.zip', 'https://msit-onelake.dfs.fabric.microsoft.com/MSIT_DEMO_WS/MSIT_DEMO_LH.Lakehouse/Files/Faker.zip;impersonate'))
177+
~~~
178+
179+
| ID | Name |
180+
|----|-------|
181+
| 1| Gary Tapia |
182+
| 2| Emma Evans |
183+
| 3| Ashley Bowen |
184+
108185
## Related content
109186

110187
For more examples of UDF functions that use the Python plugin, see the [Functions library](../functions-library/functions-library.md).

0 commit comments

Comments
 (0)