You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/spark/synapse-file-mount-api.md
+36-37Lines changed: 36 additions & 37 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,7 +38,7 @@ The example assumes that you have one Data Lake Storage Gen2 account named `stor
38
38
39
39

40
40
41
-
To mount the container called `mycontainer`, `mssparkutils` first needs to check whether you have the permission to access the container. Currently, Azure Synapse Analytics supports three authentication methods for the trigger mount operation: `LinkedService`, `accountKey`, and `sastoken`.
41
+
To mount the container called `mycontainer`, `mssparkutils` first needs to check whether you have the permission to access the container. Currently, Azure Synapse Analytics supports three authentication methods for the trigger mount operation: `linkedService`, `accountKey`, and `sastoken`.
42
42
43
43
### Mount by using a linked service (recommended)
44
44
@@ -72,7 +72,7 @@ After you create linked service successfully, you can easily mount the container
>- fileCacheTimeout: Blobs will be cached in the local temp folder for120 seconds by default. During this time, blobfuse won't check whether the file is up to date or not. The parameter could be set to change the default timeout time. When multiple clients modify files at the same time, in order to avoid inconsistencies between local and remote files, we recommend shortening the cache time, or even changing it to 0, and always getting the latest files from the server.
86
-
>- timeout: The mount operation timeout is120 seconds by default. The parameter could be set to change the default timeout time. When there are too many executors or when the mount times out, we recommend increasing the value.
87
-
>- scope: The scope parameter is used to specify the scope of the mount. The default value is"job." If the scope isset to "job," the mount is visible only to the current cluster. If the scope isset to "workspace," the mount is visible to all notebooks in the current workspace, and the mount point is automatically created if it doesn't exist. Add the same parameters to the unmount API to unmount the mount point. The workspace level mount is only supported for linked service authentication.
> We don't recommend that you mount a root folder, no matter which authentication method you use.
99
85
86
+
Mount parameters:
87
+
- fileCacheTimeout: Blobs will be cached in the local temp folder for120 seconds by default. During this time, blobfuse won't check whether the file is up to date or not. The parameter could be set to change the default timeout time. When multiple clients modify files at the same time, in order to avoid inconsistencies between local and remote files, we recommend shortening the cache time, or even changing it to 0, and always getting the latest files from the server.
88
+
- timeout: The mount operation timeout is120 seconds by default. The parameter could be set to change the default timeout time. When there are too many executors or when the mount times out, we recommend increasing the value.
89
+
- scope: The scope parameter is used to specify the scope of the mount. The default value is"job." If the scope isset to "job," the mount is visible only to the current cluster. If the scope isset to "workspace," the mount is visible to all notebooks in the current workspace, and the mount point is automatically created if it doesn't exist. Add the same parameters to the unmount API to unmount the mount point. The workspace level mount is only supported for linked service authentication.
### Mount via shared access signature token or account key
102
102
@@ -166,47 +166,44 @@ f.close()
166
166
167
167
The main purpose of the mount operation is to let customers access the data stored in a remote storage account by using a local file system API. You can also access the data by using the `mssparkutils fs` API with a mounted path as a parameter. The path format used here is a little different.
168
168
169
-
Assume that you mounted the Data Lake Storage Gen2 container `mycontainer` to `/test` by using the mount API. When you access the data by using a local file system API, the path format is like this:
170
-
171
-
`/synfs/{jobId}/test/{filename}`
169
+
Assuming you've mounted the Data Lake Storage Gen2 container mycontainer to /testusing the mount API. When accessing the data through a local file system API:
170
+
- For Spark versions less than or equal to 3.3, the path format is `/synfs/{jobId}/test/{filename}`.
171
+
- For Spark versions greater than or equal to 3.4, the path format is `/synfs/notebook/{jobId}/test/{filename}`.
172
172
173
173
We recommend using a `mssparkutils.fs.getMountPath()` to get the accurate path:
174
174
175
175
```python
176
-
path = mssparkutils.fs.getMountPath("/test")# equals to /synfs/{jobId}/test
176
+
path = mssparkutils.fs.getMountPath("/test")
177
177
```
178
178
179
-
When you want to access the data by using the `mssparkutils fs` API, the path format is like this:
180
-
181
-
`synfs:/{jobId}/test/{filename}`
179
+
> [!NOTE]
180
+
> When you mount the storage with `workspace` scope, the mount point is created under the `/synfs/workspace` folder. And you need to use `mssparkutils.fs.getMountPath("/test", "workspace")` to get the accurate path.
182
181
183
-
You can see that `synfs` is used as the schema in this case, instead of a part of the mounted path.
182
+
When you want to access the data by using the `mssparkutils fs` API, the path format is like this: `synfs:/notebook/{jobId}/test/{filename}`. You can see that `synfs` is used as the schema in this case, instead of a part of the mounted path. Of course, you can also use the local file system schema to access the data. For example, `file:/synfs/notebook/{jobId}/test/{filename}`.
184
183
185
-
The following three examples show how to access a file with a mount point path by using `mssparkutils fs`. In the examples, `49` is a Spark job ID that we got from calling `mssparkutils.env.getJobId()`.
184
+
The following three examples show how to access a file with a mount point path by using `mssparkutils fs`.
+ The `mssparkutils fs help` function hasn't added the description about the mount/unmount part yet.
277
-
278
273
+ The unmount mechanism is not automatic. When the application run finishes, to unmount the mount point to release the disk space, you need to explicitly call an unmount API in your code. Otherwise, the mount point will still exist in the node after the application run finishes.
279
274
280
275
+ Mounting a Data Lake Storage Gen1 storage account is not supported for now.
281
276
277
+
## Known issues:
278
+
279
+
+ In Spark 3.4, the mount points might be unavailable when there are multiple active sessions running in parallel in the same cluster. You can mount with `workspace` scope to avoid this issue.
280
+
282
281
## Next steps
283
282
284
283
-[Get started with Azure Synapse Analytics](../get-started.md)
0 commit comments