Skip to content

Commit 123586a

Browse files
authored
Merge pull request #113006 from SharonZhang1/xueran0801
AddNecessaryNotes
2 parents 2348df9 + 34bcd29 commit 123586a

File tree

1 file changed

+28
-5
lines changed

1 file changed

+28
-5
lines changed

articles/synapse-analytics/spark/synapse-file-mount-api.md

Lines changed: 28 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
---
22
title: Introduction to file APIs in Azure Synapse Analytics
33
description: This tutorial describes how to use the file mount and file unmount APIs in Azure Synapse Analytics, for both Azure Data Lake Storage Gen2 and Azure Blob Storage.
4-
author: ruixinxu
4+
author: JeneZhang
55
services: synapse-analytics
66
ms.service: synapse-analytics
77
ms.topic: reference
88
ms.subservice: spark
99
ms.date: 07/27/2022
10-
ms.author: ruxu
10+
ms.author: jingzh
1111
ms.reviewer: wiassaf
1212
ms.custom: subject-rbac-steps
1313
---
@@ -80,8 +80,22 @@ mssparkutils.fs.mount(
8080
> [!NOTE]
8181
> You might need to import `mssparkutils` if it's not available:
8282
> ```python
83-
> From notebookutils import mssparkutils
84-
> ```
83+
> from notebookutils import mssparkutils
84+
> ```
85+
> Mount parameters:
86+
> - fileCacheTimeout: Blobs will be cached in the local temp folder for 120 seconds by default. During this time, blobfuse won't check whether the file is up to date or not. The parameter could be set to change the default timeout time. When multiple clients modify files at the same time, in order to avoid inconsistencies between local and remote files, we recommend shortening the cache time, or even changing it to 0, and always getting the latest files from the server.
87+
> - timeout: The mount operation timeout is 120 seconds by default. The parameter could be set to change the default timeout time. When there are too many executors or when the mount times out, we recommend increasing the value.
88+
> - scope: The scope parameter is used to specify the scope of the mount. The default value is "job." If the scope is set to "job," the mount is visible only to the current cluster. If the scope is set to "workspace," the mount is visible to all notebooks in the current workspace, and the mount point is automatically created if it doesn't exist. Add the same parameters to the unmount API to unmount the mount point. The workspace level mount is only supported for linked service authentication.
89+
>
90+
> You can use these parameters like this:
91+
> ```python
92+
> mssparkutils.fs.mount(
93+
> "abfss://mycontainer@<accountname>.dfs.core.windows.net",
94+
> "/test",
95+
> {"linkedService":"mygen2account", "fileCacheTimeout": 120, "timeout": 120}
96+
> )
97+
> ```
98+
>
8599
> We don't recommend that you mount a root folder, no matter which authentication method you use.
86100
87101
@@ -149,14 +163,20 @@ f.close()
149163
```
150164
--->
151165

152-
## Access files under the mount point by using the mssparktuils fs API
166+
## Access files under the mount point by using the mssparkutils fs API
153167

154168
The main purpose of the mount operation is to let customers access the data stored in a remote storage account by using a local file system API. You can also access the data by using the `mssparkutils fs` API with a mounted path as a parameter. The path format used here is a little different.
155169

156170
Assume that you mounted the Data Lake Storage Gen2 container `mycontainer` to `/test` by using the mount API. When you access the data by using a local file system API, the path format is like this:
157171

158172
`/synfs/{jobId}/test/{filename}`
159173

174+
We recommend using a `mssparkutils.fs.getMountPath()` to get the accurate path:
175+
176+
```python
177+
path = mssparkutils.fs.getMountPath("/test") # equals to /synfs/{jobId}/test
178+
```
179+
160180
When you want to access the data by using the `mssparkutils fs` API, the path format is like this:
161181

162182
`synfs:/{jobId}/test/{filename}`
@@ -201,6 +221,9 @@ df = spark.read.load("synfs:/49/test/myFile.csv", format='csv')
201221
df.show()
202222
```
203223

224+
> [!NOTE]
225+
> When you mount the storage using a linked service, you should always explicitly set spark linked service configuration before using synfs schema to access the data. Refer to [ADLS Gen2 storage with linked services](./apache-spark-secure-credentials-with-tokenlibrary.md#adls-gen2-storage-without-linked-services) for details.
226+
204227
### Read a file from a mounted Blob Storage account
205228

206229
If you mounted a Blob Storage account and want to access it by using `mssparkutils` or the Spark API, you need to explicitly configure the SAS token via Spark configuration before you try to mount the container by using the mount API:

0 commit comments

Comments
 (0)