Skip to content

Commit dce7f9c

Browse files
authored
Added linters for direct filesystem access in Python and SQL code (#2519)
## Changes Our linters currently only detect DBFS. We need to detect all DFSAs (Direct File System Access) ### Linked issues Progresses #2350 ### Functionality None ### Tests - [x] added unit tests --------- Co-authored-by: Eric Vergnaud <[email protected]>
1 parent abda708 commit dce7f9c

File tree

17 files changed

+441
-383
lines changed

17 files changed

+441
-383
lines changed

CONTRIBUTING.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -286,12 +286,10 @@ $ python tests/integration/source_code/message_codes.py
286286
cannot-autofix-table-reference
287287
catalog-api-in-shared-clusters
288288
changed-result-format-in-uc
289-
dbfs-read-from-sql-query
290-
dbfs-usage
289+
direct-filesystem-access
290+
direct-filesystem-access-in-sql-query
291291
default-format-changed-in-dbr8
292292
dependency-not-found
293-
direct-filesystem-access
294-
implicit-dbfs-usage
295293
jvm-access-in-shared-clusters
296294
legacy-context-in-shared-clusters
297295
not-supported

README.md

Lines changed: 17 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -64,12 +64,10 @@ See [contributing instructions](CONTRIBUTING.md) to help improve this project.
6464
* [`cannot-autofix-table-reference`](#cannot-autofix-table-reference)
6565
* [`catalog-api-in-shared-clusters`](#catalog-api-in-shared-clusters)
6666
* [`changed-result-format-in-uc`](#changed-result-format-in-uc)
67-
* [`dbfs-read-from-sql-query`](#dbfs-read-from-sql-query)
68-
* [`dbfs-usage`](#dbfs-usage)
67+
* [`direct-filesystem-access`](#direct-filesystem-access)
68+
* [`direct-filesystem-access-in-sql-query`](#direct-filesystem-access-in-sql-query)
6969
* [`default-format-changed-in-dbr8`](#default-format-changed-in-dbr8)
7070
* [`dependency-not-found`](#dependency-not-found)
71-
* [`direct-filesystem-access`](#direct-filesystem-access)
72-
* [`implicit-dbfs-usage`](#implicit-dbfs-usage)
7371
* [`jvm-access-in-shared-clusters`](#jvm-access-in-shared-clusters)
7472
* [`legacy-context-in-shared-clusters`](#legacy-context-in-shared-clusters)
7573
* [`not-supported`](#not-supported)
@@ -766,24 +764,32 @@ you need to make sure that `do_stuff_with_table` can handle the new format.
766764

767765
[[back to top](#databricks-labs-ucx)]
768766

769-
#### `dbfs-read-from-sql-query`
767+
#### `direct-filesystem-access-in-sql-query`
770768

771-
DBFS access is not allowed in Unity Catalog, so if you have code like this:
769+
Direct filesystem access is deprecated in Unity Catalog.
770+
DBFS is no longer supported, so if you have code like this:
772771

773772
```python
774-
df = spark.sql("SELECT * FROM parquet.`/mnt/foo/path/to/file`")
773+
df = spark.sql("SELECT * FROM parquet.`/mnt/foo/path/to/parquet.file`")
775774
```
776775

777776
you need to change it to use UC tables.
778777

779778
[[back to top](#databricks-labs-ucx)]
780779

781-
#### `dbfs-usage`
780+
#### `direct-filesystem-access`
782781

783-
DBFS does not work in Unity Catalog, so if you have code like this:
782+
Direct filesystem access is deprecated in Unity Catalog.
783+
DBFS is no longer supported, so if you have code like this:
784784

785785
```python
786-
display(spark.read.csv('/mnt/things/e/f/g'))
786+
display(spark.read.csv('/mnt/things/data.csv'))
787+
```
788+
789+
or this:
790+
791+
```python
792+
display(spark.read.csv('s3://bucket/folder/data.csv'))
787793
```
788794

789795
You need to change it to use UC tables or UC volumes.
@@ -798,31 +804,7 @@ means an error in the user code.
798804

799805
[[back to top](#databricks-labs-ucx)]
800806

801-
#### `direct-filesystem-access`
802-
803-
It's not allowed to access the filesystem directly in Unity Catalog, so if you have code like this:
804-
805-
```python
806-
spark.read.csv("s3://bucket/path")
807-
```
808-
809-
you need to change it to use UC tables or UC volumes.
810-
811-
[[back to top](#databricks-labs-ucx)]
812-
813-
#### `implicit-dbfs-usage`
814-
815-
The use of DBFS is not allowed in Unity Catalog, so if you have code like this:
816-
817-
```python
818-
display(spark.read.csv('/mnt/things/e/f/g'))
819-
```
820-
821-
you need to change it to use UC tables or UC volumes.
822-
823-
[[back to top](#databricks-labs-ucx)]
824-
825-
#### `jvm-access-in-shared-clusters`
807+
### `jvm-access-in-shared-clusters`
826808

827809
You cannot access Spark Driver JVM on Unity Catalog clusters in Shared Access mode. If you have code like this:
828810

src/databricks/labs/ucx/source_code/linters/context.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
PythonLinter,
1313
SqlLinter,
1414
)
15-
from databricks.labs.ucx.source_code.linters.dbfs import DbfsUsageSqlLinter, DBFSUsagePyLinter
15+
from databricks.labs.ucx.source_code.linters.directfs import DirectFsAccessPyLinter, DirectFsAccessSqlLinter
1616
from databricks.labs.ucx.source_code.linters.imports import DbutilsPyLinter
1717

1818
from databricks.labs.ucx.source_code.linters.pyspark import SparkSqlPyLinter
@@ -40,12 +40,12 @@ def __init__(self, index: MigrationIndex | None = None, session_state: CurrentSe
4040
python_fixers.append(SparkSqlPyLinter(from_table, index, session_state))
4141

4242
python_linters += [
43-
DBFSUsagePyLinter(session_state),
43+
DirectFsAccessPyLinter(session_state),
4444
DBRv8d0PyLinter(dbr_version=session_state.dbr_version),
4545
SparkConnectPyLinter(session_state),
4646
DbutilsPyLinter(session_state),
4747
]
48-
sql_linters.append(DbfsUsageSqlLinter())
48+
sql_linters.append(DirectFsAccessSqlLinter())
4949

5050
self._linters: dict[Language, list[SqlLinter] | list[PythonLinter]] = {
5151
Language.PYTHON: python_linters,

src/databricks/labs/ucx/source_code/linters/dbfs.py

Lines changed: 0 additions & 124 deletions
This file was deleted.

0 commit comments

Comments
 (0)