You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-27990][SPARK-29903][PYTHON] Add recursiveFileLookup option to Python DataFrameReader
### What changes were proposed in this pull request?
As a follow-up to apache#24830, this PR adds the `recursiveFileLookup` option to the Python DataFrameReader API.
### Why are the changes needed?
This PR maintains Python feature parity with Scala.
### Does this PR introduce any user-facing change?
Yes.
Before this PR, you'd only be able to use this option as follows:
```python
spark.read.option("recursiveFileLookup", True).text("test-data").show()
```
With this PR, you can reference the option from within the format-specific method:
```python
spark.read.text("test-data", recursiveFileLookup=True).show()
```
This option now also shows up in the Python API docs.
### How was this patch tested?
I tested this manually by creating the following directories with dummy data:
```
test-data
├── 1.txt
└── nested
└── 2.txt
test-parquet
├── nested
│ ├── _SUCCESS
│ ├── part-00000-...-.parquet
├── _SUCCESS
├── part-00000-...-.parquet
```
I then ran the following tests and confirmed the output looked good:
```python
spark.read.parquet("test-parquet", recursiveFileLookup=True).show()
spark.read.text("test-data", recursiveFileLookup=True).show()
spark.read.csv("test-data", recursiveFileLookup=True).show()
```
`python/pyspark/sql/tests/test_readwriter.py` seems pretty sparse. I'm happy to add my tests there, though it seems we have been deferring testing like this to the Scala side of things.
Closesapache#26718 from nchammas/SPARK-27990-recursiveFileLookup-python.
Authored-by: Nicholas Chammas <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
0 commit comments