Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR reorganizes and expands the Sail documentation around data sources (Delta Lake, Iceberg, Python data sources, JDBC), and updates a few capability markers and embedded examples used by the docs.
Changes:
- Added a new “Data Sources” guide section with dedicated pages for Delta Lake, Iceberg, Python data sources, and JDBC.
- Updated SQL feature support markers and function compatibility metadata to reflect newly supported functionality.
- Adjusted test assets used for documentation examples, and temporarily skipped JDBC integration tests.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| python/pysail/tests/spark/test_python_datasource_read.txt | Tweaks the embedded Python console example used by the Python data source docs. |
| python/pysail/tests/spark/datasource/test_jdbc.py | Skips JDBC testcontainers integration tests at module load time. |
| python/pysail/data/compatibility/functions/scalar/datetime.json | Marks try_make_timestamp* functions as supported. |
| docs/guide/sql/features.md | Updates documented support status for FROM <format>.<path>, TABLESAMPLE, and DESCRIBE TABLE. |
| docs/guide/sources/index.md | Introduces a new top-level “Data Sources” landing page and capability summary table. |
| docs/guide/sources/python/index.md | Adds a Python data sources guide page (includes examples from test transcripts). |
| docs/guide/sources/jdbc/index.md | Adds a JDBC data source guide page (installation, options, examples). |
| docs/guide/sources/delta/index.md | Adds a Delta Lake data source section index page. |
| docs/guide/sources/delta/index.data.ts | Adds VitePress content loader for Delta Lake subpages. |
| docs/guide/sources/delta/features.md | Documents Delta Lake supported features. |
| docs/guide/sources/delta/examples.md | Provides Delta Lake examples and updates includes/headers. |
| docs/guide/sources/iceberg/index.md | Adds an Iceberg data source section index page. |
| docs/guide/sources/iceberg/index.data.ts | Adds VitePress content loader for Iceberg subpages. |
| docs/guide/sources/iceberg/features.md | Documents Iceberg supported features. |
| docs/guide/sources/iceberg/examples.md | Provides Iceberg examples and updates includes/headers. |
| docs/guide/integrations/jdbc.md | Removes the old JDBC integration doc (replaced by the new data source guide). |
| docs/guide/formats/index.md | Removes the old “Data Formats” index (replaced by the new “Data Sources” index). |
Comments suppressed due to low confidence (3)
docs/guide/sources/python/index.md:10
- Typo: "panalties" should be "penalties".
The Python data source allows you to extend the `SparkSession.read` and `DataFrame.write` APIs to support custom formats and external system integrations.
It optionally supports Arrow for zero-copy data exchange between the Python process and the Sail execution engine. This gives you flexibility in data source implementations without incurring performance panalties.
docs/guide/sources/python/index.md:12
- Grammar: "a Python class that inherit" should be "a Python class that inherits" (subject/verb agreement).
You can define a Python class that inherit from the `pyspark.sql.datasource.DataSource` abstract class, and register it to the Spark session to create a custom data source that can be used in the standard PySpark API. The `DataSource` class provides methods for defining the name and schema of the data source, as well as methods for creating readers and writers.
docs/guide/sources/iceberg/features.md:24
- Grammar: "The write operations currently follows" should be "The write operations currently follow".
The write operations currently follows "copy-on-write" semantics.
We plan to support delete files and deletion vectors, which would enable "merge-on-read" write operations in the future.
Spark 3.5.7 Test ReportCommit Information
Test Summary
Test DetailsError CountsPassed Tests Diff(empty) Failed Tests |
Spark 4.1.1 Test ReportCommit Information
Test Summary
Test DetailsError CountsPassed Tests Diff--- before.txt 2026-03-02 10:46:58.272878195 +0000
+++ after.txt 2026-03-02 10:46:58.606877428 +0000
@@ -930,0 +931 @@
+pyspark/sql/tests/connect/pandas/test_parity_pandas_udf_scalar.py::PandasUDFScalarParityTests::test_vectorized_udf_struct_complex
@@ -933,0 +935 @@
+pyspark/sql/tests/connect/pandas/test_parity_pandas_udf_scalar.py::PandasUDFScalarParityTests::test_vectorized_udf_timestamps
@@ -978 +979,0 @@
-pyspark/sql/tests/connect/test_connect_basic.py::SparkConnectBasicTests::test_namedargs_with_global_limit
@@ -1052,0 +1054 @@
+pyspark/sql/tests/connect/test_connect_creation.py::SparkConnectCreationTests::test_create_dataframe_from_pandas_with_ns_timestampFailed Tests(truncated) |
Ibis Test ReportCommit Information
Test Summary
Test DetailsError CountsPassed Tests Diff--- before.txt 2026-03-02 10:46:22.583265157 +0000
+++ after.txt 2026-03-02 10:46:22.849261886 +0000
@@ -457 +456,0 @@
-ibis/backends/tests/test_export.py::test_table_to_csv[pyspark]
@@ -1425,0 +1425 @@
+ibis/backends/tests/test_udf.py::test_vectorized_udf[pyspark-add_one_pandas]Failed Tests |
Codecov Report✅ All modified and coverable lines are covered by tests. @@ Coverage Diff @@
## main #1451 +/- ##
==========================================
+ Coverage 73.49% 74.11% +0.62%
==========================================
Files 844 844
Lines 104978 107835 +2857
==========================================
+ Hits 77150 79923 +2773
- Misses 27828 27912 +84
*This pull request uses carry forward flags. Click here to find out more. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.