[Docs] Cherry-pick 4.0 docs changes to master branch (delta-io#4749)

allisonport-db · longvu-db · web-flow · commit 5d40e1829f4f · 2025-06-13T14:41:58.000-07:00
#### Which Delta project/connector is this regarding?  - [ ] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [X] Other (fill in here) ## Description To expedite the 4.0 release we merged docs changes directly to that branch first, this PR cherry-picks them to master. Note, there was one additional change that was needed for the 4.0 release to accomodate for the 4.0 versioning (scala 2.13) and lack of flink/standalone projects. That change is NOT included in this PR as we plan to release a 3.x version in the future. Ref change: delta-io#4748 ## How was this patch tested? N/A (built and pushed on the 4.0 branch) ## Does this PR introduce _any_ user-facing changes? No --------- Co-authored-by: Thang Long Vu <107926660+longvu-db@users.noreply.github.com>
diff --git a/docs/source/delta-drop-feature.md b/docs/source/delta-drop-feature.md
@@ -26,6 +26,7 @@ You can drop the following Delta table features:
 
 - `deletionVectors`. See [_](delta-deletion-vectors.md).
 - `typeWidening-preview`. See [_](delta-type-widening.md). Type widening is available in preview in <Delta> 3.2.0 and above.
+- `typeWidening`. See [_](delta-type-widening.md). Type widening is available in <Delta> 4.0.0 and above.
 - `v2Checkpoint`. See [V2 Checkpoint Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#v2-spec). Drop support for V2 Checkpoints is available in <Delta> 3.1.0 and above.
 - `columnMapping`. See [_](delta-column-mapping.md). Drop support for column mapping is available in <Delta> 3.3.0 and above.
 - `vacuumProtocolCheck`. See [Vacuum Protocol Check Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#vacuum-protocol-check). Drop support for vacuum protocol check is available in <Delta> 3.3.0 and above.
diff --git a/docs/source/delta-spark-connect.md b/docs/source/delta-spark-connect.md
@@ -0,0 +1,85 @@
+---
+description: Learn about Delta Connect - Spark Connect Support in Delta.
+---
+
+# Delta Connect (aka Spark Connect Support in Delta)
+
+.. note:: This feature is available in <Delta> 4.0.0 and above. Please note, Delta Connect is currently in preview and not recommended for production workloads.
+
+Delta Connect adds [Spark Connect](https://spark.apache.org/docs/latest/spark-connect-overview.html) support to Delta Lake for Apache Spark. Spark Connect is a new initiative that adds a decoupled client-server infrastructure which allows remote connectivity from Spark from everywhere. Delta Connect allows all Delta Lake operations to work in your application running as a client connected to the Spark server.
+
+## Motivation
+
+Delta Connect is expected to bring the same benefits as Spark Connect:
+
+1. Upgrading to more recent versions of Spark and <Delta> is now easier because the client interface is being completely decoupled from the server.
+2. Simpler integration of Spark and <Delta> with developer tooling. IDEs no longer have to integrate with the full Spark and <Delta> implementation, and instead can integrate with a thin-client.
+3. Support for languages other than Java/Scala and Python. Clients "merely" have to generate Protocol Buffers and therefore become simpler to implement.
+4. Spark and <Delta> will become more stable, as user code is no longer running in the same JVM as Spark's driver.
+5. Remote connectivity. Code can run anywhere now, as there is a gRPC layer between the user interface and the driver.
+
+## How to start the Spark Server with Delta
+
+1. Download `spark-4.0.0-bin-hadoop3.tgz` from [Spark 4.0.0](https://archive.apache.org/dist/spark/spark-4.0.0).
+
+2. Start the Spark Connect server with the <Delta> Connect plugins:
+
+```bash
+sbin/start-connect-server.sh \ 
+  --packages io.delta:delta-connect-server_2.13:4.0.0,com.google.protobuf:protobuf-java:3.25.1 \ 
+  --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
+  --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" \
+  --conf "spark.connect.extensions.relation.classes=org.apache.spark.sql.connect.delta.DeltaRelationPlugin" \
+  --conf "spark.connect.extensions.command.classes=org.apache.spark.sql.connect.delta.DeltaCommandPlugin"
+```
+
+## How to use the Python Spark Connect Client with Delta
+
+The <Delta> Connect Python client is included in the same PyPi package as <Delta> Spark.
+
+1. `pip install pyspark==4.0.0`.
+
+2. `pip install delta-spark==4.0.0`.
+
+3. The usage is the same as Spark Connect (e.g. `./bin/pyspark --remote "sc://localhost"`).
+We just need to pass in a remote `SparkSession` (instead of a local one) to the `DeltaTable` API.
+
+An example:
+
+```python
+from delta.tables import DeltaTable
+from pyspark.sql import SparkSession
+from pyspark.sql.functions import *
+
+deltaTable = DeltaTable.forName(spark, "my_table")
+deltaTable.toDF().show()
+
+deltaTable.update(
+  condition = "id % 2 == 0",
+  set = {"id": "id + 100"}
+)
+```
+
+## How to use the Scala Spark Connect Client with Delta
+
+Make sure you are using Java 17!
+
+```bash
+./bin/spark-shell --remote "sc://localhost" --packages io.delta:delta-connect-client_2.13:4.0.0,com.google.protobuf:protobuf-java:3.25.1
+```
+
+An example:
+    
+```scala
+import io.delta.tables.DeltaTable
+
+val deltaTable = DeltaTable.forName(spark, "my_table")
+deltaTable.toDF.show()
+
+deltaTable.updateExpr(
+  condition = "id % 2 == 0",
+  set = Map("id" -> "id + 100")
+)
+```
+
+.. include:: /shared/replacements.md
diff --git a/docs/source/delta-spark.md b/docs/source/delta-spark.md
@@ -28,6 +28,7 @@ This is the documentation page for <Delta> Spark connector.
     delta-drop-feature
     delta-row-tracking
     delta-storage
+    delta-spark-connect
     delta-type-widening
     delta-uniform
     delta-sharing
diff --git a/docs/source/delta-storage.md b/docs/source/delta-storage.md
@@ -66,11 +66,11 @@ In this default mode, <Delta> supports concurrent reads from multiple clusters,
 
 This section explains how to quickly start reading and writing Delta tables on S3 using single-cluster mode. For a detailed explanation of the configuration, see [_](#setup-configuration-s3-multi-cluster).
 
-#. Use the following command to launch a Spark shell with <Delta> and S3 support (assuming you use Spark 3.5.3 which is pre-built for Hadoop 3.3.4):
+#. Use the following command to launch a Spark shell with <Delta> and S3 support (assuming you use Spark 4.0.0 which is pre-built for Hadoop 3.4.0):
 
    ```bash
    bin/spark-shell \
-    --packages io.delta:delta-spark_2.12:3.3.0,org.apache.hadoop:hadoop-aws:3.3.4 \
+    --packages io.delta:delta-spark_2.13:4.0.0,org.apache.hadoop:hadoop-aws:3.4.0 \
     --conf spark.hadoop.fs.s3a.access.key=<your-s3-access-key> \
     --conf spark.hadoop.fs.s3a.secret.key=<your-s3-secret-key>
    ```
@@ -91,7 +91,7 @@ For efficient listing of <Delta> metadata files on S3, set the configuration `de
 
   ```scala
   bin/spark-shell \
-    --packages io.delta:delta-spark_2.12:3.3.0,org.apache.hadoop:hadoop-aws:3.3.4 \
+    --packages io.delta:delta-spark_2.13:4.0.0,org.apache.hadoop:hadoop-aws:3.4.0 \
     --conf spark.hadoop.fs.s3a.access.key=<your-s3-access-key> \
     --conf spark.hadoop.fs.s3a.secret.key=<your-s3-secret-key> \
     --conf "spark.hadoop.delta.enableFastS3AListFrom=true
@@ -149,11 +149,11 @@ This mode supports concurrent writes to S3 from multiple clusters and has to be
 
 This section explains how to quickly start reading and writing Delta tables on S3 using multi-cluster mode.
 
-#. Use the following command to launch a Spark shell with <Delta> and S3 support (assuming you use Spark 3.5.3 which is pre-built for Hadoop 3.3.4):
+#. Use the following command to launch a Spark shell with <Delta> and S3 support (assuming you use Spark 4.0.0 which is pre-built for Hadoop 3.4.0):
 
    ```bash
    bin/spark-shell \
-    --packages io.delta:delta-spark_2.12:3,org.apache.hadoop:hadoop-aws:3.3.4,io.delta:delta-storage-s3-dynamodb:3.3.0 \
+    --packages io.delta:delta-spark_2.13:3,org.apache.hadoop:hadoop-aws:3.4.0,io.delta:delta-storage-s3-dynamodb:4.0.0 \
     --conf spark.hadoop.fs.s3a.access.key=<your-s3-access-key> \
     --conf spark.hadoop.fs.s3a.secret.key=<your-s3-secret-key> \
     --conf spark.delta.logStore.s3a.impl=io.delta.storage.S3DynamoDBLogStore \
diff --git a/docs/source/delta-type-widening.md b/docs/source/delta-type-widening.md
@@ -104,7 +104,7 @@ The type widening feature can be removed from a Delta table using the `DROP FEAT
 
 .. note::
 
-Tables that enabled type widening using <Delta> 3.2 require dropping feature `typeWidening-preview` instead.
+  Tables that enabled type widening using <Delta> 3.2 require dropping feature `typeWidening-preview` instead.
 
 See [_](delta-drop-feature.md) for more information on dropping Delta table features.
 
diff --git a/docs/source/index.md b/docs/source/index.md
@@ -6,9 +6,6 @@ description: Learn how to use <Delta>.
 
 # Welcome to the <Delta> documentation
 
-.. note::
-    [Delta Lake 4.0 Preview](https://github.com/delta-io/delta/releases/tag/v4.0.0rc1) is released! See the 4.0 Preview documentation [here](https://docs.delta.io/4.0.0-preview/index.html).
-
 This is the documentation site for <Delta>.
 
 .. toctree::
diff --git a/docs/source/quick-start.md b/docs/source/quick-start.md
@@ -18,13 +18,13 @@ Follow these instructions to set up <Delta> with Spark. You can run the steps in
 
 #. Run as a project: Set up a Maven or SBT project (Scala or Java) with <Delta>, copy the code snippets into a source file, and run the project. Alternatively, you can use the [examples provided in the Github repository](https://github.com/delta-io/delta/tree/master/examples).
 
-.. important:: For all of the following instructions, make sure to install the correct version of Spark or PySpark that is compatible with <Delta> `3.3.0`. See the [release compatibility matrix](releases.md) for details.
+.. important:: For all of the following instructions, make sure to install the correct version of Spark or PySpark that is compatible with <Delta> `4.0.0`. See the [release compatibility matrix](releases.md) for details.
 
 ### Prerequisite: set up Java
 
 As mentioned in the official <AS> installation instructions [here](https://spark.apache.org/docs/latest/index.html#downloading), make sure you have a valid Java version installed (8, 11, or 17) and that Java is configured correctly on your system using either the system `PATH` or `JAVA_HOME` environmental variable.
 
-Windows users should follow the instructions in this [blog](https://phoenixnap.com/kb/install-spark-on-windows-10), making sure to use the correct version of <AS> that is compatible with <Delta> `3.3.0`.
+Windows users should follow the instructions in this [blog](https://phoenixnap.com/kb/install-spark-on-windows-10), making sure to use the correct version of <AS> that is compatible with <Delta> `4.0.0`.
 
 ### Set up interactive shell
 
@@ -35,7 +35,7 @@ To use <Delta> interactively within the Spark SQL, Scala, or Python shell, you n
 Download the [compatible version](releases.md) of <AS> by following instructions from [Downloading Spark](https://spark.apache.org/downloads.html), either using `pip` or by downloading and extracting the archive and running `spark-sql` in the extracted directory.
 
 ```bash
-bin/spark-sql --packages io.delta:delta-spark_2.12:3.3.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
+bin/spark-sql --packages io.delta:delta-spark_2.13:4.0.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
 ```
 
 #### PySpark Shell
@@ -49,15 +49,15 @@ bin/spark-sql --packages io.delta:delta-spark_2.12:3.3.0 --conf "spark.sql.exten
 #. Run PySpark with the <Delta> package and additional configurations:
 
    ```bash
-   pyspark --packages io.delta:delta-spark_2.12:3.3.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
+   pyspark --packages io.delta:delta-spark_2.13:4.0.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
    ```
 
 #### Spark Scala Shell
 
 Download the [compatible version](releases.md) of <AS> by following instructions from [Downloading Spark](https://spark.apache.org/downloads.html), either using `pip` or by downloading and extracting the archive and running `spark-shell` in the extracted directory.
 
 ```bash
-bin/spark-shell --packages io.delta:delta-spark_2.12:3.3.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
+bin/spark-shell --packages io.delta:delta-spark_2.13:4.0.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
 ```
 
 ### Set up project
@@ -66,13 +66,13 @@ If you want to build a project using Delta Lake binaries from Maven Central Repo
 
 #### Maven
 
-You include <Delta> in your Maven project by adding it as a dependency in your POM file. <Delta> compiled with Scala 2.12.
+You include <Delta> in your Maven project by adding it as a dependency in your POM file. <Delta> compiled with Scala 2.13.
 
 ```xml
 <dependency>
   <groupId>io.delta</groupId>
-  <artifactId>delta-spark_2.12</artifactId>
-  <version>3.3.0</version>
+  <artifactId>delta-spark_2.13</artifactId>
+  <version>4.0.0</version>
 </dependency>
 ```
 
@@ -81,12 +81,12 @@ You include <Delta> in your Maven project by adding it as a dependency in your P
 You include <Delta> in your SBT project by adding the following line to your `build.sbt` file:
 
 ```scala
-libraryDependencies += "io.delta" %% "delta-spark" % "3.3.0"
+libraryDependencies += "io.delta" %% "delta-spark" % "4.0.0"
 ```
 
 #### Python
 
-To set up a Python project (for example, for unit testing), you can install <Delta> using `pip install delta-spark==3.3.0` and then configure the SparkSession with the `configure_spark_with_delta_pip()` utility function in <Delta>.
+To set up a Python project (for example, for unit testing), you can install <Delta> using `pip install delta-spark==4.0.0` and then configure the SparkSession with the `configure_spark_with_delta_pip()` utility function in <Delta>.
 
 ```python
 import pyspark
diff --git a/docs/source/releases.md b/docs/source/releases.md
@@ -17,6 +17,7 @@ The following table lists <Delta> versions and their compatible <AS> versions.
 
 | <Delta> version | <AS> version |
 | --- | --- |
+| 4.0.x | 4.0.x |
 | 3.3.x | 3.5.x |
 | 3.2.x | 3.5.x |
 | 3.1.x | 3.5.x |
diff --git a/docs/source/versioning.md b/docs/source/versioning.md
@@ -28,7 +28,10 @@ The following <Delta> features break forward compatibility. Features are enabled
    Clustering, [Delta Lake 3.1.0](https://github.com/delta-io/delta/releases/tag/v3.1.0),[_](/delta-clustering.md)
    Row Tracking, [Delta Lake 3.2.0](https://github.com/delta-io/delta/releases/tag/v3.2.0),[_](/delta-row-tracking.md)
    Type widening (Preview),[Delta Lake 3.2.0](https://github.com/delta-io/delta/releases/tag/v3.2.0),[_](/delta-type-widening.md)
+   Type widening,[Delta Lake 4.0.0](https://github.com/delta-io/delta/releases/tag/v4.0.0),[_](/delta-type-widening.md)
    Identity columns, [Delta Lake 3.3.0](https://github.com/delta-io/delta/releases/tag/v3.3.0),[_](/delta-batch.md#use-identity-columns)
+   Variant Type, [Delta Lake 4.0.0](https://github.com/delta-io/delta/releases/tag/v4.0.0),[Variant Type](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#variant-data-type)
+   Variant Shredding (Preview), [Delta Lake 4.0.0](https://github.com/delta-io/delta/releases/tag/v4.0.0),[Variant Shredding](https://github.com/delta-io/delta/blob/master/protocol_rfcs/variant-shredding.md)
 
 <a id="table-protocol"></a>
 
@@ -112,7 +115,9 @@ The following table shows minimum protocol versions required for <Delta> feature
    V2 Checkpoints,7,3,[V2 Checkpoint Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#v2-spec)
    Vacuum Protocol Check,7,3,[Vacuum Protocol Check Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#vacuum-protocol-check)
    Row Tracking,7,3,[_](/delta-row-tracking.md)
-   Type widening (Preview),7,3,[_](/delta-type-widening.md)
+   Type widening,7,3,[_](/delta-type-widening.md)
+   Variant Type,7,3,[Variant Type](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#variant-data-type)
+   Variant Shredding (Preview),7,3[Variant Shredding](https://github.com/delta-io/delta/blob/master/protocol_rfcs/variant-shredding.md)
 
 <a id="upgrade"></a>