Skip to content

Commit 5d40e18

Browse files
[Docs] Cherry-pick 4.0 docs changes to master branch (delta-io#4749)
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [ ] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [X] Other (fill in here) ## Description To expedite the 4.0 release we merged docs changes directly to that branch first, this PR cherry-picks them to master. Note, there was one additional change that was needed for the 4.0 release to accomodate for the 4.0 versioning (scala 2.13) and lack of flink/standalone projects. That change is NOT included in this PR as we plan to release a 3.x version in the future. Ref change: delta-io#4748 ## How was this patch tested? N/A (built and pushed on the 4.0 branch) ## Does this PR introduce _any_ user-facing changes? No --------- Co-authored-by: Thang Long Vu <[email protected]>
1 parent 430518e commit 5d40e18

File tree

9 files changed

+110
-20
lines changed

9 files changed

+110
-20
lines changed

docs/source/delta-drop-feature.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ You can drop the following Delta table features:
2626

2727
- `deletionVectors`. See [_](delta-deletion-vectors.md).
2828
- `typeWidening-preview`. See [_](delta-type-widening.md). Type widening is available in preview in <Delta> 3.2.0 and above.
29+
- `typeWidening`. See [_](delta-type-widening.md). Type widening is available in <Delta> 4.0.0 and above.
2930
- `v2Checkpoint`. See [V2 Checkpoint Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#v2-spec). Drop support for V2 Checkpoints is available in <Delta> 3.1.0 and above.
3031
- `columnMapping`. See [_](delta-column-mapping.md). Drop support for column mapping is available in <Delta> 3.3.0 and above.
3132
- `vacuumProtocolCheck`. See [Vacuum Protocol Check Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#vacuum-protocol-check). Drop support for vacuum protocol check is available in <Delta> 3.3.0 and above.

docs/source/delta-spark-connect.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
---
2+
description: Learn about Delta Connect - Spark Connect Support in Delta.
3+
---
4+
5+
# Delta Connect (aka Spark Connect Support in Delta)
6+
7+
.. note:: This feature is available in <Delta> 4.0.0 and above. Please note, Delta Connect is currently in preview and not recommended for production workloads.
8+
9+
Delta Connect adds [Spark Connect](https://spark.apache.org/docs/latest/spark-connect-overview.html) support to Delta Lake for Apache Spark. Spark Connect is a new initiative that adds a decoupled client-server infrastructure which allows remote connectivity from Spark from everywhere. Delta Connect allows all Delta Lake operations to work in your application running as a client connected to the Spark server.
10+
11+
## Motivation
12+
13+
Delta Connect is expected to bring the same benefits as Spark Connect:
14+
15+
1. Upgrading to more recent versions of Spark and <Delta> is now easier because the client interface is being completely decoupled from the server.
16+
2. Simpler integration of Spark and <Delta> with developer tooling. IDEs no longer have to integrate with the full Spark and <Delta> implementation, and instead can integrate with a thin-client.
17+
3. Support for languages other than Java/Scala and Python. Clients "merely" have to generate Protocol Buffers and therefore become simpler to implement.
18+
4. Spark and <Delta> will become more stable, as user code is no longer running in the same JVM as Spark's driver.
19+
5. Remote connectivity. Code can run anywhere now, as there is a gRPC layer between the user interface and the driver.
20+
21+
## How to start the Spark Server with Delta
22+
23+
1. Download `spark-4.0.0-bin-hadoop3.tgz` from [Spark 4.0.0](https://archive.apache.org/dist/spark/spark-4.0.0).
24+
25+
2. Start the Spark Connect server with the <Delta> Connect plugins:
26+
27+
```bash
28+
sbin/start-connect-server.sh \
29+
--packages io.delta:delta-connect-server_2.13:4.0.0,com.google.protobuf:protobuf-java:3.25.1 \
30+
--conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
31+
--conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog" \
32+
--conf "spark.connect.extensions.relation.classes=org.apache.spark.sql.connect.delta.DeltaRelationPlugin" \
33+
--conf "spark.connect.extensions.command.classes=org.apache.spark.sql.connect.delta.DeltaCommandPlugin"
34+
```
35+
36+
## How to use the Python Spark Connect Client with Delta
37+
38+
The <Delta> Connect Python client is included in the same PyPi package as <Delta> Spark.
39+
40+
1. `pip install pyspark==4.0.0`.
41+
42+
2. `pip install delta-spark==4.0.0`.
43+
44+
3. The usage is the same as Spark Connect (e.g. `./bin/pyspark --remote "sc://localhost"`).
45+
We just need to pass in a remote `SparkSession` (instead of a local one) to the `DeltaTable` API.
46+
47+
An example:
48+
49+
```python
50+
from delta.tables import DeltaTable
51+
from pyspark.sql import SparkSession
52+
from pyspark.sql.functions import *
53+
54+
deltaTable = DeltaTable.forName(spark, "my_table")
55+
deltaTable.toDF().show()
56+
57+
deltaTable.update(
58+
condition = "id % 2 == 0",
59+
set = {"id": "id + 100"}
60+
)
61+
```
62+
63+
## How to use the Scala Spark Connect Client with Delta
64+
65+
Make sure you are using Java 17!
66+
67+
```bash
68+
./bin/spark-shell --remote "sc://localhost" --packages io.delta:delta-connect-client_2.13:4.0.0,com.google.protobuf:protobuf-java:3.25.1
69+
```
70+
71+
An example:
72+
73+
```scala
74+
import io.delta.tables.DeltaTable
75+
76+
val deltaTable = DeltaTable.forName(spark, "my_table")
77+
deltaTable.toDF.show()
78+
79+
deltaTable.updateExpr(
80+
condition = "id % 2 == 0",
81+
set = Map("id" -> "id + 100")
82+
)
83+
```
84+
85+
.. include:: /shared/replacements.md

docs/source/delta-spark.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ This is the documentation page for <Delta> Spark connector.
2828
delta-drop-feature
2929
delta-row-tracking
3030
delta-storage
31+
delta-spark-connect
3132
delta-type-widening
3233
delta-uniform
3334
delta-sharing

docs/source/delta-storage.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -66,11 +66,11 @@ In this default mode, <Delta> supports concurrent reads from multiple clusters,
6666

6767
This section explains how to quickly start reading and writing Delta tables on S3 using single-cluster mode. For a detailed explanation of the configuration, see [_](#setup-configuration-s3-multi-cluster).
6868

69-
#. Use the following command to launch a Spark shell with <Delta> and S3 support (assuming you use Spark 3.5.3 which is pre-built for Hadoop 3.3.4):
69+
#. Use the following command to launch a Spark shell with <Delta> and S3 support (assuming you use Spark 4.0.0 which is pre-built for Hadoop 3.4.0):
7070

7171
```bash
7272
bin/spark-shell \
73-
--packages io.delta:delta-spark_2.12:3.3.0,org.apache.hadoop:hadoop-aws:3.3.4 \
73+
--packages io.delta:delta-spark_2.13:4.0.0,org.apache.hadoop:hadoop-aws:3.4.0 \
7474
--conf spark.hadoop.fs.s3a.access.key=<your-s3-access-key> \
7575
--conf spark.hadoop.fs.s3a.secret.key=<your-s3-secret-key>
7676
```
@@ -91,7 +91,7 @@ For efficient listing of <Delta> metadata files on S3, set the configuration `de
9191

9292
```scala
9393
bin/spark-shell \
94-
--packages io.delta:delta-spark_2.12:3.3.0,org.apache.hadoop:hadoop-aws:3.3.4 \
94+
--packages io.delta:delta-spark_2.13:4.0.0,org.apache.hadoop:hadoop-aws:3.4.0 \
9595
--conf spark.hadoop.fs.s3a.access.key=<your-s3-access-key> \
9696
--conf spark.hadoop.fs.s3a.secret.key=<your-s3-secret-key> \
9797
--conf "spark.hadoop.delta.enableFastS3AListFrom=true
@@ -149,11 +149,11 @@ This mode supports concurrent writes to S3 from multiple clusters and has to be
149149

150150
This section explains how to quickly start reading and writing Delta tables on S3 using multi-cluster mode.
151151

152-
#. Use the following command to launch a Spark shell with <Delta> and S3 support (assuming you use Spark 3.5.3 which is pre-built for Hadoop 3.3.4):
152+
#. Use the following command to launch a Spark shell with <Delta> and S3 support (assuming you use Spark 4.0.0 which is pre-built for Hadoop 3.4.0):
153153

154154
```bash
155155
bin/spark-shell \
156-
--packages io.delta:delta-spark_2.12:3,org.apache.hadoop:hadoop-aws:3.3.4,io.delta:delta-storage-s3-dynamodb:3.3.0 \
156+
--packages io.delta:delta-spark_2.13:3,org.apache.hadoop:hadoop-aws:3.4.0,io.delta:delta-storage-s3-dynamodb:4.0.0 \
157157
--conf spark.hadoop.fs.s3a.access.key=<your-s3-access-key> \
158158
--conf spark.hadoop.fs.s3a.secret.key=<your-s3-secret-key> \
159159
--conf spark.delta.logStore.s3a.impl=io.delta.storage.S3DynamoDBLogStore \

docs/source/delta-type-widening.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ The type widening feature can be removed from a Delta table using the `DROP FEAT
104104

105105
.. note::
106106

107-
Tables that enabled type widening using <Delta> 3.2 require dropping feature `typeWidening-preview` instead.
107+
Tables that enabled type widening using <Delta> 3.2 require dropping feature `typeWidening-preview` instead.
108108

109109
See [_](delta-drop-feature.md) for more information on dropping Delta table features.
110110

docs/source/index.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,6 @@ description: Learn how to use <Delta>.
66

77
# Welcome to the <Delta> documentation
88

9-
.. note::
10-
[Delta Lake 4.0 Preview](https://github.com/delta-io/delta/releases/tag/v4.0.0rc1) is released! See the 4.0 Preview documentation [here](https://docs.delta.io/4.0.0-preview/index.html).
11-
129
This is the documentation site for <Delta>.
1310

1411
.. toctree::

docs/source/quick-start.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,13 @@ Follow these instructions to set up <Delta> with Spark. You can run the steps in
1818

1919
#. Run as a project: Set up a Maven or SBT project (Scala or Java) with <Delta>, copy the code snippets into a source file, and run the project. Alternatively, you can use the [examples provided in the Github repository](https://github.com/delta-io/delta/tree/master/examples).
2020

21-
.. important:: For all of the following instructions, make sure to install the correct version of Spark or PySpark that is compatible with <Delta> `3.3.0`. See the [release compatibility matrix](releases.md) for details.
21+
.. important:: For all of the following instructions, make sure to install the correct version of Spark or PySpark that is compatible with <Delta> `4.0.0`. See the [release compatibility matrix](releases.md) for details.
2222

2323
### Prerequisite: set up Java
2424

2525
As mentioned in the official <AS> installation instructions [here](https://spark.apache.org/docs/latest/index.html#downloading), make sure you have a valid Java version installed (8, 11, or 17) and that Java is configured correctly on your system using either the system `PATH` or `JAVA_HOME` environmental variable.
2626

27-
Windows users should follow the instructions in this [blog](https://phoenixnap.com/kb/install-spark-on-windows-10), making sure to use the correct version of <AS> that is compatible with <Delta> `3.3.0`.
27+
Windows users should follow the instructions in this [blog](https://phoenixnap.com/kb/install-spark-on-windows-10), making sure to use the correct version of <AS> that is compatible with <Delta> `4.0.0`.
2828

2929
### Set up interactive shell
3030

@@ -35,7 +35,7 @@ To use <Delta> interactively within the Spark SQL, Scala, or Python shell, you n
3535
Download the [compatible version](releases.md) of <AS> by following instructions from [Downloading Spark](https://spark.apache.org/downloads.html), either using `pip` or by downloading and extracting the archive and running `spark-sql` in the extracted directory.
3636

3737
```bash
38-
bin/spark-sql --packages io.delta:delta-spark_2.12:3.3.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
38+
bin/spark-sql --packages io.delta:delta-spark_2.13:4.0.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
3939
```
4040

4141
#### PySpark Shell
@@ -49,15 +49,15 @@ bin/spark-sql --packages io.delta:delta-spark_2.12:3.3.0 --conf "spark.sql.exten
4949
#. Run PySpark with the <Delta> package and additional configurations:
5050

5151
```bash
52-
pyspark --packages io.delta:delta-spark_2.12:3.3.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
52+
pyspark --packages io.delta:delta-spark_2.13:4.0.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
5353
```
5454

5555
#### Spark Scala Shell
5656

5757
Download the [compatible version](releases.md) of <AS> by following instructions from [Downloading Spark](https://spark.apache.org/downloads.html), either using `pip` or by downloading and extracting the archive and running `spark-shell` in the extracted directory.
5858

5959
```bash
60-
bin/spark-shell --packages io.delta:delta-spark_2.12:3.3.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
60+
bin/spark-shell --packages io.delta:delta-spark_2.13:4.0.0 --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
6161
```
6262

6363
### Set up project
@@ -66,13 +66,13 @@ If you want to build a project using Delta Lake binaries from Maven Central Repo
6666

6767
#### Maven
6868

69-
You include <Delta> in your Maven project by adding it as a dependency in your POM file. <Delta> compiled with Scala 2.12.
69+
You include <Delta> in your Maven project by adding it as a dependency in your POM file. <Delta> compiled with Scala 2.13.
7070

7171
```xml
7272
<dependency>
7373
<groupId>io.delta</groupId>
74-
<artifactId>delta-spark_2.12</artifactId>
75-
<version>3.3.0</version>
74+
<artifactId>delta-spark_2.13</artifactId>
75+
<version>4.0.0</version>
7676
</dependency>
7777
```
7878

@@ -81,12 +81,12 @@ You include <Delta> in your Maven project by adding it as a dependency in your P
8181
You include <Delta> in your SBT project by adding the following line to your `build.sbt` file:
8282

8383
```scala
84-
libraryDependencies += "io.delta" %% "delta-spark" % "3.3.0"
84+
libraryDependencies += "io.delta" %% "delta-spark" % "4.0.0"
8585
```
8686

8787
#### Python
8888

89-
To set up a Python project (for example, for unit testing), you can install <Delta> using `pip install delta-spark==3.3.0` and then configure the SparkSession with the `configure_spark_with_delta_pip()` utility function in <Delta>.
89+
To set up a Python project (for example, for unit testing), you can install <Delta> using `pip install delta-spark==4.0.0` and then configure the SparkSession with the `configure_spark_with_delta_pip()` utility function in <Delta>.
9090

9191
```python
9292
import pyspark

docs/source/releases.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ The following table lists <Delta> versions and their compatible <AS> versions.
1717

1818
| <Delta> version | <AS> version |
1919
| --- | --- |
20+
| 4.0.x | 4.0.x |
2021
| 3.3.x | 3.5.x |
2122
| 3.2.x | 3.5.x |
2223
| 3.1.x | 3.5.x |

docs/source/versioning.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,10 @@ The following <Delta> features break forward compatibility. Features are enabled
2828
Clustering, [Delta Lake 3.1.0](https://github.com/delta-io/delta/releases/tag/v3.1.0),[_](/delta-clustering.md)
2929
Row Tracking, [Delta Lake 3.2.0](https://github.com/delta-io/delta/releases/tag/v3.2.0),[_](/delta-row-tracking.md)
3030
Type widening (Preview),[Delta Lake 3.2.0](https://github.com/delta-io/delta/releases/tag/v3.2.0),[_](/delta-type-widening.md)
31+
Type widening,[Delta Lake 4.0.0](https://github.com/delta-io/delta/releases/tag/v4.0.0),[_](/delta-type-widening.md)
3132
Identity columns, [Delta Lake 3.3.0](https://github.com/delta-io/delta/releases/tag/v3.3.0),[_](/delta-batch.md#use-identity-columns)
33+
Variant Type, [Delta Lake 4.0.0](https://github.com/delta-io/delta/releases/tag/v4.0.0),[Variant Type](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#variant-data-type)
34+
Variant Shredding (Preview), [Delta Lake 4.0.0](https://github.com/delta-io/delta/releases/tag/v4.0.0),[Variant Shredding](https://github.com/delta-io/delta/blob/master/protocol_rfcs/variant-shredding.md)
3235

3336
<a id="table-protocol"></a>
3437

@@ -112,7 +115,9 @@ The following table shows minimum protocol versions required for <Delta> feature
112115
V2 Checkpoints,7,3,[V2 Checkpoint Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#v2-spec)
113116
Vacuum Protocol Check,7,3,[Vacuum Protocol Check Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#vacuum-protocol-check)
114117
Row Tracking,7,3,[_](/delta-row-tracking.md)
115-
Type widening (Preview),7,3,[_](/delta-type-widening.md)
118+
Type widening,7,3,[_](/delta-type-widening.md)
119+
Variant Type,7,3,[Variant Type](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#variant-data-type)
120+
Variant Shredding (Preview),7,3[Variant Shredding](https://github.com/delta-io/delta/blob/master/protocol_rfcs/variant-shredding.md)
116121

117122
<a id="upgrade"></a>
118123

0 commit comments

Comments
 (0)