You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Docs] Cherry-pick 4.0 docs changes to master branch (delta-io#4749)
<!--
Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
3. Be sure to keep the PR description updated to reflect all changes.
4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->
#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->
- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [X] Other (fill in here)
## Description
To expedite the 4.0 release we merged docs changes directly to that
branch first, this PR cherry-picks them to master.
Note, there was one additional change that was needed for the 4.0
release to accomodate for the 4.0 versioning (scala 2.13) and lack of
flink/standalone projects. That change is NOT included in this PR as we
plan to release a 3.x version in the future. Ref change:
delta-io#4748
## How was this patch tested?
N/A (built and pushed on the 4.0 branch)
## Does this PR introduce _any_ user-facing changes?
No
---------
Co-authored-by: Thang Long Vu <[email protected]>
Copy file name to clipboardExpand all lines: docs/source/delta-drop-feature.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,6 +26,7 @@ You can drop the following Delta table features:
26
26
27
27
-`deletionVectors`. See [_](delta-deletion-vectors.md).
28
28
-`typeWidening-preview`. See [_](delta-type-widening.md). Type widening is available in preview in <Delta> 3.2.0 and above.
29
+
-`typeWidening`. See [_](delta-type-widening.md). Type widening is available in <Delta> 4.0.0 and above.
29
30
-`v2Checkpoint`. See [V2 Checkpoint Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#v2-spec). Drop support for V2 Checkpoints is available in <Delta> 3.1.0 and above.
30
31
-`columnMapping`. See [_](delta-column-mapping.md). Drop support for column mapping is available in <Delta> 3.3.0 and above.
31
32
-`vacuumProtocolCheck`. See [Vacuum Protocol Check Spec](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#vacuum-protocol-check). Drop support for vacuum protocol check is available in <Delta> 3.3.0 and above.
description: Learn about Delta Connect - Spark Connect Support in Delta.
3
+
---
4
+
5
+
# Delta Connect (aka Spark Connect Support in Delta)
6
+
7
+
.. note:: This feature is available in <Delta> 4.0.0 and above. Please note, Delta Connect is currently in preview and not recommended for production workloads.
8
+
9
+
Delta Connect adds [Spark Connect](https://spark.apache.org/docs/latest/spark-connect-overview.html) support to Delta Lake for Apache Spark. Spark Connect is a new initiative that adds a decoupled client-server infrastructure which allows remote connectivity from Spark from everywhere. Delta Connect allows all Delta Lake operations to work in your application running as a client connected to the Spark server.
10
+
11
+
## Motivation
12
+
13
+
Delta Connect is expected to bring the same benefits as Spark Connect:
14
+
15
+
1. Upgrading to more recent versions of Spark and <Delta> is now easier because the client interface is being completely decoupled from the server.
16
+
2. Simpler integration of Spark and <Delta> with developer tooling. IDEs no longer have to integrate with the full Spark and <Delta> implementation, and instead can integrate with a thin-client.
17
+
3. Support for languages other than Java/Scala and Python. Clients "merely" have to generate Protocol Buffers and therefore become simpler to implement.
18
+
4. Spark and <Delta> will become more stable, as user code is no longer running in the same JVM as Spark's driver.
19
+
5. Remote connectivity. Code can run anywhere now, as there is a gRPC layer between the user interface and the driver.
20
+
21
+
## How to start the Spark Server with Delta
22
+
23
+
1. Download `spark-4.0.0-bin-hadoop3.tgz` from [Spark 4.0.0](https://archive.apache.org/dist/spark/spark-4.0.0).
24
+
25
+
2. Start the Spark Connect server with the <Delta> Connect plugins:
Copy file name to clipboardExpand all lines: docs/source/delta-storage.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,11 +66,11 @@ In this default mode, <Delta> supports concurrent reads from multiple clusters,
66
66
67
67
This section explains how to quickly start reading and writing Delta tables on S3 using single-cluster mode. For a detailed explanation of the configuration, see [_](#setup-configuration-s3-multi-cluster).
68
68
69
-
#. Use the following command to launch a Spark shell with <Delta> and S3 support (assuming you use Spark 3.5.3 which is pre-built for Hadoop 3.3.4):
69
+
#. Use the following command to launch a Spark shell with <Delta> and S3 support (assuming you use Spark 4.0.0 which is pre-built for Hadoop 3.4.0):
Copy file name to clipboardExpand all lines: docs/source/index.md
-3Lines changed: 0 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,9 +6,6 @@ description: Learn how to use <Delta>.
6
6
7
7
# Welcome to the <Delta> documentation
8
8
9
-
.. note::
10
-
[Delta Lake 4.0 Preview](https://github.com/delta-io/delta/releases/tag/v4.0.0rc1) is released! See the 4.0 Preview documentation [here](https://docs.delta.io/4.0.0-preview/index.html).
Copy file name to clipboardExpand all lines: docs/source/quick-start.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,13 +18,13 @@ Follow these instructions to set up <Delta> with Spark. You can run the steps in
18
18
19
19
#. Run as a project: Set up a Maven or SBT project (Scala or Java) with <Delta>, copy the code snippets into a source file, and run the project. Alternatively, you can use the [examples provided in the Github repository](https://github.com/delta-io/delta/tree/master/examples).
20
20
21
-
.. important:: For all of the following instructions, make sure to install the correct version of Spark or PySpark that is compatible with <Delta> `3.3.0`. See the [release compatibility matrix](releases.md) for details.
21
+
.. important:: For all of the following instructions, make sure to install the correct version of Spark or PySpark that is compatible with <Delta> `4.0.0`. See the [release compatibility matrix](releases.md) for details.
22
22
23
23
### Prerequisite: set up Java
24
24
25
25
As mentioned in the official <AS> installation instructions [here](https://spark.apache.org/docs/latest/index.html#downloading), make sure you have a valid Java version installed (8, 11, or 17) and that Java is configured correctly on your system using either the system `PATH` or `JAVA_HOME` environmental variable.
26
26
27
-
Windows users should follow the instructions in this [blog](https://phoenixnap.com/kb/install-spark-on-windows-10), making sure to use the correct version of <AS> that is compatible with <Delta> `3.3.0`.
27
+
Windows users should follow the instructions in this [blog](https://phoenixnap.com/kb/install-spark-on-windows-10), making sure to use the correct version of <AS> that is compatible with <Delta> `4.0.0`.
28
28
29
29
### Set up interactive shell
30
30
@@ -35,7 +35,7 @@ To use <Delta> interactively within the Spark SQL, Scala, or Python shell, you n
35
35
Download the [compatible version](releases.md) of <AS> by following instructions from [Downloading Spark](https://spark.apache.org/downloads.html), either using `pip` or by downloading and extracting the archive and running `spark-sql` in the extracted directory.
Download the [compatible version](releases.md) of <AS> by following instructions from [Downloading Spark](https://spark.apache.org/downloads.html), either using `pip` or by downloading and extracting the archive and running `spark-shell` in the extracted directory.
To set up a Python project (for example, for unit testing), you can install <Delta> using `pip install delta-spark==3.3.0` and then configure the SparkSession with the `configure_spark_with_delta_pip()` utility function in <Delta>.
89
+
To set up a Python project (for example, for unit testing), you can install <Delta> using `pip install delta-spark==4.0.0` and then configure the SparkSession with the `configure_spark_with_delta_pip()` utility function in <Delta>.
0 commit comments