Skip to content

Commit fea29f8

Browse files
committed
[SPARK-52196] Promote SparkSQLRepl code to Spark SQL REPL example
### What changes were proposed in this pull request? This PR aims to refactor the existing `SparkSQLRepl` code to `Spark SQL REPL` example like the other examples. ### Why are the changes needed? For consistency. ### Does this PR introduce _any_ user-facing change? No behavior change in the `SparkConnect` library code. ### How was this patch tested? Pass the CIs and manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #160 from dongjoon-hyun/SPARK-52196. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent 9c63317 commit fea29f8

File tree

7 files changed

+227
-102
lines changed

7 files changed

+227
-102
lines changed

Examples/spark-sql/Dockerfile

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
#
2+
# Licensed to the Apache Software Foundation (ASF) under one or more
3+
# contributor license agreements. See the NOTICE file distributed with
4+
# this work for additional information regarding copyright ownership.
5+
# The ASF licenses this file to You under the Apache License, Version 2.0
6+
# (the "License"); you may not use this file except in compliance with
7+
# the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
#
17+
FROM swift:6.1 AS builder
18+
19+
WORKDIR /app
20+
21+
COPY . .
22+
23+
RUN swift build -c release
24+
25+
FROM swift:6.1-slim
26+
27+
ARG SPARK_UID=185
28+
29+
LABEL org.opencontainers.image.authors="Apache Spark project <[email protected]>"
30+
LABEL org.opencontainers.image.licenses="Apache-2.0"
31+
LABEL org.opencontainers.image.ref.name="Apache Spark Connect for Swift"
32+
33+
ENV SPARK_SWIFT_HOME=/opt/spark-swift
34+
ENV SPARK_SWIFT_APP=SparkConnectSwiftSQLRepl
35+
36+
WORKDIR $SPARK_SWIFT_HOME
37+
38+
RUN groupadd --system --gid=$SPARK_UID spark && \
39+
useradd --system --home-dir $SPARK_SWIFT_HOME --uid=$SPARK_UID --gid=spark spark && \
40+
chown -R spark:spark $SPARK_SWIFT_HOME
41+
42+
COPY --from=builder --chown=spark:spark /app/.build/*-unknown-linux-gnu/release/$SPARK_SWIFT_APP .
43+
44+
USER spark
45+
46+
ENTRYPOINT ["/bin/sh", "-c", "$SPARK_SWIFT_HOME/$SPARK_SWIFT_APP"]

Examples/spark-sql/Package.swift

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
// swift-tools-version: 6.0
2+
//
3+
// Licensed to the Apache Software Foundation (ASF) under one
4+
// or more contributor license agreements. See the NOTICE file
5+
// distributed with this work for additional information
6+
// regarding copyright ownership. The ASF licenses this file
7+
// to you under the Apache License, Version 2.0 (the
8+
// "License"); you may not use this file except in compliance
9+
// with the License. You may obtain a copy of the License at
10+
//
11+
// http://www.apache.org/licenses/LICENSE-2.0
12+
//
13+
// Unless required by applicable law or agreed to in writing,
14+
// software distributed under the License is distributed on an
15+
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16+
// KIND, either express or implied. See the License for the
17+
// specific language governing permissions and limitations
18+
// under the License.
19+
//
20+
21+
import PackageDescription
22+
23+
let package = Package(
24+
name: "SparkConnectSwiftSQLRepl",
25+
platforms: [
26+
.macOS(.v15)
27+
],
28+
dependencies: [
29+
.package(url: "https://github.com/apache/spark-connect-swift.git", branch: "main")
30+
],
31+
targets: [
32+
.executableTarget(
33+
name: "SparkConnectSwiftSQLRepl",
34+
dependencies: [.product(name: "SparkConnect", package: "spark-connect-swift")]
35+
)
36+
]
37+
)

Examples/spark-sql/README.md

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
# A `Spark SQL REPL` Application with Apache Spark Connect Swift Client
2+
3+
This is an example Swift application to show how to develop a Spark SQL REPL(Read-eval-print Loop) with Apache Spark Connect Swift Client library.
4+
5+
## How to run
6+
7+
Prepare `Spark Connect Server` via running Docker image.
8+
9+
```
10+
docker run -it --rm -p 15002:15002 apache/spark:4.0.0-preview2 bash -c "/opt/spark/sbin/start-connect-server.sh --wait"
11+
```
12+
13+
Build an application Docker image.
14+
15+
```
16+
$ docker build -t apache/spark-connect-swift:spark-sql .
17+
$ docker images apache/spark-connect-swift:spark-sql
18+
REPOSITORY TAG IMAGE ID CREATED SIZE
19+
apache/spark-connect-swift spark-sql 265ddfec650d 7 seconds ago 390MB
20+
```
21+
22+
Run `spark-sql` docker image.
23+
24+
```
25+
$ docker run -it --rm -e SPARK_REMOTE=sc://host.docker.internal:15002 apache/spark-connect-swift:spark-sql
26+
Connected to Apache Spark 4.0.0-preview2 Server
27+
spark-sql (default)> SHOW DATABASES;
28+
+---------+
29+
|namespace|
30+
+---------+
31+
|default |
32+
+---------+
33+
34+
Time taken: 30 ms
35+
spark-sql (default)> CREATE DATABASE db1;
36+
++
37+
||
38+
++
39+
++
40+
41+
Time taken: 31 ms
42+
spark-sql (default)> USE db1;
43+
++
44+
||
45+
++
46+
++
47+
48+
Time taken: 27 ms
49+
spark-sql (db1)> CREATE TABLE t1 AS SELECT * FROM RANGE(10);
50+
++
51+
||
52+
++
53+
++
54+
55+
Time taken: 99 ms
56+
spark-sql (db1)> SELECT * FROM t1;
57+
+---+
58+
| id|
59+
+---+
60+
| 1|
61+
| 5|
62+
| 3|
63+
| 0|
64+
| 6|
65+
| 9|
66+
| 4|
67+
| 8|
68+
| 7|
69+
| 2|
70+
+---+
71+
72+
Time taken: 80 ms
73+
spark-sql (db1)> USE default;
74+
++
75+
||
76+
++
77+
++
78+
79+
Time taken: 26 ms
80+
spark-sql (default)> DROP DATABASE db1 CASCADE;
81+
++
82+
||
83+
++
84+
++
85+
spark-sql (default)> exit;
86+
```
87+
88+
Apache Spark 4 supports [SQL Pipe Syntax](https://dist.apache.org/repos/dist/dev/spark/v4.0.0-rc6-docs/_site/sql-pipe-syntax.html).
89+
90+
```
91+
$ swift run
92+
...
93+
Build of product 'SparkSQLRepl' complete! (2.33s)
94+
Connected to Apache Spark 4.0.0 Server
95+
spark-sql (default)>
96+
FROM ORC.`/opt/spark/examples/src/main/resources/users.orc`
97+
|> AGGREGATE COUNT(*) cnt
98+
GROUP BY name
99+
|> ORDER BY cnt DESC, name ASC
100+
;
101+
+------+---+
102+
| name|cnt|
103+
+------+---+
104+
|Alyssa| 1|
105+
| Ben| 1|
106+
+------+---+
107+
108+
Time taken: 159 ms
109+
```
110+
111+
Run from source code.
112+
113+
```
114+
$ swift run
115+
...
116+
Connected to Apache Spark 4.0.0.9-apple-SNAPSHOT Server
117+
spark-sql (default)>
118+
```
File renamed without changes.

Package.swift

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -52,10 +52,6 @@ let package = Package(
5252
.process("Documentation.docc")
5353
]
5454
),
55-
.executableTarget(
56-
name: "SparkSQLRepl",
57-
dependencies: ["SparkConnect"]
58-
),
5955
.testTarget(
6056
name: "SparkConnectTests",
6157
dependencies: ["SparkConnect"],

README.md

Lines changed: 2 additions & 98 deletions
Original file line numberDiff line numberDiff line change
@@ -114,103 +114,7 @@ SELECT * FROM t
114114
+----+
115115
```
116116
117-
You can find more complete examples including Web Server and Streaming applications in the `Examples` directory.
117+
You can find more complete examples including `Spark SQL REPL`, `Web Server` and `Streaming` applications in the [Examples](https://github.com/apache/spark-connect-swift/tree/main/Examples) directory.
118118
119-
## How to use `Spark SQL REPL` via `Spark Connect for Swift`
119+
This library also supports `SPARK_REMOTE` environment variable to specify the [Spark Connect connection string](https://spark.apache.org/docs/latest/spark-connect-overview.html#set-sparkremote-environment-variable) in order to provide more options.
120120
121-
This project also provides `Spark SQL REPL`. You can run it directly from this repository.
122-
123-
```bash
124-
$ swift run
125-
...
126-
Build of product 'SparkSQLRepl' complete! (2.33s)
127-
Connected to Apache Spark 4.0.0 Server
128-
spark-sql (default)> SHOW DATABASES;
129-
+---------+
130-
|namespace|
131-
+---------+
132-
| default|
133-
+---------+
134-
135-
Time taken: 30 ms
136-
spark-sql (default)> CREATE DATABASE db1;
137-
++
138-
||
139-
++
140-
++
141-
142-
Time taken: 31 ms
143-
spark-sql (default)> USE db1;
144-
++
145-
||
146-
++
147-
++
148-
149-
Time taken: 27 ms
150-
spark-sql (db1)> CREATE TABLE t1 AS SELECT * FROM RANGE(10);
151-
++
152-
||
153-
++
154-
++
155-
156-
Time taken: 99 ms
157-
spark-sql (db1)> SELECT * FROM t1;
158-
+---+
159-
| id|
160-
+---+
161-
| 1|
162-
| 5|
163-
| 3|
164-
| 0|
165-
| 6|
166-
| 9|
167-
| 4|
168-
| 8|
169-
| 7|
170-
| 2|
171-
+---+
172-
173-
Time taken: 80 ms
174-
spark-sql (db1)> USE default;
175-
++
176-
||
177-
++
178-
++
179-
180-
Time taken: 26 ms
181-
spark-sql (default)> DROP DATABASE db1 CASCADE;
182-
++
183-
||
184-
++
185-
++
186-
spark-sql (default)> exit;
187-
```
188-
189-
Apache Spark 4 supports [SQL Pipe Syntax](https://dist.apache.org/repos/dist/dev/spark/v4.0.0-rc6-docs/_site/sql-pipe-syntax.html).
190-
191-
```
192-
$ swift run
193-
...
194-
Build of product 'SparkSQLRepl' complete! (2.33s)
195-
Connected to Apache Spark 4.0.0 Server
196-
spark-sql (default)>
197-
FROM ORC.`/opt/spark/examples/src/main/resources/users.orc`
198-
|> AGGREGATE COUNT(*) cnt
199-
GROUP BY name
200-
|> ORDER BY cnt DESC, name ASC
201-
;
202-
+------+---+
203-
| name|cnt|
204-
+------+---+
205-
|Alyssa| 1|
206-
| Ben| 1|
207-
+------+---+
208-
209-
Time taken: 159 ms
210-
```
211-
212-
You can use `SPARK_REMOTE` to specify the [Spark Connect connection string](https://spark.apache.org/docs/latest/spark-connect-overview.html#set-sparkremote-environment-variable) in order to provide more options.
213-
214-
```bash
215-
SPARK_REMOTE=sc://localhost swift run
216-
```

Sources/SparkConnect/Documentation.docc/Examples.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,30 @@ docker run -it --rm -e SPARK_REMOTE=sc://host.docker.internal:15002 apache/spark
3737
swift run
3838
```
3939

40+
## Spark SQL REPL(Read-Eval-Print Loop) Example
41+
42+
The Spark SQL REPL application example demonstrates interactive operations with ad-hoc Spark SQL queries with Apache Spark Connect, including:
43+
- Connecting to a Spark server
44+
- Receiving ad-hoc Spark SQL queries from users
45+
- Show the SQL results interactively
46+
47+
### Key Features
48+
- Spark SQL execution for table operations
49+
- User interactions
50+
51+
### How to Run
52+
53+
Build and run the application:
54+
55+
```bash
56+
# Using Docker
57+
docker build -t apache/spark-connect-swift:spark-sql .
58+
docker run -it --rm -e SPARK_REMOTE=sc://host.docker.internal:15002 apache/spark-connect-swift:spark-sql
59+
60+
# From source code
61+
swift run
62+
```
63+
4064
## Pi Calculation Example
4165

4266
The Pi calculation example shows how to use Spark Connect Swift for computational tasks by calculating an approximation of π (pi) using the Monte Carlo method.

0 commit comments

Comments
 (0)