Skip to content

Commit 21db3b2

Browse files
committed
[SPARK-52467] Add dfs-read-write and localstack examples
### What changes were proposed in this pull request? This PR aims to add `dfs-read-write` and `localstack` examples. ### Why are the changes needed? To provide the following examples. 1. How to add additional packages ```yaml spark.jars.packages: "org.apache.hadoop:hadoop-aws:3.4.1" spark.jars.ivy: "/tmp/.ivy2.5.2" ``` 2. How to use S3 ```yaml spark.hadoop.fs.defaultFS: "..." spark.hadoop.fs.s3a.endpoint: "..." spark.hadoop.fs.s3a.path.style.access: "..." spark.hadoop.fs.s3a.access.key: "..." spark.hadoop.fs.s3a.secret.key: "..." ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually run. ```bash $ kubectl apply -f examples/localstack.yml $ kubectl apply -f examples/dfs-read-write.yaml $ kubectl logs -f dfs-read-write-0-driver ... Success! Local Word Count 18 and DFS Word Count 18 agree. ... ``` ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#242 from dongjoon-hyun/SPARK-52467. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent 70891ed commit 21db3b2

File tree

2 files changed

+99
-0
lines changed

2 files changed

+99
-0
lines changed

examples/dfs-read-write.yaml

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one or more
2+
# contributor license agreements. See the NOTICE file distributed with
3+
# this work for additional information regarding copyright ownership.
4+
# The ASF licenses this file to You under the Apache License, Version 2.0
5+
# (the "License"); you may not use this file except in compliance with
6+
# the License. You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
#
16+
# Since this requires a remote storage, prepare it via `localstack.yml`.
17+
#
18+
apiVersion: spark.apache.org/v1beta1
19+
kind: SparkApplication
20+
metadata:
21+
name: dfs-read-write
22+
spec:
23+
mainClass: "org.apache.spark.examples.DFSReadWriteTest"
24+
jars: "local:///opt/spark/examples/jars/spark-examples.jar"
25+
driverArgs: [ "/opt/spark/RELEASE", "s3a://data/" ]
26+
sparkConf:
27+
spark.logConf: "true"
28+
spark.jars.packages: "org.apache.hadoop:hadoop-aws:3.4.1"
29+
spark.jars.ivy: "/tmp/.ivy2.5.2"
30+
spark.driver.memory: "2g"
31+
spark.dynamicAllocation.enabled: "true"
32+
spark.dynamicAllocation.shuffleTracking.enabled: "true"
33+
spark.dynamicAllocation.maxExecutors: "3"
34+
spark.kubernetes.authenticate.driver.serviceAccountName: "spark"
35+
spark.kubernetes.container.image: "apache/spark:4.0.0-java21-scala"
36+
spark.hadoop.fs.defaultFS: "s3a://data"
37+
spark.hadoop.fs.s3a.endpoint: "http://localstack:4566"
38+
spark.hadoop.fs.s3a.path.style.access: "true"
39+
spark.hadoop.fs.s3a.access.key: "test"
40+
spark.hadoop.fs.s3a.secret.key: "test"
41+
applicationTolerations:
42+
resourceRetainPolicy: OnFailure
43+
runtimeVersions:
44+
sparkVersion: "4.0.0"

examples/localstack.yml

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one or more
2+
# contributor license agreements. See the NOTICE file distributed with
3+
# this work for additional information regarding copyright ownership.
4+
# The ASF licenses this file to You under the Apache License, Version 2.0
5+
# (the "License"); you may not use this file except in compliance with
6+
# the License. You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
apiVersion: v1
16+
kind: Pod
17+
metadata:
18+
name: localstack
19+
labels:
20+
role: s3
21+
spec:
22+
containers:
23+
- name: localstack
24+
image: localstack/localstack:4
25+
resources:
26+
limits:
27+
cpu: "1"
28+
memory: 1Gi
29+
requests:
30+
cpu: "1"
31+
memory: 1Gi
32+
ports:
33+
- containerPort: 4566
34+
lifecycle:
35+
postStart:
36+
exec:
37+
command:
38+
- /bin/sh
39+
- -c
40+
- >
41+
awslocal s3 mb s3://data;
42+
awslocal s3 cp /opt/code/localstack/Makefile s3://data/
43+
---
44+
apiVersion: v1
45+
kind: Service
46+
metadata:
47+
name: localstack
48+
spec:
49+
type: ClusterIP
50+
ports:
51+
- port: 4566
52+
protocol: TCP
53+
targetPort: 4566
54+
selector:
55+
role: s3

0 commit comments

Comments
 (0)