Skip to content

Commit a182864

Browse files
committed
[SPARK-52513] Add a streaming word count example with rolling event logs
### What changes were proposed in this pull request? This PR aims to add a streaming word count example, `org.apache.spark.examples.streaming.HdfsWordCount`. In addition, this example will show a event log rolling feature which is enabled by default in Apache Spark 4.0.0. ```yaml spark.eventLog.enabled: "true" spark.eventLog.dir: "s3a://spark-events/" spark.eventLog.rolling.maxFileSize: "10m" ``` ``` $ aws s3 --profile localstack ls s3://spark-events/ --recursive 2025-06-17 09:43:55 0 eventlog_v2_stream-word-count-0/ 2025-06-17 09:43:55 0 eventlog_v2_stream-word-count-0/appstatus_stream-word-count-0.inprogress 2025-06-17 09:52:02 1957278 eventlog_v2_stream-word-count-0/events_1_stream-word-count-0.zstd ``` ### Why are the changes needed? To show a streaming example with event log. ### Does this PR introduce _any_ user-facing change? No behavior change because this is an example. ### How was this patch tested? Manual tests. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#251 from dongjoon-hyun/SPARK-52513. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent 5f9b23f commit a182864

File tree

2 files changed

+41
-0
lines changed

2 files changed

+41
-0
lines changed

examples/localstack.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ spec:
3939
- -c
4040
- >
4141
awslocal s3 mb s3://spark-events;
42+
awslocal s3 mb s3://ingest;
4243
awslocal s3 mb s3://data;
4344
awslocal s3 cp /opt/code/localstack/Makefile s3://data/
4445
---

examples/stream-word-count.yaml

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one or more
2+
# contributor license agreements. See the NOTICE file distributed with
3+
# this work for additional information regarding copyright ownership.
4+
# The ASF licenses this file to You under the Apache License, Version 2.0
5+
# (the "License"); you may not use this file except in compliance with
6+
# the License. You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
apiVersion: spark.apache.org/v1beta1
16+
kind: SparkApplication
17+
metadata:
18+
name: stream-word-count
19+
spec:
20+
mainClass: "org.apache.spark.examples.streaming.HdfsWordCount"
21+
jars: "local:///opt/spark/examples/jars/spark-examples.jar"
22+
driverArgs: [ "s3a://ingest" ]
23+
sparkConf:
24+
spark.jars.packages: "org.apache.hadoop:hadoop-aws:3.4.1"
25+
spark.jars.ivy: "/tmp/.ivy2.5.2"
26+
spark.dynamicAllocation.enabled: "true"
27+
spark.dynamicAllocation.shuffleTracking.enabled: "true"
28+
spark.dynamicAllocation.maxExecutors: "3"
29+
spark.kubernetes.authenticate.driver.serviceAccountName: "spark"
30+
spark.kubernetes.container.image: "apache/spark:4.0.0-java21-scala"
31+
spark.log.level: "WARN"
32+
spark.eventLog.enabled: "true"
33+
spark.eventLog.dir: "s3a://spark-events/"
34+
spark.eventLog.rolling.maxFileSize: "10m"
35+
spark.hadoop.fs.s3a.endpoint: "http://localstack:4566"
36+
spark.hadoop.fs.s3a.path.style.access: "true"
37+
spark.hadoop.fs.s3a.access.key: "test"
38+
spark.hadoop.fs.s3a.secret.key: "test"
39+
runtimeVersions:
40+
sparkVersion: "4.0.0"

0 commit comments

Comments
 (0)