Skip to content

Commit a5a1915

Browse files
Document the property customLogDirectory
1 parent a18ae50 commit a5a1915

File tree

2 files changed

+87
-5
lines changed

2 files changed

+87
-5
lines changed

docs/modules/spark-k8s/pages/index.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,8 @@ The SparkApplication resource is the main point of interaction with the operator
3737
An exhaustive list of options is given in the {crd}[SparkApplication CRD reference {external-link-icon}^].
3838

3939
The xref:usage-guide/history-server.adoc[SparkHistoryServer] has a single `node` role.
40-
It is used to deploy a https://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact[Spark history server] that displays application logs from S3 buckets.
41-
Of course, your applications need to write their logs to the same buckets.
40+
It is used to deploy a https://spark.apache.org/docs/latest/monitoring.html#viewing-after-the-fact[Spark history server] that displays application logs.
41+
Of course, your applications need to write their logs to the location.
4242

4343
=== Kubernetes resources
4444

docs/modules/spark-k8s/pages/usage-guide/history-server.adoc

Lines changed: 85 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,7 @@ For more details on how the Stackable Data Platform manages S3 resources see the
1717
include::example$example-history-server.yaml[]
1818
----
1919

20-
<1> The location of the event logs.
21-
Must be an S3 bucket.
22-
Future implementations might add support for other shared filesystems such as HDFS.
20+
<1> The location of the event logs, see <<log-dir-variants>> for other options.
2321
<2> Directory within the S3 bucket where the log files are located.
2422
This directory is required and must exist before setting up the history server.
2523
<3> The S3 bucket definition, here provided in-line.
@@ -56,7 +54,91 @@ include::example$example-history-app.yaml[]
5654
<5> Bucket to store logs. This must match the bucket used by the history server.
5755
<6> Credentials used to write event logs. These can, of course, differ from the credentials used to process data.
5856

57+
[#log-dir-variants]
58+
== Supported file systems for storing log events
5959

60+
=== S3
61+
62+
As already shown in the example above, the event logs can be stored in an S3 bucket:
63+
64+
[source,yaml]
65+
----
66+
---
67+
apiVersion: spark.stackable.tech/v1alpha1
68+
kind: SparkHistoryServer
69+
spec:
70+
logFileDirectory:
71+
s3:
72+
prefix: eventlogs/
73+
bucket:
74+
...
75+
---
76+
apiVersion: spark.stackable.tech/v1alpha1
77+
kind: SparkApplication
78+
spec:
79+
logFileDirectory:
80+
s3:
81+
prefix: eventlogs/
82+
bucket:
83+
...
84+
----
85+
86+
=== Custom log directory
87+
88+
If there is no structure provided for the desired file system, it can nevertheless be set with the property `customLogDirectory`.
89+
Additional configuration overrides may be necessary in this case.
90+
91+
For instance, to store the Spark event logs in HDFS, the following configuration could be used:
92+
93+
[source,yaml]
94+
----
95+
---
96+
apiVersion: spark.stackable.tech/v1alpha1
97+
kind: SparkHistoryServer
98+
spec:
99+
logFileDirectory:
100+
customLogDirectory: hdfs://simple-hdfs/eventlogs/ # <1>
101+
nodes:
102+
envOverrides:
103+
HADOOP_CONF_DIR: /stackable/hdfs-config # <2>
104+
podOverrides:
105+
spec:
106+
containers:
107+
- name: spark-history
108+
volumeMounts:
109+
- name: hdfs-config
110+
mountPath: /stackable/hdfs-config
111+
volumes:
112+
- name: hdfs-config
113+
configMap:
114+
name: hdfs # <3>
115+
---
116+
apiVersion: spark.stackable.tech/v1alpha1
117+
kind: SparkApplication
118+
spec:
119+
logFileDirectory:
120+
customLogDirectory: hdfs://simple-hdfs/eventlogs/ # <4>
121+
sparkConf:
122+
spark.driver.extraClassPath: /stackable/hdfs-config # <5>
123+
driver:
124+
config:
125+
volumeMounts:
126+
- name: hdfs-config
127+
mountPath: /stackable/hdfs-config
128+
volumes:
129+
- name: hdfs-config
130+
configMap:
131+
name: hdfs
132+
----
133+
134+
<1> A custom log directory that is used for the Spark option `spark.history.fs.logDirectory`.
135+
The required dependencies must be on the class path.
136+
This is the case for HDFS.
137+
<2> The Spark History Server looks for the Hadoop configuration in the directory defined by the environment variable `HADOOP_CONF_DIR`.
138+
<3> The ConfigMap containing the Hadoop configuration files `core-site.xml` and `hdfs-site.xml`.
139+
<4> A custom log directory that is used for the Spark option `spark.eventLog.dir`.
140+
Additionally, the Spark option `spark.eventLog.enabled` is set to `true`.
141+
<5> The Spark driver looks for the Hadoop configuration on the class path.
60142

61143
== History Web UI
62144

0 commit comments

Comments
 (0)