Skip to content

Commit ae81469

Browse files
authored
Update helm charts to reflect HDFS experiment learnings (#8)
* Update helm charts * Clean up README.md * Update README * Address review comments
1 parent 8dd3d95 commit ae81469

File tree

10 files changed

+176
-42
lines changed

10 files changed

+176
-42
lines changed

charts/README.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Helm charts for launching HDFS in a K8s cluster. They should be launched in
2+
the following order.
3+
4+
1. `hdfs-resolv-conf`: Creates a config map containing resolv.conf used by
5+
the HDFS daemons. See `hdfs-resolv-conf/README.md` for how to launch.
6+
2. `hdfs-namenode-k8s`: Launches the hdfs namenode. See
7+
`hdfs-namenode-k8s/README.md` for how to launch.
8+
3. `hdfs-datanode-k8s`: Launches the hdfs datanode daemons. See
9+
`hdfs-datanode-k8s/README.md` for how to launch.

charts/hdfs-datanode-k8s/README.md

Lines changed: 6 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -6,34 +6,24 @@ HDFS `datanodes` running inside a kubernetes cluster. See the other chart for
66
Requires Kubernetes version 1.5 and beyond, because `namenode` is using
77
`StatefulSet`, which is available only in version 1.5 and later.
88

9-
Make sure `namenode` is fully launched using the other chart. `Datanodes` rely
9+
Make sure the `hdfs-resolv-conf` chart is launched. Also ensure `namenode` is
10+
fully launched using the corresponding chart. `Datanodes` rely
1011
on DNS to resolve the hostname of the namenode when they start up.
1112

1213
### Usage
1314

14-
1. Find the service IP of your `kube-dns` of your k8s cluster.
15-
Try the following command and find the IP value in the output.
16-
It will be supplied below as the `clusterDnsIP` parameter.
17-
18-
```
19-
$ kubectl get svc --all-namespaces | grep kube-dns
20-
```
21-
22-
2. Optionally, find the domain name of your k8s cluster that become part of
15+
1. Optionally, find the domain name of your k8s cluster that becomes part of
2316
pod and service host names. Default is `cluster.local`. See `values.yaml`
2417
for additional parameters to change. You can add them below in `--set`,
2518
as comma-separated entries.
2619

27-
3. Launch this helm chart, `hdfs-datanode-k8s`, while specifying
28-
the kube-dns name server IP and other parameters. (You can add multiple
29-
of them below in --set as comma-separated entries)
20+
2. Launch this helm chart, `hdfs-datanode-k8s`.
3021

3122
```
32-
$ helm install -n my-hdfs-datanode \
33-
--set clusterDnsIP=YOUR-KUBE-DNS-IP hdfs-datanode-k8s
23+
$ helm install -n my-hdfs-datanode hdfs-datanode-k8s
3424
```
3525

36-
5. Confirm the daemons are launched.
26+
3. Confirm the daemons are launched.
3727

3828
```
3929
$ kubectl get pods | grep hdfs-datanode-

charts/hdfs-datanode-k8s/templates/datanode-daemonset.yaml

Lines changed: 25 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -12,16 +12,7 @@
1212
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1313
# See the License for the specific language governing permissions and
1414
# limitations under the License.
15-
apiVersion: v1
16-
kind: ConfigMap
17-
metadata:
18-
name: resolv-conf-datanode
19-
data:
20-
resolv.conf: |
21-
search kube-system.svc.{{ .Values.clusterDomain }} svc.{{ .Values.clusterDomain }} {{ .Values.clusterDomain }}
22-
nameserver {{ .Values.clusterDnsIP }}
23-
options ndots:5
24-
---
15+
2516
# Deleting a daemonset may need some trick. See
2617
# https://github.com/kubernetes/kubernetes/issues/33245#issuecomment-261250489
2718
apiVersion: extensions/v1beta1
@@ -40,10 +31,24 @@ spec:
4031
- name: datanode
4132
image: uhopper/hadoop-datanode:2.7.2
4233
env:
34+
# The below uses two loops to make sure the last item does not have comma. It uses index 0
35+
# for the last item since that is the only special index that helm template gives us.
36+
- name: HDFS_CONF_dfs_datanode_data_dir
37+
value: |-
38+
{{- range $index, $path := .Values.dataNodeHostPath }}
39+
{{- if ne $index 0 }}
40+
/hadoop/dfs/data/{{ $index }},
41+
{{- end }}
42+
{{- end }}
43+
{{- range $index, $path := .Values.dataNodeHostPath }}
44+
{{- if eq $index 0 }}
45+
/hadoop/dfs/data/{{ $index }}
46+
{{- end }}
47+
{{- end }}
4348
# This works only with /etc/resolv.conf mounted from the config map.
4449
# K8s version 1.6 will fix this, per https://github.com/kubernetes/kubernetes/pull/29378.
4550
- name: CORE_CONF_fs_defaultFS
46-
value: hdfs://hdfs-namenode-0.hdfs-namenode.kube-system.svc.{{ .Values.clusterDomain }}:8020
51+
value: hdfs://hdfs-namenode-0.hdfs-namenode.default.svc.{{ .Values.clusterDomain }}:8020
4752
livenessProbe:
4853
initialDelaySeconds: 30
4954
httpGet:
@@ -53,20 +58,24 @@ spec:
5358
securityContext:
5459
privileged: true
5560
volumeMounts:
56-
- name: hdfs-data
57-
mountPath: /hadoop/dfs/data
61+
{{- range $index, $path := .Values.dataNodeHostPath }}
62+
- name: hdfs-data-{{ $index }}
63+
mountPath: /hadoop/dfs/data/{{ $index }}
64+
{{- end }}
5865
# Use subPath below to mount only a single file.
5966
# See https://github.com/dshulyak/kubernetes.github.io/commit/d58ba7b075bb4848349a2c920caaa08ff3773d70
6067
- name: resolv-conf-volume
6168
mountPath: /etc/resolv.conf
6269
subPath: resolv.conf
6370
restartPolicy: Always
6471
volumes:
65-
- name: hdfs-data
72+
{{- range $index, $path := .Values.dataNodeHostPath }}
73+
- name: hdfs-data-{{ $index }}
6674
hostPath:
67-
path: {{ .Values.dataNodeHostPath }}
75+
path: {{ $path }}
76+
{{- end }}
6877
- configMap:
69-
name: resolv-conf-datanode
78+
name: hdfs-resolv-conf
7079
items:
7180
- key: resolv.conf
7281
path: resolv.conf

charts/hdfs-datanode-k8s/values.yaml

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,15 +15,13 @@
1515

1616
# Default values for template variables.
1717

18-
# Set this to the IP of your kube-dns name server that resolv POD host names.
19-
# This is used by datanode daemons when they connect to namenode using
20-
# the host name.
21-
clusterDnsIP: 10.96.0.10
22-
2318
# Set this to the domain name of your cluster that become part of POD and service
2419
# host names.
2520
clusterDomain: cluster.local
2621

27-
# Path of the local disk directory on cluster nodes that will contain the datanode
28-
# blocks. This will be mounted to the namenode as a k8s HostPath volume.
29-
dataNodeHostPath: /hdfs-data
22+
# A list of the local disk directories on cluster nodes that will contain the datanode
23+
# blocks. These paths will be mounted to the datanode as K8s HostPath volumes.
24+
# In a command line, the list should be enclosed in '{' and '}'.
25+
# e.g. --set "dataNodeHostPath={/hdfs-data,/hdfs-data1}"
26+
dataNodeHostPath:
27+
- /hdfs-data

charts/hdfs-namenode-k8s/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ HDFS `namenode` running inside a kubernetes cluster. See the other chart for
66
Requires Kubernetes version 1.5 and beyond, because `namenode` is using
77
`StatefulSet`, which is available only in version 1.5 and later.
88

9+
Make sure the `hdfs-resolv-conf` chart is launched.
10+
911
### Usage
1012

1113
1. Attach a label to one of your k8s cluster host that will run the `namenode`
@@ -20,7 +22,7 @@ HDFS `namenode` running inside a kubernetes cluster. See the other chart for
2022
2. Launch this helm chart, `hdfs-namenode-k8s`.
2123

2224
```
23-
$ helm install -n my-hdfs-namenode hdfs-k8s
25+
$ helm install -n my-hdfs-namenode hdfs-namenode-k8s
2426
```
2527

2628
3. Confirm the daemon is launched.

charts/hdfs-namenode-k8s/templates/namenode-statefulset.yaml

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ metadata:
3535
spec:
3636
serviceName: "hdfs-namenode"
3737
# Create a size-1 set. The namenode DNS name will be
38-
# hdfs-namenode-0.hdfs-namenode.kube-system.svc.YOUR-CLUSTER-DOMAIN
38+
# hdfs-namenode-0.hdfs-namenode.default.svc.YOUR-CLUSTER-DOMAIN
3939
replicas: 1
4040
template:
4141
metadata:
@@ -58,6 +58,11 @@ spec:
5858
volumeMounts:
5959
- name: hdfs-name
6060
mountPath: /hadoop/dfs/name
61+
# Use subPath below to mount only a single file.
62+
# See https://github.com/dshulyak/kubernetes.github.io/commit/d58ba7b075bb4848349a2c920caaa08ff3773d70
63+
- name: resolv-conf-volume
64+
mountPath: /etc/resolv.conf
65+
subPath: resolv.conf
6166
# Pin the pod to a node. You can label your node like below:
6267
# $ kubectl label nodes YOUR-NODE hdfs-namenode-selector=hdfs-namenode-0
6368
nodeSelector:
@@ -67,3 +72,9 @@ spec:
6772
- name: hdfs-name
6873
hostPath:
6974
path: {{ .Values.nameNodeHostPath }}
75+
- configMap:
76+
name: hdfs-resolv-conf
77+
items:
78+
- key: resolv.conf
79+
path: resolv.conf
80+
name: resolv-conf-volume

charts/hdfs-resolv-conf/Chart.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one or more
2+
# contributor license agreements. See the NOTICE file distributed with
3+
# this work for additional information regarding copyright ownership.
4+
# The ASF licenses this file to You under the Apache License, Version 2.0
5+
# (the "License"); you may not use this file except in compliance with
6+
# the License. You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
name: hdfs-resolv-conf
16+
version: 0.1
17+
description: ConfigMap entry containing resolv.conf used by HDFS daemons.

charts/hdfs-resolv-conf/README.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
ConfigMap entry storing the resolv.conf file that goes inside HDFS `namenode`
2+
and `datanode` pods.
3+
4+
### Usage
5+
6+
1. Find the service IP of your `kube-dns` of your k8s cluster.
7+
Try the following command and find the IP value in the output.
8+
It will be supplied below as the `clusterDnsIP` parameter.
9+
10+
```
11+
$ kubectl get svc --all-namespaces | grep kube-dns
12+
```
13+
14+
2. Find the domain name of your cluster that is part of
15+
cluster node host names. e.g. MYCOMPANY.COM in kube-n1.MYCOMPANY.COM.
16+
Default is "". This will be supplied below as
17+
the `hostNetworkDomains` parameter. You can find these from the `search`
18+
line in the following `kubectl run` output. `hostNetworkDomains` comes
19+
after the pod and service domain name such as `cluster.local`.
20+
21+
```
22+
$ kubectl run -i -t --rm busybox --image=busybox --restart=Never \
23+
--command -- cat /etc/resolv.conf
24+
...
25+
search default.svc.cluster.local svc.cluster.local cluster.local MYCOMPANY.COM
26+
...
27+
```
28+
29+
See `values.yaml`
30+
for additional parameters to change.
31+
32+
3. Launch this helm chart, `hdfs-resolv-conf`, while specifying
33+
the kube-dns name server IP and other parameters. (You can add multiple
34+
of them below in --set as comma-separated entries)
35+
36+
```
37+
$ helm install -n my-hdfs-resolv-conf \
38+
--set clusterDnsIP=MY-KUBE-DNS-IP,hostNetworkDomains=MYCOMPANY.COM \
39+
hdfs-resolv-conf
40+
```
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one or more
2+
# contributor license agreements. See the NOTICE file distributed with
3+
# this work for additional information regarding copyright ownership.
4+
# The ASF licenses this file to You under the Apache License, Version 2.0
5+
# (the "License"); you may not use this file except in compliance with
6+
# the License. You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
# Required for namenode and datanode to access kube-dns. This is required especially for datanode to
17+
# access namenode using the statefulset name.
18+
# K8s version 1.6 will fix this, per https://github.com/kubernetes/kubernetes/pull/29378.
19+
apiVersion: v1
20+
kind: ConfigMap
21+
metadata:
22+
name: hdfs-resolv-conf
23+
data:
24+
resolv.conf: |
25+
search default.svc.{{ .Values.clusterDomain }} svc.{{ .Values.clusterDomain }} {{ .Values.clusterDomain }} {{ .Values.hostNetworkDomains }}
26+
nameserver {{ .Values.clusterDnsIP }}
27+
options ndots:5

charts/hdfs-resolv-conf/values.yaml

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one or more
2+
# contributor license agreements. See the NOTICE file distributed with
3+
# this work for additional information regarding copyright ownership.
4+
# The ASF licenses this file to You under the Apache License, Version 2.0
5+
# (the "License"); you may not use this file except in compliance with
6+
# the License. You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
# Default values for template variables.
17+
18+
# Set this to the IP of your kube-dns name server that resolv POD host names.
19+
# This is used by datanode daemons when they connect to namenode using
20+
# the host name.
21+
clusterDnsIP: 10.96.0.10
22+
23+
# Set this to the domain name of your host network that becomes part of cluster node
24+
# host names. e.g. MYCOMPANY.COM in kube-n1.MYCOMPANY.COM.
25+
# If there are multiple names, then put them inside the quote separted by
26+
# spaces.
27+
hostNetworkDomains: ""
28+
29+
# Set this to the domain name of your cluster that becomes part of POD and service
30+
# host names.
31+
clusterDomain: cluster.local

0 commit comments

Comments
 (0)