Skip to content

Commit b40fc37

Browse files
kimoonkimfoxish
authored andcommitted
Add k8s helm charts that run HDFS daemons in Kubernetes (#2)
* Add initial version of hdfs-k8s helm chart * Add license * Switch namenode to StatefulSet * Use two charts for sequencing. hostNetwork in namenode * Add clarifying comments * Drop misleading master schedule annotation * Fix README.md to drop misleading kube-dns default IP value
1 parent 92e4d0d commit b40fc37

File tree

8 files changed

+332
-0
lines changed

8 files changed

+332
-0
lines changed

charts/hdfs-datanode-k8s/Chart.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one or more
2+
# contributor license agreements. See the NOTICE file distributed with
3+
# this work for additional information regarding copyright ownership.
4+
# The ASF licenses this file to You under the Apache License, Version 2.0
5+
# (the "License"); you may not use this file except in compliance with
6+
# the License. You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
name: hdfs-datanode-k8s
16+
version: 0.1
17+
description: Hadoop Distributed File System (HDFS) hosted by Kubernetes.

charts/hdfs-datanode-k8s/README.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
HDFS `datanodes` running inside a kubernetes cluster. See the other chart for
2+
`namenode`.
3+
4+
### Prerequisite
5+
6+
Requires Kubernetes version 1.5 and beyond, because `namenode` is using
7+
`StatefulSet`, which is available only in version 1.5 and later.
8+
9+
Make sure `namenode` is fully launched using the other chart. `Datanodes` rely
10+
on DNS to resolve the hostname of the namenode when they start up.
11+
12+
### Usage
13+
14+
1. Find the service IP of your `kube-dns` of your k8s cluster.
15+
Try the following command and find the IP value in the output.
16+
It will be supplied below as the `clusterDnsIP` parameter.
17+
18+
```
19+
$ kubectl get svc --all-namespaces | grep kube-dns
20+
```
21+
22+
2. Optionally, find the domain name of your k8s cluster that become part of
23+
pod and service host names. Default is `cluster.local`. See `values.yaml`
24+
for additional parameters to change. You can add them below in `--set`,
25+
as comma-separated entries.
26+
27+
3. Launch this helm chart, `hdfs-datanode-k8s`, while specifying
28+
the kube-dns name server IP and other parameters. (You can add multiple
29+
of them below in --set as comma-separated entries)
30+
31+
```
32+
$ helm install -n my-hdfs-datanode --namespace kube-system \
33+
--set clusterDnsIP=YOUR-KUBE-DNS-IP hdfs-datanode-k8s
34+
```
35+
36+
5. Confirm the daemons are launched.
37+
38+
```
39+
$ kubectl get pods --all-namespaces | grep hdfs-datanode-
40+
kube-system hdfs-datanode-ajdcz 1/1 Running 0 7m
41+
kube-system hdfs-datanode-f1w24 1/1 Running 0 7m
42+
```
43+
44+
`Datanode` daemons run on every cluster node. They also mount k8s `hostPath`
45+
local disk volumes.
46+
47+
`Datanodes` are using `hostNetwork` to register to `namenode` using
48+
physical IPs.
49+
50+
Note they run under the `kube-system` namespace.
51+
52+
###Credits
53+
54+
This chart is using public Hadoop docker images hosted by
55+
[uhopper](https://hub.docker.com/u/uhopper/).
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one or more
2+
# contributor license agreements. See the NOTICE file distributed with
3+
# this work for additional information regarding copyright ownership.
4+
# The ASF licenses this file to You under the Apache License, Version 2.0
5+
# (the "License"); you may not use this file except in compliance with
6+
# the License. You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
apiVersion: v1
16+
kind: ConfigMap
17+
metadata:
18+
name: resolv-conf-datanode
19+
namespace: kube-system
20+
data:
21+
resolv.conf: |
22+
search kube-system.svc.{{ .Values.clusterDomain }} svc.{{ .Values.clusterDomain }} {{ .Values.clusterDomain }}
23+
nameserver {{ .Values.clusterDnsIP }}
24+
options ndots:5
25+
---
26+
# Deleting a daemonset may need some trick. See
27+
# https://github.com/kubernetes/kubernetes/issues/33245#issuecomment-261250489
28+
apiVersion: extensions/v1beta1
29+
kind: DaemonSet
30+
metadata:
31+
name: hdfs-datanode
32+
namespace: kube-system
33+
spec:
34+
template:
35+
metadata:
36+
labels:
37+
name: hdfs-datanode
38+
spec:
39+
hostNetwork: true
40+
hostPID: true
41+
containers:
42+
- name: datanode
43+
image: uhopper/hadoop-datanode:2.7.2
44+
env:
45+
# This works only with /etc/resolv.conf mounted from the config map.
46+
# K8s version 1.6 will fix this, per https://github.com/kubernetes/kubernetes/pull/29378.
47+
- name: CORE_CONF_fs_defaultFS
48+
value: hdfs://hdfs-namenode-0.hdfs-namenode.kube-system.svc.{{ .Values.clusterDomain }}:8020
49+
livenessProbe:
50+
initialDelaySeconds: 30
51+
httpGet:
52+
host: 127.0.0.1
53+
path: /
54+
port: 50075
55+
securityContext:
56+
privileged: true
57+
volumeMounts:
58+
- name: hdfs-data
59+
mountPath: /hadoop/dfs/data
60+
# Use subPath below to mount only a single file.
61+
# See https://github.com/dshulyak/kubernetes.github.io/commit/d58ba7b075bb4848349a2c920caaa08ff3773d70
62+
- name: resolv-conf-volume
63+
mountPath: /etc/resolv.conf
64+
subPath: resolv.conf
65+
restartPolicy: Always
66+
volumes:
67+
- name: hdfs-data
68+
hostPath:
69+
path: {{ .Values.dataNodeHostPath }}
70+
- configMap:
71+
name: resolv-conf-datanode
72+
items:
73+
- key: resolv.conf
74+
path: resolv.conf
75+
name: resolv-conf-volume

charts/hdfs-datanode-k8s/values.yaml

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one or more
2+
# contributor license agreements. See the NOTICE file distributed with
3+
# this work for additional information regarding copyright ownership.
4+
# The ASF licenses this file to You under the Apache License, Version 2.0
5+
# (the "License"); you may not use this file except in compliance with
6+
# the License. You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
# Default values for template variables.
17+
18+
# Set this to the IP of your kube-dns name server that resolv POD host names.
19+
# This is used by datanode daemons when they connect to namenode using
20+
# the host name.
21+
clusterDnsIP: 10.96.0.10
22+
23+
# Set this to the domain name of your cluster that become part of POD and service
24+
# host names.
25+
clusterDomain: cluster.local
26+
27+
# Path of the local disk directory on cluster nodes that will contain the datanode
28+
# blocks. This will be mounted to the namenode as a k8s HostPath volume.
29+
dataNodeHostPath: /hdfs-data

charts/hdfs-namenode-k8s/Chart.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one or more
2+
# contributor license agreements. See the NOTICE file distributed with
3+
# this work for additional information regarding copyright ownership.
4+
# The ASF licenses this file to You under the Apache License, Version 2.0
5+
# (the "License"); you may not use this file except in compliance with
6+
# the License. You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
name: hdfs-namenode-k8s
16+
version: 0.1
17+
description: Hadoop Distributed File System (HDFS) hosted by Kubernetes.

charts/hdfs-namenode-k8s/README.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
HDFS `namenode` running inside a kubernetes cluster. See the other chart for
2+
`datanodes`.
3+
4+
### Prerequisite
5+
6+
Requires Kubernetes version 1.5 and beyond, because `namenode` is using
7+
`StatefulSet`, which is available only in version 1.5 and later.
8+
9+
### Usage
10+
11+
1. Attach a label to one of your k8s cluster host that will run the `namenode`
12+
daemon. (This is required as `namenode` currently mounts a local disk
13+
`hostPath` volume. We will switch to persistent volume in the future, so
14+
we can skip this step.)
15+
16+
```
17+
$ kubectl label nodes YOUR-HOST hdfs-namenode-selector=hdfs-namenode-0
18+
```
19+
20+
2. Launch this helm chart, `hdfs-namenode-k8s`.
21+
22+
```
23+
$ helm install -n my-hdfs-namenode --namespace kube-system hdfs-k8s
24+
```
25+
26+
3. Confirm the daemon is launched.
27+
28+
```
29+
$ kubectl get pods --all-namespaces | grep hdfs-namenode
30+
kube-system hdfs-namenode-0 1/1 Running 0 7m
31+
```
32+
33+
There will be only one `namenode` instance. i.e. High Availability (HA) is not
34+
supported at the moment. The `namenode` instance is supposed to be pinned to
35+
a cluster host using a node label, as shown in the usage above. `Namenode`
36+
mount a local disk directory using k8s `hostPath` volume.
37+
38+
`namenode` is using `hostNetwork` so it can see physical IPs of datanodes
39+
without an overlay network such as weave-net mask them.
40+
41+
Note it runs under the `kube-system` namespace.
42+
43+
###Credits
44+
45+
This chart is using public Hadoop docker images hosted by
46+
[uhopper](https://hub.docker.com/u/uhopper/).
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one or more
2+
# contributor license agreements. See the NOTICE file distributed with
3+
# this work for additional information regarding copyright ownership.
4+
# The ASF licenses this file to You under the Apache License, Version 2.0
5+
# (the "License"); you may not use this file except in compliance with
6+
# the License. You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
# A headless service to create DNS records.
17+
apiVersion: v1
18+
kind: Service
19+
metadata:
20+
name: hdfs-namenode
21+
namespace: kube-system
22+
labels:
23+
app: hdfs-namenode
24+
spec:
25+
ports:
26+
- port: 8020
27+
name: fs
28+
clusterIP: None
29+
selector:
30+
app: hdfs-namenode
31+
---
32+
apiVersion: apps/v1beta1
33+
kind: StatefulSet
34+
metadata:
35+
name: hdfs-namenode
36+
namespace: kube-system
37+
spec:
38+
serviceName: "hdfs-namenode"
39+
# Create a size-1 set. The namenode DNS name will be
40+
# hdfs-namenode-0.hdfs-namenode.kube-system.svc.YOUR-CLUSTER-DOMAIN
41+
replicas: 1
42+
template:
43+
metadata:
44+
labels:
45+
app: hdfs-namenode
46+
spec:
47+
# Use hostNetwork so datanodes connect to namenode without going through an overlay network
48+
# like weave. Otherwise, namenode fails to see physical IP address of datanodes.
49+
hostNetwork: true
50+
hostPID: true
51+
terminationGracePeriodSeconds: 0
52+
containers:
53+
- name: hdfs-namenode
54+
image: uhopper/hadoop-namenode:2.7.2
55+
env:
56+
- name: CLUSTER_NAME
57+
value: hdfs-k8s
58+
ports:
59+
- containerPort: 8020
60+
name: fs
61+
volumeMounts:
62+
- name: hdfs-name
63+
mountPath: /hadoop/dfs/name
64+
# Pin the pod to a node. You can label your node like below:
65+
# $ kubectl label nodes YOUR-NODE hdfs-namenode-selector=hdfs-namenode-0
66+
nodeSelector:
67+
hdfs-namenode-selector: hdfs-namenode-0
68+
restartPolicy: Always
69+
volumes:
70+
- name: hdfs-name
71+
hostPath:
72+
path: {{ .Values.nameNodeHostPath }}

charts/hdfs-namenode-k8s/values.yaml

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one or more
2+
# contributor license agreements. See the NOTICE file distributed with
3+
# this work for additional information regarding copyright ownership.
4+
# The ASF licenses this file to You under the Apache License, Version 2.0
5+
# (the "License"); you may not use this file except in compliance with
6+
# the License. You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
# Default values for template variables.
17+
18+
# Path of the local disk directory on a cluster node that will contain the namenode
19+
# fsimage and edit logs. This will be mounted to the namenode as a k8s HostPath
20+
# volume.
21+
nameNodeHostPath: /hdfs-name

0 commit comments

Comments
 (0)