Skip to content

Commit 5d1010f

Browse files
authored
Supports CI with minikube on travis (#34)
* Add travis yaml * Add comments * Add test setup script * Fix a typo * Fix minor bugs * Fix a bug * Add test run script * Add helm * Try to run zookeeper * build nsenter * Use sudo for moving nsenter * Suppress a warning * Zookeeper runs ok * Add journal nodes * Fix a typo * Journal nodes run ok * Minor clean up * Minor clean up * Journal nodes run ok * Run namenodes * Check namenode * Launch all helms * Fix a bug * Avoid kube-dns crash * Downgrade minikube to v0.23.0 * Use kubernetes 1.9.x * Add comments * All pods run ok * Clean up * Clean up * Copy files to HDFS for tests * Clean up * Clean up * Clean up * Fix zookeeper service name * Made pod check more robust * Clean up * Clean up * Retry client commands * Clean up * Data node use the right dir * Show travis disks * Fix a typo * Clean up * Separate out cleanup.sh * Clean up * Minor clean up * Update documentation * Fix required version number * Clean up links * Clean up links * Fix indentation * Add link * Fix typo * Address review comments
1 parent fd66f1d commit 5d1010f

File tree

14 files changed

+522
-83
lines changed

14 files changed

+522
-83
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
tests/bin
2+
tests/tmp

.travis.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
sudo: required
2+
3+
env:
4+
- USE_MINIKUBE_DRIVER_NONE=true USE_SUDO_MINIKUBE=true
5+
6+
before_script:
7+
- tests/setup.sh
8+
9+
script:
10+
- tests/run.sh
11+
12+
after_script:
13+
- tests/cleanup.sh
14+
- tests/teardown.sh

README.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,12 @@
1-
# kubernetes-HDFS
2-
Repository holding configuration files for running an HDFS cluster in Kubernetes
1+
---
2+
layout: global
3+
title: HDFS on Kubernetes
4+
---
5+
# HDFS on Kubernetes
6+
Repository holding helm charts for running Hadoop Distributed File System (HDFS)
7+
on Kubernetes.
8+
9+
See [charts/README.md](charts/README.md) for how to run the charts.
10+
11+
See [tests/README.md](tests/README.md) for how to run integration tests for
12+
HDFS on Kubernetes.

charts/README.md

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,29 @@
1-
### Prerequisite
1+
---
2+
layout: global
3+
title: HDFS charts
4+
---
25

3-
Requires Kubernetes 1.6 as the `namenode` and `datanodes` are using `ClusterFirstWithHostNet`, which was introduced in Kubernetes 1.6
6+
# HDFS charts
7+
Helm charts for launching HDFS daemons in a K8s cluster. Note that the HDFS
8+
charts are currently being heavily revised and are subject to change.
49

5-
### Usage
10+
# Prerequisite
611

7-
Helm charts for launching HDFS daemons in a K8s cluster.
8-
The daemons should be launched in the following order.
12+
Requires Kubernetes 1.6+ as the `namenode` and `datanodes` are using
13+
`ClusterFirstWithHostNet`, which was introduced in Kubernetes 1.6
14+
15+
# Usage
16+
17+
The HDFS daemons should be launched in the following order.
918

1019
1. hdfs namenode daemons. For the High Availity (HA)
11-
setup, follow instructions in `hdfs-namenode-k8s/README.md`. Or if you do
12-
not want the HA setup, follow `hdfs-simple-namenode-k8s/README.md` instead.
13-
2. hdfs datanode daemons. See `hdfs-datanode-k8s/README.md`
20+
setup, follow instructions in
21+
[hdfs-namenode-k8s/README.md](hdfs-namenode-k8s/README.md)
22+
Or if you do not want the HA setup, follow
23+
[hdfs-simple-namenode-k8s/README.md](hdfs-simple-namenode-k8s/README.md)
24+
instead.
25+
2. hdfs datanode daemons. See
26+
[hdfs-datanode-k8s/README.md](hdfs-datanode-k8s/README.md)
1427
for how to launch.
1528

1629
Kerberos is supported. See the `kerberosEnabled` option in the namenode and

charts/hdfs-datanode-k8s/README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
---
2+
layout: global
3+
title: HDFS datanodes
4+
---
15
HDFS `datanodes` running inside a kubernetes cluster. See the other chart for
26
`namenode`.
37

@@ -50,7 +54,7 @@ physical IPs.
5054
5155
Note they run under the `default` namespace.
5256
53-
###Credits
57+
### Credits
5458
5559
This chart is using public Hadoop docker images hosted by
5660
[uhopper](https://hub.docker.com/u/uhopper/).

charts/hdfs-namenode-k8s/README.md

Lines changed: 103 additions & 71 deletions
Original file line numberDiff line numberDiff line change
@@ -1,117 +1,149 @@
1+
---
2+
layout: global
3+
title: HDFS namenodes
4+
---
15
HDFS `namenodes` in HA setup running inside a Kubernetes cluster.
26
See the other chart for `datanodes`.
37

48
### Usage
59

6-
1. Launch zookeeper and journal node quorum. Zookeeper is needed to decide
7-
which namenode instance is active. Journal node quorum is needed to
10+
1. Launch a zookeeper quorum. Zookeeper is needed to decide
11+
which namenode instance is active.
12+
You would need to provide persistent volumes for zookeeper.
13+
If your quorum is size 3 (default), you need 3 volumes.
14+
15+
You can run Zookeeper in two different ways. Here, you can use
16+
`kubectl create` using a single StatefulSet yaml file.
17+
18+
```
19+
$ kubectl create -f \
20+
https://raw.githubusercontent.com/kubernetes/contrib/master/statefulsets/zookeeper/zookeeper.yaml
21+
```
22+
23+
Alternatively, you can use a helm chart.
24+
25+
```
26+
$ helm install zookeeper \
27+
--name my-zk \
28+
--version 0.6.3 \
29+
--repo https://kubernetes-charts-incubator.storage.googleapis.com/
30+
```
31+
32+
2. Launch a journal node quorum. The journal node quorum is needed to
833
synchronize metadata updates from the active namenode to the standby
9-
namenode. You would need to provide persistent volumes for zookeeper and
10-
journal node quorums. If each quorum is size 3, you need 6 volumes in
11-
total.
34+
namenode. You would need to provide persistent volumes for journal node
35+
quorums. If your quorum is size 3 (default), you need 3 volumes.
1236
13-
```
14-
$ kubectl create -f \
15-
https://raw.githubusercontent.com/kubernetes/contrib/master/statefulsets/zookeeper/zookeeper.yaml
16-
$ helm install -n my-hdfs-journalnode hdfs-namenode-journalnode
17-
```
37+
```
38+
$ helm install -n my-hdfs-journalnode hdfs-journalnode
39+
```
1840
19-
2. (Skip this if you do not plan to enable Kerberos)
41+
3. (Skip this if you do not plan to enable Kerberos)
2042
Prepare Kerberos setup, following the steps below.
2143
2244
- Create a config map containing your Kerberos config file. This will be
2345
mounted onto the namenode and datanode pods.
2446
25-
```
26-
$ kubectl create configmap kerberos-config --from-file=/etc/krb5.conf
27-
```
47+
```
48+
$ kubectl create configmap kerberos-config --from-file=/etc/krb5.conf
49+
```
2850
2951
- Generate per-host principal accounts and password keytab files for the namenode
3052
and datanode daemons. This is typically done in your Kerberos KDC host. For example,
3153
suppose the namenode will run on the k8s cluster node kube-n1.mycompany.com,
3254
and your datanodes will run on kube-n1.mycompany.com and kube-n2.mycompany.com.
3355
And your Kerberos realm is MYCOMPANY.COM, then
3456
35-
```
36-
$ kadmin.local -q "addprinc -randkey hdfs/[email protected]"
37-
$ kadmin.local -q "addprinc -randkey http/[email protected]"
38-
$ mkdir hdfs-keytabs
39-
$ kadmin.local -q "ktadd -norandkey \
40-
-k hdfs-keytabs/kube-n1.mycompany.com.keytab \
41-
42-
43-
44-
$ kadmin.local -q "addprinc -randkey hdfs/[email protected]"
45-
$ kadmin.local -q "addprinc -randkey http/[email protected]"
46-
$ kadmin.local -q "ktadd -norandkey \
47-
-k hdfs-keytabs/kube-n2.mycompany.com.keytab \
48-
49-
50-
$ kadmin.local -q "ktadd -norandkey \
51-
-k hdfs-keytabs/kube-n2.mycompany.com.keytab \
52-
53-
54-
```
57+
```
58+
$ kadmin.local -q "addprinc -randkey hdfs/[email protected]"
59+
$ kadmin.local -q "addprinc -randkey http/[email protected]"
60+
$ mkdir hdfs-keytabs
61+
$ kadmin.local -q "ktadd -norandkey \
62+
-k hdfs-keytabs/kube-n1.mycompany.com.keytab \
63+
64+
65+
66+
$ kadmin.local -q "addprinc -randkey hdfs/[email protected]"
67+
$ kadmin.local -q "addprinc -randkey http/[email protected]"
68+
$ kadmin.local -q "ktadd -norandkey \
69+
-k hdfs-keytabs/kube-n2.mycompany.com.keytab \
70+
71+
72+
$ kadmin.local -q "ktadd -norandkey \
73+
-k hdfs-keytabs/kube-n2.mycompany.com.keytab \
74+
75+
76+
```
5577
5678
- Create a k8s secret containing all the keytab files. This will be mounted
5779
onto the namenode and datanode pods. (You may want to restrict access to
5880
this secret using k8s
5981
[RBAC](https://kubernetes.io/docs/admin/authorization/rbac/),
6082
to minimize exposure of the keytab files.
61-
```
62-
$ kubectl create secret generic hdfs-kerberos-keytabs \
63-
--from-file=kube-n1.mycompany.com.keytab \
64-
--from-file=kube-n2.mycompany.com.keytab
65-
```
83+
84+
```
85+
$ kubectl create secret generic hdfs-kerberos-keytabs \
86+
--from-file=kube-n1.mycompany.com.keytab \
87+
--from-file=kube-n2.mycompany.com.keytab
88+
```
6689
6790
Optionally, attach a label to some of your k8s cluster hosts that will
6891
run the `namenode` daemons. This can allow your HDFS client outside
6992
the Kubernetes cluster to expect stable IP addresses. When used by
7093
those outside clients, Kerberos expects the namenode addresses to be
7194
stable.
72-
```
73-
$ kubectl label nodes YOUR-HOST-1 hdfs-namenode-selector=hdfs-namenode
74-
$ kubectl label nodes YOUR-HOST-2 hdfs-namenode-selector=hdfs-namenode
75-
```
7695
77-
3. Now it's time to launch namenodes using the helm chart, `hdfs-namenode-k8s`.
96+
```
97+
$ kubectl label nodes YOUR-HOST-1 hdfs-namenode-selector=hdfs-namenode
98+
$ kubectl label nodes YOUR-HOST-2 hdfs-namenode-selector=hdfs-namenode
99+
```
100+
101+
4. Now it's time to launch namenodes using the helm chart, `hdfs-namenode-k8s`.
78102
But, you need to first provide two persistent volumes for storing
79103
metadata. Each volume should have at least 100 GB. (Can be overriden by
80104
the metadataVolumeSize helm option).
81105
82106
With the volumes provided, you can launch the namenode HA with:
83107
84-
```
85-
$ helm install -n my-hdfs-namenode hdfs-namenode-k8s
86-
```
87-
88-
If enabling Kerberos, specify necessary options. For instance,
89-
```
90-
$ helm install -n my-hdfs-namenode \
91-
--set kerberosEnabled=true,kerberosRealm=MYCOMPANY.COM hdfs-namenode-k8s
92-
```
93-
The two variables above are required. For other variables, see values.yaml.
94-
95-
If also using namenode labels for Kerberos, add
96-
the namenodePinningEnabled option:
97-
```
98-
$ helm install -n my-hdfs-namenode \
99-
--set kerberosEnabled=true,kerberosRealm=MYCOMPANY.COM,namenodePinningEnabled=true \
100-
hdfs-namenode-k8s
101-
```
102-
103-
4. Confirm the daemons are launched.
104-
105-
```
106-
$ kubectl get pods | grep hdfs-namenode
107-
hdfs-namenode-0 1/1 Running 0 7m
108-
hdfs-namenode-1 1/1 Running 0 7m
109-
```
108+
```
109+
$ helm install -n my-hdfs-namenode hdfs-namenode-k8s
110+
```
111+
112+
If you launched Zookeeper using the helm chart in step (2), the command
113+
line will be slightly different:
114+
115+
```
116+
$ helm install -n my-hdfs-namenode hdfs-namenode-k8s \
117+
--set zookeeperQuorum=my-zk-zookeeper-0.my-zk-zookeeper-headless.default.svc.cluster.local:2181,my-zk-zookeeper-1.my-zk-zookeeper-headless.default.svc.cluster.local:2181,my-zk-zookeeper-2.my-zk-zookeeper-headless.default.svc.cluster.local:2181
118+
```
119+
120+
If enabling Kerberos, specify necessary options. For instance,
121+
```
122+
$ helm install -n my-hdfs-namenode \
123+
--set kerberosEnabled=true,kerberosRealm=MYCOMPANY.COM hdfs-namenode-k8s
124+
```
125+
The two variables above are required. For other variables, see values.yaml.
126+
127+
If also using namenode labels for Kerberos, add
128+
the namenodePinningEnabled option:
129+
```
130+
$ helm install -n my-hdfs-namenode \
131+
--set kerberosEnabled=true,kerberosRealm=MYCOMPANY.COM,namenodePinningEnabled=true \
132+
hdfs-namenode-k8s
133+
```
134+
135+
5. Confirm the daemons are launched.
136+
137+
```
138+
$ kubectl get pods | grep hdfs-namenode
139+
hdfs-namenode-0 1/1 Running 0 7m
140+
hdfs-namenode-1 1/1 Running 0 7m
141+
```
110142
111143
`namenode` is using `hostNetwork` so it can see physical IPs of datanodes
112144
without an overlay network such as weave-net masking them.
113145
114-
###Credits
146+
### Credits
115147
116148
This chart is using public Hadoop docker images hosted by
117149
[uhopper](https://hub.docker.com/u/uhopper/).

charts/hdfs-namenode-k8s/templates/namenode-statefulset.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,11 +105,18 @@ spec:
105105
labels:
106106
app: hdfs-namenode
107107
spec:
108+
{{- if .Values.hostNetworkEnabled }}
108109
# Use hostNetwork so datanodes connect to namenode without going through an overlay network
109110
# like weave. Otherwise, namenode fails to see physical IP address of datanodes.
111+
# Disabling this will break data locality as namenode will see pod virtual IPs and fails to
112+
# equate them with cluster node physical IPs associated with data nodes.
113+
# We currently disable this only for CI on minikube.
110114
hostNetwork: true
111115
hostPID: true
112116
dnsPolicy: ClusterFirstWithHostNet
117+
{{- else }}
118+
dnsPolicy: ClusterFirst
119+
{{- end }}
113120
affinity:
114121
podAntiAffinity:
115122
requiredDuringSchedulingIgnoredDuringExecution:

charts/hdfs-namenode-k8s/values.yaml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,14 +35,19 @@ metadataVolumeSize: 100Gi
3535
# Separated by the comma character.
3636
zookeeperQuorum: zk-0.zk-svc.default.svc.cluster.local:2181,zk-1.zk-svc.default.svc.cluster.local:2181,zk-2.zk-svc.default.svc.cluster.local:2181
3737

38-
3938
# Journal nodes quorum to use for sharing editlogs from an active namenode to
4039
# a standby namenode. Separated by the semicolon character.
4140
journalQuorum: hdfs-journalnode-0.hdfs-journalnode.default.svc.cluster.local:8485;hdfs-journalnode-1.hdfs-journalnode.default.svc.cluster.local:8485;hdfs-journalnode-2.hdfs-journalnode.default.svc.cluster.local:8485
4241

4342
# Whether or not to enable pinning of namenode pods to labeled k8s cluster nodes.
4443
namenodePinningEnabled: false
4544

45+
# Whether or not to use hostNetwork in namenode pods. Disabling this will break
46+
# data locality as namenode will see pod virtual IPs and fails to equate them with
47+
# cluster node physical IPs associated with data nodes.
48+
# We currently disable this only for CI on minikube.
49+
hostNetworkEnabled: true
50+
4651
# Custom hadoop config keys passed through env variables to hadoop uhopper images.
4752
# See https://hub.docker.com/r/uhopper/hadoop/ to get more details
4853
# Please note that these are not hadoop env variables, but docker env variables that

0 commit comments

Comments
 (0)