|
| 1 | +--- |
| 2 | +layout: global |
| 3 | +title: HDFS namenodes |
| 4 | +--- |
1 | 5 | HDFS `namenodes` in HA setup running inside a Kubernetes cluster.
|
2 | 6 | See the other chart for `datanodes`.
|
3 | 7 |
|
4 | 8 | ### Usage
|
5 | 9 |
|
6 |
| - 1. Launch zookeeper and journal node quorum. Zookeeper is needed to decide |
7 |
| - which namenode instance is active. Journal node quorum is needed to |
| 10 | + 1. Launch a zookeeper quorum. Zookeeper is needed to decide |
| 11 | + which namenode instance is active. |
| 12 | + You would need to provide persistent volumes for zookeeper. |
| 13 | + If your quorum is size 3 (default), you need 3 volumes. |
| 14 | + |
| 15 | + You can run Zookeeper in two different ways. Here, you can use |
| 16 | + `kubectl create` using a single StatefulSet yaml file. |
| 17 | + |
| 18 | + ``` |
| 19 | + $ kubectl create -f \ |
| 20 | + https://raw.githubusercontent.com/kubernetes/contrib/master/statefulsets/zookeeper/zookeeper.yaml |
| 21 | + ``` |
| 22 | +
|
| 23 | + Alternatively, you can use a helm chart. |
| 24 | +
|
| 25 | + ``` |
| 26 | + $ helm install zookeeper \ |
| 27 | + --name my-zk \ |
| 28 | + --version 0.6.3 \ |
| 29 | + --repo https://kubernetes-charts-incubator.storage.googleapis.com/ |
| 30 | + ``` |
| 31 | +
|
| 32 | + 2. Launch a journal node quorum. The journal node quorum is needed to |
8 | 33 | synchronize metadata updates from the active namenode to the standby
|
9 |
| - namenode. You would need to provide persistent volumes for zookeeper and |
10 |
| - journal node quorums. If each quorum is size 3, you need 6 volumes in |
11 |
| - total. |
| 34 | + namenode. You would need to provide persistent volumes for journal node |
| 35 | + quorums. If your quorum is size 3 (default), you need 3 volumes. |
12 | 36 |
|
13 |
| - ``` |
14 |
| - $ kubectl create -f \ |
15 |
| - https://raw.githubusercontent.com/kubernetes/contrib/master/statefulsets/zookeeper/zookeeper.yaml |
16 |
| - $ helm install -n my-hdfs-journalnode hdfs-namenode-journalnode |
17 |
| - ``` |
| 37 | + ``` |
| 38 | + $ helm install -n my-hdfs-journalnode hdfs-journalnode |
| 39 | + ``` |
18 | 40 |
|
19 |
| - 2. (Skip this if you do not plan to enable Kerberos) |
| 41 | + 3. (Skip this if you do not plan to enable Kerberos) |
20 | 42 | Prepare Kerberos setup, following the steps below.
|
21 | 43 |
|
22 | 44 | - Create a config map containing your Kerberos config file. This will be
|
23 | 45 | mounted onto the namenode and datanode pods.
|
24 | 46 |
|
25 |
| - ``` |
26 |
| - $ kubectl create configmap kerberos-config --from-file=/etc/krb5.conf |
27 |
| - ``` |
| 47 | + ``` |
| 48 | + $ kubectl create configmap kerberos-config --from-file=/etc/krb5.conf |
| 49 | + ``` |
28 | 50 |
|
29 | 51 | - Generate per-host principal accounts and password keytab files for the namenode
|
30 | 52 | and datanode daemons. This is typically done in your Kerberos KDC host. For example,
|
31 | 53 | suppose the namenode will run on the k8s cluster node kube-n1.mycompany.com,
|
32 | 54 | and your datanodes will run on kube-n1.mycompany.com and kube-n2.mycompany.com.
|
33 | 55 | And your Kerberos realm is MYCOMPANY.COM, then
|
34 | 56 |
|
35 |
| - ``` |
36 |
| - $ kadmin.local -q "addprinc -randkey hdfs/[email protected]" |
37 |
| - $ kadmin.local -q "addprinc -randkey http/[email protected]" |
38 |
| - $ mkdir hdfs-keytabs |
39 |
| - $ kadmin.local -q "ktadd -norandkey \ |
40 |
| - -k hdfs-keytabs/kube-n1.mycompany.com.keytab \ |
41 |
| - |
42 |
| - |
43 |
| -
|
44 |
| - $ kadmin.local -q "addprinc -randkey hdfs/[email protected]" |
45 |
| - $ kadmin.local -q "addprinc -randkey http/[email protected]" |
46 |
| - $ kadmin.local -q "ktadd -norandkey \ |
47 |
| - -k hdfs-keytabs/kube-n2.mycompany.com.keytab \ |
48 |
| - |
49 |
| - |
50 |
| - $ kadmin.local -q "ktadd -norandkey \ |
51 |
| - -k hdfs-keytabs/kube-n2.mycompany.com.keytab \ |
52 |
| - |
53 |
| - |
54 |
| - ``` |
| 57 | + ``` |
| 58 | + $ kadmin.local -q "addprinc -randkey hdfs/[email protected]" |
| 59 | + $ kadmin.local -q "addprinc -randkey http/[email protected]" |
| 60 | + $ mkdir hdfs-keytabs |
| 61 | + $ kadmin.local -q "ktadd -norandkey \ |
| 62 | + -k hdfs-keytabs/kube-n1.mycompany.com.keytab \ |
| 63 | + |
| 64 | + |
| 65 | +
|
| 66 | + $ kadmin.local -q "addprinc -randkey hdfs/[email protected]" |
| 67 | + $ kadmin.local -q "addprinc -randkey http/[email protected]" |
| 68 | + $ kadmin.local -q "ktadd -norandkey \ |
| 69 | + -k hdfs-keytabs/kube-n2.mycompany.com.keytab \ |
| 70 | + |
| 71 | + |
| 72 | + $ kadmin.local -q "ktadd -norandkey \ |
| 73 | + -k hdfs-keytabs/kube-n2.mycompany.com.keytab \ |
| 74 | + |
| 75 | + |
| 76 | + ``` |
55 | 77 |
|
56 | 78 | - Create a k8s secret containing all the keytab files. This will be mounted
|
57 | 79 | onto the namenode and datanode pods. (You may want to restrict access to
|
58 | 80 | this secret using k8s
|
59 | 81 | [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/),
|
60 | 82 | to minimize exposure of the keytab files.
|
61 |
| - ``` |
62 |
| - $ kubectl create secret generic hdfs-kerberos-keytabs \ |
63 |
| - --from-file=kube-n1.mycompany.com.keytab \ |
64 |
| - --from-file=kube-n2.mycompany.com.keytab |
65 |
| - ``` |
| 83 | +
|
| 84 | + ``` |
| 85 | + $ kubectl create secret generic hdfs-kerberos-keytabs \ |
| 86 | + --from-file=kube-n1.mycompany.com.keytab \ |
| 87 | + --from-file=kube-n2.mycompany.com.keytab |
| 88 | + ``` |
66 | 89 |
|
67 | 90 | Optionally, attach a label to some of your k8s cluster hosts that will
|
68 | 91 | run the `namenode` daemons. This can allow your HDFS client outside
|
69 | 92 | the Kubernetes cluster to expect stable IP addresses. When used by
|
70 | 93 | those outside clients, Kerberos expects the namenode addresses to be
|
71 | 94 | stable.
|
72 |
| - ``` |
73 |
| - $ kubectl label nodes YOUR-HOST-1 hdfs-namenode-selector=hdfs-namenode |
74 |
| - $ kubectl label nodes YOUR-HOST-2 hdfs-namenode-selector=hdfs-namenode |
75 |
| - ``` |
76 | 95 |
|
77 |
| - 3. Now it's time to launch namenodes using the helm chart, `hdfs-namenode-k8s`. |
| 96 | + ``` |
| 97 | + $ kubectl label nodes YOUR-HOST-1 hdfs-namenode-selector=hdfs-namenode |
| 98 | + $ kubectl label nodes YOUR-HOST-2 hdfs-namenode-selector=hdfs-namenode |
| 99 | + ``` |
| 100 | +
|
| 101 | + 4. Now it's time to launch namenodes using the helm chart, `hdfs-namenode-k8s`. |
78 | 102 | But, you need to first provide two persistent volumes for storing
|
79 | 103 | metadata. Each volume should have at least 100 GB. (Can be overriden by
|
80 | 104 | the metadataVolumeSize helm option).
|
81 | 105 |
|
82 | 106 | With the volumes provided, you can launch the namenode HA with:
|
83 | 107 |
|
84 |
| - ``` |
85 |
| - $ helm install -n my-hdfs-namenode hdfs-namenode-k8s |
86 |
| - ``` |
87 |
| -
|
88 |
| - If enabling Kerberos, specify necessary options. For instance, |
89 |
| - ``` |
90 |
| - $ helm install -n my-hdfs-namenode \ |
91 |
| - --set kerberosEnabled=true,kerberosRealm=MYCOMPANY.COM hdfs-namenode-k8s |
92 |
| - ``` |
93 |
| - The two variables above are required. For other variables, see values.yaml. |
94 |
| -
|
95 |
| - If also using namenode labels for Kerberos, add |
96 |
| - the namenodePinningEnabled option: |
97 |
| - ``` |
98 |
| - $ helm install -n my-hdfs-namenode \ |
99 |
| - --set kerberosEnabled=true,kerberosRealm=MYCOMPANY.COM,namenodePinningEnabled=true \ |
100 |
| - hdfs-namenode-k8s |
101 |
| - ``` |
102 |
| -
|
103 |
| - 4. Confirm the daemons are launched. |
104 |
| -
|
105 |
| - ``` |
106 |
| - $ kubectl get pods | grep hdfs-namenode |
107 |
| - hdfs-namenode-0 1/1 Running 0 7m |
108 |
| - hdfs-namenode-1 1/1 Running 0 7m |
109 |
| - ``` |
| 108 | + ``` |
| 109 | + $ helm install -n my-hdfs-namenode hdfs-namenode-k8s |
| 110 | + ``` |
| 111 | +
|
| 112 | + If you launched Zookeeper using the helm chart in step (2), the command |
| 113 | + line will be slightly different: |
| 114 | +
|
| 115 | + ``` |
| 116 | + $ helm install -n my-hdfs-namenode hdfs-namenode-k8s \ |
| 117 | + --set zookeeperQuorum=my-zk-zookeeper-0.my-zk-zookeeper-headless.default.svc.cluster.local:2181,my-zk-zookeeper-1.my-zk-zookeeper-headless.default.svc.cluster.local:2181,my-zk-zookeeper-2.my-zk-zookeeper-headless.default.svc.cluster.local:2181 |
| 118 | + ``` |
| 119 | +
|
| 120 | + If enabling Kerberos, specify necessary options. For instance, |
| 121 | + ``` |
| 122 | + $ helm install -n my-hdfs-namenode \ |
| 123 | + --set kerberosEnabled=true,kerberosRealm=MYCOMPANY.COM hdfs-namenode-k8s |
| 124 | + ``` |
| 125 | + The two variables above are required. For other variables, see values.yaml. |
| 126 | +
|
| 127 | + If also using namenode labels for Kerberos, add |
| 128 | + the namenodePinningEnabled option: |
| 129 | + ``` |
| 130 | + $ helm install -n my-hdfs-namenode \ |
| 131 | + --set kerberosEnabled=true,kerberosRealm=MYCOMPANY.COM,namenodePinningEnabled=true \ |
| 132 | + hdfs-namenode-k8s |
| 133 | + ``` |
| 134 | +
|
| 135 | + 5. Confirm the daemons are launched. |
| 136 | +
|
| 137 | + ``` |
| 138 | + $ kubectl get pods | grep hdfs-namenode |
| 139 | + hdfs-namenode-0 1/1 Running 0 7m |
| 140 | + hdfs-namenode-1 1/1 Running 0 7m |
| 141 | + ``` |
110 | 142 |
|
111 | 143 | `namenode` is using `hostNetwork` so it can see physical IPs of datanodes
|
112 | 144 | without an overlay network such as weave-net masking them.
|
113 | 145 |
|
114 |
| -###Credits |
| 146 | +### Credits |
115 | 147 |
|
116 | 148 | This chart is using public Hadoop docker images hosted by
|
117 | 149 | [uhopper](https://hub.docker.com/u/uhopper/).
|
0 commit comments