Skip to content

Commit c53677e

Browse files
committed
hdfs chart
1 parent d89a6be commit c53677e

30 files changed

+1001
-0
lines changed

charts/hdfs/.helmignore

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Patterns to ignore when building packages.
2+
# This supports shell glob matching, relative path matching, and
3+
# negation (prefixed with !). Only one pattern per line.
4+
.DS_Store
5+
# Common VCS dirs
6+
.git/
7+
.gitignore
8+
.bzr/
9+
.bzrignore
10+
.hg/
11+
.hgignore
12+
.svn/
13+
# Common backup files
14+
*.swp
15+
*.bak
16+
*.tmp
17+
*~
18+
# Various IDEs
19+
.project
20+
.idea/
21+
*.tmproj

charts/hdfs/Chart.yaml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
apiVersion: v1
2+
appVersion: 2.7.7
3+
description: The Apache Hadoop software library is a framework that allows for the
4+
distributed processing of large data sets across clusters of computers using simple
5+
programming models.
6+
home: https://hadoop.apache.org/
7+
icon: http://hadoop.apache.org/images/hadoop-logo.jpg
8+
maintainers:
9+
- email: cgiraldo@gradiant.org
10+
name: cgiraldo
11+
name: hdfs
12+
sources:
13+
- https://github.com/apache/hadoop
14+
version: 0.1.0

charts/hdfs/README.md

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
2+
# Hadoop Chart
3+
4+
** This is the readme from the original hadoop helm chart (https://github.com/helm/charts/tree/master/stable/hadoop) **
5+
** This version removes yarn manager and provides advanced hadoop configuration through env variables **
6+
7+
[Hadoop](https://hadoop.apache.org/) is a framework for running large scale distributed applications.
8+
9+
This chart is primarily intended to be used for YARN and MapReduce job execution where HDFS is just used as a means to transport small artifacts within the framework and not for a distributed filesystem. Data should be read from cloud based datastores such as Google Cloud Storage, S3 or Swift.
10+
11+
## Chart Details
12+
13+
## Installing the Chart
14+
15+
To install the chart with the release name `hadoop` that utilizes 50% of the available node resources:
16+
17+
```
18+
$ helm install --name hadoop $(stable/hadoop/tools/calc_resources.sh 50) stable/hadoop
19+
```
20+
21+
> Note that you need at least 2GB of free memory per NodeManager pod, if your cluster isn't large enough, not all pods will be scheduled.
22+
23+
The optional [`calc_resources.sh`](./tools/calc_resources.sh) script is used as a convenience helper to set the `yarn.numNodes`, and `yarn.nodeManager.resources` appropriately to utilize all nodes in the Kubernetes cluster and a given percentage of their resources. For example, with a 3 node `n1-standard-4` GKE cluster and an argument of `50`, this would create 3 NodeManager pods claiming 2 cores and 7.5Gi of memory.
24+
25+
### Persistence
26+
27+
To install the chart with persistent volumes:
28+
29+
```
30+
$ helm install --name hadoop $(stable/hadoop/tools/calc_resources.sh 50) \
31+
--set persistence.nameNode.enabled=true \
32+
--set persistence.nameNode.storageClass=standard \
33+
--set persistence.dataNode.enabled=true \
34+
--set persistence.dataNode.storageClass=standard \
35+
stable/hadoop
36+
```
37+
38+
> Change the value of `storageClass` to match your volume driver. `standard` works for Google Container Engine clusters.
39+
40+
## Configuration
41+
42+
The following table lists the configurable parameters of the Hadoop chart and their default values.
43+
44+
| Parameter | Description | Default |
45+
| ------------------------------------------------- | ------------------------------- | ---------------------------------------------------------------- |
46+
| `image.repository` | Hadoop image ([source](https://github.com/Comcast/kube-yarn/tree/master/image)) | `danisla/hadoop` |
47+
| `image.tag` | Hadoop image tag | `2.9.0` |
48+
| `imagee.pullPolicy` | Pull policy for the images | `IfNotPresent` |
49+
| `hadoopVersion` | Version of hadoop libraries being used | `2.9.0` |
50+
| `antiAffinity` | Pod antiaffinity, `hard` or `soft` | `hard` |
51+
| `hdfs.nameNode.pdbMinAvailable` | PDB for HDFS NameNode | `1` |
52+
| `hdfs.nameNode.resources` | resources for the HDFS NameNode | `requests:memory=256Mi,cpu=10m,limits:memory=2048Mi,cpu=1000m` |
53+
| `hdfs.dataNode.replicas` | Number of HDFS DataNode replicas | `1` |
54+
| `hdfs.dataNode.pdbMinAvailable` | PDB for HDFS DataNode | `1` |
55+
| `hdfs.dataNode.resources` | resources for the HDFS DataNode | `requests:memory=256Mi,cpu=10m,limits:memory=2048Mi,cpu=1000m` |
56+
| `yarn.resourceManager.pdbMinAvailable` | PDB for the YARN ResourceManager | `1` |
57+
| `yarn.resourceManager.resources` | resources for the YARN ResourceManager | `requests:memory=256Mi,cpu=10m,limits:memory=2048Mi,cpu=1000m` |
58+
| `yarn.nodeManager.pdbMinAvailable` | PDB for the YARN NodeManager | `1` |
59+
| `yarn.nodeManager.replicas` | Number of YARN NodeManager replicas | `2` |
60+
| `yarn.nodeManager.parallelCreate` | Create all nodeManager statefulset pods in parallel (K8S 1.7+) | `false` |
61+
| `yarn.nodeManager.resources` | Resource limits and requests for YARN NodeManager pods | `requests:memory=2048Mi,cpu=1000m,limits:memory=2048Mi,cpu=1000m`|
62+
| `persistence.nameNode.enabled` | Enable/disable persistent volume | `false` |
63+
| `persistence.nameNode.storageClass` | Name of the StorageClass to use per your volume provider | `-` |
64+
| `persistence.nameNode.accessMode` | Access mode for the volume | `ReadWriteOnce` |
65+
| `persistence.nameNode.size` | Size of the volume | `50Gi` |
66+
| `persistence.dataNode.enabled` | Enable/disable persistent volume | `false` |
67+
| `persistence.dataNode.storageClass` | Name of the StorageClass to use per your volume provider | `-` |
68+
| `persistence.dataNode.accessMode` | Access mode for the volume | `ReadWriteOnce` |
69+
| `persistence.dataNode.size` | Size of the volume | `200Gi` |
70+
71+
## Related charts
72+
73+
The [Zeppelin Notebook](https://github.com/kubernetes/charts/tree/master/stable/zeppelin) chart can use the hadoop config for the hadoop cluster and use the YARN executor:
74+
75+
```
76+
helm install --set hadoop.useConfigMap=true stable/zeppelin
77+
```
78+
79+
# References
80+
81+
- This is a variation of the hadoop helm chart of stable helm repo (https://github.com/helm/charts/tree/master/stable/hadoop).
82+
83+
- Original K8S Hadoop adaptation this chart was derived from: https://github.com/Comcast/kube-yarn
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
#!/bin/bash
2+
3+
: ${HADOOP_PREFIX:=/usr/local/hadoop}
4+
5+
. $HADOOP_PREFIX/etc/hadoop/hadoop-env.sh
6+
7+
# Directory to find config artifacts
8+
CONFIG_DIR="/tmp/hadoop-config"
9+
10+
# Copy config files from volume mount
11+
12+
for f in slaves core-site.xml hdfs-site.xml mapred-site.xml yarn-site.xml; do
13+
if [[ -e ${CONFIG_DIR}/$f ]]; then
14+
cp ${CONFIG_DIR}/$f $HADOOP_PREFIX/etc/hadoop/$f
15+
else
16+
echo "ERROR: Could not find $f in $CONFIG_DIR"
17+
exit 1
18+
fi
19+
done
20+
21+
# installing libraries if any - (resource urls added comma separated to the ACP system variable)
22+
cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do echo == $cp; curl -LO $cp ; done; cd -
23+
if [[ $2 == "namenode" ]]; then
24+
if [ ! -d "/dfs/name" ]; then
25+
mkdir -p /dfs/name
26+
$HADOOP_PREFIX/bin/hdfs namenode -format -force -nonInteractive
27+
fi
28+
$HADOOP_PREFIX/sbin/hadoop-daemon.sh start namenode
29+
fi
30+
if [[ $2 == "datanode" ]]; then
31+
if [ ! -d "/dfs/data" ]; then
32+
mkdir -p /dfs/data
33+
fi
34+
# wait up to 30 seconds for namenode
35+
(while [[ $count -lt 15 && -z `curl -sf http://{{ include "hdfs.fullname" . }}-namenode:50070` ]]; do ((count=count+1)) ; echo "Waiting for {{ include "hdfs.fullname" . }}-namenode" ; sleep 2; done && [[ $count -lt 15 ]])
36+
[[ $? -ne 0 ]] && echo "Timeout waiting for hdfs namenode, exiting." && exit 1
37+
38+
$HADOOP_PREFIX/sbin/hadoop-daemon.sh start datanode
39+
fi
40+
if [[ $1 == "-d" ]]; then
41+
until find ${HADOOP_PREFIX}/logs -mmin -1 | egrep -q '.*'; echo "`date`: Waiting for logs..." ; do sleep 2 ; done
42+
tail -F ${HADOOP_PREFIX}/logs/* &
43+
while true; do sleep 1000; done
44+
fi
45+
46+
if [[ $1 == "-bash" ]]; then
47+
/bin/bash
48+
fi
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
<?xml version="1.0"?>
2+
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
3+
<configuration>
4+
<property><name>fs.defaultFS</name><value>hdfs://{{ include "hdfs.fullname" . }}-namenode:{{ .Values.nameNode.port }}/</value></property>
5+
<property><name>hadoop.proxyuser.root.hosts</name>
6+
<value>*</value>
7+
</property>
8+
<property>
9+
<name>hadoop.proxyuser.root.groups</name>
10+
<value>*</value>
11+
</property>
12+
{{- range $key, $value := index .Values.conf "coreSite" }}
13+
14+
<property><name>{{ $key }}</name><value>{{ $value }}</value></property>
15+
{{- end }}
16+
</configuration>
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
<?xml version="1.0"?>
2+
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
3+
<configuration>
4+
<property><name>dfs.datanode.use.datanode.hostname</name><value>false</value></property>
5+
<property><name>dfs.client.use.datanode.hostname</name><value>false</value></property>
6+
<property><name>dfs.datanode.data.dir</name><value>file:///dfs/data</value>
7+
<description>DataNode directory</description>
8+
</property>
9+
10+
<property>
11+
<name>dfs.namenode.name.dir</name>
12+
<value>file:///dfs/name</value>
13+
<description>NameNode directory for namespace and transaction logs storage.</description>
14+
</property>
15+
16+
<property>
17+
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
18+
<value>false</value>
19+
</property>
20+
21+
<!-- Bind to all interfaces -->
22+
<property>
23+
<name>dfs.namenode.rpc-bind-host</name>
24+
<value>0.0.0.0</value>
25+
</property>
26+
<property>
27+
<name>dfs.namenode.servicerpc-bind-host</name>
28+
<value>0.0.0.0</value>
29+
</property>
30+
<!-- /Bind to all interfaces -->
31+
{{- range $key, $value := index .Values.conf "hdfsSite" }}
32+
<property><name>{{ $key }}</name><value>{{ $value }}</value></property>
33+
{{- end }}
34+
35+
</configuration>
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
hadoop httpfs secret
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
<?xml version="1.0"?>
2+
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
3+
<configuration>
4+
{{- range $key, $value := index .Values.conf "httpfsSite" }}
5+
<property><name>{{ $key }}</name><value>{{ $value }}</value></property>
6+
{{- end }}
7+
</configuration>
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
<?xml version="1.0"?>
2+
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
3+
<configuration>
4+
</configuration>
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
<?xml version="1.0"?>
2+
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
3+
<configuration>
4+
</configuration>

0 commit comments

Comments
 (0)