Skip to content

Commit ae40360

Browse files
committed
hdfs add README.md
1 parent c53677e commit ae40360

File tree

2 files changed

+68
-46
lines changed

2 files changed

+68
-46
lines changed

.github/workflows/release.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,12 @@ jobs:
2222
with:
2323
version: 'v2.16.3'
2424

25+
- name: Install Helm repositories
26+
run: |
27+
helm repo add gradiant https://gradiant.github.io/charts/
28+
helm repo add stable https://kubernetes-charts.storage.googleapis.com
29+
helm repo add incubator https://kubernetes-charts-incubator.storage.googleapis.com
30+
2531
- name: Run chart-releaser
2632
uses: helm/chart-releaser-action@v1.0.0-alpha.2
2733
env:

charts/hdfs/README.md

Lines changed: 62 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,38 +1,36 @@
11

2-
# Hadoop Chart
2+
# HDFS Chart
33

4-
** This is the readme from the original hadoop helm chart (https://github.com/helm/charts/tree/master/stable/hadoop) **
4+
** This chart is a modification of the original (https://github.com/helm/charts/tree/master/stable/hadoop) **
55
** This version removes yarn manager and provides advanced hadoop configuration through env variables **
66

7-
[Hadoop](https://hadoop.apache.org/) is a framework for running large scale distributed applications.
8-
9-
This chart is primarily intended to be used for YARN and MapReduce job execution where HDFS is just used as a means to transport small artifacts within the framework and not for a distributed filesystem. Data should be read from cloud based datastores such as Google Cloud Storage, S3 or Swift.
10-
11-
## Chart Details
7+
[Hadoop HDFS](https://hadoop.apache.org/) is a distributed file system designed to run on commodity hardware.
128

139
## Installing the Chart
1410

15-
To install the chart with the release name `hadoop` that utilizes 50% of the available node resources:
11+
Add gradiant helm repo:
1612

1713
```
18-
$ helm install --name hadoop $(stable/hadoop/tools/calc_resources.sh 50) stable/hadoop
14+
helm repo add gradiant https://gradiant.github.io/charts
1915
```
2016

21-
> Note that you need at least 2GB of free memory per NodeManager pod, if your cluster isn't large enough, not all pods will be scheduled.
17+
To install the chart with the release name `hdfs`:
2218

23-
The optional [`calc_resources.sh`](./tools/calc_resources.sh) script is used as a convenience helper to set the `yarn.numNodes`, and `yarn.nodeManager.resources` appropriately to utilize all nodes in the Kubernetes cluster and a given percentage of their resources. For example, with a 3 node `n1-standard-4` GKE cluster and an argument of `50`, this would create 3 NodeManager pods claiming 2 cores and 7.5Gi of memory.
19+
```
20+
$ helm install --name hdfs gradiant/hdfs
21+
```
2422

2523
### Persistence
2624

2725
To install the chart with persistent volumes:
2826

2927
```
30-
$ helm install --name hadoop $(stable/hadoop/tools/calc_resources.sh 50) \
28+
$ helm install --name hadoop
3129
--set persistence.nameNode.enabled=true \
3230
--set persistence.nameNode.storageClass=standard \
3331
--set persistence.dataNode.enabled=true \
3432
--set persistence.dataNode.storageClass=standard \
35-
stable/hadoop
33+
gradiant/hdfs
3634
```
3735

3836
> Change the value of `storageClass` to match your volume driver. `standard` works for Google Container Engine clusters.
@@ -41,40 +39,58 @@ $ helm install --name hadoop $(stable/hadoop/tools/calc_resources.sh 50) \
4139

4240
The following table lists the configurable parameters of the Hadoop chart and their default values.
4341

44-
| Parameter | Description | Default |
45-
| ------------------------------------------------- | ------------------------------- | ---------------------------------------------------------------- |
46-
| `image.repository` | Hadoop image ([source](https://github.com/Comcast/kube-yarn/tree/master/image)) | `danisla/hadoop` |
47-
| `image.tag` | Hadoop image tag | `2.9.0` |
48-
| `imagee.pullPolicy` | Pull policy for the images | `IfNotPresent` |
49-
| `hadoopVersion` | Version of hadoop libraries being used | `2.9.0` |
50-
| `antiAffinity` | Pod antiaffinity, `hard` or `soft` | `hard` |
51-
| `hdfs.nameNode.pdbMinAvailable` | PDB for HDFS NameNode | `1` |
52-
| `hdfs.nameNode.resources` | resources for the HDFS NameNode | `requests:memory=256Mi,cpu=10m,limits:memory=2048Mi,cpu=1000m` |
53-
| `hdfs.dataNode.replicas` | Number of HDFS DataNode replicas | `1` |
54-
| `hdfs.dataNode.pdbMinAvailable` | PDB for HDFS DataNode | `1` |
55-
| `hdfs.dataNode.resources` | resources for the HDFS DataNode | `requests:memory=256Mi,cpu=10m,limits:memory=2048Mi,cpu=1000m` |
56-
| `yarn.resourceManager.pdbMinAvailable` | PDB for the YARN ResourceManager | `1` |
57-
| `yarn.resourceManager.resources` | resources for the YARN ResourceManager | `requests:memory=256Mi,cpu=10m,limits:memory=2048Mi,cpu=1000m` |
58-
| `yarn.nodeManager.pdbMinAvailable` | PDB for the YARN NodeManager | `1` |
59-
| `yarn.nodeManager.replicas` | Number of YARN NodeManager replicas | `2` |
60-
| `yarn.nodeManager.parallelCreate` | Create all nodeManager statefulset pods in parallel (K8S 1.7+) | `false` |
61-
| `yarn.nodeManager.resources` | Resource limits and requests for YARN NodeManager pods | `requests:memory=2048Mi,cpu=1000m,limits:memory=2048Mi,cpu=1000m`|
62-
| `persistence.nameNode.enabled` | Enable/disable persistent volume | `false` |
63-
| `persistence.nameNode.storageClass` | Name of the StorageClass to use per your volume provider | `-` |
64-
| `persistence.nameNode.accessMode` | Access mode for the volume | `ReadWriteOnce` |
65-
| `persistence.nameNode.size` | Size of the volume | `50Gi` |
66-
| `persistence.dataNode.enabled` | Enable/disable persistent volume | `false` |
67-
| `persistence.dataNode.storageClass` | Name of the StorageClass to use per your volume provider | `-` |
68-
| `persistence.dataNode.accessMode` | Access mode for the volume | `ReadWriteOnce` |
69-
| `persistence.dataNode.size` | Size of the volume | `200Gi` |
70-
71-
## Related charts
72-
73-
The [Zeppelin Notebook](https://github.com/kubernetes/charts/tree/master/stable/zeppelin) chart can use the hadoop config for the hadoop cluster and use the YARN executor:
42+
## Chart Values
43+
44+
| Key | Type | Default | Description |
45+
|-----|------|---------|-------------|
46+
| antiAffinity | string | `"soft"` | |
47+
| conf.coreSite | string | `nil` | |
48+
| conf.hdfsSite."dfs.replication" | int | `3` | |
49+
| dataNode.pdbMinAvailable | int | `1` | |
50+
| dataNode.replicas | int | `1` | |
51+
| dataNode.resources.limits.cpu | string | `"1000m"` | |
52+
| dataNode.resources.limits.memory | string | `"2048Mi"` | |
53+
| dataNode.resources.requests.cpu | string | `"10m"` | |
54+
| dataNode.resources.requests.memory | string | `"256Mi"` | |
55+
| hadoopVersion | string | `"2.7.7"` | |
56+
| httpfs.adminPort | int | `14001` | |
57+
| httpfs.port | int | `14000` | |
58+
| image.pullPolicy | string | `"IfNotPresent"` | |
59+
| image.repository | string | `"gradiant/hadoop-base"` | |
60+
| image.tag | string | `"2.7.7"` | |
61+
| ingress.dataNode.annotations | object | `{}` | |
62+
| ingress.dataNode.enabled | bool | `false` | |
63+
| ingress.dataNode.hosts[0] | string | `"hdfs-datanode.local"` | |
64+
| ingress.dataNode.labels | object | `{}` | |
65+
| ingress.dataNode.path | string | `"/"` | |
66+
| ingress.httpfs.annotations | object | `{}` | |
67+
| ingress.httpfs.enabled | bool | `false` | |
68+
| ingress.httpfs.hosts[0] | string | `"httpfs.local"` | |
69+
| ingress.httpfs.labels | object | `{}` | |
70+
| ingress.httpfs.path | string | `"/"` | |
71+
| ingress.nameNode.annotations | object | `{}` | |
72+
| ingress.nameNode.enabled | bool | `false` | |
73+
| ingress.nameNode.hosts[0] | string | `"hdfs-namenode.local"` | |
74+
| ingress.nameNode.labels | object | `{}` | |
75+
| ingress.nameNode.path | string | `"/"` | |
76+
| nameNode.pdbMinAvailable | int | `1` | |
77+
| nameNode.port | int | `8020` | |
78+
| nameNode.resources.limits.cpu | string | `"1000m"` | |
79+
| nameNode.resources.limits.memory | string | `"2048Mi"` | |
80+
| nameNode.resources.requests.cpu | string | `"10m"` | |
81+
| nameNode.resources.requests.memory | string | `"256Mi"` | |
82+
| persistence.dataNode.accessMode | string | `"ReadWriteOnce"` | |
83+
| persistence.dataNode.enabled | bool | `false` | |
84+
| persistence.dataNode.size | string | `"200Gi"` | |
85+
| persistence.dataNode.storageClass | string | `nil` | |
86+
| persistence.nameNode.accessMode | string | `"ReadWriteOnce"` | |
87+
| persistence.nameNode.enabled | bool | `false` | |
88+
| persistence.nameNode.size | string | `"50Gi"` | |
89+
| persistence.nameNode.storageClass | string | `nil` | |
90+
| prometheus.exporter.enabled | bool | `true` | |
91+
| prometheus.exporter.image | string | `"marcelmay/hadoop-hdfs-fsimage-exporter:1.2"` | |
92+
| prometheus.exporter.port | int | `5556` | |
7493

75-
```
76-
helm install --set hadoop.useConfigMap=true stable/zeppelin
77-
```
7894

7995
# References
8096

0 commit comments

Comments
 (0)