You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: charts/hdfs/README.md
+62-46Lines changed: 62 additions & 46 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,38 +1,36 @@
1
1
2
-
# Hadoop Chart
2
+
# HDFS Chart
3
3
4
-
** This is the readme from the original hadoop helm chart (https://github.com/helm/charts/tree/master/stable/hadoop) **
4
+
** This chart is a modification of the original (https://github.com/helm/charts/tree/master/stable/hadoop) **
5
5
** This version removes yarn manager and provides advanced hadoop configuration through env variables **
6
6
7
-
[Hadoop](https://hadoop.apache.org/) is a framework for running large scale distributed applications.
8
-
9
-
This chart is primarily intended to be used for YARN and MapReduce job execution where HDFS is just used as a means to transport small artifacts within the framework and not for a distributed filesystem. Data should be read from cloud based datastores such as Google Cloud Storage, S3 or Swift.
10
-
11
-
## Chart Details
7
+
[Hadoop HDFS](https://hadoop.apache.org/) is a distributed file system designed to run on commodity hardware.
12
8
13
9
## Installing the Chart
14
10
15
-
To install the chart with the release name `hadoop` that utilizes 50% of the available node resources:
> Note that you need at least 2GB of free memory per NodeManager pod, if your cluster isn't large enough, not all pods will be scheduled.
17
+
To install the chart with the release name `hdfs`:
22
18
23
-
The optional [`calc_resources.sh`](./tools/calc_resources.sh) script is used as a convenience helper to set the `yarn.numNodes`, and `yarn.nodeManager.resources` appropriately to utilize all nodes in the Kubernetes cluster and a given percentage of their resources. For example, with a 3 node `n1-standard-4` GKE cluster and an argument of `50`, this would create 3 NodeManager pods claiming 2 cores and 7.5Gi of memory.
|`imagee.pullPolicy`| Pull policy for the images |`IfNotPresent`|
49
-
|`hadoopVersion`| Version of hadoop libraries being used |`2.9.0`|
50
-
|`antiAffinity`| Pod antiaffinity, `hard` or `soft`|`hard`|
51
-
|`hdfs.nameNode.pdbMinAvailable`| PDB for HDFS NameNode |`1`|
52
-
|`hdfs.nameNode.resources`| resources for the HDFS NameNode |`requests:memory=256Mi,cpu=10m,limits:memory=2048Mi,cpu=1000m`|
53
-
|`hdfs.dataNode.replicas`| Number of HDFS DataNode replicas |`1`|
54
-
|`hdfs.dataNode.pdbMinAvailable`| PDB for HDFS DataNode |`1`|
55
-
|`hdfs.dataNode.resources`| resources for the HDFS DataNode |`requests:memory=256Mi,cpu=10m,limits:memory=2048Mi,cpu=1000m`|
56
-
|`yarn.resourceManager.pdbMinAvailable`| PDB for the YARN ResourceManager |`1`|
57
-
|`yarn.resourceManager.resources`| resources for the YARN ResourceManager |`requests:memory=256Mi,cpu=10m,limits:memory=2048Mi,cpu=1000m`|
58
-
|`yarn.nodeManager.pdbMinAvailable`| PDB for the YARN NodeManager |`1`|
59
-
|`yarn.nodeManager.replicas`| Number of YARN NodeManager replicas |`2`|
60
-
|`yarn.nodeManager.parallelCreate`| Create all nodeManager statefulset pods in parallel (K8S 1.7+) |`false`|
61
-
|`yarn.nodeManager.resources`| Resource limits and requests for YARN NodeManager pods |`requests:memory=2048Mi,cpu=1000m,limits:memory=2048Mi,cpu=1000m`|
|`persistence.dataNode.storageClass`| Name of the StorageClass to use per your volume provider |`-`|
68
-
|`persistence.dataNode.accessMode`| Access mode for the volume |`ReadWriteOnce`|
69
-
|`persistence.dataNode.size`| Size of the volume |`200Gi`|
70
-
71
-
## Related charts
72
-
73
-
The [Zeppelin Notebook](https://github.com/kubernetes/charts/tree/master/stable/zeppelin) chart can use the hadoop config for the hadoop cluster and use the YARN executor:
0 commit comments