Skip to content

Commit 4c59743

Browse files
committed
freshness145
1 parent 5bde17f commit 4c59743

File tree

1 file changed

+44
-46
lines changed

1 file changed

+44
-46
lines changed

articles/hdinsight/hdinsight-hadoop-giraph-install-linux.md

Lines changed: 44 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -6,14 +6,14 @@ ms.author: hrasheed
66
ms.reviewer: jasonh
77
ms.service: hdinsight
88
ms.topic: conceptual
9-
ms.date: 04/22/2019
9+
ms.date: 12/26/2019
1010
---
1111

1212
# Install Apache Giraph on HDInsight Hadoop clusters, and use Giraph to process large-scale graphs
1313

1414
Learn how to install Apache Giraph on an HDInsight cluster. The script action feature of HDInsight allows you to customize your cluster by running a bash script. Scripts can be used to customize clusters during and after cluster creation.
1515

16-
## <a name="whatis"></a>What is Giraph
16+
## What is Giraph
1717

1818
[Apache Giraph](https://giraph.apache.org/) allows you to perform graph processing by using Hadoop, and can be used with Azure HDInsight. Graphs model relationships between objects. For example, the connections between routers on a large network like the Internet, or relationships between people on social networks. Graph processing allows you to reason about the relationships between objects in a graph, such as:
1919

@@ -28,20 +28,17 @@ Learn how to install Apache Giraph on an HDInsight cluster. The script action fe
2828
>
2929
> Custom components, such as Giraph, receive commercially reasonable support to help you to further troubleshoot the issue. Microsoft Support may be able to resolving the issue. If not, you must consult open source communities where deep expertise for that technology is found. For example, there are many community sites that can be used, like: [MSDN forum for HDInsight](https://social.msdn.microsoft.com/Forums/azure/en-US/home?forum=hdinsight), [https://stackoverflow.com](https://stackoverflow.com). Also Apache projects have project sites on [https://apache.org](https://apache.org), for example: [Hadoop](https://hadoop.apache.org/).
3030
31-
3231
## What the script does
3332

3433
This script performs the following actions:
3534

36-
* Installs Giraph to `/usr/hdp/current/giraph`
37-
38-
* Copies the `giraph-examples.jar` file to default storage (WASB) for your cluster: `/example/jars/giraph-examples.jar`
35+
* Installs Giraph to `/usr/hdp/current/giraph`.
3936

40-
## <a name="install"></a>Install Giraph using Script Actions
37+
* Copies the `giraph-examples.jar` file to default storage (WASB) for your cluster: `/example/jars/giraph-examples.jar`.
4138

42-
A sample script to install Giraph on an HDInsight cluster is available at the following location:
39+
## Install Giraph using Script Actions
4340

44-
https://hdiconfigactions.blob.core.windows.net/linuxgiraphconfigactionv01/giraph-installer-v01.sh
41+
A sample script to install Giraph on an HDInsight cluster is available at `https://hdiconfigactions.blob.core.windows.net/linuxgiraphconfigactionv01/giraph-installer-v01.sh`
4542

4643
This section provides instructions on how to use the sample script while creating the cluster by using the Azure portal.
4744

@@ -54,38 +51,34 @@ This section provides instructions on how to use the sample script while creatin
5451
>
5552
> You can also apply script actions to already running clusters. For more information, see [Customize HDInsight clusters with Script Actions](hdinsight-hadoop-customize-cluster-linux.md).
5653
57-
1. Start creating a cluster by using the steps in [Create Linux-based HDInsight clusters](hdinsight-hadoop-create-linux-clusters-portal.md), but do not complete creation.
58-
59-
2. In the **Optional Configuration** section, select **Script Actions**, and provide the following information:
60-
61-
* **NAME**: Enter a friendly name for the script action.
62-
63-
* **SCRIPT URI**: https://hdiconfigactions.blob.core.windows.net/linuxgiraphconfigactionv01/giraph-installer-v01.sh
54+
1. Start creating a cluster by using the steps in [Create Linux-based HDInsight clusters](hdinsight-hadoop-create-linux-clusters-portal.md), but don't complete creation. You'll need to use the **classic create experience** and **Custom(size, settings, apps)**.
6455

65-
* **HEAD**: Check this entry.
56+
1. In the **Cluster size** section, ensure **Number of Worker nodes** is at least 2, for this example.
6657

67-
* **WORKER**: Leave this entry unchecked.
58+
1. In the **Script actions** section, provide the following information:
6859

69-
* **ZOOKEEPER**: Leave this entry unchecked.
60+
|Property |Value |
61+
|---|---|
62+
|Script type|- Custom|
63+
|Name|Install Giraph|
64+
|Bash script URI|`https://hdiconfigactions.blob.core.windows.net/linuxgiraphconfigactionv01/giraph-installer-v01.sh`|
65+
|Node type(s)|Head|
66+
|Parameters|Leave blank|
7067

71-
* **PARAMETERS**: Leave this field blank.
68+
For more information, see [Use a script action during cluster creation](./hdinsight-hadoop-customize-cluster-linux.md#use-a-script-action-during-cluster-creation).
7269

73-
3. At the bottom of the **Script Actions**, use the **Select** button to save the configuration. Finally, use the **Select** button at the bottom of the **Optional Configuration** section to save the optional configuration information.
70+
1. Continue creating the cluster as described in [Create Linux-based HDInsight clusters](hdinsight-hadoop-create-linux-clusters-portal.md).
7471

75-
4. Continue creating the cluster as described in [Create Linux-based HDInsight clusters](hdinsight-hadoop-create-linux-clusters-portal.md).
76-
77-
## <a name="usegiraph"></a>How do I use Giraph in HDInsight?
72+
## How do I use Giraph in HDInsight?
7873

7974
Once the cluster has been created, use the following steps to run the SimpleShortestPathsComputation example included with Giraph. This example uses the basic [Pregel](https://people.apache.org/~edwardyoon/documents/pregel.pdf) implementation for finding the shortest path between objects in a graph.
8075

81-
1. Connect to the HDInsight cluster using SSH:
76+
1. Use [ssh command](./hdinsight-hadoop-linux-use-ssh-unix.md) to connect to your cluster. Edit the command below by replacing CLUSTERNAME with the name of your cluster, and then enter the command:
8277

83-
```bash
84-
ssh USERNAME@CLUSTERNAME-ssh.azurehdinsight.net
78+
```cmd
79+
ssh sshuser@CLUSTERNAME-ssh.azurehdinsight.net
8580
```
8681
87-
For information, see [Use SSH with HDInsight](hdinsight-hadoop-linux-use-ssh-unix.md).
88-
8982
2. Use the following command to create a file named **tiny_graph.txt**:
9083
9184
```bash
@@ -122,42 +115,47 @@ Once the cluster has been created, use the following steps to run the SimpleShor
122115
yarn jar /usr/hdp/current/giraph/giraph-examples.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -ca mapred.job.tracker=headnodehost:9010 -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /example/data/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /example/output/shortestpaths -w 2
123116
```
124117
118+
> [!IMPORTANT]
119+
> The value passed to `-w` must be less than or equal to the actual number of worker nodes.
120+
125121
The parameters used with this command are described in the following table:
126122
127123
| Parameter | What it does |
128124
| --- | --- |
129-
| `jar` |The jar file containing the examples. |
130-
| `org.apache.giraph.GiraphRunner` |The class used to start the examples. |
131-
| `org.apache.giraph.examples.SimpleShortestPathsCoputation` |The example that is used. In this example, it computes the shortest path between ID 1 and all other IDs in the graph. |
132-
| `-ca mapred.job.tracker` |The headnode for the cluster. |
133-
| `-vif` |The input format to use for the input data. |
134-
| `-vip` |The input data file. |
135-
| `-vof` |The output format. In this example, ID and value as plain text. |
136-
| `-op` |The output location. |
137-
| `-w 2` |The number of workers to use. In this example, 2. |
125+
| jar |The jar file containing the examples. |
126+
| org.apache.giraph.GiraphRunner |The class used to start the examples. |
127+
| org.apache.giraph.examples.SimpleShortestPathsComputation |The example that is used. In this example, it computes the shortest path between ID 1 and all other IDs in the graph. |
128+
| -ca mapred.job.tracker |The headnode for the cluster. |
129+
| -vif |The input format to use for the input data. |
130+
| -vip |The input data file. |
131+
| -vof |The output format. In this example, ID and value as plain text. |
132+
| -op |The output location. |
133+
| -w 2 |The number of workers to use. In this example, 2. |
138134
139135
For more information on these, and other parameters used with Giraph samples, see the [Giraph quickstart](https://giraph.apache.org/quick_start.html).
140136
141-
6. Once the job has finished, the results are stored in the **/example/out/shortestpaths** directory. The output file names begin with **part-m-** and end with a number indicating the first, second, etc. file. Use the following command to view the output:
137+
6. Once the job has finished, the results are stored in the **/example/output/shortestpaths** directory. The output file names begin with **part-m-** and end with a number indicating the first, second, and so on, file. Use the following command to view the output:
142138
143139
```bash
144140
hdfs dfs -text /example/output/shortestpaths/*
145141
```
146142
147143
The output appears similar to the following text:
148144
149-
0 1.0
150-
4 5.0
151-
2 2.0
152-
1 0.0
153-
3 1.0
145+
```output
146+
0 1.0
147+
4 5.0
148+
2 2.0
149+
1 0.0
150+
3 1.0
151+
```
154152
155153
The SimpleShortestPathComputation example is hard coded to start with object ID 1 and find the shortest path to other objects. The output is in the format of `destination_id` and `distance`. The `distance` is the value (or weight) of the edges traveled between object ID 1 and the target ID.
156154
157-
Visualizing this data, you can verify the results by traveling the shortest paths between ID 1 and all other objects. The shortest path between ID 1 and ID 4 is 5. This value is the total distance between <span style="color:orange">ID 1 and 3</span>, and then <span style="color:red">ID 3 and 4</span>.
155+
Visualizing this data, you can verify the results by traveling the shortest paths between ID 1 and all other objects. The shortest path between ID 1 and ID 4 is 5. This value is the total distance between ID 1 and 3, and then ID 3 and 4.
158156
159157
![Drawing of objects as circles with shortest paths drawn between](./media/hdinsight-hadoop-giraph-install-linux/hdinsight-giraph-graph-out.png)
160158
161159
## Next steps
162160
163-
* [Install and use Hue on HDInsight clusters](hdinsight-hadoop-hue-linux.md).
161+
[Install and use Hue on HDInsight clusters](hdinsight-hadoop-hue-linux.md).

0 commit comments

Comments
 (0)