You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/hbase/apache-troubleshoot-hbase.md
+9-85Lines changed: 9 additions & 85 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -96,89 +96,6 @@ It can take up to five minutes for the HBase Master service to stabilize and fin
96
96
97
97
When the SYSTEM.CATALOG table is back to normal, the connectivity issue to Phoenix should be automatically resolved.
98
98
99
-
100
-
## What causes a master server to fail to start?
101
-
102
-
### Error
103
-
104
-
An atomic renaming failure occurs.
105
-
106
-
### Detailed description
107
-
108
-
During the startup process, HMaster completes many initialization steps. These include moving data from the scratch (.tmp) folder to the data folder. HMaster also looks at the write-ahead logs (WALs) folder to see if there are any unresponsive region servers, and so on.
109
-
110
-
During startup, HMaster does a basic `list` command on these folders. If at any time, HMaster sees an unexpected file in any of these folders, it throws an exception and doesn't start.
111
-
112
-
### Probable cause
113
-
114
-
In the region server logs, try to identify the timeline of the file creation, and then see if there was a process crash around the time the file was created. (Contact HBase support to assist you in doing this.) This helps us provide more robust mechanisms, so that you can avoid hitting this bug, and ensure graceful process shutdowns.
115
-
116
-
### Resolution steps
117
-
118
-
Check the call stack and try to determine which folder might be causing the problem (for instance, it might be the WALs folder or the .tmp folder). Then, in Cloud Explorer or by using HDFS commands, try to locate the problem file. Usually, this is a \*-renamePending.json file. (The \*-renamePending.json file is a journal file that's used to implement the atomic rename operation in the WASB driver. Due to bugs in this implementation, these files can be left over after process crashes, and so on.) Force-delete this file either in Cloud Explorer or by using HDFS commands.
119
-
120
-
Sometimes, there might also be a temporary file named something like *$$$.$$$* at this location. You have to use HDFS `ls` command to see this file; you cannot see the file in Cloud Explorer. To delete this file, use the HDFS command `hdfs dfs -rm /\<path>\/\$\$\$.\$\$\$`.
121
-
122
-
After you've run these commands, HMaster should start immediately.
123
-
124
-
### Error
125
-
126
-
No server address is listed in *hbase: meta* for region xxx.
127
-
128
-
### Detailed description
129
-
130
-
You might see a message on your Linux cluster that indicates that the *hbase: meta* table is not online. Running `hbck` might report that "hbase: meta table replicaId 0 is not found on any region." The problem might be that HMaster could not initialize after you restarted HBase. In the HMaster logs, you might see the message: "No server address listed in hbase: meta for region hbase: backup \<region name\>".
131
-
132
-
### Resolution steps
133
-
134
-
1. In the HBase shell, enter the following commands (change actual values as applicable):
[Unable to process the HBase table](https://stackoverflow.com/questions/4794092/unable-to-access-hbase-table)
157
-
158
-
159
-
### Error
160
-
161
-
HMaster times out with a fatal exception similar to "java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned."
162
-
163
-
### Detailed description
164
-
165
-
You might experience this issue if you have many tables and regions that have not been flushed when you restart your HMaster services. Restart might fail, and you'll see the preceding error message.
166
-
167
-
### Probable cause
168
-
169
-
This is a known issue with the HMaster service. General cluster startup tasks can take a long time. HMaster shuts down because the namespace table isn’t yet assigned. This occurs only in scenarios in which large amount of unflushed data exists, and a timeout of five minutes is not sufficient.
170
-
171
-
### Resolution steps
172
-
173
-
1. In the Apache Ambari UI, go to **HBase** > **Configs**. In the custom hbase-site.xml file, add the following setting:
2. Restart the required services (HMaster, and possibly other HBase services).
180
-
181
-
182
99
## What causes a restart failure on a region server?
183
100
184
101
### Issue
@@ -256,5 +173,12 @@ Here's what's happening behind the scenes:
256
173
sudo su - hbase -c "/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh start regionserver"
257
174
```
258
175
259
-
### See also
260
-
[Troubleshoot by using Azure HDInsight](../../hdinsight/hdinsight-troubleshoot-guide.md)
176
+
## Next steps
177
+
178
+
If you didn't see your problem or are unable to solve your issue, visit one of the following channels for more support:
179
+
180
+
* Get answers from Azure experts through [Azure Community Support](https://azure.microsoft.com/support/community/).
181
+
182
+
* Connect with [@AzureSupport](https://twitter.com/azuresupport) - the official Microsoft Azure account for improving customer experience. Connecting the Azure community to the right resources: answers, support, and experts.
183
+
184
+
* If you need more help, you can submit a support request from the [Azure portal](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade/). Select **Support** from the menu bar or open the **Help + support** hub. For more detailed information, review [How to create an Azure support request](https://docs.microsoft.com/azure/azure-supportability/how-to-create-azure-support-request). Access to Subscription Management and billing support is included with your Microsoft Azure subscription, and Technical Support is provided through one of the [Azure Support Plans](https://azure.microsoft.com/support/plans/).
Copy file name to clipboardExpand all lines: articles/hdinsight/hbase/hbase-troubleshoot-start-fails.md
+24-20Lines changed: 24 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ ms.service: hdinsight
5
5
ms.topic: troubleshooting
6
6
author: hrasheed-msft
7
7
ms.author: hrasheed
8
-
ms.date: 08/06/2019
8
+
ms.date: 08/14/2019
9
9
---
10
10
11
11
# Apache HBase Master (HMaster) fails to start in Azure HDInsight
@@ -20,42 +20,46 @@ Unexpected files identified during startup process.
20
20
21
21
### Cause
22
22
23
-
During the startup process, HMaster performs many initialization steps, including moving data from scratch (.tmp) folder to data folder. HMaster also looks at WALs (Write Ahead Logs) folder to see if there are any dead region servers. During all these situations, it does a basic `list` command on these folders. If at any time it sees an unexpected file in any of these folders, it will throw an exception and hence not start.
23
+
During the startup process, HMaster performs many initialization steps, including moving data from scratch (.tmp) folder to data folder. HMaster also looks at the write-ahead logs (WAL) folder to see if there are any unresponsive region servers.
24
+
25
+
HMaster does a basic list command on the WAL folders. If at any time, HMaster sees an unexpected file in any of these folders, it throws an exception and doesn't start.
24
26
25
27
### Resolution
26
28
27
-
In such a situation, check the call stack to see which folder might be causing problem (for instance is it WALs folder or .tmp folder). Then via Cloud Explorer or via hdfs commands to locate the problem file. The problem file is usually a `*-renamePending.json` file (a journal file used to implement Atomic Rename operation in WASB driver). Due to bugs in this implementation, such files can be left over in cases of process crash. Force delete this file via Cloud Explorer. In addition, there might be a temporary file of the nature $ in this location. The file cannot be seen via cloud explorer and only via hdfs `ls` command. You can use hdfs command `hdfs dfs -rm //\$\$\$.\$\$\$` to delete this file.
29
+
Check the call stack and try to determine which folder might be causing the problem (for instance, it might be the WAL folder or the .tmp folder). Then, in Cloud Explorer or by using HDFS commands, try to locate the problem file. Usually, this is a `*-renamePending.json` file. (The `*-renamePending.json` file is a journal file that's used to implement the atomic rename operation in the WASB driver. Due to bugs in this implementation, these files can be left over after process crashes, and so on.) Force-delete this file either in Cloud Explorer or by using HDFS commands.
30
+
31
+
Sometimes, there might also be a temporary file named something like `$$$.$$$` at this location. You have to use HDFS `ls` command to see this file; you cannot see the file in Cloud Explorer. To delete this file, use the HDFS command `hdfs dfs -rm /\<path>\/\$\$\$.\$\$\$`.
28
32
29
-
Once the problem file has been removed, HMaster should start up immediately.
33
+
After you've run these commands, HMaster should start immediately.
30
34
31
35
---
32
36
33
37
## Scenario: No server address listed
34
38
35
39
### Issue
36
40
37
-
HMaster log shows an error message similar to "No server address listed in hbase: meta for region xxx."
41
+
You might see a message that indicates that the `hbase: meta` table is not online. Running `hbck` might report that `hbase: meta table replicaId 0 is not found on any region.` In the HMaster logs, you might see the message: `No server address listed in hbase: meta for region hbase: backup <region name>`.
38
42
39
43
### Cause
40
44
41
45
HMaster could not initialize after restarting HBase.
42
46
43
47
### Resolution
44
48
45
-
1.Execute the following commands on HBase shell (change actual values as applicable):
49
+
1.In the HBase shell, enter the following commands (change actual values as applicable):
@@ -65,29 +69,29 @@ HMaster could not initialize after restarting HBase.
65
69
66
70
### Issue
67
71
68
-
HMaster times out with fatal exception like `java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned`.
72
+
HMaster times out with fatal exception similar to: `java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned`.
69
73
70
74
### Cause
71
75
72
-
The time-out is a known defect with HMaster. General cluster startup tasks can take a long time. HMaster shuts down if the namespace table isn’t yet assigned. The lengthy startup tasks happen where large amount of unflushed data exists and a timeout of five minutes is not sufficient.
76
+
You might experience this issue if you have many tables and regions that have not been flushed when you restart your HMaster services. The time-out is a known defect with HMaster. General cluster startup tasks can take a long time. HMaster shuts down if the namespace table isn’t yet assigned. The lengthy startup tasks happen where large amount of unflushed data exists and a timeout of five minutes is not sufficient.
73
77
74
78
### Resolution
75
79
76
-
1. Access Ambari UI, go to HBase -> Configs, in custom `hbase-site.xml` add the following setting:
80
+
1. From the Apache Ambari UI, go to **HBase** > **Configs**. In the custom `hbase-site.xml` file, add the following setting:
1. Restart required services (Mainly HMaster and possibly other HBase services).
86
+
1. Restart the required services (HMaster, and possibly other HBase services).
83
87
84
88
---
85
89
86
-
## Scenario: Frequent regionserver restarts
90
+
## Scenario: Frequent region server restarts
87
91
88
92
### Issue
89
93
90
-
Nodes reboot periodically. From the regionserver logs you may see entries similar to:
94
+
Nodes reboot periodically. From the region server logs you may see entries similar to:
91
95
92
96
```
93
97
2017-05-09 17:45:07,683 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 31000ms
@@ -97,15 +101,15 @@ Nodes reboot periodically. From the regionserver logs you may see entries simila
97
101
98
102
### Cause
99
103
100
-
Long regionserver JVM GC pause. The pause will cause regionserver to be unresponsive and not able to send heart beat to HMaster within the zk session timeout 40s. HMaster will believe regionserver is dead and will abort the regionserver and restart.
104
+
Long `regionserver` JVM GC pause. The pause will cause `regionserver` to be unresponsive and not able to send heart beat to HMaster within the zk session timeout 40s. HMaster will believe `regionserver` is dead and will abort the `regionserver` and restart.
101
105
102
106
### Resolution
103
107
104
-
Change the zookeeper session timeout, not only hbase-site setting `zookeeper.session.timeout` but also zookeeper zoo.cfg setting `maxSessionTimeout` need to be changed.
108
+
Change the Zookeeper session timeout, not only `hbase-site` setting `zookeeper.session.timeout` but also Zookeeper `zoo.cfg` setting `maxSessionTimeout` need to be changed.
105
109
106
110
1. Access Ambari UI, go to **HBase -> Configs -> Settings**, in Timeouts section, change the value of Zookeeper Session Timeout.
107
111
108
-
1. Access Ambari UI, go to **Zookeeper -> Configs -> Custom** zoo.cfg, add/change the following setting. Make sure the value is the same as hbase `zookeeper.session.timeout`.
112
+
1. Access Ambari UI, go to **Zookeeper -> Configs -> Custom** `zoo.cfg`, add/change the following setting. Make sure the value is the same as HBase `zookeeper.session.timeout`.
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-troubleshoot-guide.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,16 +4,16 @@ description: Troubleshoot Apache Hadoop workloads by using Azure HDInsight. Step
4
4
author: hrasheed-msft
5
5
ms.author: hrasheed
6
6
ms.service: hdinsight
7
-
ms.topic: conceptual
8
-
ms.date: 05/29/2019
7
+
ms.topic: troubleshooting
8
+
ms.date: 08/14/2019
9
9
---
10
10
11
11
12
12
# Troubleshoot by using Azure HDInsight
13
13
14
14
| Apache workload | Top questions |
15
15
|---|---|
16
-
|<br>[Troubleshoot Apache HBase](hbase/apache-troubleshoot-hbase.md)|<br>[How do I run hbck command reports with multiple unassigned regions?](hbase/apache-troubleshoot-hbase.md#how-do-i-run-hbck-command-reports-with-multiple-unassigned-regions)<br><br>[How do I fix timeout issues when using hbck commands for region assignments?](hbase/apache-troubleshoot-hbase.md#how-do-i-fix-timeout-issues-with-hbck-commands-for-region-assignments)<br><br>[How do I fix JDBC or SQLLine connectivity issues with Apache Phoenix?](hbase/apache-troubleshoot-hbase.md#how-do-i-fix-jdbc-or-sqlline-connectivity-issues-with-apache-phoenix)<br><br>[What causes a master server to fail to start?](hbase/apache-troubleshoot-hbase.md#what-causes-a-master-server-to-fail-to-start)<br><br>[What causes a restart failure on a region server?](hbase/apache-troubleshoot-hbase.md#what-causes-a-restart-failure-on-a-region-server)|
16
+
|<br>[Troubleshoot Apache HBase](hbase/apache-troubleshoot-hbase.md)|<br>[How do I run hbck command reports with multiple unassigned regions?](hbase/apache-troubleshoot-hbase.md#how-do-i-run-hbck-command-reports-with-multiple-unassigned-regions)<br><br>[How do I fix timeout issues when using hbck commands for region assignments?](hbase/apache-troubleshoot-hbase.md#how-do-i-fix-timeout-issues-with-hbck-commands-for-region-assignments)<br><br>[How do I fix JDBC or SQLLine connectivity issues with Apache Phoenix?](hbase/apache-troubleshoot-hbase.md#how-do-i-fix-jdbc-or-sqlline-connectivity-issues-with-apache-phoenix)<br><br>[What causes a master server to fail to start?](hbase/hbase-troubleshoot-start-fails.md)<br><br>[What causes a restart failure on a region server?](hbase/apache-troubleshoot-hbase.md#what-causes-a-restart-failure-on-a-region-server)|
17
17
|<br>[Troubleshoot Apache Hadoop HDFS](hdinsight-troubleshoot-hdfs.md)|<br>[How do I access a local HDFS from inside a cluster?](hdinsight-troubleshoot-hdfs.md#how-do-i-access-local-hdfs-from-inside-a-cluster)<br><br>[Local HDFS stuck in safe mode on Azure HDInsight cluster](hadoop/hdinsight-hdfs-troubleshoot-safe-mode.md)|
18
18
|<br>[Troubleshoot Apache Hive](hdinsight-troubleshoot-hive.md)|<br>[How do I export a Hive metastore and import it on another cluster?](hdinsight-troubleshoot-hive.md#how-do-i-export-a-hive-metastore-and-import-it-on-another-cluster)<br><br>[How do I locate Apache Hive logs on a cluster?](hdinsight-troubleshoot-hive.md#how-do-i-locate-hive-logs-on-a-cluster)<br><br>[How do I launch the Apache Hive shell with specific configurations on a cluster?](hdinsight-troubleshoot-hive.md#how-do-i-launch-the-hive-shell-with-specific-configurations-on-a-cluster)<br><br>[How do I analyze Apache Tez DAG data on a cluster-critical path?](hdinsight-troubleshoot-hive.md#how-do-i-analyze-tez-dag-data-on-a-cluster-critical-path)<br><br>[How do I download Apache Tez DAG data from a cluster?](hdinsight-troubleshoot-hive.md#how-do-i-download-tez-dag-data-from-a-cluster)|
19
19
|<br>[Troubleshoot Apache Spark](hdinsight-troubleshoot-SPARK.md)|<br>[How do I configure an Apache Spark application by using Apache Ambari on clusters?](spark/apache-troubleshoot-spark.md#how-do-i-configure-an-apache-spark-application-by-using-apache-ambari-on-clusters)<br><br>[How do I configure an Apache Spark application by using a Jupyter notebook on clusters?](spark/apache-troubleshoot-spark.md#how-do-i-configure-an-apache-spark-application-by-using-a-jupyter-notebook-on-clusters)<br><br>[How do I configure an Apache Spark application by using Apache Livy on clusters?](spark/apache-troubleshoot-spark.md#how-do-i-configure-an-apache-spark-application-by-using-apache-livy-on-clusters)<br><br>[How do I configure an Apache Spark application by using spark-submit on clusters?](spark/apache-troubleshoot-spark.md#how-do-i-configure-an-apache-spark-application-by-using-spark-submit-on-clusters)<br><br>[How do I configure an Apache Spark application by using IntelliJ?](spark/apache-spark-intellij-tool-plugin.md)<br><br>[How do I configure an Apache Spark application by using Eclipse?](spark/apache-spark-eclipse-tool-plugin.md)<br><br>[How do I configure an Apache Spark application by using VSCode?](hdinsight-for-vscode.md)<br><br>[What causes an Apache Spark application OutOfMemoryError exception?](spark/apache-troubleshoot-spark.md#what-causes-an-apache-spark-application-outofmemoryerror-exception)|
0 commit comments