Skip to content

Commit 6163c2f

Browse files
authored
Merge pull request #85432 from dagiro/ts_hbase12
ts_hbase12
2 parents 6e8f769 + c59cb1b commit 6163c2f

File tree

3 files changed

+36
-108
lines changed

3 files changed

+36
-108
lines changed

articles/hdinsight/hbase/apache-troubleshoot-hbase.md

Lines changed: 9 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -96,89 +96,6 @@ It can take up to five minutes for the HBase Master service to stabilize and fin
9696

9797
When the SYSTEM.CATALOG table is back to normal, the connectivity issue to Phoenix should be automatically resolved.
9898

99-
100-
## What causes a master server to fail to start?
101-
102-
### Error
103-
104-
An atomic renaming failure occurs.
105-
106-
### Detailed description
107-
108-
During the startup process, HMaster completes many initialization steps. These include moving data from the scratch (.tmp) folder to the data folder. HMaster also looks at the write-ahead logs (WALs) folder to see if there are any unresponsive region servers, and so on.
109-
110-
During startup, HMaster does a basic `list` command on these folders. If at any time, HMaster sees an unexpected file in any of these folders, it throws an exception and doesn't start.
111-
112-
### Probable cause
113-
114-
In the region server logs, try to identify the timeline of the file creation, and then see if there was a process crash around the time the file was created. (Contact HBase support to assist you in doing this.) This helps us provide more robust mechanisms, so that you can avoid hitting this bug, and ensure graceful process shutdowns.
115-
116-
### Resolution steps
117-
118-
Check the call stack and try to determine which folder might be causing the problem (for instance, it might be the WALs folder or the .tmp folder). Then, in Cloud Explorer or by using HDFS commands, try to locate the problem file. Usually, this is a \*-renamePending.json file. (The \*-renamePending.json file is a journal file that's used to implement the atomic rename operation in the WASB driver. Due to bugs in this implementation, these files can be left over after process crashes, and so on.) Force-delete this file either in Cloud Explorer or by using HDFS commands.
119-
120-
Sometimes, there might also be a temporary file named something like *$$$.$$$* at this location. You have to use HDFS `ls` command to see this file; you cannot see the file in Cloud Explorer. To delete this file, use the HDFS command `hdfs dfs -rm /\<path>\/\$\$\$.\$\$\$`.
121-
122-
After you've run these commands, HMaster should start immediately.
123-
124-
### Error
125-
126-
No server address is listed in *hbase: meta* for region xxx.
127-
128-
### Detailed description
129-
130-
You might see a message on your Linux cluster that indicates that the *hbase: meta* table is not online. Running `hbck` might report that "hbase: meta table replicaId 0 is not found on any region." The problem might be that HMaster could not initialize after you restarted HBase. In the HMaster logs, you might see the message: "No server address listed in hbase: meta for region hbase: backup \<region name\>".
131-
132-
### Resolution steps
133-
134-
1. In the HBase shell, enter the following commands (change actual values as applicable):
135-
136-
```apache
137-
> scan 'hbase:meta'
138-
```
139-
140-
```apache
141-
> delete 'hbase:meta','hbase:backup <region name>','<column name>'
142-
```
143-
144-
2. Delete the *hbase: namespace* entry. This entry might be the same error that's being reported when the *hbase: namespace* table is scanned.
145-
146-
3. To bring up HBase in a running state, in the Ambari UI, restart the Active HMaster service.
147-
148-
4. In the HBase shell, to bring up all offline tables, run the following command:
149-
150-
```apache
151-
hbase hbck -ignorePreCheckPermission -fixAssignments
152-
```
153-
154-
### Additional reading
155-
156-
[Unable to process the HBase table](https://stackoverflow.com/questions/4794092/unable-to-access-hbase-table)
157-
158-
159-
### Error
160-
161-
HMaster times out with a fatal exception similar to "java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned."
162-
163-
### Detailed description
164-
165-
You might experience this issue if you have many tables and regions that have not been flushed when you restart your HMaster services. Restart might fail, and you'll see the preceding error message.
166-
167-
### Probable cause
168-
169-
This is a known issue with the HMaster service. General cluster startup tasks can take a long time. HMaster shuts down because the namespace table isn’t yet assigned. This occurs only in scenarios in which large amount of unflushed data exists, and a timeout of five minutes is not sufficient.
170-
171-
### Resolution steps
172-
173-
1. In the Apache Ambari UI, go to **HBase** > **Configs**. In the custom hbase-site.xml file, add the following setting:
174-
175-
```apache
176-
Key: hbase.master.namespace.init.timeout Value: 2400000
177-
```
178-
179-
2. Restart the required services (HMaster, and possibly other HBase services).
180-
181-
18299
## What causes a restart failure on a region server?
183100

184101
### Issue
@@ -256,5 +173,12 @@ Here's what's happening behind the scenes:
256173
sudo su - hbase -c "/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh start regionserver"
257174
```
258175

259-
### See also
260-
[Troubleshoot by using Azure HDInsight](../../hdinsight/hdinsight-troubleshoot-guide.md)
176+
## Next steps
177+
178+
If you didn't see your problem or are unable to solve your issue, visit one of the following channels for more support:
179+
180+
* Get answers from Azure experts through [Azure Community Support](https://azure.microsoft.com/support/community/).
181+
182+
* Connect with [@AzureSupport](https://twitter.com/azuresupport) - the official Microsoft Azure account for improving customer experience. Connecting the Azure community to the right resources: answers, support, and experts.
183+
184+
* If you need more help, you can submit a support request from the [Azure portal](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade/). Select **Support** from the menu bar or open the **Help + support** hub. For more detailed information, review [How to create an Azure support request](https://docs.microsoft.com/azure/azure-supportability/how-to-create-azure-support-request). Access to Subscription Management and billing support is included with your Microsoft Azure subscription, and Technical Support is provided through one of the [Azure Support Plans](https://azure.microsoft.com/support/plans/).

articles/hdinsight/hbase/hbase-troubleshoot-start-fails.md

Lines changed: 24 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ ms.service: hdinsight
55
ms.topic: troubleshooting
66
author: hrasheed-msft
77
ms.author: hrasheed
8-
ms.date: 08/06/2019
8+
ms.date: 08/14/2019
99
---
1010

1111
# Apache HBase Master (HMaster) fails to start in Azure HDInsight
@@ -20,42 +20,46 @@ Unexpected files identified during startup process.
2020

2121
### Cause
2222

23-
During the startup process, HMaster performs many initialization steps, including moving data from scratch (.tmp) folder to data folder. HMaster also looks at WALs (Write Ahead Logs) folder to see if there are any dead region servers. During all these situations, it does a basic `list` command on these folders. If at any time it sees an unexpected file in any of these folders, it will throw an exception and hence not start.
23+
During the startup process, HMaster performs many initialization steps, including moving data from scratch (.tmp) folder to data folder. HMaster also looks at the write-ahead logs (WAL) folder to see if there are any unresponsive region servers.
24+
25+
HMaster does a basic list command on the WAL folders. If at any time, HMaster sees an unexpected file in any of these folders, it throws an exception and doesn't start.
2426

2527
### Resolution
2628

27-
In such a situation, check the call stack to see which folder might be causing problem (for instance is it WALs folder or .tmp folder). Then via Cloud Explorer or via hdfs commands to locate the problem file. The problem file is usually a `*-renamePending.json` file (a journal file used to implement Atomic Rename operation in WASB driver). Due to bugs in this implementation, such files can be left over in cases of process crash. Force delete this file via Cloud Explorer. In addition, there might be a temporary file of the nature $ in this location. The file cannot be seen via cloud explorer and only via hdfs `ls` command. You can use hdfs command `hdfs dfs -rm //\$\$\$.\$\$\$` to delete this file.
29+
Check the call stack and try to determine which folder might be causing the problem (for instance, it might be the WAL folder or the .tmp folder). Then, in Cloud Explorer or by using HDFS commands, try to locate the problem file. Usually, this is a `*-renamePending.json` file. (The `*-renamePending.json` file is a journal file that's used to implement the atomic rename operation in the WASB driver. Due to bugs in this implementation, these files can be left over after process crashes, and so on.) Force-delete this file either in Cloud Explorer or by using HDFS commands.
30+
31+
Sometimes, there might also be a temporary file named something like `$$$.$$$` at this location. You have to use HDFS `ls` command to see this file; you cannot see the file in Cloud Explorer. To delete this file, use the HDFS command `hdfs dfs -rm /\<path>\/\$\$\$.\$\$\$`.
2832

29-
Once the problem file has been removed, HMaster should start up immediately.
33+
After you've run these commands, HMaster should start immediately.
3034

3135
---
3236

3337
## Scenario: No server address listed
3438

3539
### Issue
3640

37-
HMaster log shows an error message similar to "No server address listed in hbase: meta for region xxx."
41+
You might see a message that indicates that the `hbase: meta` table is not online. Running `hbck` might report that `hbase: meta table replicaId 0 is not found on any region.` In the HMaster logs, you might see the message: `No server address listed in hbase: meta for region hbase: backup <region name>`.
3842

3943
### Cause
4044

4145
HMaster could not initialize after restarting HBase.
4246

4347
### Resolution
4448

45-
1. Execute the following commands on HBase shell (change actual values as applicable):
49+
1. In the HBase shell, enter the following commands (change actual values as applicable):
4650

47-
```
51+
```hbase
4852
scan 'hbase:meta'
49-
delete 'hbase:meta','hbase:backup <region name>','<column name>'
53+
delete 'hbase:meta','hbase:backup <region name>','<column name>'
5054
```
5155
52-
1. Delete the entry of hbase: namespace as the same error may be reported while scan hbase: namespace table.
56+
1. Delete the `hbase: namespace` entry. This entry might be the same error that's being reported when the `hbase: namespace` table is scanned.
5357
5458
1. Restart the active HMaster from Ambari UI to bring up HBase in running state.
5559
56-
1. Run the following command on HBase shell to bring up all offline tables:
60+
1. In the HBase shell, to bring up all offline tables, run the following command:
5761
58-
```
62+
```hbase
5963
hbase hbck -ignorePreCheckPermission -fixAssignments
6064
```
6165
@@ -65,29 +69,29 @@ HMaster could not initialize after restarting HBase.
6569
6670
### Issue
6771
68-
HMaster times out with fatal exception like `java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned`.
72+
HMaster times out with fatal exception similar to: `java.io.IOException: Timedout 300000ms waiting for namespace table to be assigned`.
6973
7074
### Cause
7175
72-
The time-out is a known defect with HMaster. General cluster startup tasks can take a long time. HMaster shuts down if the namespace table isn’t yet assigned. The lengthy startup tasks happen where large amount of unflushed data exists and a timeout of five minutes is not sufficient.
76+
You might experience this issue if you have many tables and regions that have not been flushed when you restart your HMaster services. The time-out is a known defect with HMaster. General cluster startup tasks can take a long time. HMaster shuts down if the namespace table isn’t yet assigned. The lengthy startup tasks happen where large amount of unflushed data exists and a timeout of five minutes is not sufficient.
7377
7478
### Resolution
7579
76-
1. Access Ambari UI, go to HBase -> Configs, in custom `hbase-site.xml` add the following setting:
80+
1. From the Apache Ambari UI, go to **HBase** > **Configs**. In the custom `hbase-site.xml` file, add the following setting:
7781
7882
```
7983
Key: hbase.master.namespace.init.timeout Value: 2400000
8084
```
8185
82-
1. Restart required services (Mainly HMaster and possibly other HBase services).
86+
1. Restart the required services (HMaster, and possibly other HBase services).
8387
8488
---
8589
86-
## Scenario: Frequent regionserver restarts
90+
## Scenario: Frequent region server restarts
8791
8892
### Issue
8993
90-
Nodes reboot periodically. From the regionserver logs you may see entries similar to:
94+
Nodes reboot periodically. From the region server logs you may see entries similar to:
9195
9296
```
9397
2017-05-09 17:45:07,683 WARN [JvmPauseMonitor] util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 31000ms
@@ -97,15 +101,15 @@ Nodes reboot periodically. From the regionserver logs you may see entries simila
97101
98102
### Cause
99103
100-
Long regionserver JVM GC pause. The pause will cause regionserver to be unresponsive and not able to send heart beat to HMaster within the zk session timeout 40s. HMaster will believe regionserver is dead and will abort the regionserver and restart.
104+
Long `regionserver` JVM GC pause. The pause will cause `regionserver` to be unresponsive and not able to send heart beat to HMaster within the zk session timeout 40s. HMaster will believe `regionserver` is dead and will abort the `regionserver` and restart.
101105
102106
### Resolution
103107
104-
Change the zookeeper session timeout, not only hbase-site setting `zookeeper.session.timeout` but also zookeeper zoo.cfg setting `maxSessionTimeout` need to be changed.
108+
Change the Zookeeper session timeout, not only `hbase-site` setting `zookeeper.session.timeout` but also Zookeeper `zoo.cfg` setting `maxSessionTimeout` need to be changed.
105109
106110
1. Access Ambari UI, go to **HBase -> Configs -> Settings**, in Timeouts section, change the value of Zookeeper Session Timeout.
107111
108-
1. Access Ambari UI, go to **Zookeeper -> Configs -> Custom** zoo.cfg, add/change the following setting. Make sure the value is the same as hbase `zookeeper.session.timeout`.
112+
1. Access Ambari UI, go to **Zookeeper -> Configs -> Custom** `zoo.cfg`, add/change the following setting. Make sure the value is the same as HBase `zookeeper.session.timeout`.
109113
110114
```
111115
Key: maxSessionTimeout Value: 120000

articles/hdinsight/hdinsight-troubleshoot-guide.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,16 @@ description: Troubleshoot Apache Hadoop workloads by using Azure HDInsight. Step
44
author: hrasheed-msft
55
ms.author: hrasheed
66
ms.service: hdinsight
7-
ms.topic: conceptual
8-
ms.date: 05/29/2019
7+
ms.topic: troubleshooting
8+
ms.date: 08/14/2019
99
---
1010

1111

1212
# Troubleshoot by using Azure HDInsight
1313

1414
| Apache workload | Top questions |
1515
|---|---|
16-
|![HBase](./media/hdinsight-troubleshoot-guide/HBASE.png)<br>[Troubleshoot Apache HBase](hbase/apache-troubleshoot-hbase.md)|<br>[How do I run hbck command reports with multiple unassigned regions?](hbase/apache-troubleshoot-hbase.md#how-do-i-run-hbck-command-reports-with-multiple-unassigned-regions)<br><br>[How do I fix timeout issues when using hbck commands for region assignments?](hbase/apache-troubleshoot-hbase.md#how-do-i-fix-timeout-issues-with-hbck-commands-for-region-assignments)<br><br>[How do I fix JDBC or SQLLine connectivity issues with Apache Phoenix?](hbase/apache-troubleshoot-hbase.md#how-do-i-fix-jdbc-or-sqlline-connectivity-issues-with-apache-phoenix)<br><br>[What causes a master server to fail to start?](hbase/apache-troubleshoot-hbase.md#what-causes-a-master-server-to-fail-to-start)<br><br>[What causes a restart failure on a region server?](hbase/apache-troubleshoot-hbase.md#what-causes-a-restart-failure-on-a-region-server)|
16+
|![HBase](./media/hdinsight-troubleshoot-guide/HBASE.png)<br>[Troubleshoot Apache HBase](hbase/apache-troubleshoot-hbase.md)|<br>[How do I run hbck command reports with multiple unassigned regions?](hbase/apache-troubleshoot-hbase.md#how-do-i-run-hbck-command-reports-with-multiple-unassigned-regions)<br><br>[How do I fix timeout issues when using hbck commands for region assignments?](hbase/apache-troubleshoot-hbase.md#how-do-i-fix-timeout-issues-with-hbck-commands-for-region-assignments)<br><br>[How do I fix JDBC or SQLLine connectivity issues with Apache Phoenix?](hbase/apache-troubleshoot-hbase.md#how-do-i-fix-jdbc-or-sqlline-connectivity-issues-with-apache-phoenix)<br><br>[What causes a master server to fail to start?](hbase/hbase-troubleshoot-start-fails.md)<br><br>[What causes a restart failure on a region server?](hbase/apache-troubleshoot-hbase.md#what-causes-a-restart-failure-on-a-region-server)|
1717
|![HDFS](./media/hdinsight-troubleshoot-guide/HDFS.png)<br>[Troubleshoot Apache Hadoop HDFS](hdinsight-troubleshoot-hdfs.md)|<br>[How do I access a local HDFS from inside a cluster?](hdinsight-troubleshoot-hdfs.md#how-do-i-access-local-hdfs-from-inside-a-cluster)<br><br>[Local HDFS stuck in safe mode on Azure HDInsight cluster](hadoop/hdinsight-hdfs-troubleshoot-safe-mode.md)|
1818
|![Hive](./media/hdinsight-troubleshoot-guide/HIVE.png)<br>[Troubleshoot Apache Hive](hdinsight-troubleshoot-hive.md)|<br>[How do I export a Hive metastore and import it on another cluster?](hdinsight-troubleshoot-hive.md#how-do-i-export-a-hive-metastore-and-import-it-on-another-cluster)<br><br>[How do I locate Apache Hive logs on a cluster?](hdinsight-troubleshoot-hive.md#how-do-i-locate-hive-logs-on-a-cluster)<br><br>[How do I launch the Apache Hive shell with specific configurations on a cluster?](hdinsight-troubleshoot-hive.md#how-do-i-launch-the-hive-shell-with-specific-configurations-on-a-cluster)<br><br>[How do I analyze Apache Tez DAG data on a cluster-critical path?](hdinsight-troubleshoot-hive.md#how-do-i-analyze-tez-dag-data-on-a-cluster-critical-path)<br><br>[How do I download Apache Tez DAG data from a cluster?](hdinsight-troubleshoot-hive.md#how-do-i-download-tez-dag-data-from-a-cluster)|
1919
|![Spark](./media/hdinsight-troubleshoot-guide/SPARK.png)<br>[Troubleshoot Apache Spark](hdinsight-troubleshoot-SPARK.md)|<br>[How do I configure an Apache Spark application by using Apache Ambari on clusters?](spark/apache-troubleshoot-spark.md#how-do-i-configure-an-apache-spark-application-by-using-apache-ambari-on-clusters)<br><br>[How do I configure an Apache Spark application by using a Jupyter notebook on clusters?](spark/apache-troubleshoot-spark.md#how-do-i-configure-an-apache-spark-application-by-using-a-jupyter-notebook-on-clusters)<br><br>[How do I configure an Apache Spark application by using Apache Livy on clusters?](spark/apache-troubleshoot-spark.md#how-do-i-configure-an-apache-spark-application-by-using-apache-livy-on-clusters)<br><br>[How do I configure an Apache Spark application by using spark-submit on clusters?](spark/apache-troubleshoot-spark.md#how-do-i-configure-an-apache-spark-application-by-using-spark-submit-on-clusters)<br><br>[How do I configure an Apache Spark application by using IntelliJ?](spark/apache-spark-intellij-tool-plugin.md)<br><br>[How do I configure an Apache Spark application by using Eclipse?](spark/apache-spark-eclipse-tool-plugin.md)<br><br>[How do I configure an Apache Spark application by using VSCode?](hdinsight-for-vscode.md)<br><br>[What causes an Apache Spark application OutOfMemoryError exception?](spark/apache-troubleshoot-spark.md#what-causes-an-apache-spark-application-outofmemoryerror-exception)|

0 commit comments

Comments
 (0)