You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/hdinsight/find-host-name.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ You can get the host names through Ambari UI or Ambari REST API.
23
23
## Get the host names from Ambari Web UI
24
24
You can use Ambari Web UI to get the host names when you SSH to the node. The Ambari Web UI hosts view is available on your HDInsight cluster at `https://CLUSTERNAME.azurehdinsight.net/#/main/hosts`, where `CLUSTERNAME` is the name of your cluster.
When building automation scripts, you can use the Ambari REST API to get the host names before you make connections to hosts. The numbers in the host name are not guaranteed in sequence and HDInsight may change the host name format to align with VMs with release refresh. Don’t take the dependency on any certain naming convention that exists today.
You can also select the resource group name to open the resource group page, and then select **Delete resource group**. By deleting the resource group, you delete both the HDInsight cluster, and the default storage account.

58
+
:::image type="content" border="true" source="media/troubleshoot-data-retention-issues-expired-data/image-2.png" alt-text="Screenshot showing check size of store file command.":::
59
59
60
60
1. Likely, there are more results shown in the output, one result for each region ID that is part of the table and between 0 and more results for StoreFiles present under each region name, for the selected ColumnFamily. To count the overall number of rows in the result output above, run the following command.
61
61
@@ -79,7 +79,7 @@ Follow the steps given to understand where is the issue. Start by checking if th
79
79
80
80
1. An additional store file is created compared to previous result output for each region where data is modified, the StoreFile includes current content of MemStore for that region.
81
81
82
-

82
+
:::image type="content" border="true" source="media/troubleshoot-data-retention-issues-expired-data/image-3.png" alt-text="Screenshot showing memory store for the region.":::
83
83
84
84
### Check the number and size of StoreFiles per table per region after major compaction
85
85
@@ -107,11 +107,11 @@ Follow the steps given to understand where is the issue. Start by checking if th
107
107
108
108
1. You will notice that an extra StoreFile has been created in addition to previous ones per region per ColumnFamily and after several moments only the last created StoreFile is kept per region per column family.
109
109
110
-

110
+
:::image type="content" border="true" source="media/troubleshoot-data-retention-issues-expired-data/image-4.png" alt-text="Screenshot showing store file as column family.":::
111
111
112
112
For the example region above, once the extra moments elapse, we can notice that one single StoreFile remained and the size occupied by this file on the storage is reduced as major compaction occurred and at this point any expired data that has not been deleted before(by another major compaction), will be deleted after running current major compaction operation.
113
113
114
-

114
+
:::image type="content" border="true" source="media/troubleshoot-data-retention-issues-expired-data/image-5.png" alt-text="Screenshot showing expired data not deleted.":::
115
115
116
116
> [!NOTE]
117
117
> For this troubleshooting exercise we triggered the major compaction manually. But in practice, doing that manually for many tables might be time consuming. By default, major compaction is disabled on HDInsight cluster. The main reason for keeping major compaction disabled by default is because the performance of the table operations is impacted when a major compaction is in progress. However, you can enable major compaction by configuring the value for the property hbase.hregion.majorcompaction in ms or can use a cron tab job or another external system to schedule compaction at a time convenient for you, with lower workload.
Copy file name to clipboardExpand all lines: articles/hdinsight/hdinsight-for-vscode.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -496,7 +496,7 @@ From the menu bar, go to **View** > **Command Palette**, and then enter **Azure:
496
496
497
497
For Synapse PySpark installation error, since its dependency will not be maintained anymore by other team, it will not be maintained anymore. If you trying to use Synapse Pyspark interactive, please use [Azure Synapse Analytics](https://ms.web.azuresynapse.net/) instead. And it's a long term change.
0 commit comments