Skip to content

Commit cb13116

Browse files
Merge pull request #269037 from JasonWHowell/hdinsight-images
Updating image format on /hdinsight/ folder
2 parents f391bf3 + c9e94f9 commit cb13116

10 files changed

+53
-53
lines changed

articles/hdinsight/find-host-name.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ You can get the host names through Ambari UI or Ambari REST API.
2323
## Get the host names from Ambari Web UI
2424
You can use Ambari Web UI to get the host names when you SSH to the node. The Ambari Web UI hosts view is available on your HDInsight cluster at `https://CLUSTERNAME.azurehdinsight.net/#/main/hosts`, where `CLUSTERNAME` is the name of your cluster.
2525

26-
![Get-Host-Names-In-Ambari-UI](.\media\find-host-name\find-host-name-in-ambari-ui.png)
26+
:::image type="content" border="true" source=".\media\find-host-name\find-host-name-in-ambari-ui.png" alt-text="Get-Host-Names-In-Ambari-UI":::
2727

2828
## Get the host names from Ambari REST API
2929
When building automation scripts, you can use the Ambari REST API to get the host names before you make connections to hosts. The numbers in the host name are not guaranteed in sequence and HDInsight may change the host name format to align with VMs with release refresh. Don’t take the dependency on any certain naming convention that exists today.

articles/hdinsight/hbase/quickstart-resource-manager-template.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ After you complete the quickstart, you may want to delete the cluster. With HDIn
6767

6868
From the Azure portal, navigate to your cluster, and select **Delete**.
6969

70-
![Delete Resource Manager template HBase](./media/quickstart-resource-manager-template/azure-portal-delete-hbase.png)
70+
:::image type="content" border="true" source="./media/quickstart-resource-manager-template/azure-portal-delete-hbase.png" alt-text="Delete Resource Manager template HBase":::
7171

7272
You can also select the resource group name to open the resource group page, and then select **Delete resource group**. By deleting the resource group, you delete both the HDInsight cluster, and the default storage account.
7373

articles/hdinsight/hbase/troubleshoot-data-retention-issues-expired-data.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.date: 09/14/2023
1010

1111
In HBase cluster, you may decide that you would like to remove data after it ages either to free some storage and save on costs as the older data is no longer needed, either to comply with regulations. When that is needed, you need to set TTL in a table at the ColumnFamily level to expire and automatically delete older data. While TTL can be set as well at cell level, setting it at ColumnFamily level is usually a more convenient option because the ease of administration and because a cell TTL (expressed in ms) can't extend the effective lifetime of a cell beyond a ColumnFamily level TTL setting (expressed in seconds), so only required shorter retention times at cell level could benefit from setting cell level TTL.
1212

13-
Despite setting TTL, you may notice sometimes that you don't obtain the desired effect, i.e. some data hasn't expired and/or storage size hasn't decreased.
13+
Despite setting TTL, you may notice sometimes that you don't obtain the desired effect, that is, some data hasn't expired and/or storage size hasn't decreased.
1414

1515
## Prerequisites
1616

@@ -28,13 +28,13 @@ Follow the steps and given commands, open two ssh connections to HBase cluster:
2828

2929
Follow the steps given to understand where is the issue. Start by checking if the behavior occurs for a specific table or for all the tables. If you're unsure whether the issue impacts all the tables or a specific table, just consider as example a specific table name for the start.
3030

31-
1. Check first that TTL has been configured for ColumnFamily for the target tables. Run following command in the ssh session where you launched HBase shell and observe example and output below. One column family has TTL set to 50 seconds, the other ColumnFamily has no value configured for TTL, thus it appears as "FOREVER" (data in this column family isn't configured to expire).
31+
1. Check first that TTL has been configured for ColumnFamily for the target tables. Run the following command in the ssh session where you launched HBase shell and observe the output. One column family has TTL set to 50 seconds, the other ColumnFamily has no value configured for TTL, thus it appears as "FOREVER" (data in this column family isn't configured to expire).
3232

3333
```
3434
describe 'table_name'
3535
```
3636

37-
![Screenshot showing describe table name command.](media/troubleshoot-data-retention-issues-expired-data/image-1.png)
37+
:::image type="content" border="true" source="media/troubleshoot-data-retention-issues-expired-data/image-1.png" alt-text="Screenshot showing describe table name command.":::
3838

3939
1. If not configured, default TTL is set to 'FOREVER.' There are two possibilities why data is not expired as expected and removed from query result.
4040

@@ -55,7 +55,7 @@ Follow the steps given to understand where is the issue. Start by checking if th
5555
hdfs dfs -ls -R /hbase/data/default/table_name/ | grep "column_family_name"
5656
```
5757

58-
![Screenshot showing check size of store file command.](media/troubleshoot-data-retention-issues-expired-data/image-2.png)
58+
:::image type="content" border="true" source="media/troubleshoot-data-retention-issues-expired-data/image-2.png" alt-text="Screenshot showing check size of store file command.":::
5959

6060
1. Likely, there are more results shown in the output, one result for each region ID that is part of the table and between 0 and more results for StoreFiles present under each region name, for the selected ColumnFamily. To count the overall number of rows in the result output above, run the following command.
6161

@@ -79,21 +79,21 @@ Follow the steps given to understand where is the issue. Start by checking if th
7979

8080
1. An additional store file is created compared to previous result output for each region where data is modified, the StoreFile includes current content of MemStore for that region.
8181

82-
![Screenshot showing memory store for the region.](media/troubleshoot-data-retention-issues-expired-data/image-3.png)
82+
:::image type="content" border="true" source="media/troubleshoot-data-retention-issues-expired-data/image-3.png" alt-text="Screenshot showing memory store for the region.":::
8383

8484
### Check the number and size of StoreFiles per table per region after major compaction
8585

86-
1. At this point, the data from MemStore has been written to StoreFile, in storage, but expired data may still exist in one or more of the current StoreFiles. Although minor compactions can help delete some of the expired entries, it is not guaranteed that it removes all of them as minor compaction. It will not select all the StoreFiles for compaction, while major compaction will select all the StoreFiles for compaction in that region.
86+
1. At this point, the data from MemStore has been written to StoreFile, in storage, but expired data may still exist in one or more of the current StoreFiles. Although minor compactions can help delete some of the expired entries, it is not guaranteed that it removes all of them as minor compaction. It does not select all the StoreFiles for compaction, while major compaction does select all the StoreFiles for compaction in that region.
8787

88-
Also, there's another situation when minor compaction may not remove cells with TTL expired. There's a property named MIN_VERSIONS and it defaults to 0 only (see in the above output from describe 'table_name' the property MIN_VERSIONS=>'0'). If this property is set to 0, the minor compaction will remove the cells with TTL expired. If this value is greater than 0, minor compaction may not remove the cells with TTL expired even if it touches the corresponding file as part of compaction. This property configures the min number of versions of a cell to keep, even if those versions have TTL expired.
88+
Also, there's another situation when minor compaction may not remove cells with TTL expired. There's a property named MIN_VERSIONS and it defaults to 0 only (see in the above output from describe 'table_name' the property MIN_VERSIONS=>'0'). If this property is set to 0, the minor compaction removes the cells with TTL expired. If this value is greater than 0, minor compaction may not remove the cells with TTL expired even if it touches the corresponding file as part of compaction. This property configures the min number of versions of a cell to keep, even if those versions have TTL expired.
8989

90-
1. To make sure expired data is also deleted from storage, we need to run a major compaction operation. The major compaction operation, when completed, will leave behind a single StoreFile per region. In HBase shell, run the command to execute a major compaction operation on the table:
90+
1. To make sure expired data is also deleted from storage, we need to run a major compaction operation. When the major compaction operation completes, it leaves behind a single StoreFile per region. In HBase shell, run the command to execute a major compaction operation on the table:
9191

9292
```
9393
major_compact 'table_name'
9494
```
9595

96-
1. Depending on the table size, major compaction operation can take some time. Use following command in HBase shell to monitor progress. If the compaction is still running when you execute the following command, you get the output as "MAJOR", but if the compaction is completed, you get the as output "NONE."
96+
1. Depending on the table size, the major compaction operation can take some time. Use the following command in HBase shell to monitor progress. If the compaction is still running when you execute the following command, you get the output as "MAJOR", but if the compaction is completed, you get the as output "NONE."
9797

9898
```
9999
compaction_state 'table_name'
@@ -105,16 +105,16 @@ Follow the steps given to understand where is the issue. Start by checking if th
105105
hdfs dfs -ls -R /hbase/data/default/table_name/ | grep "column_family_name"
106106
```
107107

108-
1. You will notice that an extra StoreFile has been created in addition to previous ones per region per ColumnFamily and after several moments only the last created StoreFile is kept per region per column family.
108+
1. Notice that an extra StoreFile has been created in addition to previous ones per region per ColumnFamily and after several moments only the last created StoreFile is kept per region per column family.
109109

110-
![Screenshot showing store file as column family.](media/troubleshoot-data-retention-issues-expired-data/image-4.png)
110+
:::image type="content" border="true" source="media/troubleshoot-data-retention-issues-expired-data/image-4.png" alt-text="Screenshot showing store file as column family.":::
111111

112-
For the example region above, once the extra moments elapse, we can notice that one single StoreFile remained and the size occupied by this file on the storage is reduced as major compaction occurred and at this point any expired data that has not been deleted before(by another major compaction), will be deleted after running current major compaction operation.
112+
For the example region above, once the extra moments elapse, we can notice that one single StoreFile remains. Also, the size occupied by this file on the storage is reduced due to the major compaction. At this point, any expired data that has not been deleted before (by another major compaction), is soon deleted after the current major compaction operation runs.
113113

114-
![Screenshot showing expired data not deleted.](media/troubleshoot-data-retention-issues-expired-data/image-5.png)
114+
:::image type="content" border="true" source="media/troubleshoot-data-retention-issues-expired-data/image-5.png" alt-text="Screenshot showing expired data not deleted.":::
115115

116116
> [!NOTE]
117-
> For this troubleshooting exercise we triggered the major compaction manually. But in practice, doing that manually for many tables might be time consuming. By default, major compaction is disabled on HDInsight cluster. The main reason for keeping major compaction disabled by default is because the performance of the table operations is impacted when a major compaction is in progress. However, you can enable major compaction by configuring the value for the property hbase.hregion.majorcompaction in ms or can use a cron tab job or another external system to schedule compaction at a time convenient for you, with lower workload.
117+
> For this troubleshooting exercise, you triggered the major compaction manually. But in practice, doing that manually for many tables is time consuming. By default, major compaction is disabled on HDInsight cluster. The main reason for keeping major compaction disabled by default is due to performance of the table operations is impacted when a major compaction is in progress. However, you can enable major compaction by configuring the value for the property `hbase.hregion.majorcompaction` in ms or can use a cron tab job or another external system to schedule compaction at a time convenient for you, with lower workload.
118118
119119
## Next steps
120120

articles/hdinsight/hdinsight-config-for-vscode.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ For general information about working with settings in VS Code, refer to [User a
2020
3. Search **Set Configuration**.
2121
4. Expand **Extensions** in the left directory, and select **HDInsight configuration**.
2222

23-
![hdi config image](./media/HDInsight-config-for-vscode/HDInsight-config-for-vscode.png)
23+
:::image type="content" border="true" source="./media/HDInsight-config-for-vscode/HDInsight-config-for-vscode.png" alt-text="hdi config image":::
2424

2525
## General settings
2626

articles/hdinsight/hdinsight-for-vscode.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -496,7 +496,7 @@ From the menu bar, go to **View** > **Command Palette**, and then enter **Azure:
496496

497497
For Synapse PySpark installation error, since its dependency will not be maintained anymore by other team, it will not be maintained anymore. If you trying to use Synapse Pyspark interactive, please use [Azure Synapse Analytics](https://ms.web.azuresynapse.net/) instead. And it's a long term change.
498498

499-
![synapse pyspark installation error](./media/hdinsight-for-vscode/known-issue.png)
499+
:::image type="content" border="true" source="./media/hdinsight-for-vscode/known-issue.png" alt-text="synapse pyspark installation error":::
500500

501501

502502
## Next steps

0 commit comments

Comments
 (0)