Skip to content

Commit 99fcc68

Browse files
Merge pull request #251458 from v-akarnase/patch-10
Update troubleshoot-data-retention-issues-expired-data.md
2 parents 6cf196f + 828b6e1 commit 99fcc68

File tree

1 file changed

+16
-16
lines changed

1 file changed

+16
-16
lines changed

articles/hdinsight/hbase/troubleshoot-data-retention-issues-expired-data.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -3,42 +3,42 @@ title: Troubleshoot data retention (TTL) issues with expired data not being dele
33
description: Troubleshoot various data-retention (TTL) issues with expired data not being deleted from storage on Azure HDInsight
44
ms.service: hdinsight
55
ms.topic: troubleshooting
6-
ms.date: 05/06/2022
6+
ms.date: 09/14/2023
77
---
88

99
# Troubleshoot data retention (TTL) issues with expired data not being deleted from storage on Azure HDInsight
1010

11-
In HBase cluster, you may decide that you would like to remove data after it ages either to free some storage and save on costs as the older data is no longer needed, either to comply with regulations. When that is needed, you'll usually set TTL in a table at the ColumnFamily level to expire and automatically delete older data. While TTL can be set as well at cell level, setting it at ColumnFamily level is usually a more convenient option because the ease of administration and because a cell TTL (expressed in ms) can't extend the effective lifetime of a cell beyond a ColumnFamily level TTL setting (expressed in seconds), so only required shorter retention times at cell level could benefit from setting cell level TTL.
11+
In HBase cluster, you may decide that you would like to remove data after it ages either to free some storage and save on costs as the older data is no longer needed, either to comply with regulations. When that is needed, you need to set TTL in a table at the ColumnFamily level to expire and automatically delete older data. While TTL can be set as well at cell level, setting it at ColumnFamily level is usually a more convenient option because the ease of administration and because a cell TTL (expressed in ms) can't extend the effective lifetime of a cell beyond a ColumnFamily level TTL setting (expressed in seconds), so only required shorter retention times at cell level could benefit from setting cell level TTL.
1212

1313
Despite setting TTL, you may notice sometimes that you don't obtain the desired effect, i.e. some data hasn't expired and/or storage size hasn't decreased.
1414

1515
## Prerequisites
1616

17-
To prepare to follow the steps and commands below, open two ssh connections to HBase cluster:
17+
Follow the steps and given commands, open two ssh connections to HBase cluster:
1818

19-
* In one of the ssh sessions keep the default bash shell.
19+
* In one of, the ssh sessions keep the default bash shell.
2020

21-
* In the second ssh session launch HBase shell by running the command below.
21+
* In the second ssh session launch HBase shell by running, the following command.
2222

2323
```
2424
hbase shell
2525
```
2626

2727
### Check if desired TTL is configured and if expired data is removed from query result
2828

29-
Follow the steps below to understand where is the issue. Start by checking if the behavior occurs for a specific table or for all the tables. If you're unsure whether the issue impacts all the tables or a specific table, just consider as example a specific table name for the start.
29+
Follow the steps given to understand where is the issue. Start by checking if the behavior occurs for a specific table or for all the tables. If you're unsure whether the issue impacts all the tables or a specific table, just consider as example a specific table name for the start.
3030

31-
1. Check first that TTL has been configured for ColumnFamily for the target tables. Run the command below in the ssh session where you launched HBase shell and observe example and output below. One column family has TTL set to 50 seconds, the other ColumnFamily has no value configured for TTL, thus it appears as "FOREVER" (data in this column family isn't configured to expire).
31+
1. Check first that TTL has been configured for ColumnFamily for the target tables. Run following command in the ssh session where you launched HBase shell and observe example and output below. One column family has TTL set to 50 seconds, the other ColumnFamily has no value configured for TTL, thus it appears as "FOREVER" (data in this column family isn't configured to expire).
3232

3333
```
3434
describe 'table_name'
3535
```
3636

3737
![Screenshot showing describe table name command.](media/troubleshoot-data-retention-issues-expired-data/image-1.png)
3838

39-
1. If not configured, default TTL is set to 'FOREVER'. There are two possibilities why data is not expired as expected and removed from query result.
39+
1. If not configured, default TTL is set to 'FOREVER.' There are two possibilities why data is not expired as expected and removed from query result.
4040

41-
1. If TTL has any other value then 'FOREVER', observe the value for column family and note down the value in seconds(pay special attention to value correlated with the unit measure as cell TTL is in ms, but column family TTL is in seconds) to confirm if it is the expected one. If the observed value isn't correct, fix that first.
41+
1. If TTL has any other value, then 'FOREVER', observe the value for column family and note down the value in seconds(pay special attention to value correlated with the unit measure as cell TTL is in ms, but column family TTL is in seconds) to confirm if it is the expected one. If the observed value isn't correct, fix that first.
4242
1. If TTL value is 'FOREVER' for all column families, configure TTL as first step and afterwards monitor if data is expired as expected.
4343

4444
1. If you establish that TTL is configured and has the correct value for the ColumnFamily, next step is to confirm that the expired data no longer shows up when doing table scans. When data expires, it should be removed and not show up in the scan table results. Run the below command in HBase shell to check.
@@ -49,15 +49,15 @@ Follow the steps below to understand where is the issue. Start by checking if th
4949

5050
### Check the number and size of StoreFiles per table per region to observe if any changes are visible after the compaction operation
5151

52-
1. Before moving to next step, from ssh session with bash shell, run the following command to check the current number of StoreFiles and size for each StoreFile currently showing up for the ColumnFamily for which the TTL has been configured. Note first the table and ColumnFamily for which you'll be doing the check, then run the following command in ssh session (bash).
52+
1. Before moving to next step, from ssh session with bash shell, run the following command to check the current number of StoreFiles and size for each StoreFile currently showing up for the ColumnFamily for which the TTL has been configured. Note first the table and ColumnFamily for which you are doing the check, then run the following command in ssh session (bash).
5353

5454
```
5555
hdfs dfs -ls -R /hbase/data/default/table_name/ | grep "column_family_name"
5656
```
5757

5858
![Screenshot showing check size of store file command.](media/troubleshoot-data-retention-issues-expired-data/image-2.png)
5959

60-
1. Likely, there will be more results shown in the output, one result for each region ID that is part of the table and between 0 and more results for StoreFiles present under each region name, for the selected ColumnFamily. To count the overall number of rows in the result output above, run the following command.
60+
1. Likely, there are more results shown in the output, one result for each region ID that is part of the table and between 0 and more results for StoreFiles present under each region name, for the selected ColumnFamily. To count the overall number of rows in the result output above, run the following command.
6161

6262
```
6363
hdfs dfs -ls -R /hbase/data/default/table_name/ | grep "column_family_name" | wc -l
@@ -77,13 +77,13 @@ Follow the steps below to understand where is the issue. Start by checking if th
7777
hdfs dfs -ls -R /hbase/data/default/table_name/ | grep "column_family_name"
7878
```
7979

80-
1. An additional store file is created compared to previous result output for each region where data is modified, the StoreFile will include current content of MemStore for that region.
80+
1. An additional store file is created compared to previous result output for each region where data is modified, the StoreFile includes current content of MemStore for that region.
8181

8282
![Screenshot showing memory store for the region.](media/troubleshoot-data-retention-issues-expired-data/image-3.png)
8383

8484
### Check the number and size of StoreFiles per table per region after major compaction
8585

86-
1. At this point, the data from MemStore has been written to StoreFile, in storage, but expired data may still exist in one or more of the current StoreFiles. Although minor compactions can help delete some of the expired entries, it isn't guaranteed that it will remove all of them as minor compaction will usually not select all the StoreFiles for compaction, while major compaction will select all the StoreFiles for compaction in that region.
86+
1. At this point, the data from MemStore has been written to StoreFile, in storage, but expired data may still exist in one or more of the current StoreFiles. Although minor compactions can help delete some of the expired entries, it is not guaranteed that it removes all of them as minor compaction. It will not select all the StoreFiles for compaction, while major compaction will select all the StoreFiles for compaction in that region.
8787

8888
Also, there's another situation when minor compaction may not remove cells with TTL expired. There's a property named MIN_VERSIONS and it defaults to 0 only (see in the above output from describe 'table_name' the property MIN_VERSIONS=>'0'). If this property is set to 0, the minor compaction will remove the cells with TTL expired. If this value is greater than 0, minor compaction may not remove the cells with TTL expired even if it touches the corresponding file as part of compaction. This property configures the min number of versions of a cell to keep, even if those versions have TTL expired.
8989

@@ -93,13 +93,13 @@ Follow the steps below to understand where is the issue. Start by checking if th
9393
major_compact 'table_name'
9494
```
9595

96-
1. Depending on the table size, major compaction operation can take some time. Use the command below in HBase shell to monitor progress. If the compaction is still running when you execute the command below, you'll see the output "MAJOR", but if the compaction is completed, you will see the output "NONE".
96+
1. Depending on the table size, major compaction operation can take some time. Use following command in HBase shell to monitor progress. If the compaction is still running when you execute the following command, you get the output as "MAJOR", but if the compaction is completed, you get the as output "NONE."
9797

9898
```
9999
compaction_state 'table_name'
100100
```
101101

102-
1. When the compaction status appears as "NONE" in hbase shell, if you switch quickly to bash and run command
102+
1. When the compaction status appears as "NONE" in hbase shell, if you switch quickly to bash and run command.
103103

104104
```
105105
hdfs dfs -ls -R /hbase/data/default/table_name/ | grep "column_family_name"
@@ -122,6 +122,6 @@ If you didn't see your problem or are unable to solve your issue, visit one of t
122122

123123
* Get answers from Azure experts through [Azure Community Support](https://azure.microsoft.com/support/community/).
124124

125-
* Connect with [@AzureSupport](https://twitter.com/azuresupport) - the official Microsoft Azure account for improving customer experience. Connecting the Azure community to the right resources: answers, support, and experts.
125+
* Connect with [@AzureSupport](https://twitter.com/azuresupport) - the official Microsoft Azure account for improving customer experience. Connecting the Azure community to the right resources: `answers`, `support`, and `experts`.
126126

127127
* If you need more help, you can submit a support request from the [Azure portal](https://portal.azure.com/?#blade/Microsoft_Azure_Support/HelpAndSupportBlade/). Select **Support** from the menu bar or open the **Help + support** hub. For more detailed information, review [How to create an Azure support request](../../azure-portal/supportability/how-to-create-azure-support-request.md). Access to Subscription Management and billing support is included with your Microsoft Azure subscription, and Technical Support is provided through one of the [Azure Support Plans](https://azure.microsoft.com/support/plans/).

0 commit comments

Comments
 (0)