Skip to content

Commit 877c951

Browse files
committed
Updates after sign-off
1 parent ace0914 commit 877c951

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

articles/storage/common/tape-migration-guide.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ This article focuses on tape migrations. It aims to simplify, provide guidance,
1717

1818
Tape stores a large portion of worlds data, and remains one of the dominant types of storage media. Tape media exists for decades, and is still heavily used with hundreds of exabytes of new tapes shipped every year.
1919

20-
Tapes are a great medium for storing cold data. They're fast in sequential reading, but stages requiring mechanical movements (like loading, and unloading of tapes, tape seeks, etc.) are slower. That makes tapes unusable for traditional, random based access, and is the main reason that even today data stored on tapes is rarely used. In addition, tapes are a magnetic medium that require special handling. They're sensitive to environment, particularly temperature, and humidity. If kept within their operating environmental range, they can achieve high durability, and good restore success rate. However, when kept in unfriendly environment, deterioration happens often, and renders the tape unreadable.
20+
Tapes are a great medium for storing cold data. They're fast in sequential reading, but stages requiring mechanical movements (like loading, and unloading of tapes, tape seeks, etc.) are slower. That makes tapes unusable for traditional, random based access, and is the main reason that even today data stored on tapes is rarely used. In addition, tape is a magnetic medium that requires special handling. They're sensitive to environment, particularly temperature, and humidity. If kept within their operating environmental range, they can achieve high durability, and good restore success rate. However, when kept in unfriendly environment, deterioration happens often, and renders the tape unreadable.
2121

2222
Large portions of tapes store dark data (data that is created, and stored, but not used for any purpose). Dark data brings no value to the data owner. With the increase in AI capability, and accessibility, the trend is changing. Customers are looking into how dark data can help them to increase efficiency, open new revenue streams, or increase their competitive advantage. To take advantage of dark data, many organizations are considering migrating the data from tapes to cloud storage. Cloud storage provides an easy way to analyze the data, extract business value (with services like AI, Machine Learning, Azure Search, etc.), or reduce cost by taking advantage of archival storage for long-term retention.
2323

@@ -81,7 +81,7 @@ Second option is easier, and more commonly used. Tape migration partners have fa
8181
Several partners can perform tape migrations to Azure. The full list of partners can be found on [offline media import](https://azure.microsoft.com/products/databox/offline-media-import/).
8282

8383
Here's a simple flowchart to ease the selection process.
84-
![Chart showing tape migration selection process](./media/tape-migration-guide/tape-migration-chart.png)
84+
![Chart showing tape migration selection process.](./media/tape-migration-guide/tape-migration-chart.png)
8585

8686
### Data format
8787

@@ -92,12 +92,12 @@ Data format has a large impact on migration design, and is the critical consider
9292
|Virtual tapes | - Easier, and faster migration <br> - Can recreate identical tape media as the original <br> - No need to have access to the original software to write the data | - Requires maintaining virtual tape inventory <br> - Data stored in application dependent format, requires original software to restore the data <br> - Data not accessible by Azure services (AI / ML) without restore |
9393
| Native files | - Files accessible by any application, and service (AI / ML) <br> - Possible to monetize the data <br> - No need to have access to original software for restores | - More complex migration <br> - Requires access to original software to write the data |
9494

95-
Main criteria for deciding the format is how do we plan to use the migrated data. If data is migrated only for long-term retention, then virtual tapes are a great choice. In any other case, storing data in native format is a preferred option. It allows simple usage of data in the future, and opens up many possibilities with data analysis.
95+
The main criterion for deciding the format is the way we plan to use the data. If data is migrated only for long-term retention, then virtual tapes are a great choice. In any other case, storing data in native format is a preferred option. It allows simple usage of data in the future, and opens up many possibilities with data analysis.
9696

9797
## Migration process
9898

9999
Once we made decisions on migration execution, and preferred data format, we can start with the migration. Migration goes through several phases.
100-
![Picture showing tape migration phases](./media/tape-migration-guide/tape-migration-steps.png)
100+
![Picture showing tape migration phases.](./media/tape-migration-guide/tape-migration-steps.png)
101101

102102
### Information phase
103103

@@ -143,7 +143,7 @@ After we gathered basic information, we can prepare for the migration. Preparati
143143
### Migration phase
144144

145145
Once the migration design is final, we start the migration process. Before ramping up to full migration pace, we always perform a test with a smaller sample. Goal for the test is to make sure that end-to-end process works. It allows us to make tweaks, and improve the process. Once the test is successful, and we're happy with the results, we execute the migration. Migration phase is slightly different if we're using native files vs. virtual tapes. In both cases, it's a repetitive process that circles through all the tapes, and reads their entire content. This flowchart shows the migration phase when migrating to native files.
146-
![Flowchart that shows migration phase](./media/tape-migration-guide/tape-migration-phase.png)
146+
![Flowchart that shows details of a migration phase.](./media/tape-migration-guide/tape-migration-phase.png)
147147

148148
#### Data validation
149149
For each file we migrate, we need to perform data validation to make sure that data wasn't corrupted during the migration process. Data validation is done by comparing hash values before the migration, and after the migration. There are many types of hashing algorithms that can be used. A common approach is to use MD5 since Azure Storage contains a pre-defined metadata field Content-MD5 that can be filled during the migration. This approach allows checking the same MD5 value when we access the data to validate the data is not changed, or corrupted. In ideal situation, source data already contains hash values that can be easily compared to hash values post-migration. If hashes don't exist, they must be calculated before the file is migrated. If hashes match, file is marked as migrated. If not, file is discarded, and migrated again. Sometimes the data is corrupted on the source tapes. Having the original hash values helps with catching those rare cases. If they happen, we can read the data from secondary copy if it exists. Data validation process is a critical component for a migration design. Process for handling failed validation must be defined. Migration phase is also constantly monitored to make sure we can react to unpredictable situation, and adapt to it. Regular reporting to main stakeholders is important to keep the migration on track.

0 commit comments

Comments
 (0)