Skip to content

Commit 7055239

Browse files
authored
Merge pull request #103971 from normesta/normesta-reg-updates-4
Updating as per customer request
2 parents 69ed22b + c83e66c commit 7055239

File tree

1 file changed

+15
-11
lines changed

1 file changed

+15
-11
lines changed

articles/storage/blobs/data-lake-storage-namespace.md

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,44 @@
11
---
22
title: Azure Data Lake Storage Gen2 Hierarchical Namespace
3-
description: Describes the concept of the hierarchical namespace for Azure Data Lake Storage Gen2
3+
description: Describes the concept of a hierarchical namespace for Azure Data Lake Storage Gen2
44
author: normesta
55
ms.service: storage
66
ms.topic: conceptual
7-
ms.date: 12/06/2018
7+
ms.date: 02/11/2020
88
ms.author: normesta
99
ms.reviewer: jamesbak
1010
ms.subservice: data-lake-storage-gen2
1111
---
1212

1313
# Azure Data Lake Storage Gen2 hierarchical namespace
1414

15-
A key mechanism that allows Azure Data Lake Storage Gen2 to provide file system performance at object storage scale and prices is the addition of a **hierarchical namespace**. This allows the collection of objects/files within an account to be organized into a hierarchy of directories and nested subdirectories in the same way that the file system on your computer is organized. With the hierarchical namespace enabled, a storage account becomes capable of providing the scalability and cost-effectiveness of object storage, with file system semantics that are familiar to analytics engines and frameworks.
15+
A key mechanism that allows Azure Data Lake Storage Gen2 to provide file system performance at object storage scale and prices is the addition of a **hierarchical namespace**. This allows the collection of objects/files within an account to be organized into a hierarchy of directories and nested subdirectories in the same way that the file system on your computer is organized. With a hierarchical namespace enabled, a storage account becomes capable of providing the scalability and cost-effectiveness of object storage, with file system semantics that are familiar to analytics engines and frameworks.
1616

17-
## The benefits of the hierarchical namespace
17+
## The benefits of a hierarchical namespace
1818

1919
The following benefits are associated with file systems that implement a hierarchical namespace over blob data:
2020

21-
- **Atomic directory manipulation:** Object stores approximate a directory hierarchy by adopting a convention of embedding slashes (/) in the object name to denote path segments. While this convention works for organizing objects, the convention provides no assistance for actions like moving, renaming or deleting directories. Without real directories, applications must process potentially millions of individual blobs to achieve directory-level tasks. By contrast, the hierarchical namespace processes these tasks by updating a single entry (the parent directory).
21+
- **Atomic directory manipulation:** Object stores approximate a directory hierarchy by adopting a convention of embedding slashes (/) in the object name to denote path segments. While this convention works for organizing objects, the convention provides no assistance for actions like moving, renaming or deleting directories. Without real directories, applications must process potentially millions of individual blobs to achieve directory-level tasks. By contrast, a hierarchical namespace processes these tasks by updating a single entry (the parent directory).
2222

23-
This dramatic optimization is especially significant for many big data analytics frameworks. Tools like Hive, Spark, etc. often write output to temporary locations and then rename the location at the conclusion of the job. Without the hierarchical namespace, this rename can often take longer than the analytics process itself. Lower job latency equals lower total cost of ownership (TCO) for analytics workloads.
23+
This dramatic optimization is especially significant for many big data analytics frameworks. Tools like Hive, Spark, etc. often write output to temporary locations and then rename the location at the conclusion of the job. Without a hierarchical namespace, this rename can often take longer than the analytics process itself. Lower job latency equals lower total cost of ownership (TCO) for analytics workloads.
2424

2525
- **Familiar Interface Style:** File systems are well understood by developers and users alike. There is no need to learn a new storage paradigm when you move to the cloud as the file system interface exposed by Data Lake Storage Gen2 is the same paradigm used by computers, large and small.
2626

2727
One of the reasons that object stores haven't historically supported a hierarchical namespace is that a hierarchical namespace limits scale. However, the Data Lake Storage Gen2 hierarchical namespace scales linearly and does not degrade either the data capacity or performance.
2828

29-
## When to enable the hierarchical namespace
29+
## Deciding whether to enable a hierarchical namespace
3030

31-
We recommend that you turn on the hierarchical namespace for storage workloads that are designed for file systems that manipulate directories. This includes all workloads that are primarily for analytics processing. Datasets that require a high degree of organization will also benefit by enabling the hierarchical namespace.
31+
After you've enabled a hierarchical namespace on your account, you can't revert it back to a flat namespace. Therefore, consider whether it makes sense to enable a hierarchical namespace based on the nature of your object store workloads.
3232

33-
The reasons for enabling the hierarchical namespace are determined by a TCO analysis. Generally speaking, improvements in workload latency due to storage acceleration will require compute resources for less time. Latency for many workloads may be improved due to atomic directory manipulation that is enabled by the hierarchical namespace. In many workloads, the compute resource represents > 85% of the total cost and so even a modest reduction in workload latency equates to a significant amount of TCO savings. Even in cases where enabling the hierarchical namespace increases storage costs, the TCO is still lowered due to reduced compute costs.
33+
Some workloads might not gain any benefit by enabling a hierarchical namespace. Examples include backups, image storage, and other applications where object organization is stored separately from the objects themselves (for example: in a separate database).
3434

35-
## When to disable the hierarchical namespace
35+
Also, while support for Blob storage features and the Azure service ecosystem continues to grow, there are still some features and Azure services that are not yet supported in accounts that have a hierarchical namespace. See [Known Issues](data-lake-storage-known-issues.md).
3636

37-
Some object store workloads might not gain any benefit by enabling the hierarchical namespace. Examples include backups, image storage, and other applications where object organization is stored separately from the objects themselves (for example: in a separate database).
37+
In general, we recommend that you turn on a hierarchical namespace for storage workloads that are designed for file systems that manipulate directories. This includes all workloads that are primarily for analytics processing. Datasets that require a high degree of organization will also benefit by enabling a hierarchical namespace.
38+
39+
The reasons for enabling a hierarchical namespace are determined by a TCO analysis. Generally speaking, improvements in workload latency due to storage acceleration will require compute resources for less time. Latency for many workloads may be improved due to atomic directory manipulation that is enabled by a hierarchical namespace. In many workloads, the compute resource represents > 85% of the total cost and so even a modest reduction in workload latency equates to a significant amount of TCO savings. Even in cases where enabling a hierarchical namespace increases storage costs, the TCO is still lowered due to reduced compute costs.
40+
41+
To analyze differences in data storage prices, transaction prices, and storage capacity reservation pricing between accounts that have a flat hierarchical namespace versus a hierarchical namespace, see [Azure Data Lake Storage Gen2 pricing](https://azure.microsoft.com/pricing/details/storage/data-lake/).
3842

3943
## Next steps
4044

0 commit comments

Comments
 (0)