You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/storage/blobs/data-lake-storage-best-practices.md
+17-18Lines changed: 17 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -148,31 +148,30 @@ Then, review the [Access control model in Azure Data Lake Storage Gen2](data-lak
148
148
149
149
## Ingest, process, and analyze
150
150
151
-
Something here about documentation not being something that we do in this collection.
151
+
There are many different sources of data and the different ways in which that data can be ingested into a Data Lake Storage Gen2 enabled account.
152
152
153
-
#### Ingesting data
153
+
You can also ingest large sets of data from HD Insight or Hadoop clusters or smaller sets of *ad hoc* data for prototyping applications.
154
154
155
-
There are many different sources of data and the different ways in which that data can be ingested into a Data Lake Storage Gen2 enabled account. This table presents some common sources and the tools that we recommend for each source
155
+
Streamed data is generated by various sources such as applications, devices, and sensors. You can use tools to capture and process the data on an event-by-event basis in real time, and then write the events in batches into your account.
156
156
157
-
| Data source | Recommended tools |
158
-
|---|---|
159
-
| Ad hoc<br><br>Smaller sets of data for prototyping applications | <li>Azure portal</li><li>[Azure PowerShell](data-lake-storage-directory-file-acl-powershell.md)</li><li>[Azure CLI](data-lake-storage-directory-file-acl-cli.md)</li><li>[REST](/rest/api/storageservices/data-lake-storage-gen2)</li><li>[Azure Storage Explorer](https://azure.microsoft.com/features/storage-explorer/)</li><li>[Apache DistCp](data-lake-storage-use-distcp.md)</li><li>[AzCopy](../common/storage-use-azcopy-v10.md)</li>|
160
-
| Streamed<br><br>Generated by various sources such as applications, devices, and sensors. Tools used to ingest this type of data usually capture and process the data on an event-by-event basis in real time, and then write the events in batches into your account. | <li>[HDInsight Storm](../../hdinsight/storm/apache-storm-write-data-lake-store.md)</li><li>[Azure Stream Analytics](../../stream-analytics/stream-analytics-quick-create-portal.md)</li> |
161
-
| Relational<br><br>Records from relational databases ||[Azure Data Factory](../../data-factory/connector-azure-data-lake-store.md)|
162
-
| Web server logs<br><br>These files contain information such as the history of page requests. Consider writing custom scripts or applications to upload this data so you'll have the flexibility to include your data uploading component as part of your larger big data application. | <li>[Azure PowerShell](data-lake-storage-directory-file-acl-powershell.md)</li><li>[Azure CLI](data-lake-storage-directory-file-acl-cli.md)</li><li>[REST](/rest/api/storageservices/data-lake-storage-gen2)</li><li>Azure SDKs ([.NET](data-lake-storage-directory-file-acl-dotnet.md), [Java](data-lake-storage-directory-file-acl-java.md), [Python](data-lake-storage-directory-file-acl-python.md), and [Node.js](data-lake-storage-directory-file-acl-javascript.md))</li><li>[Azure Data Factory](../../data-factory/connector-azure-data-lake-store.md)</li> |
163
-
| HD Insight<br><br>Data from HDInsight cluster types (For example: Hadoop, HBase, Storm) | <li>[Azure Data Factory](../../data-factory/connector-azure-data-lake-store.md)</li><li>[Apache DistCp](data-lake-storage-use-distcp.md)</li><li>[AzCopy](../common/storage-use-azcopy-v10.md)</li> |
164
-
| Hadoop clusters<br><br>Running on-premise or in the cloud | <li>[Azure Data Factory](../../data-factory/connector-azure-data-lake-store.md)</li><li>[Apache DistCp](data-lake-storage-use-distcp.md)</li><li>[WANdisco LiveData Migrator for Azure](migrate-gen2-wandisco-live-data-platform.md)</li><li>[Azure Data Box](data-lake-storage-migrate-on-premises-hdfs-cluster.md)</li> |
165
-
| Large data sets<br><br>Data sets that range in several terabytes |[Azure ExpressRoute documentation](../../expressroute/expressroute-introduction.md)|
157
+
Web server logs contain information such as the history of page requests. Consider writing custom scripts or applications to upload web server logs so you'll have the flexibility to include your data uploading component as part of your larger big data application.
166
158
167
-
#### Process, analyze, visualize, and download
159
+
Once the data is available in Data Lake Storage Gen2 you can run analysis on that data, create visualizations, and even download data to your local machine or to other repositories such as an Azure SQL database or SQL Server instance.
168
160
169
-
Once the data is available in Data Lake Storage Gen2 you can run analysis on that data, create visualizations, and even download data to your local machine or to other repositories such as an Azure SQL database or SQL Server instance. The following sections recommend tools that you can use to analyze, visualize, and download data.
161
+
The following table recommend tools that you can use to ingest, analyze, visualize, and download data. Use the links in this table to find guidance about how to configure and use each tool.
170
162
171
-
| Purpose | Recommended tool|
163
+
| Purpose | Recommended tools|
172
164
|---|---|
173
-
| Process / Analyze | <li>[Azure Synapse Analytics](../../synapse-analytics/get-started-analyze-storage.md)</li><li>[Azure HDInsight](../../hdinsight/hdinsight-hadoop-use-data-lake-storage-gen2.md)</li><li>[Databricks](/azure/databricks/scenarios/databricks-extract-load-sql-data-warehouse)|
174
-
| Visualize | <li>[Power BI](/power-query/connectors/datalakestorage)</li><li>[Azure Data Lake Storage query acceleration](data-lake-storage-query-acceleration.md)</li> |
175
-
| Download | <li>Azure portal</li><li>[PowerShell](data-lake-storage-directory-file-acl-powershell.md)</li><li>[Azure CLI](data-lake-storage-directory-file-acl-cli.md)</li><li>[REST](/rest/api/storageservices/data-lake-storage-gen2)</li><li>Azure SDKs ([.NET](data-lake-storage-directory-file-acl-dotnet.md), [Java](data-lake-storage-directory-file-acl-java.md), [Python](data-lake-storage-directory-file-acl-python.md), and [Node.js](data-lake-storage-directory-file-acl-javascript.md))</li><li>[Azure Storage Explorer](data-lake-storage-explorer.md)</li><li>[AzCopy](../common/storage-use-azcopy-v10.md#transfer-data)</li><li>[Azure Data Factory](../../data-factory/copy-activity-overview.md)</li><li>[Apache DistCp](./data-lake-storage-use-distcp.md)|
165
+
| Ingest ad hoc data| Azure portal, [Azure PowerShell](data-lake-storage-directory-file-acl-powershell.md), [Azure CLI](data-lake-storage-directory-file-acl-cli.md), [REST](/rest/api/storageservices/data-lake-storage-gen2), [Azure Storage Explorer](https://azure.microsoft.com/features/storage-explorer/), [Apache DistCp](data-lake-storage-use-distcp.md), [AzCopy](../common/storage-use-azcopy-v10.md)|
166
+
| Ingest streaming data |[HDInsight Storm](../../hdinsight/storm/apache-storm-write-data-lake-store.md), [Azure Stream Analytics](../../stream-analytics/stream-analytics-quick-create-portal.md)|
167
+
| Ingest relational data |[Azure Data Factory](../../data-factory/connector-azure-data-lake-store.md)|
168
+
| Ingest web server logs |[Azure PowerShell](data-lake-storage-directory-file-acl-powershell.md), [Azure CLI](data-lake-storage-directory-file-acl-cli.md), [REST](/rest/api/storageservices/data-lake-storage-gen2), Azure SDKs ([.NET](data-lake-storage-directory-file-acl-dotnet.md), [Java](data-lake-storage-directory-file-acl-java.md), [Python](data-lake-storage-directory-file-acl-python.md), and [Node.js](data-lake-storage-directory-file-acl-javascript.md)), [Azure Data Factory](../../data-factory/connector-azure-data-lake-store.md)|
169
+
| Ingest from HD Insight clusters |[Azure Data Factory](../../data-factory/connector-azure-data-lake-store.md), [Apache DistCp](data-lake-storage-use-distcp.md), [AzCopy](../common/storage-use-azcopy-v10.md)|
170
+
| Ingest from Hadoop clusters |[Azure Data Factory](../../data-factory/connector-azure-data-lake-store.md), [Apache DistCp](data-lake-storage-use-distcp.md), [WANdisco LiveData Migrator for Azure](migrate-gen2-wandisco-live-data-platform.md), [Azure Data Box](data-lake-storage-migrate-on-premises-hdfs-cluster.md)|
171
+
| Ingest large data sets (several terabytes) |[Azure ExpressRoute](../../expressroute/expressroute-introduction.md)|
172
+
| Process & analyze data |[Azure Synapse Analytics](../../synapse-analytics/get-started-analyze-storage.md), [Azure HDInsight](../../hdinsight/hdinsight-hadoop-use-data-lake-storage-gen2.md), [Databricks](/azure/databricks/scenarios/databricks-extract-load-sql-data-warehouse)|
173
+
| Visualize data |[Power BI](/power-query/connectors/datalakestorage), [Azure Data Lake Storage query acceleration](data-lake-storage-query-acceleration.md)|
174
+
| Download data | Azure portal, [PowerShell](data-lake-storage-directory-file-acl-powershell.md), [Azure CLI](data-lake-storage-directory-file-acl-cli.md), [REST](/rest/api/storageservices/data-lake-storage-gen2), Azure SDKs ([.NET](data-lake-storage-directory-file-acl-dotnet.md), [Java](data-lake-storage-directory-file-acl-java.md), [Python](data-lake-storage-directory-file-acl-python.md), and [Node.js](data-lake-storage-directory-file-acl-javascript.md)), [Azure Storage Explorer](data-lake-storage-explorer.md), [AzCopy](../common/storage-use-azcopy-v10.md#transfer-data), [Azure Data Factory](../../data-factory/copy-activity-overview.md), [Apache DistCp](./data-lake-storage-use-distcp.md)|
0 commit comments