You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/storage/blobs/data-lake-storage-use-databricks-spark.md
+5-7Lines changed: 5 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ author: normesta
6
6
7
7
ms.service: azure-data-lake-storage
8
8
ms.topic: tutorial
9
-
ms.date: 11/18/2024
9
+
ms.date: 01/13/2025
10
10
ms.author: normesta
11
11
ms.reviewer: dineshm
12
12
ms.custom: py-fresh-zinc
@@ -39,13 +39,11 @@ If you don't have an Azure subscription, create a [free account](https://azure.m
39
39
40
40
See [Tutorial: Connect to Azure Data Lake Storage](/azure/databricks/getting-started/connect-to-azure-storage) (Steps 1 through 3). After completing these steps, make sure to paste the tenant ID, app ID, and client secret values into a text file. You use them later in this tutorial.
41
41
42
-
## Create an Azure Databricks workspace, cluster, and notebook
42
+
## Create an Azure Databricks workspace and notebook
43
43
44
44
1. Create an Azure Databricks workspace. See [Create an Azure Databricks workspace](/azure/databricks/getting-started/#--create-an-azure-databricks-workspace).
45
45
46
-
2. Create a cluster. See [Create a cluster](/azure/databricks/getting-started/quick-start#step-1-create-a-cluster).
47
-
48
-
3. Create a notebook. See [Create a notebook](/azure/databricks/notebooks/notebooks-manage#--create-a-notebook). Choose Python as the default language of the notebook.
46
+
2. Create a notebook. See [Create a notebook](/azure/databricks/notebooks/quick-start#create-notebook). Choose Python as the default language of the notebook.
49
47
50
48
Keep your notebook open. You use it in the following sections.
51
49
@@ -109,7 +107,7 @@ In this section, you mount your Azure Data Lake Storage cloud object storage to
109
107
110
108
1. In the notebook you created previously, select the **Connect** button in the upper right corner of the [notebook toolbar](/azure/databricks/notebooks/notebook-ui#--notebook-toolbar-icons-and-buttons). This button opens the compute selector. (If you've already connected your notebook to a cluster, the name of that cluster is shown in the button text rather than **Connect**).
111
109
112
-
1. In the cluster dropdown menu, select the cluster you previously created.
110
+
1. In the cluster dropdown menu, select any cluster you've previously created.
113
111
114
112
1. Notice that the text in the cluster selector changes to *starting*. Wait for the cluster to finish starting and for the name of the cluster to appear in the button before continuing.
115
113
@@ -293,7 +291,7 @@ In this tutorial, you:
293
291
294
292
- Created Azure resources, including an Azure Data Lake Storage storage account and Azure AD service principal, and assigned permissions to access the storage account.
295
293
296
-
- Created an Azure Databricks workspace, notebook, andcompute cluster.
294
+
- Created an Azure Databricks workspaceandnotebook.
297
295
298
296
- Used AzCopy to upload unstructured *.csv* flight data to the Azure Data Lake Storage storage account.
Copy file name to clipboardExpand all lines: articles/storage/blobs/storage-blob-calculate-container-statistics-databricks.md
+17-18Lines changed: 17 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ description: Description goes here
4
4
author: normesta
5
5
ms.service: azure-blob-storage
6
6
ms.topic: tutorial
7
-
ms.date: 02/08/2023
7
+
ms.date: 01/13/2025
8
8
ms.author: normesta
9
9
---
10
10
@@ -16,7 +16,7 @@ In this tutorial, you learn how to:
16
16
17
17
> [!div class="checklist"]
18
18
> * Generate an inventory report
19
-
> * Create an Azure Databricks workspace, cluster, and notebook
19
+
> * Create an Azure Databricks workspace and notebook
20
20
> * Read the blob inventory file
21
21
> * Get the number and total size of blobs, snapshots, and versions
22
22
> * Get the number of blobs by blob type and content type
@@ -50,13 +50,13 @@ You might have to wait up to 24 hours after enabling inventory reports for your
50
50
51
51
## Configure Azure Databricks
52
52
53
-
In this section, you create an Azure Databricks workspace, cluster, and notebook. Later in this tutorial, you paste code snippets into notebook cells, and then run them to gather container statistics.
53
+
In this section, you create an Azure Databricks workspaceand notebook. Later in this tutorial, you paste code snippets into notebook cells, and then run them to gather container statistics.
54
54
55
55
1. Create an Azure Databricks workspace. See [Create an Azure Databricks workspace](/azure/databricks/getting-started/#--create-an-azure-databricks-workspace).
56
56
57
-
2. Create a cluster. See [Create a cluster](/azure/databricks/getting-started/quick-start#step-1-create-a-cluster).
57
+
2. Create a new notebook. See [Create a notebook](/azure/databricks/getting-started/quick-start#create-notebook).
58
58
59
-
3.Create a notebook and choose Python as the default language of the notebook. See [Create a notebook](/azure/databricks/notebooks/notebooks-manage#--create-a-notebook).
59
+
3.Choose Python as the default language of the notebook.
60
60
61
61
## Read the blob inventory file
62
62
@@ -65,11 +65,11 @@ In this section, you create an Azure Databricks workspace, cluster, and notebook
65
65
```python
66
66
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
@@ -92,7 +92,7 @@ In this section, you create an Azure Databricks workspace, cluster, and notebook
92
92
93
93
- If your account has a hierarchical namespace, set the `hierarchical_namespace_enabled` variable to `True`.
94
94
95
-
3. Press the SHIFT+ENTER keys to run the code in this block.
95
+
3. Press the Run button to run the code in this cell.
96
96
97
97
## Get blob count and size
98
98
@@ -103,7 +103,7 @@ In this section, you create an Azure Databricks workspace, cluster, and notebook
103
103
print("Number of bytes occupied by blobs in the container:", df.agg({'Content-Length': 'sum'}).first()['sum(Content-Length)'])
104
104
```
105
105
106
-
2. Press SHIFT + ENTER to run the cell.
106
+
2. Press the run button to run the cell.
107
107
108
108
The notebook displays the number of blobs in a container and the number of bytes occupied by blobs in the container.
109
109
@@ -122,7 +122,7 @@ In this section, you create an Azure Databricks workspace, cluster, and notebook
122
122
print("Number of bytes occupied by snapshots in the container:", dfT.agg({'Content-Length': 'sum'}).first()['sum(Content-Length)'])
123
123
```
124
124
125
-
2. Press SHIFT + ENTER to run the cell.
125
+
2. Press the run button to run the cell.
126
126
127
127
The notebook displays the number of snapshots and total number of bytes occupied by blob snapshots.
128
128
@@ -141,14 +141,13 @@ In this section, you create an Azure Databricks workspace, cluster, and notebook
141
141
print("Number of bytes occupied by versions in the container:", dfT.agg({'Content-Length': 'sum'}).first()['sum(Content-Length)'])
142
142
```
143
143
144
-
2. Press SHIFT + ENTER to run the cell.
144
+
2. Press SHIFT + ENTER to run the cell.
145
145
146
146
The notebook displays the number of blob versions and total number of bytes occupied by blob versions.
147
147
148
148
> [!div class="mx-imgBorder"]
149
149
> 
150
150
151
-
152
151
## Get blob count by blob type
153
152
154
153
1. In a new cell, paste the following code:
@@ -157,7 +156,7 @@ In this section, you create an Azure Databricks workspace, cluster, and notebook
157
156
display(df.groupBy('BlobType').count().withColumnRenamed("count", "Total number of blobs in the container by BlobType"))
158
157
```
159
158
160
-
2. Press SHIFT + ENTER to run the cell.
159
+
2. Press SHIFT + ENTER to run the cell.
161
160
162
161
The notebook displays the number of blob types by type.
163
162
@@ -172,7 +171,7 @@ In this section, you create an Azure Databricks workspace, cluster, and notebook
172
171
display(df.groupBy('Content-Type').count().withColumnRenamed("count", "Total number of blobs in the container by Content-Type"))
173
172
```
174
173
175
-
2. Press SHIFT + ENTER to run the cell.
174
+
2. Press SHIFT + ENTER to run the cell.
176
175
177
176
The notebook displays the number of blobs associated with each content type.
178
177
@@ -181,7 +180,7 @@ In this section, you create an Azure Databricks workspace, cluster, and notebook
181
180
182
181
## Terminate the cluster
183
182
184
-
To avoid unnecessary billing, make sure to terminate the cluster. See [Terminate a cluster](/azure/databricks/clusters/clusters-manage#--terminate-a-cluster).
183
+
To avoid unnecessary billing, terminate your compute resource. See [terminate a compute](/azure/databricks/clusters/clusters-manage#cluster-terminate).
0 commit comments