Skip to content

Commit 3ae2bb5

Browse files
Merge pull request #292965 from whhender/january-freshness-2025
January freshness 2025 - part 1
2 parents 75dc47a + 58d6b29 commit 3ae2bb5

File tree

3 files changed

+43
-49
lines changed

3 files changed

+43
-49
lines changed

articles/storage/blobs/data-lake-storage-use-databricks-spark.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: normesta
66

77
ms.service: azure-data-lake-storage
88
ms.topic: tutorial
9-
ms.date: 11/18/2024
9+
ms.date: 01/13/2025
1010
ms.author: normesta
1111
ms.reviewer: dineshm
1212
ms.custom: py-fresh-zinc
@@ -39,13 +39,11 @@ If you don't have an Azure subscription, create a [free account](https://azure.m
3939

4040
See [Tutorial: Connect to Azure Data Lake Storage](/azure/databricks/getting-started/connect-to-azure-storage) (Steps 1 through 3). After completing these steps, make sure to paste the tenant ID, app ID, and client secret values into a text file. You use them later in this tutorial.
4141

42-
## Create an Azure Databricks workspace, cluster, and notebook
42+
## Create an Azure Databricks workspace and notebook
4343

4444
1. Create an Azure Databricks workspace. See [Create an Azure Databricks workspace](/azure/databricks/getting-started/#--create-an-azure-databricks-workspace).
4545

46-
2. Create a cluster. See [Create a cluster](/azure/databricks/getting-started/quick-start#step-1-create-a-cluster).
47-
48-
3. Create a notebook. See [Create a notebook](/azure/databricks/notebooks/notebooks-manage#--create-a-notebook). Choose Python as the default language of the notebook.
46+
2. Create a notebook. See [Create a notebook](/azure/databricks/notebooks/quick-start#create-notebook). Choose Python as the default language of the notebook.
4947

5048
Keep your notebook open. You use it in the following sections.
5149

@@ -109,7 +107,7 @@ In this section, you mount your Azure Data Lake Storage cloud object storage to
109107

110108
1. In the notebook you created previously, select the **Connect** button in the upper right corner of the [notebook toolbar](/azure/databricks/notebooks/notebook-ui#--notebook-toolbar-icons-and-buttons). This button opens the compute selector. (If you've already connected your notebook to a cluster, the name of that cluster is shown in the button text rather than **Connect**).
111109

112-
1. In the cluster dropdown menu, select the cluster you previously created.
110+
1. In the cluster dropdown menu, select any cluster you've previously created.
113111

114112
1. Notice that the text in the cluster selector changes to *starting*. Wait for the cluster to finish starting and for the name of the cluster to appear in the button before continuing.
115113

@@ -293,7 +291,7 @@ In this tutorial, you:
293291

294292
- Created Azure resources, including an Azure Data Lake Storage storage account and Azure AD service principal, and assigned permissions to access the storage account.
295293

296-
- Created an Azure Databricks workspace, notebook, and compute cluster.
294+
- Created an Azure Databricks workspace and notebook.
297295

298296
- Used AzCopy to upload unstructured *.csv* flight data to the Azure Data Lake Storage storage account.
299297

articles/storage/blobs/storage-blob-calculate-container-statistics-databricks.md

Lines changed: 17 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Description goes here
44
author: normesta
55
ms.service: azure-blob-storage
66
ms.topic: tutorial
7-
ms.date: 02/08/2023
7+
ms.date: 01/13/2025
88
ms.author: normesta
99
---
1010

@@ -16,7 +16,7 @@ In this tutorial, you learn how to:
1616

1717
> [!div class="checklist"]
1818
> * Generate an inventory report
19-
> * Create an Azure Databricks workspace, cluster, and notebook
19+
> * Create an Azure Databricks workspace and notebook
2020
> * Read the blob inventory file
2121
> * Get the number and total size of blobs, snapshots, and versions
2222
> * Get the number of blobs by blob type and content type
@@ -50,13 +50,13 @@ You might have to wait up to 24 hours after enabling inventory reports for your
5050

5151
## Configure Azure Databricks
5252

53-
In this section, you create an Azure Databricks workspace, cluster, and notebook. Later in this tutorial, you paste code snippets into notebook cells, and then run them to gather container statistics.
53+
In this section, you create an Azure Databricks workspace and notebook. Later in this tutorial, you paste code snippets into notebook cells, and then run them to gather container statistics.
5454

5555
1. Create an Azure Databricks workspace. See [Create an Azure Databricks workspace](/azure/databricks/getting-started/#--create-an-azure-databricks-workspace).
5656

57-
2. Create a cluster. See [Create a cluster](/azure/databricks/getting-started/quick-start#step-1-create-a-cluster).
57+
2. Create a new notebook. See [Create a notebook](/azure/databricks/getting-started/quick-start#create-notebook).
5858

59-
3. Create a notebook and choose Python as the default language of the notebook. See [Create a notebook](/azure/databricks/notebooks/notebooks-manage#--create-a-notebook).
59+
3. Choose Python as the default language of the notebook.
6060

6161
## Read the blob inventory file
6262

@@ -65,11 +65,11 @@ In this section, you create an Azure Databricks workspace, cluster, and notebook
6565
```python
6666
from pyspark.sql.types import StructType, StructField, IntegerType, StringType
6767
import pyspark.sql.functions as F
68-
storage_account_name = "<storage-account-name>"
69-
storage_account_key = "<storage-account-key>"
70-
container = "<container-name>"
71-
blob_inventory_file = "<blob-inventory-file-name>"
72-
hierarchial_namespace_enabled = False
68+
storage_account_name = "<storage-account-name>"
69+
storage_account_key = "<storage-account-key>"
70+
container = "<container-name>"
71+
blob_inventory_file = "<blob-inventory-file-name>"
72+
hierarchial_namespace_enabled = False
7373

7474
if hierarchial_namespace_enabled == False:
7575
spark.conf.set("fs.azure.account.key.{0}.blob.core.windows.net".format(storage_account_name), storage_account_key)
@@ -92,7 +92,7 @@ In this section, you create an Azure Databricks workspace, cluster, and notebook
9292

9393
- If your account has a hierarchical namespace, set the `hierarchical_namespace_enabled` variable to `True`.
9494

95-
3. Press the SHIFT + ENTER keys to run the code in this block.
95+
3. Press the Run button to run the code in this cell.
9696

9797
## Get blob count and size
9898

@@ -103,7 +103,7 @@ In this section, you create an Azure Databricks workspace, cluster, and notebook
103103
print("Number of bytes occupied by blobs in the container:", df.agg({'Content-Length': 'sum'}).first()['sum(Content-Length)'])
104104
```
105105

106-
2. Press SHIFT + ENTER to run the cell.
106+
2. Press the run button to run the cell.
107107

108108
The notebook displays the number of blobs in a container and the number of bytes occupied by blobs in the container.
109109

@@ -122,7 +122,7 @@ In this section, you create an Azure Databricks workspace, cluster, and notebook
122122
print("Number of bytes occupied by snapshots in the container:", dfT.agg({'Content-Length': 'sum'}).first()['sum(Content-Length)'])
123123
```
124124

125-
2. Press SHIFT + ENTER to run the cell.
125+
2. Press the run button to run the cell.
126126

127127
The notebook displays the number of snapshots and total number of bytes occupied by blob snapshots.
128128

@@ -141,14 +141,13 @@ In this section, you create an Azure Databricks workspace, cluster, and notebook
141141
print("Number of bytes occupied by versions in the container:", dfT.agg({'Content-Length': 'sum'}).first()['sum(Content-Length)'])
142142
```
143143

144-
2. Press SHIFT + ENTER to run the cell.
144+
2. Press SHIFT + ENTER to run the cell.
145145

146146
The notebook displays the number of blob versions and total number of bytes occupied by blob versions.
147147

148148
> [!div class="mx-imgBorder"]
149149
> ![Screenshot of results that appear when you run the cell showing the number of versions and the total combined size of versions.](./media/storage-blob-calculate-container-statistics-databricks/number-of-versions.png)
150150
151-
152151
## Get blob count by blob type
153152

154153
1. In a new cell, paste the following code:
@@ -157,7 +156,7 @@ In this section, you create an Azure Databricks workspace, cluster, and notebook
157156
display(df.groupBy('BlobType').count().withColumnRenamed("count", "Total number of blobs in the container by BlobType"))
158157
```
159158

160-
2. Press SHIFT + ENTER to run the cell.
159+
2. Press SHIFT + ENTER to run the cell.
161160

162161
The notebook displays the number of blob types by type.
163162

@@ -172,7 +171,7 @@ In this section, you create an Azure Databricks workspace, cluster, and notebook
172171
display(df.groupBy('Content-Type').count().withColumnRenamed("count", "Total number of blobs in the container by Content-Type"))
173172
```
174173

175-
2. Press SHIFT + ENTER to run the cell.
174+
2. Press SHIFT + ENTER to run the cell.
176175

177176
The notebook displays the number of blobs associated with each content type.
178177

@@ -181,7 +180,7 @@ In this section, you create an Azure Databricks workspace, cluster, and notebook
181180
182181
## Terminate the cluster
183182

184-
To avoid unnecessary billing, make sure to terminate the cluster. See [Terminate a cluster](/azure/databricks/clusters/clusters-manage#--terminate-a-cluster).
183+
To avoid unnecessary billing, terminate your compute resource. See [terminate a compute](/azure/databricks/clusters/clusters-manage#cluster-terminate).
185184

186185
## Next steps
187186

0 commit comments

Comments
 (0)