Skip to content

Commit b1267b0

Browse files
[DJM-844] Docs on how to configure Warehouse ID in DJM Databricks page (#30849)
* paragraph on configure warehouse for cost vis * vale * Update with full permission commands * vale * Remove duplicated items * SQL Warehouse suggestions 2x * Duplicate guidelines in Advanced>>Permissions * fix glitch * PR feedback * Apply suggestions from code review Co-authored-by: domalessi <[email protected]> * Apply trade-off to PR suggestions * Update content/en/data_jobs/databricks.md Co-authored-by: domalessi <[email protected]> --------- Co-authored-by: domalessi <[email protected]>
1 parent 44fbc59 commit b1267b0

File tree

1 file changed

+44
-1
lines changed

1 file changed

+44
-1
lines changed

content/en/data_jobs/databricks.md

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,11 +44,26 @@ Follow these steps to enable Data Jobs Monitoring for Databricks.
4444
1. On the **Configure** tab, click **Add Databricks Workspace**.
4545
1. Enter a workspace name, your Databricks workspace URL, account ID, and the client ID and secret you generated.
4646
{{< img src="data_jobs/databricks/configure-workspace-form-m2m.png" alt="In the Datadog-Databricks integration tile, a Databricks workspace is displayed. This workspace has a name, URL, account ID, client ID, and client secret." style="width:100%;" >}}
47+
1. To gain visibility into your Databricks costs in Data Jobs Monitoring or [Cloud Cost Management][18], provide the ID of a [Databricks SQL Warehouse][19] that Datadog can use to query your [system tables][20].
48+
- The service principal must have access to the SQL Warehouse. In the Warehouse configuration page, go to **Permissions** (top right) and grant it `CAN USE` permission.
49+
- Grant the service principal read access to the Unity Catalog [system tables][20] by running the following commands:
50+
```sql
51+
GRANT USE CATALOG ON CATALOG system TO <service_principal>;
52+
GRANT SELECT ON CATALOG system TO <service_principal>;
53+
GRANT USE SCHEMA ON CATALOG system TO <service_principal>;
54+
```
55+
The user granting these must have `MANAGE` privilege on `CATALOG system`.
56+
57+
- The SQL Warehouse must be Pro or Serverless. Classic Warehouses are **NOT** supported. A 2XS warehouse is recommended, with Auto Stop set to 5-10 minutes to reduce cost.
4758
1. In the **Select products to set up integration** section, ensure that Data Jobs Monitoring is **Enabled**.
4859
1. In the **Datadog Agent Setup** section, choose either
4960
- [Managed by Datadog (recommended)](?tab=datadogmanagedglobalinitscriptrecommended#install-the-datadog-agent): Datadog installs and manages the Agent with a global init script in the workspace.
5061
- [Manually](?tab=manuallyinstallaglobalinitscript#install-the-datadog-agent): Follow the [instructions below](?tab=manuallyinstallaglobalinitscript#install-the-datadog-agent) to install and manage the init script for installing the Agent globally or on specific Databricks clusters.
5162

63+
[18]: https://docs.datadoghq.com/cloud_cost_management/
64+
[19]: https://docs.databricks.com/aws/en/compute/sql-warehouse/
65+
[20]: https://docs.databricks.com/aws/en/admin/system-tables/
66+
5267
{{% /tab %}}
5368

5469
{{% tab "Use a Personal Access Token (Legacy)" %}}
@@ -67,6 +82,17 @@ Follow these steps to enable Data Jobs Monitoring for Databricks.
6782
1. On the **Configure** tab, click **Add Databricks Workspace**.
6883
1. Enter a workspace name, your Databricks workspace URL, and the Databricks token you generated.
6984
{{< img src="data_jobs/databricks/configure-workspace-form.png" alt="In the Datadog-Databricks integration tile, a Databricks workspace is displayed. This workspace has a name, URL, and API token." style="width:100%;" >}}
85+
1. To gain visibility into your Databricks costs in Data Jobs Monitoring or [Cloud Cost Management][18], provide the ID of a [Databricks SQL Warehouse][19] that Datadog can use to query your [system tables][20].
86+
87+
- The token's principal must have access to the SQL Warehouse. Give it `CAN USE` permission from **Permissions** at the top right of the Warehouse configuration page.
88+
- Grant the service principal read access to the Unity Catalog [system tables][20] by running the following commands::
89+
```sql
90+
GRANT USE CATALOG ON CATALOG system TO <token_principal>;
91+
GRANT SELECT ON CATALOG system TO <token_principal>;
92+
GRANT USE SCHEMA ON CATALOG system TO <token_principal>;
93+
```
94+
The user granting these must have `MANAGE` privilege on `CATALOG system`.
95+
- The SQL Warehouse must be Pro or Serverless. Classic Warehouses are **NOT** supported. A 2XS size warehouse is recommended, with Auto Stop configured for 5-10 minutes to minimize cost.
7096
1. In the **Select products to set up integration** section, make sure the Data Jobs Monitoring product is **Enabled**.
7197
1. In the **Datadog Agent Setup** section, choose either
7298
- [Managed by Datadog (recommended)](?tab=datadogmanagedglobalinitscriptrecommended#install-the-datadog-agent): Datadog installs and manages the Agent with a global init script in the workspace.
@@ -76,14 +102,18 @@ Follow these steps to enable Data Jobs Monitoring for Databricks.
76102
[10]: https://docs.databricks.com/en/admin/users-groups/service-principals.html#manage-personal-access-tokens-for-a-service-principal
77103
[11]: https://docs.databricks.com/en/admin/users-groups/service-principals.html#what-is-a-service-principal
78104
[17]: https://docs.databricks.com/aws/en/security/auth/entitlements#entitlements-overview
105+
[18]: https://docs.datadoghq.com/cloud_cost_management
106+
[19]: https://docs.databricks.com/aws/en/compute/sql-warehouse/
107+
[20]: https://docs.databricks.com/aws/en/admin/system-tables/
108+
79109

80110
{{% /tab %}}
81111

82112
{{< /tabs >}}
83113

84114
### Install the Datadog Agent
85115

86-
The Datadog Agent must be installed on Databricks clusters to monitor Databricks jobs that run on all-purpose or job clusters.
116+
The Datadog Agent must be installed on Databricks clusters to monitor Databricks jobs that run on all-purpose or job clusters. This step is not required to monitor jobs on [serverless compute][4].
87117

88118
{{< tabs >}}
89119
{{% tab "Datadog managed global init script (Recommended)" %}}
@@ -129,6 +159,7 @@ Optionally, you can add tags to your Databricks cluster and Spark performance me
129159
[1]: /getting_started/tagging/
130160
[2]: https://docs.databricks.com/api/workspace/clusters/edit#spark_env_vars
131161
[3]: /agent/logs/advanced_log_collection/?tab=environmentvariable#global-processing-rules
162+
[4]: https://docs.databricks.com/aws/en/compute/serverless/
132163

133164
{{% /tab %}}
134165

@@ -296,6 +327,18 @@ If you need more granular control, grant these minimal permissions to the follow
296327
| Query | [CAN VIEW][23]
297328
| SQL warehouse | [CAN MONITOR][24]
298329

330+
Additionally, for Datadog to access your Databricks cost data in Data Jobs Monitoring or [Cloud Cost Management][18], the user or service principal used to query [system tables][20] must have the following permissions:
331+
- `CAN USE` permission on the SQL Warehouse.
332+
- Read access to the [system tables][20] within Unity Catalog. This can be granted with:
333+
```sql
334+
GRANT USE CATALOG ON CATALOG system TO <service_principal>;
335+
GRANT SELECT ON CATALOG system TO <service_principal>;
336+
GRANT USE SCHEMA ON CATALOG system TO <service_principal>;
337+
```
338+
The user granting these must have `MANAGE` privilege on `CATALOG system`.
339+
340+
[18]: https://docs.datadoghq.com/cloud_cost_management
341+
[20]: https://docs.databricks.com/aws/en/admin/system-tables/
299342

300343
### Tag spans at runtime
301344

0 commit comments

Comments
 (0)