Skip to content

Commit d5a4f7c

Browse files
authored
Merge pull request #297299 from v-lanjunli/seventhub
new doc how-to-use-certsp-emit-log-to-eventhub
2 parents 2e7ddf6 + 5320cc5 commit d5a4f7c

10 files changed

+157
-19
lines changed

articles/synapse-analytics/spark/apache-spark-azure-log-analytics.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -57,11 +57,12 @@ spark.synapse.diagnostic.emitter.LA.secret: <LOG_ANALYTICS_WORKSPACE_KEY>
5757
#### Option 2: Configure with Azure Key Vault
5858

5959
> [!NOTE]
60-
> You need to grant read secret permission to the users who will submit Apache Spark applications. For more information, see [Provide access to Key Vault keys, certificates, and secrets with an Azure role-based access control](/azure/key-vault/general/rbac-guide). When you enable this feature in a Synapse pipeline, you need to use **Option 3**. This is necessary to obtain the secret from Azure Key Vault with workspace managed identity.
60+
> You need to grant read secret permission to the users who submit Apache Spark applications. For more information, see [Provide access to Key Vault keys, certificates, and secrets with an Azure role-based access control](/azure/key-vault/general/rbac-guide). When you enable this feature in a Synapse pipeline, you need to use **Option 3**. This is necessary to obtain the secret from Azure Key Vault with workspace managed identity.
6161
6262
To configure Azure Key Vault to store the workspace key, follow these steps:
6363

6464
1. Create and go to your key vault in the Azure portal.
65+
1. Grant the right permission to the users or workspace managed identities.
6566
1. On the settings page for the key vault, select **Secrets**.
6667
1. Select **Generate/Import**.
6768
1. On the **Create a secret** screen, choose the following values:
@@ -145,7 +146,7 @@ You can create an Apache Spark Configuration to your workspace, and when you cre
145146
1. Select **New** button to create a new Apache Spark configuration.
146147
1. **New Apache Spark configuration** page will be opened after you select **New** button.
147148

148-
:::image type="content" source="./media/apache-spark-azure-log-analytics/create-spark-configuration.png" alt-text="Screenshot that create spark configuration.":::
149+
:::image type="content" source="./media/apache-spark-azure-log-analytics/create-spark-configuration.png" alt-text="Screenshot that creates Spark configuration.":::
149150

150151
1. For **Name**, you can enter your preferred and valid name.
151152
1. For **Description**, you can input some description in it.
@@ -276,7 +277,7 @@ You can follow below steps to create a managed private endpoint connection to Az
276277
1. Navigate to your AMPLS in Azure portal again, on the **Private Endpoint connections** page, select the connection provisioned and **Approve**.
277278

278279
> [!NOTE]
279-
> - The AMPLS object has a number of limits you should consider when planning your Private Link setup. See [AMPLS limits](/azure/azure-monitor/logs/private-link-security) for a deeper review of these limits.
280+
> - The AMPLS object has many limits you should consider when planning your Private Link setup. See [AMPLS limits](/azure/azure-monitor/logs/private-link-security) for a deeper review of these limits.
280281
> - Check if you have [right permission](../security/synapse-workspace-access-control-overview.md) to create managed private endpoint.
281282
282283
## Available configurations
@@ -287,10 +288,10 @@ You can follow below steps to create a managed private endpoint connection to Az
287288
| `spark.synapse.diagnostic.emitter.<destination>.type` | Required. Built-in destination type. To enable Azure Log Analytics destination, AzureLogAnalytics needs to be included in this field.|
288289
| `spark.synapse.diagnostic.emitter.<destination>.categories` | Optional. The comma-separated selected log categories. Available values include `DriverLog`, `ExecutorLog`, `EventLog`, `Metrics`. If not set, the default value is **all** categories. |
289290
| `spark.synapse.diagnostic.emitter.<destination>.workspaceId` | Required. To enable Azure Log Analytics destination, workspaceId needs to be included in this field. |
290-
| `spark.synapse.diagnostic.emitter.<destination>.secret` | Optional. The secret (Log Aanalytics key) content. To find this, in the Azure portal, go to Azure Log Analytics workspace > Agents > Primary key. |
291+
| `spark.synapse.diagnostic.emitter.<destination>.secret` | Optional. The secret (Log Analytics key) content. To find this, in the Azure portal, go to Azure Log Analytics workspace > Agents > Primary key. |
291292
| `spark.synapse.diagnostic.emitter.<destination>.secret.keyVault` | Required if `.secret` is not specified. The [Azure Key vault](/azure/key-vault/general/overview) name where the secret (AccessKey or SAS) is stored. |
292293
| `spark.synapse.diagnostic.emitter.<destination>.secret.keyVault.secretName` | Required if `.secret.keyVault` is specified. The Azure Key vault secret name where the secret is stored. |
293-
| `spark.synapse.diagnostic.emitter.<destination>.secret.keyVault.linkedService` | Optional. The Azure Key vault linked service name. When enabled in Synapse pipeline, this is necessary to obtain the secret from AKV. (Please make sure MSI has read permission on the AKV). |
294+
| `spark.synapse.diagnostic.emitter.<destination>.secret.keyVault.linkedService` | Optional. The Azure Key vault linked service name. When enabled in Synapse pipeline, this is necessary to obtain the secret from Azure Key vault. (Make sure the MSI has read access to the Azure Key vault). |
294295
| `spark.synapse.diagnostic.emitter.<destination>.filter.eventName.match` | Optional. The comma-separated Log4j logger names, you can specify which logs to collect. For example `SparkListenerApplicationStart,SparkListenerApplicationEnd` |
295296
| `spark.synapse.diagnostic.emitter.<destination>.filter.loggerName.match` | Optional. The comma-separated log4j logger names, you can specify which logs to collect. For example: `org.apache.spark.SparkContext,org.example.Logger` |
296297
| `spark.synapse.diagnostic.emitter.<destination>.filter.metricName.match` | Optional. The comma-separated spark metric name suffixes, you can specify which metrics to collect. For example:`jvm.heap.used` |

articles/synapse-analytics/spark/azure-synapse-diagnostic-emitters-azure-eventhub.md

Lines changed: 9 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Collect your Apache Spark applications logs and metrics using Azure Event Hubs
3-
description: In this tutorial, you learn how to use the Synapse Apache Spark diagnostic emitter extension to emit Apache Spark applications logs, event logs and metrics to your Azure Event Hubs.
3+
description: In this tutorial, you learn how to use the Synapse Apache Spark diagnostic emitter extension to emit Apache Spark applications' logs, event logs, and metrics to your Azure Event Hubs.
44
author: hrasheed-msft
55
ms.author: jejiang
66

@@ -14,7 +14,7 @@ ms.date: 08/31/2021
1414

1515
The Synapse Apache Spark diagnostic emitter extension is a library that enables the Apache Spark application to emit the logs, event logs, and metrics to one or more destinations, including Azure Log Analytics, Azure Storage, and Azure Event Hubs.
1616

17-
In this tutorial, you learn how to use the Synapse Apache Spark diagnostic emitter extension to emit Apache Spark applications logs, event logs, and metrics to your Azure Event Hubs.
17+
In this tutorial, you learn how to use the Synapse Apache Spark diagnostic emitter extension to emit Apache Spark applications' logs, event logs, and metrics to your Azure Event Hubs.
1818

1919
## Collect logs and metrics to Azure Event Hubs
2020

@@ -35,7 +35,7 @@ spark.synapse.diagnostic.emitter.MyDestination1.secret <connection-string>
3535
```
3636

3737
Fill in the following parameters in the configuration file: `<connection-string>`.
38-
For more description of the parameters, you can refer to [Azure Event Hubs configurations](#available-configurations).
38+
For more descriptions of the parameters, you can refer to [Azure Event Hubs configurations](#available-configurations).
3939

4040
### Step 3: Upload the Apache Spark configuration file to Apache Spark pool
4141

@@ -51,21 +51,21 @@ For more description of the parameters, you can refer to [Azure Event Hubs confi
5151
| `spark.synapse.diagnostic.emitter.<destination>.type` | Required. Built-in destination type. To enable Azure Event Hubs destination, the value should be `AzureEventHub`. |
5252
| `spark.synapse.diagnostic.emitter.<destination>.categories` | Optional. The comma-separated selected log categories. Available values include `DriverLog`, `ExecutorLog`, `EventLog`, `Metrics`. If not set, the default value is **all** categories. |
5353
| `spark.synapse.diagnostic.emitter.<destination>.secret` | Optional. The Azure Event Hubs instance connection string. This field should match this pattern `Endpoint=sb://<FQDN>/;SharedAccessKeyName=<KeyName>;SharedAccessKey=<KeyValue>;EntityPath=<PathName>` |
54-
| `spark.synapse.diagnostic.emitter.<destination>.secret.keyVault` | Required if `.secret` is not specified. The [Azure Key vault](/azure/key-vault/general/overview) name where the secret (connection string) is stored. |
54+
| `spark.synapse.diagnostic.emitter.<destination>.secret.keyVault` | Required if `.secret` isn't specified. The [Azure Key vault](/azure/key-vault/general/overview) (AKV) name where the secret (connection string) is stored. |
5555
| `spark.synapse.diagnostic.emitter.<destination>.secret.keyVault.secretName` | Required if `.secret.keyVault` is specified. The Azure Key vault secret name where the secret (connection string) is stored. |
56-
| `spark.synapse.diagnostic.emitter.<destination>.secret.keyVault.linkedService` | Optional. The Azure Key vault linked service name. When enabled in Synapse pipeline, this is necessary to obtain the secret from AKV. (Please make sure MSI has read permission on the AKV). |
56+
| `spark.synapse.diagnostic.emitter.<destination>.secret.keyVault.linkedService` | Optional. The Azure Key vault linked service name. When enabled in Synapse pipeline, this is required to obtain the secret from AKV. (Make sure managed service identity (MSI) has read permission on the AKV). |
5757
| `spark.synapse.diagnostic.emitter.<destination>.filter.eventName.match` | Optional. The comma-separated spark event names, you can specify which events to collect. For example: `SparkListenerApplicationStart,SparkListenerApplicationEnd` |
58-
| `spark.synapse.diagnostic.emitter.<destination>.filter.loggerName.match` | Optional. The comma-separated log4j logger names, you can specify which logs to collect. For example: `org.apache.spark.SparkContext,org.example.Logger` |
58+
| `spark.synapse.diagnostic.emitter.<destination>.filter.loggerName.match` | Optional. The comma-separated Log4j logger names, you can specify which logs to collect. For example: `org.apache.spark.SparkContext,org.example.Logger` |
5959
| `spark.synapse.diagnostic.emitter.<destination>.filter.metricName.match` | Optional. The comma-separated spark metric name suffixes, you can specify which metrics to collect. For example: `jvm.heap.used` |
6060

6161

6262
> [!NOTE]
6363
>
64-
> The Azure Eventhub instance connection string should always contains the `EntityPath`, which is the name of the Azure Event Hubs instance.
64+
> The Azure Event Hubs instance connection string should always contain `EntityPath`, which is the name of the Azure Event Hubs instance.
6565
6666
## Log data sample
6767

68-
Here is a sample log record in JSON format:
68+
Here's a sample log record in JSON format:
6969

7070
```json
7171
{
@@ -91,8 +91,4 @@ Here is a sample log record in JSON format:
9191

9292
## Synapse workspace with data exfiltration protection enabled
9393

94-
Azure Synapse Analytics workspaces support enabling data exfiltration protection for workspaces. With exfiltration protection, the logs and metrics cannot be sent out to the destination endpoints directly. You can create corresponding [managed private endpoints](../../synapse-analytics/security/synapse-workspace-managed-private-endpoints.md) for different destination endpoints or [create IP firewall rules](../../synapse-analytics/security/synapse-workspace-ip-firewall.md) in this scenario.
95-
96-
97-
98-
94+
Azure Synapse Analytics workspaces support enabling data exfiltration protection for workspaces. With exfiltration protection, the logs and metrics can't be sent out to the destination endpoints directly. You can create corresponding [managed private endpoints](../../synapse-analytics/security/synapse-workspace-managed-private-endpoints.md) for different destination endpoints or [create IP firewall rules](../../synapse-analytics/security/synapse-workspace-ip-firewall.md) in this scenario.
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
---
2+
title: How to use certificate and Service Principal emit log to Azure Event Hubs
3+
description: Learn to setting up Azure services, particularly focusing on integrating Azure Synapse with Azure Event Hubs and Key Vault.
4+
author: jejiang
5+
ms.author: jejiang
6+
ms.reviewer: whhender
7+
ms.topic: tutorial
8+
ms.date: 03/24/2025
9+
---
10+
11+
# How to use certificate and service principal emit log to Azure Event Hubs
12+
13+
The Apache Spark diagnostic emitter extension is a library that allows Spark applications to send logs, event logs, and metrics to destinations like Azure Event Hubs, Azure Log Analytics, and Azure Storage.
14+
15+
In this tutorial, you learn how to create required Azure resources and configure a Spark application with a certificate and service principal to emit logs, event logs, and metrics to Azure Event Hubs using the Apache Spark diagnostic emitter extension.
16+
17+
## Prerequisites
18+
19+
- An Azure subscription. You can also [create a free account](https://azure.microsoft.com/free/) before you get started.
20+
- [Synapse Analytics workspace](/azure/synapse-analytics/get-started-create-workspace).
21+
- [Azure Event Hubs](/azure/event-hubs/event-hubs-about).
22+
- [Azure Key Vault](/azure/key-vault/general/overview)
23+
- [App Registration](https://ms.portal.azure.com/#view/Microsoft_AAD_RegisteredApps/ApplicationsListBlade)
24+
25+
> [!Note]
26+
>
27+
> To complete this tutorial's steps, you need to have access to a resource group for which you're assigned the Owner role.
28+
>
29+
30+
## Step 1. Register an application
31+
32+
1. Sign in to the [Azure portal](https://portal.azure.com/) and go to [App registrations](/entra/identity-platform/quickstart-register-app#register-an-application).
33+
2. Create a new app registration for your Synapse workspace.
34+
35+
:::image type="content" source="media\how-to-use-certificate-with-service-principalp-emit-log-event-hubs\create-a-new-app-registration.png" alt-text="Screenshot showing create a new app registration.":::
36+
37+
## Step 2. Generate a certificate in Key Vault
38+
39+
1. Navigate to Key Vault.
40+
2. Expand the **Object**, and select the **Certificates**.
41+
3. Click on **Generate/Import**.
42+
43+
:::image type="content" source="media\how-to-use-certificate-with-service-principalp-emit-log-event-hubs\generate-a-new-certificate.png" alt-text="Screenshot showing generate a new certificate for app.":::
44+
45+
## Step 3. Trust the certificate in the application
46+
47+
1. Go to the app created in Step 1 -> **Manage** -> **Manifest**.
48+
2. Append the certificate details to the manifest file to establish trust.
49+
50+
```
51+
"trustedCertificateSubjects": [
52+
{
53+
"authorityId": "00000000-0000-0000-0000-000000000001",
54+
"subjectName": "Your-Subject-of-Certificate",
55+
"revokedCertificateIdentifiers": []
56+
}
57+
]
58+
```
59+
60+
:::image type="content" source="media\how-to-use-certificate-with-service-principalp-emit-log-event-hubs\trust-the-certificate.png" alt-text="Screenshot showing trust the certificate in the application.":::
61+
62+
## Step 4. Assign Azure Event Hubs Data Sender Role
63+
64+
1. In Azure Event Hubs, navigate to Access control (IAM).
65+
2. Assign the Azure Event Hubs data sender role to the application (service principal).
66+
67+
:::image type="content" source="media\how-to-use-certificate-with-service-principalp-emit-log-event-hubs\assign-azure-event-hubs-data-sender-role.png" alt-text="Screenshot showing assign Azure event hubs data sender role.":::
68+
69+
## Step 5. Create a linked service in Synapse
70+
71+
1. In Synapse Analytics workspace, go to **Manage** -> **linked service**.
72+
2. Create a new **linked Service** in Synapse to connect to **Key Vault**.
73+
74+
:::image type="content" source="media\how-to-use-certificate-with-service-principalp-emit-log-event-hubs\create-a-linked-service-in-synapse.png" alt-text="Screenshot showing create a linked service in synapse.":::
75+
76+
## Step 6. Assign reader role to linked service in Key Vault
77+
78+
1. Get the workspace managed identity ID from the linked service. The **managed identity name** and **object ID** for the linked service is under **Edit linked service**.
79+
80+
:::image type="content" source="media\how-to-use-certificate-with-service-principalp-emit-log-event-hubs\managed-identity-name-and-object-id.png" alt-text="Screenshot showing managed identity name and object ID are in edit linked service.":::
81+
82+
2. In **Key Vault**, assign the linked service a **Reader** role.
83+
84+
## Step 7. Configure with a linked service
85+
86+
Gather the following values and add to the Apache Spark configuration.
87+
88+
- **<EMITTER_NAME>**: The name for the emmiter.
89+
- **<CERTIFICATE_NAME>**: The certificate name that you generated in the key vault.
90+
- **<LINKED_SERVICE_NAME>**: The Azure Key vault linked service name.
91+
- **<EVENT_HUB_HOST_NAME>**: The Azure Event Hubs host name, you can find it in Azure Event Hubs Namespace -> Overview -> Host name.
92+
- **<SERVICE_PRINCIPAL_TENANT_ID>**: The service principal tenant ID, you can find it in App registrations -> your app name -> Overview -> Directory (tenant) ID
93+
- **<SERVICE_PRINCIPAL_CLIENT_ID>**: The service principal client ID, you can find it in registrations -> your app name -> Overview -> Application(client) ID
94+
- **<EVENT_HUB_ENTITY_PATH>**: The Azure Event Hubs entity path, you can find it in Azure Event Hubs Namespace -> Overview -> Host name.
95+
96+
```
97+
"spark.synapse.diagnostic.emitters": <EMITTER_NAME>,
98+
"spark.synapse.diagnostic.emitter.<EMITTER_NAME>.type": "AzureEventHub",
99+
"spark.synapse.diagnostic.emitter.<EMITTER_NAME>.categories": "DriverLog,ExecutorLog,EventLog,Metrics",
100+
"spark.synapse.diagnostic.emitter.<EMITTER_NAME>.certificate.keyVault.certificateName": <CERTIFICATE_NAME>",
101+
"spark.synapse.diagnostic.emitter.<EMITTER_NAME>.certificate.keyVault.linkedService": <LINKED_SERVICE_NAME>,
102+
"spark.synapse.diagnostic.emitter.<EMITTER_NAME>.hostName": <EVENT_HUB_HOST_NAME>,
103+
"spark.synapse.diagnostic.emitter.<EMITTER_NAME>.tenantId": <SERVICE_PRINCIPAL_TENANT_ID>,
104+
"spark.synapse.diagnostic.emitter.<EMITTER_NAME>.clientId": <SERVICE_PRINCIPAL_CLIENT_ID>,
105+
"spark.synapse.diagnostic.emitter.<EMITTER_NAME>.entityPath": <EVENT_HUB_ENTITY_PATH>
106+
```
107+
108+
## Step 8. Submit an Apache Spark application and view the logs and metrics
109+
110+
You can use the Apache Log4j library to write custom logs.
111+
112+
Example for Scala:
113+
114+
```scala
115+
%%spark
116+
val logger = org.apache.log4j.LogManager.getLogger("com.contoso.LoggerExample")
117+
logger.info("info message")
118+
logger.warn("warn message")
119+
logger.error("error message")
120+
//log exception
121+
try {
122+
1/0
123+
} catch {
124+
case e:Exception =>logger.warn("Exception", e)
125+
}
126+
// run job for task level metrics
127+
val data = sc.parallelize(Seq(1,2,3,4)).toDF().count()
128+
```
129+
130+
Example for PySpark:
131+
132+
```python
133+
%%pyspark
134+
logger = sc._jvm.org.apache.log4j.LogManager.getLogger("com.contoso.PythonLoggerExample")
135+
logger.info("info message")
136+
logger.warn("warn message")
137+
logger.error("error message")
138+
```
139+
Loading
Loading
Loading
Loading
Loading
Loading

articles/synapse-analytics/toc.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -816,10 +816,12 @@ items:
816816
href: ./spark/azure-synapse-diagnostic-emitters-azure-storage.md
817817
- name: Collect Apache Spark applications logs and metrics with Azure Event Hubs
818818
href: ./spark/azure-synapse-diagnostic-emitters-azure-eventhub.md
819+
- name: Collect Apache Spark applications logs and metrics by certificate and service principal
820+
href: ./spark/how-to-use-certificate-with-service-principalp-emit-log-event-hubs.md
819821
- name: Manage Apache Spark configuration
820822
href: ./spark/apache-spark-azure-create-spark-configuration.md
821823
- name: Apache Spark Advisor
822-
href: ./monitoring/apache-spark-advisor.md
824+
href: ./monitoring/apache-spark-advisor.md
823825
- name: Data sources
824826
items:
825827
- name: Azure Cosmos DB Spark 3

0 commit comments

Comments
 (0)