Skip to content

Commit 83c8316

Browse files
authored
Merge pull request #101080 from orspod/2020-Jan-spark
2020 jan spark
2 parents d2555cf + 7b5e43b commit 83c8316

File tree

5 files changed

+26
-28
lines changed

5 files changed

+26
-28
lines changed
93.7 KB
Loading
10.9 KB
Loading
Binary file not shown.
-5.07 KB
Loading

articles/data-explorer/spark-connector.md

Lines changed: 26 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.author: orspodek
66
ms.reviewer: michazag
77
ms.service: data-explorer
88
ms.topic: conceptual
9-
ms.date: 4/29/2019
9+
ms.date: 1/14/2020
1010
---
1111

1212
# Azure Data Explorer Connector for Apache Spark (Preview)
@@ -30,7 +30,7 @@ and sink operations such as write, read and writeStream.
3030
* Install Azure Data Explorer connector library, and libraries listed in [dependencies](https://github.com/Azure/azure-kusto-spark#dependencies) including the following [Kusto Java SDK](/azure/kusto/api/java/kusto-java-client-library) libraries:
3131
* [Kusto Data Client](https://mvnrepository.com/artifact/com.microsoft.azure.kusto/kusto-data)
3232
* [Kusto Ingest Client](https://mvnrepository.com/artifact/com.microsoft.azure.kusto/kusto-ingest)
33-
* Pre-built libraries for [Spark 2.4, Scala 2.11](https://github.com/Azure/azure-kusto-spark/releases)
33+
* Pre-built libraries for [Spark 2.4, Scala 2.11](https://github.com/Azure/azure-kusto-spark/releases) and [Maven repo](https://mvnrepository.com/artifact/com.microsoft.azure.kusto/spark-kusto-connector)
3434

3535
## How to build the Spark connector
3636

@@ -79,21 +79,14 @@ For more information, see [connector usage](https://github.com/Azure/azure-kusto
7979
> [!NOTE]
8080
> It is recommended to use the latest Azure Data Explorer Spark connector release when performing the following steps:
8181
82-
1. Set the following Spark cluster settings, based on Azure Databricks cluster using Spark 2.4 and Scala 2.11:
82+
1. Set the following Spark cluster settings, based on Azure Databricks cluster using Spark 2.4.4 and Scala 2.11:
8383

8484
![Databricks cluster settings](media/spark-connector/databricks-cluster.png)
85-
86-
1. Import the Azure Data Explorer connector library:
85+
86+
1. Install the latest spark-kusto-connector library from Maven:
8787

8888
![Import Azure Data Explorer library](media/spark-connector/db-create-library.png)
8989

90-
1. Add additional dependencies (not necessary if used from maven) :
91-
92-
![Add dependencies](media/spark-connector/db-dependencies.png)
93-
94-
> [!TIP]
95-
> The correct java release version for each Spark release is found [here](https://github.com/Azure/azure-kusto-spark#dependencies).
96-
9790
1. Verify that all required libraries are installed:
9891

9992
![Verify libraries installed](media/spark-connector/db-libraries-view.png)
@@ -112,16 +105,16 @@ Most simple and common authentication method. This method is recommended for Azu
112105
|**KUSTO_AAD_AUTHORITY_ID** | Azure AD authentication authority. Azure AD Directory (tenant) ID. |
113106
|**KUSTO_AAD_CLIENT_PASSWORD** | Azure AD application key for the client. |
114107

115-
### Azure Data Explorer Privileges
108+
### Azure Data Explorer privileges
116109

117-
The following privileges must be granted on an Azure Data Explorer Cluster:
110+
The following privileges must be granted on an Azure Data Explorer cluster:
118111

119112
* For reading (data source), Azure AD application must have *viewer* privileges on the target database, or *admin* privileges on the target table.
120113
* For writing (data sink), Azure AD application must have *ingestor* privileges on the target database. It must also have *user* privileges on the target database to create new tables. If the target table already exists, *admin* privileges on the target table can be configured.
121114

122115
For more information on Azure Data Explorer principal roles, see [role-based authorization](/azure/kusto/management/access-control/role-based-authorization). For managing security roles, see [security roles management](/azure/kusto/management/security-roles).
123116

124-
## Spark sink: Writing to Azure Data Explorer
117+
## Spark sink: writing to Azure Data Explorer
125118

126119
1. Set up sink parameters:
127120

@@ -141,19 +134,19 @@ For more information on Azure Data Explorer principal roles, see [role-based aut
141134

142135
```scala
143136
import com.microsoft.kusto.spark.datasink.KustoSinkOptions
144-
val conf = Map(
145-
KustoSinkOptions.KUSTO_CLUSTER -> cluster,
146-
KustoSinkOptions.KUSTO_TABLE -> table,
147-
KustoSinkOptions.KUSTO_DATABASE -> database,
148-
KustoSinkOptions.KUSTO_AAD_CLIENT_ID -> appId,
149-
KustoSinkOptions.KUSTO_AAD_CLIENT_PASSWORD -> appKey,
150-
KustoSinkOptions.KUSTO_AAD_AUTHORITY_ID -> authorityId)
151-
137+
import org.apache.spark.sql.{SaveMode, SparkSession}
138+
152139
df.write
153140
.format("com.microsoft.kusto.spark.datasource")
154-
.options(conf)
155-
.save()
156-
141+
.option(KustoSinkOptions.KUSTO_CLUSTER, cluster)
142+
.option(KustoSinkOptions.KUSTO_DATABASE, database)
143+
.option(KustoSinkOptions.KUSTO_TABLE, "Demo3_spark")
144+
.option(KustoSinkOptions.KUSTO_AAD_CLIENT_ID, appId)
145+
.option(KustoSinkOptions.KUSTO_AAD_CLIENT_PASSWORD, appKey)
146+
.option(KustoSinkOptions.KUSTO_AAD_AUTHORITY_ID, authorityId)
147+
.option(KustoSinkOptions.KUSTO_TABLE_CREATE_OPTIONS, "CreateIfNotExist")
148+
.mode(SaveMode.Append)
149+
.save()
157150
```
158151

159152
Or use the simplified syntax:
@@ -186,10 +179,9 @@ For more information on Azure Data Explorer principal roles, see [role-based aut
186179
.option(KustoSinkOptions.KUSTO_WRITE_ENABLE_ASYNC, "true") // Optional, better for streaming, harder to handle errors
187180
.trigger(Trigger.ProcessingTime(TimeUnit.SECONDS.toMillis(10))) // Sync this with the ingestionBatching policy of the database
188181
.start()
189-
190182
```
191183

192-
## Spark source: Reading from Azure Data Explorer
184+
## Spark source: reading from Azure Data Explorer
193185

194186
1. When reading small amounts of data, define the data query:
195187

@@ -249,3 +241,9 @@ For more information on Azure Data Explorer principal roles, see [role-based aut
249241

250242
display(dfFiltered)
251243
```
244+
245+
## Next steps
246+
247+
* Learn more about the [Azure Data Explorer Spark Connector](https://github.com/Azure/azure-kusto-spark/tree/master/docs)
248+
* [Sample code](https://github.com/Azure/azure-kusto-spark/tree/master/samples/src/main)
249+

0 commit comments

Comments
 (0)