Skip to content

Commit 2cf5675

Browse files
authored
Merge pull request #276555 from sreekzz/hdi-msi-support
New Topic msi-support-to-access-azure-services
2 parents fb58eb1 + 34f9d04 commit 2cf5675

File tree

2 files changed

+366
-0
lines changed

2 files changed

+366
-0
lines changed

articles/hdinsight/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,8 @@ items:
8383
href: ./hdinsight-high-availability-case-study.md
8484
- name: Understand managed identities
8585
href: ./hdinsight-managed-identities.md
86+
- name: MSI Support to access Azure services
87+
href: ./msi-support-to-access-azure-services.md
8688
- name: Compare storage options
8789
href: hdinsight-hadoop-compare-storage-options.md
8890
items:
Lines changed: 364 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,364 @@
1+
---
2+
title: MSI Support to Access Azure services
3+
description: Learn how to provide MSI Support to Access Azure services.
4+
ms.service: hdinsight
5+
ms.topic: how-to
6+
ms.custom: hdinsightactive
7+
ms.date: 07/09/2024
8+
---
9+
10+
# MSI Support to access Azure services
11+
12+
Presently in Azure HDInsight non-ESP cluster, User Job accessing Azure resources like SqlDB, Cosmos DB, EH, KV, Kusto either using username and password or using MSI certificate key. This isn't in line with Microsoft security guidelines.
13+
14+
This article explains the HDInsight interface and code details to fetch OAuth tokens in a non-ESP cluster.
15+
16+
## Prerequisites
17+
18+
* This feature is available in the latest HDInsight-5.1, 5.0, and 4.0 versions. Make sure you recreated or installed this cluster versions.
19+
* HDInsight Cluster must be with ADL-Gen2 storage as primary storage, which enables MSI based access for this storage. This same MSI used for all job resources access. Ensure the required IAM permissions given to this MSI to access Azure resources.
20+
* IMDS endpoint can't work for HDI worker nodes and the access tokens can be fetched using this HDInsight utility only.
21+
22+
There are two Java client implementations provided to fetch the access token.
23+
24+
* Option 1: HDInsight utility and API usage to fetch access token.
25+
* Option 2: HDInsight utility, TokenCredential Implementation to fetch Access Token.
26+
27+
> [!NOTE]
28+
> By default, the Scope is “.default”. We will provide a mechanism in the utility API to pass the user supplied scope argument, in future.
29+
30+
## How to download the utility jar from Maven Central
31+
32+
Follow these steps to download client JARs from Maven Central.
33+
34+
Downloading the JAR in a Maven Build from Maven Central directly.
35+
36+
1. Add maven central as one of your repositories to resolve maven dependencies, ignore if already added.
37+
38+
Add the following code snippet to the `repositories` section of your pom.xml file:
39+
40+
```
41+
<repository>
42+
<id>central</id>
43+
<url>https://repo.maven.apache.org/maven2/</url>
44+
<releases>
45+
<enabled>true</enabled>
46+
</releases>
47+
<snapshots>
48+
<enabled>true</enabled>
49+
</snapshots>
50+
</repository>
51+
```
52+
53+
1. Following is the sample code snippet of HDInsight OAuth client utility library dependency, add the `dependency` section to your pom.xml
54+
55+
```
56+
<dependency>
57+
<groupId>com.microsoft.azure.hdinsight</groupId>
58+
<artifactId>hdi-oauth-token-utils</artifactId>
59+
<version>1.0.0</version>
60+
</dependency>
61+
```
62+
63+
> [!IMPORTANT]
64+
>
65+
> Make sure the following items are in the class path.
66+
> - Hadoop's `core-site.xml`
67+
> - All the client jars from this cluster location `/usr/hdp/<hdi-version>/hadoop/client/*`
68+
> - `azure-core-1.49.0.jar, okhttp3-4.9.3` and its transitive dependent jars.
69+
70+
### Structure of access token
71+
72+
Access token structure as follows.
73+
74+
```
75+
package com.azure.core.credential;
76+
import java.time.OffsetDateTime;
77+
78+
/** Represents an immutable access token with a token string and an expiration time
79+
* in date format. By default, 24hrs is the expiration time out.
80+
*/
81+
public class AccessToken {
82+
83+
public String getToken();
84+
85+
public OffsetDateTime getExpiresAt();
86+
}
87+
```
88+
89+
90+
## Option 1 - HDInsight utility and API usage to fetch access token
91+
92+
Implemented a convenient java utility class to fetch MSI access token by providing target resource URI, which can be EH, KV, Kusto, SqlDB, Cosmos DB etc.
93+
94+
### How to use the API
95+
96+
To fetch the token, you can invoke the API in your job application code.
97+
98+
```
99+
import com.microsoft.azure.hdinsight.oauthtoken.utils.HdiIdentityTokenServiceUtils;
100+
import com.azure.core.credential.AccessToken;
101+
102+
// uri can be EH, Kusto etc.
103+
// By default, the Scope is “.default”.
104+
// We will provide a mechanism to take user supplied scope, in future.
105+
String msiResourceUri = https://vault.azure.net/;
106+
HdiIdentityTokenServiceUtils tokenUtils = new HdiIdentityTokenServiceUtils();
107+
AccessToken token = tokenUtils.getAccessToken(msiResourceUri);
108+
```
109+
110+
## Option 2 - HDInsight Utility, TokenCredential implementation to fetch access token
111+
112+
Provided `HdiIdentityTokenCredential` feature java class, which is the standard implementation of `com.azure.core.credential.TokenCredential` interface.
113+
114+
> [!NOTE]
115+
> The HdiIdentityTokenCredential class can be used with various Azure SDK client libraries to authenticate requests and access Azure services without manual access token management.
116+
117+
### Examples
118+
119+
Following are the HDInsight oauth utility examples, which can be used in job applications to fetch access tokens for the given target resource uri:
120+
121+
**If the client is a Key Vault**
122+
123+
For Azure Key Vault, the SecretClient instance uses a TokenCredential to authenticate and fetch the access token:
124+
125+
```
126+
import com.azure.core.credential.TokenCredential;
127+
import com.azure.security.keyvault.secrets.SecretClient;
128+
import com.azure.security.keyvault.secrets.SecretClientBuilder;
129+
import com.microsoft.azure.hdinsight.oauthtoken.credential.HdiIdentityTokenCredential;
130+
131+
// Replace <resource-uri> with your Key Vault URI.
132+
TokenCredential hdiTokenCredential = new HdiIdentityTokenCredential("<resource-uri>");
133+
134+
// Create a SecretClient to call the service.
135+
SecretClient secretClient = new SecretClientBuilder()
136+
.vaultUrl("<your-key-vault-url>") // Replace with your Key Vault URL.
137+
.credential(hdiTokenCredential) // Add HDI identity token credential.
138+
.buildClient();
139+
140+
// Retrieve a secret from the Key Vault.
141+
// Replace with your secret name.
142+
KeyVaultSecret secret = secretClient.getSecret("<your-secret-name>");
143+
```
144+
145+
**If the client is a Event Hub**
146+
147+
Example of Azure Event Hubs, which doesn't directly fetch an access token. It uses a TokenCredential to authenticate, and this credential handles fetching the access token.
148+
149+
```
150+
import com.azure.messaging.eventhubs.EventHubClientBuilder;
151+
import com.azure.messaging.eventhubs.EventHubProducerClient;
152+
import com.azure.core.credential.TokenCredential;
153+
import com.microsoft.azure.hdinsight.oauthtoken.credential.HdiIdentityTokenCredential;
154+
HdiIdentityTokenCredential hdiTokenCredential = new HdiIdentityTokenCredential("https://eventhubs.azure.net");
155+
// Create a producer client
156+
EventHubProducerClient producer = new EventHubClientBuilder()
157+
.credential("<fully-qualified-namespace>", "<event-hub-name>", hdiTokenCredential)
158+
.buildProducerClient();
159+
160+
// Use the producer client ....
161+
```
162+
163+
164+
**If the client is a MySql Database**
165+
166+
Example of Azure Sql Database, which doesn't directly fetch an access token.
167+
168+
Connect using access token callback: The following example demonstrates implementing and setting the accessToken callback
169+
170+
```
171+
package com.microsoft.azure.hdinsight.oauthtoken;
172+
173+
import com.azure.core.credential.AccessToken;
174+
import com.microsoft.azure.hdinsight.oauthtoken.utils.HdiIdentityTokenServiceUtils;
175+
import com.microsoft.sqlserver.jdbc.SQLServerAccessTokenCallback;
176+
import com.microsoft.sqlserver.jdbc.SqlAuthenticationToken;
177+
178+
public class HdiSQLAccessTokenCallback implements SQLServerAccessTokenCallback {
179+
180+
@Override
181+
public SqlAuthenticationToken getAccessToken(String spn, String stsurl) {
182+
try {
183+
HdiIdentityTokenServiceUtils provider = new HdiIdentityTokenServiceUtils();
184+
AccessToken token = provider.getAccessToken("https://database.windows.net/";);
185+
return new SqlAuthenticationToken(token.getToken(), token.getExpiresAt().getTime());
186+
} catch (Exception e) {
187+
// handle exception...
188+
return null;
189+
}
190+
}
191+
}
192+
193+
194+
195+
package com.microsoft.azure.hdinsight.oauthtoken;
196+
197+
import java.sql.DriverManager;
198+
199+
public class HdiTokenClassBasedConnectionWithDriver {
200+
201+
public static void main(String[] args) throws Exception {
202+
203+
// Below is the sample code to use hdi sql callback.
204+
// Replaces <dbserver> with your server name and replaces <dbname> with your db name.
205+
String connectionUrl = "jdbc:sqlserver://<dbserver>.database.windows.net;"
206+
+ "database=<dbname>;"
207+
+ "accessTokenCallbackClass=com.microsoft.azure.hdinsight.oauthtoken.HdiSQLAccessTokenCallback;"
208+
+ "encrypt=true;"
209+
+ "trustServerCertificate=false;"
210+
+ "loginTimeout=30;";
211+
212+
DriverManager.getConnection(connectionUrl);
213+
214+
}
215+
216+
}
217+
218+
package com.microsoft.azure.hdinsight.oauthtoken;
219+
220+
import com.microsoft.azure.hdinsight.oauthtoken.HdiSQLAccessTokenCallback;
221+
import com.microsoft.sqlserver.jdbc.SQLServerDataSource;
222+
import java.sql.Connection;
223+
224+
public class HdiTokenClassBasedConnectionWithDS {
225+
226+
public static void main(String[] args) throws Exception {
227+
228+
HdiSQLAccessTokenCallback callback = new HdiSQLAccessTokenCallback();
229+
230+
SQLServerDataSource ds = new SQLServerDataSource();
231+
ds.setServerName("<db-server>"); // Replaces <db-server> with your server name.
232+
ds.setDatabaseName("<dbname>"); // Replace <dbname> with your database name.
233+
ds.setAccessTokenCallback(callback);
234+
235+
ds.getConnection();
236+
}
237+
}
238+
```
239+
240+
241+
242+
**If the client is a Kusto**
243+
244+
Example of Azure Sql Database, which doesn't directly fetch an access token.
245+
246+
Connect using tokenproviderCallback:
247+
248+
The following example demonstrates accessToken callback provider,
249+
250+
```
251+
public void createConnection () {
252+
253+
final String clusterUrl = "https://xyz.eastus.kusto.windows.net";
254+
255+
ConnectionStringBuilder conStrBuilder = ConnectionStringBuilder.createWithAadTokenProviderAuthentication(clusterUrl, new Callable<String>() {
256+
257+
public String call() throws Exception {
258+
259+
// Call HDI util class with scope. This returns the AT and from that get token string and return.
260+
// AccessToken contains expiry time and user can cache the token once acquired and call for a new one
261+
// if it is about to expire (Say, <= 30mins for expiry).
262+
HdiIdentityTokenServiceUtils hdiUtil = new HdiIdentityTokenServiceUtils();
263+
264+
AccessToken token = hdiUtil.getAccessToken(clusterUrl);
265+
266+
return token.getToken();
267+
268+
}
269+
270+
});
271+
}
272+
```
273+
274+
**Connect using pre-fetched Access Token:**
275+
276+
Fetches accesstoken explicitly and pass it as an option.
277+
278+
```
279+
String targetResourceUri = "https://<my-kusto-cluster>";
280+
HdiIdentityTokenServiceUtils tokenUtils = new HdiIdentityTokenServiceUtils();
281+
AccessToken token = tokenUtils.getAccessToken(targetResourceUri);
282+
283+
df.write
284+
.format("com.microsoft.kusto.spark.datasource")
285+
.option(KustoSinkOptions.KUSTO_CLUSTER, "MyCluster")
286+
.option(KustoSinkOptions.KUSTO_DATABASE, "MyDatabase")
287+
.option(KustoSinkOptions.KUSTO_TABLE, "MyTable")
288+
.option(KustoSinkOptions.KUSTO_ACCESS_TOKEN, token.getToken())
289+
.option(KustoOptions., "MyTable")
290+
.mode(SaveMode.Append)
291+
.save()
292+
```
293+
294+
> [!NOTE]
295+
> HdiIdentityTokenCredential class can be used in combination with various Azure SDK client libraries to authenticate requests and access Azure services without the need to manage access tokens manually.
296+
297+
### Troubleshooting
298+
299+
Integrated **HdiIdentityTokenCredential** utility into the Spark job but hitting the following exception while accessing the token during runtime (Job execution).
300+
301+
```
302+
User class threw exception: java.lang.NoSuchFieldError: Companion
303+
at okhttp3.internal.Util.<clinit>(Util.kt:70)
304+
at okhttp3.internal.concurrent.TaskRunner.<clinit>(TaskRunner.kt:309)
305+
at okhttp3.ConnectionPool.<init>(ConnectionPool.kt:41)
306+
at okhttp3.ConnectionPool.<init>(ConnectionPool.kt:47)
307+
at okhttp3.OkHttpClient$Builder.<init>(OkHttpClient.kt:471)
308+
at com.microsoft.azure.hdinsight.oauthtoken.utils.HdiIdentityTokenServiceUtils.getAccessToken(HdiIdentityTokenServiceUtils.java:142)
309+
at com.microsoft.azure.hdinsight.oauthtoken.credential.HdiIdentityTokenCredential.getTokenSync(HdiIdentityTokenCredential.java:83)
310+
```
311+
**Answer:**
312+
313+
Following is the maven dependency tree of `hdi-oauth-util` library. User need to make sure that these jars are available at the runtime (in job container).
314+
315+
```
316+
[INFO] +- com.azure:azure-core:jar:1.49.0:compile
317+
[INFO] | +- com.azure:azure-json:jar:1.1.0:compile
318+
[INFO] | +- com.azure:azure-xml:jar:1.0.0:compile
319+
[INFO] | +- com.fasterxml.jackson.core:jackson-annotations:jar:2.13.5:compile
320+
[INFO] | +- com.fasterxml.jackson.core:jackson-core:jar:2.13.5:compile
321+
[INFO] | +- com.fasterxml.jackson.datatype:jackson-datatype-jsr310:jar:2.13.5:compile
322+
[INFO] | \- io.projectreactor:reactor-core:jar:3.4.36:compile
323+
[INFO] | \- org.reactivestreams:reactive-streams:jar:1.0.4:compile
324+
[INFO] \- com.squareup.okhttp3:okhttp:jar:4.9.3:compile
325+
[INFO] +- com.squareup.okio:okio:jar:2.8.0:compile
326+
[INFO] | \- org.jetbrains.kotlin:kotlin-stdlib-common:jar:1.4.0:compile
327+
[INFO] \- org.jetbrains.kotlin:kotlin-stdlib:jar:1.4.10:compile
328+
```
329+
330+
When you build the spark uber jar, user need to make sure these jars are shaded and included into the uber jar. Can refer the following plugins.
331+
332+
```xml
333+
<plugin>
334+
<groupId>org.apache.maven.plugins</groupId>
335+
<artifactId>maven-shade-plugin</artifactId>
336+
<version>${maven.plugin.shade.version}</version>
337+
<configuration>
338+
<createDependencyReducedPom>false</createDependencyReducedPom>
339+
<relocations>
340+
<relocation>
341+
<pattern>okio</pattern>
342+
<shadedPattern>com.shaded.okio</shadedPattern>
343+
</relocation>
344+
<relocation>
345+
<pattern>okhttp</pattern>
346+
<shadedPattern>com.shaded.okhttp</shadedPattern>
347+
</relocation>
348+
<relocation>
349+
<pattern>okhttp3</pattern>
350+
<shadedPattern>com.shaded.okhttp3</shadedPattern>
351+
</relocation>
352+
<relocation>
353+
<pattern>kotlin</pattern>
354+
<shadedPattern>com.shaded.kotlin</shadedPattern>
355+
</relocation>
356+
<relocation>
357+
<pattern>com.fasterxml.jackson</pattern>
358+
<shadedPattern>com.shaded.com.fasterxml.jackson</shadedPattern>
359+
</relocation>
360+
<relocation>
361+
<pattern>com.azure</pattern>
362+
<shadedPattern>com.shaded.com.azure</shadedPattern>
363+
</relocation>
364+
```

0 commit comments

Comments
 (0)