Skip to content

Commit 8f6fd6b

Browse files
authored
Merge pull request #103547 from yanancai/master
Add doc for HDInsight jar management best practice
2 parents 249f77f + 4453ee2 commit 8f6fd6b

File tree

2 files changed

+81
-0
lines changed

2 files changed

+81
-0
lines changed

articles/hdinsight/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -346,6 +346,8 @@
346346
href: ./spark/apache-azure-spark-history-server.md
347347
- name: Enable caching with IO Cache
348348
href: ./spark/apache-spark-improve-performance-iocache.md
349+
- name: Manage Jar dependencies
350+
href: ./spark/manage-jar-dependency.md
349351
- name: Use notebooks with Apache Spark
350352
items:
351353
- name: Use a local Jupyter notebook
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
---
2+
title: Manage JAR dependencies - Azure HDInsight
3+
description: This article discusses best practices for managing Java Archive (JAR) dependencies for HDInsight applications.
4+
author: hrasheed-msft
5+
ms.author: hrasheed
6+
ms.reviewer: jasonh
7+
ms.custom: hdinsightactive
8+
ms.service: hdinsight
9+
ms.topic: conceptual
10+
ms.date: 02/05/2020
11+
---
12+
13+
# JAR dependency management best practices
14+
15+
Components installed on HDInsight clusters have dependencies on third-party libraries. Usually, a specific version of common modules like Guava is referenced by these built-in components. When you submit an application with its dependencies, it can cause a conflict between different versions of the same module. If the component version that you reference in the classpath first, built-in components may throw exceptions because of version incompatibility. However, if built-in components inject their dependencies to the classpath first, your application may throw errors like `NoSuchMethod`.
16+
17+
To avoid version conflict, consider shading your application dependencies.
18+
19+
## What does package shading mean?
20+
Shading provides a way to include and rename dependencies. It relocates the classes and rewrites affected bytecode and resources to create a private copy of your dependencies.
21+
22+
## How to shade a package?
23+
24+
### Use uber-jar
25+
Uber-jar is a single jar file that contains both the application jar and its dependencies. The dependencies in Uber-jar are by-default not shaded. In some cases, this may introduce version conflict if other components or applications reference a different version of those libraries. To avoid this, you can build an Uber-Jar file with some (or all) of the dependencies shaded.
26+
27+
### Shade package using Maven
28+
Maven can build applications written both in Java and Scala. Maven-shade-plugin can help you create a shaded uber-jar easily.
29+
30+
The example below shows a file `pom.xml` which has been updated to shade a package using maven-shade-plugin. The XML section `<relocation>…</relocation>` moves classes from package `com.google.guava` into package `com.google.shaded.guava` by moving the corresponding JAR file entries and rewriting the affected bytecode.
31+
32+
After changing `pom.xml`, you can execute `mvn package` to build the shaded uber-jar.
33+
34+
```xml
35+
<build>
36+
<plugins>
37+
<plugin>
38+
<groupId>org.apache.maven.plugins</groupId>
39+
<artifactId>maven-shade-plugin</artifactId>
40+
<version>3.2.1</version>
41+
<executions>
42+
<execution>
43+
<phase>package</phase>
44+
<goals>
45+
<goal>shade</goal>
46+
</goals>
47+
<configuration>
48+
<relocations>
49+
<relocation>
50+
<pattern>com.google.guava</pattern>
51+
<shadedPattern>com.google.shaded.guava</shadedPattern>
52+
</relocation>
53+
</relocations>
54+
</configuration>
55+
</execution>
56+
</executions>
57+
</plugin>
58+
</plugins>
59+
</build>
60+
```
61+
62+
### Shade package using SBT
63+
SBT is also a build tool for Scala and Java. SBT doesn't have a shade plugin like maven-shade-plugin. You can modify `build.sbt` file to shade packages.
64+
65+
For example, to shade `com.google.guava`, you can add the below command to the `build.sbt` file:
66+
67+
```scala
68+
assemblyShadeRules in assembly := Seq(
69+
ShadeRule.rename("com.google.guava" -> "com.google.shaded.guava.@1").inAll
70+
)
71+
```
72+
73+
Then you can run `sbt clean` and `sbt assembly` to build the shaded jar file.
74+
75+
## Next steps
76+
77+
* [Use HDInsight IntelliJ Tools](https://docs.microsoft.com/azure/hdinsight/hadoop/hdinsight-tools-for-intellij-with-hortonworks-sandbox)
78+
79+
* [Create a Scala Maven application for Spark in IntelliJ](https://docs.microsoft.com/azure/hdinsight/spark/apache-spark-create-standalone-application)

0 commit comments

Comments
 (0)