Skip to content

Commit f6a35c5

Browse files
authored
Add default scala template to databricks cli (#3906)
## Changes <!-- Brief summary of your changes that is easy to understand --> This adds a new default scala DABs template as a follow up to databricks/bundle-examples#119 ## Why <!-- Why are these changes needed? Provide the context that the reviewer might be missing. For example, were there any decisions behind the change that are not reflected in the code itself? --> This is to provide an off the shelf template for customers to start using scala dbconnect to develop scala jobs in Databricks ## Tests <!-- How have you tested the changes? --> Manually tested user flow interactively and as job workload for both standard and serverless <!-- If your PR needs to be included in the release notes for next release, add a separate entry in NEXT_CHANGELOG.md as part of your PR. -->
1 parent cc9703f commit f6a35c5

File tree

31 files changed

+819
-0
lines changed

31 files changed

+819
-0
lines changed

NEXT_CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,5 +9,6 @@
99
### Dependency updates
1010

1111
### Bundles
12+
* Add `default-scala` template for Scala projects with SBT build configuration and example code ([#3906](https://github.com/databricks/cli/pull/3906))
1213

1314
### API Changes

acceptance/bundle/help/bundle-init/output.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ TEMPLATE_PATH optionally specifies which template to use. It can be one of the f
66
- default-python: The default Python template for Notebooks and Lakeflow
77
- default-sql: The default SQL template for .sql files that run with Databricks SQL
88
- default-minimal: The minimal template, for advanced users
9+
- default-scala: The default Scala template for JAR jobs
910
- dbt-sql: The dbt SQL template (databricks.com/blog/delivering-cost-effective-data-real-time-dbt-and-databricks)
1011
- mlops-stacks: The Databricks MLOps Stacks template (github.com/databricks/mlops-stacks)
1112
- pydabs: A variant of the 'default-python' template that defines resources in Python instead of YAML
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"project_name": "my_default_scala",
3+
"compute_type": "serverless",
4+
"artifacts_dest_path": "/Volumes/test-folder",
5+
"default_catalog": "main",
6+
"personal_schemas": "yes, use a schema based on the current user name during development"
7+
}

acceptance/bundle/templates/default-scala/out.test.toml

Lines changed: 5 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
2+
>>> [CLI] bundle init default-scala --config-file ./input.json --output-dir output
3+
4+
Welcome to the default-scala template for Databricks Asset Bundles!
5+
6+
A workspace was selected based on your current profile. For information about how to change this, see https://docs.databricks.com/dev-tools/cli/profiles.html.
7+
workspace_host: [DATABRICKS_URL]
8+
✨ Successfully initialized template
Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# my_default_scala
2+
3+
The 'my_default_scala' project was generated by using the default-scala template.
4+
5+
## Getting started
6+
7+
1. Install the Databricks CLI from https://docs.databricks.com/dev-tools/cli/install.html. The version must be v0.241.0 or later.
8+
9+
2. Authenticate to your Databricks workspace (if you have not done so already):
10+
```
11+
$ databricks configure
12+
```
13+
14+
3. To deploy a development copy of this project, type:
15+
```
16+
$ databricks bundle deploy --target dev
17+
```
18+
(Note that "dev" is the default target, so the `--target` parameter
19+
is optional here.)
20+
21+
This deploys everything that's defined for this project.
22+
For example, the default template would deploy a job called
23+
`[dev yourname] my_default_scala_job` to your workspace.
24+
You can find that job by opening your workspace and clicking on **Workflows**.
25+
26+
4. Similarly, to deploy a production copy, type:
27+
```
28+
$ databricks bundle deploy --target prod
29+
```
30+
31+
5. To run a job, use the "run" command:
32+
```
33+
$ databricks bundle run
34+
```
35+
36+
6. Optionally, install developer tools such as the Databricks extension for Visual Studio Code from
37+
https://docs.databricks.com/dev-tools/vscode-ext.html.
38+
39+
7. For documentation on the Databricks Asset Bundles format used
40+
for this project, and for CI/CD configuration, see
41+
https://docs.databricks.com/dev-tools/bundles/index.html.
42+
43+
## Local Devloop
44+
45+
### Prerequisites
46+
47+
Install the following tools:
48+
49+
- [sbt](https://www.scala-sbt.org/) v1.10.2 or later
50+
- Java 17
51+
52+
### Running via sbt
53+
54+
1. On the terminal, navigate to the project's root directory. This is the directory where the `build.sbt` file is located.
55+
2. Execute the project's default `Main` class by running `sbt run`.
56+
57+
### IntelliJ setup
58+
59+
Install the latest [IntelliJ IDEA](https://www.jetbrains.com/idea/) IDE, both Community and Professional Editions work.
60+
Install the [Scala plugin](https://plugins.jetbrains.com/plugin/1347-scala) from the Jetbrains marketplace.
61+
62+
1. Import the current directory in your in IntelliJ where `build.sbt` is located.
63+
2. Choose the correct Java version in IntelliJ, go to File -> Project Structure -> SDKs.
64+
65+
Then Run -> Edit Configurations -> Set version to Java 17 from drop
66+
3. You should now be able to run the code directly in the IDE via the ▶️ option.
67+
68+
#### JVM settings
69+
70+
If you see the following error message,
71+
72+
```
73+
Failed to initialize MemoryUtil. You must start Java with --add-opens=java.base/java.nio=ALL-UNNAMED
74+
```
75+
76+
add the following to your JVM settings: `--add-opens=java.base/java.nio=org.apache.arrow.memory.core,ALL-UNNAMED`.
77+
78+
See the IntelliJ instructions on how to [configure VM settings for a specific run
79+
configuration](https://www.jetbrains.com/help/idea/run-debug-configuration-java-application.html#more_options)
80+
or [configure it everywhere on your IDE](https://www.jetbrains.com/help/ide-services/configure-settings-via-profiles.html).
81+
82+
### Unit tests
83+
84+
The project comes with a sample set of unit tests: `NycTaxiSpec.scala` using the ScalaTest
85+
framework.
86+
87+
Run the tests either directly in the IntelliJ IDE by clicking ▶️ on the tests, or via sbt
88+
by running `sbt test`.
89+
90+
## Customizations
91+
92+
### Job configuration
93+
This project uses serverless compute. No cluster setup is required.
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
// This file is used to build the sbt project with Databricks Connect.
2+
// This also includes the instructions on how to to create the jar uploaded via databricks bundle
3+
scalaVersion := "2.13.16"
4+
5+
name := "my_default_scala"
6+
organization := "com.examples"
7+
version := "0.1"
8+
9+
libraryDependencies += "com.databricks" %% "databricks-connect" % "17.0.+"
10+
libraryDependencies += "org.slf4j" % "slf4j-simple" % "2.0.16"
11+
12+
libraryDependencies += "org.scalatest" %% "scalatest" % "3.2.19" % Test
13+
14+
assembly / assemblyOption ~= { _.withIncludeScala(false) }
15+
assembly / assemblyExcludedJars := {
16+
val cp = (assembly / fullClasspath).value
17+
cp filter { _.data.getName.matches("scala-.*") } // remove Scala libraries
18+
}
19+
20+
assemblyMergeStrategy := {
21+
case _ => MergeStrategy.preferProject
22+
}
23+
24+
// to run with new jvm options, a fork is required otherwise it uses same options as sbt process
25+
fork := true
26+
javaOptions += "--add-opens=java.base/java.nio=ALL-UNNAMED"
27+
28+
// To ensure logs are written to System.out by default and not System.err
29+
javaOptions += "-Dorg.slf4j.simpleLogger.logFile=System.out"
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# This is a Databricks asset bundle definition for my_default_scala.
2+
# See https://docs.databricks.com/dev-tools/bundles/index.html for documentation.
3+
bundle:
4+
name: my_default_scala
5+
uuid: [UUID]
6+
7+
include:
8+
- resources/*.yml
9+
10+
variables:
11+
catalog:
12+
description: The catalog to use
13+
schema:
14+
description: The schema to use
15+
16+
workspace:
17+
host: [DATABRICKS_URL]
18+
artifact_path: /Volumes/test-folder/${bundle.name}/${bundle.target}/${workspace.current_user.short_name}
19+
20+
artifacts:
21+
default:
22+
type: jar
23+
build: sbt package && sbt assembly
24+
path: .
25+
files:
26+
- source: ./target/scala-2.13/my_default_scala-assembly-0.1.jar
27+
28+
targets:
29+
dev:
30+
# The default target uses 'mode: development' to create a development copy.
31+
# - Deployed resources get prefixed with '[dev my_user_name]'
32+
# - Any job schedules and triggers are paused by default.
33+
# See also https://docs.databricks.com/dev-tools/bundles/deployment-modes.html.
34+
mode: development
35+
default: true
36+
workspace:
37+
host: [DATABRICKS_URL]
38+
variables:
39+
catalog: main
40+
schema: ${workspace.current_user.short_name}
41+
42+
prod:
43+
mode: production
44+
workspace:
45+
host: [DATABRICKS_URL]
46+
# We explicitly deploy to /Workspace/Users/[USERNAME] to make sure we only have a single copy.
47+
root_path: /Workspace/Users/[USERNAME]/.bundle/${bundle.name}/${bundle.target}
48+
permissions:
49+
- user_name: [USERNAME]
50+
level: CAN_MANAGE
51+
variables:
52+
catalog: main
53+
schema: default
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.databricks/
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
// The project folder is used to store sbt specific project files
2+
// This file is used to define the plugins that are used in the sbt project.
3+
// In particular, this includes the assembly plugin to generate an uber jar.
4+
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.0.0")

0 commit comments

Comments
 (0)