You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- this project has to use Databricks SDK for Python.
6
-
- anything that doesn't fit into WorkspaceClient has to get through a "mixin" process to get to the SDK itself eventually.
7
-
- you can see the example for mixins based on the `StatementExecutionExt`.
8
-
9
-
Code organization:
10
-
11
-
- components that require API interaction must be split from the components doing business logic on a class level.
12
-
- the amount of logic in the API-calling components should be kept to a minimum.
13
-
- prefer injecting business logic components into API-calling components.
14
-
- all business logic has to be covered with unit tests. It should be easier without any API calls to mock.
15
-
16
-
Integration tests:
17
-
18
-
- all new code has to be covered by integration tests.
19
-
- integration tests should use predefined test fixtures provided in the environment variables.
20
-
- tests that require their own unique fixture setup must limit the wall clock time of fixture initialization to under one second.
21
-
- each integration test must be debuggable in IntelliJ IDEA (Community Edition) with the Python plugin (community edition).
22
-
- please reuse the extensive set of [fixtures](https://docs.pytest.org/en/latest/explanation/fixtures.html#about-fixtures), that create an object and cleanup after the test finished executing.
23
-
- All the fixtures follow the same pattern: named `make_*`, which is a function, that could be called multiple times to create multiple objects.
3
+
## First Principles
4
+
5
+
We must use the [Databricks SDK for Python](https://databricks-sdk-py.readthedocs.io/) in this project. It is a toolkit for our project.
6
+
If something doesn't naturally belong to the `WorkspaceClient`, it needs to go through a "mixin" process before it can be used with the SDK.
7
+
Imagine the `WorkspaceClient` as the main control center, and the "mixin" process as a way to adapt other things to work with it.
8
+
You can find an example of how mixins are used with `StatementExecutionExt`. There's a specific example of how to make something
9
+
work with the WorkspaceClient using `StatementExecutionExt`. This example can help you understand how mixins work in practice.
10
+
11
+
## Code Organization
12
+
13
+
When you're writing code, make sure to divide it into two main parts: **Components for API Interaction** and **Components for Business Logic**.
14
+
API Interaction should only deal with talking to external systems through APIs. They are usually integration tested and mocks are simpler.
15
+
Business Logic handles the actual logic of your application, like calculations, data processing, and decision-making.
16
+
17
+
_Keep API components simple._ In the components responsible for API interactions, try to keep things as straightforward as possible.
18
+
Don't overload them with lots of complex logic; instead, focus on making API calls and handling the data from those calls.
19
+
20
+
_Inject Business Logic._ If you need to use business logic in your API-calling components, don't build it directly there.
21
+
Instead, inject (or pass in) the business logic components into your API components. This way, you can keep your API components
22
+
clean and flexible, while the business logic remains separate and reusable.
23
+
24
+
_Test your Business Logic._ It's essential to thoroughly test your business logic to ensure it works correctly. When writing
25
+
unit tests, avoid making actual API calls - unit tests are executed for every pull request and **_take seconds to complete_**.
26
+
For calling any external services, including Databricks Connect, Databricks Platform, or even Apache Spark, unit tests have
27
+
to use "mocks" or fake versions of the APIs to simulate their behavior. This makes it easier to test your code and catch any
28
+
issues without relying on external systems. Focus on testing the edge cases of the logic, especially the scenarios where
29
+
things may fail. See [this example](https://github.com/databricks/databricks-sdk-py/pull/295) as a reference of an extensive
30
+
unit test coverage suite and the clear difference between _unit tests_ and _integration tests_.
31
+
32
+
## Integration Testing Infrastructure
33
+
34
+
All new code additions must be accompanied by integration tests. Integration tests help us validate that various parts of
35
+
our application work correctly when they interact with each other or external systems. This practice ensures that our
36
+
software _**functions as a cohesive whole**_. Integration tests run every night and take approximately 15 minutes
37
+
for the whole test suite to complete.
38
+
39
+
For integration tests, we encourage using predefined test infrastructure provided through environment variables.
40
+
These fixtures are set up in advance to simulate specific scenarios, making it easier to test different use cases. These
41
+
predefined fixtures enhance test consistency and reliability and point to the real infrastructure used by integration
42
+
testing. See [Unified Authentication Documentation](https://databricks-sdk-py.readthedocs.io/en/latest/authentication.html)
43
+
for the latest reference of environment variables related to authentication.
44
+
45
+
-`CLOUD_ENV`: This environment variable specifies the cloud environment where Databricks is hosted. The values typically
46
+
indicate the cloud provider being used, such as "aws" for Amazon Web Services and "azure" for Microsoft Azure.
47
+
-`DATABRICKS_ACCOUNT_ID`: This variable stores the unique identifier for your Databricks account.
48
+
-`DATABRICKS_HOST`: This variable contains the URL of your Databricks workspace. It is the web address you use to access
49
+
your Databricks environment and typically looks like "https://dbc-....cloud.databricks.com."
50
+
-`TEST_DEFAULT_CLUSTER_ID`: This variable holds the identifier for the default cluster used in testing. The value
51
+
resembles a unique cluster ID, like "0824-163015-tdtagl1h."
52
+
-`TEST_DEFAULT_WAREHOUSE_DATASOURCE_ID`: This environment variable stores the identifier for the default warehouse data
53
+
source used in testing. The value is a unique identifier for the data source, such as "3c0fef12-ff6c-...".
54
+
-`TEST_DEFAULT_WAREHOUSE_ID`: This variable contains the identifier for the default warehouse used in testing. The value
55
+
resembles a unique warehouse ID, like "49134b80d2...".
56
+
-`TEST_INSTANCE_POOL_ID`: This environment variable stores the identifier for the instance pool used in testing.
57
+
You must utilise existing instance pools as much as possible for the sake of cluster startup time and cost reduction.
58
+
The value is a unique instance pool ID, like "0824-113319-...".
59
+
-`TEST_LEGACY_TABLE_ACL_CLUSTER_ID`: This variable holds the identifier for the cluster used in testing legacy table
60
+
access control. The value is a unique cluster ID, like "0824-161440-...".
61
+
-`TEST_USER_ISOLATION_CLUSTER_ID`: This environment variable contains the identifier for the cluster used in testing
62
+
user isolation. The value is a unique cluster ID, like "0825-164947-...".
63
+
64
+
We encourage you to leverage the extensive set of [pytest fixtures](https://docs.pytest.org/en/latest/explanation/fixtures.html#about-fixtures).
65
+
These fixtures follow a consistent naming pattern, starting with "make_". These are functions that can be called multiple
66
+
times to _create and clean up objects as needed_ for your tests. Reusing these fixtures helps maintain clean and consistent
67
+
test setups across the codebase. In cases where your tests require unique fixture setups, it's crucial to keep the wall
68
+
clock time of fixture initialization under one second. Fast fixture initialization ensures that tests run quickly, reducing
69
+
development cycle times and allowing for faster feedback during development.
24
70
25
71
```python
26
72
from databricks.sdk.service.workspace import AclPermission
Each integration test _must be debuggable within the free [IntelliJ IDEA (Community Edition)](https://www.jetbrains.com/idea/download)
81
+
with the [Python plugin (Community Edition)](https://plugins.jetbrains.com/plugin/7322-python-community-edition). If it works within
82
+
IntelliJ CE, then it would work in PyCharm. Debugging capabilities are essential for troubleshooting and diagnosing issues during
83
+
development. Ensure that your test setup allows for easy debugging by following best practices.
35
84
36
-
- The only supported IDE for developing this project is based on IntelliJ. This means that both PyCharm (commercial) and IntelliJ IDEA (Community Edition) with Python plugin (community edition) are supported.
37
-
- VSCode is not currently supported, as debugging a single integration test from it is impossible. This may change in the future.
By adhering to these guidelines, we ensure that our integration tests are robust, efficient, and easily maintainable. This,
88
+
in turn, contributes to the overall reliability and quality of our software.
40
89
41
-
This section describes setup and development process for the project.
90
+
Currently, VSCode IDE is not supported, as it does not offer interactive debugging single integration tests.
91
+
However, it's possible that this limitation may be addressed in the future.
42
92
43
-
###Local setup
93
+
## Local Setup
44
94
45
-
- Install [hatch](https://github.com/pypa/hatch):
95
+
This section provides a step-by-step guide to set up and start working on the project. These steps will help you set up
96
+
your project environment and dependencies for efficient development.
46
97
98
+
To begin, you'll need to install [Hatch](https://github.com/pypa/hatch). You can do this with the following command:
47
99
```shell
48
100
pip install hatch
49
101
```
50
102
51
-
- Create environment:
52
-
103
+
Next, create a virtual environment for your project using Hatch:
53
104
```shell
54
105
hatch env create
55
106
```
56
107
57
-
- Install dev dependencies:
58
-
108
+
To install development dependencies, including testing and database connection packages, use the following command:
59
109
```shell
60
110
hatch run pip install -e '.[test,dbconnect]'
61
111
```
62
112
63
-
- Pin your IDE to use the newly created virtual environment. You can get the python path with:
64
-
113
+
To ensure your integrated development environment (IDE) uses the newly created virtual environment, you can retrieve the Python path with this command:
65
114
```shell
66
115
hatch run python -c "import sys; print(sys.executable)"
67
116
```
68
117
69
-
- You're good to go! 🎉
118
+
Configure your IDE to use this Python path so that you work within the virtual environment when developing the project:
119
+

70
120
71
-
### Development process
72
-
73
-
Please note that you **don't** need to use `hatch` inside notebooks or in the Databricks workspace.
74
-
It's only introduced to simplify local development.
75
-
76
-
Write your code in the IDE. Please keep all relevant files under the `src/uc_migration_toolkit` directory.
77
-
78
-
Don't forget to test your code via `hatch run unit:test` and `hatch run integration:test`.
79
-
80
-
Please note that all commits go through the CI process, and it verifies linting. You can run linting locally via:
121
+
Before every commit, apply the consistent formatting of the code, as we want our codebase look consistent:
122
+
```shell
123
+
make fmt
124
+
```
81
125
126
+
Before every commit, run automated bug detector (`make lint`) and unit tests (`make test`) to ensure that automated
127
+
pull request checks do pass, before your code is reviewed by others:
82
128
```shell
83
-
hatch run lint:fmt
129
+
make linttest
84
130
```
131
+
132
+
## First contribution
133
+
134
+
Here are the example steps to submit your first contribution:
135
+
136
+
1.`git checkout -b FEATURENAME`
137
+
2. .. do the work
138
+
3.`make fmt`
139
+
4.`make lint`
140
+
5. .. fix if any
141
+
6.`make test`
142
+
7. .. fix if any
143
+
8.`git commit -a`
144
+
9.`git push origin FEATURENAME`
145
+
10. go to GitHub UI and create PR. Alternatively, `gh pr create` (if you have [GitHub CLI](https://cli.github.com/) installed).
0 commit comments