Skip to content

Commit 780052f

Browse files
authored
feat: enhance pre commit hook (#66)
* feat: enhance pre-commit hook with validation and detailed setup instructions - Updated the pre-commit hook description for clarity. - Added validation for the configuration file in the executor hook to ensure it exists and is not empty. - Expanded README and documentation to include detailed setup instructions for the pre-commit hook, including configuration options and best practices. * feat: enhance pre-commit hook with improved config handling and logging - Added functionality to load configuration from both file and API, with detailed error handling. - Introduced logging statements to provide feedback during the execution of the pre-commit hook. - Improved handling of changed files and insights generation process, ensuring better visibility into operations. * Fix manifest object handling in executor hook logging * Enhance logging in executor hook to include config ID when using API * Update logging in executor hook to clarify config ID output * Fix executor hook to only fail when actual issues are found * Fix pre-commit hook: load manifest/catalog from files instead of generating partial ones * Update pre-commit hook configuration to version v0.0.27 and enhance documentation - Updated the pre-commit hook version to v0.0.27 in configuration files. - Added new optional arguments for manifest and catalog file paths in the README and documentation. - Clarified requirements for DBT artifacts in the documentation to improve user guidance. * Refactor executor hook for improved configuration handling and logging - Introduced functions to load configuration from both file and API, enhancing error handling and user feedback. - Added new command-line arguments for better flexibility in specifying paths and configurations. - Streamlined the process of loading manifest and catalog files, with improved logging for insights generation. - Enhanced the handling of changed files for selective model testing, ensuring clarity in output and error messages. * Update argument extraction in executor hook to include an additional return value - Modified the `extract_arguments` function to return an extra string, enhancing the argument handling capabilities. - This change supports future extensions and improves flexibility in argument parsing.
1 parent ba6e383 commit 780052f

File tree

4 files changed

+478
-74
lines changed

4 files changed

+478
-74
lines changed

.pre-commit-hooks.yaml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,16 @@
11
- id: datapilot_run_dbt_checks
22
name: datapilot run dbt checks
3-
description: datapilot run dbt checks
3+
description: Run DataPilot dbt project health checks on changed files
44
entry: datapilot_run_dbt_checks
55
language: python
66
types_or: [yaml, sql]
77
require_serial: true
8+
# Optional arguments that can be passed to the hook:
9+
# --config-path: Path to configuration file
10+
# --token: API token for authentication
11+
# --instance-name: Tenant/instance name
12+
# --backend-url: Backend URL (defaults to https://api.myaltimate.com)
13+
# --config-name: Name of config to use from API
14+
# --manifest-path: Path to DBT manifest file (defaults to ./target/manifest.json)
15+
# --catalog-path: Path to DBT catalog file (defaults to ./target/catalog.json)
16+
# --base-path: Base path of the dbt project (defaults to current directory)

README.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,40 @@ The [--config-path] is an optional argument. You can provide a yaml file with ov
4242

4343
Note: The dbt docs generate requires an active database connection and may take a long time for projects with large number of models.
4444

45+
### Pre-commit Hook Integration
46+
47+
DataPilot provides a pre-commit hook that automatically runs health checks on changed files before each commit. This ensures code quality and catches issues early in the development process.
48+
49+
#### Quick Setup
50+
51+
1. Install pre-commit:
52+
```bash
53+
pip install pre-commit
54+
```
55+
56+
2. Add to your `.pre-commit-config.yaml`:
57+
```yaml
58+
repos:
59+
- repo: https://github.com/AltimateAI/datapilot-cli
60+
rev: v0.0.27 # Always use a specific version tag
61+
hooks:
62+
- id: datapilot_run_dbt_checks
63+
args: [
64+
"--config-path", "./datapilot-config.yaml",
65+
"--token", "${DATAPILOT_TOKEN}",
66+
"--instance-name", "${DATAPILOT_INSTANCE}",
67+
"--manifest-path", "./target/manifest.json",
68+
"--catalog-path", "./target/catalog.json"
69+
]
70+
```
71+
72+
3. Install the hook:
73+
```bash
74+
pre-commit install
75+
```
76+
77+
For detailed setup instructions, see the [Pre-commit Hook Setup Guide](docs/pre-commit-setup.md).
78+
4579
### Checks
4680

4781
The following checks are available:

docs/hooks.rst

Lines changed: 150 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -11,37 +11,171 @@ To use the DataPilot pre-commit hook, follow these steps:
1111

1212
1. Install the `pre-commit` package if you haven't already:
1313

14-
```
15-
pip install pre-commit
16-
```
14+
.. code-block:: shell
15+
16+
pip install pre-commit
1717
1818
2. Add the following configuration to your .pre-commit-config.yaml file in the root of your repository:
1919

20-
```
20+
.. code-block:: yaml
21+
2122
repos:
22-
- repo: https://github.com/AltimateAI/datapilot-cli
23-
rev: <revision>
24-
hooks:
25-
- id: datapilot_run_dbt_checks
26-
args: ["--config-path", "path/to/your/config/file"]
27-
```
23+
- repo: https://github.com/AltimateAI/datapilot-cli
24+
rev: v0.0.27 # Use a specific version tag, not 'main'
25+
hooks:
26+
- id: datapilot_run_dbt_checks
27+
args: [
28+
"--config-path", "./datapilot-config.yaml",
29+
"--token", "${DATAPILOT_TOKEN}",
30+
"--instance-name", "${DATAPILOT_INSTANCE}"
31+
]
32+
33+
Configuration Options
34+
---------------------
35+
36+
The DataPilot pre-commit hook supports several configuration options:
37+
38+
**Required Configuration:**
39+
40+
- ``rev``: Always use a specific version tag (e.g., ``v0.0.27``) instead of ``main`` for production stability
41+
42+
**Optional Arguments:**
43+
44+
- ``--config-path``: Path to your DataPilot configuration file
45+
- ``--token``: Your API token for authentication (can use environment variables)
46+
- ``--instance-name``: Your tenant/instance name (can use environment variables)
47+
- ``--backend-url``: Backend URL (defaults to https://api.myaltimate.com)
48+
- ``--config-name``: Name of config to use from API
49+
- ``--base-path``: Base path of the dbt project (defaults to current directory)
50+
- ``--manifest-path``: Path to the DBT manifest file (defaults to {base_path}/target/manifest.json)
51+
- ``--catalog-path``: Path to the DBT catalog file (defaults to {base_path}/target/catalog.json)
2852

29-
Replace <revision> with the desired revision of the DataPilot repository and "path/to/your/config/file" with the path to your configuration file.
53+
**Environment Variables:**
54+
55+
You can use environment variables for sensitive information:
56+
57+
.. code-block:: yaml
58+
59+
repos:
60+
- repo: https://github.com/AltimateAI/datapilot-cli
61+
rev: v0.0.27
62+
hooks:
63+
- id: datapilot_run_dbt_checks
64+
args: [
65+
"--config-path", "./datapilot-config.yaml",
66+
"--token", "${DATAPILOT_TOKEN}",
67+
"--instance-name", "${DATAPILOT_INSTANCE}",
68+
"--manifest-path", "./target/manifest.json",
69+
"--catalog-path", "./target/catalog.json"
70+
]
71+
72+
**Configuration File Example:**
73+
74+
Create a ``datapilot-config.yaml`` file in your project root:
75+
76+
.. code-block:: yaml
77+
78+
# DataPilot Configuration
79+
disabled_insights:
80+
- "hard_coded_references"
81+
- "duplicate_sources"
82+
83+
# Custom settings for your project
84+
project_settings:
85+
max_fanout: 10
86+
require_tests: true
3087
3188
3. Install the pre-commit hook:
3289

33-
```
34-
pre-commit install
35-
```
90+
.. code-block:: shell
91+
92+
pre-commit install
3693
3794
Usage
3895
-----
3996

40-
Once the hook is installed, it will run automatically before each commit. If any issues are detected, the commit will be aborted, and you will be prompted to fix the issues before retrying the commit.
97+
Once the hook is installed, it will run automatically before each commit. The hook will:
98+
99+
1. **Validate Configuration**: Check that your config file exists and is valid
100+
2. **Authenticate**: Use your provided token and instance name to authenticate
101+
3. **Load DBT Artifacts**: Load manifest and catalog files for analysis
102+
4. **Analyze Changes**: Only analyze files that have changed in the commit
103+
5. **Report Issues**: Display any issues found and prevent the commit if problems are detected
104+
105+
**Required DBT Artifacts:**
106+
107+
The pre-commit hook requires DBT manifest and catalog files to function properly:
41108

109+
- **Manifest File**: Generated by running `dbt compile` or `dbt run`. Default location: `./target/manifest.json`
110+
- **Catalog File**: Generated by running `dbt docs generate`. Default location: `./target/catalog.json`
42111

43-
If you want to manually run all pre-commit hooks on a repository, run `pre-commit run --all-files`. To run individual hooks use `pre-commit run <hook_id>`.
112+
**Note**: The catalog file is optional but recommended for comprehensive analysis. If not available, the hook will continue without catalog information.
44113

114+
**Manual Execution:**
115+
116+
To manually run all pre-commit hooks on a repository:
117+
118+
.. code-block:: shell
119+
120+
pre-commit run --all-files
121+
122+
To run individual hooks:
123+
124+
.. code-block:: shell
125+
126+
pre-commit run datapilot_run_dbt_checks
127+
128+
**Troubleshooting:**
129+
130+
- **Authentication Issues**: Ensure your token and instance name are correctly set
131+
- **Empty Config Files**: The hook will fail if your config file is empty or invalid
132+
- **Missing Manifest File**: Ensure you have run `dbt compile` or `dbt run` to generate the manifest.json file
133+
- **Missing Catalog File**: Run `dbt docs generate` to create the catalog.json file (optional but recommended)
134+
- **No Changes**: If no relevant files have changed, the hook will skip execution
135+
- **Network Issues**: Ensure you have access to the DataPilot API
136+
137+
Best Practices
138+
-------------
139+
140+
1. **Use Version Tags**: Always specify a version tag in the ``rev`` field, never use ``main``
141+
2. **Environment Variables**: Use environment variables for sensitive information like tokens
142+
3. **Configuration Files**: Create a dedicated config file for your project settings
143+
4. **Regular Updates**: Update to new versions when they become available
144+
5. **Team Coordination**: Ensure all team members use the same configuration
145+
146+
Example Complete Setup
147+
---------------------
148+
149+
Here's a complete example of a ``.pre-commit-config.yaml`` file:
150+
151+
.. code-block:: yaml
152+
153+
# .pre-commit-config.yaml
154+
exclude: '^(\.tox|ci/templates|\.bumpversion\.cfg)(/|$)'
155+
156+
repos:
157+
- repo: https://github.com/astral-sh/ruff-pre-commit
158+
rev: v0.1.14
159+
hooks:
160+
- id: ruff
161+
args: [--fix, --exit-non-zero-on-fix, --show-fixes]
162+
163+
- repo: https://github.com/psf/black
164+
rev: 23.12.1
165+
hooks:
166+
- id: black
167+
168+
- repo: https://github.com/AltimateAI/datapilot-cli
169+
rev: v0.0.27
170+
hooks:
171+
- id: datapilot_run_dbt_checks
172+
args: [
173+
"--config-path", "./datapilot-config.yaml",
174+
"--token", "${DATAPILOT_TOKEN}",
175+
"--instance-name", "${DATAPILOT_INSTANCE}",
176+
"--manifest-path", "./target/manifest.json",
177+
"--catalog-path", "./target/catalog.json"
178+
]
45179
46180
Feedback and Contributions
47181
--------------------------

0 commit comments

Comments
 (0)