Skip to content

Commit 0a0ca37

Browse files
committed
feat: enhance pre-commit hook with validation and detailed setup instructions
- Updated the pre-commit hook description for clarity. - Added validation for the configuration file in the executor hook to ensure it exists and is not empty. - Expanded README and documentation to include detailed setup instructions for the pre-commit hook, including configuration options and best practices.
1 parent ba6e383 commit 0a0ca37

File tree

4 files changed

+271
-53
lines changed

4 files changed

+271
-53
lines changed

.pre-commit-hooks.yaml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,13 @@
11
- id: datapilot_run_dbt_checks
22
name: datapilot run dbt checks
3-
description: datapilot run dbt checks
3+
description: Run DataPilot dbt project health checks on changed files
44
entry: datapilot_run_dbt_checks
55
language: python
66
types_or: [yaml, sql]
77
require_serial: true
8+
# Optional arguments that can be passed to the hook:
9+
# --config-path: Path to configuration file
10+
# --token: API token for authentication
11+
# --instance-name: Tenant/instance name
12+
# --backend-url: Backend URL (defaults to https://api.myaltimate.com)
13+
# --config-name: Name of config to use from API

README.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,38 @@ The [--config-path] is an optional argument. You can provide a yaml file with ov
4242

4343
Note: The dbt docs generate requires an active database connection and may take a long time for projects with large number of models.
4444

45+
### Pre-commit Hook Integration
46+
47+
DataPilot provides a pre-commit hook that automatically runs health checks on changed files before each commit. This ensures code quality and catches issues early in the development process.
48+
49+
#### Quick Setup
50+
51+
1. Install pre-commit:
52+
```bash
53+
pip install pre-commit
54+
```
55+
56+
2. Add to your `.pre-commit-config.yaml`:
57+
```yaml
58+
repos:
59+
- repo: https://github.com/AltimateAI/datapilot-cli
60+
rev: v0.0.23 # Always use a specific version tag
61+
hooks:
62+
- id: datapilot_run_dbt_checks
63+
args: [
64+
"--config-path", "./datapilot-config.yaml",
65+
"--token", "${DATAPILOT_TOKEN}",
66+
"--instance-name", "${DATAPILOT_INSTANCE}"
67+
]
68+
```
69+
70+
3. Install the hook:
71+
```bash
72+
pre-commit install
73+
```
74+
75+
For detailed setup instructions, see the [Pre-commit Hook Setup Guide](docs/pre-commit-setup.md).
76+
4577
### Checks
4678

4779
The following checks are available:

docs/hooks.rst

Lines changed: 132 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -11,37 +11,153 @@ To use the DataPilot pre-commit hook, follow these steps:
1111

1212
1. Install the `pre-commit` package if you haven't already:
1313

14-
```
15-
pip install pre-commit
16-
```
14+
.. code-block:: shell
15+
16+
pip install pre-commit
1717
1818
2. Add the following configuration to your .pre-commit-config.yaml file in the root of your repository:
1919

20-
```
20+
.. code-block:: yaml
21+
2122
repos:
22-
- repo: https://github.com/AltimateAI/datapilot-cli
23-
rev: <revision>
24-
hooks:
25-
- id: datapilot_run_dbt_checks
26-
args: ["--config-path", "path/to/your/config/file"]
27-
```
23+
- repo: https://github.com/AltimateAI/datapilot-cli
24+
rev: v0.0.23 # Use a specific version tag, not 'main'
25+
hooks:
26+
- id: datapilot_run_dbt_checks
27+
args: [
28+
"--config-path", "./datapilot-config.yaml",
29+
"--token", "${DATAPILOT_TOKEN}",
30+
"--instance-name", "${DATAPILOT_INSTANCE}"
31+
]
32+
33+
Configuration Options
34+
---------------------
35+
36+
The DataPilot pre-commit hook supports several configuration options:
37+
38+
**Required Configuration:**
39+
40+
- ``rev``: Always use a specific version tag (e.g., ``v0.0.23``) instead of ``main`` for production stability
41+
42+
**Optional Arguments:**
43+
44+
- ``--config-path``: Path to your DataPilot configuration file
45+
- ``--token``: Your API token for authentication (can use environment variables)
46+
- ``--instance-name``: Your tenant/instance name (can use environment variables)
47+
- ``--backend-url``: Backend URL (defaults to https://api.myaltimate.com)
48+
- ``--config-name``: Name of config to use from API
49+
- ``--base-path``: Base path of the dbt project (defaults to current directory)
50+
51+
**Environment Variables:**
52+
53+
You can use environment variables for sensitive information:
54+
55+
.. code-block:: yaml
56+
57+
repos:
58+
- repo: https://github.com/AltimateAI/datapilot-cli
59+
rev: v0.0.23
60+
hooks:
61+
- id: datapilot_run_dbt_checks
62+
args: [
63+
"--config-path", "./datapilot-config.yaml",
64+
"--token", "${DATAPILOT_TOKEN}",
65+
"--instance-name", "${DATAPILOT_INSTANCE}"
66+
]
67+
68+
**Configuration File Example:**
69+
70+
Create a ``datapilot-config.yaml`` file in your project root:
71+
72+
.. code-block:: yaml
73+
74+
# DataPilot Configuration
75+
disabled_insights:
76+
- "hard_coded_references"
77+
- "duplicate_sources"
2878
29-
Replace <revision> with the desired revision of the DataPilot repository and "path/to/your/config/file" with the path to your configuration file.
79+
# Custom settings for your project
80+
project_settings:
81+
max_fanout: 10
82+
require_tests: true
3083
3184
3. Install the pre-commit hook:
3285

33-
```
34-
pre-commit install
35-
```
86+
.. code-block:: shell
87+
88+
pre-commit install
3689
3790
Usage
3891
-----
3992

40-
Once the hook is installed, it will run automatically before each commit. If any issues are detected, the commit will be aborted, and you will be prompted to fix the issues before retrying the commit.
93+
Once the hook is installed, it will run automatically before each commit. The hook will:
94+
95+
1. **Validate Configuration**: Check that your config file exists and is valid
96+
2. **Authenticate**: Use your provided token and instance name to authenticate
97+
3. **Analyze Changes**: Only analyze files that have changed in the commit
98+
4. **Report Issues**: Display any issues found and prevent the commit if problems are detected
99+
100+
**Manual Execution:**
101+
102+
To manually run all pre-commit hooks on a repository:
103+
104+
.. code-block:: shell
105+
106+
pre-commit run --all-files
107+
108+
To run individual hooks:
109+
110+
.. code-block:: shell
41111
112+
pre-commit run datapilot_run_dbt_checks
42113
43-
If you want to manually run all pre-commit hooks on a repository, run `pre-commit run --all-files`. To run individual hooks use `pre-commit run <hook_id>`.
114+
**Troubleshooting:**
44115

116+
- **Authentication Issues**: Ensure your token and instance name are correctly set
117+
- **Empty Config Files**: The hook will fail if your config file is empty or invalid
118+
- **No Changes**: If no relevant files have changed, the hook will skip execution
119+
- **Network Issues**: Ensure you have access to the DataPilot API
120+
121+
Best Practices
122+
-------------
123+
124+
1. **Use Version Tags**: Always specify a version tag in the ``rev`` field, never use ``main``
125+
2. **Environment Variables**: Use environment variables for sensitive information like tokens
126+
3. **Configuration Files**: Create a dedicated config file for your project settings
127+
4. **Regular Updates**: Update to new versions when they become available
128+
5. **Team Coordination**: Ensure all team members use the same configuration
129+
130+
Example Complete Setup
131+
---------------------
132+
133+
Here's a complete example of a ``.pre-commit-config.yaml`` file:
134+
135+
.. code-block:: yaml
136+
137+
# .pre-commit-config.yaml
138+
exclude: '^(\.tox|ci/templates|\.bumpversion\.cfg)(/|$)'
139+
140+
repos:
141+
- repo: https://github.com/astral-sh/ruff-pre-commit
142+
rev: v0.1.14
143+
hooks:
144+
- id: ruff
145+
args: [--fix, --exit-non-zero-on-fix, --show-fixes]
146+
147+
- repo: https://github.com/psf/black
148+
rev: 23.12.1
149+
hooks:
150+
- id: black
151+
152+
- repo: https://github.com/AltimateAI/datapilot-cli
153+
rev: v0.0.23
154+
hooks:
155+
- id: datapilot_run_dbt_checks
156+
args: [
157+
"--config-path", "./datapilot-config.yaml",
158+
"--token", "${DATAPILOT_TOKEN}",
159+
"--instance-name", "${DATAPILOT_INSTANCE}"
160+
]
45161
46162
Feedback and Contributions
47163
--------------------------

src/datapilot/core/platforms/dbt/hooks/executor_hook.py

Lines changed: 100 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
import argparse
2+
import sys
23
import time
4+
from pathlib import Path
35
from typing import Optional
46
from typing import Sequence
57

@@ -13,6 +15,23 @@
1315
from datapilot.utils.utils import generate_partial_manifest_catalog
1416

1517

18+
def validate_config_file(config_path: str) -> bool:
19+
"""Validate that the config file exists and is not empty."""
20+
if not Path(config_path).exists():
21+
print(f"Error: Config file '{config_path}' does not exist.", file=sys.stderr)
22+
return False
23+
24+
try:
25+
config = load_config(config_path)
26+
if not config:
27+
print(f"Error: Config file '{config_path}' is empty or invalid.", file=sys.stderr)
28+
return False
29+
return True
30+
except Exception as e:
31+
print(f"Error: Failed to load config file '{config_path}': {e}", file=sys.stderr)
32+
return False
33+
34+
1635
def main(argv: Optional[Sequence[str]] = None):
1736
start_time = time.time()
1837
parser = argparse.ArgumentParser()
@@ -28,58 +47,103 @@ def main(argv: Optional[Sequence[str]] = None):
2847
help="Base path of the dbt project",
2948
)
3049

50+
parser.add_argument(
51+
"--token",
52+
help="Your API token for authentication.",
53+
)
54+
55+
parser.add_argument(
56+
"--instance-name",
57+
help="Your tenant ID.",
58+
)
59+
60+
parser.add_argument("--backend-url", help="Altimate's Backend URL", default="https://api.myaltimate.com")
61+
62+
parser.add_argument(
63+
"--config-name",
64+
help="Name of the DBT config to use from the API",
65+
)
66+
3167
args = parser.parse_known_args(argv)
32-
# print(f"args: {args}", file=sys.__stdout__)
68+
69+
# Validate config file if provided
3370
config = {}
3471
if hasattr(args[0], "config_path") and args[0].config_path:
35-
# print(f"Using config file: {args[0].config_path[0]}")
36-
config = load_config(args[0].config_path[0])
72+
config_path = args[0].config_path[0]
73+
if not validate_config_file(config_path):
74+
print("Pre-commit hook failed: Invalid config file.", file=sys.stderr)
75+
sys.exit(1)
76+
config = load_config(config_path)
3777

3878
base_path = "./"
3979
if hasattr(args[0], "base_path") and args[0].base_path:
4080
base_path = args[0].base_path[0]
4181

82+
# Get authentication parameters
83+
token = getattr(args[0], "token", None)
84+
instance_name = getattr(args[0], "instance_name", None)
85+
backend_url = getattr(args[0], "backend_url", "https://api.myaltimate.com")
86+
87+
# Validate authentication parameters
88+
if not token:
89+
print("Warning: No API token provided. Using default configuration.", file=sys.stderr)
90+
print("To specify a token, use: --token 'your-token'", file=sys.stderr)
91+
92+
if not instance_name:
93+
print("Warning: No instance name provided. Using default configuration.", file=sys.stderr)
94+
print("To specify an instance, use: --instance-name 'your-instance'", file=sys.stderr)
95+
4296
changed_files = args[1]
43-
# print(f"Changed files: {changed_files}")
4497

4598
if not changed_files:
46-
# print("No changed files detected - test. Exiting...")
99+
print("No changed files detected. Skipping datapilot checks.", file=sys.stderr)
47100
return
48101

49-
# print(f"Changed files: {changed_files}", file=sys.__stdout__)
50-
selected_models, manifest, catalog = generate_partial_manifest_catalog(changed_files, base_path=base_path)
51-
# print("se1ected models", selected_models, file=sys.__stdout__)
52-
insight_generator = DBTInsightGenerator(
53-
manifest=manifest,
54-
catalog=catalog,
55-
config=config,
56-
selected_model_ids=selected_models,
57-
)
58-
reports = insight_generator.run()
59-
if reports:
60-
model_report = generate_model_insights_table(reports[MODEL])
61-
if len(model_report) > 0:
62-
print("--" * 50)
63-
print("Model Insights")
64-
print("--" * 50)
65-
for model_id, report in model_report.items():
66-
print(f"Model: {model_id}")
67-
print(f"File path: {report['path']}")
68-
print(tabulate_data(report["table"], headers="keys"))
69-
print("\n")
70-
71-
project_report = generate_project_insights_table(reports[PROJECT])
72-
if len(project_report) > 0:
73-
print("--" * 50)
74-
print("Project Insights")
75-
print("--" * 50)
76-
print(tabulate_data(project_report, headers="keys"))
77-
78-
exit(1)
102+
try:
103+
selected_models, manifest, catalog = generate_partial_manifest_catalog(changed_files, base_path=base_path)
104+
105+
insight_generator = DBTInsightGenerator(
106+
manifest=manifest,
107+
catalog=catalog,
108+
config=config,
109+
selected_model_ids=selected_models,
110+
token=token,
111+
instance_name=instance_name,
112+
backend_url=backend_url,
113+
)
114+
115+
reports = insight_generator.run()
116+
117+
if reports:
118+
model_report = generate_model_insights_table(reports[MODEL])
119+
if len(model_report) > 0:
120+
print("--" * 50, file=sys.stderr)
121+
print("Model Insights", file=sys.stderr)
122+
print("--" * 50, file=sys.stderr)
123+
for model_id, report in model_report.items():
124+
print(f"Model: {model_id}", file=sys.stderr)
125+
print(f"File path: {report['path']}", file=sys.stderr)
126+
print(tabulate_data(report["table"], headers="keys"), file=sys.stderr)
127+
print("\n", file=sys.stderr)
128+
129+
project_report = generate_project_insights_table(reports[PROJECT])
130+
if len(project_report) > 0:
131+
print("--" * 50, file=sys.stderr)
132+
print("Project Insights", file=sys.stderr)
133+
print("--" * 50, file=sys.stderr)
134+
print(tabulate_data(project_report, headers="keys"), file=sys.stderr)
135+
136+
print("\nPre-commit hook failed: DataPilot found issues that need to be addressed.", file=sys.stderr)
137+
sys.exit(1)
138+
139+
except Exception as e:
140+
print(f"Error running DataPilot checks: {e}", file=sys.stderr)
141+
print("Pre-commit hook failed due to an error.", file=sys.stderr)
142+
sys.exit(1)
79143

80144
end_time = time.time()
81145
total_time = end_time - start_time
82-
print(f"Total time taken: {round(total_time, 2)} seconds")
146+
print(f"DataPilot checks completed successfully in {round(total_time, 2)} seconds", file=sys.stderr)
83147

84148

85149
if __name__ == "__main__":

0 commit comments

Comments
 (0)