Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion .pre-commit-hooks.yaml
Original file line number Diff line number Diff line change
@@ -1,7 +1,16 @@
- id: datapilot_run_dbt_checks
name: datapilot run dbt checks
description: datapilot run dbt checks
description: Run DataPilot dbt project health checks on changed files
entry: datapilot_run_dbt_checks
language: python
types_or: [yaml, sql]
require_serial: true
# Optional arguments that can be passed to the hook:
# --config-path: Path to configuration file
# --token: API token for authentication
# --instance-name: Tenant/instance name
# --backend-url: Backend URL (defaults to https://api.myaltimate.com)
# --config-name: Name of config to use from API
# --manifest-path: Path to DBT manifest file (defaults to ./target/manifest.json)
# --catalog-path: Path to DBT catalog file (defaults to ./target/catalog.json)
# --base-path: Base path of the dbt project (defaults to current directory)
34 changes: 34 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,40 @@ The [--config-path] is an optional argument. You can provide a yaml file with ov

Note: The dbt docs generate requires an active database connection and may take a long time for projects with large number of models.

### Pre-commit Hook Integration

DataPilot provides a pre-commit hook that automatically runs health checks on changed files before each commit. This ensures code quality and catches issues early in the development process.

#### Quick Setup

1. Install pre-commit:
```bash
pip install pre-commit
```

2. Add to your `.pre-commit-config.yaml`:
```yaml
repos:
- repo: https://github.com/AltimateAI/datapilot-cli
rev: v0.0.27 # Always use a specific version tag
hooks:
- id: datapilot_run_dbt_checks
args: [
"--config-path", "./datapilot-config.yaml",
"--token", "${DATAPILOT_TOKEN}",
"--instance-name", "${DATAPILOT_INSTANCE}",
"--manifest-path", "./target/manifest.json",
"--catalog-path", "./target/catalog.json"
]
```

3. Install the hook:
```bash
pre-commit install
```

For detailed setup instructions, see the [Pre-commit Hook Setup Guide](docs/pre-commit-setup.md).

### Checks

The following checks are available:
Expand Down
166 changes: 150 additions & 16 deletions docs/hooks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,37 +11,171 @@ To use the DataPilot pre-commit hook, follow these steps:

1. Install the `pre-commit` package if you haven't already:

```
pip install pre-commit
```
.. code-block:: shell

pip install pre-commit

2. Add the following configuration to your .pre-commit-config.yaml file in the root of your repository:

```
.. code-block:: yaml

repos:
- repo: https://github.com/AltimateAI/datapilot-cli
rev: <revision>
hooks:
- id: datapilot_run_dbt_checks
args: ["--config-path", "path/to/your/config/file"]
```
- repo: https://github.com/AltimateAI/datapilot-cli
rev: v0.0.27 # Use a specific version tag, not 'main'
hooks:
- id: datapilot_run_dbt_checks
args: [
"--config-path", "./datapilot-config.yaml",
"--token", "${DATAPILOT_TOKEN}",
"--instance-name", "${DATAPILOT_INSTANCE}"
]

Configuration Options
---------------------

The DataPilot pre-commit hook supports several configuration options:

**Required Configuration:**

- ``rev``: Always use a specific version tag (e.g., ``v0.0.27``) instead of ``main`` for production stability

**Optional Arguments:**

- ``--config-path``: Path to your DataPilot configuration file
- ``--token``: Your API token for authentication (can use environment variables)
- ``--instance-name``: Your tenant/instance name (can use environment variables)
- ``--backend-url``: Backend URL (defaults to https://api.myaltimate.com)
- ``--config-name``: Name of config to use from API
- ``--base-path``: Base path of the dbt project (defaults to current directory)
- ``--manifest-path``: Path to the DBT manifest file (defaults to {base_path}/target/manifest.json)
- ``--catalog-path``: Path to the DBT catalog file (defaults to {base_path}/target/catalog.json)

Replace <revision> with the desired revision of the DataPilot repository and "path/to/your/config/file" with the path to your configuration file.
**Environment Variables:**

You can use environment variables for sensitive information:

.. code-block:: yaml

repos:
- repo: https://github.com/AltimateAI/datapilot-cli
rev: v0.0.27
hooks:
- id: datapilot_run_dbt_checks
args: [
"--config-path", "./datapilot-config.yaml",
"--token", "${DATAPILOT_TOKEN}",
"--instance-name", "${DATAPILOT_INSTANCE}",
"--manifest-path", "./target/manifest.json",
"--catalog-path", "./target/catalog.json"
]

**Configuration File Example:**

Create a ``datapilot-config.yaml`` file in your project root:

.. code-block:: yaml

# DataPilot Configuration
disabled_insights:
- "hard_coded_references"
- "duplicate_sources"

# Custom settings for your project
project_settings:
max_fanout: 10
require_tests: true

3. Install the pre-commit hook:

```
pre-commit install
```
.. code-block:: shell

pre-commit install

Usage
-----

Once the hook is installed, it will run automatically before each commit. If any issues are detected, the commit will be aborted, and you will be prompted to fix the issues before retrying the commit.
Once the hook is installed, it will run automatically before each commit. The hook will:

1. **Validate Configuration**: Check that your config file exists and is valid
2. **Authenticate**: Use your provided token and instance name to authenticate
3. **Load DBT Artifacts**: Load manifest and catalog files for analysis
4. **Analyze Changes**: Only analyze files that have changed in the commit
5. **Report Issues**: Display any issues found and prevent the commit if problems are detected

**Required DBT Artifacts:**

The pre-commit hook requires DBT manifest and catalog files to function properly:

- **Manifest File**: Generated by running `dbt compile` or `dbt run`. Default location: `./target/manifest.json`
- **Catalog File**: Generated by running `dbt docs generate`. Default location: `./target/catalog.json`

If you want to manually run all pre-commit hooks on a repository, run `pre-commit run --all-files`. To run individual hooks use `pre-commit run <hook_id>`.
**Note**: The catalog file is optional but recommended for comprehensive analysis. If not available, the hook will continue without catalog information.

**Manual Execution:**

To manually run all pre-commit hooks on a repository:

.. code-block:: shell

pre-commit run --all-files

To run individual hooks:

.. code-block:: shell

pre-commit run datapilot_run_dbt_checks

**Troubleshooting:**

- **Authentication Issues**: Ensure your token and instance name are correctly set
- **Empty Config Files**: The hook will fail if your config file is empty or invalid
- **Missing Manifest File**: Ensure you have run `dbt compile` or `dbt run` to generate the manifest.json file
- **Missing Catalog File**: Run `dbt docs generate` to create the catalog.json file (optional but recommended)
- **No Changes**: If no relevant files have changed, the hook will skip execution
- **Network Issues**: Ensure you have access to the DataPilot API

Best Practices
-------------

1. **Use Version Tags**: Always specify a version tag in the ``rev`` field, never use ``main``
2. **Environment Variables**: Use environment variables for sensitive information like tokens
3. **Configuration Files**: Create a dedicated config file for your project settings
4. **Regular Updates**: Update to new versions when they become available
5. **Team Coordination**: Ensure all team members use the same configuration

Example Complete Setup
---------------------

Here's a complete example of a ``.pre-commit-config.yaml`` file:

.. code-block:: yaml

# .pre-commit-config.yaml
exclude: '^(\.tox|ci/templates|\.bumpversion\.cfg)(/|$)'

repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.1.14
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix, --show-fixes]

- repo: https://github.com/psf/black
rev: 23.12.1
hooks:
- id: black

- repo: https://github.com/AltimateAI/datapilot-cli
rev: v0.0.27
hooks:
- id: datapilot_run_dbt_checks
args: [
"--config-path", "./datapilot-config.yaml",
"--token", "${DATAPILOT_TOKEN}",
"--instance-name", "${DATAPILOT_INSTANCE}",
"--manifest-path", "./target/manifest.json",
"--catalog-path", "./target/catalog.json"
]

Feedback and Contributions
--------------------------
Expand Down
Loading