Skip to content

Commit 83928a1

Browse files
committed
feat: DRC-1868
- Refactor CI/CD Documentation - Create a section for GitHub, move ci/cd setup docs there - Create a section for GitLab. Create docs for GitLab. - Create getting-started.md to cover choices Signed-off-by: Jared Scott <[email protected]>
1 parent 99cbd26 commit 83928a1

File tree

9 files changed

+1109
-11
lines changed

9 files changed

+1109
-11
lines changed

docs/7-cicd/getting-started.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
---
2+
title: Getting Started
3+
---
4+
5+
# Getting Started
6+
7+
Automate data validation in your development workflow. Catch data issues before they reach production with continuous integration and delivery built specifically for dbt projects.
8+
9+
## What you'll achieve
10+
11+
Set up automated workflows that:
12+
13+
- **Maintain current baselines** - Auto-update comparison baselines on every merge to main
14+
- **Validate every PR/MR** - Run data validation checks automatically when changes are proposed
15+
- **Prevent regressions** - Catch data quality issues before they reach production
16+
- **Save team time** - Eliminate manual validation steps for every change
17+
18+
## Understanding CI vs CD
19+
20+
Recce uses both continuous integration and continuous delivery to automate data validation:
21+
22+
**Continuous Integration (CI)**
23+
24+
- **When**: Runs on every PR/MR update
25+
- **Purpose**: Validates proposed changes against baseline
26+
- **Benefit**: Catches issues before merge, with results in your PR/MR
27+
28+
**Continuous Delivery (CD)**
29+
30+
- **When**: Runs after merge to main branch
31+
- **Purpose**: Updates your baseline Recce session with latest production state
32+
- **Benefit**: Ensures future comparisons use current baseline
33+
34+
## Choose your platform
35+
36+
Recce integrates with both GitHub Actions and GitLab CI/CD.
37+
38+
Select your Git platform to get started:
39+
40+
### GitHub
41+
If your dbt project uses GitHub:
42+
43+
1. [Setup CI](./github/setup-ci.md) - Auto-validate changes in every PR
44+
2. [Setup CD](./github/setup-cd.md) - Auto-update baseline on merge to main
45+
3. [Open Source Setup](./github/scenario-ci.md) - Alternative approach for open source projects
46+
47+
### GitLab
48+
If your dbt project uses GitLab:
49+
50+
2. [Setup CI](./gitlab/setup-ci.md) - Auto-validate changes in every MR
51+
1. [Setup CD](./gitlab/setup-cd.md) - Auto-update baseline on merge to main
52+
3. [GitLab Personal Access Token Guide](./gitlab/gitlab-pat-guide.md) - Required for GitLab integration
53+
54+
!!!note
55+
CI/CD automation requires Recce Cloud Team plan. A free trial is available.
56+
57+
## Prerequisites
58+
59+
Before setting up, ensure you have:
60+
61+
- **Recce Cloud account** with Team plan or free trial
62+
- **Repository connected** to Recce Cloud ([setup guide](../2-getting-started/start-free-with-cloud.md#git-integration))
63+
- **dbt artifacts** (`manifest.json` and `catalog.json`) from your project
64+
65+
## Architecture overview
66+
67+
Both CI and CD workflows follow the same pattern:
68+
69+
1. **Trigger event** (merge to main, or PR/MR opened/updated)
70+
2. **Generate dbt artifacts** (`dbt docs generate` or external source)
71+
3. **Upload to Recce Cloud** (automatic via workflow action)
72+
4. **Validation results** appear in Recce dashboard and PR/MR
73+
74+
<figure markdown>
75+
![Recce CI/CD architecture](../assets/images/7-cicd/ci-cd.png){: .shadow}
76+
<figcaption>Automated validation workflow for pull requests</figcaption>
77+
</figure>
78+
79+
## Next steps
80+
81+
1. Choose your platform (GitHub or GitLab)
82+
2. Start with CD setup to establish baseline updates
83+
3. Add CI setup to enable PR/MR validation
84+
4. Review [best practices](./best-practices-prep-env.md) for environment preparation
85+
86+
## Related workflows
87+
88+
After setting up CI/CD automation, explore these workflow guides:
89+
90+
- [Development workflow](./scenario-dev.md) - Validate changes during development
91+
- [PR/MR review workflow](./scenario-pr-review.md) - Collaborate on validation results
92+
- [Preset checks](./preset-checks.md) - Configure automatic validation checks

docs/7-cicd/github/scenario-ci.md

Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
---
2+
title: Setup CI in Open Source
3+
---
4+
5+
# Recce CI integration with GitHub Action
6+
7+
Recce provides the `recce run` command for CI/CD pipeline. You can integrate Recce with GitHub Actions (or other CI tools) to compare the data models between two environments when a new pull-request is created. The below image describes the basic architecture.
8+
9+
![ci/cd architecture](/assets/images/7-cicd/ci-cd.png){: .shadow}
10+
11+
The following guide demonstrates how to configure Recce in GitHub Actions.
12+
13+
## Prerequisites
14+
15+
Before integrating Recce with GitHub Actions, you will need to configure the following items:
16+
17+
- Set up **two environments** in your data warehouse. For example, one for base and another for pull request.
18+
19+
- Provide the **credentials profile** for both environments in your `profiles.yml` so that Recce can access your data warehouse. You can put the credentials in a `profiles.yml` file, or use environment variables.
20+
21+
- Set up the **data warehouse credentials** in your [GitHub repository secrets](https://docs.github.com/en/actions/reference/encrypted-secrets).
22+
23+
## Set up Recce with GitHub Actions
24+
25+
We suggest setting up two GitHub Actions workflows in your GitHub repository. One for the base environment and another for the PR environment.
26+
27+
- **Base environment workflow**: Triggered on every merge to the `main branch`. This ensures that base artifacts are readily available for use when a PR is opened.
28+
29+
- **PR environment workflow**: Triggered on every push to the `pull-request branch`. This workflow will compare base models with the current PR environment.
30+
31+
### Base Workflow (Main Branch)
32+
33+
This workflow will perform the following actions:
34+
35+
1. Run dbt on the base environment
36+
2. Upload the generated DBT artifacts to [GitHub workflow artifacts](https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts) for later use
37+
38+
```yaml
39+
name: Recce CI Base Branch
40+
41+
on:
42+
workflow_dispatch:
43+
push:
44+
branches:
45+
- main
46+
47+
concurrency:
48+
group: recce-ci-base
49+
cancel-in-progress: true
50+
51+
jobs:
52+
build:
53+
runs-on: ubuntu-latest
54+
55+
steps:
56+
- uses: actions/checkout@v3
57+
58+
- name: Set up Python
59+
uses: actions/setup-python@v2
60+
with:
61+
python-version: "3.10.x"
62+
63+
- name: Install dependencies
64+
run: |
65+
pip install -r requirements.txt
66+
67+
- name: Run DBT
68+
run: |
69+
dbt deps
70+
dbt seed --target ${{ env.DBT_BASE_TARGET }}
71+
dbt run --target ${{ env.DBT_BASE_TARGET }}
72+
dbt docs generate --target ${{ env.DBT_BASE_TARGET }}
73+
env:
74+
DBT_BASE_TARGET: "prod"
75+
76+
- name: Upload DBT Artifacts
77+
uses: actions/upload-artifact@v4
78+
with:
79+
name: target
80+
path: target/
81+
```
82+
83+
!!! note
84+
85+
Please place the above file in `.github/workflows/dbt_base.yml`. This workflow path will also be used in the next PR workflow. If you place it in a different location, please remember to make the corresponding changes in the next step.
86+
87+
### PR Workflow (Pull Request Branch)
88+
89+
This workflow will perform the following actions:
90+
91+
1. Run dbt on the PR environment.
92+
2. Download previously generated base artifacts from base workflow.
93+
3. Use Recce to compare the PR environment with the downloaded base artifacts.
94+
<!-- 4. Use Recce to generate the summary of the current changes and post it as a comment on the pull request. Please refer to the [Recce Summary](./recce-summary.md) for more information. -->
95+
96+
````yaml
97+
name: Recce CI PR Branch
98+
99+
on:
100+
pull_request:
101+
branches: [main]
102+
103+
jobs:
104+
check-pull-request:
105+
name: Check pull request by Recce CI
106+
runs-on: ubuntu-latest
107+
steps:
108+
- name: Checkout repository
109+
uses: actions/checkout@v3
110+
with:
111+
fetch-depth: 0
112+
- name: Merge Base Branch into PR
113+
uses: DataRecce/PR-Update@v1
114+
with:
115+
baseBranch: ${{ github.event.pull_request.base.ref }}
116+
autoMerge: false
117+
- name: Set up Python
118+
uses: actions/setup-python@v4
119+
with:
120+
python-version: "3.10.x"
121+
- name: Install dependencies
122+
run: |
123+
pip install -r requirements.txt
124+
pip install recce
125+
- name: Prepare dbt Base environment
126+
run: |
127+
gh repo set-default ${{ github.repository }}
128+
base_branch=${{ github.base_ref }}
129+
run_id=$(gh run list --workflow ${WORKFLOW_BASE} --branch ${base_branch} --status success --limit 1 --json databaseId --jq '.[0].databaseId')
130+
echo "Download artifacts from run $run_id"
131+
gh run download ${run_id} -n target -D target-base
132+
env:
133+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
134+
WORKFLOW_BASE: ".github/workflows/dbt_base.yml"
135+
- name: Prepare dbt Current environment
136+
run: |
137+
git checkout ${{ github.event.pull_request.head.sha }}
138+
dbt deps
139+
dbt seed --target ${{ env.DBT_CURRENT_TARGET}}
140+
dbt run --target ${{ env.DBT_CURRENT_TARGET}}
141+
dbt docs generate --target ${{ env.DBT_CURRENT_TARGET}}
142+
env:
143+
DBT_CURRENT_TARGET: "dev"
144+
145+
- name: Run Recce CI
146+
run: |
147+
recce run --github-pull-request-url ${{ github.event.pull_request.html_url }}
148+
149+
- name: Upload DBT Artifacts
150+
uses: actions/upload-artifact@v4
151+
with:
152+
name: target
153+
path: target/
154+
155+
- name: Upload Recce State File
156+
uses: actions/upload-artifact@v4
157+
id: recce-artifact-uploader
158+
with:
159+
name: recce-state-file
160+
path: recce_state.json
161+
````
162+
<!--
163+
- name: Prepare Recce Summary
164+
id: recce-summary
165+
run: |
166+
recce summary recce_state.json > recce_summary.md
167+
cat recce_summary.md >> $GITHUB_STEP_SUMMARY
168+
echo '${{ env.NEXT_STEP_MESSAGE }}' >> recce_summary.md
169+
170+
# Handle the case when the recce summary is too long to be displayed in the GitHub PR comment
171+
if [[ `wc -c recce_summary.md | awk '{print $1}'` -ge '65535' ]]; then
172+
echo '# Recce Summary
173+
The recce summary is too long to be displayed in the GitHub PR comment.
174+
Please check the summary detail in the [Job Summary](${{github.server_url}}/${{github.repository}}/actions/runs/${{github.run_id}}) page.
175+
${{ env.NEXT_STEP_MESSAGE }}' > recce_summary.md
176+
fi
177+
178+
env:
179+
NEXT_STEP_MESSAGE: |
180+
## Next Steps
181+
If you want to check more detail information about the recce result, please download the [artifact](${{ steps.recce-artifact-uploader.outputs.artifact-url }}) file and open it by [Recce](https://pypi.org/project/recce/) CLI.
182+
183+
### How to check the recce result
184+
```bash
185+
# Unzip the downloaded artifact file
186+
tar -xf recce-state-file.zip
187+
188+
# Launch the recce server based on the state file
189+
recce server --review recce_state.json
190+
191+
# Open the recce server http://localhost:8000 by your browser
192+
```
193+
194+
- name: Comment on pull request
195+
uses: thollander/actions-comment-pull-request@v2
196+
with:
197+
filePath: recce_summary.md
198+
comment_tag: recce
199+
-->
200+
201+
202+
## Review the Recce State File
203+
204+
Review the downloaded Recce [state file](../../8-technical-concepts/state-file.md) with the following command:
205+
206+
```bash
207+
recce server --review recce_state.json
208+
```
209+
210+
In the Recce server `--review` mode, you can review the comparison results of the data models between the base and current environments. It will contain the row counts of modified data models.
211+
<!-- and the results of any Recce [Preset Checks](./preset-checks.md). -->

0 commit comments

Comments
 (0)