Skip to content

Commit d335b09

Browse files
authored
Merge pull request #49 from DataRecce/feature/drc-1868-support-cicd-doc-for-non-github
feat: DRC-1868
2 parents 99cbd26 + 1af1b35 commit d335b09

File tree

12 files changed

+1120
-20
lines changed

12 files changed

+1120
-20
lines changed

docs/2-getting-started/start-free-with-cloud.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -172,11 +172,13 @@ Set up CI/CD to automatically upload metadata and run validation checks on every
172172

173173
See the CI/CD sections for complete setup guides:
174174

175-
- [Setup CD](/7-cicd/setup-cd/)
176-
- [Setup CI](/7-cicd/setup-ci/)
177-
178-
- GitHub integration configured
179-
- Team plan subscription or free trial
175+
- [Getting Started with CI/CD](../7-cicd/ci-cd-getting-started.md)
176+
- GitHub CI/CD
177+
- [Setup CI for GitHub](../7-cicd/github/setup-ci.md)
178+
- [Setup CD for GitHub](../7-cicd/github/setup-cd.md)
179+
- GitLab CI/CD
180+
- [Setup CI for Gitlab](../7-cicd/gitlab/setup-ci.md)
181+
- [Setup CD for Gitlab](../7-cicd/gitlab/setup-cd.md)
180182

181183
### Automation Benefits
182184

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
---
2+
title: CI/CD Getting Started
3+
---
4+
5+
# CI/CD Getting Started
6+
7+
Automate data validation in your development workflow. Catch data issues before they reach production with continuous integration and delivery built specifically for dbt projects.
8+
9+
## What you'll achieve
10+
11+
Set up automated workflows that:
12+
13+
- **Maintain current baselines** - Auto-update comparison baselines on every merge to main
14+
- **Validate every PR/MR** - Run data validation checks automatically when changes are proposed
15+
- **Prevent regressions** - Catch data quality issues before they reach production
16+
- **Save team time** - Eliminate manual validation steps for every change
17+
18+
!!!note
19+
CI/CD automation requires Recce Cloud Team plan. A free trial is available.
20+
21+
## Understanding CI vs CD
22+
23+
Recce uses both continuous integration and continuous delivery to automate data validation:
24+
25+
**Continuous Integration (CI)**
26+
27+
- **When**: Runs on every PR/MR update
28+
- **Purpose**: Validates proposed changes against baseline
29+
- **Benefit**: Catches issues before merge, with results in your PR/MR
30+
31+
**Continuous Delivery (CD)**
32+
33+
- **When**: Runs after merge to main branch
34+
- **Purpose**: Updates your baseline Recce session with latest production state
35+
- **Benefit**: Ensures future comparisons use current baseline
36+
37+
## Choose your platform
38+
39+
Recce integrates with both GitHub Actions and GitLab CI/CD.
40+
41+
Select your Git platform to get started:
42+
43+
### GitHub
44+
If your dbt project uses GitHub:
45+
46+
1. [Setup CI](./github/setup-ci.md) - Auto-validate changes in every PR
47+
2. [Setup CD](./github/setup-cd.md) - Auto-update baseline on merge to main
48+
49+
### GitLab
50+
If your dbt project uses GitLab:
51+
52+
2. [Setup CI](./gitlab/setup-ci.md) - Auto-validate changes in every MR
53+
1. [Setup CD](./gitlab/setup-cd.md) - Auto-update baseline on merge to main
54+
3. [GitLab Personal Access Token Guide](./gitlab/gitlab-pat-guide.md) - Required for GitLab integration
55+
56+
## Prerequisites
57+
58+
Before setting up, ensure you have:
59+
60+
- **Recce Cloud account** with Team plan or free trial
61+
- **Repository connected** to Recce Cloud ([setup guide](../2-getting-started/start-free-with-cloud.md#git-integration))
62+
- **dbt artifacts** (`manifest.json` and `catalog.json`) from your project
63+
64+
## Architecture overview
65+
66+
Both CI and CD workflows follow the same pattern:
67+
68+
1. **Trigger event** (merge to main, or PR/MR opened/updated)
69+
2. **Generate dbt artifacts** (`dbt docs generate` or external source)
70+
3. **Upload to Recce Cloud** (automatic via workflow action)
71+
4. **Validation results** appear in Recce dashboard and PR/MR
72+
73+
<figure markdown>
74+
![Recce CI/CD architecture](../assets/images/7-cicd/ci-cd.png){: .shadow}
75+
<figcaption>Automated validation workflow for pull requests</figcaption>
76+
</figure>
77+
78+
## Next steps
79+
80+
1. Choose your platform (GitHub or GitLab)
81+
2. Start with CD setup to establish baseline updates
82+
3. Add CI setup to enable PR/MR validation
83+
4. Review [best practices](./best-practices-prep-env.md) for environment preparation
84+
85+
## Related workflows
86+
87+
After setting up CI/CD automation, explore these workflow guides:
88+
89+
- [Development workflow](./scenario-dev.md) - Validate changes during development
90+
- [PR/MR review workflow](./scenario-pr-review.md) - Collaborate on validation results
91+
- [Preset checks](./preset-checks.md) - Configure automatic validation checks

docs/7-cicd/github/scenario-ci.md

Lines changed: 211 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,211 @@
1+
---
2+
title: Setup CI in Open Source
3+
---
4+
5+
# Recce CI integration with GitHub Action
6+
7+
Recce provides the `recce run` command for CI/CD pipeline. You can integrate Recce with GitHub Actions (or other CI tools) to compare the data models between two environments when a new pull-request is created. The below image describes the basic architecture.
8+
9+
![ci/cd architecture](/assets/images/7-cicd/ci-cd.png){: .shadow}
10+
11+
The following guide demonstrates how to configure Recce in GitHub Actions.
12+
13+
## Prerequisites
14+
15+
Before integrating Recce with GitHub Actions, you will need to configure the following items:
16+
17+
- Set up **two environments** in your data warehouse. For example, one for base and another for pull request.
18+
19+
- Provide the **credentials profile** for both environments in your `profiles.yml` so that Recce can access your data warehouse. You can put the credentials in a `profiles.yml` file, or use environment variables.
20+
21+
- Set up the **data warehouse credentials** in your [GitHub repository secrets](https://docs.github.com/en/actions/reference/encrypted-secrets).
22+
23+
## Set up Recce with GitHub Actions
24+
25+
We suggest setting up two GitHub Actions workflows in your GitHub repository. One for the base environment and another for the PR environment.
26+
27+
- **Base environment workflow**: Triggered on every merge to the `main branch`. This ensures that base artifacts are readily available for use when a PR is opened.
28+
29+
- **PR environment workflow**: Triggered on every push to the `pull-request branch`. This workflow will compare base models with the current PR environment.
30+
31+
### Base Workflow (Main Branch)
32+
33+
This workflow will perform the following actions:
34+
35+
1. Run dbt on the base environment
36+
2. Upload the generated DBT artifacts to [GitHub workflow artifacts](https://docs.github.com/en/actions/using-workflows/storing-workflow-data-as-artifacts) for later use
37+
38+
```yaml
39+
name: Recce CI Base Branch
40+
41+
on:
42+
workflow_dispatch:
43+
push:
44+
branches:
45+
- main
46+
47+
concurrency:
48+
group: recce-ci-base
49+
cancel-in-progress: true
50+
51+
jobs:
52+
build:
53+
runs-on: ubuntu-latest
54+
55+
steps:
56+
- uses: actions/checkout@v3
57+
58+
- name: Set up Python
59+
uses: actions/setup-python@v2
60+
with:
61+
python-version: "3.10.x"
62+
63+
- name: Install dependencies
64+
run: |
65+
pip install -r requirements.txt
66+
67+
- name: Run DBT
68+
run: |
69+
dbt deps
70+
dbt seed --target ${{ env.DBT_BASE_TARGET }}
71+
dbt run --target ${{ env.DBT_BASE_TARGET }}
72+
dbt docs generate --target ${{ env.DBT_BASE_TARGET }}
73+
env:
74+
DBT_BASE_TARGET: "prod"
75+
76+
- name: Upload DBT Artifacts
77+
uses: actions/upload-artifact@v4
78+
with:
79+
name: target
80+
path: target/
81+
```
82+
83+
!!! note
84+
85+
Please place the above file in `.github/workflows/dbt_base.yml`. This workflow path will also be used in the next PR workflow. If you place it in a different location, please remember to make the corresponding changes in the next step.
86+
87+
### PR Workflow (Pull Request Branch)
88+
89+
This workflow will perform the following actions:
90+
91+
1. Run dbt on the PR environment.
92+
2. Download previously generated base artifacts from base workflow.
93+
3. Use Recce to compare the PR environment with the downloaded base artifacts.
94+
<!-- 4. Use Recce to generate the summary of the current changes and post it as a comment on the pull request. Please refer to the [Recce Summary](./recce-summary.md) for more information. -->
95+
96+
````yaml
97+
name: Recce CI PR Branch
98+
99+
on:
100+
pull_request:
101+
branches: [main]
102+
103+
jobs:
104+
check-pull-request:
105+
name: Check pull request by Recce CI
106+
runs-on: ubuntu-latest
107+
steps:
108+
- name: Checkout repository
109+
uses: actions/checkout@v3
110+
with:
111+
fetch-depth: 0
112+
- name: Merge Base Branch into PR
113+
uses: DataRecce/PR-Update@v1
114+
with:
115+
baseBranch: ${{ github.event.pull_request.base.ref }}
116+
autoMerge: false
117+
- name: Set up Python
118+
uses: actions/setup-python@v4
119+
with:
120+
python-version: "3.10.x"
121+
- name: Install dependencies
122+
run: |
123+
pip install -r requirements.txt
124+
pip install recce
125+
- name: Prepare dbt Base environment
126+
run: |
127+
gh repo set-default ${{ github.repository }}
128+
base_branch=${{ github.base_ref }}
129+
run_id=$(gh run list --workflow ${WORKFLOW_BASE} --branch ${base_branch} --status success --limit 1 --json databaseId --jq '.[0].databaseId')
130+
echo "Download artifacts from run $run_id"
131+
gh run download ${run_id} -n target -D target-base
132+
env:
133+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
134+
WORKFLOW_BASE: ".github/workflows/dbt_base.yml"
135+
- name: Prepare dbt Current environment
136+
run: |
137+
git checkout ${{ github.event.pull_request.head.sha }}
138+
dbt deps
139+
dbt seed --target ${{ env.DBT_CURRENT_TARGET}}
140+
dbt run --target ${{ env.DBT_CURRENT_TARGET}}
141+
dbt docs generate --target ${{ env.DBT_CURRENT_TARGET}}
142+
env:
143+
DBT_CURRENT_TARGET: "dev"
144+
145+
- name: Run Recce CI
146+
run: |
147+
recce run --github-pull-request-url ${{ github.event.pull_request.html_url }}
148+
149+
- name: Upload DBT Artifacts
150+
uses: actions/upload-artifact@v4
151+
with:
152+
name: target
153+
path: target/
154+
155+
- name: Upload Recce State File
156+
uses: actions/upload-artifact@v4
157+
id: recce-artifact-uploader
158+
with:
159+
name: recce-state-file
160+
path: recce_state.json
161+
````
162+
<!--
163+
- name: Prepare Recce Summary
164+
id: recce-summary
165+
run: |
166+
recce summary recce_state.json > recce_summary.md
167+
cat recce_summary.md >> $GITHUB_STEP_SUMMARY
168+
echo '${{ env.NEXT_STEP_MESSAGE }}' >> recce_summary.md
169+
170+
# Handle the case when the recce summary is too long to be displayed in the GitHub PR comment
171+
if [[ `wc -c recce_summary.md | awk '{print $1}'` -ge '65535' ]]; then
172+
echo '# Recce Summary
173+
The recce summary is too long to be displayed in the GitHub PR comment.
174+
Please check the summary detail in the [Job Summary](${{github.server_url}}/${{github.repository}}/actions/runs/${{github.run_id}}) page.
175+
${{ env.NEXT_STEP_MESSAGE }}' > recce_summary.md
176+
fi
177+
178+
env:
179+
NEXT_STEP_MESSAGE: |
180+
## Next Steps
181+
If you want to check more detail information about the recce result, please download the [artifact](${{ steps.recce-artifact-uploader.outputs.artifact-url }}) file and open it by [Recce](https://pypi.org/project/recce/) CLI.
182+
183+
### How to check the recce result
184+
```bash
185+
# Unzip the downloaded artifact file
186+
tar -xf recce-state-file.zip
187+
188+
# Launch the recce server based on the state file
189+
recce server --review recce_state.json
190+
191+
# Open the recce server http://localhost:8000 by your browser
192+
```
193+
194+
- name: Comment on pull request
195+
uses: thollander/actions-comment-pull-request@v2
196+
with:
197+
filePath: recce_summary.md
198+
comment_tag: recce
199+
-->
200+
201+
202+
## Review the Recce State File
203+
204+
Review the downloaded Recce [state file](../../8-technical-concepts/state-file.md) with the following command:
205+
206+
```bash
207+
recce server --review recce_state.json
208+
```
209+
210+
In the Recce server `--review` mode, you can review the comparison results of the data models between the base and current environments. It will contain the row counts of modified data models.
211+
<!-- and the results of any Recce [Preset Checks](./preset-checks.md). -->

0 commit comments

Comments
 (0)