Skip to content

Commit 898dbab

Browse files
committed
Add quality engineering framework for PostgresAI products
Introduce a comprehensive quality framework covering DBLab SE/EE and platform UI with three layers: automated quality gates, AI-augmented review and testing, and human judgment for architecture and safety. Framework includes: - Quality engineering guide with PostgreSQL-specific standards - PR review checklist with critical safety checks - Release readiness checklist for all release artifacts - AI system prompts for automated PR review and test generation - CI quality gate definitions (coverage, race detection, complexity, vulnerability scanning, performance benchmarks) - Weekly quality metrics tracking template - Local quality check scripts for pre-push verification https://claude.ai/code/session_01BMygsj1Bb967LXm7guNEBS
1 parent 8c3ef6f commit 898dbab

File tree

9 files changed

+1163
-0
lines changed

9 files changed

+1163
-0
lines changed
Lines changed: 376 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,376 @@
1+
# Quality Engineering Guide
2+
3+
## PostgresAI / Database Lab Engine
4+
5+
This document defines the quality engineering standards, processes, and workflows for all PostgresAI products including Database Lab Engine (SE/EE) and platform UI components.
6+
7+
---
8+
9+
## Table of Contents
10+
11+
- [Core Philosophy](#core-philosophy)
12+
- [Quality Layers](#quality-layers)
13+
- [Layer 1: Automated Foundation](#layer-1-automated-foundation)
14+
- [Layer 2: AI-Augmented Quality](#layer-2-ai-augmented-quality)
15+
- [Layer 3: Human Quality Decisions](#layer-3-human-quality-decisions)
16+
- [Development Workflow](#development-workflow)
17+
- [Weekly Quality Rhythm](#weekly-quality-rhythm)
18+
- [PostgreSQL-Specific Quality Standards](#postgresql-specific-quality-standards)
19+
- [Quality Metrics](#quality-metrics)
20+
- [Trust-Critical Failure Modes](#trust-critical-failure-modes)
21+
22+
---
23+
24+
## Core Philosophy
25+
26+
**Quality as Code** -- quality engineering is embedded into the development workflow itself, with AI amplifying every contributor. No separate QA team; quality ownership stays with engineers.
27+
28+
Three layers:
29+
30+
1. **Automated quality gates** catch 80%+ of issues before any human sees them
31+
2. **AI-assisted review and testing** handles exploratory and edge-case work
32+
3. **Human judgment** reserved for architecture decisions, customer-facing scenarios, and risk assessment
33+
34+
---
35+
36+
## Quality Layers
37+
38+
### Layer 1: Automated Foundation
39+
40+
All automated checks run as CI/CD pipeline stages. Every PR must pass all gates before merge.
41+
42+
#### 1.1 Static Analysis
43+
44+
| Check | Tool | Scope | When |
45+
|-------|------|-------|------|
46+
| Go linting | golangci-lint | engine/ | every PR |
47+
| Go formatting | gofmt/goimports | engine/ | every PR |
48+
| TypeScript/ESLint | eslint | ui/ | every PR |
49+
| Style linting | stylelint | ui/ | every PR |
50+
| Spell checking | cspell | ui/ | every PR |
51+
| Secret scanning | gitleaks | repo-wide | pre-commit + CI |
52+
| Security scanning | CodeQL | repo-wide | scheduled |
53+
54+
#### 1.2 Test Suites
55+
56+
| Suite | Command | When | Coverage |
57+
|-------|---------|------|----------|
58+
| Go unit tests | `make test` | every PR | all packages |
59+
| Go integration tests | `make test-ci-integration` | MR pipeline | Docker-dependent |
60+
| Bash integration tests | `engine/test/*.sh` | MR (PG 17-18), main (PG 9.6-18) | end-to-end flows |
61+
| UI unit tests | `pnpm test` | every PR | React components |
62+
| UI e2e tests | `pnpm cy:run` | every PR | Cypress flows |
63+
| API contract tests | Newman/Postman | every PR | API endpoints |
64+
65+
#### 1.3 PostgreSQL Version Matrix
66+
67+
Full matrix on main branch; reduced set on feature branches to optimize pipeline time.
68+
69+
| Version | Feature Branch | Main Branch |
70+
|---------|---------------|-------------|
71+
| 9.6 | - | yes |
72+
| 10-16 | - | yes |
73+
| 17 | yes | yes |
74+
| 18 | yes | yes |
75+
76+
#### 1.4 Build Verification
77+
78+
- All binaries (server, CLI, CI checker, RDS refresh) must build on every PR
79+
- Docker images must build successfully
80+
- Cross-platform CLI builds (darwin/linux/freebsd/windows, amd64/arm64) verified on main
81+
82+
#### 1.5 Performance Regression Detection
83+
84+
Automated benchmarks on merge to main with statistical comparison against baseline:
85+
86+
- Thin clone creation time
87+
- Snapshot creation/restoration time
88+
- API response latencies (p50, p95, p99)
89+
- Memory usage under concurrent clone load
90+
91+
Track in `quality/metrics/benchmarks/` with per-commit results.
92+
93+
### Layer 2: AI-Augmented Quality
94+
95+
#### 2.1 AI-Assisted PR Review
96+
97+
Use the system prompt in `quality/prompts/pr-review-system-prompt.md` for automated review. The reviewer checks:
98+
99+
- PostgreSQL-specific correctness (connection handling, transaction safety, lock awareness)
100+
- Error handling completeness (every DB operation must handle errors)
101+
- Resource lifecycle (connections opened must be closed, advisory locks released)
102+
- SQL safety (no raw concatenation, parameterized queries only)
103+
- Concurrency safety (proper mutex usage, no data races)
104+
- Configuration validation (bounds checking, sensible defaults)
105+
106+
#### 2.2 AI-Assisted Test Generation
107+
108+
Use the prompt in `quality/prompts/test-generation-prompt.md` when implementing new features. The AI generates test cases covering:
109+
110+
- Normal operation path
111+
- Empty/nil/zero-value inputs
112+
- Boundary conditions
113+
- Concurrent access scenarios
114+
- PostgreSQL version-specific behavior
115+
- Extension compatibility edge cases
116+
117+
Developer reviews, adjusts, and owns the generated tests.
118+
119+
#### 2.3 Spec-to-Test Pipeline
120+
121+
For features with written specs:
122+
123+
1. Write spec in markdown (feature description, expected behavior, edge cases)
124+
2. Feed spec to AI to generate acceptance test skeletons
125+
3. Developer reviews, fills in implementation-specific details
126+
4. Tests become the executable spec -- spec and tests stay in sync
127+
128+
#### 2.4 Automated Issue Triage
129+
130+
When a bug is reported:
131+
132+
1. AI classifies severity (critical/high/medium/low)
133+
2. Identifies likely affected components from stack traces and logs
134+
3. Searches for related past issues
135+
4. Drafts initial investigation path
136+
5. Human picks up with context already assembled
137+
138+
### Layer 3: Human Quality Decisions
139+
140+
Reserve human attention for:
141+
142+
- **Architecture reviews** for features touching data safety (clone creation, snapshot management, WAL interaction)
143+
- **Customer scenario testing** before releases -- walk through key workflows manually:
144+
- "Clone a 500GB database in under 60 seconds"
145+
- "Run SAMO analysis on a production-like workload"
146+
- "Recover from a failed snapshot mid-operation"
147+
- **Risk classification** for autonomous features -- every action that modifies PostgreSQL configuration or data needs a human-defined risk level and corresponding safety gate
148+
- **Security review** for any code handling authentication, authorization, or direct SQL execution
149+
150+
---
151+
152+
## Development Workflow
153+
154+
### For Every Feature
155+
156+
```
157+
1. Spec written
158+
-> reviewed by at least one other engineer
159+
-> fed to AI for gap analysis ("what failure modes aren't addressed?")
160+
161+
2. Implementation + tests
162+
-> developer writes code
163+
-> AI generates test scaffolding from spec
164+
-> developer refines tests
165+
-> target: 80%+ code coverage for new code
166+
167+
3. PR opened
168+
-> CI runs fast suite (unit tests, lint, build)
169+
-> AI runs PR review (see prompts/pr-review-system-prompt.md)
170+
-> human reviewer focuses on design and PostgreSQL correctness
171+
172+
4. Merge to main
173+
-> nightly full matrix runs
174+
-> performance benchmarks compared to baseline
175+
176+
5. Release candidate
177+
-> AI produces release readiness report (see checklists/)
178+
-> human does scenario walkthrough
179+
-> decision made
180+
```
181+
182+
### PR Review Standards
183+
184+
Every PR must have:
185+
186+
- [ ] All CI checks passing (tests, lint, build)
187+
- [ ] AI review completed with no unresolved critical findings
188+
- [ ] At least one human approval
189+
- [ ] New/modified code has corresponding tests
190+
- [ ] No regression in test coverage
191+
- [ ] Breaking API changes documented
192+
193+
See `quality/checklists/pr-review-checklist.md` for the full checklist.
194+
195+
### Commit Standards
196+
197+
- Present tense, imperative mood ("add feature" not "added feature")
198+
- First line under 72 characters
199+
- Detailed description in body when warranted
200+
- All commits signed
201+
- Reference related issues
202+
203+
---
204+
205+
## Weekly Quality Rhythm
206+
207+
### Monday
208+
209+
- Review test failures from weekend/nightly runs
210+
- Triage any new issues reported over the weekend
211+
- Review quality metrics dashboard for trends
212+
213+
### Wednesday (mid-week check)
214+
215+
- Review open PRs for stale reviews
216+
- Check for flaky test patterns in recent CI runs
217+
- Address any performance regression alerts
218+
219+
### Friday
220+
221+
- Quality retrospective: what slipped through this week?
222+
- Does a new test need to be added?
223+
- Does a CI check need tightening?
224+
- Update quality metrics tracking
225+
226+
---
227+
228+
## PostgreSQL-Specific Quality Standards
229+
230+
These standards are non-negotiable given that DBLab interacts with production PostgreSQL instances.
231+
232+
### SQL Safety
233+
234+
- **No raw SQL concatenation.** All user-provided values must use parameterized queries.
235+
- Every SQL query the product generates must be tested against `EXPLAIN ANALYZE` output for:
236+
- No sequential scans on large tables
237+
- No unexpected lock escalation
238+
- Appropriate index usage
239+
240+
### Connection Management
241+
242+
- Every database connection must have a timeout configured
243+
- Connections must be returned to pool after use (defer pattern)
244+
- Graceful degradation under connection exhaustion
245+
- Connection pool sizing must be configurable and documented
246+
247+
### Extension Compatibility
248+
249+
Maintain a first-class extension compatibility matrix in CI:
250+
251+
| Extension | Priority | Tested Versions |
252+
|-----------|----------|----------------|
253+
| pg_stat_statements | critical | all PG versions |
254+
| pg_stat_kcache | high | PG 14+ |
255+
| auto_explain | high | all PG versions |
256+
| PostGIS | medium | PG 14+ |
257+
| pg_partman | medium | PG 14+ |
258+
| pgvector | medium | PG 14+ |
259+
260+
### WAL and Replication Safety
261+
262+
Any feature touching WAL or replication requires specific tests for:
263+
264+
- Replica lag behavior
265+
- Failover scenarios
266+
- WAL segment cleanup
267+
- Archive command compatibility
268+
269+
### Transaction Safety
270+
271+
- Document expected transaction isolation level for every DB operation
272+
- Test behavior under concurrent access
273+
- Verify no long-running transactions that could cause table bloat
274+
275+
### Destructive Testing
276+
277+
Maintain a destructive testing harness that simulates:
278+
279+
- Disk full during clone/snapshot operations
280+
- OOM conditions
281+
- Network partition between DLE and PostgreSQL
282+
- Kill/restart mid-clone
283+
- Corrupt ZFS snapshot recovery
284+
285+
---
286+
287+
## Quality Metrics
288+
289+
Track these metrics continuously. See `quality/metrics/` for tracking templates.
290+
291+
### Code Quality
292+
293+
| Metric | Target | Measurement |
294+
|--------|--------|-------------|
295+
| Unit test coverage | >= 80% | `go test -cover` |
296+
| Lint violations | 0 on main | golangci-lint |
297+
| Cyclomatic complexity | < 30 per function | golangci-lint |
298+
299+
### Pipeline Health
300+
301+
| Metric | Target | Measurement |
302+
|--------|--------|-------------|
303+
| CI pass rate | >= 95% | pipeline analytics |
304+
| Flaky test rate | < 2% | test result tracking |
305+
| Pipeline duration (fast) | < 10 min | pipeline analytics |
306+
| Pipeline duration (full) | < 45 min | pipeline analytics |
307+
308+
### Defect Tracking
309+
310+
| Metric | Target | Measurement |
311+
|--------|--------|-------------|
312+
| Mean time to detection | < 24 hours | issue timestamps |
313+
| Escaped defects per release | < 3 | post-release tracking |
314+
| Critical bug fix time | < 4 hours | issue resolution time |
315+
316+
### Performance
317+
318+
| Metric | Target | Measurement |
319+
|--------|--------|-------------|
320+
| Clone creation (100GB) | < 60s | benchmark suite |
321+
| API response (p95) | < 200ms | benchmark suite |
322+
| Snapshot creation | no regression > 5% | benchmark comparison |
323+
324+
---
325+
326+
## Trust-Critical Failure Modes
327+
328+
These are the top failure modes that would break customer trust. Each must have dedicated automated coverage with explicit test cases.
329+
330+
### 1. Data Loss During Clone
331+
332+
- **Risk**: thin clone corruption, snapshot staleness, ZFS pool failure
333+
- **Coverage**: destructive test harness, snapshot integrity checks, clone verification tests
334+
- **Gate**: block release if any clone integrity test fails
335+
336+
### 2. Incorrect Diagnostic Recommendation (SAMO)
337+
338+
- **Risk**: wrong index suggestion, incorrect bloat detection, false positive on lock contention
339+
- **Coverage**: known-answer test suite against reference databases, cross-version validation
340+
- **Gate**: all diagnostic outputs validated against expert-reviewed baselines
341+
342+
### 3. Silent Monitoring Failure
343+
344+
- **Risk**: metrics stop collecting without alerting, stale data presented as current
345+
- **Coverage**: heartbeat tests for all monitoring components, staleness detection
346+
- **Gate**: alerting on any monitoring gap > 5 minutes
347+
348+
### 4. Security Exposure
349+
350+
- **Risk**: authentication bypass, SQL injection, credential leakage in logs
351+
- **Coverage**: CodeQL scanning, secret detection (gitleaks), parameterized query enforcement
352+
- **Gate**: zero critical/high security findings
353+
354+
### 5. Performance Regression
355+
356+
- **Risk**: clone creation slowdown, API latency increase, memory leak
357+
- **Coverage**: automated benchmark suite with statistical comparison
358+
- **Gate**: no regression exceeding 10% from baseline on any key metric
359+
360+
---
361+
362+
## Getting Started Checklist
363+
364+
For the first month, prioritize:
365+
366+
- [ ] Set up AI-assisted PR review with PostgreSQL-specific system prompt
367+
- [ ] Ensure all 5 trust-critical failure modes have dedicated test coverage
368+
- [ ] Instrument quality metrics (test coverage, CI pass rate, benchmark trends)
369+
- [ ] Run first weekly quality retrospective
370+
- [ ] Validate extension compatibility matrix in CI
371+
- [ ] Create destructive testing harness (start with disk-full and kill-mid-clone)
372+
- [ ] Document performance baselines for clone creation and API latency
373+
374+
---
375+
376+
*This is a living document. Update it as quality standards evolve and new failure modes are discovered.*

0 commit comments

Comments
 (0)