Skip to content

Commit 8a83138

Browse files
committed
docs: expand reasons for team workflow on red pipelines
1 parent 1ea7eb8 commit 8a83138

File tree

5 files changed

+201
-378
lines changed

5 files changed

+201
-378
lines changed

content/MinimumCD/continuous-integration/_index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ CI is not just about tooling—it's fundamentally about team workflow and workin
1717
2. Work integrates to trunk at a minimum daily (each developer, every day)
1818
3. Work has automated testing before merge to trunk
1919
4. Work is tested with other work automatically on merge
20-
5. All feature work stops when the build is red
20+
5. [All feature work stops when the build is red](./all-feature-work-stops-when-the-build-is-red/)
2121
6. New work does not break delivered work
2222

2323
## Why This Matters
@@ -50,7 +50,7 @@ While CI depends on tooling, the team workflow and working agreement are more im
5050
4. **Trunk-based workflow**: All work begins from trunk and integrates to trunk at least daily
5151
5. **Stop-the-line**: If CI detects an error, the team stops feature work and collaborates to fix the build immediately
5252

53-
The stop-the-line practice is critical for maintaining an always-releasable trunk. For detailed guidance on implementing this discipline, see [Stop-the-Line Culture](./stop-the-line/).
53+
The stop-the-line practice is critical for maintaining an always-releasable trunk. For detailed guidance on implementing this discipline, see [All Feature Work Stops When the Build Is Red](./all-feature-work-stops-when-the-build-is-red/).
5454

5555
## Example Implementations
5656

Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
---
2+
title: All Feature Work Stops When the Build Is Red
3+
description: Why continuous delivery requires stopping all feature work when the build breaks—not just blocking merges. Learn the team mindset, practices, and working agreements that make this discipline effective.
4+
weight: 4
5+
type: docs
6+
---
7+
8+
When the trunk build breaks, the entire team stops feature work and collaborates to fix it immediately. This practice, borrowed from lean manufacturing's [Andon Cord](https://itrevolution.com/articles/kata/), prevents defects from propagating and maintains an always-releasable trunk.
9+
10+
Every team member shifts focus to:
11+
1. Understanding what broke
12+
2. Fixing the broken build
13+
3. Learning why it happened
14+
4. Preventing similar failures
15+
16+
No new feature work begins until the build is green again.
17+
18+
## Why ALL Work Stops, Not Just Merges
19+
20+
A common objection is: "Why stop all feature work? Just block merging until the pipeline is green."
21+
22+
This misses the point. Continuous Delivery is not just technology and workflow—it is a **mindset**. Part of that mindset is that individuals on the team do not have individual priorities. The **team** has priorities.
23+
24+
### Work Closer to Production Is Always More Valuable
25+
26+
Work that is closer to production is always more valuable than work that is further away. A broken pipeline is halting the most important work: getting tested, integrated changes to users. It is also blocking any hotfix the team may need to deploy.
27+
28+
When the build is red, fixing it is the team's highest priority. Not your feature. Not your story. The pipeline.
29+
30+
### "Just Block Merges" Creates a False Sense of Progress
31+
32+
If developers continue writing feature code while the build is broken:
33+
34+
- They are building on a foundation they cannot verify
35+
- Their work is accumulating integration risk with every passing minute
36+
- They are individually productive but the **team** is not delivering
37+
- The broken build becomes someone else's problem instead of everyone's priority
38+
- The incentive to fix the build urgently is removed—it can wait until someone wants to merge
39+
40+
This is the difference between individual activity and team effectiveness. A team where everyone is typing but nothing is shipping is not productive.
41+
42+
### This Is a Team Organization Problem
43+
44+
If the team is not organized to enable everyone to swarm on a broken build, that is a fundamental dysfunction. CD requires teams that:
45+
46+
- **Share ownership** of the pipeline and the codebase
47+
- **Prioritize collectively** rather than protecting individual work streams
48+
- **Can all contribute** to diagnosing and fixing build failures
49+
- **Treat the pipeline as the team's most critical asset**
50+
51+
A team that says "I'll keep working on my feature while someone else fixes the build" has not adopted the CD mindset. They are a group of individuals sharing a codebase, not a team practicing Continuous Delivery.
52+
53+
## What This Looks Like in Practice
54+
55+
### When the Team Stops
56+
57+
```
58+
09:15 - Build fails on trunk
59+
09:16 - Automated notification to team chat
60+
09:17 - Team acknowledges
61+
09:18 - Feature work pauses
62+
09:20 - Quick huddle: what broke?
63+
09:25 - Two devs pair on fix
64+
09:40 - Fix committed
65+
09:45 - Build green
66+
09:46 - Team resumes feature work
67+
09:50 - Quick retro: why did it break?
68+
```
69+
70+
**Total impact**: 30 minutes of paused feature work
71+
**Team learned**: Missing test case for edge condition
72+
**Outcome**: Better tests, faster next time
73+
74+
### When the Team Doesn't Stop
75+
76+
```
77+
09:15 - Build fails on trunk
78+
09:30 - Someone notices
79+
10:00 - "We'll look at it later"
80+
11:00 - Another commit on a red build
81+
12:00 - Third failure, harder to diagnose
82+
14:00 - "This is too complex, we need help"
83+
16:00 - Multiple devs debugging
84+
17:30 - Finally fixed
85+
```
86+
87+
**Total impact**: 8+ hours of broken trunk, multiple devs blocked
88+
**Team learned**: Nothing systematic
89+
**Outcome**: Same failures likely to recur
90+
91+
When developers continue working on a broken build, new work may depend on broken code, multiple changes pile up making diagnosis harder, and the broken state becomes the new baseline. Stopping immediately contains the problem.
92+
93+
## When the Fix Takes Too Long
94+
95+
If the fix will take more than 15 minutes, prefer reverting:
96+
97+
**Option 1: Revert immediately**
98+
- Roll back the commit that broke the build
99+
- Get trunk green
100+
- Fix properly offline
101+
- Re-integrate with the fix
102+
103+
**Option 2: Forward fix with a time limit**
104+
- Set a timer (15 minutes)
105+
- Work on forward fix
106+
- If the timer expires: revert
107+
- Fix offline and re-integrate
108+
109+
Choose revert bias when unsure. The goal is a green trunk, not a heroic fix.
110+
111+
## Team Working Agreements
112+
113+
Effective stop-the-line requires clear agreements:
114+
115+
### Fast Build Feedback
116+
117+
**Agreement**: "Our builds complete in < 10 minutes"
118+
119+
Developers can't respond to failures they don't know about. If builds are slow, parallelize test execution, move slow tests post-merge, or invest in faster infrastructure.
120+
121+
### Visible Build Status
122+
123+
**Agreement**: "Build status is visible to the entire team at all times"
124+
125+
You can't stop for failures you don't see. Use build radiators, chat notifications, and desktop alerts. See [Pipeline Visibility](../pipeline-visibility/) for detailed guidance.
126+
127+
### Team Owns the Fix
128+
129+
**Agreement**: "When the build breaks, the team owns the fix"
130+
131+
**Not**: "Whoever broke it fixes it"
132+
**Instead**: "The team fixes it together"
133+
134+
Individual blame prevents collaboration. The person who triggered the failure may not have the expertise or context to fix it quickly. Rally the team.
135+
136+
### Fixed Means Green
137+
138+
**Agreement**: "Fixed means green build on trunk, not just a fix committed"
139+
140+
Fixed includes: root cause identified, fix implemented, tests passing on trunk, and a plan to prevent recurrence.
141+
142+
### No Bypassing
143+
144+
**Agreement**: "We will not bypass CI to deploy during red builds"
145+
146+
Not for critical hotfixes (fix the build first, or revert). Not for small changes (small doesn't mean safe). Not for "known failures" (then they should be fixed or removed). Not for executive pressure (protect the team).
147+
148+
## Common Objections
149+
150+
**"We can't afford to stop feature work"**
151+
152+
You can't afford not to. Every hour the build stays broken compounds future integration issues, blocks other developers, erodes deployment confidence, and increases fix complexity. Stopping is cheaper.
153+
154+
**"Stopping kills our velocity"**
155+
156+
Short term, stopping might feel slow. Long term, stopping accelerates delivery. Broken builds that persist block developers, create integration debt, and compound failures. Stopping maintains velocity by preventing these compounding costs.
157+
158+
**"We stop all the time"**
159+
160+
If builds break frequently, the problem isn't stopping—it's insufficient testing before merge. Improve pre-merge testing, require local test runs, and fix flaky tests. Stopping reveals the problem. Better testing solves it.
161+
162+
**"It's a known flaky test"**
163+
164+
Then remove it from the build. Either fix the flaky test immediately, remove it from trunk builds, or quarantine it for investigation. Non-deterministic tests are broken tests. See [Deterministic Tests](/minimumcd/deterministic/) for guidance.
165+
166+
**"Management doesn't support stopping"**
167+
168+
Educate stakeholders on the economics: show time saved by early fixes, demonstrate deployment confidence, track defect reduction, and measure cycle time improvement. If leadership demands features over quality, you're not empowered to do CI.
169+
170+
## The Cultural Shift
171+
172+
This practice represents a fundamental change:
173+
174+
**From**: "Individual productivity" **To**: "Team effectiveness"
175+
176+
**From**: "Ship features at all costs" **To**: "Maintain quality while shipping features"
177+
178+
**From**: "Move fast and break things" **To**: "Move fast by not breaking things"
179+
180+
This shift is uncomfortable but essential for sustainable high performance.
181+
182+
## Metrics
183+
184+
- **Time to fix**: Time from build failure to green build. Target < 15 minutes median, < 1 hour average.
185+
- **Stop rate**: Percentage of build failures that trigger full stop. Target 100%.
186+
- **Failure frequency**: Build failures per week. Should decrease over time.
187+
188+
Track patterns in why builds break (flaky tests, missing pre-merge tests, environment differences, integration issues) to identify systemic improvement opportunities.
189+
190+
## Additional Resources
191+
192+
- [Continuous Integration](https://martinfowler.com/articles/continuousIntegration.html) - Martin Fowler
193+
- [The Andon Cord](https://itrevolution.com/articles/kata/) - Lean Manufacturing principle
194+
- [Pipeline Visibility](../pipeline-visibility/)
195+
- [Deterministic Tests](/minimumcd/deterministic/)

0 commit comments

Comments
 (0)