|
| 1 | +--- |
| 2 | +title: All Feature Work Stops When the Build Is Red |
| 3 | +description: Why continuous delivery requires stopping all feature work when the build breaks—not just blocking merges. Learn the team mindset, practices, and working agreements that make this discipline effective. |
| 4 | +weight: 4 |
| 5 | +type: docs |
| 6 | +--- |
| 7 | + |
| 8 | +When the trunk build breaks, the entire team stops feature work and collaborates to fix it immediately. This practice, borrowed from lean manufacturing's [Andon Cord](https://itrevolution.com/articles/kata/), prevents defects from propagating and maintains an always-releasable trunk. |
| 9 | + |
| 10 | +Every team member shifts focus to: |
| 11 | +1. Understanding what broke |
| 12 | +2. Fixing the broken build |
| 13 | +3. Learning why it happened |
| 14 | +4. Preventing similar failures |
| 15 | + |
| 16 | +No new feature work begins until the build is green again. |
| 17 | + |
| 18 | +## Why ALL Work Stops, Not Just Merges |
| 19 | + |
| 20 | +A common objection is: "Why stop all feature work? Just block merging until the pipeline is green." |
| 21 | + |
| 22 | +This misses the point. Continuous Delivery is not just technology and workflow—it is a **mindset**. Part of that mindset is that individuals on the team do not have individual priorities. The **team** has priorities. |
| 23 | + |
| 24 | +### Work Closer to Production Is Always More Valuable |
| 25 | + |
| 26 | +Work that is closer to production is always more valuable than work that is further away. A broken pipeline is halting the most important work: getting tested, integrated changes to users. It is also blocking any hotfix the team may need to deploy. |
| 27 | + |
| 28 | +When the build is red, fixing it is the team's highest priority. Not your feature. Not your story. The pipeline. |
| 29 | + |
| 30 | +### "Just Block Merges" Creates a False Sense of Progress |
| 31 | + |
| 32 | +If developers continue writing feature code while the build is broken: |
| 33 | + |
| 34 | +- They are building on a foundation they cannot verify |
| 35 | +- Their work is accumulating integration risk with every passing minute |
| 36 | +- They are individually productive but the **team** is not delivering |
| 37 | +- The broken build becomes someone else's problem instead of everyone's priority |
| 38 | +- The incentive to fix the build urgently is removed—it can wait until someone wants to merge |
| 39 | + |
| 40 | +This is the difference between individual activity and team effectiveness. A team where everyone is typing but nothing is shipping is not productive. |
| 41 | + |
| 42 | +### This Is a Team Organization Problem |
| 43 | + |
| 44 | +If the team is not organized to enable everyone to swarm on a broken build, that is a fundamental dysfunction. CD requires teams that: |
| 45 | + |
| 46 | +- **Share ownership** of the pipeline and the codebase |
| 47 | +- **Prioritize collectively** rather than protecting individual work streams |
| 48 | +- **Can all contribute** to diagnosing and fixing build failures |
| 49 | +- **Treat the pipeline as the team's most critical asset** |
| 50 | + |
| 51 | +A team that says "I'll keep working on my feature while someone else fixes the build" has not adopted the CD mindset. They are a group of individuals sharing a codebase, not a team practicing Continuous Delivery. |
| 52 | + |
| 53 | +## What This Looks Like in Practice |
| 54 | + |
| 55 | +### When the Team Stops |
| 56 | + |
| 57 | +``` |
| 58 | +09:15 - Build fails on trunk |
| 59 | +09:16 - Automated notification to team chat |
| 60 | +09:17 - Team acknowledges |
| 61 | +09:18 - Feature work pauses |
| 62 | +09:20 - Quick huddle: what broke? |
| 63 | +09:25 - Two devs pair on fix |
| 64 | +09:40 - Fix committed |
| 65 | +09:45 - Build green |
| 66 | +09:46 - Team resumes feature work |
| 67 | +09:50 - Quick retro: why did it break? |
| 68 | +``` |
| 69 | + |
| 70 | +**Total impact**: 30 minutes of paused feature work |
| 71 | +**Team learned**: Missing test case for edge condition |
| 72 | +**Outcome**: Better tests, faster next time |
| 73 | + |
| 74 | +### When the Team Doesn't Stop |
| 75 | + |
| 76 | +``` |
| 77 | +09:15 - Build fails on trunk |
| 78 | +09:30 - Someone notices |
| 79 | +10:00 - "We'll look at it later" |
| 80 | +11:00 - Another commit on a red build |
| 81 | +12:00 - Third failure, harder to diagnose |
| 82 | +14:00 - "This is too complex, we need help" |
| 83 | +16:00 - Multiple devs debugging |
| 84 | +17:30 - Finally fixed |
| 85 | +``` |
| 86 | + |
| 87 | +**Total impact**: 8+ hours of broken trunk, multiple devs blocked |
| 88 | +**Team learned**: Nothing systematic |
| 89 | +**Outcome**: Same failures likely to recur |
| 90 | + |
| 91 | +When developers continue working on a broken build, new work may depend on broken code, multiple changes pile up making diagnosis harder, and the broken state becomes the new baseline. Stopping immediately contains the problem. |
| 92 | + |
| 93 | +## When the Fix Takes Too Long |
| 94 | + |
| 95 | +If the fix will take more than 15 minutes, prefer reverting: |
| 96 | + |
| 97 | +**Option 1: Revert immediately** |
| 98 | +- Roll back the commit that broke the build |
| 99 | +- Get trunk green |
| 100 | +- Fix properly offline |
| 101 | +- Re-integrate with the fix |
| 102 | + |
| 103 | +**Option 2: Forward fix with a time limit** |
| 104 | +- Set a timer (15 minutes) |
| 105 | +- Work on forward fix |
| 106 | +- If the timer expires: revert |
| 107 | +- Fix offline and re-integrate |
| 108 | + |
| 109 | +Choose revert bias when unsure. The goal is a green trunk, not a heroic fix. |
| 110 | + |
| 111 | +## Team Working Agreements |
| 112 | + |
| 113 | +Effective stop-the-line requires clear agreements: |
| 114 | + |
| 115 | +### Fast Build Feedback |
| 116 | + |
| 117 | +**Agreement**: "Our builds complete in < 10 minutes" |
| 118 | + |
| 119 | +Developers can't respond to failures they don't know about. If builds are slow, parallelize test execution, move slow tests post-merge, or invest in faster infrastructure. |
| 120 | + |
| 121 | +### Visible Build Status |
| 122 | + |
| 123 | +**Agreement**: "Build status is visible to the entire team at all times" |
| 124 | + |
| 125 | +You can't stop for failures you don't see. Use build radiators, chat notifications, and desktop alerts. See [Pipeline Visibility](../pipeline-visibility/) for detailed guidance. |
| 126 | + |
| 127 | +### Team Owns the Fix |
| 128 | + |
| 129 | +**Agreement**: "When the build breaks, the team owns the fix" |
| 130 | + |
| 131 | +**Not**: "Whoever broke it fixes it" |
| 132 | +**Instead**: "The team fixes it together" |
| 133 | + |
| 134 | +Individual blame prevents collaboration. The person who triggered the failure may not have the expertise or context to fix it quickly. Rally the team. |
| 135 | + |
| 136 | +### Fixed Means Green |
| 137 | + |
| 138 | +**Agreement**: "Fixed means green build on trunk, not just a fix committed" |
| 139 | + |
| 140 | +Fixed includes: root cause identified, fix implemented, tests passing on trunk, and a plan to prevent recurrence. |
| 141 | + |
| 142 | +### No Bypassing |
| 143 | + |
| 144 | +**Agreement**: "We will not bypass CI to deploy during red builds" |
| 145 | + |
| 146 | +Not for critical hotfixes (fix the build first, or revert). Not for small changes (small doesn't mean safe). Not for "known failures" (then they should be fixed or removed). Not for executive pressure (protect the team). |
| 147 | + |
| 148 | +## Common Objections |
| 149 | + |
| 150 | +**"We can't afford to stop feature work"** |
| 151 | + |
| 152 | +You can't afford not to. Every hour the build stays broken compounds future integration issues, blocks other developers, erodes deployment confidence, and increases fix complexity. Stopping is cheaper. |
| 153 | + |
| 154 | +**"Stopping kills our velocity"** |
| 155 | + |
| 156 | +Short term, stopping might feel slow. Long term, stopping accelerates delivery. Broken builds that persist block developers, create integration debt, and compound failures. Stopping maintains velocity by preventing these compounding costs. |
| 157 | + |
| 158 | +**"We stop all the time"** |
| 159 | + |
| 160 | +If builds break frequently, the problem isn't stopping—it's insufficient testing before merge. Improve pre-merge testing, require local test runs, and fix flaky tests. Stopping reveals the problem. Better testing solves it. |
| 161 | + |
| 162 | +**"It's a known flaky test"** |
| 163 | + |
| 164 | +Then remove it from the build. Either fix the flaky test immediately, remove it from trunk builds, or quarantine it for investigation. Non-deterministic tests are broken tests. See [Deterministic Tests](/minimumcd/deterministic/) for guidance. |
| 165 | + |
| 166 | +**"Management doesn't support stopping"** |
| 167 | + |
| 168 | +Educate stakeholders on the economics: show time saved by early fixes, demonstrate deployment confidence, track defect reduction, and measure cycle time improvement. If leadership demands features over quality, you're not empowered to do CI. |
| 169 | + |
| 170 | +## The Cultural Shift |
| 171 | + |
| 172 | +This practice represents a fundamental change: |
| 173 | + |
| 174 | +**From**: "Individual productivity" **To**: "Team effectiveness" |
| 175 | + |
| 176 | +**From**: "Ship features at all costs" **To**: "Maintain quality while shipping features" |
| 177 | + |
| 178 | +**From**: "Move fast and break things" **To**: "Move fast by not breaking things" |
| 179 | + |
| 180 | +This shift is uncomfortable but essential for sustainable high performance. |
| 181 | + |
| 182 | +## Metrics |
| 183 | + |
| 184 | +- **Time to fix**: Time from build failure to green build. Target < 15 minutes median, < 1 hour average. |
| 185 | +- **Stop rate**: Percentage of build failures that trigger full stop. Target 100%. |
| 186 | +- **Failure frequency**: Build failures per week. Should decrease over time. |
| 187 | + |
| 188 | +Track patterns in why builds break (flaky tests, missing pre-merge tests, environment differences, integration issues) to identify systemic improvement opportunities. |
| 189 | + |
| 190 | +## Additional Resources |
| 191 | + |
| 192 | +- [Continuous Integration](https://martinfowler.com/articles/continuousIntegration.html) - Martin Fowler |
| 193 | +- [The Andon Cord](https://itrevolution.com/articles/kata/) - Lean Manufacturing principle |
| 194 | +- [Pipeline Visibility](../pipeline-visibility/) |
| 195 | +- [Deterministic Tests](/minimumcd/deterministic/) |
0 commit comments