Skip to content

Commit 1083490

Browse files
Addition of a new feature toggling practice (#349)
Introduction of a new practice page around Feature Toggling. Aims to provide some context on feature toggles and how best to manage them --------- Co-authored-by: Dan Stefaniuk <[email protected]>
1 parent 05fd86d commit 1083490

File tree

1 file changed

+131
-0
lines changed

1 file changed

+131
-0
lines changed

practices/feature-toggling.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# Feature Toggling
2+
3+
- [Feature Toggling](#feature-toggling)
4+
- [Context](#context)
5+
- [The intent](#the-intent)
6+
- [Key takeaway](#key-takeaway)
7+
- [Background](#background)
8+
- [What is feature toggling?](#what-is-feature-toggling)
9+
- [Why use feature toggling?](#why-use-feature-toggling)
10+
- [Types of toggles](#types-of-toggles)
11+
- [Managing toggles](#managing-toggles)
12+
- [Toggling strategy](#toggling-strategy)
13+
- [Toggle lifecycle](#toggle-lifecycle)
14+
- [Best practice lifecycle](#best-practice-lifecycle)
15+
- [Testing toggled features](#testing-toggled-features)
16+
- [Designing for failure](#designing-for-failure)
17+
- [Further reading](#further-reading)
18+
19+
## Context
20+
21+
- These notes are part of our broader [engineering principles](../principles.md).
22+
- Feature toggling contributes to safer delivery, reduced deployment risk, and enhanced responsiveness to change.
23+
24+
## The intent
25+
26+
We use feature toggling as a key enabling practice to support our move towards daily code integration and deployment, including multiple deployments per day. Feature toggling allows us to separate deployment from release, so that incomplete features can be merged and deployed without impacting end users. It enables incremental development by allowing small, frequent commits to `main`, and supports risk-managed rollouts by enabling functionality selectively. Importantly, it also provides a mechanism for rapid rollback without reverting code. This approach is critical to mitigating the risks associated with frequent deployments, and is foundational to achieving a safe and sustainable continuous delivery model.
27+
28+
## Key takeaway
29+
30+
- Feature toggling enables functionality to be turned on or off without deploying new code.
31+
- It separates deployment from release, allowing code to be safely deployed without activating a feature.
32+
- Toggles should be explicitly managed with clear naming, documented intent, and timely removal.
33+
- Toggle abuse (too many, long-lived, or undocumented flags) leads to tech debt and complex logic.
34+
35+
## Background
36+
37+
Feature toggling, also known as feature flags, is a technique for modifying system behaviour without changing code by checking a condition (usually externalised) at runtime. It is often used to control feature rollouts, manage risk, and test changes in production.
38+
39+
This is particularly powerful in continuous delivery environments where small, frequent changes are the norm. It supports practices like canary releases, A/B testing, and operational kill switches.
40+
41+
For a detailed and widely referenced introduction to this practice, see Martin Fowler's article on [Feature Toggles](https://martinfowler.com/articles/feature-toggles.html).
42+
43+
While some areas are looking to adopt a more enterprise-grade offering with Flagsmith, it's important to recognise that more minimal feature toggle approaches may be appropriate for smaller or simpler systems. The [Thoughtworks Technology Radar](https://www.thoughtworks.com/radar) notes that many teams over-engineer feature flagging by immediately adopting complex platforms, when a simpler approach (e.g., environment variables or static config) would suffice. However, irrespective of how the toggle is implemented, the **governance, traceability, and lifecycle management processes should be consistent**.
44+
45+
## What is feature toggling?
46+
47+
Feature toggling works by introducing conditional logic into the application code. This logic evaluates a configuration value or remote toggle to determine whether to execute a new or existing code path.
48+
49+
Toggles can be defined statically (e.g., environment variable or config file) or dynamically (e.g., via an external feature flag service). Dynamic toggles can be changed without restarting or redeploying the application.
50+
51+
## Why use feature toggling?
52+
53+
- **Decouple deployment from release**: Code can be deployed behind a toggle and activated later.
54+
- **Enable safe rollouts**: Enable features for specific users or teams to validate functionality before full rollout.
55+
- **Support operational control**: Temporarily disable a feature causing issues without rollback.
56+
- **Enable experimentation**: Run A/B tests to determine user impact.
57+
- **Configure environment-specific behaviour**: Activate features in dev or test environments only.
58+
59+
## Types of toggles
60+
61+
According to Martin Fowler, toggles typically fall into the following categories:
62+
63+
- **Release toggles**: Allow incomplete features to be merged and deployed.
64+
- **Experiment toggles**: Support A/B or multivariate testing.
65+
- **Ops toggles**: Provide operational control for performance or reliability.
66+
- **Permission toggles**: Enable features based on user roles or attributes.
67+
68+
## Managing toggles
69+
70+
Poorly managed toggles can lead to complexity, bugs, and technical debt. Best practices include:
71+
72+
- Give toggles meaningful, consistent names.
73+
- Store toggle state in a centralised and observable system.
74+
- Document the purpose and expected lifetime of each toggle.
75+
- Remove stale toggles once their purpose is fulfilled.
76+
- Avoid nesting toggles or creating toggle spaghetti.
77+
- Ensure toggles are discoverable, testable, and auditable.
78+
79+
## Toggling strategy
80+
81+
Choose a feature flagging approach appropriate for the scale and complexity of your system:
82+
83+
- **Simple applications**: Environment variables or configuration files.
84+
- **Moderate scale and beyond**: Look to make use of e.g. [Flagsmith](https://www.flagsmith.com/), which supports targeting, analytics, and team workflows.
85+
86+
Feature toggles should be queryable from all components that need access to their values. Depending on your architecture, this may require synchronisation, caching, or SDK integration.
87+
88+
## Toggle lifecycle
89+
90+
Toggles are intended to be short-lived unless explicitly designed to be permanent (e.g. permission toggles).
91+
92+
### Best practice lifecycle
93+
94+
1. **Introduce** the toggle with a clear purpose and target outcome.
95+
2. **Implement** the feature behind the toggle.
96+
3. **Test** the feature in both on/off states.
97+
4. **Roll out** gradually (e.g., canary users, targeted groups).
98+
5. **Monitor** the impact of the feature.
99+
6. **Remove** the toggle once the feature is stable and fully deployed.
100+
101+
Document toggles in your architecture or delivery tooling to ensure visibility and traceability.
102+
103+
## Testing toggled features
104+
105+
Features behind toggles should be tested in both their enabled and disabled states. This ensures correctness regardless of the toggle value.
106+
107+
- Write tests that explicitly set the toggle on and off.
108+
- Use test frameworks that allow injecting or mocking toggle values.
109+
- Consider test coverage for the toggle transitions (e.g., changing at runtime).
110+
- Ensure integration and end-to-end tests include scenarios where toggles are disabled.
111+
112+
This is particularly important for toggles that persist for more than one release cycle.
113+
114+
## Designing for failure
115+
116+
Feature toggles should never become a point of failure. Design your system so that it behaves predictably even if the toggle service is unavailable or fails to return a value.
117+
118+
Best practices:
119+
120+
- Default values: Every toggle should have a known and safe default (either on or off) hardcoded in the consuming service.
121+
- Fail-safe logic: Ensure that remote flag checks have timeouts and fallback paths.
122+
- Graceful degradation: Systems should still function, possibly with reduced capability, if a toggle cannot be resolved.
123+
- Resilient integration: Ensure that SDKs or services used for toggling are resilient and do not block application startup or core functionality.
124+
125+
## Further reading
126+
127+
- [Feature Toggles by Martin Fowler](https://martinfowler.com/articles/feature-toggles.html)
128+
- [Unleash Strategies and Best Practices](https://docs.getunleash.io/topics/feature-flags/feature-flag-best-practices)
129+
- [Flagsmith Docs](https://docs.flagsmith.com/)
130+
- [Feature Flag Best Practices](https://launchdarkly.com/blog/best-practices-for-coding-with-feature-flags/)
131+
- [Thoughtworks Tech Radar](https://www.thoughtworks.com/radar/techniques/minimum-feature-toggle-solution)

0 commit comments

Comments
 (0)