Skip to content

Commit 6426401

Browse files
authored
Add marketing skills (#947)
* chore: add marketing skills * ci: only format check the src folder
1 parent aad48c3 commit 6426401

File tree

119 files changed

+23193
-1
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

119 files changed

+23193
-1
lines changed
Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
---
2+
name: ab-test-setup
3+
description: When the user wants to plan, design, or implement an A/B test or experiment. Also use when the user mentions "A/B test," "split test," "experiment," "test this change," "variant copy," "multivariate test," or "hypothesis." For tracking implementation, see analytics-tracking.
4+
metadata:
5+
version: 1.1.0
6+
---
7+
8+
# A/B Test Setup
9+
10+
You are an expert in experimentation and A/B testing. Your goal is to help design tests that produce statistically valid, actionable results.
11+
12+
## Initial Assessment
13+
14+
**Check for product marketing context first:**
15+
If `.agents/product-marketing-context.md` exists (or `.claude/product-marketing-context.md` in older setups), read it before asking questions. Use that context and only ask for information not already covered or specific to this task.
16+
17+
Before designing a test, understand:
18+
19+
1. **Test Context** - What are you trying to improve? What change are you considering?
20+
2. **Current State** - Baseline conversion rate? Current traffic volume?
21+
3. **Constraints** - Technical complexity? Timeline? Tools available?
22+
23+
---
24+
25+
## Core Principles
26+
27+
### 1. Start with a Hypothesis
28+
- Not just "let's see what happens"
29+
- Specific prediction of outcome
30+
- Based on reasoning or data
31+
32+
### 2. Test One Thing
33+
- Single variable per test
34+
- Otherwise you don't know what worked
35+
36+
### 3. Statistical Rigor
37+
- Pre-determine sample size
38+
- Don't peek and stop early
39+
- Commit to the methodology
40+
41+
### 4. Measure What Matters
42+
- Primary metric tied to business value
43+
- Secondary metrics for context
44+
- Guardrail metrics to prevent harm
45+
46+
---
47+
48+
## Hypothesis Framework
49+
50+
### Structure
51+
52+
```
53+
Because [observation/data],
54+
we believe [change]
55+
will cause [expected outcome]
56+
for [audience].
57+
We'll know this is true when [metrics].
58+
```
59+
60+
### Example
61+
62+
**Weak**: "Changing the button color might increase clicks."
63+
64+
**Strong**: "Because users report difficulty finding the CTA (per heatmaps and feedback), we believe making the button larger and using contrasting color will increase CTA clicks by 15%+ for new visitors. We'll measure click-through rate from page view to signup start."
65+
66+
---
67+
68+
## Test Types
69+
70+
| Type | Description | Traffic Needed |
71+
|------|-------------|----------------|
72+
| A/B | Two versions, single change | Moderate |
73+
| A/B/n | Multiple variants | Higher |
74+
| MVT | Multiple changes in combinations | Very high |
75+
| Split URL | Different URLs for variants | Moderate |
76+
77+
---
78+
79+
## Sample Size
80+
81+
### Quick Reference
82+
83+
| Baseline | 10% Lift | 20% Lift | 50% Lift |
84+
|----------|----------|----------|----------|
85+
| 1% | 150k/variant | 39k/variant | 6k/variant |
86+
| 3% | 47k/variant | 12k/variant | 2k/variant |
87+
| 5% | 27k/variant | 7k/variant | 1.2k/variant |
88+
| 10% | 12k/variant | 3k/variant | 550/variant |
89+
90+
**Calculators:**
91+
- [Evan Miller's](https://www.evanmiller.org/ab-testing/sample-size.html)
92+
- [Optimizely's](https://www.optimizely.com/sample-size-calculator/)
93+
94+
**For detailed sample size tables and duration calculations**: See [references/sample-size-guide.md](references/sample-size-guide.md)
95+
96+
---
97+
98+
## Metrics Selection
99+
100+
### Primary Metric
101+
- Single metric that matters most
102+
- Directly tied to hypothesis
103+
- What you'll use to call the test
104+
105+
### Secondary Metrics
106+
- Support primary metric interpretation
107+
- Explain why/how the change worked
108+
109+
### Guardrail Metrics
110+
- Things that shouldn't get worse
111+
- Stop test if significantly negative
112+
113+
### Example: Pricing Page Test
114+
- **Primary**: Plan selection rate
115+
- **Secondary**: Time on page, plan distribution
116+
- **Guardrail**: Support tickets, refund rate
117+
118+
---
119+
120+
## Designing Variants
121+
122+
### What to Vary
123+
124+
| Category | Examples |
125+
|----------|----------|
126+
| Headlines/Copy | Message angle, value prop, specificity, tone |
127+
| Visual Design | Layout, color, images, hierarchy |
128+
| CTA | Button copy, size, placement, number |
129+
| Content | Information included, order, amount, social proof |
130+
131+
### Best Practices
132+
- Single, meaningful change
133+
- Bold enough to make a difference
134+
- True to the hypothesis
135+
136+
---
137+
138+
## Traffic Allocation
139+
140+
| Approach | Split | When to Use |
141+
|----------|-------|-------------|
142+
| Standard | 50/50 | Default for A/B |
143+
| Conservative | 90/10, 80/20 | Limit risk of bad variant |
144+
| Ramping | Start small, increase | Technical risk mitigation |
145+
146+
**Considerations:**
147+
- Consistency: Users see same variant on return
148+
- Balanced exposure across time of day/week
149+
150+
---
151+
152+
## Implementation
153+
154+
### Client-Side
155+
- JavaScript modifies page after load
156+
- Quick to implement, can cause flicker
157+
- Tools: PostHog, Optimizely, VWO
158+
159+
### Server-Side
160+
- Variant determined before render
161+
- No flicker, requires dev work
162+
- Tools: PostHog, LaunchDarkly, Split
163+
164+
---
165+
166+
## Running the Test
167+
168+
### Pre-Launch Checklist
169+
- [ ] Hypothesis documented
170+
- [ ] Primary metric defined
171+
- [ ] Sample size calculated
172+
- [ ] Variants implemented correctly
173+
- [ ] Tracking verified
174+
- [ ] QA completed on all variants
175+
176+
### During the Test
177+
178+
**DO:**
179+
- Monitor for technical issues
180+
- Check segment quality
181+
- Document external factors
182+
183+
**DON'T:**
184+
- Peek at results and stop early
185+
- Make changes to variants
186+
- Add traffic from new sources
187+
188+
### The Peeking Problem
189+
Looking at results before reaching sample size and stopping early leads to false positives and wrong decisions. Pre-commit to sample size and trust the process.
190+
191+
---
192+
193+
## Analyzing Results
194+
195+
### Statistical Significance
196+
- 95% confidence = p-value < 0.05
197+
- Means <5% chance result is random
198+
- Not a guarantee—just a threshold
199+
200+
### Analysis Checklist
201+
202+
1. **Reach sample size?** If not, result is preliminary
203+
2. **Statistically significant?** Check confidence intervals
204+
3. **Effect size meaningful?** Compare to MDE, project impact
205+
4. **Secondary metrics consistent?** Support the primary?
206+
5. **Guardrail concerns?** Anything get worse?
207+
6. **Segment differences?** Mobile vs. desktop? New vs. returning?
208+
209+
### Interpreting Results
210+
211+
| Result | Conclusion |
212+
|--------|------------|
213+
| Significant winner | Implement variant |
214+
| Significant loser | Keep control, learn why |
215+
| No significant difference | Need more traffic or bolder test |
216+
| Mixed signals | Dig deeper, maybe segment |
217+
218+
---
219+
220+
## Documentation
221+
222+
Document every test with:
223+
- Hypothesis
224+
- Variants (with screenshots)
225+
- Results (sample, metrics, significance)
226+
- Decision and learnings
227+
228+
**For templates**: See [references/test-templates.md](references/test-templates.md)
229+
230+
---
231+
232+
## Common Mistakes
233+
234+
### Test Design
235+
- Testing too small a change (undetectable)
236+
- Testing too many things (can't isolate)
237+
- No clear hypothesis
238+
239+
### Execution
240+
- Stopping early
241+
- Changing things mid-test
242+
- Not checking implementation
243+
244+
### Analysis
245+
- Ignoring confidence intervals
246+
- Cherry-picking segments
247+
- Over-interpreting inconclusive results
248+
249+
---
250+
251+
## Task-Specific Questions
252+
253+
1. What's your current conversion rate?
254+
2. How much traffic does this page get?
255+
3. What change are you considering and why?
256+
4. What's the smallest improvement worth detecting?
257+
5. What tools do you have for testing?
258+
6. Have you tested this area before?
259+
260+
---
261+
262+
## Related Skills
263+
264+
- **page-cro**: For generating test ideas based on CRO principles
265+
- **analytics-tracking**: For setting up test measurement
266+
- **copywriting**: For creating variant copy

0 commit comments

Comments
 (0)