Complete deployment_testing.md module - Implementation plan

## Overview

This issue tracks the completion of the `s7_deployment/deployment_testing.md` module, which was started but never finished. The module currently covers three deployment testing strategies (A/B Testing, Canary Deployment, Shadow Deployment) but has significant incomplete content and placeholder text.

**Related PR:** #268 (branch: `deployment-testing`)

---

## Current State Analysis

### ✅ What's Complete
- Strong introduction explaining deployment testing importance for ML projects
- Three main section structures: A/B Testing, Canary Deployment, Shadow Deployment
- Visual diagrams referenced for all three strategies
- Knowledge check table at the end
- Basic exercise scaffolding

### ❌ What's Incomplete/Missing
1. **A/B Testing Section** (Lines 23-105)
   - Missing explanatory text after diagram (line 47 is just `[text]`)
   - Incomplete exercise description (line 62 ends mid-sentence: "The second")
   - Exercise 1 shows geolocation code without context or instructions
   - No statistical analysis guidance despite having a table of tests (lines 50-56)

2. **Canary Deployment Section** (Lines 107-126)
   - No conceptual explanation (only diagram)
   - Minimal exercise (just external link + one command mention)

3. **Shadow Deployment Section** (Lines 128-169)
   - No conceptual explanation (only diagram)
   - Exercise code needs improvement (currently just random routing, not true shadowing)
   - Missing analysis/comparison steps

4. **General Issues**
   - No learning objectives section
   - No prerequisites section
   - Knowledge check table has duplicate column header (line 175)
   - No ML-specific examples throughout
   - Code examples lack error handling, logging, type hints

---

## Implementation Requirements

Based on discussion, the completed module should have:

- **Content depth**: Brief (2-3 paragraphs per concept)
- **Exercise level**: Intermediate (guidance with room for problem-solving)
- **Examples**: Simple ML models (MNIST-based)
- **Statistics**: Moderate (hypothesis testing + Python examples)
- **Additional sections**: Comparison, best practices, advanced topics

---

## Implementation Plan

### Phase 1: Introduction & Structure (Lines 1-22)
**Estimated addition:** ~30 lines

- [ ] Add Learning Objectives box after line 11
  - Understand three deployment testing strategies
  - Implement A/B testing with statistical validation
  - Deploy canary releases with gradual rollout
  - Set up shadow deployments for risk-free testing

- [ ] Add Prerequisites section
  - Link to: APIs module, Cloud Deployment module, Testing APIs module
  - Required: Deployed FastAPI app, GCP account, basic statistics knowledge

- [ ] Add Quick Comparison Table before line 19
  - Columns: Strategy, Use Case, Risk Level, Complexity, User Impact
  - Help students choose strategy at a glance

---

### Phase 2: A/B Testing Section (Lines 23-105)
**Estimated addition:** ~80 lines

- [ ] Add conceptual introduction (2-3 paragraphs) after line 27
  - Explain A/B testing for ML deployments
  - Example: Testing MNIST models with different preprocessing
  - When to use: Model version comparison, feature changes

- [ ] Fix line 47 placeholder
  - Replace `[text](link)` with actual content
  - Explain: Sample size determination, confidence intervals, stopping criteria
  - Link to statistical calculator and Python implementation

- [ ] Add statistical guidance after table (line 56)
  - Brief explanation of each test in the table
  - Python example using `scipy.stats`
  - Sample size calculator example

- [ ] Complete Exercise Section
  - **Fix line 62**: Complete the sentence ("The second will test **model performance differences**")
  - **Exercise 1**: Geo-based A/B Testing
    - Add context: Route users by geography to test regional variants
    - Setup instructions: Install geoip2, download GeoLite2 database
    - Deploy 2 Cloud Run services with different model versions
    - Add monitoring/logging code to track variant assignment
  - **Exercise 2** (NEW): Statistical Analysis
    - Provide sample data (Model A: 92% accuracy, Model B: 94% accuracy)
    - Calculate statistical significance using provided test table
    - Python code example with t-test
  - **Exercise 3** (NEW): Simple Traffic Split A/B Test
    - Deploy two MNIST models (baseline vs. with data augmentation)
    - Use Cloud Run traffic splitting (50/50)
    - Collect prediction logs
    - Analyze which model performs better

---

### Phase 3: Canary Deployment Section (Lines 107-126)
**Estimated addition:** ~60 lines

- [ ] Add conceptual explanation after line 107 (2-3 paragraphs)
  - What is canary deployment (origin: canary in coal mine)
  - How it works: Gradual rollout (5%→25%→50%→100%)
  - ML use case: Rolling out retrained model safely

- [ ] Add monitoring guidance
  - Metrics to track: accuracy, latency, error rate, user engagement
  - When to rollback vs. proceed
  - Brief mention of automation possibilities

- [ ] Expand exercises
  - **Exercise 1**: Keep GCP guide link, add context
    - What they'll build, expected time, prerequisites checklist
  - **Exercise 2** (EXPAND from lines 122-126): Step-by-step Canary
    - Deploy MNIST model v1 (baseline)
    - Deploy MNIST model v2 (improved architecture, e.g., added dropout)
    - Use `gcloud run services update-traffic` with progressive percentages
    - Monitor logs in Cloud Logging
    - Provide script template to automate traffic increases
  - **Exercise 3** (NEW): Rollback Scenario
    - Simulate issue (v2 has higher error rate on edge cases)
    - Practice immediate rollback
    - Document decision criteria and learnings

---

### Phase 4: Shadow Deployment Section (Lines 128-169)
**Estimated addition:** ~70 lines

- [ ] Add conceptual explanation after line 128 (2-3 paragraphs)
  - What is shadow deployment
  - Zero user risk - perfect for ML model validation
  - Example: Test new MNIST model architecture alongside production

- [ ] Add implementation patterns
  - Load balancer approach (current exercise approach)
  - Service mesh (mention Istio but note complexity)
  - Application-level duplication
  - When to use each pattern

- [ ] Fix & expand Exercise 1
  - **Step 1** (IMPROVE lines 144-166): Fix load balancer code
    - Current code does random routing (not true shadowing!)
    - Replace with proper shadow implementation:
      - Send request to both primary and shadow
      - Only return primary response to user
      - Log shadow response for comparison
    - Use `httpx` for async requests
    - Add structured logging for comparison
  - **Step 2** (IMPROVE lines 168-169): Better deployment
    - Deploy to Cloud Run (better fit than Cloud Functions)
    - Provide requirements.txt
    - Full deployment commands with `gcloud run deploy`
  - **Step 3** (NEW): Deploy Two Model Versions
    - Primary: Stable MNIST model
    - Shadow: Experimental model (e.g., CNN vs. ResNet architecture)
    - Configure load balancer to duplicate traffic
  - **Step 4** (NEW): Analysis Exercise
    - Query logs from both versions using Cloud Logging
    - Compare predictions on identical inputs
    - Analyze latency differences
    - Make promotion decision: deploy shadow to production or iterate

---

### Phase 5: Comparison Section (NEW)
**Estimated addition:** ~30 lines

Add new section after Shadow Deployment, before Knowledge Check:

- [ ] "When to Use Which Strategy" subsection
  - Decision flowchart (text-based is fine)
  - Criteria: risk tolerance, rollback requirements, testing goals, traffic volume

- [ ] "Combining Strategies" subsection
  - Example workflow: Shadow first → Canary → A/B test
  - Progressive de-risking approach for ML deployments

---

### Phase 6: Best Practices (NEW)
**Estimated addition:** ~40 lines

Add new section after Comparison:

- [ ] General deployment testing best practices
  - Always monitor key metrics during tests
  - Define success criteria upfront
  - Have rollback plan ready and tested
  - Document all decisions and results

- [ ] ML-specific considerations
  - Monitor for model drift during deployment
  - Watch for feature distribution shifts
  - Consider latency vs. accuracy tradeoffs
  - Calculate A/B test duration for statistical power
  - Version control for models and data preprocessing

---

### Phase 7: Knowledge Check (Lines 171-188)
**Estimated addition:** ~20 lines

- [ ] Fix table at line 175
  - Remove duplicate "Releasing to users based on conditions" column
  - Replace with "Negative user impact" or "Complexity"

- [ ] Add scenario-based questions after table
  - Q1: "You need to test a new model with zero user risk. Which strategy?"
  - Q2: "You want to statistically compare two models. Which strategy?"
  - Q3: "You want to gradually roll out a model with easy rollback. Which strategy?"
  - Provide solutions with reasoning

---

### Phase 8: Advanced Topics (NEW - Optional)
**Estimated addition:** ~40 lines

Add new optional section before final "This ends..." line:

- [ ] Blue-Green Deployment
  - Brief explanation (2 paragraphs)
  - Difference from canary (instant switch vs. gradual)
  - Quick GCP implementation note

- [ ] Feature Flags
  - How feature flags enable gradual rollout
  - Tools: LaunchDarkly, Optimizely, or custom
  - Simple implementation example

- [ ] Multi-Armed Bandits (OPTIONAL)
  - Dynamic A/B testing that automatically optimizes
  - When to consider (high traffic scenarios)
  - Brief algorithm overview

---

### Phase 9: Code Quality & Polish
**Estimated addition:** Throughout all phases

- [ ] Improve all code examples
  - Add comprehensive error handling
  - Add structured logging statements
  - Add type hints for all functions
  - Add docstrings
  - Include requirements.txt snippets for each exercise

- [ ] Add callout boxes (using mkdocs admonitions)
  - `!!! tip` for common pitfalls
  - `!!! note` for GCP-specific features
  - `!!! warning` for statistical errors to avoid

- [ ] Ensure consistent formatting
  - All exercises follow same structure
  - Consistent code style (Black formatted)
  - Proper markdown formatting

---

## Estimated Impact

| Section | Current Lines | Estimated Addition | Total Lines |
|---------|---------------|-------------------|-------------|
| Introduction | 22 | +30 | ~52 |
| A/B Testing | 82 | +80 | ~162 |
| Canary | 19 | +60 | ~79 |
| Shadow | 41 | +70 | ~111 |
| Comparison (NEW) | 0 | +30 | ~30 |
| Best Practices (NEW) | 0 | +40 | ~40 |
| Knowledge Check | 18 | +20 | ~38 |
| Advanced Topics (NEW) | 0 | +40 | ~40 |
| **TOTAL** | **190** | **+370** | **~560** |

---

## Implementation Order

1. **Phase 1-2** (Introduction & A/B Testing) - Most incomplete, foundation
2. **Phase 3-4** (Canary & Shadow) - Build on A/B concepts
3. **Phase 5** (Comparison) - Ties strategies together
4. **Phase 6-7** (Best Practices & Knowledge Check) - Practical application
5. **Phase 8** (Advanced Topics) - Optional enrichment
6. **Phase 9** (Polish) - Final touches throughout

---

## Success Criteria

The module will be considered complete when:

- ✅ All three strategies have clear 2-3 paragraph conceptual explanations
- ✅ Each strategy has 2-3 complete, working exercises with full instructions
- ✅ Exercises use simple ML models (MNIST-based) appropriate for course
- ✅ A/B testing includes moderate statistical coverage with Python examples
- ✅ All placeholder/incomplete content is filled or removed
- ✅ Comparison, best practices, and advanced topics sections exist
- ✅ All code examples include error handling, logging, and type hints
- ✅ Knowledge check table is fixed and includes scenario questions
- ✅ Module follows same style/format as other course modules

---

## Related Files

- **Main file**: `s7_deployment/deployment_testing.md`
- **Related modules**: 
  - `s7_deployment/testing_apis.md` (prerequisite)
  - `s7_deployment/apis.md` (prerequisite)
  - `s7_deployment/cloud_deployment.md` (prerequisite)
- **Figures needed**: All referenced diagrams should exist in `../figures/`

---

## Notes

- This is a substantial completion effort (~370 lines of new content)
- Each phase can be implemented independently in separate PRs if preferred
- All exercises should be tested on actual GCP to ensure they work
- Consider creating sample MNIST models for exercises in a separate directory
- Statistical examples should use commonly available Python packages (scipy, numpy)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complete deployment_testing.md module - Implementation plan #568

Overview

Current State Analysis

✅ What's Complete

❌ What's Incomplete/Missing

Implementation Requirements

Implementation Plan

Phase 1: Introduction & Structure (Lines 1-22)

Phase 2: A/B Testing Section (Lines 23-105)

Phase 3: Canary Deployment Section (Lines 107-126)

Phase 4: Shadow Deployment Section (Lines 128-169)

Phase 5: Comparison Section (NEW)

Phase 6: Best Practices (NEW)

Phase 7: Knowledge Check (Lines 171-188)

Phase 8: Advanced Topics (NEW - Optional)

Phase 9: Code Quality & Polish

Estimated Impact

Implementation Order

Success Criteria

Related Files

Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Section	Current Lines	Estimated Addition	Total Lines
Introduction	22	+30	~52
A/B Testing	82	+80	~162
Canary	19	+60	~79
Shadow	41	+70	~111
Comparison (NEW)	0	+30	~30
Best Practices (NEW)	0	+40	~40
Knowledge Check	18	+20	~38
Advanced Topics (NEW)	0	+40	~40
TOTAL	190	+370	~560

Complete deployment_testing.md module - Implementation plan #568

Description

Overview

Current State Analysis

✅ What's Complete

❌ What's Incomplete/Missing

Implementation Requirements

Implementation Plan

Phase 1: Introduction & Structure (Lines 1-22)

Phase 2: A/B Testing Section (Lines 23-105)

Phase 3: Canary Deployment Section (Lines 107-126)

Phase 4: Shadow Deployment Section (Lines 128-169)

Phase 5: Comparison Section (NEW)

Phase 6: Best Practices (NEW)

Phase 7: Knowledge Check (Lines 171-188)

Phase 8: Advanced Topics (NEW - Optional)

Phase 9: Code Quality & Polish

Estimated Impact

Implementation Order

Success Criteria

Related Files

Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions