Goal: Execute the schema-optimization workflow end-to-end and understand how all components work together.
Time: 20 minutes
Prerequisites:
- Read GUIDE-00-START-HERE.md
- Basic familiarity with bash and JSON
- How to invoke the orchestrator
- How phases chain together
- How validation gates work
- How verification scripts run
- How to inspect session outputs
Create a mock BigQuery schema export:
cd workspace/lab/schema-optimization/
# Create test input directory
mkdir -p test-data/
# Create sample schema file
cat > test-data/users.json <<'EOF'
{
"table": "users",
"schema": [
{"name": "id", "type": "INTEGER", "mode": "REQUIRED"},
{"name": "email", "type": "STRING", "mode": "REQUIRED"},
{"name": "created_at", "type": "TIMESTAMP", "mode": "REQUIRED"},
{"name": "legacy_id", "type": "STRING", "mode": "NULLABLE"},
{"name": "old_email", "type": "STRING", "mode": "NULLABLE"},
{"name": "deprecated_field", "type": "STRING", "mode": "NULLABLE"},
{"name": "notes", "type": "STRING", "mode": "NULLABLE"}
]
}
EOF
cat > test-data/orders.json <<'EOF'
{
"table": "orders",
"schema": [
{"name": "id", "type": "INTEGER", "mode": "REQUIRED"},
{"name": "user_id", "type": "INTEGER", "mode": "REQUIRED"},
{"name": "total", "type": "FLOAT", "mode": "REQUIRED"},
{"name": "created_at", "type": "TIMESTAMP", "mode": "REQUIRED"},
{"name": "temp_flag", "type": "BOOLEAN", "mode": "NULLABLE"},
{"name": "internal_notes", "type": "STRING", "mode": "NULLABLE"}
]
}
EOF
echo "✅ Test data created in test-data/"Before running the full workflow, verify the script works:
# Run script
./scripts/analyze_field_utilization.sh test-data/ test-output/
# Check output
cat test-output/field_utilization_report.json | jq .
# Verify structure
jq '.results.field_usage_breakdown.unused_fields' test-output/field_utilization_report.jsonExpected:
- Script completes without errors
- JSON output is valid
- Lists unused fields (legacy_id, old_email, deprecated_field, temp_flag)
Since we're in a learning lab, we'll simulate the orchestrator by running phases manually:
Create input file for Phase 1:
cat > phase1-input.json <<EOF
{
"skill_dir": "$(pwd)",
"session_dir": "$(pwd)/reports/runs/manual-test-$(date +%Y-%m-%d_%H%M%S)",
"reference_path": "$(pwd)/references/01-phase-1.md",
"input_folder": "$(pwd)/test-data",
"extraction_type": "bigquery_json"
}
EOF
# Create session directory
SESSION_DIR=$(jq -r '.session_dir' phase1-input.json)
mkdir -p "$SESSION_DIR"
echo "Session directory: $SESSION_DIR"Now you would invoke Phase 1 agent with this input.
For this exercise, let's manually create what Phase 1 would produce:
cat > "$SESSION_DIR/01-initial-schema-analysis.md" <<\EOF
# Phase 1: Initial Schema Analysis
**Session:** manual-test-2025-01-15_143022
**Generated:** 2025-01-15 14:30:22 CST
**Input:** test-data/
**Extraction Type:** bigquery_json
---
## Executive Summary
- Analyzed 2 tables with 13 total fields
- Identified standard naming patterns (id, created_at)
- Found 4 fields with legacy/deprecated naming
- 61% of fields are nullable
---
## Schema Overview
### Tables
- Total tables: 2
- Table size range: 6-7 fields per table
- Average fields per table: 6.5
### Fields
- Total fields across all tables: 13
- NULLABLE fields: 8 (61%)
- REQUIRED fields: 5 (39%)
### Data Types
| Type | Count | Percentage |
| --------- | ----- | ---------- |
| STRING | 7 | 54% |
| INTEGER | 3 | 23% |
| FLOAT | 1 | 8% |
| TIMESTAMP | 2 | 15% |
---
## Key Findings (Machine-Readable)
```json
{
"total_tables": 2,
"total_fields": 13,
"nullable_pct": 61.5,
"issues_count": 4
}
```
---
*Generated by Phase 1 Agent*
EOF
cat > "$SESSION_DIR/phase1-output.json" <<EOF
{
"status": "complete",
"report_path": "$SESSION_DIR/01-initial-schema-analysis.md",
"schema_summary": {
"total_tables": 2,
"total_fields": 13,
"key_findings": [
"2 tables analyzed",
"13 total fields",
"4 legacy/deprecated fields detected"
]
}
}
EOF
echo "✅ Phase 1 simulated"# Check JSON is valid
jq . "$SESSION_DIR/phase1-output.json"
# Check report exists
ls -lh "$SESSION_DIR/01-initial-schema-analysis.md"
# Verify required keys
jq '.status, .report_path, .schema_summary' "$SESSION_DIR/phase1-output.json"Expected: All checks pass ✅
Now let's simulate Phase 4 which runs the actual script:
# Phase 4 input includes script path
cat > phase4-input.json <<EOF
{
"skill_dir": "$(pwd)",
"session_dir": "$SESSION_DIR",
"reference_path": "$(pwd)/references/04-verify-with-script.md",
"phase2_report_path": "$SESSION_DIR/02-field-utilization-analysis.md",
"phase3_report_path": "$SESSION_DIR/03-impact-assessment.md",
"input_folder": "$(pwd)/test-data",
"script_path": "$(pwd)/scripts/analyze_field_utilization.sh",
"output_folder_path": "$SESSION_DIR/script-output"
}
EOF
# Run the verification script
SCRIPT_PATH=$(jq -r '.script_path' phase4-input.json)
INPUT_FOLDER=$(jq -r '.input_folder' phase4-input.json)
OUTPUT_FOLDER=$(jq -r '.output_folder_path' phase4-input.json)
mkdir -p "$OUTPUT_FOLDER"
echo "Running verification script..."
$SCRIPT_PATH "$INPUT_FOLDER" "$OUTPUT_FOLDER"
echo "✅ Script completed"
# Inspect script output
cat "$OUTPUT_FOLDER/field_utilization_report.json" | jq .Expected:
- Script runs successfully
- Finds unused fields: legacy_id, old_email, deprecated_field, temp_flag
- JSON output is valid
Look at what was created:
tree "$SESSION_DIR"
# Should show:
# reports/runs/manual-test-2025-01-15_143022/
# ├── 01-initial-schema-analysis.md
# ├── phase1-output.json
# └── script-output/
# └── field_utilization_report.jsonRead the reports:
# Phase 1 report
cat "$SESSION_DIR/01-initial-schema-analysis.md"
# Script results
jq '.results.field_usage_breakdown' "$SESSION_DIR/script-output/field_utilization_report.json"Review what just happened:
- Orchestrator creates session directory →
reports/runs/<timestamp>/ - Phase 1 runs → Writes
01-initial-schema-analysis.md, returns JSON - Orchestrator validates Phase 1 → Checks JSON, verifies file exists
- Phase 4 runs → Executes
analyze_field_utilization.sh, writes verification report - Orchestrator validates Phase 4 → Checks JSON, verifies script output
- All artifacts preserved → Can inspect session directory later
Key insight: Every phase produces evidence (files) + structured output (JSON).
Try breaking things to see validation in action:
# Simulate Phase 1 returning JSON but not writing file
rm "$SESSION_DIR/01-initial-schema-analysis.md"
# Try to validate
if [ -f "$(jq -r '.report_path' "$SESSION_DIR/phase1-output.json")" ]; then
echo "✅ Validation passed"
else
echo "❌ Validation failed: Report file missing"
fi# Simulate Phase 1 returning malformed JSON
echo "This is not JSON" > "$SESSION_DIR/phase1-bad-output.txt"
# Try to parse
if jq . "$SESSION_DIR/phase1-bad-output.txt" 2>/dev/null; then
echo "✅ Valid JSON"
else
echo "❌ Invalid JSON"
fi# Simulate Phase 1 missing schema_summary
cat > "$SESSION_DIR/phase1-incomplete.json" <<EOF
{
"status": "complete",
"report_path": "$SESSION_DIR/01-initial-schema-analysis.md"
}
EOF
# Try to extract required key
if jq -e '.schema_summary' "$SESSION_DIR/phase1-incomplete.json" >/dev/null 2>&1; then
echo "✅ Has schema_summary"
else
echo "❌ Missing schema_summary"
fi# Remove test artifacts
rm -rf test-data/ test-output/ phase*-input.json
# Keep session directory for inspection
echo "Session preserved at: $SESSION_DIR"✅ How to prepare test data
✅ How to run verification script standalone
✅ How phases produce reports + JSON
✅ How orchestrator validates outputs
✅ How session directories preserve evidence
✅ How to debug validation failures
Exercise 2: Modify the verification script to add new checks
Exercise 3: Add a Phase 6 to the workflow
Exercise 4: Build your own 3-phase workflow from scratch
Completed Exercise 1: Run the Reference Workflow