Exercise 1: Run the Reference Workflow

Goal: Execute the schema-optimization workflow end-to-end and understand how all components work together.

Time: 20 minutes

Prerequisites:

Read GUIDE-00-START-HERE.md
Basic familiarity with bash and JSON

What You'll Learn

How to invoke the orchestrator
How phases chain together
How validation gates work
How verification scripts run
How to inspect session outputs

Step 1: Prepare Test Data

Create a mock BigQuery schema export:

cd workspace/lab/schema-optimization/

# Create test input directory
mkdir -p test-data/

# Create sample schema file
cat > test-data/users.json <<'EOF'
{
  "table": "users",
  "schema": [
    {"name": "id", "type": "INTEGER", "mode": "REQUIRED"},
    {"name": "email", "type": "STRING", "mode": "REQUIRED"},
    {"name": "created_at", "type": "TIMESTAMP", "mode": "REQUIRED"},
    {"name": "legacy_id", "type": "STRING", "mode": "NULLABLE"},
    {"name": "old_email", "type": "STRING", "mode": "NULLABLE"},
    {"name": "deprecated_field", "type": "STRING", "mode": "NULLABLE"},
    {"name": "notes", "type": "STRING", "mode": "NULLABLE"}
  ]
}
EOF

cat > test-data/orders.json <<'EOF'
{
  "table": "orders",
  "schema": [
    {"name": "id", "type": "INTEGER", "mode": "REQUIRED"},
    {"name": "user_id", "type": "INTEGER", "mode": "REQUIRED"},
    {"name": "total", "type": "FLOAT", "mode": "REQUIRED"},
    {"name": "created_at", "type": "TIMESTAMP", "mode": "REQUIRED"},
    {"name": "temp_flag", "type": "BOOLEAN", "mode": "NULLABLE"},
    {"name": "internal_notes", "type": "STRING", "mode": "NULLABLE"}
  ]
}
EOF

echo "✅ Test data created in test-data/"

Step 2: Test Verification Script Standalone

Before running the full workflow, verify the script works:

# Run script
./scripts/analyze_field_utilization.sh test-data/ test-output/

# Check output
cat test-output/field_utilization_report.json | jq .

# Verify structure
jq '.results.field_usage_breakdown.unused_fields' test-output/field_utilization_report.json

Expected:

Script completes without errors
JSON output is valid
Lists unused fields (legacy_id, old_email, deprecated_field, temp_flag)

Step 3: Invoke Orchestrator (Manual Simulation)

Since we're in a learning lab, we'll simulate the orchestrator by running phases manually:

Phase 1: Initial Schema Analysis

Create input file for Phase 1:

cat > phase1-input.json <<EOF
{
  "skill_dir": "$(pwd)",
  "session_dir": "$(pwd)/reports/runs/manual-test-$(date +%Y-%m-%d_%H%M%S)",
  "reference_path": "$(pwd)/references/01-phase-1.md",
  "input_folder": "$(pwd)/test-data",
  "extraction_type": "bigquery_json"
}
EOF

# Create session directory
SESSION_DIR=$(jq -r '.session_dir' phase1-input.json)
mkdir -p "$SESSION_DIR"

echo "Session directory: $SESSION_DIR"

Now you would invoke Phase 1 agent with this input.

For this exercise, let's manually create what Phase 1 would produce:

cat > "$SESSION_DIR/01-initial-schema-analysis.md" <<\EOF
# Phase 1: Initial Schema Analysis

**Session:** manual-test-2025-01-15_143022
**Generated:** 2025-01-15 14:30:22 CST
**Input:** test-data/
**Extraction Type:** bigquery_json

---

## Executive Summary

- Analyzed 2 tables with 13 total fields
- Identified standard naming patterns (id, created_at)
- Found 4 fields with legacy/deprecated naming
- 61% of fields are nullable

---

## Schema Overview

### Tables
- Total tables: 2
- Table size range: 6-7 fields per table
- Average fields per table: 6.5

### Fields
- Total fields across all tables: 13
- NULLABLE fields: 8 (61%)
- REQUIRED fields: 5 (39%)

### Data Types
| Type      | Count | Percentage |
| --------- | ----- | ---------- |
| STRING    | 7     | 54%        |
| INTEGER   | 3     | 23%        |
| FLOAT     | 1     | 8%         |
| TIMESTAMP | 2     | 15%        |

---

## Key Findings (Machine-Readable)
```json
{
  "total_tables": 2,
  "total_fields": 13,
  "nullable_pct": 61.5,
  "issues_count": 4
}
```
---

*Generated by Phase 1 Agent*
EOF

Phase 1 would return this JSON:

cat > "$SESSION_DIR/phase1-output.json" <<EOF
{
  "status": "complete",
  "report_path": "$SESSION_DIR/01-initial-schema-analysis.md",
  "schema_summary": {
    "total_tables": 2,
    "total_fields": 13,
    "key_findings": [
      "2 tables analyzed",
      "13 total fields",
      "4 legacy/deprecated fields detected"
    ]
  }
}
EOF

echo "✅ Phase 1 simulated"

Validate Phase 1 Output

# Check JSON is valid
jq . "$SESSION_DIR/phase1-output.json"

# Check report exists
ls -lh "$SESSION_DIR/01-initial-schema-analysis.md"

# Verify required keys
jq '.status, .report_path, .schema_summary' "$SESSION_DIR/phase1-output.json"

Expected: All checks pass ✅

Phase 4: Verification (The Critical Phase)

Now let's simulate Phase 4 which runs the actual script:

# Phase 4 input includes script path
cat > phase4-input.json <<EOF
{
  "skill_dir": "$(pwd)",
  "session_dir": "$SESSION_DIR",
  "reference_path": "$(pwd)/references/04-verify-with-script.md",
  "phase2_report_path": "$SESSION_DIR/02-field-utilization-analysis.md",
  "phase3_report_path": "$SESSION_DIR/03-impact-assessment.md",
  "input_folder": "$(pwd)/test-data",
  "script_path": "$(pwd)/scripts/analyze_field_utilization.sh",
  "output_folder_path": "$SESSION_DIR/script-output"
}
EOF

# Run the verification script
SCRIPT_PATH=$(jq -r '.script_path' phase4-input.json)
INPUT_FOLDER=$(jq -r '.input_folder' phase4-input.json)
OUTPUT_FOLDER=$(jq -r '.output_folder_path' phase4-input.json)

mkdir -p "$OUTPUT_FOLDER"

echo "Running verification script..."
$SCRIPT_PATH "$INPUT_FOLDER" "$OUTPUT_FOLDER"

echo "✅ Script completed"

# Inspect script output
cat "$OUTPUT_FOLDER/field_utilization_report.json" | jq .

Expected:

Script runs successfully
Finds unused fields: legacy_id, old_email, deprecated_field, temp_flag
JSON output is valid

Step 4: Inspect Session Directory

Look at what was created:

tree "$SESSION_DIR"

# Should show:
# reports/runs/manual-test-2025-01-15_143022/
# ├── 01-initial-schema-analysis.md
# ├── phase1-output.json
# └── script-output/
#     └── field_utilization_report.json

Read the reports:

# Phase 1 report
cat "$SESSION_DIR/01-initial-schema-analysis.md"

# Script results
jq '.results.field_usage_breakdown' "$SESSION_DIR/script-output/field_utilization_report.json"

Step 5: Understand the Workflow Pattern

Review what just happened:

Orchestrator creates session directory → reports/runs/<timestamp>/
Phase 1 runs → Writes 01-initial-schema-analysis.md, returns JSON
Orchestrator validates Phase 1 → Checks JSON, verifies file exists
Phase 4 runs → Executes analyze_field_utilization.sh, writes verification report
Orchestrator validates Phase 4 → Checks JSON, verifies script output
All artifacts preserved → Can inspect session directory later

Key insight: Every phase produces evidence (files) + structured output (JSON).

Step 6: Experiment

Try breaking things to see validation in action:

Experiment A: Missing Report File

# Simulate Phase 1 returning JSON but not writing file
rm "$SESSION_DIR/01-initial-schema-analysis.md"

# Try to validate
if [ -f "$(jq -r '.report_path' "$SESSION_DIR/phase1-output.json")" ]; then
  echo "✅ Validation passed"
else
  echo "❌ Validation failed: Report file missing"
fi

Experiment B: Invalid JSON

# Simulate Phase 1 returning malformed JSON
echo "This is not JSON" > "$SESSION_DIR/phase1-bad-output.txt"

# Try to parse
if jq . "$SESSION_DIR/phase1-bad-output.txt" 2>/dev/null; then
  echo "✅ Valid JSON"
else
  echo "❌ Invalid JSON"
fi

Experiment C: Missing Required Keys

# Simulate Phase 1 missing schema_summary
cat > "$SESSION_DIR/phase1-incomplete.json" <<EOF
{
  "status": "complete",
  "report_path": "$SESSION_DIR/01-initial-schema-analysis.md"
}
EOF

# Try to extract required key
if jq -e '.schema_summary' "$SESSION_DIR/phase1-incomplete.json" >/dev/null 2>&1; then
  echo "✅ Has schema_summary"
else
  echo "❌ Missing schema_summary"
fi

Step 7: Cleanup

# Remove test artifacts
rm -rf test-data/ test-output/ phase*-input.json

# Keep session directory for inspection
echo "Session preserved at: $SESSION_DIR"

What You Learned

✅ How to prepare test data
✅ How to run verification script standalone
✅ How phases produce reports + JSON
✅ How orchestrator validates outputs
✅ How session directories preserve evidence
✅ How to debug validation failures

Next Steps

Exercise 2: Modify the verification script to add new checks

Exercise 3: Add a Phase 6 to the workflow

Exercise 4: Build your own 3-phase workflow from scratch

Completed Exercise 1: Run the Reference Workflow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exercise 1: Run the Reference Workflow

What You'll Learn

Step 1: Prepare Test Data

Step 2: Test Verification Script Standalone

Step 3: Invoke Orchestrator (Manual Simulation)

Phase 1: Initial Schema Analysis

Phase 1 would return this JSON:

Validate Phase 1 Output

Phase 4: Verification (The Critical Phase)

Step 4: Inspect Session Directory

Step 5: Understand the Workflow Pattern

Step 6: Experiment

Experiment A: Missing Report File

Experiment B: Invalid JSON

Experiment C: Missing Required Keys

Step 7: Cleanup

What You Learned

Next Steps

FilesExpand file tree

exercise-1-run-workflow.md

Latest commit

History

exercise-1-run-workflow.md

File metadata and controls

Exercise 1: Run the Reference Workflow

What You'll Learn

Step 1: Prepare Test Data

Step 2: Test Verification Script Standalone

Step 3: Invoke Orchestrator (Manual Simulation)

Phase 1: Initial Schema Analysis

Phase 1 would return this JSON:

Validate Phase 1 Output

Phase 4: Verification (The Critical Phase)

Step 4: Inspect Session Directory

Step 5: Understand the Workflow Pattern

Step 6: Experiment

Experiment A: Missing Report File

Experiment B: Invalid JSON

Experiment C: Missing Required Keys

Step 7: Cleanup

What You Learned

Next Steps