|
| 1 | +--- |
| 2 | +alwaysApply: false |
| 3 | +description: Troubleshooting guide for Docker builds and GitHub Actions workflows |
| 4 | +--- |
| 5 | + |
| 6 | +# Troubleshooting Docker Builds and GitHub Actions Workflows |
| 7 | + |
| 8 | +## GitHub Actions Workflow Issues |
| 9 | + |
| 10 | +### Symptom: Reusable Workflow Build Job Doesn't Run |
| 11 | + |
| 12 | +**Problem:** Only the `prepare` job completes, but the `build` job never starts. |
| 13 | + |
| 14 | +**Common Causes:** |
| 15 | +1. Missing `secrets: inherit` in workflow call |
| 16 | +2. Incorrect boolean conversion for inputs |
| 17 | +3. Workflow syntax errors |
| 18 | + |
| 19 | +**Solution:** |
| 20 | +```yaml |
| 21 | +build: |
| 22 | + needs: prepare |
| 23 | + uses: ./.github/workflows/docker-build.yml |
| 24 | + permissions: |
| 25 | + contents: read |
| 26 | + packages: write |
| 27 | + secrets: inherit # ← Must have this! |
| 28 | + with: |
| 29 | + force_rebuild: ${{ github.event.inputs.force_rebuild == 'true' }} # ← Not || false |
| 30 | +``` |
| 31 | + |
| 32 | +**Verification:** |
| 33 | +```bash |
| 34 | +# Check if build job ran |
| 35 | +gh api /repos/:owner/:repo/actions/runs/<run-id>/jobs | \ |
| 36 | + jq -r '.jobs[] | "\(.name): \(.conclusion)"' |
| 37 | + |
| 38 | +# Should show both prepare AND build jobs |
| 39 | +``` |
| 40 | + |
| 41 | +### Symptom: Boolean Input Not Working |
| 42 | + |
| 43 | +**Problem:** `force_rebuild=true` passed but cache still used. |
| 44 | + |
| 45 | +**Cause:** workflow_dispatch inputs are strings, not booleans. |
| 46 | + |
| 47 | +**Wrong:** |
| 48 | +```yaml |
| 49 | +force_rebuild: ${{ github.event.inputs.force_rebuild || false }} |
| 50 | +``` |
| 51 | + |
| 52 | +**Correct:** |
| 53 | +```yaml |
| 54 | +force_rebuild: ${{ github.event.inputs.force_rebuild == 'true' }} |
| 55 | +``` |
| 56 | + |
| 57 | +## Docker Build Issues |
| 58 | + |
| 59 | +### Symptom: Packages Missing Despite Dockerfile Having Install Command |
| 60 | + |
| 61 | +**Problem:** Image builds successfully but installed packages are missing. |
| 62 | + |
| 63 | +**Common Causes:** |
| 64 | +1. Docker layer caching - using old cached layer |
| 65 | +2. Multi-platform build with QEMU emulation failure (ARM64) |
| 66 | +3. Installation command exits silently without error |
| 67 | + |
| 68 | +**Diagnosis:** |
| 69 | +```bash |
| 70 | +# Check build logs for layer status |
| 71 | +gh run view <run-id> --log | grep "CACHED\|DONE" |
| 72 | + |
| 73 | +# Look for suspiciously fast completion (0.1s = cached or failed) |
| 74 | +gh run view <run-id> --log | grep -A 2 "pip install" |
| 75 | + |
| 76 | +# Verify packages in published image |
| 77 | +docker run --rm <image> pip list | grep <package> |
| 78 | +``` |
| 79 | + |
| 80 | +**Solution 1: Force Rebuild** |
| 81 | +```bash |
| 82 | +# Bypass all caching |
| 83 | +gh workflow run "🐳 Workflow Name" --ref master \ |
| 84 | + -f version=v1.0.0 \ |
| 85 | + -f force_rebuild=true |
| 86 | +``` |
| 87 | + |
| 88 | +**Solution 2: Check Platform-Specific Builds** |
| 89 | +```bash |
| 90 | +# Test amd64 |
| 91 | +docker run --platform=linux/amd64 --rm <image> <command> |
| 92 | + |
| 93 | +# Test arm64 |
| 94 | +docker run --platform=linux/arm64 --rm <image> <command> |
| 95 | +``` |
| 96 | + |
| 97 | +### Symptom: ARM64 Build Completes in 0.1s (QEMU Emulation Failure) |
| 98 | + |
| 99 | +**Problem:** ARM64 build shows `DONE 0.1s` for steps that should take minutes. |
| 100 | + |
| 101 | +**Example Log:** |
| 102 | +``` |
| 103 | +#17 [linux/arm64 5/7] RUN bash -c '. ${IDF_PATH}/export.sh && pip install ...' |
| 104 | +#17 0.057 qemu-aarch64 version 10.0.4 |
| 105 | +#17 DONE 0.1s ← Should take 1-2 minutes! |
| 106 | +``` |
| 107 | + |
| 108 | +**Cause:** QEMU emulation fails when executing shell initialization (source, export.sh). |
| 109 | + |
| 110 | +**Solutions:** |
| 111 | +1. **Use direct venv paths for installation:** |
| 112 | + ```dockerfile |
| 113 | + # Instead of: |
| 114 | + RUN bash -c '. ${IDF_PATH}/export.sh && pip install packages' |
| 115 | + |
| 116 | + # Use: |
| 117 | + RUN /opt/esp/python_env/idf5.4_py3.12_env/bin/pip install packages |
| 118 | + ``` |
| 119 | + |
| 120 | +2. **For verification, use sourced environment (amd64 only):** |
| 121 | + ```dockerfile |
| 122 | + RUN bash -c '. ${IDF_PATH}/export.sh && pytest --version' |
| 123 | + ``` |
| 124 | + |
| 125 | +3. **Build amd64 only if ARM64 fails:** |
| 126 | + ```yaml |
| 127 | + # In docker-build.yml |
| 128 | + platforms: linux/amd64 # Instead of linux/amd64,linux/arm64 |
| 129 | + ``` |
| 130 | + |
| 131 | +### Symptom: Verification Step Fails - "No module named pytest" |
| 132 | + |
| 133 | +**Problem:** `python3 -m pytest` fails with module not found. |
| 134 | + |
| 135 | +**Common Causes:** |
| 136 | +1. System python3 vs venv python3 |
| 137 | +2. Packages installed in venv but verification uses system python |
| 138 | +3. Environment not activated |
| 139 | + |
| 140 | +**Wrong Approaches:** |
| 141 | +```dockerfile |
| 142 | +# Uses system Python (doesn't have packages) |
| 143 | +RUN python3 -m pytest --version |
| 144 | + |
| 145 | +# Hardcoded paths (not maintainable) |
| 146 | +RUN /opt/esp/python_env/idf5.4_py3.12_env/bin/python3 -m pytest --version |
| 147 | +``` |
| 148 | + |
| 149 | +**Correct Approach:** |
| 150 | +```dockerfile |
| 151 | +# Activate environment first, then use simple commands |
| 152 | +RUN bash -c '. ${IDF_PATH}/export.sh && pytest --version' |
| 153 | +``` |
| 154 | + |
| 155 | +## Testing Published Images |
| 156 | + |
| 157 | +### Quick Verification Commands |
| 158 | + |
| 159 | +```bash |
| 160 | +# Check if package is installed |
| 161 | +docker run --rm <image> pip list | grep <package> |
| 162 | + |
| 163 | +# Check if command works |
| 164 | +docker run --rm <image> <command> --version |
| 165 | + |
| 166 | +# Test Python import |
| 167 | +docker run --rm <image> python3 -c 'import <module>; print(<module>.__version__)' |
| 168 | + |
| 169 | +# Interactive testing |
| 170 | +docker run -it --rm <image> |
| 171 | +``` |
| 172 | + |
| 173 | +### Platform-Specific Testing |
| 174 | + |
| 175 | +```bash |
| 176 | +# Force specific platform |
| 177 | +docker pull --platform=linux/amd64 <image> |
| 178 | +docker run --platform=linux/amd64 --rm <image> <command> |
| 179 | + |
| 180 | +# Check current platform |
| 181 | +docker run --rm <image> uname -m |
| 182 | +docker inspect <image> --format='{{.Architecture}}' |
| 183 | +``` |
| 184 | + |
| 185 | +## Force Rebuild Pattern |
| 186 | + |
| 187 | +When Docker cache causes issues: |
| 188 | + |
| 189 | +```bash |
| 190 | +# 1. Trigger force rebuild |
| 191 | +gh workflow run "🐳 Workflow Name" --ref master \ |
| 192 | + -f version=v1.0.0 \ |
| 193 | + -f force_rebuild=true |
| 194 | + |
| 195 | +# 2. Wait for workflow to start |
| 196 | +sleep 15 |
| 197 | + |
| 198 | +# 3. Get run ID |
| 199 | +RUN_ID=$(gh run list --workflow="🐳 Workflow Name" --limit 1 \ |
| 200 | + --json databaseId -q '.[0].databaseId') |
| 201 | + |
| 202 | +# 4. Monitor progress |
| 203 | +gh run watch $RUN_ID |
| 204 | + |
| 205 | +# 5. Check if succeeded |
| 206 | +gh run view $RUN_ID --json conclusion -q '.conclusion' |
| 207 | + |
| 208 | +# 6. Pull and test new image |
| 209 | +docker pull <image>:latest |
| 210 | +docker run --rm <image>:latest <test-command> |
| 211 | +``` |
| 212 | + |
| 213 | +## Common Pitfalls |
| 214 | + |
| 215 | +1. **Auto-build uses cache** - Automatic builds from commits use cache by default |
| 216 | +2. **Multiple layers can be cached** - Force rebuild is needed to ensure fresh build |
| 217 | +3. **GHCR tags aren't updated** - Check digest to verify new image is different |
| 218 | +4. **Platform mismatch** - ARM64 Mac pulls arm64 by default, may need `--platform=linux/amd64` |
| 219 | +5. **Dev branch doesn't push** - Only master branch publishes to GHCR |
| 220 | + |
| 221 | +## Debugging Checklist |
| 222 | + |
| 223 | +When packages are missing from published image: |
| 224 | + |
| 225 | +- [ ] Check if Dockerfile has the package in pip install |
| 226 | +- [ ] Verify workflow ran successfully (both prepare AND build jobs) |
| 227 | +- [ ] Check build logs for "CACHED" vs actual execution |
| 228 | +- [ ] Look for 0.1s completion times (indicates cache or QEMU failure) |
| 229 | +- [ ] Test both platforms if multi-platform build |
| 230 | +- [ ] Trigger force rebuild to bypass cache |
| 231 | +- [ ] Pull fresh image and test with exact commands |
| 232 | +- [ ] Verify image digest changed after rebuild |
0 commit comments