Skip to content

Commit 748137a

Browse files
committed
Add comprehensive troubleshooting guide for Docker and GitHub Actions
- Create troubleshooting-docker-workflows.mdc rule - Document reusable workflow issues (secrets, permissions, boolean conversion) - Document Docker cache and QEMU emulation debugging - Add platform-specific testing commands - Include force rebuild pattern and verification checklist - Capture all debugging patterns from pytest-embedded fix session
1 parent 061436a commit 748137a

File tree

1 file changed

+232
-0
lines changed

1 file changed

+232
-0
lines changed
Lines changed: 232 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,232 @@
1+
---
2+
alwaysApply: false
3+
description: Troubleshooting guide for Docker builds and GitHub Actions workflows
4+
---
5+
6+
# Troubleshooting Docker Builds and GitHub Actions Workflows
7+
8+
## GitHub Actions Workflow Issues
9+
10+
### Symptom: Reusable Workflow Build Job Doesn't Run
11+
12+
**Problem:** Only the `prepare` job completes, but the `build` job never starts.
13+
14+
**Common Causes:**
15+
1. Missing `secrets: inherit` in workflow call
16+
2. Incorrect boolean conversion for inputs
17+
3. Workflow syntax errors
18+
19+
**Solution:**
20+
```yaml
21+
build:
22+
needs: prepare
23+
uses: ./.github/workflows/docker-build.yml
24+
permissions:
25+
contents: read
26+
packages: write
27+
secrets: inherit # ← Must have this!
28+
with:
29+
force_rebuild: ${{ github.event.inputs.force_rebuild == 'true' }} # ← Not || false
30+
```
31+
32+
**Verification:**
33+
```bash
34+
# Check if build job ran
35+
gh api /repos/:owner/:repo/actions/runs/<run-id>/jobs | \
36+
jq -r '.jobs[] | "\(.name): \(.conclusion)"'
37+
38+
# Should show both prepare AND build jobs
39+
```
40+
41+
### Symptom: Boolean Input Not Working
42+
43+
**Problem:** `force_rebuild=true` passed but cache still used.
44+
45+
**Cause:** workflow_dispatch inputs are strings, not booleans.
46+
47+
**Wrong:**
48+
```yaml
49+
force_rebuild: ${{ github.event.inputs.force_rebuild || false }}
50+
```
51+
52+
**Correct:**
53+
```yaml
54+
force_rebuild: ${{ github.event.inputs.force_rebuild == 'true' }}
55+
```
56+
57+
## Docker Build Issues
58+
59+
### Symptom: Packages Missing Despite Dockerfile Having Install Command
60+
61+
**Problem:** Image builds successfully but installed packages are missing.
62+
63+
**Common Causes:**
64+
1. Docker layer caching - using old cached layer
65+
2. Multi-platform build with QEMU emulation failure (ARM64)
66+
3. Installation command exits silently without error
67+
68+
**Diagnosis:**
69+
```bash
70+
# Check build logs for layer status
71+
gh run view <run-id> --log | grep "CACHED\|DONE"
72+
73+
# Look for suspiciously fast completion (0.1s = cached or failed)
74+
gh run view <run-id> --log | grep -A 2 "pip install"
75+
76+
# Verify packages in published image
77+
docker run --rm <image> pip list | grep <package>
78+
```
79+
80+
**Solution 1: Force Rebuild**
81+
```bash
82+
# Bypass all caching
83+
gh workflow run "🐳 Workflow Name" --ref master \
84+
-f version=v1.0.0 \
85+
-f force_rebuild=true
86+
```
87+
88+
**Solution 2: Check Platform-Specific Builds**
89+
```bash
90+
# Test amd64
91+
docker run --platform=linux/amd64 --rm <image> <command>
92+
93+
# Test arm64
94+
docker run --platform=linux/arm64 --rm <image> <command>
95+
```
96+
97+
### Symptom: ARM64 Build Completes in 0.1s (QEMU Emulation Failure)
98+
99+
**Problem:** ARM64 build shows `DONE 0.1s` for steps that should take minutes.
100+
101+
**Example Log:**
102+
```
103+
#17 [linux/arm64 5/7] RUN bash -c '. ${IDF_PATH}/export.sh && pip install ...'
104+
#17 0.057 qemu-aarch64 version 10.0.4
105+
#17 DONE 0.1s ← Should take 1-2 minutes!
106+
```
107+
108+
**Cause:** QEMU emulation fails when executing shell initialization (source, export.sh).
109+
110+
**Solutions:**
111+
1. **Use direct venv paths for installation:**
112+
```dockerfile
113+
# Instead of:
114+
RUN bash -c '. ${IDF_PATH}/export.sh && pip install packages'
115+
116+
# Use:
117+
RUN /opt/esp/python_env/idf5.4_py3.12_env/bin/pip install packages
118+
```
119+
120+
2. **For verification, use sourced environment (amd64 only):**
121+
```dockerfile
122+
RUN bash -c '. ${IDF_PATH}/export.sh && pytest --version'
123+
```
124+
125+
3. **Build amd64 only if ARM64 fails:**
126+
```yaml
127+
# In docker-build.yml
128+
platforms: linux/amd64 # Instead of linux/amd64,linux/arm64
129+
```
130+
131+
### Symptom: Verification Step Fails - "No module named pytest"
132+
133+
**Problem:** `python3 -m pytest` fails with module not found.
134+
135+
**Common Causes:**
136+
1. System python3 vs venv python3
137+
2. Packages installed in venv but verification uses system python
138+
3. Environment not activated
139+
140+
**Wrong Approaches:**
141+
```dockerfile
142+
# Uses system Python (doesn't have packages)
143+
RUN python3 -m pytest --version
144+
145+
# Hardcoded paths (not maintainable)
146+
RUN /opt/esp/python_env/idf5.4_py3.12_env/bin/python3 -m pytest --version
147+
```
148+
149+
**Correct Approach:**
150+
```dockerfile
151+
# Activate environment first, then use simple commands
152+
RUN bash -c '. ${IDF_PATH}/export.sh && pytest --version'
153+
```
154+
155+
## Testing Published Images
156+
157+
### Quick Verification Commands
158+
159+
```bash
160+
# Check if package is installed
161+
docker run --rm <image> pip list | grep <package>
162+
163+
# Check if command works
164+
docker run --rm <image> <command> --version
165+
166+
# Test Python import
167+
docker run --rm <image> python3 -c 'import <module>; print(<module>.__version__)'
168+
169+
# Interactive testing
170+
docker run -it --rm <image>
171+
```
172+
173+
### Platform-Specific Testing
174+
175+
```bash
176+
# Force specific platform
177+
docker pull --platform=linux/amd64 <image>
178+
docker run --platform=linux/amd64 --rm <image> <command>
179+
180+
# Check current platform
181+
docker run --rm <image> uname -m
182+
docker inspect <image> --format='{{.Architecture}}'
183+
```
184+
185+
## Force Rebuild Pattern
186+
187+
When Docker cache causes issues:
188+
189+
```bash
190+
# 1. Trigger force rebuild
191+
gh workflow run "🐳 Workflow Name" --ref master \
192+
-f version=v1.0.0 \
193+
-f force_rebuild=true
194+
195+
# 2. Wait for workflow to start
196+
sleep 15
197+
198+
# 3. Get run ID
199+
RUN_ID=$(gh run list --workflow="🐳 Workflow Name" --limit 1 \
200+
--json databaseId -q '.[0].databaseId')
201+
202+
# 4. Monitor progress
203+
gh run watch $RUN_ID
204+
205+
# 5. Check if succeeded
206+
gh run view $RUN_ID --json conclusion -q '.conclusion'
207+
208+
# 6. Pull and test new image
209+
docker pull <image>:latest
210+
docker run --rm <image>:latest <test-command>
211+
```
212+
213+
## Common Pitfalls
214+
215+
1. **Auto-build uses cache** - Automatic builds from commits use cache by default
216+
2. **Multiple layers can be cached** - Force rebuild is needed to ensure fresh build
217+
3. **GHCR tags aren't updated** - Check digest to verify new image is different
218+
4. **Platform mismatch** - ARM64 Mac pulls arm64 by default, may need `--platform=linux/amd64`
219+
5. **Dev branch doesn't push** - Only master branch publishes to GHCR
220+
221+
## Debugging Checklist
222+
223+
When packages are missing from published image:
224+
225+
- [ ] Check if Dockerfile has the package in pip install
226+
- [ ] Verify workflow ran successfully (both prepare AND build jobs)
227+
- [ ] Check build logs for "CACHED" vs actual execution
228+
- [ ] Look for 0.1s completion times (indicates cache or QEMU failure)
229+
- [ ] Test both platforms if multi-platform build
230+
- [ ] Trigger force rebuild to bypass cache
231+
- [ ] Pull fresh image and test with exact commands
232+
- [ ] Verify image digest changed after rebuild

0 commit comments

Comments
 (0)