You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Behavior impact: only the scheduler’s discovery cadence changes; application dispatch still respects `--interval`, in-flight guards, fairness (LRU/fail-first, cooldown, per-repo-cap), and concurrency caps.
19
+
- Recommended: if startup delay is undesirable, run with `--warmup-cache=false`.
20
+
9
21
### Upgrade notes (no really, you MUST read this)
10
22
11
23
***Attention**: By default, `argocd-image-updater` now uses the K8s API to retrieve applications, instead of the Argo CD API. Also, it is now recommended to install in the same namespace as Argo CD is running in (`argocd` by default). For existing installations, which are running in a dedicated namespace.
@@ -29,6 +41,164 @@ handling on your side.
29
41
30
42
* refactor: make argocd-image-updater-config volume mapping optional (#145)
31
43
44
+
45
+
## 2025-09-18 - Release v100.0.5a
46
+
47
+
### Fixes
48
+
49
+
- fix(git): Prevent panic in batched writer when `GetCreds` is nil or write-back method is not Git
50
+
- Only enqueue batched writes when `wbc.Method == git`
51
+
- Guard in `repoWriter.commitBatch` for missing `GetCreds` (skip with log)
52
+
53
+
### Tests
54
+
55
+
- test(git): Strengthen batched writer test to set `Method: WriteBackGit` and provide `GetCreds` stub, so missing-GetCreds would fail tests
56
+
57
+
### Notes
58
+
59
+
- No flags or defaults changed; safe upgrade from v100.0.4a
60
+
61
+
## 2025-09-18 - Release v100.0.4a
62
+
63
+
### Changes
64
+
65
+
- test(git): Add unit test verifying batched writer flushes per-branch (monorepo safety)
66
+
- fix(git): Guard `getWriteBackBranch` against nil Application source
67
+
- docs: Clarify `--max-concurrency=0` (auto) in README quick reference
68
+
69
+
### Notes
70
+
71
+
- All existing tests pass. No changes to defaults or flags.
72
+
73
+
## 2025-09-18 - Release v100.0.3a
74
+
75
+
### Highlights
76
+
77
+
- Continuous mode: per-app scheduling with independent timers (no full-cycle waits)
78
+
- Auto concurrency: `--max-concurrency=0` computes workers from CPUs/apps
79
+
- Robust registry auth and I/O: singleflight + retries with backoff on `/jwt/auth`, tag and manifest operations
- Git retries (env-overridable); Batched writer (disable via `GIT_BATCH_DISABLE=true`)
132
+
133
+
### Docs
134
+
135
+
- docs(install): Performance flags and defaults (continuous mode, auto concurrency, JWT retry envs)
136
+
- docs(metrics): Expanded metrics section
137
+
138
+
### Tests
139
+
140
+
- test: Unit tests for transport caching, metrics wrappers, continuous scheduler basics, and end-to-end build
141
+
142
+
### Known issues
143
+
144
+
- Under very high concurrency and bursty load, upstream registry/SNAT limits may still cause intermittent timeouts. The new caps, retries, and singleflight significantly reduce impact; tune per‑registry limits and consider HTTP/2 where available.
145
+
146
+
## 2025-09-17 - Release v99.9.9 - 66de072
147
+
148
+
### New features
149
+
150
+
* feat: Reuse HTTP transports for registries with keep-alives and timeouts
151
+
* feat: Initialize registry refresh-token map to enable token reuse
152
+
* feat: Add Makefile `DOCKER` variable to support `podman`
153
+
154
+
### Improvements
155
+
156
+
* perf: Cache transports per registry+TLS mode; add sensible connection/timeouts
157
+
* resiliency: Retry/backoff for registry tag listing
158
+
* resiliency: Retry/backoff for git fetch/shallow-fetch/push during write-back
159
+
160
+
### Tests/Docs
161
+
162
+
* test: Add unit tests for transport caching and token map init
163
+
* docs: Requirements/notes updates
164
+
165
+
### Upgrade notes
166
+
167
+
* None
168
+
169
+
### Bug fixes
170
+
171
+
* None
172
+
173
+
### Bugs
174
+
175
+
* Under very high concurrency (300–500) after 2–3 hours, nodes may hit ephemeral port exhaustion causing registry dials to fail:
- This typically manifests across all registries simultaneously under heavy outbound connection churn.
183
+
- Root cause is excessive parallel dials combined with short‑lived connections (TIME_WAIT buildup), not a specific registry outage.
184
+
- Mitigations available in v100.0.0a: larger keep‑alive pools, lower MaxConnsPerHost, and ability to close idle on cache clear. Operational mitigations: reduce updater concurrency and/or per‑registry limits (e.g., 500→250; 50 rps→20–30 rps) while investigating.
185
+
186
+
Details:
187
+
- Old ports are “released” only after TIME_WAIT (2MSL). With HTTP/1.1 and big bursts, you create more concurrent outbound sockets than the ephemeral range can recycle before TIME_WAIT expires, so you hit “cannot assign requested address” even though old sockets eventually close.
188
+
- Why it still happens under 250/100 RPS:
189
+
- Each new dial consumes a unique local ephemeral port to the same dst tuple. TIME_WAIT lasts ~60–120s (kernel dependent). Bursty concurrency + short interval means you outpace reuse.
190
+
- Go HTTP/1.1 doesn’t pipeline; reuse works only if there’s an idle kept‑alive socket. If many goroutines need sockets at once, you dial anyway.
191
+
- Often compounded by SNAT limits at the node (Kubernetes egress): per‑dst NAT port cap can exhaust even faster.
192
+
- How to confirm quickly:
193
+
- Check TIME_WAIT to the registry IP:port: `ss -antp | grep :5000 | grep TIME_WAIT | wc -l`
0 commit comments