Skip to content

Commit aa327d5

Browse files
Add apt-retry wrapper for transient apt mirror failures (#882)
### What Add a bash wrapper script that retries apt commands with decorrelated jitter backoff (up to 5 attempts). Apply it to all apt-get update and install commands in the Dockerfile builder stages and dependencies script. ### Why @sisuresh raised today that he'd seen intermittent failures relating to apt installs. Docker builds intermittently fail due to transient apt mirror issues. Analysis of recent failed runs found the following apt-related failures: | Run | Stage | Error Type | Details | |-----|-------|------------|---------| | [20799747898](https://github.com/stellar/quickstart/actions/runs/20799747898) | `apt-get install` | 404 Not Found | `libglib2.0-0t64_2.80.0-6ubuntu3.5_arm64.deb` | | [20793261714](https://github.com/stellar/quickstart/actions/runs/20793261714) | `apt-get install` | 404 Not Found | `libxslt1.1_1.1.39-0exp1ubuntu0.24.04.3_amd64.deb` | | [19283455787](https://github.com/stellar/quickstart/actions/runs/19283455787) | `apt-get update` | Size mismatch | `Packages.gz - Mirror sync in progress?` | | [19276873082](https://github.com/stellar/quickstart/actions/runs/19276873082) | `apt-get update` | Size mismatch | `Packages.gz - Mirror sync in progress?` | There are two failure patterns: 1. **404 during install** - The package index is stale. `apt-get update` ran and cached version info, but by the time install runs, the mirror has newer versions and old .deb files are gone. 2. **Size mismatch during update** - The mirror is actively syncing while apt-get update is running. Both are transient and benefit from retries. The 404 errors during install require re-running `apt-get update` before retrying install, which is why this PR wraps the entire `apt-get update && apt-get install` sequence together rather than wrapping each command separately. Close #881
1 parent d08c68f commit aa327d5

File tree

4 files changed

+56
-11
lines changed

4 files changed

+56
-11
lines changed

.github/workflows/internal-build.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ on:
3939
cache_id:
4040
description: "A value insert into cache keys to namespace cache usage, or invalidate it by incrementing"
4141
type: "string"
42-
default: 18
42+
default: 19
4343
cache_prefix:
4444
description: "A prefix added to all cache keys generated by this workflow"
4545
type: "string"

Dockerfile

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -40,12 +40,13 @@ COPY --from=stellar-xdr-builder /usr/local/cargo/bin/stellar-xdr /stellar-xdr
4040
FROM ubuntu:24.04 AS stellar-core-builder
4141

4242
ENV DEBIAN_FRONTEND=noninteractive
43-
RUN apt-get update && \
43+
COPY apt-retry /usr/local/bin/
44+
RUN apt-retry sh -c 'apt-get update && \
4445
apt-get -y install iproute2 procps lsb-release \
4546
git build-essential pkg-config autoconf automake libtool \
4647
bison flex sed perl libpq-dev parallel \
4748
clang-20 libc++abi-20-dev libc++-20-dev \
48-
postgresql curl jq
49+
postgresql curl jq'
4950

5051
ARG CORE_REPO
5152
ARG CORE_REF
@@ -96,7 +97,8 @@ ENV RUSTUP_HOME=/rust/.rust
9697
ENV PATH="/usr/local/go/bin:$CARGO_HOME/bin:${PATH}"
9798
ENV DEBIAN_FRONTEND=noninteractive
9899

99-
RUN apt-get update && apt-get install -y build-essential jq && apt-get clean
100+
COPY apt-retry /usr/local/bin/
101+
RUN apt-retry sh -c 'apt-get update && apt-get install -y build-essential jq' && apt-get clean
100102
RUN curl https://sh.rustup.rs -sSf | sh -s -- -y --default-toolchain $RUST_TOOLCHAIN_VERSION
101103

102104
RUN make build-stellar-rpc
@@ -110,7 +112,8 @@ COPY --from=stellar-rpc-builder /go/src/github.com/stellar/stellar-rpc/stellar-r
110112
FROM golang:1.24-trixie AS stellar-horizon-builder
111113

112114
ENV DEBIAN_FRONTEND=noninteractive
113-
RUN apt-get update && apt-get -y install jq
115+
COPY apt-retry /usr/local/bin/
116+
RUN apt-retry sh -c 'apt-get update && apt-get -y install jq'
114117

115118
ARG HORIZON_REPO
116119
ARG HORIZON_REF
@@ -134,7 +137,8 @@ COPY --from=stellar-horizon-builder /stellar-horizon /stellar-horizon
134137
FROM golang:1.24-trixie AS stellar-friendbot-builder
135138

136139
ENV DEBIAN_FRONTEND=noninteractive
137-
RUN apt-get update && apt-get -y install jq
140+
COPY apt-retry /usr/local/bin/
141+
RUN apt-retry sh -c 'apt-get update && apt-get -y install jq'
138142

139143
ARG FRIENDBOT_REPO
140144
ARG FRIENDBOT_REF
@@ -206,6 +210,7 @@ EXPOSE 8100
206210
EXPOSE 11625
207211
EXPOSE 11626
208212

213+
COPY apt-retry /usr/local/bin/
209214
ADD dependencies /
210215
RUN /dependencies
211216

apt-retry

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
#!/usr/bin/env bash
2+
set -e
3+
4+
# Retry wrapper for apt commands to handle transient mirror failures.
5+
# Uses decorrelated jitter: delay = rand(base, 3 * prev_delay), capped at max_delay
6+
# Usage: apt-retry apt-get update && apt-retry apt-get install -y <packages>
7+
8+
RED='\033[0;31m'
9+
NC='\033[0m' # No Color
10+
11+
max_attempts=5
12+
base_delay=2
13+
max_delay=20
14+
15+
# Decorrelated jitter: random value between base_delay and 3 * previous delay, capped
16+
next_delay() {
17+
local prev=$1
18+
local range=$((3 * prev - base_delay + 1))
19+
local next=$((base_delay + RANDOM % range))
20+
if [ $next -gt $max_delay ]; then
21+
next=$max_delay
22+
fi
23+
echo $next
24+
}
25+
26+
delay=$base_delay
27+
attempt=1
28+
29+
while [ $attempt -le $max_attempts ]; do
30+
if "$@"; then
31+
exit 0
32+
fi
33+
if [ $attempt -lt $max_attempts ]; then
34+
echo -e "${RED}apt command failed (attempt $attempt/$max_attempts), retrying in ${delay}s...${NC}"
35+
sleep $delay
36+
fi
37+
attempt=$((attempt + 1))
38+
delay=$(next_delay $delay)
39+
done
40+
41+
echo -e "${RED}apt command failed after $max_attempts attempts${NC}"
42+
exit 1

dependencies

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,19 +5,17 @@ set -e
55
export DEBIAN_FRONTEND=noninteractive
66

77
# Add PostgreSQL APT repository for PostgreSQL 14
8-
apt-get update
9-
apt-get install -y curl ca-certificates lsb-release
8+
apt-retry sh -c 'apt-get update && apt-get install -y curl ca-certificates lsb-release'
109
install -d /usr/share/postgresql-common/pgdg
1110
curl -o /usr/share/postgresql-common/pgdg/apt.postgresql.org.asc --fail https://www.postgresql.org/media/keys/ACCC4CF8.asc
1211
. /etc/os-release
1312
sh -c "echo 'deb [signed-by=/usr/share/postgresql-common/pgdg/apt.postgresql.org.asc] https://apt.postgresql.org/pub/repos/apt ${VERSION_CODENAME}-pgdg main' > /etc/apt/sources.list.d/pgdg.list"
1413

15-
apt-get update
16-
apt-get install -y curl apt-transport-https \
14+
apt-retry sh -c 'apt-get update && apt-get install -y curl apt-transport-https \
1715
postgresql-client-14 postgresql-14 postgresql-contrib \
1816
sudo supervisor psmisc \
1917
nginx rsync jq golang-github-pelletier-go-toml netcat-openbsd \
20-
libunwind8 sqlite3 libc++abi1-20 libc++1-20
18+
libunwind8 sqlite3 libc++abi1-20 libc++1-20'
2119
apt-get clean
2220
rm -rf /var/lib/apt/lists/*
2321

0 commit comments

Comments
 (0)