Skip to content

Commit 2854448

Browse files
authored
feat(docker): automatic host setup for ScyllaDB with optional XFS support (#4475)
## Summary - Added automatic host system configuration for ScyllaDB via a setup container - Fixed critical AIO (Asynchronous I/O) limits that were causing ScyllaDB to fail - XFS filesystem is now completely optional - provides performance benefits but not required - All system parameters are dynamically calculated based on available resources ## Key Issue Resolved ScyllaDB was failing with `io_setup: Resource temporarily unavailable` due to insufficient AIO slots. The real issue was not XFS but the host's AIO limits being too low (65536 vs required 88208+). ## Changes ### 🚀 Major Features - **New `scylla-setup` container** that runs before ScyllaDB to configure host system: - Uses Ubuntu 24.04 with bash for reliability - Runs in privileged mode to modify host kernel parameters - Dynamically calculates all values based on CPU cores and memory - Always completes successfully (warnings for failed settings) - **Dynamic parameter calculation**: - AIO limits: `cores × shards × 65536 + 50% buffer` - Network settings: Scaled with CPU cores (somaxconn, syn_backlog) - Memory settings: Based on available RAM (vm.max_map_count) - File descriptors: 200K per core (minimum 1M) - Network buffers: 1% of RAM (16MB-128MB range) - **XFS is now optional**: - Removed `--skip-xfs-check` flag (unnecessary for optional feature) - XFS path validation provides information only, never blocks deployment - Script continues with Docker volumes if XFS not configured - Clear messaging that XFS is optional, not required ### 🔧 System Optimizations The setup container configures: - ✅ AIO max-nr (critical for ScyllaDB) - ✅ Network settings (somaxconn, TCP buffers, SYN cookies) - ✅ Memory settings (swappiness, dirty ratios, max_map_count) - ✅ Transparent Huge Pages disabled - ✅ CPU governor to performance mode - ✅ NUMA balancing disabled (ScyllaDB manages it) - ✅ File descriptor limits ### 📚 Documentation - Updated help text to show XFS as optional - Added clear instructions for XFS setup using LABEL/UUID - Removed confusing `--skip-xfs-check` option - Log output improvements to stderr for clean command substitution ## Test Plan - [x] Verify scylla-setup container runs and configures host - [x] Test ScyllaDB starts successfully without XFS - [x] Confirm AIO limits are properly set - [x] Validate dynamic calculation based on system resources - [x] Test with XFS path shows informational messages only - [x] Ensure deployment never blocks on XFS issues - [x] Verify backward compatibility ## Context The deployment was failing because: 1. ScyllaDB requires high AIO limits (async I/O operations) 2. Default Linux systems have low AIO limits (65536) 3. ScyllaDB with `--developer-mode 0` enforces production requirements The solution automatically configures the host system for optimal ScyllaDB performance while making XFS completely optional. 🤖 Generated with [Claude Code](https://claude.ai/code)
1 parent 0a29274 commit 2854448

File tree

3 files changed

+568
-12
lines changed

3 files changed

+568
-12
lines changed

docker/compose-scylla-setup.sh

Lines changed: 327 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,327 @@
1+
#!/bin/bash
2+
# ScyllaDB Host Setup Script
3+
# This script runs in a privileged container to configure host system for ScyllaDB
4+
# Values are calculated dynamically based on system resources per ScyllaDB recommendations
5+
# With --persist flag, also creates /etc/sysctl.d/99-scylladb.conf for permanent settings
6+
7+
# Don't exit on errors - we want to configure what we can
8+
set +e
9+
10+
# Check for --persist flag
11+
PERSIST_MODE=false
12+
if [ "$1" = "--persist" ]; then
13+
PERSIST_MODE=true
14+
echo "=== Running in PERSIST mode - will create /etc/sysctl.d/99-scylladb.conf ==="
15+
fi
16+
17+
echo "=== ScyllaDB Host System Setup ==="
18+
if [ "$PERSIST_MODE" = true ]; then
19+
echo "Mode: Persistent (creating sysctl.d configuration)"
20+
else
21+
echo "Mode: Temporary (runtime only)"
22+
fi
23+
echo ""
24+
25+
# Check if running with sufficient privileges
26+
if [ ! -d /host/proc ]; then
27+
echo "ERROR: Host /proc not mounted. Please ensure volumes are configured:"
28+
echo " volumes:"
29+
echo " - /proc:/host/proc"
30+
echo " - /sys:/host/sys"
31+
if [ "$PERSIST_MODE" = true ]; then
32+
echo " - /etc/sysctl.d:/host/sysctl.d"
33+
fi
34+
exit 1
35+
fi
36+
37+
# Check sysctl.d mount if in persist mode
38+
if [ "$PERSIST_MODE" = true ] && [ ! -d /host/sysctl.d ]; then
39+
echo "ERROR: Host /etc/sysctl.d not mounted. Please ensure volume is configured:"
40+
echo " volumes:"
41+
echo " - /etc/sysctl.d:/host/sysctl.d"
42+
exit 1
43+
fi
44+
45+
# Test if we have write access to critical parameters
46+
CAN_WRITE=true
47+
if [ -f /host/proc/sys/fs/aio-max-nr ]; then
48+
CURRENT_AIO=$(cat /host/proc/sys/fs/aio-max-nr 2>/dev/null)
49+
if ! echo "$CURRENT_AIO" >/host/proc/sys/fs/aio-max-nr 2>/dev/null; then
50+
echo "WARNING: Cannot write to /host/proc/sys - some settings may not be applied"
51+
echo "Container may need privileged mode for full configuration"
52+
CAN_WRITE=false
53+
fi
54+
else
55+
echo "WARNING: Cannot access /host/proc/sys/fs/aio-max-nr"
56+
CAN_WRITE=false
57+
fi
58+
59+
# Get system information
60+
echo "=== System Information ==="
61+
CPU_CORES=$(nproc)
62+
MEMORY_KB=$(grep MemTotal /host/proc/meminfo | awk '{print $2}')
63+
MEMORY_GB=$((MEMORY_KB / 1024 / 1024))
64+
echo "CPU Cores: ${CPU_CORES}"
65+
echo "Total Memory: ${MEMORY_GB} GB"
66+
echo ""
67+
68+
# Calculate dynamic values based on system resources
69+
# Based on ScyllaDB documentation and best practices
70+
71+
# AIO: ScyllaDB recommends 65536 * number of shards
72+
# ScyllaDB typically uses 1 shard per core, but we'll add buffer
73+
SHARDS_PER_CORE=1
74+
ESTIMATED_SHARDS=$((CPU_CORES * SHARDS_PER_CORE))
75+
AIO_PER_SHARD=65536
76+
# Add 50% buffer for safety
77+
AIO_REQUIRED=$((ESTIMATED_SHARDS * AIO_PER_SHARD * 3 / 2))
78+
# Ensure minimum of 1048576 as recommended
79+
if [ "$AIO_REQUIRED" -lt 1048576 ]; then
80+
AIO_REQUIRED=1048576
81+
fi
82+
83+
# Network settings based on cores
84+
# ScyllaDB handles high connection counts
85+
SOMAXCONN=$((CPU_CORES * 1024))
86+
if [ "$SOMAXCONN" -lt 4096 ]; then
87+
SOMAXCONN=4096
88+
elif [ "$SOMAXCONN" -gt 65535 ]; then
89+
SOMAXCONN=65535
90+
fi
91+
92+
TCP_MAX_SYN_BACKLOG=$((CPU_CORES * 512))
93+
if [ "$TCP_MAX_SYN_BACKLOG" -lt 4096 ]; then
94+
TCP_MAX_SYN_BACKLOG=4096
95+
elif [ "$TCP_MAX_SYN_BACKLOG" -gt 65535 ]; then
96+
TCP_MAX_SYN_BACKLOG=65535
97+
fi
98+
99+
# Memory map count: ScyllaDB recommends high values
100+
# Base calculation: 65530 per GB of RAM
101+
VM_MAX_MAP_COUNT=$((MEMORY_GB * 65530))
102+
if [ "$VM_MAX_MAP_COUNT" -lt 1048575 ]; then
103+
# ScyllaDB minimum recommendation
104+
VM_MAX_MAP_COUNT=1048575
105+
fi
106+
107+
# Network buffer sizes: 1% of RAM but capped
108+
NET_MEM_BYTES=$((MEMORY_KB * 1024 / 100))
109+
if [ "$NET_MEM_BYTES" -gt 134217728 ]; then
110+
# Cap at 128MB
111+
NET_MEM_BYTES=134217728
112+
elif [ "$NET_MEM_BYTES" -lt 16777216 ]; then
113+
# Minimum 16MB for good performance
114+
NET_MEM_BYTES=16777216
115+
fi
116+
117+
# File descriptors: ScyllaDB needs many
118+
FD_LIMIT=$((CPU_CORES * 200000))
119+
if [ "$FD_LIMIT" -lt 1000000 ]; then
120+
FD_LIMIT=1000000
121+
fi
122+
123+
# Function to safely set sysctl values
124+
set_sysctl() {
125+
local param=$1
126+
local value=$2
127+
local description=$3
128+
local current_value
129+
130+
current_value=$(cat /host/proc/sys/${param//\.//} 2>/dev/null || echo "0")
131+
132+
if [ "$current_value" -lt "$value" ]; then
133+
echo "Setting ${param} from ${current_value} to ${value} (${description})"
134+
if echo "$value" >/host/proc/sys/${param//\.//} 2>/dev/null; then
135+
echo "${param} = ${value}"
136+
else
137+
echo "⚠ Could not set ${param} (may need host reboot)"
138+
fi
139+
else
140+
echo "${param} = ${current_value} (already sufficient, recommended: ${value})"
141+
fi
142+
}
143+
144+
echo "=== Configuring Host System Parameters ==="
145+
echo ""
146+
147+
# 1. AIO - Most critical for ScyllaDB
148+
echo "1. Asynchronous I/O (AIO) Configuration:"
149+
echo " Calculated: ${AIO_REQUIRED} (${CPU_CORES} cores × ${SHARDS_PER_CORE} shard × ${AIO_PER_SHARD} + buffer)"
150+
set_sysctl "fs.aio-max-nr" "${AIO_REQUIRED}" "async I/O operations"
151+
echo ""
152+
153+
# 2. Network settings
154+
echo "2. Network Settings (based on ${CPU_CORES} cores):"
155+
set_sysctl "net.core.somaxconn" "${SOMAXCONN}" "socket listen backlog"
156+
set_sysctl "net.ipv4.tcp_max_syn_backlog" "${TCP_MAX_SYN_BACKLOG}" "TCP SYN queue"
157+
158+
# Disable SYN cookies (ScyllaDB recommendation)
159+
echo "0" >/host/proc/sys/net/ipv4/tcp_syncookies 2>/dev/null &&
160+
echo "✓ Disabled TCP SYN cookies (for performance)" ||
161+
echo "⚠ Could not disable TCP SYN cookies"
162+
163+
# Network buffers
164+
set_sysctl "net.core.rmem_max" "${NET_MEM_BYTES}" "receive buffer max"
165+
set_sysctl "net.core.wmem_max" "${NET_MEM_BYTES}" "send buffer max"
166+
set_sysctl "net.core.netdev_max_backlog" "10000" "network device backlog"
167+
168+
# TCP memory and buffers (space-separated values need special handling)
169+
echo "Setting TCP buffer sizes..."
170+
echo "4096 87380 ${NET_MEM_BYTES}" >/host/proc/sys/net/ipv4/tcp_rmem 2>/dev/null ||
171+
echo "⚠ Could not set TCP receive buffers"
172+
echo "4096 65536 ${NET_MEM_BYTES}" >/host/proc/sys/net/ipv4/tcp_wmem 2>/dev/null ||
173+
echo "⚠ Could not set TCP send buffers"
174+
175+
# TCP tuning for low latency
176+
set_sysctl "net.ipv4.tcp_timestamps" "1" "TCP timestamps"
177+
set_sysctl "net.ipv4.tcp_sack" "1" "TCP selective ack"
178+
set_sysctl "net.ipv4.tcp_window_scaling" "1" "TCP window scaling"
179+
echo ""
180+
181+
# 3. Memory settings
182+
echo "3. Memory Settings (based on ${MEMORY_GB} GB RAM):"
183+
set_sysctl "vm.max_map_count" "${VM_MAX_MAP_COUNT}" "memory map areas"
184+
set_sysctl "vm.swappiness" "0" "disable swap usage"
185+
set_sysctl "vm.dirty_ratio" "5" "dirty page ratio"
186+
set_sysctl "vm.dirty_background_ratio" "2" "background dirty ratio"
187+
echo ""
188+
189+
# 4. File descriptors
190+
echo "4. File Descriptor Limits:"
191+
set_sysctl "fs.file-max" "${FD_LIMIT}" "max file descriptors"
192+
set_sysctl "fs.nr_open" "${FD_LIMIT}" "max open files per process"
193+
echo ""
194+
195+
# 5. Transparent Huge Pages (THP) - ScyllaDB requires this disabled
196+
echo "5. Transparent Huge Pages (THP):"
197+
THP_STATUS=$(cat /host/sys/kernel/mm/transparent_hugepage/enabled 2>/dev/null | grep -o '\[.*\]' | tr -d '[]' || echo "unknown")
198+
if [ "$THP_STATUS" != "never" ]; then
199+
echo "Warning: THP is '${THP_STATUS}', ScyllaDB requires 'never'"
200+
if echo never >/host/sys/kernel/mm/transparent_hugepage/enabled 2>/dev/null; then
201+
echo "✓ THP disabled"
202+
else
203+
echo "⚠ Could not disable THP (may need kernel boot parameter)"
204+
fi
205+
206+
# Also disable defrag
207+
echo never >/host/sys/kernel/mm/transparent_hugepage/defrag 2>/dev/null || true
208+
else
209+
echo "✓ THP is already disabled"
210+
fi
211+
echo ""
212+
213+
# 6. CPU frequency scaling (for consistent performance)
214+
echo "6. CPU Performance Settings:"
215+
if [ -d /host/sys/devices/system/cpu/cpu0/cpufreq ]; then
216+
for gov in /host/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
217+
echo "performance" >"$gov" 2>/dev/null || true
218+
done
219+
echo "✓ Set CPU governor to performance mode"
220+
else
221+
echo "! CPU frequency scaling not available"
222+
fi
223+
echo ""
224+
225+
# 7. NUMA settings if available
226+
echo "7. NUMA Settings:"
227+
if [ -f /host/proc/sys/kernel/numa_balancing ]; then
228+
echo "0" >/host/proc/sys/kernel/numa_balancing 2>/dev/null &&
229+
echo "✓ Disabled NUMA balancing (ScyllaDB manages NUMA)" ||
230+
echo "⚠ Could not disable NUMA balancing"
231+
else
232+
echo "! NUMA not available on this system"
233+
fi
234+
echo ""
235+
236+
# Create persistent sysctl configuration if requested
237+
if [ "$PERSIST_MODE" = true ]; then
238+
echo ""
239+
echo "=== Creating Persistent Configuration ==="
240+
241+
SYSCTL_CONFIG="/host/sysctl.d/99-scylladb.conf"
242+
243+
# Check if config already exists
244+
if [ -f "$SYSCTL_CONFIG" ]; then
245+
echo "Found existing $SYSCTL_CONFIG"
246+
echo "Backing up to ${SYSCTL_CONFIG}.bak"
247+
cp "$SYSCTL_CONFIG" "${SYSCTL_CONFIG}.bak"
248+
fi
249+
250+
cat >"$SYSCTL_CONFIG" <<EOF
251+
# ScyllaDB Performance Tuning
252+
# Generated on $(date)
253+
# System: ${CPU_CORES} cores, ${MEMORY_GB} GB RAM
254+
255+
# Asynchronous I/O
256+
fs.aio-max-nr = ${AIO_REQUIRED}
257+
258+
# Network settings
259+
net.core.somaxconn = ${SOMAXCONN}
260+
net.ipv4.tcp_max_syn_backlog = ${TCP_MAX_SYN_BACKLOG}
261+
net.ipv4.tcp_syncookies = 0
262+
net.core.rmem_max = ${NET_MEM_BYTES}
263+
net.core.wmem_max = ${NET_MEM_BYTES}
264+
net.core.netdev_max_backlog = 10000
265+
net.ipv4.tcp_timestamps = 1
266+
net.ipv4.tcp_sack = 1
267+
net.ipv4.tcp_window_scaling = 1
268+
269+
# Memory settings
270+
vm.max_map_count = ${VM_MAX_MAP_COUNT}
271+
vm.swappiness = 0
272+
vm.dirty_ratio = 5
273+
vm.dirty_background_ratio = 2
274+
275+
# File descriptors
276+
fs.file-max = ${FD_LIMIT}
277+
fs.nr_open = ${FD_LIMIT}
278+
279+
# NUMA (if available)
280+
kernel.numa_balancing = 0
281+
282+
# TCP buffer sizes (space-separated values handled separately)
283+
# Apply with: sysctl -p /etc/sysctl.d/99-scylladb.conf
284+
# Then manually set:
285+
# echo "4096 87380 ${NET_MEM_BYTES}" > /proc/sys/net/ipv4/tcp_rmem
286+
# echo "4096 65536 ${NET_MEM_BYTES}" > /proc/sys/net/ipv4/tcp_wmem
287+
EOF
288+
289+
if [ -f "$SYSCTL_CONFIG" ]; then
290+
echo "✓ Created $SYSCTL_CONFIG successfully"
291+
echo ""
292+
echo "To apply on host system, run:"
293+
echo " sudo sysctl --system"
294+
echo ""
295+
echo "Note: Some settings require additional steps:"
296+
echo " - Transparent Huge Pages: Add 'transparent_hugepage=never' to kernel boot parameters"
297+
echo " - TCP buffers: Apply manually as shown in config comments"
298+
echo " - CPU governor: Set via cpupower or similar tool"
299+
else
300+
echo "⚠ Failed to create $SYSCTL_CONFIG"
301+
fi
302+
fi
303+
304+
# Create a flag file to indicate setup is complete
305+
touch /tmp/scylla-setup-complete
306+
307+
echo "=== Configuration Summary ==="
308+
echo "System Resources:"
309+
echo " - CPU Cores: ${CPU_CORES}"
310+
echo " - Memory: ${MEMORY_GB} GB"
311+
echo ""
312+
echo "Applied Settings:"
313+
echo " - AIO max-nr: ${AIO_REQUIRED}"
314+
echo " - Socket backlog: ${SOMAXCONN}"
315+
echo " - TCP SYN backlog: ${TCP_MAX_SYN_BACKLOG}"
316+
echo " - VM max map count: ${VM_MAX_MAP_COUNT}"
317+
echo " - Network buffers: $((NET_MEM_BYTES / 1024 / 1024)) MB"
318+
echo " - File descriptors: ${FD_LIMIT}"
319+
echo ""
320+
321+
echo "=== ScyllaDB Host Setup Complete ==="
322+
echo ""
323+
echo "All parameters have been configured based on system resources."
324+
echo "ScyllaDB containers can now start with optimal performance settings."
325+
echo ""
326+
327+
exit 0

0 commit comments

Comments
 (0)