Skip to content

Commit 3586387

Browse files
committed
Add health monitoring script and enhance NGINX configuration for upstream checks
1 parent b751825 commit 3586387

File tree

4 files changed

+46
-8
lines changed

4 files changed

+46
-8
lines changed

Dockerfile

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,11 @@ LABEL description="NGINX caching proxy for Owlery"
55

66
ARG NGINX_CONF=nginx.conf.template
77
COPY $NGINX_CONF /etc/nginx/nginx.conf.template
8+
COPY health-monitor.sh /usr/local/bin/health-monitor.sh
89

9-
RUN mkdir -p /var/cache/nginx && chown -R nginx:nginx /var/cache/nginx
10+
RUN mkdir -p /var/cache/nginx && chown -R nginx:nginx /var/cache/nginx && \
11+
chmod +x /usr/local/bin/health-monitor.sh
1012

1113
EXPOSE 80
1214

13-
CMD ["/bin/sh", "-c", "UPSTREAM_SERVER=${UPSTREAM_SERVER:-owl.virtualflybrain.org:80} CACHE_MAX_SIZE=${CACHE_MAX_SIZE:-20g} envsubst '${UPSTREAM_SERVER} ${CACHE_MAX_SIZE}' < /etc/nginx/nginx.conf.template > /etc/nginx/nginx.conf && nginx -g 'daemon off;'"]
15+
CMD ["/bin/sh", "-c", "UPSTREAM_SERVER=${UPSTREAM_SERVER:-owl.virtualflybrain.org:80} CACHE_MAX_SIZE=${CACHE_MAX_SIZE:-20g} DNS_RESOLVER=${DNS_RESOLVER:-8.8.8.8 1.1.1.1} envsubst '${UPSTREAM_SERVER} ${CACHE_MAX_SIZE} ${DNS_RESOLVER}' < /etc/nginx/nginx.conf.template > /etc/nginx/nginx.conf && /usr/local/bin/health-monitor.sh & nginx -g 'daemon off;'"]

README.md

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,22 +30,29 @@ services:
3030
- "80:80"
3131
environment:
3232
- UPSTREAM_SERVER=owl:8080 # For production with owl service
33-
- CACHE_MAX_SIZE=1000g # 1TB cache size for high-traffic deployments
33+
- CACHE_MAX_SIZE=1t # 1TB cache size for high-traffic deployments
34+
- DNS_RESOLVER=169.254.169.250 # Rancher internal DNS (check /etc/resolv.conf)
3435
```
3536
3637
### Health Check
3738
3839
```bash
3940
curl http://localhost/health
40-
# Returns: OK
41+
# Returns: upstream response or "UPSTREAM_UNAVAILABLE" (503) if upstream is down
42+
# Includes X-Upstream-Status header showing actual upstream response code
4143
```
4244

45+
The health endpoint now proxies to the upstream server to verify connectivity. If the upstream is unavailable, it returns 503 with "UPSTREAM_UNAVAILABLE".
46+
47+
**Health Monitoring**: A background process logs warnings every 5 minutes if the upstream server becomes unreachable, but the container continues running to serve cached content.
48+
4349
## Configuration
4450

4551
### Environment Variables
4652

4753
- `UPSTREAM_SERVER`: Backend server URL (default: `owl.virtualflybrain.org:80`)
48-
- `CACHE_MAX_SIZE`: Maximum cache size on disk (default: `20g`, accepts NGINX size units: `k`/`K` kilobytes, `m`/`M` megabytes, `g`/`G` gigabytes)
54+
- `CACHE_MAX_SIZE`: Maximum cache size on disk (default: `20g`, accepts NGINX size units like `1t` for 1TB)
55+
- `DNS_RESOLVER`: DNS resolver servers (default: `8.8.8.8 1.1.1.1`, space-separated list). Check `cat /etc/resolv.conf` in your container to find the correct value for your environment.
4956

5057
### Cache Headers
5158

@@ -69,7 +76,8 @@ The proxy adds helpful headers to responses:
6976
- **Base image**: nginx:1.26-alpine
7077
- **Cache storage**: `/var/cache/nginx/owlery` with 1:2 directory levels
7178
- **Cache zone**: 100MB in-memory metadata zone
72-
- **Max cache size**: 20GB on disk (configurable via `CACHE_MAX_SIZE` environment variable, supports k/K, m/M, g/G units)
79+
- **Max cache size**: 20GB on disk (configurable via `CACHE_MAX_SIZE` environment variable)
80+
- **Health monitoring**: Background process checks upstream connectivity every 5 minutes and logs warnings
7381

7482
### Caching Behavior
7583

@@ -83,9 +91,10 @@ The proxy adds helpful headers to responses:
8391
### Networking
8492

8593
- **Listen port**: 80
94+
- **DNS resolver**: Configurable via `DNS_RESOLVER` (default: Google Public DNS with 30s TTL for fast upstream IP updates). Check `cat /etc/resolv.conf` in your container for the correct value.
8695
- **Host-agnostic**: Ignores Host header for routing
8796
- **Connection pooling**: 16 keep-alive connections to backend
88-
- **Timeouts**: 90s connect/read/send
97+
- **Timeouts**: 90s connect/read/send, 3s for health checks
8998

9099
## Build and Deployment
91100

health-monitor.sh

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
#!/bin/sh
2+
3+
# Health monitoring script for upstream server
4+
# Logs warnings but doesn't exit container
5+
6+
UPSTREAM_HOST=$(echo $UPSTREAM_SERVER | cut -d: -f1)
7+
UPSTREAM_PORT=$(echo $UPSTREAM_SERVER | cut -d: -f2)
8+
9+
echo "Monitoring upstream server: $UPSTREAM_HOST:$UPSTREAM_PORT"
10+
11+
while true; do
12+
if nc -z -w3 $UPSTREAM_HOST $UPSTREAM_PORT 2>/dev/null; then
13+
echo "$(date): Upstream server is healthy"
14+
else
15+
echo "$(date): WARNING - Upstream server $UPSTREAM_HOST:$UPSTREAM_PORT is unreachable"
16+
fi
17+
18+
sleep 300 # Check every 5 minutes
19+
done

nginx.conf.template

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@ events {
55
}
66

77
http {
8+
resolver ${DNS_RESOLVER} valid=30s;
9+
resolver_timeout 5s;
10+
811
upstream owlery {
912
server ${UPSTREAM_SERVER};
1013
keepalive 16;
@@ -38,8 +41,13 @@ http {
3841

3942
location /health {
4043
access_log off;
41-
return 200 "OK\n";
44+
proxy_pass http://owlery/;
45+
proxy_connect_timeout 3s;
46+
proxy_read_timeout 3s;
47+
proxy_intercept_errors on;
48+
error_page 500 502 503 504 =503 "UPSTREAM_UNAVAILABLE\n";
4249
add_header Content-Type text/plain;
50+
add_header X-Upstream-Status $upstream_status;
4351
}
4452

4553
location / {

0 commit comments

Comments
 (0)