|
| 1 | +## Troubleshooting |
| 2 | +### Debug image |
| 3 | + |
| 4 | +GGBridge provides two types of images for different use cases: |
| 5 | + |
| 6 | +- **Production image**: `ghcr.io/gitguardian/ggbridge:latest` - Minimal, secure image without shell access |
| 7 | +- **Debug image**: `ghcr.io/gitguardian/ggbridge:latest-shell` - Includes debugging tools |
| 8 | + |
| 9 | +**Available debug tools**: `bash`, `curl`, `net-tools`, `bind-tools`, `openssl`, `dig`, `nslookup` |
| 10 | + |
| 11 | +**How to switch to debug image**: |
| 12 | +- **Docker Compose**: Update the image tag in `docker-compose.yaml` |
| 13 | +- **Helm**: Update the image tag in `values.yaml` |
| 14 | + |
| 15 | +### Connectivity Tests |
| 16 | + |
| 17 | +#### 1. Client-Side Healthcheck |
| 18 | + |
| 19 | +Verify basic connectivity from the client to the server: |
| 20 | + |
| 21 | +```bash |
| 22 | +kubectl exec -it $ggbridge_pod -- bash -c "curl http://127.0.0.1:9081/healthz" |
| 23 | +``` |
| 24 | + |
| 25 | +Expected output: |
| 26 | +```console |
| 27 | +OK |
| 28 | +``` |
| 29 | + |
| 30 | +#### 2. SOCKS Proxy Test (Server-Side) |
| 31 | +Test SOCKS proxy connectivity and DNS resolution. |
| 32 | + |
| 33 | +> [!IMPORTANT] |
| 34 | +> If you want to execute this test on **client side**, you need first to enable `socks` tunnel in your `values.yaml` by adding these lines : |
| 35 | +> ```yaml |
| 36 | +> client: |
| 37 | +> tunnels: |
| 38 | +> socks: |
| 39 | +> enabled: true |
| 40 | +> ``` |
| 41 | +> Then upgrade your deployment: |
| 42 | +> ```bash |
| 43 | +> helm -n ggbridge upgrade -i gbridge oci://ghcr.io/gitguardian/ggbridge/helm/ggbridge -f values.yaml |
| 44 | +> ``` |
| 45 | +> And finally, test the connection. By default, the service name will be `ggbridge-proxy` (different from the server side). Only endpoints in the allowed list can be accessed - for testing, you can use `https://api.gitguardian.com`. |
| 46 | +> ```bash |
| 47 | +> kubectl run debug -it --rm \ |
| 48 | +> --restart=Never \ |
| 49 | +> -n ggbridge \ |
| 50 | +> --image=nicolaka/netshoot:latest \ |
| 51 | +> -- zsh -c "curl -sILk --proxy socks5h://ggbridge-proxy.ggbridge.svc.cluster.local:1080 https://api.gitguardian.com" |
| 52 | +
|
| 53 | +Quick test (HTTP status code only): |
| 54 | +
|
| 55 | +```bash |
| 56 | +curl -sLk \ |
| 57 | + -o /dev/null \ |
| 58 | + -w "%{http_code}" \ |
| 59 | + --connect-timeout 60 \ |
| 60 | + --proxy "socks5h://${PROXY_HOST}:${PROXY_PORT}" "${VCS_URL}" |
| 61 | +``` |
| 62 | +
|
| 63 | +Verbose test (with headers): |
| 64 | +
|
| 65 | +```bash |
| 66 | +curl -sILk --connect-timeout 60 \ |
| 67 | + --proxy "socks5h://${PROXY_HOST}:${PROXY_PORT}" "${VCS_URL}" |
| 68 | +``` |
| 69 | +
|
| 70 | +Real-world example: |
| 71 | +
|
| 72 | +```bash |
| 73 | +# Replace $uid with your actual bridge UID |
| 74 | +kubectl run debug -it --rm \ |
| 75 | + --restart=Never \ |
| 76 | + -n ggbridge \ |
| 77 | + --image=nicolaka/netshoot:latest \ |
| 78 | + -- zsh -c "curl -sILk --proxy socks5h://$uid.ggbridge.svc.cluster.local https://vcs.example.local" |
| 79 | +``` |
| 80 | +
|
| 81 | +Expected responses: |
| 82 | +- `200`: Success |
| 83 | +- `301/302`: Redirect |
| 84 | +
|
| 85 | +> [!NOTE] |
| 86 | +> The `socks5h` is intended for remote DNS lookup. |
| 87 | +
|
| 88 | +#### 3. Git Repository Test (Server-Side) |
| 89 | +Test Git operations through the SOCKS proxy: |
| 90 | +
|
| 91 | +```bash |
| 92 | +git -c http.proxy="socks5h://${PROXY_HOST}" \ |
| 93 | + -c http.sslVerify=false \ |
| 94 | + -c http.timeout=30 \ |
| 95 | + ls-remote --heads "${REPO_URL_WITH_AUTH}" |
| 96 | +``` |
| 97 | +
|
| 98 | +Example with authentication: |
| 99 | +```bash |
| 100 | +git -c http.proxy="socks5h://$uid-proxy-socks:1080" \ |
| 101 | + -c http.sslVerify=false \ |
| 102 | + -c http.timeout=30 \ |
| 103 | + ls-remote --heads "https://admin:[email protected]/group1/myrepo.git" |
| 104 | +``` |
| 105 | +Expected output: List of Git branches and their commit hashes |
| 106 | +
|
| 107 | +> [!TIP] |
| 108 | +> Please consider using the `CronJob` probes available [here](../tests/) if you want a permanent check. |
| 109 | +
|
| 110 | +### Log Analysis |
| 111 | +
|
| 112 | +#### Client/Server Health Logs. |
| 113 | +Check nginx sidecar logs for connectivity issues: |
| 114 | +
|
| 115 | +```bash |
| 116 | +# Check specific pod logs |
| 117 | +kubectl logs -l tenant=$uid,index=$index -c nginx -n ggbridge |
| 118 | +
|
| 119 | +# Check all pods for a tenant |
| 120 | +kubectl logs -l tenant=$uid -c nginx -n ggbridge --tail=50 |
| 121 | +``` |
| 122 | +Healthy connection log example: |
| 123 | +```console |
| 124 | +health 127.0.0.1 [30/Sep/2025:12:04:38 +0000] 127.0.0.1 "GET /healthz HTTP/1.1" 200 3 "-" "Go-http-client/1.1" |
| 125 | +``` |
| 126 | +No logs = No connectivity from the other tunnel endpoint. |
| 127 | +
|
| 128 | +#### Server-Side Proxy Logs |
| 129 | +
|
| 130 | +Monitor traffic through the SOCKS proxy: |
| 131 | +```bash |
| 132 | +kubectl logs -l app.kubernetes.io/component=proxy,tenant=$uid -c nginx -n ggbridge --tail=100 |
| 133 | +``` |
| 134 | +
|
| 135 | +Port meanings: |
| 136 | +- `8081`: Health checks |
| 137 | +- `1080`: SOCKS proxy traffic |
| 138 | +- `443`: HTTPS/TLS traffic |
| 139 | +- `80`: HTTP traffic |
| 140 | +
|
| 141 | +Log format explanation: |
| 142 | +
|
| 143 | +| Position | Value | Nginx variable | Description | Unit | |
| 144 | +| --- | --- | --- | --- | --- | |
| 145 | +| 1 | `127.0.0.1` | `$remote_addr` | Local client(health check) | IP | |
| 146 | +| 2 | `[24/Sep/2025:09:46:28 +0000]` | `[$time_local]` | Connection timestamp | Date | |
| 147 | +| 3 | `TCP` | `$protocol` | Transport protocol | Protocol | |
| 148 | +| 4 | `200` | `$status` | Status code | Code | |
| 149 | +| 5 | `150` | `$bytes_sent` | Bytes sent by nginx → client | Bytes | |
| 150 | +| 6 | `102` | `$bytes_received` | Bytes received by nginx ← client | Bytes | |
| 151 | +| 7 | `0.077` | `$session_time` | Session duration | Seconds | |
| 152 | +| 8 | `"172.20.167.124:8081"` | `"$upstream_addr"` | Healthcheck backend server | IP:Port | |
| 153 | +| 9 | `"102"` | `"$upstream_bytes_sent"` | Data sent nginx → backend | Bytes | |
| 154 | +| 10 | `"150"` | `"$upstream_bytes_received"` | Data received nginx ← backend | Bytes | |
| 155 | +| 11 | `"0.000"` | `"$upstream_connect_time"` | Connection time | Seconds | |
| 156 | +
|
| 157 | +## Client Monitoring/Alerting Guidelines |
| 158 | +### Overview |
| 159 | +
|
| 160 | +> [!NOTE] |
| 161 | +> This guide provides generic recommendations for monitoring GGBridge client health and stability. These guidelines are platform-agnostic and can be adapted to your existing monitoring infrastructure. |
| 162 | +
|
| 163 | +#### Replica count |
| 164 | +Ensure that all 3 GGBridge client deployments are properly deployed, each with 1 replica: |
| 165 | +
|
| 166 | +```console |
| 167 | +$ kubectl get deployments -n ggbridge |
| 168 | +NAME READY UP-TO-DATE AVAILABLE AGE |
| 169 | +ggbridge-client-0 1/1 1 1 25h |
| 170 | +ggbridge-client-1 1/1 1 1 25h |
| 171 | +ggbridge-client-2 1/1 1 1 25h |
| 172 | +``` |
| 173 | +
|
| 174 | +**What to monitor**: |
| 175 | +- All deployments should show 1/1 in the READY column |
| 176 | +
|
| 177 | +**Alert condition**: |
| 178 | +- Any deployment showing 0/1 or missing deployments |
| 179 | +
|
| 180 | +**Prometheus query example**: |
| 181 | +``` |
| 182 | +kube_deployment_status_replicas_ready{namespace="ggbridge", deployment=~"ggbridge-client-.*"} |
| 183 | +``` |
| 184 | +Count deployment with correct status (should be 3): |
| 185 | +``` |
| 186 | +sum( |
| 187 | + (kube_deployment_status_replicas_ready{namespace="ggbridge", deployment=~"ggbridge-client-.*"} == 1) and |
| 188 | + (kube_deployment_spec_replicas{namespace="ggbridge", deployment=~"ggbridge-client-.*"} == 1) and |
| 189 | + (kube_deployment_status_replicas_available{namespace="ggbridge", deployment=~"ggbridge-client-.*"} == 1) |
| 190 | +) |
| 191 | +``` |
| 192 | +
|
| 193 | +#### Pod Status and Readiness |
| 194 | +Check that all pods are running and ready to accept connections: |
| 195 | +
|
| 196 | +```console |
| 197 | +$ kubectl get pods -n ggbridge |
| 198 | +NAME READY STATUS RESTARTS AGE |
| 199 | +ggbridge-client-0-76687c7f6f-h6zrj 2/2 Running 0 25h |
| 200 | +ggbridge-client-1-89abc123de-xyz45 2/2 Running 0 25h |
| 201 | +ggbridge-client-2-12def456gh-abc78 2/2 Running 0 25h |
| 202 | +``` |
| 203 | + |
| 204 | +**What to monitor**: |
| 205 | +- All pods should show 2/2 in the READY column (ggbridge + nginx containers) |
| 206 | +- STATUS should be Running |
| 207 | +- Monitor restart count - frequent restarts indicate issues |
| 208 | + |
| 209 | +**Alert conditions**: |
| 210 | +- Pod showing 1/2 ready (connection issues with server) |
| 211 | +- Pod in `CrashLoopBackOff`, `Error`, or `Pending` status |
| 212 | +- High restart count (>5 restarts in 1 hour) |
| 213 | + |
| 214 | +**Prometheus query example**: |
| 215 | +``` |
| 216 | +kube_pod_status_ready{condition="true", namespace="ggbridge", pod=~"ggbridge-client-.*"} |
| 217 | +``` |
| 218 | + |
| 219 | +#### Container Logs Analysis |
| 220 | +Monitor logs from the `ggbridge` container for connection issues: |
| 221 | + |
| 222 | +**Key error patterns to watch for**: |
| 223 | + |
| 224 | +WebSocket handshake failures (server connectivity issues): |
| 225 | + |
| 226 | +```console |
| 227 | +2025-09-30T15:35:11.627155Z ERROR tunnel{id="01999b43-6b64-7a61-bab6-6ff55b03aade" remote="127.0.0.1:8081"}: wstunnel::tunnel::client::client: failed to do websocket handshake with the server wss://jpynh30wscp60zs4lbdf4m4p8qe9idgu.ggbridge.gitguardian.com:443 |
| 228 | +``` |
| 229 | + |
| 230 | +**What to monitor**: |
| 231 | +- Frequency of ERROR log entries |
| 232 | +- Specific error patterns indicating connectivity issues |
| 233 | +- Connection establishment success/failure rates |
| 234 | + |
| 235 | +**Loki query example**: |
| 236 | +``` |
| 237 | +{k8s_namespace_name="ggbridge", k8s_pod_name=~"ggbridge-client-.*"} |= "ERROR" |
| 238 | +``` |
| 239 | + |
| 240 | +#### Resource Usage |
| 241 | +Monitor pod resource consumption: |
| 242 | +```console |
| 243 | +$ kubectl top pods -n ggbridge |
| 244 | +NAME CPU(cores) MEMORY(bytes) |
| 245 | +ggbridge-client-0-76687c7f6f-h6zrj 8m 7Mi |
| 246 | +ggbridge-client-1-bd75768f4-cr59l 10m 8Mi |
| 247 | +ggbridge-client-2-689f9d7c5-bz9k5 9m 7Mi |
| 248 | +``` |
| 249 | +**What to monitor**: |
| 250 | +- CPU usage |
| 251 | +- Memory usage |
| 252 | +- Sudden spikes in resource usage |
| 253 | + |
| 254 | +**Prometheus query example**: |
| 255 | +``` |
| 256 | +# CPU (millicores) |
| 257 | +rate(container_cpu_usage_seconds_total{namespace="ggbridge", pod=~"ggbridge-client-.*", container!="POD", container!=""}[5m]) * 1000 |
| 258 | +
|
| 259 | +# Memory (MB) |
| 260 | +container_memory_working_set_bytes{namespace="ggbridge", pod=~"ggbridge-client-.*", container!="POD", container!=""} / 1024 / 1024 |
| 261 | +``` |
| 262 | + |
| 263 | +### Getting Support |
| 264 | +For technical support, please contact [[email protected]](mailto:[email protected]) with: |
| 265 | +1. Environment details: Kubernetes version, GGBridge version |
| 266 | +2. Error logs: Include relevant nginx and application logs |
| 267 | +3. Configuration: Sanitized `values.yaml` or `docker-compose.yaml` |
| 268 | +4. Test results: Output from the connectivity tests above |
| 269 | +5. Network setup: Information about firewalls, proxies, DNS configuration |
0 commit comments