Skip to content

Commit 85c2318

Browse files
ixxeL2097Frederic Spiers
andauthored
docs(global): improve documentation about troubleshooting, logs, monitoring (#65)
Co-authored-by: Frederic Spiers <[email protected]>
1 parent ffcd995 commit 85c2318

File tree

4 files changed

+442
-1
lines changed

4 files changed

+442
-1
lines changed

README.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ Deploy the ggbridge client via Docker Compose by performing the following action
4747
- Create `docker-compose.yml` file
4848

4949
> [!IMPORTANT]
50-
> GGBridge is designed by default to work as HA, so it needs `3` client deployments to work properly. Ensure `replicas: 3` in your `docker-compose.yml` file.
50+
> GGBridge is designed by default to work as HA, so it needs `3` client deployments to work properly. Ensure `replicas: 3` in your `docker-compose.yml` file. If you lower the replica count, it may result in an unstable bridge.
5151
>
5252
> ```yaml
5353
> services:
@@ -179,6 +179,15 @@ helm -n ggbridge upgrade --install --create-namespace \
179179
-f values.yaml
180180
```
181181
182+
> [!TIP]
183+
> If you need to upgrade your current installation with new parameters, please update your `values.yaml` file with correct key/value and then run the following command :
184+
> ```bash
185+
> helm -n ggbridge upgrade -i \
186+
> gbridge oci://ghcr.io/gitguardian/ggbridge/helm/ggbridge \
187+
> -f values.yaml
188+
> ```
189+
> We recommend you to store the `values.yaml` file somewhere safe such as in a git repository.
190+
182191
## Examples
183192
184193
Here, you will find various usage examples of ggbridge, each example provides a step-by-step guide on how to configure and use ggbridge to establish a secure, authenticated connection between your self-hosted services and the GitGuardian platform.
@@ -187,3 +196,7 @@ Here, you will find various usage examples of ggbridge, each example provides a
187196
| --------------------------------------------- | --------------------------------------------- |
188197
| [2-way-tunneling](./examples/2-way-tunneling) | Enable client-to-server tunnels |
189198
| [ggscout](./examples/ggscout) | Connect ggscout with the GitGuardian platform |
199+
200+
## Troubleshooting
201+
202+
For troubleshooting guidance, please refer to the related [documentation](./docs/troubleshoot.md)

docs/troubleshoot.md

Lines changed: 269 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,269 @@
1+
## Troubleshooting
2+
### Debug image
3+
4+
GGBridge provides two types of images for different use cases:
5+
6+
- **Production image**: `ghcr.io/gitguardian/ggbridge:latest` - Minimal, secure image without shell access
7+
- **Debug image**: `ghcr.io/gitguardian/ggbridge:latest-shell` - Includes debugging tools
8+
9+
**Available debug tools**: `bash`, `curl`, `net-tools`, `bind-tools`, `openssl`, `dig`, `nslookup`
10+
11+
**How to switch to debug image**:
12+
- **Docker Compose**: Update the image tag in `docker-compose.yaml`
13+
- **Helm**: Update the image tag in `values.yaml`
14+
15+
### Connectivity Tests
16+
17+
#### 1. Client-Side Healthcheck
18+
19+
Verify basic connectivity from the client to the server:
20+
21+
```bash
22+
kubectl exec -it $ggbridge_pod -- bash -c "curl http://127.0.0.1:9081/healthz"
23+
```
24+
25+
Expected output:
26+
```console
27+
OK
28+
```
29+
30+
#### 2. SOCKS Proxy Test (Server-Side)
31+
Test SOCKS proxy connectivity and DNS resolution.
32+
33+
> [!IMPORTANT]
34+
> If you want to execute this test on **client side**, you need first to enable `socks` tunnel in your `values.yaml` by adding these lines :
35+
> ```yaml
36+
> client:
37+
> tunnels:
38+
> socks:
39+
> enabled: true
40+
> ```
41+
> Then upgrade your deployment:
42+
> ```bash
43+
> helm -n ggbridge upgrade -i gbridge oci://ghcr.io/gitguardian/ggbridge/helm/ggbridge -f values.yaml
44+
> ```
45+
> And finally, test the connection. By default, the service name will be `ggbridge-proxy` (different from the server side). Only endpoints in the allowed list can be accessed - for testing, you can use `https://api.gitguardian.com`.
46+
> ```bash
47+
> kubectl run debug -it --rm \
48+
> --restart=Never \
49+
> -n ggbridge \
50+
> --image=nicolaka/netshoot:latest \
51+
> -- zsh -c "curl -sILk --proxy socks5h://ggbridge-proxy.ggbridge.svc.cluster.local:1080 https://api.gitguardian.com"
52+
53+
Quick test (HTTP status code only):
54+
55+
```bash
56+
curl -sLk \
57+
-o /dev/null \
58+
-w "%{http_code}" \
59+
--connect-timeout 60 \
60+
--proxy "socks5h://${PROXY_HOST}:${PROXY_PORT}" "${VCS_URL}"
61+
```
62+
63+
Verbose test (with headers):
64+
65+
```bash
66+
curl -sILk --connect-timeout 60 \
67+
--proxy "socks5h://${PROXY_HOST}:${PROXY_PORT}" "${VCS_URL}"
68+
```
69+
70+
Real-world example:
71+
72+
```bash
73+
# Replace $uid with your actual bridge UID
74+
kubectl run debug -it --rm \
75+
--restart=Never \
76+
-n ggbridge \
77+
--image=nicolaka/netshoot:latest \
78+
-- zsh -c "curl -sILk --proxy socks5h://$uid.ggbridge.svc.cluster.local https://vcs.example.local"
79+
```
80+
81+
Expected responses:
82+
- `200`: Success
83+
- `301/302`: Redirect
84+
85+
> [!NOTE]
86+
> The `socks5h` is intended for remote DNS lookup.
87+
88+
#### 3. Git Repository Test (Server-Side)
89+
Test Git operations through the SOCKS proxy:
90+
91+
```bash
92+
git -c http.proxy="socks5h://${PROXY_HOST}" \
93+
-c http.sslVerify=false \
94+
-c http.timeout=30 \
95+
ls-remote --heads "${REPO_URL_WITH_AUTH}"
96+
```
97+
98+
Example with authentication:
99+
```bash
100+
git -c http.proxy="socks5h://$uid-proxy-socks:1080" \
101+
-c http.sslVerify=false \
102+
-c http.timeout=30 \
103+
ls-remote --heads "https://admin:[email protected]/group1/myrepo.git"
104+
```
105+
Expected output: List of Git branches and their commit hashes
106+
107+
> [!TIP]
108+
> Please consider using the `CronJob` probes available [here](../tests/) if you want a permanent check.
109+
110+
### Log Analysis
111+
112+
#### Client/Server Health Logs.
113+
Check nginx sidecar logs for connectivity issues:
114+
115+
```bash
116+
# Check specific pod logs
117+
kubectl logs -l tenant=$uid,index=$index -c nginx -n ggbridge
118+
119+
# Check all pods for a tenant
120+
kubectl logs -l tenant=$uid -c nginx -n ggbridge --tail=50
121+
```
122+
Healthy connection log example:
123+
```console
124+
health 127.0.0.1 [30/Sep/2025:12:04:38 +0000] 127.0.0.1 "GET /healthz HTTP/1.1" 200 3 "-" "Go-http-client/1.1"
125+
```
126+
No logs = No connectivity from the other tunnel endpoint.
127+
128+
#### Server-Side Proxy Logs
129+
130+
Monitor traffic through the SOCKS proxy:
131+
```bash
132+
kubectl logs -l app.kubernetes.io/component=proxy,tenant=$uid -c nginx -n ggbridge --tail=100
133+
```
134+
135+
Port meanings:
136+
- `8081`: Health checks
137+
- `1080`: SOCKS proxy traffic
138+
- `443`: HTTPS/TLS traffic
139+
- `80`: HTTP traffic
140+
141+
Log format explanation:
142+
143+
| Position | Value | Nginx variable | Description | Unit |
144+
| --- | --- | --- | --- | --- |
145+
| 1 | `127.0.0.1` | `$remote_addr` | Local client(health check) | IP |
146+
| 2 | `[24/Sep/2025:09:46:28 +0000]` | `[$time_local]` | Connection timestamp | Date |
147+
| 3 | `TCP` | `$protocol` | Transport protocol | Protocol |
148+
| 4 | `200` | `$status` | Status code | Code |
149+
| 5 | `150` | `$bytes_sent` | Bytes sent by nginx → client | Bytes |
150+
| 6 | `102` | `$bytes_received` | Bytes received by nginx ← client | Bytes |
151+
| 7 | `0.077` | `$session_time` | Session duration | Seconds |
152+
| 8 | `"172.20.167.124:8081"` | `"$upstream_addr"` | Healthcheck backend server | IP:Port |
153+
| 9 | `"102"` | `"$upstream_bytes_sent"` | Data sent nginx → backend | Bytes |
154+
| 10 | `"150"` | `"$upstream_bytes_received"` | Data received nginx ← backend | Bytes |
155+
| 11 | `"0.000"` | `"$upstream_connect_time"` | Connection time | Seconds |
156+
157+
## Client Monitoring/Alerting Guidelines
158+
### Overview
159+
160+
> [!NOTE]
161+
> This guide provides generic recommendations for monitoring GGBridge client health and stability. These guidelines are platform-agnostic and can be adapted to your existing monitoring infrastructure.
162+
163+
#### Replica count
164+
Ensure that all 3 GGBridge client deployments are properly deployed, each with 1 replica:
165+
166+
```console
167+
$ kubectl get deployments -n ggbridge
168+
NAME READY UP-TO-DATE AVAILABLE AGE
169+
ggbridge-client-0 1/1 1 1 25h
170+
ggbridge-client-1 1/1 1 1 25h
171+
ggbridge-client-2 1/1 1 1 25h
172+
```
173+
174+
**What to monitor**:
175+
- All deployments should show 1/1 in the READY column
176+
177+
**Alert condition**:
178+
- Any deployment showing 0/1 or missing deployments
179+
180+
**Prometheus query example**:
181+
```
182+
kube_deployment_status_replicas_ready{namespace="ggbridge", deployment=~"ggbridge-client-.*"}
183+
```
184+
Count deployment with correct status (should be 3):
185+
```
186+
sum(
187+
(kube_deployment_status_replicas_ready{namespace="ggbridge", deployment=~"ggbridge-client-.*"} == 1) and
188+
(kube_deployment_spec_replicas{namespace="ggbridge", deployment=~"ggbridge-client-.*"} == 1) and
189+
(kube_deployment_status_replicas_available{namespace="ggbridge", deployment=~"ggbridge-client-.*"} == 1)
190+
)
191+
```
192+
193+
#### Pod Status and Readiness
194+
Check that all pods are running and ready to accept connections:
195+
196+
```console
197+
$ kubectl get pods -n ggbridge
198+
NAME READY STATUS RESTARTS AGE
199+
ggbridge-client-0-76687c7f6f-h6zrj 2/2 Running 0 25h
200+
ggbridge-client-1-89abc123de-xyz45 2/2 Running 0 25h
201+
ggbridge-client-2-12def456gh-abc78 2/2 Running 0 25h
202+
```
203+
204+
**What to monitor**:
205+
- All pods should show 2/2 in the READY column (ggbridge + nginx containers)
206+
- STATUS should be Running
207+
- Monitor restart count - frequent restarts indicate issues
208+
209+
**Alert conditions**:
210+
- Pod showing 1/2 ready (connection issues with server)
211+
- Pod in `CrashLoopBackOff`, `Error`, or `Pending` status
212+
- High restart count (>5 restarts in 1 hour)
213+
214+
**Prometheus query example**:
215+
```
216+
kube_pod_status_ready{condition="true", namespace="ggbridge", pod=~"ggbridge-client-.*"}
217+
```
218+
219+
#### Container Logs Analysis
220+
Monitor logs from the `ggbridge` container for connection issues:
221+
222+
**Key error patterns to watch for**:
223+
224+
WebSocket handshake failures (server connectivity issues):
225+
226+
```console
227+
2025-09-30T15:35:11.627155Z ERROR tunnel{id="01999b43-6b64-7a61-bab6-6ff55b03aade" remote="127.0.0.1:8081"}: wstunnel::tunnel::client::client: failed to do websocket handshake with the server wss://jpynh30wscp60zs4lbdf4m4p8qe9idgu.ggbridge.gitguardian.com:443
228+
```
229+
230+
**What to monitor**:
231+
- Frequency of ERROR log entries
232+
- Specific error patterns indicating connectivity issues
233+
- Connection establishment success/failure rates
234+
235+
**Loki query example**:
236+
```
237+
{k8s_namespace_name="ggbridge", k8s_pod_name=~"ggbridge-client-.*"} |= "ERROR"
238+
```
239+
240+
#### Resource Usage
241+
Monitor pod resource consumption:
242+
```console
243+
$ kubectl top pods -n ggbridge
244+
NAME CPU(cores) MEMORY(bytes)
245+
ggbridge-client-0-76687c7f6f-h6zrj 8m 7Mi
246+
ggbridge-client-1-bd75768f4-cr59l 10m 8Mi
247+
ggbridge-client-2-689f9d7c5-bz9k5 9m 7Mi
248+
```
249+
**What to monitor**:
250+
- CPU usage
251+
- Memory usage
252+
- Sudden spikes in resource usage
253+
254+
**Prometheus query example**:
255+
```
256+
# CPU (millicores)
257+
rate(container_cpu_usage_seconds_total{namespace="ggbridge", pod=~"ggbridge-client-.*", container!="POD", container!=""}[5m]) * 1000
258+
259+
# Memory (MB)
260+
container_memory_working_set_bytes{namespace="ggbridge", pod=~"ggbridge-client-.*", container!="POD", container!=""} / 1024 / 1024
261+
```
262+
263+
### Getting Support
264+
For technical support, please contact [[email protected]](mailto:[email protected]) with:
265+
1. Environment details: Kubernetes version, GGBridge version
266+
2. Error logs: Include relevant nginx and application logs
267+
3. Configuration: Sanitized `values.yaml` or `docker-compose.yaml`
268+
4. Test results: Output from the connectivity tests above
269+
5. Network setup: Information about firewalls, proxies, DNS configuration

0 commit comments

Comments
 (0)