Skip to content

Commit 3e87f62

Browse files
authored
chore: Implement better logging on network issues (#3359)
* add logs * adjust doc * add error details * run acc tests when config folder changes * use NewTransport * improve doc * add config acc tests * don't log headers * make tests less verbose * remove header example from doc * remove headers from tests * rename NewNetworkLoggingTransport to NewTransportWithNetworkLogging * remove name * undo realm changes * log Digest challenge requests * note about logging.NewTransport * adjust doc * fix doc examples * enable network logging only when TF loglevel is debug or higher * fix TestAccNetworkLogging * TestNetworkLoggingTransport_Disabled
1 parent d1c5195 commit 3e87f62

File tree

6 files changed

+406
-11
lines changed

6 files changed

+406
-11
lines changed

.github/workflows/acceptance-tests-runner.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -296,6 +296,7 @@ jobs:
296296
cluster_outage_simulation:
297297
- 'internal/service/clusteroutagesimulation/*.go'
298298
config:
299+
- 'internal/config/*.go'
299300
- 'internal/service/alertconfiguration/*.go'
300301
- 'internal/service/apikey/*.go'
301302
- 'internal/service/atlasuser/*.go'
@@ -649,6 +650,7 @@ jobs:
649650
AWS_S3_BUCKET: ${{ secrets.aws_s3_bucket_federation }}
650651
MONGODB_ATLAS_LAST_VERSION: ${{ needs.get-provider-version.outputs.provider_version }}
651652
ACCTEST_PACKAGES: |
653+
./internal/config
652654
./internal/service/alertconfiguration
653655
./internal/service/atlasuser
654656
./internal/service/cloudprovideraccess

contributing/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,4 @@ Thanks for your interest in contributing to MongoDB Atlas Terraform Provider, th
88
- [Documentation](documentation.md)
99
- [Changelog process](changelog-process.md)
1010
- [Atlas SDK](atlas-sdk.md)
11+
- [Enhanced Network Logging](network-logging.md)

contributing/network-logging.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# Enhanced Network Logging for MongoDB Atlas Terraform Provider
2+
3+
## Overview
4+
5+
The MongoDB Atlas Terraform provider now includes enhanced network logging capabilities to provide better visibility into HTTP requests and responses when communicating with the MongoDB Atlas API. This feature helps diagnose API connectivity issues, timeouts, and status code errors.
6+
7+
## Features
8+
9+
### 1. Detailed Request/Response Timing
10+
- Logs the start time of each HTTP request
11+
- Measures and reports the total duration of each request
12+
13+
### 2. Comprehensive Error Context
14+
The logging transport analyzes network errors and provides specific context for common issues:
15+
16+
- **Timeout errors**: Indicates potential API server overload or network connectivity issues
17+
- **Connection refused**: Suggests API server may be down or unreachable
18+
- **DNS resolution failures**: Points to DNS configuration or network connectivity problems
19+
- **TLS certificate errors**: Highlights certificate validity or trust chain issues
20+
- **Request deadline exceeded**: Shows when requests exceed configured timeouts
21+
- **Connection reset**: Indicates unexpected server connection closures
22+
23+
### 3. HTTP Status Code Analysis
24+
- Categorizes responses as Success (2xx), Redirection (3xx), Client Error (4xx), or Server Error (5xx)
25+
- Logs additional details for non-2xx responses
26+
- Includes relevant response headers for debugging
27+
- **Special handling for 401 Unauthorized**: Recognizes digest authentication challenges and logs them as expected behavior rather than errors
28+
29+
### 4. Response Header Logging
30+
For error responses, the transport logs important debugging headers:
31+
32+
## Log Examples
33+
34+
### Successful Request With Authentication Challenge
35+
```
36+
[DEBUG] Network Request Start: GET https://cloud.mongodb.com/api/atlas/v2/groups/123/clusters (started at 2025-01-15T10:30:00.123Z)
37+
[DEBUG] Network Request Complete: GET https://cloud.mongodb.com/api/atlas/v2/groups/123/clusters - Status: 401 (Client Error) - Duration: 120ms
38+
[DEBUG] Digest Authentication Challenge: GET https://cloud.mongodb.com/api/atlas/v2/groups/123/clusters - Status: 401 - Expected first request in digest authentication flow
39+
[DEBUG] Network Request Start: GET https://cloud.mongodb.com/api/atlas/v2/groups/123/clusters (started at 2025-01-15T10:30:00.245Z)
40+
[DEBUG] Network Request Complete: GET https://cloud.mongodb.com/api/atlas/v2/groups/123/clusters - Status: 200 (Success) - Duration: 180ms
41+
```
42+
43+
### HTTP Error Response
44+
```
45+
[DEBUG] Network Request Start: POST https://cloud.mongodb.com/api/atlas/v2/groups/123/clusters (started at 2025-01-15T10:30:00.123Z)
46+
[DEBUG] Network Request Complete: POST https://cloud.mongodb.com/api/atlas/v2/groups/123/clusters - Status: 400 (Client Error) - Duration: 180ms
47+
[WARN] HTTP Error Response: POST https://cloud.mongodb.com/api/atlas/v2/groups/123/clusters - Status: 400 Bad Request - Duration: 180ms - Content-Type: application/json
48+
```
49+
50+
### Network Error
51+
```
52+
[DEBUG] Network Request Start: GET https://cloud.mongodb.com/api/atlas/v2/groups/123/clusters (started at 2025-01-15T10:30:00.123Z)
53+
[ERROR] Network Request Failed: GET https://cloud.mongodb.com/api/atlas/v2/groups/123/clusters - Duration: 30s - Error: context deadline exceeded
54+
[ERROR] Request Deadline Exceeded: GET https://cloud.mongodb.com/api/atlas/v2/groups/123/clusters - Duration: 30s - Request took longer than configured timeout
55+
```
56+
57+
### Connection Refused Error
58+
```
59+
[DEBUG] Network Request Start: POST https://cloud.mongodb.com/api/atlas/v2/groups/123/clusters (started at 2025-01-15T10:30:00.123Z)
60+
[ERROR] Network Request Failed: POST https://cloud.mongodb.com/api/atlas/v2/groups/123/clusters - Duration: 5s - Error: dial tcp 192.168.1.1:443: connect: connection refused
61+
[ERROR] Connection Refused: POST https://cloud.mongodb.com/api/atlas/v2/groups/123/clusters - Duration: 5s - API server may be down or unreachable
62+
```
63+
64+
### DNS Resolution Error
65+
```
66+
[DEBUG] Network Request Start: GET https://cloud.mongodb.com/api/atlas/v2/groups/123/clusters (started at 2025-01-15T10:30:00.123Z)
67+
[ERROR] Network Request Failed: GET https://cloud.mongodb.com/api/atlas/v2/groups/123/clusters - Duration: 2s - Error: dial tcp: lookup cloud.mongodb.com: no such host
68+
[ERROR] DNS Resolution Failed: GET https://cloud.mongodb.com/api/atlas/v2/groups/123/clusters - Duration: 2s - Check DNS configuration and network connectivity
69+
```
70+
71+
## Implementation Details
72+
73+
### Transport Chain
74+
The enhanced logging is implemented as a custom HTTP transport that wraps the existing transport chain:
75+
76+
```mermaid
77+
flowchart LR
78+
A[baseTransport] --> B[NetworkLoggingTransport]
79+
B --> C[digestTransport]
80+
C --> D[tfLoggingTransport]
81+
```
82+
83+
This ensures:
84+
1. **Network-level logging happens before digest authentication** - captures the initial 401 digest challenge requests
85+
2. **All HTTP operations are captured** - including the authentication flow
86+
3. **Terraform logging happens after digest authentication** - prevents logging of sensitive authentication details
87+
4. **Existing functionality is preserved** - maintains compatibility with all existing features
88+
5. **Consistent logging across all MongoDB Atlas API clients** - same transport chain for all SDK versions
89+
90+
## Troubleshooting Common Issues
91+
92+
### Digest Authentication Flow
93+
MongoDB Atlas uses digest authentication, which requires a two-step process:
94+
1. **Initial request (401 response)**: The first request to any endpoint will return a 401 status with a digest challenge
95+
2. **Authenticated request**: The client automatically retries with proper digest credentials
96+
97+
When you see a 401 status in the logs with the message "Expected first request in digest authentication flow", this is normal behavior and not an error. The digest authentication library will automatically handle the challenge and retry the request with proper credentials.
98+
99+
### Timeout Errors
100+
For "Request Deadline Exceeded" errors:
101+
- Check if the configured timeout is appropriate for your use case
102+
- Verify network stability
103+
- Consider if Atlas API is experiencing high load
104+
105+
### Connection Issues
106+
For "Connection Refused" or "DNS Resolution Failed" errors:
107+
- Verify network connectivity
108+
- Check DNS configuration
109+
- Ensure firewall rules allow HTTPS traffic to MongoDB Atlas

internal/config/client.go

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -96,18 +96,14 @@ type UAMetadata struct {
9696
Value string
9797
}
9898

99-
// NewClient func...
10099
func (c *Config) NewClient(ctx context.Context) (any, error) {
101-
// setup a transport to handle digest
102-
transport := digest.NewTransportWithHTTPTransport(cast.ToString(c.PublicKey), cast.ToString(c.PrivateKey), baseTransport)
103-
104-
// initialize the client
105-
client, err := transport.Client()
106-
if err != nil {
107-
return nil, err
108-
}
109-
110-
client.Transport = logging.NewTransport("MongoDB Atlas", transport)
100+
// Network Logging transport is before Digest transport so it can log the first Digest requests with 401 Unauthorized.
101+
// Terraform logging transport is after Digest transport so the Unauthorized request bodies are not logged.
102+
networkLoggingTransport := NewTransportWithNetworkLogging(baseTransport, logging.IsDebugOrHigher())
103+
digestTransport := digest.NewTransportWithHTTPRoundTripper(cast.ToString(c.PublicKey), cast.ToString(c.PrivateKey), networkLoggingTransport)
104+
// Don't change logging.NewTransport to NewSubsystemLoggingHTTPTransport until all resources are in TPF.
105+
tfLoggingTransport := logging.NewTransport("Atlas", digestTransport)
106+
client := &http.Client{Transport: tfLoggingTransport}
111107

112108
optsAtlas := []matlasClient.ClientOpt{matlasClient.SetUserAgent(userAgent(c))}
113109
if c.BaseURL != "" {

internal/config/transport.go

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
package config
2+
3+
import (
4+
"log"
5+
"net/http"
6+
"strings"
7+
"time"
8+
)
9+
10+
// NetworkLoggingTransport wraps an http.RoundTripper to provide enhanced logging
11+
// for network operations, including timing, status codes, and error details.
12+
type NetworkLoggingTransport struct {
13+
Transport http.RoundTripper
14+
Enabled bool
15+
}
16+
17+
// NewTransportWithNetworkLogging creates a new NetworkLoggingTransport that wraps
18+
// the provided transport with enhanced network logging capabilities.
19+
func NewTransportWithNetworkLogging(transport http.RoundTripper, enabled bool) *NetworkLoggingTransport {
20+
if transport == nil {
21+
transport = http.DefaultTransport
22+
}
23+
return &NetworkLoggingTransport{
24+
Transport: transport,
25+
Enabled: enabled,
26+
}
27+
}
28+
29+
// RoundTrip implements the http.RoundTripper interface and adds enhanced logging
30+
// around the HTTP request/response cycle.
31+
func (t *NetworkLoggingTransport) RoundTrip(req *http.Request) (*http.Response, error) {
32+
if !t.Enabled {
33+
return t.Transport.RoundTrip(req)
34+
}
35+
36+
startTime := time.Now()
37+
log.Printf("[DEBUG] Network Request Start: %s %s (started at %s)",
38+
req.Method, req.URL.String(), startTime.Format(time.RFC3339Nano))
39+
40+
resp, err := t.Transport.RoundTrip(req)
41+
duration := time.Since(startTime)
42+
if err != nil {
43+
log.Printf("[ERROR] Network Request Failed: %s %s - Duration: %v - Error: %v",
44+
req.Method, req.URL.String(), duration, err)
45+
46+
t.logNetworkErrorContext(err, req, duration)
47+
return resp, err
48+
}
49+
statusCode := resp.StatusCode
50+
statusClass := GetStatusClass(statusCode)
51+
52+
log.Printf("[DEBUG] Network Request Complete: %s %s - Status: %d (%s) - Duration: %v",
53+
req.Method, req.URL.String(), statusCode, statusClass, duration)
54+
55+
if statusCode == http.StatusUnauthorized {
56+
log.Printf("[DEBUG] Digest Authentication Challenge: %s %s - Status: 401 - Expected first request in digest authentication flow",
57+
req.Method, req.URL.String())
58+
} else if statusCode >= 300 {
59+
log.Printf("[WARN] HTTP Error Response: %s %s - Status: %d %s - Duration: %v - Content-Type: %s",
60+
req.Method, req.URL.String(), statusCode, http.StatusText(statusCode),
61+
duration, resp.Header.Get("Content-Type"))
62+
}
63+
return resp, nil
64+
}
65+
66+
// logNetworkErrorContext provides additional context for common network errors
67+
func (t *NetworkLoggingTransport) logNetworkErrorContext(err error, req *http.Request, duration time.Duration) {
68+
errStr := err.Error()
69+
switch {
70+
case strings.Contains(errStr, "timeout"):
71+
log.Printf("[ERROR] Network Timeout: %s %s - Duration: %v - This may indicate API server overload or network connectivity issues",
72+
req.Method, req.URL.String(), duration)
73+
case strings.Contains(errStr, "connection refused"):
74+
log.Printf("[ERROR] Connection Refused: %s %s - Duration: %v - API server may be down or unreachable",
75+
req.Method, req.URL.String(), duration)
76+
case strings.Contains(errStr, "no such host"):
77+
log.Printf("[ERROR] DNS Resolution Failed: %s %s - Duration: %v - Check DNS configuration and network connectivity",
78+
req.Method, req.URL.String(), duration)
79+
case strings.Contains(errStr, "certificate"):
80+
log.Printf("[ERROR] TLS Certificate Error: %s %s - Duration: %v - Check certificate validity and trust chain",
81+
req.Method, req.URL.String(), duration)
82+
case strings.Contains(errStr, "context deadline exceeded"):
83+
log.Printf("[ERROR] Request Deadline Exceeded: %s %s - Duration: %v - Request took longer than configured timeout",
84+
req.Method, req.URL.String(), duration)
85+
case strings.Contains(errStr, "connection reset"):
86+
log.Printf("[ERROR] Connection Reset: %s %s - Duration: %v - Server closed connection unexpectedly",
87+
req.Method, req.URL.String(), duration)
88+
default:
89+
log.Printf("[ERROR] Network Error: %s %s - Duration: %v - Error details: %v",
90+
req.Method, req.URL.String(), duration, err)
91+
}
92+
}
93+
94+
// GetStatusClass returns a human-readable status class for the HTTP status code
95+
func GetStatusClass(statusCode int) string {
96+
switch {
97+
case statusCode >= 200 && statusCode < 300:
98+
return "Success"
99+
case statusCode >= 300 && statusCode < 400:
100+
return "Redirection"
101+
case statusCode >= 400 && statusCode < 500:
102+
return "Client Error"
103+
case statusCode >= 500 && statusCode < 600:
104+
return "Server Error"
105+
default:
106+
return "Unknown"
107+
}
108+
}

0 commit comments

Comments
 (0)