Skip to content

Commit 90cdbc9

Browse files
committed
Add logging configuration
1 parent 7d50007 commit 90cdbc9

File tree

9 files changed

+71
-50
lines changed

9 files changed

+71
-50
lines changed

README.md

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -43,26 +43,26 @@ CAD consists of:
4343

4444
1) [PagerDuty Webhooks](https://support.pagerduty.com/docs/webhooks) are used to trigger Configuration-Anomaly-Detection when a [PagerDuty incident](https://support.pagerduty.com/docs/incidents) is created
4545
2) The webhook routes to a [Tekton EventListener](https://tekton.dev/docs/triggers/eventlisteners/)
46-
3) Received webhooks are filtered by a [Tekton Interceptor](https://tekton.dev/docs/triggers/interceptors/) that uses the payload to evaluate whether the alert has an implemented handler function in `cadctl` or not. If there is no handler implemented, the alert is directly forwarded to a human SRE.
46+
3) Received webhooks are filtered by a [Tekton Interceptor](https://tekton.dev/docs/triggers/interceptors/) that uses the payload to evaluate whether the alert has an implemented handler function in `cadctl` or not, and validates the webhook against the `X-PagerDuty-Signature` header. If there is no handler implemented, the alert is directly forwarded to a human SRE.
4747
4) If `cadctl` implements a handler for the received payload/alert, a [Tekton PipelineRun](https://tekton.dev/docs/pipelines/pipelineruns/) is started.
48-
5) The pipeline runs `cadctl` which determines the handler function by itself based on the payload.
48+
5) The pipeline runs `cadctl` which determines the handler function by itself based on the payload.
4949

5050
![CAD Overview](./images/cad_overview/cad_architecture_dark.png#gh-dark-mode-only)
5151
![CAD Overview](./images/cad_overview/cad_architecture_light.png#gh-light-mode-only)
5252

53-
## Contributing
53+
## Contributing
5454

5555
### Building
5656

57-
For build targets, see `make help`.
57+
For build targets, see `make help`.
5858

5959
### Adding a new investigation
6060

6161
CAD investigations are triggered by PagerDuty webhooks. Currently, CAD supports the following two formats of webhooks:
62-
- WebhookV3
62+
- WebhookV3
6363
- EventOrchestrationWebhook
6464

65-
The required investigation is identified by CAD based on the incident and its payload.
65+
The required investigation is identified by CAD based on the incident and its payload.
6666
As PagerDuty itself does not provide finer granularity for webhooks than service-based, CAD filters out the alerts it should investigate. For more information, please refer to https://support.pagerduty.com/docs/webhooks.
6767

6868
To add a new alert investigation:
@@ -75,7 +75,7 @@ To add a new alert investigation:
7575
- an existing cluster
7676
- an existing PagerDuty incident for the cluster and alert type that is being tested
7777

78-
To quickly create an incident for a cluster_id, you can run `./test/generate_incident.sh <alertname> <clusterid>`.
78+
To quickly create an incident for a cluster_id, you can run `./test/generate_incident.sh <alertname> <clusterid>`.
7979
Example usage:`./test/generate_incident.sh ClusterHasGoneMissing 2b94brrrrrrrrrrrrrrrrrrhkaj`.
8080

8181
### Running cadctl for an incident ID
@@ -90,6 +90,11 @@ Example usage:`./test/generate_incident.sh ClusterHasGoneMissing 2b94brrrrrrrrrr
9090
./bin/cadctl investigate --payload-path payload
9191
```
9292

93+
### Logging levels
94+
95+
CAD allows for different logging levels (debug, info, warn, error, fatal, panic). The log level is determind through a hierarchy, where the cli flag `log-level`
96+
is checked first, and if not set the optional environment variable `LOG_LEVEL` is used. If neither is set, the log level defaults to `info`.
97+
9398
## Documentation
9499

95100
### Investigations
@@ -101,7 +106,7 @@ Investigation specific documentation can be found in the according investigation
101106
### Integrations
102107

103108
* [AWS](https://github.com/aws/aws-sdk-go) -- Logging into the cluster, retreiving instance info and AWS CloudTrail events.
104-
* [PagerDuty](https://github.com/PagerDuty/go-pagerduty) -- Retrieving alert info, esclating or silencing incidents, and adding notes.
109+
* [PagerDuty](https://github.com/PagerDuty/go-pagerduty) -- Retrieving alert info, esclating or silencing incidents, and adding notes.
105110
* [OCM](https://github.com/openshift-online/ocm-sdk-go) -- Retrieving cluster info, sending service logs, and managing (post, delete) limited support reasons.
106111
* [osd-network-verifier](https://github.com/openshift/osd-network-verifier) -- Tool to verify the pre-configured networking components for ROSA and OSD CCS clusters.
107112

@@ -159,3 +164,5 @@ Grafana dashboard configmaps are stored in the [Dashboards](./dashboards/) direc
159164
- `CAD_EXPERIMENTAL_ENABLED`: enables experimental investigations when set to `true`, see mapping.go
160165

161166
For Red Hat employees, these environment variables can be found in the SRE-P vault.
167+
168+
- `LOG_LEVEL`: refers to the CAD log level, if not set, the default is `info`. See

cadctl/cmd/investigate/investigate.go

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -43,21 +43,26 @@ var InvestigateCmd = &cobra.Command{
4343
}
4444

4545
var (
46-
logLevelString = "info"
47-
payloadPath = "./payload.json"
46+
logLevelFlag = ""
47+
payloadPath = "./payload.json"
4848
)
4949

5050
func init() {
5151
InvestigateCmd.Flags().StringVarP(&payloadPath, "payload-path", "p", payloadPath, "the path to the payload")
52-
InvestigateCmd.Flags().StringVarP(&logLevelString, "log-level", "l", logLevelString, "the log level [debug,info,warn,error,fatal], default = info")
52+
InvestigateCmd.Flags().StringVarP(&logging.LogLevelString, "log-level", "l", "", "the log level [debug,info,warn,error,fatal], default = info")
5353

5454
err := InvestigateCmd.MarkFlagRequired("payload-path")
5555
if err != nil {
5656
logging.Warn("Could not mark flag 'payload-path' as required")
5757
}
5858
}
5959

60-
func run(_ *cobra.Command, _ []string) error {
60+
func run(cmd *cobra.Command, _ []string) error {
61+
// early init of logger for logs before clusterID is known
62+
if cmd.Flags().Changed("log-level") {
63+
flagValue, _ := cmd.Flags().GetString("log-level")
64+
logging.RawLogger = logging.InitLogger(flagValue, "")
65+
}
6166
payload, err := os.ReadFile(payloadPath)
6267
if err != nil {
6368
return fmt.Errorf("failed to read webhook payload: %w", err)
@@ -106,8 +111,13 @@ func run(_ *cobra.Command, _ []string) error {
106111
// For installing clusters, externalID can be empty.
107112
internalClusterID := cluster.ID()
108113

109-
// initialize logger for the internal-cluster-id context
110-
logging.RawLogger = logging.InitLogger(logLevelString, internalClusterID)
114+
// re-initialize logger for the internal-cluster-id context
115+
// if log-level flag is set, take priority over env + default
116+
if cmd.Flags().Changed("log-level") {
117+
logging.RawLogger = logging.InitLogger(logLevelFlag, internalClusterID)
118+
} else {
119+
logging.RawLogger = logging.InitLogger(logging.LogLevelString, internalClusterID)
120+
}
111121

112122
requiresInvestigation, err := clusterRequiresInvestigation(cluster, pdClient, ocmClient)
113123
if err != nil || !requiresInvestigation {

deploy/interceptor.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ spec:
2525
env:
2626
- name: CAD_EXPERIMENTAL_ENABLED
2727
value: ${CAD_EXPERIMENTAL_ENABLED}
28+
- name: LOG_LEVEL
29+
value: ${LOG_LEVEL}
2830
resources:
2931
limits:
3032
cpu: "100m"

deploy/task-cad-checks.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ spec:
3232
value: aggregation-pushgateway:9091
3333
- name: CAD_EXPERIMENTAL_ENABLED
3434
value: ${CAD_EXPERIMENTAL_ENABLED}
35+
- name: LOG_LEVEL
36+
value: ${LOG_LEVEL}
3537
# envFrom should pull all the secret information as envvars, so key names should be uppercase
3638
envFrom:
3739
- secretRef:
@@ -48,4 +50,4 @@ spec:
4850
memory: 64Mi
4951
limits:
5052
cpu: 100m
51-
memory: 256Mi
53+
memory: 256Mi

hack/update-template/main.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ var saasTemplateFile = Template{
3232
{Name: "REGISTRY_IMG", Value: "quay.io/app-sre/configuration-anomaly-detection"},
3333
{Name: "NAMESPACE_NAME", Value: "configuration-anomaly-detection"},
3434
{Name: "CAD_EXPERIMENTAL_ENABLED", Value: "FALSE"},
35+
{Name: "LOG_LEVEL", Value: "info"},
3536
},
3637
}
3738

interceptor/main.go

Lines changed: 3 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@ package main
33
import (
44
"context"
55
"fmt"
6-
"log"
76
"net"
87
"net/http"
98
"os"
@@ -12,8 +11,7 @@ import (
1211
"time"
1312

1413
"github.com/openshift/configuration-anomaly-detection/interceptor/pkg/interceptor"
15-
"go.uber.org/zap"
16-
"knative.dev/pkg/logging"
14+
"github.com/openshift/configuration-anomaly-detection/pkg/logging"
1715
"knative.dev/pkg/signals"
1816
)
1917

@@ -24,25 +22,13 @@ const (
2422
idleTimeout = 60 * time.Second
2523
)
2624

27-
var logger = &zap.SugaredLogger{}
25+
var logger = logging.InitLogger(logging.LogLevelString, "")
2826

2927
func main() {
3028
// set up signals so we handle the first shutdown signal gracefully
3129
ctx := signals.NewContext()
3230

33-
zap, err := zap.NewProduction()
34-
if err != nil {
35-
log.Fatalf("failed to initialize logger: %s", err)
36-
}
37-
logger = zap.Sugar()
38-
ctx = logging.WithLogger(ctx, logger)
39-
defer func() {
40-
if err := logger.Sync(); err != nil {
41-
log.Fatalf("failed to sync the logger: %s", err)
42-
}
43-
}()
44-
45-
service := interceptor.PagerDutyInterceptor{Logger: logger}
31+
service := interceptor.PagerDutyInterceptor{}
4632
mux := http.NewServeMux()
4733
mux.Handle("/", service)
4834
mux.HandleFunc("/ready", readinessHandler)

interceptor/pkg/interceptor/pdinterceptor.go

Lines changed: 14 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -13,37 +13,34 @@ import (
1313

1414
"github.com/PagerDuty/go-pagerduty/webhookv3"
1515
investigations "github.com/openshift/configuration-anomaly-detection/pkg/investigations"
16+
"github.com/openshift/configuration-anomaly-detection/pkg/logging"
1617
"github.com/openshift/configuration-anomaly-detection/pkg/pagerduty"
1718
triggersv1 "github.com/tektoncd/triggers/pkg/apis/triggers/v1beta1"
1819
"github.com/tektoncd/triggers/pkg/interceptors"
1920
"google.golang.org/grpc/codes"
20-
21-
"go.uber.org/zap"
2221
)
2322

2423
// ErrInvalidContentType is returned when the content-type is not a JSON body.
2524
var ErrInvalidContentType = errors.New("form parameter encoding not supported, please change the hook to send JSON payloads")
2625

27-
type PagerDutyInterceptor struct {
28-
Logger *zap.SugaredLogger
29-
}
26+
type PagerDutyInterceptor struct{}
3027

3128
func (pdi PagerDutyInterceptor) ServeHTTP(w http.ResponseWriter, r *http.Request) {
3229
b, err := pdi.executeInterceptor(r)
3330
if err != nil {
3431
var e Error
3532
if errors.As(err, &e) {
36-
pdi.Logger.Infof("HTTP %d - %s", e.Status(), e)
33+
logging.Infof("HTTP %d - %s", e.Status(), e)
3734
http.Error(w, e.Error(), e.Status())
3835
} else {
39-
pdi.Logger.Errorf("Non Status Error: %s", err)
36+
logging.Errorf("Non Status Error: %s", err)
4037
http.Error(w, http.StatusText(http.StatusInternalServerError), http.StatusInternalServerError)
4138
}
4239
}
4340

4441
w.Header().Add("Content-Type", "application/json")
4542
if _, err := w.Write(b); err != nil {
46-
pdi.Logger.Errorf("failed to write response: %s", err)
43+
logging.Errorf("failed to write response: %s", err)
4744
}
4845
}
4946

@@ -114,10 +111,8 @@ func (pdi *PagerDutyInterceptor) executeInterceptor(r *http.Request) ([]byte, er
114111
var ireq triggersv1.InterceptorRequest
115112

116113
// logging request
117-
pdi.Logger.Info("Wrapped Request header: %v", r.Header)
118-
pdi.Logger.Info("Wrapped Request body: ", body.String())
119-
pdi.Logger.Info("Unwrapped Request header: %v", extractedRequest.Header)
120-
pdi.Logger.Info("Unwrapped Request body: ", originalReq.Body)
114+
logging.Debug("Unwrapped Request header: %v", extractedRequest.Header)
115+
logging.Debug("Unwrapped Request body: ", originalReq.Body)
121116

122117
token, _ := os.LookupEnv("PD_SIGNATURE")
123118

@@ -126,13 +121,15 @@ func (pdi *PagerDutyInterceptor) executeInterceptor(r *http.Request) ([]byte, er
126121
return nil, badRequest(fmt.Errorf("failed to verify signature: %w", err))
127122
}
128123

124+
logging.Info("Signature verified successfully")
125+
129126
if err := json.Unmarshal(body.Bytes(), &ireq); err != nil {
130127
return nil, badRequest(fmt.Errorf("failed to parse body as InterceptorRequest: %w", err))
131128
}
132-
pdi.Logger.Debugf("Interceptor request body is: %s", ireq.Body)
129+
logging.Debugf("Interceptor request body is: %s", ireq.Body)
133130

134131
iresp := pdi.Process(ctx, &ireq)
135-
pdi.Logger.Debugf("Interceptor response is: %+v", iresp)
132+
logging.Debugf("Interceptor response is: %+v", iresp)
136133
respBytes, err := json.Marshal(iresp)
137134
if err != nil {
138135
return nil, internal(err)
@@ -151,18 +148,18 @@ func (pdi *PagerDutyInterceptor) Process(ctx context.Context, r *triggersv1.Inte
151148
// If the alert is not in the whitelist, return `Continue: false` as interceptor response
152149
// and escalate the alert to SRE
153150
if investigation == nil {
154-
pdi.Logger.Infof("Incident %s is not mapped to an investigation, escalating incident and returning InterceptorResponse `Continue: false`.", pdClient.GetIncidentID())
151+
logging.Infof("Incident %s is not mapped to an investigation, escalating incident and returning InterceptorResponse `Continue: false`.", pdClient.GetIncidentID())
155152
err = pdClient.EscalateIncidentWithNote("🤖 No automation implemented for this alert; escalated to SRE. 🤖")
156153
if err != nil {
157-
pdi.Logger.Errorf("failed to escalate incident '%s': %w", pdClient.GetIncidentID(), err)
154+
logging.Errorf("failed to escalate incident '%s': %w", pdClient.GetIncidentID(), err)
158155
}
159156

160157
return &triggersv1.InterceptorResponse{
161158
Continue: false,
162159
}
163160
}
164161

165-
pdi.Logger.Infof("Incident %s is mapped to investigation '%s', returning InterceptorResponse `Continue: true`.", pdClient.GetIncidentID(), investigation.Name)
162+
logging.Infof("Incident %s is mapped to investigation '%s', returning InterceptorResponse `Continue: true`.", pdClient.GetIncidentID(), investigation.Name)
166163
return &triggersv1.InterceptorResponse{
167164
Continue: true,
168165
}

openshift/template.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ parameters:
1111
value: configuration-anomaly-detection
1212
- name: CAD_EXPERIMENTAL_ENABLED
1313
value: "FALSE"
14+
- name: LOG_LEVEL
15+
value: info
1416
objects:
1517
- apiVersion: apps/v1
1618
kind: Deployment
@@ -35,6 +37,8 @@ objects:
3537
env:
3638
- name: CAD_EXPERIMENTAL_ENABLED
3739
value: ${CAD_EXPERIMENTAL_ENABLED}
40+
- name: LOG_LEVEL
41+
value: ${LOG_LEVEL}
3842
envFrom:
3943
- secretRef:
4044
name: cad-pd-token
@@ -318,6 +322,8 @@ objects:
318322
value: aggregation-pushgateway:9091
319323
- name: CAD_EXPERIMENTAL_ENABLED
320324
value: ${CAD_EXPERIMENTAL_ENABLED}
325+
- name: LOG_LEVEL
326+
value: ${LOG_LEVEL}
321327
envFrom:
322328
- secretRef:
323329
name: cad-aws-credentials

pkg/logging/logging.go

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,10 @@ import (
1010
"go.uber.org/zap/zapcore"
1111
)
1212

13+
var LogLevelString = getLogLevel()
14+
1315
// RawLogger is the raw global logger object used for calls wrapped by the logging package
14-
var RawLogger = InitLogger("info", "")
16+
var RawLogger = InitLogger(LogLevelString, "")
1517

1618
// InitLogger initializes a cluster-id specific child logger
1719
func InitLogger(logLevelString string, clusterID string) *zap.SugaredLogger {
@@ -92,3 +94,11 @@ func Errorf(template string, args ...interface{}) {
9294
func Fatalf(template string, args ...interface{}) {
9395
RawLogger.Fatalf(template, args...)
9496
}
97+
98+
// getLogLevel returns the log level from the environment variable LOG_LEVEL
99+
func getLogLevel() string {
100+
if envLogLevel, exists := os.LookupEnv("LOG_LEVEL"); exists {
101+
return envLogLevel
102+
}
103+
return "info"
104+
}

0 commit comments

Comments
 (0)