-
Notifications
You must be signed in to change notification settings - Fork 140
CBG-4714: collect goroutine stack traces on signal #7870
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds goroutine stack trace collection capabilities to Sync Gateway, triggered both by SIGUSR1 signal and via a REST API endpoint. The implementation includes automatic rotation of stack trace files (keeping the 10 most recent), integration with sgcollect for diagnostics, and refactoring of profile rotation logic into shared utility functions.
Key changes:
- Introduces
/_debug/stacktraceREST endpoint and SIGUSR1 signal handler for on-demand stack trace collection - Refactors profile rotation logic into reusable
base.RotateProfilesIfNeeded()function - Updates sgcollect.py to collect stack trace files via the new endpoint
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
rest/server_context.go |
Implements signal handler registration and stack trace logging with file rotation |
rest/api.go |
Adds REST endpoint handler for stack trace collection |
rest/routing.go |
Registers the new /_debug/stacktrace endpoint |
rest/stats_context.go |
Refactors memory profile collection to use shared rotation utility |
base/util.go |
Adds utility functions for stack trace capture, profile rotation, and file creation |
tools/sgcollect.py |
Adds stack trace collection task and includes it in the collection workflow |
rest/adminapitest/admin_api_test.go |
Adds test coverage for the stack trace endpoint |
rest/server_context_test.go |
Adds test coverage for stack trace file collection and rotation |
tools-tests/sgcollect_info_test.py |
Adds test coverage for sgcollect stack trace file collection |
Redocly previews |
rest/stack_trace_handler_uinx.go
Outdated
| // stack trace signal received | ||
| currentTime := time.Now() | ||
| timestamp := currentTime.Format(time.RFC3339) | ||
| sc.logStackTraces(ctx, timestamp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be good to log that we are logging the stack trace to stderr, and also log the stack stack trace with the traditional InfofCtx logging so that it gets picked up if someone isn't grabbing stderr output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have logging to indicate that a stack trace is being collected but feel logging at stderr and Info level and writing to the file is over kill. I have left out logging to info level given we write this stuff in a file for sgcollect to collect up anyway.
| name="Collect stack trace via http client", | ||
| auth_headers=auth_headers, | ||
| url=stack_trace_url, | ||
| log_file="sg_stack_trace.log", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| log_file="sg_stack_trace.log", | |
| log_file="goroutines.log", |
I don't care what we call this but sg_stack_trace.log will mean that this output overwrites the output from the signal handler, so it needs to be a different name.
Basically this is just a plain text representation of pprof_goroutines.pb.gz so I think that would be a reasonable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will alter the name for all the files to make it goroutine specific but this doesn't overwrite the signal files, the signal files will have timestamp in the file name, sgcollect http call will not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's fine then.
| @@ -0,0 +1,60 @@ | |||
| //go:build !windows | |||
| // +build !windows | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line is actually unnecessary and will be flagged by govet in go 1.25, but not sure, we can leave this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use go 1.25 locally and my linter didn't pick that up so should be okay fro now (I think)
CBG-4714
Pre-review checklist
fmt.Print,log.Print, ...)base.UD(docID),base.MD(dbName))docs/apiDependencies (if applicable)
Integration Tests
GSI=true,xattrs=truehttps://jenkins.sgwdev.com/job/SyncGatewayIntegration/159/