Skip to content

feat(ci): Add automated performance tracing to CI pipeline#12

Merged
neil-pozetroninc merged 11 commits intomainfrom
feat/add-performance-tracing
Oct 24, 2025
Merged

feat(ci): Add automated performance tracing to CI pipeline#12
neil-pozetroninc merged 11 commits intomainfrom
feat/add-performance-tracing

Conversation

@neil-pozetroninc
Copy link
Copy Markdown
Contributor

Summary

Integrates dotnet-trace into the CI workflow to automatically generate performance flamegraphs for continuous performance monitoring.

Changes

  • New performance-trace job in .github/workflows/dotnet.yml

    • Builds and runs HelloWorldBot as test application
    • Collects 30-second performance trace using dotnet-trace
    • Converts trace to Speedscope format for visualization
    • Uploads flamegraph as workflow artifact (7-day retention)
  • Updated all-tests-passed job

    • Added performance-trace as required dependency
    • Ensures CI won't pass without successful performance tracing

Benefits

  • Continuous Performance Monitoring: Every PR/push generates a performance profile
  • Regression Detection: Compare flamegraphs across commits to identify performance regressions
  • Startup Analysis: Captures application initialization and warmup performance
  • Zero Configuration: Automatic artifact uploads, viewable at speedscope.app

Test Plan

  • Worktree setup and baseline tests passing (849 tests)
  • YAML syntax validation successful
  • All pre-commit hooks passed
    • Kubeconform validation
    • Local link verification (395 valid links)
    • GitHub link verification (27 verified)
    • Markdown code verification
  • Verify CI job runs successfully
  • Download and view flamegraph artifact

Viewing Flamegraphs

After this PR merges, flamegraphs will be available on every CI run:

  1. Navigate to Actions → .NET CI → Select a workflow run
  2. Scroll to Artifacts section
  3. Download flamegraph artifact
  4. Visit https://speedscope.app
  5. Drag and drop trace.speedscope.json to visualize

Implementation Notes

  • Uses HelloWorldBot as the test application (simple, representative bot)
  • 30-second trace duration balances detail vs. CI time
  • Speedscope format chosen for universal browser-based visualization
  • 7-day artifact retention aligns with debugging needs

Fixes critical configuration mismatch where documentation instructed users
to set PROBOTSHARP_GITHUB_WEBHOOKSECRET but code expects PROBOTSHARP_WEBHOOK_SECRET,
causing webhook validation failures.

Changes:
- Updated all .env.example files to use PROBOTSHARP_WEBHOOK_SECRET
- Fixed docker-compose.yml environment variable mapping
- Updated documentation in Operations.md and README files
- Standardized configuration pattern across entire codebase

This is a documentation-only change; the code already supports the correct
configuration through its fallback chain.
Integrates dotnet-trace into the CI workflow to automatically generate
performance flamegraphs using HelloWorldBot as the test application.
This enables continuous performance monitoring and helps identify
regressions early in the development cycle.

The performance-trace job:
- Builds and runs HelloWorldBot
- Collects 30-second performance trace using dotnet-trace
- Converts to Speedscope format for visualization
- Uploads flamegraph as workflow artifact (7-day retention)
- Added as required check in all-tests-passed job
The performance-trace job was failing because HelloWorldBot is not
included in ProbotSharp.sln. The workflow restored the solution but
then tried to build HelloWorldBot with --no-restore, causing:

  error NETSDK1004: Assets file 'obj/project.assets.json' not found

This adds an explicit restore step for HelloWorldBot before building,
ensuring all dependencies are available.
The performance trace was failing because HelloWorldBot requires GitHub
App credentials to start. Without valid config, the app crashes before
dotnet-trace can attach (exit code 3).

Changes:
- Create dummy GitHub App private key file
- Set environment variables (AppId, WebhookSecret, PrivateKeyPath)
- Add process health check before attempting trace collection
- Improve error messages for debugging

This allows the app to start successfully for performance profiling
without needing real GitHub App credentials.
Add output redirection and detailed logging to diagnose why dotnet-trace
collection is failing with exit code 3 even though the process is running.

Changes:
- Redirect app stdout/stderr to /tmp/app.log
- Show app logs in process health check
- Show app logs and process list on trace collection failure
- Print PID at startup for debugging

This will help identify if the issue is:
- Process crashing during trace collection
- Permissions/capabilities issue
- .NET diagnostics not available in Release mode
Switch from Release to Debug build configuration to ensure .NET
diagnostics and EventPipe are fully available for dotnet-trace.

Release builds may have optimizations that interfere with runtime
profiling capabilities.
Add --providers Microsoft-Windows-DotNETRuntime to explicitly specify
which runtime events to trace. This may resolve attachment issues.
Found via web search: On Linux/macOS, dotnet-trace requires the target
application and dotnet-trace to share the same TMPDIR environment variable.
Otherwise, the command will time out.

Set TMPDIR=/tmp for both:
- The background HelloWorldBot process
- The dotnet-trace collect step

This should resolve the exit code 3 failure that occurred because
dotnet-trace couldn't establish a connection to the process.

Source: https://learn.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-trace
Add  command to see if process is visible
List diagnostic sockets in /tmp to verify EventPipe is available
Set DOTNET_DiagnosticPorts explicitly

This will help identify if the issue is:
- Process not creating diagnostic sockets
- dotnet-trace unable to discover the process
- Permission/visibility issues between processes
The issue was using --pid flag which doesn't exist. dotnet-trace expects:
- --process-id (not --pid)
- OR --name (preferred for dotnet run which spawns child process)

Diagnostic logs showed:
✅ Process running (PID 2915 parent, 2966 child HelloWorldBot)
✅ dotnet-trace can see the process
✅ Diagnostic socket exists: /tmp/dotnet-diagnostic-2966-8585-socket
❌ But command failed with: 'Must specify either --process-id, --name, --diagnostic-port, or --dsrouter'

Using --name HelloWorldBot to let dotnet-trace find the correct child process automatically.
Now that tracing works with --name flag, switch from Debug to Release
build for more realistic performance profiling. Release builds have
optimizations enabled and represent actual production performance.
@neil-pozetroninc neil-pozetroninc merged commit d129fb8 into main Oct 24, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant