Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 10 additions & 7 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,12 @@

Octometrics is a Go CLI that profiles GitHub Actions workflows. Read `design.md` for architecture diagrams and key design decisions. The main commands are:

| Command | Purpose |
|---------|---------|
| `monitor` | Collects system metrics (CPU, memory, disk, I/O) during a GHA job, writes JSONL |
| `gather` | Fetches workflow/job/step data from the GitHub REST & GraphQL APIs, stores as JSON |
| `observe` | Renders gathered data as interactive HTML (Mermaid Gantt charts, Plotly metric charts) |
| `report` | Analyzes monitor JSONL and posts Mermaid-based summaries to GHA step summaries and PR comments |
| Command | Purpose |
| --------- | ---------------------------------------------------------------------------------------------- |
| `monitor` | Collects system metrics (CPU, memory, disk, I/O) during a GHA job, writes JSONL |
| `gather` | Fetches workflow/job/step data from the GitHub REST & GraphQL APIs, stores as JSON |
| `observe` | Renders gathered data as interactive HTML (Mermaid Gantt charts, Plotly metric charts) |
| `report` | Analyzes monitor JSONL and posts Mermaid-based summaries to GHA step summaries and PR comments |

Key packages: `cmd/` (Cobra CLI), `monitor/` (system metrics), `gather/` (GitHub API), `observe/` (HTML visualization), `report/` (in-action reporting), `internal/config/` (Viper config), `logging/` (zerolog setup).

Expand All @@ -29,6 +29,7 @@ Analyze the outputs and fix issues you introduced. **Do not change a test unless
- Tests use `github.com/stretchr/testify` (`require` for fatal checks, `assert` for non-fatal).
- Use the `internal/testhelpers.Setup(t)` helper to create a temp directory and logger for tests. It auto-cleans on success and preserves on failure.
- Test data goes in `<package>/testdata/` directories.
- You can run `pre-commit` using the `.pre-commit-config.yaml` file for extensive checks.

## Coding Conventions

Expand All @@ -49,11 +50,13 @@ Provide a **Risk Rating** at the top of the review summary:
- **LOW:** Documentation, styling, minor bug fixes in non-critical paths, or boilerplate.

### 2. Targeted Review Areas
Identify specific code blocks that require **scrupulous human review**. Focus on:
Identify and call out specific code blocks that require **scrupulous human review**. Focus on:
- Complex conditional logic or concurrency-prone areas.
- Potential breaking changes in internal or external APIs.
- Logic that lacks sufficient unit test coverage within the PR.

If you find any, list them and give a brief description of why they deserve extra attention.

### 3. Reviewer Recommendations
Analyze the git history (recent editors) to suggest the most qualified reviewers.
- Prioritize individuals who have made significant recent contributions to the specific files modified.
25 changes: 17 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,33 @@

A simple CLI tool to visualize and profile your GitHub Actions workflows. See all the processes that run as part of a PR, workflow, or job in a simple, interactive chart. It can also run [directly in your GitHub Actions flow](https://github.com/kalverra/octometrics-action), useful for debugging changes and performance issues.

![Example PR run](example.png)
![Example PR run](./pr-example.png)

## Run

Before running, make sure to provide GitHub API token, either through the `GITHUB_TOKEN` env var, or the `-t` flag.

```sh
# Install
go install github.com/kalverra/octometrics@latest

# Show help menu
go run . -h
```
octometrics -h

## Monitor
# To see all workflows run on all commits a part of this PR (including merge queue runs): https://github.com/kalverra/octometrics/pull/33
octometrics gather -o kalverra -r octometrics -p 33

This will launch a background process to monitor stats like CPU and memory usage. This can be run on GHA runners so that when you later `gather` and `observe` the data, you will also have detailed profiling info.
# To see all workflows run on a specific commit: https://github.com/kalverra/octometrics/pull/33/changes/94ad3f7e2f45852a99791326847ea12c94b964dc
octometrics gather -o kalverra -r octometrics -c 94ad3f7e2f45852a99791326847ea12c94b964dc

```sh
go run . monitor
# To see a specific workflow run: https://github.com/kalverra/octometrics/actions/runs/22918636165
octometrics gather -o kalverra -r octometrics -w 22918636165

# Use '-u' to force update local data if it already exists
octometrics gather -o kalverra -r octometrics -p 33 -u
```

### GitHub Action
## GitHub Action

Run `monitor` directly in your GitHub action and it will post performance data as a comment and summary to the action run. [See the octometrics-action](https://github.com/kalverra/octometrics-action).

Expand Down
20 changes: 20 additions & 0 deletions cmd/gather.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,26 @@ var (
var gatherCmd = &cobra.Command{
Use: "gather",
Short: "Gather metrics from GitHub",
Long: `Gather metrics from GitHub.

Read workflow runtime data from GitHub to display in the browser.

It can be used to gather data for a specific workflow run, pull request, or commit.
In-progress workflows are supported and will be displayed with an active status indicator.
`,
Example: `
# To see all workflows run on all commits a part of this PR (including merge queue runs): https://github.com/kalverra/octometrics/pull/33
octometrics gather -o kalverra -r octometrics -p 33

# To see all workflows run on a specific commit: https://github.com/kalverra/octometrics/pull/33/changes/94ad3f7e2f45852a99791326847ea12c94b964dc
octometrics gather -o kalverra -r octometrics -c 94ad3f7e2f45852a99791326847ea12c94b964dc

# To see a specific workflow run: https://github.com/kalverra/octometrics/actions/runs/22918636165
octometrics gather -o kalverra -r octometrics -w 22918636165

# Use '-u' to force update local data if it already exists
octometrics gather -o kalverra -r octometrics -p 33 -u
`,
PreRunE: func(_ *cobra.Command, _ []string) error {
if err := cfg.ValidateGather(); err != nil {
return err
Expand Down
11 changes: 10 additions & 1 deletion cmd/monitor.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,16 @@ var (
var monitorCmd = &cobra.Command{
Use: "monitor",
Short: "Monitor system resources",
Long: "Monitor system resources and save the data to a file for later analysis.",
Long: `Monitor system resources for later analysis.

This command will monitor system resources like CPU, memory, disk, and I/O during a GHA job.

It will write the data to a file for later analysis. Primarily used in the octometrics-action to monitor system resources during a GHA job.`,
Example: `
octometrics monitor # Monitor system resources until interrupted
octometrics monitor --duration=1h # Monitor system resources for 1 hour
octometrics monitor --interval=5s # Monitor system resources every 5 seconds
`,
RunE: func(_ *cobra.Command, _ []string) error {
var (
ctx context.Context
Expand Down
4 changes: 4 additions & 0 deletions cmd/observe.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ import (
var observeCmd = &cobra.Command{
Use: "observe",
Short: "Observe metrics from GitHub",
Long: `Observe metrics from GitHub.

Display the gathered Workflow/Job/Step data in your browser.`,
Example: `octometrics observe # Display all of your gathered Workflow/Job/Step data in your browser`,
PreRunE: func(_ *cobra.Command, _ []string) error {
var err error
githubClient, err = gather.NewGitHubClient(logger, cfg.GitHubToken, nil)
Expand Down
Binary file removed example.png
Binary file not shown.
43 changes: 23 additions & 20 deletions gather/commit.go
Original file line number Diff line number Diff line change
Expand Up @@ -290,26 +290,25 @@ func setWorkflowRunsForCommit(
)

for _, checkRun := range checkRuns {
if checkRun.GetStatus() == "completed" {
match := workflowRunIDRe.FindStringSubmatch(checkRun.GetHTMLURL())
if len(match) == 0 {
log.Warn().
Str("owner", owner).
Str("repo", repo).
Str("SHA", commitData.GetSHA()).
Str("check_run", checkRun.GetName()).
Str("URL", checkRun.GetHTMLURL()).
Msg("Failed to parse workflow run ID from check run URL")
continue
}
workflowRunID, err := strconv.ParseInt(match[1], 10, 64)
if err != nil {
return fmt.Errorf("failed to parse workflow run ID from check run URL: %w", err)
}
workflowRunIDsSet[workflowRunID] = struct{}{}
} else {
log.Warn().Str("Check Run", checkRun.GetName()).Msg("Check run is not yet completed, skipping")
if checkRun.GetStatus() != "completed" {
log.Warn().Str("check_run", checkRun.GetName()).Msg("Check run is not yet completed")
}
match := workflowRunIDRe.FindStringSubmatch(checkRun.GetHTMLURL())
if len(match) == 0 {
log.Warn().
Str("owner", owner).
Str("repo", repo).
Str("SHA", commitData.GetSHA()).
Str("check_run", checkRun.GetName()).
Str("URL", checkRun.GetHTMLURL()).
Msg("Failed to parse workflow run ID from check run URL")
continue
}
workflowRunID, err := strconv.ParseInt(match[1], 10, 64)
if err != nil {
return fmt.Errorf("failed to parse workflow run ID from check run URL: %w", err)
}
workflowRunIDsSet[workflowRunID] = struct{}{}
}

// Pass commit data down to the workflow run
Expand All @@ -323,7 +322,11 @@ func setWorkflowRunsForCommit(
}
commitData.comparisonMutex.Lock()
defer commitData.comparisonMutex.Unlock()
commitData.Conclusion = establishPRChecksConclusion(commitData.Conclusion, workflowRun.GetConclusion())
conclusion := workflowRun.GetConclusion()
if conclusion == "" {
conclusion = workflowRun.GetStatus()
}
commitData.Conclusion = establishPRChecksConclusion(commitData.Conclusion, conclusion)
commitData.Cost += workflowRun.GetCost()
if workflowRun.GetRunStartedAt().Before(commitData.StartActionsTime) ||
commitData.StartActionsTime.IsZero() {
Expand Down
79 changes: 47 additions & 32 deletions gather/workflow_run.go
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,9 @@ func (w *WorkflowRunData) GetUsage() *github.WorkflowRunUsage {
return w.Usage
}

// WorkflowRun gathers all metrics for a completed workflow run
// WorkflowRun gathers all metrics for a workflow run.
// In-progress runs are supported: billing and monitoring data are skipped,
// and the result is not cached locally so fresh data is always fetched.
func WorkflowRun(
log zerolog.Logger,
client *GitHubClient,
Expand Down Expand Up @@ -207,8 +209,11 @@ func WorkflowRun(
if workflowRun == nil {
return nil, "", fmt.Errorf("workflow run '%d' not found on GitHub", workflowRunID)
}
if workflowRun.GetStatus() != "completed" {
return nil, "", fmt.Errorf("workflow run '%d' is still in progress", workflowRunID)
completed := workflowRun.GetStatus() == "completed"
if !completed {
log.Warn().
Str("status", workflowRun.GetStatus()).
Msg("Workflow run is not yet completed; billing and monitoring data will be unavailable")
}

workflowRunData.WorkflowRun = workflowRun
Expand All @@ -220,24 +225,26 @@ func WorkflowRun(
analyses []*monitor.Analysis
)

eg.Go(func() error {
var analysisErr error
analyses, analysisErr = monitoringData(log, client, owner, repo, workflowRunID, targetDir)
return analysisErr
})
if completed {
eg.Go(func() error {
var analysisErr error
analyses, analysisErr = monitoringData(log, client, owner, repo, workflowRunID, targetDir)
return analysisErr
})

eg.Go(func() error {
var billingErr error
workflowBillingData, billingErr = billingData(client, owner, repo, workflowRunID)
return billingErr
})
}

eg.Go(func() error {
var jobsErr error
workflowRunJobs, jobsErr = jobsData(client, owner, repo, workflowRunID)
return jobsErr
})

eg.Go(func() error {
var billingErr error
workflowBillingData, billingErr = billingData(client, owner, repo, workflowRunID)
return billingErr
})

if err := eg.Wait(); err != nil {
return nil, "", fmt.Errorf(
"failed to collect job, billing, and/or monitoring data for workflow run '%d': %w",
Expand All @@ -247,19 +254,26 @@ func WorkflowRun(
}
workflowRunData.Usage = workflowBillingData

// Calculate job cost data and add to workflow run data
for _, job := range workflowRunJobs {
// Calculate completed at for the workflow. GitHub API only gives "UpdatedAt" for workflows
// which can be misleading.
if workflowRunData.RunCompletedAt.IsZero() {
workflowRunData.RunCompletedAt = job.GetCompletedAt().Time
} else if job.GetCompletedAt().After(workflowRunData.RunCompletedAt) {
workflowRunData.RunCompletedAt = job.GetCompletedAt().Time
completedAt := job.GetCompletedAt().Time
if !completedAt.IsZero() {
if workflowRunData.RunCompletedAt.IsZero() {
workflowRunData.RunCompletedAt = completedAt
} else if completedAt.After(workflowRunData.RunCompletedAt) {
workflowRunData.RunCompletedAt = completedAt
}
}

runner, cost, err := calculateJobRunBilling(job.GetID(), workflowBillingData)
if err != nil {
return nil, "", fmt.Errorf("failed to calculate cost for job '%d': %w", job.GetID(), err)
var (
runner string
cost int64
)
if completed {
var billingErr error
runner, cost, billingErr = calculateJobRunBilling(job.GetID(), workflowBillingData)
if billingErr != nil {
return nil, "", fmt.Errorf("failed to calculate cost for job '%d': %w", job.GetID(), billingErr)
}
}
workflowRunData.Cost += cost
workflowRunData.Jobs = append(workflowRunData.Jobs, &JobData{
Expand All @@ -269,16 +283,17 @@ func WorkflowRun(
})
}

// Match monitoring data to jobs
nextAnalysisLoop:
for _, analysis := range analyses {
for _, job := range workflowRunData.Jobs {
if analysis.JobName == job.GetName() {
job.Analysis = analysis
continue nextAnalysisLoop
if completed {
nextAnalysisLoop:
for _, analysis := range analyses {
for _, job := range workflowRunData.Jobs {
if analysis.JobName == job.GetName() {
job.Analysis = analysis
continue nextAnalysisLoop
}
}
log.Warn().Str("monitoring_data_job_name", analysis.JobName).Msg("Found monitoring data for job but found no job name matches")
}
log.Warn().Str("monitoring_data_job_name", analysis.JobName).Msg("Found monitoring data for job but found no job name matches")
}

data, err := json.Marshal(workflowRunData)
Expand Down
Loading
Loading