GitHub Actions Failure Analysis

When your GitHub Actions workflow fails, this action automatically analyzes logs, correlates failures with code changes, and generates actionable root cause analysis reports.

Features

Semantic Log Processing: Uses cordon's transformer-based anomaly detection to extract relevant failure information from massive logs
LLM-Powered Analysis: Leverages DSPy and your choice of LLM (OpenAI, Anthropic, Gemini, Ollama) for intelligent failure analysis
PR Context-Aware: Automatically correlates code changes with failures to determine if PR changes caused the issue
Flexible Triggering: Run in the same workflow or via workflow_run events
Secret Detection: Automatically redacts secrets from all outputs
Professional Reports: Generates structured, actionable root cause analyses with evidence and recommendations

Quick Start

Same Workflow (Recommended)

Add as a final job that runs on failure:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm test

  analyze:
    needs: test
    if: failure()
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: calebevans/gha-failure-analysis@v1
        with:
          llm-provider: openai
          llm-model: gpt-4o
          llm-api-key: ${{ secrets.OPENAI_API_KEY }}
          post-pr-comment: true

Separate Workflow

Create a separate workflow that triggers on completion:

name: Failure Analysis

on:
  workflow_run:
    workflows: ["CI"]
    types: [completed]

jobs:
  analyze:
    if: ${{ github.event.workflow_run.conclusion == 'failure' }}
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: calebevans/gha-failure-analysis@v1
        with:
          run-id: ${{ github.event.workflow_run.id }}
          llm-provider: anthropic
          llm-model: claude-3-5-sonnet-20241022
          llm-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

Note: The github-token parameter is optional and defaults to ${{ github.token }}. Only specify it if you need to use a custom token with different permissions.

How It Works

When a workflow fails, the action:

Fetches workflow run metadata, failed jobs, and logs
Preprocesses logs using cordon to extract semantically relevant sections
Analyzes failures (steps and tests) using LLMs to identify root causes
Correlates PR changes with failures to assess impact
Synthesizes findings into a concise root cause analysis report
Outputs results as job summaries, PR comments, and JSON artifacts

PR Context Analysis

What It Does

When analyzing PR-triggered workflow failures, the action automatically:

Fetches PR changes: Retrieves all files changed with their diffs
Correlates with failures: Uses LLM analysis to determine if code changes likely caused each failure
Assesses impact: Provides clear assessment of whether PR changes are responsible
Identifies culprits: Points to specific files and lines that may have caused issues

This helps you quickly understand if failures are due to your changes or unrelated infrastructure issues.

Example Output

When PR changes cause failures, you'll see:

## 🔍 PR Impact Assessment

🔴 **Impact Likelihood:** High

The test failures are directly related to code changes in this PR:
- Changes to `src/auth/login.py` introduced a validation error
- Modified authentication logic conflicts with test expectations in `tests/test_auth.py`

### 💡 Relevant Code Changes

**src/auth/login.py** (+12 -3)
- Modified password validation logic at lines 45-52
- Added new timeout parameter affecting authentication flow

Configuration

PR context analysis is enabled by default for PR-triggered runs. To customize:

- uses: calebevans/gha-failure-analysis@v1
  with:
    llm-provider: openai
    llm-model: gpt-4o
    llm-api-key: ${{ secrets.OPENAI_API_KEY }}
    analyze-pr-context: true  # Default: true
    pr-context-token-budget: 20  # % of context for PR diffs (default: 20%)

Important: The action analyzes the specific commit that triggered the workflow, not the current PR state. This ensures accurate analysis even if the PR has been updated since the failure.

Configuration

Required Inputs

Input	Description	Example
`llm-provider`	LLM provider	`openai`, `anthropic`, `gemini`, `ollama`
`llm-model`	Model name	`gpt-4o`, `claude-3-5-sonnet-20241022`
`llm-api-key`	LLM API key	`${{ secrets.OPENAI_API_KEY }}`

Optional Inputs

Input	Default	Description
`github-token`	`${{ github.token }}`	GitHub token for API access
`run-id`	Current run	Workflow run ID to analyze
`pr-number`	Auto-detect	Manual PR number override (for testing)
`llm-base-url`	Provider default	Custom LLM API base URL
`post-pr-comment`	`false`	Post analysis as PR comment
`analyze-pr-context`	`true`	Analyze failures in context of PR changes
`pr-context-token-budget`	`20`	% of context to allocate for PR diffs (0-50)
`ignored-jobs`	None	Comma-separated job name patterns to ignore
`ignored-steps`	None	Comma-separated step name patterns to ignore
`artifact-patterns`	None	Comma-separated glob patterns for artifacts

Cordon Options (Log Preprocessing)

Input	Default	Description
`cordon-backend`	`sentence-transformers`	Embedding backend (`remote` or `sentence-transformers`)
`cordon-model-name`	`all-MiniLM-L6-v2`	Embedding model name
`cordon-api-key`	None	API key for remote embeddings
`cordon-device`	`cpu`	Device for local embeddings (`cpu`/`cuda`/`mps`)
`cordon-batch-size`	`32`	Batch size for embeddings

Outputs

Output	Description
`summary`	Brief failure summary
`category`	Failure category (infrastructure/test/build/configuration/timeout/unknown)
`report-path`	Path to full JSON report

LLM Providers

The action supports any LLM provider compatible with DSPy/LiteLLM:

OpenAI

llm-provider: openai
llm-model: gpt-4o
llm-api-key: ${{ secrets.OPENAI_API_KEY }}

Anthropic

llm-provider: anthropic
llm-model: claude-3-5-sonnet-20241022
llm-api-key: ${{ secrets.ANTHROPIC_API_KEY }}

Google Gemini

llm-provider: gemini
llm-model: gemini-2.5-flash
llm-api-key: ${{ secrets.GEMINI_API_KEY }}

Ollama (Local)

llm-provider: ollama
llm-model: llama3.1:70b
# No API key needed for local Ollama

Custom Endpoint

llm-provider: openai
llm-model: custom-model
llm-api-key: ${{ secrets.CUSTOM_API_KEY }}
llm-base-url: https://custom-llm-gateway.example.com

Cordon Configuration

Cordon preprocesses logs to extract semantically relevant sections. Configure it to use remote or local embeddings.

Remote Embeddings (Recommended)

Fast and no model downloads required. Uses your LLM provider's embedding API:

cordon-backend: remote
cordon-model-name: openai/text-embedding-3-small
cordon-api-key: ${{ secrets.OPENAI_API_KEY }}

Supported providers:

OpenAI: openai/text-embedding-3-small, openai/text-embedding-3-large
Google Gemini: gemini/gemini-embedding-001, gemini/text-embedding-004
Cohere: cohere/embed-english-v3.0, cohere/embed-multilingual-v3.0
Voyage: voyage/voyage-2, voyage/voyage-code-2

Example with Gemini:

cordon-backend: remote
cordon-model-name: gemini/gemini-embedding-001
cordon-api-key: ${{ secrets.GEMINI_API_KEY }}

Local Embeddings

For local embedding generation (slower, requires model download):

cordon-backend: sentence-transformers
cordon-model-name: all-MiniLM-L6-v2
cordon-device: cpu  # or cuda/mps for GPU

GPU Acceleration: If your runner has a GPU, use it for 5-15x faster preprocessing:

cordon-device: cuda  # NVIDIA GPUs
cordon-device: mps   # Apple Silicon

Example Output

The action generates structured analyses like:

# 🔍 Workflow Failure Analysis

| | |
|---|---|
| **Workflow** | `CI` |
| **Run ID** | [#21234567890](https://github.com/owner/repo/actions/runs/21234567890) |
| **Pull Request** | [#123](https://github.com/owner/repo/pull/123) |
| **Category** | 🧪 Test |

---

## 🎯 Root Cause

Test suite failed due to timeout waiting for database connection in integration tests.

## 🔬 Technical Details

### Immediate Cause
Integration test `test_user_authentication` timed out after 30 seconds while attempting
to establish a database connection.

### Contributing Factors
Database initialization script took longer than expected, causing connection pool exhaustion.

## 🔍 PR Impact Assessment

🔴 **Impact Likelihood:** High

The changes to `db/init.sql` increased startup time, directly causing the timeout.

### 💡 Relevant Code Changes

**db/init.sql** (+45 -12)
```diff
+ -- Added complex indexes that slow initialization
+ CREATE INDEX CONCURRENTLY idx_users_email ON users(email);
```

## 📊 Evidence

### ❌ test / Run integration tests

**Category:** test

**Root Cause:** Connection timeout after 30 seconds

<details>
<summary>📋 <b>View Detailed Evidence</b></summary>

```
ERROR: Timeout waiting for database connection
Connection pool exhausted: 0/10 connections available
Database startup took 45.2s (expected: <10s)
```
</details>

Security

The action automatically detects and redacts secrets from all outputs using detect-secrets. This prevents accidental exposure in:

Job summaries
PR comments
JSON reports
Console logs

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github		.github
src/gha_failure_analysis		src/gha_failure_analysis
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
action.yml		action.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GitHub Actions Failure Analysis

Features

Quick Start

Same Workflow (Recommended)

Separate Workflow

How It Works

PR Context Analysis

What It Does

Example Output

Configuration

Configuration

Required Inputs

Optional Inputs

Cordon Options (Log Preprocessing)

Outputs

LLM Providers

OpenAI

Anthropic

Google Gemini

Ollama (Local)

Custom Endpoint

Cordon Configuration

Remote Embeddings (Recommended)

Local Embeddings

Example Output

Security

About

Uh oh!

Releases 10

Packages

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

License

calebevans/gha-failure-analysis

Folders and files

Latest commit

History

Repository files navigation

GitHub Actions Failure Analysis

Features

Quick Start

Same Workflow (Recommended)

Separate Workflow

How It Works

PR Context Analysis

What It Does

Example Output

Configuration

Configuration

Required Inputs

Optional Inputs

Cordon Options (Log Preprocessing)

Outputs

LLM Providers

OpenAI

Anthropic

Google Gemini

Ollama (Local)

Custom Endpoint

Cordon Configuration

Remote Embeddings (Recommended)

Local Embeddings

Example Output

Security

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Packages