LLM-Powered Test Analysis — Complete Guide

This guide explains how the LISA MCP server uses Azure OpenAI to automatically analyze test failures, determine root causes, and generate rich reports.

Overview — how it works
Prerequisites
The 4 analysis tools
Step-by-step: analyze a test run
Understanding the output
Reading the HTML report
Root cause categories
Failure severity levels
Log collection — how context is gathered
End-to-end pipeline: run_and_analyze
API cost guidance
CI/CD integration

1. Overview — how it works

LISA test run output
 │
 ▼
┌─────────────────────────────────────┐
│ Step 1 — Parse results │
│ parse_results() → TestRunSummary │
│ JUnit XML or console output │
└──────────────┬──────────────────────┘
 │
 ▼
┌─────────────────────────────────────┐
│ Step 2 — Collect log context │
│ collect_run_logs() │
│ Tail-reads per-test .log files │
│ Extracts lines near ERROR/FAIL │
│ Hard cap: 256 KB per file │
│ 8,000 chars per test │
└──────────────┬──────────────────────┘
 │
 ▼
┌─────────────────────────────────────┐
│ Step 3 — Per-failure analysis │
│ For each failed test: │
│ analyze_failure() → Azure OpenAI API │
│ Uses tool_use for structured JSON │
│ Returns FailureAnalysis: │
│ • root_cause_category │
│ • root_cause_description │
│ • recommended_fix │
│ • severity (critical/high/…) │
│ • relevant_log_lines │
│ • confidence (0.0–1.0) │
└──────────────┬──────────────────────┘
 │
 ▼
┌─────────────────────────────────────┐
│ Step 4 — Run-level summary │
│ analyze_run() → Azure OpenAI API │
│ Sends compact digest (not logs) │
│ Returns RunAnalysisSummary: │
│ • overall_health │
│ • health_score │
│ • failure_patterns │
│ • top_priorities │
│ • recommendations │
│ • executive_summary │
└──────────────┬──────────────────────┘
 │
 ▼
┌─────────────────────────────────────┐
│ Step 5 — Report generation │
│ generate_html_report() │
│ generate_markdown_report() │
│ Self-contained HTML (no CDN) │
│ GitHub-flavored Markdown │
└─────────────────────────────────────┘

Token usage per run:

Per failure: ~1,000–2,000 input tokens + ~500 output tokens
Run summary: ~800 input tokens + ~800 output tokens
10 failures ≈ ~~15,000–25,000 tokens total (~~$0.05–0.15 at current pricing)

2. Prerequisites

Get an Azure OpenAI API key

Go to https://portal.azure.com
Create an account or sign in
Navigate to API Keys → Create Key
Copy the key (starts with YOUR_AZURE_OPENAI_API_KEY)

The key is passed directly to the tools — it is never stored on disk by the MCP server.

LISA installation (for running tests)

If you only want to analyze existing results, no LISA installation is needed. For end-to-end run_and_analyze, LISA must be installed:

cd ~/lisa
pip install -e .
lisa --version

3. The 4 analysis tools

Tool	Input	Output	Use case
`analyze_test_run_with_llm`	Results + optional run dir	JSON with all analyses	Analyze results you already have
`analyze_failure_root_cause`	Single test name + error	JSON with FailureAnalysis	Deep-dive on one specific failure
`generate_analysis_report`	Results + output dir	HTML + Markdown + JSON	Full report to share with team
`run_and_analyze`	Runbook + lisa_path	All of the above	One-shot: run tests and get report

4. Step-by-step: analyze a test run

Option A — You have a JUnit XML file

I have LISA test results at ~/lisa/lisa_results.xml and LISA logs at ~/lisa/runtime/latest.
My Azure OpenAI API key is YOUR_AZURE_OPENAI_API_KEY
Analyze the failures and generate a report at ~/reports/

The AI calls generate_analysis_report:

generate_analysis_report(
 results_source="~/lisa/lisa_results.xml",
 api_key="YOUR_AZURE_OPENAI_API_KEY",
 output_dir="~/reports/",
 run_dir="~/lisa/runtime/latest",
 report_base_name="my_run",
)

Returns:

{
 "html_path": "/home/user/reports/my_run.html",
 "markdown_path": "/home/user/reports/my_run.md",
 "report": { ... }
}

Open my_run.html in a browser for the full visual report.

Option B — You have console output

Here is my LISA run output. Analyze the failures with my API key YOUR_AZURE_OPENAI_API_KEY:

[PASS] Provisioning.smoke_test (12.3s)
[FAIL] StorageVerification.nvme_io_test (120.5s)
[FAIL] NetworkTest.verify_sriov (45.0s)
[PASS] CoreTest.verify_cpu_count (2.8s)

The AI calls analyze_test_run_with_llm:

analyze_test_run_with_llm(
 results_source="[PASS] Provisioning...\n[FAIL] StorageVerification...",
 api_key="YOUR_AZURE_OPENAI_API_KEY",
)

Option C — Investigate one specific failure

The test StorageVerification.nvme_io_test failed with:
"Expected exit code 0 but got 1"
The log is at ~/lisa/logs/StorageVerification/nvme_io_test/console.log
Analyze with API key YOUR_AZURE_OPENAI_API_KEY

The AI calls analyze_failure_root_cause:

analyze_failure_root_cause(
 test_name="StorageVerification.nvme_io_test",
 failure_message="Expected exit code 0 but got 1",
 api_key="YOUR_AZURE_OPENAI_API_KEY",
 log_file_path="~/lisa/logs/StorageVerification/nvme_io_test/console.log",
)

Option D — Run tests and analyze in one shot

Run the runbook at ~/runbooks/ubuntu22_t1.yml using LISA at ~/lisa.
Pass subscription_id=xxxx and admin_private_key_file=~/.ssh/lisa_key.
After running, analyze all failures with Azure OpenAI API key YOUR_AZURE_OPENAI_API_KEY
and save the report to ~/reports/ubuntu22_t1/

The AI calls run_and_analyze — fully automated end-to-end.

5. Understanding the output

FailureAnalysis JSON

{
 "test_name": "StorageVerification.nvme_io_test",
 "root_cause_category": "disk_io_error",
 "root_cause_description": "The NVMe device /dev/nvme0 was not found during test
 execution. The PCIe enumeration log shows a timeout at boot, suggesting the
 Azure VM SKU (Standard_D4s_v3) does not provide NVMe storage.",
 "recommended_fix": "Switch to an Lsv3 or Lv3 series VM for NVMe tests.
 Verify with: az vm list-skus --location westus3 | grep nvme
 Alternatively, check if the VM has NVMe: ls /dev/nvme*",
 "severity": "critical",
 "relevant_log_lines": [
 "[ 2.345] pcieport: PCIe Bus Error: severity=Corrected",
 "ERROR: NVMe device not found at /dev/nvme0",
 "FAILED: assert_that('/dev/nvme0').exists() -> False"
 ],
 "confidence": 0.88
}

RunAnalysisSummary JSON

{
 "overall_health": "critical",
 "health_score": 0.72,
 "failure_patterns": [
 "NVMe storage not enumerated on Standard_D4s_v3",
 "SR-IOV VF interface timeout on accelerated networking"
 ],
 "top_priorities": [
 "Switch VM SKU to Lsv3 series for all NVMe tests",
 "Investigate accelerated networking VF timeout",
 "Review kernel version for SR-IOV fixes"
 ],
 "environment_issues": [
 "Standard_D4s_v3 does not support NVMe — use Lsv3/Lv3"
 ],
 "recommendations": [
 "Update runbook to use Standard_L8s_v3 for storage tests",
 "Pin kernel to 5.15.0-1040-azure or later",
 "Add retry logic to SR-IOV VF detection (30s retry, 3 attempts)"
 ],
 "executive_summary": "This T1 test run achieved a 72% pass rate with 5 failures,
 2 of which are critical blockers. The NVMe storage failures are caused by the
 wrong VM SKU being used — a simple runbook fix will resolve 8 tests. The SR-IOV
 network failure appears related to a known driver timing issue in kernel 5.14.
 Recommend updating the runbook VM configuration and upgrading the kernel before
 the next run."
}

6. Reading the HTML report

The HTML report (lisa_analysis.html) has these sections:

Header

Overall health badge: HEALTHY / DEGRADED / CRITICAL / UNKNOWN
Health score percentage (weighted by severity)
Run directory and generation timestamp

Metrics grid

Total tests · Passed · Failed · Skipped · Duration

Pass rate bar

Visual progress bar showing pass rate.

Executive Summary

3-5 sentence non-technical summary for stakeholders. Safe to paste into a Jira ticket or email.

Failure Analysis cards

One card per failed test, sorted by severity (critical first):

Badge: severity level (CRITICAL / HIGH / MEDIUM / LOW)
Category badge: root cause category
Confidence: how certain The model is (0–100%)
Root Cause: technical explanation
Recommended Fix: specific actionable step
Log Lines: most relevant log lines in a code block

Top Priorities

Numbered list of the most important issues to fix first.

Recommendations

Specific team actions (updating runbooks, fixing code, changing configs).

Failure Patterns

Tag cloud of recurring themes across multiple failures.

Environment / Infrastructure Issues

Problems that are not test code bugs — VM SKU, quota, network config.

7. Root cause categories

Category	When used
`kernel_panic`	Kernel crash, oops, BUG() call
`network_timeout`	SSH timeout, ping failure, SR-IOV VF timeout
`disk_io_error`	NVMe not found, I/O error, filesystem unmount
`permission_denied`	chmod issues, sudo failures, SELinux denials
`package_not_found`	apt/dnf install failure, binary not in PATH
`service_crash`	systemd service failed, process died unexpectedly
`assertion_failure`	Test assertion failed (expected != actual)
`timeout`	Test exceeded timeout, hung process
`environment_setup`	cloud-init failure, wrong VM SKU, missing feature
`flaky_test`	Race condition, intermittent timing issue
`infrastructure`	Azure quota, VM provisioning failure, network glitch
`unknown`	The model couldn't determine the root cause from available data

8. Failure severity levels

Severity	Meaning	Action
`critical`	Blocks release; data loss or system instability risk	Fix before merge/publish
`high`	Major feature completely broken	Fix before next release
`medium`	Partial functionality impacted	Fix within sprint
`low`	Minor issue, informational	Fix when convenient

Severity determines card color in the HTML report and sort order.

9. Log collection — how context is gathered

When run_dir is provided, the log collector:

Walks standard LISA output directories:

<run_dir>/logs/<SuiteName>/<test_method>/
<run_dir>/runtime/<timestamp>/<test_name>/

Tail-reads each .log file (last 256 KB only — never loads whole file)
Scans for lines matching these error patterns:

ERROR, FAILED, EXCEPTION, Traceback, AssertionError,
CRITICAL, PANIC, exit code [non-zero], returncode, timeout,
permission denied, connection refused, no such file

Extracts 15 lines of context before and after each error signal
Caps output at 8,000 characters per test (≈ 150 lines)
Sends the context snippet to the AI alongside the failure message

What to do if logs aren't found

If run_dir contains no log files:

Analysis still works using just the failure message and stack trace
Add run_dir=None explicitly to skip log collection
The AI will note low confidence when logs are insufficient

10. End-to-end pipeline: `run_and_analyze`

run_and_analyze ties everything together:

1. run_tests(lisa_path, runbook_path, variables)
 │
 ▼
2. Find results: look for lisa_results.xml in
 - <lisa_path>/runtime/<latest>/
 - <lisa_path>/runs/<latest>/
 - Current working directory
 Fallback: use stdout as results_source
 │
 ▼
3. generate_analysis_report(results_source, api_key, output_dir, run_dir)
 │
 ▼
4. Return: {
 run_result: { success, returncode, command },
 summary_line: "CRITICAL | 13/18 passed (72.2%) | 5 failed",
 html_path: "/path/to/lisa_analysis.html",
 markdown_path: "/path/to/lisa_analysis.md",
 report: { ... full AnalysisReport ... }
 }

Example conversation

Run the runbook ~/runbooks/rhel9_t1.yml with:
 - LISA at ~/lisa
 - subscription_id: xxxx
 - admin_private_key_file: ~/.ssh/lisa_key
 - Azure OpenAI API key: YOUR_AZURE_OPENAI_API_KEY
Save reports to ~/reports/rhel9_t1/

The AI calls run_and_analyze and returns:

✅ Run complete.
Health: DEGRADED | 15/18 passed (83.3%) | 3 failed

Reports:
 HTML: ~/reports/rhel9_t1/lisa_analysis.html
 Markdown: ~/reports/rhel9_t1/lisa_analysis.md

Top failures:
 1. [CRITICAL] CoreTest.verify_kdump — kdump service not found on RHEL 9.2
 2. [HIGH] NetworkTest.verify_sriov — SR-IOV VF timeout
 3. [MEDIUM] StorageTest.verify_swap — swap not enabled by default

Executive summary:
 83% of T1 tests passed with 3 failures. The kdump failure is critical for
 kernel crash collection and should be fixed before image publication...

11. API cost guidance

Approximate costs (February 2026 pricing for gpt-4o):

Scenario	Failures	Approx tokens	Approx cost
T0 smoke run	1–3	~5,000	~$0.01
T1 daily CI	3–10	~15,000–30,000	~$0.05–0.10
T2 weekly regression	5–20	~25,000–60,000	~$0.08–0.20
T4 full certification	20+	~80,000+	cap at 20 failures

Controlling cost

Use max_failures_to_analyze to cap the number of LLM calls:

analyze_test_run_with_llm(
 results_source="lisa_results.xml",
 api_key="YOUR_AZURE_OPENAI_API_KEY",
 max_failures_to_analyze=5, # only analyze the 5 worst failures
)

The tool analyzes failures in the order they appear in the results file. Sort by severity in post-processing if needed.

12. CI/CD integration

GitHub Actions — analyze after every run

- name: Run LISA T1 tests
 run: |
 lisa -r runbook.yml \
 -v subscription_id=${{ secrets.AZURE_SUB_ID }} \
 -v admin_private_key_file=/tmp/lisa_key

- name: Analyze failures with the AI
 if: always() # run even if tests failed
 run: |
 python3 -c "
 import json
 from lisa_mcp.tools.result_parser import parse_junit_xml
 from lisa_mcp.server import generate_analysis_report

 result = generate_analysis_report(
 results_source='./lisa_results.xml',
 api_key='${{ secrets.AZURE_OPENAI_API_KEY }}',
 output_dir='./reports/',
 run_dir='.',
 )
 data = json.loads(result)
 report = data['report']
 health = report['summary']['overall_health'].upper()
 print(f'Health: {health}')
 print(f'Summary: {report[\"summary\"][\"executive_summary\"]}')
 "

- name: Upload analysis report
 if: always()
 uses: actions/upload-artifact@v4
 with:
 name: lisa-analysis-${{ github.run_id }}
 path: reports/

Post analysis in a GitHub comment (on PR)

import json
import subprocess

# Get the analysis
result = json.loads(generate_analysis_report(...))
report = result["report"]
summary = report["summary"]

# Build comment body
comment = f"""## LISA Test Analysis — {summary['overall_health'].upper()}

**Health score:** {int(summary['health_score'] * 100)}%
**Passed:** {report['passed']}/{report['total']}

### Executive Summary
{summary['executive_summary']}

### Top Priorities
""" + "\n".join(f"- {p}" for p in summary['top_priorities'][:3])

# Post via gh CLI
subprocess.run([
 "gh", "pr", "comment", "--body", comment
])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM-Powered Test Analysis — Complete Guide

Table of Contents

1. Overview — how it works

2. Prerequisites

Get an Azure OpenAI API key

LISA installation (for running tests)

3. The 4 analysis tools

4. Step-by-step: analyze a test run

Option A — You have a JUnit XML file

Option B — You have console output

Option C — Investigate one specific failure

Option D — Run tests and analyze in one shot

5. Understanding the output

FailureAnalysis JSON

RunAnalysisSummary JSON

6. Reading the HTML report

Header

Metrics grid

Pass rate bar

Executive Summary

Failure Analysis cards

Top Priorities

Recommendations

Failure Patterns

Environment / Infrastructure Issues

7. Root cause categories

8. Failure severity levels

9. Log collection — how context is gathered

What to do if logs aren't found

10. End-to-end pipeline: `run_and_analyze`

Example conversation

11. API cost guidance

Controlling cost

12. CI/CD integration

GitHub Actions — analyze after every run

Post analysis in a GitHub comment (on PR)

FilesExpand file tree

llm-analysis.md

Latest commit

History

llm-analysis.md

File metadata and controls

LLM-Powered Test Analysis — Complete Guide

Table of Contents

1. Overview — how it works

2. Prerequisites

Get an Azure OpenAI API key

LISA installation (for running tests)

3. The 4 analysis tools

4. Step-by-step: analyze a test run

Option A — You have a JUnit XML file

Option B — You have console output

Option C — Investigate one specific failure

Option D — Run tests and analyze in one shot

5. Understanding the output

FailureAnalysis JSON

RunAnalysisSummary JSON

6. Reading the HTML report

Header

Metrics grid

Pass rate bar

Executive Summary

Failure Analysis cards

Top Priorities

Recommendations

Failure Patterns

Environment / Infrastructure Issues

7. Root cause categories

8. Failure severity levels

9. Log collection — how context is gathered

What to do if logs aren't found

10. End-to-end pipeline: run_and_analyze

Example conversation

11. API cost guidance

Controlling cost

12. CI/CD integration

GitHub Actions — analyze after every run

Post analysis in a GitHub comment (on PR)

10. End-to-end pipeline: `run_and_analyze`