Skip to content

Phase 4: Diagnostics & Failure Explanation#59

Draft
Copilot wants to merge 4 commits intomainfrom
copilot/diagnostics-and-failure-explanation
Draft

Phase 4: Diagnostics & Failure Explanation#59
Copilot wants to merge 4 commits intomainfrom
copilot/diagnostics-and-failure-explanation

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 9, 2026

Adds AI-powered workflow failure diagnostics with automatic state capture and root cause analysis.

Core Module

Diagnostic Capture

  • DiagnosticSnapshot model captures incidents, variables, properties, and metadata at failure time
  • WorkflowDiagnosticsService performs snapshot capture and heuristic pattern matching (timeout, connection, null reference)
  • GetWorkflowDiagnosticsSnapshotTool exposes diagnostic data to AI via function calling

AI Context Enhancement

  • System prompt instructs AI on root cause analysis methodology
  • Automatic diagnostic snapshot injection when incidents detected
  • Comprehensive context includes execution state, exception details, and preliminary analysis

Studio Module

DiagnosticsSummaryPanel Component

  • Displays root cause, incident timeline, and suggested actions
  • Visual hierarchy: error alert → root cause → incidents list → recommendations

ChatPanel Enhancement

  • "Explain Failure" quick action auto-generates diagnostic query when instance attached
  • Surfaces diagnostic-specific context indicators

Implementation Notes

Diagnostic snapshots are captured on-demand when chat context includes failed instances. The service serializes complex exception objects safely and falls back gracefully if data unavailable.

Example AI interaction:

User: [Attaches failed instance] [Clicks "Explain Failure"]

AI receives:
- System: "Identify root cause from timeline, recognize patterns (timeout/connection/null), 
  provide clear explanation with specific activity IDs, suggest actionable fixes"
- Context: DiagnosticSnapshot with incidents, variables, execution state
- Preliminary analysis: "Timeout error in HttpActivity (ID: abc123)"

AI responds:
"Root cause: API timeout after 30s in HttpActivity abc123.
 Recommendations: Increase timeout to 60s, verify endpoint health, add retry logic..."

Screenshot

Copilot with diagnostic features

The chat panel now includes the "Explain Failure" button when a workflow instance is attached.

Original prompt

This section details on the original issue you should resolve

<issue_title>Phase 4: Diagnostics & Failure Explanation</issue_title>
<issue_description>Diagnostics integration and AI-powered troubleshooting for workflows:

  • Explain workflow failures, show root cause, and suggest next actions (using Copilot context and LLM reasoning).
  • Build instrumentation to capture workflow state at failure (snapshots) for AI analysis.
  • Integrate results and diagnostic suggestions into Studio overlay or panel.</issue_description>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI and others added 3 commits February 9, 2026 07:55
Co-authored-by: sfmskywalker <938393+sfmskywalker@users.noreply.github.com>
Co-authored-by: sfmskywalker <938393+sfmskywalker@users.noreply.github.com>
Copilot AI changed the title [WIP] Add diagnostics integration and troubleshooting for workflows Phase 4: Diagnostics & Failure Explanation Feb 9, 2026
Copilot AI requested a review from sfmskywalker February 9, 2026 08:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Phase 4: Diagnostics & Failure Explanation

2 participants