Add Azure Resource Health & Issue Diagnosis prompt

ewega · ewega · commit 765aed9d825e · 2025-07-07T09:17:49.000Z
diff --git a/README.md b/README.md
@@ -52,6 +52,7 @@ Ready-to-use prompt templates for specific development scenarios and tasks, defi
 | ----- | ----------- | ------- |
 | [ASP.NET Minimal API with OpenAPI](prompts/aspnet-minimal-api-openapi.prompt.md) | Create ASP.NET Minimal API endpoints with proper OpenAPI documentation | [![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://vscode.dev/redirect?url=vscode%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Faspnet-minimal-api-openapi.prompt.md) [![Install in VS Code](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect?url=vscode-insiders%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Faspnet-minimal-api-openapi.prompt.md) |
 | [Azure Cost Optimize](prompts/az-cost-optimize.prompt.md) | Analyze Azure resources used in the app (IaC files and/or resources in a target rg) and optimize costs - creating GitHub issues for identified optimizations. | [![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://vscode.dev/redirect?url=vscode%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Faz-cost-optimize.prompt.md) [![Install in VS Code](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect?url=vscode-insiders%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Faz-cost-optimize.prompt.md) |
+| [Azure Resource Health & Issue Diagnosis](prompts/azure-resource-health-diagnose.prompt.md) | Analyze Azure resource health, diagnose issues from logs and telemetry, and create a remediation plan for identified problems. | [![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://vscode.dev/redirect?url=vscode%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Fazure-resource-health-diagnose.prompt.md) [![Install in VS Code](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect?url=vscode-insiders%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Fazure-resource-health-diagnose.prompt.md) |
 | [Comment Code Generate A Tutorial](prompts/comment-code-generate-a-tutorial.prompt.md) | Transform this Python script into a polished, beginner-friendly project by refactoring the code, adding clear instructional comments, and generating a complete markdown tutorial. | [![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://vscode.dev/redirect?url=vscode%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Fcomment-code-generate-a-tutorial.prompt.md) [![Install in VS Code](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect?url=vscode-insiders%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Fcomment-code-generate-a-tutorial.prompt.md) |
 | [C# Async Programming Best Practices](prompts/csharp-async.prompt.md) | Get best practices for C# async programming | [![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://vscode.dev/redirect?url=vscode%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Fcsharp-async.prompt.md) [![Install in VS Code](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect?url=vscode-insiders%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Fcsharp-async.prompt.md) |
 | [C# Documentation Best Practices](prompts/csharp-docs.prompt.md) | Ensure that C# types are documented with XML comments and follow best practices for documentation. | [![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://vscode.dev/redirect?url=vscode%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Fcsharp-docs.prompt.md) [![Install in VS Code](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect?url=vscode-insiders%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Fcsharp-docs.prompt.md) |
@@ -115,4 +116,4 @@ This project may contain trademarks or logos for projects, products, or services
 trademarks or logos is subject to and must follow
 [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
 Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
-Any use of third-party trademarks or logos are subject to those third-party's policies.
+Any use of third-party trademarks or logos are subject to those third-party's policies.
diff --git a/prompts/azure-resource-health-diagnose.prompt.md b/prompts/azure-resource-health-diagnose.prompt.md
@@ -0,0 +1,290 @@
+---
+mode: 'agent'
+description: 'Analyze Azure resource health, diagnose issues from logs and telemetry, and create a remediation plan for identified problems.'
+---
+
+# Azure Resource Health & Issue Diagnosis
+
+This workflow analyzes a specific Azure resource to assess its health status, diagnose potential issues using logs and telemetry data, and develop a comprehensive remediation plan for any problems discovered.
+
+## Prerequisites
+- Azure MCP server configured and authenticated
+- Target Azure resource identified (name and optionally resource group/subscription)
+- Resource must be deployed and running to generate logs/telemetry
+- Prefer Azure MCP tools (`azmcp-*`) over direct Azure CLI when available
+
+## Workflow Steps
+
+### Step 1: Get Azure Best Practices
+**Action**: Retrieve diagnostic and troubleshooting best practices
+**Tools**: Azure MCP best practices tool
+**Process**:
+1. **Load Best Practices**:
+   - Execute Azure best practices tool to get diagnostic guidelines
+   - Focus on health monitoring, log analysis, and issue resolution patterns
+   - Use these practices to inform diagnostic approach and remediation recommendations
+
+### Step 2: Resource Discovery & Identification
+**Action**: Locate and identify the target Azure resource
+**Tools**: Azure MCP tools + Azure CLI fallback
+**Process**:
+1. **Resource Lookup**:
+   - If only resource name provided: Search across subscriptions using `azmcp-subscription-list`
+   - Use `az resource list --name <resource-name>` to find matching resources
+   - If multiple matches found, prompt user to specify subscription/resource group
+   - Gather detailed resource information:
+     - Resource type and current status
+     - Location, tags, and configuration
+     - Associated services and dependencies
+
+2. **Resource Type Detection**:
+   - Identify resource type to determine appropriate diagnostic approach:
+     - **Web Apps/Function Apps**: Application logs, performance metrics, dependency tracking
+     - **Virtual Machines**: System logs, performance counters, boot diagnostics
+     - **Cosmos DB**: Request metrics, throttling, partition statistics
+     - **Storage Accounts**: Access logs, performance metrics, availability
+     - **SQL Database**: Query performance, connection logs, resource utilization
+     - **Application Insights**: Application telemetry, exceptions, dependencies
+     - **Key Vault**: Access logs, certificate status, secret usage
+     - **Service Bus**: Message metrics, dead letter queues, throughput
+
+### Step 3: Health Status Assessment
+**Action**: Evaluate current resource health and availability
+**Tools**: Azure MCP monitoring tools + Azure CLI
+**Process**:
+1. **Basic Health Check**:
+   - Check resource provisioning state and operational status
+   - Verify service availability and responsiveness
+   - Review recent deployment or configuration changes
+   - Assess current resource utilization (CPU, memory, storage, etc.)
+
+2. **Service-Specific Health Indicators**:
+   - **Web Apps**: HTTP response codes, response times, uptime
+   - **Databases**: Connection success rate, query performance, deadlocks
+   - **Storage**: Availability percentage, request success rate, latency
+   - **VMs**: Boot diagnostics, guest OS metrics, network connectivity
+   - **Functions**: Execution success rate, duration, error frequency
+
+### Step 4: Log & Telemetry Analysis
+**Action**: Analyze logs and telemetry to identify issues and patterns
+**Tools**: Azure MCP monitoring tools for Log Analytics queries
+**Process**:
+1. **Find Monitoring Sources**:
+   - Use `azmcp-monitor-workspace-list` to identify Log Analytics workspaces
+   - Locate Application Insights instances associated with the resource
+   - Identify relevant log tables using `azmcp-monitor-table-list`
+
+2. **Execute Diagnostic Queries**:
+   Use `azmcp-monitor-log-query` with targeted KQL queries based on resource type:
+
+   **General Error Analysis**:
+   ```kql
+   // Recent errors and exceptions
+   union isfuzzy=true 
+       AzureDiagnostics,
+       AppServiceHTTPLogs,
+       AppServiceAppLogs,
+       AzureActivity
+   | where TimeGenerated > ago(24h)
+   | where Level == "Error" or ResultType != "Success"
+   | summarize ErrorCount=count() by Resource, ResultType, bin(TimeGenerated, 1h)
+   | order by TimeGenerated desc
+   ```
+
+   **Performance Analysis**:
+   ```kql
+   // Performance degradation patterns
+   Perf
+   | where TimeGenerated > ago(7d)
+   | where ObjectName == "Processor" and CounterName == "% Processor Time"
+   | summarize avg(CounterValue) by Computer, bin(TimeGenerated, 1h)
+   | where avg_CounterValue > 80
+   ```
+
+   **Application-Specific Queries**:
+   ```kql
+   // Application Insights - Failed requests
+   requests
+   | where timestamp > ago(24h)
+   | where success == false
+   | summarize FailureCount=count() by resultCode, bin(timestamp, 1h)
+   | order by timestamp desc
+   
+   // Database - Connection failures
+   AzureDiagnostics
+   | where ResourceProvider == "MICROSOFT.SQL"
+   | where Category == "SQLSecurityAuditEvents"
+   | where action_name_s == "CONNECTION_FAILED"
+   | summarize ConnectionFailures=count() by bin(TimeGenerated, 1h)
+   ```
+
+3. **Pattern Recognition**:
+   - Identify recurring error patterns or anomalies
+   - Correlate errors with deployment times or configuration changes
+   - Analyze performance trends and degradation patterns
+   - Look for dependency failures or external service issues
+
+### Step 5: Issue Classification & Root Cause Analysis
+**Action**: Categorize identified issues and determine root causes
+**Process**:
+1. **Issue Classification**:
+   - **Critical**: Service unavailable, data loss, security breaches
+   - **High**: Performance degradation, intermittent failures, high error rates
+   - **Medium**: Warnings, suboptimal configuration, minor performance issues
+   - **Low**: Informational alerts, optimization opportunities
+
+2. **Root Cause Analysis**:
+   - **Configuration Issues**: Incorrect settings, missing dependencies
+   - **Resource Constraints**: CPU/memory/disk limitations, throttling
+   - **Network Issues**: Connectivity problems, DNS resolution, firewall rules
+   - **Application Issues**: Code bugs, memory leaks, inefficient queries
+   - **External Dependencies**: Third-party service failures, API limits
+   - **Security Issues**: Authentication failures, certificate expiration
+
+3. **Impact Assessment**:
+   - Determine business impact and affected users/systems
+   - Evaluate data integrity and security implications
+   - Assess recovery time objectives and priorities
+
+### Step 6: Generate Remediation Plan
+**Action**: Create a comprehensive plan to address identified issues
+**Process**:
+1. **Immediate Actions** (Critical issues):
+   - Emergency fixes to restore service availability
+   - Temporary workarounds to mitigate impact
+   - Escalation procedures for complex issues
+
+2. **Short-term Fixes** (High/Medium issues):
+   - Configuration adjustments and resource scaling
+   - Application updates and patches
+   - Monitoring and alerting improvements
+
+3. **Long-term Improvements** (All issues):
+   - Architectural changes for better resilience
+   - Preventive measures and monitoring enhancements
+   - Documentation and process improvements
+
+4. **Implementation Steps**:
+   - Prioritized action items with specific Azure CLI commands
+   - Testing and validation procedures
+   - Rollback plans for each change
+   - Monitoring to verify issue resolution
+
+### Step 7: User Confirmation & Report Generation
+**Action**: Present findings and get approval for remediation actions
+**Process**:
+1. **Display Health Assessment Summary**:
+   ```
+   🏥 Azure Resource Health Assessment
+   
+   📊 Resource Overview:
+   • Resource: [Name] ([Type])
+   • Status: [Healthy/Warning/Critical]
+   • Location: [Region]
+   • Last Analyzed: [Timestamp]
+   
+   🚨 Issues Identified:
+   • Critical: X issues requiring immediate attention
+   • High: Y issues affecting performance/reliability  
+   • Medium: Z issues for optimization
+   • Low: N informational items
+   
+   🔍 Top Issues:
+   1. [Issue Type]: [Description] - Impact: [High/Medium/Low]
+   2. [Issue Type]: [Description] - Impact: [High/Medium/Low]
+   3. [Issue Type]: [Description] - Impact: [High/Medium/Low]
+   
+   🛠️ Remediation Plan:
+   • Immediate Actions: X items
+   • Short-term Fixes: Y items  
+   • Long-term Improvements: Z items
+   • Estimated Resolution Time: [Timeline]
+   
+   ❓ Proceed with detailed remediation plan? (y/n)
+   ```
+
+2. **Generate Detailed Report**:
+   ```markdown
+   # Azure Resource Health Report: [Resource Name]
+   
+   **Generated**: [Timestamp]  
+   **Resource**: [Full Resource ID]  
+   **Overall Health**: [Status with color indicator]
+   
+   ## 🔍 Executive Summary
+   [Brief overview of health status and key findings]
+   
+   ## 📊 Health Metrics
+   - **Availability**: X% over last 24h
+   - **Performance**: [Average response time/throughput]
+   - **Error Rate**: X% over last 24h
+   - **Resource Utilization**: [CPU/Memory/Storage percentages]
+   
+   ## 🚨 Issues Identified
+   
+   ### Critical Issues
+   - **[Issue 1]**: [Description]
+     - **Root Cause**: [Analysis]
+     - **Impact**: [Business impact]
+     - **Immediate Action**: [Required steps]
+   
+   ### High Priority Issues  
+   - **[Issue 2]**: [Description]
+     - **Root Cause**: [Analysis]
+     - **Impact**: [Performance/reliability impact]
+     - **Recommended Fix**: [Solution steps]
+   
+   ## 🛠️ Remediation Plan
+   
+   ### Phase 1: Immediate Actions (0-2 hours)
+   ```bash
+   # Critical fixes to restore service
+   [Azure CLI commands with explanations]
+   ```
+   
+   ### Phase 2: Short-term Fixes (2-24 hours)
+   ```bash
+   # Performance and reliability improvements
+   [Azure CLI commands with explanations]
+   ```
+   
+   ### Phase 3: Long-term Improvements (1-4 weeks)
+   ```bash
+   # Architectural and preventive measures
+   [Azure CLI commands and configuration changes]
+   ```
+   
+   ## 📈 Monitoring Recommendations
+   - **Alerts to Configure**: [List of recommended alerts]
+   - **Dashboards to Create**: [Monitoring dashboard suggestions]
+   - **Regular Health Checks**: [Recommended frequency and scope]
+   
+   ## ✅ Validation Steps
+   - [ ] Verify issue resolution through logs
+   - [ ] Confirm performance improvements
+   - [ ] Test application functionality
+   - [ ] Update monitoring and alerting
+   - [ ] Document lessons learned
+   
+   ## 📝 Prevention Measures
+   - [Recommendations to prevent similar issues]
+   - [Process improvements]
+   - [Monitoring enhancements]
+   ```
+
+## Error Handling
+- **Resource Not Found**: Provide guidance on resource name/location specification
+- **Authentication Issues**: Guide user through Azure authentication setup
+- **Insufficient Permissions**: List required RBAC roles for resource access
+- **No Logs Available**: Suggest enabling diagnostic settings and waiting for data
+- **Query Timeouts**: Break down analysis into smaller time windows
+- **Service-Specific Issues**: Provide generic health assessment with limitations noted
+
+## Success Criteria
+- ✅ Resource health status accurately assessed
+- ✅ All significant issues identified and categorized
+- ✅ Root cause analysis completed for major problems
+- ✅ Actionable remediation plan with specific steps provided
+- ✅ Monitoring and prevention recommendations included
+- ✅ Clear prioritization of issues by business impact
+- ✅ Implementation steps include validation and rollback procedures