Skip to content

Commit 765aed9

Browse files
committed
Add Azure Resource Health & Issue Diagnosis prompt
1 parent 1ca04dc commit 765aed9

File tree

2 files changed

+292
-1
lines changed

2 files changed

+292
-1
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ Ready-to-use prompt templates for specific development scenarios and tasks, defi
5252
| ----- | ----------- | ------- |
5353
| [ASP.NET Minimal API with OpenAPI](prompts/aspnet-minimal-api-openapi.prompt.md) | Create ASP.NET Minimal API endpoints with proper OpenAPI documentation | [![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://vscode.dev/redirect?url=vscode%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Faspnet-minimal-api-openapi.prompt.md) [![Install in VS Code](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect?url=vscode-insiders%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Faspnet-minimal-api-openapi.prompt.md) |
5454
| [Azure Cost Optimize](prompts/az-cost-optimize.prompt.md) | Analyze Azure resources used in the app (IaC files and/or resources in a target rg) and optimize costs - creating GitHub issues for identified optimizations. | [![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://vscode.dev/redirect?url=vscode%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Faz-cost-optimize.prompt.md) [![Install in VS Code](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect?url=vscode-insiders%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Faz-cost-optimize.prompt.md) |
55+
| [Azure Resource Health & Issue Diagnosis](prompts/azure-resource-health-diagnose.prompt.md) | Analyze Azure resource health, diagnose issues from logs and telemetry, and create a remediation plan for identified problems. | [![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://vscode.dev/redirect?url=vscode%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Fazure-resource-health-diagnose.prompt.md) [![Install in VS Code](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect?url=vscode-insiders%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Fazure-resource-health-diagnose.prompt.md) |
5556
| [Comment Code Generate A Tutorial](prompts/comment-code-generate-a-tutorial.prompt.md) | Transform this Python script into a polished, beginner-friendly project by refactoring the code, adding clear instructional comments, and generating a complete markdown tutorial. | [![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://vscode.dev/redirect?url=vscode%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Fcomment-code-generate-a-tutorial.prompt.md) [![Install in VS Code](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect?url=vscode-insiders%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Fcomment-code-generate-a-tutorial.prompt.md) |
5657
| [C# Async Programming Best Practices](prompts/csharp-async.prompt.md) | Get best practices for C# async programming | [![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://vscode.dev/redirect?url=vscode%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Fcsharp-async.prompt.md) [![Install in VS Code](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect?url=vscode-insiders%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Fcsharp-async.prompt.md) |
5758
| [C# Documentation Best Practices](prompts/csharp-docs.prompt.md) | Ensure that C# types are documented with XML comments and follow best practices for documentation. | [![Install in VS Code](https://img.shields.io/badge/VS_Code-Install-0098FF?style=flat-square&logo=visualstudiocode&logoColor=white)](https://vscode.dev/redirect?url=vscode%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Fcsharp-docs.prompt.md) [![Install in VS Code](https://img.shields.io/badge/VS_Code_Insiders-Install-24bfa5?style=flat-square&logo=visualstudiocode&logoColor=white)](https://insiders.vscode.dev/redirect?url=vscode-insiders%3Achat-prompt%2Finstall%3Furl%3Dhttps%3A%2F%2Fraw.githubusercontent.com%2Fgithub%2Fawesome-copilot%2Fmain%2Fprompts%2Fcsharp-docs.prompt.md) |
@@ -115,4 +116,4 @@ This project may contain trademarks or logos for projects, products, or services
115116
trademarks or logos is subject to and must follow
116117
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
117118
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
118-
Any use of third-party trademarks or logos are subject to those third-party's policies.
119+
Any use of third-party trademarks or logos are subject to those third-party's policies.
Lines changed: 290 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,290 @@
1+
---
2+
mode: 'agent'
3+
description: 'Analyze Azure resource health, diagnose issues from logs and telemetry, and create a remediation plan for identified problems.'
4+
---
5+
6+
# Azure Resource Health & Issue Diagnosis
7+
8+
This workflow analyzes a specific Azure resource to assess its health status, diagnose potential issues using logs and telemetry data, and develop a comprehensive remediation plan for any problems discovered.
9+
10+
## Prerequisites
11+
- Azure MCP server configured and authenticated
12+
- Target Azure resource identified (name and optionally resource group/subscription)
13+
- Resource must be deployed and running to generate logs/telemetry
14+
- Prefer Azure MCP tools (`azmcp-*`) over direct Azure CLI when available
15+
16+
## Workflow Steps
17+
18+
### Step 1: Get Azure Best Practices
19+
**Action**: Retrieve diagnostic and troubleshooting best practices
20+
**Tools**: Azure MCP best practices tool
21+
**Process**:
22+
1. **Load Best Practices**:
23+
- Execute Azure best practices tool to get diagnostic guidelines
24+
- Focus on health monitoring, log analysis, and issue resolution patterns
25+
- Use these practices to inform diagnostic approach and remediation recommendations
26+
27+
### Step 2: Resource Discovery & Identification
28+
**Action**: Locate and identify the target Azure resource
29+
**Tools**: Azure MCP tools + Azure CLI fallback
30+
**Process**:
31+
1. **Resource Lookup**:
32+
- If only resource name provided: Search across subscriptions using `azmcp-subscription-list`
33+
- Use `az resource list --name <resource-name>` to find matching resources
34+
- If multiple matches found, prompt user to specify subscription/resource group
35+
- Gather detailed resource information:
36+
- Resource type and current status
37+
- Location, tags, and configuration
38+
- Associated services and dependencies
39+
40+
2. **Resource Type Detection**:
41+
- Identify resource type to determine appropriate diagnostic approach:
42+
- **Web Apps/Function Apps**: Application logs, performance metrics, dependency tracking
43+
- **Virtual Machines**: System logs, performance counters, boot diagnostics
44+
- **Cosmos DB**: Request metrics, throttling, partition statistics
45+
- **Storage Accounts**: Access logs, performance metrics, availability
46+
- **SQL Database**: Query performance, connection logs, resource utilization
47+
- **Application Insights**: Application telemetry, exceptions, dependencies
48+
- **Key Vault**: Access logs, certificate status, secret usage
49+
- **Service Bus**: Message metrics, dead letter queues, throughput
50+
51+
### Step 3: Health Status Assessment
52+
**Action**: Evaluate current resource health and availability
53+
**Tools**: Azure MCP monitoring tools + Azure CLI
54+
**Process**:
55+
1. **Basic Health Check**:
56+
- Check resource provisioning state and operational status
57+
- Verify service availability and responsiveness
58+
- Review recent deployment or configuration changes
59+
- Assess current resource utilization (CPU, memory, storage, etc.)
60+
61+
2. **Service-Specific Health Indicators**:
62+
- **Web Apps**: HTTP response codes, response times, uptime
63+
- **Databases**: Connection success rate, query performance, deadlocks
64+
- **Storage**: Availability percentage, request success rate, latency
65+
- **VMs**: Boot diagnostics, guest OS metrics, network connectivity
66+
- **Functions**: Execution success rate, duration, error frequency
67+
68+
### Step 4: Log & Telemetry Analysis
69+
**Action**: Analyze logs and telemetry to identify issues and patterns
70+
**Tools**: Azure MCP monitoring tools for Log Analytics queries
71+
**Process**:
72+
1. **Find Monitoring Sources**:
73+
- Use `azmcp-monitor-workspace-list` to identify Log Analytics workspaces
74+
- Locate Application Insights instances associated with the resource
75+
- Identify relevant log tables using `azmcp-monitor-table-list`
76+
77+
2. **Execute Diagnostic Queries**:
78+
Use `azmcp-monitor-log-query` with targeted KQL queries based on resource type:
79+
80+
**General Error Analysis**:
81+
```kql
82+
// Recent errors and exceptions
83+
union isfuzzy=true
84+
AzureDiagnostics,
85+
AppServiceHTTPLogs,
86+
AppServiceAppLogs,
87+
AzureActivity
88+
| where TimeGenerated > ago(24h)
89+
| where Level == "Error" or ResultType != "Success"
90+
| summarize ErrorCount=count() by Resource, ResultType, bin(TimeGenerated, 1h)
91+
| order by TimeGenerated desc
92+
```
93+
94+
**Performance Analysis**:
95+
```kql
96+
// Performance degradation patterns
97+
Perf
98+
| where TimeGenerated > ago(7d)
99+
| where ObjectName == "Processor" and CounterName == "% Processor Time"
100+
| summarize avg(CounterValue) by Computer, bin(TimeGenerated, 1h)
101+
| where avg_CounterValue > 80
102+
```
103+
104+
**Application-Specific Queries**:
105+
```kql
106+
// Application Insights - Failed requests
107+
requests
108+
| where timestamp > ago(24h)
109+
| where success == false
110+
| summarize FailureCount=count() by resultCode, bin(timestamp, 1h)
111+
| order by timestamp desc
112+
113+
// Database - Connection failures
114+
AzureDiagnostics
115+
| where ResourceProvider == "MICROSOFT.SQL"
116+
| where Category == "SQLSecurityAuditEvents"
117+
| where action_name_s == "CONNECTION_FAILED"
118+
| summarize ConnectionFailures=count() by bin(TimeGenerated, 1h)
119+
```
120+
121+
3. **Pattern Recognition**:
122+
- Identify recurring error patterns or anomalies
123+
- Correlate errors with deployment times or configuration changes
124+
- Analyze performance trends and degradation patterns
125+
- Look for dependency failures or external service issues
126+
127+
### Step 5: Issue Classification & Root Cause Analysis
128+
**Action**: Categorize identified issues and determine root causes
129+
**Process**:
130+
1. **Issue Classification**:
131+
- **Critical**: Service unavailable, data loss, security breaches
132+
- **High**: Performance degradation, intermittent failures, high error rates
133+
- **Medium**: Warnings, suboptimal configuration, minor performance issues
134+
- **Low**: Informational alerts, optimization opportunities
135+
136+
2. **Root Cause Analysis**:
137+
- **Configuration Issues**: Incorrect settings, missing dependencies
138+
- **Resource Constraints**: CPU/memory/disk limitations, throttling
139+
- **Network Issues**: Connectivity problems, DNS resolution, firewall rules
140+
- **Application Issues**: Code bugs, memory leaks, inefficient queries
141+
- **External Dependencies**: Third-party service failures, API limits
142+
- **Security Issues**: Authentication failures, certificate expiration
143+
144+
3. **Impact Assessment**:
145+
- Determine business impact and affected users/systems
146+
- Evaluate data integrity and security implications
147+
- Assess recovery time objectives and priorities
148+
149+
### Step 6: Generate Remediation Plan
150+
**Action**: Create a comprehensive plan to address identified issues
151+
**Process**:
152+
1. **Immediate Actions** (Critical issues):
153+
- Emergency fixes to restore service availability
154+
- Temporary workarounds to mitigate impact
155+
- Escalation procedures for complex issues
156+
157+
2. **Short-term Fixes** (High/Medium issues):
158+
- Configuration adjustments and resource scaling
159+
- Application updates and patches
160+
- Monitoring and alerting improvements
161+
162+
3. **Long-term Improvements** (All issues):
163+
- Architectural changes for better resilience
164+
- Preventive measures and monitoring enhancements
165+
- Documentation and process improvements
166+
167+
4. **Implementation Steps**:
168+
- Prioritized action items with specific Azure CLI commands
169+
- Testing and validation procedures
170+
- Rollback plans for each change
171+
- Monitoring to verify issue resolution
172+
173+
### Step 7: User Confirmation & Report Generation
174+
**Action**: Present findings and get approval for remediation actions
175+
**Process**:
176+
1. **Display Health Assessment Summary**:
177+
```
178+
🏥 Azure Resource Health Assessment
179+
180+
📊 Resource Overview:
181+
• Resource: [Name] ([Type])
182+
• Status: [Healthy/Warning/Critical]
183+
• Location: [Region]
184+
• Last Analyzed: [Timestamp]
185+
186+
🚨 Issues Identified:
187+
• Critical: X issues requiring immediate attention
188+
• High: Y issues affecting performance/reliability
189+
• Medium: Z issues for optimization
190+
• Low: N informational items
191+
192+
🔍 Top Issues:
193+
1. [Issue Type]: [Description] - Impact: [High/Medium/Low]
194+
2. [Issue Type]: [Description] - Impact: [High/Medium/Low]
195+
3. [Issue Type]: [Description] - Impact: [High/Medium/Low]
196+
197+
🛠️ Remediation Plan:
198+
• Immediate Actions: X items
199+
• Short-term Fixes: Y items
200+
• Long-term Improvements: Z items
201+
• Estimated Resolution Time: [Timeline]
202+
203+
❓ Proceed with detailed remediation plan? (y/n)
204+
```
205+
206+
2. **Generate Detailed Report**:
207+
```markdown
208+
# Azure Resource Health Report: [Resource Name]
209+
210+
**Generated**: [Timestamp]
211+
**Resource**: [Full Resource ID]
212+
**Overall Health**: [Status with color indicator]
213+
214+
## 🔍 Executive Summary
215+
[Brief overview of health status and key findings]
216+
217+
## 📊 Health Metrics
218+
- **Availability**: X% over last 24h
219+
- **Performance**: [Average response time/throughput]
220+
- **Error Rate**: X% over last 24h
221+
- **Resource Utilization**: [CPU/Memory/Storage percentages]
222+
223+
## 🚨 Issues Identified
224+
225+
### Critical Issues
226+
- **[Issue 1]**: [Description]
227+
- **Root Cause**: [Analysis]
228+
- **Impact**: [Business impact]
229+
- **Immediate Action**: [Required steps]
230+
231+
### High Priority Issues
232+
- **[Issue 2]**: [Description]
233+
- **Root Cause**: [Analysis]
234+
- **Impact**: [Performance/reliability impact]
235+
- **Recommended Fix**: [Solution steps]
236+
237+
## 🛠️ Remediation Plan
238+
239+
### Phase 1: Immediate Actions (0-2 hours)
240+
```bash
241+
# Critical fixes to restore service
242+
[Azure CLI commands with explanations]
243+
```
244+
245+
### Phase 2: Short-term Fixes (2-24 hours)
246+
```bash
247+
# Performance and reliability improvements
248+
[Azure CLI commands with explanations]
249+
```
250+
251+
### Phase 3: Long-term Improvements (1-4 weeks)
252+
```bash
253+
# Architectural and preventive measures
254+
[Azure CLI commands and configuration changes]
255+
```
256+
257+
## 📈 Monitoring Recommendations
258+
- **Alerts to Configure**: [List of recommended alerts]
259+
- **Dashboards to Create**: [Monitoring dashboard suggestions]
260+
- **Regular Health Checks**: [Recommended frequency and scope]
261+
262+
## ✅ Validation Steps
263+
- [ ] Verify issue resolution through logs
264+
- [ ] Confirm performance improvements
265+
- [ ] Test application functionality
266+
- [ ] Update monitoring and alerting
267+
- [ ] Document lessons learned
268+
269+
## 📝 Prevention Measures
270+
- [Recommendations to prevent similar issues]
271+
- [Process improvements]
272+
- [Monitoring enhancements]
273+
```
274+
275+
## Error Handling
276+
- **Resource Not Found**: Provide guidance on resource name/location specification
277+
- **Authentication Issues**: Guide user through Azure authentication setup
278+
- **Insufficient Permissions**: List required RBAC roles for resource access
279+
- **No Logs Available**: Suggest enabling diagnostic settings and waiting for data
280+
- **Query Timeouts**: Break down analysis into smaller time windows
281+
- **Service-Specific Issues**: Provide generic health assessment with limitations noted
282+
283+
## Success Criteria
284+
- ✅ Resource health status accurately assessed
285+
- ✅ All significant issues identified and categorized
286+
- ✅ Root cause analysis completed for major problems
287+
- ✅ Actionable remediation plan with specific steps provided
288+
- ✅ Monitoring and prevention recommendations included
289+
- ✅ Clear prioritization of issues by business impact
290+
- ✅ Implementation steps include validation and rollback procedures

0 commit comments

Comments
 (0)