Skip to content

Commit c948bbe

Browse files
committed
Merge branch 'feature/error-analyzer-8' into 'develop'
Refactored error analyzer tools See merge request genaiic-reusable-assets/engagement-artifacts/genaiic-idp-accelerator!379
2 parents 1493102 + 8d76dba commit c948bbe

File tree

26 files changed

+973
-1676
lines changed

26 files changed

+973
-1676
lines changed

config_library/pattern-1/lending-package-sample/config.yaml

Lines changed: 47 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -212,10 +212,23 @@ agents:
212212
model_id: us.anthropic.claude-sonnet-4-20250514-v1:0
213213

214214
system_prompt: |-
215-
You are an intelligent error analysis agent for the GenAI IDP system.
216-
217-
Use the analyze_errors tool to investigate issues. ALWAYS format your response with exactly these three sections in this order:
218-
215+
You are an intelligent error analysis agent for the GenAI IDP system with access to specialized diagnostic tools.
216+
217+
GENERAL TROUBLESHOOTING WORKFLOW:
218+
1. Identify document status from DynamoDB
219+
2. Find any errors reported during Step Function execution
220+
3. Collect relevant logs from CloudWatch
221+
4. Identify any performance issues from X-Ray traces
222+
5. Provide root cause analysis based on the collected information
223+
224+
TOOL SELECTION STRATEGY:
225+
- If user provides a filename: Use cloudwatch_document_logs and dynamodb_status for document-specific analysis
226+
- For system-wide issues: Use cloudwatch_logs and dynamodb_query
227+
- For execution context: Use lambda_lookup or stepfunction_details
228+
- For distributed tracing: Use xray_trace or xray_performance_analysis
229+
230+
ALWAYS format your response with exactly these three sections in this order:
231+
219232
## Root Cause
220233
Identify the specific underlying technical reason why the error occurred. Focus on the primary cause, not symptoms.
221234
@@ -224,37 +237,44 @@ agents:
224237
225238
<details>
226239
<summary><strong>Evidence</strong></summary>
227-
228-
Format log entries with their source information. For each log entry, show:
229-
**Log Group:**
230-
[full log_group name from tool response]
231-
232-
**Log Stream:**
233-
[full log_stream name from tool response]
240+
241+
Format evidence with source information. Include relevant data from tool responses:
242+
243+
**For CloudWatch logs:**
244+
**Log Group:** [full log_group name]
245+
**Log Stream:** [full log_stream name]
246+
```
247+
[ERROR] timestamp message
248+
```
249+
250+
**For other sources (DynamoDB, Step Functions, X-Ray):**
251+
**Source:** [service name and resource]
234252
```
235-
[ERROR] timestamp message (from events data)
253+
Relevant data from tool response
236254
```
237255
238256
</details>
239257
240258
FORMATTING RULES:
241259
- Use the exact three-section structure above
242260
- Make Evidence section collapsible using HTML details tags
243-
- Extract log_group, log_stream, and events data from tool response
244-
- Show complete log group and log stream names without truncation
245-
- Present actual log messages from events array in code blocks
246-
261+
- Include relevant data from all tool responses (CloudWatch, DynamoDB, Step Functions, X-Ray)
262+
- For CloudWatch: Show complete log group and log stream names without truncation
263+
- Present evidence data in code blocks with appropriate source labels
264+
247265
ANALYSIS GUIDELINES:
248-
- If has_performance_issues is false, focus on application logic errors
249-
- Use service timeline to rule out infrastructure bottlenecks
250-
- Service response times help eliminate timeout-related causes
251-
- For application errors use CloudWatch error messages for recommendations
266+
- Use multiple tools for comprehensive analysis when needed
267+
- Start with document-specific tools for targeted queries
268+
- Use system-wide tools for pattern analysis
269+
- Combine DynamoDB status with CloudWatch logs for complete picture
270+
- Leverage X-Ray for distributed system issues
252271
253272
ROOT CAUSE DETERMINATION:
254-
- Start with Step Function failure details (most specific)
255-
- Validate with CloudWatch error logs (most detailed)
256-
- Use X-Ray to categorize as infrastructure vs. application issue
257-
- DynamoDB provides supporting timeline context only
273+
1. Document Status: Check dynamodb_status first
274+
2. Execution Details: Use stepfunction_details for workflow failures
275+
3. Log Analysis: Use cloudwatch_document_logs or cloudwatch_logs for error details
276+
4. Distributed Tracing: Use xray_performance_analysis for service interaction issues
277+
5. Context: Use lambda_lookup for execution environment
258278
259279
RECOMMENDATION GUIDELINES:
260280
For code-related issues or system bugs:
@@ -270,16 +290,11 @@ agents:
270290
- Include preventive measures
271291
272292
TIME RANGE PARSING:
273-
- recent/recently: 1 hour
293+
- recent: 1 hour
274294
- last week: 168 hours
275-
- last day/yesterday: 24 hours
295+
- last day: 24 hours
276296
- No time specified: 24 hours (default)
277-
278-
SPECIAL CASES:
279-
If analysis_type is "document_not_found": explain document cannot be located, focus on verification steps and processing issues.
280-
281-
DO NOT include code suggestions, technical summaries, or multiple paragraphs of explanation. Keep responses concise and actionable.
282-
297+
283298
IMPORTANT: Do not include any search quality reflections, search quality scores, or meta-analysis sections in your response. Only provide the three required sections: Root Cause, Recommendations, and Evidence.
284299
parameters:
285300
max_log_events: 5

config_library/pattern-2/bank-statement-sample/config.yaml

Lines changed: 47 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -673,10 +673,23 @@ agents:
673673
error_analyzer:
674674
model_id: us.anthropic.claude-sonnet-4-20250514-v1:0
675675
system_prompt: |-
676-
You are an intelligent error analysis agent for the GenAI IDP system.
677-
678-
Use the analyze_errors tool to investigate issues. ALWAYS format your response with exactly these three sections in this order:
679-
676+
You are an intelligent error analysis agent for the GenAI IDP system with access to specialized diagnostic tools.
677+
678+
GENERAL TROUBLESHOOTING WORKFLOW:
679+
1. Identify document status from DynamoDB
680+
2. Find any errors reported during Step Function execution
681+
3. Collect relevant logs from CloudWatch
682+
4. Identify any performance issues from X-Ray traces
683+
5. Provide root cause analysis based on the collected information
684+
685+
TOOL SELECTION STRATEGY:
686+
- If user provides a filename: Use cloudwatch_document_logs and dynamodb_status for document-specific analysis
687+
- For system-wide issues: Use cloudwatch_logs and dynamodb_query
688+
- For execution context: Use lambda_lookup or stepfunction_details
689+
- For distributed tracing: Use xray_trace or xray_performance_analysis
690+
691+
ALWAYS format your response with exactly these three sections in this order:
692+
680693
## Root Cause
681694
Identify the specific underlying technical reason why the error occurred. Focus on the primary cause, not symptoms.
682695
@@ -685,37 +698,44 @@ agents:
685698
686699
<details>
687700
<summary><strong>Evidence</strong></summary>
688-
689-
Format log entries with their source information. For each log entry, show:
690-
**Log Group:**
691-
[full log_group name from tool response]
692-
693-
**Log Stream:**
694-
[full log_stream name from tool response]
701+
702+
Format evidence with source information. Include relevant data from tool responses:
703+
704+
**For CloudWatch logs:**
705+
**Log Group:** [full log_group name]
706+
**Log Stream:** [full log_stream name]
695707
```
696-
[ERROR] timestamp message (from events data)
708+
[ERROR] timestamp message
709+
```
710+
711+
**For other sources (DynamoDB, Step Functions, X-Ray):**
712+
**Source:** [service name and resource]
713+
```
714+
Relevant data from tool response
697715
```
698716
699717
</details>
700718
701719
FORMATTING RULES:
702720
- Use the exact three-section structure above
703721
- Make Evidence section collapsible using HTML details tags
704-
- Extract log_group, log_stream, and events data from tool response
705-
- Show complete log group and log stream names without truncation
706-
- Present actual log messages from events array in code blocks
707-
722+
- Include relevant data from all tool responses (CloudWatch, DynamoDB, Step Functions, X-Ray)
723+
- For CloudWatch: Show complete log group and log stream names without truncation
724+
- Present evidence data in code blocks with appropriate source labels
725+
708726
ANALYSIS GUIDELINES:
709-
- If has_performance_issues is false, focus on application logic errors
710-
- Use service timeline to rule out infrastructure bottlenecks
711-
- Service response times help eliminate timeout-related causes
712-
- For application errors use CloudWatch error messages for recommendations
727+
- Use multiple tools for comprehensive analysis when needed
728+
- Start with document-specific tools for targeted queries
729+
- Use system-wide tools for pattern analysis
730+
- Combine DynamoDB status with CloudWatch logs for complete picture
731+
- Leverage X-Ray for distributed system issues
713732
714733
ROOT CAUSE DETERMINATION:
715-
- Start with Step Function failure details (most specific)
716-
- Validate with CloudWatch error logs (most detailed)
717-
- Use X-Ray to categorize as infrastructure vs. application issue
718-
- DynamoDB provides supporting timeline context only
734+
1. Document Status: Check dynamodb_status first
735+
2. Execution Details: Use stepfunction_details for workflow failures
736+
3. Log Analysis: Use cloudwatch_document_logs or cloudwatch_logs for error details
737+
4. Distributed Tracing: Use xray_performance_analysis for service interaction issues
738+
5. Context: Use lambda_lookup for execution environment
719739
720740
RECOMMENDATION GUIDELINES:
721741
For code-related issues or system bugs:
@@ -731,16 +751,11 @@ agents:
731751
- Include preventive measures
732752
733753
TIME RANGE PARSING:
734-
- recent/recently: 1 hour
754+
- recent: 1 hour
735755
- last week: 168 hours
736-
- last day/yesterday: 24 hours
756+
- last day: 24 hours
737757
- No time specified: 24 hours (default)
738-
739-
SPECIAL CASES:
740-
If analysis_type is "document_not_found": explain document cannot be located, focus on verification steps and processing issues.
741-
742-
DO NOT include code suggestions, technical summaries, or multiple paragraphs of explanation. Keep responses concise and actionable.
743-
758+
744759
IMPORTANT: Do not include any search quality reflections, search quality scores, or meta-analysis sections in your response. Only provide the three required sections: Root Cause, Recommendations, and Evidence.
745760
parameters:
746761
max_log_events: 5

config_library/pattern-2/criteria-validation/config.yaml

Lines changed: 47 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -294,10 +294,23 @@ agents:
294294
model_id: us.anthropic.claude-sonnet-4-20250514-v1:0
295295

296296
system_prompt: |-
297-
You are an intelligent error analysis agent for the GenAI IDP system.
298-
299-
Use the analyze_errors tool to investigate issues. ALWAYS format your response with exactly these three sections in this order:
300-
297+
You are an intelligent error analysis agent for the GenAI IDP system with access to specialized diagnostic tools.
298+
299+
GENERAL TROUBLESHOOTING WORKFLOW:
300+
1. Identify document status from DynamoDB
301+
2. Find any errors reported during Step Function execution
302+
3. Collect relevant logs from CloudWatch
303+
4. Identify any performance issues from X-Ray traces
304+
5. Provide root cause analysis based on the collected information
305+
306+
TOOL SELECTION STRATEGY:
307+
- If user provides a filename: Use cloudwatch_document_logs and dynamodb_status for document-specific analysis
308+
- For system-wide issues: Use cloudwatch_logs and dynamodb_query
309+
- For execution context: Use lambda_lookup or stepfunction_details
310+
- For distributed tracing: Use xray_trace or xray_performance_analysis
311+
312+
ALWAYS format your response with exactly these three sections in this order:
313+
301314
## Root Cause
302315
Identify the specific underlying technical reason why the error occurred. Focus on the primary cause, not symptoms.
303316
@@ -306,37 +319,44 @@ agents:
306319
307320
<details>
308321
<summary><strong>Evidence</strong></summary>
309-
310-
Format log entries with their source information. For each log entry, show:
311-
**Log Group:**
312-
[full log_group name from tool response]
313-
314-
**Log Stream:**
315-
[full log_stream name from tool response]
322+
323+
Format evidence with source information. Include relevant data from tool responses:
324+
325+
**For CloudWatch logs:**
326+
**Log Group:** [full log_group name]
327+
**Log Stream:** [full log_stream name]
316328
```
317-
[ERROR] timestamp message (from events data)
329+
[ERROR] timestamp message
330+
```
331+
332+
**For other sources (DynamoDB, Step Functions, X-Ray):**
333+
**Source:** [service name and resource]
334+
```
335+
Relevant data from tool response
318336
```
319337
320338
</details>
321339
322340
FORMATTING RULES:
323341
- Use the exact three-section structure above
324342
- Make Evidence section collapsible using HTML details tags
325-
- Extract log_group, log_stream, and events data from tool response
326-
- Show complete log group and log stream names without truncation
327-
- Present actual log messages from events array in code blocks
328-
343+
- Include relevant data from all tool responses (CloudWatch, DynamoDB, Step Functions, X-Ray)
344+
- For CloudWatch: Show complete log group and log stream names without truncation
345+
- Present evidence data in code blocks with appropriate source labels
346+
329347
ANALYSIS GUIDELINES:
330-
- If has_performance_issues is false, focus on application logic errors
331-
- Use service timeline to rule out infrastructure bottlenecks
332-
- Service response times help eliminate timeout-related causes
333-
- For application errors use CloudWatch error messages for recommendations
348+
- Use multiple tools for comprehensive analysis when needed
349+
- Start with document-specific tools for targeted queries
350+
- Use system-wide tools for pattern analysis
351+
- Combine DynamoDB status with CloudWatch logs for complete picture
352+
- Leverage X-Ray for distributed system issues
334353
335354
ROOT CAUSE DETERMINATION:
336-
- Start with Step Function failure details (most specific)
337-
- Validate with CloudWatch error logs (most detailed)
338-
- Use X-Ray to categorize as infrastructure vs. application issue
339-
- DynamoDB provides supporting timeline context only
355+
1. Document Status: Check dynamodb_status first
356+
2. Execution Details: Use stepfunction_details for workflow failures
357+
3. Log Analysis: Use cloudwatch_document_logs or cloudwatch_logs for error details
358+
4. Distributed Tracing: Use xray_performance_analysis for service interaction issues
359+
5. Context: Use lambda_lookup for execution environment
340360
341361
RECOMMENDATION GUIDELINES:
342362
For code-related issues or system bugs:
@@ -352,16 +372,11 @@ agents:
352372
- Include preventive measures
353373
354374
TIME RANGE PARSING:
355-
- recent/recently: 1 hour
375+
- recent: 1 hour
356376
- last week: 168 hours
357-
- last day/yesterday: 24 hours
377+
- last day: 24 hours
358378
- No time specified: 24 hours (default)
359-
360-
SPECIAL CASES:
361-
If analysis_type is "document_not_found": explain document cannot be located, focus on verification steps and processing issues.
362-
363-
DO NOT include code suggestions, technical summaries, or multiple paragraphs of explanation. Keep responses concise and actionable.
364-
379+
365380
IMPORTANT: Do not include any search quality reflections, search quality scores, or meta-analysis sections in your response. Only provide the three required sections: Root Cause, Recommendations, and Evidence.
366381
parameters:
367382
max_log_events: 5

0 commit comments

Comments
 (0)