Skip to content

Commit 0e2ed3a

Browse files
Merge pull request #65 from dgoodwin/analyze-e2e-intervals
Expand /prow-job:analyze-test-failure skill to process intervals
2 parents cef9201 + 39229b1 commit 0e2ed3a

File tree

1 file changed

+31
-1
lines changed
  • plugins/prow-job/skills/prow-job-analyze-test-failure

1 file changed

+31
-1
lines changed

plugins/prow-job/skills/prow-job-analyze-test-failure/SKILL.md

Lines changed: 31 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,9 @@ Identical with "Prow Job Analyze Resource" skill.
1818
## Input Format
1919

2020
The user will provide:
21+
2122
1. **Prow job URL** - gcsweb URL containing `test-platform-results/`
23+
2224
- Example: `https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_hypershift/6731/pull-ci-openshift-hypershift-main-e2e-aws/1962527613477982208`
2325
- URL may or may not have trailing slash
2426

@@ -37,6 +39,7 @@ Use the "Parse and Validate URL" steps from "Prow Job Analyze Resource" skill
3739
### Step 2: Create Working Directory
3840

3941
1. **Check for existing artifacts first**
42+
4043
- Check if `.work/prow-job-analyze-test-failure/{build_id}/logs/` directory exists and has content
4144
- If it exists with content:
4245
- Use AskUserQuestion tool to ask:
@@ -70,16 +73,41 @@ Use the "Download and Validate prowjob.json" steps from "Prow Job Analyze Resour
7073
### Step 4: Analyze Test Failure
7174

7275
1. **Download build-log.txt**
76+
7377
```bash
7478
gcloud storage cp gs://test-platform-results/{bucket-path}/build-log.txt .work/prow-job-analyze-test-failure/{build_id}/logs/build-log.txt --no-user-output-enabled
7579
```
7680

7781
2. **Parse and validate**
82+
7883
- Read `.work/prow-job-analyze-resource/{build_id}/logs/build-log.txt`
7984
- Search for the Test name
8085
- Gather stack trace related to the test
8186

82-
3. **Determine root cause**
87+
3. **Examine intervals files for cluster activity during E2E failures**
88+
89+
- Search recursively for E2E timeline artifacts (known as "interval files") within the bucket-path:
90+
```bash
91+
gcloud storage ls 'gs://test-platform-results/{bucket-path}/**/e2e-timelines_spyglass_*json'
92+
```
93+
- The files can be nested at unpredictable levels below the bucket-path
94+
- There could be as many as two matching files
95+
- Download all matching interval files (use the full paths from the search results):
96+
```bash
97+
gcloud storage cp gs://test-platform-results/{bucket-path}/**/e2e-timelines_spyglass_*.json .work/prow-job-analyze-test-failure/{build_id}/logs/ --no-user-output-enabled
98+
```
99+
- If the wildcard copy doesn't work, copy each file individually using the full paths from the search results
100+
- **Scan interval files for test failure timing:**
101+
- Look for intervals where `source = "E2ETest"` and `message.annotations.status = "Failed"`
102+
- Note the `from` and `to` timestamps on this interval - this indicates when the test was running
103+
- **Scan interval files for related cluster events:**
104+
- Look for intervals that overlap the timeframe when the failed test was running
105+
- Filter for intervals with:
106+
- `level = "Error"` or `level = "Warning"`
107+
- `source = "OperatorState"`
108+
- These events may indicate cluster issues that caused or contributed to the test failure
109+
110+
4. **Determine root cause**
83111
- Determine a possible root cause for the test failure
84112
- Analyze stack traces
85113
- Analyze related code in the code repository
@@ -91,6 +119,7 @@ Use the "Download and Validate prowjob.json" steps from "Prow Job Analyze Resour
91119
### Step 5: Present Results to User
92120
93121
1. **Display summary**
122+
94123
```text
95124
Test Failure Analysis Complete
96125
@@ -104,6 +133,7 @@ Use the "Download and Validate prowjob.json" steps from "Prow Job Analyze Resour
104133
105134
Artifacts downloaded to: .work/prow-job-analyze-test-failure/{build_id}/logs/
106135
```
136+
107137
## Error Handling
108138
109139
Handle errors in the same way as "Error handling" in "Prow Job Analyze Resource" skill

0 commit comments

Comments
 (0)