@@ -18,7 +18,9 @@ Identical with "Prow Job Analyze Resource" skill.
1818## Input Format
1919
2020The user will provide:
21+
21221 . ** Prow job URL** - gcsweb URL containing ` test-platform-results/ `
23+
2224 - Example: ` https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_hypershift/6731/pull-ci-openshift-hypershift-main-e2e-aws/1962527613477982208 `
2325 - URL may or may not have trailing slash
2426
@@ -37,6 +39,7 @@ Use the "Parse and Validate URL" steps from "Prow Job Analyze Resource" skill
3739### Step 2: Create Working Directory
3840
39411 . ** Check for existing artifacts first**
42+
4043 - Check if ` .work/prow-job-analyze-test-failure/{build_id}/logs/ ` directory exists and has content
4144 - If it exists with content:
4245 - Use AskUserQuestion tool to ask:
@@ -70,16 +73,41 @@ Use the "Download and Validate prowjob.json" steps from "Prow Job Analyze Resour
7073### Step 4: Analyze Test Failure
7174
72751 . ** Download build-log.txt**
76+
7377 ``` bash
7478 gcloud storage cp gs://test-platform-results/{bucket-path}/build-log.txt .work/prow-job-analyze-test-failure/{build_id}/logs/build-log.txt --no-user-output-enabled
7579 ```
7680
77812 . ** Parse and validate**
82+
7883 - Read ` .work/prow-job-analyze-resource/{build_id}/logs/build-log.txt `
7984 - Search for the Test name
8085 - Gather stack trace related to the test
8186
82- 3 . ** Determine root cause**
87+ 3 . ** Examine intervals files for cluster activity during E2E failures**
88+
89+ - Search recursively for E2E timeline artifacts (known as "interval files") within the bucket-path:
90+ ``` bash
91+ gcloud storage ls ' gs://test-platform-results/{bucket-path}/**/e2e-timelines_spyglass_*json'
92+ ```
93+ - The files can be nested at unpredictable levels below the bucket-path
94+ - There could be as many as two matching files
95+ - Download all matching interval files (use the full paths from the search results):
96+ ` ` ` bash
97+ gcloud storage cp gs://test-platform-results/{bucket-path}/** /e2e-timelines_spyglass_* .json .work/prow-job-analyze-test-failure/{build_id}/logs/ --no-user-output-enabled
98+ ` ` `
99+ - If the wildcard copy doesn' t work, copy each file individually using the full paths from the search results
100+ - **Scan interval files for test failure timing:**
101+ - Look for intervals where `source = "E2ETest"` and `message.annotations.status = "Failed"`
102+ - Note the `from` and `to` timestamps on this interval - this indicates when the test was running
103+ - **Scan interval files for related cluster events:**
104+ - Look for intervals that overlap the timeframe when the failed test was running
105+ - Filter for intervals with:
106+ - `level = "Error"` or `level = "Warning"`
107+ - `source = "OperatorState"`
108+ - These events may indicate cluster issues that caused or contributed to the test failure
109+
110+ 4. **Determine root cause**
83111 - Determine a possible root cause for the test failure
84112 - Analyze stack traces
85113 - Analyze related code in the code repository
@@ -91,6 +119,7 @@ Use the "Download and Validate prowjob.json" steps from "Prow Job Analyze Resour
91119### Step 5: Present Results to User
92120
931211. **Display summary**
122+
94123 ```text
95124 Test Failure Analysis Complete
96125
@@ -104,6 +133,7 @@ Use the "Download and Validate prowjob.json" steps from "Prow Job Analyze Resour
104133
105134 Artifacts downloaded to: .work/prow-job-analyze-test-failure/{build_id}/logs/
106135 ```
136+
107137## Error Handling
108138
109139Handle errors in the same way as "Error handling" in "Prow Job Analyze Resource" skill
0 commit comments