14
14
- [ Design Details] ( #design-details )
15
15
- [ Automated Merging of Prow Autobump PRs] ( #automated-merging-of-prow-autobump-prs )
16
16
- [ Roll Back Process] ( #roll-back-process )
17
+ - [ Graduation Criteria] ( #graduation-criteria )
18
+ - [ Alpha -> ; Beta Graduation] ( #alpha---beta-graduation )
19
+ - [ Beta -> ; GA Graduation] ( #beta---ga-graduation )
20
+ - [ Announcement] ( #announcement )
17
21
- [ Implementation History] ( #implementation-history )
18
22
- [ Alternatives] ( #alternatives )
19
23
- [ A new tool merges autobump PRs] ( #a-new-tool-merges-autobump-prs )
@@ -82,7 +86,7 @@ Shouldn’t see any change, prow breakage should be discovered by prow monitorin
82
86
- What’s Not Changed
83
87
- React to prow alerts and take actions.
84
88
- What’s Changed
85
- - No more manual inspecting prow healthiness .
89
+ - Decouple prow logs inspection from prow bump .
86
90
- No more manual lgtm/approve/retest autobump PRs.
87
91
- No more manual Slack posting.
88
92
@@ -94,7 +98,7 @@ Change how prow is released.
94
98
95
99
## Proposal
96
100
97
- Prow autobump PRs are automatically merged every hour , only on working hours of working days.
101
+ Prow autobump PRs are automatically merged every 3 hours , only on working hours of working days.
98
102
99
103
### Notes/Constraints/Caveats (Optional)
100
104
@@ -114,36 +118,55 @@ One possible way of dealing with breaking changes, is:
114
118
115
119
This approach uses tide auto-merge feature, so that no need to worry about repo requirements such as need more than one approver etc.
116
120
117
- ```
118
- <<[UNRESOLVED (spiffxp) ]>>
119
- Suggestion: how to keep slack reports on each automated bump.
120
- <<[/UNRESOLVED]>>
121
- ```
122
-
123
121
#### Roll Back Process
124
122
125
123
When prow stopped functioning after a bump, prow oncall should:
126
124
- Stop auto-deploying by commenting ` /hold ` on latest autobump PR.
127
125
- Manually create rollback PR for rolling back to known good version.
128
- - Manually apply the changes from rollback PR.
126
+ - Prow is not super actively developed currently, normally there are not many
127
+ changes between bumps, and it should be easy to identify culprit.
128
+ - General rule of thumb is we can assume last bump was good.
129
+ - Manually apply the changes from rollback PR by running [ ` prow/bump.sh ` ] ( https://github.com/kubernetes/test-infra/blob/master/prow/deploy.sh )
130
+
131
+ ### Graduation Criteria
132
+
133
+ #### Alpha -> Beta Graduation
134
+
135
+ - Low frequency continuous deployment bumped prow as expected
136
+ - Known prow failures are captured by alerts ahead of non-oncall human
137
+
138
+ #### Beta -> GA Graduation
129
139
130
- ```
131
- <<[UNRESOLVED]>>
132
- Which version to roll back. This is generally not a problem due to low release volume of prow. @alvaroaleman suggested 6 hours intervals.
133
- <<[/UNRESOLVED]>>
134
- ```
140
+ - High frequency continuous deployment bumped prow as expected
141
+ - Testgrid displays prow plank version
142
+
143
+ #### Announcement
144
+
145
+ Before enabling Alpha phase, this will be announced:
146
+ - On #prow and #testing-ops channel on Slack
147
+ - Via email to the entire
[email protected] group
135
148
136
149
## Implementation History
137
150
138
151
139
152
## Alternatives
140
153
141
-
142
154
#### A new tool merges autobump PRs
143
- This method is independent of tide, which makes sure it works on every prow instance.
155
+
156
+ Instead of letting tide merge PR, an alternative idea is to created a dedicated
157
+ continuous deploy job that takes full control:
158
+ - Merge autobump PR on a fixed schedule
144
159
145
160
##### Pros:
146
- Not relying on tide, works really well with prow instances that don't have tide .
161
+ - This method is independent of tide, which makes sure it works on every prow instance .
147
162
148
163
##### Cons:
149
- Probably have significantly divergent code paths for finding and approving PRs on Gerrit vs PRs on GitHub.
164
+ - The tools is pretty similar to tide, means there will be lots of duplicated
165
+ logic with tide.
166
+
167
+ The biggest pros of this approach, is that it works better with prow instance
168
+ that doesn't have tide support yet, for example prow that works with gerrit.
169
+ However, there are two reasons for not going this path:
170
+ - The current design is targeting k8s prow, which does have tide.
171
+ - Tide will eventually come to gerrit and this can be evaluated later which
172
+ should be done first: tide for gerrit, or continuous deploy prow with gerrit.
0 commit comments