@@ -185,7 +185,8 @@ overage.
185
185
186
186
- ** What happens if we reenable the feature if it was previously rolled back?**
187
187
188
- It should continue to work as expected.
188
+ New objects with expanded DNS configuration will be accepted by the apiserver
189
+ and new Pods with expanded configuration will be created by the kubelet.
189
190
190
191
- ** Are there any tests for feature enablement/disablement?**
191
192
@@ -195,84 +196,95 @@ We will add unit tests.
195
196
196
197
- ** How can a rollout fail? Can it impact already running workloads?**
197
198
198
- N/A
199
+ If a kubelet starts with invalid ` resolvConf ` , new workloads will fail DNS
200
+ lookups.
199
201
200
202
- ** What specific metrics should inform a rollback?**
201
203
202
- N/A
204
+ If new workloads start to fail DNS lookups due to a corrupted resolv.conf, or
205
+ due to older resolver libraries, that would be an indication to rollback the
206
+ enablement.
203
207
204
208
- ** Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
205
209
206
- N/A
210
+ We will do test.
207
211
208
212
- ** Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?**
209
213
210
- N/A
214
+ No
211
215
212
216
### Monitoring Requirements
213
217
214
218
- ** How can an operator determine if the feature is in use by workloads?**
215
219
216
- N/A
220
+ There is no metric to indicate the enablement. The operator has to check if
221
+ there are objects or DNS resolver configuration files with expanded
222
+ configuration to determine if the feature is in use.
217
223
218
224
- ** What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?**
219
225
- [ ] Metrics
220
226
- Metric name:
221
227
- [ Optional] Aggregation method:
222
228
- Components exposing the metric:
223
- - [ ] Other (treat as last resort)
224
- - Details:
225
-
226
- N/A
229
+ - [x] Other (treat as last resort)
230
+ - Success of DNS lookups
227
231
228
232
- ** What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
229
233
230
- N/A
234
+ DNS lookups should not fail as before the feature was enabled.
231
235
232
236
- ** Are there any missing metrics that would be useful to have to improve observability of this feature?**
233
237
234
- N/A
238
+ TBD
235
239
236
240
### Dependencies
237
241
238
242
- ** Does this feature depend on any specific services running in the cluster?**
239
243
240
- N/A
244
+ No
241
245
242
246
### Scalability
243
247
244
248
- ** Will enabling / using this feature result in any new API calls?**
245
249
246
- N/A
250
+ No
247
251
248
252
- ** Will enabling / using this feature result in introducing new API types?**
249
253
250
- N/A
254
+ No
251
255
252
256
- ** Will enabling / using this feature result in any new calls to the cloud provider?**
253
257
254
- N/A
258
+ No
255
259
256
260
- ** Will enabling / using this feature result in increasing size or count of the existing API objects?**
257
261
258
- N/A
262
+ The sum of the lengths of ` PodSpec.DNSConfig.Searches ` can be increased to 2048.
259
263
260
264
- ** Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?**
261
265
262
- N/A
266
+ The DNS lookup time can be increased, but it will be negligible.
263
267
264
268
- ** Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?**
265
269
266
- N/A
270
+ No
267
271
268
272
### Troubleshooting
269
273
270
274
- ** How does this feature react if the API server and/or etcd is unavailable?**
271
275
276
+ N/A
277
+
272
278
- ** What are other known failure modes?**
273
279
280
+ N/A
281
+
274
282
- ** What steps should be taken if SLOs are not being met to determine the problem?**
275
283
284
+ If DNS lookups fail, you can check error messages. And then, validate the
285
+ kubelet's ` resolvConf ` if it is corrupted or use newer DNS resolver libraries if
286
+ they are too old.
287
+
276
288
## Implementation History
277
289
278
290
- 2021-03-26: [ Initial
0 commit comments