Skip to content

Commit 17d8ed0

Browse files
committed
Fix typos
1 parent becd62c commit 17d8ed0

File tree

1 file changed

+14
-14
lines changed
  • keps/sig-apps/3329-retriable-and-non-retriable-failures

1 file changed

+14
-14
lines changed

keps/sig-apps/3329-retriable-and-non-retriable-failures/README.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,7 @@ thousands of nodes requires usage of pod restart policies in order
230230
to account for infrastructure failures.
231231

232232
Currently, kubernetes Job API offers a way to account for infrastructure
233-
failures by setting `.backoffLimit > 0`. However, this mechanism intructs the
233+
failures by setting `.backoffLimit > 0`. However, this mechanism instructs the
234234
job controller to restart all failed pods - regardless of the root cause
235235
of the failures. Thus, in some scenarios this leads to unnecessary
236236
restarts of many pods, resulting in a waste of time and computational
@@ -354,7 +354,7 @@ As a machine learning researcher, I run jobs comprising thousands
354354
of long-running pods on a cluster comprising thousands of nodes. The jobs often
355355
run at night or over weekend without any human monitoring. In order to account
356356
for random infrastructure failures we define `.backoffLimit: 6` for the job.
357-
However, a signifficant portion of the failures happen due to bugs in code.
357+
However, a significant portion of the failures happen due to bugs in code.
358358
Moreover, the failures may happen late during the program execution time. In
359359
such case, restarting such a pod results in wasting a lot of computational time.
360360

@@ -729,7 +729,7 @@ in different messages for pods.
729729
- Reproduction: Run kube-controller-manager with disabled taint-manager (with the
730730
flag `--enable-taint-manager=false`). Then, run a job with a long-running pod and
731731
disconnect the node
732-
- Comments: handled by node lifcycle controller in: `controller/nodelifecycle/node_lifecycle_controller.go`.
732+
- Comments: handled by node lifecycle controller in: `controller/nodelifecycle/node_lifecycle_controller.go`.
733733
However, the pod phase remains `Running`.
734734
- Pod status:
735735
- status: Unknown
@@ -762,7 +762,7 @@ In Alpha, there is no support for Pod conditions for failures or disruptions ini
762762

763763
For Beta we introduce handling of Pod failures initiated by Kubelet by adding
764764
the pod disruption condition (introduced in Alpha) in case of disruptions
765-
initiated by kubetlet (see [Design details](#design-details)).
765+
initiated by Kubelet (see [Design details](#design-details)).
766766

767767
Kubelet can also evict a pod in some scenarios which are not covered with
768768
adding a pod failure condition:
@@ -863,7 +863,7 @@ dies) between appending a pod condition and deleting the pod.
863863
In particular, scheduler can possibly decide to preempt
864864
a different pod the next time (or none). This would leave a pod with a
865865
condition that it was preempted, when it actually wasn't. This in turn
866-
could lead to inproper handling of the pod by the job controller.
866+
could lead to improper handling of the pod by the job controller.
867867

868868
As a solution we implement a worker, part of the disruption
869869
controller, which clears the pod condition added if `DeletionTimestamp` is
@@ -1218,7 +1218,7 @@ the pod failure does not match any of the specified rules, then default
12181218
handling of failed pods applies.
12191219

12201220
If we limit this feature to use `onExitCodes` only when `restartPolicy=Never`
1221-
(see: [limitting this feature](#limitting-this-feature)), then the rules using
1221+
(see: [limiting this feature](#limitting-this-feature)), then the rules using
12221222
`onExitCodes` are evaluated only against the exit codes in the `state` field
12231223
(under `terminated.exitCode`) of `pod.status.containerStatuses` and
12241224
`pod.status.initContainerStatuses`. We may also need to check for the exit codes
@@ -1279,9 +1279,9 @@ the following scenarios will be covered with unit tests:
12791279
- handling of a pod failure, in accordance with the specified `spec.podFailurePolicy`,
12801280
when the failure is associated with
12811281
- a failed container with non-zero exit code,
1282-
- a dedicated Pod condition indicating termmination originated by a kubernetes component
1282+
- a dedicated Pod condition indicating termination originated by a kubernetes component
12831283
- adding of the `DisruptionTarget` by Kubelet in case of:
1284-
- eviciton due to graceful node shutdown
1284+
- eviction due to graceful node shutdown
12851285
- eviction due to node pressure
12861286
<!--
12871287
Additionally, for Alpha try to enumerate the core package you will be touching
@@ -1313,7 +1313,7 @@ The following scenarios will be covered with integration tests:
13131313
- pod failure is caused by a failed container with a non-zero exit code
13141314

13151315
More integration tests might be added to ensure good code coverage based on the
1316-
actual implemention.
1316+
actual implementation.
13171317

13181318
<!--
13191319
This question should be filled when targeting a release.
@@ -1453,7 +1453,7 @@ N/A
14531453
An upgrade to a version which supports this feature should not require any
14541454
additional configuration changes. In order to use this feature after an upgrade
14551455
users will need to configure their Jobs by specifying `spec.podFailurePolicy`. The
1456-
only noticeable difference in behaviour, without specifying `spec.podFailurePolicy`,
1456+
only noticeable difference in behavior, without specifying `spec.podFailurePolicy`,
14571457
is that Pods terminated by kubernetes components will have an additional
14581458
condition appended to `status.conditions`.
14591459

@@ -1668,7 +1668,7 @@ Manual test performed to simulate the upgrade->downgrade->upgrade scenario:
16681668
- Scenario 2:
16691669
- Create a job with a long running containers and `backoffLimit=0`.
16701670
- Verify that the job continues after the node in uncordoned
1671-
1. Disable the feature gates. Verify that the above scenarios result in default behaviour:
1671+
1. Disable the feature gates. Verify that the above scenarios result in default behavior:
16721672
- In scenario 1: the job restarts pods failed with exit code `42`
16731673
- In scenario 2: the job is failed due to exceeding the `backoffLimit` as the failed pod failed during the draining
16741674
1. Re-enable the feature gates
@@ -1967,7 +1967,7 @@ technics apply):
19671967
is an increase of the Job controller processing time.
19681968
- Inspect the Job controller's `job_pods_finished_total` metric for the
19691969
to check if the numbers of pod failures handled by specific actions (counted
1970-
by the `failure_policy_action` label) agree with the expetations.
1970+
by the `failure_policy_action` label) agree with the expectations.
19711971
For example, if a user configures job failure policy with `Ignore` action for
19721972
the `DisruptionTarget` condition, then a node drain is expected to increase
19731973
the metric for `failure_policy_action=Ignore`.
@@ -1977,7 +1977,7 @@ technics apply):
19771977

19781978
- 2022-06-23: Initial KEP merged
19791979
- 2022-07-12: Preparatory PR "Refactor gc_controller to do not use the deletePod stub" merged
1980-
- 2022-07-14: Preparatory PR "efactor taint_manager to do not use getPod and getNode stubs" merged
1980+
- 2022-07-14: Preparatory PR "Refactor taint_manager to do not use getPod and getNode stubs" merged
19811981
- 2022-07-20: Preparatory PR "Add integration test for podgc" merged
19821982
- 2022-07-28: KEP updates merged
19831983
- 2022-08-01: Additional KEP updates merged
@@ -1986,7 +1986,7 @@ technics apply):
19861986
- 2022-08-04: PR "Support handling of pod failures with respect to the configured rules" merged
19871987
- 2022-09-09: Bugfix PR for test "Fix the TestRoundTripTypes by adding default to the fuzzer" merged
19881988
- 2022-09-26: Prepared PR for KEP Beta update. Summary of the changes:
1989-
- propsal to extend kubelet to add the following pod conditions when evicting a pod (see [Design details](#design-details)):
1989+
- proposal to extend kubelet to add the following pod conditions when evicting a pod (see [Design details](#design-details)):
19901990
- DisruptionTarget for evictions due graceful node shutdown, admission errors, node pressure or Pod admission errors
19911991
- ResourceExhausted for evictions due to OOM killer and exceeding Pod's ephemeral-storage limits
19921992
- extended the review of pod eviction scenarios by kubelet-initiated pod evictions:

0 commit comments

Comments
 (0)