Skip to content
This repository was archived by the owner on Jul 10, 2025. It is now read-only.

Commit cc718ec

Browse files
committed
Fix PR comments.
1 parent d49e799 commit cc718ec

File tree

1 file changed

+33
-35
lines changed

1 file changed

+33
-35
lines changed

rfcs/20190829-tfx-container-component-execution.md

Lines changed: 33 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ This RFC proposes an orchestrator agnostic way to reliably execute a user’s
1212
container in the TFX pipeline. The proposal can support:
1313

1414
* Running an arbitrary container in either a local Docker environment or a remote
15-
k8s cluster.
15+
Kubernetes cluster.
1616
* Passing data into the container
1717
* Passing output data from the container
1818
* Capturing logs from the container
@@ -21,9 +21,9 @@ container in the TFX pipeline. The proposal can support:
2121

2222
## Motivation
2323

24-
Currently, in a TFX pipeline, there is no way to execute a generic container as
25-
one of its steps. Without this feature, users cannot bring their own containers
26-
into the pipeline. This blocks following use cases:
24+
Currently, the execution of a generic container as a step in a TFX pipeline is
25+
not supported. Without this feature, users cannot bring their own containers
26+
into the pipeline. This blocks the following use cases:
2727

2828
* User already has a docker image and wants to run the image as one of the
2929
steps in a TFX pipeline.
@@ -42,7 +42,7 @@ The execution may occurs in local Docker container or in a remote Kubernetes clu
4242

4343
Today, KFP’s ContainerOp leverages
4444
[Argo container template API](https://github.com/argoproj/argo/blob/master/pkg/apis/workflow/v1alpha1/workflow_types.go)
45-
to launch user’s container in a k8s pod. Argo, as the orchestrator, controls when
45+
to launch user’s container in a Kubernetes pod. Argo, as the orchestrator, controls when
4646
to launch the POD and it uses a sidecar container to report output files back
4747
and wait for user’s container to complete. We are not proposing to use Argo API
4848
because of the following reasons:
@@ -55,9 +55,9 @@ because of the following reasons:
5555
* Argo doesn’t provide an easy way to recover from user’s transient errors,
5656
which is critical in production workload.
5757

58-
#### Airflow k8s pod operator
58+
#### Airflow Kubernetes pod operator
5959

60-
Airflow supports launching a k8s pod by an
60+
Airflow supports launching a Kubernetes pod by an
6161
[operator](https://github.com/apache/airflow/blob/master/airflow/contrib/operators/kubernetes_pod_operator.py).
6262
This approach is closer to what we are proposing in the document. However, we
6363
cannot directly use the operator because:
@@ -76,12 +76,11 @@ cannot directly use the operator because:
7676

7777
### TLDR
7878

79-
We propose to solve the above problems by the following design.
79+
We propose to solve the above problems with the following design:
8080

81-
* Define container as an executor spec.
82-
* Launch container by component launcher in either local docker or k8s pod.
83-
* Use platform config to specify platform specific settings like k8s pod
84-
config.
81+
* Define a container as an executor spec.
82+
* Launch a container via a component launcher in either a local docker or Kubernetes pod.
83+
* Use a platform config to specify a platform-specific settings config.
8584

8685
The proposed solution has the following parts:
8786

@@ -92,9 +91,9 @@ The proposed solution has the following parts:
9291
* `DockerComponentLauncher` which launches `ExecutorContainerSpec` in
9392
a Docker environment.
9493
* `KubernetesPodComponentLauncher` which launches `ExecutorContainerSpec`
95-
in a k8s environment.
94+
in a Kubernetes environment.
9695
* Extensible `PlatformConfig` framework.
97-
* `KubernetesPodPlatformConfig` to support k8s pod spec as a config.
96+
* `KubernetesPodPlatformConfig` to support Kubernetes pod spec as a config.
9897
* `DockerPlatformConfig` to support docker run configs.
9998

10099
### Architecture
@@ -105,7 +104,7 @@ Architecture that allows local container execution.
105104

106105
Architecture that allows Kubernetes container execution.
107106

108-
![TFX k8s container execution](20190829-tfx-container-component-execution/tfx-k8s-container-execution.png)
107+
![TFX Kubernetes container execution](20190829-tfx-container-component-execution/tfx-Kubernetes-container-execution.png)
109108

110109
Class diagram that allows container execution
111110

@@ -114,8 +113,7 @@ Class diagram that allows container execution
114113
### Python DSL experience
115114

116115
In order to use container base component in TFX DSL, user needs follow these
117-
steps. Step 1 and Step 2 follow the DSL extension proposed by the other RFC
118-
(https://github.com/tensorflow/community/pull/146).
116+
steps. Step 1 and Step 2 follow the DSL extension proposed by [TFX Generic Container-based Component](https://github.com/tensorflow/community/pull/146).
119117

120118
#### Step 1: Define the container based component by `ExecutorContainerSpec`
121119

@@ -169,7 +167,7 @@ _ = BeamRunner(platform_configs={
169167
}).run(create_pipeline())
170168
```
171169

172-
#### Step 3(b): Set k8s platform config via runner’s config
170+
#### Step 3(b): Set Kubernetes platform config via runner’s config
173171

174172
```python
175173
_ = KubeflowDagRunner(platform_configs={
@@ -199,7 +197,7 @@ different target platforms. For example:
199197
process.
200198
* `DockerComponentLauncher` can launch a container executor in a Docker
201199
environment.
202-
* `KubernetesPodComponentLauncher` can launch a container executor in a k8s
200+
* `KubernetesPodComponentLauncher` can launch a container executor in a Kubernetes
203201
environment.
204202
* A Dataflow launcher can launch a beam executor in Dataflow service.
205203

@@ -274,7 +272,7 @@ class KubernetesPodComponentLauncher(BaseComponentLauncher):
274272
input_dict: Dict[Text, List[types.Artifact]],
275273
output_dict: Dict[Text, List[types.Artifact]],
276274
exec_properties: Dict[Text, Any]) -> None:
277-
# k8s pod launcher implementation
275+
# Kubernetes pod launcher implementation
278276
279277
```
280278

@@ -467,7 +465,7 @@ definitions:
467465
```
468466

469467
The output.json file is optional, but if the user’s container writes to the file. It
470-
overrides the default handling of the k8s pod launcher. The output fields are:
468+
overrides the default handling of the Kubernetes pod launcher. The output fields are:
471469

472470
* error_status: tells the executor whether it should retry or fail
473471
* outputs and exec_properties: used to override the execution and
@@ -478,55 +476,55 @@ MLMD from executor.
478476

479477
### Auth context resolution
480478

481-
The k8s pod launcher internally uses the k8s Python client. The auth context resolution
479+
The Kubernetes pod launcher internally uses the Kubernetes Python client. The auth context resolution
482480
logic is as follows:
483481

484482
1. If the current env is in a cluster, use `load_incluster_config` to load k8s
485483
context.
486-
1. If not, use default k8s active context to connect to remote cluster.
484+
1. If not, use default Kubernetes active context to connect to remote cluster.
487485

488486
### Pod launcher resiliency
489487

490488
In this design section, we focused more on the launcher resiliency under
491489
`KubeflowDAGRunner`. In `AirflowDAGRunner`, the launcher code is running in the
492-
same process of Airflow orchestrator which we rely on Airflow to ensure its
493-
resiliency. `BeamDAGRunner`, however, is considered mainly for local testing
490+
same process of Airflow orchestrator, and we rely on Airflow to ensure the
491+
resiliency of the process. `BeamDAGRunner`, however, is considered mainly for local testing
494492
purpose and we won't add support for it to be resilient.
495493

496494
In `KubeflowDAGRunner`, a pipeline step will create two pods in order to execute
497495
user’s container:
498496

499-
* A launcher pod which contains the driver, k8s pod launcher, and publisher code.
497+
* A launcher pod which contains the driver, Kubernetes pod launcher, and publisher code.
500498
* A user pod with user’s container.
501499

502-
A pod in k8s is not resilient by itself. We will use Argo’s retry feature to make
500+
A pod in Kubernetes is not resilient by itself. We will use Argo’s retry feature to make
503501
the launcher pod partially resilient. The details are as follows:
504502

505503
* Each Argo launcher step will be configured with a default retry count.
506504
* Argo will retry the step in case of failure, no matter what type of error.
507505
* The launcher container will create a tmp workdir in `pipeline_root`.
508506
* It will keep intermediate results (for example, the ID of the created pod) in the tmp workdir.
509-
* The k8s pod launcher will be implemented in a way that it will resume the
507+
* The Kubernetes pod launcher will be implemented in a way that it will resume the
510508
operation based on the intermediate results in the tmp workdir.
511509
* The launcher will also record a permanent failure data in the tmp workdir so
512510
it won’t resume the operation in case of non-retriable failures.
513511

514512
### Default retry strategy
515513

516514
K8s pod launcher supports exponential backoff retry. This strategy applies to
517-
all runners which can support k8s pod launcher. Docker launchers are not in the
515+
all runners which can support Kubernetes pod launcher. Docker launchers are not in the
518516
scope of the design as it is mainly for local development use case.
519517

520518
The retry only happens if the error is retriable. An error is retriable only
521519
when:
522520

523-
* It’s a transient error code from k8s pod API.
524-
* The output.json file from artifact store indicates it’s a retriable error.
525-
* The pod get deleted (For example: GKE preemptible pod feature).
521+
* It’s a transient error code from Kubernetes pod API.
522+
* Or, the output.json file from artifact store indicates it’s a retriable error.
523+
* Or, the pod get deleted (For example: GKE preemptible pod feature).
526524

527525
### Log streaming
528526

529-
The container launcher streams the log from user’s docker container or k8s pod through the
527+
The container launcher streams the log from user’s docker container or Kubernetes pod through the
530528
API. It will start a thread which constantly pulls new logs and outputs them to
531529
local stdout.
532530

@@ -541,13 +539,13 @@ How the container launcher handles cancellation request varies by orchestrators:
541539
to work. We will use the same process to propagate cancellation requests to
542540
user’s container.
543541

544-
In order to allow the user to specify the cancellation command line entrypoint, the k8s
542+
In order to allow the user to specify the cancellation command line entrypoint, the Kubernetes
545543
pod launcher will support an optional parameter called `cancellation_command`
546544
from `ExecutorContainerSpec`.
547545

548546
## Open discussions
549547

550548
* In the Argo runner, each step requires 2 pods with total 3 containers (launcher
551549
main container + launcher argo wait container + user main container) to run.
552-
Although each launcher container requires minimal k8s resources,
550+
Although each launcher container requires minimal Kubernetes resources,
553551
resource usage is still a concern.

0 commit comments

Comments
 (0)