1- The readme describes how to create and delete eks cluster and kfp services.
1+ The readme describes how to create and delete an EKS cluster and KFP services.
22
33#### Creating EKS cluster
44
5+ export CLUSTER_NAME="torchx-dev"
6+ export EKS_VERSION="1.21"
7+ envsubst < torchx-dev-eks-template.yml > torchx-dev-eks.yml
58 eksctl create cluster -f torchx-dev-eks.yml
69
10+ See https://docs.aws.amazon.com/eks/latest/userguide/platform-versions.html for the latest EKS version
11+
712#### Creating KFP
813
9- kfctl apply -V -f torchx-dev-kfp.yml
14+ Source doc: https://www.kubeflow.org/docs/components/pipelines/installation/standalone-deployment/#deploying-kubeflow-pipelines
15+
16+ export PIPELINE_VERSION=1.8.1
17+ kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION"
18+ kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
19+ kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/dev?ref=$PIPELINE_VERSION"
20+
21+ See https://github.com/kubeflow/pipelines/releases for the latest KFP version
1022
11- #### Applying kfp role binding
23+ #### Applying KFP role binding
1224
25+ kubectl create namespace torchx-dev
1326 kubectl apply -f kfp_volcano_role_binding.yaml
1427
1528#### Creating torchserve
@@ -22,16 +35,6 @@ The readme describes how to create and delete eks cluster and kfp services.
2235
2336 Install `vcctl`
2437
25-
26- #### Installing kfp from source code
27-
28- Source doc: https://www.kubeflow.org/docs/components/pipelines/installation/standalone-deployment/
29-
30- kubectl apply -k manifests/kustomize/cluster-scoped-resources
31-
32- kubectl apply -k manifests/kustomize/env/dev
33-
34-
3538#### Starting etcd service
3639
3740 kubectl apply -f etcd.yaml
@@ -44,21 +47,20 @@ The readme describes how to create and delete eks cluster and kfp services.
4447
4548 eksctl delete -f torch-dev-eks.yml
4649
47- This command most likely will fail. EKS user cloudformation to create many resources, that
48- are hard to remove. If the command fails there needs to be done manual cleanup:
50+ This command most likely will fail. EKS uses CloudFormation to create many resources that
51+ are hard to remove. If the command fails there needs to be manual cleanup:
4952* Clean up the associated VPC. Go to AWS Console -> VPC -> Press ` Delete ` . This will
5053point you the ENI and NAT that needs to be deleted manually.
51- * Clean up the cloudformation temalte . Go to AWS Console -> CNF -> delete corresponding templates.
54+ * Clean up the CloudFormation template . Go to AWS Console -> CNF -> delete corresponding templates.
5255
5356### Gotchas:
5457
55- * The directory where ` torchx-dev-kfp.yml ` is located should be the same name
56- as eks cluster
58+ * The directory where ` torchx-dev-kfp.yml ` is located should be the same name as eks cluster
5759
58- * The node groups in eks cluster HAVE to be spread more than a single AZ, otherwise there
60+ * The node groups in the EKS cluster HAVE to be spread to more than a single AZ, otherwise there
5961 will be problems with ` istio-ingress `
6062
61- * KFP troubleshooting: https://www.kubeflow.org/docs/distributions/aws /troubleshooting-aws /
63+ * KFP troubleshooting: https://www.kubeflow.org/docs/components/pipelines /troubleshooting/
6264
6365* Enable Kubernetes nodes to access AWS account resources: https://stackoverflow.com/a/64617080/1446208
6466
0 commit comments