Skip to content

Commit 69c1d69

Browse files
author
Mike O'Brien
committed
Add my todo list.
1 parent 04a60ee commit 69c1d69

File tree

1 file changed

+20
-0
lines changed

1 file changed

+20
-0
lines changed

todo.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
* Templatize number of GPUs, size of GPU, etc
2+
* Add LLM model for deployment
3+
* Templatize the NeMo services
4+
* Harden Terraform network defaults: replace 0.0.0.0/0 in allowed_cidr and bastion_ssh_cidr with safe defaults and add validation/docs
5+
* Fix AWS subnet data count: set data.aws_subnet.existing_subnets count to length(var.subnet_ids) to avoid index errors
6+
* Prefix EBS CSI IAM role with name_prefix in iam_oidc to avoid cross-environment name collisions
7+
* Make ALB/Target Group creation optional in loadbalancer module and auto-detect or document pre-existing resource requirement
8+
* Harmonize provider and Kubernetes version pins across root and modules; define single source-of-truth for versions
9+
* Parameterize EC2 snapshot IDs and avoid pre-provisioning large GPU instances by default; gate with variables
10+
* Parameterize storage class names in kubernetes module (gp3/standard) and add fallback handling
11+
* Tighten EKS node security group: restrict kubelet port 10250 ingress from 0.0.0.0/0 to cluster SG/cidr
12+
* Clarify Azure/AKS support in docs vs code; add AKS module or adjust claims
13+
* Configure initial models/NIMs to deploy: add config surface (.env or YAML) and defaults for embedder/reranker
14+
* Document and support "existing cluster" install path (no Terraform): clear steps using --local/VIA_BASTION=false
15+
* Evaluate and, if needed, separate Langflow from GPU/NIM workloads (namespace/nodepool or separate cluster) and document
16+
* Document GKE private endpoint guidance; consider enabling private endpoint + authorized networks for production
17+
* Add preflight to detect missing ALB/TGs and emit actionable errors with names expected by loadbalancer module
18+
* Migrate from NGINX Ingress Controller to alternative (NGINX Ingress being retired March 2026 per https://www.kubernetes.dev/blog/2025/11/12/ingress-nginx-retirement/). Recommended: Gateway API (modern replacement) or Traefik/Istio Ingress controllers. Update Helm charts in modules/gke, scripts (deploy_ingress.sh, deploy_nginx_*.sh, cli.sh), and all ingress resources with nginx-specific annotations across GKE/EKS/RKE2 platforms
19+
20+

0 commit comments

Comments
 (0)