@@ -9,7 +9,7 @@ Trainer](https://www.kubeflow.org/docs/components/training/),
99[ vLLM] ( https://docs.vllm.ai/en/latest/ ) . It complements them with several
1010open-source components born in IBM Research including
1111[ AutoPilot] ( https://github.com/IBM/autopilot ) ,
12- [ AppWrappers ] ( https://project-codeflare.github.io/appwrapper/ ) , and
12+ [ AppWrapper ] ( https://project-codeflare.github.io/appwrapper/ ) , and
1313[ Sakkara] ( https://github.com/atantawi/4986-kep-sakkara ) . MLBatch manages teams,
1414queues, quotas, and resource allocation. It monitors key cluster components,
1515detecting faults and to a degree automating fault recovery.
@@ -27,7 +27,7 @@ Kubernetes cluster and run a few example workloads.
2727- We deploy [ Prometheus] ( https://prometheus.io ) and [ Grafana
2828dashboards] ( https://grafana.com/grafana/dashboards/ ) to monitor the health of
2929the cluster and GPU utilization. _ This is optional._
30- - We demonstrate the queueing , quota management, and fault recovery capabilities
30+ - We demonstrate the queuing , quota management, and fault recovery capabilities
3131 of MLBatch using synthetic workloads.
3232- We run example workloads using vLLM, PyTorch, and Ray.
3333
0 commit comments