Skip to content

Commit be22ecf

Browse files
authored
[Doc] Update README (#3695)
Signed-off-by: kaihsun <[email protected]>
1 parent 2285202 commit be22ecf

File tree

1 file changed

+44
-61
lines changed

1 file changed

+44
-61
lines changed

README.md

Lines changed: 44 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -14,16 +14,17 @@ KubeRay is a powerful, open-source Kubernetes operator that simplifies the deplo
1414

1515
* **RayService**: RayService is made up of two parts: a RayCluster and a Ray Serve deployment graph. RayService offers zero-downtime upgrades for RayCluster and high availability.
1616

17-
**Kubectl Plugin**: Starting from KubeRay v1.3.0, you can use the `kubectl ray` plugin to simplify
18-
common workflows when deploying Ray on Kubernetes. If you aren’t familiar with Kubernetes, this
19-
plugin simplifies running Ray on Kubernetes.
17+
**KubeRay ecosystem**: Some optional components.
2018

21-
**Community-managed components (optional)**: Some components are maintained by the KubeRay community.
19+
* **Kubectl Plugin** (Beta): Starting from KubeRay v1.3.0, you can use the `kubectl ray` plugin to simplify
20+
common workflows when deploying Ray on Kubernetes. If you aren’t familiar with Kubernetes, this
21+
plugin simplifies running Ray on Kubernetes. See [kubectl-plugin](https://docs.ray.io/en/latest/cluster/kubernetes/user-guides/kubectl-plugin.html#kubectl-plugin) for more details.
2222

23-
* **KubeRay APIServer**: It provides a layer of simplified configuration for KubeRay resources. The KubeRay API server is used internally
23+
* **KubeRay APIServer** (Alpha): It provides a layer of simplified configuration for KubeRay resources. The KubeRay API server is used internally
2424
by some organizations to back user interfaces for KubeRay resource management.
2525

26-
* **KubeRay Python client**: This Python client library provides APIs to handle RayCluster from your Python application.
26+
* **KubeRay Dashboard** (Experimental): Starting from KubeRay v1.4.0, we have introduced a new dashboard that enables users to view and manage KubeRay resources.
27+
While it is not yet production-ready, we welcome your feedback.
2728

2829
## Documentation
2930

@@ -38,67 +39,47 @@ The KubeRay repository only contains documentation related to the development an
3839

3940
## Examples
4041

41-
* [Ray Train XGBoostTrainer on Kubernetes](https://docs.ray.io/en/master/cluster/kubernetes/examples/ml-example.html#kuberay-ml-example) (CPU-only)
42-
* [Train PyTorch ResNet model with GPUs on Kubernetes](https://docs.ray.io/en/master/cluster/kubernetes/examples/gpu-training-example.html#kuberay-gpu-training-example)
43-
* [Serve a MobileNet image classifier on Kubernetes](https://docs.ray.io/en/master/cluster/kubernetes/examples/mobilenet-rayservice.html#kuberay-mobilenet-rayservice-example) (CPU-only)
44-
* [Serve a StableDiffusion text-to-image model on Kubernetes](https://docs.ray.io/en/master/cluster/kubernetes/examples/stable-diffusion-rayservice.html#kuberay-stable-diffusion-rayservice-example)
45-
* [Serve a text summarizer on Kubernetes](https://docs.ray.io/en/master/cluster/kubernetes/examples/text-summarizer-rayservice.html#kuberay-text-summarizer-rayservice-example)
46-
* [RayJob Batch Inference Example](https://docs.ray.io/en/master/cluster/kubernetes/examples/rayjob-batch-inference-example.html#kuberay-batch-inference-example)
42+
KubeRay examples are hosted on the [Ray documentation](https://docs.ray.io/en/latest/cluster/kubernetes/examples.html).
43+
Examples span a wide range of use cases, including training, LLM online inference, batch inference, and more.
4744

4845
## Kubernetes Ecosystem
4946

50-
* [Ingress: AWS Application Load Balancer, GKE Ingress, Nginx](https://docs.ray.io/en/master/cluster/kubernetes/k8s-ecosystem/ingress.html#kuberay-ingress)
51-
* [Using Prometheus and Grafana](https://docs.ray.io/en/master/cluster/kubernetes/k8s-ecosystem/prometheus-grafana.html#kuberay-prometheus-grafana)
52-
* [Profiling with py-spy](https://docs.ray.io/en/master/cluster/kubernetes/k8s-ecosystem/pyspy.html#kuberay-pyspy-integration)
53-
* [KubeRay integration with Volcano](https://docs.ray.io/en/master/cluster/kubernetes/k8s-ecosystem/volcano.html#kuberay-volcano)
54-
* [MCAD: A Kubernetes Solution for Queuing and Gang Dispatching jobs on Single or Multi-Cluster environment](https://github.com/ray-project/kuberay/blob/master/docs/guidance/kuberay-with-MCAD.md)
47+
KubeRay integrates with the Kubernetes ecosystem, including observability tools (e.g., Prometheus, Grafana, py-spy), queuing systems (e.g., Volcano, Apache YuniKorn, Kueue), ingress controllers (e.g., Nginx), and more.
48+
See [KubeRay Ecosystem](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem.html) for more details.
5549

56-
## External Blog Posts
50+
## Blog Posts
5751

58-
1. [Evolving Niantic AR Mapping Infrastructures with Ray](https://nianticlabs.com/news/ray) Niantic (September 6, 2023)
59-
2. [Building a Modern Machine Learning Platform with Ray at Samsara](https://www.samsara.com/blog/building-a-modern-machine-learning-platform-with-ray) Samsara (August 29, 2023)
60-
3. [Using Ray on Kubernetes with KubeRay at Google Cloud](https://cloud.google.com/blog/products/containers-kubernetes/use-ray-on-kubernetes-with-kuberay) Google (August 15, 2023)
61-
4. [How DoorDash Built an Ensemble Learning Model for Time Series Forecasting with KubeRay](https://doordash.engineering/2023/06/20/how-doordash-built-an-ensemble-learning-model-for-time-series-forecasting/) Doordash (June 20, 2023)
62-
5. [AI/ML Models Batch Training at Scale with Open Data Hub](https://cloud.redhat.com/blog/ai/ml-models-batch-training-at-scale-with-open-data-hub) Red Hat (May 15, 2023)
63-
6. [A cloud-native, open-source stack for accelerating foundation model innovation](https://research.ibm.com/blog/openshift-foundation-model-stack) IBM (May 9, 2023)
64-
7. [Distributed Machine Learning at Instacart](https://tech.instacart.com/distributed-machine-learning-at-instacart-4b11d7569423) Instacart (March 17, 2023)
65-
8. [Unleashing ML Innovation at Spotify with Ray](https://engineering.atspotify.com/2023/02/unleashing-ml-innovation-at-spotify-with-ray/) Spotify (February 1, 2023)
66-
9. [Best Practices For Ray Cluster On ACK](https://www.alibabacloud.com/blog/best-practices-for-ray-clusters---ray-on-ack_600925) Alibaba Cloud (Mar 12, 2024)
52+
* [Scaling Ray to 10K Models and Beyond](https://medium.com/workday-engineering/scaling-ray-to-10k-models-and-beyond-92799b4c9fc3) Workday
53+
* [How Klaviyo built a robust model serving platform with Ray Serve](https://klaviyo.tech/how-klaviyo-built-a-robust-model-serving-platform-with-ray-serve-c02ec65788b3) Klaviyo
54+
* [Evolving Niantic AR Mapping Infrastructures with Ray](https://nianticlabs.com/news/ray) Niantic
55+
* [Building a Modern Machine Learning Platform with Ray at Samsara](https://www.samsara.com/blog/building-a-modern-machine-learning-platform-with-ray) Samsara
56+
* [Using Ray on Kubernetes with KubeRay at Google Cloud](https://cloud.google.com/blog/products/containers-kubernetes/use-ray-on-kubernetes-with-kuberay) Google
57+
* [How DoorDash Built an Ensemble Learning Model for Time Series Forecasting with KubeRay](https://doordash.engineering/2023/06/20/how-doordash-built-an-ensemble-learning-model-for-time-series-forecasting/) Doordash
58+
* [AI/ML Models Batch Training at Scale with Open Data Hub](https://cloud.redhat.com/blog/ai/ml-models-batch-training-at-scale-with-open-data-hub) Red Hat
59+
* [Distributed Machine Learning at Instacart](https://tech.instacart.com/distributed-machine-learning-at-instacart-4b11d7569423) Instacart
60+
* [Unleashing ML Innovation at Spotify with Ray](https://engineering.atspotify.com/2023/02/unleashing-ml-innovation-at-spotify-with-ray/) Spotify
61+
* [Best Practices For Ray Cluster On ACK](https://www.alibabacloud.com/blog/best-practices-for-ray-clusters---ray-on-ack_600925) Alibaba Cloud
6762

6863
## Talks
6964

70-
1. [Supercharge Your AI Platform with KubeRay](https://youtu.be/DgfJR6wR4BQ?si=QuK3j7VEkteSwglA) Anyscale + Google (November 8, 2023)
71-
2. [Sailing Ray Workloads with KubeRay and Kueue in Kubernetes](https://www.youtube.com/watch?v=Q-sQLDMeJ8M) Volcano + DaoCloud (October 17, 2023)
72-
3. [Serving Large Language Models with KubeRay on TPUs](https://raysummit.anyscale.com/agenda/sessions/135) Google (September 19, 2023)
73-
4. [KubeRay: A Ray Cluster Management Solution on Kubernetes](https://raysummit.anyscale.com/agenda/sessions/184) Anyscale (September 18, 2023)
74-
5. [The Different Shades of using KubeRay with Kubernetes](https://raysummit.anyscale.com/agenda/sessions/140) Microsoft (September 18, 2023)
75-
6. [On-Demand Ray Clusters in ML Workflows via KubeRay & Sematic](https://raysummit.anyscale.com/agenda/sessions/164) Sematic (September 18, 2023)
76-
7. [KubeRay - A Kubernetes Ray Clustering Solution](https://www.youtube.com/watch?v=tMEwSAeC1jo) Microsoft (February 8, 2023)
77-
8. [KubeRay x Flyte Integration](https://www.youtube.com/watch?v=RmGynLp5u4Q) Flyte (August 24, 2022)
78-
9. [Operationalizing Ray Serve on Kubernetes](https://youtu.be/NekkpRrcAWg?si=bpX7z64AuZiM_iUv) Anyscale (August 24, 2022)
79-
80-
## Helm Charts
81-
82-
KubeRay Helm charts are hosted on the [ray-project/kuberay-helm](https://github.com/ray-project/kuberay-helm) repository.
83-
Please read [kuberay-operator](helm-chart/kuberay-operator/README.md) to deploy the operator and [ray-cluster](helm-chart/ray-cluster/README.md) to deploy a configurable Ray cluster.
84-
To deploy the optional KubeRay API Server, see [kuberay-apiserver](helm-chart/kuberay-apiserver/README.md).
85-
86-
```sh
87-
# Add the Helm repo
88-
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
89-
helm repo update
90-
91-
# Confirm the repo exists
92-
helm search repo kuberay --devel
93-
94-
# Install both CRDs and KubeRay operator v1.1.0.
95-
helm install kuberay-operator kuberay/kuberay-operator --version 1.1.0
96-
97-
# Check the KubeRay operator Pod in `default` namespace
98-
kubectl get pods
99-
# NAME READY STATUS RESTARTS AGE
100-
# kuberay-operator-6fcbb94f64-mbfnr 1/1 Running 0 17s
101-
```
65+
* [Advanced Model Serving Techniques with Ray on Kubernetes | KubeCon 2024 NA](https://youtu.be/mASxYpfWUNU?si=iCuXakrP7ORAg37z) Anyscale + Google
66+
* [Building Scalable AI Infrastructure with Kuberay and Kubernetes | Ray Summit 2024](https://youtu.be/bbKpBTGf_AU?si=BkdCL7FGOde71t_P) Anyscale + Google
67+
* [Ray at Scale: Apple's Approach to Elastic GPU Management | Ray Summit 2024](https://youtu.be/ZCRZQVt-r3g?si=1Gxkpy8CNVVDDBP0) Apple
68+
* [Scaling Ray Train to 10K Kubernetes Nodes on GKE | Ray Summit 2024](https://youtu.be/9S5WznGnIpE?si=O6Rqpor9QmAvdv6u) Google
69+
* [KubeSecRay: Fortifying Multi-Tenant Ray Clusters on Kubernetes | Ray Summit 2024](https://youtu.be/Y-kLmZ3nklQ?si=N9FIc5Nk_rWwKBRp) Microsoft
70+
* [Scaling LLM Inference: AWS Inferentia Meets Ray Serve on EKS | Ray Summit 2024](https://youtu.be/6rNfYlm6s1k?si=WZeXZXrMDtRbbVKO) AWS
71+
* [How Roblox Scaled Machine Learning by Leveraging Ray for Efficient Batch Inference | Ray Summit 2024](https://youtu.be/BN1CVDZjQRE?si=9pN9A3bReSL26Pc-) Roblox
72+
* [Airbnb's LLM Evolution: Fine-Tuning with Ray | Ray Summit 2024](https://youtu.be/jYQ9ry8uXY0?si=3P56QNo8Qwovv4Vf) Airbnb
73+
* [Ray @ eBay: Pioneering a Next-Gen AI Platform | Ray Summit 2024](https://youtu.be/5KuTdRq9Zto?si=8m485B1411ixfdlx) eBay
74+
* [Spotify Harnesses Ray for Next-Gen AI Infrastructure | Ray Summit 2024](https://youtu.be/4kw3EYBz1Gs?si=PswsNR88xe6Mxuas) Spotify
75+
* [Spotify's Approach to Distributed LLM Training with Ray on GKE | Ray Summit 2024](https://youtu.be/2l1lVBdmNIQ?si=PwCeZD1-XajPNLam) Spotify
76+
* [Reddit's ML Evolution: Scaling with Ray and KubeRay | Ray Summit 2024](https://youtu.be/XwrGk0SM6ls?si=xNMQo548lOonKLiK) Reddit
77+
* [IBM's Approach to Building a Cloud-Native AI Platform | Ray Summit 2024](https://youtu.be/Q27JFtLE6b4?si=QQhVMZyBRelkLC13) IBM
78+
* [Exploring Hinge's ML Platform Evolution with Ray | Ray Summit 2024](https://youtu.be/_nsTcYtfnXU?si=dKNasWOxiTRJgyvj) Hinge
79+
* [How Rubrik Unlocked AI at Scale with Ray Serve | Ray Summit 2024](https://youtu.be/Md5vww4ardo?si=leiuvNkDy2fKeK8r) Rubrik
80+
* [Supercharge Your AI Platform with KubeRay | KubeCon 2023 NA](https://youtu.be/DgfJR6wR4BQ?si=QuK3j7VEkteSwglA) Anyscale + Google
81+
* [Sailing Ray Workloads with KubeRay and Kueue in Kubernetes](https://www.youtube.com/watch?v=Q-sQLDMeJ8M) Volcano + DaoCloud
82+
* [Serving Large Language Models with KubeRay on TPUs](https://raysummit.anyscale.com/agenda/sessions/135) Google
10283

10384
## Development
10485

@@ -108,9 +89,11 @@ Please read our [CONTRIBUTING](CONTRIBUTING.md) guide before making a pull reque
10889

10990
Join [Ray's Slack workspace](https://docs.google.com/forms/d/e/1FAIpQLSfAcoiLCHOguOm8e7Jnn-JJdZaCxPGjgVCvFijHB5PLaQLeig/viewform), and search the following public channels:
11091

111-
* `#kuberay-questions` (KubeRay users): This channel aims to help KubeRay users with their questions. The messages will be closely monitored by the Ray and KubeRay maintainers.
92+
* `#kuberay-questions`: This channel aims to help KubeRay users with their questions. The messages will be closely monitored by the Ray and KubeRay maintainers.
93+
94+
KubeRay contributors are welcome to join the bi-weekly KubeRay community meetings.
11295

113-
* `#kuberay-discuss` (KubeRay contributors): This channel is for contributors to discuss what to do next with KubeRay (e.g. issues, pull requests, feature requests, design docs, KubeRay ecosystem integrations). All KubeRay maintainers and core contributors are in the channel.
96+
* Add the [Ray/KubeRay Google calendar](https://calendar.google.com/calendar/u/1?cid=Y19iZWIwYTUxZDQyZTczMTFmZWFmYTY5YjZiOTY1NjAxMTQ3ZTEzOTAxZWE0ZGU5YzA1NjFlZWQ5OTljY2FiOWM4QGdyb3VwLmNhbGVuZGFyLmdvb2dsZS5jb20) to your calendar.
11497

11598
## Security
11699

0 commit comments

Comments
 (0)