Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
158 changes: 158 additions & 0 deletions demo-notebooks/guided-demos/5_rayjob_lifecycled_cluster.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "9259e514",
"metadata": {},
"source": [
"# Submitting a RayJob which lifecycles its own RayCluster\n",
"\n",
"In this notebook, we will go through the basics of using the SDK to:\n",
" * Define a RayCluster configuration\n",
" * Use this configuration alongside a RayJob definition\n",
" * Submit the RayJob, and allow Kuberay Operator to lifecycle the RayCluster for the RayJob"
]
},
{
"cell_type": "markdown",
"id": "18136ea7",
"metadata": {},
"source": [
"## Defining and Submitting the RayJob"
]
},
{
"cell_type": "markdown",
"id": "a1c2545d",
"metadata": {},
"source": [
"First, we'll need to import the relevant CodeFlare SDK packages. You can do this by executing the below cell."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "51e18292",
"metadata": {},
"outputs": [],
"source": [
"from codeflare_sdk import RayJob, ManagedClusterConfig, TokenAuthentication"
]
},
{
"cell_type": "markdown",
"id": "649c5911",
"metadata": {},
"source": [
"Execute the below cell to authenticate the notebook via OpenShift.\n",
"\n",
"**TODO: Add guide to authenticate locally.**"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dc364888",
"metadata": {},
"outputs": [],
"source": [
"auth = TokenAuthentication(\n",
" token = \"XXXXX\",\n",
" server = \"XXXXX\",\n",
" skip_tls=False\n",
")\n",
"auth.login()"
]
},
{
"cell_type": "markdown",
"id": "5581eca9",
"metadata": {},
"source": [
"Next we'll need to define the ManagedClusterConfig. Kuberay will use this to spin up a short-lived RayCluster that will only exist as long as the job"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3094c60a",
"metadata": {},
"outputs": [],
"source": [
"cluster_config = ManagedClusterConfig(\n",
" num_workers=2,\n",
" worker_cpu_requests=1,\n",
" worker_cpu_limits=1,\n",
" worker_memory_requests=4,\n",
" worker_memory_limits=4,\n",
" head_accelerators={'nvidia.com/gpu': 0},\n",
" worker_accelerators={'nvidia.com/gpu': 0},\n",
")"
]
},
{
"cell_type": "markdown",
"id": "02a2b32b",
"metadata": {},
"source": [
"Lastly we can pass the ManagedClusterConfig into the RayJob and submit it. You do not need to worry about tearing down the cluster when the job has completed, that is handled for you!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e905ccea",
"metadata": {},
"outputs": [],
"source": [
"job = RayJob(\n",
" job_name=\"demo-rayjob\",\n",
" entrypoint=\"python -c 'print(\\\"Hello from RayJob!\\\")'\",\n",
" cluster_config=cluster_config,\n",
" namespace=\"your-namespace\"\n",
")\n",
"\n",
"job.submit()"
]
},
{
"cell_type": "markdown",
"id": "f3612de2",
"metadata": {},
"source": [
"We can check the status of our cluster by executing the below cell. If it's not up immediately, run the cell a few more times until you see that it's in a 'running' state."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "96d92f93",
"metadata": {},
"outputs": [],
"source": [
"job.status()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.11"
}
},
"nbformat": 4,
"nbformat_minor": 5
}