|
| 1 | +# Quick Start with Minikube |
| 2 | + |
| 3 | +This guide provides instructions for deploying the Agentic RAG system on Minikube for local testing. |
| 4 | + |
| 5 | +## Prerequisites |
| 6 | + |
| 7 | +1. [Minikube](https://minikube.sigs.k8s.io/docs/start/) installed |
| 8 | +2. [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/) installed |
| 9 | +3. Docker or another container runtime installed |
| 10 | +4. NVIDIA GPU with appropriate drivers installed |
| 11 | +5. [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) installed |
| 12 | + |
| 13 | +## Step 1: Start Minikube with GPU Support |
| 14 | + |
| 15 | +Start Minikube with sufficient resources and GPU support: |
| 16 | + |
| 17 | +```bash |
| 18 | +# For Linux |
| 19 | +minikube start --cpus 4 --memory 16384 --disk-size 50g --driver=kvm2 --gpu |
| 20 | + |
| 21 | +# For Windows |
| 22 | +minikube start --cpus 4 --memory 16384 --disk-size 50g --driver=hyperv --gpu |
| 23 | + |
| 24 | +# For macOS (Note: GPU passthrough is limited on macOS) |
| 25 | +minikube start --cpus 4 --memory 16384 --disk-size 50g --driver=hyperkit |
| 26 | +``` |
| 27 | + |
| 28 | +Verify that Minikube is running: |
| 29 | + |
| 30 | +```bash |
| 31 | +minikube status |
| 32 | +``` |
| 33 | + |
| 34 | +## Step 2: Install NVIDIA Device Plugin |
| 35 | + |
| 36 | +Install the NVIDIA device plugin to enable GPU support in Kubernetes: |
| 37 | + |
| 38 | +```bash |
| 39 | +# Apply the NVIDIA device plugin |
| 40 | +kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.0/nvidia-device-plugin.yml |
| 41 | +``` |
| 42 | + |
| 43 | +Verify that the GPU is available in the cluster: |
| 44 | + |
| 45 | +```bash |
| 46 | +kubectl get nodes "-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu" |
| 47 | +``` |
| 48 | + |
| 49 | +## Step 3: Clone the Repository |
| 50 | + |
| 51 | +Clone the repository containing the Kubernetes manifests: |
| 52 | + |
| 53 | +```bash |
| 54 | +git clone https://github.com/devrel/devrel-labs.git |
| 55 | +cd devrel-labs/agentic_rag/k8s |
| 56 | +``` |
| 57 | + |
| 58 | +## Step 4: Deploy the Application |
| 59 | + |
| 60 | +The deployment includes both Hugging Face models and Ollama for inference. The Hugging Face token is optional but recommended for using Mistral models. |
| 61 | + |
| 62 | +### Option 1: Deploy without a Hugging Face token (Ollama models only) |
| 63 | + |
| 64 | +```bash |
| 65 | +# Create a namespace |
| 66 | +kubectl create namespace agentic-rag |
| 67 | + |
| 68 | +# Create an empty ConfigMap |
| 69 | +cat <<EOF | kubectl apply -n agentic-rag -f - |
| 70 | +apiVersion: v1 |
| 71 | +kind: ConfigMap |
| 72 | +metadata: |
| 73 | + name: agentic-rag-config |
| 74 | +data: |
| 75 | + config.yaml: | |
| 76 | + # No Hugging Face token provided |
| 77 | + # You can still use Ollama models |
| 78 | +EOF |
| 79 | + |
| 80 | +# Apply the manifests |
| 81 | +kubectl apply -n agentic-rag -f local-deployment/deployment.yaml |
| 82 | +kubectl apply -n agentic-rag -f local-deployment/service.yaml |
| 83 | +``` |
| 84 | + |
| 85 | +### Option 2: Deploy with a Hugging Face token (both Mistral and Ollama models) |
| 86 | + |
| 87 | +```bash |
| 88 | +# Create a namespace |
| 89 | +kubectl create namespace agentic-rag |
| 90 | + |
| 91 | +# Create ConfigMap with your Hugging Face token |
| 92 | +cat <<EOF | kubectl apply -n agentic-rag -f - |
| 93 | +apiVersion: v1 |
| 94 | +kind: ConfigMap |
| 95 | +metadata: |
| 96 | + name: agentic-rag-config |
| 97 | +data: |
| 98 | + config.yaml: | |
| 99 | + HUGGING_FACE_HUB_TOKEN: "your-huggingface-token" |
| 100 | +EOF |
| 101 | + |
| 102 | +# Apply the manifests |
| 103 | +kubectl apply -n agentic-rag -f local-deployment/deployment.yaml |
| 104 | +kubectl apply -n agentic-rag -f local-deployment/service.yaml |
| 105 | +``` |
| 106 | + |
| 107 | +### Option 3: Using the deployment script |
| 108 | + |
| 109 | +```bash |
| 110 | +# Make the script executable |
| 111 | +chmod +x deploy.sh |
| 112 | + |
| 113 | +# Deploy with a Hugging Face token |
| 114 | +./deploy.sh --hf-token "your-huggingface-token" --namespace agentic-rag |
| 115 | + |
| 116 | +# Or deploy without a Hugging Face token |
| 117 | +./deploy.sh --namespace agentic-rag |
| 118 | +``` |
| 119 | + |
| 120 | +## Step 5: Monitor the Deployment |
| 121 | + |
| 122 | +Check the status of your pods: |
| 123 | + |
| 124 | +```bash |
| 125 | +kubectl get pods -n agentic-rag |
| 126 | +``` |
| 127 | + |
| 128 | +View the logs: |
| 129 | + |
| 130 | +```bash |
| 131 | +kubectl logs -f deployment/agentic-rag -n agentic-rag |
| 132 | +``` |
| 133 | + |
| 134 | +## Step 6: Access the Application |
| 135 | + |
| 136 | +For Minikube, you need to use port-forwarding to access the application: |
| 137 | + |
| 138 | +```bash |
| 139 | +kubectl port-forward -n agentic-rag service/agentic-rag 8080:80 |
| 140 | +``` |
| 141 | + |
| 142 | +Then access the application in your browser at `http://localhost:8080`. |
| 143 | + |
| 144 | +Alternatively, you can use Minikube's service command: |
| 145 | + |
| 146 | +```bash |
| 147 | +minikube service agentic-rag -n agentic-rag |
| 148 | +``` |
| 149 | + |
| 150 | +## Troubleshooting |
| 151 | + |
| 152 | +### Insufficient Resources |
| 153 | + |
| 154 | +If pods are stuck in Pending state due to insufficient resources, you can increase Minikube's resources: |
| 155 | + |
| 156 | +```bash |
| 157 | +minikube stop |
| 158 | +minikube start --cpus 6 --memory 16384 --disk-size 50g --driver=kvm2 --gpu |
| 159 | +``` |
| 160 | + |
| 161 | +### GPU-Related Issues |
| 162 | + |
| 163 | +If you encounter GPU-related issues: |
| 164 | + |
| 165 | +1. **Check GPU availability in Minikube**: |
| 166 | + ```bash |
| 167 | + minikube ssh -- nvidia-smi |
| 168 | + ``` |
| 169 | + |
| 170 | +2. **Verify NVIDIA device plugin is running**: |
| 171 | + ```bash |
| 172 | + kubectl get pods -n kube-system | grep nvidia-device-plugin |
| 173 | + ``` |
| 174 | + |
| 175 | +3. **Check if GPU is available to Kubernetes**: |
| 176 | + ```bash |
| 177 | + kubectl describe nodes | grep nvidia.com/gpu |
| 178 | + ``` |
| 179 | + |
| 180 | +### Slow Model Download |
| 181 | + |
| 182 | +The first time you deploy, the models will be downloaded, which can take some time. You can check the progress in the logs: |
| 183 | + |
| 184 | +```bash |
| 185 | +kubectl logs -f deployment/agentic-rag -n agentic-rag |
| 186 | +``` |
| 187 | + |
| 188 | +### Service Not Accessible |
| 189 | + |
| 190 | +If you can't access the service, make sure port-forwarding is running or try using the Minikube service command. |
| 191 | + |
| 192 | +## Cleanup |
| 193 | + |
| 194 | +To remove all resources: |
| 195 | + |
| 196 | +```bash |
| 197 | +kubectl delete namespace agentic-rag |
| 198 | +``` |
| 199 | + |
| 200 | +To stop Minikube: |
| 201 | + |
| 202 | +```bash |
| 203 | +minikube stop |
| 204 | +``` |
| 205 | + |
| 206 | +To delete the Minikube cluster: |
| 207 | + |
| 208 | +```bash |
| 209 | +minikube delete |
| 210 | +``` |
0 commit comments