Skip to content

Add private OCI OKE cookbook for Llama Nemotron Nano 8B#117

Open
fede-kamel wants to merge 4 commits intoNVIDIA-NeMo:mainfrom
fede-kamel:fk/oci-phoenix-private-nemotron
Open

Add private OCI OKE cookbook for Llama Nemotron Nano 8B#117
fede-kamel wants to merge 4 commits intoNVIDIA-NeMo:mainfrom
fede-kamel:fk/oci-phoenix-private-nemotron

Conversation

@fede-kamel
Copy link
Copy Markdown

@fede-kamel fede-kamel commented Mar 16, 2026

Summary

  • add a Nemotron-specific OCI cookbook for nvidia/Llama-3.1-Nemotron-Nano-8B-v1
  • document a validated private-only OKE deployment in us-phoenix-1 with no public control-plane endpoint, no public worker IPs, and no public inference endpoint
  • add a checked-in Terraform wrapper for the private Phoenix OKE infrastructure using Oracle's official OKE module plus OCI Bastion service
  • include a known-good vLLM values file for a single VM.GPU.A10.1 node
  • surface the new OCI cookbook in the root README and cookbook index

Validation

Validated against a live Phoenix OKE deployment of nvidia/Llama-3.1-Nemotron-Nano-8B-v1 using a private cluster plus OCI Bastion/tunnel access:

  • terraform plan
  • terraform apply
  • private OKE cluster active
  • OCI Bastion service active
  • CPU node pool active
  • GPU node pool active in PHX-AD-2
  • /health
  • /v1/models
  • chat completion
  • tool calling
  • streaming
  • async concurrent requests

Notes

  • this contribution is intentionally Nemotron-specific and OCI-specific
  • the deployment guidance is private-only and does not use public IPs for the Kubernetes API or inference endpoint
  • the OCI path is documented as a reproducible option comparable to common AWS GPU/Kubernetes deployment patterns, without claiming repo-local AWS Terraform already exists in this repo
  • the Bastion resource is the OCI Bastion service, not a public bastion VM

Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
Signed-off-by: Federico Kamelhar <federico.kamelhar@oracle.com>
@fede-kamel
Copy link
Copy Markdown
Author

@chrisalexiuk-nvidia @anushapant @shashank3959 — Friendly follow-up! This PR adds an OCI OKE deployment cookbook for Llama Nemotron Nano 8B. Would love to get a review when you get a chance. Thanks!

@fede-kamel
Copy link
Copy Markdown
Author

fede-kamel commented Mar 27, 2026

Hey team 👋 — just checking in on this one. Happy to address any feedback or make adjustments to scope if that helps move things along. Let me know if there's anything I can do on my end! @chrisalexiuk-nvidia @anushapant @shashank3959

@fede-kamel
Copy link
Copy Markdown
Author

✅ Cross-validated with NeMo Agent Toolkit OCI integration

This OKE deployment is now serving as the live inference backend for NVIDIA/NeMo-Agent-Toolkit#1804, which adds first-class OCI Generative AI support to the Agent Toolkit.

The full Agent Toolkit OCI test suite — 11/11 tests pass — was validated against the nvidia/Llama-3.1-Nemotron-Nano-8B-v1 endpoint running on this exact private OKE infrastructure in us-phoenix-1. Both PRs together deliver a complete story: reproducible OCI deployment (this PR) powering a production-ready LLM provider and LangChain integration (Toolkit PR).

@fede-kamel
Copy link
Copy Markdown
Author

@chrisalexiuk-nvidia @anushapant @shashank3959 — Quick update — we just used this exact OKE deployment to validate the full OCI integration for NeMo Agent Toolkit! 🚀

The nvidia/Llama-3.1-Nemotron-Nano-8B-v1 model running on the private Phoenix cluster documented in this cookbook passed 11/11 tests as the live inference backend for NVIDIA/NeMo-Agent-Toolkit#1804 — covering the OCI provider, LangChain wrapper, and an end-to-end agent workflow.

Really exciting to see both pieces come together: this PR provides the reproducible OCI deployment, and the Toolkit PR builds on top of it with a first-class integration. Two repos, one Nemotron story on OCI. Looking forward to your review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant