Skip to content

Conversation

@dkennetzoracle
Copy link

What does this PR do?

Adds OCI GenAI PaaS models for openai chat completion endpoints.

Test Plan

In an OCI tenancy with access to GenAI PaaS, perform the following steps:

  1. Ensure you have IAM policies in place to use service (check docs included in this PR)
  2. For local development, setup OCI cli and configure the CLI with your region, tenancy, and auth here
  3. Once configured, go through llama-stack setup and run llama-stack (uses config based auth) like:
OCI_AUTH_TYPE=config_file OCI_CLI_PROFILE=CHICAGO OCI_REGION=us-chicago-1 OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..aaaaaaaa5...5a llama stack run oci
  1. Hit the models endpoint to list models after server is running:
curl http://localhost:8321/v1/models | jq
...
{
      "identifier": "meta.llama-4-scout-17b-16e-instruct",
      "provider_resource_id": "ocid1.generativeaimodel.oc1.us-chicago-1.am...q",
      "provider_id": "oci",
      "type": "model",
      "metadata": {
        "display_name": "meta.llama-4-scout-17b-16e-instruct",
        "capabilities": [
          "CHAT"
        ],
        "oci_model_id": "ocid1.generativeaimodel.oc1.us-chicago-1.a...q"
      },
      "model_type": "llm"
},
   ...
  1. Use the "display_name" field to use the model in a /chat/completions request:
# Streaming result
curl -X POST http://localhost:8321/v1/chat/completions   -H "Content-Type: application/json"   -d '{
        "model": "meta.llama-4-scout-17b-16e-instruct",
       "stream": true,
       "temperature": 0.9,
      "messages": [
         {
           "role": "system",
           "content": "You are a funny comedian. You can be crass."
         },
          {
           "role": "user",
          "content": "Tell me a funny joke about programming."
         }
       ]
}'

# Non-streaming result
curl -X POST http://localhost:8321/v1/chat/completions   -H "Content-Type: application/json"   -d '{
        "model": "meta.llama-4-scout-17b-16e-instruct",
       "stream": false,
       "temperature": 0.9,
      "messages": [
         {
           "role": "system",
           "content": "You are a funny comedian. You can be crass."
         },
          {
           "role": "user",
          "content": "Tell me a funny joke about programming."
         }
       ]
}'
  1. Try out other models from the /models endpoint.

@dkennetzoracle dkennetzoracle changed the title Oci inference provider feat: add oci genai service as chat inference provider Oct 21, 2025
@ashwinb
Copy link
Contributor

ashwinb commented Oct 21, 2025

@github-actions run precommit

@github-actions
Copy link
Contributor

⏳ Running pre-commit hooks on PR #3876...

🤖 Applied by @github-actions bot via pre-commit workflow
@github-actions
Copy link
Contributor

✅ Pre-commit hooks completed successfully!

🔧 Changes have been committed and pushed to the PR branch.

@dkennetzoracle
Copy link
Author

Removing docs additions at request of @raghotham

@ashwinb
Copy link
Contributor

ashwinb commented Oct 22, 2025

cc @mattf for a review since this touches the inference system

@dkennetzoracle
Copy link
Author

Any updates here?



@json_schema_type
class OCIConfig(BaseModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class OCIConfig(BaseModel):
class OCIConfig(RemoteInferenceProviderConfig):

Comment on lines +303 to +307
# log_probs=params.get("log_probs", 0),
# tool_choice=params.get("tool_choice", {}), # Unsupported
# tools=params.get("tools", {}), # Unsupported
# web_search_options=params.get("web_search_options", {}), # Unsupported
# stop=params.get("stop", []),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why comment out? Are all of them unsupported?

)
return chat_details

async def chat_completion(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have OpenAIMixin Class that exposes a lot of knobs, cna you look at it and see if you can use it? Instead of using "custom" code to do the completion requests?

Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkennetzoracle -

  1. does oci provide an openai compatible endpoint?
  2. please include output of the inference tests against the remote::oci provider

@ashwinb
Copy link
Contributor

ashwinb commented Oct 27, 2025

I think it would be much preferable if we can work against an OpenAI compatible endpoint. Otherwise, at the very least we need a set of recorded tests against the provider. But before recordings, let's make sure at least the tests pass "live". Here's a command to run (roughly):

pytest -sv tests/integration/inference/  \
   --stack-config <your_distro> \
   --text-model <oci/...>   \
   --embedding-model sentence-transformers/nomic-ai/... \
   --inference-mode live

@dkennetzoracle
Copy link
Author

@leseb @ashwinb @mattf thanks for reviewing. If it would be strongly preferable for me to use an OpenAI compatible endpoint, I can make those changes. I'll refactor and re-request when this is done.

Sorry also, I started the PR a few weeks ago before a conference, and when I got back inference providers had changed significantly, although it seems like for the better. I'll align on the changes and re-request!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants