Skip to content

Latest commit

 

History

History
33 lines (26 loc) · 1.79 KB

File metadata and controls

33 lines (26 loc) · 1.79 KB

Deploying an LLM as a service in an {ocp-name} AI cluster

The code suggestions from {mta-dl-full} differ based on the large language model (LLM) that you use. Therefore, you might want to use an LLM that caters to your specific requirements.

{mta-dl-plugin} integrates with LLMs that are deployed as a scalable service on {ocp-name} AI clusters. These deployments provide you with granular control over resources such as compute, cluster nodes, and auto-scaling graphics processing units (GPUs) while enabling you to use LLMs to resolve code issues at a large scale.

An example workflow for configuring an LLM service on {ocp-name} AI broadly requires the following configurations:

  • Installing and configuring the following infrastructure resources:

    • Red Hat {ocp-name} cluster and installing the {ocp-name} AI Operator

    • Configure a GPU machineset

    • (Optional) Configure an auto scaler custom resource (CR) and a machine scaler CR

  • Configuring {ocp-name} AI platform

    • Configure a data science project

    • Configure a serving runtime

    • Configure an accelerator profile

  • Deploying the LLM through {ocp-name} AI

    • Uploading your model to an AWS compatible bucket

    • Add a data connection

    • Deploy the LLM in your {ocp-name} AI data science project

    • Export the SSL certificate, OPENAI_API_BASE URL and other environment variables to access the LLM

  • Preparing the LLM for analysis

    • Configure an OpenAI API key

    • Update the OpenAI API key and the base URL in provider-settings.yaml.

See Configuring LLM provider settings to configure the base URL and the LLM API key in the {mta-dl-plugin} Visual Studio Code extension.