DIAL VertexAI Adapter

Overview

Overview

LLM Adapters unify the APIs of respective LLMs to align with the Unified Protocol of DIAL Core. Each Adapter operates within a dedicated container. Multi-modality allows supporting non-textual communications such as image-to-text, text-to-image, file transfers and more.

The project implements AI DIAL API for language models and embedding models from Vertex AI.

Supported models

Chat completion models

The following models support POST $SERVER_ORIGIN/openai/deployments/$MODEL_ID/chat/completions endpoint along with an optional support of the feature endpoints:

POST $SERVER_ORIGIN/openai/deployments/$MODEL_ID/tokenize
POST $SERVER_ORIGIN/openai/deployments/$MODEL_ID/truncate_prompt
POST $SERVER_ORIGIN/openai/deployments/$MODEL_ID/configuration

Model	Model ID	Modality	`/tokenize`	`/truncate_prompt`	tools/functions support	`/configuration`
Gemini 3.1 Pro	gemini-3.1-pro-preview	(text/pdf/image/audio/video)-to-text	✅	✅	✅	✅
Gemini 3.1 Flash Lite	gemini-3.1-flash-lite-preview	(text/pdf/image/audio/video)-to-text	✅	✅	✅	✅
Gemini 3.1 Flash Image	gemini-3.1-flash-image-preview	(text/pdf/image/audio/video)-to-text	✅	✅	✅	✅
Gemini 3 Pro	gemini-3-pro[-preview]	(text/pdf/image/audio/video)-to-text	✅	✅	✅	✅
Gemini 3 Flash	gemini-3-flash-preview	(text/pdf/image/audio/video)-to-text	✅	✅	✅	✅
Gemini 3 Pro Image	gemini-3-pro-image-preview	(text/image)-to-(text/image)	✅	✅	✅	✅
Gemini 2.5 Flash	gemini-2.5-flash	(text/pdf/image/audio/video)-to-text	✅	✅	✅	✅
Gemini 2.5 Flash Image	gemini-2.5-flash-image	(text/image)-to-(text/image)	✅	✅	✅	✅
Gemini 2.5 Pro	gemini-2.5-pro	(text/pdf/image/audio/video)-to-text	✅	✅	✅	✅
Gemini 2.0 Flash Lite	gemini-2.0-flash-lite-001	(text/pdf/image/audio/video)-to-text	✅	✅	✅	❌
Gemini 2.0 Flash	gemini-2.0-flash-exp	(text/pdf/image/audio/video)-to-(text/image)	✅	✅	✅	❌
Gemini 2.0 Flash	gemini-2.0-flash-001	(text/pdf/image/audio/video)-to-text	✅	✅	✅	❌
Claude 4.6 Opus	claude-opus-4-6	(pdf/text/image)-to-text	✅	✅	✅	✅
Claude 4.6 Sonnet	claude-sonnet-4-6	(pdf/text/image)-to-text	✅	✅	✅	✅
Claude 4.5 Opus	claude-opus-4-5@20251101	(pdf/text/image)-to-text	✅	✅	✅	✅
Claude 4.5 Sonnet	claude-sonnet-4-5@20250929	(pdf/text/image)-to-text	✅	✅	✅	✅
Claude 4.5 Haiku	claude-haiku-4-5@20251001	(pdf/text/image)-to-text	✅	✅	✅	✅
Claude 4.1 Opus	claude-opus-4-1@20250805	(pdf/text/image)-to-text	✅	✅	✅	✅
Claude 4 Opus	claude-opus-4@20250514	(pdf/text/image)-to-text	✅	✅	✅	✅
Claude 4 Sonnet	claude-sonnet-4@20250514	(pdf/text/image)-to-text	✅	✅	✅	✅
Claude 3.7 Sonnet	claude-3-7-sonnet@20250219	(pdf/text/image)-to-text	✅	✅	✅	✅
Claude 3 Opus	claude-3-opus@20240229	(text/image)-to-text	✅	✅	✅	✅
Claude 3.5 Sonnet v2	claude-3-5-sonnet-v2@20241022	(pdf/text/image)-to-text	✅	✅	✅	✅
Claude 3.5 Sonnet	claude-3-5-sonnet@20240620	(pdf/text/image)-to-text	✅	✅	✅	✅
Claude 3.5 Haiku	claude-3-5-haiku@20241022	(pdf/text)-to-text	✅	✅	✅	✅
Claude 3 Haiku	claude-3-haiku@20240307	(text/image)-to-text	✅	✅	✅	✅
Imagen 4	imagen-4.0-(generate-preview-06-06\|fast-generate-preview-06-06\|ultra-generate-preview-06-06\|generate-001\|fast-generate-001\|ultra-generate-001)	text-to-image	✅	✅	❌	✅
Imagen 3	imagen-3.0-(generate-001\|generate-002\|fast-generate-001)	text-to-image	✅	✅	❌	✅
Imagen 2	imagegeneration@005	text-to-image	✅	✅	❌	✅
Veo 3.1 Fast Generate	veo-3.1-fast-generate-(001\|preview)	text-to-video	✅	✅	❌	✅
Veo 3.1 Generate	veo-3.1-generate-(001\|preview)	text-to-video	✅	✅	❌	✅
Veo 3.0 Fast Generate	veo-3.0-fast-generate-(001\|preview)	text-to-video	✅	✅	❌	✅
Veo 3.0 Generate	veo-3.0-generate-(001\|preview)	text-to-video	✅	✅	❌	✅

The models that support /truncate_prompt do also support max_prompt_tokens chat completion request parameter.

Image editing in Gemini 2.5 Flash Image

Gemini 2.5 Flash Image and Gemini 3 Pro Image models support both image generation and image editing. This enables the following use case:

user: generate an image of a cat sitting on a sofa
assistant: <image attachment #1>
user: replace the cat with a dog
assistant: <image attachment #2>

This scenario works out of the box with API integrations.

However, it wouldn't work for interactions via DIAL Chat, since it removes all attachments from the assistant messages by default. To change this default behavior, set the assistantAttachmentsInRequestSupported flag to true in the DIAL Core configuration for the Gemini deployment in question:

{
  "models": {
    "dial-gemini-deployment-id": {
      "type": "chat",
      "endpoint": "${VERTEXAI_ADAPTER_ORIGIN}/openai/deployments/gemini-2.5-flash-image/chat/completions",
      "features": {
        "assistantAttachmentsInRequestSupported": true
      }
    }
  }
}

Configurable models

Certain models support configuration via the /configuration endpoint. GET request to this endpoint returns the schema of the model configuration in JSON Schema format. Such models expect that custom_fields.configuration field of the chat/completions request will contain a JSON value that conforms to the schema. The custom_fields.configuration field is optional iff each field in the schema is optional too.

Imagen models

The Imagen models support configuration of parameters specific for image-generation such as negative prompt, aspect ratio and watermarking. See the complete list of configurable parameters at the Imagen API documentation.

{
  "messages": [{"role": "user", "content": "forest meadow"}],
  "custom_fields": {
    "configuration": {
      "add_watermark": false,
      "negative_prompt": "trees",
      "aspect_ratio": "16:9"
    }
  }
}

Veo models

The Veo models support configuration of parameters specific for video-generation such as aspect ratio, compression quality and duration seconds. See the complete list of configurable parameters at the Veo API documentation.

{
  "messages": [{"role": "user", "content": "forest meadow"}],
  "custom_fields": {
    "configuration": {
      "aspect_ratio": "16:9",
      "compression_quality": "optimized",
      "duration_seconds": 4
    }
  }
}

Gemini 2.5, Gemini 3 models

The Gemini 2.5 and Gemini 3 series models support configuration of the thinking parameters:

{
  "custom_fields": {
    "configuration": {
      "thinking": {
        "include_thoughts": true,
        "thinking_budget": 2048
      }
    }
  }
}

The thought summaries are printed into a dedicated Thinking stage given that include_thoughts is set to true.

The token budget for thinking may be set to be unlimited via thinking_budget=-1.

The Gemini 3 series model also supports thinking_level parameter that could be set via the reasoning_effort field from the OpenAI API:

{
  "messages": [{"role": "user", "content": "Explain quantum computing in simple terms."}],
  "reasoning_effort": "none|minimum|low|medium|high"
}

Note

You cannot use both reasoning_effort and the thinking_budget parameters in the same request.

Gemini 2.5 Flash Image model

The Gemini 2.5 Flash Image support configuration of parameters controlling generation of images:

{
  "custom_fields": {
    "configuration": {
      "image_config": {
        "aspect_ratio": "21:9",
        "image_size": "4K"
      }
    }
  }
}

Consult the documentation for the possible values of these parameters and their defaults.

Claude models

The Claude models accept a configuration flag that enables document citations in the generated output. The flag is false by default.

{
  "custom_fields": {
    "configuration": {
      "enable_citations": true
    }
  }
}

Not every Claude model supports citations. Refer to the official documentation before utilizing any flags.

Besides that Claude models support beta flags. The whole list of flags could be found in the Anthropic SDK.

The most notable beta flags are:

Configuration	Comment	Scope
`{"betas": ["token-efficient-tools-2025-02-19"]}`	Token-efficient tool use	Claude 3.7 Sonnet
`{"betas": ["output-128k-2025-02-19"]}`	Extended output length	Claude 3.7 Sonnet

Not every model supports all flags. Refer to the official documentation before utilizing any flags.

Google Search grounding

Gemini models support Grounding with Google Search. It's enabled by the google_search static tool:

{
  "message": [
    {
      "user": "role",
      "content": "What are the latest GenAI news?"
    }
  ],
  "tools": [
    {
      "type": "static_function",
      "static_function": {
        "name": "google_search"
      }
    }
  ]
}

The response will include DIAL attachments with citations from relevant URLs fetched by Google Search.

Refer to the official documentation for the list of models supporting Google Search grounding.

Code Interpreter tool

Gemini models support Code Interpreter tool. It's enabled by the code_execution static tool:

{
  "message": [
    {
      "user": "role",
      "content": "What is the sum of the first 50 prime numbers? Generate and run code for the calculation, and make sure you get all 50."
    }
  ],
  "tools": [
    {
      "type": "static_function",
      "static_function": {
        "name": "code_execution"
      }
    }
  ]
}

The response will include stage titled Code execution for the code generated by the model and the result of its execution.

Embedding models

The following models support $SERVER_ORIGIN/openai/deployments/$MODEL_ID/embeddings endpoint:

Model	Model ID	Language support	Modality
Gemini Embedding 2	gemini-embedding-2-preview	English	(text/image/video/audio/pdf)-to-embedding
Gemini Embeddings	gemini-embedding-001	Multilingual	text-to-embedding
Multimodal embeddings	multimodalembedding@001	English	(text/image)-to-embedding
Embeddings for Text	text-embedding-(004\|005)	English	text-to-embedding
Embeddings for Text Multilingual	text-multilingual-embedding-002	Multilingual	text-to-embedding
Gecko Embeddings for Text V1	textembedding-gecko@001	English	text-to-embedding
Gecko Embeddings for Text V3	textembedding-gecko@003	English	text-to-embedding
Gecko Embeddings for Text Multilingual	textembedding-gecko-multilingual@001	Multilingual	text-to-embedding

Gemini Embedding 2

Gemini Embedding 2 model provides embeddings for text, image, video, audio and PDFs. Instruction prompts are not permitted for this family, so keep custom_fields.instruction unset.

The following example requests demonstrate how to express different modalities when calling POST /openai/deployments/gemini-embedding-2-preview/embeddings:

Single text (resulting in one embedding vector)

{
  "input": "Describe how solar panels generate electricity."
}

Two text strings (two vectors)

{
  "input": [
    "Explain transformers in one paragraph.",
    "Write a limerick about embeddings."
  ]
}

Multi-modal inputs are expressed as DIAL attachments placed inside the custom_input array. Each attachment object must supply a MIME type and either url (leading to DIAL Storage, public URL or base64 encoded data as data URL) or data (containing base64 encoded data).

Image attachment (one vector)

{
  "input": [],
  "custom_input": [
    {
      "type": "image/jpeg",
      "url": "https://example.com/media/robot.jpg"
    }
  ]
}

Video attachment (one vector)

{
  "input": [],
  "custom_input": [
    {
      "type": "video/mp4",
      "url": "https://example.com/media/product-demo.mp4"
    }
  ]
}

Audio attachment (one vector)

{
  "input": [],
  "custom_input": [
    {
      "type": "audio/mpeg",
      "url": "https://example.com/media/podcast-intro.mp3"
    }
  ]
}

PDF attachment (one vector)

{
  "input": [],
  "custom_input": [
    {
      "type": "application/pdf",
      "url": "https://example.com/media/security-whitepaper.pdf"
    }
  ]
}

Image and audio attachments separately (two vectors)

{
  "input": [],
  "custom_input": [
    {
      "type": "image/png",
      "url": "https://example.com/media/dog.png"
    },
    {
      "type": "audio/mpeg",
      "url": "https://example.com/media/dog-bark.mp3"
    }
  ]
}

Multiple multi-modal components could be used as single composite input for the embedding model:

Audio and PDF attachments together (one vector)

{
  "input": [],
  "custom_input": [
    [
      {
        "type": "audio/mpeg",
        "url": "https://example.com/media/meeting-recording.mp3"
      },
      {
        "type": "application/pdf",
        "url": "https://example.com/media/meeting-summary.pdf"
      }
    ]
  ]
}

The first text string in a multi-components input is interpreted as a title. The title could only be used when type is equal to RETRIEVAL_DOCUMENT:

Text with a title (one vector)

{
  "input": [],
  "custom_input": [
    [
      "Incident response playbook",
      "Step 1: Notify on-call.",
      "Step 2: Capture relevant logs."
    ]
  ],
  "custom_fields": {
    "type": "RETRIEVAL_DOCUMENT"
  }
}

Image with a title (one vector)

{
  "input": [],
  "custom_input": [
    [
      "Device schematic",
      {
        "type": "image/png",
        "url": "https://example.com/media/schematic.png"
      }
    ]
  ],
  "custom_fields": {
    "type": "RETRIEVAL_DOCUMENT"
  }
}

The model supports configurable dimensions parameter which equals 3072 by default.

Text (one vector of the specified length)

{
  "input": "Summarize reinforcement learning in two sentences.",
  "dimensions": 768
}

The model supports various task types:

Text with task type equal SEMANTIC_SIMILARITY (one vector)

{
  "input": "List practical uses for cosine similarity.",
  "custom_fields": {
    "type": "SEMANTIC_SIMILARITY"
  }
}

The model returns normalized embedding vectors.

Environment variables

Copy .env.example to .env and customize it for your environment:

Variable	Default	Description
GOOGLE_APPLICATION_CREDENTIALS		Filepath to JSON with credentials
DEFAULT_REGION		Default region for Vertex AI (e.g. "us-central1")
GCP_PROJECT_ID		GCP project ID
LOG_LEVEL	INFO	Log level. Use DEBUG for dev purposes and INFO in prod
AIDIAL_LOG_LEVEL	WARNING	AI DIAL SDK log level
WEB_CONCURRENCY	1	Number of workers for the server
DIAL_URL		URL of the core DIAL server. Optional. Used to access images stored in the DIAL File storage
COMPATIBILITY_MAPPING	{}	Deprecated in favour of compatibility configuration in DIAL Core config. A JSON dictionary that maps VertexAI deployments that aren't supported by the Adapter to the VertexAI deployments that are supported by the Adapter (see the Supported models section). Find more details in the compatibility mode section.
CLAUDE_DEFAULT_MAX_TOKENS	1536	The default value of `max_tokens` chat completion parameter if it is not provided in the request. ⚠️ Using the variable is discouraged. Consider configuring the default in the DIAL Core Config instead as demonstrated in the example below.
GOOGLE_GENAI_MAX_RETRY_ATTEMPTS	0	How many times to retry Google GenAI chat model requests when the provider returns a retriable error
ANTHROPIC_MAX_RETRY_ATTEMPTS	0	How many times to retry Anthropic chat model requests when the provider returns a retriable error

Default `max_tokens` for Claude models

Unlike Gemini models, Claude models require the max_tokens parameter in the chat completion request.

We recommend configuring max_tokens default value on a per-model basis in the DIAL Core Config, for example:

{
    "models": {
        "dial-claude-deployment-id": {
            "type": "chat",
            "description": "...",
            "endpoint": "...",
            "defaults": {
                "max_tokens": 2048
            }
        }
    }
}

If the default is missing in the DIAL Core Config, it will be taken from the CLAUDE_DEFAULT_MAX_TOKENS environment variable. However, we strongly recommend not to rely on this variable and instead configure the defaults in the DIAL Core Config. Such a per-model configuration is operationally cleaner since all the information relevant to tokens (like pricing and token limits) is kept in the same place.

The default value set in the DIAL Core Config takes precedence over the one configured in the adapter.

Make sure the default doesn't exceed Claude's max output tokens, otherwise, you will receive an error like this one: max_tokens: 10000 > 8192, which is the maximum allowed number of output tokens for claude-3...).

Compatibility mode

The Adapter supports a predefined list of VertexAI deployments. The Supported models section lists the models. These models could be accessed via /openai/deployments/$MODEL_ID/(chat_completions|embeddings) endpoints. The Adapter won't recognize any other deployment name and will result in 404 error.

Now, suppose VertexAI has just released a new version of a model, e.g. gemini-2.0-flash-006 that is a better version of an older gemini-2.0-flash-001 model.

Immediately after the release, the former model is unsupported by the Adapter, but the latter is supported. Therefore, the request to openai/deployments/gemini-2.0-flash-006/chat/completions will result in 404 error.

It will take some time for the Adapter to catch up with VertexAI - support the v6 model and publish the release with the fix.

What to do in the meantime? Presumably, the v6 model is backward compatible with v1, so we may try to run v6 in the compatibility mode - that is to convince the Adapter to process v6 request as if it's v1 request with the only difference that the final upstream request to GCP VertexAI will be to v6 and not v1.

There are two way to enable compatibility mode in the adapter.

Compatibility configuration in DIAL Core config

Since: 0.32.0

It's possible to define compatible model on per-upstream basis in the DIAL Core configuration.

E.g. the following configuration enables gemini-2.0-flash-006 model (a hypothetical model that isn't supported by the Adapter natively) via gemini-2.0-flash-001 model (that is supported by the Adapter natively):

{
  "models": {
    "dial-deployment-id-for-gemini-2": {
      "type": "chat",
      "endpoint": "${ADAPTER_ORIGIN}/deployments/gemini-2.0-flash-006/chat/completions",
      "upstreams": [
        {
          "extraData": {
            "compatible_model_id": "gemini-2.0-flash-001"
          }
        }
      ]
    }
  }
}

The given configuration enables the adapter to handle requests to the gemini-2.0-flash-006 deployment. The requests will be processed by the same pipeline as gemini-2.0-flash-001, but the call to GCP VertexAI will be done to gemini-2.0-flash-006 deployment name.

Naturally, this will only work if the APIs of v1 and v6 deployments are compatible:

The requests utilizing the modalities supported by both v1 and v6 will work just fine.
However, the requests with modalities that are supported by v6 and aren't supported by v1, won't be processed correctly. You will have to wait until the adapter supports the v6 deployment natively.

When a version of the adapter supporting the v6 model is released, you may migrate to it and safely remove the compatible_model_id from the DIAL Core config.

Note that setting compatible_model_id=imagen-4.0-generate-001 will be ineffectual, since the APIs of the two model and their capabilities are drastically different.

Important

If the DIAL deployment has many upstreams, the compatible_model_id field should be set in all of the upstreams.

Compatibility configuration in Adapter

COMPATIBILITY_MAPPING env variable enables compatibility mode on the adapter level. It hold a mapping from unsupported deployment ids to supported deployment ids.

E.g. the following mapping enables gemini-2.0-flash-006 via gemini-2.0-flash-001:

COMPATIBILITY_MAPPING={"gemini-2.0-flash-006": "gemini-2.0-flash-001"}

Important

Model compatibility configuration using the COMPATIBILITY_MAPPING environment variable has been deprecated since 0.32.0 in favor of configuration in DIAL Core. While still supported for now, its use is discouraged and it may be removed in a future release.

Load balancing

If you use DIAL Core load balancing mechanism, you can provide extraData upstream setting the region and the project to use for a particular upstream:

{
  "upstreams": [
    {
      "extraData": {
        "project": "project1",
        "region": "us-central1"
      }
    },
    {
      "extraData": {
        "project": "project1",
        "region": "us-east5"
      }
    },
    {
      "extraData": {
        "project": "project2"
      }
    },
    {
      "key": "api-key"
    }
  ]
}

The fields in the extra data override the corresponding environment variables:

`extraData` field	Env variable
`region`	`DEFAULT_REGION`
`project`	`GCP_PROJECT_ID`

Note

The region and project configuration is only supported for Gemini>=2 and Anthropic models.

Global endpoint

Use the global region to enable the global endpoint:

{
  "upstreams": [
    {
      "extraData": {
        "region": "global"
      }
    }
  ]
}

Note

The global endpoint is supported only for certain models and has a few other limitations.

Prompt caching

Implicit caching

Gemini 2.5 models support implicit context caching.

Any request over a certain amount of tokens will be automatically cached. The token threshold triggering caching for Gemini 2.5 Flash is 1024 and for Gemini 2.5 Pro - 4096.

Set autoCachingSupported flag in the DIAL Core config for a deployment of interest to enable this feature:

{
  "models": {
    "my-dial-gemini-deployment": {
      "type": "chat",
      "displayName": "Gemini 2.5 Flash",
      "endpoint": "${VERTEXAI_ADAPTER_ORIGIN}/openai/deployments/gemini-2.5-flash/chat/completions",
      "upstreams": [
        {
          "extraData": {
            "region": "us-central1"
          }
        },
        {
          "extraData": {
            "region": "us-east5"
          }
        }
      ],
      "features": {
        "autoCachingSupported": true
      }
    }
  }
}

On a cache hit, usage.prompt_tokens_details.cached_tokens field reports the number of cached prompt tokens.

Authentication

GCP Vertex AI

Access to GCP Vertex AI is authenticated via Application Default Credentials (ADC) with region and project configured either:

globally via DEFAULT_REGION and GCP_PROJECT_ID environment vars, or
on a per upstream basis via upstreams.extraData fields in DIAL Core Config.

Anthropic API / Google AI Platform

Gemini>=2 and Anthropic deployments could be accessed via API key. The API keys should be configured per-upstream in the DIAL Core config:

{
  "models": {
    "gemini-dial-deployment-id": {
      "endpoint": "${ADAPTER_ORIGIN}/deployments/gemini-2.0-flash-lite-001/chat/completions",
      "upstreams": [
        {
          "key": "gemini-api-key"
        }
      ]
    },
    "claude-dial-deployment-id": {
      "endpoint": "${ADAPTER_ORIGIN}/deployments/claude-3-5-sonnet-20241022/chat/completions",
      "upstreams": [
        {
          "key": "anthropic-api-key",
          "extraData": {
            "compatible_model_id": "claude-3-5-sonnet-v2@20241022"
          }
        }
      ]
    }
  }
}

Keep in mind that the same Anthropic models have different identifiers in Anthropic API and GPC Vertex AI.

E.g. claude-3-5-sonnet-v2@20241022 in GCP Vertex AI corresponds to claude-3-5-sonnet-20241022 in Anthropic API.

The VertexAI adapter uses model names from GCP Vertex AI. Therefore, in order to use Anthropic API model name you need to specify the corresponding name from GCP Vertex AI in the compatible_model_id field. Otherwise, the adapter returns 404.

Anthropic Foundry

The adapter supports access to Claude models from Azure Foundry service.

{
  "models": {
    "claude-dial-deployment-id": {
      "endpoint": "${VERTEXAI_ADAPTER_ORIGIN}/openai/deployments/claude-sonnet-4-520250929/chat/completions",
      "upstreams": [
        {
          "endpoint": "https://${AZURE_FOUNDRY_RESOURCE_NAME1}.services.ai.azure.com/anthropic/v1/messages",
          "key": "optional-azure-foundry-api-key1"
        },
        {
          "endpoint": "https://${AZURE_FOUNDRY_RESOURCE_NAME2}.services.ai.azure.com/anthropic/v1/messages",
          "key": "optional-azure-foundry-api-key2"
        }
      ]
    }
  }
}

The DefaultAzureCredential is used to authenticate requests to Azure when an API key is not provided in the upstream configuration.

Since the models names in Azure Foundry are different from the ones in GCP Vertex AI, you need to map them onto supported deployment names using the compatibility mapping:

COMPATIBILITY_MAPPING={"claude-sonnet-4-520250929":"claude-sonnet-4-5@20250929"}

Development

Development Environment

This project requires Python ≥3.11 and Poetry ≥2.1.1 for dependency management.

Setup

Install Poetry. See the official installation guide.
(Optional) Specify custom Python or Poetry executables in .env.dev. This is useful if multiple versions are installed. By default, python and poetry are used.
```
POETRY_PYTHON=path-to-python-exe
POETRY=path-to-poetry-exe
```
Create and activate the virtual environment:
```
make init_env
source .venv/bin/activate
```
Install project dependencies (including linting, formatting, and test tools):
```
make install
```

IDE configuration

The recommended IDE is VS Code. Open the project in VS Code and install the recommended extensions. VS Code is configured to use the Ruff formatter.

Alternatively you can use PyCharm that has built-in Ruff support.

Make on Windows

As of now, Windows distributions do not include the make tool. To run make commands, the tool can be installed using the following command (since Windows 10):

winget install GnuWin32.Make

For convenience, the tool folder can be added to the PATH environment variable as C:\Program Files (x86)\GnuWin32\bin. The command definitions inside Makefile should be cross-platform to keep the development environment setup simple.

Run

Run the development server locally:

make serve

Run the server from a Docker container:

make docker_serve

Lint

Don't forget to run the linting before committing:

make lint

To auto-fix formatting issues run:

make format

Test

To run the unit tests locally:

make test

To run the unit tests from the Docker container:

make docker_test

To run the integration tests locally:

make integration_tests

Clean

To remove the virtual environment and build artifacts:

make clean

Name		Name	Last commit message	Last commit date
Latest commit History 336 Commits
.github		.github
.vscode		.vscode
aidial_adapter_vertexai		aidial_adapter_vertexai
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env.dev.example		.env.dev.example
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
.markdownlint.json		.markdownlint.json
.ort.yml		.ort.yml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.test		Dockerfile.test
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
noxfile.py		noxfile.py
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml
trivy.yaml		trivy.yaml

Folders and files

Latest commit

History

Repository files navigation

DIAL VertexAI Adapter

Overview

Supported models

Chat completion models

Image editing in Gemini 2.5 Flash Image

Configurable models

Imagen models

Veo models

Gemini 2.5, Gemini 3 models

Gemini 2.5 Flash Image model

Claude models

Google Search grounding

Code Interpreter tool

Embedding models

Gemini Embedding 2

Environment variables

Default max_tokens for Claude models

Compatibility mode

Compatibility configuration in DIAL Core config

Compatibility configuration in Adapter

Load balancing

Global endpoint

Prompt caching

Implicit caching

Authentication

GCP Vertex AI

Anthropic API / Google AI Platform

Anthropic Foundry

Development

Development Environment

Setup

IDE configuration

Make on Windows

Run

Lint

Test

Clean

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 42

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Default `max_tokens` for Claude models

Packages