Skip to content

Conversation

@lramos15
Copy link
Member

@lramos15 lramos15 commented Jan 9, 2026

Copilot AI review requested due to automatic review settings January 9, 2026 15:43
@lramos15 lramos15 enabled auto-merge January 9, 2026 15:43
@lramos15 lramos15 self-assigned this Jan 9, 2026
@lramos15 lramos15 requested a review from bhavyaus January 9, 2026 15:43
@vs-code-engineering vs-code-engineering bot added this to the January 2026 milestone Jan 9, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements extension-contributed tokenization by creating a new ExtensionContributedChatTokenizer class that leverages the VS Code Language Model API for token counting. This replaces the previous approach where extension-contributed endpoints relied on the generic ITokenizerProvider service.

Key Changes

  • Adds ExtensionContributedChatTokenizer that delegates token counting to the VS Code language model API
  • Removes ITokenizerProvider dependency from ExtensionContributedChatEndpoint
  • Adds comprehensive unit tests for the new tokenizer implementation

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/platform/endpoint/vscode-node/extChatTokenizer.ts New tokenizer implementation that uses VS Code's LanguageModelChat.countTokens API for token counting with special handling for different content part types
src/platform/endpoint/vscode-node/test/extChatTokenizer.spec.ts Comprehensive test suite covering string tokens, content parts, messages, and tools with mock implementations
src/platform/endpoint/vscode-node/extChatEndpoint.ts Updated to instantiate ExtensionContributedChatTokenizer directly, removing ITokenizerProvider dependency injection
Comments suppressed due to low confidence (2)

src/platform/endpoint/vscode-node/test/extChatTokenizer.spec.ts:125

  • The countMessageTokens method returns 0 when convertToApiChatMessage returns an empty array, but this case is not tested. Since convertToApiChatMessage can skip certain content parts (like non-base64 images), it's possible for valid input messages to result in empty API messages. Consider adding a test case that verifies this behavior, such as a message with only non-base64 image URLs that get skipped during conversion.
	describe('countMessageTokens', () => {
		it('should count tokens for a user message', async () => {
			const message: Raw.ChatMessage = {
				role: Raw.ChatRole.User,
				content: [{ type: Raw.ChatCompletionContentPartKind.Text, text: 'Hello there' }]
			};
			const result = await tokenizer.countMessageTokens(message);
			// BaseTokensPerMessage (3) + message content tokens
			expect(result).toBeGreaterThanOrEqual(3);
		});

		it('should count tokens for an assistant message', async () => {
			const message: Raw.ChatMessage = {
				role: Raw.ChatRole.Assistant,
				content: [{ type: Raw.ChatCompletionContentPartKind.Text, text: 'I can help with that' }]
			};
			const result = await tokenizer.countMessageTokens(message);
			expect(result).toBeGreaterThanOrEqual(3);
		});

		it('should count tokens for a system message', async () => {
			const message: Raw.ChatMessage = {
				role: Raw.ChatRole.System,
				content: [{ type: Raw.ChatCompletionContentPartKind.Text, text: 'You are a helpful assistant' }]
			};
			const result = await tokenizer.countMessageTokens(message);
			expect(result).toBeGreaterThanOrEqual(3);
		});
	});

src/platform/endpoint/vscode-node/extChatTokenizer.ts:114

  • The _countObjectTokens method doesn't handle arrays correctly. When it encounters an array (which is typeof 'object'), it will cast it to Record<string, unknown> and use Object.entries, which will iterate over array indices as keys. This could lead to incorrect token counting for tool parameters that contain arrays. Consider adding explicit handling for arrays, such as checking Array.isArray(value) before the typeof check and iterating over array elements appropriately.
			} else if (typeof value === 'object') {
				numTokens += await this._countObjectTokens(value as Record<string, unknown>);
			}

private async _countObjectTokens(obj: Record<string, unknown>): Promise<number> {
let numTokens = 0;
for (const [key, value] of Object.entries(obj)) {
if (!value) {
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check if (!value) on line 105 will skip over falsy values including 0, false, empty strings, etc. While this may be intentional for null/undefined, it could incorrectly skip legitimate values. For example, a tool parameter with a default value of 0 or false would not contribute to the token count. Consider using a more explicit check like if (value === null || value === undefined) to only skip truly absent values.

This issue also appears in the following locations of the same file:

  • line 112
Suggested change
if (!value) {
if (value === null || value === undefined) {

Copilot uses AI. Check for mistakes.
Comment on lines +154 to +193
describe('countToolTokens', () => {
it('should count tokens for a single tool', async () => {
const tools = [{
name: 'get_weather',
description: 'Get the current weather',
inputSchema: {
type: 'object',
properties: {
location: { type: 'string' }
}
}
}];
const result = await tokenizer.countToolTokens(tools);
// baseToolTokens (16) + baseTokensPerTool (8) + object tokens * 1.1
expect(result).toBeGreaterThan(24);
});

it('should count tokens for multiple tools', async () => {
const tools = [
{
name: 'get_weather',
description: 'Get weather info',
inputSchema: { type: 'object' }
},
{
name: 'search',
description: 'Search the web',
inputSchema: { type: 'object' }
}
];
const result = await tokenizer.countToolTokens(tools);
// baseToolTokens (16) + 2 * baseTokensPerTool (8) + object tokens
expect(result).toBeGreaterThan(32);
});

it('should return 0 for empty tools array', async () => {
const result = await tokenizer.countToolTokens([]);
expect(result).toBe(0);
});
});
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage is missing for tool schemas with non-string/non-object values (numbers, booleans, null). These values should be considered when counting tokens, but the current implementation may skip them due to the falsy check or lack of handling. Consider adding tests for tool schemas with:

  1. Numeric values (e.g., maxLength: 100)
  2. Boolean values (e.g., required: true)
  3. Null values
  4. Arrays (e.g., required: ['field1', 'field2'])

Copilot uses AI. Check for mistakes.
Comment on lines +42 to +49
if (text.imageUrl.url.startsWith('data:image/')) {
try {
return calculateImageTokenCost(text.imageUrl.url, text.imageUrl.detail);
} catch {
return this._textTokenLength(text.imageUrl.url);
}
}
return this._textTokenLength(text.imageUrl.url);
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no null safety check for text.imageUrl before accessing text.imageUrl.url. If the imageUrl property is undefined or null, this will throw a runtime error. Consider adding a check like if (text.imageUrl?.url.startsWith('data:image/')) or handling the case where imageUrl might be missing.

Suggested change
if (text.imageUrl.url.startsWith('data:image/')) {
try {
return calculateImageTokenCost(text.imageUrl.url, text.imageUrl.detail);
} catch {
return this._textTokenLength(text.imageUrl.url);
}
}
return this._textTokenLength(text.imageUrl.url);
if (!text.imageUrl || !text.imageUrl.url) {
return 0;
}
{
const imageUrl = text.imageUrl;
if (imageUrl.url.startsWith('data:image/')) {
try {
return calculateImageTokenCost(imageUrl.url, imageUrl.detail);
} catch {
return this._textTokenLength(imageUrl.url);
}
}
return this._textTokenLength(imageUrl.url);
}

Copilot uses AI. Check for mistakes.
Comment on lines +88 to +94
it('should return 0 for cache breakpoint content parts', async () => {
const cacheBreakpoint: Raw.ChatCompletionContentPart = {
type: Raw.ChatCompletionContentPartKind.CacheBreakpoint
};
const result = await tokenizer.tokenLength(cacheBreakpoint);
expect(result).toBe(0);
});
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test suite is missing test coverage for the Image content part type. The implementation in extChatTokenizer.ts handles Image content parts with special logic for base64 data URLs and calls calculateImageTokenCost. Consider adding tests for:

  1. Image parts with base64 data URLs (data:image/...)
  2. Image parts with regular URLs
  3. Image parts where calculateImageTokenCost throws an exception

This issue also appears in the following locations of the same file:

  • line 97

Copilot uses AI. Check for mistakes.
// Convert to VS Code message format and use the language model's countTokens
const apiMessages = convertToApiChatMessage([message]);
if (apiMessages.length === 0) {
return 0;
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When convertToApiChatMessage returns an empty array (e.g., for messages with only non-base64 images), this method returns 0. However, this behavior may not accurately represent the token cost of such messages, as the empty result is due to content filtering during conversion rather than the message truly being empty. Consider whether returning BaseTokensPerMessage (3) would be more appropriate in this case, or adding a comment to document this edge case behavior.

Suggested change
return 0;
// Edge case: convertToApiChatMessage can return an empty array when the message
// only contains filtered content (e.g., non-base64 images). In that case, we still
// account for the base per-message token overhead.
return BaseTokensPerMessage;

Copilot uses AI. Check for mistakes.
@lramos15 lramos15 added this pull request to the merge queue Jan 12, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 12, 2026
@lramos15 lramos15 added this pull request to the merge queue Jan 13, 2026
Merged via the queue into main with commit abf24aa Jan 13, 2026
24 checks passed
@lramos15 lramos15 deleted the lramos15/legislative-bat branch January 13, 2026 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue: LanguageModelChatProvider.provideTokenCount is never invoked

3 participants