Skip to content

Conversation

Guikingone
Copy link
Contributor

@Guikingone Guikingone commented Sep 3, 2025

Q A
Bug fix? no
New feature? yes
Docs? yes
Issues Somehow related to #337
License MIT

Hi 👋🏻

This PR aim to introduce a caching layer for Ollama platform (like OpenAI, Anthropic and more already does).

@carsonbot carsonbot added Feature New feature Platform Issues & PRs about the AI Platform component Status: Needs Review labels Sep 3, 2025
@Guikingone Guikingone force-pushed the ollama/prompt_caching branch from 51fa81c to 1fa590a Compare September 3, 2025 12:04
@OskarStark OskarStark changed the title [Platform] Add Ollama prompt cache [Platform][Ollama] Add prompt cache Sep 3, 2025
@Guikingone Guikingone force-pushed the ollama/prompt_caching branch 2 times, most recently from bf5a1fe to 5ef4417 Compare September 3, 2025 12:18
Comment on lines 29 to 39
$result = $agent->call($messages, [
'prompt_cache_key' => 'chat',
]);

echo $result->getContent().\PHP_EOL;

$secondResult = $agent->call($messages, [
'prompt_cache_key' => 'chat',
]);

echo $result->getContent().\PHP_EOL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we ensure, that it really uses the cache and does not just return the exact same answer twice?

@Guikingone Guikingone force-pushed the ollama/prompt_caching branch from 5ef4417 to cc5f431 Compare September 5, 2025 11:57
->arrayNode('ollama')
->children()
->scalarNode('host_url')->defaultValue('http://127.0.0.1:11434')->end()
->scalarNode('cache')->end()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might end with the same cache repeated again and again in every platform.

Should we introduce a cache config key at a higher level ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question, IMHO, we should introduce a "root" key for it and allowing the override "per platform", @OskarStark @chr-hertel, any idea?

@Guikingone Guikingone force-pushed the ollama/prompt_caching branch 3 times, most recently from 8e15d3d to 46bca63 Compare September 5, 2025 15:31
@Guikingone Guikingone force-pushed the ollama/prompt_caching branch 4 times, most recently from 914eddf to 8112ab2 Compare September 15, 2025 08:57
Comment on lines +82 to +85
$metadata->add('cached', true);
$metadata->add('prompt_cache_key', $options['prompt_cache_key']);
$metadata->add('cached_prompt_count', $data['prompt_eval_count']);
$metadata->add('cached_completion_count', $data['eval_count']);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to group this data into a DTO, like there is TokenUsage, and then add that DTO to metadata or perhaps even reuse the said DTO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not convinced about the benefits of using an object here, we're only storing an integer, I don't see the benefits to be honest 🤔

@OskarStark @chr-hertel Any thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree, it would be great to have an object like CacheUsage similar to TokenUsage

Comment on lines +201 to +230
$firstCall = $platform->invoke(new Ollama(Ollama::LLAMA_3_2), [
'messages' => [
[
'role' => 'user',
'content' => 'Say hello world',
],
],
'model' => 'llama3.2',
], [
'prompt_cache_key' => 'foo',
]);

$result = $firstCall->getResult();

$this->assertSame('Hello world', $result->getContent());
$this->assertSame(10, $result->getMetadata()->get('cached_prompt_count'));
$this->assertSame(10, $result->getMetadata()->get('cached_completion_count'));

$secondCall = $platform->invoke(new Ollama(Ollama::LLAMA_3_2), [
'messages' => [
[
'role' => 'user',
'content' => 'Say hello world',
],
],
'model' => 'llama3.2',
], [
'prompt_cache_key' => 'foo',
]);

$secondResult = $secondCall->getResult();

$this->assertSame('Hello world', $secondResult->getContent());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$firstCall = $platform->invoke(new Ollama(Ollama::LLAMA_3_2), [
'messages' => [
[
'role' => 'user',
'content' => 'Say hello world',
],
],
'model' => 'llama3.2',
], [
'prompt_cache_key' => 'foo',
]);
$result = $firstCall->getResult();
$this->assertSame('Hello world', $result->getContent());
$this->assertSame(10, $result->getMetadata()->get('cached_prompt_count'));
$this->assertSame(10, $result->getMetadata()->get('cached_completion_count'));
$secondCall = $platform->invoke(new Ollama(Ollama::LLAMA_3_2), [
'messages' => [
[
'role' => 'user',
'content' => 'Say hello world',
],
],
'model' => 'llama3.2',
], [
'prompt_cache_key' => 'foo',
]);
$secondResult = $secondCall->getResult();
$this->assertSame('Hello world', $secondResult->getContent());
$firstCall = $platform->invoke(new Ollama(Ollama::LLAMA_3_2), [
'messages' => [
[
'role' => 'user',
'content' => 'Say hello world',
],
],
'model' => 'llama3.2',
], [
'prompt_cache_key' => 'foo',
]);
$secondCall = $platform->invoke(new Ollama(Ollama::LLAMA_3_2), [
'messages' => [
[
'role' => 'user',
'content' => 'Say hello world',
],
],
'model' => 'llama3.2',
], [
'prompt_cache_key' => 'foo',
]);
$firstResult = $firstCall->getResult();
$secondResult = $secondCall->getResult();
$this->assertSame('Hello world', $firstResult->getContent());
$this->assertSame(10, $firstResult->getMetadata()->get('cached_prompt_count'));
$this->assertSame(10, $firstResult->getMetadata()->get('cached_completion_count'));
$this->assertSame('Hello world', $secondResult->getContent());

@Guikingone Guikingone force-pushed the ollama/prompt_caching branch 3 times, most recently from de85ef7 to e038555 Compare September 23, 2025 11:56
@chr-hertel
Copy link
Member

Let's zoom a bit out here, for two reasons:

  1. doesn't ollama caching already?
  2. if we want to have it user land, why only ollama?

@Guikingone
Copy link
Contributor Author

doesn't ollama caching already?

Ollama does a "context caching" and/or a K/V caching, it stores the X latest messages for the model window (or pending tokens to speed TTFT), it's not a cache that returns the generated response if the request already exist.

if we want to have it user land, why only Ollama?

Well, because that's the one that I use the most and the easiest to implement first but we can integrate it for every platform if that's the question, we just need to use the API contract, both Anthropic and OpenAI already does it natively 🤔

If the question is: Could we implement it at the platform layer for every platform without relying on API calls, well, that's not a big deal to be honest and we could easily integrate it 🙂

@chr-hertel
Copy link
Member

What do you think about having it as decorator CachedPlatform or similar?

@Guikingone
Copy link
Contributor Author

I like the idea of CachedPlatform, looks and sound like HttpCache, I'll rewrite it 👍🏻

@Guikingone Guikingone force-pushed the ollama/prompt_caching branch from e038555 to 194f7a4 Compare September 29, 2025 15:51
if ('ollama' === $type) {
$arguments = [
$platform['host_url'],
new Reference('http_client', ContainerInterface::NULL_ON_INVALID_REFERENCE),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
new Reference('http_client', ContainerInterface::NULL_ON_INVALID_REFERENCE),
new Reference('http_client', ContainerInterface::NULL_ON_INVALID_REFERENCE),
new Reference('ai.platform.model_catalog.ollama'),

if (\array_key_exists('cache', $platform)) {
$arguments[] = new Reference($platform['cache'], ContainerInterface::NULL_ON_INVALID_REFERENCE);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$arguments is not used

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes here would also belong into CachedPlatform so every bridge can benefit from this decorator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature Platform Issues & PRs about the AI Platform component Status: Needs Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants