Skip to content

Commit 0491aa3

Browse files
committed
feature #468 [AI Bundle][Demo] Make vectorizers configurable (OskarStark)
This PR was squashed before being merged into the main branch. Discussion ---------- [AI Bundle][Demo] Make vectorizers configurable | Q | A | ------------- | --- | Bug fix? | no | New feature? | yes | Docs? | no | Issues | Refs #465 | License | MIT Add support for configuring vectorizers via ai.yaml configuration, allowing reuse across multiple indexers and centralized vectorizer management. Commits ------- 7acf871 [AI Bundle][Demo] Make vectorizers configurable
2 parents 8ac8528 + 7acf871 commit 0491aa3

File tree

5 files changed

+199
-19
lines changed

5 files changed

+199
-19
lines changed

demo/config/packages/ai.yaml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,11 +53,15 @@ ai:
5353
chroma_db:
5454
symfonycon:
5555
collection: 'symfony_blog'
56-
indexer:
57-
default:
56+
vectorizer:
57+
openai_embeddings:
5858
model:
5959
class: 'Symfony\AI\Platform\Bridge\OpenAi\Embeddings'
6060
name: !php/const Symfony\AI\Platform\Bridge\OpenAi\Embeddings::TEXT_ADA_002
61+
indexer:
62+
default:
63+
vectorizer: 'ai.vectorizer.openai_embeddings'
64+
store: 'ai.store.chroma_db.symfonycon'
6165

6266
services:
6367
_defaults:

src/ai-bundle/config/options.php

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
use Probots\Pinecone\Client as PineconeClient;
1717
use Symfony\AI\Platform\Bridge\OpenAi\PlatformFactory;
1818
use Symfony\AI\Platform\PlatformInterface;
19+
use Symfony\AI\Store\Document\VectorizerInterface;
1920
use Symfony\AI\Store\StoreInterface;
2021

2122
return static function (DefinitionConfigurator $configurator): void {
@@ -371,14 +372,10 @@
371372
->end()
372373
->end()
373374
->end()
374-
->arrayNode('indexer')
375+
->arrayNode('vectorizer')
375376
->useAttributeAsKey('name')
376377
->arrayPrototype()
377378
->children()
378-
->scalarNode('store')
379-
->info('Service name of store')
380-
->defaultValue(StoreInterface::class)
381-
->end()
382379
->scalarNode('platform')
383380
->info('Service name of platform')
384381
->defaultValue(PlatformInterface::class)
@@ -395,6 +392,21 @@
395392
->end()
396393
->end()
397394
->end()
395+
->arrayNode('indexer')
396+
->useAttributeAsKey('name')
397+
->arrayPrototype()
398+
->children()
399+
->scalarNode('vectorizer')
400+
->info('Service name of vectorizer')
401+
->defaultValue(VectorizerInterface::class)
402+
->end()
403+
->scalarNode('store')
404+
->info('Service name of store')
405+
->defaultValue(StoreInterface::class)
406+
->end()
407+
->end()
408+
->end()
409+
->end()
398410
->end()
399411
;
400412
};

src/ai-bundle/doc/index.rst

Lines changed: 79 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -113,13 +113,28 @@ Configuration
113113
memory:
114114
ollama:
115115
strategy: 'manhattan'
116-
indexer:
117-
default:
118-
# platform: 'ai.platform.mistral'
119-
# store: 'ai.store.chroma_db.default'
116+
vectorizer:
117+
# Reusable vectorizer configurations
118+
openai_embeddings:
119+
platform: 'ai.platform.openai'
120+
model:
121+
class: 'Symfony\AI\Platform\Bridge\OpenAi\Embeddings'
122+
name: !php/const Symfony\AI\Platform\Bridge\OpenAi\Embeddings::TEXT_EMBEDDING_3_SMALL
123+
options:
124+
dimensions: 512
125+
mistral_embeddings:
126+
platform: 'ai.platform.mistral'
120127
model:
121128
class: 'Symfony\AI\Platform\Bridge\Mistral\Embeddings'
122129
name: !php/const Symfony\AI\Platform\Bridge\Mistral\Embeddings::MISTRAL_EMBED
130+
indexer:
131+
default:
132+
vectorizer: 'ai.vectorizer.openai_embeddings'
133+
store: 'ai.store.chroma_db.default'
134+
135+
research:
136+
vectorizer: 'ai.vectorizer.mistral_embeddings'
137+
store: 'ai.store.memory.research'
123138
124139
Usage
125140
-----
@@ -319,6 +334,66 @@ To disable token usage tracking for an agent, set the ``track_token_usage`` opti
319334
class: 'Symfony\AI\Platform\Bridge\OpenAi\Gpt'
320335
name: !php/const Symfony\AI\Platform\Bridge\OpenAi\Gpt::GPT_4O_MINI
321336
337+
Vectorizers
338+
-----------
339+
340+
Vectorizers are components that convert text documents into vector embeddings for storage and retrieval.
341+
They can be configured once and reused across multiple indexers, providing better maintainability and consistency.
342+
343+
**Configuring Vectorizers**
344+
345+
Vectorizers are defined in the ``vectorizer`` section of your configuration:
346+
347+
.. code-block:: yaml
348+
349+
ai:
350+
vectorizer:
351+
openai_small:
352+
platform: 'ai.platform.openai'
353+
model:
354+
class: 'Symfony\AI\Platform\Bridge\OpenAi\Embeddings'
355+
name: !php/const Symfony\AI\Platform\Bridge\OpenAi\Embeddings::TEXT_EMBEDDING_3_SMALL
356+
options:
357+
dimensions: 512
358+
359+
openai_large:
360+
platform: 'ai.platform.openai'
361+
model:
362+
class: 'Symfony\AI\Platform\Bridge\OpenAi\Embeddings'
363+
name: !php/const Symfony\AI\Platform\Bridge\OpenAi\Embeddings::TEXT_EMBEDDING_3_LARGE
364+
365+
mistral_embed:
366+
platform: 'ai.platform.mistral'
367+
model:
368+
class: 'Symfony\AI\Platform\Bridge\Mistral\Embeddings'
369+
name: !php/const Symfony\AI\Platform\Bridge\Mistral\Embeddings::MISTRAL_EMBED
370+
371+
**Using Vectorizers in Indexers**
372+
373+
Once configured, vectorizers can be referenced by name in indexer configurations:
374+
375+
.. code-block:: yaml
376+
377+
ai:
378+
indexer:
379+
documents:
380+
vectorizer: 'ai.vectorizer.openai_small'
381+
store: 'ai.store.chroma_db.documents'
382+
383+
research:
384+
vectorizer: 'ai.vectorizer.openai_large'
385+
store: 'ai.store.chroma_db.research'
386+
387+
knowledge_base:
388+
vectorizer: 'ai.vectorizer.mistral_embed'
389+
store: 'ai.store.memory.kb'
390+
391+
**Benefits of Configured Vectorizers**
392+
393+
* **Reusability**: Define once, use in multiple indexers
394+
* **Consistency**: Ensure all indexers using the same vectorizer have identical embedding configuration
395+
* **Maintainability**: Change vectorizer settings in one place
396+
322397
Profiler
323398
--------
324399

src/ai-bundle/src/AiBundle.php

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -148,6 +148,10 @@ public function loadExtension(array $config, ContainerConfigurator $container, C
148148
$builder->removeDefinition('ai.command.drop_store');
149149
}
150150

151+
foreach ($config['vectorizer'] ?? [] as $vectorizerName => $vectorizer) {
152+
$this->processVectorizerConfig($vectorizerName, $vectorizer, $builder);
153+
}
154+
151155
foreach ($config['indexer'] as $indexerName => $indexer) {
152156
$this->processIndexerConfig($indexerName, $indexer, $builder);
153157
}
@@ -1031,7 +1035,7 @@ private function processStoreConfig(string $type, array $stores, ContainerBuilde
10311035
/**
10321036
* @param array<string, mixed> $config
10331037
*/
1034-
private function processIndexerConfig(int|string $name, array $config, ContainerBuilder $container): void
1038+
private function processVectorizerConfig(string $name, array $config, ContainerBuilder $container): void
10351039
{
10361040
['class' => $modelClass, 'name' => $modelName, 'options' => $options] = $config['model'];
10371041

@@ -1048,16 +1052,23 @@ private function processIndexerConfig(int|string $name, array $config, Container
10481052
}
10491053

10501054
$modelDefinition->addTag('ai.model.embeddings_model');
1051-
$container->setDefinition('ai.indexer.'.$name.'.model', $modelDefinition);
1055+
$container->setDefinition('ai.vectorizer.'.$name.'.model', $modelDefinition);
10521056

10531057
$vectorizerDefinition = new Definition(Vectorizer::class, [
10541058
new Reference($config['platform']),
1055-
new Reference('ai.indexer.'.$name.'.model'),
1059+
new Reference('ai.vectorizer.'.$name.'.model'),
10561060
]);
1057-
$container->setDefinition('ai.indexer.'.$name.'.vectorizer', $vectorizerDefinition);
1061+
$vectorizerDefinition->addTag('ai.vectorizer', ['name' => $name]);
1062+
$container->setDefinition('ai.vectorizer.'.$name, $vectorizerDefinition);
1063+
}
10581064

1065+
/**
1066+
* @param array<string, mixed> $config
1067+
*/
1068+
private function processIndexerConfig(int|string $name, array $config, ContainerBuilder $container): void
1069+
{
10591070
$definition = new Definition(Indexer::class, [
1060-
new Reference('ai.indexer.'.$name.'.vectorizer'),
1071+
new Reference($config['vectorizer']),
10611072
new Reference($config['store']),
10621073
new Reference('logger', ContainerInterface::IGNORE_ON_INVALID_REFERENCE),
10631074
]);

src/ai-bundle/tests/DependencyInjection/AiBundleTest.php

Lines changed: 81 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@
1818
use PHPUnit\Framework\Attributes\UsesClass;
1919
use PHPUnit\Framework\TestCase;
2020
use Symfony\AI\AiBundle\AiBundle;
21+
use Symfony\AI\Platform\Bridge\OpenAi\Embeddings;
22+
use Symfony\AI\Store\Document\Vectorizer;
2123
use Symfony\Component\Config\Definition\Exception\InvalidConfigurationException;
2224
use Symfony\Component\DependencyInjection\ContainerBuilder;
2325
use Symfony\Component\DependencyInjection\Definition;
@@ -591,6 +593,77 @@ public function testOpenAiPlatformWithInvalidRegion()
591593
]);
592594
}
593595

596+
public function testVectorizerConfiguration()
597+
{
598+
$container = $this->buildContainer([
599+
'ai' => [
600+
'vectorizer' => [
601+
'my_vectorizer' => [
602+
'platform' => 'my_platform_service_id',
603+
'model' => [
604+
'class' => 'Symfony\AI\Platform\Bridge\OpenAi\Embeddings',
605+
'name' => 'text-embedding-3-small',
606+
'options' => ['dimension' => 512],
607+
],
608+
],
609+
],
610+
],
611+
]);
612+
613+
$this->assertTrue($container->hasDefinition('ai.vectorizer.my_vectorizer'));
614+
$this->assertTrue($container->hasDefinition('ai.vectorizer.my_vectorizer.model'));
615+
616+
$vectorizerDefinition = $container->getDefinition('ai.vectorizer.my_vectorizer');
617+
$this->assertSame(Vectorizer::class, $vectorizerDefinition->getClass());
618+
$this->assertTrue($vectorizerDefinition->hasTag('ai.vectorizer'));
619+
620+
$modelDefinition = $container->getDefinition('ai.vectorizer.my_vectorizer.model');
621+
$this->assertSame(Embeddings::class, $modelDefinition->getClass());
622+
$this->assertTrue($modelDefinition->hasTag('ai.model.embeddings_model'));
623+
}
624+
625+
public function testIndexerWithConfiguredVectorizer()
626+
{
627+
$container = $this->buildContainer([
628+
'ai' => [
629+
'store' => [
630+
'memory' => [
631+
'my_store' => [],
632+
],
633+
],
634+
'vectorizer' => [
635+
'my_vectorizer' => [
636+
'platform' => 'my_platform_service_id',
637+
'model' => [
638+
'class' => 'Symfony\AI\Platform\Bridge\OpenAi\Embeddings',
639+
'name' => 'text-embedding-3-small',
640+
],
641+
],
642+
],
643+
'indexer' => [
644+
'my_indexer' => [
645+
'vectorizer' => 'ai.vectorizer.my_vectorizer',
646+
'store' => 'ai.store.memory.my_store',
647+
],
648+
],
649+
],
650+
]);
651+
652+
$this->assertTrue($container->hasDefinition('ai.indexer.my_indexer'));
653+
$this->assertTrue($container->hasDefinition('ai.vectorizer.my_vectorizer'));
654+
655+
$indexerDefinition = $container->getDefinition('ai.indexer.my_indexer');
656+
$arguments = $indexerDefinition->getArguments();
657+
658+
// First argument should be a reference to the vectorizer
659+
$this->assertInstanceOf(Reference::class, $arguments[0]);
660+
$this->assertSame('ai.vectorizer.my_vectorizer', (string) $arguments[0]);
661+
662+
// Should not create model-specific vectorizer when using configured one
663+
$this->assertFalse($container->hasDefinition('ai.indexer.my_indexer.vectorizer'));
664+
$this->assertFalse($container->hasDefinition('ai.indexer.my_indexer.model'));
665+
}
666+
594667
private function buildContainer(array $configuration): ContainerBuilder
595668
{
596669
$container = new ContainerBuilder();
@@ -838,9 +911,8 @@ private function getFullConfig(): array
838911
],
839912
],
840913
],
841-
'indexer' => [
842-
'my_text_indexer' => [
843-
'store' => 'my_azure_search_store_service_id',
914+
'vectorizer' => [
915+
'test_vectorizer' => [
844916
'platform' => 'mistral_platform_service_id',
845917
'model' => [
846918
'class' => 'Symfony\AI\Platform\Bridge\Mistral\Embeddings',
@@ -849,6 +921,12 @@ private function getFullConfig(): array
849921
],
850922
],
851923
],
924+
'indexer' => [
925+
'my_text_indexer' => [
926+
'vectorizer' => 'ai.vectorizer.test_vectorizer',
927+
'store' => 'my_azure_search_store_service_id',
928+
],
929+
],
852930
],
853931
];
854932
}

0 commit comments

Comments
 (0)