Skip to content

Commit 9a0e994

Browse files
authored
PHPORM-381 Add class metadata for vector search indexes (#2820)
* PHPORM-381 Add class metadata for vector search indexes * Test SchemaManager with vector search * Document the #[VectorSearchIndex] attribute * Copilot review
1 parent 0642c8b commit 9a0e994

15 files changed

+449
-30
lines changed

docs/en/reference/attributes-reference.rst

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1167,6 +1167,10 @@ Optional arguments:
11671167
This attribute is used to specify :ref:`search indexes <search_indexes>` for
11681168
`MongoDB Atlas Search <https://www.mongodb.com/docs/atlas/atlas-search/>`__.
11691169

1170+
.. note::
1171+
1172+
For vector search indexes, see :ref:`vector_search_index` below.
1173+
11701174
The arguments correspond to arguments for
11711175
`MongoDB\Collection::createSearchIndex() <https://www.mongodb.com/docs/php-library/current/reference/method/MongoDBCollection-createSearchIndex/>`__.
11721176
Excluding ``name``, arguments are used to create the
@@ -1397,6 +1401,73 @@ for the related collection.
13971401
// rest of the class code...
13981402
}
13991403
1404+
#[VectorSearchIndex]
1405+
--------------------
1406+
1407+
.. _vector_search_index:
1408+
1409+
The ``#[VectorSearchIndex]`` attribute is used to define a vector search index
1410+
on a document class. This enables efficient similarity search on vector fields,
1411+
such as those used for machine learning embeddings.
1412+
1413+
Optional arguments:
1414+
1415+
- ``name``: (optional) The name of the vector search index. If omitted, a default name is used.
1416+
- ``fields`` (required): A list of field definitions. Each field definition is an associative array describing a vector or filter field. For vector fields, the following keys are supported:
1417+
1418+
- ``type``: Must be set to ``'vector'`` for vector fields or ``'filter'`` for filter fields.
1419+
- ``path``: The name of the field in the document to index.
1420+
- ``numDimensions``: (vector fields only) The number of dimensions in the vector.
1421+
- ``similarity``: (vector fields only) The vector similarity function to use. Supported values include ``'euclidean'``, ``'cosine'``, and ``'dotProduct'``. Use the constants from ``Doctrine\ODM\MongoDB\Mapping\ClassMetadata::VECTOR_SIMILARITY_*`` for best compatibility.
1422+
- ``quantization``: (vector fields only, optional) The quantization method, e.g., ``'scalar'``.
1423+
- ``hnswOptions``: (vector fields only, optional) Options for the HNSW algorithm: ``maxEdges`` and ``numEdgeCandidates``.
1424+
1425+
For filter fields, only ``type: 'filter'`` and ``path`` are required.
1426+
1427+
1428+
Example:
1429+
1430+
.. code-block:: php
1431+
1432+
<?php
1433+
use Doctrine\ODM\MongoDB\Mapping\Annotations\Document;
1434+
use Doctrine\ODM\MongoDB\Mapping\Annotations\Field;
1435+
use Doctrine\ODM\MongoDB\Mapping\Annotations\Id;
1436+
use Doctrine\ODM\MongoDB\Mapping\Annotations\VectorSearchIndex;
1437+
use Doctrine\ODM\MongoDB\Mapping\ClassMetadata;
1438+
use Doctrine\ODM\MongoDB\Types\Type;
1439+
1440+
#[Document(collection: 'vector_embeddings')]
1441+
#[VectorSearchIndex(
1442+
fields: [
1443+
[
1444+
'type' => 'vector',
1445+
'path' => 'plotEmbeddingVoyage3Large',
1446+
'numDimensions' => 2048,
1447+
'similarity' => ClassMetadata::VECTOR_SIMILARITY_DOT_PRODUCT,
1448+
'quantization' => ClassMetadata::VECTOR_QUANTIZATION_SCALAR,
1449+
],
1450+
[
1451+
'type' => 'filter',
1452+
'path' => 'category',
1453+
],
1454+
],
1455+
)]
1456+
class VectorEmbedding
1457+
{
1458+
#[Id]
1459+
public ?string $id = null;
1460+
1461+
/** @var list<float> */
1462+
#[Field(type: Type::COLLECTION)]
1463+
public array $plotEmbeddingVoyage3Large = [];
1464+
1465+
#[Field]
1466+
public string $category;
1467+
}
1468+
1469+
For more details, see the MongoDB documentation on `Atlas Vector Search <https://www.mongodb.com/docs/atlas/atlas-vector-search/>`_.
1470+
14001471
#[Version]
14011472
----------
14021473

doctrine-mongo-mapping.xsd

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -157,6 +157,7 @@
157157
<xs:element name="also-load-methods" type="odm:also-load-methods" minOccurs="0" />
158158
<xs:element name="indexes" type="odm:indexes" minOccurs="0" />
159159
<xs:element name="search-indexes" type="odm:search-indexes" minOccurs="0" />
160+
<xs:element name="vector-search-indexes" type="odm:vector-search-indexes" minOccurs="0" />
160161
<xs:element name="shard-key" type="odm:shard-key" minOccurs="0" />
161162
<xs:element name="read-preference" type="odm:read-preference" minOccurs="0" />
162163
<xs:element name="schema-validation" type="odm:schema-validation" minOccurs="0" />
@@ -640,6 +641,35 @@
640641
</xs:restriction>
641642
</xs:simpleType>
642643

644+
<!-- https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-type/#atlas-vector-search-index-fields -->
645+
<xs:complexType name="vector-search-indexes">
646+
<xs:choice maxOccurs="unbounded">
647+
<xs:element name="vector-search-index" type="odm:vector-search-index" maxOccurs="unbounded" />
648+
</xs:choice>
649+
</xs:complexType>
650+
651+
<xs:complexType name="vector-search-index">
652+
<xs:choice maxOccurs="unbounded">
653+
<xs:element name="vector-field" type="odm:vector-search-vector-field" />
654+
<xs:element name="filter-field" type="odm:vector-search-filter-field" minOccurs="0" maxOccurs="unbounded" />
655+
</xs:choice>
656+
657+
<xs:attribute name="name" type="xs:string" />
658+
</xs:complexType>
659+
660+
<xs:complexType name="vector-search-vector-field">
661+
<xs:attribute name="path" type="xs:string" use="required" />
662+
<xs:attribute name="numDimensions" type="xs:int" use="required" />
663+
<xs:attribute name="similarity" type="xs:string" use="required" />
664+
<xs:attribute name="quantization" type="xs:string" />
665+
<xs:attribute name="hnswMaxEdges" type="xs:int" />
666+
<xs:attribute name="hnswNumEdgeCandidates" type="xs:int" />
667+
</xs:complexType>
668+
669+
<xs:complexType name="vector-search-filter-field">
670+
<xs:attribute name="path" type="xs:string" use="required" />
671+
</xs:complexType>
672+
643673
<xs:complexType name="shard-key">
644674
<xs:choice minOccurs="0" maxOccurs="unbounded">
645675
<xs:element name="key" type="odm:shard-key-key" maxOccurs="unbounded" />

lib/Doctrine/ODM/MongoDB/Mapping/Annotations/SearchIndex.php

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
/**
1212
* Defines a search index on a class.
1313
*
14+
* @see https://www.mongodb.com/docs/atlas/atlas-search/index-definitions/
15+
*
1416
* @Annotation
1517
* @NamedArgumentConstructor
1618
* @phpstan-import-type SearchIndexStoredSource from ClassMetadata
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
<?php
2+
3+
declare(strict_types=1);
4+
5+
namespace Doctrine\ODM\MongoDB\Mapping\Annotations;
6+
7+
use Attribute;
8+
use Doctrine\Common\Annotations\Annotation\NamedArgumentConstructor;
9+
use Doctrine\ODM\MongoDB\Mapping\ClassMetadata;
10+
11+
/**
12+
* Defines a vector search index on a class.
13+
*
14+
* @see https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-type/
15+
*
16+
* @Annotation
17+
* @NamedArgumentConstructor
18+
* @phpstan-import-type VectorSearchIndexField from ClassMetadata
19+
*/
20+
#[Attribute(Attribute::TARGET_CLASS | Attribute::IS_REPEATABLE)]
21+
class VectorSearchIndex implements Annotation
22+
{
23+
/** @param list<VectorSearchIndexField> $fields */
24+
public function __construct(
25+
public array $fields,
26+
public ?string $name = null,
27+
) {
28+
}
29+
}

lib/Doctrine/ODM/MongoDB/Mapping/ClassMetadata.php

Lines changed: 32 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
use ReflectionNamedType;
3636
use ReflectionProperty;
3737

38+
use function array_column;
3839
use function array_filter;
3940
use function array_key_exists;
4041
use function array_keys;
@@ -262,6 +263,17 @@
262263
* name: string,
263264
* definition: SearchIndexDefinition
264265
* }
266+
* @phpstan-type VectorSearchIndexField array{
267+
* type: 'vector'|'filter',
268+
* path: string,
269+
* numDimensions?: int,
270+
* similarity?: self::VECTOR_SIMILARITY_*,
271+
* quantization?: self::VECTOR_QUANTIZATION_*,
272+
* hnswOptions?: array{maxEdges?: int, numEdgeCandidates?: int}
273+
* }
274+
* @phpstan-type VectorSearchIndexDefinition array{
275+
* fields: list<VectorSearchIndexField>
276+
* }
265277
* @phpstan-type ShardKeys array<string, mixed>
266278
* @phpstan-type ShardOptions array<string, mixed>
267279
* @phpstan-type ShardKey array{
@@ -459,6 +471,13 @@
459471
*/
460472
public const DEFAULT_SEARCH_INDEX_NAME = 'default';
461473

474+
public const VECTOR_SIMILARITY_EUCLIDEAN = 'euclidean';
475+
public const VECTOR_SIMILARITY_COSINE = 'cosine';
476+
public const VECTOR_SIMILARITY_DOT_PRODUCT = 'dot_product';
477+
public const VECTOR_QUANTIZATION_NONE = 'none';
478+
public const VECTOR_QUANTIZATION_SCALAR = 'scalar';
479+
public const VECTOR_QUANTIZATION_BINARY = 'binary';
480+
462481
private const ALLOWED_GRIDFS_FIELDS = ['_id', 'chunkSize', 'filename', 'length', 'metadata', 'uploadDate'];
463482

464483
/**
@@ -1243,19 +1262,29 @@ public function hasIndexes(): bool
12431262
/**
12441263
* Add a search index for this Document.
12451264
*
1246-
* @phpstan-param SearchIndexDefinition $definition
1265+
* @phpstan-param SearchIndexDefinition|VectorSearchIndexDefinition $definition
1266+
* @phpstan-param 'search'|'vectorSearch' $type
12471267
*/
1248-
public function addSearchIndex(array $definition, ?string $name = null): void
1268+
public function addSearchIndex(array $definition, ?string $name = null, string $type = 'search'): void
12491269
{
12501270
$name ??= self::DEFAULT_SEARCH_INDEX_NAME;
12511271

1252-
if (empty($definition['mappings']['dynamic']) && empty($definition['mappings']['fields'])) {
1272+
if ($type !== 'search' && $type !== 'vectorSearch') {
1273+
throw new InvalidArgumentException(sprintf('Search index type must be either "search" or "vectorSearch", "%s" given.', $type));
1274+
}
1275+
1276+
if ($type === 'search' && empty($definition['mappings']['dynamic']) && empty($definition['mappings']['fields'])) {
12531277
throw MappingException::emptySearchIndexDefinition($this->name, $name);
12541278
}
12551279

1280+
if ($type === 'vectorSearch' && ! in_array('vector', array_column($definition['fields'] ?? [], 'type'), true)) {
1281+
throw MappingException::emptyVectorSearchIndexDefinition($this->name, $name);
1282+
}
1283+
12561284
$this->searchIndexes[] = [
12571285
'definition' => $definition,
12581286
'name' => $name,
1287+
'type' => $type,
12591288
];
12601289
}
12611290

lib/Doctrine/ODM/MongoDB/Mapping/Driver/AttributeDriver.php

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88
use Doctrine\ODM\MongoDB\Events;
99
use Doctrine\ODM\MongoDB\Mapping\Annotations as ODM;
1010
use Doctrine\ODM\MongoDB\Mapping\Annotations\AbstractIndex;
11-
use Doctrine\ODM\MongoDB\Mapping\Annotations\SearchIndex;
1211
use Doctrine\ODM\MongoDB\Mapping\Annotations\ShardKey;
1312
use Doctrine\ODM\MongoDB\Mapping\Annotations\TimeSeries;
1413
use Doctrine\ODM\MongoDB\Mapping\ClassMetadata;
@@ -108,6 +107,10 @@ public function loadMetadataForClass($className, PersistenceClassMetadata $metad
108107
$this->addSearchIndex($metadata, $attribute);
109108
}
110109

110+
if ($attribute instanceof ODM\VectorSearchIndex) {
111+
$this->addVectorSearchIndex($metadata, $attribute);
112+
}
113+
111114
if ($attribute instanceof ODM\Indexes) {
112115
trigger_deprecation(
113116
'doctrine/mongodb-odm',
@@ -370,7 +373,7 @@ private function addIndex(ClassMetadata $class, AbstractIndex $index, array $key
370373
}
371374

372375
/** @param ClassMetadata<object> $class */
373-
private function addSearchIndex(ClassMetadata $class, SearchIndex $index): void
376+
private function addSearchIndex(ClassMetadata $class, ODM\SearchIndex $index): void
374377
{
375378
$definition = [];
376379

@@ -386,7 +389,17 @@ private function addSearchIndex(ClassMetadata $class, SearchIndex $index): void
386389
}
387390
}
388391

389-
$class->addSearchIndex($definition, $index->name ?? null);
392+
$class->addSearchIndex($definition, $index->name ?? null, 'search');
393+
}
394+
395+
/** @param ClassMetadata<object> $class */
396+
private function addVectorSearchIndex(ClassMetadata $class, ODM\VectorSearchIndex $index): void
397+
{
398+
$definition = [
399+
'fields' => $index->fields,
400+
];
401+
402+
$class->addSearchIndex($definition, $index->name ?? null, 'vectorSearch');
390403
}
391404

392405
/**

lib/Doctrine/ODM/MongoDB/Mapping/Driver/XmlDriver.php

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,12 @@ public function loadMetadataForClass($className, \Doctrine\Persistence\Mapping\C
210210
}
211211
}
212212

213+
if (isset($xmlRoot->{'vector-search-indexes'})) {
214+
foreach ($xmlRoot->{'vector-search-indexes'}->{'vector-search-index'} as $searchIndex) {
215+
$this->addVectorSearchIndex($metadata, $searchIndex);
216+
}
217+
}
218+
213219
if (isset($xmlRoot->{'shard-key'})) {
214220
$this->setShardKey($metadata, $xmlRoot->{'shard-key'}[0]);
215221
}
@@ -748,6 +754,45 @@ private function getSearchIndexFieldDefinition(SimpleXMLElement $field): array
748754
return $fieldDefinition;
749755
}
750756

757+
/** @param ClassMetadata<object> $class */
758+
private function addVectorSearchIndex(ClassMetadata $class, SimpleXMLElement $searchIndex): void
759+
{
760+
$definition = ['fields' => []];
761+
762+
foreach ($searchIndex->{'vector-field'} as $vectorField) {
763+
$field = [
764+
'type' => 'vector',
765+
'path' => (string) $vectorField['path'],
766+
'numDimensions' => (int) $vectorField['numDimensions'],
767+
'similarity' => (string) $vectorField['similarity'],
768+
];
769+
if (isset($vectorField['quantization'])) {
770+
$field['quantization'] = (string) $vectorField['quantization'];
771+
}
772+
773+
if (isset($vectorField['hnswMaxEdges'])) {
774+
$field['hnswOptions']['maxEdges'] = (int) $vectorField['hnswMaxEdges'];
775+
}
776+
777+
if (isset($vectorField['hnswNumEdgeCandidates'])) {
778+
$field['hnswOptions']['numEdgeCandidates'] = (int) $vectorField['hnswNumEdgeCandidates'];
779+
}
780+
781+
$definition['fields'][] = $field;
782+
}
783+
784+
foreach ($searchIndex->{'filter-field'} as $filterField) {
785+
$definition['fields'][] = [
786+
'type' => 'filter',
787+
'path' => (string) $filterField['path'],
788+
];
789+
}
790+
791+
$name = isset($searchIndex['name']) ? (string) $searchIndex['name'] : null;
792+
793+
$class->addSearchIndex($definition, $name, 'vectorSearch');
794+
}
795+
751796
/** @return array<string, array<string, mixed>|scalar|null> */
752797
private function getPartialFilterExpression(SimpleXMLElement $fields): array
753798
{

lib/Doctrine/ODM/MongoDB/Mapping/MappingException.php

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -297,6 +297,11 @@ public static function emptySearchIndexDefinition(string $className, string $ind
297297
return new self(sprintf('%s search index "%s" must be dynamic or specify a field mapping', $className, $indexName));
298298
}
299299

300+
public static function emptyVectorSearchIndexDefinition(string $className, string $indexName): self
301+
{
302+
return new self(sprintf('%s vector search index "%s" must have a vector field', $className, $indexName));
303+
}
304+
300305
public static function timeSeriesFieldNotFound(string $className, string $fieldName, string $field): self
301306
{
302307
return new self(sprintf(

phpstan-baseline.neon

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1866,6 +1866,18 @@ parameters:
18661866
count: 1
18671867
path: tests/Doctrine/ODM/MongoDB/Tests/Mapping/ClassMetadataLoadEventTest.php
18681868

1869+
-
1870+
message: '#^Method Doctrine\\ODM\\MongoDB\\Tests\\Mapping\\ClassMetadataTest\:\:testEmptyVectorSearchIndexDefinition\(\) has parameter \$definition with no value type specified in iterable type array\.$#'
1871+
identifier: missingType.iterableValue
1872+
count: 1
1873+
path: tests/Doctrine/ODM/MongoDB/Tests/Mapping/ClassMetadataTest.php
1874+
1875+
-
1876+
message: '#^Method Doctrine\\ODM\\MongoDB\\Tests\\Mapping\\ClassMetadataTest\:\:testSearchIndexDefinition\(\) has parameter \$definition with no value type specified in iterable type array\.$#'
1877+
identifier: missingType.iterableValue
1878+
count: 1
1879+
path: tests/Doctrine/ODM/MongoDB/Tests/Mapping/ClassMetadataTest.php
1880+
18691881
-
18701882
message: '#^Property DoctrineGlobal_User\:\:\$email is unused\.$#'
18711883
identifier: property.unused

0 commit comments

Comments
 (0)