Skip to content

Commit 4786900

Browse files
authored
Merge pull request #272101 from wmwxwa/patch-14
Update "open-source vector database"
2 parents 9e15f97 + 0e87def commit 4786900

File tree

1 file changed

+16
-15
lines changed

1 file changed

+16
-15
lines changed

articles/cosmos-db/mongodb/vcore/vector-search-ai.md

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ Another advantage of open-source vector databases is the strong community suppor
2121

2222
Some individuals opt for open-source vector databases because they are "free," meaning there's no cost to acquire or use the software. An alternative is using the free tiers offered by managed vector database services. These managed services provide not only cost-free access up to a certain usage limit but also simplify the operational burden by handling maintenance, updates, and scalability. Therefore, by using the free tier of managed vector database services, users can achieve cost savings while reducing management overhead. This approach allows users to focus more on their core activities rather than on database administration.
2323

24-
## Working mechanism of open-source vector databases
24+
## Working mechanism of vector databases
2525

26-
Open-source vector databases are designed to store and manage vector embeddings, which are mathematical representations of data in a high-dimensional space. In this space, each dimension corresponds to a feature of the data, and tens of thousands of dimensions might be used to represent sophisticated data. A vector's position in this space represents its characteristics. Words, phrases, or entire documents, and images, audio, and other types of data can all be vectorized. These vector embeddings are used in similarity search, multi-modal search, recommendations engines, large languages models (LLMs), etc.
26+
Vector databases are designed to store and manage vector embeddings, which are mathematical representations of data in a high-dimensional space. In this space, each dimension corresponds to a feature of the data, and tens of thousands of dimensions might be used to represent sophisticated data. A vector's position in this space represents its characteristics. Words, phrases, or entire documents, and images, audio, and other types of data can all be vectorized. These vector embeddings are used in similarity search, multi-modal search, recommendations engines, large languages models (LLMs), etc.
2727

2828
These databases' architecture typically includes a storage engine and an indexing mechanism. The storage engine optimizes the storage of vector data for efficient retrieval and manipulation, while the indexing mechanism organizes the data for fast searching and retrieval operations.
2929

@@ -40,6 +40,14 @@ Vector databases are used in numerous domains and situations across analytical a
4040
- Implement persistent memory for AI agents
4141
- Enable retrieval-augmented generation (RAG)
4242

43+
### Integrated vector database vs pure vector database
44+
45+
There are two common types of vector database implementations - pure vector database and integrated vector database in a NoSQL or relational database.
46+
47+
A pure vector database is designed to efficiently store and manage vector embeddings, along with a small amount of metadata; it is separate from the data source from which the embeddings are derived.
48+
49+
A vector database that is integrated in a highly performant NoSQL or relational database provides additional capabilities. The integrated vector database in a NoSQL or relational database can store, index, and query embeddings alongside the corresponding original data. This approach eliminates the extra cost of replicating data in a separate pure vector database. Moreover, keeping the vector embeddings and original data together better facilitates multi-modal data operations, and enables greater data consistency, scale, and performance.
50+
4351
## Selecting the best open-source vector database
4452

4553
Choosing the best open-source vector database requires considering several factors. Performance and scalability of the database are crucial, as they impact whether the database can handle your specific workload requirements. Databases with efficient indexing and querying capabilities usually offer optimal performance. Another factor is the community support and documentation available for the database. A robust community and ample documentation can provide valuable assistance. Here are some popular open-source vector databases:
@@ -49,33 +57,26 @@ Choosing the best open-source vector database requires considering several facto
4957
- Qdrant
5058
- Weaviate
5159

52-
>[!NOTE]
53-
>The most popular option may not be the best option for you. To find the best fit for your needs, you should compare different options based on features, supported data types, compatibility with existing tools and frameworks you use. Ease of installation, configuration, and maintenance should also be considered to ensure smooth integration into your workflow.
60+
However, the most popular option may not be the best option for you. Thus, you should compare different options based on features, supported data types, compatibility with existing tools and frameworks you use. You should also keep in mind the challenges of open-source vector databases (below).
5461

5562
## Challenges of open-source vector databases
5663

57-
Open-source vector databases pose challenges that are typical of open-source software and ones that are specific to many vector databases.
64+
Most open-source vector databases, including the ones listed above, are pure vector databases. In other words, they are designed to store and manage vector embeddings only, along with a small amount of metadata. Since they are independent of the data source from which the embeddings are derived, using them requires sending your data between service integrations, which adds extra cost, complexity, and bottlenecks for your production workloads.
5865

59-
### Challenges with open source:
66+
They also pose the challenges that are typical of open-source databases:
6067

6168
- Setup: Users need in-depth knowledge to install, configure, and operate, especially for complex deployments. Optimizing resources and configuration while scaling up operation requires close monitoring and adjustments.
6269
- Maintenance: Users must manage their own updates, patches, and maintenance. Thus, ML expertise wouldn't suffice; users must also have extensive experience in database administration.
6370
- Support: Official support can be limited compared to managed services, relying more on community assistance.
6471

6572
Therefore, while free initially, open-source vector databases incur significant costs when scaling up. Expanding operations necessitates more hardware, skilled IT staff, and advanced infrastructure management, leading to higher expenses in hardware, personnel, and operational costs. Scaling open-source vector databases can be financially demanding despite the lack of licensing fees.
6673

67-
### Challenges that are specific to vector databases:
68-
69-
Most open-source vector databases nowadays are pure vector databases. In other words, they are designed to store and manage vector embeddings, along with a small amount of metadata; it is separate from the data source from which the embeddings are derived. Thus, using pure vector databases requires sending your data between service integrations, which adds extra cost, complexity, and bottlenecks for your production workloads.
70-
7174
## Addressing the challenges of open-source vector databases
7275

73-
A fully managed database service helps developers avoid the hassles from setting up, maintaining, and relying on community assistance for an open-source vector database; moreover, some managed vector database services offer a life-time free tier.
74-
75-
The extra cost and complexity of pure vector databases can be avoided by using a vector database that is integrated in a highly performant NoSQL or relational database, which stores, indexes, and queries embeddings alongside the corresponding original data. This approach eliminates the extra cost of replicating data in a separate pure vector database. Moreover, keeping the vector embeddings and original data together better facilitates multi-modal data operations, and enables greater data consistency, scale, and performance.
76+
A fully managed vector database that is integrated in a highly performant NoSQL or relational database avoids the extra cost and complexity of open-source vector databases. Such a database stores, indexes, and queries embeddings alongside the corresponding original data. This approach eliminates the extra cost of replicating data in a separate pure vector database. Moreover, keeping the vector embeddings and original data together better facilitates multi-modal data operations, and enables greater data consistency, scale, and performance. Meanwhile, the fully managed service helps developers avoid the hassles from setting up, maintaining, and relying on community assistance for an open-source vector database. Moreover, some managed vector database services offer a life-time free tier.
7677

77-
An example is the Integrated Vector Database in Azure Cosmos DB for MongoDB. It allows developers to enjoy the same financial benefit associated with open-source vector databases, while the service provider handles maintenance, updates, and scalability. When it’s time to scale up operations, upgrading is quick and easy while keeping a low [total cost of ownership (TCO)](introduction.md#low-total-cost-of-ownership-tco).
78+
An example is the Integrated Vector Database in Azure Cosmos DB for MongoDB. It allows developers to enjoy the same financial benefit associated with open-source vector databases, while the service provider handles maintenance, updates, and scalability. When it’s time to scale up operations, upgrading is quick and easy while keeping a low [total cost of ownership (TCO)](introduction.md#low-total-cost-of-ownership-tco). This service can also be used to conveniently [scale MongoDB](../reimagined.md) applications that are already in production.
7879

7980
## Next steps
8081
> [!div class="nextstepaction"]
81-
> [Create a lifetime free-tier vCore cluster for Azure Cosmos DB for MongoDB](free-tier.md)
82+
> [Use lifetime free tier of Integrated Vector Database in Azure Cosmos DB for MongoDB](free-tier.md)

0 commit comments

Comments
 (0)