You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CouchbaseVectorSearchDemo/README.md
+44-34Lines changed: 44 additions & 34 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,47 +4,48 @@ A quickstart example demonstrating vector search with Couchbase and Microsoft Se
4
4
5
5
## Introduction
6
6
7
-
This demo showcases the **[Semantic Kernel Couchbase connector](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel)** - a .NET library that bridges Microsoft's Semantic Kernel framework with Couchbase's vector search capabilities. The connector provides a seamless integration that allows developers to build AI-powered applications using familiar Semantic Kernel abstractions while leveraging Couchbase's vector indexing for high-performance semantic search.
7
+
This demo showcases the **Semantic Kernel Couchbase connector** - a .NET library that bridges Microsoft's Semantic Kernel framework with Couchbase's vector search capabilities. The connector provides a seamless integration that allows developers to build AI-powered applications using familiar Semantic Kernel abstractions while leveraging Couchbase's vector indexing for high-performance semantic search.
8
8
9
9
The connector supports three index types:
10
10
-**Hyperscale Vector Index** - for pure vector search at scale ← *Used in this demo*
11
11
-**Composite Vector Index** - for vector search with heavy scalar filtering
12
-
-**FTS** (Full-Text Search) - for hybrid text + semantic search
12
+
-**Search Vector Index** (using Search service) - for hybrid text + semantic search
13
13
14
14
This makes the connector ideal for RAG (Retrieval-Augmented Generation) applications, semantic search engines, hybrid search, and recommendation systems.
15
15
16
16
## Prerequisites
17
17
18
-
### 1. Couchbase Server Setup
18
+
### Couchbase Server Setup
19
19
-**Couchbase Server 8.0+**
20
20
- Local installation or Couchbase Cloud/Capella
21
21
- Bucket with proper read/write permissions
22
22
- Query service enabled for SQL++ operations
23
23
24
-
### 2. OpenAI API Access
24
+
### OpenAI API Access
25
25
-**OpenAI API Key** - Get one from: https://platform.openai.com/api-keys
26
26
- Used for generating text embeddings with `text-embedding-3-small` model
27
27
- Ensure you have sufficient API quota for embedding generation
cd couchbase-semantic-kernel-quickstart/CouchbaseVectorSearchDemo
40
41
```
41
42
42
-
### 2. Install Dependencies
43
+
### Install Dependencies
43
44
```bash
44
45
dotnet restore
45
46
```
46
47
47
-
### 3. Configuration Setup
48
+
### Configuration Setup
48
49
49
50
Update `appsettings.Development.json` with your credentials:
50
51
@@ -65,17 +66,7 @@ Update `appsettings.Development.json` with your credentials:
65
66
}
66
67
```
67
68
68
-
### 4. Prepare Couchbase
69
-
70
-
Ensure you have the bucket, scope, and collection ready in Couchbase:
71
-
-**Bucket**: `demo`
72
-
-**Scope**: `semantic-kernel`
73
-
-**Collection**: `glossary`
74
-
75
-
### 5. Run the Application
76
-
```bash
77
-
dotnet run
78
-
```
69
+
> **Note**: The `BucketName`, `ScopeName`, and `CollectionName` values can be changed to match your Couchbase setup, but you'll need to update the corresponding code references in the demo application accordingly.
79
70
80
71
## Understanding the Data Model
81
72
@@ -107,7 +98,14 @@ internal sealed class Glossary
107
98
108
99
## Step-by-Step Tutorial
109
100
110
-
### Step 1: Data Ingestion and Embedding Generation
101
+
### Prepare Couchbase
102
+
103
+
Ensure you have the bucket, scope, and collection ready in Couchbase:
104
+
-**Bucket**: `demo`
105
+
-**Scope**: `semantic-kernel`
106
+
-**Collection**: `glossary`
107
+
108
+
### Data Ingestion and Embedding Generation
111
109
112
110
This step demonstrates how the connector works with Semantic Kernel's vector store abstractions:
113
111
@@ -118,13 +116,12 @@ var collection = vectorStore.GetCollection<string, Glossary>(
118
116
"glossary",
119
117
newCouchbaseQueryCollectionOptions
120
118
{
121
-
IndexName="hyperscale_glossary_index", // Hyperscale index name
122
119
SimilarityMetric="cosine"
123
120
}
124
121
);
125
122
```
126
123
127
-
The `CouchbaseQueryCollectionOptions` works with both Hyperscale and Composite indexes - simply specify the appropriate index name. For FTS indexes, use `CouchbaseSearchCollection` with `CouchbaseSearchCollectionOptions` instead.
124
+
The `CouchbaseQueryCollectionOptions` works with both Hyperscale and Composite indexes. For Search Vector indexes, use `CouchbaseSearchCollection` with `CouchbaseSearchCollectionOptions` instead.
128
125
129
126
**Automatic Embedding Generation** - The connector integrates with Semantic Kernel's `IEmbeddingGenerator` interface to automatically generate embeddings from text. When you provide an embedding generator (in this case, OpenAI's `text-embedding-3-small`), the text is automatically converted to vectors:
This creates 6 sample glossary entries with technical terms, generates embeddings for each definition, and stores them in Couchbase with the following structure:
145
142
146
143
**Document ID:**`"1"` (from Key field)
147
-
148
144
**Document Content:**
149
145
```json
150
146
{
@@ -155,9 +151,13 @@ This creates 6 sample glossary entries with technical terms, generates embedding
155
151
}
156
152
```
157
153
158
-
### Step 2: Hyperscale Index Creation
154
+
### Hyperscale Index Creation
159
155
160
-
This demo uses a **Hyperscale Vector Index** - optimized for pure vector searches without heavy scalar filtering. After documents are inserted, the demo creates the Hyperscale index:
156
+
While the application works without creating indexes manually, you can optionally create a vector index for better performance.
157
+
158
+
This demo uses a **Hyperscale Vector Index** - optimized for pure vector searches without heavy scalar filtering.
159
+
160
+
After documents are inserted, the demo creates the Hyperscale index:
161
161
162
162
```sql
163
163
CREATE VECTOR INDEX `hyperscale_glossary_index`
@@ -177,9 +177,9 @@ USING GSI WITH {
177
177
-**Include Fields**: Non-vector fields for faster retrieval
178
178
-**Quantization**: `IVF,SQ8` (Inverted File with 8-bit scalar quantization)
179
179
180
-
> **Note**: Composite vector indexes can be created similarly by adding scalar fields to the index definition. Use Composite indexes when your queries frequently filter on scalar values before vector comparison. For this demo, we use Hyperscale since we're demonstrating pure semantic search capabilities.
180
+
> **Note**: [Composite vector indexes](https://docs.couchbase.com/server/current/vector-index/composite-vector-index.html) can be created similarly by adding scalar fields to the index definition. Use Composite indexes when your queries frequently filter on scalar values before vector comparison. For this demo, we use Hyperscale since we are demonstrating pure semantic search capabilities.
181
181
182
-
### Step 3: Vector Search Operations
182
+
### Vector Search Operations
183
183
184
184
The demo performs two types of searches using the connector's `SearchAsync()` method with the Hyperscale index:
185
185
@@ -205,6 +205,8 @@ ORDER BY _distance ASC
205
205
LIMIT1
206
206
```
207
207
208
+
> **Note**: The distance metric (`'cosine'` in this example) comes from the `SimilarityMetric` property configured when creating the collection:
209
+
208
210
**Expected Result**: Finds "API" entry with high similarity
209
211
210
212
#### Filtered Vector Search
@@ -248,7 +250,7 @@ Couchbase offers three types of vector indexes optimized for different use cases
248
250
- Designed to scale to billions of vectors with low memory footprint
249
251
- Optimized for high-performance concurrent operations
-**Creation**: Using Search Service index configuration with vector field support
266
268
269
+
267
270
All three index types work with the same Semantic Kernel abstractions (`SearchAsync()`, `UpsertAsync()`, etc.). The main difference is which collection class you instantiate and the underlying query engine.
268
271
269
272
**Choosing the Right Type**:
270
273
- Start with **Hyperscale** for pure vector searches and large datasets
271
274
- Use **Composite** when scalar filters eliminate large portions of data before vector comparison
272
-
- Use **FTS** when you need hybrid search combining full-text and semantic search
275
+
- Use **Search Vector Index** when you need hybrid search combining full-text and semantic search
273
276
274
277
For more details, see the [Couchbase Vector Index Documentation](https://docs.couchbase.com/server/current/vector-index/use-vector-indexes.html).
275
278
279
+
276
280
### Index Configuration (Couchbase 8.0+)
277
281
278
282
The `description` parameter in the index definition controls vector storage optimization through centroids and quantization:
@@ -298,6 +302,12 @@ For detailed configuration options, see the [Quantization & Centroid Settings](h
298
302
299
303
## Running the Demo
300
304
305
+
### Build and Execute
306
+
```bash
307
+
dotnet build
308
+
dotnet run
309
+
```
310
+
301
311
### Expected Output
302
312
```
303
313
Couchbase Hyperscale Vector Search Demo
@@ -308,7 +318,7 @@ Data ingestion completed
308
318
309
319
Step 2: Creating Hyperscale vector index manually...
310
320
Executing Hyperscale index creation query...
311
-
Hyperscale vector index 'hyperscale_glossary_index' already exists.
321
+
Hyperscale vector index 'hyperscale_glossary_index' created successfully!
312
322
313
323
Step 3: Performing vector search...
314
324
Found: API
@@ -341,7 +351,7 @@ The Couchbase Semantic Kernel connector provides a seamless integration between
341
351
**Vector Store Classes:**
342
352
-**`CouchbaseVectorStore`** - Main entry point for vector store operations
343
353
-**`CouchbaseQueryCollection`** - Collection class for Hyperscale and Composite indexes (SQL++)
344
-
-**`CouchbaseSearchCollection`** - Collection class for FTS indexes (Search API)
354
+
-**`CouchbaseSearchCollection`** - Collection class for Search Vector indexes (Search, formerly known as Full Text service)
345
355
346
356
**Common Methods (all index types):**
347
357
-**`GetCollection<TKey, TRecord>()`** - Returns a typed collection for CRUD operations
@@ -351,7 +361,7 @@ The Couchbase Semantic Kernel connector provides a seamless integration between
351
361
352
362
**Configuration Options:**
353
363
-**`CouchbaseQueryCollectionOptions`** - For Hyperscale and Composite indexes
354
-
-**`CouchbaseSearchCollectionOptions`** - For FTS indexes
364
+
-**`CouchbaseSearchCollectionOptions`** - For Search Vector indexes
0 commit comments