Skip to content

Commit 0c25999

Browse files
authored
Merge pull request #48 from rryam/feature/nl-contextual-embedding
Add NaturalLanguage framework support via VecturaNLKit
2 parents 81aaf7c + 16383e6 commit 0c25999

File tree

6 files changed

+834
-2
lines changed

6 files changed

+834
-2
lines changed

Package.swift

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,10 @@ let package = Package(
2121
name: "VecturaMLXKit",
2222
targets: ["VecturaMLXKit"]
2323
),
24+
.library(
25+
name: "VecturaNLKit",
26+
targets: ["VecturaNLKit"]
27+
),
2428
.executable(
2529
name: "vectura-cli",
2630
targets: ["VecturaCLI"]
@@ -53,6 +57,12 @@ let package = Package(
5357
.product(name: "MLXEmbedders", package: "mlx-swift-lm"),
5458
]
5559
),
60+
.target(
61+
name: "VecturaNLKit",
62+
dependencies: [
63+
"VecturaKit"
64+
]
65+
),
5666
.executableTarget(
5767
name: "VecturaCLI",
5868
dependencies: [
@@ -76,6 +86,10 @@ let package = Package(
7686
name: "TestMLXExamples",
7787
dependencies: ["VecturaMLXKit"]
7888
),
89+
.executableTarget(
90+
name: "TestNLExamples",
91+
dependencies: ["VecturaNLKit"]
92+
),
7993
.testTarget(
8094
name: "VecturaKitTests",
8195
dependencies: ["VecturaKit"]
@@ -84,6 +98,10 @@ let package = Package(
8498
name: "VecturaMLXKitTests",
8599
dependencies: ["VecturaMLXKit"]
86100
),
101+
.testTarget(
102+
name: "VecturaNLKitTests",
103+
dependencies: ["VecturaNLKit"]
104+
),
87105
.testTarget(
88106
name: "PerformanceTests",
89107
dependencies: ["VecturaKit"],

README.md

Lines changed: 126 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# VecturaKit
22

3-
VecturaKit is a Swift-based vector database designed for on-device apps through local vector storage and retrieval.
3+
VecturaKit is a Swift-based vector database designed for on-device apps through local vector storage and retrieval.
44

55
Inspired by [Dripfarm's SVDB](https://github.com/Dripfarm/SVDB), **VecturaKit** uses `MLTensor` and [`swift-embeddings`](https://github.com/jkrukowski/swift-embeddings) for generating and managing embeddings. It features **Model2Vec** support with the 32M parameter model as default for fast static embeddings.
66

7-
The framework offers `VecturaKit` as the core vector database with pluggable embedding providers. Use `SwiftEmbedder` for `swift-embeddings` integration or `MLXEmbedder` for Apple's MLX framework acceleration.
7+
The framework offers `VecturaKit` as the core vector database with pluggable embedding providers. Use `SwiftEmbedder` for `swift-embeddings` integration, `MLXEmbedder` for Apple's MLX framework acceleration, or `NLContextualEmbedder` for Apple's NaturalLanguage framework with zero external dependencies.
88

99
It also includes CLI tools (`vectura-cli` and `vectura-mlx-cli`) for easily trying out the package.
1010

@@ -55,6 +55,12 @@ Explore the following books to understand more about AI and iOS development:
5555
- [Add Documents](#add-documents-1)
5656
- [Search Documents](#search-documents-1)
5757
- [Document Management](#document-management-1)
58+
- [NaturalLanguage Integration](#naturallanguage-integration)
59+
- [Import NaturalLanguage Support](#import-naturallanguage-support)
60+
- [Initialize Database with NLContextualEmbedding](#initialize-database-with-nlcontextualembedding)
61+
- [Add Documents](#nl-add-documents)
62+
- [Search Documents](#nl-search-documents)
63+
- [Document Management](#nl-document-management)
5864
- [Command Line Interface](#command-line-interface)
5965
- [Swift CLI Tool (`vectura-cli`)](#swift-cli-tool-vectura-cli)
6066
- [MLX CLI Tool (`vectura-mlx-cli`)](#mlx-cli-tool-vectura-mlx-cli)
@@ -76,6 +82,7 @@ Explore the following books to understand more about AI and iOS development:
7682
- **Custom Storage Provider:** Implements custom storage backends (SQLite, Core Data, cloud storage) by conforming to the `VecturaStorage` protocol.
7783
- **Memory Management Strategies:** Choose between automatic, full-memory, or indexed modes to optimize performance for datasets ranging from thousands to millions of documents. [Learn more](Docs/INDEXED_STORAGE_GUIDE.md)
7884
- **MLX Support:** Uses Apple's MLX framework for accelerated embedding generation through `MLXEmbedder`.
85+
- **NaturalLanguage Support:** Uses Apple's NaturalLanguage framework for contextual embeddings with zero external dependencies through `NLContextualEmbedder`.
7986
- **CLI Tools:** Includes `vectura-cli` (Swift embeddings) and `vectura-mlx-cli` (MLX embeddings) for database management and testing.
8087

8188
## Supported Platforms
@@ -106,6 +113,8 @@ VecturaKit uses the following Swift packages:
106113
- [swift-argument-parser](https://github.com/apple/swift-argument-parser): Used for creating the command-line interface.
107114
- [mlx-swift-examples](https://github.com/ml-explore/mlx-swift-examples): Provides MLX-based embeddings and vector search capabilities, specifically for `VecturaMLXKit`.
108115

116+
**Note:** `VecturaNLKit` has no external dependencies beyond Apple's native NaturalLanguage framework.
117+
109118
## Quick Start
110119

111120
Get up and running with VecturaKit in minutes. Here is an example of adding and searching documents:
@@ -480,6 +489,121 @@ Reset database:
480489
try await vectorDB.reset()
481490
```
482491

492+
## NaturalLanguage Integration
493+
494+
VecturaKit supports Apple's NaturalLanguage framework through the `NLContextualEmbedder` for contextual embeddings with zero external dependencies.
495+
496+
### Import NaturalLanguage Support
497+
498+
```swift
499+
import VecturaKit
500+
import VecturaNLKit
501+
```
502+
503+
### Initialize Database with NLContextualEmbedding
504+
505+
```swift
506+
let config = VecturaConfig(
507+
name: "my-nl-vector-db",
508+
dimension: nil // Auto-detect dimension from NL embedder
509+
)
510+
511+
// Create NLContextualEmbedder
512+
let embedder = try await NLContextualEmbedder(
513+
language: .english
514+
)
515+
let vectorDB = try await VecturaKit(config: config, embedder: embedder)
516+
```
517+
518+
**Available Options:**
519+
520+
```swift
521+
// Initialize with specific language
522+
let embedder = try await NLContextualEmbedder(
523+
language: .spanish
524+
)
525+
526+
// Get model information
527+
let modelInfo = await embedder.modelInfo
528+
print("Language: \(modelInfo.language)")
529+
if let dimension = modelInfo.dimension {
530+
print("Dimension: \(dimension)")
531+
} else {
532+
print("Dimension: Not yet determined")
533+
}
534+
```
535+
536+
### <a name="nl-add-documents"></a>Add Documents
537+
538+
```swift
539+
let texts = [
540+
"Natural language understanding is fascinating",
541+
"Swift makes iOS development enjoyable",
542+
"Machine learning on device preserves privacy"
543+
]
544+
let documentIds = try await vectorDB.addDocuments(texts: texts)
545+
```
546+
547+
### <a name="nl-search-documents"></a>Search Documents
548+
549+
```swift
550+
let results = try await vectorDB.search(
551+
query: "iOS programming",
552+
numResults: 5, // Optional
553+
threshold: 0.7 // Optional
554+
)
555+
556+
for result in results {
557+
print("Document ID: \(result.id)")
558+
print("Text: \(result.text)")
559+
print("Similarity Score: \(result.score)")
560+
print("Created At: \(result.createdAt)")
561+
}
562+
```
563+
564+
### <a name="nl-document-management"></a>Document Management
565+
566+
Update document:
567+
568+
```swift
569+
try await vectorDB.updateDocument(
570+
id: documentId,
571+
newText: "Updated text"
572+
)
573+
```
574+
575+
Delete documents:
576+
577+
```swift
578+
try await vectorDB.deleteDocuments(ids: [documentId1, documentId2])
579+
```
580+
581+
Reset database:
582+
583+
```swift
584+
try await vectorDB.reset()
585+
```
586+
587+
**Key Features:**
588+
589+
- **Zero External Dependencies:** Uses only Apple's native NaturalLanguage framework
590+
- **Contextual Embeddings:** Considers surrounding context for more accurate semantic understanding
591+
- **Privacy-First:** All processing happens on-device
592+
- **Language Support:** Supports multiple languages (English, Spanish, French, German, Italian, Portuguese, and more)
593+
- **Auto-Detection:** Automatically detects embedding dimensions
594+
595+
**Performance Characteristics:**
596+
597+
- **Speed:** Moderate (slower than Model2Vec, comparable to MLX)
598+
- **Accuracy:** High contextual understanding for supported languages
599+
- **Memory:** Efficient on-device processing
600+
- **Use Cases:** Ideal for apps requiring semantic search without external dependencies
601+
602+
**Platform Requirements:**
603+
604+
- iOS 17.0+ / macOS 14.0+ / tvOS 17.0+ / visionOS 1.0+ / watchOS 10.0+
605+
- NaturalLanguage framework (included with OS)
606+
483607
## Command Line Interface
484608

485609
VecturaKit includes command-line tools for database management with different embedding backends.
Lines changed: 176 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,176 @@
1+
// Test script for VecturaNLKit README examples
2+
import Foundation
3+
import VecturaKit
4+
import VecturaNLKit
5+
6+
enum ExampleError: Error, CustomStringConvertible {
7+
case noDocumentsToUpdate
8+
9+
var description: String {
10+
switch self {
11+
case .noDocumentsToUpdate:
12+
return "No documents available to update"
13+
}
14+
}
15+
}
16+
17+
@main
18+
struct TestNLExamples {
19+
static func main() async throws {
20+
try await initializeEmbedder()
21+
let embedder = try await NLContextualEmbedder(language: .english)
22+
let vectorDB = try await initializeDatabase(embedder: embedder)
23+
let documentIds = try await addDocuments(to: vectorDB)
24+
try await searchDocuments(in: vectorDB)
25+
try await testContextualUnderstanding(in: vectorDB)
26+
try await manageDocuments(in: vectorDB, documentIds: documentIds)
27+
try await testEmbeddings(embedder: embedder)
28+
try await resetDatabase(vectorDB)
29+
30+
debugPrint("\n✅ All VecturaNLKit examples completed successfully!")
31+
}
32+
33+
private static func initializeEmbedder() async throws {
34+
debugPrint("1. Initialize NLContextualEmbedder")
35+
36+
let embedder = try await NLContextualEmbedder(language: .english)
37+
let dimension = try await embedder.dimension
38+
debugPrint("NLContextualEmbedder initialized successfully")
39+
debugPrint("Embedding dimension: \(dimension)")
40+
41+
let modelInfo = await embedder.modelInfo
42+
debugPrint("Model language: \(modelInfo.language.rawValue)")
43+
if let dimension = modelInfo.dimension {
44+
debugPrint("Model dimension: \(dimension)")
45+
}
46+
}
47+
48+
private static func initializeDatabase(embedder: NLContextualEmbedder) async throws -> VecturaKit {
49+
debugPrint("\n2. Initialize Database")
50+
51+
let config = try VecturaConfig(name: "test-nl-vector-db")
52+
let vectorDB = try await VecturaKit(
53+
config: config,
54+
embedder: embedder
55+
)
56+
debugPrint("NL Database initialized successfully")
57+
debugPrint("Document count: \(try await vectorDB.documentCount)")
58+
59+
return vectorDB
60+
}
61+
62+
private static func addDocuments(to vectorDB: VecturaKit) async throws -> [UUID] {
63+
debugPrint("\n3. Add Documents")
64+
65+
let texts = [
66+
"Natural language understanding is fascinating",
67+
"Swift makes iOS development enjoyable",
68+
"Machine learning on device preserves privacy",
69+
"Vector databases enable semantic search"
70+
]
71+
let documentIds = try await vectorDB.addDocuments(texts: texts)
72+
debugPrint("Documents added with IDs: \(documentIds)")
73+
debugPrint("Total document count: \(try await vectorDB.documentCount)")
74+
75+
return documentIds
76+
}
77+
78+
private static func searchDocuments(in vectorDB: VecturaKit) async throws {
79+
debugPrint("\n4. Search Documents")
80+
81+
let results = try await vectorDB.search(
82+
query: "iOS programming",
83+
numResults: 5, // Optional
84+
threshold: 0.7 // Optional
85+
)
86+
87+
debugPrint("Search found \(results.count) results:")
88+
for result in results {
89+
debugPrint("ID: \(result.id)")
90+
debugPrint("Text: \(result.text)")
91+
debugPrint("Score: \(result.score)")
92+
debugPrint("Created At: \(result.createdAt)")
93+
debugPrint("---")
94+
}
95+
}
96+
97+
private static func testContextualUnderstanding(in vectorDB: VecturaKit) async throws {
98+
debugPrint("\n5. Test Contextual Understanding")
99+
100+
let semanticResults = try await vectorDB.search(
101+
query: "building apps for Apple platforms",
102+
numResults: 3,
103+
threshold: 0.6
104+
)
105+
106+
debugPrint("Semantic search found \(semanticResults.count) results:")
107+
for result in semanticResults {
108+
debugPrint("Text: \(result.text)")
109+
debugPrint("Score: \(result.score)")
110+
debugPrint("---")
111+
}
112+
}
113+
114+
private static func manageDocuments(in vectorDB: VecturaKit, documentIds: [UUID]) async throws {
115+
debugPrint("\n6. Document Management")
116+
117+
guard let documentToUpdate = documentIds.first else {
118+
throw ExampleError.noDocumentsToUpdate
119+
}
120+
121+
debugPrint("Updating document...")
122+
try await vectorDB.updateDocument(
123+
id: documentToUpdate,
124+
newText: "Apple's frameworks enable powerful on-device AI"
125+
)
126+
debugPrint("Document updated")
127+
128+
// Verify update by searching
129+
let updatedResults = try await vectorDB.search(
130+
query: "on-device AI",
131+
threshold: 0.6
132+
)
133+
debugPrint("Verification: Found \(updatedResults.count) documents related to 'on-device AI'")
134+
if let first = updatedResults.first {
135+
debugPrint("Top result: \(first.text)")
136+
}
137+
138+
debugPrint("\nDeleting documents...")
139+
let idsToDelete = documentIds.count >= 2
140+
? [documentToUpdate, documentIds[1]]
141+
: [documentToUpdate]
142+
try await vectorDB.deleteDocuments(ids: idsToDelete)
143+
debugPrint("Documents deleted")
144+
debugPrint("Document count after deletion: \(try await vectorDB.documentCount)")
145+
}
146+
147+
private static func testEmbeddings(embedder: NLContextualEmbedder) async throws {
148+
debugPrint("\n7. Test Single Embedding")
149+
150+
let singleText = "Testing NLContextualEmbedding"
151+
let singleEmbedding = try await embedder.embed(text: singleText)
152+
debugPrint("Generated embedding for: '\(singleText)'")
153+
debugPrint("Embedding length: \(singleEmbedding.count)")
154+
debugPrint("First 5 values: \(Array(singleEmbedding.prefix(5)))")
155+
156+
debugPrint("\n8. Test Batch Embedding")
157+
158+
let batchTexts = [
159+
"First test",
160+
"Second test",
161+
"Third test"
162+
]
163+
let batchEmbeddings = try await embedder.embed(texts: batchTexts)
164+
debugPrint("Generated \(batchEmbeddings.count) embeddings")
165+
for (index, embedding) in batchEmbeddings.enumerated() {
166+
debugPrint("Embedding \(index + 1): length = \(embedding.count)")
167+
}
168+
}
169+
170+
private static func resetDatabase(_ vectorDB: VecturaKit) async throws {
171+
debugPrint("\nResetting database...")
172+
try await vectorDB.reset()
173+
debugPrint("Database reset")
174+
debugPrint("Document count after reset: \(try await vectorDB.documentCount)")
175+
}
176+
}

0 commit comments

Comments
 (0)