Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
271 changes: 259 additions & 12 deletions modus/search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,264 @@
"og:title": "Search - Modus"
---

The Modus Collections API provides a robust way to store, retrieve, and search
through data using both natural language and vector-based search methods. By
leveraging embeddings, developers can enable semantic and similarity-based
By leveraging embeddings, developers can enable semantic and similarity-based
searches, improving the relevance of search results within their applications.

Vector search is a powerful technique that transforms data (like text, images,
or audio) into numerical representations called embeddings. These embeddings
capture the semantic meaning of the content in a multi-dimensional space and
position similar items closer together. When performing a search, the query is
also converted into an embedding, and the system finds items whose embeddings
are closest to the query embedding. This approach offers significant benefits
over traditional keyword-based search, including improved relevance by capturing
context and semantics, enhanced precision by understanding user intent, and the
ability to handle complex queries with higher accuracy. Vector search is
particularly effective for applications like semantic search, recommendation
systems, and retrieval augmented generation (RAG), optimizing both efficiency
and accuracy in finding and retrieving data based on meaningful similarity
rather than exact matches.

For example, with natural language similarity, if you search for a product
description like 'sleek red sports car', the search method returns similar
product descriptions such as "luxury sports car in red" or 'high-speed car with
sleek design'.

## Understanding key components
## Options for implementing natural language search with Modus

Options for adding natural language search to your Modus app include:

1. [**Using Dgraph's vector search feature**](#natural-language-search-with-dgraph-and-modus)
Take advantage of Dgraph's HNSW index-backed vector search for highly

Check failure on line 34 in modus/search.mdx

View check run for this annotation

Trunk.io / Trunk Check

vale(error)

[new] Spell out 'HNSW', if it's unfamiliar to the audience.
scalable natural language search. This approach can also be used for more

Check failure on line 35 in modus/search.mdx

View check run for this annotation

Trunk.io / Trunk Check

vale(error)

[new] In general, use active voice instead of passive voice ('be used').
sophisticated retrieval approaches that combine vector search and graph
traversals such as GraphRAG patterns.
2. [**Using Modus's built-in Collection data structure**](#natural-language-search-with-modus-collections)
Modus Collections include capability to automatically generate embeddings for
new data and are good for medium-sized data.

## Natural language search with Dgraph and Modus

The steps to implement natural language search with Dgraph include defining the
Dgraph connection in your Modus app manifest, selecting and configuring an
embedding model, declaring a vector index in the Dgraph DQL schema, and using
the `similar_to` DQL function to search for similar text in vector space.

The system stores, retrieves, updates, and deletes product information while
enabling semantic similarity searches using text embeddings generated via
machine learning models.

### Declare Dgraph connection and Hypermode embedding model

First, update the Modus app manifest file `modus.json` to define the connection
to your Dgraph instance and the embedding model that will be used to

Check failure on line 56 in modus/search.mdx

View check run for this annotation

Trunk.io / Trunk Check

vale(error)

[new] In general, use active voice instead of passive voice ('be used').

Check failure on line 56 in modus/search.mdx

View check run for this annotation

Trunk.io / Trunk Check

vale(error)

[new] Avoid using 'will'.

```json
{
"$schema": "https://schema.hypermode.com/modus.json",
"endpoints": {
"default": {
"type": "graphql",
"path": "/graphql",
"auth": "bearer-token"
}
},
"models": {
"minilm": {
"sourceModel": "sentence-transformers/all-MiniLM-L6-v2",
"connection": "hypermode",
"provider": "hugging-face"
}
},
"connections": {
"dgraph-grpc": {
"type": "dgraph",
"grpcTarget": "localhost:9080"
}
}
}
```

<Note>
In order to use Hypermode hosted models in the local Modus development
environment you'll need to use the `hyp` CLI to connect your local environment
with your Hypermode account. See the [Using Hypermode-hosted
models](run-locally#using-hypermode-hosted-models) docs page for more
information.
</Note>

### Data modeling

Define your data model using classes with decorators for automatic
serialization/deserialization. The @json decorator enables JSON serialization,
while @alias maps property names to Dgraph-friendly formats:

```ts
@json
export class Product {
@alias("Product.id")
id!: string

@alias("Product.title")
title: string = ""

@alias("Product.description")
description: string = ""

@alias("Product.category")
@omitnull()
category: Category | null = null
}

@json
export class Category {
@alias("Category.name")
name: string = ""
}
```

### Embedding Integration

Create an embedding function that uses a transformer model (like minilm) to

Check failure on line 124 in modus/search.mdx

View check run for this annotation

Trunk.io / Trunk Check

vale(error)

[new] Did you really mean 'minilm'?
convert product descriptions and search queries into vectors:

```ts
import { models } from "@hypermode/modus-sdk-as"
import { EmbeddingsModel } from "@hypermode/modus-sdk-as/models/experimental/embeddings"

const EMBEDDING_MODEL = "minilm"export

function embedText(content: string[]): f32[][] {
const model = models.getModel<EmbeddingsModel>(EMBEDDING_MODEL)
const input = model.createInput(content)
const output = model.invoke(input)
return output.predictions
}
```

### Dgraph `similar_to` query function

Create a Modus function that

Create utility functions to interact with Dgraph, including functions to inject
UIDs into JSON payloads, retrieve entities by properties, delete node

Check failure on line 146 in modus/search.mdx

View check run for this annotation

Trunk.io / Trunk Check

vale(error)

[new] Did you really mean 'UIDs'?
predicates, and perform similarity searches:

```ts
export function searchBySimilarity<T>(
connection: string,
embedding: f32[],
predicate: string,
body: string,
topK: i32,
): T[] {
const query = new dgraph.Query(`
query search($vector: float32vector) {
var(func: similar_to(${predicate},${topK},$vector)) {
vemb as Product.embedding
dist as math((vemb - $vector) dot (vemb - $vector))
score as math(1 - (dist / 2.0))
}

list(func:uid(score),orderdesc:val(score)) @filter(gt(val(score),0.25)){
${body}
}
}`).withVariable("$vector", embedding)

const response = dgraph.executeQuery(connection, query)
console.log(response.Json)
return JSON.parse<ListOf<T>>(response.Json).list
}

/**
* Search products by similarity to a given text
*/
export function searchProducts(search: string): Product[] {
const embedding = embedText([search])[0]
const topK = 3
const body = `
Product.id
Product.description
Product.title
Product.category {
Category.name
}
`
return searchBySimilarity<Product>(
DGRAPH_CONNECTION,
embedding,
"Product.embedding",
body,
topK,
)
}
```

### Define Modus mutation functions

Implement

```ts
/**
* Add or update a new product to the database
*/
export function upsertProduct(product: Product): Map<string, string> | null {
let payload = buildProductMutationJson(DGRAPH_CONNECTION, product)

const embedding = embedText([product.description])[0]
payload = addEmbeddingToJson(payload, "Product.embedding", embedding)

const mutation = new dgraph.Mutation(payload)
const response = dgraph.executeMutations(DGRAPH_CONNECTION, mutation)

return response.Uids
}
```

### Define Dgraph schema

While Dgraph can be used without defining a schema, in order to use the vector

Check failure on line 222 in modus/search.mdx

View check run for this annotation

Trunk.io / Trunk Check

vale(error)

[new] In general, use active voice instead of passive voice ('be used').

Check failure on line 222 in modus/search.mdx

View check run for this annotation

Trunk.io / Trunk Check

vale(error)

[new] Use 'to' instead of 'in order to'.
search functionality of Dgraph we must declare a schema in order to create an

Check failure on line 223 in modus/search.mdx

View check run for this annotation

Trunk.io / Trunk Check

vale(error)

[new] Use 'to' instead of 'in order to'.

Check failure on line 223 in modus/search.mdx

View check run for this annotation

Trunk.io / Trunk Check

vale(error)

[new] Use 'capability' or 'feature' instead of 'functionality'.
index on the `Product.embedding` property.

To define your Dgraph schema with vector indexing support we add the
`@index(hnsw)` directive to the property storing the embedding value, in this
case `Product.embedding`. We also define the other property types and node
labels.

```rdf
<Category.name>: string @index(hash) .
<Product.category>: uid @reverse .
<Product.description>: string .
<Product.id>: string @index(hash) .
<Product.embedding>: float32vector @index(hnsw) .
```

To apply this schema to our Dgraph instance we can make a POST request to the
`/alter` endpoint of our Dgraph instance:

```bash
curl -X POST localhost:8080/alter --silent --data-binary '@dqlschema.txt'
```

or use the schema tab of the Ratel interface to apply the schema.

### Query Modus endpoint

```graphql
TODO: GraphQL query
```

### Resources

- Video: https://www.youtube.com/watch?v=Z2fB-nBf4Wo
- Code: https://github.com/hypermodeinc/modus-recipes/tree/main/dgraph-101

## Natural language search with Modus Collections

The Modus Collections API provides a robust way to store, retrieve, and search
through data using both natural language and vector-based search methods.

### Understanding key components

**Collections**: a collection is a structured storage that organizes and stores
textual data and associated metadata. Collections enable sophisticated search,
Expand All @@ -34,7 +281,7 @@
configuration, when you add or update items.
</Note>

## Initializing your collection
### Initializing your collection

Before implementing search, ensure you have
[defined a collection in the app manifest](./app-manifest#collections). In this
Expand Down Expand Up @@ -79,12 +326,12 @@

</CodeGroup>

## Configure your search method
### Configure your search method

The search capability relies on a search method and embedding function. To
configure your search method.

### Create an embedding function
#### Create an embedding function

An embedding function is any API function that transforms text into vectors that
represent their meaning in a high-dimensional space.
Expand Down Expand Up @@ -254,7 +501,7 @@
</Tab>
</Tabs>

### Declare the search method
#### Declare the search method

With an embedding function in place, declare a search method in the
[collection properties](/modus/app-manifest#collections).
Expand All @@ -272,7 +519,7 @@

```

## Implement semantic similarity search
### Implement semantic similarity search

With the products stored, you can now search the collection by semantic
similarity. The search] API computes an embedding for the provided text,
Expand Down Expand Up @@ -306,7 +553,7 @@

</CodeGroup>

### Search result format
#### Search result format

The search response is a CollectionSearchResult containing the following fields:

Expand All @@ -333,7 +580,7 @@
}
```

## Search for similar Items
### Search for similar Items

When you need to search similar items to a given item, use the `searchByVector`
API. Retrieve the vector associated with the given item by its key, then perform
Expand Down Expand Up @@ -386,7 +633,7 @@

</CodeGroup>

## Develop locally with Collections
### Develop locally with Collections

While Collections expose a key-value interface for working with data, a
PostgreSQL database instance persists the data. When using Collections in a
Expand Down
1 change: 1 addition & 0 deletions styles/Google/Acronyms.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ exceptions:
- PDF
- PHP
- POST
- RAG
- RAM
- REPL
- RSA
Expand Down
Loading