hypermodeinc · johnymontana · Mar 11, 2025 · Mar 11, 2025 · Mar 11, 2025
diff --git a/modus/search.mdx b/modus/search.mdx
@@ -4,17 +4,264 @@
 "og:title": "Search - Modus"
 ---
 
-The Modus Collections API provides a robust way to store, retrieve, and search
-through data using both natural language and vector-based search methods. By
-leveraging embeddings, developers can enable semantic and similarity-based
+By leveraging embeddings, developers can enable semantic and similarity-based
 searches, improving the relevance of search results within their applications.
 
+Vector search is a powerful technique that transforms data (like text, images,
+or audio) into numerical representations called embeddings. These embeddings
+capture the semantic meaning of the content in a multi-dimensional space and
+position similar items closer together. When performing a search, the query is
+also converted into an embedding, and the system finds items whose embeddings
+are closest to the query embedding. This approach offers significant benefits
+over traditional keyword-based search, including improved relevance by capturing
+context and semantics, enhanced precision by understanding user intent, and the
+ability to handle complex queries with higher accuracy. Vector search is
+particularly effective for applications like semantic search, recommendation
+systems, and retrieval augmented generation (RAG), optimizing both efficiency
+and accuracy in finding and retrieving data based on meaningful similarity
+rather than exact matches.
+
 For example, with natural language similarity, if you search for a product
 description like 'sleek red sports car', the search method returns similar
 product descriptions such as "luxury sports car in red" or 'high-speed car with
 sleek design'.
 
-## Understanding key components
+## Options for implementing natural language search with Modus
+
+Options for adding natural language search to your Modus app include:
+
+1. [**Using Dgraph's vector search feature**](#natural-language-search-with-dgraph-and-modus)
+   Take advantage of Dgraph's HNSW index-backed vector search for highly
+   scalable natural language search. This approach can also be used for more
+   sophisticated retrieval approaches that combine vector search and graph
+   traversals such as GraphRAG patterns.
+2. [**Using Modus's built-in Collection data structure**](#natural-language-search-with-modus-collections)
+   Modus Collections include capability to automatically generate embeddings for
+   new data and are good for medium-sized data.
+
+## Natural language search with Dgraph and Modus
+
+The steps to implement natural language search with Dgraph include defining the
+Dgraph connection in your Modus app manifest, selecting and configuring an
+embedding model, declaring a vector index in the Dgraph DQL schema, and using
+the `similar_to` DQL function to search for similar text in vector space.
+
+The system stores, retrieves, updates, and deletes product information while
+enabling semantic similarity searches using text embeddings generated via
+machine learning models.
+
+### Declare Dgraph connection and Hypermode embedding model
+
+First, update the Modus app manifest file `modus.json` to define the connection
+to your Dgraph instance and the embedding model that will be used to
+
+```json
+{
+  "$schema": "https://schema.hypermode.com/modus.json",
+  "endpoints": {
+    "default": {
+      "type": "graphql",
+      "path": "/graphql",
+      "auth": "bearer-token"
+    }
+  },
+  "models": {
+    "minilm": {
+      "sourceModel": "sentence-transformers/all-MiniLM-L6-v2",
+      "connection": "hypermode",
+      "provider": "hugging-face"
+    }
+  },
+  "connections": {
+    "dgraph-grpc": {
+      "type": "dgraph",
+      "grpcTarget": "localhost:9080"
+    }
+  }
+}
+```
+
+<Note>
+  In order to use Hypermode hosted models in the local Modus development
+  environment you'll need to use the `hyp` CLI to connect your local environment
+  with your Hypermode account. See the [Using Hypermode-hosted
+  models](run-locally#using-hypermode-hosted-models) docs page for more
+  information.
+</Note>
+
+### Data modeling
+
+Define your data model using classes with decorators for automatic
+serialization/deserialization. The @json decorator enables JSON serialization,
+while @alias maps property names to Dgraph-friendly formats:
+
+```ts
+@json
+export class Product {
+  @alias("Product.id")
+  id!: string
+
+  @alias("Product.title")
+  title: string = ""
+
+  @alias("Product.description")
+  description: string = ""
+
+  @alias("Product.category")
+  @omitnull()
+  category: Category | null = null
+}
+
+@json
+export class Category {
+  @alias("Category.name")
+  name: string = ""
+}
+```
+
+### Embedding Integration
+
+Create an embedding function that uses a transformer model (like minilm) to
+convert product descriptions and search queries into vectors:
+
+```ts
+import { models } from "@hypermode/modus-sdk-as"
+import { EmbeddingsModel } from "@hypermode/modus-sdk-as/models/experimental/embeddings"
+
+const EMBEDDING_MODEL = "minilm"export
+
+function embedText(content: string[]): f32[][] {
+const model = models.getModel<EmbeddingsModel>(EMBEDDING_MODEL)
+const input = model.createInput(content)
+const output = model.invoke(input)
+return output.predictions
+}
+```
+
+### Dgraph `similar_to` query function
+
+Create a Modus function that
+
+Create utility functions to interact with Dgraph, including functions to inject
+UIDs into JSON payloads, retrieve entities by properties, delete node
+predicates, and perform similarity searches:
+
+```ts
+export function searchBySimilarity<T>(
+  connection: string,
+  embedding: f32[],
+  predicate: string,
+  body: string,
+  topK: i32,
+): T[] {
+  const query = new dgraph.Query(`
+    query search($vector: float32vector) {
+        var(func: similar_to(${predicate},${topK},$vector))  {    
+            vemb as Product.embedding 
+            dist as math((vemb - $vector) dot (vemb - $vector))
+            score as math(1 - (dist / 2.0))
+        } 
+
+        list(func:uid(score),orderdesc:val(score))  @filter(gt(val(score),0.25)){ 
+            ${body}
+        }
+    }`).withVariable("$vector", embedding)
+
+  const response = dgraph.executeQuery(connection, query)
+  console.log(response.Json)
+  return JSON.parse<ListOf<T>>(response.Json).list
+}
+
+/**
+ * Search products by similarity to a given text
+ */
+export function searchProducts(search: string): Product[] {
+  const embedding = embedText([search])[0]
+  const topK = 3
+  const body = `
+    Product.id
+    Product.description
+    Product.title
+    Product.category {
+      Category.name
+    }
+  `
+  return searchBySimilarity<Product>(
+    DGRAPH_CONNECTION,
+    embedding,
+    "Product.embedding",
+    body,
+    topK,
+  )
+}
+```
+
+### Define Modus mutation functions
+
+Implement
+
+```ts
+/**
+ * Add or update a new product to the database
+ */
+export function upsertProduct(product: Product): Map<string, string> | null {
+  let payload = buildProductMutationJson(DGRAPH_CONNECTION, product)
+
+  const embedding = embedText([product.description])[0]
+  payload = addEmbeddingToJson(payload, "Product.embedding", embedding)
+
+  const mutation = new dgraph.Mutation(payload)
+  const response = dgraph.executeMutations(DGRAPH_CONNECTION, mutation)
+
+  return response.Uids
+}
+```
+
+### Define Dgraph schema
+
+While Dgraph can be used without defining a schema, in order to use the vector
+search functionality of Dgraph we must declare a schema in order to create an
+index on the `Product.embedding` property.
+
+To define your Dgraph schema with vector indexing support we add the
+`@index(hnsw)` directive to the property storing the embedding value, in this
+case `Product.embedding`. We also define the other property types and node
+labels.
+
+```rdf
+<Category.name>: string @index(hash) .
+<Product.category>: uid @reverse .
+<Product.description>: string .
+<Product.id>: string @index(hash) .
+<Product.embedding>: float32vector @index(hnsw) .
+```
+
+To apply this schema to our Dgraph instance we can make a POST request to the
+`/alter` endpoint of our Dgraph instance:
+
+```bash
+curl -X POST localhost:8080/alter --silent --data-binary '@dqlschema.txt'
+```
+
+or use the schema tab of the Ratel interface to apply the schema.
+
+### Query Modus endpoint
+
+```graphql
+TODO: GraphQL query
+```
+
+### Resources
+
+- Video: https://www.youtube.com/watch?v=Z2fB-nBf4Wo
+- Code: https://github.com/hypermodeinc/modus-recipes/tree/main/dgraph-101
+
+## Natural language search with Modus Collections
+
+The Modus Collections API provides a robust way to store, retrieve, and search
+through data using both natural language and vector-based search methods.
+
+### Understanding key components
 
 **Collections**: a collection is a structured storage that organizes and stores
 textual data and associated metadata. Collections enable sophisticated search,
@@ -34,7 +281,7 @@
   configuration, when you add or update items.
 </Note>
 
-## Initializing your collection
+### Initializing your collection
 
 Before implementing search, ensure you have
 [defined a collection in the app manifest](./app-manifest#collections). In this
@@ -79,12 +326,12 @@
 
 </CodeGroup>
 
-## Configure your search method
+### Configure your search method
 
 The search capability relies on a search method and embedding function. To
 configure your search method.
 
-### Create an embedding function
+#### Create an embedding function
 
 An embedding function is any API function that transforms text into vectors that
 represent their meaning in a high-dimensional space.
@@ -254,7 +501,7 @@
   </Tab>
 </Tabs>
 
-### Declare the search method
+#### Declare the search method
 
 With an embedding function in place, declare a search method in the
 [collection properties](/modus/app-manifest#collections).
@@ -272,7 +519,7 @@
 
 ```
 
-## Implement semantic similarity search
+### Implement semantic similarity search
 
 With the products stored, you can now search the collection by semantic
 similarity. The search] API computes an embedding for the provided text,
@@ -306,7 +553,7 @@
 
 </CodeGroup>
 
-### Search result format
+#### Search result format
 
 The search response is a CollectionSearchResult containing the following fields:
 
@@ -333,7 +580,7 @@
 }
 ```
 
-## Search for similar Items
+### Search for similar Items
 
 When you need to search similar items to a given item, use the `searchByVector`
 API. Retrieve the vector associated with the given item by its key, then perform
@@ -386,7 +633,7 @@
 
 </CodeGroup>
 
-## Develop locally with Collections
+### Develop locally with Collections
 
 While Collections expose a key-value interface for working with data, a
 PostgreSQL database instance persists the data. When using Collections in a

diff --git a/styles/Google/Acronyms.yml b/styles/Google/Acronyms.yml
@@ -41,6 +41,7 @@ exceptions:
   - PDF
   - PHP
   - POST
+  - RAG
   - RAM
   - REPL
   - RSA
-Original file line number
+Diff line change
@@ Expand Up / @@ -41,6 +41,7 @@ exceptions: @@
       - PDF
       - PHP
       - POST
+      - RAG
       - RAM
       - REPL
       - RSA
@@ Expand Down @@