Add vector embeddings for search queries by jsdanielh · Pull Request #20 · nimiq/crypto-map

jsdanielh · 2025-10-07T03:14:47Z

Add vector embedding for search queries.

This closes #11.

nuxthub-admin · 2025-10-07T03:22:24Z

✅ Deployed crypto-map-next

Deployed crypto-map-next 6133dc7 to preview

🔗	crypto-map-next-preview.je-cf9.workers.dev
📌	`57bec7b0-crypto-map-next-preview.je-cf9.workers.dev`
📱	View QR Code

📋 View deployment logs

Use a bash script for generating embeddings since node is not available in the database container image.

onmax · 2025-10-07T12:29:07Z

database/schema.ts

+export type Location = typeof locations.$inferSelect
 export type LocationCategory = typeof locationCategories.$inferSelect


This was my mistake. This kind of types are defined in ~/shared/types. defining the types here seems to no give us anything.

onmax · 2025-10-07T12:29:47Z

database/startup.sh

 /docker-entrypoint-initdb.d/03-seed.sh || true

+# Run the vector embeddings generation
+echo "Running vector embeddings generation on startup..."


This morning the whole docker/supabase thing was refactored completely. I think this might create a mergin issue. I will fix it in this PR :)

onmax

would you like to have a call?

onmax · 2025-10-07T12:31:31Z

server/api/search.get.ts

+    if (error instanceof Error && error.message.includes('OpenAI API key')) {
+      consola.warn('OpenAI API key not configured, falling back to keyword-only search')
+    }
+    else if (error instanceof Error && error.message.includes('OpenAI authentication')) {
+      consola.warn('OpenAI authentication failed, falling back to keyword-only search')
+    }
+    else if (error instanceof Error && error.message.includes('vector')) {
+      consola.warn('Vector database operations failed, falling back to keyword-only search')


i would remove this completely. Just use

throw createError(error)

createError is a function from h3.

if we needed it we can always add more details later, but let;s keep the code clean i would say

onmax · 2025-10-07T12:34:06Z

server/utils/config-validation.ts

+  private static validateVectorSearchConfig(
+    config: any,
+    errors: string[],
+    warnings: string[],
+  ): void {
+    if (typeof config.keywordLimit !== 'number' || config.keywordLimit < 1 || config.keywordLimit > 100) {
+      errors.push('VECTOR_SEARCH_KEYWORD_LIMIT must be a number between 1 and 100')
+    }
+
+    if (typeof config.vectorLimit !== 'number' || config.vectorLimit < 1 || config.vectorLimit > 100) {
+      errors.push('VECTOR_SEARCH_VECTOR_LIMIT must be a number between 1 and 100')
+    }
+
+    if (typeof config.hybridThreshold !== 'number' || config.hybridThreshold < 1 || config.hybridThreshold > 50) {
+      errors.push('VECTOR_SEARCH_HYBRID_THRESHOLD must be a number between 1 and 50')
+    }
+
+    if (typeof config.similarityThreshold !== 'number' || config.similarityThreshold < 0 || config.similarityThreshold > 1) {
+      errors.push('VECTOR_SEARCH_SIMILARITY_THRESHOLD must be a number between 0 and 1')
+    }
+
+    // Logical validation
+    if (config.hybridThreshold > config.keywordLimit) {
+      warnings.push('VECTOR_SEARCH_HYBRID_THRESHOLD is greater than VECTOR_SEARCH_KEYWORD_LIMIT. Vector search may never be triggered.')
+    }
+
+    if (config.similarityThreshold > 0.9) {
+      warnings.push('VECTOR_SEARCH_SIMILARITY_THRESHOLD is very high (>0.9). This may result in very few vector search results.')
+    }
+
+    if (config.similarityThreshold < 0.3) {
+      warnings.push('VECTOR_SEARCH_SIMILARITY_THRESHOLD is very low (<0.3). This may result in many irrelevant vector search results.')
+    }
+  }


This kind of check adds complexity. we can always improve the safeRuntimeConfig to add these checks.

onmax · 2025-10-07T12:35:38Z

server/plugins/config-validation.ts

+/**
+ * Nitro plugin to validate configuration during server startup
+ */
+export default defineNitroPlugin(async (_nitroApp) => {


since the validateVectorSearchConfig is unnecessary, this whole plugin is also unnecessary imo

onmax · 2025-10-07T12:36:05Z

server/utils/config-validation.ts

+  /**
+   * Validate all required environment variables and configuration
+   */
+  static validateConfiguration(): void {


not needed this whole file

onmax · 2025-10-07T12:37:32Z

server/utils/embedding.ts

+/**
+ * Service for generating embeddings using OpenAI API
+ */
+export class EmbeddingService {


I think this is way too much code.

we can use something like this https://ai-sdk.dev/docs/ai-sdk-core/embeddings#embedding-a-single-value

Then setup the OPENAI_API_KEY in .env and it should be more than enough.

config such as maxRetries: 3,
baseDelay: 1000, // 1 second
maxDelay: 30000, // 30 seconds
batchSize: 100, // OpenAI batch limit

not necessaary in m opinion. the ai sdk provides that for us

also the "batch" optimization is also not ncessary.

I would suggest to run a embedding manually for the categories and put the result in the database/seed The scrit can be run with tsx and ai sdk. We should run it only once.

in the future if we update the categories, then we run it again.

- Remove Location type from schema (use shared/types) - Simplify error handling in search endpoint - Remove config validation (use safeRuntimeConfig) - Remove location embeddings (only category embeddings) - Revert to named catalog structure in workspace - Remove vector search dependencies and config

- Add full-text search on location name and address - Add category similarity search using embeddings - Cache query embeddings in NuxtHub KV (30 day TTL) - Combine text and category results with deduplication - Use AI SDK for embedding generation - Support user-selected category filters

- Add /api/search/autocomplete for fast full-text search - Add /api/search/embed for background embedding precompute - Remove TTL from embedding cache (permanent storage) - Min query length: 2 characters

- Add form-based autocomplete with search icon - Show user's query as first result - Display matching locations below with map pin icons - Precompute embeddings while typing (background) - Support UUID-based location fetching - Extract locationSelect helper to reduce duplication - Use semantic search when submitting user's query - Use direct UUID fetch when selecting location

- Format template to one line per element - Use @submit.prevent for form submission - Remove icons from autocomplete locations - Add PostgreSQL ts_headline for match highlighting - Style <mark> tags as bold (no background) - Move search.get.ts to search/index.get.ts

…tils

…translations

jsdanielh added 2 commits October 6, 2025 21:15

Add database schema and migration for vector support

82a5a51

Create embedding generation utilities and OpenAI inregration

0430c58

jsdanielh force-pushed the jsdanielh/main branch from 8fdec3e to 1847282 Compare October 7, 2025 03:15

nuxthub-admin bot temporarily deployed to preview October 7, 2025 03:22 Inactive

nuxthub-admin bot temporarily deployed to preview October 7, 2025 03:26 Inactive

jsdanielh added 2 commits October 6, 2025 22:03

Implement hybrid search functionality in the search endpoint

9fed820

Use a bash script for generating embeddings since node is not available in the database container image.

Add environment configuration and validation

1a12c54

jsdanielh force-pushed the jsdanielh/main branch from 1847282 to 1a12c54 Compare October 7, 2025 04:05

nuxthub-admin bot temporarily deployed to preview October 7, 2025 04:06 Inactive

onmax reviewed Oct 7, 2025

View reviewed changes

onmax requested changes Oct 7, 2025

View reviewed changes

Remove Docker-based embedding generation setup

2938dc4

nuxthub-admin bot temporarily deployed to preview October 8, 2025 05:53 Inactive

onmax added 8 commits October 8, 2025 08:03

Revert schema changes

da4a85e

Add autocomplete and embedding precompute endpoints

f1f08d2

- Add /api/search/autocomplete for fast full-text search - Add /api/search/embed for background embedding precompute - Remove TTL from embedding cache (permanent storage) - Min query length: 2 characters

fix: update lockfile and dependency versions

51be678

docs: improve comments to explain why, add data flow diagram

ec2f132

nuxthub-admin bot temporarily deployed to preview October 8, 2025 06:40 Inactive

onmax added 2 commits October 8, 2025 08:42

fix: skip runtime config validation during build

ecfaef2

fix: use DATABASE_URL and fix import paths

cffb313

nuxthub-admin bot temporarily deployed to preview October 8, 2025 06:46 Inactive

chore: remove old supabase folder

f6d5eac

nuxthub-admin bot temporarily deployed to preview October 8, 2025 06:48 Inactive

fix: allow empty strings in runtime config validation

f224020

nuxthub-admin bot temporarily deployed to preview October 8, 2025 06:51 Inactive

refactor: split index.vue logic into composables

20ffbc8

nuxthub-admin bot temporarily deployed to preview October 8, 2025 06:54 Inactive

onmax added 2 commits October 8, 2025 08:59

refactor: create locations/[uuid] endpoint, move opening hours to utils

18b2727

feat: add i18n for opening hours messages

d2a908b

nuxthub-admin bot temporarily deployed to preview October 8, 2025 07:05 Inactive

refactor: include messageKey in opening hours status

58a2234

nuxthub-admin bot temporarily deployed to preview October 8, 2025 07:06 Inactive

refactor: move location fetch to component, add location enrichment u…

38446e6

…tils

nuxthub-admin bot temporarily deployed to preview October 8, 2025 07:11 Inactive

onmax added 2 commits October 8, 2025 09:15

refactor: simplify location endpoint with single query

a77d5ec

refactor: simplify createError calls, use validated params, add i18n …

ca04266

…translations

nuxthub-admin bot temporarily deployed to preview October 8, 2025 07:21 Inactive

nuxthub-admin bot temporarily deployed to preview October 8, 2025 07:23 Inactive

refactor: optimize queries and simplify abstractions

a6bdf44

nuxthub-admin bot temporarily deployed to preview October 8, 2025 07:38 Inactive

refactor: simplify search, remove created_at

0332553

nuxthub-admin bot temporarily deployed to preview October 8, 2025 08:18 Inactive

docs: update docs and simplify filters to booleans

6133dc7

nuxthub-admin bot temporarily deployed to preview October 8, 2025 08:42 Inactive

onmax merged commit 3c2fecd into main Oct 8, 2025
1 check passed

jsdanielh deleted the jsdanielh/main branch October 8, 2025 22:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vector embeddings for search queries#20

Add vector embeddings for search queries#20
onmax merged 27 commits intomainfrom
jsdanielh/main

jsdanielh commented Oct 7, 2025

Uh oh!

nuxthub-admin bot commented Oct 7, 2025 •

edited

Loading

Uh oh!

onmax Oct 7, 2025

Uh oh!

onmax Oct 7, 2025

Uh oh!

onmax left a comment

Uh oh!

onmax Oct 7, 2025

Uh oh!

onmax Oct 7, 2025

Uh oh!

onmax Oct 7, 2025

Uh oh!

onmax Oct 7, 2025

Uh oh!

onmax Oct 7, 2025

Uh oh!

onmax Oct 7, 2025

Uh oh!

onmax Oct 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		export type Location = typeof locations.$inferSelect
		export type LocationCategory = typeof locationCategories.$inferSelect

Conversation

jsdanielh commented Oct 7, 2025

Uh oh!

nuxthub-admin bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deployed crypto-map-next

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

onmax left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nuxthub-admin bot commented Oct 7, 2025 •

edited

Loading