Skip to content

Add PostgreSQL Full-Text Search for Notes #321

@InfinityBowman

Description

@InfinityBowman

Problem

Note search currently uses ILIKE (case-insensitive pattern matching) on both the title and text columns:

// packages/api/src/routes/notes.ts
conditions.push(
  or(
    ilike(note.title, `%${query.search}%`),
    ilike(note.text, `%${query.search}%`)
  )
);

This has several problems:

  1. Full table scan -- ILIKE '%term%' cannot use a B-tree index. Every search scans every row.
  2. No ranking -- results aren't ordered by relevance. A note that mentions the search term once in the middle of a paragraph ranks the same as one with the term in the title.
  3. No stemming -- searching for "mapping" won't find notes containing "mapped" or "maps".
  4. HTML in text -- the text column contains HTML from the rich text editor. ILIKE matches on HTML tags and attributes, producing false positives (e.g., searching for "class" matches <div class="...">).
  5. Scales poorly -- as the note count grows, search latency increases linearly.

Goal

Replace ILIKE search with PostgreSQL's built-in full-text search using tsvector and tsquery. This provides:

  • Indexed search -- uses a GIN index, sublinear lookup time
  • Relevance ranking -- ts_rank scores results by how well they match
  • Stemming -- "mapping", "mapped", "maps" all match "map"
  • Stop word removal -- common words like "the", "is", "at" are ignored
  • HTML stripping -- can strip HTML tags before indexing

Implementation

1. Add a custom tsvector type and generated column

Drizzle supports this natively via customType and generatedAlwaysAs(). No raw SQL migration needed.

In packages/api/src/db/schema.ts:

import { SQL, sql } from 'drizzle-orm';
import { customType, index, pgTable, text, /* ...existing imports */ } from 'drizzle-orm/pg-core';

// Custom tsvector type for full-text search
const tsvector = customType<{ data: string }>({
  dataType() {
    return 'tsvector';
  },
});

Then add the generated column and GIN index to the note table:

export const note = pgTable('note', {
  // ...existing columns...

  // Full-text search vector (auto-computed from title + text)
  searchVector: tsvector('search_vector')
    .notNull()
    .generatedAlwaysAs(
      (): SQL =>
        sql`setweight(to_tsvector('english', coalesce(${note.title}, '')), 'A')
            ||
            setweight(to_tsvector('english', coalesce(
              regexp_replace(${note.text}, '<[^>]*>', ' ', 'g'), ''
            )), 'B')`,
    ),
}, (table) => [
  // ...existing indexes...
  index('note_search_vector_idx').using('gin', table.searchVector),
]);

What this does:

  • setweight(..., 'A') gives title matches higher priority than body matches
  • regexp_replace(text, '<[^>]*>', ' ', 'g') strips HTML tags before indexing
  • generatedAlwaysAs() means the column auto-updates whenever title or text change -- no triggers or manual maintenance
  • The GIN index makes lookups fast

2. Generate and apply the migration

# Generate the migration
pnpm --filter @lrda/api db:generate

# Review the generated SQL in packages/api/drizzle/ -- should contain
# ALTER TABLE with the generated column and CREATE INDEX with GIN

# Apply to local database
pnpm --filter @lrda/api db:push

3. Update the search query in the notes route

In packages/api/src/routes/notes.ts, replace the ILIKE condition with a full-text search query:

import { sql } from 'drizzle-orm';

if (query.search) {
  const searchTerms = toSearchQuery(query.search);
  if (searchTerms) {
    conditions.push(
      sql`${note.searchVector} @@ to_tsquery('english', ${searchTerms})`
    );
  }
}

4. Add relevance-based ordering

When searching, order by relevance rank:

if (query.search) {
  const searchTerms = toSearchQuery(query.search);
  if (searchTerms) {
    orderBy = sql`ts_rank(${note.searchVector}, to_tsquery('english', ${searchTerms})) DESC`;
  }
}

5. Handle empty/invalid search input

Add a utility to safely convert user input to a tsquery string:

function toSearchQuery(input: string): string | null {
  const cleaned = input.trim().replace(/[^\w\s]/g, ''); // remove special chars
  if (!cleaned) return null;
  return cleaned
    .split(/\s+/)
    .filter(Boolean)
    .map(term => `${term}:*`) // prefix matching: "relig" matches "religion"
    .join(' & ');              // AND logic: all terms must match
}

Context

Why not an external search engine (Elasticsearch, etc.)?

For the current scale, PostgreSQL full-text search is more than sufficient and adds zero infrastructure complexity. It's built into the database the app already uses. If the project eventually needs faceted search, fuzzy matching, or sub-millisecond search over millions of documents, a dedicated search engine can be added later.

What about the existing ILIKE behavior?

The full-text search is strictly better for natural language queries. However, if exact substring matching is ever needed (e.g., searching for a specific ID or code snippet), a fallback to ILIKE could be added behind a flag. For this app's use case (searching research notes by topic), full-text search is the right fit.

Generated column vs trigger

A generatedAlwaysAs() / GENERATED ALWAYS AS ... STORED column is simpler than a trigger. It automatically stays in sync when title or text are updated. The tradeoff is slightly more storage per row, but for text search vectors this is negligible.

How to Test

  1. Apply the migration: pnpm --filter @lrda/api db:push
  2. Verify the index exists:
    SELECT indexname FROM pg_indexes WHERE tablename = 'note' AND indexname = 'note_search_vector_idx';
  3. Create a few test notes with known content
  4. Search via the API:
    curl "http://localhost:3002/api/notes?search=religion&published=true"
  5. Verify results are ranked (notes with "religion" in the title appear first)
  6. Verify stemming works: search for "mapping" and find notes containing "mapped"
  7. Verify HTML tags are not matched: search for "div" and confirm it doesn't match every note
  8. Run EXPLAIN ANALYZE to confirm index usage:
    EXPLAIN ANALYZE SELECT * FROM note WHERE search_vector @@ to_tsquery('english', 'religion:*');

Acceptance Criteria

  • searchVector generated column added to the note table using Drizzle's generatedAlwaysAs()
  • GIN index created on searchVector
  • Notes route uses @@ full-text search instead of ILIKE
  • Search results are ranked by relevance (ts_rank)
  • Prefix matching works (partial words match)
  • HTML tags are stripped from the indexed text
  • Invalid/empty search input is handled gracefully
  • EXPLAIN ANALYZE shows index scan (not sequential scan)
  • Existing tests pass

Files to Modify

  • packages/api/src/db/schema.ts -- add tsvector custom type, generated column, and GIN index
  • packages/api/src/routes/notes.ts -- replace ILIKE with full-text search query + relevance ordering

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions