-
Notifications
You must be signed in to change notification settings - Fork 9
Add PostgreSQL Full-Text Search for Notes #321
Description
Problem
Note search currently uses ILIKE (case-insensitive pattern matching) on both the title and text columns:
// packages/api/src/routes/notes.ts
conditions.push(
or(
ilike(note.title, `%${query.search}%`),
ilike(note.text, `%${query.search}%`)
)
);This has several problems:
- Full table scan --
ILIKE '%term%'cannot use a B-tree index. Every search scans every row. - No ranking -- results aren't ordered by relevance. A note that mentions the search term once in the middle of a paragraph ranks the same as one with the term in the title.
- No stemming -- searching for "mapping" won't find notes containing "mapped" or "maps".
- HTML in text -- the
textcolumn contains HTML from the rich text editor.ILIKEmatches on HTML tags and attributes, producing false positives (e.g., searching for "class" matches<div class="...">). - Scales poorly -- as the note count grows, search latency increases linearly.
Goal
Replace ILIKE search with PostgreSQL's built-in full-text search using tsvector and tsquery. This provides:
- Indexed search -- uses a GIN index, sublinear lookup time
- Relevance ranking --
ts_rankscores results by how well they match - Stemming -- "mapping", "mapped", "maps" all match "map"
- Stop word removal -- common words like "the", "is", "at" are ignored
- HTML stripping -- can strip HTML tags before indexing
Implementation
1. Add a custom tsvector type and generated column
Drizzle supports this natively via customType and generatedAlwaysAs(). No raw SQL migration needed.
In packages/api/src/db/schema.ts:
import { SQL, sql } from 'drizzle-orm';
import { customType, index, pgTable, text, /* ...existing imports */ } from 'drizzle-orm/pg-core';
// Custom tsvector type for full-text search
const tsvector = customType<{ data: string }>({
dataType() {
return 'tsvector';
},
});Then add the generated column and GIN index to the note table:
export const note = pgTable('note', {
// ...existing columns...
// Full-text search vector (auto-computed from title + text)
searchVector: tsvector('search_vector')
.notNull()
.generatedAlwaysAs(
(): SQL =>
sql`setweight(to_tsvector('english', coalesce(${note.title}, '')), 'A')
||
setweight(to_tsvector('english', coalesce(
regexp_replace(${note.text}, '<[^>]*>', ' ', 'g'), ''
)), 'B')`,
),
}, (table) => [
// ...existing indexes...
index('note_search_vector_idx').using('gin', table.searchVector),
]);What this does:
setweight(..., 'A')gives title matches higher priority than body matchesregexp_replace(text, '<[^>]*>', ' ', 'g')strips HTML tags before indexinggeneratedAlwaysAs()means the column auto-updates whenever title or text change -- no triggers or manual maintenance- The GIN index makes lookups fast
2. Generate and apply the migration
# Generate the migration
pnpm --filter @lrda/api db:generate
# Review the generated SQL in packages/api/drizzle/ -- should contain
# ALTER TABLE with the generated column and CREATE INDEX with GIN
# Apply to local database
pnpm --filter @lrda/api db:push3. Update the search query in the notes route
In packages/api/src/routes/notes.ts, replace the ILIKE condition with a full-text search query:
import { sql } from 'drizzle-orm';
if (query.search) {
const searchTerms = toSearchQuery(query.search);
if (searchTerms) {
conditions.push(
sql`${note.searchVector} @@ to_tsquery('english', ${searchTerms})`
);
}
}4. Add relevance-based ordering
When searching, order by relevance rank:
if (query.search) {
const searchTerms = toSearchQuery(query.search);
if (searchTerms) {
orderBy = sql`ts_rank(${note.searchVector}, to_tsquery('english', ${searchTerms})) DESC`;
}
}5. Handle empty/invalid search input
Add a utility to safely convert user input to a tsquery string:
function toSearchQuery(input: string): string | null {
const cleaned = input.trim().replace(/[^\w\s]/g, ''); // remove special chars
if (!cleaned) return null;
return cleaned
.split(/\s+/)
.filter(Boolean)
.map(term => `${term}:*`) // prefix matching: "relig" matches "religion"
.join(' & '); // AND logic: all terms must match
}Context
Why not an external search engine (Elasticsearch, etc.)?
For the current scale, PostgreSQL full-text search is more than sufficient and adds zero infrastructure complexity. It's built into the database the app already uses. If the project eventually needs faceted search, fuzzy matching, or sub-millisecond search over millions of documents, a dedicated search engine can be added later.
What about the existing ILIKE behavior?
The full-text search is strictly better for natural language queries. However, if exact substring matching is ever needed (e.g., searching for a specific ID or code snippet), a fallback to ILIKE could be added behind a flag. For this app's use case (searching research notes by topic), full-text search is the right fit.
Generated column vs trigger
A generatedAlwaysAs() / GENERATED ALWAYS AS ... STORED column is simpler than a trigger. It automatically stays in sync when title or text are updated. The tradeoff is slightly more storage per row, but for text search vectors this is negligible.
How to Test
- Apply the migration:
pnpm --filter @lrda/api db:push - Verify the index exists:
SELECT indexname FROM pg_indexes WHERE tablename = 'note' AND indexname = 'note_search_vector_idx';
- Create a few test notes with known content
- Search via the API:
curl "http://localhost:3002/api/notes?search=religion&published=true" - Verify results are ranked (notes with "religion" in the title appear first)
- Verify stemming works: search for "mapping" and find notes containing "mapped"
- Verify HTML tags are not matched: search for "div" and confirm it doesn't match every note
- Run
EXPLAIN ANALYZEto confirm index usage:EXPLAIN ANALYZE SELECT * FROM note WHERE search_vector @@ to_tsquery('english', 'religion:*');
Acceptance Criteria
-
searchVectorgenerated column added to the note table using Drizzle'sgeneratedAlwaysAs() - GIN index created on
searchVector - Notes route uses
@@full-text search instead ofILIKE - Search results are ranked by relevance (
ts_rank) - Prefix matching works (partial words match)
- HTML tags are stripped from the indexed text
- Invalid/empty search input is handled gracefully
-
EXPLAIN ANALYZEshows index scan (not sequential scan) - Existing tests pass
Files to Modify
packages/api/src/db/schema.ts-- addtsvectorcustom type, generated column, and GIN indexpackages/api/src/routes/notes.ts-- replaceILIKEwith full-text search query + relevance ordering