Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 65 additions & 24 deletions docs/en/sql-reference/20-sql-functions/10-search-functions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,50 +2,91 @@
title: Full-Text Search Functions
---

This section provides reference information for the full-text search functions in Databend. These functions enable powerful text search capabilities similar to those found in dedicated search engines.
Databend's full-text search functions deliver search-engine-style filtering for semi-structured `VARIANT` data and plain text columns that are indexed with an inverted index. They are ideal for AI-generated metadata—such as perception results from autonomous-driving video frames—stored alongside your assets.

:::info
Databend's full-text search functions are inspired by [Elasticsearch Full-Text Search Functions](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html).
Databend's search functions are inspired by [Elasticsearch Full-Text Search Functions](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html).
:::

Include an inverted index in the table definition for the columns you plan to search:

```sql
CREATE OR REPLACE TABLE frames (
id INT,
meta VARIANT,
INVERTED INDEX idx_meta (meta)
);
```

## Search Functions

| Function | Description | Example |
|----------|-------------|--------|
| [MATCH](match) | Searches for documents containing specified keywords in selected columns | `MATCH('title, body', 'technology')` |
| [QUERY](query) | Searches for documents satisfying a specified query expression with advanced syntax | `QUERY('title:technology AND society')` |
| [SCORE](score) | Returns the relevance score of search results when used with MATCH or QUERY | `SELECT title, SCORE() FROM articles WHERE MATCH('title', 'technology')` |
|----------|-------------|---------|
| [MATCH](match) | Performs a relevance-ranked search across the listed columns. | `MATCH('summary, tags', 'traffic light red')` |
| [QUERY](query) | Evaluates a Lucene-style query expression, including nested `VARIANT` fields. | `QUERY('meta.signals.traffic_light:red')` |
| [SCORE](score) | Returns the relevance score for the current row when used with `MATCH` or `QUERY`. | `SELECT summary, SCORE() FROM frame_notes WHERE MATCH('summary, tags', 'traffic light red')` |

## Usage Examples
## Query Syntax Examples

### Basic Text Search
### Example: Single Keyword

```sql
-- Search for documents with 'technology' in title or body columns
SELECT * FROM articles
WHERE MATCH('title, body', 'technology');
SELECT id, meta['frame']['timestamp'] AS ts
FROM frames
WHERE QUERY('meta.detections.label:pedestrian')
LIMIT 100;
```

### Advanced Query Expressions
### Example: Boolean AND

```sql
-- Search for documents with 'technology' in title and 'impact' in body
SELECT * FROM articles
WHERE QUERY('title:technology AND body:impact');
SELECT id, meta['frame']['timestamp'] AS ts
FROM frames
WHERE QUERY('meta.signals.traffic_light:red AND meta.vehicle.lane:center')
LIMIT 100;
```

### Relevance Scoring
### Example: Boolean OR

```sql
-- Search with relevance scoring and sorting by relevance
SELECT title, body, SCORE()
FROM articles
WHERE MATCH('title^2, body', 'technology')
ORDER BY SCORE() DESC;
SELECT id, meta['frame']['timestamp'] AS ts
FROM frames
WHERE QUERY('meta.signals.traffic_light:red OR meta.detections.label:bike')
LIMIT 100;
```

Before using these functions, you need to create an inverted index on the columns you want to search:
### Example: IN List

```sql
CREATE INVERTED INDEX idx ON articles(title, body);
```
SELECT id, meta['frame']['timestamp'] AS ts
FROM frames
WHERE QUERY('meta.tags:IN [stop urban]')
LIMIT 100;
```

### Example: Inclusive Range

```sql
SELECT id, meta['frame']['timestamp'] AS ts
FROM frames
WHERE QUERY('meta.vehicle.speed_kmh:[0 TO 10]')
LIMIT 100;
```

### Example: Exclusive Range

```sql
SELECT id, meta['frame']['timestamp'] AS ts
FROM frames
WHERE QUERY('meta.vehicle.speed_kmh:{0 TO 10}')
LIMIT 100;
```

### Example: Boosted Fields

```sql
SELECT id, meta['frame']['timestamp'] AS ts, SCORE()
FROM frames
WHERE QUERY('meta.signals.traffic_light:red^1.0 AND meta.tags:urban^2.0')
LIMIT 100;
```
127 changes: 53 additions & 74 deletions docs/en/sql-reference/20-sql-functions/10-search-functions/match.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import FunctionDescription from '@site/src/components/FunctionDescription';

<FunctionDescription description="Introduced or updated: v1.2.619"/>

Searches for documents containing specified keywords. Please note that the MATCH function can only be used in a WHERE clause.
`MATCH` searches for rows that contain the supplied keywords within the listed columns. The function can only appear in a `WHERE` clause.

:::info
Databend's MATCH function is inspired by Elasticsearch's [MATCH](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html#sql-functions-search-match).
Expand All @@ -14,83 +14,62 @@ Databend's MATCH function is inspired by Elasticsearch's [MATCH](https://www.ela
## Syntax

```sql
MATCH( '<columns>', '<keywords>'[, '<options>'] )
MATCH('<columns>', '<keywords>'[, '<options>'])
```

| Parameter | Description |
|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `<columns>` | A comma-separated list of column names in the table to search for the specified keywords, with optional weighting using the syntax (^), which allows assigning different weights to each column, influencing the importance of each column in the search. |
| `<keywords>` | The keywords to match against the specified columns in the table. This parameter can also be used for suffix matching, where the search term followed by an asterisk (*) can match any number of characters or words. |
| `<options>` | A set of configuration options, separated by semicolons `;`, that customize the search behavior. See the table below for details. |
- `<columns>`: A comma-separated list of columns to search. Append `^<boost>` to weight a column higher than the others.
- `<keywords>`: The terms to search for. Append `*` for suffix matching, for example `rust*`.
- `<options>`: An optional semicolon-separated list of `key=value` pairs fine-tuning the search.

| Option | Description | Example | Explanation |
|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| fuzziness | Allows matching terms within a specified Levenshtein distance. `fuzziness` can be set to 1 or 2. | SELECT id, score(), content FROM t WHERE match(content, 'box', 'fuzziness=1'); | When matching the query term "box", `fuzziness=1` allows matching terms like "fox", since "box" and "fox" have a Levenshtein distance of 1. |
| operator | Specifies how multiple query terms are combined. Can be set to OR (default) or AND. OR returns results containing any of the query terms, while AND returns results containing all query terms. | SELECT id, score(), content FROM t WHERE match(content, 'action works', 'fuzziness=1;operator=AND'); | With `operator=AND`, the query requires both "action" and "works" to be present in the results. Due to `fuzziness=1`, it matches terms like "Actions" and "words", so "Actions speak louder than words" is returned. |
| lenient | Controls whether errors are reported when the query text is invalid. Defaults to `false`. If set to `true`, no error is reported, and an empty result set is returned if the query text is invalid. | SELECT id, score(), content FROM t WHERE match(content, '()', 'lenient=true'); | If the query text `()` is invalid, setting `lenient=true` prevents an error from being thrown and returns an empty result set instead. |
## Options

| Option | Values | Description | Example |
|--------|--------|-------------|---------|
| `fuzziness` | `1` or `2` | Matches keywords within the specified Levenshtein distance. | `MATCH('summary, tags', 'pedestrain', 'fuzziness=1')` matches rows that contain the correctly spelled `pedestrian`. |
| `operator` | `OR` (default) or `AND` | Controls how multiple keywords are combined when no boolean operator is specified. | `MATCH('summary, tags', 'traffic light red', 'operator=AND')` requires both words. |
| `lenient` | `true` or `false` | When `true`, suppresses parsing errors and returns an empty result set. | `MATCH('summary, tags', '()', 'lenient=true')` returns no rows instead of an error. |

## Examples

In many AI pipelines you may capture structured metadata in a `VARIANT` column while also materializing human-readable summaries for search. The following example stores dashcam frame summaries and tags that were extracted from the JSON payload.

### Example: Build Searchable Summaries

```sql
CREATE OR REPLACE TABLE frame_notes (
id INT,
camera STRING,
summary STRING,
tags STRING,
INVERTED INDEX idx_notes (summary, tags)
);

INSERT INTO frame_notes VALUES
(1, 'dashcam_front',
'Green light at Market & 5th with pedestrian entering the crosswalk',
'downtown commute green-light pedestrian'),
(2, 'dashcam_front',
'Vehicle stopped at Mission & 6th red traffic light with cyclist ahead',
'stop urban red-light cyclist'),
(3, 'dashcam_front',
'School zone caution sign in SOMA with pedestrian waiting near crosswalk',
'school-zone caution pedestrian');
```

### Example: Boolean AND

```sql
SELECT id, summary
FROM frame_notes
WHERE MATCH('summary, tags', 'traffic light red', 'operator=AND');
-- Returns id 2
```

### Example: Fuzzy Matching

```sql
CREATE TABLE test(title STRING, body STRING);

CREATE INVERTED INDEX idx ON test(title, body);

INSERT INTO test VALUES
('The Importance of Reading', 'Reading is a crucial skill that opens up a world of knowledge and imagination.'),
('The Benefits of Exercise', 'Exercise is essential for maintaining a healthy lifestyle.'),
('The Power of Perseverance', 'Perseverance is the key to overcoming obstacles and achieving success.'),
('The Art of Communication', 'Effective communication is crucial in everyday life.'),
('The Impact of Technology on Society', 'Technology has revolutionized our society in countless ways.');

-- Retrieve documents where the 'title' column matches 'art power'
SELECT * FROM test WHERE MATCH('title', 'art power');

┌────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ title │ body │
├───────────────────────────┼────────────────────────────────────────────────────────────────────────┤
│ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │
│ The Art of Communication │ Effective communication is crucial in everyday life. │
└────────────────────────────────────────────────────────────────────────────────────────────────────┘

-- Retrieve documents where the 'title' column contains values that start with 'The' followed by any characters
SELECT * FROM test WHERE MATCH('title', 'The*')

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ title │ body │
│ Nullable(String) │ Nullable(String) │
├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┤
│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │
│ The Benefits of Exercise │ Exercise is essential for maintaining a healthy lifestyle. │
│ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │
│ The Art of Communication │ Effective communication is crucial in everyday life. │
│ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

-- Retrieve documents where either the 'title' or 'body' column matches 'knowledge technology'
SELECT *, score() FROM test WHERE MATCH('title, body', 'knowledge technology');

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ title │ body │ score() │
├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┤
│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 1.1550591 │
│ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ 2.6830134 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

-- Retrieve documents where either the 'title' or 'body' column matches 'knowledge technology', with weighted importance on both columns
SELECT *, score() FROM test WHERE MATCH('title^5, body^1.2', 'knowledge technology');

┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ title │ body │ score() │
├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┤
│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 1.3860708 │
│ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ 7.8053584 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

-- Retrieve documents where the 'body' column contains both "knowledge" and "imagination" (allowing for minor typos).
SELECT * FROM test WHERE MATCH('body', 'knowledg imaginatio', 'fuzziness = 1; operator = AND');

-[ RECORD 1 ]-----------------------------------
title: The Importance of Reading
body: Reading is a crucial skill that opens up a world of knowledge and imagination.
```
SELECT id, summary
FROM frame_notes
WHERE MATCH('summary^2, tags', 'pedestrain', 'fuzziness=1');
-- Returns ids 1 and 3
```
Loading
Loading