diff --git a/docs/en/sql-reference/20-sql-functions/10-search-functions/index.md b/docs/en/sql-reference/20-sql-functions/10-search-functions/index.md index 876dbeeb5b..f3f64002e7 100644 --- a/docs/en/sql-reference/20-sql-functions/10-search-functions/index.md +++ b/docs/en/sql-reference/20-sql-functions/10-search-functions/index.md @@ -2,50 +2,91 @@ title: Full-Text Search Functions --- -This section provides reference information for the full-text search functions in Databend. These functions enable powerful text search capabilities similar to those found in dedicated search engines. +Databend's full-text search functions deliver search-engine-style filtering for semi-structured `VARIANT` data and plain text columns that are indexed with an inverted index. They are ideal for AI-generated metadata—such as perception results from autonomous-driving video frames—stored alongside your assets. :::info -Databend's full-text search functions are inspired by [Elasticsearch Full-Text Search Functions](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html). +Databend's search functions are inspired by [Elasticsearch Full-Text Search Functions](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html). ::: +Include an inverted index in the table definition for the columns you plan to search: + +```sql +CREATE OR REPLACE TABLE frames ( + id INT, + meta VARIANT, + INVERTED INDEX idx_meta (meta) +); +``` + ## Search Functions | Function | Description | Example | -|----------|-------------|--------| -| [MATCH](match) | Searches for documents containing specified keywords in selected columns | `MATCH('title, body', 'technology')` | -| [QUERY](query) | Searches for documents satisfying a specified query expression with advanced syntax | `QUERY('title:technology AND society')` | -| [SCORE](score) | Returns the relevance score of search results when used with MATCH or QUERY | `SELECT title, SCORE() FROM articles WHERE MATCH('title', 'technology')` | +|----------|-------------|---------| +| [MATCH](match) | Performs a relevance-ranked search across the listed columns. | `MATCH('summary, tags', 'traffic light red')` | +| [QUERY](query) | Evaluates a Lucene-style query expression, including nested `VARIANT` fields. | `QUERY('meta.signals.traffic_light:red')` | +| [SCORE](score) | Returns the relevance score for the current row when used with `MATCH` or `QUERY`. | `SELECT summary, SCORE() FROM frame_notes WHERE MATCH('summary, tags', 'traffic light red')` | -## Usage Examples +## Query Syntax Examples -### Basic Text Search +### Example: Single Keyword ```sql --- Search for documents with 'technology' in title or body columns -SELECT * FROM articles -WHERE MATCH('title, body', 'technology'); +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.detections.label:pedestrian') +LIMIT 100; ``` -### Advanced Query Expressions +### Example: Boolean AND ```sql --- Search for documents with 'technology' in title and 'impact' in body -SELECT * FROM articles -WHERE QUERY('title:technology AND body:impact'); +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.signals.traffic_light:red AND meta.vehicle.lane:center') +LIMIT 100; ``` -### Relevance Scoring +### Example: Boolean OR ```sql --- Search with relevance scoring and sorting by relevance -SELECT title, body, SCORE() -FROM articles -WHERE MATCH('title^2, body', 'technology') -ORDER BY SCORE() DESC; +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.signals.traffic_light:red OR meta.detections.label:bike') +LIMIT 100; ``` -Before using these functions, you need to create an inverted index on the columns you want to search: +### Example: IN List ```sql -CREATE INVERTED INDEX idx ON articles(title, body); -``` \ No newline at end of file +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.tags:IN [stop urban]') +LIMIT 100; +``` + +### Example: Inclusive Range + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.vehicle.speed_kmh:[0 TO 10]') +LIMIT 100; +``` + +### Example: Exclusive Range + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.vehicle.speed_kmh:{0 TO 10}') +LIMIT 100; +``` + +### Example: Boosted Fields + +```sql +SELECT id, meta['frame']['timestamp'] AS ts, SCORE() +FROM frames +WHERE QUERY('meta.signals.traffic_light:red^1.0 AND meta.tags:urban^2.0') +LIMIT 100; +``` diff --git a/docs/en/sql-reference/20-sql-functions/10-search-functions/match.md b/docs/en/sql-reference/20-sql-functions/10-search-functions/match.md index cd12c377dc..63a9ac9bd2 100644 --- a/docs/en/sql-reference/20-sql-functions/10-search-functions/match.md +++ b/docs/en/sql-reference/20-sql-functions/10-search-functions/match.md @@ -5,7 +5,7 @@ import FunctionDescription from '@site/src/components/FunctionDescription'; -Searches for documents containing specified keywords. Please note that the MATCH function can only be used in a WHERE clause. +`MATCH` searches for rows that contain the supplied keywords within the listed columns. The function can only appear in a `WHERE` clause. :::info Databend's MATCH function is inspired by Elasticsearch's [MATCH](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html#sql-functions-search-match). @@ -14,83 +14,62 @@ Databend's MATCH function is inspired by Elasticsearch's [MATCH](https://www.ela ## Syntax ```sql -MATCH( '', ''[, ''] ) +MATCH('', ''[, '']) ``` -| Parameter | Description | -|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `` | A comma-separated list of column names in the table to search for the specified keywords, with optional weighting using the syntax (^), which allows assigning different weights to each column, influencing the importance of each column in the search. | -| `` | The keywords to match against the specified columns in the table. This parameter can also be used for suffix matching, where the search term followed by an asterisk (*) can match any number of characters or words. | -| `` | A set of configuration options, separated by semicolons `;`, that customize the search behavior. See the table below for details. | +- ``: A comma-separated list of columns to search. Append `^` to weight a column higher than the others. +- ``: The terms to search for. Append `*` for suffix matching, for example `rust*`. +- ``: An optional semicolon-separated list of `key=value` pairs fine-tuning the search. -| Option | Description | Example | Explanation | -|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| fuzziness | Allows matching terms within a specified Levenshtein distance. `fuzziness` can be set to 1 or 2. | SELECT id, score(), content FROM t WHERE match(content, 'box', 'fuzziness=1'); | When matching the query term "box", `fuzziness=1` allows matching terms like "fox", since "box" and "fox" have a Levenshtein distance of 1. | -| operator | Specifies how multiple query terms are combined. Can be set to OR (default) or AND. OR returns results containing any of the query terms, while AND returns results containing all query terms. | SELECT id, score(), content FROM t WHERE match(content, 'action works', 'fuzziness=1;operator=AND'); | With `operator=AND`, the query requires both "action" and "works" to be present in the results. Due to `fuzziness=1`, it matches terms like "Actions" and "words", so "Actions speak louder than words" is returned. | -| lenient | Controls whether errors are reported when the query text is invalid. Defaults to `false`. If set to `true`, no error is reported, and an empty result set is returned if the query text is invalid. | SELECT id, score(), content FROM t WHERE match(content, '()', 'lenient=true'); | If the query text `()` is invalid, setting `lenient=true` prevents an error from being thrown and returns an empty result set instead. | +## Options + +| Option | Values | Description | Example | +|--------|--------|-------------|---------| +| `fuzziness` | `1` or `2` | Matches keywords within the specified Levenshtein distance. | `MATCH('summary, tags', 'pedestrain', 'fuzziness=1')` matches rows that contain the correctly spelled `pedestrian`. | +| `operator` | `OR` (default) or `AND` | Controls how multiple keywords are combined when no boolean operator is specified. | `MATCH('summary, tags', 'traffic light red', 'operator=AND')` requires both words. | +| `lenient` | `true` or `false` | When `true`, suppresses parsing errors and returns an empty result set. | `MATCH('summary, tags', '()', 'lenient=true')` returns no rows instead of an error. | ## Examples +In many AI pipelines you may capture structured metadata in a `VARIANT` column while also materializing human-readable summaries for search. The following example stores dashcam frame summaries and tags that were extracted from the JSON payload. + +### Example: Build Searchable Summaries + +```sql +CREATE OR REPLACE TABLE frame_notes ( + id INT, + camera STRING, + summary STRING, + tags STRING, + INVERTED INDEX idx_notes (summary, tags) +); + +INSERT INTO frame_notes VALUES + (1, 'dashcam_front', + 'Green light at Market & 5th with pedestrian entering the crosswalk', + 'downtown commute green-light pedestrian'), + (2, 'dashcam_front', + 'Vehicle stopped at Mission & 6th red traffic light with cyclist ahead', + 'stop urban red-light cyclist'), + (3, 'dashcam_front', + 'School zone caution sign in SOMA with pedestrian waiting near crosswalk', + 'school-zone caution pedestrian'); +``` + +### Example: Boolean AND + +```sql +SELECT id, summary +FROM frame_notes +WHERE MATCH('summary, tags', 'traffic light red', 'operator=AND'); +-- Returns id 2 +``` + +### Example: Fuzzy Matching + ```sql -CREATE TABLE test(title STRING, body STRING); - -CREATE INVERTED INDEX idx ON test(title, body); - -INSERT INTO test VALUES -('The Importance of Reading', 'Reading is a crucial skill that opens up a world of knowledge and imagination.'), -('The Benefits of Exercise', 'Exercise is essential for maintaining a healthy lifestyle.'), -('The Power of Perseverance', 'Perseverance is the key to overcoming obstacles and achieving success.'), -('The Art of Communication', 'Effective communication is crucial in everyday life.'), -('The Impact of Technology on Society', 'Technology has revolutionized our society in countless ways.'); - --- Retrieve documents where the 'title' column matches 'art power' -SELECT * FROM test WHERE MATCH('title', 'art power'); - -┌────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ -├───────────────────────────┼────────────────────────────────────────────────────────────────────────┤ -│ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ -│ The Art of Communication │ Effective communication is crucial in everyday life. │ -└────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- Retrieve documents where the 'title' column contains values that start with 'The' followed by any characters -SELECT * FROM test WHERE MATCH('title', 'The*') - -┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ -│ Nullable(String) │ Nullable(String) │ -├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┤ -│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ -│ The Benefits of Exercise │ Exercise is essential for maintaining a healthy lifestyle. │ -│ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ -│ The Art of Communication │ Effective communication is crucial in everyday life. │ -│ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ -└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- Retrieve documents where either the 'title' or 'body' column matches 'knowledge technology' -SELECT *, score() FROM test WHERE MATCH('title, body', 'knowledge technology'); - -┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ score() │ -├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┤ -│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 1.1550591 │ -│ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ 2.6830134 │ -└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- Retrieve documents where either the 'title' or 'body' column matches 'knowledge technology', with weighted importance on both columns -SELECT *, score() FROM test WHERE MATCH('title^5, body^1.2', 'knowledge technology'); - -┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ score() │ -├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┤ -│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 1.3860708 │ -│ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ 7.8053584 │ -└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- Retrieve documents where the 'body' column contains both "knowledge" and "imagination" (allowing for minor typos). -SELECT * FROM test WHERE MATCH('body', 'knowledg imaginatio', 'fuzziness = 1; operator = AND'); - --[ RECORD 1 ]----------------------------------- -title: The Importance of Reading - body: Reading is a crucial skill that opens up a world of knowledge and imagination. -``` \ No newline at end of file +SELECT id, summary +FROM frame_notes +WHERE MATCH('summary^2, tags', 'pedestrain', 'fuzziness=1'); +-- Returns ids 1 and 3 +``` diff --git a/docs/en/sql-reference/20-sql-functions/10-search-functions/query.md b/docs/en/sql-reference/20-sql-functions/10-search-functions/query.md index 68075461a3..3b9c4ddf1d 100644 --- a/docs/en/sql-reference/20-sql-functions/10-search-functions/query.md +++ b/docs/en/sql-reference/20-sql-functions/10-search-functions/query.md @@ -3,9 +3,9 @@ title: QUERY --- import FunctionDescription from '@site/src/components/FunctionDescription'; - + -Searches for documents satisfying a specified query expression. Please note that the QUERY function can only be used in a WHERE clause. +`QUERY` filters rows by matching a Lucene-style query expression against columns that have an inverted index. Use dot notation to navigate nested fields inside `VARIANT` columns. The function is valid only in a `WHERE` clause. :::info Databend's QUERY function is inspired by Elasticsearch's [QUERY](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html#sql-functions-search-query). @@ -14,118 +14,169 @@ Databend's QUERY function is inspired by Elasticsearch's [QUERY](https://www.ela ## Syntax ```sql -QUERY( ''[, ''] ) +QUERY(''[, '']) ``` -The query expression supports the following syntaxes. Please note that `` can also be used for suffix matching, where the search term followed by an asterisk (*) can match any number of characters or words. +`` is an optional semicolon-separated list of `key=value` pairs that adjusts how the search works. -| Syntax | Description | Examples | -|---------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------| -| `:` | Matches documents where the specified column contains the specified keyword. | `QUERY('title:power')` | -| `:IN [, ...]` | Matches documents where the specified column contains any of the keywords listed within the square brackets. | `QUERY('title:IN [power, art]')`| -| `: AND / OR ` | Matches documents where the specified column contains both or either of the specified keywords. In queries with both AND and OR, AND operations are prioritized over OR, meaning that 'a AND b OR c' is read as '(a AND b) OR c'. | `QUERY('title:power AND art')` | -| `:+ -` | Matches documents where the specified positive keyword exists in the specified column and excludes documents where the specified negative keyword exists. | `QUERY('title:+the -reading')` | -| `:""` | Matches documents where the specified column contains the exact specified phrase. | `QUERY('title:"Benefits of Exercise"')` | -| `:^ :^` | Matches documents where the specified keyword exists in the specified columns with the specified boosts to increase their relevance in the search. This syntax allows setting different weights for multiple columns to influence the search relevance. | `QUERY('title:art^5 body:reading^1.2')` | +## Building Query Expressions +| Expression | Purpose | Example | +|------------|---------|---------| +| `column:keyword` | Matches rows where `column` contains the keyword. Append `*` for suffix matching. | `QUERY('meta.detections.label:pedestrian')` | +| `column:"exact phrase"` | Matches rows that contain the exact phrase. | `QUERY('meta.scene.summary:"vehicle stopped at red traffic light"')` | +| `column:+required -excluded` | Requires or excludes terms in the same column. | `QUERY('meta.tags:+commute -cyclist')` | +| `column:term1 AND term2` / `column:term1 OR term2` | Combines multiple terms with boolean operators. `AND` has higher precedence than `OR`. | `QUERY('meta.signals.traffic_light:red AND meta.vehicle.lane:center')` | +| `column:IN [value1 value2 ...]` | Matches any value from the list. | `QUERY('meta.tags:IN [stop urban]')` | +| `column:[min TO max]` | Performs inclusive range search. Use `*` to leave one side open. | `QUERY('meta.vehicle.speed_kmh:[0 TO 10]')` | +| `column:{min TO max}` | Performs exclusive range search that omits the boundary values. | `QUERY('meta.vehicle.speed_kmh:{0 TO 10}')` | +| `column:term^boost` | Increases the weight of matches in a specific column. | `QUERY('meta.signals.traffic_light:red^1.0 meta.tags:urban^2.0')` | -| Option | Description | Example | Explanation | -|-----------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| fuzziness | Allows matching terms within a specified Levenshtein distance. `fuzziness` can be set to 1 or 2. | SELECT id, score(), content FROM t WHERE query('content:box', 'fuzziness=1'); | When matching the query term "box", `fuzziness=1` allows matching terms like "fox", since "box" and "fox" have a Levenshtein distance of 1. | -| operator | Specifies how multiple query terms are combined. Can be set to OR (default) or AND. OR returns results containing any of the query terms, while AND returns results containing all query terms. | SELECT id, score(), content FROM t WHERE query('content:action works', 'fuzziness=1;operator=AND'); | With `operator=AND`, the query requires both "action" and "works" to be present in the results. Due to `fuzziness=1`, it matches terms like "Actions" and "words", so "Actions speak louder than words" is returned. | -| lenient | Controls whether errors are reported when the query text is invalid. Defaults to `false`. If set to `true`, no error is reported, and an empty result set is returned if the query text is invalid. | SELECT id, score(), content FROM t WHERE query('content:()', 'lenient=true'); | If the query text `()` is invalid, setting `lenient=true` prevents an error from being thrown and returns an empty result set instead. | +### Nested `VARIANT` Fields + +Use dot notation to address inner fields inside a `VARIANT` column. Databend evaluates the path across objects and arrays. + +| Pattern | Description | Example | +|---------|-------------|---------| +| `variant_col.field:value` | Matches an inner field. | `QUERY('meta.signals.traffic_light:red')` | +| `variant_col.field:IN [ ... ]` | Matches any value inside arrays. | `QUERY('meta.detections.label:IN [pedestrian cyclist]')` | +| `variant_col.field:[min TO max]` | Applies range search to numeric inner fields. | `QUERY('meta.vehicle.speed_kmh:[0 TO 10]')` | + +## Options + +| Option | Values | Description | Example | +|--------|--------|-------------|---------| +| `fuzziness` | `1` or `2` | Matches terms within the specified Levenshtein distance. | `SELECT id FROM frames WHERE QUERY('meta.detections.label:pedestrain', 'fuzziness=1');` | +| `operator` | `OR` (default) or `AND` | Controls how multiple terms are combined when no explicit boolean operator is supplied. | `SELECT id FROM frames WHERE QUERY('meta.scene.weather:rain fog', 'operator=AND');` | +| `lenient` | `true` or `false` | Suppresses parsing errors and returns an empty result set when `true`. | `SELECT id FROM frames WHERE QUERY('meta.detections.label:()', 'lenient=true');` | ## Examples +### Set Up a Smart-Driving Dataset + +```sql +CREATE OR REPLACE TABLE frames ( + id INT, + meta VARIANT, + INVERTED INDEX idx_meta (meta) +); + +INSERT INTO frames VALUES + (1, '{ + "frame":{"source":"dashcam_front","timestamp":"2025-10-21T08:32:05Z","location":{"city":"San Francisco","intersection":"Market & 5th","gps":[37.7825,-122.4072]}}, + "vehicle":{"speed_kmh":48,"acceleration":0.8,"lane":"center"}, + "signals":{"traffic_light":"green","distance_m":55,"speed_limit_kmh":50}, + "detections":[ + {"label":"car","confidence":0.96,"distance_m":15,"relative_speed_kmh":2}, + {"label":"pedestrian","confidence":0.88,"distance_m":12,"intent":"crossing"} + ], + "scene":{"weather":"clear","time_of_day":"day","visibility":"good"}, + "tags":["downtown","commute","green-light"], + "model":"perception-net-v5" + }'), + (2, '{ + "frame":{"source":"dashcam_front","timestamp":"2025-10-21T08:32:06Z","location":{"city":"San Francisco","intersection":"Mission & 6th","gps":[37.7829,-122.4079]}}, + "vehicle":{"speed_kmh":9,"acceleration":-1.1,"lane":"center"}, + "signals":{"traffic_light":"red","distance_m":18,"speed_limit_kmh":40}, + "detections":[ + {"label":"traffic_light","state":"red","confidence":0.99,"distance_m":18}, + {"label":"bike","confidence":0.82,"distance_m":9,"relative_speed_kmh":3} + ], + "scene":{"weather":"clear","time_of_day":"day","visibility":"good"}, + "tags":["stop","cyclist","urban"], + "model":"perception-net-v5" + }'), + (3, '{ + "frame":{"source":"dashcam_front","timestamp":"2025-10-21T08:32:07Z","location":{"city":"San Francisco","intersection":"SOMA School Zone","gps":[37.7808,-122.4016]}}, + "vehicle":{"speed_kmh":28,"acceleration":0.2,"lane":"right"}, + "signals":{"traffic_light":"yellow","distance_m":32,"speed_limit_kmh":25}, + "detections":[ + {"label":"traffic_sign","text":"SCHOOL","confidence":0.91,"distance_m":25}, + {"label":"pedestrian","confidence":0.76,"distance_m":8,"intent":"waiting"} + ], + "scene":{"weather":"overcast","time_of_day":"day","visibility":"moderate"}, + "tags":["school-zone","caution"], + "model":"perception-net-v5" + }'); +``` + +### Example: Boolean AND + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.signals.traffic_light:red AND meta.vehicle.speed_kmh:[0 TO 10]'); +-- Returns id 2 +``` + +### Example: Boolean OR + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.signals.traffic_light:red OR meta.detections.label:bike'); +-- Returns id 2 +``` + +### Example: IN List Matching + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.tags:IN [stop urban]'); +-- Returns id 2 +``` + +### Example: Inclusive Range + ```sql -CREATE TABLE test(title STRING, body STRING); - -CREATE INVERTED INDEX idx ON test(title, body); - -INSERT INTO test VALUES -('The Importance of Reading', 'Reading is a crucial skill that opens up a world of knowledge and imagination.'), -('The Benefits of Exercise', 'Exercise is essential for maintaining a healthy lifestyle.'), -('The Power of Perseverance', 'Perseverance is the key to overcoming obstacles and achieving success.'), -('The Art of Communication', 'Effective communication is crucial in everyday life.'), -('The Impact of Technology on Society', 'Technology has revolutionized our society in countless ways.'); - --- Retrieve documents where the 'title' column contains the keyword 'power' -SELECT * FROM test WHERE QUERY('title:power'); - -┌────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ -├───────────────────────────┼────────────────────────────────────────────────────────────────────────┤ -│ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ -└────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- Retrieve documents where the 'title' column contains values that start with 'The' followed by any characters -SELECT * FROM test WHERE QUERY('title:The*'); - -┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ -├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────────────┤ -│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ -│ The Benefits of Exercise │ Exercise is essential for maintaining a healthy lifestyle. │ -│ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ -│ The Art of Communication │ Effective communication is crucial in everyday life. │ -│ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ -└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- Retrieve documents where the 'title' column contains either the keyword 'power' or 'art' -SELECT * FROM test WHERE QUERY('title:power OR art'); - -┌────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ -├───────────────────────────┼────────────────────────────────────────────────────────────────────────┤ -│ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ -│ The Art of Communication │ Effective communication is crucial in everyday life. │ -└────────────────────────────────────────────────────────────────────────────────────────────────────┘ - -SELECT * FROM test WHERE QUERY('title:IN [power, art]') - -┌────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ -│ Nullable(String) │ Nullable(String) │ -├───────────────────────────┼────────────────────────────────────────────────────────────────────────┤ -│ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ -│ The Art of Communication │ Effective communication is crucial in everyday life. │ -└────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- Retrieve documents where the 'title' column contains the positive keyword 'the' but not 'reading' -SELECT * FROM test WHERE QUERY('title:+the -reading'); - -┌──────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ -├─────────────────────────────────────┼────────────────────────────────────────────────────────────────────────┤ -│ The Benefits of Exercise │ Exercise is essential for maintaining a healthy lifestyle. │ -│ The Power of Perseverance │ Perseverance is the key to overcoming obstacles and achieving success. │ -│ The Art of Communication │ Effective communication is crucial in everyday life. │ -│ The Impact of Technology on Society │ Technology has revolutionized our society in countless ways. │ -└──────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- Retrieve documents where the 'title' column contains the exact phrase 'Benefits of Exercise' -SELECT * FROM test WHERE QUERY('title:"Benefits of Exercise"'); - -┌───────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ -├──────────────────────────┼────────────────────────────────────────────────────────────┤ -│ The Benefits of Exercise │ Exercise is essential for maintaining a healthy lifestyle. │ -└───────────────────────────────────────────────────────────────────────────────────────┘ - --- Retrieve documents where the 'title' column contains the keyword 'art' with a boost of 5 and the 'body' column contains the keyword 'reading' with a boost of 1.2 -SELECT *, score() FROM test WHERE QUERY('title:art^5 body:reading^1.2'); - -┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ score() │ -├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┤ -│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 1.3860708 │ -│ The Art of Communication │ Effective communication is crucial in everyday life. │ 7.1992116 │ -└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- Retrieve documents where the 'body' column contains both "knowledge" and "imagination" (allowing for minor typos). -SELECT * FROM test WHERE QUERY('body:knowledg OR imaginatio', 'fuzziness = 1; operator = AND'); - --[ RECORD 1 ]----------------------------------- -title: The Importance of Reading - body: Reading is a crucial skill that opens up a world of knowledge and imagination. -``` \ No newline at end of file +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.vehicle.speed_kmh:[0 TO 10]'); +-- Returns id 2 +``` + +### Example: Exclusive Range + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.vehicle.speed_kmh:{0 TO 10}'); +-- Returns id 2 +``` + +### Example: Boost Across Fields + +```sql +SELECT id, meta['frame']['timestamp'] AS ts, SCORE() +FROM frames +WHERE QUERY('meta.signals.traffic_light:red^1.0 AND meta.tags:urban^2.0'); +-- Returns id 2 with higher relevance +``` + +### Example: Detect High-Confidence Pedestrians + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.detections.label:IN [pedestrian cyclist] AND meta.detections.confidence:[0.8 TO *]'); +-- Returns ids 1 and 3 +``` + +### Example: Filter by Phrase + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.scene.summary:"vehicle stopped at red traffic light"'); +-- Returns id 2 +``` + +### Example: School-Zone Filter + +```sql +SELECT id, meta['frame']['timestamp'] AS ts +FROM frames +WHERE QUERY('meta.detections.text:SCHOOL AND meta.scene.time_of_day:day'); +-- Returns id 3 +``` diff --git a/docs/en/sql-reference/20-sql-functions/10-search-functions/score.md b/docs/en/sql-reference/20-sql-functions/10-search-functions/score.md index bf7128a091..5d67e8e816 100644 --- a/docs/en/sql-reference/20-sql-functions/10-search-functions/score.md +++ b/docs/en/sql-reference/20-sql-functions/10-search-functions/score.md @@ -5,7 +5,7 @@ import FunctionDescription from '@site/src/components/FunctionDescription'; -Returns the relevance of the query string. The higher the score, the more relevant the data. Please note that SCORE function can only be used with the [QUERY](query.md) or [MATCH](match.md) function. +`SCORE()` returns the relevance score assigned to the current row by the inverted index search. Use it together with [MATCH](match) or [QUERY](query) in a `WHERE` clause. :::info Databend's SCORE function is inspired by Elasticsearch's [SCORE](https://www.elastic.co/guide/en/elasticsearch/reference/current/sql-functions-search.html#sql-functions-search-score). @@ -19,35 +19,44 @@ SCORE() ## Examples +### Example: Prepare Text Notes for MATCH + +```sql +CREATE OR REPLACE TABLE frame_notes ( + id INT, + camera STRING, + summary STRING, + tags STRING, + INVERTED INDEX idx_notes (summary, tags) +); + +INSERT INTO frame_notes VALUES + (1, 'dashcam_front', + 'Green light at Market & 5th with pedestrian entering the crosswalk', + 'downtown commute green-light pedestrian'), + (2, 'dashcam_front', + 'Vehicle stopped at Mission & 6th red traffic light with cyclist ahead', + 'stop urban red-light cyclist'), + (3, 'dashcam_front', + 'School zone caution sign in SOMA with pedestrian waiting near crosswalk', + 'school-zone caution pedestrian'); +``` + +### Example: Score MATCH Results + +```sql +SELECT summary, SCORE() +FROM frame_notes +WHERE MATCH('summary^2, tags', 'traffic light red', 'operator=AND') +ORDER BY SCORE() DESC; +``` + +### Example: Score QUERY Results + +Reusing the `frames` table from the [QUERY](query) examples: + ```sql -CREATE TABLE test(title STRING, body STRING); - -CREATE INVERTED INDEX idx ON test(title, body); - -INSERT INTO test VALUES -('The Importance of Reading', 'Reading is a crucial skill that opens up a world of knowledge and imagination.'), -('The Benefits of Exercise', 'Exercise is essential for maintaining a healthy lifestyle.'), -('The Power of Perseverance', 'Perseverance is the key to overcoming obstacles and achieving success.'), -('The Art of Communication', 'Effective communication is crucial in everyday life.'), -('The Impact of Technology on Society', 'Technology has revolutionized our society in countless ways.'); - --- Retrieve documents where the 'title' column contains the keyword 'art' with a boost of 5 and the 'body' column contains the keyword 'reading' with a boost of 1.2, along with their relevance scores -SELECT *, score() FROM test WHERE QUERY('title:art^5 body:reading^1.2'); - -┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ score() │ -├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┤ -│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 1.3860708 │ -│ The Art of Communication │ Effective communication is crucial in everyday life. │ 7.1992116 │ -└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ - --- Retrieve documents where the 'title' column contains the keyword 'reading' with a boost of 5 and the 'body' column contains the keyword 'everyday' with a boost of 1.2, along with their relevance scores -SELECT *, score() FROM test WHERE MATCH('title^5, body^1.2', 'reading everyday'); - -┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -│ title │ body │ score() │ -├───────────────────────────┼────────────────────────────────────────────────────────────────────────────────┼───────────┤ -│ The Importance of Reading │ Reading is a crucial skill that opens up a world of knowledge and imagination. │ 8.585282 │ -│ The Art of Communication │ Effective communication is crucial in everyday life. │ 1.8575745 │ -└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +SELECT id, SCORE() +FROM frames +WHERE QUERY('meta.detections.label:pedestrian^3 AND meta.scene.time_of_day:day'); ```