diff --git a/content/commands/ft.create.md b/content/commands/ft.create.md index f7a4480cd6..228217dc7a 100644 --- a/content/commands/ft.create.md +++ b/content/commands/ft.create.md @@ -222,7 +222,7 @@ after the SCHEMA keyword, declares which fields to index: - `TEXT` - Allows full-text search queries against the value in this attribute. - - `TAG` - Allows exact-match queries, such as categories or primary keys, against the value in this attribute. For more information, see [Tag Fields]({{< relref "/develop/ai/search-and-query/advanced-concepts/tags" >}}). + - `TAG` - Allows exact-match queries, such as categories or primary keys, against the value in this attribute. For more information, see [Tag Fields]({{< relref "/develop/ai/search-and-query/indexing/tags" >}}). - `NUMERIC` - Allows numeric range queries against the value in this attribute. See [query syntax docs]({{< relref "/develop/ai/search-and-query/query/" >}}) for details on how to use numeric ranges. diff --git a/content/commands/ft.tagvals.md b/content/commands/ft.tagvals.md index 1a58aa11ce..b0e56533c2 100644 --- a/content/commands/ft.tagvals.md +++ b/content/commands/ft.tagvals.md @@ -62,7 +62,7 @@ Use FT.TAGVALS if your tag indexes things like cities, categories, and so on. ## Limitations -FT.TAGVALS provides no paging or sorting, and the tags are not alphabetically sorted. FT.TAGVALS only operates on [tag fields]({{< relref "/develop/ai/search-and-query/advanced-concepts/tags" >}}). +FT.TAGVALS provides no paging or sorting, and the tags are not alphabetically sorted. FT.TAGVALS only operates on [tag fields]({{< relref "/develop/ai/search-and-query/indexing/tags" >}}). The returned strings are lowercase with whitespaces removed, but otherwise unchanged. ## Return @@ -87,5 +87,5 @@ FT.TAGVALS returns an array reply of all distinct tags in the tag index. ## Related topics -- [Tag fields]({{< relref "/develop/ai/search-and-query/advanced-concepts/tags" >}}) +- [Tag fields]({{< relref "/develop/ai/search-and-query/indexing/tags" >}}) - [RediSearch]({{< relref "/develop/ai/search-and-query/" >}}) diff --git a/content/develop/ai/search-and-query/advanced-concepts/escaping.md b/content/develop/ai/search-and-query/advanced-concepts/escaping.md deleted file mode 100644 index a0e4d28311..0000000000 --- a/content/develop/ai/search-and-query/advanced-concepts/escaping.md +++ /dev/null @@ -1,81 +0,0 @@ ---- -aliases: -- /develop/interact/search-and-query/advanced-concepts/escaping -categories: -- docs -- develop -- stack -- oss -- rs -- rc -- oss -- kubernetes -- clients -description: Controlling text tokenization and escaping -linkTitle: Tokenization -title: Tokenization -weight: 4 ---- - -Full-text search works by comparing words, URLs, numbers, and other elements of the query -against the text in the searchable fields of each document. However, -it would be very inefficient to compare the entire text of the query against the -entire text of each field over and over again, so the search system doesn't do this. -Instead, it splits the document text into short, significant sections -called *tokens* during the indexing process and stores the tokens as part of the document's -index data. - -During a search, the query system also tokenizes the -query text and then simply compares the tokens from the query against the tokens stored -for each document. Finding a match like this is much more efficient than pattern-matching on -the whole text and also lets you use -[stemming]({{< relref "/develop/ai/search-and-query/advanced-concepts/stemming" >}}) and -[stop words]({{< relref "/develop/ai/search-and-query/advanced-concepts/stopwords" >}}) -to improve the search even further. See this article about -[Tokenization](https://queryunderstanding.com/tokenization-c8cdd6aef7ff) -for a general introduction to the concepts. - -Redis uses a very simple tokenizer for documents and a slightly more sophisticated tokenizer for queries. Both allow a degree of control over string escaping and tokenization. - -The sections below describe the rules for tokenizing text fields and queries. -Note that -[Tag fields]({{< relref "/develop/ai/search-and-query/advanced-concepts/tags" >}}) -are essentially text fields but they use a simpler form of tokenization, as described -separately in the -[Tokenization rules for tag fields](#tokenization-rules-for-tag-fields) section. - -## Tokenization rules for text fields - -1. All punctuation marks and whitespace (besides underscores) separate the document and queries into tokens. For example, any character of `,.<>{}[]"':;!@#$%^&*()-+=~` will break the text into terms, so the text `foo-bar.baz...bag` will be tokenized into `[foo, bar, baz, bag]` - -2. Escaping separators in both queries and documents is done by prepending a backslash to any separator. For example, the text `hello\-world hello-world` will be tokenized as `[hello-world, hello, world]`. In most languages you will need an extra backslash to signify an actual backslash when formatting the document or query, so the actual text entered into redis-cli will be `hello\\-world`. - -3. Underscores (`_`) are not used as separators in either document or query, so the text `hello_world` will remain as is after tokenization. - -4. Repeating spaces or punctuation marks are stripped. - -5. Latin characters are converted to lowercase. - -6. A backslash before the first digit will tokenize it as a term. This will translate the `-` sign as NOT, which otherwise would make the number negative. Add a backslash before `.` if you are searching for a float. For example, `-20 -> {-20} vs -\20 -> {NOT{20}}`. - -## Tokenization rules for tag fields - -[Tag fields]({{< relref "/develop/ai/search-and-query/advanced-concepts/tags" >}}) interpret -a text field as a list of *tags* delimited by a -[separator]({{< relref "/develop/ai/search-and-query/advanced-concepts/tags#creating-a-tag-field" >}}) -character (which is a comma "," by -default). The tokenizer simply splits the text wherever it finds the separator and so most -punctuation marks and whitespace are valid characters within each tag token. The only -changes that the tokenizer makes to the tags are: - -- Trimming whitespace at the start and end of the tag. Other whitespace in the tag text is left intact. -- Converting Latin alphabet characters to lowercase. You can override this by adding the - [`CASESENSITIVE`]({{< relref "/develop/ai/search-and-query/indexing/field-and-type-options#tag-fields" >}}) option in the indexing schema for the tag field. - -This means that when you define a tag field, you don't need to escape any characters, except -in the unusual case where you want leading or trailing spaces to be part of the tag text. -However, you do need to escape certain characters in a *query* against a tag field. See the -[Query syntax]({{< relref "/develop/ai/search-and-query/advanced-concepts/query_syntax#tag-filters" >}}) and -[Exact match]({{< relref "/develop/ai/search-and-query/query/exact-match" >}}) pages for more information about escaping -and how to use [DIALECT 2]({{< relref "/develop/ai/search-and-query/advanced-concepts/dialects#dialect-2" >}}), which is required for -exact match queries involving tags. diff --git a/content/develop/ai/search-and-query/advanced-concepts/geo.md b/content/develop/ai/search-and-query/advanced-concepts/geo.md deleted file mode 100644 index 4a31410017..0000000000 --- a/content/develop/ai/search-and-query/advanced-concepts/geo.md +++ /dev/null @@ -1,205 +0,0 @@ ---- -aliases: -- /develop/interact/search-and-query/advanced-concepts/geo -- /develop/ai/search-and-query/indexing/geo -categories: -- docs -- develop -- stack -- oss -- rs -- rc -- oss -- kubernetes -- clients -description: Learn how to use geospatial fields and perform geospatial queries in Redis -linkTitle: Geospatial -math: true -title: Geospatial -weight: 14 ---- - -Redis Query Engine supports geospatial data. This feature -lets you store geographical locations and geometric shapes -in the fields of JSON objects. - -{{< note >}}Take care not to confuse the geospatial indexing -features in Redis Query Engine with the -[Geospatial data type]({{< relref "/develop/data-types/geospatial" >}}) -that Redis also supports. Although there are some similarities between -these two features, the data type is intended for simpler use -cases and doesn't have the range of format options and queries -available in Redis Query Engine. -{{< /note >}} - -You can index these fields and use queries to find the objects -by their location or the relationship of their shape to other shapes. -For example, if you add the locations of a set of shops, you can -find all the shops within 5km of a user's position or determine -which ones are within the boundary of a particular town. - -Redis uses coordinate points to represent geospatial locations. -You can store individual points but you can also -use a set of points to define a polygon shape (the shape of a -town, for example). You can query several types of interactions -between points and shapes, such as whether a point lies within -a shape or whether two shapes overlap. - -Redis can interpret coordinates either as geographical longitude -and latitude or as Cartesian coordinates on a flat plane. -Geographical coordinates are ideal for large real-world locations -and areas (such as towns and countries). Cartesian coordinates -are more suitable for smaller areas (such as rooms in a building) -or for games, simulations, and other artificial scenarios. - -## Storing geospatial data - -Redis supports two different -[schema types]({{< relref "/develop/ai/search-and-query/indexing/field-and-type-options" >}}) -for geospatial data: - -- [`GEO`](#geo): This uses a simple format where individual geospatial - points are specified as numeric longitude-latitude pairs. - -- [`GEOSHAPE`](#geoshape): [Redis Open Source]({{< relref "/operate/oss_and_stack" >}}) also - supports `GEOSHAPE` indexing in v7.2 and later. - This uses a subset of the - [Well-Known Text (WKT)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) - format to specify both points and polygons using either geographical - coordinates or Cartesian coordinates. A - `GEOSHAPE` field supports more advanced queries than `GEO`, - such as checking if one shape overlaps or contains another. - -The sections below describe these schema types in more detail. - -## `GEO` - -A `GEO` index lets you represent geospatial data either as -a string containing a longitude-latitude pair (for example, -"-104.991531, 39.742043") or as a JSON array of these -strings. Note that the longitude value comes first in the -string. - -For example, you could index the `location` fields of the -the [JSON]({{< relref "/develop/data-types/json" >}}) objects -shown below as `GEO`: - -```json -{ - "description": "Navy Blue Slippers", - "price": 45.99, - "city": "Denver", - "location": "-104.991531, 39.742043" -} - -{ - "description": "Bright Red Boots", - "price": 185.75, - "city": "Various", - "location": [ - "-104.991531, 39.742043", - "-105.0618814,40.5150098" - ] -} -``` - -`GEO` fields allow only basic point and radius queries. -For example, the query below finds products within a 100 mile radius of Colorado Springs -(Longitude=-104.800644, Latitude=38.846127). - -```bash -FT.SEARCH productidx '@location:[-104.800644 38.846127 100 mi]' -``` - -See [Geospatial queries]({{< relref "/develop/ai/search-and-query/query/geo-spatial" >}}) -for more information about the available query options and see -[Geospatial indexing]({{< relref "/develop/ai/search-and-query/indexing/geoindex" >}}) -for examples of indexing `GEO` fields. - -## `GEOSHAPE` - -Fields indexed as `GEOSHAPE` support the `POINT` and `POLYGON` primitives from the -[Well-Known Text](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) -representation of geometry. The `POINT` primitive defines a single point -in a similar way to a `GEO` field. -The `geom` field of the example JSON object shown below specifies a point -(in Cartesian coordinates, using the standard x,y order): - -```json -{ - "name": "Purple Point", - "geom": "POINT (2 2)" -} -``` - -The `POLYGON` primitive can approximate the outline of any shape using a -sequence of points. Specify the coordinates of the corners in the order they -occur around the shape (either clockwise or counter-clockwise) and ensure the -shape is "closed" by making the final coordinate exactly the same as the first. - -Note that `POLYGON` requires double parentheses around the coordinate list. -This is because you can specify additional shapes as a comma-separated list -that define "holes" within the enclosing polygon. The holes must have the opposite -winding order to the outer polygon (so, if the outer polygon uses a clockwise winding -order, the holes must use counter-clockwise). -The `geom` field of the example JSON object shown below specifies a -square using Cartesian coordinates in a clockwise winding order: - -```json -{ - "name": "Green Square", - "geom": "POLYGON ((1 1, 1 3, 3 3, 3 1, 1 1))" -} -``` - -The following examples define one `POINT` and three `POLYGON` primitives, -which are shown in the image below: - -``` -POINT (2 2) -POLYGON ((1 1, 1 3, 3 3, 3 1, 1 1)) -POLYGON ((2 2.5, 2 3.5, 3.5 3.5, 3.5 2.5, 2 2.5)) -POLYGON ((3.5 1, 3.75 2, 4 1, 3.5 1)) -``` - -{{< image filename="/images/dev/rqe/geoshapes.jpg" >}} - -You can run various types of queries against a geospatial index. For -example, the query below returns one primitive that lies within the boundary -of the green square (from the example above) but omits the square itself: - -```bash -> FT.SEARCH geomidx "(-@name:(Green Square) @geom:[WITHIN $qshape])" PARAMS 2 qshape "POLYGON ((1 1, 1 3, 3 3, 3 1, 1 1))" RETURN 1 name DIALECT 2 - -1) (integer) 1 -2) "shape:4" -3) 1) "name" - 2) "[\"Purple Point\"]" -``` - -There are four query operations that you can use with `GEOSHAPE` fields: - -- `WITHIN`: Find points or shapes that lie entirely within an - enclosing shape that you specify in the query. -- `CONTAINS`: Find shapes that completely contain the specified point - or shape. -- `INTERSECTS`: Find shapes whose boundary overlaps another specified - shape. -- `DISJOINT`: Find shapes whose boundary does not overlap another specified - shape. - -See -[Geospatial queries]({{< relref "/develop/ai/search-and-query/query/geo-spatial" >}}) -for more information about these query types and see -[Geospatial indexing]({{< relref "/develop/ai/search-and-query/indexing/geoindex" >}}) -for examples of indexing `GEOSHAPE` fields. - -## Limitations of geographical coordinates - -Planet Earth is actually shaped more like an -[ellipsoid](https://en.wikipedia.org/wiki/Earth_ellipsoid) than a perfect sphere. -The spherical coordinate system used by Redis Query Engine is a close -approximation to the shape of the Earth but not exact. For most practical -uses of geospatial queries, the approximation works very well, but you -shouldn't rely on it if you need very precise location data (for example, to track -the GPS locations of boats in an emergency response system). diff --git a/content/develop/ai/search-and-query/advanced-concepts/query_syntax.md b/content/develop/ai/search-and-query/advanced-concepts/query_syntax.md index 17189011f7..e69beccfd0 100644 --- a/content/develop/ai/search-and-query/advanced-concepts/query_syntax.md +++ b/content/develop/ai/search-and-query/advanced-concepts/query_syntax.md @@ -64,7 +64,7 @@ You can use simple syntax for complex queries using these rules: * Georadius matches on geo fields with the syntax `@field:[{lon} {lat} {radius} {m|km|mi|ft}]`. * As of 2.6, range queries on vector fields with the syntax `@field:[VECTOR_RANGE {radius} $query_vec]`, where `query_vec` is given as a query parameter. * As of v2.4, k-nearest neighbors (KNN) queries on vector fields with or without pre-filtering with the syntax `{filter_query}=>[KNN {num} @field $query_vec]`. -* Tag field filters with the syntax `@field:{tag | tag | ...}`. See the full documentation on [tags]({{< relref "/develop/ai/search-and-query/advanced-concepts/tags" >}}). +* Tag field filters with the syntax `@field:{tag | tag | ...}`. See the full documentation on [tags]({{< relref "/develop/ai/search-and-query/indexing/tags" >}}). * Optional terms or clauses: `foo ~bar` means bar is optional but documents containing `bar` will rank higher. * Fuzzy matching on terms: `%hello%` means all terms with Levenshtein distance of 1 from it. Use multiple pairs of '%' brackets, up to three deep, to increase the Levenshtein distance. * An expression in a query can be wrapped in parentheses to disambiguate, for example, `(hello|hella) (world|werld)`. @@ -128,8 +128,8 @@ If a field in the schema is defined as NUMERIC, it is possible to use the FILTER ## Tag filters As of v0.91, you can use a special field type called a -[_tag field_]({{< relref "/develop/ai/search-and-query/advanced-concepts/tags" >}}), with simpler -[tokenization]({{< relref "/develop/ai/search-and-query/advanced-concepts/escaping#tokenization-rules-for-tag-fields" >}}) +[_tag field_]({{< relref "/develop/ai/search-and-query/indexing/tags" >}}), with simpler +[tokenization]({{< relref "/develop/ai/search-and-query/indexing/tokenization#tag-field-tokenization" >}}) and encoding in the index. You can't access the values in these fields using a general fieldless search. Instead, you use special syntax: ``` diff --git a/content/develop/ai/search-and-query/advanced-concepts/tags.md b/content/develop/ai/search-and-query/advanced-concepts/tags.md deleted file mode 100644 index 17e7cedc22..0000000000 --- a/content/develop/ai/search-and-query/advanced-concepts/tags.md +++ /dev/null @@ -1,169 +0,0 @@ ---- -aliases: -- /develop/interact/search-and-query/advanced-concepts/tags -categories: -- docs -- develop -- stack -- oss -- rs -- rc -- oss -- kubernetes -- clients -description: Details about tag fields -linkTitle: Tags -title: Tags -weight: 6 ---- - -Tag fields are similar to full-text fields but they interpret the text as a simple -list of *tags* delimited by a -[separator](#creating-a-tag-field) character (which is a comma "," by default). -This limitation means that tag fields can use simpler -[tokenization]({{< relref "/develop/ai/search-and-query/advanced-concepts/escaping" >}}) -and encoding in the index, which is more efficient than full-text indexing. - -The values in tag fields cannot be accessed by general field-less search and can be used only with a special syntax. - -The main differences between tag and full-text fields are: - -1. [Tokenization]({{< relref "/develop/ai/search-and-query/advanced-concepts/escaping#tokenization-rules-for-tag-fields" >}}) - is very simple for tags. - -1. Stemming is not performed on tag indexes. - -1. Tags cannot be found from a general full-text search. If a document has a field called "tags" - with the values "foo" and "bar", searching for foo or bar without a special tag modifier (see below) will not return this document. - -1. The index is much simpler and more compressed: frequencies or offset vectors of field flags - are not stored. The index contains only document IDs encoded as deltas. This means that an entry in - a tag index is usually one or two bytes long. This makes them very memory-efficient and fast. - -1. You can create up to 1024 tag fields per index. - -## Creating a tag field - -Tag fields can be added to the schema with the following syntax: - -``` -FT.CREATE ... SCHEMA ... {field_name} TAG [SEPARATOR {sep}] [CASESENSITIVE] -``` - -For hashes, SEPARATOR can be any printable ASCII character; the default is a comma (`,`). For JSON, there is no default separator; you must declare one explicitly if needed. - -For example: - -``` -JSON.SET key:1 $ '{"colors": "red, orange, yellow"}' -FT.CREATE idx on JSON PREFIX 1 key: SCHEMA $.colors AS colors TAG SEPARATOR "," - -> FT.SEARCH idx '@colors:{orange}' -1) "1" -2) "key:1" -3) 1) "$" - 2) "{\"colors\":\"red, orange, yellow\"}" -``` - -CASESENSITIVE can be specified to keep the original case. - -## Querying tag fields - -As mentioned above, just searching for a tag without any modifiers will not retrieve documents -containing it. - -The syntax for matching tags in a query is as follows (the curly braces are part of the syntax): - - ``` - @:{ | | ...} - ``` - -For example, this query finds documents with either the tag `hello world` or `foo bar`: - -``` - FT.SEARCH idx "@tags:{ hello world | foo bar }" -``` - -Tag clauses can be combined into any sub-clause, used as negative expressions, optional expressions, etc. For example, given the following index: - -``` -FT.CREATE idx ON HASH PREFIX 1 test: SCHEMA title TEXT price NUMERIC tags TAG SEPARATOR ";" -``` - -You can combine a full-text search on the title field, a numerical range on price, and match either the `foo bar` or `hello world` tag like this: - -``` -FT.SEARCH idx "@title:hello @price:[0 100] @tags:{ foo bar | hello world } -``` - -Tags support prefix matching with the regular `*` character: - -``` -FT.SEARCH idx "@tags:{ hell* }" -FT.SEARCH idx "@tags:{ hello\\ w* }" - -``` - -## Multiple tags in a single filter - -Notice that including multiple tags in the same clause creates a union of all documents that contain any of the included tags. To create an intersection of documents containing all of the given tags, you should repeat the tag filter several times. - -For example, imagine an index of travelers, with a tag field for the cities each traveler has visited: - -``` -FT.CREATE myIndex ON HASH PREFIX 1 traveler: SCHEMA name TEXT cities TAG - -HSET traveler:1 name "John Doe" cities "New York, Barcelona, San Francisco" -``` - -For this index, the following query will return all the people who visited at least one of the following cities: - -``` -FT.SEARCH myIndex "@cities:{ New York | Los Angeles | Barcelona }" -``` - -But the next query will return all people who have visited all three cities: - -``` -FT.SEARCH myIndex "@cities:{ New York } @cities:{Los Angeles} @cities:{ Barcelona }" -``` - -## Including punctuation and spaces in tags - -A tag field can contain any punctuation characters except for the field separator. -You can use punctuation without escaping when you *define* a tag field, -but you typically need to escape certain characters when you *query* the field -because the query syntax itself uses the same characters. -(See [Query syntax]({{< relref "/develop/ai/search-and-query/advanced-concepts/query_syntax#tag-filters" >}}) -for the full set of characters that require escaping.) - -For example, given the following index: - -``` -FT.CREATE punctuation ON HASH PREFIX 1 test: SCHEMA tags TAG -``` - -You can add tags that contain punctuation like this: - -``` -HSET test:1 tags "Andrew's Top 5,Justin's Top 5" -``` - -However, when you query for those tags, you must escape the punctuation characters -with a backslash (`\`). So, querying for the tag `Andrew's Top 5` in -[`redis-cli`]({{< relref "/develop/tools/cli" >}}) looks like this: - -``` -FT.SEARCH punctuation "@tags:{ Andrew\\'s Top 5 }" -``` - -(Note that you need the double backslash here because the terminal app itself -uses the backslash as an escape character. -Programming languages commonly use this convention also.) - -You can include spaces in a tag filter without escaping *unless* you are -using a version of RediSearch earlier than v2.4 or you are using -[query dialect 1]({{< relref "/develop/ai/search-and-query/advanced-concepts/dialects#dialect-1" >}}). -See -[Query syntax]({{< relref "/develop/ai/search-and-query/advanced-concepts/query_syntax#tag-filters" >}}) -for a full explanation. diff --git a/content/develop/ai/search-and-query/indexing/_index.md b/content/develop/ai/search-and-query/indexing/_index.md index 1162e9a796..117d7e7733 100644 --- a/content/develop/ai/search-and-query/indexing/_index.md +++ b/content/develop/ai/search-and-query/indexing/_index.md @@ -12,593 +12,138 @@ categories: - oss - kubernetes - clients -description: How to index and search JSON documents +description: How to create search indexes for Redis data structures linkTitle: Indexing title: Indexing weight: 3 --- -In addition to indexing Redis hashes, Redis Open Source can also index JSON documents. +You can create search indexes for your Redis data to enable fast, flexible queries across your stored information. Redis supports indexing for both Hash and JSON data structures, each with their own advantages and use cases. -## Create index with JSON schema +## Choose your indexing approach + +### Hash indexing + +Hash indexing provides a straightforward approach where field names in your schema map directly to hash field names. This makes it ideal for: + +- **Structured data** with consistent field patterns +- **Simple schemas** without nested objects +- **High performance** requirements with minimal overhead +- **Existing applications** already using Redis Hashes + +Learn more: [Hash indexing]({{< relref "/develop/ai/search-and-query/indexing/hash-indexing" >}}) + +### JSON indexing + +JSON indexing uses JSONPath expressions to specify which parts of your documents to index. This approach works well for: + +- **Complex, nested data** structures +- **Flexible schemas** that may evolve over time +- **Rich data types** including arrays and objects +- **Applications** requiring document-style storage + +Learn more: [JSON indexing]({{< relref "/develop/ai/search-and-query/indexing/json-indexing" >}}) + +## Core concepts + +### Schema definition + +Every search index requires a schema that defines: + +- **Field types**: TEXT, TAG, NUMERIC, VECTOR, GEO, GEOSHAPE +- **Field options**: SORTABLE, NOINDEX, WEIGHT +- **Index configuration**: prefixes, filters, language settings + +### Field types + +Redis supports several field types for different data and query patterns: + +- **TEXT**: Full-text search with stemming and tokenization +- **TAG**: Exact match searches with high performance +- **NUMERIC**: Range queries and sorting +- **VECTOR**: Similarity search and machine learning +- **GEO/GEOSHAPE**: Geospatial queries and location-based search + +Learn more: [Field and type options]({{< relref "/develop/ai/search-and-query/indexing/field-and-type-options" >}}) + +### Indexing process + +When you create an index: + +1. **Existing data** is indexed asynchronously in the background +2. **New data** is indexed synchronously as you add it +3. **Modified data** is automatically reindexed +4. **Deleted data** is automatically removed from the index + +## Advanced topics + +### JSON arrays + +JSON documents often contain arrays that require special indexing considerations: + +- **Array elements as tags** for exact matching +- **Array elements as text** for full-text search +- **Numeric arrays** for range queries +- **Vector arrays** for similarity search + +Learn more: [JSON arrays]({{< relref "/develop/ai/search-and-query/indexing/json-arrays" >}}) + +### Tags and exact matching + +Tag fields provide efficient exact matching capabilities: + +- **High performance** with compressed indexes +- **Exact match semantics** without tokenization +- **Multiple values** with separator characters +- **Case sensitivity** options + +Learn more: [Tags]({{< relref "/develop/ai/search-and-query/indexing/tags" >}}) + +### Geospatial indexing + +Store and query geographical data: + +- **Point locations** with GEO fields +- **Complex shapes** with GEOSHAPE fields +- **Proximity queries** within radius or bounds +- **Spatial relationships** like contains, intersects + +Learn more: [Geospatial indexing]({{< relref "/develop/ai/search-and-query/indexing/geospatial" >}}) + +### Text processing + +Control how text is tokenized and processed: + +- **Tokenization rules** for different field types +- **Character escaping** in queries and documents +- **Language-specific** processing options + +Learn more: [Tokenization]({{< relref "/develop/ai/search-and-query/indexing/tokenization" >}}) + +### Search techniques + +Advanced query and result processing: + +- **Field projection** to return specific attributes +- **Result highlighting** for search terms +- **Aggregation queries** for analytics and faceting + +Learn more: [Search techniques]({{< relref "/develop/ai/search-and-query/indexing/search-techniques" >}}) + +## Getting started + +1. **Choose your data structure**: Hash for simple structured data, JSON for complex nested data +2. **Design your schema**: Define fields, types, and options based on your query requirements +3. **Create your index**: Use `FT.CREATE` with appropriate configuration +4. **Add your data**: Use standard Redis commands (HSET, JSON.SET) to populate your index +5. **Query your data**: Use `FT.SEARCH` and `FT.AGGREGATE` to find and analyze your information + +## Next steps + +- Start with [Hash indexing]({{< relref "/develop/ai/search-and-query/indexing/hash-indexing" >}}) for simple use cases +- Explore [JSON indexing]({{< relref "/develop/ai/search-and-query/indexing/json-indexing" >}}) for complex data +- Learn about [field types]({{< relref "/develop/ai/search-and-query/indexing/field-and-type-options" >}}) and their capabilities +- See [query examples]({{< relref "/develop/ai/search-and-query/query" >}}) for search patterns -When you create an index with the [`FT.CREATE`]({{< relref "commands/ft.create/" >}}) command, include the `ON JSON` keyword to index any existing and future JSON documents stored in the database. -To define the `SCHEMA`, you can provide [JSONPath]({{< relref "/develop/data-types/json/path" >}}) expressions. -The result of each JSONPath expression is indexed and associated with a logical name called an `attribute` (previously known as a `field`). -You can use these attributes in queries. -{{% alert title="Note" color="info" %}} -Note: `attribute` is optional for [`FT.CREATE`]({{< relref "commands/ft.create/" >}}). -{{% /alert %}} - -Use the following syntax to create a JSON index: - -```sql -FT.CREATE {index_name} ON JSON SCHEMA {json_path} AS {attribute} {type} -``` - -For example, this command creates an index that indexes the name, description, price, and image vector embedding of each JSON document that represents an inventory item: - -```sql -127.0.0.1:6379> FT.CREATE itemIdx ON JSON PREFIX 1 item: SCHEMA $.name AS name TEXT $.description as description TEXT $.price AS price NUMERIC $.embedding AS embedding VECTOR FLAT 6 DIM 4 DISTANCE_METRIC L2 TYPE FLOAT32 -``` - -See [Index limitations](#index-limitations) for more details about JSON index `SCHEMA` restrictions. - -## Add JSON documents - -After you create an index, Redis automatically indexes any existing, modified, or newly created JSON documents stored in the database. For existing documents, indexing runs asynchronously in the background, so it can take some time before the document is available. Modified and newly created documents are indexed synchronously, so the document will be available by the time the add or modify command finishes. - -You can use any JSON write command, such as [`JSON.SET`]({{< relref "commands/json.set/" >}}) and [`JSON.ARRAPPEND`]({{< relref "commands/json.arrappend/" >}}), to create or modify JSON documents. - -The following examples use these JSON documents to represent individual inventory items. - -Item 1 JSON document: - -```json -{ - "name": "Noise-cancelling Bluetooth headphones", - "description": "Wireless Bluetooth headphones with noise-cancelling technology", - "connection": { - "wireless": true, - "type": "Bluetooth" - }, - "price": 99.98, - "stock": 25, - "colors": [ - "black", - "silver" - ], - "embedding": [0.87, -0.15, 0.55, 0.03] -} -``` - -Item 2 JSON document: - -```json -{ - "name": "Wireless earbuds", - "description": "Wireless Bluetooth in-ear headphones", - "connection": { - "wireless": true, - "type": "Bluetooth" - }, - "price": 64.99, - "stock": 17, - "colors": [ - "black", - "white" - ], - "embedding": [-0.7, -0.51, 0.88, 0.14] -} -``` - -Use [`JSON.SET`]({{< relref "commands/json.set/" >}}) to store these documents in the database: - -```sql -127.0.0.1:6379> JSON.SET item:1 $ '{"name":"Noise-cancelling Bluetooth headphones","description":"Wireless Bluetooth headphones with noise-cancelling technology","connection":{"wireless":true,"type":"Bluetooth"},"price":99.98,"stock":25,"colors":["black","silver"],"embedding":[0.87,-0.15,0.55,0.03]}' -"OK" -127.0.0.1:6379> JSON.SET item:2 $ '{"name":"Wireless earbuds","description":"Wireless Bluetooth in-ear headphones","connection":{"wireless":true,"type":"Bluetooth"},"price":64.99,"stock":17,"colors":["black","white"],"embedding":[-0.7,-0.51,0.88,0.14]}' -"OK" -``` - -Because indexing is synchronous in this case, the documents will be available on the index as soon as the [`JSON.SET`]({{< relref "commands/json.set/" >}}) command returns. -Any subsequent queries that match the indexed content will return the document. - -## Search the index - -To search the index for JSON documents, use the [`FT.SEARCH`]({{< relref "commands/ft.search/" >}}) command. -You can search any attribute defined in the `SCHEMA`. - -For example, use this query to search for items with the word "earbuds" in the name: - -```sql -127.0.0.1:6379> FT.SEARCH itemIdx '@name:(earbuds)' -1) "1" -2) "item:2" -3) 1) "$" - 2) "{\"name\":\"Wireless earbuds\",\"description\":\"Wireless Bluetooth in-ear headphones\",\"connection\":{\"wireless\":true,\"connection\":\"Bluetooth\"},\"price\":64.99,\"stock\":17,\"colors\":[\"black\",\"white\"],\"embedding\":[-0.7,-0.51,0.88,0.14]}" -``` - -This query searches for all items that include "bluetooth" and "headphones" in the description: - -```sql -127.0.0.1:6379> FT.SEARCH itemIdx '@description:(bluetooth headphones)' -1) "2" -2) "item:1" -3) 1) "$" - 2) "{\"name\":\"Noise-cancelling Bluetooth headphones\",\"description\":\"Wireless Bluetooth headphones with noise-cancelling technology\",\"connection\":{\"wireless\":true,\"type\":\"Bluetooth\"},\"price\":99.98,\"stock\":25,\"colors\":[\"black\",\"silver\"], \"embedding\":[0.87,-0.15,0.55,0.03]}" -4) "item:2" -5) 1) "$" - 2) "{\"name\":\"Wireless earbuds\",\"description\":\"Wireless Bluetooth in-ear headphones\",\"connection\":{\"wireless\":true,\"connection\":\"Bluetooth\"},\"price\":64.99,\"stock\":17,\"colors\":[\"black\",\"white\"],\"embedding\":[-0.7,-0.51,0.88,0.14]}" -``` - -Now search for Bluetooth headphones with a price less than 70: - -```sql -127.0.0.1:6379> FT.SEARCH itemIdx '@description:(bluetooth headphones) @price:[0 70]' -1) "1" -2) "item:2" -3) 1) "$" - 2) "{\"name\":\"Wireless earbuds\",\"description\":\"Wireless Bluetooth in-ear headphones\",\"connection\":{\"wireless\":true,\"connection\":\"Bluetooth\"},\"price\":64.99,\"stock\":17,\"colors\":[\"black\",\"white\"],\"embedding\":[-0.7,-0.51,0.88,0.14]}" -``` - -And lastly, search for the Bluetooth headphones that are most similar to an image whose embedding is [1.0, 1.0, 1.0, 1.0]: - -```sql -127.0.0.1:6379> FT.SEARCH itemIdx '@description:(bluetooth headphones)=>[KNN 2 @embedding $blob]' PARAMS 2 blob \x01\x01\x01\x01 DIALECT 2 -1) "2" -2) "item:1" -3) 1) "__embedding_score" - 2) "1.08280003071" - 1) "$" - 2) "{\"name\":\"Noise-cancelling Bluetooth headphones\",\"description\":\"Wireless Bluetooth headphones with noise-cancelling technology\",\"connection\":{\"wireless\":true,\"type\":\"Bluetooth\"},\"price\":99.98,\"stock\":25,\"colors\":[\"black\",\"silver\"],\"embedding\":[0.87,-0.15,0.55,0.03]}" -2) "item:2" -3) 1) "__embedding_score" - 2) "1.54409992695" - 3) "$" - 4) "{\"name\":\"Wireless earbuds\",\"description\":\"Wireless Bluetooth in-ear headphones\",\"connection\":{\"wireless\":true,\"connection\":\"Bluetooth\"},\"price\":64.99,\"stock\":17,\"colors\":[\"black\",\"white\"],\"embedding\":[-0.7,-0.51,0.88,0.14]}" -``` - -For more information about search queries, see [Search query syntax]({{< relref "/develop/ai/search-and-query/advanced-concepts/query_syntax" >}}). - -{{% alert title="Note" color="info" %}} -[`FT.SEARCH`]({{< relref "commands/ft.search/" >}}) queries require `attribute` modifiers. Don't use JSONPath expressions in queries because the query parser doesn't fully support them. -{{% /alert %}} - -## Index JSON arrays as TAG - -The preferred method for indexing a JSON field with multivalued terms is using JSON arrays. Each value of the array is indexed, and those values must be scalars. If you want to index string or boolean values as TAGs within a JSON array, use the [JSONPath]({{< relref "/develop/data-types/json/path" >}}) wildcard operator. - -To index an item's list of available colors, specify the JSONPath `$.colors.*` in the `SCHEMA` definition during index creation: - -```sql -127.0.0.1:6379> FT.CREATE itemIdx2 ON JSON PREFIX 1 item: SCHEMA $.colors.* AS colors TAG $.name AS name TEXT $.description as description TEXT -``` - -Now you can search for silver headphones: - -```sql -127.0.0.1:6379> FT.SEARCH itemIdx2 "@colors:{silver} (@name:(headphones)|@description:(headphones))" -1) "1" -2) "item:1" -3) 1) "$" - 2) "{\"name\":\"Noise-cancelling Bluetooth headphones\",\"description\":\"Wireless Bluetooth headphones with noise-cancelling technology\",\"connection\":{\"wireless\":true,\"type\":\"Bluetooth\"},\"price\":99.98,\"stock\":25,\"colors\":[\"black\",\"silver\"]}" -``` - -## Index JSON arrays as TEXT -Starting with RediSearch v2.6.0, full text search can be done on an array of strings or on a JSONPath leading to multiple strings. - -If you want to index multiple string values as TEXT, use either a JSONPath leading to a single array of strings, or a JSONPath leading to multiple string values, using JSONPath operators such as wildcard, filter, union, array slice, and/or recursive descent. - -To index an item's list of available colors, specify the JSONPath `$.colors` in the `SCHEMA` definition during index creation: - -```sql -127.0.0.1:6379> FT.CREATE itemIdx3 ON JSON PREFIX 1 item: SCHEMA $.colors AS colors TEXT $.name AS name TEXT $.description as description TEXT -``` - -```sql -127.0.0.1:6379> JSON.SET item:3 $ '{"name":"True Wireless earbuds","description":"True Wireless Bluetooth in-ear headphones","connection":{"wireless":true,"type":"Bluetooth"},"price":74.99,"stock":20,"colors":["red","light blue"]}' -"OK" -``` - -Now you can do full text search for light colored headphones: - -```sql -127.0.0.1:6379> FT.SEARCH itemIdx3 '@colors:(white|light) (@name|description:(headphones))' RETURN 1 $.colors -1) (integer) 2 -2) "item:2" -3) 1) "$.colors" - 2) "[\"black\",\"white\"]" -4) "item:3" -5) 1) "$.colors" - 2) "[\"red\",\"light blue\"]" -``` - -### Limitations -- When a JSONPath may lead to multiple values and not only to a single array, e.g., when a JSONPath contains wildcards, etc., specifying `SLOP` or `INORDER` in [`FT.SEARCH`]({{< relref "commands/ft.search/" >}}) will return an error, since the order of the values matching the JSONPath is not well defined, leading to potentially inconsistent results. - - For example, using a JSONPath such as `$..b[*]` on a JSON value such as - ```json - { - "a": [ - {"b": ["first first", "first second"]}, - {"c": - {"b": ["second first", "second second"]}}, - {"b": ["third first", "third second"]} - ] - } - ``` - may match values in various orderings, depending on the specific implementation of the JSONPath library being used. - - Since `SLOP` and `INORDER` consider relative ordering among the indexed values, and results may change in future releases, an error will be returned. - -- When JSONPath leads to multiple values: - - String values are indexed - - `null` values are skipped - - Any other value type will cause an indexing failure - -- `SORTBY` only sorts by the first value -- No `HIGHLIGHT` and `SUMMARIZE` support -- `RETURN` of a Schema attribute, whose JSONPath leads to multiple values, returns only the first value (as a JSON String) -- If a JSONPath is specified by the `RETURN`, instead of a Schema attribute, all values are returned (as a JSON String) - -### Handling phrases in different array slots: - -When indexing, a predefined delta is used to increase positional offsets between array slots for multiple text values. This delta controls the level of separation between phrases in different array slots (related to the `SLOP` parameter of [`FT.SEARCH`]({{< relref "commands/ft.search/" >}})). -This predefined value is set by the configuration parameter `MULTI_TEXT_SLOP` (at module load-time). The default value is 100. - -## Index JSON arrays as NUMERIC - -Starting with RediSearch v2.6.1, search can be done on an array of numerical values or on a JSONPath leading to multiple numerical values. - -If you want to index multiple numerical values as NUMERIC, use either a JSONPath leading to a single array of numbers, or a JSONPath leading to multiple numbers, using JSONPath operators such as wildcard, filter, union, array slice, and/or recursive descent. - -For example, add to the item's list the available `max_level` of volume (in decibels): - -```sql -127.0.0.1:6379> JSON.SET item:1 $ '{"name":"Noise-cancelling Bluetooth headphones","description":"Wireless Bluetooth headphones with noise-cancelling technology","connection":{"wireless":true,"type":"Bluetooth"},"price":99.98,"stock":25,"colors":["black","silver"], "max_level":[60, 70, 80, 90, 100]}' -OK - -127.0.0.1:6379> JSON.SET item:2 $ '{"name":"Wireless earbuds","description":"Wireless Bluetooth in-ear headphones","connection":{"wireless":true,"type":"Bluetooth"},"price":64.99,"stock":17,"colors":["black","white"], "max_level":[80, 100, 120]}' -OK - -127.0.0.1:6379> JSON.SET item:3 $ '{"name":"True Wireless earbuds","description":"True Wireless Bluetooth in-ear headphones","connection":{"wireless":true,"type":"Bluetooth"},"price":74.99,"stock":20,"colors":["red","light blue"], "max_level":[90, 100, 110, 120]}' -OK -``` - -To index the `max_level` array, specify the JSONPath `$.max_level` in the `SCHEMA` definition during index creation: - -```sql -127.0.0.1:6379> FT.CREATE itemIdx4 ON JSON PREFIX 1 item: SCHEMA $.max_level AS dB NUMERIC -OK -``` - -You can now search for headphones with specific max volume levels, for example, between 70 and 80 (inclusive), returning items with at least one value in their `max_level` array, which is in the requested range: - -```sql -127.0.0.1:6379> FT.SEARCH itemIdx4 '@dB:[70 80]' -1) (integer) 2 -2) "item:1" -3) 1) "$" - 2) "{\"name\":\"Noise-cancelling Bluetooth headphones\",\"description\":\"Wireless Bluetooth headphones with noise-cancelling technology\",\"connection\":{\"wireless\":true,\"type\":\"Bluetooth\"},\"price\":99.98,\"stock\":25,\"colors\":[\"black\",\"silver\"],\"max_level\":[60,70,80,90,100]}" -4) "item:2" -5) 1) "$" - 2) "{\"name\":\"Wireless earbuds\",\"description\":\"Wireless Bluetooth in-ear headphones\",\"connection\":{\"wireless\":true,\"type\":\"Bluetooth\"},\"price\":64.99,\"stock\":17,\"colors\":[\"black\",\"white\"],\"max_level\":[80,100,120]}" -``` - -You can also search for items with all values in a specific range. For example, all values are in the range [90, 120] (inclusive): - -```sql -127.0.0.1:6379> FT.SEARCH itemIdx4 '-@dB:[-inf (90] -@dB:[(120 +inf]' -1) (integer) 1 -2) "item:3" -3) 1) "$" - 2) "{\"name\":\"True Wireless earbuds\",\"description\":\"True Wireless Bluetooth in-ear headphones\",\"connection\":{\"wireless\":true,\"type\":\"Bluetooth\"},\"price\":74.99,\"stock\":20,\"colors\":[\"red\",\"light blue\"],\"max_level\":[90,100,110,120]}" -``` - -### Limitations - -When JSONPath leads to multiple numerical values: - - Numerical values are indexed - - `null` values are skipped - - Any other value type will cause an indexing failure - -## Index JSON arrays as GEO and GEOSHAPE - -You can use `GEO` and `GEOSHAPE` fields to store geospatial data, -such as geographical locations and geometric shapes. See -[Geospatial indexing]({{< relref "/develop/ai/search-and-query/indexing/geoindex" >}}) -to learn how to use these schema types and see the -[Geospatial]({{< relref "/develop/ai/search-and-query/advanced-concepts/geo" >}}) -reference page for an introduction to their format and usage. - -## Index JSON arrays as VECTOR - -Starting with RediSearch 2.6.0, you can index a JSONPath leading to an array of numeric values as a VECTOR type in the index schema. - -For example, assume that your JSON items include an array of vector embeddings, where each vector represents an image of a product. To index these vectors, specify the JSONPath `$.embedding` in the schema definition during index creation: - -```sql -127.0.0.1:6379> FT.CREATE itemIdx5 ON JSON PREFIX 1 item: SCHEMA $.embedding AS embedding VECTOR FLAT 6 DIM 4 DISTANCE_METRIC L2 TYPE FLOAT32 -OK -127.0.0.1:6379> JSON.SET item:1 $ '{"name":"Noise-cancelling Bluetooth headphones","description":"Wireless Bluetooth headphones with noise-cancelling technology","price":99.98,"stock":25,"colors":["black","silver"],"embedding":[0.87,-0.15,0.55,0.03]}' -OK -127.0.0.1:6379> JSON.SET item:2 $ '{"name":"Wireless earbuds","description":"Wireless Bluetooth in-ear headphones","price":64.99,"stock":17,"colors":["black","white"],"embedding":[-0.7,-0.51,0.88,0.14]}' -OK -``` - -Now you can search for the two headphones that are most similar to the image embedding by using vector search KNN query. (Note that the vector queries are supported as of dialect 2.) For example: - -```sql -127.0.0.1:6379> FT.SEARCH itemIdx5 '*=>[KNN 2 @embedding $blob AS dist]' SORTBY dist PARAMS 2 blob \x01\x01\x01\x01 DIALECT 2 -1) (integer) 2 -2) "item:1" -3) 1) "dist" - 2) "1.08280003071" - 3) "$" - 4) "{\"name\":\"Noise-cancelling Bluetooth headphones\",\"description\":\"Wireless Bluetooth headphones with noise-cancelling technology\",\"price\":99.98,\"stock\":25,\"colors\":[\"black\",\"silver\"],\"embedding\":[0.87,-0.15,0.55,0.03]}" -4) "item:2" -5) 1) "dist" - 2) "1.54409992695" - 3) "$" - 4) "{\"name\":\"Wireless earbuds\",\"description\":\"Wireless Bluetooth in-ear headphones\",\"price\":64.99,\"stock\":17,\"colors\":[\"black\",\"white\"],\"embedding\":[-0.7,-0.51,0.88,0.14]}" -``` - -If you want to index multiple numeric arrays as VECTOR, use a [JSONPath]({{< relref "/develop/data-types/json/path" >}}) leading to multiple numeric arrays using JSONPath operators such as wildcard, filter, union, array slice, and/or recursive descent. - -For example, assume that your JSON items include an array of vector embeddings, where each vector represents a different image of the same product. To index these vectors, specify the JSONPath `$.embeddings[*]` in the schema definition during index creation: - -```sql -127.0.0.1:6379> FT.CREATE itemIdx5 ON JSON PREFIX 1 item: SCHEMA $.embeddings[*] AS embeddings VECTOR FLAT 6 DIM 4 DISTANCE_METRIC L2 TYPE FLOAT32 -OK -127.0.0.1:6379> JSON.SET item:1 $ '{"name":"Noise-cancelling Bluetooth headphones","description":"Wireless Bluetooth headphones with noise-cancelling technology","price":99.98,"stock":25,"colors":["black","silver"],"embeddings":[[0.87,-0.15,0.55,0.03]]}' -OK -127.0.0.1:6379> JSON.SET item:2 $ '{"name":"Wireless earbuds","description":"Wireless Bluetooth in-ear headphones","price":64.99,"stock":17,"colors":["black","white"],"embeddings":[[-0.7,-0.51,0.88,0.14],[-0.8,-0.15,0.33,-0.01]]}' -OK -``` - -{{% alert title="Important note" color="info" %}} -Unlike the case with the NUMERIC type, setting a static path such as `$.embedding` in the schema for the VECTOR type does not allow you to index multiple vectors stored under that field. Hence, if you set `$.embedding` as the path to the index schema, specifying an array of vectors in the `embedding` field in your JSON will cause an indexing failure. -{{% /alert %}} - -Now you can search for the two headphones that are most similar to an image embedding by using vector search KNN query. (Note that the vector queries are supported as of dialect 2.) The distance between a document to the query vector is defined as the minimum distance between the query vector to a vector that matches the JSONPath specified in the schema. For example: - -```sql -127.0.0.1:6379> FT.SEARCH itemIdx5 '*=>[KNN 2 @embeddings $blob AS dist]' SORTBY dist PARAMS 2 blob \x01\x01\x01\x01 DIALECT 2 -1) (integer) 2 -2) "item:2" -3) 1) "dist" - 2) "0.771500051022" - 3) "$" - 4) "{\"name\":\"Wireless earbuds\",\"description\":\"Wireless Bluetooth in-ear headphones\",\"price\":64.99,\"stock\":17,\"colors\":[\"black\",\"white\"],\"embeddings\":[[-0.7,-0.51,0.88,0.14],[-0.8,-0.15,0.33,-0.01]]}" -4) "item:1" -5) 1) "dist" - 2) "1.08280003071" - 3) "$" - 4) "{\"name\":\"Noise-cancelling Bluetooth headphones\",\"description\":\"Wireless Bluetooth headphones with noise-cancelling technology\",\"price\":99.98,\"stock\":25,\"colors\":[\"black\",\"silver\"],\"embeddings\":[[0.87,-0.15,0.55,0.03]]}" -``` -Note that `0.771500051022` is the L2 distance between the query vector and `[-0.8,-0.15,0.33,-0.01]`, which is the second element in the embedding array, and it is lower than the L2 distance between the query vector and `[-0.7,-0.51,0.88,0.14]`, which is the first element in the embedding array. - -For more information on vector similarity syntax, see [Vector fields]({{< relref "/develop/ai/search-and-query/vectors" >}}). - -## Index JSON objects - -You cannot index JSON objects. FT.CREATE will return an error if the JSONPath expression returns an object. - -To index the contents of a JSON object, you need to index the individual elements within the object in separate attributes. - -For example, to index the `connection` JSON object, define the `$.connection.wireless` and `$.connection.type` fields as separate attributes when you create the index: - -```sql -127.0.0.1:6379> FT.CREATE itemIdx3 ON JSON SCHEMA $.connection.wireless AS wireless TAG $.connection.type AS connectionType TEXT -"OK" -``` - -After you create the new index, you can search for items with the wireless TAG set to `true`: - -```sql -127.0.0.1:6379> FT.SEARCH itemIdx3 '@wireless:{true}' -1) "2" -2) "item:2" -3) 1) "$" - 2) "{\"name\":\"Wireless earbuds\",\"description\":\"Wireless Bluetooth in-ear headphones\",\"connection\":{\"wireless\":true,\"connection\":\"Bluetooth\"},\"price\":64.99,\"stock\":17,\"colors\":[\"black\",\"white\"]}" -4) "item:1" -5) 1) "$" - 2) "{\"name\":\"Noise-cancelling Bluetooth headphones\",\"description\":\"Wireless Bluetooth headphones with noise-cancelling technology\",\"connection\":{\"wireless\":true,\"type\":\"Bluetooth\"},\"price\":99.98,\"stock\":25,\"colors\":[\"black\",\"silver\"]}" -``` - -You can also search for items with a Bluetooth connection type: - -```sql -127.0.0.1:6379> FT.SEARCH itemIdx3 '@connectionType:(bluetooth)' -1) "2" -2) "item:1" -3) 1) "$" - 2) "{\"name\":\"Noise-cancelling Bluetooth headphones\",\"description\":\"Wireless Bluetooth headphones with noise-cancelling technology\",\"connection\":{\"wireless\":true,\"type\":\"Bluetooth\"},\"price\":99.98,\"stock\":25,\"colors\":[\"black\",\"silver\"]}" -4) "item:2" -5) 1) "$" - 2) "{\"name\":\"Wireless earbuds\",\"description\":\"Wireless Bluetooth in-ear headphones\",\"connection\":{\"wireless\":true,\"type\":\"Bluetooth\"},\"price\":64.99,\"stock\":17,\"colors\":[\"black\",\"white\"]}" -``` - -## Field projection - -[`FT.SEARCH`]({{< relref "commands/ft.search/" >}}) returns the entire JSON document by default. If you want to limit the returned search results to specific attributes, you can use field projection. - -### Return specific attributes - -When you run a search query, you can use the `RETURN` keyword to specify which attributes you want to include in the search results. You also need to specify the number of fields to return. - -For example, this query only returns the `name` and `price` of each set of headphones: - -```sql -127.0.0.1:6379> FT.SEARCH itemIdx '@description:(headphones)' RETURN 2 name price -1) "2" -2) "item:1" -3) 1) "name" - 2) "Noise-cancelling Bluetooth headphones" - 3) "price" - 4) "99.98" -4) "item:2" -5) 1) "name" - 2) "Wireless earbuds" - 3) "price" - 4) "64.99" -``` - -### Project with JSONPath - -You can use [JSONPath]({{< relref "/develop/data-types/json/path" >}}) expressions in a `RETURN` statement to extract any part of the JSON document, even fields that were not defined in the index `SCHEMA`. - -For example, the following query uses the JSONPath expression `$.stock` to return each item's stock in addition to the name and price attributes. - -```sql -127.0.0.1:6379> FT.SEARCH itemIdx '@description:(headphones)' RETURN 3 name price $.stock -1) "2" -2) "item:1" -3) 1) "name" - 2) "Noise-cancelling Bluetooth headphones" - 3) "price" - 4) "99.98" - 5) "$.stock" - 6) "25" -4) "item:2" -5) 1) "name" - 2) "Wireless earbuds" - 3) "price" - 4) "64.99" - 5) "$.stock" - 6) "17" -``` - -Note that the returned property name is the JSONPath expression itself: `"$.stock"`. - -You can use the `AS` option to specify an alias for the returned property: - -```sql -127.0.0.1:6379> FT.SEARCH itemIdx '@description:(headphones)' RETURN 5 name price $.stock AS stock -1) "2" -2) "item:1" -3) 1) "name" - 2) "Noise-cancelling Bluetooth headphones" - 3) "price" - 4) "99.98" - 5) "stock" - 6) "25" -4) "item:2" -5) 1) "name" - 2) "Wireless earbuds" - 3) "price" - 4) "64.99" - 5) "stock" - 6) "17" -``` - -This query returns the field as the alias `"stock"` instead of the JSONPath expression `"$.stock"`. - -### Highlight search terms - -You can [highlight]({{< relref "/develop/ai/search-and-query/advanced-concepts/highlight" >}}) relevant search terms in any indexed `TEXT` attribute. - -For [`FT.SEARCH`]({{< relref "commands/ft.search/" >}}), you have to explicitly set which attributes you want highlighted after the `RETURN` and `HIGHLIGHT` parameters. - -Use the optional `TAGS` keyword to specify the strings that will surround (or highlight) the matching search terms. - -For example, highlight the word "bluetooth" with bold HTML tags in item names and descriptions: - -```sql -127.0.0.1:6379> FT.SEARCH itemIdx '(@name:(bluetooth))|(@description:(bluetooth))' RETURN 3 name description price HIGHLIGHT FIELDS 2 name description TAGS '' '' -1) "2" -2) "item:1" -3) 1) "name" - 2) "Noise-cancelling Bluetooth headphones" - 3) "description" - 4) "Wireless Bluetooth headphones with noise-cancelling technology" - 5) "price" - 6) "99.98" -4) "item:2" -5) 1) "name" - 2) "Wireless earbuds" - 3) "description" - 4) "Wireless Bluetooth in-ear headphones" - 5) "price" - 6) "64.99" -``` - -## Aggregate with JSONPath - -You can use [aggregation]({{< relref "/develop/ai/search-and-query/advanced-concepts/aggregations" >}}) to generate statistics or build facet queries. - -The `LOAD` option accepts [JSONPath]({{< relref "/develop/data-types/json/path" >}}) expressions. You can use any value in the pipeline, even if the value is not indexed. - -This example uses aggregation to calculate a 10% price discount for each item and sorts the items from least expensive to most expensive: - -```sql -127.0.0.1:6379> FT.AGGREGATE itemIdx '*' LOAD 4 name $.price AS originalPrice APPLY '@originalPrice - (@originalPrice * 0.10)' AS salePrice SORTBY 2 @salePrice ASC -1) "2" -2) 1) "name" - 2) "Wireless earbuds" - 3) "originalPrice" - 4) "64.99" - 5) "salePrice" - 6) "58.491" -3) 1) "name" - 2) "Noise-cancelling Bluetooth headphones" - 3) "originalPrice" - 4) "99.98" - 5) "salePrice" - 6) "89.982" -``` - -{{% alert title="Note" color="info" %}} -[`FT.AGGREGATE`]({{< relref "commands/ft.aggregate/" >}}) queries require `attribute` modifiers. Don't use JSONPath expressions in queries, except with the `LOAD` option, because the query parser doesn't fully support them. -{{% /alert %}} - -## Index missing or empty values -As of v2.10, you can search for missing properties, that is, properties that do not exist in a given document, using the `INDEXMISSING` option to `FT.CREATE` in conjunction with the `ismissing` query function with `FT.SEARCH`. You can also search for existing properties with no value (i.e., empty) using the `INDEXEMPTY` option with `FT.CREATE`. Both query types require DIALECT 2. Examples below: - -``` -JSON.SET key:1 $ '{"propA": "foo"}' -JSON.SET key:2 $ '{"propA": "bar", "propB":"abc"}' -FT.CREATE idx ON JSON PREFIX 1 key: SCHEMA $.propA AS propA TAG $.propB AS propB TAG INDEXMISSING - -> FT.SEARCH idx 'ismissing(@propB)' DIALECT 2 -1) "1" -2) "key:1" -3) 1) "$" - 2) "{\"propA\":\"foo\"}" -``` - -``` -JSON.SET key:1 $ '{"propA": "foo", "propB":""}' -JSON.SET key:2 $ '{"propA": "bar", "propB":"abc"}' -FT.CREATE idx ON JSON PREFIX 1 key: SCHEMA $.propA AS propA TAG $.propB AS propB TAG INDEXEMPTY - -> FT.SEARCH idx '@propB:{""}' DIALECT 2 -1) "1" -2) "key:1" -3) 1) "$" - 2) "{\"propA\":\"foo\",\"propB\":\"\"}" -``` - -## Index limitations - -### Schema mapping - -During index creation, you need to map the JSON elements to `SCHEMA` fields as follows: - -- Strings as `TEXT`, `TAG`, or `GEO`. -- Numbers as `NUMERIC`. -- Booleans as `TAG`. -- JSON array - - Array of strings as `TAG` or `TEXT`. - - Array of numbers as `NUMERIC` or `VECTOR`. - - Array of geo coordinates as `GEO`. - - `null` values in such arrays are ignored. -- You cannot index JSON objects. Index the individual elements as separate attributes instead. -- `null` values are ignored. - -### Sortable tags - -If you create an index for JSON documents with a JSONPath leading to an array or to multiple values, only the first value is considered by the sort. diff --git a/content/develop/ai/search-and-query/indexing/field-and-type-options.md b/content/develop/ai/search-and-query/indexing/field-and-type-options.md index cb2033f2be..84c60b9535 100644 --- a/content/develop/ai/search-and-query/indexing/field-and-type-options.md +++ b/content/develop/ai/search-and-query/indexing/field-and-type-options.md @@ -15,7 +15,7 @@ categories: description: Available field types and options. linkTitle: Field and type options title: Field and type options -weight: 2 +weight: 5 --- @@ -71,7 +71,7 @@ Geo fields are used to store geographical coordinates such as longitude and lati Redis Query Engine also supports [geoshape fields](#geoshape-fields) for more advanced geospatial queries. See the -[Geospatial]({{< relref "/develop/ai/search-and-query/advanced-concepts/geo" >}}) +[Geospatial indexing]({{< relref "/develop/ai/search-and-query/indexing/geospatial" >}}) reference page for an introduction to the format and usage of both schema types. You can add geo fields to the schema in [`FT.CREATE`]({{< relref "commands/ft.create/" >}}) using this syntax: @@ -106,7 +106,7 @@ such as finding all office locations in a specified region or finding all rooms in a building that fall within range of a wi-fi router. See the -[Geospatial]({{< relref "/develop/ai/search-and-query/advanced-concepts/geo" >}}) +[Geospatial indexing]({{< relref "/develop/ai/search-and-query/indexing/geospatial" >}}) reference page for an introduction to the format and usage of both the geoshape and geo schema types. @@ -197,7 +197,7 @@ You can search for documents with specific tags using the `@:{} FT.SEARCH idx "@tags:{blue}" ``` -For more information about tag fields, see [Tag Fields]({{< relref "/develop/ai/search-and-query/advanced-concepts/tags" >}}). +For more information about tag fields, see [Tag Fields]({{< relref "/develop/ai/search-and-query/indexing/tags" >}}). ## Text fields diff --git a/content/develop/ai/search-and-query/indexing/geoindex.md b/content/develop/ai/search-and-query/indexing/geoindex.md deleted file mode 100644 index 3b24434467..0000000000 --- a/content/develop/ai/search-and-query/indexing/geoindex.md +++ /dev/null @@ -1,112 +0,0 @@ ---- -aliases: -- /develop/interact/search-and-query/indexing/geoindex -categories: -- docs -- develop -- stack -- oss -- rs -- rc -- oss -- kubernetes -- clients -description: Options for indexing geospatial data -linkTitle: Geospatial -title: Geospatial indexing -weight: 3 ---- - -Redis supports two different -[schema types]({{< relref "/develop/ai/search-and-query/indexing/field-and-type-options" >}}) -for geospatial data: - -- [`GEO`](#geo): This uses a simple format where individual geospatial - points are specified as numeric longitude-latitude pairs. -- [`GEOSHAPE`](#geoshape): This uses a subset of the - [Well-Known Text (WKT)](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) - format to specify both points and polygons using either geographical - coordinates or Cartesian coordinates. - -The sections below explain how to index these schema types. See the -[Geospatial]({{< relref "/develop/ai/search-and-query/advanced-concepts/geo" >}}) -reference page for a full description of both types. - -## `GEO` - -The following command creates a `GEO` index for JSON objects that contain -the geospatial data in a field called `location`: - -{{< clients-example geoindex create_geo_idx >}} -> FT.CREATE productidx ON JSON PREFIX 1 product: SCHEMA $.location AS location GEO -OK -{{< /clients-example >}} - -If you now add JSON objects with the `product:` prefix and a `location` field, -they will be added to the index automatically: - -{{< clients-example geoindex add_geo_json >}} -> JSON.SET product:46885 $ '{"description": "Navy Blue Slippers","price": 45.99,"city": "Denver","location": "-104.991531, 39.742043"}' -OK -> JSON.SET product:46886 $ '{"description": "Bright Green Socks","price": 25.50,"city": "Fort Collins","location": "-105.0618814,40.5150098"}' -OK -{{< /clients-example >}} - -The query below finds products within a 100 mile radius of Colorado Springs -(Longitude=-104.800644, Latitude=38.846127). This returns only the location in -Denver, but a radius of 200 miles would also include the location in Fort Collins: - -{{< clients-example geoindex geo_query >}} -> FT.SEARCH productidx '@location:[-104.800644 38.846127 100 mi]' -1) "1" -2) "product:46885" -3) 1) "$" - 2) "{\"description\":\"Navy Blue Slippers\",\"price\":45.99,\"city\":\"Denver\",\"location\":\"-104.991531, 39.742043\"}" -{{< /clients-example >}} - -See [Geospatial queries]({{< relref "/develop/ai/search-and-query/query/geo-spatial" >}}) -for more information about the available options. - -## `GEOSHAPE` - -The following command creates an index for JSON objects that include -geospatial data in a field called `geom`. The `FLAT` option at the end -of the field definition specifies Cartesian coordinates instead of -the default spherical geographical coordinates. Use `SPHERICAL` in -place of `FLAT` to choose the coordinate space explicitly. - -{{< clients-example geoindex create_gshape_idx >}} -> FT.CREATE geomidx ON JSON PREFIX 1 shape: SCHEMA $.name AS name TEXT $.geom AS geom GEOSHAPE FLAT -OK -{{< /clients-example >}} - -Use the `shape:` prefix for the JSON objects to add them to the index: - -{{< clients-example geoindex add_gshape_json >}} -> JSON.SET shape:1 $ '{"name": "Green Square", "geom": "POLYGON ((1 1, 1 3, 3 3, 3 1, 1 1))"}' -OK -> JSON.SET shape:2 $ '{"name": "Red Rectangle", "geom": "POLYGON ((2 2.5, 2 3.5, 3.5 3.5, 3.5 2.5, 2 2.5))"}' -OK -> JSON.SET shape:3 $ '{"name": "Blue Triangle", "geom": "POLYGON ((3.5 1, 3.75 2, 4 1, 3.5 1))"}' -OK -> JSON.SET shape:4 $ '{"name": "Purple Point", "geom": "POINT (2 2)"}' -OK -{{< /clients-example >}} - -You can now run various geospatial queries against the index. For -example, the query below returns any shapes within the boundary -of the green square but omits the green square itself: - -{{< clients-example geoindex gshape_query >}} -> FT.SEARCH geomidx "(-@name:(Green Square) @geom:[WITHIN $qshape])" PARAMS 2 qshape "POLYGON ((1 1, 1 3, 3 3, 3 1, 1 1))" RETURN 1 name DIALECT 2 - -1) (integer) 1 -2) "shape:4" -3) 1) "name" - 2) "[\"Purple Point\"]" -{{< /clients-example >}} - -You can also run queries to find whether shapes in the index completely contain -or overlap each other. See -[Geospatial queries]({{< relref "/develop/ai/search-and-query/query/geo-spatial" >}}) -for more information. diff --git a/content/develop/ai/search-and-query/indexing/geospatial.md b/content/develop/ai/search-and-query/indexing/geospatial.md new file mode 100644 index 0000000000..4f3e1c77fc --- /dev/null +++ b/content/develop/ai/search-and-query/indexing/geospatial.md @@ -0,0 +1,298 @@ +--- +aliases: +- /develop/interact/search-and-query/indexing/geoindex +- /develop/interact/search-and-query/advanced-concepts/geo +- /develop/ai/search-and-query/indexing/geo +- /develop/ai/search-and-query/indexing/geospatial +categories: +- docs +- develop +- stack +- oss +- rs +- rc +- oss +- kubernetes +- clients +description: How to index and query geospatial data including points and shapes +linkTitle: Geospatial indexing +title: Geospatial indexing +weight: 35 +math: true +--- + +You can store and query geographical locations and geometric shapes using Redis geospatial indexing. This feature enables location-based searches, proximity queries, and spatial relationship analysis. + +{{< note >}}Don't confuse geospatial indexing in Redis Query Engine with the [Geospatial data type]({{< relref "/develop/data-types/geospatial" >}}) that Redis also supports. The data type is intended for simpler use cases, while geospatial indexing provides more advanced format options and query capabilities.{{< /note >}} + +## Use cases + +Geospatial indexing enables powerful location-based applications: + +- **Store locator**: Find shops within 5km of a user's position +- **Delivery zones**: Determine if an address falls within delivery boundaries +- **Real estate**: Search properties within specific neighborhoods or districts +- **Gaming**: Track player positions and detect collisions in virtual worlds +- **IoT tracking**: Monitor device locations and geofenced areas + +## Coordinate systems + +Redis supports two coordinate systems: + +- **Geographical coordinates**: Longitude and latitude for real-world locations (towns, countries) +- **Cartesian coordinates**: X,Y coordinates on a flat plane for smaller areas (building floors, game maps) + +## Field types + +Redis provides two geospatial field types with different capabilities: + +| Feature | GEO | GEOSHAPE | +|---------|-----|----------| +| **Format** | Longitude-latitude pairs | Well-Known Text (WKT) | +| **Shapes** | Points only | Points and polygons | +| **Queries** | Radius searches | Spatial relationships | +| **Complexity** | Simple | Advanced | +| **Use case** | Basic location queries | Complex spatial analysis | + +## GEO fields + +GEO fields store simple point locations using longitude-latitude pairs. + +### Format + +Store coordinates as strings with longitude first: + +```json +{ + "name": "Coffee Shop", + "location": "-104.991531, 39.742043" +} +``` + +Or as JSON arrays for multiple locations: + +```json +{ + "name": "Chain Store", + "locations": [ + "-104.991531, 39.742043", + "-105.0618814, 40.5150098" + ] +} +``` + +### Create GEO index + +```sql +FT.CREATE stores ON JSON PREFIX 1 store: SCHEMA + $.name AS name TEXT + $.location AS location GEO +``` + +### Add GEO data + +```sql +JSON.SET store:1 $ '{ + "name": "Downtown Coffee", + "location": "-104.991531, 39.742043", + "city": "Denver" +}' + +JSON.SET store:2 $ '{ + "name": "Mountain View Cafe", + "location": "-105.0618814, 40.5150098", + "city": "Boulder" +}' +``` + +### Query GEO fields + +Find locations within a radius: + +```sql +# Find stores within 50 miles of coordinates +FT.SEARCH stores '@location:[-104.800644 38.846127 50 mi]' + +# Find stores within 10 kilometers +FT.SEARCH stores '@location:[-104.800644 38.846127 10 km]' +``` + +Supported distance units: +- `m` - meters +- `km` - kilometers +- `mi` - miles +- `ft` - feet + +## GEOSHAPE fields + +GEOSHAPE fields support both points and polygons using Well-Known Text (WKT) format. + +### Supported shapes + +**POINT**: Single coordinate location +``` +POINT (2 2) +POINT (-104.991531 39.742043) +``` + +**POLYGON**: Closed shape defined by coordinate sequence +``` +POLYGON ((1 1, 1 3, 3 3, 3 1, 1 1)) +``` + +**Important**: +- Polygons require double parentheses +- First and last coordinates must be identical (closed shape) +- Use clockwise or counter-clockwise winding order consistently + +### Create GEOSHAPE index + +```sql +FT.CREATE zones ON JSON PREFIX 1 zone: SCHEMA + $.name AS name TEXT + $.boundary AS boundary GEOSHAPE +``` + +### Add GEOSHAPE data + +```sql +# Point location +JSON.SET zone:1 $ '{ + "name": "City Center", + "boundary": "POINT (-104.991531 39.742043)" +}' + +# Polygon area +JSON.SET zone:2 $ '{ + "name": "Downtown District", + "boundary": "POLYGON ((-105.01 39.74, -105.01 39.76, -104.99 39.76, -104.99 39.74, -105.01 39.74))" +}' +``` + +### Query GEOSHAPE fields + +GEOSHAPE supports four spatial relationship queries: + +**WITHIN**: Find shapes completely inside another shape +```sql +FT.SEARCH zones '@boundary:[WITHIN $area]' + PARAMS 2 area "POLYGON ((-105.02 39.73, -105.02 39.77, -104.98 39.77, -104.98 39.73, -105.02 39.73))" + DIALECT 2 +``` + +**CONTAINS**: Find shapes that completely contain another shape +```sql +FT.SEARCH zones '@boundary:[CONTAINS $point]' + PARAMS 2 point "POINT (-105.00 39.75)" + DIALECT 2 +``` + +**INTERSECTS**: Find shapes that overlap another shape +```sql +FT.SEARCH zones '@boundary:[INTERSECTS $area]' + PARAMS 2 area "POLYGON ((-105.00 39.745, -105.00 39.755, -104.995 39.755, -104.995 39.745, -105.00 39.745))" + DIALECT 2 +``` + +**DISJOINT**: Find shapes that don't overlap another shape +```sql +FT.SEARCH zones '@boundary:[DISJOINT $area]' + PARAMS 2 area "POLYGON ((-105.00 39.745, -105.00 39.755, -104.995 39.755, -104.995 39.745, -105.00 39.745))" + DIALECT 2 +``` + +## Practical examples + +### Store locator with delivery zones + +```sql +# Create index for stores with delivery areas +FT.CREATE delivery ON JSON PREFIX 1 business: SCHEMA + $.name AS name TEXT + $.location AS location GEO + $.delivery_zone AS delivery_zone GEOSHAPE + +# Add store with circular delivery area (approximated as polygon) +JSON.SET business:1 $ '{ + "name": "Pizza Palace", + "location": "-104.991531, 39.742043", + "delivery_zone": "POLYGON ((-105.01 39.72, -105.01 39.76, -104.97 39.76, -104.97 39.72, -105.01 39.72))" +}' + +# Find stores that deliver to a specific address +FT.SEARCH delivery '@delivery_zone:[CONTAINS $address]' + PARAMS 2 address "POINT (-105.00 39.75)" + DIALECT 2 +``` + +### Gaming world with regions + +```sql +# Create index for game regions +FT.CREATE gameworld ON JSON PREFIX 1 region: SCHEMA + $.name AS name TEXT + $.area AS area GEOSHAPE + $.type AS type TAG + +# Add different region types +JSON.SET region:1 $ '{ + "name": "Safe Zone", + "type": "safe", + "area": "POLYGON ((10 10, 10 20, 20 20, 20 10, 10 10))" +}' + +JSON.SET region:2 $ '{ + "name": "Battle Arena", + "type": "pvp", + "area": "POLYGON ((25 25, 25 35, 35 35, 35 25, 25 25))" +}' + +# Find what region a player is in +FT.SEARCH gameworld '@area:[CONTAINS $player]' + PARAMS 2 player "POINT (15 15)" + DIALECT 2 +``` + +## Limitations and considerations + +### Geographical coordinate limitations + +Planet Earth is shaped more like an [ellipsoid](https://en.wikipedia.org/wiki/Earth_ellipsoid) than a perfect sphere. Redis uses a spherical coordinate system that closely approximates Earth's shape but isn't exact. + +**For most applications**: The approximation works very well +**For high precision needs**: Don't rely on geospatial indexing for critical applications requiring exact GPS positioning (emergency response, surveying) + +### Performance considerations + +- **GEO queries**: Very fast for radius searches +- **GEOSHAPE queries**: More complex, especially for large polygons +- **Index size**: Geospatial indexes are generally compact +- **Query complexity**: INTERSECTS and CONTAINS are more expensive than WITHIN + +### Best practices + +1. **Choose the right field type**: + - Use GEO for simple point-radius queries + - Use GEOSHAPE for complex spatial relationships + +2. **Optimize polygon complexity**: + - Keep polygons simple when possible + - Avoid highly detailed boundaries for better performance + +3. **Coordinate order**: + - GEO: Always longitude first, then latitude + - GEOSHAPE: Follow WKT standard (X Y or longitude latitude) + +4. **Test with real data**: + - Verify coordinate systems match your application needs + - Test query performance with realistic data volumes + +5. **Use appropriate units**: + - Choose distance units that match your application scale + - Consider user expectations (miles vs kilometers) + +## Next steps + +- Learn about [field and type options]({{< relref "/develop/ai/search-and-query/indexing/field-and-type-options" >}}) for other field types +- Explore [geospatial queries]({{< relref "/develop/ai/search-and-query/query/geo-spatial" >}}) for advanced search patterns +- See [query syntax]({{< relref "/develop/ai/search-and-query/advanced-concepts/query_syntax" >}}) for combining geospatial with other queries diff --git a/content/develop/ai/search-and-query/indexing/schema-definition.md b/content/develop/ai/search-and-query/indexing/hash-indexing.md similarity index 66% rename from content/develop/ai/search-and-query/indexing/schema-definition.md rename to content/develop/ai/search-and-query/indexing/hash-indexing.md index c2b3baf2a0..5898badb1b 100644 --- a/content/develop/ai/search-and-query/indexing/schema-definition.md +++ b/content/develop/ai/search-and-query/indexing/hash-indexing.md @@ -2,6 +2,7 @@ aliases: - /develop/interact/search-and-query/indexing/schema-definition - /develop/interact/search-and-query/basic-constructs/schema-definition +- /develop/interact/search-and-query/indexing/hash-indexing categories: - docs - develop @@ -12,15 +13,15 @@ categories: - oss - kubernetes - clients -description: 'How to define the schema of an index. - - ' -linkTitle: Schema definition -title: Schema definition -weight: 1 +description: 'How to create and configure indexes for Redis Hash documents.' +linkTitle: Hash indexing +title: Hash indexing +weight: 20 --- -An index structure is defined by a schema. The schema specifies the fields, their types, whether they should be indexed or stored, and other additional configuration options. By properly configuring the schema, you can optimize search performance and control the storage requirements of your index. +You can create search indexes for Redis Hash documents to enable fast, flexible queries across your data. Hash indexing provides a straightforward approach where field names in your schema map directly to hash field names, making it ideal for structured data with consistent field patterns. + +An index structure is defined by a schema that specifies the fields, their types, whether they should be indexed or stored, and other configuration options. By properly configuring your schema, you can optimize search performance and control the storage requirements of your index. ``` FT.CREATE idx @@ -41,7 +42,7 @@ You can learn more about the available field types and options on the [`FT.CREAT ## More schema definition examples -##### Index tags with a separator +### Index tags with a separator Index books that have a `categories` attribute, where each category is separated by a `;` character. @@ -54,7 +55,7 @@ SCHEMA categories TAG SEPARATOR ";" ``` -##### Index a single field in multiple ways +### Index a single field in multiple ways Index the `sku` attribute from a hash as both a `TAG` and as `TEXT`: @@ -67,7 +68,7 @@ SCHEMA sku AS sku_tag TAG SORTABLE ``` -##### Index documents with multiple prefixes +### Index documents with multiple prefixes Index two different hashes, one containing author data and one containing book data: ``` @@ -82,7 +83,7 @@ SCHEMA In this example, keys for author data use the key pattern `author:details:`, while keys for book data use the pattern `book:details:`. -##### Only index documents if a field specifies a certain value using `FILTER` +### Only index documents if a field specifies a certain value using `FILTER` Index authors whose names start with G: @@ -106,17 +107,8 @@ SCHEMA title TEXT ``` -##### Index a JSON document using a JSONPath expression - -Index a JSON document that has `title` and `categories` fields. The `title` field is indexed as `TEXT` and the `categories` field is indexed as `TAG`. - -``` -FT.CREATE idx - ON JSON -SCHEMA - $.title AS title TEXT - $.categories AS categories TAG -``` +## Next steps +You can learn more about the available field types and options on the [Field and type options]({{< relref "/develop/ai/search-and-query/indexing/field-and-type-options" >}}) page. -You can learn more about the available field types and options on the [`FT.CREATE`]({{< relref "commands/ft.create/" >}}) page. \ No newline at end of file +For JSON document indexing, see [JSON indexing]({{< relref "/develop/ai/search-and-query/indexing/json-indexing" >}}). \ No newline at end of file diff --git a/content/develop/ai/search-and-query/indexing/json-arrays.md b/content/develop/ai/search-and-query/indexing/json-arrays.md new file mode 100644 index 0000000000..6d4c2dc41e --- /dev/null +++ b/content/develop/ai/search-and-query/indexing/json-arrays.md @@ -0,0 +1,255 @@ +--- +aliases: +- /develop/interact/search-and-query/indexing/json-arrays +- /develop/ai/search-and-query/indexing/json-arrays +categories: +- docs +- develop +- stack +- oss +- rs +- rc +- oss +- kubernetes +- clients +description: How to index and search JSON arrays with different field types +linkTitle: JSON array indexing +title: JSON array indexing +weight: 30 +--- + +Redis supports indexing JSON arrays with various field types, allowing you to search across multiple values within a single document field. This page covers the different approaches and considerations for indexing JSON arrays effectively. + +## Index JSON arrays as TAG + +The preferred method for indexing a JSON field with multivalued terms is using JSON arrays. Each value of the array is indexed, and those values must be scalars. If you want to index string or boolean values as TAGs within a JSON array, use the [JSONPath]({{< relref "/develop/data-types/json/path" >}}) wildcard operator. + +To index an item's list of available colors, specify the JSONPath `$.colors.*` in the `SCHEMA` definition during index creation: + +```sql +127.0.0.1:6379> FT.CREATE itemIdx2 ON JSON PREFIX 1 item: SCHEMA $.colors.* AS colors TAG $.name AS name TEXT $.description as description TEXT +``` + +Now you can search for silver headphones: + +```sql +127.0.0.1:6379> FT.SEARCH itemIdx2 "@colors:{silver} (@name:(headphones)|@description:(headphones))" +1) "1" +2) "item:1" +3) 1) "$" + 2) "{\"name\":\"Noise-cancelling Bluetooth headphones\",\"description\":\"Wireless Bluetooth headphones with noise-cancelling technology\",\"connection\":{\"wireless\":true,\"type\":\"Bluetooth\"},\"price\":99.98,\"stock\":25,\"colors\":[\"black\",\"silver\"]}" +``` + +## Index JSON arrays as TEXT + +Starting with RediSearch v2.6.0, full text search can be done on an array of strings or on a JSONPath leading to multiple strings. + +If you want to index multiple string values as TEXT, use either a JSONPath leading to a single array of strings, or a JSONPath leading to multiple string values, using JSONPath operators such as wildcard, filter, union, array slice, and/or recursive descent. + +To index an item's list of available colors, specify the JSONPath `$.colors` in the `SCHEMA` definition during index creation: + +```sql +127.0.0.1:6379> FT.CREATE itemIdx3 ON JSON PREFIX 1 item: SCHEMA $.colors AS colors TEXT $.name AS name TEXT $.description as description TEXT +``` + +```sql +127.0.0.1:6379> JSON.SET item:3 $ '{"name":"True Wireless earbuds","description":"True Wireless Bluetooth in-ear headphones","connection":{"wireless":true,"type":"Bluetooth"},"price":74.99,"stock":20,"colors":["red","light blue"]}' +"OK" +``` + +Now you can do full text search for light colored headphones: + +```sql +127.0.0.1:6379> FT.SEARCH itemIdx3 '@colors:(white|light) (@name|description:(headphones))' RETURN 1 $.colors +1) (integer) 2 +2) "item:2" +3) 1) "$.colors" + 2) "[\"black\",\"white\"]" +4) "item:3" +5) 1) "$.colors" + 2) "[\"red\",\"light blue\"]" +``` + +### Limitations + +- When a JSONPath may lead to multiple values and not only to a single array, e.g., when a JSONPath contains wildcards, etc., specifying `SLOP` or `INORDER` in [`FT.SEARCH`]({{< relref "commands/ft.search/" >}}) will return an error, since the order of the values matching the JSONPath is not well defined, leading to potentially inconsistent results. + + For example, using a JSONPath such as `$..b[*]` on a JSON value such as + ```json + { + "a": [ + {"b": ["first first", "first second"]}, + {"c": + {"b": ["second first", "second second"]}}, + {"b": ["third first", "third second"]} + ] + } + ``` + may match values in various orderings, depending on the specific implementation of the JSONPath library being used. + + Since `SLOP` and `INORDER` consider relative ordering among the indexed values, and results may change in future releases, an error will be returned. + +- When JSONPath leads to multiple values: + - String values are indexed + - `null` values are skipped + - Any other value type will cause an indexing failure + +- `SORTBY` only sorts by the first value +- No `HIGHLIGHT` and `SUMMARIZE` support +- `RETURN` of a Schema attribute, whose JSONPath leads to multiple values, returns only the first value (as a JSON String) +- If a JSONPath is specified by the `RETURN`, instead of a Schema attribute, all values are returned (as a JSON String) + +### Handling phrases in different array slots + +When indexing, a predefined delta is used to increase positional offsets between array slots for multiple text values. This delta controls the level of separation between phrases in different array slots (related to the `SLOP` parameter of [`FT.SEARCH`]({{< relref "commands/ft.search/" >}})). +This predefined value is set by the configuration parameter `MULTI_TEXT_SLOP` (at module load-time). The default value is 100. + +## Index JSON arrays as NUMERIC + +Starting with RediSearch v2.6.1, search can be done on an array of numerical values or on a JSONPath leading to multiple numerical values. + +If you want to index multiple numerical values as NUMERIC, use either a JSONPath leading to a single array of numbers, or a JSONPath leading to multiple numbers, using JSONPath operators such as wildcard, filter, union, array slice, and/or recursive descent. + +For example, add to the item's list the available `max_level` of volume (in decibels): + +```sql +127.0.0.1:6379> JSON.SET item:1 $ '{"name":"Noise-cancelling Bluetooth headphones","description":"Wireless Bluetooth headphones with noise-cancelling technology","connection":{"wireless":true,"type":"Bluetooth"},"price":99.98,"stock":25,"colors":["black","silver"], "max_level":[60, 70, 80, 90, 100]}' +OK + +127.0.0.1:6379> JSON.SET item:2 $ '{"name":"Wireless earbuds","description":"Wireless Bluetooth in-ear headphones","connection":{"wireless":true,"type":"Bluetooth"},"price":64.99,"stock":17,"colors":["black","white"], "max_level":[80, 100, 120]}' +OK + +127.0.0.1:6379> JSON.SET item:3 $ '{"name":"True Wireless earbuds","description":"True Wireless Bluetooth in-ear headphones","connection":{"wireless":true,"type":"Bluetooth"},"price":74.99,"stock":20,"colors":["red","light blue"], "max_level":[90, 100, 110, 120]}' +OK +``` + +To index the `max_level` array, specify the JSONPath `$.max_level` in the `SCHEMA` definition during index creation: + +```sql +127.0.0.1:6379> FT.CREATE itemIdx4 ON JSON PREFIX 1 item: SCHEMA $.max_level AS dB NUMERIC +OK +``` + +You can now search for headphones with specific max volume levels, for example, between 70 and 80 (inclusive), returning items with at least one value in their `max_level` array, which is in the requested range: + +```sql +127.0.0.1:6379> FT.SEARCH itemIdx4 '@dB:[70 80]' +1) (integer) 2 +2) "item:1" +3) 1) "$" + 2) "{\"name\":\"Noise-cancelling Bluetooth headphones\",\"description\":\"Wireless Bluetooth headphones with noise-cancelling technology\",\"connection\":{\"wireless\":true,\"type\":\"Bluetooth\"},\"price\":99.98,\"stock\":25,\"colors\":[\"black\",\"silver\"],\"max_level\":[60,70,80,90,100]}" +4) "item:2" +5) 1) "$" + 2) "{\"name\":\"Wireless earbuds\",\"description\":\"Wireless Bluetooth in-ear headphones\",\"connection\":{\"wireless\":true,\"type\":\"Bluetooth\"},\"price\":64.99,\"stock\":17,\"colors\":[\"black\",\"white\"],\"max_level\":[80,100,120]}" +``` + +You can also search for items with all values in a specific range. For example, all values are in the range [90, 120] (inclusive): + +```sql +127.0.0.1:6379> FT.SEARCH itemIdx4 '-@dB:[-inf (90] -@dB:[(120 +inf]' +1) (integer) 1 +2) "item:3" +3) 1) "$" + 2) "{\"name\":\"True Wireless earbuds\",\"description\":\"True Wireless Bluetooth in-ear headphones\",\"connection\":{\"wireless\":true,\"type\":\"Bluetooth\"},\"price\":74.99,\"stock\":20,\"colors\":[\"red\",\"light blue\"],\"max_level\":[90,100,110,120]}" +``` + +### Limitations + +When JSONPath leads to multiple numerical values: + - Numerical values are indexed + - `null` values are skipped + - Any other value type will cause an indexing failure + +## Index JSON arrays as VECTOR + +Starting with RediSearch 2.6.0, you can index a JSONPath leading to an array of numeric values as a VECTOR type in the index schema. + +For example, assume that your JSON items include an array of vector embeddings, where each vector represents an image of a product. To index these vectors, specify the JSONPath `$.embedding` in the schema definition during index creation: + +```sql +127.0.0.1:6379> FT.CREATE itemIdx5 ON JSON PREFIX 1 item: SCHEMA $.embedding AS embedding VECTOR FLAT 6 DIM 4 DISTANCE_METRIC L2 TYPE FLOAT32 +OK +127.0.0.1:6379> JSON.SET item:1 $ '{"name":"Noise-cancelling Bluetooth headphones","description":"Wireless Bluetooth headphones with noise-cancelling technology","price":99.98,"stock":25,"colors":["black","silver"],"embedding":[0.87,-0.15,0.55,0.03]}' +OK +127.0.0.1:6379> JSON.SET item:2 $ '{"name":"Wireless earbuds","description":"Wireless Bluetooth in-ear headphones","price":64.99,"stock":17,"colors":["black","white"],"embedding":[-0.7,-0.51,0.88,0.14]}' +OK +``` + +Now you can search for the two headphones that are most similar to the image embedding by using vector search KNN query. (Note that the vector queries are supported as of dialect 2.) For example: + +```sql +127.0.0.1:6379> FT.SEARCH itemIdx5 '*=>[KNN 2 @embedding $blob AS dist]' SORTBY dist PARAMS 2 blob \x01\x01\x01\x01 DIALECT 2 +1) (integer) 2 +2) "item:1" +3) 1) "dist" + 2) "1.08280003071" + 3) "$" + 4) "{\"name\":\"Noise-cancelling Bluetooth headphones\",\"description\":\"Wireless Bluetooth headphones with noise-cancelling technology\",\"price\":99.98,\"stock\":25,\"colors\":[\"black\",\"silver\"],\"embedding\":[0.87,-0.15,0.55,0.03]}" +4) "item:2" +5) 1) "dist" + 2) "1.54409992695" + 3) "$" + 4) "{\"name\":\"Wireless earbuds\",\"description\":\"Wireless Bluetooth in-ear headphones\",\"price\":64.99,\"stock\":17,\"colors\":[\"black\",\"white\"],\"embedding\":[-0.7,-0.51,0.88,0.14]}" +``` + +If you want to index multiple numeric arrays as VECTOR, use a [JSONPath]({{< relref "/develop/data-types/json/path" >}}) leading to multiple numeric arrays using JSONPath operators such as wildcard, filter, union, array slice, and/or recursive descent. + +For example, assume that your JSON items include an array of vector embeddings, where each vector represents a different image of the same product. To index these vectors, specify the JSONPath `$.embeddings[*]` in the schema definition during index creation: + +```sql +127.0.0.1:6379> FT.CREATE itemIdx5 ON JSON PREFIX 1 item: SCHEMA $.embeddings[*] AS embeddings VECTOR FLAT 6 DIM 4 DISTANCE_METRIC L2 TYPE FLOAT32 +OK +127.0.0.1:6379> JSON.SET item:1 $ '{"name":"Noise-cancelling Bluetooth headphones","description":"Wireless Bluetooth headphones with noise-cancelling technology","price":99.98,"stock":25,"colors":["black","silver"],"embeddings":[[0.87,-0.15,0.55,0.03]]}' +OK +127.0.0.1:6379> JSON.SET item:2 $ '{"name":"Wireless earbuds","description":"Wireless Bluetooth in-ear headphones","price":64.99,"stock":17,"colors":["black","white"],"embeddings":[[-0.7,-0.51,0.88,0.14],[-0.8,-0.15,0.33,-0.01]]}' +OK +``` + +{{% alert title="Important note" color="info" %}} +Unlike the case with the NUMERIC type, setting a static path such as `$.embedding` in the schema for the VECTOR type does not allow you to index multiple vectors stored under that field. Hence, if you set `$.embedding` as the path to the index schema, specifying an array of vectors in the `embedding` field in your JSON will cause an indexing failure. +{{% /alert %}} + +Now you can search for the two headphones that are most similar to an image embedding by using vector search KNN query. (Note that the vector queries are supported as of dialect 2.) The distance between a document to the query vector is defined as the minimum distance between the query vector to a vector that matches the JSONPath specified in the schema. For example: + +```sql +127.0.0.1:6379> FT.SEARCH itemIdx5 '*=>[KNN 2 @embeddings $blob AS dist]' SORTBY dist PARAMS 2 blob \x01\x01\x01\x01 DIALECT 2 +1) (integer) 2 +2) "item:2" +3) 1) "dist" + 2) "0.771500051022" + 3) "$" + 4) "{\"name\":\"Wireless earbuds\",\"description\":\"Wireless Bluetooth in-ear headphones\",\"price\":64.99,\"stock\":17,\"colors\":[\"black\",\"white\"],\"embeddings\":[[-0.7,-0.51,0.88,0.14],[-0.8,-0.15,0.33,-0.01]]}" +4) "item:1" +5) 1) "dist" + 2) "1.08280003071" + 3) "$" + 4) "{\"name\":\"Noise-cancelling Bluetooth headphones\",\"description\":\"Wireless Bluetooth headphones with noise-cancelling technology\",\"price\":99.98,\"stock\":25,\"colors\":[\"black\",\"silver\"],\"embeddings\":[[0.87,-0.15,0.55,0.03]]}" +``` +Note that `0.771500051022` is the L2 distance between the query vector and `[-0.8,-0.15,0.33,-0.01]`, which is the second element in the embedding array, and it is lower than the L2 distance between the query vector and `[-0.7,-0.51,0.88,0.14]`, which is the first element in the embedding array. + +For more information on vector similarity syntax, see [Vector fields]({{< relref "/develop/ai/search-and-query/vectors" >}}). + +## Index JSON arrays as GEO and GEOSHAPE + +You can use `GEO` and `GEOSHAPE` fields to store geospatial data, such as geographical locations and geometric shapes. See [Geospatial indexing]({{< relref "/develop/ai/search-and-query/indexing/geospatial" >}}) to learn how to use these schema types and their format and usage. + +## Index limitations + +### Schema mapping + +During index creation, you need to map the JSON elements to `SCHEMA` fields as follows: + +- Strings as `TEXT`, `TAG`, or `GEO`. +- Numbers as `NUMERIC`. +- Booleans as `TAG`. +- JSON array + - Array of strings as `TAG` or `TEXT`. + - Array of numbers as `NUMERIC` or `VECTOR`. + - Array of geo coordinates as `GEO`. + - `null` values in such arrays are ignored. +- You cannot index JSON objects. Index the individual elements as separate attributes instead. +- `null` values are ignored. + +### Sortable tags + +If you create an index for JSON documents with a JSONPath leading to an array or to multiple values, only the first value is considered by the sort. diff --git a/content/develop/ai/search-and-query/indexing/json-indexing.md b/content/develop/ai/search-and-query/indexing/json-indexing.md new file mode 100644 index 0000000000..49473001ce --- /dev/null +++ b/content/develop/ai/search-and-query/indexing/json-indexing.md @@ -0,0 +1,218 @@ +--- +aliases: +- /develop/interact/search-and-query/indexing/json-indexing +- /develop/ai/search-and-query/indexing/json-indexing +categories: +- docs +- develop +- stack +- oss +- rs +- rc +- oss +- kubernetes +- clients +description: How to create and configure indexes for Redis JSON documents +linkTitle: JSON indexing +title: JSON indexing +weight: 25 +--- + +You can create search indexes for Redis JSON documents to enable fast, flexible queries across your structured data. JSON indexing uses JSONPath expressions to specify which parts of your documents to index, making it ideal for complex, nested data structures. + +## Create an index with JSON schema + +When you create an index with the [`FT.CREATE`]({{< relref "commands/ft.create/" >}}) command, include the `ON JSON` keyword to index any existing and future JSON documents stored in the database. + +To define the `SCHEMA`, you provide [JSONPath]({{< relref "/develop/data-types/json/path" >}}) expressions. The result of each JSONPath expression is indexed and associated with a logical name called an `attribute` (previously known as a `field`). You can use these attributes in queries. + +{{% alert title="Note" color="info" %}} +Note: `attribute` is optional for [`FT.CREATE`]({{< relref "commands/ft.create/" >}}). +{{% /alert %}} + +Use the following syntax to create a JSON index: + +```sql +FT.CREATE {index_name} ON JSON SCHEMA {json_path} AS {attribute} {type} +``` + +For example, this command creates an index that indexes the name, description, price, and image vector embedding of each JSON document that represents an inventory item: + +```sql +127.0.0.1:6379> FT.CREATE itemIdx ON JSON PREFIX 1 item: SCHEMA $.name AS name TEXT $.description as description TEXT $.price AS price NUMERIC $.embedding AS embedding VECTOR FLAT 6 DIM 4 DISTANCE_METRIC L2 TYPE FLOAT32 +``` + +## Add JSON documents + +After you create an index, Redis automatically indexes any existing, modified, or newly created JSON documents stored in the database. For existing documents, indexing runs asynchronously in the background, so it can take some time before the document is available. Modified and newly created documents are indexed synchronously, so the document will be available by the time the add or modify command finishes. + +You can use any JSON write command, such as [`JSON.SET`]({{< relref "commands/json.set/" >}}) and [`JSON.ARRAPPEND`]({{< relref "commands/json.arrappend/" >}}), to create or modify JSON documents. + +The following examples use these JSON documents to represent individual inventory items. + +Item 1 JSON document: + +```json +{ + "name": "Noise-cancelling Bluetooth headphones", + "description": "Wireless Bluetooth headphones with noise-cancelling technology", + "connection": { + "wireless": true, + "type": "Bluetooth" + }, + "price": 99.98, + "stock": 25, + "colors": [ + "black", + "silver" + ], + "embedding": [0.87, -0.15, 0.55, 0.03] +} +``` + +Item 2 JSON document: + +```json +{ + "name": "Wireless earbuds", + "description": "Wireless Bluetooth in-ear headphones", + "connection": { + "wireless": true, + "type": "Bluetooth" + }, + "price": 64.99, + "stock": 17, + "colors": [ + "black", + "white" + ], + "embedding": [-0.7, -0.51, 0.88, 0.14] +} +``` + +Use [`JSON.SET`]({{< relref "commands/json.set/" >}}) to store these documents in the database: + +```sql +127.0.0.1:6379> JSON.SET item:1 $ '{"name":"Noise-cancelling Bluetooth headphones","description":"Wireless Bluetooth headphones with noise-cancelling technology","connection":{"wireless":true,"type":"Bluetooth"},"price":99.98,"stock":25,"colors":["black","silver"],"embedding":[0.87,-0.15,0.55,0.03]}' +"OK" +127.0.0.1:6379> JSON.SET item:2 $ '{"name":"Wireless earbuds","description":"Wireless Bluetooth in-ear headphones","connection":{"wireless":true,"type":"Bluetooth"},"price":64.99,"stock":17,"colors":["black","white"],"embedding":[-0.7,-0.51,0.88,0.14]}' +"OK" +``` + +Because indexing is synchronous in this case, the documents will be available on the index as soon as the [`JSON.SET`]({{< relref "commands/json.set/" >}}) command returns. Any subsequent queries that match the indexed content will return the document. + +## Search the index + +To search the index for JSON documents, use the [`FT.SEARCH`]({{< relref "commands/ft.search/" >}}) command. You can search any attribute defined in the `SCHEMA`. + +For example, use this query to search for items with the word "earbuds" in the name: + +```sql +127.0.0.1:6379> FT.SEARCH itemIdx '@name:(earbuds)' +1) "1" +2) "item:2" +3) 1) "$" + 2) "{\"name\":\"Wireless earbuds\",\"description\":\"Wireless Bluetooth in-ear headphones\",\"connection\":{\"wireless\":true,\"connection\":\"Bluetooth\"},\"price\":64.99,\"stock\":17,\"colors\":[\"black\",\"white\"],\"embedding\":[-0.7,-0.51,0.88,0.14]}" +``` + +This query searches for all items that include "bluetooth" and "headphones" in the description: + +```sql +127.0.0.1:6379> FT.SEARCH itemIdx '@description:(bluetooth headphones)' +1) "2" +2) "item:1" +3) 1) "$" + 2) "{\"name\":\"Noise-cancelling Bluetooth headphones\",\"description\":\"Wireless Bluetooth headphones with noise-cancelling technology\",\"connection\":{\"wireless\":true,\"type\":\"Bluetooth\"},\"price\":99.98,\"stock\":25,\"colors\":[\"black\",\"silver\"], \"embedding\":[0.87,-0.15,0.55,0.03]}" +4) "item:2" +5) 1) "$" + 2) "{\"name\":\"Wireless earbuds\",\"description\":\"Wireless Bluetooth in-ear headphones\",\"connection\":{\"wireless\":true,\"connection\":\"Bluetooth\"},\"price\":64.99,\"stock\":17,\"colors\":[\"black\",\"white\"],\"embedding\":[-0.7,-0.51,0.88,0.14]}" +``` + +Now search for Bluetooth headphones with a price less than 70: + +```sql +127.0.0.1:6379> FT.SEARCH itemIdx '@description:(bluetooth headphones) @price:[0 70]' +1) "1" +2) "item:2" +3) 1) "$" + 2) "{\"name\":\"Wireless earbuds\",\"description\":\"Wireless Bluetooth in-ear headphones\",\"connection\":{\"wireless\":true,\"connection\":\"Bluetooth\"},\"price\":64.99,\"stock\":17,\"colors\":[\"black\",\"white\"],\"embedding\":[-0.7,-0.51,0.88,0.14]}" +``` + +For more information about search queries, see [Search query syntax]({{< relref "/develop/ai/search-and-query/advanced-concepts/query_syntax" >}}). + +{{% alert title="Note" color="info" %}} +[`FT.SEARCH`]({{< relref "commands/ft.search/" >}}) queries require `attribute` modifiers. Don't use JSONPath expressions in queries because the query parser doesn't fully support them. +{{% /alert %}} + +## Index JSON objects + +You cannot index JSON objects directly. FT.CREATE will return an error if the JSONPath expression returns an object. + +To index the contents of a JSON object, you need to index the individual elements within the object as separate attributes. + +For example, to index the `connection` JSON object, define the `$.connection.wireless` and `$.connection.type` fields as separate attributes when you create the index: + +```sql +127.0.0.1:6379> FT.CREATE itemIdx3 ON JSON SCHEMA $.connection.wireless AS wireless TAG $.connection.type AS connectionType TEXT +"OK" +``` + +After you create the new index, you can search for items with the wireless TAG set to `true`: + +```sql +127.0.0.1:6379> FT.SEARCH itemIdx3 '@wireless:{true}' +1) "2" +2) "item:2" +3) 1) "$" + 2) "{\"name\":\"Wireless earbuds\",\"description\":\"Wireless Bluetooth in-ear headphones\",\"connection\":{\"wireless\":true,\"connection\":\"Bluetooth\"},\"price\":64.99,\"stock\":17,\"colors\":[\"black\",\"white\"]}" +4) "item:1" +5) 1) "$" + 2) "{\"name\":\"Noise-cancelling Bluetooth headphones\",\"description\":\"Wireless Bluetooth headphones with noise-cancelling technology\",\"connection\":{\"wireless\":true,\"type\":\"Bluetooth\"},\"price\":99.98,\"stock\":25,\"colors\":[\"black\",\"silver\"]}" +``` + +You can also search for items with a Bluetooth connection type: + +```sql +127.0.0.1:6379> FT.SEARCH itemIdx3 '@connectionType:(bluetooth)' +1) "2" +2) "item:1" +3) 1) "$" + 2) "{\"name\":\"Noise-cancelling Bluetooth headphones\",\"description\":\"Wireless Bluetooth headphones with noise-cancelling technology\",\"connection\":{\"wireless\":true,\"type\":\"Bluetooth\"},\"price\":99.98,\"stock\":25,\"colors\":[\"black\",\"silver\"]}" +4) "item:2" +5) 1) "$" + 2) "{\"name\":\"Wireless earbuds\",\"description\":\"Wireless Bluetooth in-ear headphones\",\"connection\":{\"wireless\":true,\"type\":\"Bluetooth\"},\"price\":64.99,\"stock\":17,\"colors\":[\"black\",\"white\"]}" +``` + +## Index missing or empty values + +As of v2.10, you can search for missing properties, that is, properties that do not exist in a given document, using the `INDEXMISSING` option to `FT.CREATE` in conjunction with the `ismissing` query function with `FT.SEARCH`. You can also search for existing properties with no value (i.e., empty) using the `INDEXEMPTY` option with `FT.CREATE`. Both query types require DIALECT 2. Examples below: + +``` +JSON.SET key:1 $ '{"propA": "foo"}' +JSON.SET key:2 $ '{"propA": "bar", "propB":"abc"}' +FT.CREATE idx ON JSON PREFIX 1 key: SCHEMA $.propA AS propA TAG $.propB AS propB TAG INDEXMISSING + +> FT.SEARCH idx 'ismissing(@propB)' DIALECT 2 +1) "1" +2) "key:1" +3) 1) "$" + 2) "{\"propA\":\"foo\"}" +``` + +``` +JSON.SET key:1 $ '{"propA": "foo", "propB":""}' +JSON.SET key:2 $ '{"propA": "bar", "propB":"abc"}' +FT.CREATE idx ON JSON PREFIX 1 key: SCHEMA $.propA AS propA TAG $.propB AS propB TAG INDEXEMPTY + +> FT.SEARCH idx '@propB:{""}' DIALECT 2 +1) "1" +2) "key:1" +3) 1) "$" + 2) "{\"propA\":\"foo\",\"propB\":\"\"}" +``` + +## Next steps + +- For advanced JSON array indexing techniques, see [JSON arrays]({{< relref "/develop/ai/search-and-query/indexing/json-arrays" >}}) +- For field projection and search techniques, see [Search techniques]({{< relref "/develop/ai/search-and-query/indexing/search-techniques" >}}) +- Learn about available field types on the [Field and type options]({{< relref "/develop/ai/search-and-query/indexing/field-and-type-options" >}}) page diff --git a/content/develop/ai/search-and-query/indexing/search-techniques.md b/content/develop/ai/search-and-query/indexing/search-techniques.md new file mode 100644 index 0000000000..fec6f2a271 --- /dev/null +++ b/content/develop/ai/search-and-query/indexing/search-techniques.md @@ -0,0 +1,160 @@ +--- +aliases: +- /develop/interact/search-and-query/indexing/search-techniques +- /develop/ai/search-and-query/indexing/search-techniques +categories: +- docs +- develop +- stack +- oss +- rs +- rc +- oss +- kubernetes +- clients +description: Advanced search techniques including field projection, highlighting, and aggregation +linkTitle: Search techniques +title: Search techniques +weight: 45 +--- + +This page covers advanced search techniques you can use with indexed JSON documents, including field projection, highlighting search terms, and aggregation queries. + +## Field projection + +[`FT.SEARCH`]({{< relref "commands/ft.search/" >}}) returns the entire JSON document by default. If you want to limit the returned search results to specific attributes, you can use field projection. + +### Return specific attributes + +When you run a search query, you can use the `RETURN` keyword to specify which attributes you want to include in the search results. You also need to specify the number of fields to return. + +For example, this query only returns the `name` and `price` of each set of headphones: + +```sql +127.0.0.1:6379> FT.SEARCH itemIdx '@description:(headphones)' RETURN 2 name price +1) "2" +2) "item:1" +3) 1) "name" + 2) "Noise-cancelling Bluetooth headphones" + 3) "price" + 4) "99.98" +4) "item:2" +5) 1) "name" + 2) "Wireless earbuds" + 3) "price" + 4) "64.99" +``` + +### Project with JSONPath + +You can use [JSONPath]({{< relref "/develop/data-types/json/path" >}}) expressions in a `RETURN` statement to extract any part of the JSON document, even fields that were not defined in the index `SCHEMA`. + +For example, the following query uses the JSONPath expression `$.stock` to return each item's stock in addition to the name and price attributes. + +```sql +127.0.0.1:6379> FT.SEARCH itemIdx '@description:(headphones)' RETURN 3 name price $.stock +1) "2" +2) "item:1" +3) 1) "name" + 2) "Noise-cancelling Bluetooth headphones" + 3) "price" + 4) "99.98" + 5) "$.stock" + 6) "25" +4) "item:2" +5) 1) "name" + 2) "Wireless earbuds" + 3) "price" + 4) "64.99" + 5) "$.stock" + 6) "17" +``` + +Note that the returned property name is the JSONPath expression itself: `"$.stock"`. + +You can use the `AS` option to specify an alias for the returned property: + +```sql +127.0.0.1:6379> FT.SEARCH itemIdx '@description:(headphones)' RETURN 5 name price $.stock AS stock +1) "2" +2) "item:1" +3) 1) "name" + 2) "Noise-cancelling Bluetooth headphones" + 3) "price" + 4) "99.98" + 5) "stock" + 6) "25" +4) "item:2" +5) 1) "name" + 2) "Wireless earbuds" + 3) "price" + 4) "64.99" + 5) "stock" + 6) "17" +``` + +This query returns the field as the alias `"stock"` instead of the JSONPath expression `"$.stock"`. + +## Highlight search terms + +You can [highlight]({{< relref "/develop/ai/search-and-query/advanced-concepts/highlight" >}}) relevant search terms in any indexed `TEXT` attribute. + +For [`FT.SEARCH`]({{< relref "commands/ft.search/" >}}), you have to explicitly set which attributes you want highlighted after the `RETURN` and `HIGHLIGHT` parameters. + +Use the optional `TAGS` keyword to specify the strings that will surround (or highlight) the matching search terms. + +For example, highlight the word "bluetooth" with bold HTML tags in item names and descriptions: + +```sql +127.0.0.1:6379> FT.SEARCH itemIdx '(@name:(bluetooth))|(@description:(bluetooth))' RETURN 3 name description price HIGHLIGHT FIELDS 2 name description TAGS '' '' +1) "2" +2) "item:1" +3) 1) "name" + 2) "Noise-cancelling Bluetooth headphones" + 3) "description" + 4) "Wireless Bluetooth headphones with noise-cancelling technology" + 5) "price" + 6) "99.98" +4) "item:2" +5) 1) "name" + 2) "Wireless earbuds" + 3) "description" + 4) "Wireless Bluetooth in-ear headphones" + 5) "price" + 6) "64.99" +``` + +## Aggregate with JSONPath + +You can use [aggregation]({{< relref "/develop/ai/search-and-query/advanced-concepts/aggregations" >}}) to generate statistics or build facet queries. + +The `LOAD` option accepts [JSONPath]({{< relref "/develop/data-types/json/path" >}}) expressions. You can use any value in the pipeline, even if the value is not indexed. + +This example uses aggregation to calculate a 10% price discount for each item and sorts the items from least expensive to most expensive: + +```sql +127.0.0.1:6379> FT.AGGREGATE itemIdx '*' LOAD 4 name $.price AS originalPrice APPLY '@originalPrice - (@originalPrice * 0.10)' AS salePrice SORTBY 2 @salePrice ASC +1) "2" +2) 1) "name" + 2) "Wireless earbuds" + 3) "originalPrice" + 4) "64.99" + 5) "salePrice" + 6) "58.491" +3) 1) "name" + 2) "Noise-cancelling Bluetooth headphones" + 3) "originalPrice" + 4) "99.98" + 5) "salePrice" + 6) "89.982" +``` + +{{% alert title="Note" color="info" %}} +[`FT.AGGREGATE`]({{< relref "commands/ft.aggregate/" >}}) queries require `attribute` modifiers. Don't use JSONPath expressions in queries, except with the `LOAD` option, because the query parser doesn't fully support them. +{{% /alert %}} + +## Next steps + +- Learn more about [aggregation syntax]({{< relref "/develop/ai/search-and-query/advanced-concepts/aggregations" >}}) +- Explore [highlighting options]({{< relref "/develop/ai/search-and-query/advanced-concepts/highlight" >}}) +- See [query syntax]({{< relref "/develop/ai/search-and-query/advanced-concepts/query_syntax" >}}) for advanced search patterns diff --git a/content/develop/ai/search-and-query/indexing/tags.md b/content/develop/ai/search-and-query/indexing/tags.md new file mode 100644 index 0000000000..d5bb39f13f --- /dev/null +++ b/content/develop/ai/search-and-query/indexing/tags.md @@ -0,0 +1,334 @@ +--- +aliases: +- /develop/interact/search-and-query/advanced-concepts/tags +- /develop/ai/search-and-query/advanced-concepts/tags +- /develop/interact/search-and-query/indexing/tags +categories: +- docs +- develop +- stack +- oss +- rs +- rc +- oss +- kubernetes +- clients +description: How to use tag fields for exact match searches and high-performance filtering +linkTitle: Tag fields +title: Tag fields +weight: 15 +--- + +Tag fields provide exact match search capabilities with high performance and memory efficiency. Use tag fields when you need to filter documents by specific values without the complexity of full-text search tokenization. + +## When to use tag fields + +Tag fields excel in scenarios requiring exact matching: + +- **Product categories**: Electronics, Clothing, Books +- **User roles**: Admin, Editor, Viewer +- **Status values**: Active, Pending, Completed +- **Geographic regions**: US, EU, APAC +- **Content types**: Video, Image, Document + +## Key advantages + +Tag fields offer several benefits over TEXT fields: + +1. **Exact match semantics** - Find documents with precise values +2. **High performance** - Compressed indexes with minimal memory usage +3. **Simple tokenization** - No stemming or complex text processing +4. **Multiple values** - Support comma-separated lists in a single field +5. **Case control** - Optional case-sensitive matching + +## Tag fields vs TEXT fields + +| Feature | Tag Fields | TEXT Fields | +|---------|------------|-------------| +| **Search type** | Exact match | Full-text search | +| **Tokenization** | Simple delimiter splitting | Complex word tokenization | +| **Stemming** | None | Language-specific stemming | +| **Memory usage** | Very low (1-2 bytes per entry) | Higher (frequencies, positions) | +| **Performance** | Fastest | Slower for exact matches | +| **Use case** | Categories, filters, IDs | Content search, descriptions | + +Tag fields interpret text as a simple list of *tags* delimited by a [separator](#creating-a-tag-field) character (comma "," by default). This approach enables simpler [tokenization]({{< relref "/develop/ai/search-and-query/indexing/tokenization" >}}) and encoding, making tag indexes much more efficient than full-text indexes. + +**Important**: You can only access tag field values using special tag query syntax - they don't appear in general field-less searches. + +## Technical details + +### Index structure +- **Compressed storage**: Only document IDs encoded as deltas (1-2 bytes per entry) +- **No frequencies**: Unlike TEXT fields, tag indexes don't store term frequencies +- **No positions**: No offset vectors or field flags stored +- **Limit**: You can create up to 1024 tag fields per index + +### Tokenization differences +- **Simple splitting**: Text is split only at separator characters +- **No stemming**: Words are indexed exactly as written +- **Case handling**: Optional case-sensitive or case-insensitive matching +- **No stop words**: All tag values are indexed regardless of content + +## Create a tag field + +Add tag fields to your schema using this syntax: + +``` +FT.CREATE ... SCHEMA ... {field_name} TAG [SEPARATOR {sep}] [CASESENSITIVE] +``` + +### Separator options + +- **Hash documents**: Default separator is comma (`,`). You can use any printable ASCII character +- **JSON documents**: No default separator - you must specify one explicitly if needed +- **Custom separators**: Use semicolon (`;`), pipe (`|`), or other characters as needed + +### Case sensitivity + +- **Default**: Case-insensitive matching (`red` matches `Red`, `RED`) +- **CASESENSITIVE**: Preserves original case for exact matching + +### Examples + +**Basic tag field with JSON:** +```sql +JSON.SET key:1 $ '{"colors": "red, orange, yellow"}' +FT.CREATE idx ON JSON PREFIX 1 key: SCHEMA $.colors AS colors TAG SEPARATOR "," + +> FT.SEARCH idx '@colors:{orange}' +1) "1" +2) "key:1" +3) 1) "$" + 2) "{\"colors\":\"red, orange, yellow\"}" +``` + +**Case-sensitive tags with Hash:** +```sql +HSET product:1 categories "Electronics,Gaming,PC" +FT.CREATE products ON HASH PREFIX 1 product: SCHEMA categories TAG CASESENSITIVE + +> FT.SEARCH products '@categories:{PC}' +1) "1" +2) "product:1" +``` + +**Custom separator:** +```sql +HSET book:1 genres "Fiction;Mystery;Thriller" +FT.CREATE books ON HASH PREFIX 1 book: SCHEMA genres TAG SEPARATOR ";" +``` + +## Query tag fields + +**Important**: Tag fields require special query syntax - you cannot find tag values with general field-less searches. + +### Basic tag query syntax + +Use curly braces to specify tag values (the braces are part of the syntax): + +``` +@:{ | | ...} +``` + +### Single tag match + +Find documents with a specific tag: + +```sql +FT.SEARCH idx "@category:{Electronics}" +FT.SEARCH idx "@status:{Active}" +``` + +### Multiple tag match (OR) + +Find documents with any of the specified tags: + +```sql +FT.SEARCH idx "@tags:{ hello world | foo bar }" +FT.SEARCH idx "@category:{ Electronics | Gaming | Software }" +``` + +### Combining with other queries + +Tag queries work seamlessly with other field types: + +```sql +FT.CREATE idx ON HASH PREFIX 1 product: SCHEMA + title TEXT + price NUMERIC + category TAG SEPARATOR ";" + +# Combine text search, numeric range, and tag filter +FT.SEARCH idx "@title:laptop @price:[500 1500] @category:{ Electronics | Gaming }" +``` + +### Prefix matching + +Use the `*` wildcard for prefix matching: + +```sql +FT.SEARCH idx "@tags:{ tech* }" # Matches: technology, technical, tech +FT.SEARCH idx "@tags:{ hello\\ w* }" # Matches: "hello world", "hello web" +``` + +### Negative matching + +Exclude documents with specific tags: + +```sql +FT.SEARCH idx "-@category:{Discontinued}" +FT.SEARCH idx "@title:phone -@category:{Refurbished}" +``` + +## Advanced tag queries + +### OR vs AND logic + +**Single clause (OR logic)**: Find documents with ANY of the specified tags +```sql +@cities:{ New York | Los Angeles | Barcelona } +# Returns: Documents with New York OR Los Angeles OR Barcelona +``` + +**Multiple clauses (AND logic)**: Find documents with ALL of the specified tags +```sql +@cities:{ New York } @cities:{ Los Angeles } @cities:{ Barcelona } +# Returns: Documents with New York AND Los Angeles AND Barcelona +``` + +### Practical example + +Consider a travel database: + +```sql +FT.CREATE travelers ON HASH PREFIX 1 traveler: SCHEMA + name TEXT + cities TAG + +HSET traveler:1 name "John Doe" cities "New York, Barcelona, San Francisco" +HSET traveler:2 name "Jane Smith" cities "New York, Los Angeles, Tokyo" +``` + +**Find travelers who visited any of these cities:** +```sql +FT.SEARCH travelers "@cities:{ New York | Los Angeles | Barcelona }" +# Returns: Both John and Jane +``` + +**Find travelers who visited all of these cities:** +```sql +FT.SEARCH travelers "@cities:{ New York } @cities:{ Barcelona }" +# Returns: Only John (has both New York and Barcelona) +``` + +## Handle special characters + +Tag fields can contain any punctuation except the field separator, but you need to escape certain characters in queries. + +### Defining tags with special characters + +You can store tags with punctuation without escaping: + +```sql +FT.CREATE products ON HASH PREFIX 1 test: SCHEMA tags TAG + +HSET test:1 tags "Andrew's Top 5,Justin's Top 5,5-Star Rating" +HSET test:2 tags "Best Buy,Top-Rated,Editor's Choice" +``` + +### Querying tags with special characters + +**Escape punctuation in queries** using backslash (`\`): + +```sql +# Query for "Andrew's Top 5" +FT.SEARCH products "@tags:{ Andrew\\'s Top 5 }" + +# Query for "5-Star Rating" +FT.SEARCH products "@tags:{ 5\\-Star Rating }" + +# Query for "Editor's Choice" +FT.SEARCH products "@tags:{ Editor\\'s Choice }" +``` + +### Characters that need escaping + +In tag queries, escape these characters: +- Single quotes: `'` → `\\'` +- Hyphens: `-` → `\\-` +- Parentheses: `()` → `\\(\\)` +- Brackets: `[]{}` → `\\[\\]\\{\\}` +- Pipes: `|` → `\\|` + +### Spaces in tags + +**Modern Redis** (v2.4+): Spaces don't need escaping in tag queries +```sql +FT.SEARCH products "@tags:{ Top Rated Product }" +``` + +**Older versions** or **dialect 1**: Escape spaces +```sql +FT.SEARCH products "@tags:{ Top\\ Rated\\ Product }" +``` + +### Best practices + +1. **Use simple separators**: Stick to comma (`,`) or semicolon (`;`) +2. **Avoid complex punctuation**: Keep tag values simple when possible +3. **Test your queries**: Verify escaping works with your specific characters +4. **Use consistent casing**: Decide on case sensitivity early in your design + +See [Query syntax]({{< relref "/develop/ai/search-and-query/advanced-concepts/query_syntax#tag-filters" >}}) for complete escaping rules. + +## Common use cases + +### E-commerce filtering +```sql +# Product categories and attributes +FT.CREATE products ON HASH PREFIX 1 product: SCHEMA + name TEXT + category TAG + brand TAG + features TAG SEPARATOR ";" + +HSET product:1 name "Gaming Laptop" category "Electronics" brand "ASUS" features "RGB;16GB RAM;SSD" + +# Find gaming products with specific features +FT.SEARCH products "@category:{Electronics} @features:{RGB} @features:{SSD}" +``` + +### User management +```sql +# User roles and permissions +FT.CREATE users ON HASH PREFIX 1 user: SCHEMA + name TEXT + roles TAG SEPARATOR "," + departments TAG SEPARATOR "," + +HSET user:1 name "John Admin" roles "admin,editor" departments "IT,Security" + +# Find users with admin access in IT +FT.SEARCH users "@roles:{admin} @departments:{IT}" +``` + +### Content classification +```sql +# Document tagging system +FT.CREATE docs ON JSON PREFIX 1 doc: SCHEMA + $.title AS title TEXT + $.tags AS tags TAG SEPARATOR "," + $.status AS status TAG + +JSON.SET doc:1 $ '{"title":"API Guide","tags":"technical,guide,api","status":"published"}' + +# Find published technical documents +FT.SEARCH docs "@status:{published} @tags:{technical}" +``` + +## Next steps + +- Learn about [tokenization rules]({{< relref "/develop/ai/search-and-query/indexing/tokenization" >}}) for tag fields +- Explore [field and type options]({{< relref "/develop/ai/search-and-query/indexing/field-and-type-options" >}}) for other field types +- See [query syntax]({{< relref "/develop/ai/search-and-query/advanced-concepts/query_syntax" >}}) for advanced query patterns diff --git a/content/develop/ai/search-and-query/indexing/tokenization.md b/content/develop/ai/search-and-query/indexing/tokenization.md new file mode 100644 index 0000000000..478497ddd6 --- /dev/null +++ b/content/develop/ai/search-and-query/indexing/tokenization.md @@ -0,0 +1,238 @@ +--- +aliases: +- /develop/interact/search-and-query/advanced-concepts/escaping +- /develop/ai/search-and-query/advanced-concepts/escaping +- /develop/interact/search-and-query/indexing/tokenization +categories: +- docs +- develop +- stack +- oss +- rs +- rc +- oss +- kubernetes +- clients +description: How Redis processes and tokenizes text for indexing and search +linkTitle: Tokenization +title: Tokenization +weight: 10 +--- + +Tokenization is the process of breaking text into smaller, searchable units called *tokens*. Understanding how Redis tokenizes your data helps you create more effective indexes and write better search queries. + +## How tokenization works + +When you index documents, Redis splits text into tokens and stores them efficiently. During searches, Redis tokenizes your query and matches tokens rather than comparing entire text strings. + +**Benefits of tokenization**: +- **Performance**: Token matching is much faster than full-text comparison +- **Flexibility**: Enables features like stemming, stop words, and fuzzy matching +- **Efficiency**: Reduces storage requirements and memory usage + +## Field type differences + +Redis uses different tokenization approaches based on field type: + +| Field Type | Tokenization | Use Case | +|------------|--------------|----------| +| **TEXT** | Complex word-based splitting | Full-text search, content analysis | +| **TAG** | Simple delimiter splitting | Exact matching, categories, filters | + +## TEXT field tokenization + +TEXT fields use sophisticated tokenization for full-text search capabilities. + +### Basic rules + +1. **Punctuation separates tokens**: Most punctuation marks and whitespace create token boundaries +2. **Underscores preserved**: Underscores (`_`) are NOT treated as separators +3. **Case normalization**: Latin characters converted to lowercase +4. **Whitespace handling**: Multiple spaces or punctuation marks are stripped +5. **Escaping supported**: Use backslash (`\`) to escape separator characters + +### Separator characters + +These characters split text into separate tokens: +``` +, . < > { } [ ] " ' : ; ! @ # $ % ^ & * ( ) - + = ~ ` | \ / ? +``` + +**Example**: +``` +"hello-world.test" → ["hello", "world", "test"] +"user@example.com" → ["user", "example", "com"] +"price:$99.99" → ["price", "99", "99"] +``` + +### Escaping separators + +Use backslash (`\`) to include separator characters in tokens: + +``` +"hello\-world" → ["hello-world"] +"user\@domain" → ["user@domain"] +"file\.txt" → ["file.txt"] +``` + +**Note**: In most programming languages, you need double backslashes: +```python +query = "hello\\-world" # Produces "hello\-world" +``` + +### Underscores preserved + +Underscores remain part of tokens: +``` +"hello_world" → ["hello_world"] +"user_id_123" → ["user_id_123"] +``` + +### Number handling + +Numbers require special attention: + +``` +"-20" → ["-20"] # Negative number +"-\20" → ["NOT", "20"] # Escaped: NOT operator + number +"3.14" → ["3", "14"] # Decimal split by dot +"3\.14" → ["3.14"] # Escaped: preserved as single token +``` + +Redis uses different tokenization approaches for different field types. [Tag fields]({{< relref "/develop/ai/search-and-query/indexing/tags" >}}) use simpler tokenization focused on exact matching, while TEXT fields support complex full-text search features. + +## TAG field tokenization + +[Tag fields]({{< relref "/develop/ai/search-and-query/indexing/tags" >}}) use much simpler tokenization designed for exact matching. + +### How TAG tokenization works + +1. **Split at separators**: Text is divided only at the specified separator character (default: comma) +2. **Preserve content**: Most punctuation and whitespace within tags is preserved +3. **Trim whitespace**: Leading and trailing spaces are removed from each tag +4. **Case handling**: Converts to lowercase unless `CASESENSITIVE` is specified + +### TAG tokenization examples + +**Default comma separator**: +``` +"red, blue, green" → ["red", "blue", "green"] +"Electronics,Gaming,PC" → ["electronics", "gaming", "pc"] +``` + +**Custom separator**: +``` +"Fiction;Mystery;Thriller" → ["fiction", "mystery", "thriller"] +"admin|editor|viewer" → ["admin", "editor", "viewer"] +``` + +**Preserved punctuation**: +``` +"Andrew's Top 5,Best Buy,5-Star" → ["andrew's top 5", "best buy", "5-star"] +``` + +### When to escape in TAG queries + +**During indexing**: No escaping needed when storing tag values +**During querying**: Escape special characters in tag queries + +```sql +# Store tags (no escaping needed) +HSET product:1 tags "Andrew's Top 5,Best-Seller" + +# Query tags (escaping required) +FT.SEARCH products "@tags:{ Andrew\\'s Top 5 }" +FT.SEARCH products "@tags:{ Best\\-Seller }" +``` + +## Practical examples + +### TEXT field examples + +**Product descriptions**: +```sql +# Document content +"High-quality noise-cancelling headphones with Bluetooth 5.0" + +# Tokenized as +["high", "quality", "noise", "cancelling", "headphones", "with", "bluetooth", "5", "0"] + +# Search queries that match +"@description:(noise cancelling)" # Matches: noise AND cancelling +"@description:(bluetooth)" # Matches: bluetooth +"@description:(high quality)" # Matches: high AND quality +``` + +**Email addresses and URLs**: +```sql +# Document content +"Contact: support@example.com or visit https://example.com/help" + +# Tokenized as +["contact", "support", "example", "com", "or", "visit", "https", "example", "com", "help"] + +# To search for complete email, escape the @ symbol +"@content:(support\\@example.com)" +``` + +### TAG field examples + +**Product categories**: +```sql +# Store categories +HSET product:1 categories "Electronics,Audio,Headphones" + +# Create index +FT.CREATE products ON HASH PREFIX 1 product: SCHEMA categories TAG + +# Query exact categories +FT.SEARCH products "@categories:{Electronics}" +FT.SEARCH products "@categories:{Audio}" +``` + +**User roles with special characters**: +```sql +# Store roles with punctuation +HSET user:1 roles "Admin,Content-Editor,API-User" + +# Query with escaping +FT.SEARCH users "@roles:{Content\\-Editor}" +FT.SEARCH users "@roles:{API\\-User}" +``` + +## Best practices + +### Choose the right field type + +- **Use TEXT for**: Content search, descriptions, articles, comments +- **Use TAG for**: Categories, status values, user roles, product types + +### Optimize for your queries + +**TEXT fields**: +- Consider how users will search your content +- Plan for partial word matches and stemming +- Test with realistic search terms + +**TAG fields**: +- Keep tag values simple and consistent +- Avoid complex punctuation when possible +- Use meaningful separator characters + +### Handle special characters + +**In TEXT fields**: +- Escape separators when you need them in search terms +- Remember that punctuation splits tokens +- Use underscores for compound identifiers + +**In TAG fields**: +- Store tags without escaping +- Escape punctuation in queries +- Test your tag queries with real data + +## Next steps + +- Learn about [tags]({{< relref "/develop/ai/search-and-query/indexing/tags" >}}) for exact matching +- Explore [field and type options]({{< relref "/develop/ai/search-and-query/indexing/field-and-type-options" >}}) for other field types +- See [query syntax]({{< relref "/develop/ai/search-and-query/advanced-concepts/query_syntax" >}}) for advanced search patterns diff --git a/content/develop/ai/search-and-query/indexing/vector-indexing.md b/content/develop/ai/search-and-query/indexing/vector-indexing.md new file mode 100644 index 0000000000..a59add4d4e --- /dev/null +++ b/content/develop/ai/search-and-query/indexing/vector-indexing.md @@ -0,0 +1,50 @@ +--- +aliases: +- /develop/interact/search-and-query/indexing/vector-indexing +- /develop/ai/search-and-query/indexing/vector-indexing +categories: +- docs +- develop +- stack +- oss +- rs +- rc +- oss +- kubernetes +- clients +description: How to index and search vector embeddings for similarity search and machine learning applications +linkTitle: Vector indexing +title: Vector indexing +weight: 40 +--- + +You can index vector embeddings in Redis to enable similarity search, recommendation systems, and machine learning applications. Vector indexing allows you to find documents with similar content, perform semantic search, and build AI-powered features. + +## What are vector embeddings? + +Vector embeddings are numerical representations of data (text, images, audio) that capture semantic meaning in high-dimensional space. Similar items have similar vector representations, enabling you to find related content through mathematical distance calculations. + +**Common use cases:** +- **Semantic search**: Find documents with similar meaning, not just matching keywords +- **Recommendation engines**: Suggest products, content, or users based on similarity +- **Image search**: Find visually similar images or objects +- **Anomaly detection**: Identify outliers in data patterns +- **Chatbots and AI**: Enable context-aware responses and retrieval-augmented generation (RAG) + +## Comprehensive documentation + +For detailed information about vector search capabilities, algorithms, parameters, and advanced use cases, see the comprehensive [Vector search documentation]({{< relref "/develop/ai/search-and-query/vectors" >}}). + +The vectors page covers: +- **Detailed algorithm comparisons** and parameter tuning +- **Advanced query techniques** and filtering options +- **Performance optimization** strategies +- **Client library examples** in multiple languages +- **Production deployment** best practices +- **Troubleshooting** and monitoring guidance + +## Next steps + +- Learn about [field and type options]({{< relref "/develop/ai/search-and-query/indexing/field-and-type-options" >}}) for vector configuration +- Explore [JSON indexing]({{< relref "/develop/ai/search-and-query/indexing/json-indexing" >}}) for storing embeddings with metadata +- See [search techniques]({{< relref "/develop/ai/search-and-query/indexing/search-techniques" >}}) for combining vector and traditional search diff --git a/content/develop/clients/dotnet/vecsearch.md b/content/develop/clients/dotnet/vecsearch.md index 55b241c9c7..df106e74c1 100644 --- a/content/develop/clients/dotnet/vecsearch.md +++ b/content/develop/clients/dotnet/vecsearch.md @@ -245,7 +245,7 @@ try { db.FT().DropIndex("vector_idx");} catch {} Next, create the index. The schema in the example below includes three fields: the text content to index, a -[tag]({{< relref "/develop/ai/search-and-query/advanced-concepts/tags" >}}) +[tag]({{< relref "/develop/ai/search-and-query/indexing/tags" >}}) field to represent the "genre" of the text, and the embedding vector generated from the original text content. The `embedding` field specifies [HNSW]({{< relref "/develop/ai/search-and-query/vectors#hnsw-index" >}}) diff --git a/content/develop/clients/go/vecsearch.md b/content/develop/clients/go/vecsearch.md index 3c2b1dfb81..836c401885 100644 --- a/content/develop/clients/go/vecsearch.md +++ b/content/develop/clients/go/vecsearch.md @@ -140,7 +140,7 @@ rdb.FTDropIndexWithArgs(ctx, Next, create the index. The schema in the example below specifies hash objects for storage and includes three fields: the text content to index, a -[tag]({{< relref "/develop/ai/search-and-query/advanced-concepts/tags" >}}) +[tag]({{< relref "/develop/ai/search-and-query/indexing/tags" >}}) field to represent the "genre" of the text, and the embedding vector generated from the original text content. The `embedding` field specifies [HNSW]({{< relref "/develop/ai/search-and-query/vectors#hnsw-index" >}}) diff --git a/content/develop/clients/jedis/vecsearch.md b/content/develop/clients/jedis/vecsearch.md index 95f7a46208..a75b1ab275 100644 --- a/content/develop/clients/jedis/vecsearch.md +++ b/content/develop/clients/jedis/vecsearch.md @@ -150,7 +150,7 @@ try {jedis.ftDropIndex("vector_idx");} catch (JedisDataException j){} Next, we create the index. The schema in the example below includes three fields: the text content to index, a -[tag]({{< relref "/develop/ai/search-and-query/advanced-concepts/tags" >}}) +[tag]({{< relref "/develop/ai/search-and-query/indexing/tags" >}}) field to represent the "genre" of the text, and the embedding vector generated from the original text content. The `embedding` field specifies [HNSW]({{< relref "/develop/ai/search-and-query/vectors#hnsw-index" >}}) diff --git a/content/develop/clients/nodejs/vecsearch.md b/content/develop/clients/nodejs/vecsearch.md index 6973858573..c1859f1885 100644 --- a/content/develop/clients/nodejs/vecsearch.md +++ b/content/develop/clients/nodejs/vecsearch.md @@ -106,7 +106,7 @@ try { Next, create the index with the following schema: - `content`: Text field for the content to index -- `genre`: [Tag]({{< relref "/develop/ai/search-and-query/advanced-concepts/tags" >}}) +- `genre`: [Tag]({{< relref "/develop/ai/search-and-query/indexing/tags" >}}) field representing the text's genre - `embedding`: [Vector]({{< relref "/develop/ai/search-and-query/vectors" >}}) field with: diff --git a/content/develop/clients/patterns/indexes/index.md b/content/develop/clients/patterns/indexes/index.md index dfab5476b2..c6602e741e 100644 --- a/content/develop/clients/patterns/indexes/index.md +++ b/content/develop/clients/patterns/indexes/index.md @@ -46,7 +46,7 @@ Once hash or JSON keys have been indexed using the [`FT.CREATE`]({{< relref "com For more information on creating hash and JSON indexes, see the following pages. -* [Hash indexes]({{< relref "/develop/ai/search-and-query/indexing/schema-definition" >}}) +* [Hash indexes]({{< relref "/develop/ai/search-and-query/indexing/hash-indexing" >}}) * [JSON indexes]({{< relref "/develop/ai/search-and-query/indexing" >}}) ## Simple numerical indexes with sorted sets @@ -182,7 +182,7 @@ index. When you create a new time series using the [`TS.CREATE`]({{< relref "commands/ts.create" >}}) command, you can associate one or more `LABELS` with it. Each label is a name-value pair, where the both name and value are text. Labels serve as a secondary index that allows you to execute queries on groups of time series keys using various time series commands. -See the [time series quickstart guide]({{< relref "/develop/data-types/timeseries/quickstart#labels" >}}) for an example of creating a time series with a label. +See the [time series documentation]({{< relref "/develop/data-types/timeseries" >}}) for examples of creating time series with labels. The [`TS.MGET`]({{< relref "commands/ts.mget" >}}), [`TS.MRANGE`]({{< relref "commands/ts.mrange" >}}), and [`TS.MREVRANGE`]({{< relref "commands/ts.mrevrange" >}}) commands operate on multiple time series based on specified labels or using a label-related filter expression. The [`TS.QUERYINDEX`]({{< relref "commands/ts.queryindex" >}}) command returns all time series keys matching a given label-related filter expression. diff --git a/content/develop/clients/php/vecsearch.md b/content/develop/clients/php/vecsearch.md index 57d027510e..045feb8a09 100644 --- a/content/develop/clients/php/vecsearch.md +++ b/content/develop/clients/php/vecsearch.md @@ -109,7 +109,7 @@ try { Next, create the index. The schema in the example below includes three fields: the text content to index, a -[tag]({{< relref "/develop/ai/search-and-query/advanced-concepts/tags" >}}) +[tag]({{< relref "/develop/ai/search-and-query/indexing/tags" >}}) field to represent the "genre" of the text, and the embedding vector generated from the original text content. The `embedding` field specifies [HNSW]({{< relref "/develop/ai/search-and-query/vectors#hnsw-index" >}}) diff --git a/content/develop/clients/redis-py/vecsearch.md b/content/develop/clients/redis-py/vecsearch.md index 23758077e4..b579b10887 100644 --- a/content/develop/clients/redis-py/vecsearch.md +++ b/content/develop/clients/redis-py/vecsearch.md @@ -101,7 +101,7 @@ except redis.exceptions.ResponseError: Next, create the index. The schema in the example below specifies hash objects for storage and includes three fields: the text content to index, a -[tag]({{< relref "/develop/ai/search-and-query/advanced-concepts/tags" >}}) +[tag]({{< relref "/develop/ai/search-and-query/indexing/tags" >}}) field to represent the "genre" of the text, and the embedding vector generated from the original text content. The `embedding` field specifies [HNSW]({{< relref "/develop/ai/search-and-query/vectors#hnsw-index" >}}) diff --git a/content/develop/data-types/geospatial.md b/content/develop/data-types/geospatial.md index e4144075da..5711c47257 100644 --- a/content/develop/data-types/geospatial.md +++ b/content/develop/data-types/geospatial.md @@ -21,7 +21,7 @@ Redis geospatial indexes let you store coordinates and search for them. This data structure is useful for finding nearby points within a given radius or bounding box. {{< note >}}Take care not to confuse the Geospatial data type with the -[Geospatial]({{< relref "/develop/ai/search-and-query/advanced-concepts/geo" >}}) +[Geospatial indexing]({{< relref "/develop/ai/search-and-query/indexing/geospatial" >}}) features in [Redis Query Engine]({{< relref "/develop/ai/search-and-query" >}}). Although there are some similarities between these two features, the data type is intended for simpler use cases and doesn't have the range of format options and queries diff --git a/content/develop/whats-new/_index.md b/content/develop/whats-new/_index.md index 84bd2cf07b..e5770e2607 100644 --- a/content/develop/whats-new/_index.md +++ b/content/develop/whats-new/_index.md @@ -71,8 +71,8 @@ weight: 10 - [Index lifecycle]({{< relref "/develop/ai/search-and-query/best-practices/index-mgmt-best-practices.md" >}}) - New/updated topics: - [Autocomplete]({{< relref "/develop/ai/search-and-query/advanced-concepts/autocomplete.md" >}}) - - [Escaping & tokenization]({{< relref "/develop/ai/search-and-query/advanced-concepts/escaping.md" >}}) - - [Geo indexing]({{< relref "/develop/ai/search-and-query/indexing/geoindex.md" >}}) + - [Tokenization]({{< relref "/develop/ai/search-and-query/indexing/tokenization.md" >}}) + - [Geospatial indexing]({{< relref "/develop/ai/search-and-query/indexing/geospatial.md" >}}) - [Sorting, scoring, stemming]({{< relref "/develop/ai/search-and-query/advanced-concepts/sorting.md" >}}) --- diff --git a/content/operate/oss_and_stack/stack-with-enterprise/release-notes/redisearch/redisearch-2.0-release-notes.md b/content/operate/oss_and_stack/stack-with-enterprise/release-notes/redisearch/redisearch-2.0-release-notes.md index d817c35a65..4f13cc68b0 100644 --- a/content/operate/oss_and_stack/stack-with-enterprise/release-notes/redisearch/redisearch-2.0-release-notes.md +++ b/content/operate/oss_and_stack/stack-with-enterprise/release-notes/redisearch/redisearch-2.0-release-notes.md @@ -150,7 +150,7 @@ Details: - #[1696](https://github.com/redisearch/redisearch/issues/1696) The maximum number of results produced by `FT.AGGREGATE` is now configurable: `MAXAGGREGATERESULTS`. - #[1708](https://github.com/redisearch/redisearch/issues/1708) [Stemming]({{}}) updated with support of new languages: Basque, Catalan, Greek, Indonesian, Irish, Lithuanian, Nepali. - Minor bugfixes: - - #[1668](https://github.com/redisearch/redisearch/issues/1668) Fixes support of stop words in [tag fields]({{}}). Solves also the following related issues: #[166](https://github.com/redisearch/redisearch/issues/166), #[984](https://github.com/redisearch/redisearch/issues/984), #[1237](https://github.com/redisearch/redisearch/issues/1237), #[1294](https://github.com/redisearch/redisearch/issues/1294). + - #[1668](https://github.com/redisearch/redisearch/issues/1668) Fixes support of stop words in [tag fields]({{}}). Solves also the following related issues: #[166](https://github.com/redisearch/redisearch/issues/166), #[984](https://github.com/redisearch/redisearch/issues/984), #[1237](https://github.com/redisearch/redisearch/issues/1237), #[1294](https://github.com/redisearch/redisearch/issues/1294). - #[1689](https://github.com/redisearch/redisearch/issues/1689) Consistency fix and performance improvement when using `FT.SUGGET` with [RSCoordinator](https://github.com/RediSearch/RSCoordinator). - #[1774](https://github.com/redisearch/redisearch/issues/1774) `MINPREFIX` and `MAXFILTEREXPANSION` configuration options can be changed at runtime. - #[1745](https://github.com/redisearch/redisearch/issues/1745) Enforce 0 value for `REDUCER COUNT`. diff --git a/content/operate/oss_and_stack/stack-with-enterprise/search/scalable-query-best-practices.md b/content/operate/oss_and_stack/stack-with-enterprise/search/scalable-query-best-practices.md index c50aeb9a1e..18b9119549 100644 --- a/content/operate/oss_and_stack/stack-with-enterprise/search/scalable-query-best-practices.md +++ b/content/operate/oss_and_stack/stack-with-enterprise/search/scalable-query-best-practices.md @@ -18,7 +18,7 @@ weight: 25 - [Full-text]({{}}) - - [Tag]({{}}) + - [Tag]({{}}) - [Vector]({{}})