Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions content/develop/interact/search-and-query/best-practices/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
categories:
- docs
- develop
- stack
- oss
description: Redis Query Engine best practices
linkTitle: Best practices
title: Best practices
weight: 8
---
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
---
Title: Best practices for Redis Query Engine performance
alwaysopen: false
categories:
- docs
- develop
- stack
- oss
- kubernetes
- clients
linkTitle: RQE performance
weight: 1
---

{{< note >}}
If you're using Redis Software or Redis Cloud, see the [best practices for scalable Redis Query Engine]({{< relref "/operate/oss_and_stack/stack-with-enterprise/search/scalable-query-best-practices" >}}) page.
{{< /note >}}

## Checklist
Below are some basic steps to ensure good performance of the Redis Query Engine (RQE).

* Create a Redis data model with your query patterns in mind.
* Ensure the Redis architecture has been sized for the expected load using the [sizing calculator](https://redis.io/redisearch-sizing-calculator/).
* Provision Redis nodes with sufficient resources (RAM, CPU, network) to support the expected maximum load.
* Review [`FT.INFO`]({{< baseurl >}}/commands/ft.info) and [`FT.PROFILE`]({{< baseurl >}}/commands/ft.profile) outputs for anomalies and/or errors.
* Conduct load testing in a test environment with real-world queries and a load generated by either [memtier_benchmark](https://github.com/redislabs/memtier_benchmark) or a custom load application.

## Indexing considerations

### General
- Favor [`TAG`]({{< relref "/develop/interact/search-and-query/basic-constructs/field-and-type-options#tag-fields" >}}) over [`NUMERIC`]({{< relref "/develop/interact/search-and-query/basic-constructs/field-and-type-options#numeric-fields" >}}) for use cases that only require matching.
- Favor [`TAG`]({{< relref "/develop/interact/search-and-query/basic-constructs/field-and-type-options#tag-fields" >}}) over [`TEXT`]({{< relref "/develop/interact/search-and-query/basic-constructs/field-and-type-options#text-fields" >}}) for use cases that don’t require full-text capabilities (pure match).

### Non-threaded search
- Put only those fields used in your queries in the index.
- Only make fields [`SORTABLE`]({{< relref "/develop/interact/search-and-query/advanced-concepts/sorting" >}}) if they are used in [`SORTBY`]({{< relref "/develop/interact/search-and-query/advanced-concepts/sorting#specifying-sortby" >}})
queries.
- Use [`DIALECT 4`]({{< relref "/develop/interact/search-and-query/advanced-concepts/dialects#dialect-4" >}}).

### Threaded (query performance factor or QPF) search
- Put both query fields and any projected fields (`RETURN` or `LOAD`) in the index.
- Set all fields to `SORTABLE`.
- Set TAG fields to [UNF]({{< relref "/develop/interact/search-and-query/advanced-concepts/sorting#normalization-unf-option" >}}).
- Optional: Set `TEXT` fields to `NOSTEM` if the use case will support it.
- Use [`DIALECT 4`]({{< relref "/develop/interact/search-and-query/advanced-concepts/dialects#dialect-4" >}}).

## Query optimization

- Avoid returning large result sets. Use `CURSOR` or `LIMIT`.
- Avoid wildcard searches.
- Avoid projecting all fields (e.g., `LOAD *`). Project only those fields that are part of the index schema.
- If queries are long-running, enable threading (query performance factor) to reduce contention for the main Redis thread.

## Validate performance (`FT.PROFILE`)

You can analyze [`FT.PROFILE`]({{< baseurl >}}/commands/ft.profile) output to gain insights about query execution.
The following informational items are available for analysis:

- Total execution time
- Execution time per shard
- Coordination time (for multi-sharded environments)
- Breakdown of the query into fundamental components, such as `UNION` and `INTERSECT`
- Warnings, such as `TIMEOUT`

## Anti-patterns

When designing and querying indexes in RQE, certain practices can hinder performance, scalability, and maintainability. Below are some common anti-patterns to avoid:

- **Large documents**: storing excessively large documents in Redis makes data retrieval slower and increases memory usage. Break data into smaller, focused records whenever possible.
- **Deeply-nested fields**: retrieving or indexing deeply-nested JSON fields is computationally expensive. Use a flatter schema for better performance.
- **Large result sets**: fetching unnecessarily large result sets puts a strain on memory and network resources. Limit results to only what is needed.
- **Wildcarding**: using wildcard patterns indiscriminately in queries can lead to large and inefficient scans, especially if the index size is significant.
- **Large projections**: including excessive fields in query results increases memory overhead and slows down query execution. Limit projections to essential fields.

The following examples depict an anti-pattern index schema and query, followed by corrected versions designed for scalability with RQE.

### Anti-pattern index schema

The following schema introduces challenges for scalability and performance:

```sh
FT.CREATE jsonidx:profiles ON JSON PREFIX 1 profiles:
SCHEMA $.tags.* as t NUMERIC SORTABLE
$.firstName as name TEXT
$.location as loc GEO
```

Issues:

- Minimal schema definition: the schema is sparse and lacks fields like `lastName`, `id`, and `version` that might be frequently queried. This results in additional operations to fetch these fields separately, reducing efficiency.
- Missing `SORTABLE` flag for text fields: sorting operations on unsortable fields require full-text processing, which is slow.
- Wildcard indexing: `$.tags.*` creates a broad index that can lead to excessive memory usage and reduced query performance.

### Anti-pattern query

The following query is inefficient and not optimized for vertical scaling:

```sh
FT.AGGREGATE jsonidx:profiles '@t:[1299 1299]' LOAD * LIMIT 0 10
```
Issues:

- Wildcard projection (`LOAD *`): retrieving all fields in the result set is inefficient and increases memory usage, especially if the documents are large.
- Unnecessary fields: fields that aren't required for the current operation are still fetched, slowing down execution.
- Lack of advanced query syntax: without specifying a query dialect or leveraging features like tagging, the query may perform unnecessary computations.

### Improved index schema

Here’s an optimized schema that adheres to best practices for vertical scaling:

```sh
FT.CREATE jsonidx:profiles ON JSON PREFIX 1 profiles:
SCHEMA $.tags.* as t NUMERIC SORTABLE
$.firstName as name TEXT NOSTEM SORTABLE
$.lastName as lastname TEXT NOSTEM SORTABLE
$.location as loc GEO SORTABLE
$.id as id TAG SORTABLE UNF
$.ver as ver TAG SORTABLE UNF
```

Improvements:

- `NOSTEM` for text fields: prevents stemming on fields like `firstName` and `lastName` to allow for exact matches (e.g., "Smith" stays "Smith").
- Expanded schema: adds commonly queried fields like `lastName`, `id`, and `version`, making queries more efficient by reducing the need for post-query data retrieval.
- `TAG` fields: `id` and `ver` are defined as `TAG` fields to support fast filtering with exact matches.
- `SORTABLE` for all relevant fields: ensures that sorting operations are efficient without requiring full-text scanning.

You might be wondering why `$.tags.* as t NUMERIC SORTABLE` is acceptable in the improved schema and it wasn't previously.
The inclusion of `$.tags.*` is acceptable when:

- It has a clear purpose: it is actively used in queries, such as filtering on numeric ranges or matching specific values.
- Other fields in the schema complement it: these fields reduce over-reliance on `$.tags.*` for all query operations, distributing the load more evenly.
- Projections and limits are managed carefully: queries that use `$.tags.*` should avoid loading unnecessary fields or returning excessively large result sets.

### Improved query

The following query is better suited for vertical scaling:

```sh
FT.AGGREGATE jsonidx:profiles '@t:[1299 1299]'
LOAD 6 id t name lastname loc ver
LIMIT 0 10
DIALECT 3
```

Improvements:

- Targeted projection: the `LOAD` clause specifies only essential fields (`id, t, name, lastname, loc, ver`), reducing memory and network overhead.
- Limited results: the `LIMIT` clause ensures the query retrieves only the first 10 results, avoiding large result sets.
- [`DIALECT 3`]({{< relref "/develop/interact/search-and-query/advanced-concepts/dialects#dialect-3" >}}): enables the latest RQE syntax and features, ensuring compatibility with modern capabilities.
Loading