diff --git a/content/develop/ai/index.md b/content/develop/ai/index.md index f251cde016..3f87dcb206 100644 --- a/content/develop/ai/index.md +++ b/content/develop/ai/index.md @@ -14,7 +14,7 @@ Redis stores and indexes vector embeddings that semantically represent unstructu | Vector | RAG | RedisVL | | :-- | :-- | :-- | -| {{AI Redis icon.}}[Redis vector database quick start guide]({{< relref "/develop/get-started/vector-database" >}}) |{{AI Redis icon.}} [Retrieval-Augmented Generation quick start guide]({{< relref "/develop/get-started/rag" >}}) | {{AI Redis icon.}}[Redis vector Python client library documentation]({{< relref "/integrate/redisvl/overview/" >}}) | +| {{AI Redis icon.}}[Redis vector database quick start guide]({{< relref "/develop/get-started/vector-database" >}}) |{{AI Redis icon.}} [Retrieval-Augmented Generation quick start guide]({{< relref "/develop/get-started/rag" >}}) | {{AI Redis icon.}}[Redis vector Python client library documentation]({{< relref "/integrate/redisvl/" >}}) | #### Overview diff --git a/content/integrate/redisvl/_index.md b/content/integrate/redisvl/_index.md index b6c65f9e7e..2e3e1a4ba2 100644 --- a/content/integrate/redisvl/_index.md +++ b/content/integrate/redisvl/_index.md @@ -1,4 +1,20 @@ --- +aliases: +- /integrate/redisvl/api +- /integrate/redisvl/api/cache +- /integrate/redisvl/api/filter +- /integrate/redisvl/api/query +- /integrate/redisvl/api/schema +- /integrate/redisvl/api/searchindex +- /integrate/redisvl/api/vectorizer +- /integrate/redisvl/overview +- /integrate/redisvl/overview/cli +- /integrate/redisvl/user-guide +- /integrate/redisvl/user-guide/get-started +- /integrate/redisvl/user-guide/json-v-hashes +- /integrate/redisvl/user-guide/query-filter +- /integrate/redisvl/user-guide/semantic-caching +- /integrate/redisvl/user-guide/vectorizers categories: - docs - integrate @@ -8,7 +24,7 @@ categories: - rc - oss - clients -description: This is the Redis vector library. +description: This is the Redis vector library (RedisVL). group: library hidden: false linkTitle: RedisVL @@ -19,4 +35,12 @@ title: RedisVL type: integration weight: 1 --- +RedisVL is a powerful, dedicated Python client library for Redis that enables seamless integration and management of high-dimensional vector data. +Built to support machine learning and artificial intelligence workflows, RedisVL simplifies the process of storing, searching, and analyzing vector embeddings, which are commonly used for tasks like recommendation systems, semantic search, and anomaly detection. +Key features of RedisVL include: + +- Vector Similarity Search: Efficiently find nearest neighbors in high-dimensional spaces using algorithms like HNSW (Hierarchical Navigable Small World). +- Integration with AI Frameworks: RedisVL works seamlessly with popular frameworks such as TensorFlow, PyTorch, and Hugging Face, making it easy to deploy AI models. +- Scalable and Fast: Leveraging Redis's in-memory architecture, RedisVL provides low-latency access to vector data, even at scale. +- By bridging the gap between data storage and AI model deployment, RedisVL empowers developers to build intelligent, real-time applications with minimal infrastructure complexity. diff --git a/content/integrate/redisvl/api/_index.md b/content/integrate/redisvl/api/_index.md deleted file mode 100644 index 64f0649887..0000000000 --- a/content/integrate/redisvl/api/_index.md +++ /dev/null @@ -1,7 +0,0 @@ ---- -description: The RedisVL API guide -linkTitle: API Guide -title: API Guide -type: integration -weight: 3 ---- diff --git a/content/integrate/redisvl/api/cache.md b/content/integrate/redisvl/api/cache.md deleted file mode 100644 index 12a659572e..0000000000 --- a/content/integrate/redisvl/api/cache.md +++ /dev/null @@ -1,206 +0,0 @@ ---- -description: The LLMCache APIs -linkTitle: LLMCache -title: LLMCache -type: integration -weight: 7 ---- - -## SemanticCache - - - -### *class* SemanticCache(name='llmcache', prefix=None, distance_threshold=0.1, ttl=None, vectorizer=HFTextVectorizer(model='sentence-transformers/all-mpnet-base-v2', dims=768, client=SentenceTransformer( (0): Transformer({'max_seq_length': 384, 'do_lower_case': False}) with Transformer model: MPNetModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False}) (2): Normalize() )), redis_client=None, redis_url='redis://localhost:6379', connection_args={}, \*\*kwargs) - -Bases: `BaseLLMCache` - -Semantic Cache for Large Language Models. - -Semantic Cache for Large Language Models. - -* **Parameters:** - * **name** (*str* *,* *optional*) – The name of the semantic cache search index. - Defaults to “llmcache”. - * **prefix** (*Optional* *[**str* *]* *,* *optional*) – The prefix for Redis keys - associated with the semantic cache search index. Defaults to - None, and the index name will be used as the key prefix. - * **distance_threshold** (*float* *,* *optional*) – Semantic threshold for the - cache. Defaults to 0.1. - * **ttl** (*Optional* *[**int* *]* *,* *optional*) – The time-to-live for records cached - in Redis. Defaults to None. - * **vectorizer** (*BaseVectorizer* *,* *optional*) – The vectorizer for the cache. - Defaults to HFTextVectorizer. - * **redis_client** (*Redis* *,* *optional*) – A redis client connection instance. - Defaults to None. - * **redis_url** (*str* *,* *optional*) – The redis url. Defaults to - “redis://localhost:6379”. - * **connection_args** (*Dict* *[**str* *,* *Any* *]* *,* *optional*) – The connection arguments - for the redis client. Defaults to None. -* **Raises:** - * **TypeError** – If an invalid vectorizer is provided. - * **TypeError** – If the TTL value is not an int. - * **ValueError** – If the threshold is not between 0 and 1. - * **ValueError** – If the index name is not provided - -### check(prompt=None, vector=None, num_results=1, return_fields=None) - -Checks the semantic cache for results similar to the specified prompt -or vector. - -This method searches the cache using vector similarity with -either a raw text prompt (converted to a vector) or a provided vector as -input. It checks for semantically similar prompts and fetches the cached -LLM responses. - -* **Parameters:** - * **prompt** (*Optional* *[**str* *]* *,* *optional*) – The text prompt to search for in - the cache. - * **vector** (*Optional* *[**List* *[**float* *]* *]* *,* *optional*) – The vector representation - of the prompt to search for in the cache. - * **num_results** (*int* *,* *optional*) – The number of cached results to return. - Defaults to 1. - * **return_fields** (*Optional* *[**List* *[**str* *]* *]* *,* *optional*) – The fields to include - in each returned result. If None, defaults to all available - fields in the cached entry. -* **Returns:** - A list of dicts containing the requested - : return fields for each similar cached response. -* **Return type:** - List[Dict[str, Any]] -* **Raises:** - * **ValueError** – If neither a prompt nor a vector is specified. - * **TypeError** – If return_fields is not a list when provided. - -```python -response = cache.check( - prompt="What is the captial city of France?" -) -``` - -### clear() - -Clear the cache of all keys while preserving the index. - -* **Return type:** - None - -### delete() - -Clear the semantic cache of all keys and remove the underlying search -index. - -* **Return type:** - None - -### deserialize(metadata) - -Deserialize the input from a string. - -* **Parameters:** - **metadata** (*str*) – -* **Return type:** - *Dict*[str, *Any*] - -### hash_input(prompt) - -Hashes the input using SHA256. - -* **Parameters:** - **prompt** (*str*) – - -### serialize(metadata) - -Serlize the input into a string. - -* **Parameters:** - **metadata** (*Dict* *[**str* *,* *Any* *]*) – -* **Return type:** - str - -### set_threshold(distance_threshold) - -Sets the semantic distance threshold for the cache. - -* **Parameters:** - **distance_threshold** (*float*) – The semantic distance threshold for - the cache. -* **Raises:** - **ValueError** – If the threshold is not between 0 and 1. -* **Return type:** - None - -### set_ttl(ttl=None) - -Set the default TTL, in seconds, for entries in the cache. - -* **Parameters:** - **ttl** (*Optional* *[**int* *]* *,* *optional*) – The optional time-to-live expiration - for the cache, in seconds. -* **Raises:** - **ValueError** – If the time-to-live value is not an integer. - -### set_vectorizer(vectorizer) - -Sets the vectorizer for the LLM cache. - -Must be a valid subclass of BaseVectorizer and have equivalent -dimensions to the vector field defined in the schema. - -* **Parameters:** - **vectorizer** (*BaseVectorizer*) – The RedisVL vectorizer to use for - vectorizing cache entries. -* **Raises:** - * **TypeError** – If the vectorizer is not a valid type. - * **ValueError** – If the vector dimensions are mismatched. -* **Return type:** - None - -### store(prompt, response, vector=None, metadata=None) - -Stores the specified key-value pair in the cache along with metadata. - -* **Parameters:** - * **prompt** (*str*) – The user prompt to cache. - * **response** (*str*) – The LLM response to cache. - * **vector** (*Optional* *[**List* *[**float* *]* *]* *,* *optional*) – The prompt vector to - cache. Defaults to None, and the prompt vector is generated on - demand. - * **metadata** (*Optional* *[**dict* *]* *,* *optional*) – The optional metadata to cache - alongside the prompt and response. Defaults to None. -* **Returns:** - The Redis key for the entries added to the semantic cache. -* **Return type:** - str -* **Raises:** - * **ValueError** – If neither prompt nor vector is specified. - * **TypeError** – If provided metadata is not a dictionary. - -```python -key = cache.store( - prompt="What is the captial city of France?", - response="Paris", - metadata={"city": "Paris", "country": "France"} -) -``` - -### *property* distance_threshold *: float* - -The semantic distance threshold for the cache. - -* **Returns:** - The semantic distance threshold. -* **Return type:** - float - -### *property* index *: [SearchIndex](searchindex.md#redisvl.index.SearchIndex)* - -The underlying SearchIndex for the cache. - -* **Returns:** - The search index. -* **Return type:** - [SearchIndex](searchindex.md#redisvl.index.SearchIndex) - -### *property* ttl *: int | None* - -The default TTL, in seconds, for entries in the cache. diff --git a/content/integrate/redisvl/api/filter.md b/content/integrate/redisvl/api/filter.md deleted file mode 100644 index 0620ee4dbf..0000000000 --- a/content/integrate/redisvl/api/filter.md +++ /dev/null @@ -1,352 +0,0 @@ ---- -description: The filter APIs -linkTitle: Filter -title: Filter -type: integration -weight: 5 ---- - - - -## FilterExpression - -### *class* FilterExpression(\_filter=None, operator=None, left=None, right=None) - -A FilterExpression is a logical combination of filters in RedisVL. - -FilterExpressions can be combined using the & and | operators to create -complex expressions that evaluate to the Redis Query language. - -This presents an interface by which users can create complex queries -without having to know the Redis Query language. - -```python -from redisvl.query.filter import Tag, Num - -brand_is_nike = Tag("brand") == "nike" -price_is_over_100 = Num("price") < 100 -f = brand_is_nike & price_is_over_100 - -print(str(f)) - ->>> (@brand:{nike} @price:[-inf (100)]) -``` - -This can be combined with the VectorQuery class to create a query: - -```python -from redisvl.query import VectorQuery - -v = VectorQuery( - vector=[0.1, 0.1, 0.5, ...], - vector_field_name="product_embedding", - return_fields=["product_id", "brand", "price"], - filter_expression=f, -) -``` - -{{< note >}} -Filter expressions are typically not called directly. Instead they are -built by combining filter statements using the & and | operators. -{{< /note >}} - -* **Parameters:** - * **\_filter** (*str* *|* *None*) – - * **operator** (*FilterOperator* *|* *None*) – - * **left** ([*FilterExpression*](#redisvl.query.filter.FilterExpression) *|* *None*) – - * **right** ([*FilterExpression*](#redisvl.query.filter.FilterExpression) *|* *None*) – - -## Tag - -### *class* Tag(field) - -A Tag filter can be applied to Tag fields - -* **Parameters:** - **field** (*str*) – - -### \_\_eq_\_(other) - -Create a Tag equality filter expression. - -* **Parameters:** - **other** (*Union* *[**List* *[**str* *]* *,* *str* *]*) – The tag(s) to filter on. -* **Return type:** - [*FilterExpression*](#redisvl.query.filter.FilterExpression) - -```python -from redisvl.query.filter import Tag - -f = Tag("brand") == "nike" -``` - -### \_\_ne_\_(other) - -Create a Tag inequality filter expression. - -* **Parameters:** - **other** (*Union* *[**List* *[**str* *]* *,* *str* *]*) – The tag(s) to filter on. -* **Return type:** - [*FilterExpression*](#redisvl.query.filter.FilterExpression) - -```python -from redisvl.query.filter import Tag -f = Tag("brand") != "nike" -``` - -### \_\_str_\_() - -Return the Redis Query string for the Tag filter - -* **Return type:** - str - -## Text - -### *class* Text(field) - -A Text is a FilterField representing a text field in a Redis index. - -* **Parameters:** - **field** (*str*) – - -### \_\_eq_\_(other) - -Create a Text equality filter expression. These expressions yield -filters that enforce an exact match on the supplied term(s). - -* **Parameters:** - **other** (*str*) – The text value to filter on. -* **Return type:** - [*FilterExpression*](#redisvl.query.filter.FilterExpression) - -```python -from redisvl.query.filter import Text - -f = Text("job") == "engineer" -``` - -### \_\_mod_\_(other) - -Create a Text “LIKE” filter expression. A flexible expression that -yields filters that can use a variety of additional operators like -wildcards (\*), fuzzy matches (%%), or combinatorics (|) of the supplied -term(s). - -* **Parameters:** - **other** (*str*) – The text value to filter on. -* **Return type:** - [*FilterExpression*](#redisvl.query.filter.FilterExpression) - -```python -from redisvl.query.filter import Text - -f = Text("job") % "engine*" # suffix wild card match -f = Text("job") % "%%engine%%" # fuzzy match w/ Levenshtein Distance -f = Text("job") % "engineer|doctor" # contains either term in field -f = Text("job") % "engineer doctor" # contains both terms in field -``` - -### \_\_ne_\_(other) - -Create a Text inequality filter expression. These expressions yield -negated filters on exact matches on the supplied term(s). Opposite of an -equality filter expression. - -* **Parameters:** - **other** (*str*) – The text value to filter on. -* **Return type:** - [*FilterExpression*](#redisvl.query.filter.FilterExpression) - -```python -from redisvl.query.filter import Text - -f = Text("job") != "engineer" -``` - -### \_\_str_\_() - -Return the Redis Query string for the Text filter - -* **Return type:** - str - -## Num - -### *class* Num(field) - -A Num is a FilterField representing a numeric field in a Redis index. - -* **Parameters:** - **field** (*str*) – - -### \_\_eq_\_(other) - -Create a Numeric equality filter expression. - -* **Parameters:** - **other** (*int*) – The value to filter on. -* **Return type:** - [*FilterExpression*](#redisvl.query.filter.FilterExpression) - -```python -from redisvl.query.filter import Num -f = Num("zipcode") == 90210 -``` - -### \_\_ge_\_(other) - -Create a Numeric greater than or equal to filter expression. - -* **Parameters:** - **other** (*int*) – The value to filter on. -* **Return type:** - [*FilterExpression*](#redisvl.query.filter.FilterExpression) - -```python -from redisvl.query.filter import Num - -f = Num("age") >= 18 -``` - -### \_\_gt_\_(other) - -Create a Numeric greater than filter expression. - -* **Parameters:** - **other** (*int*) – The value to filter on. -* **Return type:** - [*FilterExpression*](#redisvl.query.filter.FilterExpression) - -```python -from redisvl.query.filter import Num - -f = Num("age") > 18 -``` - -### \_\_le_\_(other) - -Create a Numeric less than or equal to filter expression. - -* **Parameters:** - **other** (*int*) – The value to filter on. -* **Return type:** - [*FilterExpression*](#redisvl.query.filter.FilterExpression) - -```python -from redisvl.query.filter import Num - -f = Num("age") <= 18 -``` - -### \_\_lt_\_(other) - -Create a Numeric less than filter expression. - -* **Parameters:** - **other** (*int*) – The value to filter on. -* **Return type:** - [*FilterExpression*](#redisvl.query.filter.FilterExpression) - -```python -from redisvl.query.filter import Num - -f = Num("age") < 18 -``` - -### \_\_ne_\_(other) - -Create a Numeric inequality filter expression. - -* **Parameters:** - **other** (*int*) – The value to filter on. -* **Return type:** - [*FilterExpression*](#redisvl.query.filter.FilterExpression) - -```python -from redisvl.query.filter import Num - -f = Num("zipcode") != 90210 -``` - -### \_\_str_\_() - -Return the Redis Query string for the Numeric filter - -* **Return type:** - str - -## Geo - -### *class* Geo(field) - -A Geo is a FilterField representing a geographic (lat/lon) field in a -Redis index. - -* **Parameters:** - **field** (*str*) – - -### \_\_eq_\_(other) - -Create a geographic filter within a specified GeoRadius. - -* **Parameters:** - **other** ([*GeoRadius*](#redisvl.query.filter.GeoRadius)) – The geographic spec to filter on. -* **Return type:** - [*FilterExpression*](#redisvl.query.filter.FilterExpression) - -```python -from redisvl.query.filter import Geo, GeoRadius - -f = Geo("location") == GeoRadius(-122.4194, 37.7749, 1, unit="m") -``` - -### \_\_ne_\_(other) - -Create a geographic filter outside of a specified GeoRadius. - -* **Parameters:** - **other** ([*GeoRadius*](#redisvl.query.filter.GeoRadius)) – The geographic spec to filter on. -* **Return type:** - [*FilterExpression*](#redisvl.query.filter.FilterExpression) - -```python -from redisvl.query.filter import Geo, GeoRadius - -f = Geo("location") != GeoRadius(-122.4194, 37.7749, 1, unit="m") -``` - -### \_\_str_\_() - -Return the Redis Query string for the Geo filter - -* **Return type:** - str - -## GeoRadius - -### *class* GeoRadius(longitude, latitude, radius=1, unit='km') - -A GeoRadius is a GeoSpec representing a geographic radius. - -Create a GeoRadius specification (GeoSpec) - -* **Parameters:** - * **longitude** (*float*) – The longitude of the center of the radius. - * **latitude** (*float*) – The latitude of the center of the radius. - * **radius** (*int* *,* *optional*) – The radius of the circle. Defaults to 1. - * **unit** (*str* *,* *optional*) – The unit of the radius. Defaults to “km”. -* **Raises:** - **ValueError** – If the unit is not one of “m”, “km”, “mi”, or “ft”. - -### \_\_init_\_(longitude, latitude, radius=1, unit='km') - -Create a GeoRadius specification (GeoSpec) - -* **Parameters:** - * **longitude** (*float*) – The longitude of the center of the radius. - * **latitude** (*float*) – The latitude of the center of the radius. - * **radius** (*int* *,* *optional*) – The radius of the circle. Defaults to 1. - * **unit** (*str* *,* *optional*) – The unit of the radius. Defaults to “km”. -* **Raises:** - **ValueError** – If the unit is not one of “m”, “km”, “mi”, or “ft”. diff --git a/content/integrate/redisvl/api/query.md b/content/integrate/redisvl/api/query.md deleted file mode 100644 index 64dfecef72..0000000000 --- a/content/integrate/redisvl/api/query.md +++ /dev/null @@ -1,320 +0,0 @@ ---- -description: The query APIs -linkTitle: Query -title: Query -type: integration -weight: 3 ---- - -## VectorQuery - - - -### *class* VectorQuery(vector, vector_field_name, return_fields=None, filter_expression=None, dtype='float32', num_results=10, return_score=True, dialect=2) - -A query for running a vector search along with an optional filter -expression. - -* **Parameters:** - * **vector** (*List* *[**float* *]*) – The vector to perform the vector search with. - * **vector_field_name** (*str*) – The name of the vector field to search - against in the database. - * **return_fields** (*List* *[**str* *]*) – The declared fields to return with search - results. - * **filter_expression** ([*FilterExpression*](filter.md#redisvl.query.filter.FilterExpression) *,* *optional*) – A filter to apply - along with the vector search. Defaults to None. - * **dtype** (*str* *,* *optional*) – The dtype of the vector. Defaults to - “float32”. - * **num_results** (*int* *,* *optional*) – The top k results to return from the - vector search. Defaults to 10. - * **return_score** (*bool* *,* *optional*) – Whether to return the vector - distance. Defaults to True. - * **dialect** (*int* *,* *optional*) – The RediSearch query dialect. - Defaults to 2. -* **Raises:** - **TypeError** – If filter_expression is not of type redisvl.query.FilterExpression - -{{< note >}} -Learn more about vector queries in Redis: [{{< relref "/develop/interact/search-and-query/" >}}search/vectors/#knn-search]({{< relref "/develop/interact/search-and-query/" >}}search/vectors/#knn-search) -{{< /note >}} - -### get_filter() - -Get the filter expression for the query. - -* **Returns:** - The filter for the query. -* **Return type:** - [FilterExpression](filter.md#redisvl.query.filter.FilterExpression) - -### set_filter(filter_expression=None) - -Set the filter expression for the query. - -* **Parameters:** - **filter_expression** (*Optional* *[*[*FilterExpression*](filter.md#redisvl.query.filter.FilterExpression) *]* *,* *optional*) – The filter - to apply to the query. -* **Raises:** - **TypeError** – If filter_expression is not of type redisvl.query.FilterExpression - -### set_paging(first, limit) - -Set the paging parameters for the query to limit the number of -results. - -* **Parameters:** - * **first** (*int*) – The zero-indexed offset for which to fetch query results - * **limit** (*int*) – The max number of results to include including the offset -* **Raises:** - **TypeError** – If first or limit are NOT integers. - -### *property* params *: Dict[str, Any]* - -Return the parameters for the query. - -* **Returns:** - The parameters for the query. -* **Return type:** - Dict[str, Any] - -### *property* query *: Query* - -Return a Redis-Py Query object representing the query. - -* **Returns:** - The Redis-Py query object. -* **Return type:** - redis.commands.search.query.Query - -## RangeQuery - -### *class* RangeQuery(vector, vector_field_name, return_fields=None, filter_expression=None, dtype='float32', distance_threshold=0.2, num_results=10, return_score=True, dialect=2) - -A query for running a filtered vector search based on semantic -distance threshold. - -* **Parameters:** - * **vector** (*List* *[**float* *]*) – The vector to perform the range query with. - * **vector_field_name** (*str*) – The name of the vector field to search - against in the database. - * **return_fields** (*List* *[**str* *]*) – The declared fields to return with search - results. - * **filter_expression** ([*FilterExpression*](filter.md#redisvl.query.filter.FilterExpression) *,* *optional*) – A filter to apply - along with the range query. Defaults to None. - * **dtype** (*str* *,* *optional*) – The dtype of the vector. Defaults to - “float32”. - * **distance_threshold** (*str* *,* *float*) – The threshold for vector distance. - A smaller threshold indicates a stricter semantic search. - Defaults to 0.2. - * **num_results** (*int*) – The MAX number of results to return. - Defaults to 10. - * **return_score** (*bool* *,* *optional*) – Whether to return the vector - distance. Defaults to True. - * **dialect** (*int* *,* *optional*) – The RediSearch query dialect. - Defaults to 2. -* **Raises:** - **TypeError** – If filter_expression is not of type redisvl.query.FilterExpression - -{{< note >}} -Learn more about vector range queries: [{{< relref "/develop/interact/search-and-query/" >}}search/vectors/#range-query]({{< relref "/develop/interact/search-and-query/" >}}search/vectors/#range-query) -{{< /note >}} - -### get_filter() - -Get the filter expression for the query. - -* **Returns:** - The filter for the query. -* **Return type:** - [FilterExpression](filter.md#redisvl.query.filter.FilterExpression) - -### set_distance_threshold(distance_threshold) - -Set the distance treshold for the query. - -* **Parameters:** - **distance_threshold** (*float*) – vector distance - -### set_filter(filter_expression=None) - -Set the filter expression for the query. - -* **Parameters:** - **filter_expression** (*Optional* *[*[*FilterExpression*](filter.md#redisvl.query.filter.FilterExpression) *]* *,* *optional*) – The filter - to apply to the query. -* **Raises:** - **TypeError** – If filter_expression is not of type redisvl.query.FilterExpression - -### set_paging(first, limit) - -Set the paging parameters for the query to limit the number of -results. - -* **Parameters:** - * **first** (*int*) – The zero-indexed offset for which to fetch query results - * **limit** (*int*) – The max number of results to include including the offset -* **Raises:** - **TypeError** – If first or limit are NOT integers. - -### *property* distance_threshold *: float* - -Return the distance threshold for the query. - -* **Returns:** - The distance threshold for the query. -* **Return type:** - float - -### *property* params *: Dict[str, Any]* - -Return the parameters for the query. - -* **Returns:** - The parameters for the query. -* **Return type:** - Dict[str, Any] - -### *property* query *: Query* - -Return a Redis-Py Query object representing the query. - -* **Returns:** - The Redis-Py query object. -* **Return type:** - redis.commands.search.query.Query - -## FilterQuery - -### *class* FilterQuery(filter_expression, return_fields=None, num_results=10, dialect=2, params=None) - -A query for a running a filtered search with a filter expression. - -* **Parameters:** - * **filter_expression** ([*FilterExpression*](filter.md#redisvl.query.filter.FilterExpression)) – The filter expression to - query for. - * **return_fields** (*Optional* *[**List* *[**str* *]* *]* *,* *optional*) – The fields to return. - * **num_results** (*Optional* *[**int* *]* *,* *optional*) – The number of results to - return. Defaults to 10. - * **params** (*Optional* *[**Dict* *[**str* *,* *Any* *]* *]* *,* *optional*) – The parameters for the - query. Defaults to None. - * **dialect** (*int*) – -* **Raises:** - **TypeError** – If filter_expression is not of type redisvl.query.FilterExpression - -```python -from redisvl.query import FilterQuery -from redisvl.query.filter import Tag - -t = Tag("brand") == "Nike" -q = FilterQuery(return_fields=["brand", "price"], filter_expression=t) -``` - -### get_filter() - -Get the filter expression for the query. - -* **Returns:** - The filter for the query. -* **Return type:** - [FilterExpression](filter.md#redisvl.query.filter.FilterExpression) - -### set_filter(filter_expression=None) - -Set the filter expression for the query. - -* **Parameters:** - **filter_expression** (*Optional* *[*[*FilterExpression*](filter.md#redisvl.query.filter.FilterExpression) *]* *,* *optional*) – The filter - to apply to the query. -* **Raises:** - **TypeError** – If filter_expression is not of type redisvl.query.FilterExpression - -### set_paging(first, limit) - -Set the paging parameters for the query to limit the number of -results. - -* **Parameters:** - * **first** (*int*) – The zero-indexed offset for which to fetch query results - * **limit** (*int*) – The max number of results to include including the offset -* **Raises:** - **TypeError** – If first or limit are NOT integers. - -### *property* query *: Query* - -Return a Redis-Py Query object representing the query. - -* **Returns:** - The Redis-Py query object. -* **Return type:** - redis.commands.search.query.Query - -## CountQuery - -### *class* CountQuery(filter_expression, dialect=2, params=None) - -A query for a simple count operation provided some filter expression. - -* **Parameters:** - * **filter_expression** ([*FilterExpression*](filter.md#redisvl.query.filter.FilterExpression)) – The filter expression to query for. - * **params** (*Optional* *[**Dict* *[**str* *,* *Any* *]* *]* *,* *optional*) – The parameters for the query. Defaults to None. - * **dialect** (*int*) – -* **Raises:** - **TypeError** – If filter_expression is not of type redisvl.query.FilterExpression - -```python -from redisvl.query import CountQuery -from redisvl.query.filter import Tag - -t = Tag("brand") == "Nike" -query = CountQuery(filter_expression=t) - -count = index.query(query) -``` - -### get_filter() - -Get the filter expression for the query. - -* **Returns:** - The filter for the query. -* **Return type:** - [FilterExpression](filter.md#redisvl.query.filter.FilterExpression) - -### set_filter(filter_expression=None) - -Set the filter expression for the query. - -* **Parameters:** - **filter_expression** (*Optional* *[*[*FilterExpression*](filter.md#redisvl.query.filter.FilterExpression) *]* *,* *optional*) – The filter - to apply to the query. -* **Raises:** - **TypeError** – If filter_expression is not of type redisvl.query.FilterExpression - -### set_paging(first, limit) - -Set the paging parameters for the query to limit the number of -results. - -* **Parameters:** - * **first** (*int*) – The zero-indexed offset for which to fetch query results - * **limit** (*int*) – The max number of results to include including the offset -* **Raises:** - **TypeError** – If first or limit are NOT integers. - -### *property* params *: Dict[str, Any]* - -The parameters for the query. - -* **Returns:** - The parameters for the query. -* **Return type:** - Dict[str, Any] - -### *property* query *: Query* - -The loaded Redis-Py query. - -* **Returns:** - The Redis-Py query object. -* **Return type:** - redis.commands.search.query.Query diff --git a/content/integrate/redisvl/api/schema.md b/content/integrate/redisvl/api/schema.md deleted file mode 100644 index efd113392c..0000000000 --- a/content/integrate/redisvl/api/schema.md +++ /dev/null @@ -1,337 +0,0 @@ ---- -description: The schema APIs -linkTitle: Schema -title: Schema -type: integration -weight: 2 ---- - -Schema in RedisVL provides a structured format to define index settings and -field configurations using the following three components: - -| Component | Description | -|-------------|------------------------------------------------------------------------------------| -| version | The version of the schema spec. Current supported version is 0.1.0. | -| index | Index specific settings like name, key prefix, key separator, and storage type. | -| fields | Subset of fields within your data to include in the index and any custom settings. | - -## IndexSchema - - - -### *class* IndexSchema(\*, index, fields={}, version='0.1.0') - -A schema definition for a search index in Redis, used in RedisVL for -configuring index settings and organizing vector and metadata fields. - -The class offers methods to create an index schema from a YAML file or a -Python dictionary, supporting flexible schema definitions and easy -integration into various workflows. - -An example schema.yaml file might look like this: - -```yaml -version: '0.1.0' - -index: - name: user-index - prefix: user - key_separator: ":" - storage_type: json - -fields: - - name: user - type: tag - - name: credit_score - type: tag - - name: embedding - type: vector - attrs: - algorithm: flat - dims: 3 - distance_metric: cosine - datatype: float32 -``` - -Loading the schema for RedisVL from yaml is as simple as: - -```python -from redisvl.schema import IndexSchema - -schema = IndexSchema.from_yaml("schema.yaml") -``` - -Loading the schema for RedisVL from dict is as simple as: - -```python -from redisvl.schema import IndexSchema - -schema = IndexSchema.from_dict({ - "index": { - "name": "user-index", - "prefix": "user", - "key_separator": ":", - "storage_type": "json", - }, - "fields": [ - {"name": "user", "type": "tag"}, - {"name": "credit_score", "type": "tag"}, - { - "name": "embedding", - "type": "vector", - "attrs": { - "algorithm": "flat", - "dims": 3, - "distance_metric": "cosine", - "datatype": "float32" - } - } - ] -}) -``` - -{{< note >}} -The fields attribute in the schema must contain unique field names to ensure -correct and unambiguous field references. -{{< /note >}} - -Create a new model by parsing and validating input data from keyword arguments. - -Raises ValidationError if the input data cannot be parsed to form a valid model. - -* **Parameters:** - * **index** (*IndexInfo*) – - * **fields** (*Dict* *[**str* *,* *BaseField* *]*) – - * **version** (*str*) – - -### add_field(field_inputs) - -Adds a single field to the index schema based on the specified field -type and attributes. - -This method allows for the addition of individual fields to the schema, -providing flexibility in defining the structure of the index. - -* **Parameters:** - **field_inputs** (*Dict* *[**str* *,* *Any* *]*) – A field to add. -* **Raises:** - **ValueError** – If the field name or type are not provided or if the name - already exists within the schema. - -```python -# Add a tag field -schema.add_field({"name": "user", "type": "tag}) - -# Add a vector field -schema.add_field({ - "name": "user-embedding", - "type": "vector", - "attrs": { - "dims": 1024, - "algorithm": "flat", - "datatype": "float32" - } -}) -``` - -### add_fields(fields) - -Extends the schema with additional fields. - -This method allows dynamically adding new fields to the index schema. It -processes a list of field definitions. - -* **Parameters:** - **fields** (*List* *[**Dict* *[**str* *,* *Any* *]* *]*) – A list of fields to add. -* **Raises:** - **ValueError** – If a field with the same name already exists in the - schema. - -```python -schema.add_fields([ - {"name": "user", "type": "tag"}, - {"name": "bio", "type": "text"}, - { - "name": "user-embedding", - "type": "vector", - "attrs": { - "dims": 1024, - "algorithm": "flat", - "datatype": "float32" - } - } -]) -``` - -### *classmethod* from_dict(data) - -Create an IndexSchema from a dictionary. - -* **Parameters:** - **data** (*Dict* *[**str* *,* *Any* *]*) – The index schema data. -* **Returns:** - The index schema. -* **Return type:** - [IndexSchema](#redisvl.schema.IndexSchema) - -```python -from redisvl.schema import IndexSchema - -schema = IndexSchema.from_dict({ - "index": { - "name": "docs-index", - "prefix": "docs", - "storage_type": "hash", - }, - "fields": [ - { - "name": "doc-id", - "type": "tag" - }, - { - "name": "doc-embedding", - "type": "vector", - "attrs": { - "algorithm": "flat", - "dims": 1536 - } - } - ] -}) -``` - -### *classmethod* from_yaml(file_path) - -Create an IndexSchema from a YAML file. - -* **Parameters:** - **file_path** (*str*) – The path to the YAML file. -* **Returns:** - The index schema. -* **Return type:** - [IndexSchema](#redisvl.schema.IndexSchema) - -```python -from redisvl.schema import IndexSchema -schema = IndexSchema.from_yaml("schema.yaml") -``` - -### remove_field(field_name) - -Removes a field from the schema based on the specified name. - -This method is useful for dynamically altering the schema by removing -existing fields. - -* **Parameters:** - **field_name** (*str*) – The name of the field to be removed. - -### to_dict() - -Convert the index schema to a dictionary. - -* **Returns:** - The index schema as a dictionary. -* **Return type:** - Dict[str, Any] - -### to_yaml(file_path, overwrite=True) - -Write the index schema to a YAML file. - -* **Parameters:** - * **file_path** (*str*) – The path to the YAML file. - * **overwrite** (*bool*) – Whether to overwrite the file if it already exists. -* **Raises:** - **FileExistsError** – If the file already exists and overwrite is False. -* **Return type:** - None - -### *property* field_names *: List[str]* - -A list of field names associated with the index schema. - -* **Returns:** - A list of field names from the schema. -* **Return type:** - List[str] - -### fields *: Dict[str, BaseField]* - -Fields associated with the search index and their properties - -### index *: IndexInfo* - -Details of the basic index configurations. - -### version *: str* - -Version of the underlying index schema. - -## Defining Fields - -Fields in the schema can be defined in YAML format or as a Python dictionary, specifying a name, type, an optional path, and attributes for customization. - -**YAML Example**: - -```yaml -- name: title - type: text - path: $.document.title - attrs: - weight: 1.0 - no_stem: false - withsuffixtrie: true -``` - -**Python Dictionary Example**: - -```python -{ - "name": "location", - "type": "geo", - "attrs": { - "sortable": true - } -} -``` - -## Supported Field Types and Attributes - -Each field type supports specific attributes that customize its behavior. Below are the field types and their available attributes: - -**Text Field Attributes**: - -- weight: Importance of the field in result calculation. -- no_stem: Disables stemming during indexing. -- withsuffixtrie: Optimizes queries by maintaining a suffix trie. -- phonetic_matcher: Enables phonetic matching. -- sortable: Allows sorting on this field. - -**Tag Field Attributes**: - -- separator: Character for splitting text into individual tags. -- case_sensitive: Case sensitivity in tag matching. -- withsuffixtrie: Suffix trie optimization for queries. -- sortable: Enables sorting based on the tag field. - -**Numeric and Geo Field Attributes**: - -- Both numeric and geo fields support the sortable attribute, enabling sorting on these fields. - -**Common Vector Field Attributes**: - -- dims: Dimensionality of the vector. -- algorithm: Indexing algorithm (flat or hnsw). -- datatype: Float datatype of the vector (float32 or float64). -- distance_metric: Metric for measuring query relevance (COSINE, L2, IP). - -**HNSW Vector Field Specific Attributes**: - -- m: Max outgoing edges per node in each layer. -- ef_construction: Max edge candidates during build time. -- ef_runtime: Max top candidates during search. -- epsilon: Range search boundary factor. - -Note: -: See fully documented Redis-supported fields and options here: [FT.CREATE]({{< baseurl >}}/commands/ft.create). diff --git a/content/integrate/redisvl/api/searchindex.md b/content/integrate/redisvl/api/searchindex.md deleted file mode 100644 index 762d73428d..0000000000 --- a/content/integrate/redisvl/api/searchindex.md +++ /dev/null @@ -1,772 +0,0 @@ ---- -description: The search index APIs -linkTitle: Search index -title: Search index -type: integration -weight: 3 ---- - -| Class | Description | -|------------------|----------------------------------------------------------------------------------------------| -| SearchIndex | Primary class to write, read, and search across data structures in Redis. | -| AsyncSearchIndex | Async version of the SearchIndex to write, read, and search across data structures in Redis. | - -## SearchIndex - - - -### *class* SearchIndex(schema, redis_client=None, redis_url=None, connection_args={}, \*\*kwargs) - -A search index class for interacting with Redis as a vector database. - -The SearchIndex is instantiated with a reference to a Redis database and an -IndexSchema (YAML path or dictionary object) that describes the various -settings and field configurations. - -```python -from redisvl.index import SearchIndex - -# initialize the index object with schema from file -index = SearchIndex.from_yaml("schemas/schema.yaml") -index.connect(redis_url="redis://localhost:6379") - -# create the index -index.create(overwrite=True) - -# data is an iterable of dictionaries -index.load(data) - -# delete index and data -index.delete(drop=True) -``` - -Initialize the RedisVL search index with a schema, Redis client -(or URL string with other connection args), connection_args, and other -kwargs. - -* **Parameters:** - * **schema** ([*IndexSchema*](schema.md#redisvl.schema.IndexSchema)) – Index schema object. - * **redis_client** (*Union* *[**redis.Redis* *,* *aredis.Redis* *]* *,* *optional*) – An - instantiated redis client. - * **redis_url** (*str* *,* *optional*) – The URL of the Redis server to - connect to. - * **connection_args** (*Dict* *[**str* *,* *Any* *]* *,* *optional*) – Redis client connection - args. - -### connect(redis_url=None, \*\*kwargs) - -Connect to a Redis instance using the provided redis_url, falling -back to the REDIS_URL environment variable (if available). - -Note: Additional keyword arguments (\*\*kwargs) can be used to provide -extra options specific to the Redis connection. - -* **Parameters:** - **redis_url** (*Optional* *[**str* *]* *,* *optional*) – The URL of the Redis server to - connect to. If not provided, the method defaults to using the - REDIS_URL environment variable. -* **Raises:** - * **redis.exceptions.ConnectionError** – If the connection to the Redis - server fails. - * **ValueError** – If the Redis URL is not provided nor accessible - through the REDIS_URL environment variable. - -```python -index.connect(redis_url="redis://localhost:6379") -``` - -### create(overwrite=False, drop=False) - -Create an index in Redis with the current schema and properties. - -* **Parameters:** - * **overwrite** (*bool* *,* *optional*) – Whether to overwrite the index if it - already exists. Defaults to False. - * **drop** (*bool* *,* *optional*) – Whether to drop all keys associated with the - index in the case of overwriting. Defaults to False. -* **Raises:** - * **RuntimeError** – If the index already exists and ‘overwrite’ is False. - * **ValueError** – If no fields are defined for the index. -* **Return type:** - None - -```python -# create an index in Redis; only if one does not exist with given name -index.create() - -# overwrite an index in Redis without dropping associated data -index.create(overwrite=True) - -# overwrite an index in Redis; drop associated data (clean slate) -index.create(overwrite=True, drop=True) -``` - -### delete(drop=True) - -Delete the search index while optionally dropping all keys associated -with the index. - -* **Parameters:** - **drop** (*bool* *,* *optional*) – Delete the key / documents pairs in the - index. Defaults to True. -* **Raises:** - **redis.exceptions.ResponseError** – If the index does not exist. - -### disconnect() - -Disconnect from the Redis database. - -### exists() - -Check if the index exists in Redis. - -* **Returns:** - True if the index exists, False otherwise. -* **Return type:** - bool - -### fetch(id) - -Fetch an object from Redis by id. - -The id is typically either a unique identifier, -or derived from some domain-specific metadata combination -(like a document id or chunk id). - -* **Parameters:** - **id** (*str*) – The specified unique identifier for a particular - document indexed in Redis. -* **Returns:** - The fetched object. -* **Return type:** - Dict[str, Any] - -### *classmethod* from_dict(schema_dict, \*\*kwargs) - -Create a SearchIndex from a dictionary. - -* **Parameters:** - * **schema_dict** (*Dict* *[**str* *,* *Any* *]*) – A dictionary containing the schema. - * **connection_args** (*Dict* *[**str* *,* *Any* *]* *,* *optional*) – Redis client connection - args. -* **Returns:** - A RedisVL SearchIndex object. -* **Return type:** - [SearchIndex](#redisvl.index.SearchIndex) - -```python -from redisvl.index import SearchIndex - -index = SearchIndex.from_dict({ - "index": { - "name": "my-index", - "prefix": "rvl", - "storage_type": "hash", - }, - "fields": [ - {"name": "doc-id", "type": "tag"} - ] -}) -``` - -### *classmethod* from_yaml(schema_path, \*\*kwargs) - -Create a SearchIndex from a YAML schema file. - -* **Parameters:** - **schema_path** (*str*) – Path to the YAML schema file. -* **Returns:** - A RedisVL SearchIndex object. -* **Return type:** - [SearchIndex](#redisvl.index.SearchIndex) - -```python -from redisvl.index import SearchIndex - -index = SearchIndex.from_yaml("schemas/schema.yaml") -``` - -### info() - -Get information about the index. - -* **Returns:** - A dictionary containing the information about the index. -* **Return type:** - dict - -### key(id) - -Construct a redis key as a combination of an index key prefix (optional) -and specified id. - -The id is typically either a unique identifier, or -derived from some domain-specific metadata combination (like a document -id or chunk id). - -* **Parameters:** - **id** (*str*) – The specified unique identifier for a particular - document indexed in Redis. -* **Returns:** - The full Redis key including key prefix and value as a string. -* **Return type:** - str - -### listall() - -List all search indices in Redis database. - -* **Returns:** - The list of indices in the database. -* **Return type:** - List[str] - -### load(data, id_field=None, keys=None, ttl=None, preprocess=None, batch_size=None) - -Load objects to the Redis database. Returns the list of keys loaded -to Redis. - -RedisVL automatically handles constructing the object keys, batching, -optional preprocessing steps, and setting optional expiration -(TTL policies) on keys. - -* **Parameters:** - * **data** (*Iterable* *[**Any* *]*) – An iterable of objects to store. - * **id_field** (*Optional* *[**str* *]* *,* *optional*) – Specified field used as the id - portion of the redis key (after the prefix) for each - object. Defaults to None. - * **keys** (*Optional* *[**Iterable* *[**str* *]* *]* *,* *optional*) – Optional iterable of keys. - Must match the length of objects if provided. Defaults to None. - * **ttl** (*Optional* *[**int* *]* *,* *optional*) – Time-to-live in seconds for each key. - Defaults to None. - * **preprocess** (*Optional* *[**Callable* *]* *,* *optional*) – A function to preprocess - objects before storage. Defaults to None. - * **batch_size** (*Optional* *[**int* *]* *,* *optional*) – Number of objects to write in - a single Redis pipeline execution. Defaults to class’s - default batch size. -* **Returns:** - List of keys loaded to Redis. -* **Return type:** - List[str] -* **Raises:** - **ValueError** – If the length of provided keys does not match the length - of objects. - -```python -data = [{"test": "foo"}, {"test": "bar"}] - -# simple case -keys = index.load(data) - -# set 360 second ttl policy on data -keys = index.load(data, ttl=360) - -# load data with predefined keys -keys = index.load(data, keys=["rvl:foo", "rvl:bar"]) - -# load data with preprocessing step -def add_field(d): - d["new_field"] = 123 - return d -keys = index.load(data, preprocess=add_field) -``` - -### paginate(query, page_size=30) - -Execute a given query against the index and return results in -paginated batches. - -This method accepts a RedisVL query instance, enabling pagination of -results which allows for subsequent processing over each batch with a -generator. - -* **Parameters:** - * **query** (*BaseQuery*) – The search query to be executed. - * **page_size** (*int* *,* *optional*) – The number of results to return in each - batch. Defaults to 30. -* **Yields:** - A generator yielding batches of search results. -* **Raises:** - * **TypeError** – If the page_size argument is not of type int. - * **ValueError** – If the page_size argument is less than or equal to zero. -* **Return type:** - *Generator* - -### Example - -# Iterate over paginated search results in batches of 10 -for result_batch in index.paginate(query, page_size=10): - -> # Process each batch of results -> pass - -{{< note >}} -The page_size parameter controls the number of items each result -batch contains. Adjust this value based on performance -considerations and the expected volume of search results. -{{< /note >}} - -### query(query) - -Execute a query on the index. - -This method takes a BaseQuery object directly, runs the search, and -handles post-processing of the search. - -* **Parameters:** - **query** (*BaseQuery*) – The query to run. -* **Returns:** - A list of search results. -* **Return type:** - List[Result] - -```python -from redisvl.query import VectorQuery - -query = VectorQuery( - vector=[0.16, -0.34, 0.98, 0.23], - vector_field_name="embedding", - num_results=3 -) - -results = index.query(query) -``` - -### search(\*args, \*\*kwargs) - -Perform a search against the index. - -Wrapper around redis.search.Search that adds the index name -to the search query and passes along the rest of the arguments -to the redis-py ft.search() method. - -* **Returns:** - Raw Redis search results. -* **Return type:** - Result - -### set_client(client) - -Manually set the Redis client to use with the search index. - -This method configures the search index to use a specific Redis or -Async Redis client. It is useful for cases where an external, -custom-configured client is preferred instead of creating a new one. - -* **Parameters:** - **client** (*redis.Redis*) – A Redis or Async Redis - client instance to be used for the connection. -* **Raises:** - **TypeError** – If the provided client is not valid. - -```python -import redis -from redisvl.index import SearchIndex - -client = redis.Redis.from_url("redis://localhost:6379") -index = SearchIndex.from_yaml("schemas/schema.yaml") -index.set_client(client) -``` - -### *property* client *: Redis | Redis | None* - -The underlying redis-py client object. - -### *property* key_separator *: str* - -The optional separator between a defined prefix and key value in -forming a Redis key. - -### *property* name *: str* - -The name of the Redis search index. - -### *property* prefix *: str* - -The optional key prefix that comes before a unique key value in -forming a Redis key. - -### *property* storage_type *: StorageType* - -The underlying storage type for the search index; either -hash or json. - -## AsyncSearchIndex - - - -### *class* AsyncSearchIndex(schema, redis_client=None, redis_url=None, connection_args={}, \*\*kwargs) - -A search index class for interacting with Redis as a vector database in -async-mode. - -The AsyncSearchIndex is instantiated with a reference to a Redis database -and an IndexSchema (YAML path or dictionary object) that describes the -various settings and field configurations. - -```python -from redisvl.index import AsyncSearchIndex - -# initialize the index object with schema from file -index = AsyncSearchIndex.from_yaml("schemas/schema.yaml") -index.connect(redis_url="redis://localhost:6379") - -# create the index -await index.create(overwrite=True) - -# data is an iterable of dictionaries -await index.load(data) - -# delete index and data -await index.delete(drop=True) -``` - -Initialize the RedisVL search index with a schema, Redis client -(or URL string with other connection args), connection_args, and other -kwargs. - -* **Parameters:** - * **schema** ([*IndexSchema*](schema.md#redisvl.schema.IndexSchema)) – Index schema object. - * **redis_client** (*Union* *[**redis.Redis* *,* *aredis.Redis* *]* *,* *optional*) – An - instantiated redis client. - * **redis_url** (*str* *,* *optional*) – The URL of the Redis server to - connect to. - * **connection_args** (*Dict* *[**str* *,* *Any* *]* *,* *optional*) – Redis client connection - args. - -### connect(redis_url=None, \*\*kwargs) - -Connect to a Redis instance using the provided redis_url, falling -back to the REDIS_URL environment variable (if available). - -Note: Additional keyword arguments (\*\*kwargs) can be used to provide -extra options specific to the Redis connection. - -* **Parameters:** - **redis_url** (*Optional* *[**str* *]* *,* *optional*) – The URL of the Redis server to - connect to. If not provided, the method defaults to using the - REDIS_URL environment variable. -* **Raises:** - * **redis.exceptions.ConnectionError** – If the connection to the Redis - server fails. - * **ValueError** – If the Redis URL is not provided nor accessible - through the REDIS_URL environment variable. - -```python -index.connect(redis_url="redis://localhost:6379") -``` - -### *async* create(overwrite=False, drop=False) - -Asynchronously create an index in Redis with the current schema -: and properties. - -* **Parameters:** - * **overwrite** (*bool* *,* *optional*) – Whether to overwrite the index if it - already exists. Defaults to False. - * **drop** (*bool* *,* *optional*) – Whether to drop all keys associated with the - index in the case of overwriting. Defaults to False. -* **Raises:** - * **RuntimeError** – If the index already exists and ‘overwrite’ is False. - * **ValueError** – If no fields are defined for the index. -* **Return type:** - None - -```python -# create an index in Redis; only if one does not exist with given name -await index.create() - -# overwrite an index in Redis without dropping associated data -await index.create(overwrite=True) - -# overwrite an index in Redis; drop associated data (clean slate) -await index.create(overwrite=True, drop=True) -``` - -### *async* delete(drop=True) - -Delete the search index. - -* **Parameters:** - **drop** (*bool* *,* *optional*) – Delete the documents in the index. - Defaults to True. -* **Raises:** - **redis.exceptions.ResponseError** – If the index does not exist. - -### disconnect() - -Disconnect from the Redis database. - -### *async* exists() - -Check if the index exists in Redis. - -* **Returns:** - True if the index exists, False otherwise. -* **Return type:** - bool - -### *async* fetch(id) - -Asynchronously etch an object from Redis by id. The id is typically -either a unique identifier, or derived from some domain-specific -metadata combination (like a document id or chunk id). - -* **Parameters:** - **id** (*str*) – The specified unique identifier for a particular - document indexed in Redis. -* **Returns:** - The fetched object. -* **Return type:** - Dict[str, Any] - -### *classmethod* from_dict(schema_dict, \*\*kwargs) - -Create a SearchIndex from a dictionary. - -* **Parameters:** - * **schema_dict** (*Dict* *[**str* *,* *Any* *]*) – A dictionary containing the schema. - * **connection_args** (*Dict* *[**str* *,* *Any* *]* *,* *optional*) – Redis client connection - args. -* **Returns:** - A RedisVL SearchIndex object. -* **Return type:** - [SearchIndex](#redisvl.index.SearchIndex) - -```python -from redisvl.index import SearchIndex - -index = SearchIndex.from_dict({ - "index": { - "name": "my-index", - "prefix": "rvl", - "storage_type": "hash", - }, - "fields": [ - {"name": "doc-id", "type": "tag"} - ] -}) -``` - -### *classmethod* from_yaml(schema_path, \*\*kwargs) - -Create a SearchIndex from a YAML schema file. - -* **Parameters:** - **schema_path** (*str*) – Path to the YAML schema file. -* **Returns:** - A RedisVL SearchIndex object. -* **Return type:** - [SearchIndex](#redisvl.index.SearchIndex) - -```python -from redisvl.index import SearchIndex - -index = SearchIndex.from_yaml("schemas/schema.yaml") -``` - -### *async* info() - -Get information about the index. - -* **Returns:** - A dictionary containing the information about the index. -* **Return type:** - dict - -### key(id) - -Construct a redis key as a combination of an index key prefix (optional) -and specified id. - -The id is typically either a unique identifier, or -derived from some domain-specific metadata combination (like a document -id or chunk id). - -* **Parameters:** - **id** (*str*) – The specified unique identifier for a particular - document indexed in Redis. -* **Returns:** - The full Redis key including key prefix and value as a string. -* **Return type:** - str - -### *async* listall() - -List all search indices in Redis database. - -* **Returns:** - The list of indices in the database. -* **Return type:** - List[str] - -### *async* load(data, id_field=None, keys=None, ttl=None, preprocess=None, concurrency=None) - -Asynchronously load objects to Redis with concurrency control. -Returns the list of keys loaded to Redis. - -RedisVL automatically handles constructing the object keys, batching, -optional preprocessing steps, and setting optional expiration -(TTL policies) on keys. - -* **Parameters:** - * **data** (*Iterable* *[**Any* *]*) – An iterable of objects to store. - * **id_field** (*Optional* *[**str* *]* *,* *optional*) – Specified field used as the id - portion of the redis key (after the prefix) for each - object. Defaults to None. - * **keys** (*Optional* *[**Iterable* *[**str* *]* *]* *,* *optional*) – Optional iterable of keys. - Must match the length of objects if provided. Defaults to None. - * **ttl** (*Optional* *[**int* *]* *,* *optional*) – Time-to-live in seconds for each key. - Defaults to None. - * **preprocess** (*Optional* *[**Callable* *]* *,* *optional*) – An async function to - preprocess objects before storage. Defaults to None. - * **concurrency** (*Optional* *[**int* *]* *,* *optional*) – The maximum number of - concurrent write operations. Defaults to class’s default - concurrency level. -* **Returns:** - List of keys loaded to Redis. -* **Return type:** - List[str] -* **Raises:** - **ValueError** – If the length of provided keys does not match the - length of objects. - -```python -data = [{"test": "foo"}, {"test": "bar"}] - -# simple case -keys = await index.load(data) - -# set 360 second ttl policy on data -keys = await index.load(data, ttl=360) - -# load data with predefined keys -keys = await index.load(data, keys=["rvl:foo", "rvl:bar"]) - -# load data with preprocessing step -async def add_field(d): - d["new_field"] = 123 - return d -keys = await index.load(data, preprocess=add_field) -``` - -### *async* paginate(query, page_size=30) - -Execute a given query against the index and return results in -paginated batches. - -This method accepts a RedisVL query instance, enabling async pagination -of results which allows for subsequent processing over each batch with a -generator. - -* **Parameters:** - * **query** (*BaseQuery*) – The search query to be executed. - * **page_size** (*int* *,* *optional*) – The number of results to return in each - batch. Defaults to 30. -* **Yields:** - An async generator yielding batches of search results. -* **Raises:** - * **TypeError** – If the page_size argument is not of type int. - * **ValueError** – If the page_size argument is less than or equal to zero. -* **Return type:** - *AsyncGenerator* - -### Example - -# Iterate over paginated search results in batches of 10 -async for result_batch in index.paginate(query, page_size=10): - -> # Process each batch of results -> pass - -{{< note >}} -The page_size parameter controls the number of items each result -batch contains. Adjust this value based on performance -considerations and the expected volume of search results. -{{< /note >}} - -### *async* query(query) - -Asynchronously execute a query on the index. - -This method takes a BaseQuery object directly, runs the search, and -handles post-processing of the search. - -* **Parameters:** - **query** (*BaseQuery*) – The query to run. -* **Returns:** - A list of search results. -* **Return type:** - List[Result] - -```python -from redisvl.query import VectorQuery - -query = VectorQuery( - vector=[0.16, -0.34, 0.98, 0.23], - vector_field_name="embedding", - num_results=3 -) - -results = await index.query(query) -``` - -### *async* search(\*args, \*\*kwargs) - -Perform a search on this index. - -Wrapper around redis.search.Search that adds the index name -to the search query and passes along the rest of the arguments -to the redis-py ft.search() method. - -* **Returns:** - Raw Redis search results. -* **Return type:** - Result - -### set_client(client) - -Manually set the Redis client to use with the search index. - -This method configures the search index to use a specific -Async Redis client. It is useful for cases where an external, -custom-configured client is preferred instead of creating a new one. - -* **Parameters:** - **client** (*aredis.Redis*) – An Async Redis - client instance to be used for the connection. -* **Raises:** - **TypeError** – If the provided client is not valid. - -```python -import redis.asyncio as aredis -from redisvl.index import AsyncSearchIndex - -# async Redis client and index -client = aredis.Redis.from_url("redis://localhost:6379") -index = AsyncSearchIndex.from_yaml("schemas/schema.yaml") -index.set_client(client) -``` - -### *property* client *: Redis | Redis | None* - -The underlying redis-py client object. - -### *property* key_separator *: str* - -The optional separator between a defined prefix and key value in -forming a Redis key. - -### *property* name *: str* - -The name of the Redis search index. - -### *property* prefix *: str* - -The optional key prefix that comes before a unique key value in -forming a Redis key. - -### *property* storage_type *: StorageType* - -The underlying storage type for the search index; either -hash or json. diff --git a/content/integrate/redisvl/api/vectorizer.md b/content/integrate/redisvl/api/vectorizer.md deleted file mode 100644 index fcee58cac6..0000000000 --- a/content/integrate/redisvl/api/vectorizer.md +++ /dev/null @@ -1,415 +0,0 @@ ---- -description: The vectorizer APIs -linkTitle: Vectorizers -title: Vectorizers -type: integration -weight: 6 ---- - -## HFTextVectorizer - - - -### *class* HFTextVectorizer(model='sentence-transformers/all-mpnet-base-v2', \*, dims, client=None) - -Bases: `BaseVectorizer` - -The HFTextVectorizer class is designed to leverage the power of Hugging -Face’s Sentence Transformers for generating text embeddings. This vectorizer -is particularly useful in scenarios where advanced natural language -processing and understanding are required, and ideal for running on your own -hardware (for free). - -Utilizing this vectorizer involves specifying a pre-trained model from -Hugging Face’s vast collection of Sentence Transformers. These models are -trained on a variety of datasets and tasks, ensuring versatility and -robust performance across different text embedding needs. Additionally, -make sure the sentence-transformers library is installed with -pip install sentence-transformers==2.2.2. - -```python -# Embedding a single text -vectorizer = HFTextVectorizer(model="sentence-transformers/all-mpnet-base-v2") -embedding = vectorizer.embed("Hello, world!") - -# Embedding a batch of texts -embeddings = vectorizer.embed_many(["Hello, world!", "How are you?"], batch_size=2) -``` - -Initialize the Hugging Face text vectorizer. - -* **Parameters:** - * **model** (*str*) – The pre-trained model from Hugging Face’s Sentence - Transformers to be used for embedding. Defaults to - ‘sentence-transformers/all-mpnet-base-v2’. - * **dims** (*int*) – - * **client** (*Any*) – -* **Raises:** - * **ImportError** – If the sentence-transformers library is not installed. - * **ValueError** – If there is an error setting the embedding model dimensions. - -### embed(text, preprocess=None, as_buffer=False, \*\*kwargs) - -Embed a chunk of text using the Hugging Face sentence transformer. - -* **Parameters:** - * **text** (*str*) – Chunk of text to embed. - * **preprocess** (*Optional* *[**Callable* *]* *,* *optional*) – Optional preprocessing - callable to perform before vectorization. Defaults to None. - * **as_buffer** (*bool* *,* *optional*) – Whether to convert the raw embedding - to a byte string. Defaults to False. -* **Returns:** - Embedding. -* **Return type:** - List[float] -* **Raises:** - **TypeError** – If the wrong input type is passed in for the text. - -### embed_many(texts, preprocess=None, batch_size=1000, as_buffer=False, \*\*kwargs) - -Asynchronously embed many chunks of texts using the Hugging Face -sentence transformer. - -* **Parameters:** - * **texts** (*List* *[**str* *]*) – List of text chunks to embed. - * **preprocess** (*Optional* *[**Callable* *]* *,* *optional*) – Optional preprocessing - callable to perform before vectorization. Defaults to None. - * **batch_size** (*int* *,* *optional*) – Batch size of texts to use when creating - embeddings. Defaults to 10. - * **as_buffer** (*bool* *,* *optional*) – Whether to convert the raw embedding - to a byte string. Defaults to False. -* **Returns:** - List of embeddings. -* **Return type:** - List[List[float]] -* **Raises:** - **TypeError** – If the wrong input type is passed in for the test. - -## OpenAITextVectorizer - - - -### *class* OpenAITextVectorizer(model='text-embedding-ada-002', api_config=None) - -Bases: `BaseVectorizer` - -The OpenAITextVectorizer class utilizes OpenAI’s API to generate -embeddings for text data. - -This vectorizer is designed to interact with OpenAI’s embeddings API, -requiring an API key for authentication. The key can be provided directly -in the api_config dictionary or through the OPENAI_API_KEY environment -variable. Users must obtain an API key from OpenAI’s website -([https://api.openai.com/](https://platform.openai.com/docs/quickstart)). Additionally, the openai python client must be -installed with pip install openai>=1.13.0. - -The vectorizer supports both synchronous and asynchronous operations, -allowing for batch processing of texts and flexibility in handling -preprocessing tasks. - -```python -# Synchronous embedding of a single text -vectorizer = OpenAITextVectorizer( - model="text-embedding-ada-002", - api_config={"api_key": "your_api_key"} # OR set OPENAI_API_KEY in your env -) -embedding = vectorizer.embed("Hello, world!") - -# Asynchronous batch embedding of multiple texts -embeddings = await vectorizer.aembed_many( - ["Hello, world!", "How are you?"], - batch_size=2 -) -``` - -Initialize the OpenAI vectorizer. - -* **Parameters:** - * **model** (*str*) – Model to use for embedding. Defaults to - ‘text-embedding-ada-002’. - * **api_config** (*Optional* *[**Dict* *]* *,* *optional*) – Dictionary containing the - API key. Defaults to None. -* **Raises:** - * **ImportError** – If the openai library is not installed. - * **ValueError** – If the OpenAI API key is not provided. - -### *async* aembed(text, preprocess=None, as_buffer=False, \*\*kwargs) - -Asynchronously embed a chunk of text using the OpenAI API. - -* **Parameters:** - * **text** (*str*) – Chunk of text to embed. - * **preprocess** (*Optional* *[**Callable* *]* *,* *optional*) – Optional preprocessing callable to - perform before vectorization. Defaults to None. - * **as_buffer** (*bool* *,* *optional*) – Whether to convert the raw embedding - to a byte string. Defaults to False. -* **Returns:** - Embedding. -* **Return type:** - List[float] -* **Raises:** - **TypeError** – If the wrong input type is passed in for the test. - -### *async* aembed_many(texts, preprocess=None, batch_size=1000, as_buffer=False, \*\*kwargs) - -Asynchronously embed many chunks of texts using the OpenAI API. - -* **Parameters:** - * **texts** (*List* *[**str* *]*) – List of text chunks to embed. - * **preprocess** (*Optional* *[**Callable* *]* *,* *optional*) – Optional preprocessing callable to - perform before vectorization. Defaults to None. - * **batch_size** (*int* *,* *optional*) – Batch size of texts to use when creating - embeddings. Defaults to 10. - * **as_buffer** (*bool* *,* *optional*) – Whether to convert the raw embedding - to a byte string. Defaults to False. -* **Returns:** - List of embeddings. -* **Return type:** - List[List[float]] -* **Raises:** - **TypeError** – If the wrong input type is passed in for the test. - -### embed(text, preprocess=None, as_buffer=False, \*\*kwargs) - -Embed a chunk of text using the OpenAI API. - -* **Parameters:** - * **text** (*str*) – Chunk of text to embed. - * **preprocess** (*Optional* *[**Callable* *]* *,* *optional*) – Optional preprocessing callable to - perform before vectorization. Defaults to None. - * **as_buffer** (*bool* *,* *optional*) – Whether to convert the raw embedding - to a byte string. Defaults to False. -* **Returns:** - Embedding. -* **Return type:** - List[float] -* **Raises:** - **TypeError** – If the wrong input type is passed in for the test. - -### embed_many(texts, preprocess=None, batch_size=10, as_buffer=False, \*\*kwargs) - -Embed many chunks of texts using the OpenAI API. - -* **Parameters:** - * **texts** (*List* *[**str* *]*) – List of text chunks to embed. - * **preprocess** (*Optional* *[**Callable* *]* *,* *optional*) – Optional preprocessing - callable to perform before vectorization. Defaults to None. - * **batch_size** (*int* *,* *optional*) – Batch size of texts to use when creating - embeddings. Defaults to 10. - * **as_buffer** (*bool* *,* *optional*) – Whether to convert the raw embedding - to a byte string. Defaults to False. -* **Returns:** - List of embeddings. -* **Return type:** - List[List[float]] -* **Raises:** - **TypeError** – If the wrong input type is passed in for the test. - -## VertexAITextVectorizer - - - -### *class* VertexAITextVectorizer(model='textembedding-gecko', api_config=None) - -Bases: `BaseVectorizer` - -The VertexAITextVectorizer uses Google’s VertexAI Palm 2 embedding model -API to create text embeddings. - -This vectorizer is tailored for use in -environments where integration with Google Cloud Platform (GCP) services is -a key requirement. - -Utilizing this vectorizer requires an active GCP project and location -(region), along with appropriate application credentials. These can be -provided through the api_config dictionary or by setting the corresponding -environment variables. Additionally, the vertexai python client must be -installed with pip install google-cloud-aiplatform>=1.26. - -```python -# Synchronous embedding of a single text -vectorizer = VertexAITextVectorizer( - model="textembedding-gecko", - api_config={ - "project_id": "your_gcp_project_id", # OR set GCP_PROJECT_ID - "location": "your_gcp_location", # OR set GCP_LOCATION - "google_application_credentials": "path_to_your_creds" - # OR set GOOGLE_APPLICATION_CREDENTIALS - }) -embedding = vectorizer.embed("Hello, world!") - -# Asynchronous batch embedding of multiple texts -embeddings = await vectorizer.embed_many( - ["Hello, world!", "Goodbye, world!"], - batch_size=2 -) -``` - -Initialize the VertexAI vectorizer. - -* **Parameters:** - * **model** (*str*) – Model to use for embedding. Defaults to - ‘textembedding-gecko’. - * **api_config** (*Optional* *[**Dict* *]* *,* *optional*) – Dictionary containing the - API key. Defaults to None. -* **Raises:** - * **ImportError** – If the google-cloud-aiplatform library is not installed. - * **ValueError** – If the API key is not provided. - -### embed(text, preprocess=None, as_buffer=False, \*\*kwargs) - -Embed a chunk of text using the VertexAI API. - -* **Parameters:** - * **text** (*str*) – Chunk of text to embed. - * **preprocess** (*Optional* *[**Callable* *]* *,* *optional*) – Optional preprocessing callable to - perform before vectorization. Defaults to None. - * **as_buffer** (*bool* *,* *optional*) – Whether to convert the raw embedding - to a byte string. Defaults to False. -* **Returns:** - Embedding. -* **Return type:** - List[float] -* **Raises:** - **TypeError** – If the wrong input type is passed in for the test. - -### embed_many(texts, preprocess=None, batch_size=10, as_buffer=False, \*\*kwargs) - -Embed many chunks of texts using the VertexAI API. - -* **Parameters:** - * **texts** (*List* *[**str* *]*) – List of text chunks to embed. - * **preprocess** (*Optional* *[**Callable* *]* *,* *optional*) – Optional preprocessing callable to - perform before vectorization. Defaults to None. - * **batch_size** (*int* *,* *optional*) – Batch size of texts to use when creating - embeddings. Defaults to 10. - * **as_buffer** (*bool* *,* *optional*) – Whether to convert the raw embedding - to a byte string. Defaults to False. -* **Returns:** - List of embeddings. -* **Return type:** - List[List[float]] -* **Raises:** - **TypeError** – If the wrong input type is passed in for the test. - -## CohereTextVectorizer - - - -### *class* CohereTextVectorizer(model='embed-english-v3.0', api_config=None) - -Bases: `BaseVectorizer` - -The CohereTextVectorizer class utilizes Cohere’s API to generate -embeddings for text data. - -This vectorizer is designed to interact with Cohere’s /embed API, -requiring an API key for authentication. The key can be provided -directly in the api_config dictionary or through the COHERE_API_KEY -environment variable. User must obtain an API key from Cohere’s website -([https://dashboard.cohere.com/](https://dashboard.cohere.com/)). Additionally, the cohere python -client must be installed with pip install cohere. - -The vectorizer supports only synchronous operations, allows for batch -processing of texts and flexibility in handling preprocessing tasks. - -```python -from redisvl.utils.vectorize import CohereTextVectorizer - -vectorizer = CohereTextVectorizer( - model="embed-english-v3.0", - api_config={"api_key": "your-cohere-api-key"} # OR set COHERE_API_KEY in your env -) -query_embedding = vectorizer.embed( - text="your input query text here", - input_type="search_query" -) -doc_embeddings = cohere.embed_many( - texts=["your document text", "more document text"], - input_type="search_document" -) -``` - -Initialize the Cohere vectorizer. - -Visit [https://cohere.ai/embed](https://cohere.ai/embed) to learn about embeddings. - -* **Parameters:** - * **model** (*str*) – Model to use for embedding. Defaults to ‘embed-english-v3.0’. - * **api_config** (*Optional* *[**Dict* *]* *,* *optional*) – Dictionary containing the API key. - Defaults to None. -* **Raises:** - * **ImportError** – If the cohere library is not installed. - * **ValueError** – If the API key is not provided. - -### embed(text, preprocess=None, as_buffer=False, \*\*kwargs) - -Embed a chunk of text using the Cohere Embeddings API. - -Must provide the embedding input_type as a kwarg to this method -that specifies the type of input you’re giving to the model. - -Supported input types: -: - `search_document`: Used for embeddings stored in a vector database for search use-cases. - - `search_query`: Used for embeddings of search queries run against a vector DB to find relevant documents. - - `classification`: Used for embeddings passed through a text classifier - - `clustering`: Used for the embeddings run through a clustering algorithm. - -When hydrating your Redis DB, the documents you want to search over -should be embedded with input_type= “search_document” and when you are -querying the database, you should set the input_type = “search query”. -If you want to use the embeddings for a classification or clustering -task downstream, you should set input_type= “classification” or -“clustering”. - -* **Parameters:** - * **text** (*str*) – Chunk of text to embed. - * **preprocess** (*Optional* *[**Callable* *]* *,* *optional*) – Optional preprocessing callable to - perform before vectorization. Defaults to None. - * **as_buffer** (*bool* *,* *optional*) – Whether to convert the raw embedding - to a byte string. Defaults to False. - * **input_type** (*str*) – Specifies the type of input passed to the model. - Required for embedding models v3 and higher. -* **Returns:** - Embedding. -* **Return type:** - List[float] -* **Raises:** - **TypeError** – In an invalid input_type is provided. - -### embed_many(texts, preprocess=None, batch_size=10, as_buffer=False, \*\*kwargs) - -Embed many chunks of text using the Cohere Embeddings API. - -Must provide the embedding input_type as a kwarg to this method -that specifies the type of input you’re giving to the model. - -Supported input types: -: - `search_document`: Used for embeddings stored in a vector database for search use-cases. - - `search_query`: Used for embeddings of search queries run against a vector DB to find relevant documents. - - `classification`: Used for embeddings passed through a text classifier - - `clustering`: Used for the embeddings run through a clustering algorithm. - -When hydrating your Redis DB, the documents you want to search over -should be embedded with input_type= “search_document” and when you are -querying the database, you should set the input_type = “search query”. -If you want to use the embeddings for a classification or clustering -task downstream, you should set input_type= “classification” or -“clustering”. - -* **Parameters:** - * **texts** (*List* *[**str* *]*) – List of text chunks to embed. - * **preprocess** (*Optional* *[**Callable* *]* *,* *optional*) – Optional preprocessing callable to - perform before vectorization. Defaults to None. - * **batch_size** (*int* *,* *optional*) – Batch size of texts to use when creating - embeddings. Defaults to 10. - * **as_buffer** (*bool* *,* *optional*) – Whether to convert the raw embedding - to a byte string. Defaults to False. - * **input_type** (*str*) – Specifies the type of input passed to the model. - Required for embedding models v3 and higher. -* **Returns:** - List of embeddings. -* **Return type:** - List[List[float]] -* **Raises:** - **TypeError** – In an invalid input_type is provided. diff --git a/content/integrate/redisvl/docs.md b/content/integrate/redisvl/docs.md new file mode 100644 index 0000000000..b2fe8f5280 --- /dev/null +++ b/content/integrate/redisvl/docs.md @@ -0,0 +1,9 @@ +--- +description: RedisVL documentation +linkTitle: Documentation +title: Documentation +type: integration +weight: 3 +--- + +Full documentation for RedisVL, including a [getting started guide](https://docs.redisvl.com/en/stable/user_guide/getting_started_01.html), is available [here](https://docs.redisvl.com/en/stable/). \ No newline at end of file diff --git a/content/integrate/redisvl/install.md b/content/integrate/redisvl/install.md new file mode 100644 index 0000000000..6c66c51858 --- /dev/null +++ b/content/integrate/redisvl/install.md @@ -0,0 +1,24 @@ +--- +aliases: +- /integrate/redisvl/overview/install +description: Install RedisVL +linkTitle: Install +title: Install +type: integration +weight: 2 +--- +## Installation + +Install the `redisvl` package into your Python (>=3.8) environment using the `pip` command: + +```shell +pip install redisvl +``` + +Then make sure to have a Redis instance with the Redis Query Engine features enabled on Redis Cloud or locally in docker with Redis Stack: + +```shell +docker run -d --name redis -p 6379:6379 -p 8001:8001 redis/redis-stack:latest +``` + +After running the previous command, the Redis Insight GUI will be available at http://localhost:8001. diff --git a/content/integrate/redisvl/overview/_index.md b/content/integrate/redisvl/overview/_index.md deleted file mode 100644 index 998659f5f4..0000000000 --- a/content/integrate/redisvl/overview/_index.md +++ /dev/null @@ -1,19 +0,0 @@ ---- -description: An overview of RedisVL -linkTitle: Overview -title: RedisVL Overview -type: integration -weight: 1 ---- - -RedisVL provides the following features: - -- **Index management** - Design search schema and indexes with ease from YAML, with Python, or from the CLI. - -- **Advanced vector search** - Perform powerful vector searches with complex filtering support. - -- **Embedding creation** - Use OpenAI or any of the other supported vectorizers to create embeddings. - -- **CLI** - Interact with RedisVL using a command line interface (CLI) for ease of use. - -- **Semantic caching** - Use RedisVL to cache large language model (LLM) results, increasing queries per second (QPS) and decreasing cost. diff --git a/content/integrate/redisvl/overview/cli.md b/content/integrate/redisvl/overview/cli.md deleted file mode 100644 index 18cf01e0c7..0000000000 --- a/content/integrate/redisvl/overview/cli.md +++ /dev/null @@ -1,146 +0,0 @@ ---- -description: How to use RedisVL's CLI -title: RedisVL CLI -type: integration ---- -RedisVL is a Python library with a dedicated CLI to help load and create vector search indexes within Redis. - -This document will walk through how to use the RedisVL CLI (`rvl`). - -{{< note >}} -This document is a converted form of [this Jupyter notebook](https://github.com/RedisVentures/redisvl/blob/main/docs/user_guide/cli.ipynb). -{{< /note >}} - -Before beginning, be sure of the following: - -1. You have installed RedisVL and have that environment activated. -1. You have a running Redis instance with the Redis Query Engine capability. - -```bash -# First, see if the rvl tool is installed -$ rvl version -11:13:52 [RedisVL] INFO RedisVL version 0.1.2 -``` - -## Index - -The `rvl index` command can be used for a number of tasks related to creating and managing indexes. Whether you are working in Bash or another shell, this CLI tool can still be useful for managing and inspecting your indexes. - -First, create an index from a YAML schema that looks like the following: - -```yaml -version: '0.1.0' - -index: - name: vectorizers - prefix: doc - storage_type: hash - -fields: - - name: sentence - type: text - - name: embedding - type: vector - attrs: - dims: 768 - algorithm: flat - distance_metric: cosine -``` - -```bash -# Create an index from a yaml schema -$ rvl index create -s schema.yaml -11:13:54 [RedisVL] INFO Index created successfully -``` - -```bash -# List the indexes that are available -$ rvl index listall -11:13:56 [RedisVL] INFO Indices: -11:13:56 [RedisVL] INFO 1. vectorizers -``` - -```bash -# inspect the index fields -$ rvl index info -i providers - -Index Information: -╭──────────────┬────────────────┬────────────┬─────────────────┬────────────╮ -│ Index Name │ Storage Type │ Prefixes │ Index Options │ Indexing │ -├──────────────┼────────────────┼────────────┼─────────────────┼────────────┤ -│ vectorizers │ HASH │ ['doc'] │ [] │ 0 │ -╰──────────────┴────────────────┴────────────┴─────────────────┴────────────╯ -Index Fields: -╭───────────┬─────────────┬────────┬────────────────┬────────────────╮ -│ Name │ Attribute │ Type │ Field Option │ Option Value │ -├───────────┼─────────────┼────────┼────────────────┼────────────────┤ -│ sentence │ sentence │ TEXT │ WEIGHT │ 1 │ -│ embedding │ embedding │ VECTOR │ │ │ -╰───────────┴─────────────┴────────┴────────────────┴────────────────╯ -``` - -```bash -# delete an index without deleting the data within it -$ rvl index delete -i vectorizers -11:13:59 [RedisVL] INFO Index deleted successfully -``` - -```bash -# view the index -$ rvl index listall -11:14:00 [RedisVL] INFO Indices: -``` - -## Stats - -The `rvl stats` command will return some basic information about an index. This is useful for checking the status of an index, or for getting information about the index to use in other commands. - - -```bash -# create a new index with the same schema -$ rvl index create -s schema.yaml -11:14:02 [RedisVL] INFO Index created successfully -``` - -```bash -# list the indexes that are available -$ rvl index listall -11:14:03 [RedisVL] INFO Indices: -11:14:03 [RedisVL] INFO 1. vectorizers -``` - -```bash -# see all the stats for the index -$ rvl stats -i providers - -Statistics: -╭─────────────────────────────┬────────────╮ -│ Stat Key │ Value │ -├─────────────────────────────┼────────────┤ -│ num_docs │ 0 │ -│ num_terms │ 0 │ -│ max_doc_id │ 0 │ -│ num_records │ 0 │ -│ percent_indexed │ 1 │ -│ hash_indexing_failures │ 0 │ -│ number_of_uses │ 1 │ -│ bytes_per_record_avg │ nan │ -│ doc_table_size_mb │ 0 │ -│ inverted_sz_mb │ 0 │ -│ key_table_size_mb │ 0 │ -│ offset_bits_per_record_avg │ nan │ -│ offset_vectors_sz_mb │ 0 │ -│ offsets_per_term_avg │ nan │ -│ records_per_doc_avg │ nan │ -│ sortable_values_size_mb │ 0 │ -│ total_indexing_time │ 0 │ -│ total_inverted_index_blocks │ 0 │ -│ vector_index_sz_mb │ 0.00818634 │ -╰─────────────────────────────┴────────────╯ -``` - -```bash -$ rvl index destroy -i vectorizers - -09:00:27 [RedisVL] INFO Index deleted successfully -``` \ No newline at end of file diff --git a/content/integrate/redisvl/overview/install.md b/content/integrate/redisvl/overview/install.md deleted file mode 100644 index 00afefb7bd..0000000000 --- a/content/integrate/redisvl/overview/install.md +++ /dev/null @@ -1,75 +0,0 @@ ---- -description: How to install RedisVL -title: Install RedisVL -type: integration ---- -There are a few ways to install RedisVL. The easiest way is to use Python's `pip` command. - -## Install RedisVL with pip - -Install `redisvl` into your Python (>=3.8) environment using `pip`: - -```bash -$ pip install -U redisvl -``` - -RedisVL comes with a few dependencies that are automatically installed. However, a few dependencies -are optional and can be installed separately if needed: - -```bash -$ pip install redisvl[all] # install vectorizer dependencies -$ pip install redisvl[dev] # install dev dependencies -``` - -If you use Zsh, remember to escape the brackets: - -```bash -$ pip install redisvl\[all\] -``` - -This library supports the use of [hiredis](https://redis.com/lp/hiredis/), so you can also install RedisVL by running: - -```bash -pip install redisvl[hiredis] -``` - -## Install RedisVL from source - -To install RedisVL from source, clone the repository and install the package using `pip`: - -```bash -$ git clone https://github.com/redisventures/redisvl.git && cd redisvl -$ pip install . - -# or for an editable installation (for developers of RedisVL) -$ pip install -e . -``` - -## Install Redis - -RedisVL requires a distribution of Redis that supports the [Redis Query Engine](https://redis.com/modules/redis-search/) capability, of which there are three: - -1. [Redis Cloud](https://redis.com/try-free), a fully managed cloud offering that you can try for free. -2. [Redis Stack]({{< relref "/operate/oss_and_stack/install/install-stack/docker" >}}), a local docker image for testing and development. -3. [Redis Enterprise](https://redis.com/redis-enterprise/), a commercial self-hosted offering. - -### Redis Cloud - -Redis Cloud is the easiest way to get started with RedisVL. You can sign up for a free account [here](https://redis.com/try-free). Make sure to have the **Search and Query** capability enabled when creating your database. - -### Redis Stack (local development) - -For local development and testing, Redis Stack and be used. We recommend running Redis -in a docker container. To do so, run the following command: - -```bash -docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 redis/redis-stack:latest -``` - -This will also start the [Redis Insight application](https://redis.com/redis-enterprise/redis-insight/) at `http://localhost:8001`. - -### Redis Enterprise (self-hosted) - -Redis Enterprise is a commercial offering that can be self-hosted. You can download the latest version [here](https://redis.io/downloads/#software). - -If you are considering a self-hosted Redis Enterprise deployment on Kubernetes, there is the [Redis Enterprise Operator](https://docs.redis.com/latest/kubernetes/) for Kubernetes. This will allow you to easily deploy and manage a Redis Enterprise cluster on Kubernetes. \ No newline at end of file diff --git a/content/integrate/redisvl/user-guide/_index.md b/content/integrate/redisvl/user-guide/_index.md deleted file mode 100644 index 68e905311d..0000000000 --- a/content/integrate/redisvl/user-guide/_index.md +++ /dev/null @@ -1,8 +0,0 @@ ---- -description: The RedisVL user guide -linkTitle: User guide -title: RedisVL user guide -type: integration -weight: 2 ---- - diff --git a/content/integrate/redisvl/user-guide/get-started.md b/content/integrate/redisvl/user-guide/get-started.md deleted file mode 100644 index cd8a5cbbb9..0000000000 --- a/content/integrate/redisvl/user-guide/get-started.md +++ /dev/null @@ -1,375 +0,0 @@ ---- -description: Get started with RedisVL -linkTitle: Get started -title: Get started -type: integration -weight: 2 ---- -RedisVL is a versatile Python library with an integrated CLI, which is designed to enhance AI applications implemented using Redis. This guide will walk you through the following steps: - -1. Define an `IndexSchema`. -2. Prepare a sample dataset. -3. Create a `SearchIndex` object. -4. Test `rvl` CLI functionality. -5. Load the sample data. -6. Build `VectorQuery` objects and execute searches. -7. Update a `SearchIndex` object. - -{{< note >}} -This document is a converted form of [this Jupyter notebook](https://github.com/RedisVentures/redisvl/blob/main/docs/user_guide/getting_started_01.ipynb). -{{< /note >}} - -Before beginning, be sure of the following: - -1. You have installed RedisVL and have that environment activated. -1. You have a running Redis instance with the Redis Query Engine capability. - -## Define an `IndexSchema` - -The `IndexSchema` maintains crucial index configuration and field definitions to -enable search with Redis. For ease of use, the schema can be constructed from a -Python dictionary or a YAML file. - -### Example schema creation - -Consider a dataset with user information, including `job`, `age`, `credit_score`, -and a three-dimensional `user_embedding` vector. - -You must decide on a Redis index name and key prefix to use for this -dataset. Below are example schema definitions in both YAML and Python `dict` formats. - -**YAML definition:** - -```yaml -version: '0.1.0' - -index: - name: user_simple - prefix: user_simple_docs - -fields: - - name: user - type: tag - - name: credit_store - type: tag - - name: job - type: text - - name: age - type: numeric - - name: user_embedding - type: vector - attrs: - algorithm: flat - dims: 3 - distance_metric: cosine - datatype: float32 -``` - -Store this information in a local file, such as `schema.yaml`, for use with RedisVL. - -**Python dictionary:** - -```python -schema = { - "index": { - "name": "user_simple", - "prefix": "user_simple_docs", - }, - "fields": [ - {"name": "user", "type": "tag"}, - {"name": "credit_score", "type": "tag"}, - {"name": "job", "type": "text"}, - {"name": "age", "type": "numeric"}, - { - "name": "user_embedding", - "type": "vector", - "attrs": { - "dims": 3, - "distance_metric": "cosine", - "algorithm": "flat", - "datatype": "float32" - } - } - ] -} -``` - -## Sample dataset preparation - -Below, create a mock dataset with `user`, `job`, `age`, `credit_score`, and -`user_embedding` fields. The `user_embedding` vectors are synthetic examples -for demonstration purposes. - -For more information on creating real-world embeddings, refer to this -[article](https://mlops.community/vector-similarity-search-from-basics-to-production/). - -```python -import numpy as np - -data = [ - { - 'user': 'john', - 'age': 1, - 'job': 'engineer', - 'credit_score': 'high', - 'user_embedding': np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes() - }, - { - 'user': 'mary', - 'age': 2, - 'job': 'doctor', - 'credit_score': 'low', - 'user_embedding': np.array([0.1, 0.1, 0.5], dtype=np.float32).tobytes() - }, - { - 'user': 'joe', - 'age': 3, - 'job': 'dentist', - 'credit_score': 'medium', - 'user_embedding': np.array([0.9, 0.9, 0.1], dtype=np.float32).tobytes() - } -] -``` - -As seen above, the sample `user_embedding` vectors are converted into bytes using the `NumPy` Python package. - -## Create a `SearchIndex` - -With the schema and sample dataset ready, create a `SearchIndex`: - -```python -from redisvl.index import SearchIndex - -index = SearchIndex.from_dict(schema) -# or use .from_yaml('schema_file.yaml') -``` - -Now we also need to create a Redis connection. There are a few ways to do this: - -- Create and manage your own client connection (recommended). -- Provide a simple Redis URL and let RedisVL connect on your behalf. - -### Bring your own Redis connection instance - -This is ideal in scenarios where you have custom settings on the connection instance or if your application will share a connection pool: - -```python -from redis import Redis - -client = Redis.from_url("redis://localhost:6379") - -index.set_client(client) -# optionally provide an async Redis client object to enable async index operations -``` - -### Let the index manage the connection instance - -This is ideal for simple cases: - -```python -index.connect("redis://localhost:6379") -# optionally use an async client by passing use_async=True -``` - -### Create the underlying index - -Now that there's a connection to Redis, run the create command. - -```python -index.create(overwrite=True) -``` - -Note: at this point, the index has no associated data. Data loading follows. - -## Inspect with the `rvl` command - -Use the `rvl` CLI command to inspect the newly-created index and its fields: - -```python -$ rvl index listall -18:25:34 [RedisVL] INFO Indices: -18:25:34 [RedisVL] INFO 1. user_simple -``` - -```python -$ rvl index info -i user_simple - -╭──────────────┬────────────────┬──────────────────────┬─────────────────┬────────────╮ -│ Index Name │ Storage Type │ Prefixes │ Index Options │ Indexing │ -├──────────────┼────────────────┼──────────────────────┼─────────────────┼────────────┤ -│ user_simple │ HASH │ ['user_simple_docs'] │ [] │ 0 │ -╰──────────────┴────────────────┴──────────────────────┴─────────────────┴────────────╯ -Index Fields: -╭────────────────┬────────────────┬─────────┬────────────────┬────────────────╮ -│ Name │ Attribute │ Type │ Field Option │ Option Value │ -├────────────────┼────────────────┼─────────┼────────────────┼────────────────┤ -│ user │ user │ TAG │ SEPARATOR │ , │ -│ credit_score │ credit_score │ TAG │ SEPARATOR │ , │ -│ job │ job │ TEXT │ WEIGHT │ 1 │ -│ age │ age │ NUMERIC │ │ │ -│ user_embedding │ user_embedding │ VECTOR │ │ │ -╰────────────────┴────────────────┴─────────┴────────────────┴────────────────╯ -``` - -## Load data to `SearchIndex` - -Load the sample dataset to Redis: - -```python -keys = index.load(data) - -print(keys) - -['user:31d4f3c73f1a4c26b41cf0e2b8e0248a', - 'user:c9ff740437064b919245e49ef585484d', - 'user:6db5f2e09f08438785b73d8048d5350b'] -``` - -By default, `load` will create a unique Redis key as a combination of the index key `prefix` and a UUID. You can also customize the key by providing direct keys or pointing to a specified `id_field` on load. - -### Update the index with new data - -Update data using the `load` method: - -```python -# Add more data -new_data = [{ - 'user': 'tyler', - 'age': 9, - 'job': 'engineer', - 'credit_score': 'high', - 'user_embedding': np.array([0.1, 0.3, 0.5], dtype=np.float32).tobytes() -}] -keys = index.load(new_data) - -print(keys) - -['user_simple_docs:ea6e8f2f93d5447c950ccb6843627761'] -``` - -## Create `VectorQuery` objects - -Next, create a vector query object for your newly-populated index. This example will use a simple vector to demonstrate how vector search works. Vectors in production will likely be much larger than three floating point numbers and often require machine learning models (e.g., [Huggingface](https://huggingface.co/models) sentence transformers) or an embeddings API (e.g., Cohere and OpenAI). RedisVL provides a set of vectorizers to assist with vector creation. - -```python -from redisvl.query import VectorQuery -from jupyterutils import result_print - -query = VectorQuery( - vector=[0.1, 0.1, 0.5], - vector_field_name="user_embedding", - return_fields=["user", "age", "job", "credit_score", "vector_distance"], - num_results=3 -) -``` - -### Executing queries - -With your `VectorQuery` object defined, you can execute the query over the `SearchIndex` using the `query` method. - -```python -results = index.query(query) -result_print(results) -``` - -
vector_distanceuseragejobcredit_score
0john1engineerhigh
0mary2doctorlow
0.0566299557686tyler9engineerhigh
- -## Using an asynchronous Redis client - -The `AsyncSearchIndex` class, along with an asynchronous Redis Python client, provides for asynchronous queries, index creation, and data loading. This is the -recommended way for working with `redisvl` in production settings. - -```python -from redisvl.index import AsyncSearchIndex -from redis.asyncio import Redis - -client = Redis.from_url("redis://localhost:6379") - -index = AsyncSearchIndex.from_dict(schema) -index.set_client(client) - -# execute the vector query async -results = await index.aquery(query) -result_print(results) -``` - -
vector_distanceuseragejobcredit_score
0john1engineerhigh
0mary2doctorlow
0.0566299557686tyler9engineerhigh
- -## Update a schema - -In some scenarios, it makes sense to update the index schema. With Redis and RedisVL, this is easy because Redis can keep the underlying data in place while you update to the index configuration. - -Imagine you want to re-index this data in the following ways: - -- Use a `Tag` type for the `job` field instead of `Text`. -- Use an `HNSW` vector index for the `user_embedding` field instead of a `flat` vector index. - -```python -# Modify this schema to have what we want - -index.schema.remove_field("job") -index.schema.remove_field("user_embedding") -index.schema.add_fields([ - {"name": "job", "type": "tag"}, - { - "name": "user_embedding", - "type": "vector", - "attrs": { - "dims": 3, - "distance_metric": "cosine", - "algorithm": "flat", - "datatype": "float32" - } - } -]) - -# Run the index update but keep underlying data in place -await index.create(overwrite=True, drop=False) - -# Execute the vector query -results = await index.aquery(query) -result_print(results) -``` - -
vector_distanceuseragejobcredit_score
0john1engineerhigh
0mary2doctorlow
0.0566299557686tyler9engineerhigh
- -## Check index stats - -Use `rvl` to check the statistics for the index: - -```python -$ rvl stats -i user_simple - -Statistics: -╭─────────────────────────────┬─────────────╮ -│ Stat Key │ Value │ -├─────────────────────────────┼─────────────┤ -│ num_docs │ 4 │ -│ num_terms │ 0 │ -│ max_doc_id │ 4 │ -│ num_records │ 20 │ -│ percent_indexed │ 1 │ -│ hash_indexing_failures │ 0 │ -│ number_of_uses │ 2 │ -│ bytes_per_record_avg │ 1 │ -│ doc_table_size_mb │ 0.00044632 │ -│ inverted_sz_mb │ 1.90735e-05 │ -│ key_table_size_mb │ 0.000165939 │ -│ offset_bits_per_record_avg │ nan │ -│ offset_vectors_sz_mb │ 0 │ -│ offsets_per_term_avg │ 0 │ -│ records_per_doc_avg │ 5 │ -│ sortable_values_size_mb │ 0 │ -│ total_indexing_time │ 0.246 │ -│ total_inverted_index_blocks │ 11 │ -│ vector_index_sz_mb │ 0.0201416 │ -╰─────────────────────────────┴─────────────╯ - -``` - -## Cleanup - -```python -# clean up the index -await index.adelete() -``` diff --git a/content/integrate/redisvl/user-guide/json-v-hashes.md b/content/integrate/redisvl/user-guide/json-v-hashes.md deleted file mode 100644 index 2b162e4859..0000000000 --- a/content/integrate/redisvl/user-guide/json-v-hashes.md +++ /dev/null @@ -1,306 +0,0 @@ ---- -description: Storing JSON and hashes with RedisVL -linkTitle: JSON vs. hash storage -title: JSON vs. hash storage -type: integration -weight: 6 ---- - -Out of the box, Redis provides a [variety of data structures](https://redis.com/redis-enterprise/data-structures/) that can be used for your domain specific applications and use cases. -In this document, you will learn how to use RedisVL with both [hash]({{< relref "/develop/data-types/hashes" >}}) and [JSON]({{< relref "/develop/data-types/json/" >}}) data. - -{{< note >}} -This document is a converted form of [this Jupyter notebook](https://github.com/RedisVentures/redisvl/blob/main/docs/user_guide/hash_vs_json_05.ipynb). -{{< /note >}} - -Before beginning, be sure of the following: - -1. You have installed RedisVL and have that environment activated. -1. You have a running Redis instance with the Redis Query Engine capability. - -```python -# import necessary modules -import pickle - -from redisvl.redis.utils import buffer_to_array -from jupyterutils import result_print, table_print -from redisvl.index import SearchIndex - -# load in the example data and printing utils -data = pickle.load(open("hybrid_example_data.pkl", "rb")) -``` - -```python -table_print(data) -``` - -
useragejobcredit_scoreoffice_locationuser_embedding
john18engineerhigh-122.4194,37.7749b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'
derrick14doctorlow-122.4194,37.7749b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'
nancy94doctorhigh-122.4194,37.7749b'333?\xcd\xcc\xcc=\x00\x00\x00?'
tyler100engineerhigh-122.0839,37.3861b'\xcd\xcc\xcc=\xcd\xcc\xcc>\x00\x00\x00?'
tim12dermatologisthigh-122.0839,37.3861b'\xcd\xcc\xcc>\xcd\xcc\xcc>\x00\x00\x00?'
taimur15CEOlow-122.0839,37.3861b'\x9a\x99\x19?\xcd\xcc\xcc=\x00\x00\x00?'
joe35dentistmedium-122.0839,37.3861b'fff?fff?\xcd\xcc\xcc='
- - -## Hash or JSON - how to choose? - -Both storage options offer a variety of features and tradeoffs. Below, you will work through a dummy dataset to learn when and how to use both data types. - -### Working with hashes - -Hashes in Redis are simple collections of field-value pairs. Think of it like a mutable, single-level dictionary that contains multiple "rows": - -```python -{ - "model": "Deimos", - "brand": "Ergonom", - "type": "Enduro bikes", - "price": 4972, -} -``` - -Hashes are best suited for use cases with the following characteristics: - -- Performance (speed) and storage space (memory consumption) are top concerns. -- Data can be easily normalized and modeled as a single-level dictionary. - -> Hashes are typically the default recommendation. - -```python -# define the hash index schema -hash_schema = { - "index": { - "name": "user-hash", - "prefix": "user-hash-docs", - "storage_type": "hash", # default setting -- HASH - }, - "fields": [ - {"name": "user", "type": "tag"}, - {"name": "credit_score", "type": "tag"}, - {"name": "job", "type": "text"}, - {"name": "age", "type": "numeric"}, - {"name": "office_location", "type": "geo"}, - { - "name": "user_embedding", - "type": "vector", - "attrs": { - "dims": 3, - "distance_metric": "cosine", - "algorithm": "flat", - "datatype": "float32" - } - } - ], -} -``` - -```python -# construct a search index from the hash schema -hindex = SearchIndex.from_dict(hash_schema) - -# connect to local redis instance -hindex.connect("redis://localhost:6379") - -# create the index (no data yet) -hindex.create(overwrite=True) -``` - -```python -# show the underlying storage type -hindex.storage_type - - -``` - -#### Vectors as byte strings - -One nuance when working with hashes in Redis is that all vectorized data must be passed as a byte string (for efficient storage, indexing, and processing). An example of this can be seen below: - - -```python -# show a single entry from the data that will be loaded -data[0] - - {'user': 'john', - 'age': 18, - 'job': 'engineer', - 'credit_score': 'high', - 'office_location': '-122.4194,37.7749', - 'user_embedding': b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'} -``` - -```python -# load hash data -keys = hindex.load(data) -``` - -```python -$ rvl stats -i user-hash - - Statistics: - ╭─────────────────────────────┬─────────────╮ - │ Stat Key │ Value │ - ├─────────────────────────────┼─────────────┤ - │ num_docs │ 7 │ - │ num_terms │ 6 │ - │ max_doc_id │ 7 │ - │ num_records │ 44 │ - │ percent_indexed │ 1 │ - │ hash_indexing_failures │ 0 │ - │ number_of_uses │ 1 │ - │ bytes_per_record_avg │ 3.40909 │ - │ doc_table_size_mb │ 0.000767708 │ - │ inverted_sz_mb │ 0.000143051 │ - │ key_table_size_mb │ 0.000248909 │ - │ offset_bits_per_record_avg │ 8 │ - │ offset_vectors_sz_mb │ 8.58307e-06 │ - │ offsets_per_term_avg │ 0.204545 │ - │ records_per_doc_avg │ 6.28571 │ - │ sortable_values_size_mb │ 0 │ - │ total_indexing_time │ 0.587 │ - │ total_inverted_index_blocks │ 18 │ - │ vector_index_sz_mb │ 0.0202332 │ - ╰─────────────────────────────┴─────────────╯ -``` - -#### Performing queries - -Once the index is created and data is loaded into the right format, you can run queries against the index: - -```python -from redisvl.query import VectorQuery -from redisvl.query.filter import Tag, Text, Num - -t = (Tag("credit_score") == "high") & (Text("job") % "enginee*") & (Num("age") > 17) - -v = VectorQuery([0.1, 0.1, 0.5], - "user_embedding", - return_fields=["user", "credit_score", "age", "job", "office_location"], - filter_expression=t) - - -results = hindex.query(v) -result_print(results) - -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0johnhigh18engineer-122.4194,37.7749
0.109129190445tylerhigh100engineer-122.0839,37.3861
- -```python -# clean up -hindex.delete() -``` - -### Working with JSON - -Redis also supports native **JSON** objects. These can be multi-level (nested) objects, with full [JSONPath]({{< relref "/develop/data-types/json/" >}}path/) support for retrieving and updating sub-elements: - -```python -{ - "name": "bike", - "metadata": { - "model": "Deimos", - "brand": "Ergonom", - "type": "Enduro bikes", - "price": 4972, - } -} -``` - -JSON is best suited for use cases with the following characteristics: - -- Ease of use and data model flexibility are top concerns. -- Application data is already native JSON. -- Replacing another document storage/database solution. - -#### Full JSON Path support - -Because Redis enables full JSONPath support, when creating an index schema, elements need to be indexed and selected by their path with the desired `name` and `path` that points to where the data is located within the objects. - -{{< note >}} -By default, RedisVL will assume the path as `$.{name}` if not provided in JSON fields schema. -{{< /note >}} - -```python -# define the json index schema -json_schema = { - "index": { - "name": "user-json", - "prefix": "user-json-docs", - "storage_type": "json", # JSON storage type - }, - "fields": [ - {"name": "user", "type": "tag"}, - {"name": "credit_score", "type": "tag"}, - {"name": "job", "type": "text"}, - {"name": "age", "type": "numeric"}, - {"name": "office_location", "type": "geo"}, - { - "name": "user_embedding", - "type": "vector", - "attrs": { - "dims": 3, - "distance_metric": "cosine", - "algorithm": "flat", - "datatype": "float32" - } - } - ], -} -``` - -```python -# construct a search index from the JSON schema -jindex = SearchIndex.from_dict(json_schema) - -# connect to a local redis instance -jindex.connect("redis://localhost:6379") - -# create the index (no data yet) -jindex.create(overwrite=True) -``` - -```python -# note the multiple indices in the same database -$ rvl index listall - - 20:23:08 [RedisVL] INFO Indices: - 20:23:08 [RedisVL] INFO 1. user-json - -#### Vectors as float arrays - -Vectorized data stored in JSON must be stored as a pure array (e.g., a Python list) of floats. Modify your sample data to account for this below: - -```python -import numpy as np - -json_data = data.copy() - -for d in json_data: - d['user_embedding'] = buffer_to_array(d['user_embedding'], dtype=np.float32) -``` - -```python -# inspect a single JSON record -json_data[0] -``` - - {'user': 'john', - 'age': 18, - 'job': 'engineer', - 'credit_score': 'high', - 'office_location': '-122.4194,37.7749', - 'user_embedding': [0.10000000149011612, 0.10000000149011612, 0.5]} - - -```python -keys = jindex.load(json_data) -``` - -```python -# we can now run the exact same query as above -result_print(jindex.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0johnhigh18engineer-122.4194,37.7749
0.109129190445tylerhigh100engineer-122.0839,37.3861
- -## Cleanup - -```python -jindex.delete() -``` diff --git a/content/integrate/redisvl/user-guide/query-filter.md b/content/integrate/redisvl/user-guide/query-filter.md deleted file mode 100644 index 271f48a5ce..0000000000 --- a/content/integrate/redisvl/user-guide/query-filter.md +++ /dev/null @@ -1,540 +0,0 @@ ---- -description: Query and filter with RedisVL -linkTitle: Query and filter -title: Query and filter -type: integration -weight: 3 ---- -In this document, you will explore more complex queries that can be performed with RedisVL. - -{{< note >}} -This document is a converted form of [this Jupyter notebook](https://github.com/RedisVentures/redisvl/blob/main/docs/user_guide/hybrid_queries_02.ipynb). -{{< /note >}} - -Before beginning, be sure of the following: - -1. You have installed RedisVL and have that environment activated. -1. You have a running Redis instance with the Redis Query Engine capability. - -The sample binary data is in [this file on GitHub](https://github.com/RedisVentures/redisvl/blob/main/docs/user_guide/hybrid_example_data.pkl). - -```python -import pickle -from jupyterutils import table_print, result_print - -# load in the example data and printing utils -data = pickle.load(open("hybrid_example_data.pkl", "rb")) -table_print(data) -``` - -
useragejobcredit_scoreoffice_locationuser_embedding
john18engineerhigh-122.4194,37.7749b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'
derrick14doctorlow-122.4194,37.7749b'\xcd\xcc\xcc=\xcd\xcc\xcc=\x00\x00\x00?'
nancy94doctorhigh-122.4194,37.7749b'333?\xcd\xcc\xcc=\x00\x00\x00?'
tyler100engineerhigh-122.0839,37.3861b'\xcd\xcc\xcc=\xcd\xcc\xcc>\x00\x00\x00?'
tim12dermatologisthigh-122.0839,37.3861b'\xcd\xcc\xcc>\xcd\xcc\xcc>\x00\x00\x00?'
taimur15CEOlow-122.0839,37.3861b'\x9a\x99\x19?\xcd\xcc\xcc=\x00\x00\x00?'
joe35dentistmedium-122.0839,37.3861b'fff?fff?\xcd\xcc\xcc='
- -```python -schema = { - "index": { - "name": "user_queries", - "prefix": "user_queries_docs", - "storage_type": "hash", # default setting -- HASH - }, - "fields": [ - {"name": "user", "type": "tag"}, - {"name": "credit_score", "type": "tag"}, - {"name": "job", "type": "text"}, - {"name": "age", "type": "numeric"}, - {"name": "office_location", "type": "geo"}, - { - "name": "user_embedding", - "type": "vector", - "attrs": { - "dims": 3, - "distance_metric": "cosine", - "algorithm": "flat", - "datatype": "float32" - } - - } - ], -} -``` - -```python -from redisvl.index import SearchIndex - -# construct a search index from the schema -index = SearchIndex.from_dict(schema) - -# connect to local redis instance -index.connect("redis://localhost:6379") - -# create the index (no data yet) -index.create(overwrite=True) -``` - -```python -# inspect the newly-created index -$ rvl index listall - -18:26:34 [RedisVL] INFO Indices: -18:26:34 [RedisVL] INFO 1. user_queries -``` - -```python -keys = index.load(data) -``` - -## Hybrid queries - -Hybrid queries are queries that combine multiple types of filters. For example, you may want to search for a user that is a certain age, has a certain job, and is within a certain distance of a location. This is a hybrid query that combines numeric, tag, and geographic filters. - -### Tag filters - -Tag filters are filters that are applied to tag fields. These are fields that are not tokenized and are used to store a single categorical value. - -```python -from redisvl.query import VectorQuery -from redisvl.query.filter import Tag - -t = Tag("credit_score") == "high" - -v = VectorQuery([0.1, 0.1, 0.5], - "user_embedding", - return_fields=["user", "credit_score", "age", "job", "office_location"], - filter_expression=t) - -results = index.query(v) -result_print(results) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0johnhigh18engineer-122.4194,37.7749
0.109129190445tylerhigh100engineer-122.0839,37.3861
0.158808946609timhigh12dermatologist-122.0839,37.3861
0.266666650772nancyhigh94doctor-122.4194,37.7749
- -```python -# negation -t = Tag("credit_score") != "high" - -v.set_filter(t) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0derricklow14doctor-122.4194,37.7749
0.217882037163taimurlow15CEO-122.0839,37.3861
0.653301358223joemedium35dentist-122.0839,37.3861
- -```python -# use multiple tags as a list -t = Tag("credit_score") == ["high", "medium"] - -v.set_filter(t) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0johnhigh18engineer-122.4194,37.7749
0.109129190445tylerhigh100engineer-122.0839,37.3861
0.158808946609timhigh12dermatologist-122.0839,37.3861
0.266666650772nancyhigh94doctor-122.4194,37.7749
0.653301358223joemedium35dentist-122.0839,37.3861
- -```python -# use multiple tags as a set (to enforce uniqueness) -t = Tag("credit_score") == set(["high", "high", "medium"]) - -v.set_filter(t) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0johnhigh18engineer-122.4194,37.7749
0.109129190445tylerhigh100engineer-122.0839,37.3861
0.158808946609timhigh12dermatologist-122.0839,37.3861
0.266666650772nancyhigh94doctor-122.4194,37.7749
0.653301358223joemedium35dentist-122.0839,37.3861
- -What about scenarios where you might want to dynamically generate a list of tags? RedisVL allows you to do this gracefully without having to check for the empty case. The empty case is when you attempt to run a tag filter on a field with no defined values to match. For example: - -`Tag("credit_score") == []` - -An empty filter like the one above will yield a `*` Redis query filter that implies the base case: no filter. - -```python -# gracefully fallback to "*" filter if empty case -empty_case = Tag("credit_score") == [] - -v.set_filter(empty_case) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0johnhigh18engineer-122.4194,37.7749
0derricklow14doctor-122.4194,37.7749
0.109129190445tylerhigh100engineer-122.0839,37.3861
0.158808946609timhigh12dermatologist-122.0839,37.3861
0.217882037163taimurlow15CEO-122.0839,37.3861
0.266666650772nancyhigh94doctor-122.4194,37.7749
0.653301358223joemedium35dentist-122.0839,37.3861
- -### Numeric filters - -Numeric filters are filters that are applied to numeric fields and can be used to isolate a range of values for a given field. - -```python -from redisvl.query.filter import Num - -numeric_filter = Num("age") > 15 - -v.set_filter(numeric_filter) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0johnhigh18engineer-122.4194,37.7749
0.109129190445tylerhigh100engineer-122.0839,37.3861
0.266666650772nancyhigh94doctor-122.4194,37.7749
0.653301358223joemedium35dentist-122.0839,37.3861
- -```python -# exact match query -numeric_filter = Num("age") == 14 - -v.set_filter(numeric_filter) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0derricklow14doctor-122.4194,37.7749
- -```python -# negation -numeric_filter = Num("age") != 14 - -v.set_filter(numeric_filter) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0johnhigh18engineer-122.4194,37.7749
0.109129190445tylerhigh100engineer-122.0839,37.3861
0.158808946609timhigh12dermatologist-122.0839,37.3861
0.217882037163taimurlow15CEO-122.0839,37.3861
0.266666650772nancyhigh94doctor-122.4194,37.7749
0.653301358223joemedium35dentist-122.0839,37.3861
- -### Text filters - -Text filters are filters that are applied to text fields. These filters are applied to the entire text field. For example, if you have a text field that contains the text "The quick brown fox jumps over the lazy dog", a text filter of "quick" will match this text field. - -```python -from redisvl.query.filter import Text - -# exact match filter -- document must contain the exact word doctor -text_filter = Text("job") == "doctor" - -v.set_filter(text_filter) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0derricklow14doctor-122.4194,37.7749
0.266666650772nancyhigh94doctor-122.4194,37.7749
- -```python -# negation -- document must not contain the exact word doctor -negate_text_filter = Text("job") != "doctor" - -v.set_filter(negate_text_filter) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0johnhigh18engineer-122.4194,37.7749
0.109129190445tylerhigh100engineer-122.0839,37.3861
0.158808946609timhigh12dermatologist-122.0839,37.3861
0.217882037163taimurlow15CEO-122.0839,37.3861
0.653301358223joemedium35dentist-122.0839,37.3861
- -```python -# wildcard match filter -wildcard_filter = Text("job") % "doct*" - -v.set_filter(wildcard_filter) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0derricklow14doctor-122.4194,37.7749
0.266666650772nancyhigh94doctor-122.4194,37.7749
- -```python -# fuzzy match filter -fuzzy_match = Text("job") % "%%engine%%" - -v.set_filter(fuzzy_match) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0johnhigh18engineer-122.4194,37.7749
0.109129190445tylerhigh100engineer-122.0839,37.3861
- -```python -# conditional -- match documents with job field containing engineer OR doctor -conditional = Text("job") % "engineer|doctor" - -v.set_filter(conditional) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0johnhigh18engineer-122.4194,37.7749
0derricklow14doctor-122.4194,37.7749
0.109129190445tylerhigh100engineer-122.0839,37.3861
0.266666650772nancyhigh94doctor-122.4194,37.7749
- -```python -# gracefully fallback to "*" filter if empty case -empty_case = Text("job") % "" - -v.set_filter(empty_case) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0johnhigh18engineer-122.4194,37.7749
0derricklow14doctor-122.4194,37.7749
0.109129190445tylerhigh100engineer-122.0839,37.3861
0.158808946609timhigh12dermatologist-122.0839,37.3861
0.217882037163taimurlow15CEO-122.0839,37.3861
0.266666650772nancyhigh94doctor-122.4194,37.7749
0.653301358223joemedium35dentist-122.0839,37.3861
- -### Geographic filters - -Geographic filters are filters that are applied to geographic fields. These filters are used to find results that are within a certain distance of a given point. The distance is specified in kilometers, miles, meters, or feet. A radius can also be specified to find results within a certain radius of a given point. - -```python -from redisvl.query.filter import Geo, GeoRadius - -# within 10 km of San Francisco office -geo_filter = Geo("office_location") == GeoRadius(-122.4194, 37.7749, 10, "km") - -v.set_filter(geo_filter) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0johnhigh18engineer-122.4194,37.7749
0derricklow14doctor-122.4194,37.7749
0.266666650772nancyhigh94doctor-122.4194,37.7749
- -```python -# within 100 km Radius of San Francisco office -geo_filter = Geo("office_location") == GeoRadius(-122.4194, 37.7749, 100, "km") - -v.set_filter(geo_filter) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0johnhigh18engineer-122.4194,37.7749
0derricklow14doctor-122.4194,37.7749
0.109129190445tylerhigh100engineer-122.0839,37.3861
0.158808946609timhigh12dermatologist-122.0839,37.3861
0.217882037163taimurlow15CEO-122.0839,37.3861
0.266666650772nancyhigh94doctor-122.4194,37.7749
0.653301358223joemedium35dentist-122.0839,37.3861
- -```python -# not within 10 km Radius of San Francisco office -geo_filter = Geo("office_location") != GeoRadius(-122.4194, 37.7749, 10, "km") - -v.set_filter(geo_filter) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0.109129190445tylerhigh100engineer-122.0839,37.3861
0.158808946609timhigh12dermatologist-122.0839,37.3861
0.217882037163taimurlow15CEO-122.0839,37.3861
0.653301358223joemedium35dentist-122.0839,37.3861
- -## Combining Filters - -In this example, you will combine a numeric filter with a tag filter, and search for users that are between the ages of 20 and 30 and have a job title of "engineer". - -### Intersection ("and") - -```python -t = Tag("credit_score") == "high" -low = Num("age") >= 18 -high = Num("age") <= 100 - -combined = t & low & high - -v = VectorQuery([0.1, 0.1, 0.5], - "user_embedding", - return_fields=["user", "credit_score", "age", "job", "office_location"], - filter_expression=combined) - - -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0johnhigh18engineer-122.4194,37.7749
0.109129190445tylerhigh100engineer-122.0839,37.3861
0.266666650772nancyhigh94doctor-122.4194,37.7749
- -### Union ("or") - -The union of two queries is the set of all results that are returned by either of the two queries. The union of two queries is performed using the `|` operator. - -```python -low = Num("age") < 18 -high = Num("age") > 93 - -combined = low | high - -v.set_filter(combined) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0derricklow14doctor-122.4194,37.7749
0.109129190445tylerhigh100engineer-122.0839,37.3861
0.158808946609timhigh12dermatologist-122.0839,37.3861
0.217882037163taimurlow15CEO-122.0839,37.3861
0.266666650772nancyhigh94doctor-122.4194,37.7749
- -### Dynamic combination - -There are often situations where you may or may not want to use a filter in a -given query. As shown above, filters will except the `None` type and revert -to a wildcard filter that returns all results. - -The same goes for filter combinations, which enable rapid reuse of filters in -requests with different parameters as shown below. This removes the need for -a number of "if-then" conditionals to test for the empty case. - -```python -def make_filter(age=None, credit=None, job=None): - flexible_filter = ( - (Num("age") > age) & - (Tag("credit_score") == credit) & - (Text("job") % job) - ) - return flexible_filter - -``` - -```python -# all parameters -combined = make_filter(age=18, credit="high", job="engineer") -v.set_filter(combined) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0.109129190445tylerhigh100engineer-122.0839,37.3861
- -```python -# just age and credit_score -combined = make_filter(age=18, credit="high") -v.set_filter(combined) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0.109129190445tylerhigh100engineer-122.0839,37.3861
0.266666650772nancyhigh94doctor-122.4194,37.7749
- -```python -# just age -combined = make_filter(age=18) -v.set_filter(combined) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0.109129190445tylerhigh100engineer-122.0839,37.3861
0.266666650772nancyhigh94doctor-122.4194,37.7749
0.653301358223joemedium35dentist-122.0839,37.3861
- -```python -# no filters -combined = make_filter() -v.set_filter(combined) -result_print(index.query(v)) -``` - -
vector_distanceusercredit_scoreagejoboffice_location
0johnhigh18engineer-122.4194,37.7749
0derricklow14doctor-122.4194,37.7749
0.109129190445tylerhigh100engineer-122.0839,37.3861
0.158808946609timhigh12dermatologist-122.0839,37.3861
0.217882037163taimurlow15CEO-122.0839,37.3861
0.266666650772nancyhigh94doctor-122.4194,37.7749
0.653301358223joemedium35dentist-122.0839,37.3861
- -## Filter queries - -In some cases, you may not want to run a vector query, but just use a ``FilterExpression`` similar to a SQL query. The ``FilterQuery`` class enable this functionality. It is similar to the ``VectorQuery`` class but soley takes a ``FilterExpression``. - -```python -from redisvl.query import FilterQuery - -has_low_credit = Tag("credit_score") == "low" - -filter_query = FilterQuery( - return_fields=["user", "credit_score", "age", "job", "location"], - filter_expression=has_low_credit -) - -results = index.query(filter_query) - -result_print(results) -``` - -
usercredit_scoreagejob
derricklow14doctor
taimurlow15CEO
- -## Count queries - -In some cases, you may need to use a `FilterExpression` to execute a `CountQuery` that simply returns the count of the number of entities in the pertaining set. It is similar to the `FilterQuery` class but does not return the values of the underlying data. - -```python -from redisvl.query import CountQuery - -has_low_credit = Tag("credit_score") == "low" - -filter_query = CountQuery(filter_expression=has_low_credit) - -count = index.query(filter_query) - -print(f"{count} records match the filter expression {str(has_low_credit)} for the given index.") - -2 records match the filter expression @credit_score:{low} for the given index. -``` - -## Range queries - -Range Queries are useful for performing a vector search where only the results within a vector `distance_threshold` are returned. This enables the user to find all records within their dataset that are similar to a query vector where "similar" is defined by a quantitative value. - -```python -from redisvl.query import RangeQuery - -range_query = RangeQuery( - vector=[0.1, 0.1, 0.5], - vector_field_name="user_embedding", - return_fields=["user", "credit_score", "age", "job", "location"], - distance_threshold=0.2 -) - -# same as the vector query or filter query -results = index.query(range_query) - -result_print(results) -``` - -
vector_distanceusercredit_scoreagejob
0johnhigh18engineer
0derricklow14doctor
0.109129190445tylerhigh100engineer
0.158808946609timhigh12dermatologist
- -You can also change the distance threshold of the query object between uses. Here, you will set `distance_threshold==0.1`. This means that the query object will return all matches that are within 0.1 of the query object. This is a small distance, so expect to get fewer matches than before. - -```python -range_query.set_distance_threshold(0.1) - -result_print(index.query(range_query)) -``` - -
vector_distanceusercredit_scoreagejob
0johnhigh18engineer
0derricklow14doctor
- -Range queries can also be used with filters like any other query type. The following limits the results to only those records with a `job` of `engineer` while also being within the vector range (i.e., distance). - -```python -is_engineer = Text("job") == "engineer" - -range_query.set_filter(is_engineer) - -result_print(index.query(range_query)) -``` - -
vector_distanceusercredit_scoreagejob
0johnhigh18engineer
- -## Other Redis queries - -There may be cases where RedisVL does not cover the explicit functionality required by the query, either because of new releases that haven't been implemented in the client, or because of a very specific use case. In these cases, it is possible to use the `SearchIndex.search` method to execute queries with a redis-py `Query` object or through a raw Redis string. - -### redis-py - -```python -# Manipulate the redis-py Query object -redis_py_query = v.query - -# choose to sort by age instead of vector distance -redis_py_query.sort_by("age", asc=False) - -# run the query with the ``SearchIndex.search`` method -result = index.search(redis_py_query, v.params) -result_print(result) -``` - -
vector_distanceageusercredit_scorejoboffice_location
0.109129190445100tylerhighengineer-122.0839,37.3861
0.26666665077294nancyhighdoctor-122.4194,37.7749
0.65330135822335joemediumdentist-122.0839,37.3861
018johnhighengineer-122.4194,37.7749
0.21788203716315taimurlowCEO-122.0839,37.3861
014derricklowdoctor-122.4194,37.7749
0.15880894660912timhighdermatologist-122.0839,37.3861
- -### Raw Redis query string - -One case might be where you want to have a search that only filters on a tag field and doesn't need other functionality. Conversely, you may require a query that is more complex than what is currently supported by RedisVL. In these cases, you can use the `SearchIndex.search` method with a raw Redis query string. - -```python -t = Tag("credit_score") == "high" - -str(t) - -'@credit_score:{high}' -``` - -```python -results = index.search(str(t)) -for r in results.docs: - print(r.__dict__) - - {'id': 'user_queries_docs:0e511391dcf346639669bdba70a189c0', 'payload': None, 'user': 'john', 'age': '18', 'job': 'engineer', 'credit_score': 'high', 'office_location': '-122.4194,37.7749', 'user_embedding': '==\x00\x00\x00?'} - {'id': 'user_queries_docs:d204e8e5df90467dbff5b2fb6f800a78', 'payload': None, 'user': 'nancy', 'age': '94', 'job': 'doctor', 'credit_score': 'high', 'office_location': '-122.4194,37.7749', 'user_embedding': '333?=\x00\x00\x00?'} - {'id': 'user_queries_docs:7cf3d6b1a4044966b4f0c5d3725a5e03', 'payload': None, 'user': 'tyler', 'age': '100', 'job': 'engineer', 'credit_score': 'high', 'office_location': '-122.0839,37.3861', 'user_embedding': '=>\x00\x00\x00?'} - {'id': 'user_queries_docs:f6581edaaeaf432a85c1d1df8fdf5edc', 'payload': None, 'user': 'tim', 'age': '12', 'job': 'dermatologist', 'credit_score': 'high', 'office_location': '-122.0839,37.3861', 'user_embedding': '>>\x00\x00\x00?'} -``` - -## Inspecting queries - -In this example, you will learn how to inspect the query that is generated by RedisVL. This can be useful for debugging purposes or for understanding how the query is being executed. - -Consider an example of a query that combines a numeric filter with a tag filter. This will search for users that are between the ages of between 18 and 100, have a high credit score, and sort the results by closest vector distance to the query vector. - -```python -t = Tag("credit_score") == "high" -low = Num("age") >= 18 -high = Num("age") <= 100 - -combined = t & low & high - -v.set_filter(combined) - -# Using the str() method, you can see what Redis Query this will emit. -str(v) - -'((@credit_score:{high} @age:[18 +inf]) @age:[-inf 100])=>[KNN 10 @user_embedding $vector AS vector_distance] RETURN 6 user credit_score age job office_location vector_distance SORTBY vector_distance ASC DIALECT 2 LIMIT 0 10' -``` - -```python -# Cleanup -index.delete() -``` \ No newline at end of file diff --git a/content/integrate/redisvl/user-guide/semantic-caching.md b/content/integrate/redisvl/user-guide/semantic-caching.md deleted file mode 100644 index c5234aaa35..0000000000 --- a/content/integrate/redisvl/user-guide/semantic-caching.md +++ /dev/null @@ -1,291 +0,0 @@ ---- -description: Semantic caching with RedisVL -linkTitle: Semantic caching -title: Semantic caching -type: integration -weight: 5 ---- - -{{< note >}} -This document is a converted form of [this Jupyter notebook](https://github.com/RedisVentures/redisvl/blob/main/docs/user_guide/llmcache_03.ipynb). -{{< /note >}} - -Before beginning, be sure of the following: - -1. You have installed RedisVL and have that environment activated. -1. You have a running Redis instance with the Redis Query Engine capability. - -## Semantic caching for LLMs - -RedisVL provides a `SemanticCache` interface that uses Redis's built-in caching capabilities and vector search to store responses from previously-answered questions. This reduces the number of requests and tokens sent to LLM services, decreasing costs and enhancing application throughput by reducing the time taken to generate responses. - -This document will teach you how to use Redis as a semantic cache for your applications. - -Begin by importing [OpenAI](https://platform.openai.com) so you can use their API for responding to user prompts. You will also create a simple `ask_openai` helper method to assist. - -```python -import os -import getpass -import time - -from openai import OpenAI - -import numpy as np - -os.environ["TOKENIZERS_PARALLELISM"] = "False" - -api_key = os.getenv("OPENAI_API_KEY") or getpass.getpass("Enter your OpenAI API key: ") - -client = OpenAI(api_key=api_key) - -def ask_openai(question: str) -> str: - response = client.completions.create( - model="gpt-3.5-turbo-instruct", - prompt=question, - max_tokens=200 - ) - return response.choices[0].text.strip() -``` - -```python -# Test -print(ask_openai("What is the capital of France?")) -``` - - The capital of France is Paris. - -## Initializing `SemanticCache` - -Upon initialization, `SemanticCache` will automatically create an index within Redis for the semantic cache content. - -```python -from redisvl.extensions.llmcache import SemanticCache - -llmcache = SemanticCache( - name="llmcache", # underlying search index name - prefix="llmcache", # redis key prefix for hash entries - redis_url="redis://localhost:6379", # redis connection url string - distance_threshold=0.1 # semantic cache distance threshold -) -``` - -```python -# look at the index specification created for the semantic cache lookup -$ rvl index info -i llmcache - - Index Information: - ╭──────────────┬────────────────┬──────────────┬─────────────────┬────────────╮ - │ Index Name │ Storage Type │ Prefixes │ Index Options │ Indexing │ - ├──────────────┼────────────────┼──────────────┼─────────────────┼────────────┤ - │ llmcache │ HASH │ ['llmcache'] │ [] │ 0 │ - ╰──────────────┴────────────────┴──────────────┴─────────────────┴────────────╯ - Index Fields: - ╭───────────────┬───────────────┬────────┬────────────────┬────────────────╮ - │ Name │ Attribute │ Type │ Field Option │ Option Value │ - ├───────────────┼───────────────┼────────┼────────────────┼────────────────┤ - │ prompt │ prompt │ TEXT │ WEIGHT │ 1 │ - │ response │ response │ TEXT │ WEIGHT │ 1 │ - │ prompt_vector │ prompt_vector │ VECTOR │ │ │ - ╰───────────────┴───────────────┴────────┴────────────────┴────────────────╯ -``` - -## Basic Cache Usage - -```python -question = "What is the capital of France?" -``` - -```python -# Check the semantic cache -- should be empty -if response := llmcache.check(prompt=question): - print(response) -else: - print("Empty cache") - - Empty cache -``` - -Your initial cache check should be empty since you have yet to store anything in the cache. Below, store the `question`, the -proper `response`, and any arbitrary `metadata` (as a Python dictionary object) in the cache. - -```python -# Cache the question, answer, and arbitrary metadata -llmcache.store( - prompt=question, - response="Paris", - metadata={"city": "Paris", "country": "france"} -) -``` - -```python -# Check the cache again -if response := llmcache.check(prompt=question, return_fields=["prompt", "response", "metadata"]): - print(response) -else: - print("Empty cache") - -[{'id': 'llmcache:115049a298532be2f181edb03f766770c0db84c22aff39003fec340deaec7545', 'vector_distance': '9.53674316406e-07', 'prompt': 'What is the capital of France?', 'response': 'Paris', 'metadata': {'city': 'Paris', 'country': 'france'}}] -``` - -```python -# Check for a semantically similar result -question = "What actually is the capital of France?" -llmcache.check(prompt=question)[0]['response'] - - 'Paris' -``` - -## Customize the distance threshhold - -For most use cases, the correct semantic similarity threshhold is not a fixed quantity. Depending on the choice of embedding model, -the properties of the input query, and the business use case, the threshhold might need to change. - -Fortunately, you can seamlessly adjust the threshhold at any point, as shown below: - -```python -# Widen the semantic distance threshold -llmcache.set_threshold(0.3) -``` - -```python -# Really try to trick it by asking around the point -# But is able to slip just under our new threshold -question = "What is the capital city of the country in Europe that also has a city named Nice?" -llmcache.check(prompt=question)[0]['response'] - - 'Paris' -``` - -```python -# Invalidate the cache completely by clearing it out -llmcache.clear() - -# should be empty now -llmcache.check(prompt=question) - - [] -``` - -## Use TTL - -Redis uses optional time-to-live (TTL) policies to expire individual keys at points in time in the future. -This allows you to focus on your data flow and business logic without worrying about complex cleanup tasks. - -A TTL policy set on the `SemanticCache` allows you to temporarily hold onto cache entries. Set the TTL policy to 5 seconds. - -```python -llmcache.set_ttl(5) # 5 seconds -``` - -```python -llmcache.store("This is a TTL test", "This is a TTL test response") - -time.sleep(5) -``` - -```python -# confirm that the cache has cleared by now on it's own -result = llmcache.check("This is a TTL test") - -print(result) - -[] -``` - -```python -# Reset the TTL to null (long lived data) -llmcache.set_ttl() -``` - -## Simple performance testing - -Next, you will measure the speedup obtained by using `SemanticCache`. You will use the `time` module to measure the time taken to generate responses with and without `SemanticCache`. - -```python -def answer_question(question: str) -> str: - """Helper function to answer a simple question using OpenAI with a wrapper - check for the answer in the semantic cache first. - - Args: - question (str): User input question. - - Returns: - str: Response. - """ - results = llmcache.check(prompt=question) - if results: - return results[0]["response"] - else: - answer = ask_openai(question) - return answer -``` - -```python -start = time.time() -# asking a question -- openai response time -question = "What was the name of the first US President?" -answer = answer_question(question) -end = time.time() - -print(f"Without caching, a call to openAI to answer this simple question took {end-start} seconds.") - -Without caching, a call to openAI to answer this simple question took 0.5017588138580322 seconds. -``` - -```python -llmcache.store(prompt=question, response="George Washington") -``` - -```python -# Calculate the avg latency for caching over LLM usage -times = [] - -for _ in range(10): - cached_start = time.time() - cached_answer = answer_question(question) - cached_end = time.time() - times.append(cached_end-cached_start) - -avg_time_with_cache = np.mean(times) -print(f"Avg time taken with LLM cache enabled: {avg_time_with_cache}") -print(f"Percentage of time saved: {round(((end - start) - avg_time_with_cache) / (end - start) * 100, 2)}%") -``` - -Avg time taken with LLM cache enabled: 0.2560166358947754 -Percentage of time saved: 82.47% -``` - -```bash -# check the stats of the index -$ rvl stats -i llmcache - - Statistics: - ╭─────────────────────────────┬─────────────╮ - │ Stat Key │ Value │ - ├─────────────────────────────┼─────────────┤ - │ num_docs │ 1 │ - │ num_terms │ 19 │ - │ max_doc_id │ 3 │ - │ num_records │ 23 │ - │ percent_indexed │ 1 │ - │ hash_indexing_failures │ 0 │ - │ number_of_uses │ 19 │ - │ bytes_per_record_avg │ 5.30435 │ - │ doc_table_size_mb │ 0.000134468 │ - │ inverted_sz_mb │ 0.000116348 │ - │ key_table_size_mb │ 2.76566e-05 │ - │ offset_bits_per_record_avg │ 8 │ - │ offset_vectors_sz_mb │ 2.09808e-05 │ - │ offsets_per_term_avg │ 0.956522 │ - │ records_per_doc_avg │ 23 │ - │ sortable_values_size_mb │ 0 │ - │ total_indexing_time │ 1.211 │ - │ total_inverted_index_blocks │ 19 │ - │ vector_index_sz_mb │ 3.0161 │ - ╰─────────────────────────────┴─────────────╯ -``` - -```python -# Clear the cache AND delete the underlying index -llmcache.delete() -``` diff --git a/content/integrate/redisvl/user-guide/vectorizers.md b/content/integrate/redisvl/user-guide/vectorizers.md deleted file mode 100644 index 829a08b8e1..0000000000 --- a/content/integrate/redisvl/user-guide/vectorizers.md +++ /dev/null @@ -1,327 +0,0 @@ ---- -description: Supported vectorizers -linkTitle: Vectorizers -title: Vectorizers -type: integration -weight: 4 ---- -In this document, you will learn how to use RedisVL to create embeddings using the built-in text embedding vectorizers. RedisVL supports: - -1. OpenAI -1. HuggingFace -1. Vertex AI -1. Cohere - -{{< note >}} -This document is a converted form of [this Jupyter notebook](https://github.com/RedisVentures/redisvl/blob/main/docs/user_guide/vectorizers_04.ipynb). -{{< /note >}} - -Before beginning, be sure of the following: - -1. You have installed RedisVL and have that environment activated. -1. You have a running Redis instance with the Redis Query Engine capability. - -```python -# import necessary modules -import os -``` - -## Create text embeddings - -This example will show how to create an embedding from three simple sentences with a number of different text vectorizers in RedisVL. - -- "That is a happy dog" -- "That is a happy person" -- "Today is a nice day" - -### OpenAI - -The `OpenAITextVectorizer` makes it easy to use RedisVL with the embedding models from OpenAI. For this you will need to install `openai`. - -```bash -pip install openai -``` - -```python -import getpass - -# setup the API Key -api_key = os.environ.get("OPENAI_API_KEY") or getpass.getpass("Enter your OpenAI API key: ") -``` - -```python -from redisvl.utils.vectorize import OpenAITextVectorizer - -# create a vectorizer -oai = OpenAITextVectorizer( - model="text-embedding-ada-002", - api_config={"api_key": api_key}, -) - -test = oai.embed("This is a test sentence.") -print("Vector dimensions: ", len(test)) -test[:10] - -Vector dimensions: 1536 - -[-0.001025049015879631, - -0.0030993607360869646, - 0.0024536605924367905, - -0.004484387580305338, - -0.010331203229725361, - 0.012700922787189484, - -0.005368996877223253, - -0.0029411641880869865, - -0.0070833307690918446, - -0.03386051580309868] -``` - -```python -# Create many embeddings at once -sentences = [ - "That is a happy dog", - "That is a happy person", - "Today is a sunny day" -] - -embeddings = oai.embed_many(sentences) -embeddings[0][:10] - -[-0.01747742109000683, - -5.228330701356754e-05, - 0.0013870716793462634, - -0.025637786835432053, - -0.01985435001552105, - 0.016117358580231667, - -0.0037306349258869886, - 0.0008945261361077428, - 0.006577865686267614, - -0.025091219693422318] -``` - -```python -# openai also supports asyncronous requests, which you can use to speed up the vectorization process. -embeddings = await oai.aembed_many(sentences) -print("Number of Embeddings:", len(embeddings)) - -Number of Embeddings: 3 -``` - -### Huggingface - -[Huggingface](https://huggingface.co/models) is a popular natural language processing (NLP) platform that has a number of pre-trained models you can use off the shelf. RedisVL supports using Huggingface "Sentence Transformers" to create embeddings from text. To use Huggingface, you will need to install the `sentence-transformers` library. - -```bash -pip install sentence-transformers -``` - -```python -os.environ["TOKENIZERS_PARALLELISM"] = "false" -from redisvl.utils.vectorize import HFTextVectorizer - -# create a vectorizer -# choose your model from the huggingface website -hf = HFTextVectorizer(model="sentence-transformers/all-mpnet-base-v2") - -# embed a sentence -test = hf.embed("This is a test sentence.") -test[:10] - -[0.00037810884532518685, - -0.05080341175198555, - -0.03514723479747772, - -0.02325104922056198, - -0.044158220291137695, - 0.020487844944000244, - 0.0014617963461205363, - 0.031261757016181946, - 0.05605152249336243, - 0.018815357238054276] -``` - -```python -# You can also create many embeddings at once -embeddings = hf.embed_many(sentences, as_buffer=True) -``` - -### VertexAI - -[VertexAI](https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-text-embeddings) is GCP's fully-featured AI platform, which includes a number of pre-trained LLMs. RedisVL supports using VertexAI to create embeddings from these models. To use VertexAI, you will first need to install the `google-cloud-aiplatform` library. - -```bash -pip install google-cloud-aiplatform>=1.26 -``` - -Then you need to gain access to a [Google Cloud Project](https://cloud.google.com/gcp?hl=en) and provide [access to credentials](https://cloud.google.com/docs/authentication/application-default-credentials). This is accomplished by setting the `GOOGLE_APPLICATION_CREDENTIALS` environment variable to the path of a JSON key file downloaded from your service account on GCP. - -Finally, you need to find your [project ID](https://support.google.com/googleapi/answer/7014113?hl=en) and [geographic region for VertexAI](https://cloud.google.com/vertex-ai/docs/general/locations). - -Make sure the following env vars are set: - -```bash -GOOGLE_APPLICATION_CREDENTIALS= -GCP_PROJECT_ID= -GCP_LOCATION= -``` - -```python -from redisvl.utils.vectorize import VertexAITextVectorizer - - -# create a vectorizer -vtx = VertexAITextVectorizer(api_config={ - "project_id": os.environ.get("GCP_PROJECT_ID") or getpass.getpass("Enter your GCP Project ID: "), - "location": os.environ.get("GCP_LOCATION") or getpass.getpass("Enter your GCP Location: "), - "google_application_credentials": os.environ.get("GOOGLE_APPLICATION_CREDENTIALS") or getpass.getpass("Enter your Google App Credentials path: ") -}) - -# embed a sentence -test = vtx.embed("This is a test sentence.") -test[:10] - -[0.04373306408524513, - -0.05040992051362991, - -0.011946038343012333, - -0.043528858572244644, - 0.021510830149054527, - 0.028604144230484962, - 0.014770914800465107, - -0.01610461436212063, - -0.0036560404114425182, - 0.013746795244514942] -``` - -### Cohere - -[Cohere](https://dashboard.cohere.ai/) allows you to implement language AI in your product. The `CohereTextVectorizer` makes it simple to use RedisVL with the embedding models at Cohere. For this, you will need to install `cohere`. - -```bash -pip install cohere -``` - -```python -import getpass -# set up the API Key -api_key = os.environ.get("COHERE_API_KEY") or getpass.getpass("Enter your Cohere API key: ") -``` - -Special attention needs to be paid to the `input_type` parameter for each `embed` call. For example, for embedding -queries, you should set `input_type='search_query'`. For embedding documents, set `input_type='search_document'`. See -more information [here](https://docs.cohere.com/reference/embed). - -```python -from redisvl.utils.vectorize import CohereTextVectorizer - -# create a vectorizer -co = CohereTextVectorizer( - model="embed-english-v3.0", - api_config={"api_key": api_key}, -) - -# embed a search query -test = co.embed("This is a test sentence.", input_type='search_query') -print("Vector dimensions: ", len(test)) -print(test[:10]) - -# embed a document -test = co.embed("This is a test sentence.", input_type='search_document') -print("Vector dimensions: ", len(test)) -print(test[:10]) - -Vector dimensions: 1024 -[-0.010856628, -0.019683838, -0.0062179565, 0.003545761, -0.047943115, 0.0009365082, -0.005924225, 0.016174316, -0.03289795, 0.049194336] -Vector dimensions: 1024 -[-0.009712219, -0.016036987, 2.8073788e-05, -0.022491455, -0.041259766, 0.002281189, -0.033294678, -0.00057029724, -0.026260376, 0.0579834] -``` - -Learn more about using RedisVL and Cohere together through [this dedicated user guide](https://docs.cohere.com/docs/redis-and-cohere). - -## Search with provider embeddings - -Now that you've created your embeddings, you can use them to search for similar sentences. You will use the same three sentences from above and search for similar sentences. - -First, create the schema for your index. - -Here's what the schema for the example looks like in YAML for the HuggingFace vectorizer: - -```yaml -version: '0.1.0' - -index: - name: vectorizers - prefix: doc - storage_type: hash - -fields: - - name: sentence - type: text - - name: embedding - type: vector - attrs: - dims: 768 - algorithm: flat - distance_metric: cosine -``` - -```python -from redisvl.index import SearchIndex - -# construct a search index from the schema -index = SearchIndex.from_yaml("./schema.yaml") - -# connect to local redis instance -index.connect("redis://localhost:6379") - -# create the index (no data yet) -index.create(overwrite=True) -``` - -```python -# use the CLI to see the created index -!rvl index listall - -22:02:27 [RedisVL] INFO Indices: -22:02:27 [RedisVL] INFO 1. vectorizers -``` - -```python -# load expects an iterable of dictionaries where -# the vector is stored as a bytes buffer - -data = [{"text": t, - "embedding": v} - for t, v in zip(sentences, embeddings)] - -index.load(data) - - ['doc:17c401b679ce43cb82f3ab2280ad02f2', - 'doc:3fc0502bec434b17a3f06e20824b2e59', - 'doc:199f17b0e5d24dcaa1fd4fb41558150c'] -``` - -```python -from redisvl.query import VectorQuery - -# use the HuggingFace vectorizer again to create a query embedding -query_embedding = hf.embed("That is a happy cat") - -query = VectorQuery( - vector=query_embedding, - vector_field_name="embedding", - return_fields=["text"], - num_results=3 -) - -results = index.query(query) -for doc in results: - print(doc["text"], doc["vector_distance"]) - -That is a happy dog 0.160862326622 -That is a happy person 0.273598492146 -Today is a sunny day 0.744559407234 -``` - -```python -# cleanup -index.delete() -```