\",\n",
+ ")\n",
+ "```"
]
},
{
"cell_type": "code",
- "execution_count": 69,
+ "execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
- "[{'id': 'movie:01JQC7NC1KPM2X2EJAQ8DFP1W7',\n",
- " 'vector_distance': '0.643690168858',\n",
+ "[{'vector_distance': '0.645975470543',\n",
" 'title': 'The Incredibles',\n",
- " 'description': \"A family of undercover superheroes, while trying to live the quiet suburban life, are forced into action to save the world. Bob Parr (Mr. Incredible) and his wife Helen (Elastigirl) were among the world's greatest crime fighters, but now they must assume civilian identities and retreat to the suburbs to live a 'normal' life with their three children. However, the family's desire to help the world pulls them back into action when they face a new and dangerous enemy.\"},\n",
- " {'id': 'movie:01JQC7NC1JXCDCBZY83G914DZD',\n",
- " 'vector_distance': '0.66843944788',\n",
+ " 'vector_similarity': '0.677012264729',\n",
+ " 'text_score': '10.5386477145',\n",
+ " 'hybrid_score': '3.63550289966'},\n",
+ " {'vector_distance': '0.797545194626',\n",
+ " 'title': 'Skyfall',\n",
+ " 'vector_similarity': '0.601227402687',\n",
+ " 'text_score': '4.73920856087',\n",
+ " 'hybrid_score': '1.84262175014'},\n",
+ " {'vector_distance': '0.608649492264',\n",
" 'title': 'Explosive Pursuit',\n",
- " 'description': 'A daring cop chases a notorious criminal across the city in a high-stakes game of cat and mouse.'},\n",
- " {'id': 'movie:01JQC7NC1KEEH4ZSA5R1PQAPZM',\n",
- " 'vector_distance': '0.698122441769',\n",
- " 'title': 'Mad Max: Fury Road',\n",
- " 'description': \"In a post-apocalyptic wasteland, Max teams up with Furiosa to escape a tyrant's clutches and find freedom.\"}]"
+ " 'vector_similarity': '0.695675253868',\n",
+ " 'text_score': '3.93239518818',\n",
+ " 'hybrid_score': '1.66669123416'}]"
]
},
- "execution_count": 69,
+ "execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
- "query = make_vector_query(user_query, num_results=3)\n",
+ "from redisvl.query import HybridQuery\n",
+ "\n",
+ "vector = model.embed(user_query, as_buffer=True, dtype=\"float32\")\n",
+ "\n",
+ "query = HybridQuery(\n",
+ " text=user_query,\n",
+ " text_field_name=\"description\",\n",
+ " vector=vector,\n",
+ " vector_field_name=\"description_vector\",\n",
+ " return_fields=[\"title\"],\n",
+ ")\n",
+ "\n",
+ "results = index.query(query)\n",
"\n",
- "# Check standard vector search results\n",
- "index.query(query)"
+ "results[:3]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "Next, we add a full-text search predicate using RedisVL helpers and our user-query tokenizer:"
+ "That's it! That is all it takes to perform a hybrid text matching and vector query with RedisVL.\n",
+ "Of course there are many more configurations and things we can do with the `HybridQuery` class. Let's investigate.\n",
+ "\n",
+ "First, let's look at just the text query part that is being run:"
]
},
{
"cell_type": "code",
- "execution_count": 70,
+ "execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
- "'(~@description:(action | adventure | movie | great | fighting | scenes | crime | busting | superheroes | magic))=>[KNN 3 @description_vector $vector AS vector_distance]'"
+ "'(~@description:(action | adventure | movie | great | fighting | scenes | dangerous | criminal | crime | busting | superheroes | magic))=>[KNN 10 @description_vector $vector AS vector_distance]'"
]
},
- "execution_count": 70,
+ "execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
- "base_full_text_query = str(Text(\"description\") % tokenize_query(user_query))\n",
- "\n",
- "# Add the optional flag, \"~\", so that this doesn't also act as a strict text filter\n",
- "full_text_query = f\"(~{base_full_text_query})\"\n",
- "\n",
- "\n",
- "# Add full-text predicate to the vector query \n",
- "query.set_filter(full_text_query)\n",
- "query.query_string()"
+ "query._build_query_string()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "**The query string above combines both full-text search and a vector search.** This will be passed to the aggregation API to combine using a simple weighted sum of scores before a final sort and truncation.\n",
- "\n",
- "Note: for the following query to work `redis-py >= 5.2.0`"
+ "### Choosing your stopwords for better queries\n",
+ "You can see that the user query string has been tokenized and certain stopwords like 'and', 'for', 'with', 'but', have been removed, otherwise you would get matches on irrelevant words.\n",
+ "RedisVL uses [NLTK](https://www.nltk.org/index.html) english stopwords as the the default. You can change which default language stopwords to use with the `stopwords` argument.\n",
+ "You specify a language, like 'german', 'arabic', 'greek' and many others, provide your own list of stopwords, or set it to `None` to not remove any."
]
},
{
"cell_type": "code",
- "execution_count": 92,
+ "execution_count": 14,
"metadata": {},
"outputs": [
{
- "data": {
- "text/plain": [
- "[{'vector_distance': '0.643690168858',\n",
- " '__score': '5.82636454242',\n",
- " 'title': 'The Incredibles',\n",
- " 'description': \"A family of undercover superheroes, while trying to live the quiet suburban life, are forced into action to save the world. Bob Parr (Mr. Incredible) and his wife Helen (Elastigirl) were among the world's greatest crime fighters, but now they must assume civilian identities and retreat to the suburbs to live a 'normal' life with their three children. However, the family's desire to help the world pulls them back into action when they face a new and dangerous enemy.\",\n",
- " 'cosine_similarity': '0.678154915571',\n",
- " 'bm25_score': '5.82636454242',\n",
- " 'hybrid_score': '2.22261780363'},\n",
- " {'vector_distance': '0.66843944788',\n",
- " '__score': '0',\n",
- " 'title': 'Explosive Pursuit',\n",
- " 'description': 'A daring cop chases a notorious criminal across the city in a high-stakes game of cat and mouse.',\n",
- " 'cosine_similarity': '0.66578027606',\n",
- " 'bm25_score': '0',\n",
- " 'hybrid_score': '0.466046193242'},\n",
- " {'vector_distance': '0.698122441769',\n",
- " '__score': '0',\n",
- " 'title': 'Mad Max: Fury Road',\n",
- " 'description': \"In a post-apocalyptic wasteland, Max teams up with Furiosa to escape a tyrant's clutches and find freedom.\",\n",
- " 'cosine_similarity': '0.650938779116',\n",
- " 'bm25_score': '0',\n",
- " 'hybrid_score': '0.455657145381'}]"
- ]
- },
- "execution_count": 92,
- "metadata": {},
- "output_type": "execute_result"
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "(~@description:(film | d\\'action | d\\'aventure | superbes | scènes | combat | enquêtes | criminelles | super\\-héros | magie))=>[KNN 10 @description_vector $vector AS vector_distance]\n",
+ "(~@description:(action | adventure | movie | great | fighting | scenes | against | dangerous | criminal | crime | busting | superheroes | magic))=>[KNN 10 @description_vector $vector AS vector_distance]\n",
+ "(~@description:(action | adventure | movie | with | great | fighting | scenes | against | a | dangerous | criminal | crime | busting | superheroes | and | magic))=>[KNN 10 @description_vector $vector AS vector_distance]\n"
+ ]
}
],
"source": [
- "from typing import Any, Dict, List\n",
- "from redis.commands.search.aggregation import AggregateRequest, Desc\n",
- "\n",
- "# Build the aggregation request\n",
- "req = (\n",
- " AggregateRequest(query.query_string())\n",
- " .scorer(\"BM25STD\")\n",
- " .add_scores()\n",
- " .apply(cosine_similarity=\"(2 - @vector_distance)/2\", bm25_score=\"@__score\")\n",
- " .apply(hybrid_score=f\"0.3*@bm25_score + 0.7*@cosine_similarity\")\n",
- " .load(\"title\", \"description\", \"cosine_similarity\", \"bm25_score\", \"hybrid_score\")\n",
- " .sort_by(Desc(\"@hybrid_score\"), max=3)\n",
- " .dialect(2)\n",
+ "# translate our user query to French and use nltk french stopwords\n",
+ "french_query_text = \"Film d'action et d'aventure avec de superbes scènes de combat, des enquêtes criminelles, des super-héros et de la magie\"\n",
+ "\n",
+ "french_film_query = HybridQuery(\n",
+ " text=french_query_text,\n",
+ " text_field_name=\"description\",\n",
+ " vector=model.embed(french_query_text, as_buffer=True, dtype=\"float32\"),\n",
+ " vector_field_name=\"description_vector\",\n",
+ " stopwords=\"french\",\n",
")\n",
"\n",
- "# Run the query\n",
- "res = index.aggregate(req, query_params={'vector': query._vector})\n",
+ "print(french_film_query._build_query_string())\n",
"\n",
- "# Perform output parsing\n",
- "[make_dict(row) for row in convert_bytes(res.rows)]\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "Notes on aggregate query syntax \n",
- "- `.scorer`: specifies the scoring function to use BM25 in this case\n",
- " - [see docs](https://redis.io/docs/latest/develop/interact/search-and-query/advanced-concepts/scoring/) for all available scorers\n",
- "- `.add_scores`: adds the scores to the result\n",
- "- `.apply`: algebraic operations that can be customized for your use case\n",
- "- `.load`: specifies fields to return - all in this case.\n",
- "- `.sort_by`: sort the output based on the hybrid score and yield top 5 results\n",
- "- `.dialect`: specifies the query dialect to use."
+ "# specify your own stopwords\n",
+ "custom_stopwords = set([\n",
+ " \"a\", \"is\", \"the\", \"an\", \"and\", \"are\", \"as\", \"at\", \"be\", \"but\", \"by\", \"for\",\n",
+ " \"if\", \"in\", \"into\", \"it\", \"no\", \"not\", \"of\", \"on\", \"or\", \"such\", \"that\", \"their\",\n",
+ " \"then\", \"there\", \"these\", \"they\", \"this\", \"to\", \"was\", \"will\", \"with\"\n",
+ "])\n",
+ "\n",
+ "stopwords_query = HybridQuery(\n",
+ " text=user_query,\n",
+ " text_field_name=\"description\",\n",
+ " vector=vector,\n",
+ " vector_field_name=\"description_vector\",\n",
+ " stopwords=custom_stopwords,\n",
+ ")\n",
+ "\n",
+ "print(stopwords_query._build_query_string())\n",
+ "\n",
+ "# don't use any stopwords\n",
+ "no_stopwords_query = HybridQuery(\n",
+ " text=user_query,\n",
+ " text_field_name=\"description\",\n",
+ " vector=vector,\n",
+ " vector_field_name=\"description_vector\",\n",
+ " stopwords=None,\n",
+ ")\n",
+ "\n",
+ "print(no_stopwords_query._build_query_string())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "Now we will define a function to do the entire operation start to finish for simplicity."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 90,
- "metadata": {},
- "outputs": [],
- "source": [
- "def linear_combo(user_query: str, alpha: float, num_results: int = 3) -> List[Dict[str, Any]]:\n",
- " # Add the optional flag, \"~\", so that this doesn't also act as a strict text filter\n",
- " text = f\"(~{Text('description') % tokenize_query(user_query)})\"\n",
+ "### Choosing your text scoring function and weights\n",
+ "There are different ways to calculate the similarity between sets of text. Redis supports several, such as `BM25`, `TFIDF`, `DISMAX`, and others. The default is `BM25STD` and is easy to configure with the `text_scorer` parameter. Just like changing you embedding model can change your vector similarity scores, changing your text similarity measure can change your text scores.\n",
"\n",
- " # Build vector query\n",
- " query = make_vector_query(user_query, num_results=num_results, filters=text)\n",
- " \n",
- " # Build aggregation\n",
- " req = (\n",
- " AggregateRequest(query.query_string())\n",
- " .scorer(\"BM25STD\")\n",
- " .add_scores()\n",
- " .apply(cosine_similarity=\"(2 - @vector_distance)/2\", bm25_score=\"@__score\")\n",
- " .apply(hybrid_score=f\"{1-alpha}*@bm25_score + {alpha}*@cosine_similarity\")\n",
- " .sort_by(Desc(\"@hybrid_score\"), max=num_results)\n",
- " .load(\"title\", \"description\", \"cosine_similarity\", \"bm25_score\", \"hybrid_score\")\n",
- " .dialect(2)\n",
- " )\n",
+ "Because hybrid queries are performing a weighted average of text similarity and vector similarity you also control the relative balance of these scores with the `alpha` parameter.\n",
+ "\n",
+ "The documents are ranked based on the hybrid score which is computed as:\n",
"\n",
- " # Run the query\n",
- " res = index.aggregate(req, query_params={'vector': query._vector})\n",
+ "```python\n",
+ "hybrid_score = {1-alpha} * text_score + {alpha} * vector_similarity\n",
+ "```\n",
"\n",
- " # Perform output parsing\n",
- " if res:\n",
- " movies = [make_dict(row) for row in convert_bytes(res.rows)]\n",
- " return [(movie[\"title\"], movie[\"hybrid_score\"]) for movie in movies]"
+ "Try changing the `text_scorer` and `alpha` parameters in the query below to see how results may change.\n"
]
},
{
"cell_type": "code",
- "execution_count": 91,
+ "execution_count": 15,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
- "[('The Incredibles', '2.22261780363'),\n",
- " ('Explosive Pursuit', '0.466046193242'),\n",
- " ('Mad Max: Fury Road', '0.455657145381'),\n",
- " ('The Dark Knight', '0.452280691266'),\n",
- " ('Despicable Me', '0.448826736212'),\n",
- " ('Inception', '0.434456560016')]"
+ "[{'vector_distance': '0.645975470543',\n",
+ " 'title': 'The Incredibles',\n",
+ " 'description': \"A family of undercover superheroes, while trying to live the quiet suburban life, are forced into action to save the world. Bob Parr (Mr. Incredible) and his wife Helen (Elastigirl) were among the world's greatest crime fighters, but now they must assume civilian identities and retreat to the suburbs to live a 'normal' life with their three children. However, the family's desire to help the world pulls them back into action when they face a new and dangerous enemy.\",\n",
+ " 'vector_similarity': '0.677012264729',\n",
+ " 'text_score': '6',\n",
+ " 'hybrid_score': '4.66925306618'},\n",
+ " {'vector_distance': '0.653376281261',\n",
+ " 'title': 'The Dark Knight',\n",
+ " 'description': 'Batman faces off against the Joker, a criminal mastermind who threatens to plunge Gotham into chaos.',\n",
+ " 'vector_similarity': '0.673311859369',\n",
+ " 'text_score': '4',\n",
+ " 'hybrid_score': '3.16832796484'},\n",
+ " {'vector_distance': '0.608649492264',\n",
+ " 'title': 'Explosive Pursuit',\n",
+ " 'description': 'A daring cop chases a notorious criminal across the city in a high-stakes game of cat and mouse.',\n",
+ " 'vector_similarity': '0.695675253868',\n",
+ " 'text_score': '3',\n",
+ " 'hybrid_score': '2.42391881347'}]"
]
},
- "execution_count": 91,
+ "execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
- "# Test it out\n",
+ "tfidf_query = HybridQuery(\n",
+ " text=user_query,\n",
+ " text_field_name=\"description\",\n",
+ " vector=vector,\n",
+ " vector_field_name=\"description_vector\",\n",
+ " text_scorer=\"TFIDF\", # can be one of [TFIDF, TFIDF.DOCNORM, BM25, DISMAX, DOCSCORE, BM25STD]\n",
+ " stopwords=None,\n",
+ " alpha=0.25, # weight the vector score lower\n",
+ " return_fields=[\"title\", \"description\"],\n",
+ ")\n",
+ "\n",
+ "results = index.query(tfidf_query)\n",
"\n",
- "# 70% of the hybrid search score based on cosine similarity\n",
- "linear_combo(user_query, alpha=0.7, num_results=6)"
+ "results[:3]"
]
},
{
@@ -697,7 +588,7 @@
},
{
"cell_type": "code",
- "execution_count": 74,
+ "execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
@@ -725,7 +616,7 @@
},
{
"cell_type": "code",
- "execution_count": 75,
+ "execution_count": 15,
"metadata": {},
"outputs": [
{
@@ -741,7 +632,7 @@
" (8, 0.015384615384615385)]"
]
},
- "execution_count": 75,
+ "execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
@@ -751,12 +642,57 @@
"fuse_rankings_rrf([1, 2, 3], [2, 4, 6, 7, 8], [5, 6, 1, 2])"
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We'll want some helper functions to construct our individual text and vector queries"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Function to create a vector query using RedisVL helpers for ease of use\n",
+ "from redisvl.query import VectorQuery, TextQuery\n",
+ "\n",
+ "\n",
+ "def make_vector_query(user_query: str, num_results: int, filters = None) -> VectorQuery:\n",
+ " \"\"\"Generate a Redis vector query given user query string.\"\"\"\n",
+ " vector = model.embed(user_query, as_buffer=True, dtype=\"float32\")\n",
+ " query = VectorQuery(\n",
+ " vector=vector,\n",
+ " vector_field_name=\"description_vector\",\n",
+ " num_results=num_results,\n",
+ " return_fields=[\"title\", \"description\"]\n",
+ " )\n",
+ " if filters:\n",
+ " query.set_filter(filters)\n",
+ " return query\n",
+ "\n",
+ "\n",
+ "def make_ft_query(text_field: str, user_query: str, num_results: int) -> TextQuery:\n",
+ " \"\"\"Generate a Redis full-text query given a user query string.\"\"\"\n",
+ " return TextQuery(\n",
+ " text=user_query,\n",
+ " text_field_name=text_field,\n",
+ " text_scorer=\"BM25\",\n",
+ " num_results=num_results,\n",
+ " return_fields=[\"title\", \"description\"],\n",
+ " )"
+ ]
+ },
{
"cell_type": "code",
- "execution_count": 76,
+ "execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
+ "from typing import List, Dict, Any\n",
+ "\n",
+ "\n",
"def weighted_rrf(\n",
" user_query: str,\n",
" alpha: float = 0.5,\n",
@@ -784,21 +720,21 @@
},
{
"cell_type": "code",
- "execution_count": 77,
+ "execution_count": 23,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
- "[('The Incredibles', 0.01639344262295082),\n",
- " ('Explosive Pursuit', 0.01575682382133995),\n",
- " ('Mad Max: Fury Road', 0.015079365079365078),\n",
- " ('Fast & Furious 9', 0.014925373134328358),\n",
- " ('Finding Nemo', 0.01488095238095238),\n",
- " ('The Dark Knight', 0.014854753521126762)]"
+ "[('Explosive Pursuit', 0.01639344262295082),\n",
+ " ('The Dark Knight', 0.015873015873015872),\n",
+ " ('Despicable Me', 0.015625),\n",
+ " ('The Incredibles', 0.015417457305502846),\n",
+ " ('Skyfall', 0.0152073732718894),\n",
+ " ('Finding Nemo', 0.014242424242424244)]"
]
},
- "execution_count": 77,
+ "execution_count": 23,
"metadata": {},
"output_type": "execute_result"
}
@@ -817,21 +753,21 @@
},
{
"cell_type": "code",
- "execution_count": 78,
+ "execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
- "[('The Incredibles', 0.01639344262295082),\n",
- " ('Explosive Pursuit', 0.015905707196029777),\n",
- " ('Mad Max: Fury Road', 0.015396825396825395),\n",
- " ('The Dark Knight', 0.015162852112676057),\n",
- " ('Fast & Furious 9', 0.014925373134328356),\n",
- " ('Inception', 0.014715649647156496)]"
+ "[('Explosive Pursuit', 0.01639344262295082),\n",
+ " ('The Dark Knight', 0.015873015873015872),\n",
+ " ('The Incredibles', 0.015702087286527514),\n",
+ " ('Despicable Me', 0.015625),\n",
+ " ('Skyfall', 0.014838709677419354),\n",
+ " ('Finding Nemo', 0.01387878787878788)]"
]
},
- "execution_count": 78,
+ "execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
@@ -851,7 +787,7 @@
},
{
"cell_type": "code",
- "execution_count": 79,
+ "execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
@@ -900,21 +836,21 @@
},
{
"cell_type": "code",
- "execution_count": 80,
+ "execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
- "[('The Incredibles', -0.4526837468147278),\n",
- " ('The Dark Knight', -7.41187858581543),\n",
- " ('Explosive Pursuit', -8.751346588134766),\n",
- " ('Mad Max: Fury Road', -7.049142837524414),\n",
- " ('Aladdin', -9.638406753540039),\n",
- " ('Despicable Me', -9.797615051269531)]"
+ "[('The Incredibles', -4.1636810302734375),\n",
+ " ('Explosive Pursuit', 0.8551048636436462),\n",
+ " ('The Dark Knight', -4.403156280517578),\n",
+ " ('Skyfall', -7.830077171325684),\n",
+ " ('Mad Max: Fury Road', -7.7119951248168945),\n",
+ " ('Despicable Me', -8.742403030395508)]"
]
},
- "execution_count": 80,
+ "execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
@@ -942,7 +878,7 @@
},
{
"cell_type": "code",
- "execution_count": 81,
+ "execution_count": 27,
"metadata": {},
"outputs": [],
"source": [
@@ -967,7 +903,37 @@
},
{
"cell_type": "code",
- "execution_count": 82,
+ "execution_count": 28,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def hybrid_query(text, alpha, num_results) -> List[Dict[str, Any]]:\n",
+ "\n",
+ " query = HybridQuery(\n",
+ " text,\n",
+ " text_field_name=\"description\",\n",
+ " vector=model.embed(text, as_buffer=True, dtype=\"float32\"),\n",
+ " vector_field_name=\"description_vector\",\n",
+ " text_scorer=\"BM25\",\n",
+ " stopwords=\"english\",\n",
+ " alpha=alpha,\n",
+ " return_fields=[\"title\", \"hybrid_score\"],\n",
+ " )\n",
+ "\n",
+ " results = index.query(query)\n",
+ "\n",
+ " return [\n",
+ " (\n",
+ " movie[\"title\"],\n",
+ " movie[\"hybrid_score\"]\n",
+ " )\n",
+ " for movie in results\n",
+ " ]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
"metadata": {},
"outputs": [],
"source": [
@@ -985,7 +951,7 @@
},
{
"cell_type": "code",
- "execution_count": 83,
+ "execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
@@ -993,12 +959,12 @@
"for i, user_query in enumerate(movie_user_queries):\n",
" rankings.at[i, \"hf-cross-encoder\"] = rerank(user_query, num_results=4)\n",
" rankings.at[i, \"rrf\"] = weighted_rrf(user_query, alpha=0.7, num_results=4)\n",
- " rankings.at[i, \"linear-combo-bm25-cosine\"] = linear_combo(user_query, alpha=0.7, num_results=4)"
+ " rankings.at[i, \"linear-combo-bm25-cosine\"] = hybrid_query(user_query, alpha=0.7, num_results=4)"
]
},
{
"cell_type": "code",
- "execution_count": 84,
+ "execution_count": 31,
"metadata": {},
"outputs": [
{
@@ -1032,37 +998,37 @@
" \n",
" | 0 | \n",
" I'm in the mood for a high-rated action movie ... | \n",
- " [(Explosive Pursuit, -11.244140625), (Mad Max:... | \n",
- " [(The Incredibles, 0.016029143897996357), (Mad... | \n",
- " [(The Incredibles, 1.09860771359), (Despicable... | \n",
+ " [(Mad Max: Fury Road, -11.244140625), (Toy Sto... | \n",
+ " [(The Incredibles, 0.016029143897996357), (Toy... | \n",
+ " [(The Incredibles, 0.55239223002), (Toy Story,... | \n",
"
\n",
" \n",
" | 1 | \n",
" What's a funny animated film about unlikely fr... | \n",
" [(Despicable Me, -10.441909790039062), (The In... | \n",
- " [(Black Widow, 0.015625), (The Incredibles, 0.... | \n",
- " [(The Incredibles, 0.454752063751), (Despicabl... | \n",
+ " [(Monsters, Inc., 0.015524093392945852), (Mada... | \n",
+ " [(The Incredibles, 0.45475204289), (Despicable... | \n",
"
\n",
" \n",
" | 2 | \n",
" Any movies featuring superheroes or extraordin... | \n",
- " [(The Incredibles, -3.6648082733154297), (The ... | \n",
- " [(The Incredibles, 0.01639344262295082), (Mad ... | \n",
- " [(The Incredibles, 1.05887192239), (The Avenge... | \n",
+ " [(The Incredibles, -3.6648073196411133), (The ... | \n",
+ " [(The Incredibles, 0.01639344262295082), (The ... | \n",
+ " [(The Incredibles, 0.603234915587), (The Aveng... | \n",
"
\n",
" \n",
" | 3 | \n",
" I want to watch a thrilling movie with spies o... | \n",
- " [(The Incredibles, -10.843631744384766), (Expl... | \n",
- " [(Skyfall, 0.01631411951348493), (Explosive Pu... | \n",
+ " [(Inception, -10.843631744384766), (The Incred... | \n",
+ " [(Inception, 0.015524093392945852), (Skyfall, ... | \n",
" [(Skyfall, 0.443840536475), (Despicable Me, 0.... | \n",
"
\n",
" \n",
" | 4 | \n",
" Are there any comedies set in unusual location... | \n",
- " [(The Incredibles, -11.45376968383789), (Explo... | \n",
- " [(Madagascar, 0.015272878190495952), (Explosiv... | \n",
- " [(Madagascar, 0.442132198811), (Despicable Me,... | \n",
+ " [(The Incredibles, -11.45376968383789), (Findi... | \n",
+ " [(Finding Nemo, 0.015524093392945852), (Madaga... | \n",
+ " [(Madagascar, 0.442132219672), (Despicable Me,... | \n",
"
\n",
" \n",
"\n",
@@ -1077,28 +1043,28 @@
"4 Are there any comedies set in unusual location... \n",
"\n",
" hf-cross-encoder \\\n",
- "0 [(Explosive Pursuit, -11.244140625), (Mad Max:... \n",
+ "0 [(Mad Max: Fury Road, -11.244140625), (Toy Sto... \n",
"1 [(Despicable Me, -10.441909790039062), (The In... \n",
- "2 [(The Incredibles, -3.6648082733154297), (The ... \n",
- "3 [(The Incredibles, -10.843631744384766), (Expl... \n",
- "4 [(The Incredibles, -11.45376968383789), (Explo... \n",
+ "2 [(The Incredibles, -3.6648073196411133), (The ... \n",
+ "3 [(Inception, -10.843631744384766), (The Incred... \n",
+ "4 [(The Incredibles, -11.45376968383789), (Findi... \n",
"\n",
" rrf \\\n",
- "0 [(The Incredibles, 0.016029143897996357), (Mad... \n",
- "1 [(Black Widow, 0.015625), (The Incredibles, 0.... \n",
- "2 [(The Incredibles, 0.01639344262295082), (Mad ... \n",
- "3 [(Skyfall, 0.01631411951348493), (Explosive Pu... \n",
- "4 [(Madagascar, 0.015272878190495952), (Explosiv... \n",
+ "0 [(The Incredibles, 0.016029143897996357), (Toy... \n",
+ "1 [(Monsters, Inc., 0.015524093392945852), (Mada... \n",
+ "2 [(The Incredibles, 0.01639344262295082), (The ... \n",
+ "3 [(Inception, 0.015524093392945852), (Skyfall, ... \n",
+ "4 [(Finding Nemo, 0.015524093392945852), (Madaga... \n",
"\n",
" linear-combo-bm25-cosine \n",
- "0 [(The Incredibles, 1.09860771359), (Despicable... \n",
- "1 [(The Incredibles, 0.454752063751), (Despicabl... \n",
- "2 [(The Incredibles, 1.05887192239), (The Avenge... \n",
+ "0 [(The Incredibles, 0.55239223002), (Toy Story,... \n",
+ "1 [(The Incredibles, 0.45475204289), (Despicable... \n",
+ "2 [(The Incredibles, 0.603234915587), (The Aveng... \n",
"3 [(Skyfall, 0.443840536475), (Despicable Me, 0.... \n",
- "4 [(Madagascar, 0.442132198811), (Despicable Me,... "
+ "4 [(Madagascar, 0.442132219672), (Despicable Me,... "
]
},
- "execution_count": 84,
+ "execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
@@ -1109,20 +1075,20 @@
},
{
"cell_type": "code",
- "execution_count": 85,
+ "execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array(['Show me movies set in dystopian or post-apocalyptic worlds',\n",
- " list([('Mad Max: Fury Road', -3.4906280040740967), ('Despicable Me', -11.051526069641113), ('The Incredibles', -11.315656661987305), ('Black Widow', -10.880638122558594)]),\n",
- " list([('Mad Max: Fury Road', 0.01602086438152012), ('Skyfall', 0.015607940446650124), ('The Incredibles', 0.015237691001697792), ('Black Widow', 0.01513526119402985)]),\n",
- " list([('Mad Max: Fury Road', '0.452238592505'), ('The Incredibles', '0.445061504841'), ('Madagascar', '0.419015598297'), ('Despicable Me', '0.416218388081')])],\n",
+ " list([('Mad Max: Fury Road', -3.4906270503997803), ('Despicable Me', -11.051526069641113), ('The Incredibles', -11.315656661987305), ('Finding Nemo', -10.880638122558594)]),\n",
+ " list([('The Incredibles', 0.01620835536753041), ('Finding Nemo', 0.013813068651778329), ('Mad Max: Fury Road', 0.011475409836065573), ('Madagascar', 0.01111111111111111)]),\n",
+ " list([('Mad Max: Fury Road', '0.452238571644'), ('The Incredibles', '0.445061463118'), ('Madagascar', '0.41901564002'), ('Despicable Me', '0.416218408942'), ('Skyfall', '0.411504244804'), ('The Avengers', '0.41121032536'), ('Black Widow', '0.410578364134'), ('The Lego Movie', '0.408463662863'), ('Monsters, Inc.', '0.392220926285'), ('Shrek', '0.390464815497')])],\n",
" dtype=object)"
]
},
- "execution_count": 85,
+ "execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
@@ -1142,11 +1108,16 @@
"- How to implement hybrid search queries using the Redis aggregation API\n",
"- How to perform client-side fusion and reranking techniques"
]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": []
}
],
"metadata": {
"kernelspec": {
- "display_name": "Python 3",
+ "display_name": "redis-ai-res",
"language": "python",
"name": "python3"
},
@@ -1160,7 +1131,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.11.9"
+ "version": "3.13.2"
}
},
"nbformat": 4,