Update to use exclude rules, match queries

kderusso · kderusso · commit da0e41b2193b · 2025-03-17T14:20:14.000-04:00
diff --git a/notebooks/enterprise-search/app-search-engine-exporter.ipynb b/notebooks/enterprise-search/app-search-engine-exporter.ipynb
@@ -66,7 +66,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -95,12 +95,12 @@
     "\n",
     "You can find your App Search endpoint and your search private key from the `Credentials` menu inside your App Search instance in Kibana.\n",
     "\n",
-    "Also note here, we define our `ENGINE_NAME`. For this examplem we are using the `national-parks-demo` sample engine that is available within App Search."
+    "Also note here, we define our `ENGINE_NAME`. For this example, we are using the `national-parks-demo` sample engine that is available within App Search."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -129,7 +129,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": null,
    "metadata": {
     "id": "kpV8K5jHvRK6"
    },
@@ -173,9 +173,9 @@
     "\n",
     "Next, we will export any curations that may be in our App Search engine.\n",
     "\n",
-    "To export App Search curations we will use Elasticsearch [query rules](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-using-query-rules.html).\n",
-    "At the moment of writing this notebook Elasticsearch query rules only allow for pinning results unlike App Search curations that also allow excluding results.\n",
-    "For this reason we will only export pinned results. The code below will create the necessary `query_rules` to achieve this. Note that there is a default soft limit of 100 curations for `query_rules` that can be configured up to a hard limit of 1,000."
+    "To export App Search curations we will use Elasticsearch [query rules](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-using-query-rules.html). The code below will create the necessary `query_rules` to achieve this. Note that there is a default soft limit of 100 curations for `query_rules` that can be configured up to a hard limit of 1,000.\n",
+    "\n",
+    "NOTE: This example outputs query rules requiring `exact` matches, which are case-sensitive. If you need typo tolerance, consider using `fuzzy`. If you need different case values consider adding multiple values to your criteria. "
    ]
   },
   {
@@ -187,24 +187,56 @@
     "query_rules = []\n",
     "\n",
     "for curation in app_search.list_curations(engine_name=ENGINE_NAME).body[\"results\"]:\n",
-    "    query_rules.append(\n",
-    "        {\n",
-    "            \"rule_id\": curation[\"id\"],\n",
-    "            \"type\": \"pinned\",\n",
-    "            \"criteria\": [\n",
-    "                {\n",
-    "                    \"type\": \"exact\",\n",
-    "                    \"metadata\": \"user_query\",\n",
-    "                    \"values\": curation[\"queries\"],\n",
-    "                }\n",
-    "            ],\n",
-    "            \"actions\": {\"ids\": curation[\"promoted\"]},\n",
-    "        }\n",
-    "    )\n",
+    "    if (curation[\"promoted\"]):\n",
+    "      query_rules.append(\n",
+    "          {\n",
+    "              \"rule_id\": curation[\"id\"] + \"-pinned\",\n",
+    "              \"type\": \"pinned\",\n",
+    "              \"criteria\": [\n",
+    "                  {\n",
+    "                      \"type\": \"exact\",\n",
+    "                      \"metadata\": \"user_query\",\n",
+    "                      \"values\": curation[\"queries\"],\n",
+    "                  }\n",
+    "              ],\n",
+    "              \"actions\": {\"ids\": curation[\"promoted\"]},\n",
+    "          }\n",
+    "      )\n",
+    "    if(curation[\"hidden\"]):\n",
+    "      query_rules.append(\n",
+    "          {\n",
+    "              \"rule_id\": curation[\"id\"] + \"-exclude\",\n",
+    "              \"type\": \"exclude\",\n",
+    "              \"criteria\": [\n",
+    "                  {\n",
+    "                      \"type\": \"exact\",\n",
+    "                      \"metadata\": \"user_query\",\n",
+    "                      \"values\": curation[\"queries\"],\n",
+    "                  }\n",
+    "              ],\n",
+    "              \"actions\": {\"ids\": curation[\"hidden\"]},\n",
+    "          }\n",
+    "      )\n",
     "\n",
     "elasticsearch.query_rules.put_ruleset(ruleset_id=ENGINE_NAME, rules=query_rules)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's take a quick look at the query rules we've migrated. We'll do this via the `GET _query_rules/ENGINE_NAME` endpoint. Note that curations with both pinned and hidden documents will be represented as two rules in the ruleset."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(json.dumps(elasticsearch.query_rules.get_ruleset(ruleset_id=ENGINE_NAME).body, indent=2))"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {
@@ -215,7 +247,15 @@
     "\n",
     "We recommend reindexing your App Search engine data into a new Elasticsearch index instead of reusing the existing one. This allows you to update the index mapping to take advantage of modern features like semantic search and the newly created Elasticsearch synonym set.\n",
     "\n",
-    "App Search has the following data types: `text`, `number`, `date` and `geolocation`. Each of these types is mapped to Elasticsearch field types.\n",
+    "App Search has the following data types:\n",
+    "\n",
+    "- `text`\n",
+    "- `number`\n",
+    "- `date`\n",
+    "- `geolocation`\n",
+    " \n",
+    "Each of these types is mapped to Elasticsearch field types.\n",
+    "\n",
     "We can take a closer look at how App Search field types are mapped to Elasticsearch fields, by using the [`GET mapping API`](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-mapping.html).\n",
     "For App Search engines, the associated Elasticsearch index name is `.ent-search-engine-documents-[ENGINE_NAME]`, e.g. `.ent-search-engine-documents-national-parks-demo` for the App Search sample engine `national-parks-demo`.\n",
     "One thing to notice is how App Search uses [multi-fields](https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html) in Elasticsearch that allow for quickly changing the field type in App Search without requiring reindexing by creating subfields for each type of supported field:\n",
@@ -578,38 +618,11 @@
    "source": [
     "# Add semantic text fields for semantic search (optional)\n",
     "\n",
-    "One of the advantages of exporting our index directly to Elasticsearch is that we can easily perform semantic search with ELSER. To do this, we'll need to add an inference endpoint using ELSER, and a `semantic_text` field to our index to use it.\n",
+    "One of the advantages of exporting our index directly to Elasticsearch is that we can easily perform semantic search with ELSER. To do this, we'll need to add a `semantic_text` field to our index to use it. We will set up a `semantic_text` field using our default ELSER endpoint.\n",
     "\n",
     "Note that to use this feature, your cluster must have at least one ML node set up with enough resources allocated to it.\n",
     "\n",
-    "If you have not already, be sure that your ELSER v2 model is [setup and deployed](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html).\n",
-    "\n",
-    "Let's first start by creating our inference endpoint using the [Create inference API]](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-inference-api.html)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# delete our inference endpoint if it is already created\n",
-    "if elasticsearch.inference.get(inference_id=\"elser_inference_endpoint\"):\n",
-    "    elasticsearch.inference.delete(inference_id=\"elser_inference_endpoint\")\n",
-    "\n",
-    "# and create our endpoint using the ELSER v2 model\n",
-    "elasticsearch.inference.put(\n",
-    "    inference_id=\"elser_inference_endpoint\",\n",
-    "    inference_config={\n",
-    "        \"service\": \"elasticsearch\",\n",
-    "        \"service_settings\": {\n",
-    "            \"model_id\": \".elser_model_2_linux-x86_64\",\n",
-    "            \"num_allocations\": 1,\n",
-    "            \"num_threads\": 1,\n",
-    "        },\n",
-    "    },\n",
-    "    task_type=\"sparse_embedding\",\n",
-    ")"
+    "If you do not have an ELSER endpoint running, it will be automatically downloaded, deployed and started for you when you use `semantic_text`. This means the first few commands may take a while as the model loads."
    ]
   },
   {
@@ -637,8 +650,7 @@
     "for field_name in SEMANTIC_TEXT_FIELDS:\n",
     "    semantic_field_name = field_name + \"_semantic\"\n",
     "    mapping[semantic_field_name] = {\n",
-    "        \"type\": \"semantic_text\",\n",
-    "        \"inference_id\": \"elser_inference_endpoint\",\n",
+    "        \"type\": \"semantic_text\"\n",
     "    }\n",
     "\n",
     "# and for our text fields, add a \"copy_to\" directive to copy the text to the semantic_text field\n",
@@ -778,7 +790,7 @@
     "\n",
     "For the results, we sort on our score descending as the primary sort, with the document id as the secondary.\n",
     "\n",
-    "We apply highlights to our results, request a return size of the top 10 hits, and for each hit, return the result fields."
+    "We apply highlights to returned text search descriptions, request a return size of the top 10 hits, and for each hit, return the result fields."
    ]
   },
   {
@@ -826,7 +838,7 @@
     "        \"order\": \"score\",\n",
     "        \"encoder\": \"html\",\n",
     "        \"require_field_match\": False,\n",
-    "        \"fields\": {},\n",
+    "        \"fields\": { \"description\" : { \"pre_tags\" : [\"<em>\"], \"post_tags\" : [\"</em>\"] } },\n",
     "    },\n",
     "    \"size\": 10,\n",
     "    \"_source\": result_fields,\n",
@@ -849,7 +861,7 @@
    "outputs": [],
    "source": [
     "results = elasticsearch.search(\n",
-    "    index=SOURCE_INDEX,\n",
+    "    index=DEST_INDEX,\n",
     "    query=app_search_query_payload[\"query\"],\n",
     "    highlight=app_search_query_payload[\"highlight\"],\n",
     "    source=app_search_query_payload[\"_source\"],\n",
@@ -866,7 +878,9 @@
     "### How to do semantic search using ELSER with semantic text fields\n",
     "\n",
     "If you [enabled and reindexed your data with ELSER](#add-sparse_vector-fields-for-semantic-search-optional), we can now use this to do semantic search.\n",
-    "For each `semantic_text` field type, we can define a [semantic query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-semantic-query.html) to easily perform a semantic search on these fields.\n"
+    "For each `semantic_text` field type, we can define a [match query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html) to easily perform a semantic search on these fields.\n",
+    "\n",
+    "NOTE: For Elasticsearch versions prior to 8.18, we should update this to use a [semantic query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-semantic-query.html) to easily perform a semantic search on these fields.\n"
    ]
   },
   {
@@ -883,9 +897,8 @@
     "    semantic_field_name = field_name + \"_semantic\"\n",
     "    semantic_text_queries.append(\n",
     "        {\n",
-    "            \"semantic\": {\n",
-    "                \"field\": semantic_field_name,\n",
-    "                \"query\": QUERY_STRING,\n",
+    "            \"match\": {\n",
+    "                semantic_field_name: QUERY_STRING\n",
     "            }\n",
     "        }\n",
     "    )\n",
@@ -926,7 +939,7 @@
     "          \"should\": [\n",
     "            // multi_match query with best_fields from App Search generated query\n",
     "            // multi_match query with cross_fields from App Search generated query\n",
-    "            // text_expansion queries for sparse_vector fields\n",
+    "            // match queries for semantic_text fields\n",
     "          ]\n",
     "        }\n",
     "      }  \n",
@@ -960,7 +973,7 @@
    "outputs": [],
    "source": [
     "results = elasticsearch.search(\n",
-    "    index=SOURCE_INDEX,\n",
+    "    index=DEST_INDEX,\n",
     "    query=payload[\"query\"],\n",
     "    highlight=payload[\"highlight\"],\n",
     "    source=payload[\"_source\"],\n",
@@ -969,7 +982,7 @@
     "    min_score=1,\n",
     ")\n",
     "\n",
-    "print(f\"Text expansion query results:\\n{json.dumps(results.body, indent=2)}\\n\")"
+    "print(f\"Semantic query results:\\n{json.dumps(results.body, indent=2)}\\n\")"
    ]
   }
  ],