Copyedit fixes + initial connection to ES instance now requires endpoint URL and port, not Cloud ID

mattnowzari · mattnowzari · commit 266c29749d16 · 2025-02-25T10:54:40.000-05:00
diff --git a/notebooks/enterprise-search/elastic-crawler-to-open-crawler-migration.ipynb b/notebooks/enterprise-search/elastic-crawler-to-open-crawler-migration.ipynb
@@ -40,11 +40,13 @@
    "metadata": {},
    "source": [
     "We are going to need a few things from your Elasticsearch deployment before we can migrate your configurations:\n",
-    "- Your **Elasticsearch Cloud ID**\n",
+    "- Your **Elasticsearch Endpoint URL**\n",
+    "- Your **Elasticsearch Endpoint Port number**\n",
     "- An **API key**\n",
     "\n",
-    "To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.\n",
-    "You can create a new API key from the Stack Management -> API keys menu in Kibana. Be sure to copy or write down your key in a safe place once it is created it will be displayed only upon creation."
+    "You can find your Endpoint URL and port number by visiting your Elasticsearch Overview page in Kibana.\n",
+    "\n",
+    "You can create a new API key from the Stack Management -> API keys menu in Kibana. Be sure to copy or write down your key in a safe place, as it will be displayed only once upon creation."
    ]
   },
   {
@@ -54,11 +56,12 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "ELASTIC_CLOUD_ID = getpass(\"Elastic Cloud ID: \")\n",
+    "ELASTIC_ENDPOINT = getpass(\"Elastic Endpoint: \")\n",
+    "ELASTIC_PORT = getpass(\"Port\")\n",
     "API_KEY = getpass(\"Elastic Api Key: \")\n",
     "\n",
     "es_client = Elasticsearch(\n",
-    "    cloud_id=ELASTIC_CLOUD_ID,\n",
+    "    \":\".join([ELASTIC_ENDPOINT, ELASTIC_PORT]),\n",
     "    api_key=API_KEY,\n",
     ")\n",
     "\n",
@@ -83,7 +86,7 @@
    "source": [
     "### Step 1: Acquire Basic Configurations\n",
     "\n",
-    "The first order of business is to establish what Crawlers you have and their basic configuration details.\n",
+    "First, we need to establish what Crawlers you have and their basic configuration details.\n",
     "This migration notebook will attempt to pull configurations for every distinct Crawler you have in your Elasticsearch instance."
    ]
   },
@@ -142,7 +145,7 @@
    "id": "2804d02b-870d-4173-9c5f-6d5eb434d49b",
    "metadata": {},
    "source": [
-    "**Before continuing, please verify in the output above that the correct number of Crawlers was found!**\n",
+    "**Before continuing, please verify in the output above that the correct number of Crawlers was found.**\n",
     "\n",
     "Now that we have some basic data about your Crawlers, let's use this information to get more configuration values!"
    ]
@@ -154,7 +157,7 @@
    "source": [
     "### Step 2: URLs, Sitemaps, and Crawl Rules\n",
     "\n",
-    "In this cell, we will need to query Elasticsearch for information about each Crawler's domain URLs, seed URLs, sitemaps, and crawling rules."
+    "In the next cell, we will need to query Elasticsearch for information about each Crawler's domain URLs, seed URLs, sitemaps, and crawling rules."
    ]
   },
   {
@@ -244,7 +247,7 @@
    "source": [
     "### Step 3: Extracting the Extraction Rules\n",
     "\n",
-    "In the following cell, we will be acquiring any extraction rules you may have set in your Elastic Crawlers."
+    "In the next cell, we will find any extraction rules you set for your Elastic Crawlers."
    ]
   },
   {
@@ -324,7 +327,7 @@
    "source": [
     "### Step 4: Schedules\n",
     "\n",
-    "In the upcoming cell, we will be gathering any specific time schedules your Crawlers have set. Please note that _interval time schedules_ are not supported by Open Crawler and will be ignored."
+    "In the next cell, we will gather any specific time schedules your Crawlers have set. Please note that _interval time schedules_ are not supported by Open Crawler and will be ignored."
    ]
   },
   {
@@ -390,9 +393,9 @@
    "source": [
     "### Step 5: Creating the Open Crawler YAML configuration files\n",
     "\n",
-    "In this final step, we will be creating the actual YAML files you need to get up and running with Open Crawler!\n",
+    "In this final step, we will create the actual YAML files you need to get up and running with Open Crawler!\n",
     "\n",
-    "The upcoming cell performs some final transformations to the in-memory data structure that is keeping track of your configurations."
+    "The next cell performs some final transformations to the in-memory data structure that is keeping track of your configurations."
    ]
   },
   {
@@ -423,7 +426,8 @@
    "source": [
     "#### **Wait! Before we continue onto creating our YAML files, we're going to need your input on a few things.**\n",
     "\n",
-    "In the following cell, please enter the following details about the _Elasticsearch instance you will be using with Open Crawler_:\n",
+    "In the next cell, please enter the following details about the _Elasticsearch instance you will be using with Open Crawler_. This instance can be Elastic Cloud Hosted, Serverless, or a local instance.\n",
+    "\n",
     "- The Elasticsearch endpoint URL\n",
     "- The port number of your Elasticsearch endpoint _(Optional, will default to 443 if left blank)_\n",
     "- An API key"
@@ -436,8 +440,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "ENDPOINT = input(\"Elasticsearch endpoint URL: \")\n",
-    "PORT = input(\"[OPTIONAL] Elasticsearch endpoint port number: \")\n",
+    "ENDPOINT = getpass(\"Elasticsearch endpoint URL: \")\n",
+    "PORT = getpass(\"[OPTIONAL] Elasticsearch endpoint port number: \")\n",
     "OUTPUT_API_KEY = getpass(\"Elasticsearch API key: \")\n",
     "\n",
     "# set the above values in each Crawler's configuration\n",
@@ -523,12 +527,16 @@
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7aaee4e8-c388-4b22-a8ad-a657550d92c7",
+   "cell_type": "markdown",
+   "id": "dd4d18de-7b3b-4ebe-831b-c96bc55d6eb9",
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "source": [
+    "### Next Steps\n",
+    "\n",
+    "Now that the YAML files have been generated, you can visit the Open Crawler GitHub repository to learn more about how to deploy Open Crawler: https://github.com/elastic/crawler#quickstart\n",
+    "\n",
+    "If you find any problems with this Notebook, please feel free to create an issue in the elasticsearch-labs repository: https://github.com/elastic/elasticsearch-labs/issues"
+   ]
   }
  ],
  "metadata": {