Skip to content

Commit 266c297

Browse files
committed
Copyedit fixes + initial connection to ES instance now requires endpoint URL and port, not Cloud ID
1 parent 90f5eed commit 266c297

File tree

1 file changed

+28
-20
lines changed

1 file changed

+28
-20
lines changed

notebooks/enterprise-search/elastic-crawler-to-open-crawler-migration.ipynb

Lines changed: 28 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -40,11 +40,13 @@
4040
"metadata": {},
4141
"source": [
4242
"We are going to need a few things from your Elasticsearch deployment before we can migrate your configurations:\n",
43-
"- Your **Elasticsearch Cloud ID**\n",
43+
"- Your **Elasticsearch Endpoint URL**\n",
44+
"- Your **Elasticsearch Endpoint Port number**\n",
4445
"- An **API key**\n",
4546
"\n",
46-
"To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.\n",
47-
"You can create a new API key from the Stack Management -> API keys menu in Kibana. Be sure to copy or write down your key in a safe place once it is created it will be displayed only upon creation."
47+
"You can find your Endpoint URL and port number by visiting your Elasticsearch Overview page in Kibana.\n",
48+
"\n",
49+
"You can create a new API key from the Stack Management -> API keys menu in Kibana. Be sure to copy or write down your key in a safe place, as it will be displayed only once upon creation."
4850
]
4951
},
5052
{
@@ -54,11 +56,12 @@
5456
"metadata": {},
5557
"outputs": [],
5658
"source": [
57-
"ELASTIC_CLOUD_ID = getpass(\"Elastic Cloud ID: \")\n",
59+
"ELASTIC_ENDPOINT = getpass(\"Elastic Endpoint: \")\n",
60+
"ELASTIC_PORT = getpass(\"Port\")\n",
5861
"API_KEY = getpass(\"Elastic Api Key: \")\n",
5962
"\n",
6063
"es_client = Elasticsearch(\n",
61-
" cloud_id=ELASTIC_CLOUD_ID,\n",
64+
" \":\".join([ELASTIC_ENDPOINT, ELASTIC_PORT]),\n",
6265
" api_key=API_KEY,\n",
6366
")\n",
6467
"\n",
@@ -83,7 +86,7 @@
8386
"source": [
8487
"### Step 1: Acquire Basic Configurations\n",
8588
"\n",
86-
"The first order of business is to establish what Crawlers you have and their basic configuration details.\n",
89+
"First, we need to establish what Crawlers you have and their basic configuration details.\n",
8790
"This migration notebook will attempt to pull configurations for every distinct Crawler you have in your Elasticsearch instance."
8891
]
8992
},
@@ -142,7 +145,7 @@
142145
"id": "2804d02b-870d-4173-9c5f-6d5eb434d49b",
143146
"metadata": {},
144147
"source": [
145-
"**Before continuing, please verify in the output above that the correct number of Crawlers was found!**\n",
148+
"**Before continuing, please verify in the output above that the correct number of Crawlers was found.**\n",
146149
"\n",
147150
"Now that we have some basic data about your Crawlers, let's use this information to get more configuration values!"
148151
]
@@ -154,7 +157,7 @@
154157
"source": [
155158
"### Step 2: URLs, Sitemaps, and Crawl Rules\n",
156159
"\n",
157-
"In this cell, we will need to query Elasticsearch for information about each Crawler's domain URLs, seed URLs, sitemaps, and crawling rules."
160+
"In the next cell, we will need to query Elasticsearch for information about each Crawler's domain URLs, seed URLs, sitemaps, and crawling rules."
158161
]
159162
},
160163
{
@@ -244,7 +247,7 @@
244247
"source": [
245248
"### Step 3: Extracting the Extraction Rules\n",
246249
"\n",
247-
"In the following cell, we will be acquiring any extraction rules you may have set in your Elastic Crawlers."
250+
"In the next cell, we will find any extraction rules you set for your Elastic Crawlers."
248251
]
249252
},
250253
{
@@ -324,7 +327,7 @@
324327
"source": [
325328
"### Step 4: Schedules\n",
326329
"\n",
327-
"In the upcoming cell, we will be gathering any specific time schedules your Crawlers have set. Please note that _interval time schedules_ are not supported by Open Crawler and will be ignored."
330+
"In the next cell, we will gather any specific time schedules your Crawlers have set. Please note that _interval time schedules_ are not supported by Open Crawler and will be ignored."
328331
]
329332
},
330333
{
@@ -390,9 +393,9 @@
390393
"source": [
391394
"### Step 5: Creating the Open Crawler YAML configuration files\n",
392395
"\n",
393-
"In this final step, we will be creating the actual YAML files you need to get up and running with Open Crawler!\n",
396+
"In this final step, we will create the actual YAML files you need to get up and running with Open Crawler!\n",
394397
"\n",
395-
"The upcoming cell performs some final transformations to the in-memory data structure that is keeping track of your configurations."
398+
"The next cell performs some final transformations to the in-memory data structure that is keeping track of your configurations."
396399
]
397400
},
398401
{
@@ -423,7 +426,8 @@
423426
"source": [
424427
"#### **Wait! Before we continue onto creating our YAML files, we're going to need your input on a few things.**\n",
425428
"\n",
426-
"In the following cell, please enter the following details about the _Elasticsearch instance you will be using with Open Crawler_:\n",
429+
"In the next cell, please enter the following details about the _Elasticsearch instance you will be using with Open Crawler_. This instance can be Elastic Cloud Hosted, Serverless, or a local instance.\n",
430+
"\n",
427431
"- The Elasticsearch endpoint URL\n",
428432
"- The port number of your Elasticsearch endpoint _(Optional, will default to 443 if left blank)_\n",
429433
"- An API key"
@@ -436,8 +440,8 @@
436440
"metadata": {},
437441
"outputs": [],
438442
"source": [
439-
"ENDPOINT = input(\"Elasticsearch endpoint URL: \")\n",
440-
"PORT = input(\"[OPTIONAL] Elasticsearch endpoint port number: \")\n",
443+
"ENDPOINT = getpass(\"Elasticsearch endpoint URL: \")\n",
444+
"PORT = getpass(\"[OPTIONAL] Elasticsearch endpoint port number: \")\n",
441445
"OUTPUT_API_KEY = getpass(\"Elasticsearch API key: \")\n",
442446
"\n",
443447
"# set the above values in each Crawler's configuration\n",
@@ -523,12 +527,16 @@
523527
]
524528
},
525529
{
526-
"cell_type": "code",
527-
"execution_count": null,
528-
"id": "7aaee4e8-c388-4b22-a8ad-a657550d92c7",
530+
"cell_type": "markdown",
531+
"id": "dd4d18de-7b3b-4ebe-831b-c96bc55d6eb9",
529532
"metadata": {},
530-
"outputs": [],
531-
"source": []
533+
"source": [
534+
"### Next Steps\n",
535+
"\n",
536+
"Now that the YAML files have been generated, you can visit the Open Crawler GitHub repository to learn more about how to deploy Open Crawler: https://github.com/elastic/crawler#quickstart\n",
537+
"\n",
538+
"If you find any problems with this Notebook, please feel free to create an issue in the elasticsearch-labs repository: https://github.com/elastic/elasticsearch-labs/issues"
539+
]
532540
}
533541
],
534542
"metadata": {

0 commit comments

Comments
 (0)