|
40 | 40 | "metadata": {}, |
41 | 41 | "source": [ |
42 | 42 | "We are going to need a few things from your Elasticsearch deployment before we can migrate your configurations:\n", |
43 | | - "- Your **Elasticsearch Cloud ID**\n", |
| 43 | + "- Your **Elasticsearch Endpoint URL**\n", |
| 44 | + "- Your **Elasticsearch Endpoint Port number**\n", |
44 | 45 | "- An **API key**\n", |
45 | 46 | "\n", |
46 | | - "To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.\n", |
47 | | - "You can create a new API key from the Stack Management -> API keys menu in Kibana. Be sure to copy or write down your key in a safe place once it is created it will be displayed only upon creation." |
| 47 | + "You can find your Endpoint URL and port number by visiting your Elasticsearch Overview page in Kibana.\n", |
| 48 | + "\n", |
| 49 | + "You can create a new API key from the Stack Management -> API keys menu in Kibana. Be sure to copy or write down your key in a safe place, as it will be displayed only once upon creation." |
48 | 50 | ] |
49 | 51 | }, |
50 | 52 | { |
|
54 | 56 | "metadata": {}, |
55 | 57 | "outputs": [], |
56 | 58 | "source": [ |
57 | | - "ELASTIC_CLOUD_ID = getpass(\"Elastic Cloud ID: \")\n", |
| 59 | + "ELASTIC_ENDPOINT = getpass(\"Elastic Endpoint: \")\n", |
| 60 | + "ELASTIC_PORT = getpass(\"Port\")\n", |
58 | 61 | "API_KEY = getpass(\"Elastic Api Key: \")\n", |
59 | 62 | "\n", |
60 | 63 | "es_client = Elasticsearch(\n", |
61 | | - " cloud_id=ELASTIC_CLOUD_ID,\n", |
| 64 | + " \":\".join([ELASTIC_ENDPOINT, ELASTIC_PORT]),\n", |
62 | 65 | " api_key=API_KEY,\n", |
63 | 66 | ")\n", |
64 | 67 | "\n", |
|
83 | 86 | "source": [ |
84 | 87 | "### Step 1: Acquire Basic Configurations\n", |
85 | 88 | "\n", |
86 | | - "The first order of business is to establish what Crawlers you have and their basic configuration details.\n", |
| 89 | + "First, we need to establish what Crawlers you have and their basic configuration details.\n", |
87 | 90 | "This migration notebook will attempt to pull configurations for every distinct Crawler you have in your Elasticsearch instance." |
88 | 91 | ] |
89 | 92 | }, |
|
142 | 145 | "id": "2804d02b-870d-4173-9c5f-6d5eb434d49b", |
143 | 146 | "metadata": {}, |
144 | 147 | "source": [ |
145 | | - "**Before continuing, please verify in the output above that the correct number of Crawlers was found!**\n", |
| 148 | + "**Before continuing, please verify in the output above that the correct number of Crawlers was found.**\n", |
146 | 149 | "\n", |
147 | 150 | "Now that we have some basic data about your Crawlers, let's use this information to get more configuration values!" |
148 | 151 | ] |
|
154 | 157 | "source": [ |
155 | 158 | "### Step 2: URLs, Sitemaps, and Crawl Rules\n", |
156 | 159 | "\n", |
157 | | - "In this cell, we will need to query Elasticsearch for information about each Crawler's domain URLs, seed URLs, sitemaps, and crawling rules." |
| 160 | + "In the next cell, we will need to query Elasticsearch for information about each Crawler's domain URLs, seed URLs, sitemaps, and crawling rules." |
158 | 161 | ] |
159 | 162 | }, |
160 | 163 | { |
|
244 | 247 | "source": [ |
245 | 248 | "### Step 3: Extracting the Extraction Rules\n", |
246 | 249 | "\n", |
247 | | - "In the following cell, we will be acquiring any extraction rules you may have set in your Elastic Crawlers." |
| 250 | + "In the next cell, we will find any extraction rules you set for your Elastic Crawlers." |
248 | 251 | ] |
249 | 252 | }, |
250 | 253 | { |
|
324 | 327 | "source": [ |
325 | 328 | "### Step 4: Schedules\n", |
326 | 329 | "\n", |
327 | | - "In the upcoming cell, we will be gathering any specific time schedules your Crawlers have set. Please note that _interval time schedules_ are not supported by Open Crawler and will be ignored." |
| 330 | + "In the next cell, we will gather any specific time schedules your Crawlers have set. Please note that _interval time schedules_ are not supported by Open Crawler and will be ignored." |
328 | 331 | ] |
329 | 332 | }, |
330 | 333 | { |
|
390 | 393 | "source": [ |
391 | 394 | "### Step 5: Creating the Open Crawler YAML configuration files\n", |
392 | 395 | "\n", |
393 | | - "In this final step, we will be creating the actual YAML files you need to get up and running with Open Crawler!\n", |
| 396 | + "In this final step, we will create the actual YAML files you need to get up and running with Open Crawler!\n", |
394 | 397 | "\n", |
395 | | - "The upcoming cell performs some final transformations to the in-memory data structure that is keeping track of your configurations." |
| 398 | + "The next cell performs some final transformations to the in-memory data structure that is keeping track of your configurations." |
396 | 399 | ] |
397 | 400 | }, |
398 | 401 | { |
|
423 | 426 | "source": [ |
424 | 427 | "#### **Wait! Before we continue onto creating our YAML files, we're going to need your input on a few things.**\n", |
425 | 428 | "\n", |
426 | | - "In the following cell, please enter the following details about the _Elasticsearch instance you will be using with Open Crawler_:\n", |
| 429 | + "In the next cell, please enter the following details about the _Elasticsearch instance you will be using with Open Crawler_. This instance can be Elastic Cloud Hosted, Serverless, or a local instance.\n", |
| 430 | + "\n", |
427 | 431 | "- The Elasticsearch endpoint URL\n", |
428 | 432 | "- The port number of your Elasticsearch endpoint _(Optional, will default to 443 if left blank)_\n", |
429 | 433 | "- An API key" |
|
436 | 440 | "metadata": {}, |
437 | 441 | "outputs": [], |
438 | 442 | "source": [ |
439 | | - "ENDPOINT = input(\"Elasticsearch endpoint URL: \")\n", |
440 | | - "PORT = input(\"[OPTIONAL] Elasticsearch endpoint port number: \")\n", |
| 443 | + "ENDPOINT = getpass(\"Elasticsearch endpoint URL: \")\n", |
| 444 | + "PORT = getpass(\"[OPTIONAL] Elasticsearch endpoint port number: \")\n", |
441 | 445 | "OUTPUT_API_KEY = getpass(\"Elasticsearch API key: \")\n", |
442 | 446 | "\n", |
443 | 447 | "# set the above values in each Crawler's configuration\n", |
|
523 | 527 | ] |
524 | 528 | }, |
525 | 529 | { |
526 | | - "cell_type": "code", |
527 | | - "execution_count": null, |
528 | | - "id": "7aaee4e8-c388-4b22-a8ad-a657550d92c7", |
| 530 | + "cell_type": "markdown", |
| 531 | + "id": "dd4d18de-7b3b-4ebe-831b-c96bc55d6eb9", |
529 | 532 | "metadata": {}, |
530 | | - "outputs": [], |
531 | | - "source": [] |
| 533 | + "source": [ |
| 534 | + "### Next Steps\n", |
| 535 | + "\n", |
| 536 | + "Now that the YAML files have been generated, you can visit the Open Crawler GitHub repository to learn more about how to deploy Open Crawler: https://github.com/elastic/crawler#quickstart\n", |
| 537 | + "\n", |
| 538 | + "If you find any problems with this Notebook, please feel free to create an issue in the elasticsearch-labs repository: https://github.com/elastic/elasticsearch-labs/issues" |
| 539 | + ] |
532 | 540 | } |
533 | 541 | ], |
534 | 542 | "metadata": { |
|
0 commit comments