diff --git a/.gitignore b/.gitignore
index 2b67b8b9..72be2849 100644
--- a/.gitignore
+++ b/.gitignore
@@ -73,3 +73,6 @@ script/licenses/**/_downloaded_*-LICENSE.txt
 bin/container-structure-test
 .artifacts
 .buildkite/publish/container-structure-test.yaml
+
+# Migration
+/migration/.ipynb_checkpoints/*
diff --git a/migration/crawler_migration.ipynb b/migration/crawler_migration.ipynb
new file mode 100644
index 00000000..82eaf99a
--- /dev/null
+++ b/migration/crawler_migration.ipynb
@@ -0,0 +1,742 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "89b4646f-6a71-44e0-97b9-846319bf0162",
+   "metadata": {},
+   "source": [
+    "## Hello, future Elastic Open Crawler user!\n",
+    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)]()\n",
+    "\n",
+    "This notebook is designed to help you migrate your Elastic Crawler configurations to Open Crawler-friendly YAML!\n",
+    "\n",
+    "We recommend running each cell individually in a sequential fashion, as each cell is dependent on previous cells having been run.\n",
+    "\n",
+    "_If you are running this notebook inside Google Colab, or have not installed elasticsearch in your local environment yet, please run the following cell to make sure the Python `elasticsearch` client is installed._\n",
+    "\n",
+    "### Setup\n",
+    "First, let's start by making sure `elasticsearch` and other required dependencies are installed and imported by running the following cell:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 675,
+   "id": "da411d2f-9aff-46af-845a-5fe9be19ea3c",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Requirement already satisfied: elasticsearch in /Users/mattnowzari/repos/python/mn_venv/lib/python3.12/site-packages (8.17.1)\n",
+      "Requirement already satisfied: elastic-transport<9,>=8.15.1 in /Users/mattnowzari/repos/python/mn_venv/lib/python3.12/site-packages (from elasticsearch) (8.17.0)\n",
+      "Requirement already satisfied: urllib3<3,>=1.26.2 in /Users/mattnowzari/repos/python/mn_venv/lib/python3.12/site-packages (from elastic-transport<9,>=8.15.1->elasticsearch) (2.3.0)\n",
+      "Requirement already satisfied: certifi in /Users/mattnowzari/repos/python/mn_venv/lib/python3.12/site-packages (from elastic-transport<9,>=8.15.1->elasticsearch) (2024.12.14)\n"
+     ]
+    }
+   ],
+   "source": [
+    "!pip install elasticsearch\n",
+    "\n",
+    "from getpass import getpass\n",
+    "from elasticsearch import Elasticsearch\n",
+    "\n",
+    "import os\n",
+    "import json\n",
+    "import yaml\n",
+    "import pprint\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4131f88-9895-4c0e-8b0a-6ec7b3b45653",
+   "metadata": {},
+   "source": [
+    "We are going to need a few things from your Elasticsearch deployment before we can migrate your configurations:\n",
+    "- Your **Elasticsearch Cloud ID**\n",
+    "- An **API key**\n",
+    "\n",
+    "To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.\n",
+    "You can create a new API key from the Stack Management -> API keys menu in Kibana. Be sure to copy or write down your key in a safe place once it is created it will be displayed only upon creation."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 753,
+   "id": "08e6e3d2-62d3-4890-a6be-41fe0a931ef6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdin",
+     "output_type": "stream",
+     "text": [
+      "Elastic Cloud ID:  ········\n",
+      "Elastic Api Key:  ········\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "'You Know, for Search'"
+      ]
+     },
+     "execution_count": 753,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "ELASTIC_CLOUD_ID = getpass(\"Elastic Cloud ID: \")\n",
+    "API_KEY = getpass(\"Elastic Api Key: \")\n",
+    "\n",
+    "es_client = Elasticsearch(\n",
+    "    cloud_id=ELASTIC_CLOUD_ID,\n",
+    "    api_key=API_KEY,\n",
+    ")\n",
+    "\n",
+    "# ping ES to make sure we have positive connection\n",
+    "es_client.info()['tagline']"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "85f99942-58ae-437d-a72b-70b8d1f4432c",
+   "metadata": {},
+   "source": [
+    "Hopefully you received our tagline 'You Know, for Search'. If so, we are connected and ready to go!\n",
+    "\n",
+    "If not, please double-check your Cloud ID and API key that you provided above. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a55236e7-19dc-4f4c-92b9-d10848dd6af9",
+   "metadata": {},
+   "source": [
+    "### Step 1: Acquire Basic Configurations\n",
+    "\n",
+    "The first order of business is to establish what Crawlers you have and their basic configuration details.\n",
+    "This migration notebook will attempt to pull configurations for every distinct Crawler you have in your Elasticsearch instance."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 816,
+   "id": "0a698b05-e939-42a5-aa31-51b1b1883e6f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1. search-search-crawler-fully-loaded-8.18\n",
+      "2. search-daggerfall-unity-website-crawler-8.18\n",
+      "3. search-migration-crawler\n",
+      "4. search-basic\n",
+      "   search-basic uses an interval schedule, which is not supported in Open Crawler!\n"
+     ]
+    }
+   ],
+   "source": [
+    " # in-memory data structure that maintains current state of the configs we've pulled\n",
+    "inflight_configuration_data = {}\n",
+    "\n",
+    "crawler_configurations = es_client.search(\n",
+    "    index=\".ent-search-actastic-crawler2_configurations_v2\",\n",
+    ")\n",
+    "\n",
+    "crawler_counter = 1\n",
+    "for configuration in crawler_configurations[\"hits\"][\"hits\"]:\n",
+    "    source = configuration[\"_source\"]\n",
+    "\n",
+    "    # extract values\n",
+    "    crawler_oid = source[\"id\"]\n",
+    "    output_index = source[\"index_name\"]\n",
+    "\n",
+    "    print (f\"{crawler_counter}. {output_index}\")\n",
+    "    crawler_counter += 1\n",
+    "\n",
+    "    crawl_schedule = [] # either no schedule or a specific schedule - determined in Step 4\n",
+    "    if source[\"use_connector_schedule\"] == False and source[\"crawl_schedule\"]: # an interval schedule is being used\n",
+    "        print (f\"   {output_index} uses an interval schedule, which is not supported in Open Crawler!\")\n",
+    "\n",
+    "    # populate a temporary hashmap\n",
+    "    temp_conf_map = {\n",
+    "        \"output_index\": output_index,\n",
+    "        \"schedule\": crawl_schedule\n",
+    "    }\n",
+    "    # pre-populate some necessary fields in preparation for upcoming steps\n",
+    "    temp_conf_map[\"domains_temp\"] = {}\n",
+    "    temp_conf_map[\"output_sink\"] = \"elasticsearch\"\n",
+    "    temp_conf_map[\"full_html_extraction_enabled\"] = False\n",
+    "    temp_conf_map[\"elasticsearch\"] = {\n",
+    "        \"host\": \"\",\n",
+    "        \"port\": \"\",\n",
+    "        \"api_key\": \"\",\n",
+    "        # \"username\": \"\",\n",
+    "        # \"password\": \"\",\n",
+    "    }\n",
+    "    # populate the in-memory data structure\n",
+    "    inflight_configuration_data[crawler_oid] = temp_conf_map"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2804d02b-870d-4173-9c5f-6d5eb434d49b",
+   "metadata": {},
+   "source": [
+    "**Before continuing, please verify in the output above that the correct number of Crawlers was found!**\n",
+    "\n",
+    "Now that we have some basic data about your Crawlers, let's use this information to get more configuration values!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2b9e2da7-853c-40bd-9ee1-02c4d92b3b43",
+   "metadata": {},
+   "source": [
+    "### Step 2: URLs, Sitemaps, and Crawl Rules\n",
+    "\n",
+    "In this cell, we will need to query Elasticsearch for information about each Crawler's domain URLs, seed URLs, sitemaps, and crawling rules."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 817,
+   "id": "e1c64c3d-c8d7-4236-9ed9-c9b1cb5e7972",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1.) Crawler ID 67b74f16204956a3ce9fd0a4\n",
+      "    Domain https://www.speedhunters.com found!\n",
+      "    Seed URls found: ['https://www.speedhunters.com/2025/01/the-mystery-of-the-hks-zero-r/', 'https://www.speedhunters.com/2025/02/daniel-arsham-eroded-porsche-911/', 'https://www.speedhunters.com/2025/02/5-plus-7-equals-v12-a-custom-bmw-super-saloon/']\n",
+      "    Sitemap URLs found: ['https://www.speedhunters.com/post_tag-sitemap2.xml']\n",
+      "2.) Crawler ID 67b74f84204956efce9fd0b7\n",
+      "    Domain https://www.dfworkshop.net found!\n",
+      "    Seed URls found: ['https://www.dfworkshop.net/']\n",
+      "    Crawl rules found: [{'policy': 'allow', 'type': 'begins', 'pattern': '/word'}, {'policy': 'deny', 'type': 'contains', 'pattern': 'DOS'}]\n",
+      "    Domain https://www.speedhunters.com found!\n",
+      "    Seed URls found: ['https://www.speedhunters.com/']\n",
+      "    Crawl rules found: [{'policy': 'deny', 'type': 'begins', 'pattern': '/BMW'}]\n",
+      "3.) Crawler ID 67b7509b2049567f859fd0d4\n",
+      "    Domain https://justinjackson.ca found!\n",
+      "    Seed URls found: ['https://justinjackson.ca/']\n",
+      "    Domain https://matt-nowzari.myportfolio.com found!\n",
+      "    Seed URls found: ['https://matt-nowzari.myportfolio.com/']\n",
+      "    Crawl rules found: [{'policy': 'deny', 'type': 'begins', 'pattern': '/The'}]\n",
+      "4.) Crawler ID 67b75aeb20495617d59fd0ea\n",
+      "    Domain https://www.elastic.co found!\n",
+      "    Seed URls found: ['https://www.elastic.co/']\n"
+     ]
+    }
+   ],
+   "source": [
+    "crawler_ids_to_query = inflight_configuration_data.keys()\n",
+    "\n",
+    "crawler_counter = 1\n",
+    "for crawler_oid in crawler_ids_to_query:\n",
+    "    # query ES to get the crawler's domain configurations\n",
+    "    crawler_domains = es_client.search(\n",
+    "        index=\".ent-search-actastic-crawler2_domains\",\n",
+    "        query={\"match\": {\"configuration_oid\": crawler_oid}},\n",
+    "        _source=[\"name\",\n",
+    "                 \"configuration_oid\",\n",
+    "                 \"id\",\n",
+    "                 \"sitemaps\",\n",
+    "                 \"crawl_rules\",\n",
+    "                 \"seed_urls\",\n",
+    "                 \"auth\"]\n",
+    "        )\n",
+    "    print (f\"{crawler_counter}.) Crawler ID {crawler_oid}\")\n",
+    "    crawler_counter += 1\n",
+    "    \n",
+    "    # for each domain the Crawler has, grab its config values\n",
+    "    # and update the in-memory data structure\n",
+    "    for domain_info in crawler_domains[\"hits\"][\"hits\"]:\n",
+    "        source = domain_info[\"_source\"]\n",
+    "\n",
+    "        # extract values\n",
+    "        domain_oid = str(source[\"id\"])\n",
+    "        domain_url = source[\"name\"]\n",
+    "        seed_urls = source[\"seed_urls\"]\n",
+    "        sitemap_urls = source[\"sitemaps\"]\n",
+    "        crawl_rules = source[\"crawl_rules\"]\n",
+    "\n",
+    "        print (f\"    Domain {domain_url} found!\")\n",
+    "        \n",
+    "        # transform seed, sitemap, and crawl rules into arrays\n",
+    "        seed_urls_list = []\n",
+    "        for seed_obj in seed_urls:\n",
+    "            seed_urls_list.append(seed_obj[\"url\"])\n",
+    "\n",
+    "        sitemap_urls_list= []\n",
+    "        for sitemap_obj in sitemap_urls:\n",
+    "            sitemap_urls_list.append(sitemap_obj[\"url\"])\n",
+    "\n",
+    "        crawl_rules_list = []\n",
+    "        for crawl_rules_obj in crawl_rules:\n",
+    "            crawl_rules_list.append({\n",
+    "                \"policy\" : crawl_rules_obj[\"policy\"],\n",
+    "                \"type\": crawl_rules_obj[\"rule\"],\n",
+    "                \"pattern\": crawl_rules_obj[\"pattern\"]\n",
+    "            })\n",
+    "\n",
+    "        # populate a temporary hashmap\n",
+    "        temp_domain_conf = {\"url\": domain_url}\n",
+    "        if seed_urls_list:\n",
+    "            temp_domain_conf[\"seed_urls\"] = seed_urls_list\n",
+    "            print (f\"    Seed URls found: {seed_urls_list}\")\n",
+    "        if sitemap_urls_list:\n",
+    "            temp_domain_conf[\"sitemap_urls\"] = sitemap_urls_list\n",
+    "            print (f\"    Sitemap URLs found: {sitemap_urls_list}\")\n",
+    "        if crawl_rules_list:\n",
+    "            temp_domain_conf[\"crawl_rules\"] = crawl_rules_list\n",
+    "            print (f\"    Crawl rules found: {crawl_rules_list}\")\n",
+    "                \n",
+    "        # populate the in-memory data structure\n",
+    "        inflight_configuration_data[crawler_oid][\"domains_temp\"][domain_oid] = temp_domain_conf"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "575c00ac-7c84-465e-83d7-aa51f8e5310d",
+   "metadata": {},
+   "source": [
+    "### Step 3: Extracting the Extraction Rules\n",
+    "\n",
+    "In the following cell, we will be acquiring any extraction rules you may have set in your Elastic Crawlers."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 818,
+   "id": "61a7df7a-72ad-4330-a30c-da319befd55c",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "4 total extraction rules found!\n"
+     ]
+    }
+   ],
+   "source": [
+    "extraction_rules = es_client.search(\n",
+    "    index=\".ent-search-actastic-crawler2_extraction_rules\",\n",
+    "    _source=[\"configuration_oid\", \"domain_oid\", \"rules\", \"url_filters\"]\n",
+    ")\n",
+    "\n",
+    "extr_count = 0\n",
+    "for exr_rule in extraction_rules[\"hits\"][\"hits\"]:\n",
+    "    source = exr_rule[\"_source\"]\n",
+    "\n",
+    "    config_oid = source[\"configuration_oid\"]\n",
+    "    domain_oid = source[\"domain_oid\"]\n",
+    "    \n",
+    "    all_rules = source[\"rules\"]\n",
+    "    all_url_filters = source[\"url_filters\"]\n",
+    "\n",
+    "    # extract url filters\n",
+    "    url_filters = []\n",
+    "    if all_url_filters:\n",
+    "        url_filters = [{\n",
+    "            \"type\": all_url_filters[0][\"filter\"],\n",
+    "            \"pattern\": all_url_filters[0][\"pattern\"],\n",
+    "        }]\n",
+    "\n",
+    "    # extract rulesets\n",
+    "    action_translation_map = {\n",
+    "        \"fixed\": \"set\",\n",
+    "        \"extracted\": \"extract\",\n",
+    "    }\n",
+    "    \n",
+    "    ruleset = {}\n",
+    "    if all_rules:\n",
+    "        ruleset = [{\n",
+    "            \"action\": action_translation_map[all_rules[0][\"content_from\"][\"value_type\"]],\n",
+    "            \"field_name\": all_rules[0][\"field_name\"],\n",
+    "            \"selector\": all_rules[0][\"selector\"],\n",
+    "            \"join_as\": all_rules[0][\"multiple_objects_handling\"],\n",
+    "            \"value\": all_rules[0][\"content_from\"][\"value\"],\n",
+    "            \"source\": all_rules[0][\"source_type\"],\n",
+    "        }]\n",
+    "\n",
+    "    # populate the in-memory data structure\n",
+    "    temp_extraction_rulesets = [{\n",
+    "        \"url_filters\": url_filters,\n",
+    "        \"rules\": ruleset,\n",
+    "    }]\n",
+    "    extr_count += 1\n",
+    "    inflight_configuration_data[config_oid][\"domains_temp\"][domain_oid][\"extraction_rulesets\"] = temp_extraction_rulesets\n",
+    "\n",
+    "print (f\"{extr_count} total extraction rules found!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "538fb054-1399-4b88-bd1e-fef116491421",
+   "metadata": {},
+   "source": [
+    "### Step 4: Schedules\n",
+    "\n",
+    "In the upcoming cell, we will be gathering any specific time schedules your Crawlers have set. Please note that _interval time schedules_ are not supported by Open Crawler and will be ignored."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 819,
+   "id": "d880e081-f960-41c7-921e-26896f248eab",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1.) Crawler search-daggerfall-unity-website-crawler-8.18 has the schedule '0 30 8 * * ?'\n"
+     ]
+    }
+   ],
+   "source": [
+    "crawler_counter = 1\n",
+    "for crawler_oid, crawler_config in inflight_configuration_data.items():\n",
+    "    output_index = crawler_config[\"output_index\"]\n",
+    "    \n",
+    "    existing_schedule_value = crawler_config[\"schedule\"]\n",
+    "\n",
+    "    if not existing_schedule_value:\n",
+    "        # query ES to get this Crawler's specific time schedule\n",
+    "        schedules_result = es_client.search(\n",
+    "            index=\".elastic-connectors-v1\",\n",
+    "            query={\"match\": {\"index_name\": output_index}},\n",
+    "            _source=[\"index_name\", \"scheduling\"]\n",
+    "        )\n",
+    "        # update schedule field with cron expression if specific time scheduling is enabled\n",
+    "        if schedules_result[\"hits\"][\"hits\"][0][\"_source\"][\"scheduling\"][\"full\"][\"enabled\"]:\n",
+    "            crawler_config[\"schedule\"] = schedules_result[\"hits\"][\"hits\"][0][\"_source\"][\"scheduling\"][\"full\"][\"interval\"]\n",
+    "            print (f\"{crawler_counter}.) Crawler {output_index} has the schedule '{crawler_config[\"schedule\"]}'\")\n",
+    "            crawler_counter += 1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b1586df2-283d-435f-9b08-ba9fad3a7e0a",
+   "metadata": {},
+   "source": [
+    "### Step 5: Creating the Open Crawler YAML configuration files\n",
+    "\n",
+    "In this final step, we will be creating the actual YAML files you need to get up and running with Open Crawler!\n",
+    "\n",
+    "The upcoming cell performs some final transformations to the in-memory data structure that is keeping track of your configurations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 820,
+   "id": "dd70f102-33ee-4106-8861-0aa0f9a223a1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Final transform of the in-memory data structure to a form we can dump to YAML\n",
+    "# for each crawler, collect all of its domain configurations into a list\n",
+    "for crawler_config in inflight_configuration_data.values():\n",
+    "    all_crawler_domains = []\n",
+    "    \n",
+    "    for domain_config in crawler_config[\"domains_temp\"].values():\n",
+    "        all_crawler_domains.append(domain_config)\n",
+    "    # create a new key called \"domains\" that points to a list of domain configs only - no domain_oid values as keys\n",
+    "    crawler_config[\"domains\"] = all_crawler_domains\n",
+    "    # delete the temporary domain key\n",
+    "    del crawler_config[\"domains_temp\"]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e611a486-e12f-4951-ab95-ca54241a7a06",
+   "metadata": {},
+   "source": [
+    "#### **Wait! Before we continue onto creating our YAML files, we're going to need your input on a few things.**\n",
+    "\n",
+    "In the following cell, please enter the following details about the _Elasticsearch instance you will be using with Open Crawler_:\n",
+    "- The Elasticsearch endpoint URL\n",
+    "- The port number of your Elasticsearch endpoint\n",
+    "- An API key"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 826,
+   "id": "213880cc-cbf3-40d9-8c7d-6fcf6428c16b",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdin",
+     "output_type": "stream",
+     "text": [
+      "Elasticsearch endpoint URL:  https://4911ebad5ed44d149fe8ddad4a4b3751.us-west2.gcp.elastic-cloud.com\n",
+      "The Elasticsearch endpoint's port number:  443\n",
+      "Elasticsearch API key:  ········\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "'You Know, for Search'"
+      ]
+     },
+     "execution_count": 826,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "ENDPOINT = input(\"Elasticsearch endpoint URL: \")\n",
+    "PORT = input(\"The Elasticsearch endpoint's port number: \")\n",
+    "OUTPUT_API_KEY = getpass(\"Elasticsearch API key: \")\n",
+    "\n",
+    "# set the above values in each Crawler's configuration\n",
+    "for crawler_config in inflight_configuration_data.values():\n",
+    "    crawler_config[\"elasticsearch\"][\"host\"] = ENDPOINT\n",
+    "    crawler_config[\"elasticsearch\"][\"port\"] = int(PORT)\n",
+    "    crawler_config[\"elasticsearch\"][\"api_key\"] = OUTPUT_API_KEY\n",
+    "\n",
+    "#ping ES to make sure we have positive connection\n",
+    "es_client = Elasticsearch(\n",
+    "    \":\".join([ENDPOINT, PORT]),\n",
+    "    api_key=OUTPUT_API_KEY,\n",
+    ")\n",
+    "\n",
+    "es_client.info()['tagline']"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "67dfc7c6-429e-42f0-ab08-2c84d72945cb",
+   "metadata": {},
+   "source": [
+    "#### **This is the final step! You have two options here:**\n",
+    "\n",
+    "- The \"Write to YAML\" cell will create _n_ number of YAML files, one for each Crawler you have.\n",
+    "- The \"Print to output\" cell will print each Crawler's configuration YAML in the Notebook, so you can copy-paste them into your Open Crawler YAML files manually.\n",
+    "\n",
+    "Feel free to run both! You can run Option 2 first to see the output before running Option 1 to save the configs into YAML files."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7ca5ad33-364c-4d13-88fc-db19052363d5",
+   "metadata": {},
+   "source": [
+    "#### Option 1: Write to YAML file"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 827,
+   "id": "6adc53db-d781-4b72-a5f3-441364f354b8",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      " Wrote search-search-crawler-fully-loaded-8.18-config.yml to /Users/mattnowzari/repos/search_and_transform/crawler/migration/search-search-crawler-fully-loaded-8.18-config.yml\n",
+      " Wrote search-daggerfall-unity-website-crawler-8.18-config.yml to /Users/mattnowzari/repos/search_and_transform/crawler/migration/search-daggerfall-unity-website-crawler-8.18-config.yml\n",
+      " Wrote search-migration-crawler-config.yml to /Users/mattnowzari/repos/search_and_transform/crawler/migration/search-migration-crawler-config.yml\n",
+      " Wrote search-basic-config.yml to /Users/mattnowzari/repos/search_and_transform/crawler/migration/search-basic-config.yml\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Dump each Crawler's configuration into its own YAML file\n",
+    "for crawler_config in inflight_configuration_data.values():\n",
+    "    base_dir = os.getcwd()\n",
+    "    file_name = f\"{crawler_config['output_index']}-config.yml\" # autogen a custom filename\n",
+    "    output_path = os.path.join(base_dir, file_name)\n",
+    "\n",
+    "    if os.path.exists(base_dir):\n",
+    "        with open(output_path, 'w') as file:\n",
+    "            yaml.safe_dump(\n",
+    "                crawler_config,\n",
+    "                file,\n",
+    "                sort_keys=False\n",
+    "            )\n",
+    "            print (f\" Wrote {file_name} to {output_path}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "35c56a2b-4acd-47f5-90e3-9dd39fa4383f",
+   "metadata": {},
+   "source": [
+    "#### Option 2: Print to output"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 828,
+   "id": "525aabb8-0537-4ba6-8109-109490dddafe",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "YAML config => search-search-crawler-fully-loaded-8.18-config.yml\n",
+      "--------\n",
+      "output_index: search-search-crawler-fully-loaded-8.18\n",
+      "schedule: []\n",
+      "output_sink: elasticsearch\n",
+      "full_html_extraction_enabled: false\n",
+      "elasticsearch:\n",
+      "  host: https://4911ebad5ed44d149fe8ddad4a4b3751.us-west2.gcp.elastic-cloud.com\n",
+      "  port: 443\n",
+      "  api_key: d1RBMktaVUJRdEdzS0U4d05BSWI6ZDlGaE9PbWdrVER3VEZFVlBPWkxVQQ==\n",
+      "domains:\n",
+      "- url: https://www.speedhunters.com\n",
+      "  seed_urls:\n",
+      "  - https://www.speedhunters.com/2025/01/the-mystery-of-the-hks-zero-r/\n",
+      "  - https://www.speedhunters.com/2025/02/daniel-arsham-eroded-porsche-911/\n",
+      "  - https://www.speedhunters.com/2025/02/5-plus-7-equals-v12-a-custom-bmw-super-saloon/\n",
+      "  sitemap_urls:\n",
+      "  - https://www.speedhunters.com/post_tag-sitemap2.xml\n",
+      "\n",
+      "--------------------------------------------------------------------------------\n",
+      "YAML config => search-daggerfall-unity-website-crawler-8.18-config.yml\n",
+      "--------\n",
+      "output_index: search-daggerfall-unity-website-crawler-8.18\n",
+      "schedule: 0 30 8 * * ?\n",
+      "output_sink: elasticsearch\n",
+      "full_html_extraction_enabled: false\n",
+      "elasticsearch:\n",
+      "  host: https://4911ebad5ed44d149fe8ddad4a4b3751.us-west2.gcp.elastic-cloud.com\n",
+      "  port: 443\n",
+      "  api_key: d1RBMktaVUJRdEdzS0U4d05BSWI6ZDlGaE9PbWdrVER3VEZFVlBPWkxVQQ==\n",
+      "domains:\n",
+      "- url: https://www.dfworkshop.net\n",
+      "  seed_urls:\n",
+      "  - https://www.dfworkshop.net/\n",
+      "  crawl_rules:\n",
+      "  - policy: allow\n",
+      "    type: begins\n",
+      "    pattern: /word\n",
+      "  - policy: deny\n",
+      "    type: contains\n",
+      "    pattern: DOS\n",
+      "  extraction_rulesets:\n",
+      "  - url_filters:\n",
+      "    - type: begins\n",
+      "      pattern: /elderscrolls/*\n",
+      "    rules:\n",
+      "    - action: set\n",
+      "      field_name: elder_field\n",
+      "      selector: /elderscrolls/*\n",
+      "      join_as: string\n",
+      "      value: ping\n",
+      "      source: url\n",
+      "- url: https://www.speedhunters.com\n",
+      "  seed_urls:\n",
+      "  - https://www.speedhunters.com/\n",
+      "  crawl_rules:\n",
+      "  - policy: deny\n",
+      "    type: begins\n",
+      "    pattern: /BMW\n",
+      "\n",
+      "--------------------------------------------------------------------------------\n",
+      "YAML config => search-migration-crawler-config.yml\n",
+      "--------\n",
+      "output_index: search-migration-crawler\n",
+      "schedule: []\n",
+      "output_sink: elasticsearch\n",
+      "full_html_extraction_enabled: false\n",
+      "elasticsearch:\n",
+      "  host: https://4911ebad5ed44d149fe8ddad4a4b3751.us-west2.gcp.elastic-cloud.com\n",
+      "  port: 443\n",
+      "  api_key: d1RBMktaVUJRdEdzS0U4d05BSWI6ZDlGaE9PbWdrVER3VEZFVlBPWkxVQQ==\n",
+      "domains:\n",
+      "- url: https://justinjackson.ca\n",
+      "  seed_urls:\n",
+      "  - https://justinjackson.ca/\n",
+      "- url: https://matt-nowzari.myportfolio.com\n",
+      "  seed_urls:\n",
+      "  - https://matt-nowzari.myportfolio.com/\n",
+      "  crawl_rules:\n",
+      "  - policy: deny\n",
+      "    type: begins\n",
+      "    pattern: /The\n",
+      "  extraction_rulesets:\n",
+      "  - url_filters: []\n",
+      "    rules:\n",
+      "    - action: set\n",
+      "      field_name: test_field\n",
+      "      selector: /html/body/a/@title\n",
+      "      join_as: string\n",
+      "      value: some_rando_value\n",
+      "      source: html\n",
+      "\n",
+      "--------------------------------------------------------------------------------\n",
+      "YAML config => search-basic-config.yml\n",
+      "--------\n",
+      "output_index: search-basic\n",
+      "schedule: []\n",
+      "output_sink: elasticsearch\n",
+      "full_html_extraction_enabled: false\n",
+      "elasticsearch:\n",
+      "  host: https://4911ebad5ed44d149fe8ddad4a4b3751.us-west2.gcp.elastic-cloud.com\n",
+      "  port: 443\n",
+      "  api_key: d1RBMktaVUJRdEdzS0U4d05BSWI6ZDlGaE9PbWdrVER3VEZFVlBPWkxVQQ==\n",
+      "domains:\n",
+      "- url: https://www.elastic.co\n",
+      "  seed_urls:\n",
+      "  - https://www.elastic.co/\n",
+      "\n",
+      "--------------------------------------------------------------------------------\n"
+     ]
+    }
+   ],
+   "source": [
+    "for crawler_config in inflight_configuration_data.values():\n",
+    "    yaml_out = yaml.safe_dump(\n",
+    "        crawler_config,\n",
+    "        sort_keys=False\n",
+    "    )\n",
+    "    \n",
+    "    print (f\"YAML config => {crawler_config['output_index']}-config.yml\\n--------\")\n",
+    "    print (yaml_out)\n",
+    "    print (\"--------------------------------------------------------------------------------\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "55888204-f823-48cd-bca4-a7663e0fe56a",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}