docs: Add tutorials section for the Neptune bulk load (#2267)

LeonLuttenberger · web-flow · commit 149221690493 · 2023-05-10T09:38:52.000-07:00
diff --git a/tutorials/033 - Amazon Neptune.ipynb b/tutorials/033 - Amazon Neptune.ipynb
@@ -1,6 +1,7 @@
 {
  "cells": [
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "b0ee9a28",
    "metadata": {},
@@ -9,6 +10,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "3a2a7b51",
    "metadata": {},
@@ -17,6 +19,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "42724a76",
    "metadata": {},
@@ -39,6 +42,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {
     "collapsed": false
@@ -68,6 +72,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "1e9499ea",
    "metadata": {},
@@ -86,6 +91,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "6f13f0cb",
    "metadata": {},
@@ -110,6 +116,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "a7666d80",
    "metadata": {},
@@ -133,6 +140,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "367791b9",
    "metadata": {},
@@ -153,6 +161,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "f91b967c",
    "metadata": {},
@@ -202,6 +211,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "fd5fc8a2",
    "metadata": {},
@@ -238,6 +248,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "efe6eaaf",
    "metadata": {},
@@ -267,6 +278,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "bff6a1fc",
    "metadata": {},
@@ -297,6 +309,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "beca9dab",
    "metadata": {},
@@ -335,6 +348,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "b7a45c6a",
    "metadata": {},
@@ -365,6 +379,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "8370b377",
    "metadata": {},
@@ -394,6 +409,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "9324bff7",
    "metadata": {},
@@ -413,6 +429,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "21738d39",
    "metadata": {},
@@ -432,6 +449,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "1bded05b",
    "metadata": {},
@@ -450,6 +468,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "cd49d635",
    "metadata": {},
@@ -489,6 +508,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "783a599e",
    "metadata": {},
@@ -526,6 +546,93 @@
     "df = wr.neptune.execute_opencypher(client, query)\n",
     "display(df)"
    ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "19a2ae67",
+   "metadata": {},
+   "source": [
+    "## Bulk Load"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "86d1bca1",
+   "metadata": {},
+   "source": [
+    "Data can be written using the Neptune Bulk Loader by way of S3.\n",
+    "The Bulk Loader is fast and optimized for large datasets.\n",
+    "\n",
+    "For details on the IAM permissions needed to set this up, see [here](https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3f3aa82f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = pd.DataFrame([_create_dummy_edge() for _ in range(1000)])\n",
+    "\n",
+    "wr.neptune.bulk_load(\n",
+    "    client=client,\n",
+    "    df=df,\n",
+    "    path=\"s3://my-bucket/stage-files/\",\n",
+    "    iam_role=\"arn:aws:iam::XXX:role/XXX\",\n",
+    ")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "e00bc8a5",
+   "metadata": {},
+   "source": [
+    "Alternatively, if the data is already on S3 in CSV format, you can use the `neptune.bulk_load_from_files` function.\n",
+    "This is also useful if the data is written to S3 as a byproduct of an AWS Athena command, as the example below will show."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "a5263211",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sql = \"\"\"\n",
+    "SELECT\n",
+    "    <col_id> AS \"~id\"\n",
+    "  , <label_id> AS \"~label\"\n",
+    "  , *\n",
+    "FROM <database>.<table>\n",
+    "\"\"\"\n",
+    "\n",
+    "wr.athena.start_query_execution(\n",
+    "    sql=sql,\n",
+    "    s3_output=\"s3://my-bucket/stage-files-athena/\",\n",
+    "    wait=True,\n",
+    ")\n",
+    "\n",
+    "wr.neptune.bulk_load_from_files(\n",
+    "    client=client,\n",
+    "    path=\"s3://my-bucket/stage-files-athena/\",\n",
+    "    iam_role=\"arn:aws:iam::XXX:role/XXX\",\n",
+    ")"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "58ee6866",
+   "metadata": {},
+   "source": [
+    "Both the `bulk_load` and `bulk_load_from_files` functions are suitable at scale.\n",
+    "The latter simply invokes the Neptune Bulk Loader on existing data in S3.\n",
+    "The former, however, involves writing CSV data to S3. With `ray` and `modin` installed, this operation can also be distributed across multiple workers in a Ray cluster."
+   ]
   }
  ],
  "metadata": {