Skip to content

Commit 1492216

Browse files
docs: Add tutorials section for the Neptune bulk load (#2267)
1 parent 42a70ea commit 1492216

File tree

1 file changed

+107
-0
lines changed

1 file changed

+107
-0
lines changed

tutorials/033 - Amazon Neptune.ipynb

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
{
22
"cells": [
33
{
4+
"attachments": {},
45
"cell_type": "markdown",
56
"id": "b0ee9a28",
67
"metadata": {},
@@ -9,6 +10,7 @@
910
]
1011
},
1112
{
13+
"attachments": {},
1214
"cell_type": "markdown",
1315
"id": "3a2a7b51",
1416
"metadata": {},
@@ -17,6 +19,7 @@
1719
]
1820
},
1921
{
22+
"attachments": {},
2023
"cell_type": "markdown",
2124
"id": "42724a76",
2225
"metadata": {},
@@ -39,6 +42,7 @@
3942
]
4043
},
4144
{
45+
"attachments": {},
4246
"cell_type": "markdown",
4347
"metadata": {
4448
"collapsed": false
@@ -68,6 +72,7 @@
6872
]
6973
},
7074
{
75+
"attachments": {},
7176
"cell_type": "markdown",
7277
"id": "1e9499ea",
7378
"metadata": {},
@@ -86,6 +91,7 @@
8691
]
8792
},
8893
{
94+
"attachments": {},
8995
"cell_type": "markdown",
9096
"id": "6f13f0cb",
9197
"metadata": {},
@@ -110,6 +116,7 @@
110116
]
111117
},
112118
{
119+
"attachments": {},
113120
"cell_type": "markdown",
114121
"id": "a7666d80",
115122
"metadata": {},
@@ -133,6 +140,7 @@
133140
]
134141
},
135142
{
143+
"attachments": {},
136144
"cell_type": "markdown",
137145
"id": "367791b9",
138146
"metadata": {},
@@ -153,6 +161,7 @@
153161
]
154162
},
155163
{
164+
"attachments": {},
156165
"cell_type": "markdown",
157166
"id": "f91b967c",
158167
"metadata": {},
@@ -202,6 +211,7 @@
202211
]
203212
},
204213
{
214+
"attachments": {},
205215
"cell_type": "markdown",
206216
"id": "fd5fc8a2",
207217
"metadata": {},
@@ -238,6 +248,7 @@
238248
]
239249
},
240250
{
251+
"attachments": {},
241252
"cell_type": "markdown",
242253
"id": "efe6eaaf",
243254
"metadata": {},
@@ -267,6 +278,7 @@
267278
]
268279
},
269280
{
281+
"attachments": {},
270282
"cell_type": "markdown",
271283
"id": "bff6a1fc",
272284
"metadata": {},
@@ -297,6 +309,7 @@
297309
]
298310
},
299311
{
312+
"attachments": {},
300313
"cell_type": "markdown",
301314
"id": "beca9dab",
302315
"metadata": {},
@@ -335,6 +348,7 @@
335348
]
336349
},
337350
{
351+
"attachments": {},
338352
"cell_type": "markdown",
339353
"id": "b7a45c6a",
340354
"metadata": {},
@@ -365,6 +379,7 @@
365379
]
366380
},
367381
{
382+
"attachments": {},
368383
"cell_type": "markdown",
369384
"id": "8370b377",
370385
"metadata": {},
@@ -394,6 +409,7 @@
394409
]
395410
},
396411
{
412+
"attachments": {},
397413
"cell_type": "markdown",
398414
"id": "9324bff7",
399415
"metadata": {},
@@ -413,6 +429,7 @@
413429
]
414430
},
415431
{
432+
"attachments": {},
416433
"cell_type": "markdown",
417434
"id": "21738d39",
418435
"metadata": {},
@@ -432,6 +449,7 @@
432449
]
433450
},
434451
{
452+
"attachments": {},
435453
"cell_type": "markdown",
436454
"id": "1bded05b",
437455
"metadata": {},
@@ -450,6 +468,7 @@
450468
]
451469
},
452470
{
471+
"attachments": {},
453472
"cell_type": "markdown",
454473
"id": "cd49d635",
455474
"metadata": {},
@@ -489,6 +508,7 @@
489508
]
490509
},
491510
{
511+
"attachments": {},
492512
"cell_type": "markdown",
493513
"id": "783a599e",
494514
"metadata": {},
@@ -526,6 +546,93 @@
526546
"df = wr.neptune.execute_opencypher(client, query)\n",
527547
"display(df)"
528548
]
549+
},
550+
{
551+
"attachments": {},
552+
"cell_type": "markdown",
553+
"id": "19a2ae67",
554+
"metadata": {},
555+
"source": [
556+
"## Bulk Load"
557+
]
558+
},
559+
{
560+
"attachments": {},
561+
"cell_type": "markdown",
562+
"id": "86d1bca1",
563+
"metadata": {},
564+
"source": [
565+
"Data can be written using the Neptune Bulk Loader by way of S3.\n",
566+
"The Bulk Loader is fast and optimized for large datasets.\n",
567+
"\n",
568+
"For details on the IAM permissions needed to set this up, see [here](https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html)."
569+
]
570+
},
571+
{
572+
"cell_type": "code",
573+
"execution_count": null,
574+
"id": "3f3aa82f",
575+
"metadata": {},
576+
"outputs": [],
577+
"source": [
578+
"df = pd.DataFrame([_create_dummy_edge() for _ in range(1000)])\n",
579+
"\n",
580+
"wr.neptune.bulk_load(\n",
581+
" client=client,\n",
582+
" df=df,\n",
583+
" path=\"s3://my-bucket/stage-files/\",\n",
584+
" iam_role=\"arn:aws:iam::XXX:role/XXX\",\n",
585+
")"
586+
]
587+
},
588+
{
589+
"attachments": {},
590+
"cell_type": "markdown",
591+
"id": "e00bc8a5",
592+
"metadata": {},
593+
"source": [
594+
"Alternatively, if the data is already on S3 in CSV format, you can use the `neptune.bulk_load_from_files` function.\n",
595+
"This is also useful if the data is written to S3 as a byproduct of an AWS Athena command, as the example below will show."
596+
]
597+
},
598+
{
599+
"cell_type": "code",
600+
"execution_count": null,
601+
"id": "a5263211",
602+
"metadata": {},
603+
"outputs": [],
604+
"source": [
605+
"sql = \"\"\"\n",
606+
"SELECT\n",
607+
" <col_id> AS \"~id\"\n",
608+
" , <label_id> AS \"~label\"\n",
609+
" , *\n",
610+
"FROM <database>.<table>\n",
611+
"\"\"\"\n",
612+
"\n",
613+
"wr.athena.start_query_execution(\n",
614+
" sql=sql,\n",
615+
" s3_output=\"s3://my-bucket/stage-files-athena/\",\n",
616+
" wait=True,\n",
617+
")\n",
618+
"\n",
619+
"wr.neptune.bulk_load_from_files(\n",
620+
" client=client,\n",
621+
" path=\"s3://my-bucket/stage-files-athena/\",\n",
622+
" iam_role=\"arn:aws:iam::XXX:role/XXX\",\n",
623+
")"
624+
]
625+
},
626+
{
627+
"attachments": {},
628+
"cell_type": "markdown",
629+
"id": "58ee6866",
630+
"metadata": {},
631+
"source": [
632+
"Both the `bulk_load` and `bulk_load_from_files` functions are suitable at scale.\n",
633+
"The latter simply invokes the Neptune Bulk Loader on existing data in S3.\n",
634+
"The former, however, involves writing CSV data to S3. With `ray` and `modin` installed, this operation can also be distributed across multiple workers in a Ray cluster."
635+
]
529636
}
530637
],
531638
"metadata": {

0 commit comments

Comments
 (0)