|
1 | 1 | { |
2 | 2 | "cells": [ |
3 | 3 | { |
| 4 | + "attachments": {}, |
4 | 5 | "cell_type": "markdown", |
5 | 6 | "id": "b0ee9a28", |
6 | 7 | "metadata": {}, |
|
9 | 10 | ] |
10 | 11 | }, |
11 | 12 | { |
| 13 | + "attachments": {}, |
12 | 14 | "cell_type": "markdown", |
13 | 15 | "id": "3a2a7b51", |
14 | 16 | "metadata": {}, |
|
17 | 19 | ] |
18 | 20 | }, |
19 | 21 | { |
| 22 | + "attachments": {}, |
20 | 23 | "cell_type": "markdown", |
21 | 24 | "id": "42724a76", |
22 | 25 | "metadata": {}, |
|
39 | 42 | ] |
40 | 43 | }, |
41 | 44 | { |
| 45 | + "attachments": {}, |
42 | 46 | "cell_type": "markdown", |
43 | 47 | "metadata": { |
44 | 48 | "collapsed": false |
|
68 | 72 | ] |
69 | 73 | }, |
70 | 74 | { |
| 75 | + "attachments": {}, |
71 | 76 | "cell_type": "markdown", |
72 | 77 | "id": "1e9499ea", |
73 | 78 | "metadata": {}, |
|
86 | 91 | ] |
87 | 92 | }, |
88 | 93 | { |
| 94 | + "attachments": {}, |
89 | 95 | "cell_type": "markdown", |
90 | 96 | "id": "6f13f0cb", |
91 | 97 | "metadata": {}, |
|
110 | 116 | ] |
111 | 117 | }, |
112 | 118 | { |
| 119 | + "attachments": {}, |
113 | 120 | "cell_type": "markdown", |
114 | 121 | "id": "a7666d80", |
115 | 122 | "metadata": {}, |
|
133 | 140 | ] |
134 | 141 | }, |
135 | 142 | { |
| 143 | + "attachments": {}, |
136 | 144 | "cell_type": "markdown", |
137 | 145 | "id": "367791b9", |
138 | 146 | "metadata": {}, |
|
153 | 161 | ] |
154 | 162 | }, |
155 | 163 | { |
| 164 | + "attachments": {}, |
156 | 165 | "cell_type": "markdown", |
157 | 166 | "id": "f91b967c", |
158 | 167 | "metadata": {}, |
|
202 | 211 | ] |
203 | 212 | }, |
204 | 213 | { |
| 214 | + "attachments": {}, |
205 | 215 | "cell_type": "markdown", |
206 | 216 | "id": "fd5fc8a2", |
207 | 217 | "metadata": {}, |
|
238 | 248 | ] |
239 | 249 | }, |
240 | 250 | { |
| 251 | + "attachments": {}, |
241 | 252 | "cell_type": "markdown", |
242 | 253 | "id": "efe6eaaf", |
243 | 254 | "metadata": {}, |
|
267 | 278 | ] |
268 | 279 | }, |
269 | 280 | { |
| 281 | + "attachments": {}, |
270 | 282 | "cell_type": "markdown", |
271 | 283 | "id": "bff6a1fc", |
272 | 284 | "metadata": {}, |
|
297 | 309 | ] |
298 | 310 | }, |
299 | 311 | { |
| 312 | + "attachments": {}, |
300 | 313 | "cell_type": "markdown", |
301 | 314 | "id": "beca9dab", |
302 | 315 | "metadata": {}, |
|
335 | 348 | ] |
336 | 349 | }, |
337 | 350 | { |
| 351 | + "attachments": {}, |
338 | 352 | "cell_type": "markdown", |
339 | 353 | "id": "b7a45c6a", |
340 | 354 | "metadata": {}, |
|
365 | 379 | ] |
366 | 380 | }, |
367 | 381 | { |
| 382 | + "attachments": {}, |
368 | 383 | "cell_type": "markdown", |
369 | 384 | "id": "8370b377", |
370 | 385 | "metadata": {}, |
|
394 | 409 | ] |
395 | 410 | }, |
396 | 411 | { |
| 412 | + "attachments": {}, |
397 | 413 | "cell_type": "markdown", |
398 | 414 | "id": "9324bff7", |
399 | 415 | "metadata": {}, |
|
413 | 429 | ] |
414 | 430 | }, |
415 | 431 | { |
| 432 | + "attachments": {}, |
416 | 433 | "cell_type": "markdown", |
417 | 434 | "id": "21738d39", |
418 | 435 | "metadata": {}, |
|
432 | 449 | ] |
433 | 450 | }, |
434 | 451 | { |
| 452 | + "attachments": {}, |
435 | 453 | "cell_type": "markdown", |
436 | 454 | "id": "1bded05b", |
437 | 455 | "metadata": {}, |
|
450 | 468 | ] |
451 | 469 | }, |
452 | 470 | { |
| 471 | + "attachments": {}, |
453 | 472 | "cell_type": "markdown", |
454 | 473 | "id": "cd49d635", |
455 | 474 | "metadata": {}, |
|
489 | 508 | ] |
490 | 509 | }, |
491 | 510 | { |
| 511 | + "attachments": {}, |
492 | 512 | "cell_type": "markdown", |
493 | 513 | "id": "783a599e", |
494 | 514 | "metadata": {}, |
|
526 | 546 | "df = wr.neptune.execute_opencypher(client, query)\n", |
527 | 547 | "display(df)" |
528 | 548 | ] |
| 549 | + }, |
| 550 | + { |
| 551 | + "attachments": {}, |
| 552 | + "cell_type": "markdown", |
| 553 | + "id": "19a2ae67", |
| 554 | + "metadata": {}, |
| 555 | + "source": [ |
| 556 | + "## Bulk Load" |
| 557 | + ] |
| 558 | + }, |
| 559 | + { |
| 560 | + "attachments": {}, |
| 561 | + "cell_type": "markdown", |
| 562 | + "id": "86d1bca1", |
| 563 | + "metadata": {}, |
| 564 | + "source": [ |
| 565 | + "Data can be written using the Neptune Bulk Loader by way of S3.\n", |
| 566 | + "The Bulk Loader is fast and optimized for large datasets.\n", |
| 567 | + "\n", |
| 568 | + "For details on the IAM permissions needed to set this up, see [here](https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html)." |
| 569 | + ] |
| 570 | + }, |
| 571 | + { |
| 572 | + "cell_type": "code", |
| 573 | + "execution_count": null, |
| 574 | + "id": "3f3aa82f", |
| 575 | + "metadata": {}, |
| 576 | + "outputs": [], |
| 577 | + "source": [ |
| 578 | + "df = pd.DataFrame([_create_dummy_edge() for _ in range(1000)])\n", |
| 579 | + "\n", |
| 580 | + "wr.neptune.bulk_load(\n", |
| 581 | + " client=client,\n", |
| 582 | + " df=df,\n", |
| 583 | + " path=\"s3://my-bucket/stage-files/\",\n", |
| 584 | + " iam_role=\"arn:aws:iam::XXX:role/XXX\",\n", |
| 585 | + ")" |
| 586 | + ] |
| 587 | + }, |
| 588 | + { |
| 589 | + "attachments": {}, |
| 590 | + "cell_type": "markdown", |
| 591 | + "id": "e00bc8a5", |
| 592 | + "metadata": {}, |
| 593 | + "source": [ |
| 594 | + "Alternatively, if the data is already on S3 in CSV format, you can use the `neptune.bulk_load_from_files` function.\n", |
| 595 | + "This is also useful if the data is written to S3 as a byproduct of an AWS Athena command, as the example below will show." |
| 596 | + ] |
| 597 | + }, |
| 598 | + { |
| 599 | + "cell_type": "code", |
| 600 | + "execution_count": null, |
| 601 | + "id": "a5263211", |
| 602 | + "metadata": {}, |
| 603 | + "outputs": [], |
| 604 | + "source": [ |
| 605 | + "sql = \"\"\"\n", |
| 606 | + "SELECT\n", |
| 607 | + " <col_id> AS \"~id\"\n", |
| 608 | + " , <label_id> AS \"~label\"\n", |
| 609 | + " , *\n", |
| 610 | + "FROM <database>.<table>\n", |
| 611 | + "\"\"\"\n", |
| 612 | + "\n", |
| 613 | + "wr.athena.start_query_execution(\n", |
| 614 | + " sql=sql,\n", |
| 615 | + " s3_output=\"s3://my-bucket/stage-files-athena/\",\n", |
| 616 | + " wait=True,\n", |
| 617 | + ")\n", |
| 618 | + "\n", |
| 619 | + "wr.neptune.bulk_load_from_files(\n", |
| 620 | + " client=client,\n", |
| 621 | + " path=\"s3://my-bucket/stage-files-athena/\",\n", |
| 622 | + " iam_role=\"arn:aws:iam::XXX:role/XXX\",\n", |
| 623 | + ")" |
| 624 | + ] |
| 625 | + }, |
| 626 | + { |
| 627 | + "attachments": {}, |
| 628 | + "cell_type": "markdown", |
| 629 | + "id": "58ee6866", |
| 630 | + "metadata": {}, |
| 631 | + "source": [ |
| 632 | + "Both the `bulk_load` and `bulk_load_from_files` functions are suitable at scale.\n", |
| 633 | + "The latter simply invokes the Neptune Bulk Loader on existing data in S3.\n", |
| 634 | + "The former, however, involves writing CSV data to S3. With `ray` and `modin` installed, this operation can also be distributed across multiple workers in a Ray cluster." |
| 635 | + ] |
529 | 636 | } |
530 | 637 | ], |
531 | 638 | "metadata": { |
|
0 commit comments