Skip to content

Commit 812eaa0

Browse files
authored
DOC-345 | Arangograph Data Loader (#328)
* wip * wip 2 * remove {{< description >}} shortcode * add import step, file validation and errors * add dataloader example * copy files to all version folders * few adjustments
1 parent f35e990 commit 812eaa0

25 files changed

+1095
-0
lines changed
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
---
2+
title: Load your data into ArangoGraph
3+
menuTitle: Data Loader
4+
weight: 22
5+
description: >-
6+
Load your data into ArangoGraph and transform it into richly-connected graph
7+
structures, without needing to write any code or deploy any infrastructure
8+
archetype: chapter
9+
---
10+
11+
The ArangoGraph Data Loader allows you to transform existing data from CSV file
12+
formats into data that can be analyzed by the ArangoGraph platform.
13+
14+
You provide your data in CSV format, a common format used for exports of data
15+
from various systems. Then, using a no-code editor, you can model the schema of
16+
this data and the relationships between them. This allows you to ingest your
17+
existing datasets into your ArangoGraph database, without the need for any
18+
development effort.
19+
20+
You can get started in a few easy steps.
21+
22+
{{< tabs groupid="data-loader-steps" >}}
23+
24+
{{< tab name="1. Create database" >}}
25+
Choose an existing database or create a new one and enter a name for your new graph.
26+
{{< /tab >}}
27+
28+
{{< tab name="2. Add files" >}}
29+
Drag and drop your data files in CSV format.
30+
{{< /tab >}}
31+
32+
{{< tab name="3. Design your graph" >}}
33+
Model your graph schema by adding nodes and connecting them via edges.
34+
{{< /tab >}}
35+
36+
{{< tab name="4. Import data" >}}
37+
Once you are ready, save and start the import. The resulting graph is an
38+
[EnterpriseGraph](../../graphs/enterprisegraphs/_index.md) with its
39+
corresponding collections, available in your ArangoDB web interface.
40+
{{< /tab >}}
41+
42+
{{< /tabs >}}
43+
44+
Follow this [working example](../data-loader/example.md) to see how easy it is
45+
to transform existing data into a graph.
46+
47+
## How to access the Data Loader
48+
49+
1. If you do not have a deployment yet, [create a deployment](../deployments/_index.md#how-to-create-a-new-deployment) first.
50+
2. Open the deployment you want to load data into.
51+
3. In the **Load Data** section, click the **Load your data** button.
52+
53+
![ArangoGraph Data Loader Overview](../../../images/arangograph-data-loader-overview.png)
54+
55+
## Other options to import data into ArangoGraph
56+
57+
To import data from various files into collections **without creating a graph**,
58+
get the ArangoDB client tools for your operating system from the
59+
[download page](https://arangodb.com/download-major/).
60+
61+
- To import data to ArangoGraph from an existing ArangoDB instance, see
62+
[arangodump](../../components/tools/arangodump/) and
63+
[arangorestore](../../components/tools/arangorestore/).
64+
- To import pre-existing data in JSON, CSV, or TSV format, see
65+
[arangoimport](../../components/tools/arangoimport/).
66+
- To transfer data from an existing on-premises ArangoDB instance to your
67+
ArangoGraph cluster, see the [cloud migration tool](../migrate-to-the-cloud.md).
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
---
2+
title: Add files into Data Loader
3+
menuTitle: Add files
4+
weight: 5
5+
description: >-
6+
Provide your set of files in CSV format containing the data to be imported
7+
archetype: default
8+
---
9+
10+
The Data Loader allows you to upload your data files in CSV format into
11+
ArangoGraph and then use these data sources to design a graph using the
12+
built-in graph designer.
13+
14+
## Upload your files
15+
16+
You can upload your CSV files in the following ways:
17+
18+
- Drag and drop your files in the designated area.
19+
- Click the **Browse files** button and select the files you want to add.
20+
21+
![ArangoGraph Data Loader Upload Files](../../../images/arangograph-data-loader-upload-files.png)
22+
23+
You have the option to either upload several files collectively as a batch or
24+
add them individually. Furthermore, you can supplement additional files later on.
25+
After a file has been uploaded, you can expand it to preview both the header and
26+
the first row of data within the file.
27+
28+
In case you upload CSV files without fields, they will not be available for
29+
manipulation.
30+
31+
Once the files are uploaded, you can start [designing your graph](../data-loader/design-graph.md).
32+
33+
### File formatting limitations
34+
35+
Ensure that the files you upload are correctly formatted. Otherwise, errors may
36+
occur, the upload may fail, or the data may not be correctly mapped.
37+
38+
The following restrictions and limitations apply:
39+
40+
- The only supported file format is CSV. If you submit an invalid file format,
41+
the upload of that specific file will be prevented.
42+
- It is required that all CSV files have a header row. If you upload a file
43+
without a header, the first row of data is treated as the header. To avoid
44+
losing the first row of the data, make sure to include headers in your files.
45+
- The CSV file should have unique header names. It is not possible to have two
46+
columns with the same name within the same file.
47+
48+
For more details, see the [File validation](../data-loader/import.md#file-validation) section.
49+
50+
### Upload limits
51+
52+
Note that there is a cumulative file upload limit of 1GB. This means that the
53+
combined size of all files you upload should not exceed 1GB. If the total size
54+
of the uploaded files surpasses this limit, the upload may not be successful.
55+
56+
## Delete files
57+
58+
You can remove uploaded files by clicking the **Delete file** button in the
59+
**Your files** panel. Please keep in mind that in order to delete a file,
60+
you must first remove all graph associations associated with it.
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
---
2+
title: Design your graph
3+
menuTitle: Design graph
4+
weight: 10
5+
description: >-
6+
Design your graph database schema using the integrated graph modeler in the Data Loader
7+
archetype: default
8+
---
9+
10+
Based on the data you have uploaded, you can start designing your graph.
11+
The graph designer allows you to create a schema using nodes and edges.
12+
Once this is done, you can save and start the import. The resulting
13+
[EnterpriseGraph](../../graphs/enterprisegraphs/_index.md) and the
14+
corresponding collections are created in your ArangoDB database instance.
15+
16+
## How to add a node
17+
18+
Nodes are the main objects in your data model and include the attributes of the
19+
objects.
20+
21+
1. To create a new node, click the **Add node** button.
22+
2. In the graph designer, click on the newly create node to view the **Node details**.
23+
3. In the **Node details** panel, fill in the following fields:
24+
- For **Node label**, enter a name you want to use for the node.
25+
- For **File**, select a file from the list to associate it with the node.
26+
- For **Primary Identifier**, select a field from the list. This is used to
27+
reference the nodes when you define relations with edges.
28+
- For **File Headers**, select one or more attributes from the list.
29+
30+
![ArangoGraph Data Loader Add Node](../../../images/arangograph-data-loader-add-node.png)
31+
32+
## How to connect nodes
33+
34+
Nodes can be connected by edges to express and categorize the relations between
35+
them. A relation always has a direction, going from one node to another. You can
36+
define this direction in the graph designer by dragging your cursor from a
37+
particular node to another.
38+
39+
To connect two nodes, you can use the **Connect node(s)** button. Click on any
40+
node to self-reference it or drag to connect it to another node. Alternatively,
41+
when you select a node, a plus sign will appear, allowing you to directly add a
42+
new node with an edge.
43+
44+
The edge needs to be associated with a file and must have a label. Note that a
45+
node and an edge cannot have the same label.
46+
47+
See below the steps to add details to an edge.
48+
49+
1. Click on an edge in the graph designer.
50+
2. In the **Edit Edge** panel, fill in the following fields:
51+
- For **Edge label**, enter a name you want to use for the edge.
52+
- For **Relation file**, select a file from the list to associate it with the edge.
53+
- To define how the relation points from one node to another, select the
54+
corresponding relation file header for both the origin file (`_from`) and the
55+
destination file (`_to`).
56+
- For **File Headers**, select one or more attributes from the list.
57+
58+
![ArangoGraph Data Loader Edit Edge](../../../images/arangograph-data-loader-edit-edge.png)
59+
60+
## How to delete elements
61+
62+
To remove a node or an edge, simply select it in the graph designer and click the
63+
**Delete** icon.
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
---
2+
title: Data Loader Example
3+
menuTitle: Example
4+
weight: 20
5+
description: >-
6+
Follow this complete working example to see how easy it is to transform existing
7+
data into a graph and get insights from the connected entities
8+
archetype: default
9+
---
10+
11+
To transform your data into a graph, you need to have CSV files with entities
12+
representing the nodes and a corresponding CSV file representing the edges.
13+
14+
This example uses a sample data set of two files, `airports.csv` and `flights.csv`
15+
to create a graph showing flights arriving to and departing from various cities.
16+
17+
The `airports.csv` contains rows of airport entries, which are the future nodes
18+
in your graph. The `flights.csv` contains rows of flight entries, which are the
19+
future edges connecting the nodes.
20+
21+
The whole process can be broken down into these steps:
22+
23+
1. **Database and graph setup**: Begin by choosing an existing database or
24+
create a new one and enter a name for your new graph.
25+
2. **Add files**: Upload the CSV files to the Data Loader web interface. You can
26+
simply drag and drop them or upload them through the file browser window.
27+
3. **Design graph**: Design your graph schema by adding nodes and edges and map
28+
data from the uploaded files to them. This allows creating the corresponding
29+
documents and collections for your graph.
30+
4. **Import data**: Import the data and start using your newly created
31+
[EnterpriseGraph](../../graphs/enterprisegraphs/_index.md) and its
32+
corresponding collections.
33+
34+
## Step 1: Create database and choose graph name
35+
36+
Start by creating a new database and adding a name for your graph.
37+
38+
![Data Loader Example Step 1](../../../images/arangograph-data-loader-example-choose-names.png)
39+
40+
## Step 2: Add files
41+
42+
Upload your CSV files to the Data Loader web interface. You can drag and drop
43+
them or upload them via a file browser window.
44+
45+
![Data Loader Example Step 2](../../../images/arangograph-data-loader-example-add-files.png)
46+
47+
See also [Add files into Data Loader](../data-loader/add-files.md).
48+
49+
## Step 3: Design graph schema
50+
51+
Once the files are added, you can start designing the graph schema. This example
52+
uses a simple graph consisting of:
53+
- Two nodes (`origin_airport` and `destination_airport`)
54+
- One directed edge going from the origin airport to the destination one
55+
representing a flight
56+
57+
Click **Add node** to create the nodes and connect them with edges.
58+
59+
Next, for each of the nodes and edges, you need to create a mapping to the
60+
corresponding file and headers.
61+
62+
For nodes, the **Node label** is going to be a node collection name and the
63+
**Primary identifier** will be used to populate the `_key` attribute of documents.
64+
You can also select any additional headers to be included as document attributes.
65+
66+
In this example, two node collections have been created (`origin_airport` and
67+
`destination_airport`) and `AirportID` header is used to create the `_key`
68+
attribute for documents in both node collections. The header preview makes it
69+
easy to select the headers you want to use.
70+
71+
![Data Loader Example Step 3 Nodes](../../../images/arangograph-data-loader-example-map-nodes.png)
72+
73+
For edges, the **Edge label** is going to be an edge collection name. Then, you
74+
need to specify how edges will connect nodes. You can do this by selecting the
75+
*from* and *to* nodes to give a direction to the edge.
76+
In this example, the `source airport` header has been selected as a source and
77+
the `destination airport` header as a target for the edge.
78+
79+
![Data Loader Example Step 3 Edges](../../../images/arangograph-data-loader-example-map-edges.png)
80+
81+
Note that the values of source and target for the edge correspond to the
82+
**Primary identifier** (`_key` attribute) of the nodes. In this case, it is the
83+
airport code (i.e. GKA) used as the `_key` in the node documents and in the source
84+
and destination headers to configure the edges.
85+
86+
See also [Design your graph in the Data Loader](../data-loader/design-graph.md).
87+
88+
## Step 4: Import and see resulting graph
89+
90+
After all the mapping is done, all you need to do is click
91+
**Save and start import**. The report provides an overview of the files
92+
processed and the documents created, as well as a link to your new graph.
93+
See also [Start import](../data-loader/import.md).
94+
95+
![Data Loader Example Step 4 See your new graph](../../../images/arangograph-data-loader-example-data-import.png)
96+
97+
Finally, click **See your new graph** to open the ArangoDB web interface and
98+
explore your new collections and graph.
99+
100+
![Data Loader Example Step 4 Resulting graph](../../../images/arangograph-data-loader-example-resulting-graph.png)
101+
102+
Happy graphing!
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
---
2+
title: Start the import
3+
menuTitle: Start import
4+
weight: 15
5+
description: >-
6+
Once the data files are provided and the graph is designed, you can start the import
7+
archetype: default
8+
---
9+
10+
Before starting the actual import, make sure that:
11+
- You have selected a database for import or created a new one;
12+
- You have provided a valid name for your graph;
13+
- You have created at least one node;
14+
- You have created at least one edge;
15+
- You have uploaded at least one file;
16+
- Every file is related to at least one node or edge;
17+
- Every node and edge is linked to a file;
18+
- Every node and edge has a unique label;
19+
- Every node has a primary identifier selected;
20+
- Every edge has an origin and destination file header selected.
21+
22+
To continue with the import, click the **Save and start import** button. The data
23+
importer provides an overview showing results with the collections that have been
24+
created with the data provided in the files.
25+
26+
To access your newly created graph in the ArangoDB web interface, click the
27+
**See your new graph** button.
28+
29+
## File validation
30+
31+
Once the import has started, the files that you have provided are being validated.
32+
If the validation process detects parsing errors in any of the files, the import
33+
is temporarily paused and the validation errors are shown. You can get a full
34+
report by clicking the **See full report** button.
35+
36+
At this point, you can:
37+
- Continue with the import without addressing the errors. The CSV files will still
38+
be included in the migration. However, the invalid rows are skipped and
39+
excluded from the migration.
40+
- Revisit the problematic file(s), resolve the issues, and then re-upload the
41+
file(s) again.
42+
43+
{{< tip >}}
44+
To ensure the integrity of your data, it is recommended to address all the errors
45+
detected during the validation process.
46+
{{< /tip >}}
47+
48+
### Validation errors and their meanings
49+
50+
#### Invalid Quotation Mark
51+
52+
This error indicates issues with quotation marks in the CSV data.
53+
It can occur due to improper use of quotes.
54+
55+
#### Missing Quotation Marks
56+
57+
This error occurs when quotation marks are missing or improperly placed in the
58+
CSV data, potentially affecting data enclosure.
59+
60+
#### Insufficient Data Fields
61+
62+
This error occurs when a CSV row has fewer fields than expected. It may indicate
63+
missing or improperly formatted data.
64+
65+
#### Excessive Data Fields
66+
67+
This error occurs when a CSV row has more fields than expected, possibly due to
68+
extra data or formatting issues.
69+
70+
#### Unidentifiable Field Separator
71+
72+
This error suggests that the parser could not identify the field separator
73+
character in the CSV data.

0 commit comments

Comments
 (0)