Skip to content

Commit 9422f52

Browse files
authored
Merge pull request #179 from HarperDB/dataloader
initial data loader documentation
2 parents 527a84a + 95e59ff commit 9422f52

File tree

4 files changed

+188
-1
lines changed

4 files changed

+188
-1
lines changed

docs/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
* [Caching](developers/applications/caching.md)
1414
* [Defining Schemas](developers/applications/defining-schemas.md)
1515
* [Defining Roles](developers/applications/defining-roles.md)
16+
* [Data Loader](developers/applications/data-loader.md)
1617
* [Debugging Applications](developers/applications/debugging.md)
1718
* [Define Fastify Routes](developers/applications/define-routes.md)
1819
* [Web Applications](developers/applications/web-applications.md)
Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
# Data Loader
2+
3+
The Data Loader is a built-in component that provides a reliable mechanism for loading data from JSON or YAML files into Harper tables as part of component deployment. This feature is particularly useful for ensuring specific records exist in your database when deploying components, such as seed data, configuration records, or initial application data.
4+
5+
## Configuration
6+
7+
To use the Data Loader, first specify your data files in the `config.yaml` in your component directory:
8+
9+
```yaml
10+
dataLoader:
11+
files: 'data/*.json'
12+
```
13+
14+
The Data Loader is an [Extension](../components/reference.md#extensions) and supports the standard `files` configuration option.
15+
16+
## Data File Format
17+
18+
Data files can be structured as either JSON or YAML files containing the records you want to load. Each data file must specify records for a single table - if you need to load data into multiple tables, create separate data files for each table.
19+
20+
### Basic Example
21+
22+
Create a data file in your component's data directory (one table per file):
23+
24+
```json
25+
{
26+
"database": "myapp",
27+
"table": "users",
28+
"records": [
29+
{
30+
"id": 1,
31+
"username": "admin",
32+
"email": "[email protected]",
33+
"role": "administrator"
34+
},
35+
{
36+
"id": 2,
37+
"username": "user1",
38+
"email": "[email protected]",
39+
"role": "standard"
40+
}
41+
]
42+
}
43+
```
44+
45+
### Multiple Tables
46+
47+
To load data into multiple tables, create separate data files for each table:
48+
49+
**users.json:**
50+
```json
51+
{
52+
"database": "myapp",
53+
"table": "users",
54+
"records": [
55+
{
56+
"id": 1,
57+
"username": "admin",
58+
"email": "[email protected]"
59+
}
60+
]
61+
}
62+
```
63+
64+
**settings.yaml:**
65+
```yaml
66+
database: myapp
67+
table: settings
68+
records:
69+
- id: 1
70+
setting_name: app_name
71+
setting_value: My Application
72+
- id: 2
73+
setting_name: version
74+
setting_value: "1.0.0"
75+
```
76+
77+
## File Organization
78+
79+
You can organize your data files in various ways:
80+
81+
### Single File Pattern
82+
```yaml
83+
dataLoader:
84+
files: 'data/seed-data.json'
85+
```
86+
87+
### Multiple Files Pattern
88+
```yaml
89+
dataLoader:
90+
files:
91+
- 'data/users.json'
92+
- 'data/settings.yaml'
93+
- 'data/initial-products.json'
94+
```
95+
96+
### Glob Pattern
97+
```yaml
98+
dataLoader:
99+
files: 'data/**/*.{json,yaml,yml}'
100+
```
101+
102+
## Loading Behavior
103+
104+
When Harper starts up with a component that includes the Data Loader:
105+
106+
1. The Data Loader reads all specified data files (JSON or YAML)
107+
2. For each file, it validates that a single table is specified
108+
3. Records are inserted or updated based on timestamp comparison:
109+
- New records are inserted if they don't exist
110+
- Existing records are updated only if the data file's modification time is newer than the record's updated time
111+
- This ensures data files can be safely reloaded without overwriting newer changes
112+
4. If records with the same primary key already exist, updates occur only when the file is newer
113+
114+
Note: While the Data Loader can create tables automatically by inferring the schema from the provided records, it's recommended to define your table schemas explicitly using the [graphqlSchema](../applications/defining-schemas.md) component for better control and type safety.
115+
116+
## Best Practices
117+
118+
1. **Define Schemas First**: While the Data Loader can infer schemas, it's strongly recommended to define your table schemas and relations explicitly using the [graphqlSchema](../applications/defining-schemas.md) component before loading data. This ensures proper data types, constraints, and relationships between tables.
119+
120+
2. **One Table Per File**: Remember that each data file can only load records into a single table. Organize your files accordingly.
121+
122+
3. **Idempotency**: Design your data files to be idempotent - they should be safe to load multiple times without creating duplicate or conflicting data.
123+
124+
4. **Version Control**: Include your data files in version control to ensure consistency across deployments.
125+
126+
5. **Environment-Specific Data**: Consider using different data files for different environments (development, staging, production).
127+
128+
6. **Data Validation**: Ensure your data files are valid JSON or YAML and match your table schemas before deployment.
129+
130+
7. **Sensitive Data**: Avoid including sensitive data like passwords or API keys directly in data files. Use environment variables or secure configuration management instead.
131+
132+
## Example Component Structure
133+
134+
```
135+
my-component/
136+
├── config.yaml
137+
├── data/
138+
│ ├── users.json
139+
│ ├── roles.json
140+
│ └── settings.json
141+
├── schemas.graphql
142+
└── roles.yaml
143+
```
144+
145+
With this structure, your `config.yaml` might look like:
146+
147+
```yaml
148+
# Load environment variables first
149+
loadEnv:
150+
files: '.env'
151+
152+
# Define schemas
153+
graphqlSchema:
154+
files: 'schemas.graphql'
155+
156+
# Define roles
157+
roles:
158+
files: 'roles.yaml'
159+
160+
# Load initial data
161+
dataLoader:
162+
files: 'data/*.json'
163+
164+
# Enable REST endpoints
165+
rest: true
166+
```
167+
168+
## Related Documentation
169+
170+
- [Built-In Components](../components/built-in.md)
171+
- [Extensions](../components/reference.md#extensions)
172+
- [Bulk Operations](../operations-api/bulk-operations.md) - For loading data via the Operations API

docs/developers/components/built-in.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
Harper provides extended features using built-in components. They do **not** need to be installed with a package manager, and simply must be specified in a config to run. These are used throughout many Harper docs, guides, and examples. Unlike external components which have their own semantic versions, built-in components follow Harper's semantic version.
44

55
- [Built-In Components](#built-in-components)
6+
- [dataLoader](#dataloader)
67
- [fastifyRoutes](#fastifyroutes)
78
- [graphql](#graphql)
89
- [graphqlSchema](#graphqlschema)
@@ -16,6 +17,19 @@ Harper provides extended features using built-in components. They do **not** nee
1617

1718
<!-- ## clustering -->
1819

20+
## dataLoader
21+
22+
Load data from JSON or YAML files into Harper tables as part of component deployment.
23+
24+
This component is an [Extension](./reference.md#extensions) and can be configured with the `files` configuration option.
25+
26+
Complete documentation for this feature is available here: [Data Loader](../applications/data-loader.md)
27+
28+
```yaml
29+
dataLoader:
30+
files: 'data/*.json'
31+
```
32+
1933
## fastifyRoutes
2034
2135
Specify custom endpoints using [Fastify](https://fastify.dev/).

docs/technical-details/release-notes/4.tucker/4.6.0.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ The new logger is now based on the Node.js Console API, with improved the format
1717
An important change is that logging to standard out/error will _not_ include the timestamp.
1818

1919
### Data Loader
20-
4.6 includes a new data loader that can be used to load data into HarperDB as part of a component. The data loader can be used to load data from JSON file and can be deployed and distributed with a component to provide a reliable mechanism for ensuring specific records are loaded into Harper.
20+
4.6 includes a new [data loader](../../../../developers/applications/data-loader.md) that can be used to load data into HarperDB as part of a component. The data loader can be used to load data from JSON file and can be deployed and distributed with a component to provide a reliable mechanism for ensuring specific records are loaded into Harper.
2121

2222
### Resource API Upgrades
2323
4.6 includes an upgraded form of the Resource API that can be selected with significant improvements in ease of use.

0 commit comments

Comments
 (0)