Skip to content

Commit d8ed85a

Browse files
committed
Data Loader update
1 parent 80dc52c commit d8ed85a

File tree

2 files changed

+36
-8
lines changed

2 files changed

+36
-8
lines changed

docs/developers/applications/data-loader.md

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -114,11 +114,25 @@ When Harper starts up with a component that includes the Data Loader:
114114

115115
1. The Data Loader reads all specified data files (JSON or YAML)
116116
1. For each file, it validates that a single table is specified
117-
1. Records are inserted or updated based on timestamp comparison:
117+
1. Records are inserted or updated based on content hash comparison:
118118
- New records are inserted if they don't exist
119-
- Existing records are updated only if the data file's modification time is newer than the record's updated time
120-
- This ensures data files can be safely reloaded without overwriting newer changes
121-
1. If records with the same primary key already exist, updates occur only when the file is newer
119+
- Existing records are updated only if the data file content has changed
120+
- User modifications made via Operations API or other methods are preserved - those records won't be overwritten
121+
- Users can add extra fields to data-loader records without blocking future updates to the original fields
122+
1. The Data Loader uses SHA-256 content hashing stored in a system table (`hdb_dataloader_hash`) to track which records it has loaded and detect changes
123+
124+
### Change Detection
125+
126+
The Data Loader intelligently handles various scenarios:
127+
128+
- **New records**: Inserted with their content hash stored
129+
- **Unchanged records**: Skipped (no database writes)
130+
- **Changed data file**: Records are updated using `patch` to preserve any extra fields users may have added
131+
- **User-created records**: Records created outside the Data Loader (via Operations API, REST, etc.) are never overwritten
132+
- **User-modified records**: Records modified after being loaded are preserved and not overwritten
133+
- **User-added fields**: Extra fields added to data-loader records are preserved during updates
134+
135+
This approach ensures data files can be safely reloaded across deployments and node scaling without losing user modifications.
122136

123137
Note: While the Data Loader can create tables automatically by inferring the schema from the provided records, it's recommended to define your table schemas explicitly using the [graphqlSchema](../applications/defining-schemas) component for better control and type safety.
124138

versioned_docs/version-4.6/developers/applications/data-loader.md

Lines changed: 18 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -114,11 +114,25 @@ When Harper starts up with a component that includes the Data Loader:
114114

115115
1. The Data Loader reads all specified data files (JSON or YAML)
116116
1. For each file, it validates that a single table is specified
117-
1. Records are inserted or updated based on timestamp comparison:
117+
1. Records are inserted or updated based on content hash comparison:
118118
- New records are inserted if they don't exist
119-
- Existing records are updated only if the data file's modification time is newer than the record's updated time
120-
- This ensures data files can be safely reloaded without overwriting newer changes
121-
1. If records with the same primary key already exist, updates occur only when the file is newer
119+
- Existing records are updated only if the data file content has changed
120+
- User modifications made via Operations API or other methods are preserved - those records won't be overwritten
121+
- Users can add extra fields to data-loader records without blocking future updates to the original fields
122+
1. The Data Loader uses SHA-256 content hashing stored in a system table (`hdb_dataloader_hash`) to track which records it has loaded and detect changes
123+
124+
### Change Detection
125+
126+
The Data Loader intelligently handles various scenarios:
127+
128+
- **New records**: Inserted with their content hash stored
129+
- **Unchanged records**: Skipped (no database writes)
130+
- **Changed data file**: Records are updated using `patch` to preserve any extra fields users may have added
131+
- **User-created records**: Records created outside the Data Loader (via Operations API, REST, etc.) are never overwritten
132+
- **User-modified records**: Records modified after being loaded are preserved and not overwritten
133+
- **User-added fields**: Extra fields added to data-loader records are preserved during updates
134+
135+
This approach ensures data files can be safely reloaded across deployments and node scaling without losing user modifications.
122136

123137
Note: While the Data Loader can create tables automatically by inferring the schema from the provided records, it's recommended to define your table schemas explicitly using the [graphqlSchema](../applications/defining-schemas) component for better control and type safety.
124138

0 commit comments

Comments
 (0)