You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/developers/applications/data-loader.md
+18-4Lines changed: 18 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -114,11 +114,25 @@ When Harper starts up with a component that includes the Data Loader:
114
114
115
115
1. The Data Loader reads all specified data files (JSON or YAML)
116
116
1. For each file, it validates that a single table is specified
117
-
1. Records are inserted or updated based on timestamp comparison:
117
+
1. Records are inserted or updated based on content hash comparison:
118
118
- New records are inserted if they don't exist
119
-
- Existing records are updated only if the data file's modification time is newer than the record's updated time
120
-
- This ensures data files can be safely reloaded without overwriting newer changes
121
-
1. If records with the same primary key already exist, updates occur only when the file is newer
119
+
- Existing records are updated only if the data file content has changed
120
+
- User modifications made via Operations API or other methods are preserved - those records won't be overwritten
121
+
- Users can add extra fields to data-loader records without blocking future updates to the original fields
122
+
1. The Data Loader uses SHA-256 content hashing stored in a system table (`hdb_dataloader_hash`) to track which records it has loaded and detect changes
123
+
124
+
### Change Detection
125
+
126
+
The Data Loader intelligently handles various scenarios:
127
+
128
+
- **New records**: Inserted with their content hash stored
129
+
- **Unchanged records**: Skipped (no database writes)
130
+
- **Changed data file**: Records are updated using `patch` to preserve any extra fields users may have added
131
+
- **User-created records**: Records created outside the Data Loader (via Operations API, REST, etc.) are never overwritten
132
+
- **User-modified records**: Records modified after being loaded are preserved and not overwritten
133
+
- **User-added fields**: Extra fields added to data-loader records are preserved during updates
134
+
135
+
This approach ensures data files can be safely reloaded across deployments and node scaling without losing user modifications.
122
136
123
137
Note: While the Data Loader can create tables automatically by inferring the schema from the provided records, it's recommended to define your table schemas explicitly using the [graphqlSchema](../applications/defining-schemas) component for better control and type safety.
Copy file name to clipboardExpand all lines: versioned_docs/version-4.6/developers/applications/data-loader.md
+18-4Lines changed: 18 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -114,11 +114,25 @@ When Harper starts up with a component that includes the Data Loader:
114
114
115
115
1. The Data Loader reads all specified data files (JSON or YAML)
116
116
1. For each file, it validates that a single table is specified
117
-
1. Records are inserted or updated based on timestamp comparison:
117
+
1. Records are inserted or updated based on content hash comparison:
118
118
- New records are inserted if they don't exist
119
-
- Existing records are updated only if the data file's modification time is newer than the record's updated time
120
-
- This ensures data files can be safely reloaded without overwriting newer changes
121
-
1. If records with the same primary key already exist, updates occur only when the file is newer
119
+
- Existing records are updated only if the data file content has changed
120
+
- User modifications made via Operations API or other methods are preserved - those records won't be overwritten
121
+
- Users can add extra fields to data-loader records without blocking future updates to the original fields
122
+
1. The Data Loader uses SHA-256 content hashing stored in a system table (`hdb_dataloader_hash`) to track which records it has loaded and detect changes
123
+
124
+
### Change Detection
125
+
126
+
The Data Loader intelligently handles various scenarios:
127
+
128
+
- **New records**: Inserted with their content hash stored
129
+
- **Unchanged records**: Skipped (no database writes)
130
+
- **Changed data file**: Records are updated using `patch` to preserve any extra fields users may have added
131
+
- **User-created records**: Records created outside the Data Loader (via Operations API, REST, etc.) are never overwritten
132
+
- **User-modified records**: Records modified after being loaded are preserved and not overwritten
133
+
- **User-added fields**: Extra fields added to data-loader records are preserved during updates
134
+
135
+
This approach ensures data files can be safely reloaded across deployments and node scaling without losing user modifications.
122
136
123
137
Note: While the Data Loader can create tables automatically by inferring the schema from the provided records, it's recommended to define your table schemas explicitly using the [graphqlSchema](../applications/defining-schemas) component for better control and type safety.
0 commit comments