Skip to content

Commit accdb09

Browse files
authored
feat: make track deletes work across execution (#5517)
The track deletes features currently assume the entire dataset is being fetched/saved within a single execution. Which means it isn't compatible with checkpoints and fetching/saving across multiple executions. To solve this problem, this PR is introducing a way to explicitly define the start and the end of the track deletes interval, storing the starting point in a special checkpoint that survive across executions, instead of assuming the start is always the beginning of the current execution cc @bastienbeurier to agree on the naming `trackDeletesStart/trackDeletesEnd` ex: ``` exec: async (nango) => { await nango.trackDeletesStart('MyModel'); ... await nango.batchSave([...], 'MyModel'); ... const deleted = await nango.trackDeletesEnd('MyModel'); } ``` <!-- Summary by @propel-code-bot --> --- The runner stores a per-model delete-window checkpoint keyed off the sync checkpoint and, when the window closes, uses it to invoke deletion of outdated records before clearing the checkpoint. <details> <summary><strong>Key Changes</strong></summary> • Added `trackDeletesStart`/`trackDeletesEnd` to `NangoSyncBase` and `NangoSyncRunner`, with per-model checkpoint keys and `deleteOutdatedRecords` using stored `syncJobId` • Refactored `Checkpointing` in `packages/runner/lib/sdk/checkpointing.ts` to manage per-key state and accept `key` arguments for checkpoint operations • Expanded CLI parser/compiler validations to track `trackDeletes` calls per model and enforce ordering around `batchSave` • Updated docs and examples to use `trackDeletesStart`/`trackDeletesEnd` and mark `deleteRecordsFromPreviousExecutions` as deprecated • Reset behavior in `packages/shared/lib/clients/orchestrator.ts` now hard-deletes all checkpoints with a key prefix </details> <details> <summary><strong>Possible Issues</strong></summary> • A run that calls `trackDeletesStart` but exits before `trackDeletesEnd` will leave the delete-window checkpoint in place for future runs • The per-key `stateByKey` map in `Checkpointing` is not pruned, which could grow in long-lived processes with many dynamic keys • Full reset now returns an error if `hardDeleteCheckpoints` fails, which could block user-initiated resets </details> --- *This summary was automatically generated by @propel-code-bot*
1 parent ca4b279 commit accdb09

File tree

17 files changed

+396
-93
lines changed

17 files changed

+396
-93
lines changed

docs/implementation-guides/use-cases/syncs/checkpoints.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ After the initial run completes, subsequent runs are fast and incremental — on
152152
Not every sync can be incremental. You need to fetch the full dataset when:
153153

154154
- **The external API doesn't support filtering by modification date** — Some APIs have no way to ask "give me everything that changed since X." In this case, you must fetch all records on every run.
155-
- **You need automated delete detection** — To detect deleted records automatically using [`deleteRecordsFromPreviousExecutions()`](/reference/functions#detect-deletions-automatically), Nango must compare the full current dataset against the previous one. This requires a full sync. (If the external API exposes deleted records, you can use [`batchDelete()`](/reference/functions#delete-records) with checkpoints instead — see [deletion detection guide](/implementation-guides/use-cases/syncs/deletion-detection).)
155+
- **You need automated delete detection** — To detect deleted records automatically using [`trackDeletesStart`/`trackDeletesEnd`](/reference/functions#detect-deletions-automatically), Nango compares the records that existed before `trackDeletesStart` against what was saved between `trackDeletesStart` and `trackDeletesEnd`. This requires fetching the complete dataset between the two calls. (If the external API exposes deleted records, you can use [`batchDelete()`](/reference/functions#delete-records) instead — see [deletion detection guide](/implementation-guides/use-cases/syncs/deletion-detection).)
156156

157157
For small datasets (e.g., a list of Slack users for an organization with fewer than 100 employees), a full sync on every run is perfectly fine:
158158

@@ -173,7 +173,7 @@ export default createSync({
173173
As datasets grow, full syncs become unscalable — taking longer to run, triggering rate limits, and consuming more compute and memory. If your dataset has more than a few thousand records and the API supports filtering by date or cursor, use checkpoints.
174174

175175
<Warning>
176-
Checkpoints are currently incompatible with `deleteRecordsFromPreviousExecutions()` because that function requires comparing the full dataset between consecutive runs. Support for checkpoints with full syncs is coming soon.
176+
`deleteRecordsFromPreviousExecutions()` is incompatible with checkpoints because it requires comparing the full dataset between consecutive runs. Use `trackDeletesStart`/`trackDeletesEnd` instead, which explicitly bounds the deletion detection window and supports checkpointed syncs.
177177
</Warning>
178178

179179
# Avoiding memory overuse

docs/implementation-guides/use-cases/syncs/deletion-detection.mdx

Lines changed: 25 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -84,17 +84,13 @@ export default createSync({
8484

8585
Syncs that fetch all records on every run can automatically detect deletions.
8686

87-
Nango can therefore detect removals by computing the diff between two consecutive result sets. Enable this behaviour by calling the `deleteRecordsFromPreviousExecutions` function. ([full reference](/reference/functions#detect-deletions-automatically)).
88-
89-
<Note>
90-
`deleteRecordsFromPreviousExecutions` does not work with checkpoint-based syncs because fetching the data incrementally prevents performing a diff and automatically detecting deletions.
91-
</Note>
87+
Nango detects removals by computing the diff between what existed before `trackDeletesStart` and what was saved between `trackDeletesStart` and `trackDeletesEnd`. ([full reference](/reference/functions#detect-deletions-automatically)).
9288

9389
### Example sync with deletion detection
9490

9591
```ts
96-
import { createSync } from 'nango';
97-
import * as z from 'zod';
92+
import { createSync } from nango;
93+
import * as z from zod;
9894

9995
const TicketSchema = z.object({
10096
id: z.string(),
@@ -103,55 +99,57 @@ const TicketSchema = z.object({
10399
});
104100

105101
export default createSync({
106-
description: 'Fetch all help-desk tickets',
107-
frequency: 'every day',
108-
endpoints: [{ method: 'GET', path: '/tickets', group: 'Tickets' }],
102+
description: Fetch all help-desk tickets,
103+
frequency: every day,
104+
endpoints: [{ method: GET, path: /tickets, group: Tickets }],
109105
models: { Ticket: TicketSchema },
110106

111107
exec: async (nango) => {
108+
// Mark the start of deletion tracking
109+
await nango.trackDeletesStart(‘Ticket’);
110+
112111
const tickets = await nango.paginate<{ id: string; subject: string; status: string }>({
113-
endpoint: '/tickets',
114-
paginate: { type: 'cursor', cursorPathInResponse: 'next', cursorNameInRequest: 'cursor', responsePath: 'tickets' }
112+
endpoint: /tickets,
113+
paginate: { type: cursor, cursorPathInResponse: next, cursorNameInRequest: cursor, responsePath: tickets }
115114
});
116115

117116
for await (const page of tickets) {
118-
await nango.batchSave(page, 'Ticket');
117+
await nango.batchSave(page, Ticket);
119118
}
120119

121-
// Delete records that don't exist anymore
122-
await nango.deleteRecordsFromPreviousExecutions('Ticket');
120+
// Detect and mark deleted records
121+
await nango.trackDeletesEnd(‘Ticket);
123122
}
124123
});
125124
```
126125

127126
### How the algorithm works
128127

129-
1. During the execution, Nango stores the list of record IDs.
130-
2. When calling `deleteRecordsFromPreviousExecutions`, Nango compares the new list with the old one.
131-
3. Any records missing in the new list are marked as deleted (soft delete). They remain accessible from the Nango cache, but with `record._metadata.deleted === true`.
128+
1. When `trackDeletesStart` is called, Nango marks the beginning of the deletion tracking window for the model.
129+
2. Records saved with `batchSave` between `trackDeletesStart` and `trackDeletesEnd` are tracked.
130+
3. When `trackDeletesEnd` is called, Nango compares what existed before `trackDeletesStart` with what was saved in the window.
131+
4. Any records missing from the new dataset are marked as deleted (soft delete). They remain accessible from the Nango cache, but with `record._metadata.deleted === true`.
132132

133133
<Warning>
134-
**Be careful with exception handling when using** `deleteRecordsFromPreviousExecutions`
134+
**Be careful with exception handling when using** `trackDeletesStart`/`trackDeletesEnd`
135135

136136
Nango only performs deletion detection (the “diff”) if a sync run completes successfully without any uncaught exceptions.
137137

138-
If you’re using `deleteRecordsFromPreviousExecutions`, exception handling becomes critical:
139-
- If your sync doesn’t fetch the full dataset, but still call `deleteRecordsFromPreviousExecutions` (e.g. you catch and swallow an exception), Nango will attempt the diff on an incomplete dataset.
138+
Exception handling is critical:
139+
- If your sync doesn’t fetch the full dataset between the two calls (e.g. you catch and swallow an exception), Nango will attempt the diff on an incomplete dataset.
140140
- This leads to false positives, where valid records are mistakenly considered deleted.
141141

142142
**What You Should Do**
143143

144-
If a failure prevents full data retrieval, make sure the sync run fails and `deleteRecordsFromPreviousExecutions` is not being called:
144+
If a failure prevents full data retrieval, make sure the sync run fails and `trackDeletesEnd` is not being called:
145145
- Let exceptions bubble up and interrupt the run.
146146
- If you’re using `try/catch`, re-throw exceptions that indicate incomplete data.
147147
</Warning>
148148

149149
<Tip>
150-
**How to use** `deleteRecordsFromPreviousExecutions` **safely**
151-
152-
If some records are incorrectly marked as deleted due to calling `deleteRecordsFromPreviousExecutions` improperly, you can trigger a full resync (via the UI or API) to restore the correct data state.
150+
**How to use** `trackDeletesStart`/`trackDeletesEnd` **safely**
153151

154-
However, because `deleteRecordsFromPreviousExecutions` relies on logic in your sync functions, mistakes there can lead to false deletions.
152+
If some records are incorrectly marked as deleted, you can trigger a full resync (via the UI or API) to restore the correct data state.
155153

156154
We strongly recommend not performing irreversible destructive actions (like hard-deleting records in your system) based solely on deletions reported by Nango. A full resync should always be able to recover from issues.
157155
</Tip>
@@ -160,7 +158,7 @@ We strongly recommend not performing irreversible destructive actions (like hard
160158

161159
| Symptom | Likely cause |
162160
| -------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
163-
| Records that still exist in the source API are shown as `deleted` in Nango | sync didn't save all records (silent failures) before calling `deleteRecordsFromPreviousExecutions` |
161+
| Records that still exist in the source API are shown as `deleted` in Nango | sync didnt save all records (silent failures) between `trackDeletesStart` and `trackDeletesEnd` |
164162
| You never see deleted records | Check if deletion detection is implemented for the sync. |
165163

166164
<Tip>

docs/reference/api/sync/prune-records.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ This is useful for compliance requirements, ensuring Nango doesn't hold onto you
1515
Pruning is not the same as marking a record as deleted from the external API.
1616
This endpoint prunes data from Nango’s cache only. It does not delete anything on the external API, and it is not the same as [detecting a deletion from the source](/implementation-guides/use-cases/syncs/implement-a-sync#deletion-detection).
1717

18-
If you need to tell your customers that a record was deleted on the external API while keeping its last-known payload in cache, use `batchDelete` or `deleteRecordsFromPreviousExecutions` in your sync functions instead.
18+
If you need to tell your customers that a record was deleted on the external API while keeping its last-known payload in cache, use `batchDelete` or `trackDeletesStart`/`trackDeletesEnd` in your sync functions instead.
1919
</Tip>
2020

2121
## Cursor behavior

docs/reference/functions.mdx

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,9 @@ description: "Full reference of the SDK available in Nango Functions."
2727
},
2828

2929
exec: async (nango) => {
30+
// Mark the start of deletion tracking
31+
await nango.trackDeletesStart('GithubIssueDemo');
32+
3033
// Fetch issues from GitHub.
3134
const res = await nango.get({
3235
endpoint: '/repos/NangoHQ/interactive-demo/issues?labels=demo&sort=created&direction=asc'
@@ -40,8 +43,8 @@ description: "Full reference of the SDK available in Nango Functions."
4043
// Persist issues to the Nango cache.
4144
await nango.batchSave(issues, 'GithubIssueDemo');
4245

43-
// Delete records that don't exist anymore
44-
await nango.deleteRecordsFromPreviousExecutions('GithubIssueDemo')
46+
// Detect and mark deleted records
47+
await nango.trackDeletesEnd('GithubIssueDemo');
4548
},
4649
});
4750
```
@@ -172,7 +175,7 @@ Read more about [integration functions](/guides/primitives/functions) to underst
172175
</ResponseField>
173176

174177
<ResponseField name="trackDeletes" type="boolean" deprecated>
175-
[DEPRECATED] Instead use `await nango.deleteRecordsFromPreviousExecutions('modelName')` inside the sync `exec` function.
178+
[DEPRECATED] Instead use `await nango.trackDeletesStart('modelName')` and `await nango.trackDeletesEnd('modelName')` inside the sync `exec` function.
176179

177180
When `trackDeletes` is set to `true`, Nango automatically detects deleted records **during full syncs only** and marks them as deleted in each record’s metadata (soft delete). These records remain stored in the cache.
178181

@@ -883,23 +886,27 @@ await nango.batchDelete(githubIssuesToDelete, 'GitHubIssue');
883886

884887
### Detect deletions automatically
885888

886-
Automatically detects and marks records as deleted by comparing the current sync execution with the previous one.
889+
Automatically detects and marks records as deleted by comparing what existed before `trackDeletesStart` with what was saved between `trackDeletesStart` and `trackDeletesEnd`.
887890

888891
<Tip>
889892
This does not remove cached payloads.
890893

891-
`nango.deleteRecordsFromPreviousExecutions()` is used to mark records as deleted in Nango because they were not returned by the external API. Nango may still keep the last-known payload so your customer can react to the deletion event.
894+
`nango.trackDeletesStart()`/`nango.trackDeletesEnd()` are used to mark records as deleted in Nango because they were not returned by the external API. Nango may still keep the last-known payload so your customer can react to the deletion event.
892895

893896
If you want to permanently remove data from Nango storage for cost or compliance reasons, use [record pruning instead](/reference/api/sync/prune-records).
894897
</Tip>
895898

896899
```js
897-
await nango.deleteRecordsFromPreviousExecutions('ModelName');
900+
await nango.trackDeletesStart('ModelName');
901+
902+
// ... fetch and save all records ...
903+
904+
await nango.trackDeletesEnd('ModelName');
898905
```
899906

900-
This function should be called at the end of your sync execution, after all records have been saved with `batchSave`. Nango will compare the records saved in the current execution with those from the previous execution and mark any missing records as deleted.
907+
Call `trackDeletesStart` at the beginning of your sync execution, before fetching any data. Call `trackDeletesEnd` after all records have been saved with `batchSave`. Nango will compare the records that existed before `trackDeletesStart` with those saved in the window and mark any missing records as deleted.
901908

902-
**Parameters**
909+
**Parameters** (both functions)
903910

904911
<Expandable>
905912
<ResponseField name="modelType" type="string" required>
@@ -909,11 +916,14 @@ This function should be called at the end of your sync execution, after all reco
909916

910917
**Important considerations:**
911918

912-
- Only use within syncs that fetch the complete dataset
913-
- Call this function only after successfully saving all records
914-
- If your sync fails or doesn't fetch the complete dataset, avoid calling this function as it may cause false deletions
919+
- Only use within syncs that fetch the complete dataset between `trackDeletesStart` and `trackDeletesEnd`
920+
- If your sync fails or doesn't fetch the complete dataset, avoid calling `trackDeletesEnd` as it may cause false deletions
915921
- Records are soft deleted (marked with `_metadata.deleted = true`) and remain in the cache
916922

923+
<Note>
924+
`deleteRecordsFromPreviousExecutions` is deprecated. Use `trackDeletesStart`/`trackDeletesEnd` instead.
925+
</Note>
926+
917927
For more details on deletion detection strategies, see the [detecting deletes guide](/implementation-guides/use-cases/syncs/deletion-detection.mdx).
918928

919929
### Update records

packages/cli/example/github/syncs/fetchIssues.ts

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ const sync = createSync({
3232

3333
// Sync execution
3434
exec: async (nango) => {
35+
await nango.trackDeletesStart('GithubIssue');
36+
3537
const repos = await getAllRepositories(nango);
3638

3739
for (const repo of repos) {
@@ -65,7 +67,7 @@ const sync = createSync({
6567
}
6668
}
6769

68-
await nango.deleteRecordsFromPreviousExecutions('GithubIssue');
70+
await nango.trackDeletesEnd('GithubIssue');
6971
},
7072

7173
// Webhook handler

packages/cli/lib/services/__snapshots__/model.service.unit.test.ts.snap

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -474,7 +474,12 @@ export declare class NangoSync<TCheckpoint = Checkpoint> extends NangoAction {
474474
batchUpdate<T extends object>(results: T[], model: string): Promise<boolean | null>;
475475
getMetadata<T = Metadata>(): Promise<T>;
476476
setMergingStrategy(merging: { strategy: 'ignore_if_modified_after' | 'override' }, model: string): Promise<void>;
477+
/**
478+
* @deprecated please use trackDeletesStart and trackDeletesEnd
479+
*/
477480
deleteRecordsFromPreviousExecutions(model: string): Promise<{ deletedKeys: string[] }>;
481+
trackDeletesStart(model: string): Promise<void>;
482+
trackDeletesEnd(model: string): Promise<{ deletedKeys: string[] }>;
478483
getRecordsByIds<K = string | number, T = any>(ids: K[], model: string): Promise<Map<K, T>>;
479484
getCheckpoint(): Promise<TCheckpoint | null>;
480485
saveCheckpoint(checkpoint: TCheckpoint): Promise<void>;

0 commit comments

Comments
 (0)