-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
I work at DBT and have been improving an ETL pipeline for gov.uk content we have based on parameters the department needs. I'd like to configure it so it ingests and overwrites data that's changed rather than ingesting everything over and over again.
My plan is:
- Use the search API and the
updated_atfield to return results changed in the last few days - Use the content API to fetch the content, recursing through related pages to pick up collection children etc, again filtering on
updated_atfor new content
From the other side of the API, is that a good plan?
- Is
updated_atreliably updated? Is is safe to base a pipeline on? - Do I actually need to recurse through the children once this in incremental? It's in there as we found filtering on our department in the search API missed lots of documents our department published in related pages
- On testing I'll sometimes get
JSONDecodeErrorfor very new items, which makes me think I'm picking up drafts. Is there a field I'm missing to ignore these until they're ready?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels