How not to ingest a file more than once? #355
Unanswered
alberto-lanfranco-storebrand
asked this question in
Q&A
Replies: 1 comment 3 replies
-
Yes, need to document this better. Based on file timestamp: source: azure
target: my_db
defaults:
update_key: _sling_loaded_at # <-- tells sling to use the file timestamp for comparison
object: my_schema.{stream_file_name}
streams:
"path/to/my/folder/*.csv":
env:
SLING_LOADED_AT_COLUMN: unix Based on a column in a file (not what you're asking, will scan all files again, but stream only rows after latest max(update_key)): source: local
target: postgres
defaults:
mode: incremental
update_key: create_dt
primary_key: id
object: public.incremental_csv
target_options:
adjust_column_type: true
streams:
cmd/sling/tests/files/test1.csv:
cmd/sling/tests/files/test1.upsert.csv: |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I'm currently implementing Sling in my architecture.
I'm ingesting from an Azure blob storage with a lot of big csv files, but at each iteration I only want to ingest the new files since last execution.
What is the best practice on how to implement this in Sling?
I have a feeling in might involve the
_SLING_STREAM_URL
column, andupdate_key
in the replication file, but I don't know exactly how to make it work.Beta Was this translation helpful? Give feedback.
All reactions