how checkpointing and stream_state handled in abstract_source.py #31071
Unanswered
shawnh310
asked this question in
Connector Questions
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have a scenario as following steps
in my custom source connector written in python, a subclass of HttpStream with incremental sync mode, it sends out an state message for checkpointing at the end of processing each slice in my overridden read_records method (so that even if there's no data in the given slice, we still advance the state)
in abstract_source.py after each slice is processed, it will try to send a checkpoint message as well (
airbyte/airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py
Line 276 in 4ebac2e
In the case where there's no data in the current slice, that line above will yield an stale stream_state, as it doesn't update the stream_state variable when my custom source connector was yielding an checkpoint state message.
I can bypass this by subclassing AbstractSource and change the behavior (not sending the checkpoint message at the end of processing each slice). But it seems like the use case and scenario should be properly handled by AbstractSource. I might misunderstand something and thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions