feat: implemented idempotency #25
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR implements idempotency in the database load function, resolving Issue #12.
It replaces the previous
df.to_sql(..., if_exists="append")method, which was vulnerable to creating duplicate entries if the pipeline was run multiple times.The new implementation:
CREATE TABLE IF NOT EXISTScommand to define the table schema and, critically, sets theemployee_idas thePRIMARY KEY.INSERT OR IGNOREquery withcursor.executemany(). This command instructs the database to skip any row where theemployee_idalready exists, ensuring data is not duplicated on subsequent runs.Semver Changes
Issues
Closes #12
Checklist