Skip to content

Conversation

@Satvik-Singh192
Copy link
Contributor

Description

This PR implements idempotency in the database load function, resolving Issue #12.

It replaces the previous df.to_sql(..., if_exists="append") method, which was vulnerable to creating duplicate entries if the pipeline was run multiple times.

The new implementation:

  1. Adds a CREATE TABLE IF NOT EXISTS command to define the table schema and, critically, sets the employee_id as the PRIMARY KEY.
  2. Uses a bulk INSERT OR IGNORE query with cursor.executemany(). This command instructs the database to skip any row where the employee_id already exists, ensuring data is not duplicated on subsequent runs.

Semver Changes

  • Patch (bug fix, no new features)
  • Minor (new features, no breaking changes)
  • Major (breaking changes)

Issues

Closes #12

Checklist

@Dheerajyadav1 Dheerajyadav1 merged commit 6e5d072 into OPCODE-Open-Spring-Fest:main Oct 23, 2025
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Idempotency in Database Load

2 participants