Skip to content

Commit ae242dd

Browse files
Merge pull request #38 from pushpam345/featduplicatePushpam
feat: remove duplicates
2 parents bbadb72 + 68f1d95 commit ae242dd

File tree

1 file changed

+6
-3
lines changed

1 file changed

+6
-3
lines changed

app/etl/transform.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,14 @@ def transform(df: pd.DataFrame) -> pd.DataFrame:
2222

2323
# Handle duplicates
2424
initial_rows = len(df_transformed)
25-
# TODO (Find & Fix): Duplicates are not removed
25+
# Removing duplicates
26+
df_transformed=df_transformed.drop_duplicates()
27+
2628
duplicates_removed = initial_rows - len(df_transformed)
2729
if duplicates_removed > 0:
28-
# TODO (Find & Fix): Should log how many duplicates were removed
29-
pass
30+
# Number of duplicates removed
31+
print(f"✅ Removed {duplicates_removed} duplicate rows.")
32+
3033

3134
# Handle null values in numeric columns
3235
numeric_columns = df_transformed.select_dtypes(include=['number']).columns

0 commit comments

Comments
 (0)