Data-Engineer-Task

Overview

This document provides a detailed explanation of the Data Engineering task performed using Python (Pandas) in Jupyter Notebook. Due to account-specific issues preventing AWS Glue and Lambda from functioning properly, data transformation was implemented using Pandas.

Initially: What I did in AWS using dataset

Data Loading to AWS S3

I load the dataset in the S3 bucket with proper IAM.

Then I created RDBMS on AWS like Configured an RDS instance with MySQL.

Now everything is setup. Next part is to transform the data using AWS Glue or lambda. In that case, I faced an issues in my account while using Glue and lambda. I can't able to perform any trasnformation.

So that only I tried using Pandas because as I learned pandas is also used for transformation but only for small amount of data.

Challenges Faced in AWS

AWS Glue and Lambda were not functional due to account issues.

Workaround: Performed data transformation locally using Pandas.

Task Breakdown

1. Data Transformation using Pandas in Jupyter Notebook

The raw data was loaded into a Pandas DataFrame.

Applied data cleaning techniques:

Handling missing values.

Standardizing column names.

Removing duplicate records.

Applied transformation logic:

Data type conversion.

Feature engineering.

Normalization and formatting.

Visulaized the Transformed data in Pandas for easy understanding.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Data Engineer Task.ipynb		Data Engineer Task.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Engineer-Task

About

Uh oh!

Releases

Packages

Languages

girivelan1507/Data-Engineer-Task

Folders and files

Latest commit

History

Repository files navigation

Data-Engineer-Task

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages