Python CSV to SQL Converter

Automating Data Cleaning and SQL Script Generation Using Python and Pandas

Context: I embarked on a MySQL project as part of IBM's SQL for Data Science course on edX. While working on the project, I encountered a discrepancy between the provided picture and the exercise requirements. The picture only contained data for three employees, but the exercise required data for ten employees.

Problem: To resolve the discrepancy, I needed to modify my SQL script to load all the necessary employee data. However, the CSV file provided in a previous lab had formatting issues when loaded into Google Sheets. Manually cleaning the data was not an ideal solution.

Solution: To overcome this challenge, I turned to Python as a powerful tool for data cleaning and transformation. I leveraged the pandas library to read the CSV file and performed data cleaning operations such as removing unnecessary columns and formatting inconsistencies. By using Python, I could efficiently clean the data and prepare it for generating SQL scripts required to load the data accurately into the database tables.

Struggles and Learning Points:

During my exploration of database design, SQL queries, and Python programming, I encountered challenges and gained valuable insights. Here's a summary of what I learned:

Python and pandas: I learned how to read CSV files using pd.read_csv() and handle header configurations. Additionally, I gained proficiency in iterating through pandas DataFrames using the iterrows() function.
SQL: I discovered the importance of foreign key constraints for establishing relationships between tables. Indexes became clear to me as a means of optimizing query performance. I also learned how to add columns to existing tables using the ALTER TABLE statement, and I became familiar with the INSERT INTO statement for inserting data.
join() statements: I grasped the use of the join() method, which allowed me to concatenate values efficiently. For example, I used ', '.join(values) to concatenate multiple values in an SQL query.

Overall, my exploration of Python and pandas improved my data manipulation skills, and in SQL, I gained knowledge about database design principles, optimizing queries, and managing table structures. These insights will aid me in effectively designing and managing databases. By referring to relevant documentation and resources, I will continue expanding my knowledge and applying these concepts to my specific use case.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Clean CSV and Create SQL Script.ipynb		Clean CSV and Create SQL Script.ipynb
Employees_updated.csv		Employees_updated.csv
HR_Database_Create_Tables_Script.sql		HR_Database_Create_Tables_Script.sql
README.md		README.md
database-for-hr.png		database-for-hr.png
original-insert-statements.sql		original-insert-statements.sql
sql-query-exercises.sql		sql-query-exercises.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python CSV to SQL Converter

Automating Data Cleaning and SQL Script Generation Using Python and Pandas

Struggles and Learning Points:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Python CSV to SQL Converter

Automating Data Cleaning and SQL Script Generation Using Python and Pandas

Struggles and Learning Points:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages