Skip to content

Data Wrangling: From Messy Data to Meaningful Insight - delivered to my Space Science students

Notifications You must be signed in to change notification settings

steviecurran/wrangling-lecture

Repository files navigation

Data Wrangling: From Messy Data to Meaningful Insight

Topics covered include:

  • Reading and structuring data from multiple formats (CSV, Excel, fixed-width, text)
  • Identifying and fixing common data quality issues (missing values, incorrect types, inconsistent units)
  • Filtering, grouping, merging, and reshaping datasets using pandas
  • Practical strategies for dealing with categorical vs numerical data
  • Cleaning and analysing real-world datasets (including time series)
  • Visualising results and checking that conclusions actually make sense

Given in conjunction with two notebboks:

  • Ex1.ipynb, which uses data we worked with in 2nd year for the of concepts mean, variance, the central limit theorem and A/B testing

    Here it is used to introduce dataframes and how these can be combined

  • Ex2.ipynb, which uses data on the number of fires in the Amazon rainforest, to demonstrate some data cleaning and visualisation:

    • Renaming parameters (column names), including the use of dictionaries
    • Identifying problems with the data
    • Stripping strings
    • Missing values
    • Grouping data
    • Visualisation
    • Time Series
    • Significance and correlation

    The latter of which shows that the number of fires has been steadily increasing over the range of the data (1998 - 2018)

Question_bank.pdf contains two practical exercises with the solutions in quakes.ipynb and spectrum.py

About

Data Wrangling: From Messy Data to Meaningful Insight - delivered to my Space Science students

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published