codedex-io
diff --git a/‎projects/50-terminal-project-ideas-using-python/50-terminal-project-ideas-using-python.mdx‎
Lines changed: 5 additions & 0 deletions b/‎projects/50-terminal-project-ideas-using-python/50-terminal-project-ideas-using-python.mdx‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎projects/analyze-spreadsheet-data-with-pandas-chatgpt/analyze-spreadsheet-data-with-pandas-chatgpt.mdx‎
Lines changed: 27 additions & 35 deletions b/‎projects/analyze-spreadsheet-data-with-pandas-chatgpt/analyze-spreadsheet-data-with-pandas-chatgpt.mdx‎
Lines changed: 27 additions & 35 deletions
diff --git a/‎projects/analyze-us-census-data-with-scipy/analyze-us-census-data-with-scipy.mdx‎
Lines changed: 18 additions & 41 deletions b/‎projects/analyze-us-census-data-with-scipy/analyze-us-census-data-with-scipy.mdx‎
Lines changed: 18 additions & 41 deletions
@@ -7,6 +7,11 @@ description: A list of 50 terminal project ideas to implement in your choice of
 published: live
 header: https://raw.githubusercontent.com/codedex-io/projects/main/projects/50-terminal-project-ideas-using-python/header.png
 bannerImage: https://raw.githubusercontent.com/codedex-io/projects/main/projects/50-terminal-project-ideas-using-python/header.png
+prerequisites: Python fundamentals
+versions: Python 3.10
+courses:
+  - python
+readTime: 20
 tags:
   - beginner
   - python
 
@@ -7,52 +7,40 @@ description: Learn how to import and analyze Amazon data with pandas, a Python l
 published: live
 header: https://raw.githubusercontent.com/codedex-io/projects/main/projects/analyze-spreadsheet-data-with-pandas-chatgpt/header.png
 bannerImage: https://raw.githubusercontent.com/codedex-io/projects/main/projects/analyze-spreadsheet-data-with-pandas-chatgpt/header.png
+readTime: 60
+prerequisites: Python fundamentals
+versions: Python 3.9.6, pandas 2.0.1
+courses:
+  - python
 tags:
   - intermediate
   - python
 ---
 
-<BannerImage
- link="https://raw.githubusercontent.com/codedex-io/projects/main/projects/analyze-spreadsheet-data-with-pandas-chatgpt/header.png"
- description="Title Image"
- uid={true}
- cl="for-sidebar"
-/>
-
-# Analyze Best Selling Amazon Books with Pandas
-
-<AuthorAvatar
- author_name="Grace Peters"
- author_avatar="/images/projects/authors/grace-peters-chatgpt.png"
- username="gracepeters"
- uid={true}
-/>
-
-<BannerImage
- link="https://raw.githubusercontent.com/codedex-io/projects/main/projects/analyze-spreadsheet-data-with-pandas-chatgpt/header.png"
- description="Title Image"
- uid={true}
-/>
-
-**Prerequisites:** Python
-**Versions:** Python 3.9.6, pandas 2.0.1
-**Read Time:** 60 minutes
-
 ## Introduction
 
 Python is a powerful programming language that can be used for a variety of tasks, including analyzing data from a CSV file. We'll go over how to use Python to import data and run an analysis on it. We'll be using the <a href="https://pandas.pydata.org/" target="_blank">pandas</a> library, a popular data analysis tool for Python.
 
-<a href="https://www.amazon.com/gp/bestsellers/books" target="_blank">Amazon Best Sellers</a> are updated every hour. The actual list is made of 100 books, but the data we're working with features just the top 50 books. 📖
+<a href="https://www.amazon.com/gp/bestsellers/books" target="_blank">
+  Amazon Best Sellers
+</a> are updated every hour. The actual list is made of 100 books, but the data we're
+working with features just the top 50 books. 📖
 
 ## The Dataset
 
 In this tutorial, we will work with a CSV (comma-separated values) file that features some fun data about the top 50 best selling books on Amazon from 2009 to 2019 (provided by <a href="https://www.kaggle.com/datasets/sootersaalu/amazon-top-50-bestselling-books-2009-2019?resource=download" target="_ blank">Kaggle</a>).
 
-<img src="https://raw.githubusercontent.com/codedex-io/projects/main/projects/analyze-spreadsheet-data-with-pandas-chatgpt/best-sellers-csv-data.png" alt="Best seller CSV data" />
+<img
+  src="https://raw.githubusercontent.com/codedex-io/projects/main/projects/analyze-spreadsheet-data-with-pandas-chatgpt/best-sellers-csv-data.png"
+  alt="Best seller CSV data"
+/>
 
 **Note**: If you don't have a Kaggle account, you can also download it <a href="https://github.com/codedex-io/projects/blob/main/projects/analyze-spreadsheet-data-with-pandas-chatgpt/amazon-best-sellers-analysis/bestsellers.csv" target="_blank">here</a> in our GitHub:
 
-<img src="https://raw.githubusercontent.com/codedex-io/projects/main/projects/analyze-spreadsheet-data-with-pandas-chatgpt/file_download_btn_github.png" alt="File Download Button on GitHub" />
+<img
+  src="https://raw.githubusercontent.com/codedex-io/projects/main/projects/analyze-spreadsheet-data-with-pandas-chatgpt/file_download_btn_github.png"
+  alt="File Download Button on GitHub"
+/>
 
 The **.csv** file contains 550 books. Here are the seven columns:
 
@@ -91,7 +79,7 @@ import pandas as pd
 ## Step 2: Import pandas and Load the Spreadsheet
 
 Next, we need to import the pandas library and load the data into our Python program.
-Download the **bestsellers.csv**  file and add it to the same folder as your **main.py** file, **amazon-best-sellers-analysis**.
+Download the **bestsellers.csv** file and add it to the same folder as your **main.py** file, **amazon-best-sellers-analysis**.
 
 To read CSV files, we'll use the `.read_csv()` function provided by pandas. Then we will save this data to a new `df` variable:
 
@@ -199,7 +187,7 @@ Here are a few examples:
 
 Using methods from our `df` DataFrame object, we can get a glimpse of which authors have the most books on the Amazon Best Sellers list.
 
-This can be done by selecting the  `'Author'` column data and using the `value_counts()` method. We can assign this to an `author_counts` variable:
+This can be done by selecting the `'Author'` column data and using the `value_counts()` method. We can assign this to an `author_counts` variable:
 
 ```py
 author_counts = df['Author'].value_counts()
@@ -266,13 +254,17 @@ Congratulations! We've made it to the end of the tutorial! 🎊
 
 We were able to harness the power of Python libraries like pandas to analyze data from a CSV file. Specifically, we did the following:
 
-- Imported book data about the top 50   books on Amazon from 2009 to 2019.
+- Imported book data about the top 50 books on Amazon from 2009 to 2019.
 - Explored and cleaned the data with DataFrame methods.
-- Exported the modified data to a new CSV file.  
+- Exported the modified data to a new CSV file.
 
 View the full source for this project <a href="https://github.com/codedex-io/projects/blob/main/projects/analyze-spreadsheet-data-with-pandas-chatgpt/amazon-best-sellers-analysis/main.py" target="_blank">here</a>.
 
 Also, check out the following resources to learn more about data analysis with Python:
 
-- <a href="https://pandas.pydata.org/docs/" target="_blank">pandas documentation</a>
-- <a href="https://dataanalysispython.readthedocs.io/en/latest" target="_blank">Data Analysis in Python (Read the Docs)</a>
+- <a href="https://pandas.pydata.org/docs/" target="_blank">
+    pandas documentation
+  </a>
+- <a href="https://dataanalysispython.readthedocs.io/en/latest" target="_blank">
+    Data Analysis in Python (Read the Docs)
+  </a>
@@ -6,54 +6,31 @@ datePublished: 2025-01-13
 published: live
 description: Learn how to analyze U.S. census data with SciPy
 header: https://firebasestorage.googleapis.com/v0/b/codedex-io.appspot.com/o/assets%2Findex%2F12423.png?alt=media&token=721aaaaa-f431-438e-bd19-c3f6a97afb41
+readTime: 45
+prerequisites: <a href="/python">Python</a>, <a href="/numpy">NumPy</a>, SciPy
+versions: Python 3
 tags:
   - intermediate
   - python
 ---
 
-<BannerImage
-  link="https://firebasestorage.googleapis.com/v0/b/codedex-io.appspot.com/o/assets%2Findex%2F12423.png?alt=media&token=721aaaaa-f431-438e-bd19-c3f6a97afb41"
-  description="Title Image"
-  uid={true}
-  cl="for-sidebar"
-/>
-
-# Analyze U.S. Census Data with SciPy
-
-<AuthorAvatar
-  author_name=""
-  author_avatar="/images/projects/authors/"
-  username=""
-  uid={true}
-/>
-
-<BannerImage
-  link="https://firebasestorage.googleapis.com/v0/b/codedex-io.appspot.com/o/assets%2Findex%2F12423.png?alt=media&token=721aaaaa-f431-438e-bd19-c3f6a97afb41"
-  description="Banner"
-  uid={true}
-/>
-
-**Prerequisites**: Python, NumPy, SciPy
-**Version**: Python 3
-**Read Time**: 45 minutes
-
 ## Introduction
 
-No matter where you are on your journey to mastering data science, it's always helpful to practice the basics of finding, cleaning, and analyzing real-world datasets. Back in 2020, COVID-19 sent us many of us into quarantine and while its long-term impact is still relatively unknown, we can reference a handful of public datasets to begin to scratch the surface. 
+No matter where you are on your journey to mastering data science, it's always helpful to practice the basics of finding, cleaning, and analyzing real-world datasets. Back in 2020, COVID-19 sent us many of us into quarantine and while its long-term impact is still relatively unknown, we can reference a handful of public datasets to begin to scratch the surface.
 
 In this project tutorial, we'll be analyzing a dataset gathered from the 2022 [U.S. Census](https://data.census.gov/) covering geographic relocation roughly two years after quarantine.
 
 <RoundedImage
   link="https://i.imgur.com/QSycenX.gif"
-  description="U.S. Census Data Analysis" 
+  description="U.S. Census Data Analysis"
 />
 
 We will begin to test our assumptions and answer some basic questions about various demographic groups using SciPy, NumPy, Pandas, and some basic working knowledge of statistics.
 
 The questions include:
 
 - Is there a difference in mobility patterns between those that moved within their home state versus across states lines in New York and California in particular?
-    - And do trends vary amongst citizenship status?
+  - And do trends vary amongst citizenship status?
 - Is there a difference in those same patterns amongst educational status between the Northeast (New Jersey, Pennsylvania, Rhode Island, Vermont, etc.) and the South (Georgia, Maryland, Virginia, etc.)?
 - What about marital status across conservative divisions like the South Atlantic (Washington D.C., Georgia, Florida, North Carolina, etc.) and the Mountain States (Colorado, Wyoming, Nevada, Arizona, etc.)? Do we notice a difference in geographic mobility there as well?
 
@@ -63,19 +40,18 @@ As you can see below, the original data provided by [census.gov](https://data.ce
 
 <RoundedImage
   link="https://i.imgur.com/uvbRfkQ.png"
-  description="U.S. Census Data Analysis" 
+  description="U.S. Census Data Analysis"
 />
 <RoundedImage
   link="https://i.imgur.com/nxdFv8j.png"
-  description="U.S. Census Data Analysis" 
+  description="U.S. Census Data Analysis"
 />
 
-
 When this happens, it's helpful to have some basic data preparation skills. While this isn't typically a requirement for using the SciPy package or conducting basic statistical analysis, you can look at each step we took to clean and structure the data by referencing the source code [here](https://colab.research.google.com/drive/1ujk1u0TWqlNolFwv9-rUNMjaghZuLLZK).
 
 ## About the Clean Datasets
 
-The source code cranks out multiple categories of the same data, including information on the total population in 2022: 
+The source code cranks out multiple categories of the same data, including information on the total population in 2022:
 
 - those that moved within the same county and/or state
 - those that moved between states
@@ -85,13 +61,13 @@ For the categories listed, each dataset contains the following columns, which ar
 
 <RoundedImage
   link="https://i.imgur.com/dzkXTSC.gif"
-  description="U.S. Census Data Analysis" 
+  description="U.S. Census Data Analysis"
 />
 
 ### Geographical Data
 
 - **Geography ID**: a unique identifier used to reference specific geographic areas
-- **Census Tract**: a small, relatively permanent subdivision of a county 
+- **Census Tract**: a small, relatively permanent subdivision of a county
 - **State**: the state in which the Census Tract is located
 - **County**: the county within the state in which the Census Tract resides
 - **Region**: the broader geographic area in which the state or county is located, typically referring to one of four major regions: Northeast, Midwest, South, or West
@@ -128,11 +104,10 @@ When conducting an exploratory analysis, we first want to make sure that our dat
 
 Generally speaking, most data science models abide by what we call parametric assumptions, which refer to normal distribution of a fixed set of parameters. In our particular case, those parameters include, but are not limited to, the columns we listed above. The three parametric assumptions are independence, normality, and homogeneity of variances.
 
-Additionally, traditional **A/B testing** typically utilizes one of two methods: either a **chi-squared** (which looks for dependence between two categorical variables) or a **t-test** (which looks for a statistically significant difference between the averages of two groups) to validate what we refer to as the null hypothesis (which is the assumption that there is no relationship or comparison between two patterns of behavior). 
+Additionally, traditional **A/B testing** typically utilizes one of two methods: either a **chi-squared** (which looks for dependence between two categorical variables) or a **t-test** (which looks for a statistically significant difference between the averages of two groups) to validate what we refer to as the null hypothesis (which is the assumption that there is no relationship or comparison between two patterns of behavior).
 
 For this tutorial, we'll be running t-tests.
 
-
 ## Getting Started
 
 To get started, you'll need the following [datasets](https://drive.google.com/drive/folders/1xO33dvJV_RySl77y2W-7lxIvBW7PUoEg?usp=sharing) and a copy of [this Google Colab notebook](https://colab.research.google.com/drive/1GWiNXPVuRTORqEBNFV7zpTGZD_yeprNt?usp=sharing).
@@ -141,7 +116,7 @@ Feel free to manually upload the CSVs to the notebook if you don't already see t
 
 <RoundedImage
   link="https://i.imgur.com/Iz1PLIY.png"
-  description="U.S. Census Data Analysis" 
+  description="U.S. Census Data Analysis"
 />
 
 First we'll begin by importing the necessary packages:
@@ -167,7 +142,6 @@ variant = pd.read_csv(v)
 # variant.head()
 ```
 
-
 ## Let's Explore
 
 Let's begin by manually creating an empty dataframe (table) based on each level of detail (County, State, Division, and Region) listed by the U.S. Census.
@@ -190,7 +164,7 @@ state["Relocated Between States"] = variant.groupby("State")["Total Population"]
 state.head()
 ```
 
-Comparing California residents to those from New York only, **is there a significant difference in mobility between those that relocated within the same** area (in this case, state) **versus those that moved across state lines?** 
+Comparing California residents to those from New York only, **is there a significant difference in mobility between those that relocated within the same** area (in this case, state) **versus those that moved across state lines?**
 
 We'll use the `.loc[]` method to search for the two states and extract the summed values that we calculated in the exercise above.
 
@@ -237,6 +211,7 @@ print("p-value:", p_value)
 The p-value is much higher in this instance, suggesting that we can be only 62% certain that there was a difference in mobility amongst immigrants between the two states.
 
 Now what about when comparing U.S. citizens only?
+
 ```python
 cny3 = pd.DataFrame()
 cny3["Total U.S. Citizens (Native)"] = d.groupby("State")["Total US Citizens (Native)"].sum()
@@ -267,6 +242,7 @@ region["Bachelor's Degree"] = control.groupby("Region")["Bachelor's Degree"].sum
 nem = region.loc[region.index.isin(["Northeast", "South"])]
 # nem
 ```
+
 ```python
 t_stat, p_value = stats.ttest_ind(nem["High School Graduate (or its Equivalency)"], nem["Bachelor's Degree"])
 
@@ -285,6 +261,7 @@ division["Married"] = control.groupby("Division")["Married"].sum()
 sam = division.loc[division.index.isin(["South Atlantic", "Mountain"])]
 # sam
 ```
+
 ```python
 t_stat, p_value = stats.ttest_ind(sam["Never Married"], sam["Married"])
 
@@ -309,7 +286,7 @@ So what have we learned?? We've learned that:
 - No, there does not appear to be a difference in those same patterns amongst educational status between the Northeast (New Jersey, Pennsylvania, Rhode Island, Vermont, etc.) and the South (Georgia, Maryland, Virginia, D.C., etc.).
 - No, there also does not appear to be a difference across marital status for conservative divisions like the South Atlantic (Washington D.C., Georgia, Florida, North Carolina, etc.) and the Mountain States (Colorado, Wyoming, Nevada, Arizona, etc.) either.
 
-Why does this matter? It matters because it demonstrates that there's actually a sound and scientific method for answering these questions when they come up. Feel free to try your hand at doing the same the next time you run into an interesting dataset! Or, consider ways you can examine how mobility influences local economies, or even how it impacts the environment. 
+Why does this matter? It matters because it demonstrates that there's actually a sound and scientific method for answering these questions when they come up. Feel free to try your hand at doing the same the next time you run into an interesting dataset! Or, consider ways you can examine how mobility influences local economies, or even how it impacts the environment.
 
 Thanks for coding with us!