Skip to content

Commit f63934d

Browse files
authored
Update analyze-us-census-data-with-scipy.mdx
1 parent 3a61d44 commit f63934d

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

projects/analyze-us-census-data-with-scipy/analyze-us-census-data-with-scipy.mdx

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -128,7 +128,7 @@ When conducting an exploratory analysis, we first want to make sure that our dat
128128

129129
Generally speaking, most data science models abide by what we call parametric assumptions, which refer to normal distribution of a fixed set of parameters. In our particular case, those parameters include, but are not limited to, the columns we listed above. The three parametric assumptions are independence, normality, and homogeneity of variances.
130130

131-
Additionally, traditional A/B testing typically utilizes one of two methods: either a chi-squared (which looks for dependence between two categorical variables) or a t-test (which looks for a statistically significant difference between the averages of two groups) to validate what we refer to as the null hypothesis (which is the assumption that there is no relationship or comparison between two patterns of behavior).
131+
Additionally, traditional **A/B testing** typically utilizes one of two methods: either a **chi-squared** (which looks for dependence between two categorical variables) or a **t-test** (which looks for a statistically significant difference between the averages of two groups) to validate what we refer to as the null hypothesis (which is the assumption that there is no relationship or comparison between two patterns of behavior).
132132

133133
For this tutorial, we'll be running t-tests.
134134

@@ -163,8 +163,8 @@ v = ("/content/moved_between_states.csv")
163163
control = pd.read_csv(c)
164164
variant = pd.read_csv(v)
165165

166-
#control.head()
167-
#variant.head()
166+
# control.head()
167+
# variant.head()
168168
```
169169

170170

@@ -266,7 +266,7 @@ region["High School Graduate (or its Equivalency)"] = control.groupby("Region")[
266266
region["Bachelor's Degree"] = control.groupby("Region")["Bachelor's Degree"].sum()
267267

268268
nem = region.loc[region.index.isin(["Northeast", "South"])]
269-
#nem
269+
# nem
270270
```
271271
```python
272272
t_stat, p_value = stats.ttest_ind(nem["High School Graduate (or its Equivalency)"], nem["Bachelor's Degree"])
@@ -284,7 +284,7 @@ division["Never Married"] = control.groupby("Division")["Never Married"].sum()
284284
division["Married"] = control.groupby("Division")["Married"].sum()
285285

286286
sam = division.loc[division.index.isin(["South Atlantic", "Mountain"])]
287-
#sam
287+
# sam
288288
```
289289
```python
290290
t_stat, p_value = stats.ttest_ind(sam["Never Married"], sam["Married"])
@@ -299,7 +299,7 @@ Now answer the same exact question at the county level using two counties that y
299299
county["Never Married"] = control.groupby("County")["Never Married"].sum()
300300
county["Married"] = control.groupby("County")["Married"].sum()
301301

302-
#home = county.loc[county.index.isin(["Your Home county", "Home County 2"])]
302+
# home = county.loc[county.index.isin(["Your Home county", "Home County 2"])]
303303
```
304304

305305
## Conclusion

0 commit comments

Comments
 (0)