Skip to content

Commit 76d5ac6

Browse files
committed
replaced Height and Weight
1 parent f904ef1 commit 76d5ac6

File tree

10 files changed

+225
-340
lines changed

10 files changed

+225
-340
lines changed

.idea/.gitignore

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/Data-Science-For-Beginners.iml

Lines changed: 11 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/inspectionProfiles/Project_Default.xml

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/jpa-buddy.xml

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/misc.xml

Lines changed: 7 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/modules.xml

Lines changed: 8 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

.idea/vcs.xml

Lines changed: 6 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

1-Introduction/04-stats-and-probability/assignment.ipynb

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33
{
44
"cell_type": "markdown",
55
"source": [
6-
"## Introduction to Probability and Statistics\r\n",
7-
"## Assignment\r\n",
8-
"\r\n",
6+
"## Introduction to Probability and Statistics\n",
7+
"## Assignment\n",
8+
"\n",
99
"In this assignment, we will use the dataset of diabetes patients taken [from here](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html)."
1010
],
1111
"metadata": {}
@@ -14,10 +14,10 @@
1414
"cell_type": "code",
1515
"execution_count": 13,
1616
"source": [
17-
"import pandas as pd\r\n",
18-
"import numpy as np\r\n",
19-
"\r\n",
20-
"df = pd.read_csv(\"../../data/diabetes.tsv\",sep='\\t')\r\n",
17+
"import pandas as pd\n",
18+
"import numpy as np\n",
19+
"\n",
20+
"df = pd.read_csv(\"../../data/diabetes.tsv\",sep='\\t')\n",
2121
"df.head()"
2222
],
2323
"outputs": [
@@ -149,16 +149,16 @@
149149
{
150150
"cell_type": "markdown",
151151
"source": [
152-
"\r\n",
153-
"In this dataset, columns as the following:\r\n",
154-
"* Age and sex are self-explanatory\r\n",
155-
"* BMI is body mass index\r\n",
156-
"* BP is average blood pressure\r\n",
157-
"* S1 through S6 are different blood measurements\r\n",
158-
"* Y is the qualitative measure of disease progression over one year\r\n",
159-
"\r\n",
160-
"Let's study this dataset using methods of probability and statistics.\r\n",
161-
"\r\n",
152+
"\n",
153+
"In this dataset, columns as the following:\n",
154+
"* Age and sex are self-explanatory\n",
155+
"* BMI is body mass index\n",
156+
"* BP is average blood pressure\n",
157+
"* S1 through S6 are different blood measurements\n",
158+
"* Y is the qualitative measure of disease progression over one year\n",
159+
"\n",
160+
"Let's study this dataset using methods of probability and statistics.\n",
161+
"\n",
162162
"### Task 1: Compute mean values and variance for all values"
163163
],
164164
"metadata": {}
@@ -201,8 +201,8 @@
201201
{
202202
"cell_type": "markdown",
203203
"source": [
204-
"### Task 4: Test the correlation between different variables and disease progression (Y)\r\n",
205-
"\r\n",
204+
"### Task 4: Test the correlation between different variables and disease progression (Y)\n",
205+
"\n",
206206
"> **Hint** Correlation matrix would give you the most useful information on which values are dependent."
207207
],
208208
"metadata": {}
@@ -249,4 +249,4 @@
249249
},
250250
"nbformat": 4,
251251
"nbformat_minor": 2
252-
}
252+
}

1-Introduction/04-stats-and-probability/notebook.ipynb

Lines changed: 115 additions & 282 deletions
Large diffs are not rendered by default.

1-Introduction/04-stats-and-probability/solution/assignment.ipynb

Lines changed: 38 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@
33
{
44
"cell_type": "markdown",
55
"source": [
6-
"## Introduction to Probability and Statistics\r\n",
7-
"## Assignment\r\n",
8-
"\r\n",
6+
"## Introduction to Probability and Statistics\n",
7+
"## Assignment\n",
8+
"\n",
99
"In this assignment, we will use the dataset of diabetes patients taken [from here](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html)."
1010
],
1111
"metadata": {}
@@ -14,11 +14,11 @@
1414
"cell_type": "code",
1515
"execution_count": 13,
1616
"source": [
17-
"import pandas as pd\r\n",
18-
"import numpy as np\r\n",
19-
"import matplotlib.pyplot as plt\r\n",
20-
"\r\n",
21-
"df = pd.read_csv(\"../../../data/diabetes.tsv\",sep='\\t')\r\n",
17+
"import pandas as pd\n",
18+
"import numpy as np\n",
19+
"import matplotlib.pyplot as plt\n",
20+
"\n",
21+
"df = pd.read_csv(\"../../../data/diabetes.tsv\",sep='\\t')\n",
2222
"df.head()"
2323
],
2424
"outputs": [
@@ -150,16 +150,16 @@
150150
{
151151
"cell_type": "markdown",
152152
"source": [
153-
"\r\n",
154-
"In this dataset, columns as the following:\r\n",
155-
"* Age and sex are self-explanatory\r\n",
156-
"* BMI is body mass index\r\n",
157-
"* BP is average blood pressure\r\n",
158-
"* S1 through S6 are different blood measurements\r\n",
159-
"* Y is the qualitative measure of disease progression over one year\r\n",
160-
"\r\n",
161-
"Let's study this dataset using methods of probability and statistics.\r\n",
162-
"\r\n",
153+
"\n",
154+
"In this dataset, columns as the following:\n",
155+
"* Age and sex are self-explanatory\n",
156+
"* BMI is body mass index\n",
157+
"* BP is average blood pressure\n",
158+
"* S1 through S6 are different blood measurements\n",
159+
"* Y is the qualitative measure of disease progression over one year\n",
160+
"\n",
161+
"Let's study this dataset using methods of probability and statistics.\n",
162+
"\n",
163163
"### Task 1: Compute mean values and variance for all values"
164164
],
165165
"metadata": {}
@@ -355,7 +355,7 @@
355355
"cell_type": "code",
356356
"execution_count": 8,
357357
"source": [
358-
"# Another way\r\n",
358+
"# Another way\n",
359359
"pd.DataFrame([df.mean(),df.var()],index=['Mean','Variance']).head()"
360360
],
361361
"outputs": [
@@ -447,7 +447,7 @@
447447
"cell_type": "code",
448448
"execution_count": 9,
449449
"source": [
450-
"# Or, more simply, for the mean (variance can be done similarly)\r\n",
450+
"# Or, more simply, for the mean (variance can be done similarly)\n",
451451
"df.mean()"
452452
],
453453
"outputs": [
@@ -486,8 +486,8 @@
486486
"cell_type": "code",
487487
"execution_count": 17,
488488
"source": [
489-
"for col in ['BMI','BP','Y']:\r\n",
490-
" df.boxplot(column=col,by='SEX')\r\n",
489+
"for col in ['BMI','BP','Y']:\n",
490+
" df.boxplot(column=col,by='SEX')\n",
491491
"plt.show()"
492492
],
493493
"outputs": [
@@ -538,8 +538,8 @@
538538
"cell_type": "code",
539539
"execution_count": 19,
540540
"source": [
541-
"for col in ['AGE','SEX','BMI','Y']:\r\n",
542-
" df[col].hist()\r\n",
541+
"for col in ['AGE','SEX','BMI','Y']:\n",
542+
" df[col].hist()\n",
543543
" plt.show()"
544544
],
545545
"outputs": [
@@ -593,18 +593,18 @@
593593
{
594594
"cell_type": "markdown",
595595
"source": [
596-
"Conclusions:\r\n",
597-
"* Age - normal\r\n",
598-
"* Sex - uniform\r\n",
596+
"Conclusions:\n",
597+
"* Age - normal\n",
598+
"* Sex - uniform\n",
599599
"* BMI, Y - hard to tell"
600600
],
601601
"metadata": {}
602602
},
603603
{
604604
"cell_type": "markdown",
605605
"source": [
606-
"### Task 4: Test the correlation between different variables and disease progression (Y)\r\n",
607-
"\r\n",
606+
"### Task 4: Test the correlation between different variables and disease progression (Y)\n",
607+
"\n",
608608
"> **Hint** Correlation matrix would give you the most useful information on which values are dependent."
609609
],
610610
"metadata": {}
@@ -847,7 +847,7 @@
847847
{
848848
"cell_type": "markdown",
849849
"source": [
850-
"Conclusion:\r\n",
850+
"Conclusion:\n",
851851
"* The strongest correlation of Y is BMI and S5 (blood sugar). This sounds reasonable."
852852
],
853853
"metadata": {}
@@ -856,10 +856,10 @@
856856
"cell_type": "code",
857857
"execution_count": 26,
858858
"source": [
859-
"fig, ax = plt.subplots(1,3,figsize=(10,5))\r\n",
860-
"for i,n in enumerate(['BMI','S5','BP']):\r\n",
861-
" ax[i].scatter(df['Y'],df[n])\r\n",
862-
" ax[i].set_title(n)\r\n",
859+
"fig, ax = plt.subplots(1,3,figsize=(10,5))\n",
860+
"for i,n in enumerate(['BMI','S5','BP']):\n",
861+
" ax[i].scatter(df['Y'],df[n])\n",
862+
" ax[i].set_title(n)\n",
863863
"plt.show()"
864864
],
865865
"outputs": [
@@ -888,9 +888,9 @@
888888
"cell_type": "code",
889889
"execution_count": 27,
890890
"source": [
891-
"from scipy.stats import ttest_ind\r\n",
892-
"\r\n",
893-
"tval, pval = ttest_ind(df.loc[df['SEX']==1,['Y']], df.loc[df['SEX']==2,['Y']],equal_var=False)\r\n",
891+
"from scipy.stats import ttest_ind\n",
892+
"\n",
893+
"tval, pval = ttest_ind(df.loc[df['SEX']==1,['Y']], df.loc[df['SEX']==2,['Y']],equal_var=False)\n",
894894
"print(f\"T-value = {tval[0]:.2f}\\nP-value: {pval[0]}\")"
895895
],
896896
"outputs": [
@@ -942,4 +942,4 @@
942942
},
943943
"nbformat": 4,
944944
"nbformat_minor": 2
945-
}
945+
}

0 commit comments

Comments
 (0)