|
4 | 4 | "metadata": { |
5 | 5 | "colab": { |
6 | 6 | "provenance": [], |
7 | | - "collapsed_sections": [], |
8 | | - "toc_visible": true, |
9 | | - "authorship_tag": "ABX9TyOxj4lg4u8IZYP0RgSz5Ion", |
10 | | - "include_colab_link": true |
| 7 | + "authorship_tag": "ABX9TyNk3lwH1I5UBvtUCfCy6G+K" |
11 | 8 | }, |
12 | 9 | "kernelspec": { |
13 | 10 | "name": "python3", |
14 | 11 | "display_name": "Python 3" |
15 | 12 | } |
16 | 13 | }, |
17 | 14 | "cells": [ |
18 | | - { |
19 | | - "cell_type": "markdown", |
20 | | - "metadata": { |
21 | | - "id": "view-in-github", |
22 | | - "colab_type": "text" |
23 | | - }, |
24 | | - "source": [ |
25 | | - "<a href=\"https://colab.research.google.com/github/ebatty/MathToolsforNeuroscience/blob/jupyterbook/Week10/Week10Tutorial2.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" |
26 | | - ] |
27 | | - }, |
28 | 15 | { |
29 | 16 | "cell_type": "markdown", |
30 | 17 | "metadata": { |
|
63 | 50 | "import matplotlib as mpl\n", |
64 | 51 | "from sklearn.linear_model import LinearRegression, Lasso, Ridge\n" |
65 | 52 | ], |
66 | | - "execution_count": null, |
| 53 | + "execution_count": 1, |
67 | 54 | "outputs": [] |
68 | 55 | }, |
69 | 56 | { |
|
140 | 127 | " axes[1].plot(reg_validation_r2s, '-ob')\n", |
141 | 128 | " axes[1].set(title='Validation data', ylabel='R2', xlabel='alpha', xticks = np.arange(0, len(alphas)), xticklabels= alphas);" |
142 | 129 | ], |
143 | | - "execution_count": null, |
| 130 | + "execution_count": 2, |
144 | 131 | "outputs": [] |
145 | 132 | }, |
146 | 133 | { |
|
168 | 155 | " with open(fname, \"wb\") as fid:\n", |
169 | 156 | " fid.write(r.content)\n" |
170 | 157 | ], |
171 | | - "execution_count": null, |
| 158 | + "execution_count": 3, |
172 | 159 | "outputs": [] |
173 | 160 | }, |
174 | 161 | { |
|
204 | 191 | "run_test = dat['run'][2250:2500, :]\n", |
205 | 192 | "\n" |
206 | 193 | ], |
207 | | - "execution_count": null, |
| 194 | + "execution_count": 4, |
208 | 195 | "outputs": [] |
209 | 196 | }, |
210 | 197 | { |
|
221 | 208 | { |
222 | 209 | "cell_type": "code", |
223 | 210 | "metadata": { |
224 | | - "id": "0dr0odDsd_Z8" |
| 211 | + "id": "0dr0odDsd_Z8", |
| 212 | + "outputId": "9f8abf66-c4e4-42d4-a0aa-5816dd8cdb69", |
| 213 | + "colab": { |
| 214 | + "base_uri": "https://localhost:8080/" |
| 215 | + } |
225 | 216 | }, |
226 | 217 | "source": [ |
227 | 218 | "print(resps_train.shape)\n", |
|
233 | 224 | "print(resps_test.shape)\n", |
234 | 225 | "print(run_test.shape)" |
235 | 226 | ], |
236 | | - "execution_count": null, |
237 | | - "outputs": [] |
| 227 | + "execution_count": 5, |
| 228 | + "outputs": [ |
| 229 | + { |
| 230 | + "output_type": "stream", |
| 231 | + "name": "stdout", |
| 232 | + "text": [ |
| 233 | + "(2000, 11983)\n", |
| 234 | + "(2000, 1)\n", |
| 235 | + "(250, 11983)\n", |
| 236 | + "(250, 1)\n", |
| 237 | + "(250, 11983)\n", |
| 238 | + "(250, 1)\n" |
| 239 | + ] |
| 240 | + } |
| 241 | + ] |
238 | 242 | }, |
239 | 243 | { |
240 | 244 | "cell_type": "markdown", |
|
251 | 255 | "id": "6YtJ0FWFlbzg" |
252 | 256 | }, |
253 | 257 | "source": [ |
254 | | - "First, we'll examine a standard linear regression model and examine our prediction quality on held out data. We fit this model using sklearn. \n", |
| 258 | + "First, we'll examine a standard linear regression model and examine our prediction quality on held out data. We fit this model using sklearn.\n", |
255 | 259 | "\n" |
256 | 260 | ] |
257 | 261 | }, |
258 | 262 | { |
259 | 263 | "cell_type": "code", |
260 | 264 | "source": [ |
261 | 265 | "from sklearn.linear_model import LinearRegression\n", |
| 266 | + "from sklearn.preprocessing import StandardScaler\n", |
| 267 | + "from sklearn.pipeline import make_pipeline\n", |
262 | 268 | "\n", |
263 | | - "no_reg_model = LinearRegression(normalize=True)\n", |
| 269 | + "no_reg_model = make_pipeline(StandardScaler(with_mean=False), LinearRegression())\n", |
264 | 270 | "\n", |
265 | 271 | "no_reg_model.fit(resps_train, run_train)\n", |
266 | 272 | "\n", |
|
269 | 275 | "R2 = no_reg_model.score(resps_train, run_train)" |
270 | 276 | ], |
271 | 277 | "metadata": { |
272 | | - "id": "89RmUIjlYOSS" |
| 278 | + "id": "89RmUIjlYOSS", |
| 279 | + "outputId": "e9ec40c5-3235-40f1-e75f-721f700c5946", |
| 280 | + "colab": { |
| 281 | + "base_uri": "https://localhost:8080/", |
| 282 | + "height": 246 |
| 283 | + } |
273 | 284 | }, |
274 | | - "execution_count": null, |
275 | | - "outputs": [] |
| 285 | + "execution_count": 1, |
| 286 | + "outputs": [ |
| 287 | + { |
| 288 | + "output_type": "error", |
| 289 | + "ename": "NameError", |
| 290 | + "evalue": "ignored", |
| 291 | + "traceback": [ |
| 292 | + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", |
| 293 | + "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", |
| 294 | + "\u001b[0;32m<ipython-input-1-75e9c709344c>\u001b[0m in \u001b[0;36m<cell line: 7>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mno_reg_model\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmake_pipeline\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mStandardScaler\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mwith_mean\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mLinearRegression\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0mno_reg_model\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresps_train\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mrun_train\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 8\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0my_pred\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mno_reg_model\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpredict\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresps_train\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", |
| 295 | + "\u001b[0;31mNameError\u001b[0m: name 'resps_train' is not defined" |
| 296 | + ] |
| 297 | + } |
| 298 | + ] |
276 | 299 | }, |
277 | 300 | { |
278 | 301 | "cell_type": "markdown", |
279 | 302 | "source": [ |
280 | 303 | "There's four steps in the code above:\n", |
281 | 304 | "\n", |
282 | | - "- We *initialized* the model. \n", |
283 | | - "We have `normalize = True` so the model normalizes the input data (the neural activities) before fitting.\n", |
| 305 | + "- We *initialized* the model.\n", |
| 306 | + "We have a standard scalar as the first step in our pipeline so the model standardizes the input data (the neural activities) before fitting.\n", |
284 | 307 | "\n", |
285 | 308 | "- We *fit* the model by passing it the training data\n", |
286 | 309 | "- We *predicted* y (running speed) given resps_train.\n", |
|
351 | 374 | }, |
352 | 375 | "source": [ |
353 | 376 | "\n", |
354 | | - "We can see that we are predicting the training data very accurately (high R2/overlapping traces) but are not predicting the validation data as well. Seems like we're overfitting! We can try some regularization to fix this. \n", |
| 377 | + "We can see that we are predicting the training data very accurately (high R2/overlapping traces) but are not predicting the validation data as well. Seems like we're overfitting! We can try some regularization to fix this.\n", |
355 | 378 | "\n", |
356 | 379 | "(We are still fitting the validation data fairly well so we can decode running speed quite well from the neural activity!)" |
357 | 380 | ] |
|
360 | 383 | "cell_type": "markdown", |
361 | 384 | "source": [ |
362 | 385 | " An even better choice would have been to not have separate validation data. Instaed, we could use cross-validation to fit and evaluate on the training + validation data so we use it all. However, this takes too long for the purposes of this tutorial.\n", |
363 | | - " \n", |
| 386 | + "\n", |
364 | 387 | "Just so you know though, we can do cross-validation easily easily in one line of code using the magic of scikit-learn. See the cell below for evaluating the model using 5 folds cross validation" |
365 | 388 | ], |
366 | 389 | "metadata": { |
|
371 | 394 | "cell_type": "code", |
372 | 395 | "source": [ |
373 | 396 | "from sklearn.model_selection import cross_validate\n", |
374 | | - "scores = cross_validate(LinearRegression(normalize=True), resps_train, run_train,\n", |
375 | | - " cv = 5, return_train_score = True, return_estimator=True) \n", |
| 397 | + "scores = cross_validate(make_pipeline(StandardScaler(with_mean=False), LinearRegression()), resps_train, run_train,\n", |
| 398 | + " cv = 5, return_train_score = True, return_estimator=True)\n", |
376 | 399 | "\n", |
377 | 400 | "print(f\"Cross-validated training score is {scores['train_score'].mean()}\")\n", |
378 | 401 | "print(f\"Cross-validated validation score is {scores['test_score'].mean()}\")" |
|
397 | 420 | { |
398 | 421 | "cell_type": "code", |
399 | 422 | "source": [ |
400 | | - "plot_weights({\"No regularization\": no_reg_model})" |
| 423 | + "plot_weights({\"No regularization\": no_reg_model[1]})" |
401 | 424 | ], |
402 | 425 | "metadata": { |
403 | 426 | "id": "pLudOSrgbsw1" |
|
431 | 454 | "In sklearn, we fit L2 regularized linear regression models as:\n", |
432 | 455 | "\n", |
433 | 456 | "```\n", |
434 | | - "model = Ridge(alpha = 1, normalize = True)\n", |
| 457 | + "model = make_pipeline(StandardScaler(with_mean=False), Ridge(alpha = 1))\n", |
435 | 458 | "model.fit(resps_train, run_train)\n", |
436 | 459 | "```\n", |
437 | 460 | "\n", |
438 | | - "Again, the `alpha` input tells the weighting of the regularization term in the loss function. " |
| 461 | + "Again, the `alpha` input tells the weighting of the regularization term in the loss function." |
439 | 462 | ] |
440 | 463 | }, |
441 | 464 | { |
|
463 | 486 | "ridge_train_r2s = np.zeros((len(l2_alphas)))\n", |
464 | 487 | "ridge_validation_r2s = np.zeros(len(l2_alphas))\n", |
465 | 488 | "\n", |
466 | | - "# Your code here to loop over different values of alpha, fit a Ridge regression model on the training data, \n", |
| 489 | + "# Your code here to loop over different values of alpha, fit a Ridge regression model on the training data,\n", |
467 | 490 | " # store the resulting R2 score on the training and validation data in the correct entry of `ridge_train_r2s` and\n", |
468 | 491 | " # `ridge_validation_r2s`\n", |
469 | 492 | " # Save the fit model for a given alpha in the dictionary `ridge_models` - the key should be the value of alpha\n", |
|
484 | 507 | "id": "9Zlx82yv6xQJ" |
485 | 508 | }, |
486 | 509 | "source": [ |
487 | | - "We can see that by increasing regularization, we decrease performance on training data. Here, we increase performance (for some values of regularization) on validation data more. For high values of regularization, our model does very badly on both train and validation data - we have to pick the regularization amount carefully! " |
| 510 | + "We can see that by increasing regularization, we decrease performance on training data. Here, we increase performance (for some values of regularization) on validation data more. For high values of regularization, our model does very badly on both train and validation data - we have to pick the regularization amount carefully!" |
488 | 511 | ] |
489 | 512 | }, |
490 | 513 | { |
|
512 | 535 | "@widgets.interact\n", |
513 | 536 | "def plot_observed(alpha = l2_alphas):\n", |
514 | 537 | " models = {\n", |
515 | | - " \"No regularization\": no_reg_model,\n", |
516 | | - " \"Regularization\": ridge_models[alpha]\n", |
| 538 | + " \"No regularization\": no_reg_model[1],\n", |
| 539 | + " \"Regularization\": ridge_models[alpha][1]\n", |
517 | 540 | " }\n", |
518 | | - " plot_weights(models)" |
| 541 | + " plot_weights(models)\n", |
| 542 | + " plt.show()" |
519 | 543 | ], |
520 | 544 | "execution_count": null, |
521 | 545 | "outputs": [] |
|
544 | 568 | "In sklearn, we fit L1 regularized linear regression models as:\n", |
545 | 569 | "\n", |
546 | 570 | "```\n", |
547 | | - "model = Lasso(alpha = 1, normalize=True)\n", |
| 571 | + "model = make_pipeline(StandardScaler(with_mean=False), Lasso(alpha))\n", |
548 | 572 | "model.fit(resps_train, run_train)\n", |
549 | 573 | "```\n", |
550 | 574 | "\n", |
|
572 | 596 | "lasso_train_r2s = np.zeros((len(l1_alphas)))\n", |
573 | 597 | "lasso_validation_r2s = np.zeros(len(l1_alphas))\n", |
574 | 598 | "\n", |
575 | | - "# Your code here to loop over different values of alpha, fit a Lasso regression model on the training data, \n", |
| 599 | + "# Your code here to loop over different values of alpha, fit a Lasso regression model on the training data,\n", |
576 | 600 | " # store the resulting R2 score on the training and validation data in the correct entry of `lasso_train_r2s` and\n", |
577 | 601 | " # `lasso_validation_r2s`\n", |
578 | 602 | " # Save the fit model for a given alpha in the dictionary `lasso_models` - the key should be the value of alpha\n", |
|
593 | 617 | "id": "ujEewQWG4kel" |
594 | 618 | }, |
595 | 619 | "source": [ |
596 | | - "We can see that with increasing values of alpha, the performance on the training data decreases. On validation data, we get slightly better performance for certain values of alpha (0.001) as compared to the model with no regularization. For high values of regularization, our model does very badly on both train and validation data - we have to pick the regularization amount carefully! " |
| 620 | + "We can see that with increasing values of alpha, the performance on the training data decreases. On validation data, we get slightly better performance for certain values of alpha (0.001) as compared to the model with no regularization. For high values of regularization, our model does very badly on both train and validation data - we have to pick the regularization amount carefully!" |
597 | 621 | ] |
598 | 622 | }, |
599 | 623 | { |
|
620 | 644 | "@widgets.interact\n", |
621 | 645 | "def plot_observed(alpha = l1_alphas):\n", |
622 | 646 | " models = {\n", |
623 | | - " \"No regularization\": no_reg_model,\n", |
624 | | - " \"Regularization\": lasso_models[alpha]\n", |
| 647 | + " \"No regularization\": no_reg_model[1],\n", |
| 648 | + " \"Regularization\": lasso_models[alpha][1]\n", |
625 | 649 | " }\n", |
626 | | - " plot_weights(models)" |
| 650 | + " plot_weights(models)\n", |
| 651 | + " plt.show()" |
627 | 652 | ], |
628 | 653 | "execution_count": null, |
629 | 654 | "outputs": [] |
|
643 | 668 | "# Section 4: Choosing regularizers\n", |
644 | 669 | "\n", |
645 | 670 | "\n", |
646 | | - "When should you use $L_1$ vs. $L_2$ regularization? Both penalties shrink parameters, and both will help reduce overfitting. However, the models they lead to are different. \n", |
| 671 | + "When should you use $L_1$ vs. $L_2$ regularization? Both penalties shrink parameters, and both will help reduce overfitting. However, the models they lead to are different.\n", |
647 | 672 | "\n", |
648 | 673 | "In particular, the $L_1$ penalty encourages *sparse* solutions in which most parameters are 0. Let's unpack the notion of sparsity.\n", |
649 | 674 | "\n", |
650 | 675 | "A \"dense\" vector has mostly nonzero elements:\n", |
651 | 676 | "$\\begin{bmatrix}\n", |
652 | | - " 0.1 \\\\ -0.6\\\\-9.1\\\\0.07 \n", |
653 | | - "\\end{bmatrix}$. \n", |
| 677 | + " 0.1 \\\\ -0.6\\\\-9.1\\\\0.07\n", |
| 678 | + "\\end{bmatrix}$.\n", |
654 | 679 | "A \"sparse\" vector has mostly zero elements:\n", |
655 | 680 | "$\\begin{bmatrix}\n", |
656 | 681 | " 0 \\\\ -0.7\\\\ 0\\\\0\n", |
|
702 | 727 | "\n", |
703 | 728 | "When is it OK to assume that the parameter vector is sparse? Whenever it is true that most features don't affect the outcome. One use-case might be decoding low-level visual features from whole-brain fMRI: we may expect only voxels in V1 and thalamus should be used in the prediction.\n", |
704 | 729 | "\n", |
705 | | - "**WARNING**: be careful when interpreting $\\theta$. Never interpret the nonzero coefficients as *evidence* that only those voxels/neurons/features carry information about the outcome. This is a product of our regularization scheme, and thus *our prior assumption that the solution is sparse*. Other regularization types or models may find very distributed relationships across the brain. Never use a model as evidence for a phenomena when that phenomena is encoded in the assumptions of the model. \n", |
| 730 | + "**WARNING**: be careful when interpreting $\\theta$. Never interpret the nonzero coefficients as *evidence* that only those voxels/neurons/features carry information about the outcome. This is a product of our regularization scheme, and thus *our prior assumption that the solution is sparse*. Other regularization types or models may find very distributed relationships across the brain. Never use a model as evidence for a phenomena when that phenomena is encoded in the assumptions of the model.\n", |
706 | 731 | "\n", |
707 | 732 | "\n" |
708 | 733 | ], |
|
0 commit comments