|
27 | 27 | "id": "a6fe247f", |
28 | 28 | "metadata": {}, |
29 | 29 | "source": [ |
30 | | - "# Binary logistic regression model with hidden layers and a sigmoid output layer\n", |
| 30 | + "# PyTorch Model for Binary Regression Using Hidden Layers and Sigmoid Output Layer\n", |
31 | 31 | "\n", |
32 | | - "- we use **PyTorch** to train the model and to make predictions\n", |
33 | 32 | "- we use scikit-learn for data synthesis and split\n", |
| 33 | + "- we use PyTorch for model / training / prediction\n", |
34 | 34 | "- we use scikit-learn for statistical measures\n", |
35 | | - "- see [binary_logistic_regression_tf_with_hidden_layers.ipynb](binary_logistic_regression_tf_with_hidden_layers.ipynb) for a TensorFlow implementation of the same problem" |
| 35 | + "- see [binary_logistic_regression_tf_with_hidden_layers.ipynb](binary_logistic_regression_tf_with_hidden_layers.ipynb) for a TensorFlow implementation of a similar problem" |
36 | 36 | ] |
37 | 37 | }, |
38 | 38 | { |
|
42 | 42 | "metadata": {}, |
43 | 43 | "outputs": [], |
44 | 44 | "source": [ |
45 | | - "import numpy as np\n", |
46 | 45 | "import matplotlib.pyplot as plt\n", |
| 46 | + "import numpy as np\n", |
47 | 47 | "import sklearn\n", |
48 | 48 | "import torch\n", |
49 | 49 | "\n", |
|
64 | 64 | "id": "696e1590", |
65 | 65 | "metadata": {}, |
66 | 66 | "source": [ |
67 | | - "## Synthesis of Data" |
| 67 | + "## Data" |
68 | 68 | ] |
69 | 69 | }, |
70 | 70 | { |
|
96 | 96 | ")\n", |
97 | 97 | "X_train, X_test, Y_train, Y_test = train_test_split(\n", |
98 | 98 | " X, Y, train_size=train_size, random_state=None\n", |
99 | | - ")" |
| 99 | + ")\n", |
| 100 | + "M_train = X_train.shape[0]\n", |
| 101 | + "M_test = X_test.shape[0]\n", |
| 102 | + "print(\"M_train\", M_train)\n", |
| 103 | + "print(\"X train dim\", X_train.shape, \"Y train dim\", Y_train.shape)\n", |
| 104 | + "print(\"M_test\", M_test)\n", |
| 105 | + "print(\"X test dim\", X_test.shape, \"Y test dim\", Y_test.shape)" |
100 | 106 | ] |
101 | 107 | }, |
102 | 108 | { |
|
123 | 129 | " X[Y == 0, 1],\n", |
124 | 130 | " \"o\", color='dodgerblue', ms=1)\n", |
125 | 131 | "plt.axis(\"square\")\n", |
126 | | - "plt.title(\"data 1 and 0 \" + str(X.shape))\n", |
| 132 | + "plt.title(\"data '1' red and '0' blue\" + str(X.shape))\n", |
127 | 133 | "plt.xlabel(\"feature 1\")\n", |
128 | 134 | "plt.ylabel(\"feature 2\")\n", |
129 | 135 | "plt.axis([-10, 10, -10, 10])\n", |
|
134 | 140 | " X[Y == 1, 1],\n", |
135 | 141 | " \"o\", color='orangered', ms=1)\n", |
136 | 142 | "plt.axis(\"square\")\n", |
137 | | - "plt.title(\"data 1 \" + str(X.shape))\n", |
| 143 | + "plt.title(\"data '1' red \" + str(X.shape))\n", |
138 | 144 | "plt.xlabel(\"feature 1\")\n", |
139 | 145 | "plt.ylabel(\"feature 2\")\n", |
140 | 146 | "plt.axis([-10, 10, -10, 10])\n", |
|
145 | 151 | " X[Y == 0, 1],\n", |
146 | 152 | " \"o\", color='dodgerblue', ms=1)\n", |
147 | 153 | "plt.axis(\"square\")\n", |
148 | | - "plt.title(\"data 0 \" + str(X.shape))\n", |
| 154 | + "plt.title(\"data '0' blue \" + str(X.shape))\n", |
149 | 155 | "plt.xlabel(\"feature 1\")\n", |
150 | 156 | "plt.ylabel(\"feature 2\")\n", |
151 | 157 | "plt.axis([-10, 10, -10, 10])\n", |
|
160 | 166 | "## Learning Parameters" |
161 | 167 | ] |
162 | 168 | }, |
| 169 | + { |
| 170 | + "cell_type": "markdown", |
| 171 | + "id": "2be2665e", |
| 172 | + "metadata": {}, |
| 173 | + "source": [ |
| 174 | + "- set up hyper parameters\n", |
| 175 | + "- in practice we do hyper parameter tuning, see the upcoming exercises" |
| 176 | + ] |
| 177 | + }, |
163 | 178 | { |
164 | 179 | "cell_type": "code", |
165 | 180 | "execution_count": null, |
|
168 | 183 | "outputs": [], |
169 | 184 | "source": [ |
170 | 185 | "batch_size = 2**3\n", |
171 | | - "num_epochs = 5\n", |
172 | | - "learning_rate = 0.01" |
| 186 | + "num_epochs = 2**2\n", |
| 187 | + "learning_rate = 0.1" |
173 | 188 | ] |
174 | 189 | }, |
175 | 190 | { |
|
187 | 202 | "metadata": {}, |
188 | 203 | "outputs": [], |
189 | 204 | "source": [ |
190 | | - "Y_train = Y_train[:, np.newaxis]\n", |
191 | | - "Y_test = Y_test[:, np.newaxis]\n", |
| 205 | + "Y_train = Y_train[:, None]\n", |
| 206 | + "Y_test = Y_test[:, None]\n", |
| 207 | + "\n", |
| 208 | + "print(\"X train dim\", X_train.shape, \"Y train dim\", Y_train.shape)\n", |
| 209 | + "print(\"X test dim\", X_test.shape, \"Y test dim\", Y_test.shape)\n", |
192 | 210 | "\n", |
193 | 211 | "data_train = TensorDataset(torch.FloatTensor(X_train),\n", |
194 | 212 | " torch.FloatTensor(Y_train))\n", |
|
279 | 297 | "outputs": [], |
280 | 298 | "source": [ |
281 | 299 | "print('mps available? ', torch.backends.mps.is_available())\n", |
282 | | - "# device = torch.device('mps')\n", |
| 300 | + "# device = torch.device('mps') # Apple's Metal Performance Shaders (MPS)\n", |
283 | 301 | "device = torch.device('cpu')\n", |
284 | 302 | "model = model.to(device)\n", |
| 303 | + "\n", |
| 304 | + "# serious projects need hardware-specific compile and load\n", |
| 305 | + "# model = torch.compile(model)\n", |
| 306 | + "# we go for uncompiled to be hardware agnostic\n", |
| 307 | + "\n", |
285 | 308 | "print(next(model.parameters()).device)\n", |
286 | 309 | "# check some model weights\n", |
287 | 310 | "model.linear1.weight, model.linear1.bias" |
|
350 | 373 | "outputs": [], |
351 | 374 | "source": [ |
352 | 375 | "with torch.no_grad():\n", |
| 376 | + "\n", |
353 | 377 | " er = empirical_risk(model.forward(\n", |
354 | 378 | " torch.tensor(X_train, dtype=torch.float32).to(device)),\n", |
355 | 379 | " torch.tensor(Y_train, dtype=torch.float32).to(device))\n", |
356 | 380 | " print('final train loss', er)\n", |
| 381 | + "\n", |
357 | 382 | " er = empirical_risk(model.forward(\n", |
358 | 383 | " torch.tensor(X_test, dtype=torch.float32).to(device)),\n", |
359 | 384 | " torch.tensor(Y_test, dtype=torch.float32).to(device))\n", |
|
399 | 424 | "metadata": {}, |
400 | 425 | "outputs": [], |
401 | 426 | "source": [ |
402 | | - "print('train')\n", |
| 427 | + "print('train data')\n", |
403 | 428 | "print(confusion_matrix(\n", |
404 | 429 | " y_true=Y_train,\n", |
405 | 430 | " y_pred=Y_pred_train,\n", |
|
408 | 433 | " y_true=Y_train,\n", |
409 | 434 | " y_pred=Y_pred_train,\n", |
410 | 435 | " normalize='all')*100)\n", |
411 | | - "print('\\n test')\n", |
| 436 | + "print('\\n test data')\n", |
412 | 437 | "print(confusion_matrix(\n", |
413 | 438 | " y_true=Y_test,\n", |
414 | 439 | " y_pred=Y_pred_test,\n", |
|
434 | 459 | "metadata": {}, |
435 | 460 | "outputs": [], |
436 | 461 | "source": [ |
437 | | - "p, r, f, s = precision_recall_fscore_support(y_true=Y_train,\n", |
438 | | - " y_pred=Y_pred_train)\n", |
439 | | - "print('train', p, r, f, s)\n", |
440 | | - "p, r, f, s = precision_recall_fscore_support(y_true=Y_test,\n", |
441 | | - " y_pred=Y_pred_test)\n", |
442 | | - "print('test', p, r, f, s)" |
| 462 | + "p, r, f, s = precision_recall_fscore_support(\n", |
| 463 | + " y_true=Y_train, y_pred=Y_pred_train)\n", |
| 464 | + "print('train data', p, r, f, s)\n", |
| 465 | + "p, r, f, s = precision_recall_fscore_support(\n", |
| 466 | + " y_true=Y_test, y_pred=Y_pred_test)\n", |
| 467 | + "print('test data', p, r, f, s)" |
443 | 468 | ] |
444 | 469 | }, |
445 | 470 | { |
|
459 | 484 | "metadata": {}, |
460 | 485 | "outputs": [], |
461 | 486 | "source": [ |
462 | | - "a = accuracy_score(y_true=Y_train,\n", |
463 | | - " y_pred=Y_pred_train)\n", |
464 | | - "ba = balanced_accuracy_score(y_true=Y_train,\n", |
465 | | - " y_pred=Y_pred_train)\n", |
466 | | - "print('train:', a, ba)\n", |
467 | | - "a = accuracy_score(y_true=Y_test,\n", |
468 | | - " y_pred=Y_pred_test)\n", |
469 | | - "ba = balanced_accuracy_score(y_true=Y_test,\n", |
470 | | - " y_pred=Y_pred_test)\n", |
471 | | - "print('test', a, ba)" |
| 487 | + "a = accuracy_score(\n", |
| 488 | + " y_true=Y_train, y_pred=Y_pred_train)\n", |
| 489 | + "ba = balanced_accuracy_score(\n", |
| 490 | + " y_true=Y_train, y_pred=Y_pred_train)\n", |
| 491 | + "print('train data:', a, ba)\n", |
| 492 | + "a = accuracy_score(\n", |
| 493 | + " y_true=Y_test, y_pred=Y_pred_test)\n", |
| 494 | + "ba = balanced_accuracy_score(\n", |
| 495 | + " y_true=Y_test, y_pred=Y_pred_test)\n", |
| 496 | + "print('test data', a, ba)" |
472 | 497 | ] |
473 | 498 | }, |
474 | 499 | { |
|
486 | 511 | "metadata": {}, |
487 | 512 | "outputs": [], |
488 | 513 | "source": [ |
489 | | - "levels = [0.0, 0.05, 0.1, 0.37, 0.5, 0.63, 0.9, 0.95, 1]\n", |
| 514 | + "if N == 2: # 2D plot of data and classification (curved) line\n", |
| 515 | + " levels = [0.0, 0.05, 0.1, 0.37, 0.5, 0.63, 0.9, 0.95, 1]\n", |
490 | 516 | "\n", |
491 | | - "f1, f2 = np.arange(-10, 10, 0.1), np.arange(-10, 10, 0.1)\n", |
492 | | - "xv, yv = np.meshgrid(f1, f2)\n", |
493 | | - "xv_tmp = np.reshape(xv, (xv.shape[0]*xv.shape[1], 1))\n", |
494 | | - "yv_tmp = np.reshape(yv, (yv.shape[0]*yv.shape[1], 1))\n", |
495 | | - "X_tmp = torch.tensor(np.hstack([xv_tmp, yv_tmp]),\n", |
496 | | - " dtype=torch.float32).to(device)\n", |
497 | | - "with torch.no_grad():\n", |
498 | | - " Y_tmp = model.predict_class(X_tmp).cpu().detach().numpy()\n", |
499 | | - "tmp = np.reshape(Y_tmp, (xv.shape[0], -1))\n", |
500 | | - "# hard decision boundary:\n", |
501 | | - "# tmp = (tmp>=0.5) * 1\n", |
502 | | - "\n", |
503 | | - "plt.figure(figsize=(10, 10))\n", |
504 | | - "plt.subplot(2, 2, 1)\n", |
505 | | - "plt.plot(X_train[Y_train[:, 0] == 1, 0],\n", |
506 | | - " X_train[Y_train[:, 0] == 1, 1],\n", |
507 | | - " \"o\", color='orangered', ms=1)\n", |
508 | | - "plt.contourf(f1, f2, tmp, levels=levels, cmap=\"RdBu_r\")\n", |
509 | | - "plt.axis(\"equal\")\n", |
510 | | - "plt.colorbar()\n", |
511 | | - "plt.title(\"training \" + str(X_train.shape))\n", |
512 | | - "plt.xlabel(\"feature 1\")\n", |
513 | | - "plt.ylabel(\"feature 2\")\n", |
| 517 | + " f1, f2 = np.arange(-10, 10, 0.1), np.arange(-10, 10, 0.1)\n", |
| 518 | + " xv, yv = np.meshgrid(f1, f2)\n", |
| 519 | + " xv_tmp = np.reshape(xv, (xv.shape[0]*xv.shape[1], 1))\n", |
| 520 | + " yv_tmp = np.reshape(yv, (yv.shape[0]*yv.shape[1], 1))\n", |
| 521 | + " X_tmp = torch.tensor(np.hstack([xv_tmp, yv_tmp]),\n", |
| 522 | + " dtype=torch.float32).to(device)\n", |
| 523 | + " with torch.no_grad():\n", |
| 524 | + " ygrid = model.forward(X_tmp).cpu().detach().numpy()\n", |
| 525 | + " # probability 0...1\n", |
514 | 526 | "\n", |
515 | | - "plt.subplot(2, 2, 2)\n", |
516 | | - "plt.plot(X_train[Y_train[:, 0] == 0, 0],\n", |
517 | | - " X_train[Y_train[:, 0] == 0, 1],\n", |
518 | | - " \"o\", color='dodgerblue', ms=1)\n", |
519 | | - "plt.contourf(f1, f2, tmp, levels=levels, cmap=\"RdBu_r\")\n", |
520 | | - "plt.axis(\"equal\")\n", |
521 | | - "plt.colorbar()\n", |
522 | | - "plt.title(\"training \" + str(X_train.shape))\n", |
523 | | - "plt.xlabel(\"feature 1\")\n", |
524 | | - "plt.ylabel(\"feature 2\")\n", |
| 527 | + " # hard decision boundary:\n", |
| 528 | + " # ygrid = (ygrid >= 0.5) * 1 # binary classes {0,1}\n", |
525 | 529 | "\n", |
526 | | - "plt.subplot(2, 2, 3)\n", |
527 | | - "plt.plot(X_test[Y_test[:, 0] == 1, 0],\n", |
528 | | - " X_test[Y_test[:, 0] == 1, 1],\n", |
529 | | - " \"o\", color='orangered', ms=1)\n", |
530 | | - "plt.contourf(f1, f2, tmp, levels=levels, cmap=\"RdBu_r\")\n", |
531 | | - "plt.axis(\"equal\")\n", |
532 | | - "plt.colorbar()\n", |
533 | | - "plt.title(\"test \" + str(X_test.shape))\n", |
534 | | - "plt.xlabel(\"feature 1\")\n", |
535 | | - "plt.ylabel(\"feature 2\")\n", |
| 530 | + " # reshape to plane\n", |
| 531 | + " ygrid = np.reshape(ygrid, (xv.shape[0], -1))\n", |
536 | 532 | "\n", |
537 | | - "plt.subplot(2, 2, 4)\n", |
538 | | - "plt.plot(X_test[Y_test[:, 0] == 0, 0],\n", |
539 | | - " X_test[Y_test[:, 0] == 0, 1],\n", |
540 | | - " \"o\", color='dodgerblue', ms=1)\n", |
541 | | - "plt.contourf(f1, f2, tmp, levels=levels, cmap=\"RdBu_r\")\n", |
542 | | - "plt.axis(\"equal\")\n", |
543 | | - "plt.colorbar()\n", |
544 | | - "plt.title(\"test \" + str(X_test.shape))\n", |
545 | | - "plt.xlabel(\"feature 1\")\n", |
546 | | - "plt.ylabel(\"feature 2\")" |
| 533 | + " plt.figure(figsize=(12, 5))\n", |
| 534 | + " plt.subplot(1, 2, 1) # left plot for training data set\n", |
| 535 | + " plt.plot(X_train[Y_train[:, 0] == 0, 0],\n", |
| 536 | + " X_train[Y_train[:, 0] == 0, 1],\n", |
| 537 | + " \"o\", color='dodgerblue', ms=1)\n", |
| 538 | + " plt.plot(X_train[Y_train[:, 0] == 1, 0],\n", |
| 539 | + " X_train[Y_train[:, 0] == 1, 1],\n", |
| 540 | + " \"o\", color='orangered', ms=1)\n", |
| 541 | + " plt.contourf(f1, f2, ygrid, cmap=\"RdBu_r\", levels=levels)\n", |
| 542 | + " plt.colorbar()\n", |
| 543 | + " plt.axis(\"square\")\n", |
| 544 | + " plt.xlim(-6, 6)\n", |
| 545 | + " plt.ylim(-6, 6)\n", |
| 546 | + " plt.title(\"training: \" + str(X_train.shape))\n", |
| 547 | + " plt.xlabel(\"feature 1\")\n", |
| 548 | + " plt.ylabel(\"feature 2\")\n", |
| 549 | + "\n", |
| 550 | + " plt.subplot(1, 2, 2) # right plot for test data set\n", |
| 551 | + " plt.plot(X_test[Y_test[:, 0] == 0, 0],\n", |
| 552 | + " X_test[Y_test[:, 0] == 0, 1],\n", |
| 553 | + " \"o\", color='dodgerblue', ms=1)\n", |
| 554 | + " plt.plot(X_test[Y_test[:, 0] == 1, 0],\n", |
| 555 | + " X_test[Y_test[:, 0] == 1, 1],\n", |
| 556 | + " \"o\", color='orangered', ms=1)\n", |
| 557 | + " plt.contourf(f1, f2, ygrid, cmap=\"RdBu_r\", levels=levels)\n", |
| 558 | + " plt.colorbar()\n", |
| 559 | + " plt.axis(\"square\")\n", |
| 560 | + " plt.xlim(-6, 6)\n", |
| 561 | + " plt.ylim(-6, 6)\n", |
| 562 | + " plt.title(\"test: \" + str(X_test.shape))\n", |
| 563 | + " plt.xlabel(\"feature 1\")\n", |
| 564 | + " plt.ylabel(\"feature 2\")" |
| 565 | + ] |
| 566 | + }, |
| 567 | + { |
| 568 | + "cell_type": "markdown", |
| 569 | + "id": "43218100", |
| 570 | + "metadata": {}, |
| 571 | + "source": [ |
| 572 | + "## Nice to think about\n", |
| 573 | + "\n", |
| 574 | + "- Instead of using `activation='tanh'` in the dense layers, we could experience that `activation='relu'` yields a more piece-wise linear classification boundary line.\n", |
| 575 | + "- Please have a look at the classification boundary line and guess how many coefficients a polynomial or spline curve would need to create such a curve. A optimum model should exhibit about same parameter number to create this classification curve. Certainly, thousands of model parameters for this data example is too much. We see: mathematical thinking and and creating expectations for the given problem, rather than just playing around with models (and by that wasting energy), is a valuable, mandatory human skill.\n", |
| 576 | + "- For more than two features `N` we cannot conveniently plot the data sets and boundary lines anymore. Hence, instead of having visual contact to the data and classification, we heavily rely on the performances measures. So, we make sure that we fully understand the given numbers.\n", |
| 577 | + "- How to choose the best number of features? How to find the best model? When is a model nicely trained? These are the important questions for real applications. We soon learn about hyper parameter tuning, regularization." |
547 | 578 | ] |
548 | 579 | }, |
549 | 580 | { |
|
0 commit comments