Skip to content

Commit 52db713

Browse files
Adding initial minndae tuning nb (#36)
1 parent 7ecdbe5 commit 52db713

File tree

1 file changed

+228
-0
lines changed

1 file changed

+228
-0
lines changed

notebooks/tuning/minndae.ipynb

Lines changed: 228 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "8089e733-b121-4419-a641-9457f10ca989",
6+
"metadata": {},
7+
"source": [
8+
"# Hyperparameter Tuning: MinNDAE\n",
9+
"In this *Jupyter Notebook* the goal is to find the *optimal hyperparameters* for the `MinNDAE` model using the Kera's `MNIST` dataset as the baseline/standard dataset."
10+
]
11+
},
12+
{
13+
"cell_type": "markdown",
14+
"id": "4beb03a4-6443-4952-9ab7-d713414a6679",
15+
"metadata": {},
16+
"source": [
17+
"## Setup\n",
18+
"Need to get the necessary packages ..."
19+
]
20+
},
21+
{
22+
"cell_type": "code",
23+
"execution_count": null,
24+
"id": "988ef4c5-55b7-45d8-ba1d-886901ad4d7c",
25+
"metadata": {},
26+
"outputs": [],
27+
"source": [
28+
"# check for colab\n",
29+
"if \"google.colab\" in str(get_ipython()):\n",
30+
" # install colab dependencies\n",
31+
" !pip install git+https://github.com/DiogenesAnalytics/autoencoder"
32+
]
33+
},
34+
{
35+
"cell_type": "markdown",
36+
"id": "1b30698e-0e05-4687-a5dc-5b237c9617c8",
37+
"metadata": {},
38+
"source": [
39+
"## Get MNIST Data\n",
40+
"Wille use `keras.datasets` to get the `MNIST` dataset, and then do some *normalizing* and *reshaping* to prepare it for use in training the *autoencoder*."
41+
]
42+
},
43+
{
44+
"cell_type": "code",
45+
"execution_count": null,
46+
"id": "b377de50-992c-4bff-bd1b-e0ce5330ec77",
47+
"metadata": {},
48+
"outputs": [],
49+
"source": [
50+
"# get necessary libs for data/preprocessing\n",
51+
"import tensorflow as tf\n",
52+
"from keras.datasets import mnist\n",
53+
"\n",
54+
"# load the data\n",
55+
"(x_train, _), (x_test, _) = mnist.load_data()\n",
56+
"\n",
57+
"# preprocess the data (normalize)\n",
58+
"x_train = x_train.astype(\"float32\") / 255.\n",
59+
"x_test = x_test.astype(\"float32\") / 255.\n",
60+
"\n",
61+
"# add grayscale dimension\n",
62+
"x_train = tf.expand_dims(x_train, axis=-1)\n",
63+
"x_test = tf.expand_dims(x_test, axis=-1)\n",
64+
"\n",
65+
"# convert to tf datasets\n",
66+
"train_ds = tf.data.Dataset.from_tensor_slices((x_train, x_train))\n",
67+
"test_ds = tf.data.Dataset.from_tensor_slices((x_test, x_test))\n",
68+
"\n",
69+
"# set a few params\n",
70+
"BATCH_SIZE = 64\n",
71+
"SHUFFLE_BUFFER_SIZE = 100\n",
72+
"\n",
73+
"# update with batch/buffer size\n",
74+
"train_ds = train_ds.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE)\n",
75+
"test_ds = test_ds.batch(BATCH_SIZE)"
76+
]
77+
},
78+
{
79+
"cell_type": "markdown",
80+
"id": "13bf50dc-9add-4b1b-9507-032b6c37d0d5",
81+
"metadata": {},
82+
"source": [
83+
"## Building Hypermodel\n",
84+
"Here we need to define the *function* that will be used to build the *hyper model* for the `MinNDAE` class."
85+
]
86+
},
87+
{
88+
"cell_type": "code",
89+
"execution_count": null,
90+
"id": "edd24794-4b11-4922-8f9e-7f41439c2221",
91+
"metadata": {},
92+
"outputs": [],
93+
"source": [
94+
"from autoencoder.model.minimal import MinNDParams, MinNDAE\n",
95+
"from autoencoder.training import build_encode_dim_loss_function\n",
96+
"\n",
97+
"# set regularization factor\n",
98+
"REG_FACTOR = 1.0 / (28.0 * 28.0)\n",
99+
"\n",
100+
"# define the autoencoder model\n",
101+
"def build_autoencoder(hp):\n",
102+
" # get encoding dimension\n",
103+
" encode_dim = hp.Int(\"encode_dim\", min_value=1, max_value=(28 * 28), step=1)\n",
104+
" \n",
105+
" # get layer configs\n",
106+
" config = MinNDParams(\n",
107+
" l0={\"input_shape\": (28, 28, 1)},\n",
108+
" l2={\"units\": encode_dim},\n",
109+
" l3={\"units\": 28 * 28 * 1},\n",
110+
" l4={\"target_shape\": (28, 28, 1)},\n",
111+
" )\n",
112+
"\n",
113+
" # create model\n",
114+
" autoencoder = MinNDAE(config)\n",
115+
" \n",
116+
" # get custom loss func\n",
117+
" loss_function = build_encode_dim_loss_function(encode_dim, regularization_factor=REG_FACTOR)\n",
118+
" \n",
119+
" # select loss function\n",
120+
" autoencoder.compile(optimizer=\"adam\", loss=loss_function)\n",
121+
"\n",
122+
" # now return keras model\n",
123+
" return autoencoder.model"
124+
]
125+
},
126+
{
127+
"cell_type": "markdown",
128+
"id": "3a99fb3e-0ac3-4ed9-98b3-92d135fa085d",
129+
"metadata": {},
130+
"source": [
131+
"## Hyperparameter Search\n",
132+
"Now we can begin the *hyperparameter search algorithm*."
133+
]
134+
},
135+
{
136+
"cell_type": "code",
137+
"execution_count": null,
138+
"id": "fb32211a-c914-43b9-a4fd-e76541eae0f1",
139+
"metadata": {
140+
"scrolled": true
141+
},
142+
"outputs": [],
143+
"source": [
144+
"# get hyperparam tools\n",
145+
"from keras.callbacks import EarlyStopping\n",
146+
"from keras_tuner import GridSearch\n",
147+
"\n",
148+
"# setup tuner\n",
149+
"tuner = GridSearch(\n",
150+
" build_autoencoder,\n",
151+
" objective=\"val_loss\",\n",
152+
" max_trials=50,\n",
153+
" directory=\"autoencoder_tuning/minndae\",\n",
154+
" project_name=f\"grid_search_encode_dim_{REG_FACTOR}_reg\",\n",
155+
" seed=42,\n",
156+
")\n",
157+
"\n",
158+
"# create early stop call backs\n",
159+
"stop_early = EarlyStopping(monitor=\"val_loss\", patience=2)\n",
160+
"\n",
161+
"# generate random search space for hyperparameters\n",
162+
"tuner.search_space_summary()\n",
163+
"\n",
164+
"# run the hyperparameter search\n",
165+
"tuner.search(train_ds, epochs=10, validation_data=test_ds, callbacks=[stop_early])"
166+
]
167+
},
168+
{
169+
"cell_type": "code",
170+
"execution_count": null,
171+
"id": "5d25acd5-58e7-48a9-a4ac-a0e9c8e82b02",
172+
"metadata": {},
173+
"outputs": [],
174+
"source": [
175+
"# get hyperparams of best model\n",
176+
"best_hp = tuner.oracle.get_best_trials(num_trials=1)[0].hyperparameters.values\n",
177+
"print(\"Best Hyperparameters:\", best_hp)"
178+
]
179+
},
180+
{
181+
"cell_type": "code",
182+
"execution_count": null,
183+
"id": "9629e788-536f-4944-9a98-1bf1d1b3549c",
184+
"metadata": {},
185+
"outputs": [],
186+
"source": [
187+
"# get plotting libs\n",
188+
"import matplotlib.pyplot as plt\n",
189+
"\n",
190+
"# extract score/encode_dims from each trial\n",
191+
"scores, encoding_dims = zip(\n",
192+
" *((trial.score, trial.hyperparameters[\"encode_dim\"]) for trial in tuner.oracle.trials.values())\n",
193+
")\n",
194+
"\n",
195+
"# Plotting a line chart\n",
196+
"plt.scatter(encoding_dims, scores)\n",
197+
"plt.title(f\"Performance vs Encoding Dimension:\\n{MinNDAE.__name__} / MNIST / {REG_FACTOR:0.4f} Regularization\")\n",
198+
"plt.axvline(x=best_hp[\"encode_dim\"], color=\"r\", linestyle=\"dashed\", linewidth=2, label=\"optimal_encode_dim\")\n",
199+
"plt.axvline(x=32, color=\"y\", linestyle=\"dashed\", linewidth=2, label=\"keras_default\")\n",
200+
"plt.xlabel(\"Encoding Dimension\")\n",
201+
"plt.ylabel(\"Loss Metric\")\n",
202+
"plt.legend()\n",
203+
"plt.show()"
204+
]
205+
}
206+
],
207+
"metadata": {
208+
"kernelspec": {
209+
"display_name": "Python 3 (ipykernel)",
210+
"language": "python",
211+
"name": "python3"
212+
},
213+
"language_info": {
214+
"codemirror_mode": {
215+
"name": "ipython",
216+
"version": 3
217+
},
218+
"file_extension": ".py",
219+
"mimetype": "text/x-python",
220+
"name": "python",
221+
"nbconvert_exporter": "python",
222+
"pygments_lexer": "ipython3",
223+
"version": "3.11.7"
224+
}
225+
},
226+
"nbformat": 4,
227+
"nbformat_minor": 5
228+
}

0 commit comments

Comments
 (0)