Skip to content

Commit c48af52

Browse files
committed
fix mnist sample
1 parent b1f097a commit c48af52

File tree

2 files changed

+284
-4
lines changed

2 files changed

+284
-4
lines changed

docs/HYPERTUNING.md

Lines changed: 280 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,280 @@
1+
# Hyperparameter Tuning
2+
3+
The `ann_hypertune` module provides automated hyperparameter search to find
4+
optimal network configurations. It supports **grid search** (exhaustive),
5+
**random search** (sampling-based), and **Bayesian optimization** (intelligent) strategies.
6+
7+
## Features
8+
9+
- **Grid Search** - exhaustively tries all combinations of hyperparameters
10+
- **Random Search** - randomly samples from the hyperparameter space
11+
- **Bayesian Optimization** - intelligent search using Gaussian Process surrogate
12+
- **Topology Patterns** - automatic layer size generation (pyramid, funnel, etc.)
13+
- **Per-Layer Activations** - different activation function for each layer
14+
- **Data Splitting** - automatic train/validation holdout with optional shuffling
15+
- **Custom Scoring** - user-defined callback for optimization metric
16+
- **Progress Reporting** - callback for monitoring search progress
17+
- **Reproducibility** - seed support for reproducible random searches
18+
19+
## Tunable Hyperparameters
20+
21+
| Parameter | Description |
22+
|-----------|-------------|
23+
| Learning rate | Continuous range with linear or log-scale spacing |
24+
| Batch size | Discrete set of values to try |
25+
| Optimizer | SGD, Momentum, Adam, RMSProp, AdaGrad |
26+
| Hidden layers | Number of hidden layers (1-5) |
27+
| Layer size | Base size for topology generation |
28+
| Topology pattern | CONSTANT, PYRAMID, FUNNEL, INVERSE |
29+
| Activation | Sigmoid, ReLU, LeakyReLU, Tanh (per layer optional) |
30+
31+
## Functions
32+
33+
| Function | Description |
34+
|----------|-------------|
35+
| hypertune_space_init | initialize search space with defaults |
36+
| hypertune_options_init | initialize search options |
37+
| hypertune_result_init | initialize a result structure |
38+
| hypertune_split_data | split data into train/validation sets |
39+
| hypertune_free_split | free split tensors |
40+
| hypertune_grid_search | perform exhaustive grid search |
41+
| hypertune_random_search | perform random search |
42+
| hypertune_bayesian_search | perform Bayesian optimization search |
43+
| hypertune_create_network | create network from result config |
44+
| hypertune_count_grid_trials | calculate total grid combinations |
45+
| hypertune_print_result | print a single result |
46+
| hypertune_print_summary | print top N results |
47+
| hypertune_score_accuracy | default scoring function (accuracy) |
48+
| hypertune_generate_topology | generate layer sizes from pattern |
49+
| hypertune_topology_name | get string name for topology pattern |
50+
| gp_init | initialize Gaussian Process state |
51+
| gp_add_observation | add observation to GP |
52+
| gp_predict | predict mean and variance at a point |
53+
| gp_expected_improvement | compute expected improvement |
54+
| bayesian_options_init | initialize Bayesian optimization options |
55+
56+
## Basic Example
57+
58+
```c
59+
#include "ann_hypertune.h"
60+
61+
// Load your data
62+
PTensor inputs = /* your input data */;
63+
PTensor outputs = /* your output data */;
64+
65+
// Split into train/validation (80/20)
66+
DataSplit split;
67+
hypertune_split_data(inputs, outputs, 0.8f, 1, 0, &split);
68+
69+
// Configure search space
70+
HyperparamSpace space;
71+
hypertune_space_init(&space);
72+
73+
// Customize the search space
74+
space.learning_rate_min = 0.001f;
75+
space.learning_rate_max = 0.1f;
76+
space.learning_rate_steps = 3;
77+
space.learning_rate_log_scale = 1; // log-uniform sampling
78+
79+
space.batch_sizes[0] = 32;
80+
space.batch_sizes[1] = 64;
81+
space.batch_size_count = 2;
82+
83+
space.optimizers[0] = OPT_ADAM;
84+
space.optimizers[1] = OPT_SGD;
85+
space.optimizer_count = 2;
86+
87+
space.hidden_layer_counts[0] = 1;
88+
space.hidden_layer_counts[1] = 2;
89+
space.hidden_layer_count_options = 2;
90+
91+
space.hidden_layer_sizes[0] = 64;
92+
space.hidden_layer_sizes[1] = 128;
93+
space.hidden_layer_size_count = 2;
94+
95+
space.hidden_activations[0] = ACTIVATION_RELU;
96+
space.hidden_activation_count = 1;
97+
98+
space.epoch_limit = 500;
99+
100+
// Configure options
101+
HypertuneOptions options;
102+
hypertune_options_init(&options);
103+
options.verbosity = 1; // show progress
104+
105+
// Run grid search
106+
HypertuneResult results[100];
107+
HypertuneResult best;
108+
int trials = hypertune_grid_search(
109+
&space,
110+
input_size, // number of input features
111+
output_size, // number of output classes
112+
ACTIVATION_SOFTMAX, // output activation
113+
LOSS_CATEGORICAL_CROSS_ENTROPY,
114+
&split,
115+
&options,
116+
results, 100,
117+
&best
118+
);
119+
120+
printf("Completed %d trials\n", trials);
121+
hypertune_print_result(&best);
122+
123+
// Create final network with best configuration
124+
PNetwork net = hypertune_create_network(
125+
&best,
126+
input_size,
127+
output_size,
128+
ACTIVATION_SOFTMAX,
129+
LOSS_CATEGORICAL_CROSS_ENTROPY
130+
);
131+
132+
// Train on full dataset, evaluate, etc.
133+
134+
// Cleanup
135+
hypertune_free_split(&split);
136+
ann_free_network(net);
137+
```
138+
139+
## Random Search
140+
141+
For larger search spaces, random search is often more efficient than grid search:
142+
143+
```c
144+
// Run random search with 50 trials
145+
int trials = hypertune_random_search(
146+
&space,
147+
50, // number of random trials
148+
input_size,
149+
output_size,
150+
ACTIVATION_SOFTMAX,
151+
LOSS_CATEGORICAL_CROSS_ENTROPY,
152+
&split,
153+
&options,
154+
results, 100,
155+
&best
156+
);
157+
158+
// Print top 5 results
159+
hypertune_print_summary(results, trials, 5);
160+
```
161+
162+
## Bayesian Optimization
163+
164+
For more efficient hyperparameter search, Bayesian optimization uses a Gaussian
165+
Process surrogate model to intelligently explore the search space:
166+
167+
```c
168+
#include "ann_hypertune.h"
169+
170+
// Configure search space
171+
HyperparamSpace space;
172+
hypertune_space_init(&space);
173+
space.learning_rate_min = 0.001f;
174+
space.learning_rate_max = 0.1f;
175+
space.batch_sizes[0] = 16;
176+
space.batch_sizes[1] = 32;
177+
space.batch_sizes[2] = 64;
178+
space.batch_size_count = 3;
179+
// ... other fixed hyperparameters
180+
181+
// Configure Bayesian optimization
182+
BayesianOptions bo_opts;
183+
bayesian_options_init(&bo_opts);
184+
bo_opts.n_initial = 10; // Random samples to initialize GP
185+
bo_opts.n_iterations = 20; // BO iterations after initialization
186+
bo_opts.n_candidates = 100; // Candidates to evaluate per iteration
187+
188+
// Run Bayesian optimization
189+
HypertuneResult results[50], best;
190+
int trials = hypertune_bayesian_search(
191+
&space, input_size, output_size,
192+
ACTIVATION_SOFTMAX, LOSS_CROSS_ENTROPY,
193+
&split, &tune_opts, &bo_opts,
194+
results, 50, &best
195+
);
196+
```
197+
198+
**How it works:**
199+
1. **Initial phase**: Randomly samples `n_initial` configurations
200+
2. **BO phase**: Uses Gaussian Process to predict performance, selects points
201+
with highest Expected Improvement (EI)
202+
3. **Optimizes**: Learning rate (log-scale) and batch size
203+
204+
**When to use Bayesian optimization:**
205+
- Expensive evaluations (long training times)
206+
- Smooth objective function
207+
- 2-5 hyperparameters to tune
208+
209+
## Custom Scoring Function
210+
211+
By default, hypertuning optimizes for accuracy. You can provide a custom
212+
scoring function:
213+
214+
```c
215+
// Custom scorer: optimize for F1 score, or minimize loss, etc.
216+
real my_custom_scorer(PNetwork net, PTensor val_in, PTensor val_out, void *data) {
217+
// Your scoring logic here
218+
// Return higher values for better configurations
219+
real accuracy = ann_evaluate_accuracy(net, val_in, val_out);
220+
return accuracy; // or any custom metric
221+
}
222+
223+
// Use custom scorer
224+
options.score_func = my_custom_scorer;
225+
options.user_data = NULL; // optional context data
226+
```
227+
228+
## Topology Patterns
229+
230+
The hypertuning module supports automatic generation of layer sizes based on
231+
topology patterns. This helps explore different network architectures:
232+
233+
| Pattern | Description | Example (3 layers, base=64) |
234+
|---------|-------------|----------------------------|
235+
| CONSTANT | All layers same size | 64 → 64 → 64 |
236+
| PYRAMID | Decreasing sizes toward output | 64 → 32 → 16 |
237+
| INVERSE | Increasing sizes toward output | 16 → 32 → 64 |
238+
| FUNNEL | Expand then contract | 32 → 64 → 32 |
239+
| CUSTOM | Use explicit sizes | user-defined |
240+
241+
```c
242+
// Configure multiple topology patterns
243+
space.topology_patterns[0] = TOPOLOGY_CONSTANT;
244+
space.topology_patterns[1] = TOPOLOGY_PYRAMID;
245+
space.topology_patterns[2] = TOPOLOGY_INVERSE;
246+
space.topology_pattern_count = 3;
247+
248+
// Generate sizes programmatically
249+
int sizes[3];
250+
hypertune_generate_topology(TOPOLOGY_PYRAMID, 64, 3, sizes);
251+
// sizes = {64, 32, 16}
252+
```
253+
254+
## Per-Layer Activations
255+
256+
Enable searching different activations for each hidden layer:
257+
258+
```c
259+
space.hidden_activations[0] = ACTIVATION_RELU;
260+
space.hidden_activations[1] = ACTIVATION_SIGMOID;
261+
space.hidden_activations[2] = ACTIVATION_TANH;
262+
space.hidden_activation_count = 3;
263+
space.search_per_layer_activation = 1; // enable per-layer search
264+
```
265+
266+
When `search_per_layer_activation` is enabled, random search will assign
267+
different activations to each layer independently.
268+
269+
## Search Strategy Comparison
270+
271+
| Strategy | Best For | Pros | Cons |
272+
|----------|----------|------|------|
273+
| **Grid** | Small search spaces | Exhaustive, reproducible | Exponential cost |
274+
| **Random** | Large spaces, many params | Efficient, parallelizable | May miss optimum |
275+
| **Bayesian** | Expensive evaluations | Sample-efficient | Overhead, sequential |
276+
277+
**Rules of thumb:**
278+
- Grid search: ≤100 total combinations
279+
- Random search: 10-100 trials typically sufficient
280+
- Bayesian: When each trial takes minutes/hours

mnist.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
# include <cblas.h>
3333
#endif
3434

35-
#define EPSILON 1e-5
35+
#define CONVERGENCE_EPSILON 0.01
3636

3737
static int threads = -1;
3838
static int batch_size = 32;
@@ -263,9 +263,9 @@ int main(int argc, char *argv[])
263263
tensor_mul_scalar(x_test, (real)(1.0 / 255.0));
264264

265265
// set some hyper-parameters
266-
pnet->epochLimit = epoch_count;
267-
pnet->convergence_epsilon = (real)EPSILON;
268-
pnet->batchSize = batch_size;
266+
ann_set_epoch_limit(pnet, epoch_count);
267+
ann_set_convergence(pnet, (real)0.1);
268+
ann_set_batch_size(pnet, batch_size);
269269

270270
// train the network
271271
ann_train_network(pnet, x_train, y_train, x_train->rows);

0 commit comments

Comments
 (0)