|
| 1 | +# Hyperparameter Tuning |
| 2 | + |
| 3 | +The `ann_hypertune` module provides automated hyperparameter search to find |
| 4 | +optimal network configurations. It supports **grid search** (exhaustive), |
| 5 | +**random search** (sampling-based), and **Bayesian optimization** (intelligent) strategies. |
| 6 | + |
| 7 | +## Features |
| 8 | + |
| 9 | +- **Grid Search** - exhaustively tries all combinations of hyperparameters |
| 10 | +- **Random Search** - randomly samples from the hyperparameter space |
| 11 | +- **Bayesian Optimization** - intelligent search using Gaussian Process surrogate |
| 12 | +- **Topology Patterns** - automatic layer size generation (pyramid, funnel, etc.) |
| 13 | +- **Per-Layer Activations** - different activation function for each layer |
| 14 | +- **Data Splitting** - automatic train/validation holdout with optional shuffling |
| 15 | +- **Custom Scoring** - user-defined callback for optimization metric |
| 16 | +- **Progress Reporting** - callback for monitoring search progress |
| 17 | +- **Reproducibility** - seed support for reproducible random searches |
| 18 | + |
| 19 | +## Tunable Hyperparameters |
| 20 | + |
| 21 | +| Parameter | Description | |
| 22 | +|-----------|-------------| |
| 23 | +| Learning rate | Continuous range with linear or log-scale spacing | |
| 24 | +| Batch size | Discrete set of values to try | |
| 25 | +| Optimizer | SGD, Momentum, Adam, RMSProp, AdaGrad | |
| 26 | +| Hidden layers | Number of hidden layers (1-5) | |
| 27 | +| Layer size | Base size for topology generation | |
| 28 | +| Topology pattern | CONSTANT, PYRAMID, FUNNEL, INVERSE | |
| 29 | +| Activation | Sigmoid, ReLU, LeakyReLU, Tanh (per layer optional) | |
| 30 | + |
| 31 | +## Functions |
| 32 | + |
| 33 | +| Function | Description | |
| 34 | +|----------|-------------| |
| 35 | +| hypertune_space_init | initialize search space with defaults | |
| 36 | +| hypertune_options_init | initialize search options | |
| 37 | +| hypertune_result_init | initialize a result structure | |
| 38 | +| hypertune_split_data | split data into train/validation sets | |
| 39 | +| hypertune_free_split | free split tensors | |
| 40 | +| hypertune_grid_search | perform exhaustive grid search | |
| 41 | +| hypertune_random_search | perform random search | |
| 42 | +| hypertune_bayesian_search | perform Bayesian optimization search | |
| 43 | +| hypertune_create_network | create network from result config | |
| 44 | +| hypertune_count_grid_trials | calculate total grid combinations | |
| 45 | +| hypertune_print_result | print a single result | |
| 46 | +| hypertune_print_summary | print top N results | |
| 47 | +| hypertune_score_accuracy | default scoring function (accuracy) | |
| 48 | +| hypertune_generate_topology | generate layer sizes from pattern | |
| 49 | +| hypertune_topology_name | get string name for topology pattern | |
| 50 | +| gp_init | initialize Gaussian Process state | |
| 51 | +| gp_add_observation | add observation to GP | |
| 52 | +| gp_predict | predict mean and variance at a point | |
| 53 | +| gp_expected_improvement | compute expected improvement | |
| 54 | +| bayesian_options_init | initialize Bayesian optimization options | |
| 55 | + |
| 56 | +## Basic Example |
| 57 | + |
| 58 | +```c |
| 59 | +#include "ann_hypertune.h" |
| 60 | + |
| 61 | +// Load your data |
| 62 | +PTensor inputs = /* your input data */; |
| 63 | +PTensor outputs = /* your output data */; |
| 64 | + |
| 65 | +// Split into train/validation (80/20) |
| 66 | +DataSplit split; |
| 67 | +hypertune_split_data(inputs, outputs, 0.8f, 1, 0, &split); |
| 68 | + |
| 69 | +// Configure search space |
| 70 | +HyperparamSpace space; |
| 71 | +hypertune_space_init(&space); |
| 72 | + |
| 73 | +// Customize the search space |
| 74 | +space.learning_rate_min = 0.001f; |
| 75 | +space.learning_rate_max = 0.1f; |
| 76 | +space.learning_rate_steps = 3; |
| 77 | +space.learning_rate_log_scale = 1; // log-uniform sampling |
| 78 | + |
| 79 | +space.batch_sizes[0] = 32; |
| 80 | +space.batch_sizes[1] = 64; |
| 81 | +space.batch_size_count = 2; |
| 82 | + |
| 83 | +space.optimizers[0] = OPT_ADAM; |
| 84 | +space.optimizers[1] = OPT_SGD; |
| 85 | +space.optimizer_count = 2; |
| 86 | + |
| 87 | +space.hidden_layer_counts[0] = 1; |
| 88 | +space.hidden_layer_counts[1] = 2; |
| 89 | +space.hidden_layer_count_options = 2; |
| 90 | + |
| 91 | +space.hidden_layer_sizes[0] = 64; |
| 92 | +space.hidden_layer_sizes[1] = 128; |
| 93 | +space.hidden_layer_size_count = 2; |
| 94 | + |
| 95 | +space.hidden_activations[0] = ACTIVATION_RELU; |
| 96 | +space.hidden_activation_count = 1; |
| 97 | + |
| 98 | +space.epoch_limit = 500; |
| 99 | + |
| 100 | +// Configure options |
| 101 | +HypertuneOptions options; |
| 102 | +hypertune_options_init(&options); |
| 103 | +options.verbosity = 1; // show progress |
| 104 | + |
| 105 | +// Run grid search |
| 106 | +HypertuneResult results[100]; |
| 107 | +HypertuneResult best; |
| 108 | +int trials = hypertune_grid_search( |
| 109 | + &space, |
| 110 | + input_size, // number of input features |
| 111 | + output_size, // number of output classes |
| 112 | + ACTIVATION_SOFTMAX, // output activation |
| 113 | + LOSS_CATEGORICAL_CROSS_ENTROPY, |
| 114 | + &split, |
| 115 | + &options, |
| 116 | + results, 100, |
| 117 | + &best |
| 118 | +); |
| 119 | + |
| 120 | +printf("Completed %d trials\n", trials); |
| 121 | +hypertune_print_result(&best); |
| 122 | + |
| 123 | +// Create final network with best configuration |
| 124 | +PNetwork net = hypertune_create_network( |
| 125 | + &best, |
| 126 | + input_size, |
| 127 | + output_size, |
| 128 | + ACTIVATION_SOFTMAX, |
| 129 | + LOSS_CATEGORICAL_CROSS_ENTROPY |
| 130 | +); |
| 131 | + |
| 132 | +// Train on full dataset, evaluate, etc. |
| 133 | + |
| 134 | +// Cleanup |
| 135 | +hypertune_free_split(&split); |
| 136 | +ann_free_network(net); |
| 137 | +``` |
| 138 | +
|
| 139 | +## Random Search |
| 140 | +
|
| 141 | +For larger search spaces, random search is often more efficient than grid search: |
| 142 | +
|
| 143 | +```c |
| 144 | +// Run random search with 50 trials |
| 145 | +int trials = hypertune_random_search( |
| 146 | + &space, |
| 147 | + 50, // number of random trials |
| 148 | + input_size, |
| 149 | + output_size, |
| 150 | + ACTIVATION_SOFTMAX, |
| 151 | + LOSS_CATEGORICAL_CROSS_ENTROPY, |
| 152 | + &split, |
| 153 | + &options, |
| 154 | + results, 100, |
| 155 | + &best |
| 156 | +); |
| 157 | +
|
| 158 | +// Print top 5 results |
| 159 | +hypertune_print_summary(results, trials, 5); |
| 160 | +``` |
| 161 | + |
| 162 | +## Bayesian Optimization |
| 163 | + |
| 164 | +For more efficient hyperparameter search, Bayesian optimization uses a Gaussian |
| 165 | +Process surrogate model to intelligently explore the search space: |
| 166 | + |
| 167 | +```c |
| 168 | +#include "ann_hypertune.h" |
| 169 | + |
| 170 | +// Configure search space |
| 171 | +HyperparamSpace space; |
| 172 | +hypertune_space_init(&space); |
| 173 | +space.learning_rate_min = 0.001f; |
| 174 | +space.learning_rate_max = 0.1f; |
| 175 | +space.batch_sizes[0] = 16; |
| 176 | +space.batch_sizes[1] = 32; |
| 177 | +space.batch_sizes[2] = 64; |
| 178 | +space.batch_size_count = 3; |
| 179 | +// ... other fixed hyperparameters |
| 180 | + |
| 181 | +// Configure Bayesian optimization |
| 182 | +BayesianOptions bo_opts; |
| 183 | +bayesian_options_init(&bo_opts); |
| 184 | +bo_opts.n_initial = 10; // Random samples to initialize GP |
| 185 | +bo_opts.n_iterations = 20; // BO iterations after initialization |
| 186 | +bo_opts.n_candidates = 100; // Candidates to evaluate per iteration |
| 187 | + |
| 188 | +// Run Bayesian optimization |
| 189 | +HypertuneResult results[50], best; |
| 190 | +int trials = hypertune_bayesian_search( |
| 191 | + &space, input_size, output_size, |
| 192 | + ACTIVATION_SOFTMAX, LOSS_CROSS_ENTROPY, |
| 193 | + &split, &tune_opts, &bo_opts, |
| 194 | + results, 50, &best |
| 195 | +); |
| 196 | +``` |
| 197 | + |
| 198 | +**How it works:** |
| 199 | +1. **Initial phase**: Randomly samples `n_initial` configurations |
| 200 | +2. **BO phase**: Uses Gaussian Process to predict performance, selects points |
| 201 | + with highest Expected Improvement (EI) |
| 202 | +3. **Optimizes**: Learning rate (log-scale) and batch size |
| 203 | + |
| 204 | +**When to use Bayesian optimization:** |
| 205 | +- Expensive evaluations (long training times) |
| 206 | +- Smooth objective function |
| 207 | +- 2-5 hyperparameters to tune |
| 208 | + |
| 209 | +## Custom Scoring Function |
| 210 | + |
| 211 | +By default, hypertuning optimizes for accuracy. You can provide a custom |
| 212 | +scoring function: |
| 213 | + |
| 214 | +```c |
| 215 | +// Custom scorer: optimize for F1 score, or minimize loss, etc. |
| 216 | +real my_custom_scorer(PNetwork net, PTensor val_in, PTensor val_out, void *data) { |
| 217 | + // Your scoring logic here |
| 218 | + // Return higher values for better configurations |
| 219 | + real accuracy = ann_evaluate_accuracy(net, val_in, val_out); |
| 220 | + return accuracy; // or any custom metric |
| 221 | +} |
| 222 | + |
| 223 | +// Use custom scorer |
| 224 | +options.score_func = my_custom_scorer; |
| 225 | +options.user_data = NULL; // optional context data |
| 226 | +``` |
| 227 | +
|
| 228 | +## Topology Patterns |
| 229 | +
|
| 230 | +The hypertuning module supports automatic generation of layer sizes based on |
| 231 | +topology patterns. This helps explore different network architectures: |
| 232 | +
|
| 233 | +| Pattern | Description | Example (3 layers, base=64) | |
| 234 | +|---------|-------------|----------------------------| |
| 235 | +| CONSTANT | All layers same size | 64 → 64 → 64 | |
| 236 | +| PYRAMID | Decreasing sizes toward output | 64 → 32 → 16 | |
| 237 | +| INVERSE | Increasing sizes toward output | 16 → 32 → 64 | |
| 238 | +| FUNNEL | Expand then contract | 32 → 64 → 32 | |
| 239 | +| CUSTOM | Use explicit sizes | user-defined | |
| 240 | +
|
| 241 | +```c |
| 242 | +// Configure multiple topology patterns |
| 243 | +space.topology_patterns[0] = TOPOLOGY_CONSTANT; |
| 244 | +space.topology_patterns[1] = TOPOLOGY_PYRAMID; |
| 245 | +space.topology_patterns[2] = TOPOLOGY_INVERSE; |
| 246 | +space.topology_pattern_count = 3; |
| 247 | +
|
| 248 | +// Generate sizes programmatically |
| 249 | +int sizes[3]; |
| 250 | +hypertune_generate_topology(TOPOLOGY_PYRAMID, 64, 3, sizes); |
| 251 | +// sizes = {64, 32, 16} |
| 252 | +``` |
| 253 | + |
| 254 | +## Per-Layer Activations |
| 255 | + |
| 256 | +Enable searching different activations for each hidden layer: |
| 257 | + |
| 258 | +```c |
| 259 | +space.hidden_activations[0] = ACTIVATION_RELU; |
| 260 | +space.hidden_activations[1] = ACTIVATION_SIGMOID; |
| 261 | +space.hidden_activations[2] = ACTIVATION_TANH; |
| 262 | +space.hidden_activation_count = 3; |
| 263 | +space.search_per_layer_activation = 1; // enable per-layer search |
| 264 | +``` |
| 265 | + |
| 266 | +When `search_per_layer_activation` is enabled, random search will assign |
| 267 | +different activations to each layer independently. |
| 268 | + |
| 269 | +## Search Strategy Comparison |
| 270 | + |
| 271 | +| Strategy | Best For | Pros | Cons | |
| 272 | +|----------|----------|------|------| |
| 273 | +| **Grid** | Small search spaces | Exhaustive, reproducible | Exponential cost | |
| 274 | +| **Random** | Large spaces, many params | Efficient, parallelizable | May miss optimum | |
| 275 | +| **Bayesian** | Expensive evaluations | Sample-efficient | Overhead, sequential | |
| 276 | + |
| 277 | +**Rules of thumb:** |
| 278 | +- Grid search: ≤100 total combinations |
| 279 | +- Random search: 10-100 trials typically sufficient |
| 280 | +- Bayesian: When each trial takes minutes/hours |
0 commit comments