Understanding the compisition of test and validation sets. #1160

Palettenbrett · 2023-06-30T09:05:51Z

Palettenbrett
Jun 30, 2023

Hello fellow Anomalib users,

can someone explain to me how the test set and the validation set are put together during training and testing?

In my experience the behaviour is quite strange and i yet cant make sense of it.

My custom dataset contains 2800 pictures, 1400 good and 1400 bad.
I seperated 250 pictures from each class to be the test set.
So the model should only use 1150 pictures each for training.

During training of the padim classification model, it uses 463 Pictures for the first epoch.
I cant wrap my head around how the 463 Pictures arise.

Simmilary during testing i set the path to my test set i made earlier.
But here it uses only 125 Pictures insead of the 250 it should.

I would be very thankful if someone can shed a little light into this.

The config i used is below:

dataset:
name: part1
format: folder
path: ./datasets/part1/Dataset_v2
normal_dir: IO # name of the folder containing normal images.
abnormal_dir: NIO # name of the folder containing abnormal images. Change this to test folder when testing
normal_test_dir: IO_Test # name of the folder containing normal test images.
task: classification # classification or segmentation
mask: null #optional
normalization: imagenet
extensions: null
image_size: 256
train_batch_size: 4
test_batch_size: 4
num_workers: 8
test_split_mode: from_dir # options: [from_dir, synthetic]
test_split_ratio: 0.2 # fraction of train images held out testing (usage depends on test_split_mode)
val_split_mode: from_test # options: [same_as_test, from_test, synthetic]
val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)
create_validation_set: true
transform_config:
train: null
val: null
tiling:
apply: false
tile_size: null
stride: null
remove_border_count: 0
use_random_tiling: False
random_tile_count: 16

model:
name: padim
backbone: resnet18
pre_trained: true
layers:
- layer1
- layer2
- layer3
normalization_method: min_max # options: [none, min_max, cdf]

metrics:
image:
- AUROC
- F1Score
pixel: null
threshold:
method: adaptive #options: [adaptive, manual]
manual_image: null
manual_pixel: null

visualization:
show_images: False # show images on the screen
save_images: True # save images to the file system
log_images: True # log images to the available loggers (if any)
image_save_path: null # path to which images will be saved
mode: full # options: ["full", "simple"]

project:
seed: 42
path: ./results/padim_v3

logging:
logger: [] # options: [comet, tensorboard, wandb, csv] or combinations.
log_graph: false # Logs the model graph to respective logger.

optimization:
export_mode: null # options: torch, onnx, openvino

trainer:
enable_checkpointing: true
default_root_dir: null
gradient_clip_val: 0
gradient_clip_algorithm: norm
num_nodes: 1
devices: 1
enable_progress_bar: true
overfit_batches: 0.0
track_grad_norm: -1
check_val_every_n_epoch: 1 # Don't validate before extracting features.
fast_dev_run: false
accumulate_grad_batches: 1
max_epochs: 5
min_epochs: null
max_steps: -1
min_steps: null
max_time: null
limit_train_batches: 1.0
limit_val_batches: 1.0
limit_test_batches: 1.0
limit_predict_batches: 1.0
val_check_interval: 1.0 # Don't validate before extracting features.
log_every_n_steps: 50
accelerator: gpu # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
strategy: null
sync_batchnorm: false
precision: 32
enable_model_summary: true
num_sanity_val_steps: 0
profiler: null
benchmark: false
deterministic: false
reload_dataloaders_every_n_epochs: 0
auto_lr_find: false
replace_sampler_ddp: true
detect_anomaly: false
auto_scale_batch_size: false
plugins: null
move_metrics_to_cpu: false
multiple_trainloader_mode: max_size_cycle

blaz-r · 2023-06-30T14:33:51Z

blaz-r
Jun 30, 2023

Hello.
I'm unsure how exactly these numbers are obtained, but just some clarification:

By default, all normal images are assigned to train and all abnormal + the ones from normal_test_dir are assigned to test.
The other thing to notice is that val_split_mode: from_test takes (and removes) validation samples from test_set. Since you have val_split_ratio: 0.5 this means that half of test set becomes validation set.

So from my understanding, this should mean that your train set, consists of 1150 normal images (1). Your test set contains 1150 abnormal from train + additional 250 normal from test, 1400 in total (1). This is then additionally split to val, resulting in 700 images for test (2).

I'm not sure how you obtained the numbers above, but it seems a bit off.

I also think that documentation needs some updating because this splitting really is a bit unclear.

0 replies

Palettenbrett · 2023-07-03T10:47:57Z

Palettenbrett
Jul 3, 2023
Author

Thanks for the explanaition. Yeah, i figured this should be explained more in the documentation, among other things. A problem i also encounterd was, that i think the code uses part of the abnormal images for testing that it was trained on. Ive reached F1Scores of 91,2% during training. But as soon i run my model for testing with testing.py with a testing dataset, the score drops to 72% F1Score. I dont think these scores should be this far apart, for the reason, F1 Score during training is evaluated on the test set. Am i missing something or is this a potentail error?

3 replies

blaz-r Jul 3, 2023

Training is performed only on normal data by default, this means that all anomalous data from your training folder is automatically added to test.

When it comes to testing and your configuration, I believe that the validation set used during training is not the same as the test set used during testing (refer to previous response as to why that is).

So if you want them to be the same, set validation split to val_split_mode: same_as_test.

Palettenbrett Jul 3, 2023
Author

Ah, i see. So the model only trains on normal_dir: images and uses the abnormal_dir: + normal_test_dir: images for testing in a classification task. If i want to evaluate the model on unseen data, is it correct to assume that i need to change the images in abnormal_dir: to unseen images?

blaz-r Jul 3, 2023

I believe you already had a setup to test on unseen data.

So when you use val_split_mode: from_test, and with ratio set to 0.5, this means that half of files from abnormal + normal_test are put into the validation set used in training, the other half is saved for later use in testing. That's probably why you see the difference in performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Understanding the compisition of test and validation sets. #1160

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Understanding the compisition of test and validation sets. #1160

Uh oh!

Palettenbrett Jun 30, 2023

Replies: 2 comments · 3 replies

Uh oh!

blaz-r Jun 30, 2023

Uh oh!

Uh oh!

Palettenbrett Jul 3, 2023 Author

Uh oh!

Uh oh!

blaz-r Jul 3, 2023

Uh oh!

Palettenbrett Jul 3, 2023 Author

Uh oh!

blaz-r Jul 3, 2023

Palettenbrett
Jun 30, 2023

Replies: 2 comments 3 replies

blaz-r
Jun 30, 2023

Palettenbrett
Jul 3, 2023
Author

Palettenbrett Jul 3, 2023
Author