PatchCore Training #1276

TorAP · 2023-04-10T20:52:55Z

TorAP
Apr 10, 2023

Describe the bug

I get a key-error when doing HPO using WandB.

Dataset

MVTec

Model

PatchCore

Steps to reproduce the behavior

Install anomalib
python tools/hpo/sweep.py --model patchcore --model_config src/anomalib/models/patchcore/config.yaml --sweep_config tools/hpo/configs/wandb.yaml

Expected behavior

I would expect the sweep to start..

Screenshots

No response

Pip/GitHub

pip

What version/branch did you use?

No response

Configuration YAML

---------
sweep.yml
---------
observation_budget: 10
method: bayes
metric:
  name: pixel_AUROC
  goal: maximize
parameters:
  learning_rate:
    min: 1e-5
    max: 1e-1
  optimizer:
    values: ["adam","sgd"]
  dataset:
    category: capsule
  model:
    backbone:
      values: [resnet18, wide_resnet50_2]


--------
PatchCore config
--------
dataset:
  name: mvtec
  format: mvtec
  path: ./datasets/MVTec
  task: segmentation
  category: bottle
  train_batch_size: 32
  test_batch_size: 32
  num_workers: 1
  image_size: 256 # dimensions to which images are resized (mandatory)
  center_crop: 224 # dimensions to which images are center-cropped after resizing (optional)
  normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
  transform_config:
    train: null
    eval: null
  test_split_mode: from_dir # options: [from_dir, synthetic]
  test_split_ratio: 0.2 # fraction of train images held out testing (usage depends on test_split_mode)
  val_split_mode: same_as_test # options: [same_as_test, from_test, synthetic]
  val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: patchcore
  backbone: wide_resnet50_2
  pre_trained: False
  layers:
    - layer3
  coreset_sampling_ratio: 0.1
  num_neighbors: 9
  normalization_method: min_max # options: [null, min_max, cdf]

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
    - F1Score
    - AUROC
  threshold:
    method: adaptive #options: [adaptive, manual]
    manual_image: null
    manual_pixel: null

visualization:
  show_images: False # show images on the screen
  save_images: True # save images to the file system
  log_images: True # log images to the available loggers (if any)
  image_save_path: patchcore/mvtec/bottle/run/images # path to which images will be saved
  mode: full # options: ["full", "simple"]

project:
  seed: 0
  path: ./results

logging:
  logger: [wandb] # options: [comet, tensorboard, wandb, csv] or combinations.
  log_graph: True  # Logs the model graph to respective logger.

optimization:
  export_mode: null # options: onnx, openvino

# PL Trainer Args. Don't add extra parameter here.
trainer:
  enable_checkpointing: true
  default_root_dir: null
  gradient_clip_val: 0
  gradient_clip_algorithm: norm
  num_nodes: 1
  devices: 1
  enable_progress_bar: true
  overfit_batches: 0.0
  track_grad_norm: -1
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  fast_dev_run: false
  accumulate_grad_batches: 1
  max_epochs: 50
  min_epochs: null
  max_steps: -1
  min_steps: null
  max_time: null
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  limit_test_batches: 1.0
  limit_predict_batches: 1.0
  val_check_interval: 1.0 # Don't validate before extracting features.
  log_every_n_steps: 50
  accelerator: auto # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  strategy: null
  sync_batchnorm: false
  precision: 32
  enable_model_summary: true
  num_sanity_val_steps: 0
  profiler: null
  benchmark: false
  deterministic: false
  reload_dataloaders_every_n_epochs: 0
  auto_lr_find: false
  replace_sampler_ddp: true
  detect_anomaly: false
  auto_scale_batch_size: false
  plugins: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle

Logs

/home/toap/.conda/envs/anomaliv_env2/lib/python3.8/site-packages/requests/__init__.py:109: RequestsDependencyWarning: urllib3 (1.26.14) or chardet (None)/charset_normalizer (3.1.0) doesn't match a supported version!
  warnings.warn(
/home/toap/.conda/envs/anomaliv_env2/lib/python3.8/site-packages/anomalib/config/config.py:238: UserWarning: The seed value is now fixed to 0. Up to v0.3.7, the seed was not fixed when the seed value was set to 0. If you want to use the random seed, please select `None` for the seed value (`null` in the YAML file) or remove the `seed` key from the YAML file.
  warn(
/home/toap/.conda/envs/anomaliv_env2/lib/python3.8/site-packages/anomalib/config/config.py:275: UserWarning: config.project.unique_dir is set to False. This does not ensure that your results will be written in an empty directory and you may overwrite files.
  warn(
[rank: 0] Global seed set to 0
/home/toap/.conda/envs/anomaliv_env2/lib/python3.8/site-packages/requests/__init__.py:109: RequestsDependencyWarning: urllib3 (1.26.14) or chardet (None)/charset_normalizer (3.1.0) doesn't match a supported version!
  warnings.warn(
/home/toap/.conda/envs/anomaliv_env2/lib/python3.8/site-packages/anomalib/config/config.py:238: UserWarning: The seed value is now fixed to 0. Up to v0.3.7, the seed was not fixed when the seed value was set to 0. If you want to use the random seed, please select `None` for the seed value (`null` in the YAML file) or remove the `seed` key from the YAML file.
  warn(
/home/toap/.conda/envs/anomaliv_env2/lib/python3.8/site-packages/anomalib/config/config.py:275: UserWarning: config.project.unique_dir is set to False. This does not ensure that your results will be written in an empty directory and you may overwrite files.
  warn(
[rank: 1] Global seed set to 0
/home/toap/.conda/envs/anomaliv_env2/lib/python3.8/site-packages/requests/__init__.py:109: RequestsDependencyWarning: urllib3 (1.26.14) or chardet (None)/charset_normalizer (3.1.0) doesn't match a supported version!
  warnings.warn(
/home/toap/.conda/envs/anomaliv_env2/lib/python3.8/site-packages/requests/__init__.py:109: RequestsDependencyWarning: urllib3 (1.26.14) or chardet (None)/charset_normalizer (3.1.0) doesn't match a supported version!
  warnings.warn(
wandb: Agent Starting Run: tfvdtelj with config:
wandb: 	learning_rate: 0.05701225120472204
wandb: 	model.backbone: wide_resnet50_2
wandb: 	optimizer: adam
wandb: Agent Starting Run: 6edtodhp with config:
wandb: 	learning_rate: 0.0268687292737814
wandb: 	model.backbone: resnet18
wandb: 	optimizer: sgd
Run tfvdtelj errored: AttributeError("'function' object has no attribute 'keys'")
wandb: ERROR Run tfvdtelj errored: AttributeError("'function' object has no attribute 'keys'")
wandb: Currently logged in as: tor-arnth. Use `wandb login --relogin` to force relogin
wandb: WARNING Ignored wandb.init() arg project when running a sweep.
wandb: wandb version 0.14.2 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.14.0
wandb: Run data is saved locally in ./wandb/run-20230410_200952-6edtodhp
wandb: Run `wandb offline` to turn off syncing.
wandb: Agent Starting Run: b65xplws with config:
wandb: 	learning_rate: 0.030693667871486703
wandb: 	model.backbone: resnet18
wandb: 	optimizer: adam
Run b65xplws errored: AttributeError("'function' object has no attribute 'keys'")
wandb: ERROR Run b65xplws errored: AttributeError("'function' object has no attribute 'keys'")
wandb: Resuming run sweepy-sweep-1
wandb: ⭐️ View project at https://wandb.ai/tor-arnth/patchcore_mvtec
wandb: 🧹 View sweep at https://wandb.ai/tor-arnth/patchcore_mvtec/sweeps/l2e38kmv
wandb: 🚀 View run at https://wandb.ai/tor-arnth/patchcore_mvtec/runs/6edtodhp
/home/toap/.conda/envs/anomaliv_env2/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Metric `PrecisionRecallCurve` will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
  warnings.warn(*args, **kwargs)
FeatureExtractor is deprecated. Use TimmFeatureExtractor instead. Both FeatureExtractor and TimmFeatureExtractor will be removed in a future release.
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(limit_train_batches=1.0)` was configured so 100% of the batches per epoch will be used..
`Trainer(limit_val_batches=1.0)` was configured so 100% of the batches will be used..
`Trainer(limit_test_batches=1.0)` was configured so 100% of the batches will be used..
`Trainer(limit_predict_batches=1.0)` was configured so 100% of the batches will be used..
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
You are using a CUDA device ('NVIDIA A30') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
wandb: Sweep Agent: Waiting for job.
/home/toap/.conda/envs/anomaliv_env2/lib/python3.8/site-packages/torchmetrics/utilities/prints.py:36: UserWarning: Metric `ROC` will save all targets and predictions in buffer. For large datasets this may lead to large memory footprint.
  warnings.warn(*args, **kwargs)
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
/home/toap/.conda/envs/anomaliv_env2/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py:183: UserWarning: `LightningModule.configure_optimizers` returned `None`, this fit will run with no optimizer
  rank_zero_warn(

  | Name            | Type                     | Params
-------------------------------------------------------------
0 | image_threshold | AnomalyScoreThreshold    | 0
1 | pixel_threshold | AnomalyScoreThreshold    | 0
2 | model           | PatchcoreModel           | 2.8 M
3 | image_metrics   | AnomalibMetricCollection | 0
4 | pixel_metrics   | AnomalibMetricCollection | 0
-------------------------------------------------------------
2.8 M     Trainable params
0         Non-trainable params
2.8 M     Total params
11.131    Total estimated model params size (MB)
wandb: WARNING Config item 'learning_rate' was locked by 'sweep' (ignored update).
wandb: WARNING Config item 'optimizer' was locked by 'sweep' (ignored update).
SLURM auto-requeueing enabled. Setting signal handlers.
/home/toap/.conda/envs/anomaliv_env2/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, train_dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 96 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
/home/toap/.conda/envs/anomaliv_env2/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py:1609: PossibleUserWarning: The number of training batches (7) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
  rank_zero_warn(
/home/toap/.conda/envs/anomaliv_env2/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:224: PossibleUserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 96 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
Create sweep with ID: l2e38kmv
Sweep URL: https://wandb.ai/tor-arnth/patchcore_mvtec/sweeps/l2e38kmv
wandb: Job received.
wandb: Agent Starting Run: w5ckatl4 with config:
wandb: 	learning_rate: 0.05802934016250325
wandb: 	model.backbone: resnet18
wandb: 	optimizer: sgd
Create sweep with ID: ya55wjys
Sweep URL: https://wandb.ai/tor-arnth/patchcore_mvtec/sweeps/ya55wjys
Run w5ckatl4 errored: AttributeError("'function' object has no attribute 'keys'")
wandb: ERROR Run w5ckatl4 errored: AttributeError("'function' object has no attribute 'keys'")
Detected 3 failed runs in the first 60 seconds, killing sweep.
wandb: ERROR Detected 3 failed runs in the first 60 seconds, killing sweep.
wandb: To disable this check set WANDB_AGENT_DISABLE_FLAPPING=true
Epoch 0:  60%|██████    | 6/10 [00:08<00:05,  1.42s/it, loss=nan, /home/toap/.conda/envs/anomaliv_env2/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py:138: UserWarning: `training_step` returned `None`. If this was on purpose, ignore this warning...
  self.warning_cache.warn("`training_step` returned `None`. If this was on purpose, ignore this warning...")
Epoch 0: 100%|██████████| 10/10 [00:15<00:00,  1.54s/it, loss=nan, v_num=odhp, pEpoch 0: 100%|██████████| 10/10 [00:15<00:00,  1.54s/it, loss=nan, v_num=odhp, pEpoch 0:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1ScEpoch 1:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1ScEpoch 1:  10%|█         | 1/10 [00:01<00:13,  1.47s/it, loss=nan, v_num=odhp, piEpoch 1:  10%|█         | 1/10 [00:01<00:13,  1.48s/it, loss=nan, v_num=odhp, piEpoch 1:  20%|██        | 2/10 [00:02<00:08,  1.09s/it, loss=nan, v_num=odhp, piEpoch 1:  20%|██        | 2/10 [00:02<00:08,  1.09s/it, loss=nan, v_num=odhp, piEpoch 1:  30%|███       | 3/10 [00:02<00:06,  1.03it/s, loss=nan, v_num=odhp, piEpoch 1:  30%|███       | 3/10 [00:02<00:06,  1.02it/s, loss=nan, v_num=odhp, piEpoch 1:  40%|████      | 4/10 [00:03<00:05,  1.09it/s, loss=nan, v_num=odhp, piEpoch 1:  40%|████      | 4/10 [00:03<00:05,  1.09it/s, loss=nan, v_num=odhp, piEpoch 1:  50%|█████     | 5/10 [00:04<00:04,  1.14it/s, loss=nan, v_num=odhp, piEpoch 1:  50%|█████     | 5/10 [00:04<00:04,  1.13it/s, loss=nan, v_num=odhp, piEpoch 1:  60%|██████    | 6/10 [00:05<00:03,  1.17it/s, loss=nan, v_num=odhp, piEpoch 1:  60%|██████    | 6/10 [00:05<00:03,  1.17it/s, loss=nan, v_num=odhp, piEpoch 1:  70%|███████   | 7/10 [00:05<00:02,  1.26it/s, loss=nan, v_num=odhp, piEpoch 1:  70%|███████   | 7/10 [00:05<00:02,  1.26it/s, loss=nan, v_num=odhp, piEpoch 1:  80%|████████  | 8/10 [00:13<00:03,  1.70s/it, loss=nan, v_num=odhp, piEpoch 1:  90%|█████████ | 9/10 [00:14<00:01,  1.60s/it, loss=nan, v_num=odhp, piEpoch 1: 100%|██████████| 10/10 [00:14<00:00,  1.48s/it, loss=nan, v_num=odhp, pEpoch 1: 100%|██████████| 10/10 [00:15<00:00,  1.56s/it, loss=nan, v_num=odhp, pEpoch 1: 100%|██████████| 10/10 [00:15<00:00,  1.56s/it, loss=nan, v_num=odhp, pEpoch 1:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1ScEpoch 2:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1ScEpoch 2:  10%|█         | 1/10 [00:01<00:11,  1.32s/it, loss=nan, v_num=odhp, piEpoch 2:  10%|█         | 1/10 [00:01<00:12,  1.34s/it, loss=nan, v_num=odhp, piEpoch 2:  20%|██        | 2/10 [00:02<00:08,  1.02s/it, loss=nan, v_num=odhp, piEpoch 2:  20%|██        | 2/10 [00:02<00:08,  1.02s/it, loss=nan, v_num=odhp, piEpoch 2:  30%|███       | 3/10 [00:02<00:06,  1.08it/s, loss=nan, v_num=odhp, piEpoch 2:  30%|███       | 3/10 [00:02<00:06,  1.08it/s, loss=nan, v_num=odhp, piEpoch 2:  40%|████      | 4/10 [00:03<00:05,  1.13it/s, loss=nan, v_num=odhp, piEpoch 2:  40%|████      | 4/10 [00:03<00:05,  1.13it/s, loss=nan, v_num=odhp, piEpoch 2:  50%|█████     | 5/10 [00:04<00:04,  1.17it/s, loss=nan, v_num=odhp, piEpoch 2:  50%|█████     | 5/10 [00:04<00:04,  1.17it/s, loss=nan, v_num=odhp, piEpoch 2:  60%|██████    | 6/10 [00:05<00:03,  1.19it/s, loss=nan, v_num=odhp, piEpoch 2:  60%|██████    | 6/10 [00:05<00:03,  1.19it/s, loss=nan, v_num=odhp, piEpoch 2:  70%|███████   | 7/10 [00:05<00:02,  1.29it/s, loss=nan, v_num=odhp, piEpoch 2:  70%|███████   | 7/10 [00:05<00:02,  1.29it/s, loss=nan, v_num=odhp, piEpoch 2:  80%|████████  | 8/10 [00:20<00:05,  2.56s/it, loss=nan, v_num=odhp, piEpoch 2:  90%|█████████ | 9/10 [00:21<00:02,  2.36s/it, loss=nan, v_num=odhp, piEpoch 2: 100%|██████████| 10/10 [00:21<00:00,  2.17s/it, loss=nan, v_num=odhp, pEpoch 2: 100%|██████████| 10/10 [00:22<00:00,  2.26s/it, loss=nan, v_num=odhp, pEpoch 2: 100%|██████████| 10/10 [00:22<00:00,  2.27s/it, loss=nan, v_num=odhp, pEpoch 2:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1ScEpoch 3:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1ScEpoch 3:  10%|█         | 1/10 [00:01<00:11,  1.33s/it, loss=nan, v_num=odhp, piEpoch 3:  10%|█         | 1/10 [00:01<00:12,  1.34s/it, loss=nan, v_num=odhp, piEpoch 3:  20%|██        | 2/10 [00:02<00:08,  1.02s/it, loss=nan, v_num=odhp, piEpoch 3:  20%|██        | 2/10 [00:02<00:08,  1.02s/it, loss=nan, v_num=odhp, piEpoch 3:  30%|███       | 3/10 [00:02<00:06,  1.08it/s, loss=nan, v_num=odhp, piEpoch 3:  30%|███       | 3/10 [00:02<00:06,  1.08it/s, loss=nan, v_num=odhp, piEpoch 3:  40%|████      | 4/10 [00:03<00:05,  1.14it/s, loss=nan, v_num=odhp, piEpoch 3:  40%|████      | 4/10 [00:03<00:05,  1.14it/s, loss=nan, v_num=odhp, piEpoch 3:  50%|█████     | 5/10 [00:04<00:04,  1.18it/s, loss=nan, v_num=odhp, piEpoch 3:  50%|█████     | 5/10 [00:04<00:04,  1.18it/s, loss=nan, v_num=odhp, piEpoch 3:  60%|██████    | 6/10 [00:04<00:03,  1.20it/s, loss=nan, v_num=odhp, piEpoch 3:  60%|██████    | 6/10 [00:05<00:03,  1.20it/s, loss=nan, v_num=odhp, piEpoch 3:  70%|███████   | 7/10 [00:05<00:02,  1.30it/s, loss=nan, v_num=odhp, piEpoch 3:  70%|███████   | 7/10 [00:05<00:02,  1.30it/s, loss=nan, v_num=odhp, piEpoch 3:  80%|████████  | 8/10 [00:29<00:07,  3.69s/it, loss=nan, v_num=odhp, piEpoch 3:  90%|█████████ | 9/10 [00:30<00:03,  3.38s/it, loss=nan, v_num=odhp, piEpoch 3: 100%|██████████| 10/10 [00:30<00:00,  3.09s/it, loss=nan, v_num=odhp, pEpoch 3: 100%|██████████| 10/10 [00:31<00:00,  3.16s/it, loss=nan, v_num=odhp, pEpoch 3: 100%|██████████| 10/10 [00:31<00:00,  3.16s/it, loss=nan, v_num=odhp, pEpoch 3:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1ScEpoch 4:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1ScEpoch 4:  10%|█         | 1/10 [00:01<00:12,  1.34s/it, loss=nan, v_num=odhp, piEpoch 4:  10%|█         | 1/10 [00:01<00:12,  1.36s/it, loss=nan, v_num=odhp, piEpoch 4:  20%|██        | 2/10 [00:02<00:08,  1.03s/it, loss=nan, v_num=odhp, piEpoch 4:  20%|██        | 2/10 [00:02<00:08,  1.03s/it, loss=nan, v_num=odhp, piEpoch 4:  30%|███       | 3/10 [00:02<00:06,  1.08it/s, loss=nan, v_num=odhp, piEpoch 4:  30%|███       | 3/10 [00:02<00:06,  1.07it/s, loss=nan, v_num=odhp, piEpoch 4:  40%|████      | 4/10 [00:03<00:05,  1.14it/s, loss=nan, v_num=odhp, piEpoch 4:  40%|████      | 4/10 [00:03<00:05,  1.13it/s, loss=nan, v_num=odhp, piEpoch 4:  50%|█████     | 5/10 [00:04<00:04,  1.17it/s, loss=nan, v_num=odhp, piEpoch 4:  50%|█████     | 5/10 [00:04<00:04,  1.17it/s, loss=nan, v_num=odhp, piEpoch 4:  60%|██████    | 6/10 [00:05<00:03,  1.20it/s, loss=nan, v_num=odhp, piEpoch 4:  60%|██████    | 6/10 [00:05<00:03,  1.20it/s, loss=nan, v_num=odhp, piEpoch 4:  70%|███████   | 7/10 [00:05<00:02,  1.30it/s, loss=nan, v_num=odhp, piEpoch 4:  70%|███████   | 7/10 [00:05<00:02,  1.29it/s, loss=nan, v_num=odhp, piEpoch 4:  80%|████████  | 8/10 [00:42<00:10,  5.29s/it, loss=nan, v_num=odhp, piEpoch 4:  90%|█████████ | 9/10 [00:43<00:04,  4.79s/it, loss=nan, v_num=odhp, piEpoch 4: 100%|██████████| 10/10 [00:43<00:00,  4.35s/it, loss=nan, v_num=odhp, pEpoch 4: 100%|██████████| 10/10 [00:44<00:00,  4.43s/it, loss=nan, v_num=odhp, pEpoch 4: 100%|██████████| 10/10 [00:44<00:00,  4.43s/it, loss=nan, v_num=odhp, pEpoch 4:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1ScEpoch 5:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1ScEpoch 5:  10%|█         | 1/10 [00:01<00:12,  1.35s/it, loss=nan, v_num=odhp, piEpoch 5:  10%|█         | 1/10 [00:01<00:12,  1.36s/it, loss=nan, v_num=odhp, piEpoch 5:  20%|██        | 2/10 [00:02<00:08,  1.02s/it, loss=nan, v_num=odhp, piEpoch 5:  20%|██        | 2/10 [00:02<00:08,  1.02s/it, loss=nan, v_num=odhp, piEpoch 5:  30%|███       | 3/10 [00:02<00:06,  1.08it/s, loss=nan, v_num=odhp, piEpoch 5:  30%|███       | 3/10 [00:02<00:06,  1.08it/s, loss=nan, v_num=odhp, piEpoch 5:  40%|████      | 4/10 [00:03<00:05,  1.13it/s, loss=nan, v_num=odhp, piEpoch 5:  40%|████      | 4/10 [00:03<00:05,  1.13it/s, loss=nan, v_num=odhp, piEpoch 5:  50%|█████     | 5/10 [00:04<00:04,  1.17it/s, loss=nan, v_num=odhp, piEpoch 5:  50%|█████     | 5/10 [00:04<00:04,  1.17it/s, loss=nan, v_num=odhp, piEpoch 5:  60%|██████    | 6/10 [00:05<00:03,  1.20it/s, loss=nan, v_num=odhp, piEpoch 5:  60%|██████    | 6/10 [00:05<00:03,  1.19it/s, loss=nan, v_num=odhp, piEpoch 5:  70%|███████   | 7/10 [00:05<00:02,  1.29it/s, loss=nan, v_num=odhp, piEpoch 5:  70%|███████   | 7/10 [00:05<00:02,  1.29it/s, loss=nan, v_num=odhp, piEpoch 5:  80%|████████  | 8/10 [00:58<00:14,  7.26s/it, loss=nan, v_num=odhp, piEpoch 5:  90%|█████████ | 9/10 [00:58<00:06,  6.54s/it, loss=nan, v_num=odhp, piEpoch 5: 100%|██████████| 10/10 [00:59<00:00,  5.93s/it, loss=nan, v_num=odhp, pEpoch 5: 100%|██████████| 10/10 [01:00<00:00,  6.01s/it, loss=nan, v_num=odhp, pEpoch 5: 100%|██████████| 10/10 [01:00<00:00,  6.01s/it, loss=nan, v_num=odhp, pEpoch 5:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1ScEpoch 6:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1ScEpoch 6:  10%|█         | 1/10 [00:01<00:14,  1.58s/it, loss=nan, v_num=odhp, piEpoch 6:  10%|█         | 1/10 [00:01<00:14,  1.58s/it, loss=nan, v_num=odhp, piEpoch 6:  20%|██        | 2/10 [00:02<00:09,  1.14s/it, loss=nan, v_num=odhp, piEpoch 6:  20%|██        | 2/10 [00:02<00:09,  1.14s/it, loss=nan, v_num=odhp, piEpoch 6:  30%|███       | 3/10 [00:03<00:07,  1.01s/it, loss=nan, v_num=odhp, piEpoch 6:  30%|███       | 3/10 [00:03<00:07,  1.01s/it, loss=nan, v_num=odhp, piEpoch 6:  40%|████      | 4/10 [00:03<00:05,  1.06it/s, loss=nan, v_num=odhp, piEpoch 6:  40%|████      | 4/10 [00:03<00:05,  1.06it/s, loss=nan, v_num=odhp, piEpoch 6:  50%|█████     | 5/10 [00:04<00:04,  1.11it/s, loss=nan, v_num=odhp, piEpoch 6:  50%|█████     | 5/10 [00:04<00:04,  1.11it/s, loss=nan, v_num=odhp, piEpoch 6:  60%|██████    | 6/10 [00:05<00:03,  1.14it/s, loss=nan, v_num=odhp, piEpoch 6:  60%|██████    | 6/10 [00:05<00:03,  1.14it/s, loss=nan, v_num=odhp, piEpoch 6:  70%|███████   | 7/10 [00:05<00:02,  1.24it/s, loss=nan, v_num=odhp, piEpoch 6:  70%|███████   | 7/10 [00:05<00:02,  1.24it/s, loss=nan, v_num=odhp, piEpoch 6:  80%|████████  | 8/10 [01:16<00:19,  9.56s/it, loss=nan, v_num=odhp, piEpoch 6:  90%|█████████ | 9/10 [01:17<00:08,  8.58s/it, loss=nan, v_num=odhp, piEpoch 6: 100%|██████████| 10/10 [01:17<00:00,  7.77s/it, loss=nan, v_num=odhp, pEpoch 6: 100%|██████████| 10/10 [01:18<00:00,  7.85s/it, loss=nan, v_num=odhp, pEpoch 6: 100%|██████████| 10/10 [01:18<00:00,  7.85s/it, loss=nan, v_num=odhp, pEpoch 6:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1ScEpoch 7:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1ScEpoch 7:  10%|█         | 1/10 [00:01<00:12,  1.40s/it, loss=nan, v_num=odhp, piEpoch 7:  10%|█         | 1/10 [00:01<00:12,  1.41s/it, loss=nan, v_num=odhp, piEpoch 7:  20%|██        | 2/10 [00:02<00:08,  1.06s/it, loss=nan, v_num=odhp, piEpoch 7:  20%|██        | 2/10 [00:02<00:08,  1.06s/it, loss=nan, v_num=odhp, piEpoch 7:  30%|███       | 3/10 [00:02<00:06,  1.05it/s, loss=nan, v_num=odhp, piEpoch 7:  30%|███       | 3/10 [00:02<00:06,  1.05it/s, loss=nan, v_num=odhp, piEpoch 7:  40%|████      | 4/10 [00:03<00:05,  1.11it/s, loss=nan, v_num=odhp, piEpoch 7:  40%|████      | 4/10 [00:03<00:05,  1.11it/s, loss=nan, v_num=odhp, piEpoch 7:  50%|█████     | 5/10 [00:04<00:04,  1.15it/s, loss=nan, v_num=odhp, piEpoch 7:  50%|█████     | 5/10 [00:04<00:04,  1.15it/s, loss=nan, v_num=odhp, piEpoch 7:  60%|██████    | 6/10 [00:05<00:03,  1.18it/s, loss=nan, v_num=odhp, piEpoch 7:  60%|██████    | 6/10 [00:05<00:03,  1.17it/s, loss=nan, v_num=odhp, piEpoch 7:  70%|███████   | 7/10 [00:05<00:02,  1.27it/s, loss=nan, v_num=odhp, piEpoch 7:  70%|███████   | 7/10 [00:05<00:02,  1.27it/s, loss=nan, v_num=odhp, piEpoch 7:  80%|████████  | 8/10 [01:36<00:24, 12.12s/it, loss=nan, v_num=odhp, piEpoch 7:  90%|█████████ | 9/10 [01:37<00:10, 10.89s/it, loss=nan, v_num=odhp, piEpoch 7: 100%|██████████| 10/10 [01:38<00:00,  9.84s/it, loss=nan, v_num=odhp, pEpoch 7: 100%|██████████| 10/10 [01:39<00:00,  9.92s/it, loss=nan, v_num=odhp, pEpoch 7: 100%|██████████| 10/10 [01:39<00:00,  9.92s/it, loss=nan, v_num=odhp, pEpoch 7:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1ScEpoch 8:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1ScEpoch 8:  10%|█         | 1/10 [00:01<00:12,  1.36s/it, loss=nan, v_num=odhp, piEpoch 8:  10%|█         | 1/10 [00:01<00:12,  1.36s/it, loss=nan, v_num=odhp, piEpoch 8:  20%|██        | 2/10 [00:02<00:08,  1.04s/it, loss=nan, v_num=odhp, piEpoch 8:  20%|██        | 2/10 [00:02<00:08,  1.04s/it, loss=nan, v_num=odhp, piEpoch 8:  30%|███       | 3/10 [00:03<00:07,  1.02s/it, loss=nan, v_num=odhp, piEpoch 8:  30%|███       | 3/10 [00:03<00:07,  1.02s/it, loss=nan, v_num=odhp, piEpoch 8:  40%|████      | 4/10 [00:03<00:05,  1.05it/s, loss=nan, v_num=odhp, piEpoch 8:  40%|████      | 4/10 [00:03<00:05,  1.05it/s, loss=nan, v_num=odhp, piEpoch 8:  50%|█████     | 5/10 [00:04<00:04,  1.04it/s, loss=nan, v_num=odhp, piEpoch 8:  50%|█████     | 5/10 [00:04<00:04,  1.04it/s, loss=nan, v_num=odhp, piEpoch 8:  60%|██████    | 6/10 [00:05<00:03,  1.08it/s, loss=nan, v_num=odhp, piEpoch 8:  60%|██████    | 6/10 [00:05<00:03,  1.08it/s, loss=nan, v_num=odhp, piEpoch 8:  70%|███████   | 7/10 [00:05<00:02,  1.17it/s, loss=nan, v_num=odhp, piEpoch 8:  70%|███████   | 7/10 [00:05<00:02,  1.17it/s, loss=nan, v_num=odhp, piEpoch 8:  80%|████████  | 8/10 [02:00<00:30, 15.07s/it, loss=nan, v_num=odhp, piEpoch 8:  90%|█████████ | 9/10 [02:01<00:13, 13.50s/it, loss=nan, v_num=odhp, piEpoch 8: 100%|██████████| 10/10 [02:01<00:00, 12.19s/it, loss=nan, v_num=odhp, pEpoch 8: 100%|██████████| 10/10 [02:02<00:00, 12.27s/it, loss=nan, v_num=odhp, pEpoch 8: 100%|██████████| 10/10 [02:02<00:00, 12.27s/it, loss=nan, v_num=odhp, pEpoch 8:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1ScEpoch 9:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1ScEpoch 9:  10%|█         | 1/10 [00:01<00:12,  1.36s/it, loss=nan, v_num=odhp, piEpoch 9:  10%|█         | 1/10 [00:01<00:12,  1.38s/it, loss=nan, v_num=odhp, piEpoch 9:  20%|██        | 2/10 [00:02<00:09,  1.15s/it, loss=nan, v_num=odhp, piEpoch 9:  20%|██        | 2/10 [00:02<00:09,  1.15s/it, loss=nan, v_num=odhp, piEpoch 9:  30%|███       | 3/10 [00:03<00:07,  1.02s/it, loss=nan, v_num=odhp, piEpoch 9:  30%|███       | 3/10 [00:03<00:07,  1.02s/it, loss=nan, v_num=odhp, piEpoch 9:  40%|████      | 4/10 [00:04<00:06,  1.01s/it, loss=nan, v_num=odhp, piEpoch 9:  40%|████      | 4/10 [00:04<00:06,  1.01s/it, loss=nan, v_num=odhp, piEpoch 9:  50%|█████     | 5/10 [00:04<00:04,  1.04it/s, loss=nan, v_num=odhp, piEpoch 9:  50%|█████     | 5/10 [00:04<00:04,  1.04it/s, loss=nan, v_num=odhp, piEpoch 9:  60%|██████    | 6/10 [00:05<00:03,  1.08it/s, loss=nan, v_num=odhp, piEpoch 9:  60%|██████    | 6/10 [00:05<00:03,  1.07it/s, loss=nan, v_num=odhp, piEpoch 9:  70%|███████   | 7/10 [00:06<00:02,  1.12it/s, loss=nan, v_num=odhp, piEpoch 9:  70%|███████   | 7/10 [00:06<00:02,  1.12it/s, loss=nan, v_num=odhp, piEpoch 9:  80%|████████  | 8/10 [02:28<00:37, 18.58s/it, loss=nan, v_num=odhp, piEpoch 9:  90%|█████████ | 9/10 [02:29<00:16, 16.61s/it, loss=nan, v_num=odhp, piEpoch 9: 100%|██████████| 10/10 [02:30<00:00, 15.01s/it, loss=nan, v_num=odhp, pEpoch 9: 100%|██████████| 10/10 [02:30<00:00, 15.09s/it, loss=nan, v_num=odhp, pEpoch 9: 100%|██████████| 10/10 [02:30<00:00, 15.09s/it, loss=nan, v_num=odhp, pEpoch 9:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1ScEpoch 10:   0%|          | 0/10 [00:00<?, ?it/s, loss=nan, v_num=odhp, pixel_F1SEpoch 10:  10%|█         | 1/10 [00:01<00:12,  1.38s/it, loss=nan, v_num=odhp, pEpoch 10:  10%|█         | 1/10 [00:01<00:12,  1.40s/it, loss=nan, v_num=odhp, pEpoch 10:  20%|██        | 2/10 [00:02<00:08,  1.06s/it, loss=nan, v_num=odhp, pEpoch 10:  20%|██        | 2/10 [00:02<00:08,  1.06s/it, loss=nan, v_num=odhp, pEpoch 10:  30%|███       | 3/10 [00:02<00:06,  1.05it/s, loss=nan, v_num=odhp, pEpoch 10:  30%|███       | 3/10 [00:02<00:06,  1.05it/s, loss=nan, v_num=odhp, pEpoch 10:  40%|████      | 4/10 [00:03<00:05,  1.10it/s, loss=nan, v_num=odhp, pEpoch 10:  40%|████      | 4/10 [00:03<00:05,  1.10it/s, loss=nan, v_num=odhp, pEpoch 10:  50%|█████     | 5/10 [00:04<00:04,  1.14it/s, loss=nan, v_num=odhp, pEpoch 10:  50%|█████     | 5/10 [00:04<00:04,  1.14it/s, loss=nan, v_num=odhp, pEpoch 10:  60%|██████    | 6/10 [00:05<00:03,  1.17it/s, loss=nan, v_num=odhp, pEpoch 10:  60%|██████    | 6/10 [00:05<00:03,  1.17it/s, loss=nan, v_num=odhp, pEpoch 10:  70%|███████   | 7/10 [00:05<00:02,  1.26it/s, loss=nan, v_num=odhp, pEpoch 10:  70%|███████   | 7/10 [00:05<00:02,  1.26it/s, loss=nan, v_num=odhp, pixel_F1Score=0.414, pixel_AUROC=0.798]

Code of Conduct

I agree to follow this project's Code of Conduct

Answered by alexriedel1

Apr 13, 2023

Thank you for the input Alex do you mind elaborating a bit? If I don't use a pre-trained model and want to train the CNN responsible for extracting mid-level-features from the 'backbone', learning rate/optimizers wont have an effect because of what?

Which hyperparameters can I tweak to expereiment? Is it these: input_size: tuple[int, int], layers: list[str], backbone: str = "wide_resnet50_2", pre_trained: bool = True, num_neighbors: int = 9, from the documentation (https://openvinotoolkit.github.io/anomalib/_modules/anomalib/models/patchcore/torch_model.html#PatchcoreModel.forward)

Appologies for my ignorance.

You will always use pre-trained models because patchcore only extracts featur…

View full answer

TorAP · 2023-04-11T11:24:45Z

TorAP
Apr 11, 2023
Author

Please tell me if you need more information

0 replies

alexriedel1 · 2023-04-12T10:18:29Z

alexriedel1
Apr 12, 2023

Besides the error you are getting please be aware that changing the learning rate or optimizer wont' have any impact on patchcores performance

0 replies

TorAP · 2023-04-13T07:31:14Z

TorAP
Apr 13, 2023
Author

Thank you for the input Alex do you mind elaborating a bit? If I don't use a pre-trained model and want to train the CNN responsible for extracting mid-level-features from the 'backbone', learning rate/optimizers wont have an effect because of what?

Which hyperparameters can I tweak to expereiment? Is it these:
input_size: tuple[int, int], layers: list[str], backbone: str = "wide_resnet50_2", pre_trained: bool = True, num_neighbors: int = 9, from the documentation (https://openvinotoolkit.github.io/anomalib/_modules/anomalib/models/patchcore/torch_model.html#PatchcoreModel.forward)

Appologies for my ignorance.

0 replies

alexriedel1 · 2023-04-13T07:51:36Z

alexriedel1
Apr 13, 2023

Thank you for the input Alex do you mind elaborating a bit? If I don't use a pre-trained model and want to train the CNN responsible for extracting mid-level-features from the 'backbone', learning rate/optimizers wont have an effect because of what?

Which hyperparameters can I tweak to expereiment? Is it these: input_size: tuple[int, int], layers: list[str], backbone: str = "wide_resnet50_2", pre_trained: bool = True, num_neighbors: int = 9, from the documentation (https://openvinotoolkit.github.io/anomalib/_modules/anomalib/models/patchcore/torch_model.html#PatchcoreModel.forward)

Appologies for my ignorance.

You will always use pre-trained models because patchcore only extracts features and stores them without any weight updates. You can train a new model for feature extraction but this isnt the point of patchcore and using the anomalib training code wont do that.

Changing the backbone model and the layers might have an impact on the performance, input image size might might have as well.

Maybe reading the original Paper will make things clearer to you

0 replies

TorAP · 2023-04-13T09:14:21Z

TorAP
Apr 13, 2023
Author

I've read the paper, but what confuses me is what an "epoch" entails if we're not training a neural network?

0 replies

alexriedel1 · 2023-04-13T09:17:04Z

alexriedel1
Apr 13, 2023

I've read the paper, but what confuses me is what an "epoch" entails if we're not training a neural network?

An epoch means going through all training images

0 replies

TorAP · 2023-04-13T09:56:07Z

TorAP
Apr 13, 2023
Author

But why would the performance then change per epoch?

0 replies

alexriedel1 · 2023-04-13T11:13:13Z

alexriedel1
Apr 13, 2023

But why would the performance then change per epoch?

It doesn't or are you seeing something else?

0 replies

TorAP · 2023-04-13T13:03:31Z

TorAP
Apr 13, 2023
Author

But why would the performance then change per epoch?

It doesn't or are you seeing something else?

The performance increases until epoch 4 and then declines until epoch 7 where it is static until epoch 50. Which is somewhat the same I see for the 20-25 runs that I have done.

Another question, what is the different between a pre-trained model and a non-pretrained model then?
As this is a prameters I can set for the model:
model: name: patchcore backbone: wide_resnet50_2 pre_trained: true layers: - layer2 - layer3 coreset_sampling_ratio: 0.1 num_neighbors: 9 normalization_method: min_max # options: [null, min_max, cdf]

If we in fact dont have a Neural Network? Does it simple mean that the "memory-bank" is constructed and I dont do the Corset subsampling step to create the memory-bank?

0 replies

alexriedel1 · 2023-04-13T17:37:04Z

alexriedel1
Apr 13, 2023

You have a neural network which is e.g. wide_resnet50_2 trained on imagenet. You extract image features from the layers you specify and store them in a memory-bank coreset

0 replies

TorAP · 2023-04-13T20:12:54Z

TorAP
Apr 13, 2023
Author

Can you comment on this "The performance increases until epoch 4 and then declines until epoch 7 where it is static until epoch 50. Which is somewhat the same I see for the 20-25 runs that I have done."

Is it because the train-part and the test-part run parallel?

0 replies

alexriedel1 · 2023-04-14T07:25:16Z

alexriedel1
Apr 14, 2023

Can you comment on this "The performance increases until epoch 4 and then declines until epoch 7 where it is static until epoch 50. Which is somewhat the same I see for the 20-25 runs that I have done."

Is it because the train-part and the test-part run parallel?

I dont know exactly why this happens. On what scale is the accuracy improving across the epochs? I guess using pre_trained: False will not get you close to the original results?

0 replies

TorAP · 2023-04-14T09:10:52Z

TorAP
Apr 14, 2023
Author

Is it a 0.01 variation..

0 replies

alexriedel1 · 2023-04-14T09:20:31Z

alexriedel1
Apr 14, 2023

Is it a 0.01 variation..

ok this might be due to some randomness that is used somewhere in the process.

0 replies

TorAP · 2023-04-15T10:12:01Z

TorAP
Apr 15, 2023
Author

Can you comment on this "The performance increases until epoch 4 and then declines until epoch 7 where it is static until epoch 50. Which is somewhat the same I see for the 20-25 runs that I have done."
Is it because the train-part and the test-part run parallel?

I dont know exactly why this happens. On what scale is the accuracy improving across the epochs? I guess using pre_trained: False will not get you close to the original results?

What does "pre-trained" mean in this context? What exactly is pre-trained?

0 replies

alexriedel1 · 2023-04-15T10:44:53Z

alexriedel1
Apr 15, 2023

Can you comment on this "The performance increases until epoch 4 and then declines until epoch 7 where it is static until epoch 50. Which is somewhat the same I see for the 20-25 runs that I have done."
Is it because the train-part and the test-part run parallel?

I dont know exactly why this happens. On what scale is the accuracy improving across the epochs? I guess using pre_trained: False will not get you close to the original results?

What does "pre-trained" mean in this context? What exactly is pre-trained?

Trained on imagenet. Please try to read and understand the patchcore Paper, read the padim and some referenced papers as well..

0 replies

samet-akcay · 2023-08-15T14:38:43Z

samet-akcay
Aug 15, 2023
Maintainer

with the latest anomalib version I cannot reproduce this error. I am able to run the sweep using your config file. However, as @alexriedel1 mentioned there is no need for running this sweep for patchcore.

I'm moving this to Q&A since the discussion here is important and could be useful to other users.

0 replies

PatchCore Training #1276

Uh oh!

Uh oh!

Describe the bug

Dataset

Model

Steps to reproduce the behavior

Expected behavior

Screenshots

Pip/GitHub

What version/branch did you use?

Configuration YAML

Logs

Code of Conduct

Replies: 17 comments

Uh oh!

TorAP Apr 11, 2023 Author

Uh oh!

Uh oh!

Uh oh!

TorAP Apr 13, 2023 Author

Uh oh!

Uh oh!

TorAP Apr 13, 2023 Author

Uh oh!

Uh oh!

TorAP Apr 13, 2023 Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TorAP Apr 13, 2023 Author

Uh oh!

Uh oh!

Uh oh!

TorAP Apr 13, 2023 Author

Uh oh!

Uh oh!

TorAP Apr 14, 2023 Author

Uh oh!

Uh oh!

TorAP Apr 15, 2023 Author

Uh oh!

Uh oh!

samet-akcay Aug 15, 2023 Maintainer

TorAP
Apr 11, 2023
Author

TorAP
Apr 13, 2023
Author

TorAP
Apr 13, 2023
Author

TorAP
Apr 13, 2023
Author

TorAP
Apr 13, 2023
Author

TorAP
Apr 13, 2023
Author

TorAP
Apr 14, 2023
Author

TorAP
Apr 15, 2023
Author

samet-akcay
Aug 15, 2023
Maintainer