Why loss value is high in Anomalib's fastflow, Draem, and other models that need to be trained many epochs #1026

laogonggong847 · 2023-04-21T10:15:44Z

laogonggong847
Apr 21, 2023

Describe the bug

The same error occurs when using Anomalib's fastflow, Draem, and other models that need to be trained many epochs

Dataset

Other (please specify in the text field below)

Model

FastFlow

Steps to reproduce the behavior

1： Download the latest anomalib
2： Install pytorch 11.3+cu13 （My GPU: Nvidia RTX 2080Ti -- 13G）
3： pip install -e .
4： pip install -r requirement.txt

OS information

OS： win10
IDE： Pycharm
GPU： 2080Ti
pytorch： 1.12.1(GPU)
Anomalib version： latest（Until April 21, 2023）

Expected behavior

Many thanks to the authors involved for open sourcing this great library Anomalib. I think it is a milestone in the defect detection field, it is a great work and congratulations to them for the result.

I have successfully trained on my own dataset using Padim, patchCore, etc. and have achieved good results.

But unfortunately, I'm getting a lot of the same errors when training my own data with fastflow, a model that Draem needs to iterate through multiple updates.

Below I will use fastflow's related logs to illustrate

I made changes to fastflow's config.Yaml, but the changes I made were limited to the dataset section, as follows:

dataset:
  name: mydata
  format: folder
  path: ./Mydatasets/cubes
  normal_dir: normal # 独有 name of the folder containing normal images.
  abnormal_dir: abnormal # 独有 name of the folder containing abnormal images.
  task: classification
  normal_test_dir: null
  mask: null
  extensions: null
  image_size: 256 # dimensions to which images are resized (mandatory)
  center_crop: 224 # dimensions to which images are center-cropped after resizing (optional)
  normalization: imagenet # data distribution to which the images will be normalized: [none, imagenet]
  train_batch_size: 32
  test_batch_size: 32
  transform_config:
    train: null
    eval: null
  num_workers: 4
  test_split_mode: from_dir # options: [from_dir, synthetic]
  test_split_ratio: 0.2 # fraction of train images held out testing (usage depends on test_split_mode)
  val_split_mode: same_as_test # options: [same_as_test, from_test, synthetic]
  val_split_ratio: 0.5 # fraction of train/test images held out for validation (usage depends on val_split_mode)
  tiling:
    apply: false
    tile_size: null
    stride: null
    remove_border_count: 0
    use_random_tiling: False
    random_tile_count: 16

model:
  name: fastflow
  backbone: wide_resnet50_2 # options: [resnet18, wide_resnet50_2, cait_m48_448, deit_base_distilled_patch16_384]
  pre_trained: true
  flow_steps: 8 # options: [8, 8, 20, 20] - for each supported backbone
  hidden_ratio: 1.0 # options: [1.0, 1.0, 0.16, 0.16] - for each supported backbone
  conv3x3_only: True # options: [True, False, False, False] - for each supported backbone
  lr: 0.001
  weight_decay: 0.00001
  early_stopping:
    patience: 3
    metric: image_F1Score    #这里我对其进行了限制 'pixel_AUROC' `train_loss`, `train_loss_step`, `image_F1Score`, `image_AUROC`, `train_loss_epoch`
    mode: max
  normalization_method: min_max # options: [null, min_max, cdf]

metrics:
  image:
    - F1Score
    - AUROC
  pixel:
    - F1Score
    - AUROC
  threshold:
    method: adaptive #options: [adaptive, manual]
    manual_image: null
    manual_pixel: null

visualization:
  show_images: False # show images on the screen
  save_images: True # save images to the file system
  log_images: True # log images to the available loggers (if any)
  image_save_path: null # path to which images will be saved
  mode: full # options: ["full", "simple"]

project:
  seed: 42
  path: ./results

logging:
  logger: [] # options: [comet, tensorboard, wandb, csv] or combinations.
  log_graph: false # Logs the model graph to respective logger.

optimization:
  export_mode: null # options: torch, onnx, openvino

# PL Trainer Args. Don't add extra parameter here.
trainer:
  enable_checkpointing: true
  default_root_dir: null
  gradient_clip_val: 0
  gradient_clip_algorithm: norm
  num_nodes: 1
  devices: 1
  enable_progress_bar: true
  overfit_batches: 0.0
  track_grad_norm: -1
  check_val_every_n_epoch: 1 # Don't validate before extracting features.
  fast_dev_run: false
  accumulate_grad_batches: 1
  max_epochs: 500
  min_epochs: null
  max_steps: -1
  min_steps: null
  max_time: null
  limit_train_batches: 1.0
  limit_val_batches: 1.0
  limit_test_batches: 1.0
  limit_predict_batches: 1.0
  val_check_interval: 1.0 # Don't validate before extracting features.
  log_every_n_steps: 50
  accelerator: auto # <"cpu", "gpu", "tpu", "ipu", "hpu", "auto">
  strategy: null
  sync_batchnorm: false
  precision: 32
  enable_model_summary: true
  num_sanity_val_steps: 0
  profiler: null
  benchmark: false
  deterministic: false
  reload_dataloaders_every_n_epochs: 0
  auto_lr_find: false
  replace_sampler_ddp: true
  detect_anomaly: false
  auto_scale_batch_size: false
  plugins: null
  move_metrics_to_cpu: false
  multiple_trainloader_mode: max_size_cycle

Then I gleefully picked up my coffee and prepared to wait for it to run the 500 times I had set for training. But it runs for a while and then reports an error directly:

So I followed the instructions and found the corresponding place in the Yaml file and made the changes （Changed pixel_AUROC to image_AUROC in matric）

 early_stoping:
          patience: 3
          metric:  image_AUROC # `train_loss`, `train_loss_step`, `image_F1Score`, `image_AUROC`, `train_loss_epoch`

But fastflow it only trained three rounds on the error, and his loss function is still very high (but DRAEM I so modified after training more than 40 epochs, the loss function down to 0.17 or so)

I think it is extremely unreasonable that his loss function still has 7.64e+04 at the end of the training

Screenshots

No response

Pip/GitHub

pip

What version/branch did you use?

No response

Configuration YAML

Logs

| Name                  | Type                     | Params
-------------------------------------------------------------------
0 | image_threshold       | AnomalyScoreThreshold    | 0     
1 | pixel_threshold       | AnomalyScoreThreshold    | 0     
2 | model                 | FastflowModel            | 9.5 M 
3 | loss                  | FastflowLoss             | 0     
4 | image_metrics         | AnomalibMetricCollection | 0     
5 | pixel_metrics         | AnomalibMetricCollection | 0     
6 | normalization_metrics | MinMax                   | 0     
-------------------------------------------------------------------
5.4 M     Trainable params
4.2 M     Non-trainable params
9.5 M     Total params
38.076    Total estimated model params size (MB)
Epoch 0:   0%|          | 0/6 [00:00<?, ?it/s] C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\pytorch_lightning\core\module.py:481: UserWarning: You called `self.log('train_loss', ..., logger=True)` but have no logger configured. You can enable one by doing `Trainer(logger=ALogger(...))`
  rank_zero_warn(
Epoch 0:  50%|█████     | 3/6 [00:16<00:16,  5.45s/it, loss=1.55e+05, train_loss_step=1.33e+5]
Validation: 0it [00:00, ?it/s]
Validation:   0%|          | 0/3 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|          | 0/3 [00:00<?, ?it/s]
Epoch 0:  67%|██████▋   | 4/6 [00:31<00:15,  7.78s/it, loss=1.55e+05, train_loss_step=1.33e+5]
Epoch 0:  83%|████████▎ | 5/6 [00:31<00:06,  6.24s/it, loss=1.55e+05, train_loss_step=1.33e+5]
Epoch 0: 100%|██████████| 6/6 [00:31<00:00,  5.22s/it, loss=1.55e+05, train_loss_step=1.33e+5, image_F1Score=1.000, image_AUROC=1.000]
C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\torchmetrics\utilities\prints.py:36: DeprecationWarning: `torchmetrics.functional.auc` has been move to `torchmetrics.utilities.compute` in v0.10 and will be removed in v0.11.
  warnings.warn(*args, **kwargs)
Epoch 1:  50%|█████     | 3/6 [00:15<00:15,  5.22s/it, loss=1.27e+05, train_loss_step=8.21e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=1.56e+5]
Validation: 0it [00:00, ?it/s]
Validation:   0%|          | 0/3 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|          | 0/3 [00:00<?, ?it/s]
Epoch 1:  67%|██████▋   | 4/6 [00:30<00:15,  7.73s/it, loss=1.27e+05, train_loss_step=8.21e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=1.56e+5]
Epoch 1:  83%|████████▎ | 5/6 [00:30<00:06,  6.20s/it, loss=1.27e+05, train_loss_step=8.21e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=1.56e+5]
Epoch 1: 100%|██████████| 6/6 [00:31<00:00,  5.18s/it, loss=1.27e+05, train_loss_step=8.21e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=1.56e+5]
Epoch 2:  50%|█████     | 3/6 [00:15<00:15,  5.21s/it, loss=1.01e+05, train_loss_step=3.22e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=9.92e+4]
Validation: 0it [00:00, ?it/s]
Validation:   0%|          | 0/3 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|          | 0/3 [00:00<?, ?it/s]
Epoch 2:  67%|██████▋   | 4/6 [00:31<00:15,  7.77s/it, loss=1.01e+05, train_loss_step=3.22e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=9.92e+4]
Epoch 2:  83%|████████▎ | 5/6 [00:31<00:06,  6.23s/it, loss=1.01e+05, train_loss_step=3.22e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=9.92e+4]
Epoch 2: 100%|██████████| 6/6 [00:31<00:00,  5.20s/it, loss=1.01e+05, train_loss_step=3.22e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=9.92e+4]
Epoch 3:  50%|█████     | 3/6 [00:16<00:16,  5.36s/it, loss=7.64e+04, train_loss_step=-1.04e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=4.87e+4]
Validation: 0it [00:00, ?it/s]
Validation:   0%|          | 0/3 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|          | 0/3 [00:00<?, ?it/s]
Epoch 3:  67%|██████▋   | 4/6 [00:31<00:15,  7.93s/it, loss=7.64e+04, train_loss_step=-1.04e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=4.87e+4]
Epoch 3:  83%|████████▎ | 5/6 [00:31<00:06,  6.36s/it, loss=7.64e+04, train_loss_step=-1.04e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=4.87e+4]
Epoch 3: 100%|██████████| 6/6 [00:31<00:00,  5.31s/it, loss=7.64e+04, train_loss_step=-1.04e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=4.87e+4]
Epoch 3: 100%|██████████| 6/6 [00:31<00:00,  5.31s/it, loss=7.64e+04, train_loss_step=-1.04e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=3.62e+3]
2023-04-21 18:06:11,485 - anomalib.utils.callbacks.timer - INFO - Training took 128.13 seconds
2023-04-21 18:06:11,485 - anomalib - INFO - Loading the best model weights.
2023-04-21 18:06:11,485 - anomalib - INFO - Testing the model.
2023-04-21 18:06:11,501 - pytorch_lightning.utilities.rank_zero - INFO - The following callbacks returned in `LightningModule.configure_callbacks` will override existing callbacks passed to Trainer: EarlyStopping
2023-04-21 18:06:11,501 - anomalib.utils.callbacks.model_loader - INFO - Loading the model from D:\MyCode\4_HC\Time\202304\Anomalib_Seg\anomalib-main\tools\results\fastflow\mydata\run\weights\lightning\model-v11.ckpt
2023-04-21 18:06:11,594 - anomalib.utils.callbacks.metrics_configuration - WARNING - Cannot perform pixel-level evaluation when task type is classification. Ignoring the following pixel-level metrics: ['F1Score', 'AUROC']
2023-04-21 18:06:11,641 - pytorch_lightning.accelerators.cuda - INFO - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Testing DataLoader 0:   0%|          | 0/3 [00:00<?, ?it/s]D:\MyCode\4_HC\Time\202304\Anomalib_Seg\anomalib-main\src\anomalib\post_processing\visualizer.py:264: MatplotlibDeprecationWarning: Support for FigureCanvases without a required_interactive_framework attribute was deprecated in Matplotlib 3.6 and will be removed two minor releases later.
  self.figure, self.axis = plt.subplots(1, num_cols, figsize=figure_size)
Testing DataLoader 0: 100%|██████████| 3/3 [00:12<00:00,  4.12s/it]2023-04-21 18:06:38,785 - anomalib.utils.callbacks.timer - INFO - Testing took 27.11213994026184 seconds
Throughput (batch_size=32) : 3.5039653900179197 FPS
D:\MyCode\4_HC\Time\202304\Anomalib_Seg\anomalib-main\src\anomalib\utils\metrics\plotting_utils.py:48: MatplotlibDeprecationWarning: Support for FigureCanvases without a required_interactive_framework attribute was deprecated in Matplotlib 3.6 and will be removed two minor releases later.
  fig, axis = plt.subplots()
Testing DataLoader 0: 100%|██████████| 3/3 [00:12<00:00,  4.15s/it]
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       Test metric             DataLoader 0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       image_AUROC                  1.0
      image_F1Score                 1.0
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

进程已结束,退出代码0

Code of Conduct

I agree to follow this project's Code of Conduct

alexriedel1 · 2023-04-23T18:45:55Z

alexriedel1
Apr 23, 2023

You should not care about the absolute loss value but only about your image AUROC and F1, which are both 1.0 meaning that each of your test images was predicted correctly.

9 replies

laogonggong847 Apr 24, 2023
Author

I really appreciate your fast, accurate and high quality replies, all your answers have helped me a lot, thanks again.

I tried to read the fastflow paper, but I am not a native English speaker and this gave me some obstacles. But I really appreciate all of your team for providing links to the original papers in each ''README.md'', which is extremely careful and commendable, thanks again!

In the case that I did not create the corresponding pixel masks, I wanted to improve the accuracy of the model for anomaly localization through multiple iterations. So for the three parameters of early_stopping should I choose the following.

patience:  10  #（Or bigger than 10）

metric: image_AUROC  #（Because I don't have pixel masks）

mode: min # （Should this be the case？）

1:Is it possible to set the parameters in this way? Can I simply understand that when image_AUROC is used, the mode I should choose should be set to min

2:As at the beginning of the question, my AUROC and F1 both achieved the ideal score of 1.0, so I would like to know if I can really improve the performance of the model by letting it train for more epochs in this case

alexriedel1 Apr 24, 2023

If you have train_loss or other loss metrics, you should use min (because you want to minimize them). If you use image_AUROC of image_F1 you use max (because you want to maximize it).
You need a metric for what you want to improve the performance on. If you want to improve the performance for localization you need some masks to be able to watch the pixel metrics. You can also try using the DRAEM model (https://github.com/openvinotoolkit/anomalib/tree/main/src/anomalib/models/draem) to get better localization without masks as it uses synthetic anomalies

laogonggong847 Apr 24, 2023
Author

I really appreciate your patient, attentive, and extremely professional response and help. You solved a problem I had for many days in just one hour, thank you so much. It's because of professional and helpful people like you that we are really pushing forward the progress of technical communication in a sense! I salute you and thank you very much!

alexriedel1 Apr 24, 2023

Thanks @laogonggong847 for the nice words :)

laogonggong847 Apr 25, 2023
Author

Hello @alexriedel1 , I am sorry to ask you a question again. I have a question, now I trained my own data with Drame to get an onnx model. I have used OnnxRuntime to perform inference on this model and converted the inference results into a heat map using cv2.applyColorMap.

Also, we know that in the final visualization of Anomalib itself, the image corresponding to Prediction is given a corresponding rating in the upper left corner, such as "normal (55%) " or "anomalous (66%)".

Based on this, I have the following three questions:

1: How are normal and anomalous determined, and how are 55% and 66% obtained?

2: In my own inference model using OnnxRuntime, how should I get the conclusion of "normal" or "anomalous" and the corresponding score based on the Heat Map I got.

3: If I only observe the Heat Map of the final results of normal and anomalous samples, I don't seem to be able to tell well whether they are anomalous or normal. Also, for the two images that were correctly judged as normal and abnormal, I observed that their corresponding Heat Map was very similar.

Again, I am very sorry to bother you. Thank you very much and I hope to get your reply!

samet-akcay · 2023-04-24T09:20:34Z

samet-akcay
Apr 24, 2023
Maintainer

I agree with @alexriedel1. Your performance scores 100%. I guess, it cannot be any better than that

Instead of looking at the loss value, you could perhaps observe how it reduces over time during training?

4 replies

laogonggong847 Apr 24, 2023
Author

Thank you very much for your patient replies and answers to my questions. I hope I can become a good and valuable programmer like you one day through my continuous learning.

Thank you very much, you have helped me many times, your reply helped me a lot and helped me to be more familiar with this project, thank you again.

Although I got good results on AUROC and F1 with my own dataset, I still have four points of confusion, which are offloaded in my reply to @alexriedel1. If you have time, I hope you can also read these confusions of mine, thank you very much.

Meanwhile, I was using fastflow, which I had completely run through before and got what everyone thought was a good result. But I wanted to get the model in onnx format, so I changed "export_mode" to "onnx" on the previous yaml file

  export_mode : "onnx"

But this time I got an error directly, the specific error message is as follows:

Epoch 3:  43%|████▎     | 3/7 [00:20<00:27,  6.90s/it, loss=1.09e+05, train_loss_step=4.69e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=9.02e+4]
Epoch 3:  57%|█████▋    | 4/7 [00:20<00:15,  5.19s/it, loss=1.09e+05, train_loss_step=4.69e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=9.02e+4]
Epoch 3:  71%|███████▏  | 5/7 [00:20<00:08,  4.17s/it, loss=1.09e+05, train_loss_step=4.69e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=9.02e+4]
Epoch 3:  86%|████████▌ | 6/7 [00:20<00:03,  3.48s/it, loss=1.09e+05, train_loss_step=4.69e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=9.02e+4]
Epoch 3: 100%|██████████| 7/7 [00:20<00:00,  2.99s/it, loss=1.09e+05, train_loss_step=4.69e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=9.02e+4]
Epoch 4:  29%|██▊       | 2/7 [00:05<00:13,  2.68s/it, loss=9.17e+04, train_loss_step=1.36e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=5.63e+4]
Validation: 0it [00:00, ?it/s]
Validation:   0%|          | 0/5 [00:00<?, ?it/s]
Validation DataLoader 0:   0%|          | 0/5 [00:00<?, ?it/s]
Epoch 4:  43%|████▎     | 3/7 [00:20<00:27,  6.91s/it, loss=9.17e+04, train_loss_step=1.36e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=5.63e+4]
Epoch 4:  57%|█████▋    | 4/7 [00:20<00:15,  5.20s/it, loss=9.17e+04, train_loss_step=1.36e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=5.63e+4]
Epoch 4:  71%|███████▏  | 5/7 [00:20<00:08,  4.17s/it, loss=9.17e+04, train_loss_step=1.36e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=5.63e+4]
Epoch 4:  86%|████████▌ | 6/7 [00:20<00:03,  3.48s/it, loss=9.17e+04, train_loss_step=1.36e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=5.63e+4]
Epoch 4: 100%|██████████| 7/7 [00:20<00:00,  2.99s/it, loss=9.17e+04, train_loss_step=1.36e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=5.63e+4]
2023-04-24 17:45:43,730 - anomalib.utils.callbacks.export - INFO - Exporting the model
Epoch 4: 100%|██████████| 7/7 [00:20<00:00,  2.99s/it, loss=9.17e+04, train_loss_step=1.36e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=2.37e+4]Traceback (most recent call last):
  File "D:/MyCode/4_HC/Time/202304/Anomalib_Seg/anomalib-main/tools/train.py", line 76, in <module>
    train()
  File "D:/MyCode/4_HC/Time/202304/Anomalib_Seg/anomalib-main/tools/train.py", line 61, in train
    trainer.fit(model=model, datamodule=datamodule)
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 608, in fit
    call._call_and_handle_interrupt(
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\pytorch_lightning\trainer\call.py", line 38, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 650, in _fit_impl
    self._run(model, ckpt_path=self.ckpt_path)
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1112, in _run
    results = self._run_stage()
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1191, in _run_stage
    self._run_train()
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1214, in _run_train
    self.fit_loop.run()
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\pytorch_lightning\loops\loop.py", line 206, in run
    output = self.on_run_end()
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\pytorch_lightning\loops\fit_loop.py", line 323, in on_run_end
    self.trainer._call_callback_hooks("on_train_end")
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 1394, in _call_callback_hooks
    fn(self, self.lightning_module, *args, **kwargs)
  File "D:\MyCode\4_HC\Time\202304\Anomalib_Seg\anomalib-main\src\anomalib\utils\callbacks\export.py", line 46, in on_train_end
    export(
  File "D:\MyCode\4_HC\Time\202304\Anomalib_Seg\anomalib-main\src\anomalib\deploy\export.py", line 114, in export
    onnx_path = export_to_onnx(model, input_size, export_path)
  File "D:\MyCode\4_HC\Time\202304\Anomalib_Seg\anomalib-main\src\anomalib\deploy\export.py", line 147, in export_to_onnx
    torch.onnx.export(
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\torch\onnx\__init__.py", line 350, in export
    return utils.export(
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\torch\onnx\utils.py", line 163, in export
    _export(
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\torch\onnx\utils.py", line 1074, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\torch\onnx\utils.py", line 731, in _model_to_graph
    graph = _optimize_graph(
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\torch\onnx\utils.py", line 308, in _optimize_graph
    graph = _C._jit_pass_onnx(graph, operator_export_type)
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\torch\onnx\__init__.py", line 416, in _run_symbolic_function
    return utils._run_symbolic_function(*args, **kwargs)
  File "C:\ProgramData\anaconda3\envs\HC_Anomalib\lib\site-packages\torch\onnx\utils.py", line 1421, in _run_symbolic_function
    raise symbolic_registry.UnsupportedOperatorError(
torch.onnx.symbolic_registry.UnsupportedOperatorError: Exporting the operator ::_convolution_mode to ONNX opset version 11 is not supported. Please feel free to request support or submit a pull request on PyTorch GitHub.
Epoch 4: 100%|██████████| 7/7 [00:23<00:00,  3.34s/it, loss=9.17e+04, train_loss_step=1.36e+4, image_F1Score=1.000, image_AUROC=1.000, train_loss_epoch=2.37e+4]

进程已结束,退出代码1

Can you please tell me what is the reason for this problem and how to solve it? I look forward to your reply and wish you a happy life, thank you very much.

alexriedel1 Apr 24, 2023

The reason for this is explained here: #982 and there is also a possible solution. However you will need to change some of the code of anomalib for this

laogonggong847 Apr 24, 2023
Author

You are really too strong, HaHaHa this program is really very useful, thank you very much

anewworl Nov 6, 2023

@laogonggong847 can i ask this. Did you do any training on the efficient_ad model and if you did how much epochs did you do ? thanks a lot

Why loss value is high in Anomalib's fastflow, Draem, and other models that need to be trained many epochs #1026

Uh oh!

laogonggong847 Apr 21, 2023

Describe the bug

Dataset

Model

Steps to reproduce the behavior

OS information

Expected behavior

Screenshots

Pip/GitHub

What version/branch did you use?

Configuration YAML

Logs

Code of Conduct

Replies: 2 comments · 13 replies

Uh oh!

alexriedel1 Apr 23, 2023

Uh oh!

laogonggong847 Apr 24, 2023 Author

Uh oh!

alexriedel1 Apr 24, 2023

Uh oh!

laogonggong847 Apr 24, 2023 Author

Uh oh!

alexriedel1 Apr 24, 2023

Uh oh!

Uh oh!

laogonggong847 Apr 25, 2023 Author

Uh oh!

Uh oh!

samet-akcay Apr 24, 2023 Maintainer

Uh oh!

laogonggong847 Apr 24, 2023 Author

Uh oh!

alexriedel1 Apr 24, 2023

Uh oh!

laogonggong847 Apr 24, 2023 Author

Uh oh!

Uh oh!

anewworl Nov 6, 2023

laogonggong847
Apr 21, 2023

Replies: 2 comments 13 replies

alexriedel1
Apr 23, 2023

laogonggong847 Apr 24, 2023
Author

laogonggong847 Apr 24, 2023
Author

laogonggong847 Apr 25, 2023
Author

samet-akcay
Apr 24, 2023
Maintainer

laogonggong847 Apr 24, 2023
Author

laogonggong847 Apr 24, 2023
Author