-
Notifications
You must be signed in to change notification settings - Fork 84
Description
Dear micro-sam developers,
I'm Ziqiang Huang, image analyst at the Imaging Centre at EMBL Heidelberg. I'm trying to use micro-sam to segment some single channel flurorescent images of some special cell type, that has morphology like microglia and quite irregular in shape, size, and morphology.
I created a set of training data, 54 images in total, all 650 by 650 pixel 2D (originally 16-bit image), and converted to 8-bit. I created their corresponding masks with micro-sam 2D annotator.
I'm making use of the finetune_hela.py script in finetuning example to train my own model. However despite I tried myself to fix a few places, it still doesn't work. I found some place in the code it requires only 8-bit image? And some place it needs all the input data to be the same size?
I changed in finetune_hela.py:
line 34-35: the path of image_dir and segmentation_dir to my input data folder and mask folder;
line 54, 56: roi = np.s_[:30, :, :] and roi= np.s_[30:, :, :]
line 83: patch_shape = (1, 650, 650) # is this necessar?
line 84: n_objects_per_batch = 1 # I have various number of objects in each of the input image/batch, but majority has 1, I tried both keep all of them, and remove all but 1, and neither worked
line 124: model_type = "vit_b_lm"
I got the environment setup correctly with the instruction and can use the cuda device successfully.
When running the script I got initially the training OK, but reporting during training looks wrong with "current metric" always be 'nan', and 'best metric' always being 'inf'
More detailed output from the console I pasted it here, for your reference:
(sam) C:\Users\...\micro-sam\examples\finetuning>python -m finetune_hela
Verifying labels in 'train' dataloader: 36%|██████████████████████████████████████▉ | 18/50 [00:24<00:43, 1.36s/it]
Verifying labels in 'val' dataloader: 72%|███████████████████████████████████████████████████████████████████████████████▏ | 36/50 [00:25<00:09, 1.44it/s]
Start fitting for 1800 iterations / 100 epochs
with 18 iterations per epoch
Training with mixed precision
Epoch 11: average [s/it]: 39.558808, current metric: nan, best metric: inf: 12%|███████▋ | 216/1800 [4:52:37<16:46:31, 38.13s/it]Stopping training because there has been no improvement for 10 epochs
Finished training after 11 epochs / 216 iterations.
The best epoch is number 0.
Epoch 11: average [s/it]: 39.558808, current metric: nan, best metric: inf: 12%|███████▋ | 216/1800 [5:02:29<36:58:18, 84.03s/it]
Training took 18249.626536607742 seconds (= 05:304:10 hours)
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "C:\Users\...\micro-sam\examples\finetuning\finetune_hela.py", line 141, in <module>
main()
File "C:\Users\...\micro-sam\examples\finetuning\finetune_hela.py", line 137, in main
export_model(checkpoint_name, model_type)
File "C:\Users\...\micro-sam\examples\finetuning\finetune_hela.py", line 113, in export_model
export_custom_sam_model(
File "C:\Users\...\micro-sam\micro_sam\util.py", line 508, in export_custom_sam_model
_, state = get_sam_model(
^^^^^^^^^^^^^^
File "C:\Users\...\micro-sam\micro_sam\util.py", line 376, in get_sam_model
raise ValueError(f"Checkpoint at {checkpoint_path} could not be found.")
ValueError: Checkpoint at checkpoints\sam_silvia3\best.pt could not be found.I attachd a image and mask pair for your reference:
