Why does mobilenetv4_conv_small model training fail to converge? #2651
Unanswered
jiangxiangchuan
asked this question in
Q&A
Replies: 1 comment
-
|
Really can't say, worth pointing out that the appropriate hparams and performance for any given task is linked to the dataset so it's not possible to provide much useful insight without that. I'd see if it works better with a less fussy model like resnet18/34 first or more standard hparams (these were unusual hparams compared to most, though worked suprisingly well on imagenet pretraining). |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I changed the head of mobilenetv4_conv_small model to four heads, used my dataset class. The training dataset has 4800 images and the val dataset has 1200 images. The training failed to converage, the information of training is as below:

My customized model class is defined as below:
`# 定义多输出分类模型
class MultiOutputMobileNet(nn.Module):
def init(self, backbone, num_outputs, num_classes_per_output, pretrained=True):
super().init()
# 1. 正确加载骨干网络(保留原模型的完整特征提取+池化逻辑)
self.backbone = create_model(
"mobilenetv4_conv_small",
pretrained=pretrained,
num_classes=0 # 关键:num_classes=0 → 原模型返回池化后的特征(无分类头)
)
# 2. 正确获取骨干输出维度(timm模型的num_features属性是官方提供的,绝对准确)
self.backbone_out_features = self.backbone.head_hidden_size # mobilenetv4_conv_small的num_features=768(不是576!)
`
There are no problems in loading data.
The parameter configuration is as belows:
`aa: rand-m8-inc1-mstd1.0
amp: true
amp_dtype: float16
amp_impl: native
aug_repeats: 0
aug_splits: 0
batch_size: 128
bce_loss: false
bce_pos_weight: null
bce_sum: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
channels_last: true
checkpoint_hist: 10
class_map: ''
clip_grad: null
clip_mode: norm
color_jitter: 0.4
color_jitter_prob: null
cooldown_epochs: 0
crop_pct: null
cutmix: 0.0
cutmix_minmax: null
data:
data_dir: F:\引线数据\引线颜色检测样本\20260115
dataset: ''
dataset_download: false
decay_epochs: 90
decay_milestones:
decay_rate: 0.1
device: cuda
device_modules: null
dist_bn: reduce
drop: 0.25
drop_block: null
drop_connect: null
drop_path: null
epoch_repeats: 0.0
epochs: 2400
eval_metric: top1
experiment: ''
fast_norm: false
fuser: ''
gaussian_blur_prob: 0.05
gp: null
grad_accum_steps: 1
grad_checkpointing: false
grayscale_prob: 0.1
head_init_bias: null
head_init_scale: null
hflip: 0.5
img_size: null
in_chans: null
initial_checkpoint: ''
input_img_mode: null
input_key: null
input_size:
interpolation: ''
jsd_loss: false
layer_decay: null
local_rank: 0
log_interval: 50
log_wandb: false
lr: null
lr_base: 0.002
lr_base_scale: ''
lr_base_size: 4096
lr_cycle_decay: 0.5
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_k_decay: 1.0
lr_noise: null
lr_noise_pct: 0.67
lr_noise_std: 1.0
mean: null
min_lr: 0.0
mixup: 0.0
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: mobilenetv4_conv_small
model_ema: true
model_ema_decay: 0.99995
model_ema_force_cpu: false
model_ema_warmup: true
model_kwargs: {}
momentum: 0.9
no_aug: false
no_ddp_bb: false
no_prefetcher: false
no_resume_opt: false
num_classes: 4
opt: adamw
opt_betas:
opt_eps: null
opt_kwargs: {}
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
pretrained_path: null
ratio:
recount: 1
recovery_interval: 0
remode: pixel
reprob: 0.25
resplit: false
resume: ''
save_images: false
scale:
sched: cosine
sched_on_updates: true
seed: 42
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
synchronize_step: false
target_key: null
torchcompile: null
torchscript: false
train_crop_mode: null
train_interpolation: random
train_num_samples: null
train_split: train.csv
tta: 0
use_multi_epochs_loader: false
val_num_samples: null
val_split: val.csv
validation_batch_size: null
vflip: 0.0
warmup_epochs: 5
warmup_lr: 0.0
warmup_prefix: true
weight_decay: 0.06
worker_seeding: all
workers: 4`
Beta Was this translation helpful? Give feedback.
All reactions