Why does mobilenetv4_conv_small model training fail to converge? #2651

jiangxiangchuan · 2026-01-20T05:55:43Z

jiangxiangchuan
Jan 20, 2026

I changed the head of mobilenetv4_conv_small model to four heads, used my dataset class. The training dataset has 4800 images and the val dataset has 1200 images. The training failed to converage, the information of training is as below:

My customized model class is defined as below:
`# 定义多输出分类模型
class MultiOutputMobileNet(nn.Module):
def init(self, backbone, num_outputs, num_classes_per_output, pretrained=True):
super().init()
# 1. 正确加载骨干网络（保留原模型的完整特征提取+池化逻辑）
self.backbone = create_model(
"mobilenetv4_conv_small",
pretrained=pretrained,
num_classes=0 # 关键：num_classes=0 → 原模型返回池化后的特征（无分类头）
)
# 2. 正确获取骨干输出维度（timm模型的num_features属性是官方提供的，绝对准确）
self.backbone_out_features = self.backbone.head_hidden_size # mobilenetv4_conv_small的num_features=768（不是576！）

    # # 3. 多分支分类头（带适配h-swish的初始化）
    # self.class_heads = nn.ModuleList([
    #     self._build_class_head() for _ in range(num_outputs)
    # ])

    # 改进后的分类头结构
    self.class_heads = nn.ModuleList([
        nn.Sequential(
            nn.Linear(self.backbone_out_features, 256),
            nn.ReLU(inplace=True),
            nn.Dropout(0.3),
            nn.Linear(256, num_classes_per_output)
        ) for _ in range(num_outputs)
    ])


def forward(self, x):
    # 1. 骨干提取特征：输出形状[batch, 768]（num_classes=0时，timm自动返回池化后的一维特征）
    features = self.backbone(x)  # 无需手动展平！timm已帮你做了GAP+展平

    # print("骨干输出形状：", features.shape) 

    # 2. 多分支预测（返回元组，避免列表的梯度问题）
    outputs = tuple(head(features) for head in self.class_heads)
    return outputs

`

There are no problems in loading data.
The parameter configuration is as belows:
`aa: rand-m8-inc1-mstd1.0
amp: true
amp_dtype: float16
amp_impl: native
aug_repeats: 0
aug_splits: 0
batch_size: 128
bce_loss: false
bce_pos_weight: null
bce_sum: false
bce_target_thresh: null
bn_eps: null
bn_momentum: null
channels_last: true
checkpoint_hist: 10
class_map: ''
clip_grad: null
clip_mode: norm
color_jitter: 0.4
color_jitter_prob: null
cooldown_epochs: 0
crop_pct: null
cutmix: 0.0
cutmix_minmax: null
data:
data_dir: F:\引线数据\引线颜色检测样本\20260115
dataset: ''
dataset_download: false
decay_epochs: 90
decay_milestones:

90
180
270
decay_rate: 0.1
device: cuda
device_modules: null
dist_bn: reduce
drop: 0.25
drop_block: null
drop_connect: null
drop_path: null
epoch_repeats: 0.0
epochs: 2400
eval_metric: top1
experiment: ''
fast_norm: false
fuser: ''
gaussian_blur_prob: 0.05
gp: null
grad_accum_steps: 1
grad_checkpointing: false
grayscale_prob: 0.1
head_init_bias: null
head_init_scale: null
hflip: 0.5
img_size: null
in_chans: null
initial_checkpoint: ''
input_img_mode: null
input_key: null
input_size:
3
256
256
interpolation: ''
jsd_loss: false
layer_decay: null
local_rank: 0
log_interval: 50
log_wandb: false
lr: null
lr_base: 0.002
lr_base_scale: ''
lr_base_size: 4096
lr_cycle_decay: 0.5
lr_cycle_limit: 1
lr_cycle_mul: 1.0
lr_k_decay: 1.0
lr_noise: null
lr_noise_pct: 0.67
lr_noise_std: 1.0
mean: null
min_lr: 0.0
mixup: 0.0
mixup_mode: batch
mixup_off_epoch: 0
mixup_prob: 1.0
mixup_switch_prob: 0.5
model: mobilenetv4_conv_small
model_ema: true
model_ema_decay: 0.99995
model_ema_force_cpu: false
model_ema_warmup: true
model_kwargs: {}
momentum: 0.9
no_aug: false
no_ddp_bb: false
no_prefetcher: false
no_resume_opt: false
num_classes: 4
opt: adamw
opt_betas:
0.6
0.995
opt_eps: null
opt_kwargs: {}
output: ''
patience_epochs: 10
pin_mem: false
pretrained: false
pretrained_path: null
ratio:
0.75
1.3333333333333333
recount: 1
recovery_interval: 0
remode: pixel
reprob: 0.25
resplit: false
resume: ''
save_images: false
scale:
0.08
1.0
sched: cosine
sched_on_updates: true
seed: 42
smoothing: 0.1
split_bn: false
start_epoch: null
std: null
sync_bn: false
synchronize_step: false
target_key: null
torchcompile: null
torchscript: false
train_crop_mode: null
train_interpolation: random
train_num_samples: null
train_split: train.csv
tta: 0
use_multi_epochs_loader: false
val_num_samples: null
val_split: val.csv
validation_batch_size: null
vflip: 0.0
warmup_epochs: 5
warmup_lr: 0.0
warmup_prefix: true
weight_decay: 0.06
worker_seeding: all
workers: 4`

rwightman · 2026-01-21T22:47:12Z

rwightman
Jan 21, 2026
Maintainer

Really can't say, worth pointing out that the appropriate hparams and performance for any given task is linked to the dataset so it's not possible to provide much useful insight without that.

I'd see if it works better with a less fussy model like resnet18/34 first or more standard hparams (these were unusual hparams compared to most, though worked suprisingly well on imagenet pretraining).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why does mobilenetv4_conv_small model training fail to converge? #2651

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Why does mobilenetv4_conv_small model training fail to converge? #2651

Uh oh!

jiangxiangchuan Jan 20, 2026

Replies: 1 comment

Uh oh!

rwightman Jan 21, 2026 Maintainer

jiangxiangchuan
Jan 20, 2026

rwightman
Jan 21, 2026
Maintainer