-
Notifications
You must be signed in to change notification settings - Fork 888
Description
I was trying to replicate the results of the VAR paper (https://github.com/FoundationVision/VAR). However, I saved 50k images to a .npz in order by class (the first 50 images have label 0, the next have label 1, etc.,). This results in an extremely low inception score:
$python evaluator.py VIRTUAL_imagenet256_labeled.npz <my_samples>.npz
...
Computing evaluations...
Inception Score: 62.07027053833008
FID: 3.374905178956624
sFID: 7.929707907039187
Precision: 0.85034
Recall: 0.4955
However, changing the split_size in evaluator.py gave me more expected numbers:
print("Inception Score:", evaluator.compute_inception_score(sample_acts[0]) ->
print("Inception Score:", evaluator.compute_inception_score(sample_acts[0], split_size = sample_acts[0].shape[0]))
gives this:
$python evaluator.py VIRTUAL_imagenet256_labeled.npz <my_samples>.npz
...
Computing evaluations...
Inception Score: 298.4532165527344
FID: 3.3748805195803584
sFID: 7.929708379309773
Precision: 0.85028
Recall: 0.4958
This is a non-trivial difference (62 vs 298 !). I think it was missed in the original evals due to the random order that generated samples were added in. Ideally, the code should be order agnostic (I prefer adding samples in order). I suggest either to the one-liner above for a quick fix. It looks like split_size is there for tractability - maybe a deeper fix is necessary to still get this benefit.