Inception Score is dependent on split_size

I was trying to replicate the results of the VAR paper (https://github.com/FoundationVision/VAR). However, I saved 50k images to a .npz in order by class (the first 50 images have label 0, the next have label 1, etc.,). This results in an extremely low inception score:

$python evaluator.py VIRTUAL_imagenet256_labeled.npz <my_samples>.npz
...
Computing evaluations...
Inception Score: 62.07027053833008
FID: 3.374905178956624
sFID: 7.929707907039187
Precision: 0.85034
Recall: 0.4955


However, changing the split_size in evaluator.py gave me more expected numbers:

print("Inception Score:", evaluator.compute_inception_score(sample_acts[0]) -> 
print("Inception Score:", evaluator.compute_inception_score(sample_acts[0], split_size = sample_acts[0].shape[0]))

gives this: 

$python evaluator.py VIRTUAL_imagenet256_labeled.npz <my_samples>.npz
...
Computing evaluations...
Inception Score: 298.4532165527344
FID: 3.3748805195803584
sFID: 7.929708379309773
Precision: 0.85028
Recall: 0.4958

This is a non-trivial difference (62 vs 298 !). I think it was missed in the original evals due to the random order that generated samples were added in. Ideally, the code should be order agnostic (I prefer adding samples in order). I suggest either to the one-liner above for a quick fix. It looks like split_size is there for tractability - maybe a deeper fix is necessary to still get this benefit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inception Score is dependent on split_size #165

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inception Score is dependent on split_size #165

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions