Skip to content

Inception Score is dependent on split_size #165

@CameronBraunstein

Description

@CameronBraunstein

I was trying to replicate the results of the VAR paper (https://github.com/FoundationVision/VAR). However, I saved 50k images to a .npz in order by class (the first 50 images have label 0, the next have label 1, etc.,). This results in an extremely low inception score:

$python evaluator.py VIRTUAL_imagenet256_labeled.npz <my_samples>.npz
...
Computing evaluations...
Inception Score: 62.07027053833008
FID: 3.374905178956624
sFID: 7.929707907039187
Precision: 0.85034
Recall: 0.4955

However, changing the split_size in evaluator.py gave me more expected numbers:

print("Inception Score:", evaluator.compute_inception_score(sample_acts[0]) ->
print("Inception Score:", evaluator.compute_inception_score(sample_acts[0], split_size = sample_acts[0].shape[0]))

gives this:

$python evaluator.py VIRTUAL_imagenet256_labeled.npz <my_samples>.npz
...
Computing evaluations...
Inception Score: 298.4532165527344
FID: 3.3748805195803584
sFID: 7.929708379309773
Precision: 0.85028
Recall: 0.4958

This is a non-trivial difference (62 vs 298 !). I think it was missed in the original evals due to the random order that generated samples were added in. Ideally, the code should be order agnostic (I prefer adding samples in order). I suggest either to the one-liner above for a quick fix. It looks like split_size is there for tractability - maybe a deeper fix is necessary to still get this benefit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions