Skip to content

Conversation

@hlky
Copy link
Contributor

@hlky hlky commented Feb 12, 2025

What does this PR do?

Benchmarking suite is planned, this is a quick draft to benchmark Autoencoder as a priority.

python benchmarks/benchmark_autoencoderkl.py --pretrained_model_name_or_path "stable-diffusion-v1-5/stable-diffusion-v1-5" --dtype float16 --subfolder vae

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left some comments. LMK if they make sense.

self.model_class_name = str(self.model.__class__.__name__)
self.pretrained_model_name_or_path = pretrained_model_name_or_path

@torch.no_grad
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@torch.no_grad
@torch.no_grad()

As per the docs: https://pytorch.org/docs/stable/generated/torch.no_grad.html

return filepath


class AutoencoderKLBenchmark(BaseBenchmarkTestCase):
Copy link
Member

@sayakpaul sayakpaul Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we let the users define dummy_inputs() per model class here? And then we could let them implement their own function that needs to be benchmarked.

So, BaseBenchmarkTestCase could then have a method benchmark():

def benchmark(...):
    time = benchmark_fn(self.run_decode, self.model, tensor)
    memory = bytes_to_giga_bytes(torch.cuda.max_memory_allocated()) # should this be allocated?
    benchmark_info = BenchmarkInfo(time=time, memory=memory)
    
    csv_dict = generate_csv_dict_model(
        model_cls=self.model_class_name, ckpt=self.pretrained_model_name_or_path, benchmark_info=benchmark_info, **kwargs,
    )
    print(f"{self.model_class_name} decode - shape: {list(tensor.shape)}, time: {time}, memory: {memory}")
    return csv_dict

def __init__(self, pretrained_model_name_or_path, dtype, **kwargs):
super().__init__()
self.dtype = getattr(torch, dtype)
model = self.model_class.from_pretrained(pretrained_model_name_or_path, torch_dtype=self.dtype, **kwargs).eval()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, we could move all of these reusable components to BaseBenchmarkTestCase (perhaps rename to ModelBaseBenchmarkTestCase) and let the users specify pretrained_model_name_or_path, torch_dtype, subfolder, etc.

print(f"{self.model_class_name} decode - shape: {list(tensor.shape)}, time: {time}, memory: {memory}")
return csv_dict

def test_decode(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed for the first iteration but I would consider also including model.compile().

@sayakpaul
Copy link
Member

On a 4090 without tiling:

AutoencoderKL decode - shape: [1, 4, 32, 32], time: 0.007, memory: 0.461
AutoencoderKL decode - shape: [1, 4, 64, 64], time: 0.031, memory: 1.318
AutoencoderKL decode - shape: [1, 4, 128, 128], time: 0.149, memory: 4.707
AutoencoderKL decode - shape: [1, 4, 256, 256], time: 0.687, memory: 18.221

With tiling:

AutoencoderKL decode - shape: [1, 4, 32, 32], time: 0.007, memory: 0.461
AutoencoderKL decode - shape: [1, 4, 64, 64], time: 0.031, memory: 1.318
AutoencoderKL decode - shape: [1, 4, 128, 128], time: 0.218, memory: 1.322
AutoencoderKL decode - shape: [1, 4, 256, 256], time: 1.032, memory: 1.324

@hlky hlky closed this Apr 15, 2025
@hlky hlky deleted the benchmark-autoencoder branch April 15, 2025 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants