-
Notifications
You must be signed in to change notification settings - Fork 6.5k
Benchmark Autoencoder #10780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark Autoencoder #10780
Conversation
sayakpaul
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Left some comments. LMK if they make sense.
benchmarks/base_classes.py
Outdated
| self.model_class_name = str(self.model.__class__.__name__) | ||
| self.pretrained_model_name_or_path = pretrained_model_name_or_path | ||
|
|
||
| @torch.no_grad |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @torch.no_grad | |
| @torch.no_grad() |
As per the docs: https://pytorch.org/docs/stable/generated/torch.no_grad.html
| return filepath | ||
|
|
||
|
|
||
| class AutoencoderKLBenchmark(BaseBenchmarkTestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we let the users define dummy_inputs() per model class here? And then we could let them implement their own function that needs to be benchmarked.
So, BaseBenchmarkTestCase could then have a method benchmark():
def benchmark(...):
time = benchmark_fn(self.run_decode, self.model, tensor)
memory = bytes_to_giga_bytes(torch.cuda.max_memory_allocated()) # should this be allocated?
benchmark_info = BenchmarkInfo(time=time, memory=memory)
csv_dict = generate_csv_dict_model(
model_cls=self.model_class_name, ckpt=self.pretrained_model_name_or_path, benchmark_info=benchmark_info, **kwargs,
)
print(f"{self.model_class_name} decode - shape: {list(tensor.shape)}, time: {time}, memory: {memory}")
return csv_dict
benchmarks/base_classes.py
Outdated
| def __init__(self, pretrained_model_name_or_path, dtype, **kwargs): | ||
| super().__init__() | ||
| self.dtype = getattr(torch, dtype) | ||
| model = self.model_class.from_pretrained(pretrained_model_name_or_path, torch_dtype=self.dtype, **kwargs).eval() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, we could move all of these reusable components to BaseBenchmarkTestCase (perhaps rename to ModelBaseBenchmarkTestCase) and let the users specify pretrained_model_name_or_path, torch_dtype, subfolder, etc.
| print(f"{self.model_class_name} decode - shape: {list(tensor.shape)}, time: {time}, memory: {memory}") | ||
| return csv_dict | ||
|
|
||
| def test_decode(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed for the first iteration but I would consider also including model.compile().
|
On a 4090 without tiling: With tiling: |
What does this PR do?
Benchmarking suite is planned, this is a quick draft to benchmark Autoencoder as a priority.
python benchmarks/benchmark_autoencoderkl.py --pretrained_model_name_or_path "stable-diffusion-v1-5/stable-diffusion-v1-5" --dtype float16 --subfolder vaeWho can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.