-
-
Notifications
You must be signed in to change notification settings - Fork 270
Aspect Ratio Bucketing
hyppyhyppo edited this page Apr 19, 2025
·
7 revisions
- Training Resolution: 1024 pixels
- Batch size: 4
- Pixel Limit (aka Budget): 1,048,576 pixels
- (Because 1024×1024=1,048,576)
Handling AR’s properly = better image generations. OneTrainer uses Aspect Ratio Bucketing. Here's how it works.
- Defines buckets relative to training resolution using all_possible_input_aspects
- The program looks at every image in your dataset and notes down the width and height
- For every image, the program figures out which bucket is the closest fit
- Scaling: If the image is still too big, it shrinks (scales down) the image to fit within the pixel budget.
- Cropping:
- If scaling alone cannot make it fit, then it can crop one dimension evenly (width or height). The crop amount is functionally limited by the amount of buckets and the training resolution (we derive the resolution of each bucket from the training res). If the crop jitter augmentation is enabled it will randomly distribute the cropping required in one or more dimensions
In summary we try to make the smallest possible adjustments to the image.
- (4.0, 1.0),
- (3.5, 1.0),
- (3.0, 1.0),
- (2.5, 1.0),
- (2.0, 1.0),
- (1.75, 1.0), Common Widescreen (16/9)
- (1.5, 1.0),
- (1.25, 1.0),
- (1.0, 1.0), Square
- (1.0, 1.25),
- (1.0, 1.5), Common Portrait
- (1.0, 1.75),
- (1.0, 2.0),
- (1.0, 2.5),
- (1.0, 3.0),
- (1.0, 3.5),
- (1.0, 4.0)
- 16:9
- 1280 x 720
- 1366 x 768
- 1600 x 900
- 1920 x 1080
Lets use the 1920 x 1080 as an example
- Determine AR - divide image width by height = 1.7
- Looking at the available buckets, our closet match 1.75:1, however its not an exact match
- The image is also over our pixel budget so we proportionally scale down the image to 1365 × 768 (1.048M pixels)
- Now to make it fit the 1.75:1 bucket we must reduce its width, so we crop 21 pixels from the width
- An aspect ratio bucket is an aspect ratio adjusted to the pixel budget.
- Images are scaled / cropped to match the closest possible bucket.
- During training, a batch can only be filled with images on the same bucket. That explains a potential image drop when using a batch size greater than 1 and images on different ratio.