Skip to content

Conversation

@seongbae15
Copy link

First of all, thank you for your amazing research. I'm very interested in your work, and while running the inference script, I encountered a couple of errors related to the input image's data type and dimensions. I investigated the issues and made some fixes, which I’m sharing through this pull request.

This pull request fixes two runtime errors encountered during inference in the TEMU-VTOFF project.

Error 1: AttributeError — 'Image' object has no attribute 'shape'

Cause: vton_image was a PIL.Image object, which does not have a .shape attribute.

Error 2: RuntimeError — Sizes of tensors must match

Cause: Mismatch in tensor dimensions when concatenating vton_model_input, mask, and masked_vton_latents using torch.cat.

Fix

  • The image is now converted to a PyTorch tensor and batched using:
  • Ensured consistent dimensions by properly processing the image input as shown above. This resolved the size mismatch in concatenated tensors.
image = transforms.ToTensor()(image).unsqueeze(0)

Changes Made

Added image preprocessing line to convert PIL.Image to 4D tensor

Test

Verified inference runs without errors on Colab.

Output image was generated successfully with no runtime exceptions.

…dimension mismatch

- Fixed AttributeError caused by attempting to access '.shape' on a PIL.Image object.
- Converted PIL image to tensor and added batch dimension using:
  'image = transforms.ToTensor()(image).unsqueeze(0)'
- Fixed RuntimeError from mismatched tensor sizes during torch.cat by ensuring input tensors have consistent dimensions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant