Home

Welcome to the lean2ComfyUI wiki!

การคำนวณค่า tensors ไม่ให้แตกต่าง ระหว่าง input image vs model

🧾 1. จากโค้ด wanvideo/modules/encoder.py

self.patch_embed = PatchEmbed(
    img_size=(832, 480),
    patch_size=16,
    in_chans=3,
    embed_dim=768,
)

จาก patch size = 16
ถ้า กว้าง 832 x ยาว 480
832/16=52 Tokens
480/16=30 Tokens

52x30=1560 tokens!

เพราะงั้นเวลาเล่น กับ image to video, ต้องคำนึงถึงจำนวน Tokens = 1560/frame

ถ้าจะ resize image ให้ถึงนึงถึง (w/16)*(H/16)=1560 เสมอ

สมมติเรามีรูปขนาด 1024p x 1024p

Simple math node

ต้องการ W ใช้สูตร floor(1560/floor(height/16))*16
= จะเท่ากับ W384 x H1024 เพื่อได้ Token 1560/frame

📦 ความหมายของ `[B, W, H, C]` ใน latent shape

Index	ชื่อ	ค่า	ความหมาย
1	`B` = Batch size	1	จำนวนภาพใน batch นี้ (คือรูปเดียว frame เดียวตอนนั้น)
2	`W` = Width	52	จำนวน patch ด้านกว้าง (W = 832 / 16)
3	`H` = Height	30	จำนวน patch ด้านสูง (H = 480 / 16)
4	`C` = Channel/Dim	768	ความยาวของ embedding vector (dimension ต่อ token)

🧪embedding dim = 768 คืออะไร?

ค่ามาตรฐาน ViT/Transformer:
ViT Base: 768
ViT Large: 1024 หรือ 1152

สรุป [1, 52, 30, 768] = 1 รูป, แบ่งเป็น 52x30 patch, patch แต่ละอันมีเวกเตอร์ขนาด 768 มิติ
→ Patch กลายเป็น Token
→ Token กลายเป็น vector
→ Vector ถูกหมุนด้วย RoPE
→ ป้อนเข้า Transformer เพื่อให้วิดีโอเข้าใจภาพขยับ! 🎥🧠

By iimate24.com

== Side bar Menu ==

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Home

การคำนวณค่า tensors ไม่ให้แตกต่าง ระหว่าง input image vs model

Simple math node

📦 ความหมายของ `[B, W, H, C]` ใน latent shape

🧪embedding dim = 768 คืออะไร?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Home

การคำนวณค่า tensors ไม่ให้แตกต่าง ระหว่าง input image vs model

Simple math node

📦 ความหมายของ [B, W, H, C] ใน latent shape

🧪embedding dim = 768 คืออะไร?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

📦 ความหมายของ `[B, W, H, C]` ใน latent shape