ggml : fix padding in timestep embedding kernels #15932

danbev · 2025-09-11T04:25:28Z

This commit applies the same changes as were applied in #15917 to the cuda, metal, opencl, sycl, and vulkan backends.

ggerganov

I think we should add a comment to the declaration of this function to clarify that it will pad the tensor if it has odd number of dimensions. So here:

llama.cpp/ggml/include/ggml.h

Lines 2123 to 2130 in 28b5f19

    
           // Ref: https://github.com/CompVis/stable-diffusion/blob/main/ldm/modules/diffusionmodules/util.py#L151 
        
           // timesteps: [N,] 
        
           // return: [N, dim] 
        
           GGML_API struct ggml_tensor * ggml_timestep_embedding( 
        
                   struct ggml_context * ctx, 
        
                   struct ggml_tensor  * timesteps, 
        
                   int                   dim, 
        
                   int                   max_period);

Change to:

// return: [N, PAD(dim, 2)]

Btw, not sure what was the reasoning for this padding logic. It does not seem to be present in the reference Python implementation. Maybe @leejet can clarify?

danbev · 2025-09-11T08:49:16Z

I forgot to add [no ci] so I manually cancelled the workflows, hence the "Some checks haven't completed yet".

leejet · 2025-09-11T14:56:22Z

I think we should add a comment to the declaration of this function to clarify that it will pad the tensor if it has odd number of dimensions. So here:

llama.cpp/ggml/include/ggml.h

Lines 2123 to 2130 in 28b5f19

// Ref: https://github.com/CompVis/stable-diffusion/blob/main/ldm/modules/diffusionmodules/util.py#L151

// timesteps: [N,]

// return: [N, dim]

GGML_API struct ggml_tensor * ggml_timestep_embedding(

struct ggml_context * ctx,

struct ggml_tensor * timesteps,

int dim,

int max_period);

Change to:
// return: [N, PAD(dim, 2)]
Btw, not sure what was the reasoning for this padding logic. It does not seem to be present in the reference Python implementation. Maybe @leejet can clarify?

It seems that padding is incorrect behavior. I don’t really remember why I added it in the first place—it was too long ago. Given the current situation, I suggest removing the padding.

danbev · 2025-09-12T12:27:58Z

@leejet Thanks for the feedback!
I'll take a closer look at removing the padding next week.

danbev · 2025-09-15T11:38:06Z

@ggerganov @leejet I've taken a closer look at this and I think that the padding might be required after all.

Looking at the reference python implementation we have the following:

def timestep_embedding(timesteps, dim, max_period=10000, repeat_only=False):    
    if not repeat_only:                                                            
        half = dim // 2                                                         
        freqs = torch.exp(                                                      
            -math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32) / half
        ).to(device=timesteps.device)                                           
        args = timesteps[:, None].float() * freqs[None]                         
        embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)       
        if dim % 2:                                                             
            embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)
    else:                                                                       
        embedding = repeat(timesteps, 'b -> b d', d=dim)                        
    return embedding

Specifically these lines:

        if dim % 2:                                                             
            embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)

if dim % 2 checks if the dimension is odd, and if so it adds an extra column of zeros to make the final embedding have the correct dimension.

This padding is necessary because sinusoidal embeddings produce pairs of cosine/sine values. For dim=15, you get 7 complete pairs (14 values) plus 1 padding zero to reach the target dimension. Without this padding, the GGML implementation leaves uninitialized memory that can contain garbage values, which is what we observed in the test failures. I believe the padding in GGML is required to match this reference behavior.

ggerganov · 2025-09-16T05:27:25Z

@danbev I think the problem is that for input dim=15 the Python code produces output dimension of 15, while ggml produces 16. At least this is what I understand from reading the implementation in ggml.c:

llama.cpp/ggml/src/ggml.c

Lines 4919 to 4932 in 261e6a2

    
           // ggml_timestep_embedding 
        
           struct ggml_tensor * ggml_timestep_embedding( 
        
                   struct ggml_context * ctx, 
        
                   struct ggml_tensor  * timesteps, 
        
                   int                   dim, 
        
                   int                   max_period) { 
        
               int actual_dim = dim; 
        
               if (dim % 2 != 0) { 
        
                   actual_dim = dim + 1; 
        
               } 
        
               struct ggml_tensor * result = ggml_new_tensor_2d(ctx, GGML_TYPE_F32, actual_dim, timesteps->ne[0]);

danbev · 2025-09-16T06:04:17Z

@ggerganov Oh right, I misunderstood the comment about removing the padding to mean in the kernels. It indeed looks like adding an extra dimension here would not be needed. I'll update this and test it.

This commit updates the ggml_timestep_embedding function to no longer add an extra dimension when the specified dimension is odd. The motivation for this change is that this introduces an unnecessary dimension when the dimension is odd, which caused an issue in the kernels which were not expecting this extra dimension and it resulted in uninitialized memory for the second to last dimension.

This commit removes the zeroing out of the last dimension now that we are not adding the extra padding dimension.

This commit fixes the zero padding for odd dimensions in the timestep embedding kernel

This commit fixes the zero padding for odd dimensions in the timestep embedding kernel.

This commit removes the zeroing out of the last dimension now that we are not adding the extra padding dimension.

* ggml : remove adding extra dim timestep embedding This commit updates the ggml_timestep_embedding function to no longer add an extra dimension when the specified dimension is odd. The motivation for this change is that this introduces an unnecessary dimension when the dimension is odd, which caused an issue in the kernels which were not expecting this extra dimension and it resulted in uninitialized memory for the second to last dimension. * ggml-cuda : fix padding in timestep embedding kernel This commit removes the zeroing out of the last dimension now that we are not adding the extra padding dimension. * ggml-metal : fix padding in timestep embedding kernel This commit fixes the zero padding for odd dimensions in the timestep embedding kernel * ggml-opencl : fix padding in timestep embedding kernel This commit fixes the zero padding for odd dimensions in the timestep embedding kernel. * ggml-sycl : fix padding in timestep embedding kernel This commit fixes the zero padding for odd dimensions in the timestep embedding kernel. * ggml-vulkan : fix padding in timestep embedding kernel This commit fixes the zero padding for odd dimensions in the timestep embedding kernel. * ggml-cpu : fix padding in timestep embedding function This commit removes the zeroing out of the last dimension now that we are not adding the extra padding dimension.

danbev requested a review from 0cc4m as a code owner September 11, 2025 04:25

ggerganov approved these changes Sep 11, 2025

View reviewed changes

danbev force-pushed the ggml-timestep-embedding-backends branch from 004eaba to d304f6b Compare September 16, 2025 04:20

danbev added 7 commits September 16, 2025 09:15

ggml-cuda : fix padding in timestep embedding kernel

5db73a8

This commit removes the zeroing out of the last dimension now that we are not adding the extra padding dimension.

ggml-metal : fix padding in timestep embedding kernel

b55c194

This commit fixes the zero padding for odd dimensions in the timestep embedding kernel

ggml-opencl : fix padding in timestep embedding kernel

a1de8bd

This commit fixes the zero padding for odd dimensions in the timestep embedding kernel.

ggml-sycl : fix padding in timestep embedding kernel

7f12441

This commit fixes the zero padding for odd dimensions in the timestep embedding kernel.

ggml-vulkan : fix padding in timestep embedding kernel

cb6b133

This commit fixes the zero padding for odd dimensions in the timestep embedding kernel.

ggml-cpu : fix padding in timestep embedding function

aa2c30c

This commit removes the zeroing out of the last dimension now that we are not adding the extra padding dimension.

danbev force-pushed the ggml-timestep-embedding-backends branch from d304f6b to aa2c30c Compare September 16, 2025 09:42

danbev requested review from ggerganov and removed request for 0cc4m September 16, 2025 09:43

ggerganov approved these changes Sep 16, 2025

View reviewed changes

danbev merged commit 3913f87 into ggml-org:master Sep 16, 2025
47 of 48 checks passed

danbev deleted the ggml-timestep-embedding-backends branch September 24, 2025 06:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml : fix padding in timestep embedding kernels #15932

ggml : fix padding in timestep embedding kernels #15932

Uh oh!

danbev commented Sep 11, 2025

Uh oh!

ggerganov left a comment

Uh oh!

danbev commented Sep 11, 2025

Uh oh!

leejet commented Sep 11, 2025

Uh oh!

danbev commented Sep 12, 2025

Uh oh!

danbev commented Sep 15, 2025

Uh oh!

ggerganov commented Sep 16, 2025

Uh oh!

danbev commented Sep 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	// Ref: https://github.com/CompVis/stable-diffusion/blob/main/ldm/modules/diffusionmodules/util.py#L151
	// timesteps: [N,]
	// return: [N, dim]
	GGML_API struct ggml_tensor * ggml_timestep_embedding(
	struct ggml_context * ctx,
	struct ggml_tensor * timesteps,
	int dim,
	int max_period);

ggml : fix padding in timestep embedding kernels #15932

ggml : fix padding in timestep embedding kernels #15932

Uh oh!

Conversation

danbev commented Sep 11, 2025

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

danbev commented Sep 11, 2025

Uh oh!

leejet commented Sep 11, 2025

Uh oh!

danbev commented Sep 12, 2025

Uh oh!

danbev commented Sep 15, 2025

Uh oh!

ggerganov commented Sep 16, 2025

Uh oh!

danbev commented Sep 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants