self.t_emb_layers = nn.ModuleList([ nn.Sequential( nn.SiLU(), nn.Linear(t_emb_dim, out_channels) ) for _ in range(num_layers) ])
shouldn't it be following instead
self.t_emb_layers = nn.ModuleList([ nn.Sequential( nn.Linear(t_emb_dim, out_channels), nn.SiLU(), nn.Linear(out_channels, out_channels) ) for _ in range(num_layers) ])
I checked unet_2d.py from huggingface and it also has 2 linear FC layers, let me know if I am missing something.