MAISI Mask Autoencoder #2026

cbe135 · 2025-09-04T10:40:34Z

cbe135
Sep 4, 2025

During the implementation of our own Mask Autoencoder, we encountered two questions.
Specifically, our dataset masks have the characteristic of input channel of 1 and output channel of 1.
Our question relates to the different input channel and output channel setting used in pretraining the provided model weights.
The output channel count of 128 makes sense as there are 128 possible option choices for organs and disease phenomena.
In terms of the input channel count, we were wondering why the input channel was 7?
Also, in terms of fine tuning it to our dataset, we were thinking of averaging/modifying the first layer and last layer of the model into the desired input or output. Are there other recommended suggestions?
Thank you.

Can-Zhao · 2025-09-05T18:17:57Z

Can-Zhao
Sep 5, 2025
Collaborator

@guopengf

0 replies

guopengf · 2025-09-06T02:38:53Z

guopengf
Sep 6, 2025
Collaborator

Hi @cbe135, the input channel of the mask autoencoder is 8 (see

tutorials/generation/maisi/configs/config_maisi3d-rflow.json

Line 94 in 8b90a16

"in_channels": 8,

).
This is mainly because we use binary representation to encode the input mask, which saves memory. 8 channels can represent 2**8 (0~255) labels. Each channel represents a bit (see the following function).

tutorials/generation/maisi/scripts/utils.py

Lines 175 to 190 in 8b90a16

    
           def binarize_labels(x: Tensor, bits: int = 8) -> Tensor: 
        
               """ 
        
               Convert input tensor to binary representation. 
        
               This function takes an input tensor and converts it to a binary representation 
        
               using the specified number of bits. 
        
               Args: 
        
                   x (Tensor): Input tensor with shape (B, 1, H, W, D). 
        
                   bits (int, optional): Number of bits to use for binary representation. Defaults to 8. 
        
               Returns: 
        
                   Tensor: Binary representation of the input tensor with shape (B, bits, H, W, D). 
        
               """ 
        
               mask = 2 ** torch.arange(bits).to(x.device, x.dtype) 
        
               return x.unsqueeze(-1).bitwise_and(mask).ne(0).byte().squeeze(1).permute(0, 4, 1, 2, 3)

For example, label 1 is encoded as [0, 0, 0, 0, 0, 0, 0, 1].

For your use case, is the label of your dataset covered in the pre-defined label dict?

0 replies

cbe135 · 2025-09-06T10:05:35Z

cbe135
Sep 6, 2025
Author

Hi @guopengf, thank you.
That makes sense.
Our mask target is hypo tumor so it is not on the list.
Would it make sense to use one of the dummies?
Does it matter which dummy is used?
Looking specifically at the label dict, is there any specific logic on how the different labels are ordered?
Thank you.

1 reply

guopengf Sep 9, 2025
Collaborator

The order of labels does not matter, so you can use any dummy label.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MAISI Mask Autoencoder #2026

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

MAISI Mask Autoencoder #2026

Uh oh!

cbe135 Sep 4, 2025

Replies: 3 comments · 1 reply

Uh oh!

Can-Zhao Sep 5, 2025 Collaborator

Uh oh!

guopengf Sep 6, 2025 Collaborator

Uh oh!

cbe135 Sep 6, 2025 Author

Uh oh!

guopengf Sep 9, 2025 Collaborator

cbe135
Sep 4, 2025

Replies: 3 comments 1 reply

Can-Zhao
Sep 5, 2025
Collaborator

guopengf
Sep 6, 2025
Collaborator

cbe135
Sep 6, 2025
Author

guopengf Sep 9, 2025
Collaborator