Skip to content

Support for multi-class probability in extrapolators #75

@kiranvad

Description

@kiranvad

The extrapolator classes in AFL.double_agent.Extrapolator squash the multi-class probabilities down to a single value per grid point:

y_prob.sum(axis=-1), dims=self.grid_dim

This behavior is limiting when we want to access the full multi-class probabilities for downstream computations or visualization. To return these probabilities in an xr.DataArray, we need to introduce dummy dimensions for the multiple classes. For example:

self.output[self._prefix_output("y_prob")] = xr.DataArray(
                probabilities.detach().numpy(), dims=(self.grid_dim, self._prefix_output("n_classes"))
            )

However, this approach breaks when the number of classes changes between iterations, since the dimensions are no longer consistent after a new class is discovered by the labeler. Since the PipelineOp initiates the following after .calculate,

dataset1 = op.add_to_dataset(dataset1, copy_dataset=False)

it throws an error:

ValueError: cannot reindex or align along dimension 'phase_n_classes' because of conflicting dimension sizes:

This happens because, upon discovering a new phase, the dimension phase_n_classes gains an extra entry, while the existing dataset still has fewer entries.

This seems like a general issue that hasn’t shown up in earlier AFL pipelines due to the use of clustering as a labeler, where the number of phases is indirectly fixed in advance. Supporting the more generic case—where phases can be discovered dynamically—would make the framework more robust. Pipelines that know all possible phases at the start would naturally fit into this more general solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions