Skip to content

Use of np.float16 causes ValueError due to sparse array conversion #539

@gratus907

Description

@gratus907

In graphium/features/featurizer.py, line 634:

def mol_to_adj_and_features(
    mol: Union[str, dm.Mol],
    atom_property_list_onehot: List[str] = [],
    atom_property_list_float: List[Union[str, Callable]] = [],
    conformer_property_list: List[str] = [],
    edge_property_list: List[str] = [],
    add_self_loop: bool = False,
    explicit_H: bool = False,
    use_bonds_weights: bool = False,
    pos_encoding_as_features: Dict[str, Any] = None,
    dtype: np.dtype = np.float16,
    mask_nan: Union[str, float, type(None)] = "raise",
) -> Union[
    coo_matrix,
    Union[Tensor, None],
    Union[Tensor, None],
    Dict[str, Tensor],
    Union[Tensor, None],
    Dict[str, Tensor],
]:

graphium seems to use np.float16 as default dtype for this method. However, mol_to_adj_and_features calls

def mol_to_adjacency_matrix(
    mol: dm.Mol,
    use_bonds_weights: bool = False,
    add_self_loop: bool = False,
    dtype: np.dtype = np.float32,
) -> coo_matrix:

(line 791)
which has default dtype of np.float32.

The problem is that in mol_to_adjacency_matrix, the adjacency matrix is converted to sparse array;

if len(adj_val) > 0:  # ensure tensor is not empty
        adj = coo_matrix(
            (torch.as_tensor(adj_val), torch.as_tensor(adj_idx).T.reshape(2, -1)),
            shape=(mol.GetNumAtoms(), mol.GetNumAtoms()),
            dtype=dtype,
        )

Which causes, in my environment,

ValueError: scipy.sparse does not support dtype float16. The only supported types are: bool, int8, uint8, int16, uint16, int32, uint32, int64, uint64, longlong, ulonglong, float32, float64, longdouble, complex64, complex128, clongdouble.

As far as I know, this has been discussed in scipy (scipy/scipy#7408) and in recent versions the checks have become stronger (scipy/scipy#20207).

I believe this can be fixed simply by using np.float32 instead. However, if usage of small dtypes for memory efficiency is critical, workarounds would be more complicated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions