Use of `np.float16` causes `ValueError` due to sparse array conversion

In `graphium/features/featurizer.py`, line 634: 
```py
def mol_to_adj_and_features(
    mol: Union[str, dm.Mol],
    atom_property_list_onehot: List[str] = [],
    atom_property_list_float: List[Union[str, Callable]] = [],
    conformer_property_list: List[str] = [],
    edge_property_list: List[str] = [],
    add_self_loop: bool = False,
    explicit_H: bool = False,
    use_bonds_weights: bool = False,
    pos_encoding_as_features: Dict[str, Any] = None,
    dtype: np.dtype = np.float16,
    mask_nan: Union[str, float, type(None)] = "raise",
) -> Union[
    coo_matrix,
    Union[Tensor, None],
    Union[Tensor, None],
    Dict[str, Tensor],
    Union[Tensor, None],
    Dict[str, Tensor],
]:
```
`graphium` seems to use `np.float16` as default dtype for this method. However, `mol_to_adj_and_features` calls 
```py
def mol_to_adjacency_matrix(
    mol: dm.Mol,
    use_bonds_weights: bool = False,
    add_self_loop: bool = False,
    dtype: np.dtype = np.float32,
) -> coo_matrix:
```
(line 791)
which has default dtype of `np.float32`. 

The problem is that in `mol_to_adjacency_matrix`, the adjacency matrix is converted to sparse array; 
```py
if len(adj_val) > 0:  # ensure tensor is not empty
        adj = coo_matrix(
            (torch.as_tensor(adj_val), torch.as_tensor(adj_idx).T.reshape(2, -1)),
            shape=(mol.GetNumAtoms(), mol.GetNumAtoms()),
            dtype=dtype,
        )
```

Which causes, in my environment, 
```
ValueError: scipy.sparse does not support dtype float16. The only supported types are: bool, int8, uint8, int16, uint16, int32, uint32, int64, uint64, longlong, ulonglong, float32, float64, longdouble, complex64, complex128, clongdouble.
```

As far as I know, this has been discussed in scipy (https://github.com/scipy/scipy/issues/7408) and in recent versions the checks have become stronger (https://github.com/scipy/scipy/issues/20207). 

I believe this can be fixed simply by using `np.float32` instead. However, if usage of small dtypes for memory efficiency is critical, workarounds would be more complicated. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of `np.float16` causes `ValueError` due to sparse array conversion #539

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use of np.float16 causes ValueError due to sparse array conversion #539

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Use of `np.float16` causes `ValueError` due to sparse array conversion #539