Metrics: using batches for ARI and NMI (clustering_overlap) (#68)

seohyonkim · web-flow · commit e4f0b7c40335 · 2025-08-28T16:25:56.000+02:00
* working ari and nmi batch

* consistent naming

* add changelog
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,8 @@
 * Added `metrics/kbet_pg` and `metrics/kbet_pg_label` components (PR #52).
 * Added `method/drvi` component (PR #61).
 
+* Added `ARI_batch` and `NMI_batch` to `metrics/clustering_overlap` (PR #68).
+
 ## Minor changes
 
 * Un-pin the scPRINT version and update parameters (PR #51)
diff --git a/src/metrics/clustering_overlap/config.vsh.yaml b/src/metrics/clustering_overlap/config.vsh.yaml
@@ -49,6 +49,51 @@ info:
       min: 0
       max: 1
       maximize: true
+    - name: ari_batch
+      label: ARI_batch
+      summary: This version of Adjusted Rand Index compares clustering overlap, correcting for batches
+        and considering correct overlaps and disagreements.
+      description: |
+        The Adjusted Rand Index (ARI) compares the overlap of two clusterings;
+        it considers both correct clustering overlaps while also counting correct
+        disagreements between two clusterings. We compared the batches with the 
+        NMI-optimized Louvain clustering computed on the integrated dataset.
+        The adjustment of the Rand index corrects for randomly correct labels.
+        An ARI_batch of 0 or 1 corresponds to no batch correction or well corrected batches,
+        respectively.
+      references:
+        doi:
+          - 10.1038/s41592-021-01336-8
+          - 10.1007/bf01908075
+      links:
+        homepage: https://scib.readthedocs.io/en/latest/
+        documentation: https://scib.readthedocs.io/en/latest/api/scib.metrics.silhouette_batch.html
+        repository: https://github.com/theislab/scib
+      min: 0
+      max: 1
+      maximize: true
+    - name: nmi_batch
+      label: NMI_batch
+      summary: This version of NMI compares overlap by scaling using mean entropy terms and optimizing
+        Louvain clustering to obtain the best outcome of batch correction.
+      description: |
+        Normalized Mutual Information (NMI) compares the overlap of two clusterings.
+        We used NMI to compare the batches with Louvain clusters computed on
+        the integrated dataset. The overlap was scaled using the mean of the entropy terms
+        for cell-type and cluster labels, then subracted from 1. Thus, NMI_batch scores of 0 or 1 correspond to no batch correction 
+        or well corrected batches, respectively. We performed optimized Louvain clustering
+        for this metric to obtain the best outcome of batch correction.
+      references:
+        doi:
+          - 10.1145/2808797.2809344
+          - 10.1038/s41592-021-01336-8
+      links:
+        homepage: https://scib.readthedocs.io/en/latest/
+        documentation: https://scib.readthedocs.io/en/latest/api/scib.metrics.silhouette_batch.html
+        repository: https://github.com/theislab/scib
+      min: 0
+      max: 1
+      maximize: true
 arguments:
   - name: --resolutions
     type: double
diff --git a/src/metrics/clustering_overlap/script.py b/src/metrics/clustering_overlap/script.py
@@ -47,14 +47,20 @@
 print('Compute NMI score', flush=True)
 nmi_score = nmi(adata, cluster_key=cluster_key, label_key="cell_type")
 
+print('Compute ARI score with batches', flush=True)
+ari_batch_score = 1 - ari(adata, cluster_key=cluster_key, label_key="batch")
+
+print('Compute NMI score with batches', flush=True)
+nmi_batch_score = 1 - nmi(adata, cluster_key=cluster_key, label_key="batch")
+
 print("Create output AnnData object", flush=True)
 output = ad.AnnData(
     uns={
         "dataset_id": adata.uns['dataset_id'],
         'normalization_id': adata.uns['normalization_id'],
         "method_id": adata.uns['method_id'],
-        "metric_ids": [ "ari", "nmi" ],
-        "metric_values": [ ari_score, nmi_score ]
+        "metric_ids": [ "ari", "nmi", "ari_batch", "nmi_batch" ],
+        "metric_values": [ ari_score, nmi_score, ari_batch_score, nmi_batch_score ]
     }
 )