cvs-health
diff --git a/‎assets/images/matched_unit_graphic.png‎
-280 Bytes b/‎assets/images/matched_unit_graphic.png‎
-280 Bytes
diff --git a/‎docs/source/_notebooks/examples/long_text_graph_demo.ipynb‎
Lines changed: 6 additions & 4 deletions b/‎docs/source/_notebooks/examples/long_text_graph_demo.ipynb‎
Lines changed: 6 additions & 4 deletions
diff --git a/‎docs/source/scorer_definitions/long_text/graph.rst‎
Lines changed: 2 additions & 2 deletions b/‎docs/source/scorer_definitions/long_text/graph.rst‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/source/scorer_definitions/long_text/luq.rst‎
Lines changed: 2 additions & 2 deletions b/‎docs/source/scorer_definitions/long_text/luq.rst‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/source/scorer_definitions/long_text/qa.rst‎
Lines changed: 2 additions & 2 deletions b/‎docs/source/scorer_definitions/long_text/qa.rst‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎examples/README.md‎
Lines changed: 17 additions & 17 deletions b/‎examples/README.md‎
Lines changed: 17 additions & 17 deletions
@@ -12,10 +12,12 @@
     "    of how to use these methods with <code>uqlm</code>. The available scorers and papers from which they are adapted are below:\n",
     "  </p>\n",
     "      \n",
-    "*   Long-text Uncertainty Quantification (LUQ) ([Zhang et al., 2024](https://arxiv.org/abs/2403.20279))\n",
-    "*   LUQ-Atomic ([Zhang et al., 2024](https://arxiv.org/abs/2403.20279))\n",
-    "*   LUQ-pair ([Zhang et al., 2024](https://arxiv.org/abs/2403.20279))\n",
-    "*   Generalized LUQ-pair ([Zhang et al., 2024](https://arxiv.org/abs/2403.20279))\n",
+    "*   Closeness Centrality ([Jiang et al., 2024](https://arxiv.org/abs/2410.20783))\n",
+    "*   Betweenness Centrality ([Jiang et al., 2024](https://arxiv.org/abs/2410.20783))\n",
+    "*   PageRank ([Jiang et al., 2024](https://arxiv.org/abs/2410.20783))\n",
+    "*   Degree Centrality ([Zhang et al., 2024](https://arxiv.org/abs/2403.20279))\n",
+    "*   Harmonic Centrality\n",
+    "*   Laplacian Centrality\n",
     "\n",
     "</div>\n",
     "\n",
 
@@ -11,7 +11,7 @@ Graph-based scorers, proposed by Jiang et al. (2024), decompose original and sam
 
 * **Degree Centrality** - :math:`\frac{1}{m} \sum_{j=1}^m P(\text{entail}|y_j, s)` is the average edge weight, measured by entailment probability for claim node `s`. 
 
-* **Betweenness Centrality** - :math:`\frac{1}{B_{\text{max}}}\sum_{u \neq v \neq s} \frac{\sigma_{uv}(s)}{\sigma_{uv}}` measures uncertainty by calculating the proportion of shortest paths between node pairs that pass through node :math:`s`, where :math:`\sigma_{uv}` represents all shortest paths between nodes :math:`u` and :math:`v`, and :math:`B_{\text{max}}` is the maximum possible value, given by :math:`B_{\text{max}}=\frac{1}{2} [m^2 (p + 1)^2 + m (p + 1)(2t - p - 1) - t (2p - t + 3)]`, `p = \frac{(|\mathbf{s}| - 1)}{m}`, and `t = (|\mathbf{s}| - 1) \mod m`.
+* **Betweenness Centrality** - :math:`\frac{1}{B_{\text{max}}}\sum_{u \neq v \neq s} \frac{\sigma_{uv}(s)}{\sigma_{uv}}` measures uncertainty by calculating the proportion of shortest paths between node pairs that pass through node :math:`s`, where :math:`\sigma_{uv}` represents all shortest paths between nodes :math:`u` and :math:`v`, and :math:`B_{\text{max}}` is the maximum possible value, given by :math:`B_{\text{max}}=\frac{1}{2} [m^2 (p + 1)^2 + m (p + 1)(2t - p - 1) - t (2p - t + 3)]`, :math:`p = \frac{(|\mathbf{s}| - 1)}{m}`, and :math:`t = (|\mathbf{s}| - 1) \mod m`.
 
 
 * **Closeness Centrality** - :math:`\frac{m + 2(|\mathbf{s}| - 1) }{\sum_{v \neq s}dist(s, v)}` measures the inverse sum of distances to all other nodes, normalized by the minimum possible distance.
@@ -27,7 +27,7 @@ where :math:`\mathbf{y}^{(s)}_{\text{cand}} = \{y_1^{(s)}, ..., y_m^{(s)}\}` are
 **Key Properties:**
 
 - Claim or sententence-level scoring
-- Less complex (cost and latency) than other long-form scoring methods
+- More complex (cost and latency) than LUQ-style scoring methods
 - Score range: :math:`[0, 1]`
 
 How It Works
 
@@ -6,13 +6,13 @@ Long-Text Uncertainty Quantification (LUQ)
 Definition
 ----------
 
-The Long-text UQ (LUQ) approach demonstrated here is adapted from Zhang et al. (2024). Similar to standard black-box UQ, this approach requires generating a original response and sampled candidate responses to the same prompt. The original response is then decomposed into units (claims or sentences). Unit-level confidence scores are then obtained by averaging entailment probabilities across candidate responses:
+The Long-text UQ (LUQ) approach demonstrated here is adapted from Zhang et al. (2024). Similar to standard black-box UQ, this approach requires generating a original response and sampled candidate responses to the same prompt. The original response :math:`y` is then decomposed into units (claims or sentences). A confidence score for each unit :math:`s` is then obtained by averaging entailment probabilities across candidate responses:
 
 .. math::
 
     c_g(s; \mathbf{y}_{\text{cand}}) = \frac{1}{m} \sum_{j=1}^m P(\text{entail}|y_j, s)
 
-where :math:`\mathbf{y}^{(s)}_{\text{cand}} = {y_1^{(s)}, ..., y_m^{(s)}}` are :math:`m` candidate responses, and :math:`P(\text{entail}|y_j, s)` denotes the NLI-estimated probability that :math:`s` is entailed in :math:`y_j`.
+where :math:`\mathbf{y}^{(s)}_{\text{cand}} = \{y_1^{(s)}, ..., y_m^{(s)}\}` are :math:`m` candidate responses, and :math:`P(\text{entail}|y_j, s)` denotes the NLI-estimated probability that :math:`s` is entailed in :math:`y_j`.
 
 **Key Properties:**
 
 
@@ -6,7 +6,7 @@ QA-Based Uncertainty Quantification (LUQ)
 Definition
 ----------
 
-The Claim-QA approach demonstrated here is adapted from Farquhar et al. (2024). It uses an LLM to convert each unit (sentence or claim) into a question for which that unit would be the answer. The method measures consistency across multiple responses to these questions, effectively applying standard black-box uncertainty quantification to those sampled responses to the unit questions. Formally, a claim-QA scorer :math:`c_g(s;\cdot)` is defined as follows:
+The Claim-QA approach demonstrated here is adapted from Farquhar et al. (2024).  The original response :math:`y` is decomposed into units (claims or sentences) and LLM is used to convert each unit :math:`s` (sentence or claim) into a question for which that unit would be the answer. The method measures consistency across multiple responses to these questions, effectively applying standard black-box uncertainty quantification to those sampled responses to the unit questions. Formally, a claim-QA scorer :math:`c_g(s;\cdot)` is defined as follows:
 
 .. math::
 
@@ -17,7 +17,7 @@ where :math:`y_0^{(s)}` is the original unit response, :math:`\mathbf{y}^{(s)}_{
 **Key Properties:**
 
 - Claim or sententence-level scoring
-- Less complex (cost and latency) than other long-form scoring methods
+- More complex (cost and latency) than LUQ-style scoring methods
 - Score range: :math:`[0, 1]`
 
 How It Works
 
@@ -10,35 +10,35 @@ The notebooks are organized into core methods, long-form techniques, and advance
 
 | Tutorial | Great fit for... | LLM Compatibility | Added Cost/Latency |
 |----------|-------------|-------------------|--------------|
-| [Black-Box UQ](black_box_demo.ipynb) | Quick setup with any LLM; no need for model internals | All LLMs (API-only access) | Medium-High (multiple generations and comparisons) |
-| [White-Box UQ (Single-Generation)](white_box_single_generation_demo.ipynb) | Fastest and most efficient UQ when you have token probabilities | Requires token probability access | Negligible (single generation) |
-| [White-Box UQ (Multi-Generation)](white_box_multi_generation_demo.ipynb) | Higher accuracy UQ when compute budget allows | Requires token probability access | Medium-High (multiple generations) |
-| [LLM-as-a-Judge](judges_demo.ipynb) | Leveraging one or more LLMs to assess hallucination likelihood | All LLMs (API-only access) | Low-Medium (depends on which judge(s)) |
-| [Train a UQ Ensemble](ensemble_tuning_demo.ipynb) | Maximizing performance by combining multiple UQ methods | Depends on ensemble components | Low-High (depends on selected components) |
+| [Black-Box UQ](https://github.com/cvs-health/uqlm/blob/main/examples/black_box_demo.ipynb) | Quick setup with any LLM; no need for model internals | All LLMs (API-only access) | Medium-High (multiple generations and comparisons) |
+| [White-Box UQ (Single-Generation)](https://github.com/cvs-health/uqlm/blob/main/examples/white_box_single_generation_demo.ipynb) | Fastest and most efficient UQ when you have token probabilities | Requires token probability access | Negligible (single generation) |
+| [White-Box UQ (Multi-Generation)](https://github.com/cvs-health/uqlm/blob/main/examples/white_box_multi_generation_demo.ipynb) | Higher accuracy UQ when compute budget allows | Requires token probability access | Medium-High (multiple generations) |
+| [LLM-as-a-Judge](https://github.com/cvs-health/uqlm/blob/main/examples/judges_demo.ipynb) | Leveraging one or more LLMs to assess hallucination likelihood | All LLMs (API-only access) | Low-Medium (depends on which judge(s)) |
+| [Train a UQ Ensemble](https://github.com/cvs-health/uqlm/blob/main/examples/ensemble_tuning_demo.ipynb) | Maximizing performance by combining multiple UQ methods | Depends on ensemble components | Low-High (depends on selected components) |
 
 ### Tutorials for Long-Form Uncertainty Quantification Methods (for long-text outputs)
 
 | Tutorial | Great fit for... | LLM Compatibility | Added Cost/Latency |
 |----------|-------------|-------------------|--------------|
-| [LUQ method](luq_demo.ipynb) | Detecting claim-level hallucinations in long-form text without model internals | All LLMs (API-only access) | Medium-High (operates over all claims/sentences in original response) |
-| [Graph-based method](graph_based_demo.ipynb) | Analyzing claim relationships in complex responses | All LLMs (API-only access) | Very High (operates over all claims/sentences in original response and sampled responses) |
-| [Generalized Long-form semantic entropy](long_form_semantic_entropy_demo.ipynb) | Reflexlive, detailed approach to claim-level hallucination detection | All LLMs (API-only access) | High (operates over all claims/sentences in original response) |
+| [LUQ method](https://github.com/cvs-health/uqlm/blob/main/examples/long_text_uq_demo.ipynb) | Detecting claim-level hallucinations in long-form text without model internals | All LLMs (API-only access) | Medium-High (operates over all claims/sentences in original response) |
+| [Graph-based method](https://github.com/cvs-health/uqlm/blob/main/examples/long_text_graph_demo.ipynb) | Analyzing claim relationships in complex responses | All LLMs (API-only access) | Very High (operates over all claims/sentences in original response and sampled responses) |
+| [Generalized Long-form semantic entropy](https://github.com/cvs-health/uqlm/blob/main/examples/long_text_qa_demo.ipynb) | Reflexlive, detailed approach to claim-level hallucination detection | All LLMs (API-only access) | High (operates over all claims/sentences in original response) |
 
 ### Other Tutorials and SOTA Method Examples
 
 | Tutorial | Great fit for... | LLM Compatibility | Added Cost/Latency |
 |----------|-------------|-------------------|--------------|
-| [Multimodal UQ](multimodal_demo.ipynb) | Uncertainty quantification with image+text inputs | Requires image-to-text model | Varies by method |
-| [Score Calibration](score_calibration_demo.ipynb) | Converting raw scores to calibrated probabilities as a postprocessing step | Works with any UQ method | Negligible |
-| [Semantic Entropy](semantic_entropy_demo.ipynb) | State-of-the-art UQ when token probabilities are available | Requires token probability access | Medium-High (multiple generations and comparisons) |
-| [Semantic Density](semantic_density_demo.ipynb) | Newest SOTA method for high-accuracy UQ | Requires token probability access | Medium-High (multiple generations and comparisons) |
-| [BS Detector Off-the-Shelf UQ Ensemble](ensemble_off_the_shelf_demo.ipynb) | Ready-to-use ensemble without training | Depends on ensemble components | Medium-High (multiple generations and comparisons) |
+| [Multimodal UQ](https://github.com/cvs-health/uqlm/blob/main/examples/multimodal_demo.ipynb) | Uncertainty quantification with image+text inputs | Requires image-to-text model | Varies by method |
+| [Score Calibration](https://github.com/cvs-health/uqlm/blob/main/examples/score_calibration_demo.ipynb) | Converting raw scores to calibrated probabilities as a postprocessing step | Works with any UQ method | Negligible |
+| [Semantic Entropy](https://github.com/cvs-health/uqlm/blob/main/examples/semantic_entropy_demo.ipynb) | State-of-the-art UQ when token probabilities are available | Requires token probability access | Medium-High (multiple generations and comparisons) |
+| [Semantic Density](https://github.com/cvs-health/uqlm/blob/main/examples/semantic_density_demo.ipynb) | Newest SOTA method for high-accuracy UQ | Requires token probability access | Medium-High (multiple generations and comparisons) |
+| [BS Detector Off-the-Shelf UQ Ensemble](https://github.com/cvs-health/uqlm/blob/main/examples/ensemble_off_the_shelf_demo.ipynb) | Ready-to-use ensemble without training | Depends on ensemble components | Medium-High (multiple generations and comparisons) |
 
 
-## Getting Started
+## Where should I start?
 
-We recommend starting with the [Black-Box UQ](black_box_demo.ipynb) notebook if you're new to uncertainty quantification or don't have access to model internals.
+We recommend starting with the [Black-Box UQ](https://github.com/cvs-health/uqlm/blob/main/examples/black_box_demo.ipynb) notebook if you're new to uncertainty quantification or don't have access to model internals.
 
-For the most efficient approach with minimal compute requirements, try the [White-Box UQ (Single-Generation)](white_box_single_generation_demo.ipynb) notebook if you have access to token probabilities.
+For the most efficient approach with minimal compute requirements, try the [White-Box UQ (Single-Generation)](https://github.com/cvs-health/uqlm/blob/main/examples/white_box_single_generation_demo.ipynb) notebook if you have access to token probabilities.
 
-For long-form text evaluation, the [LUQ method](luq_demo.ipynb) provides a good starting point that works with any LLM API.
+For long-form text evaluation, the [LUQ method](https://github.com/cvs-health/uqlm/blob/main/examples/long_text_uq_demo.ipynb) provides a good starting point that works with any LLM API.