Merge pull request #319 from cvs-health/patch/v0.5.1

dylanbouchard · web-flow · commit 7bd62f1b9129 · 2026-01-09T09:37:28.000-05:00
Patch release: `v0.5.1`
diff --git a/docs/source/_notebooks/index.rst b/docs/source/_notebooks/index.rst
@@ -244,13 +244,16 @@ UQLM offers a broad collection of tutorial notebooks to demonstrate usage of the
 .. toctree::
    :hidden:
 
+   examples/black_box_demo.ipynb
+   examples/white_box_single_generation_demo.ipynb
+   examples/white_box_multi_generation_demo.ipynb
    examples/ensemble_off_the_shelf_demo.ipynb
    examples/ensemble_tuning_demo.ipynb
    examples/judges_demo.ipynb
+   examples/long_text_uq_demo.ipynb
+   examples/long_text_graph_demo.ipynb
+   examples/long_text_qa_demo.ipynb
    examples/semantic_entropy_demo.ipynb
    examples/semantic_density_demo.ipynb
-   examples/white_box_multi_generation_demo.ipynb
-   examples/white_box_single_generation_demo.ipynb
-   examples/black_box_demo.ipynb
    examples/multimodal_demo.ipynb
    examples/score_calibration_demo.ipynb
diff --git a/docs/source/getstarted.rst b/docs/source/getstarted.rst
@@ -174,12 +174,12 @@ Below is a sample of code illustrating how to use the LongTextUQ class to conduc
     #   'entailment': 0.9548099517822266
     # }
 
+.. raw:: html
 
    <p align="center">
      <img src="./_static/images/long_text_output.png" />
    </p>
 
-
 Above `response` and `entailment` reflect the original response and response-level confidence score, while `refined_response` and `refined_entailment` are the corresponding values after response refinement. The `claims_data` column includes granular data for each response, including claims, claim-level confidence scores, and whether each claim is retained in the response refinement process. We use `ChatOpenAI` in this example, any `LangChain Chat Model <https://js.langchain.com/docs/integrations/chat/>`_ may be used. For a more detailed demo, refer to our `Long-Text UQ Demo <_notebooks/examples/long_text_uq_demo.ipynb>`_.
 
 
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -172,7 +172,7 @@ These scorers leverage a weighted average of multiple individual scorers to prov
 
 .. _long-text-scorers:
 
-1. Long-Text Scorers (Claim-Level)
+5. Long-Text Scorers (Claim-Level)
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 .. image:: ./_static/images/luq_example.png
diff --git a/docs/source/scorer_definitions/long_text/graph.rst b/docs/source/scorer_definitions/long_text/graph.rst
@@ -1,5 +1,5 @@
-Long-Text Uncertainty Quantification (LUQ)
-==========================================
+Graph-Based Uncertainty Quantification (LUQ)
+============================================
 
 .. currentmodule:: uqlm.scorers
 
diff --git a/docs/source/scorer_definitions/long_text/index.rst b/docs/source/scorer_definitions/long_text/index.rst
@@ -3,9 +3,9 @@ Long-Text Scorers
 
 Long-form uncertainty quantification implements a three-stage pipeline after response generation:
 
-1. Response Decomposition: The response $y$ is decomposed into units (claims or sentences), where a unit as denoted as $s$.
+1. Response Decomposition: The response :math:`y` is decomposed into units (claims or sentences), where a unit as denoted as $s$.
 
-2. Unit-Level Confidence Scoring: Confidence scores are computed using function $c_g(s;\cdot) \in [0, 1]$. Higher scores indicate greater likelihood of factual correctness. Units with scores below threshold $\tau$ are flagged as potential hallucinations.
+2. Unit-Level Confidence Scoring: Confidence scores are computed using a unit-level scoring function with values in :math:`[0, 1]`. Higher scores indicate greater likelihood of factual correctness. Units with scores below threshold $\tau$ are flagged as potential hallucinations.
 
 3. Response-Level Aggregation: Unit scores are combined to provide an overall response confidence.
 
@@ -21,46 +21,14 @@ Long-form uncertainty quantification implements a three-stage pipeline after res
 - **Limited Compatibility:** Multiple generations and comparison calculations increase latency
 
 
-Claim-Response Scorers
-----------------------
-
-These scorers directly compare claims or sentences in the original responses with sampled responses generated from the same prompt.
-
-.. toctree::
-   :maxdepth: 1
-
-   entailment
-   noncontradiction
-   contrasted_entailment
-
-Graph-Based Scorers
--------------------
-
-These scorers decompose original and sampled responses into claims, obtain the union of unique claims across all responses, and compute graph centrality metrics on the bipartite graph of claim-response entailment to measure uncertainty.
-
-.. toctree::
-   :maxdepth: 1
-
-   closeness_centrality
-   harmonic_centrality
-   degree_centrality
-   betweenness_centrality
-   laplacian_centrality
-   page_rank
-
-
-Claim-QA Scorers
-----------------
-
-These scorers decompose responses into granular units (sentences or claims), convert each claim or sentence to a question, sample LLM responses to those questions, and measure consistency among those answers to score the claim.
+Long-Text Scoring Methods
+-------------------------
+ 
+There are three main categories of long-text scoring methods offered by UQLM:
 
 .. toctree::
    :maxdepth: 1
 
-   semantic_negentropy
-   semantic_sets_confidence
-   noncontradiction
-   entailment
-   exact_match
-   bert_score
-   cosine_sim
+   luq
+   graph
+   qa
diff --git a/docs/source/scorer_definitions/long_text/qa.rst b/docs/source/scorer_definitions/long_text/qa.rst
@@ -1,5 +1,5 @@
-Long-Text Uncertainty Quantification (LUQ)
-==========================================
+QA-Based Uncertainty Quantification (LUQ)
+=========================================
 
 .. currentmodule:: uqlm.scorers
 
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "uqlm"
-version = "0.5.0"
+version = "0.5.1"
 description = "UQLM (Uncertainty Quantification for Language Models) is a Python package for UQ-based LLM hallucination detection."
 authors = ["Dylan Bouchard <dylan.bouchard@cvshealth.com>", "Mohit Singh Chauhan <mohitsingh.chauhan@cvshealth.com>"]
 maintainers = [
@@ -28,6 +28,7 @@ packages = [
     { include = "uqlm/judges" },
     { include = "uqlm/black_box" },
     { include = "uqlm/white_box" },
+    { include = "uqlm/longform" },
     { include = "uqlm/calibration" },
     { include = "uqlm/resources" },
     { include = "uqlm/utils" },