Skip to content

Commit 7bd62f1

Browse files
Merge pull request #319 from cvs-health/patch/v0.5.1
Patch release: `v0.5.1`
2 parents 4ce62b2 + db25340 commit 7bd62f1

File tree

7 files changed

+23
-51
lines changed

7 files changed

+23
-51
lines changed

docs/source/_notebooks/index.rst

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -244,13 +244,16 @@ UQLM offers a broad collection of tutorial notebooks to demonstrate usage of the
244244
.. toctree::
245245
:hidden:
246246

247+
examples/black_box_demo.ipynb
248+
examples/white_box_single_generation_demo.ipynb
249+
examples/white_box_multi_generation_demo.ipynb
247250
examples/ensemble_off_the_shelf_demo.ipynb
248251
examples/ensemble_tuning_demo.ipynb
249252
examples/judges_demo.ipynb
253+
examples/long_text_uq_demo.ipynb
254+
examples/long_text_graph_demo.ipynb
255+
examples/long_text_qa_demo.ipynb
250256
examples/semantic_entropy_demo.ipynb
251257
examples/semantic_density_demo.ipynb
252-
examples/white_box_multi_generation_demo.ipynb
253-
examples/white_box_single_generation_demo.ipynb
254-
examples/black_box_demo.ipynb
255258
examples/multimodal_demo.ipynb
256259
examples/score_calibration_demo.ipynb

docs/source/getstarted.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -174,12 +174,12 @@ Below is a sample of code illustrating how to use the LongTextUQ class to conduc
174174
# 'entailment': 0.9548099517822266
175175
# }
176176
177+
.. raw:: html
177178

178179
<p align="center">
179180
<img src="./_static/images/long_text_output.png" />
180181
</p>
181182

182-
183183
Above `response` and `entailment` reflect the original response and response-level confidence score, while `refined_response` and `refined_entailment` are the corresponding values after response refinement. The `claims_data` column includes granular data for each response, including claims, claim-level confidence scores, and whether each claim is retained in the response refinement process. We use `ChatOpenAI` in this example, any `LangChain Chat Model <https://js.langchain.com/docs/integrations/chat/>`_ may be used. For a more detailed demo, refer to our `Long-Text UQ Demo <_notebooks/examples/long_text_uq_demo.ipynb>`_.
184184

185185

docs/source/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -172,7 +172,7 @@ These scorers leverage a weighted average of multiple individual scorers to prov
172172

173173
.. _long-text-scorers:
174174

175-
1. Long-Text Scorers (Claim-Level)
175+
5. Long-Text Scorers (Claim-Level)
176176
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
177177

178178
.. image:: ./_static/images/luq_example.png

docs/source/scorer_definitions/long_text/graph.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
Long-Text Uncertainty Quantification (LUQ)
2-
==========================================
1+
Graph-Based Uncertainty Quantification (LUQ)
2+
============================================
33

44
.. currentmodule:: uqlm.scorers
55

docs/source/scorer_definitions/long_text/index.rst

Lines changed: 9 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@ Long-Text Scorers
33

44
Long-form uncertainty quantification implements a three-stage pipeline after response generation:
55

6-
1. Response Decomposition: The response $y$ is decomposed into units (claims or sentences), where a unit as denoted as $s$.
6+
1. Response Decomposition: The response :math:`y` is decomposed into units (claims or sentences), where a unit as denoted as $s$.
77

8-
2. Unit-Level Confidence Scoring: Confidence scores are computed using function $c_g(s;\cdot) \in [0, 1]$. Higher scores indicate greater likelihood of factual correctness. Units with scores below threshold $\tau$ are flagged as potential hallucinations.
8+
2. Unit-Level Confidence Scoring: Confidence scores are computed using a unit-level scoring function with values in :math:`[0, 1]`. Higher scores indicate greater likelihood of factual correctness. Units with scores below threshold $\tau$ are flagged as potential hallucinations.
99

1010
3. Response-Level Aggregation: Unit scores are combined to provide an overall response confidence.
1111

@@ -21,46 +21,14 @@ Long-form uncertainty quantification implements a three-stage pipeline after res
2121
- **Limited Compatibility:** Multiple generations and comparison calculations increase latency
2222

2323

24-
Claim-Response Scorers
25-
----------------------
26-
27-
These scorers directly compare claims or sentences in the original responses with sampled responses generated from the same prompt.
28-
29-
.. toctree::
30-
:maxdepth: 1
31-
32-
entailment
33-
noncontradiction
34-
contrasted_entailment
35-
36-
Graph-Based Scorers
37-
-------------------
38-
39-
These scorers decompose original and sampled responses into claims, obtain the union of unique claims across all responses, and compute graph centrality metrics on the bipartite graph of claim-response entailment to measure uncertainty.
40-
41-
.. toctree::
42-
:maxdepth: 1
43-
44-
closeness_centrality
45-
harmonic_centrality
46-
degree_centrality
47-
betweenness_centrality
48-
laplacian_centrality
49-
page_rank
50-
51-
52-
Claim-QA Scorers
53-
----------------
54-
55-
These scorers decompose responses into granular units (sentences or claims), convert each claim or sentence to a question, sample LLM responses to those questions, and measure consistency among those answers to score the claim.
24+
Long-Text Scoring Methods
25+
-------------------------
26+
27+
There are three main categories of long-text scoring methods offered by UQLM:
5628

5729
.. toctree::
5830
:maxdepth: 1
5931

60-
semantic_negentropy
61-
semantic_sets_confidence
62-
noncontradiction
63-
entailment
64-
exact_match
65-
bert_score
66-
cosine_sim
32+
luq
33+
graph
34+
qa

docs/source/scorer_definitions/long_text/qa.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
1-
Long-Text Uncertainty Quantification (LUQ)
2-
==========================================
1+
QA-Based Uncertainty Quantification (LUQ)
2+
=========================================
33

44
.. currentmodule:: uqlm.scorers
55

pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "uqlm"
3-
version = "0.5.0"
3+
version = "0.5.1"
44
description = "UQLM (Uncertainty Quantification for Language Models) is a Python package for UQ-based LLM hallucination detection."
55
authors = ["Dylan Bouchard <dylan.bouchard@cvshealth.com>", "Mohit Singh Chauhan <mohitsingh.chauhan@cvshealth.com>"]
66
maintainers = [
@@ -28,6 +28,7 @@ packages = [
2828
{ include = "uqlm/judges" },
2929
{ include = "uqlm/black_box" },
3030
{ include = "uqlm/white_box" },
31+
{ include = "uqlm/longform" },
3132
{ include = "uqlm/calibration" },
3233
{ include = "uqlm/resources" },
3334
{ include = "uqlm/utils" },

0 commit comments

Comments
 (0)