Releases: cvs-health/uqlm
v0.5.7
What's Changed
- Fixed doc site inaccuracies by @vgyani in #367
- Patch release:
v0.5.7by @dylanbouchard in #368
Full Changelog: v0.5.6...v0.5.7
v0.5.6
Highlights
- package upgrades from dependabot
- update badge colors and links
- update citation information
What's Changed
- Badge by @dylanbouchard in #351
- Revise publication links and citation details by @dylanbouchard in #352
- Badge by @dylanbouchard in #353
- Bump nbsphinx from 0.9.6 to 0.9.8 by @dependabot[bot] in #356
- Bump pytest-asyncio from 1.1.1 to 1.3.0 by @dependabot[bot] in #250
- Bump sphinx-gallery from 0.18.0 to 0.20.0 by @dependabot[bot] in #354
- Update rich requirement from <14.0.0,>=13.8.0 to >=13.8.0,<15.0.0 by @dependabot[bot] in #355
- Patch release:
v0.5.6by @dylanbouchard in #357
Full Changelog: v0.5.5...v0.5.6
v0.5.5
Highlights
- replace
ageneratewithainvokewhere generation is used, sinceainvokeappears to be the better-maintained and better-documented LangChain method - replace
poetrywithuvfor dependency management - update badges
What's Changed
- Badge updates by @dylanbouchard in #348
- Switch from Poetry to astral-sh/uv by @vgyani in #349
- Patch release:
v0.5.5by @dylanbouchard in #350
Full Changelog: v0.5.4...v0.5.5
v0.5.4
Highlights
1. Add new white-box scorers to UQEnsemble accepted scorers list:
Top-logprobs scorers (3):
min_token_negentropy- Minimum negentropy across tokensmean_token_negentropy- Average negentropy across tokensprobability_margin- Mean difference between top-2 token probabilities
Sampled-logprobs scorers (4):
semantic_negentropy- Entropy based on semantic clusteringsemantic_density- Density-based confidence measuremonte_carlo_probability- Average sequence probability across samplesconsistency_and_confidence- Cosine similarity × response probability
P(True) scorer (1):
p_true- LLM's estimate of P(response is true)
2. Fix embeddings model specification for cosine_sim and consistency_and_confidence, enable with WhiteBoxUQ
Corrects a string error in embedding model specification with sentence_transformer parameter of BlackBoxUQ. Previously, the string was forced to begin with "sentence_transformers" but now the full string is specified with the parameter.
Previous: sentence_transformer=all-MiniLM-L12-v2 was specified and then "sentence-transformers/" was prepended to the string when storing the class attribute.
Now: sentence_transformer=sentence-transformers/all-MiniLM-L12-v2 is specified. This allows other embeddings models that don't start with "sentence_transformers/", such as jinaai/jina-embeddings-v2-base-code to be specified.
Also adds missing sentence_transformer parameter for WhiteBoxUQ
What's Changed
- v0.5.3 updates by @dylanbouchard in #327
- Fix embedding model specification by @dylanbouchard in #332
- Enable use of new white-box scorers in
UQEnsembleby @dylanbouchard in #333 - Feature/enable all white box scorers by @kaushik-42 in #328
- Patch release:
v0.5.4by @dylanbouchard in #334
Full Changelog: v0.5.3...v0.5.4
v0.5.3
Highlights
- added now demo notebook to illustrate langgraph-uqlm integration
- upgrade package versions per dependabot
- fix some LaTeX in docs site
- fix links in readme
What's Changed
- v0.5.2 updates by @dylanbouchard in #322
- fix latex in docs site by @dylanbouchard in #324
- Added LangGraph demo notebook by @vnnair98 in #323
- Security updates by @dylanbouchard in #325
- Patch release:
v0.5.3by @dylanbouchard in #326
New Contributors
Full Changelog: v0.5.2...v0.5.3
v0.5.2
Highlights
- Create
uqlm.nli.EntailmentClassifierclass for LLM-based entailment classification. This is well-suited for long-text scoring when responses exceed the length that can be handled by the Hugging Face NLI model - Update
LongTextGraph,LongTexUQ,UnitResponseScorer,GraphScorerand associated notebooks to allow for LLM-based entailment classification. - Update unit tests
- Misc. docs site cleanup
What's Changed
- Add LLM-based entailment classification + Docs cleanup by @dylanbouchard in #320
- Patch release:
v0.5.2by @dylanbouchard in #321
Full Changelog: v0.5.1...v0.5.2
v0.5.1
Highlights
- fixes rendering of long-form scorer content on the docs site
- adds missing uqlm/longform subpackage to pyproject.toml so it appears in API reference on docs site
- misc. docs site cleanup
What's Changed
- v0.5.0 updates by @dylanbouchard in #316
- Add longform subpackage and fix docs links by @dylanbouchard in #317
- fix code block in get started by @dylanbouchard in #318
- Patch release:
v0.5.1by @dylanbouchard in #319
Full Changelog: v0.5.0...v0.5.1
v0.5.0
New Methods: Long-Form UQ
Short-form UQ methods have been shown to generalize poorly to long-form LLM outputs. Fine-grained methods for long-form UQ address these limitations by first decomposing responses into granular units (sentences or claims) and then scoring each unit.
Response Decomposition
We enable decomposition of responses into sentences or claims using our ResponseDecomposer class. This class implements claim decomposition using an LLM or sentence decomposition using a rule-based approach.
Scoring methods
We add three families of fine-grained scorers for long-form uncertainty quantification: Unit-Response, Matched-Unit, and Unit-QA
1. Unit-Response (Based on the LUQ/LUQ-Atomic methods)
These scorers measure whether sampled responses entail each unit (sentence or claim) in the original response and average across sampled responses to obtain unit-level confidence scores. This is implemented with the uqlm.scorers.longform.LongTextUQ class.

2. Matched-Unit (Based on the LUQ-pair method)
These scorers work by matching each original sentence or claim to its most similar counterpart in sampled responses before computing entailment scores. Matched scores are then averaged across sampled responses to obtain a confidence score for each unit in the original response. This is implemented with the uqlm.scorers.longform.LongTextUQ class.

3. Unit-QA (Based on the Longform Semantic Entropy method)
These scorers work by decomposing a response into granular units (sentences or claims), generating questions whose answers are the claims given context, sampling multiple answers, and computes black-box UQ scores across these answers. his is implemented with the uqlm.scorers.longform.LongTextQA class.

4. Graph-Based (Based on the Jiang et al., 2024)
Graph-based scorers decompose original and sampled responses into claims, obtain the union of unique claims across all responses, and compute graph centrality metrics on the bipartite graph of claim-response entailment to measure uncertainty. This is implemented with the uqlm.scorers.longform.LongTextGraph class.

These scorer classes all share the same parent class: uqlm.scorers.longform.baseclass.LongFormUQ.
Response Refinement with Uncertainty Aware Decoding
Response refinement works by dropping claims with confidence scores (specified with claim_filtering_scorer parameter) below a specified threshold (specified with response_refinement_threshold parameter) and reconstructing the response from the retained claims. This functionality is available in combination with any of the four methods described above by setting response_refinement=True in the constructor of the corresponding scorer class.
Performance Evaluation
We enable FactScore-based grading using an LLM. This works by comparing units (sentences or claims) in a generated response to a FactScore question against the corresponding text of the subject's wikipedia article.
New docs site pages
We have added a "Scorer Definitions" tab to the docs site, intended to serve as an 'encyclopedia' of available scoring methods. It provides formal definitions, explanations in simple terms, and code snippets for all available methods.
Other changes
uqlm.scorershas now been refactored with two subfolders:uqlm.scorers.shortform(which contains existing scorer classes as of v0.4) anduqlm.scorers.longformwhich contains classes to implement the above mentioned scoring methods- the readme has been updated to reflect new longform scorers, and a new readme has been added inside the examples/ directory to provide more details on the available tutorials
- various package upgrades to address security vulnerabilities identified by dependabot
Breaking changes
normalized_probabilityhas been deprecated from acceptable white-box scorer list inWhiteBoxUQandUQEnsemblein favor ofsequence_probabilitywithlength_normalize=True(default). This also affects the key/column names in the returnedUQResultobject.
What's Changed
- v0.3 updates by @dylanbouchard in #197
- LLM based NLI + ResponseDecomposer upgrades + restructured prompts by @dskarbrevik in #199
- Minor refactor by @dylanbouchard in #201
- add aggregation method by @dylanbouchard in #202
- Add
mode,granularityparameters in place ofscorersby @dylanbouchard in #204 - Long-form Semantic Entropy by @mohitcek in #203
- add factscore grader by @dylanbouchard in #207
- Enable more granular score return by @dylanbouchard in #208
- Binary style for NLI class by @dskarbrevik in #206
- update grader by @dylanbouchard in #215
- Longform Feature: evaluate method to compute semantic entropy by @mohitcek in #217
- Refactor ClaimQA class by @mohitcek in #218
- Patch/v0.3.1 by @dylanbouchard in #225
- v0.3.1 updates by @dylanbouchard in #224
- update question template by @dylanbouchard in #227
- Feat: ClaimQA class - multiple questions per factoid/claim by @mohitcek in #228
- Claimqa updates by @dylanbouchard in #235
- v0.4.4 updates by @dylanbouchard in #279
- Merge develop -> longform UQ branch by @dylanbouchard in #282
- v0.4.5 updates by @dylanbouchard in #286
- LongForm UQ by @dylanbouchard in #283
- Created new directories for short-form and long-form responses by @mohitcek in #288
- Refactor
uqlm.scorersfor shorform vs. longform parent classes by @dylanbouchard in #289 - Issue #244 - Added Scorer Definitions on Docs Site by @vgyani in #287
- Add long-text definition to docs by @dylanbouchard in #298
- Rearrange subpackages by @dylanbouchard in #300
- Rename modules, add UAD scorer specification by @dylanbouchard in #304
- Update notebooks by @dylanbouchard in #308
- Graph based long-form scoring by @dskarbrevik in #307
- Fix links and test by @dylanbouchard in #309
- Add new unit tests by @dylanbouchard in #310
- update uad graphics by @dylanbouchard in #311
- update luq graphic and version by @dylanbouchard in #313
- add qa unit test by @dylanbouchard in #314
- Minor release:
v0.5.0by @dylanbouchard in #315
Full Changelog: v0.4.5...v0.5.0
v0.4.5
Highlights
- fix bug in model name string checking when retrieving logprobs, per issue #284
What's Changed
- patch release:
v0.4.5by @dylanbouchard in #285
Full Changelog: v0.4.4...v0.4.5
v0.4.4
Highlights
max_lengthparameter toWhiteBoxUQto avoid the CUDAOutOfMemoryError.- updates demo and docstring accordingly
What's Changed
- v0.4.3 updates by @dylanbouchard in #276
- Patch release:
v0.4.4by @dylanbouchard in #277
Full Changelog: v0.4.3...v0.4.4