13 Mar 16:56

dylanbouchard

35aa5ef

v0.5.7 Latest

Latest

What's Changed

Fixed doc site inaccuracies by @vgyani in #367
Patch release: v0.5.7 by @dylanbouchard in #368

Full Changelog: v0.5.6...v0.5.7

Contributors

vgyani and dylanbouchard

Assets 2

02 Mar 19:18

dylanbouchard

v0.5.6

6ee9312

v0.5.6

Highlights

package upgrades from dependabot
update badge colors and links
update citation information

What's Changed

Badge by @dylanbouchard in #351
Revise publication links and citation details by @dylanbouchard in #352
Badge by @dylanbouchard in #353
Bump nbsphinx from 0.9.6 to 0.9.8 by @dependabot[bot] in #356
Bump pytest-asyncio from 1.1.1 to 1.3.0 by @dependabot[bot] in #250
Bump sphinx-gallery from 0.18.0 to 0.20.0 by @dependabot[bot] in #354
Update rich requirement from <14.0.0,>=13.8.0 to >=13.8.0,<15.0.0 by @dependabot[bot] in #355
Patch release: v0.5.6 by @dylanbouchard in #357

Full Changelog: v0.5.5...v0.5.6

Contributors

dependabot and dylanbouchard

Assets 2

25 Feb 20:56

dylanbouchard

v0.5.5

d367798

v0.5.5

Highlights

replace agenerate with ainvoke where generation is used, since ainvoke appears to be the better-maintained and better-documented LangChain method
replace poetry with uv for dependency management
update badges

What's Changed

Badge updates by @dylanbouchard in #348
Switch from Poetry to astral-sh/uv by @vgyani in #349
Patch release: v0.5.5 by @dylanbouchard in #350

Full Changelog: v0.5.4...v0.5.5

Contributors

vgyani and dylanbouchard

Assets 2

30 Jan 17:06

dylanbouchard

v0.5.4

84a6121

v0.5.4

Highlights

1. Add new white-box scorers to `UQEnsemble` accepted scorers list:

Top-logprobs scorers (3):

min_token_negentropy - Minimum negentropy across tokens
mean_token_negentropy - Average negentropy across tokens
probability_margin - Mean difference between top-2 token probabilities

Sampled-logprobs scorers (4):

semantic_negentropy - Entropy based on semantic clustering
semantic_density - Density-based confidence measure
monte_carlo_probability - Average sequence probability across samples
consistency_and_confidence - Cosine similarity × response probability

P(True) scorer (1):

p_true - LLM's estimate of P(response is true)

2. Fix embeddings model specification for `cosine_sim` and `consistency_and_confidence`, enable with `WhiteBoxUQ`

Corrects a string error in embedding model specification with sentence_transformer parameter of BlackBoxUQ. Previously, the string was forced to begin with "sentence_transformers" but now the full string is specified with the parameter.

Previous: sentence_transformer=all-MiniLM-L12-v2 was specified and then "sentence-transformers/" was prepended to the string when storing the class attribute.

Now: sentence_transformer=sentence-transformers/all-MiniLM-L12-v2 is specified. This allows other embeddings models that don't start with "sentence_transformers/", such as jinaai/jina-embeddings-v2-base-code to be specified.

Also adds missing sentence_transformer parameter for WhiteBoxUQ

What's Changed

v0.5.3 updates by @dylanbouchard in #327
Fix embedding model specification by @dylanbouchard in #332
Enable use of new white-box scorers in UQEnsemble by @dylanbouchard in #333
Feature/enable all white box scorers by @kaushik-42 in #328
Patch release: v0.5.4 by @dylanbouchard in #334

Full Changelog: v0.5.3...v0.5.4

Contributors

kaushik-42 and dylanbouchard

Assets 2

20 Jan 20:51

dylanbouchard

v0.5.3

6d26749

v0.5.3

Highlights

added now demo notebook to illustrate langgraph-uqlm integration
upgrade package versions per dependabot
fix some LaTeX in docs site
fix links in readme

What's Changed

v0.5.2 updates by @dylanbouchard in #322
fix latex in docs site by @dylanbouchard in #324
Added LangGraph demo notebook by @vnnair98 in #323
Security updates by @dylanbouchard in #325
Patch release: v0.5.3 by @dylanbouchard in #326

New Contributors

@vnnair98 made their first contribution in #323

Full Changelog: v0.5.2...v0.5.3

Contributors

dylanbouchard and vnnair98

Assets 2

14 Jan 16:05

dylanbouchard

v0.5.2

7acd188

v0.5.2

Highlights

Create uqlm.nli.EntailmentClassifier class for LLM-based entailment classification. This is well-suited for long-text scoring when responses exceed the length that can be handled by the Hugging Face NLI model
Update LongTextGraph, LongTexUQ, UnitResponseScorer, GraphScorer and associated notebooks to allow for LLM-based entailment classification.
Update unit tests
Misc. docs site cleanup

What's Changed

Add LLM-based entailment classification + Docs cleanup by @dylanbouchard in #320
Patch release: v0.5.2 by @dylanbouchard in #321

Full Changelog: v0.5.1...v0.5.2

Contributors

dylanbouchard

Assets 2

09 Jan 14:38

dylanbouchard

v0.5.1

7bd62f1

v0.5.1

Highlights

fixes rendering of long-form scorer content on the docs site
adds missing uqlm/longform subpackage to pyproject.toml so it appears in API reference on docs site
misc. docs site cleanup

What's Changed

v0.5.0 updates by @dylanbouchard in #316
Add longform subpackage and fix docs links by @dylanbouchard in #317
fix code block in get started by @dylanbouchard in #318
Patch release: v0.5.1 by @dylanbouchard in #319

Full Changelog: v0.5.0...v0.5.1

Contributors

dylanbouchard

Assets 2

08 Jan 17:57

dylanbouchard

v0.5.0

4ce62b2

v0.5.0

New Methods: Long-Form UQ

Short-form UQ methods have been shown to generalize poorly to long-form LLM outputs. Fine-grained methods for long-form UQ address these limitations by first decomposing responses into granular units (sentences or claims) and then scoring each unit.

Response Decomposition

We enable decomposition of responses into sentences or claims using our ResponseDecomposer class. This class implements claim decomposition using an LLM or sentence decomposition using a rule-based approach.

Scoring methods

We add three families of fine-grained scorers for long-form uncertainty quantification: Unit-Response, Matched-Unit, and Unit-QA

1. Unit-Response (Based on the LUQ/LUQ-Atomic methods)

These scorers measure whether sampled responses entail each unit (sentence or claim) in the original response and average across sampled responses to obtain unit-level confidence scores. This is implemented with the uqlm.scorers.longform.LongTextUQ class.

2. Matched-Unit (Based on the LUQ-pair method)

These scorers work by matching each original sentence or claim to its most similar counterpart in sampled responses before computing entailment scores. Matched scores are then averaged across sampled responses to obtain a confidence score for each unit in the original response. This is implemented with the uqlm.scorers.longform.LongTextUQ class.

3. Unit-QA (Based on the Longform Semantic Entropy method)

These scorers work by decomposing a response into granular units (sentences or claims), generating questions whose answers are the claims given context, sampling multiple answers, and computes black-box UQ scores across these answers. his is implemented with the uqlm.scorers.longform.LongTextQA class.

4. Graph-Based (Based on the Jiang et al., 2024)

Graph-based scorers decompose original and sampled responses into claims, obtain the union of unique claims across all responses, and compute graph centrality metrics on the bipartite graph of claim-response entailment to measure uncertainty. This is implemented with the uqlm.scorers.longform.LongTextGraph class.

These scorer classes all share the same parent class: uqlm.scorers.longform.baseclass.LongFormUQ.

Response Refinement with Uncertainty Aware Decoding

Response refinement works by dropping claims with confidence scores (specified with claim_filtering_scorer parameter) below a specified threshold (specified with response_refinement_threshold parameter) and reconstructing the response from the retained claims. This functionality is available in combination with any of the four methods described above by setting response_refinement=True in the constructor of the corresponding scorer class.

Performance Evaluation

We enable FactScore-based grading using an LLM. This works by comparing units (sentences or claims) in a generated response to a FactScore question against the corresponding text of the subject's wikipedia article.

New docs site pages

We have added a "Scorer Definitions" tab to the docs site, intended to serve as an 'encyclopedia' of available scoring methods. It provides formal definitions, explanations in simple terms, and code snippets for all available methods.

Other changes

uqlm.scorers has now been refactored with two subfolders: uqlm.scorers.shortform (which contains existing scorer classes as of v0.4) and uqlm.scorers.longform which contains classes to implement the above mentioned scoring methods
the readme has been updated to reflect new longform scorers, and a new readme has been added inside the examples/ directory to provide more details on the available tutorials
various package upgrades to address security vulnerabilities identified by dependabot

Breaking changes

normalized_probability has been deprecated from acceptable white-box scorer list in WhiteBoxUQ and UQEnsemble in favor of sequence_probability with length_normalize=True (default). This also affects the key/column names in the returned UQResult object.

What's Changed

v0.3 updates by @dylanbouchard in #197
LLM based NLI + ResponseDecomposer upgrades + restructured prompts by @dskarbrevik in #199
Minor refactor by @dylanbouchard in #201
add aggregation method by @dylanbouchard in #202
Add mode, granularity parameters in place of scorers by @dylanbouchard in #204
Long-form Semantic Entropy by @mohitcek in #203
add factscore grader by @dylanbouchard in #207
Enable more granular score return by @dylanbouchard in #208
Binary style for NLI class by @dskarbrevik in #206
update grader by @dylanbouchard in #215
Longform Feature: evaluate method to compute semantic entropy by @mohitcek in #217
Refactor ClaimQA class by @mohitcek in #218
Patch/v0.3.1 by @dylanbouchard in #225
v0.3.1 updates by @dylanbouchard in #224
update question template by @dylanbouchard in #227
Feat: ClaimQA class - multiple questions per factoid/claim by @mohitcek in #228
Claimqa updates by @dylanbouchard in #235
v0.4.4 updates by @dylanbouchard in #279
Merge develop -> longform UQ branch by @dylanbouchard in #282
v0.4.5 updates by @dylanbouchard in #286
LongForm UQ by @dylanbouchard in #283
Created new directories for short-form and long-form responses by @mohitcek in #288
Refactor uqlm.scorers for shorform vs. longform parent classes by @dylanbouchard in #289
Issue #244 - Added Scorer Definitions on Docs Site by @vgyani in #287
Add long-text definition to docs by @dylanbouchard in #298
Rearrange subpackages by @dylanbouchard in #300
Rename modules, add UAD scorer specification by @dylanbouchard in #304
Update notebooks by @dylanbouchard in #308
Graph based long-form scoring by @dskarbrevik in #307
Fix links and test by @dylanbouchard in #309
Add new unit tests by @dylanbouchard in #310
update uad graphics by @dylanbouchard in #311
update luq graphic and version by @dylanbouchard in #313
add qa unit test by @dylanbouchard in #314
Minor release: v0.5.0 by @dylanbouchard in #315

Full Changelog: v0.4.5...v0.5.0

Contributors

dskarbrevik, mohitcek, and 2 other contributors

Assets 2

08 Dec 16:18

dylanbouchard

v0.4.5

9d30468

v0.4.5

Highlights

fix bug in model name string checking when retrieving logprobs, per issue #284

What's Changed

patch release: v0.4.5 by @dylanbouchard in #285

Full Changelog: v0.4.4...v0.4.5

Contributors

dylanbouchard

Assets 2

04 Dec 15:42

dylanbouchard

v0.4.4

e9305e4

v0.4.4

Highlights

max_length parameter to WhiteBoxUQ to avoid the CUDA OutOfMemoryError.
updates demo and docstring accordingly

What's Changed

v0.4.3 updates by @dylanbouchard in #276
Patch release: v0.4.4 by @dylanbouchard in #277

Full Changelog: v0.4.3...v0.4.4

Contributors

dylanbouchard

Assets 2

Releases: cvs-health/uqlm

v0.5.7

What's Changed

Contributors

Uh oh!

v0.5.6

Highlights

What's Changed

Contributors

Uh oh!

v0.5.5

Highlights

What's Changed

Contributors

Uh oh!

v0.5.4

Highlights

1. Add new white-box scorers to UQEnsemble accepted scorers list:

2. Fix embeddings model specification for cosine_sim and consistency_and_confidence, enable with WhiteBoxUQ

What's Changed

Contributors

Uh oh!

v0.5.3

Highlights

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.2

Highlights

What's Changed

Contributors

Uh oh!

v0.5.1

Highlights

What's Changed

Contributors

Uh oh!

v0.5.0

New Methods: Long-Form UQ

Response Decomposition

Scoring methods

1. Unit-Response (Based on the LUQ/LUQ-Atomic methods)

2. Matched-Unit (Based on the LUQ-pair method)

3. Unit-QA (Based on the Longform Semantic Entropy method)

4. Graph-Based (Based on the Jiang et al., 2024)

Response Refinement with Uncertainty Aware Decoding

Performance Evaluation

New docs site pages

Other changes

Breaking changes

What's Changed

Contributors

Uh oh!

v0.4.5

Highlights

What's Changed

Contributors

Uh oh!

v0.4.4

Highlights

What's Changed

Contributors

Uh oh!

1. Add new white-box scorers to `UQEnsemble` accepted scorers list:

2. Fix embeddings model specification for `cosine_sim` and `consistency_and_confidence`, enable with `WhiteBoxUQ`