Skip to content

Commit 286cf03

Browse files
committed
📦 Port all ESR logic from pymc-labs/llmconsumerstudies
1 parent 96d2614 commit 286cf03

File tree

14 files changed

+2140
-5457
lines changed

14 files changed

+2140
-5457
lines changed

README.md

Lines changed: 66 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,80 @@
1-
# embeddings_similarity_rating
1+
# Embeddings-Similarity Rating (ESR)
22

3-
*Description of the project/package. Make it super easy for people to understand what it does. Add links to external resources like Notion, SOWs, etc.if needed.*
3+
A Python package implementing the Embeddings-Similarity Rating methodology for converting LLM textual responses to Likert scale probability distributions using semantic similarity against reference statements.
44

5-
## Features
5+
## Overview
66

7-
*Bullet form list of the most important features of the project/package.*
7+
The ESR methodology addresses the challenge of mapping rich textual responses from Large Language Models (LLMs) to structured Likert scale ratings. Instead of forcing a single numerical rating, ESR preserves the inherent uncertainty and nuance in textual responses by generating probability distributions over all possible Likert scale points.
88

9-
## Usage
9+
This package provides a distilled, reusable implementation of the ESR methodology described in the paper "Measuring Synthetic Consumer Purchase Intent Using Embeddings-Similarity Ratings" by Maier & Aslak (2025).
1010

11-
*How to use `embeddings_similarity_rating`. Include examples and code snippets.*
11+
## Installation
1212

13-
## Project Structure
13+
### Local Development
14+
To install this package locally for development, run:
15+
```bash
16+
pip install -e .
17+
```
1418

15-
- `embeddings_similarity_rating/`: Contains the package logic
16-
- `tests/`: Contains tests for the package
17-
- `notebooks/`: Contains exploratory code for testing new features
19+
### From GitHub Repository
20+
To install this package into your own project from GitHub, run:
21+
```bash
22+
pip install git+https://github.com/pymc-labs/embeddings-similarity-rating.git
23+
```
1824

19-
## Development
25+
## Quick Start
2026

21-
This package has been created with [pymc-labs/project-starter](https://github.com/pymc-labs/project-starter). It features:
27+
```python
28+
import numpy as np
29+
import polars as po
30+
from embeddings_similarity_rating import EmbeddingsRater
2231

23-
- 📦 **`pixi`** for dependency and environment management.
24-
- 🧹 **`pre-commit`** for formatting, spellcheck, etc. If everyone uses the same standard formatting, then PRs won't have flaky formatting updates that distract from the actual contribution. Reviewing code will be much easier.
25-
- 🏷️ **`beartype`** for runtime type checking. If you know what's going in and out of functions just by reading the code, then it's easier to debug. And if these types are even enforced at runtime with tools like `beartype`, then there's a whole class of bugs that can never enter your code.
26-
- 🧪 **`pytest`** for testing. Meanwhile, with `beartype` handling type checks, tests do not have to assert types, and can merely focus on whether the actual logic works.
27-
- 🔄 **Github Actions** for running the pre-commit checks on each PR, automated testing and dependency management (dependabot).
32+
# Create reference sentences with embeddings
33+
reference_data = po.DataFrame({
34+
'id': ['set1'] * 5,
35+
'int_response': [1, 2, 3, 4, 5],
36+
'sentence': [
37+
"It's very unlikely that I'd buy it.",
38+
"It's unlikely that I'd buy it.",
39+
"I might buy it or not. I don't know.",
40+
"It's somewhat possible I'd buy it.",
41+
"It's possible I'd buy it."
42+
],
43+
'embedding_small': [np.random.rand(384).tolist() for _ in range(5)]
44+
})
2845

29-
### Prerequisites
46+
# Initialize the rater
47+
rater = EmbeddingsRater(reference_data, embeddings_column='embedding_small')
3048

31-
- Python 3.11 or higher
32-
- [Pixi package manager](https://pixi.sh/latest/)
49+
# Convert LLM response embeddings to probability distributions
50+
llm_responses = np.random.rand(10, 384)
51+
pdfs = rater.get_response_pdfs('set1', llm_responses)
3352

34-
### Get started
53+
# Get overall survey distribution
54+
survey_pdf = rater.get_survey_response_pdf(pdfs)
55+
print(f"Survey distribution: {survey_pdf}")
56+
```
3557

36-
1. Run `pixi install` to install the dependencies.
37-
2. Run `pixi r test` to run the tests.
58+
## Methodology
59+
60+
The ESR methodology works by:
61+
1. Defining reference statements for each Likert scale point
62+
2. Computing cosine similarities between LLM response embeddings and reference statement embeddings
63+
3. Converting similarities to probability distributions using minimum similarity subtraction and normalization
64+
4. Optionally applying temperature scaling for distribution control
65+
66+
## Core Components
67+
68+
- `EmbeddingsRater`: Main class implementing the ESR methodology
69+
- `response_embeddings_to_pdf()`: Core function for similarity-to-probability conversion
70+
- `scale_pdf()` and `scale_pdf_no_max_temp()`: Temperature scaling functions
71+
72+
## Citation
73+
74+
```
75+
Maier, B. F., & Aslak, U. (2025). Measuring Synthetic Consumer Purchase Intent Using Embeddings-Similarity Ratings.
76+
```
77+
78+
## License
79+
80+
MIT License
Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,26 @@
1-
"""Top-level module for embeddings_similarity_rating."""
1+
"""
2+
Embeddings-Similarity Rating (ESR) Package
3+
4+
A package for converting LLM textual responses to Likert scale probability distributions
5+
using semantic similarity against reference statements.
6+
7+
This package implements the ESR methodology described in the paper:
8+
"Measuring Synthetic Consumer Purchase Intent Using Embeddings-Similarity Ratings"
9+
"""
210

311
from beartype.claw import beartype_this_package
412

5-
from .model import my_model
13+
from .compute import response_embeddings_to_pdf, scale_pdf, scale_pdf_no_max_temp
14+
from .embeddings_rater import EmbeddingsRater
15+
16+
__version__ = "1.0.0"
17+
__author__ = "Ben F. Maier, Ulf Aslak"
618

7-
__all__ = ["my_model"]
19+
__all__ = [
20+
"EmbeddingsRater",
21+
"response_embeddings_to_pdf",
22+
"scale_pdf",
23+
"scale_pdf_no_max_temp",
24+
]
825

926
beartype_this_package()

0 commit comments

Comments
 (0)