You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<strong>Annoyed at having to look through your long model conversations or agentic traces? Fear not, StringSight has come to ease your woes. Understand and compare model behavior by automatically extracting behavioral properties from their responses, grouping similar behaviors together, and quantifying how important these behaviors are.</strong>
31
+
</p>
2
32
3
-
# StringSight
4
-
### *Extract, cluster, and analyze behavioral properties from Large Language Models*
# (Optional) create and activate a dedicated environment
37
+
conda create -n stringsight python=3.11
38
+
conda activate stringsight
11
39
12
-
**Understand how different generative models behave by automatically extracting behavioral properties from their responses, grouping similar behaviors together, and quantifying how important these behaviors are.**
40
+
# Install the core library from PyPI
41
+
pip install stringsight
13
42
14
-
</div>
43
+
# Install with all optional extras (recommended for notebooks and advanced workflows)
44
+
pip install "stringsight[full]"
45
+
```
15
46
16
-
## Installation
47
+
For local development or contributing, you can install from source in editable mode:
For a comprehensive tutorial with detailed explanations, see [starter_notebook.ipynb](starter_notebook.ipynb).
77
+
For a comprehensive tutorial with detailed explanations, see [starter_notebook.ipynb](starter_notebook.ipynb) or open it directly in [Google Colab](https://colab.research.google.com/drive/1XBQqDqTK6-9wopqRB51j8cPfnTS5Wjqh?usp=drive_link).
39
78
40
79
### 1. Extract and Cluster Properties with `explain()`
41
80
@@ -221,30 +260,7 @@ Use the React frontend or other visualization tools to explore your results.
221
260
222
261
### Side-by-Side Comparisons
223
262
224
-
**Option 1: Pre-paired Data**
225
-
226
-
**Required Columns:**
227
-
| Column | Description | Example |
228
-
|--------|-------------|---------|
229
-
|`prompt`| Question given to both models |`"What is machine learning?"`|
230
-
|`model_a`| First model name |`"gpt-4"`|
231
-
|`model_b`| Second model name |`"claude-3"`|
232
-
|`model_a_response`| First model's response |`"Machine learning is..."`|
233
-
|`model_b_response`| Second model's response |`"ML involves..."`|
234
-
235
-
**Optional Columns:**
236
-
| Column | Description | Example |
237
-
|--------|-------------|---------|
238
-
|`score`| Winner and metrics |`{"winner": "model_a", "helpfulness_a": 4.2, "helpfulness_b": 3.8}`|
239
-
|`score_columns`| Alternative: separate columns for each metric with `_a` and `_b` suffixes (e.g., `accuracy_a`, `accuracy_b`) |`score_columns=["accuracy_a", "accuracy_b", "helpfulness_a", "helpfulness_b"]`|
240
-
|`prompt_column`| Name of the prompt column in your dataframe (default: `"prompt"`) |`prompt_column="query"`|
241
-
|`model_a_column`| Name of the model_a column (default: `"model_a"`) |`model_a_column="model_1"`|
242
-
|`model_b_column`| Name of the model_b column (default: `"model_b"`) |`model_b_column="model_2"`|
243
-
|`model_a_response_column`| Name of the model_a_response column (default: `"model_a_response"`) |`model_a_response_column="response_1"`|
244
-
|`model_b_response_column`| Name of the model_b_response column (default: `"model_b_response"`) |`model_b_response_column="response_2"`|
245
-
|`question_id_column`| Name of the question_id column (default: `"question_id"` if column exists) |`question_id_column="qid"`|
246
-
247
-
**Option 2: Tidy Data (Auto-pairing)**
263
+
**Option 1: Tidy Data (Auto-pairing)**
248
264
249
265
If your data is in tidy single-model format with multiple models, StringSight can automatically pair them:
The pipeline will automatically pair rows where both models answered the same prompt.
270
286
287
+
**Option 2: Pre-paired Data**
288
+
289
+
**Required Columns:**
290
+
| Column | Description | Example |
291
+
|--------|-------------|---------|
292
+
|`prompt`| Question given to both models |`"What is machine learning?"`|
293
+
|`model_a`| First model name |`"gpt-4"`|
294
+
|`model_b`| Second model name |`"claude-3"`|
295
+
|`model_a_response`| First model's response |`"Machine learning is..."`|
296
+
|`model_b_response`| Second model's response |`"ML involves..."`|
297
+
298
+
**Optional Columns:**
299
+
| Column | Description | Example |
300
+
|--------|-------------|---------|
301
+
|`score`| Winner and metrics |`{"winner": "model_a", "helpfulness_a": 4.2, "helpfulness_b": 3.8}`|
302
+
|`score_columns`| Alternative: separate columns for each metric with `_a` and `_b` suffixes (e.g., `accuracy_a`, `accuracy_b`) |`score_columns=["accuracy_a", "accuracy_b", "helpfulness_a", "helpfulness_b"]`|
303
+
|`prompt_column`| Name of the prompt column in your dataframe (default: `"prompt"`) |`prompt_column="query"`|
304
+
|`model_a_column`| Name of the model_a column (default: `"model_a"`) |`model_a_column="model_1"`|
305
+
|`model_b_column`| Name of the model_b column (default: `"model_b"`) |`model_b_column="model_2"`|
306
+
|`model_a_response_column`| Name of the model_a_response column (default: `"model_a_response"`) |`model_a_response_column="response_1"`|
307
+
|`model_b_response_column`| Name of the model_b_response column (default: `"model_b_response"`) |`model_b_response_column="response_2"`|
308
+
|`question_id_column`| Name of the question_id column (default: `"question_id"` if column exists) |`question_id_column="qid"`|
0 commit comments