You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use few-shot learning to condition chat models on the task of regurgitating their training data. This works well for GPT-3.5 and GPT-4, and also for many other LLMs (but not necessarily for all LLMs).
98
+
We use few-shot learning to condition chat models on the desired task. This works well for GPT-3.5 and GPT-4, and also for many other LLMs (but not necessarily for all LLMs).
99
99
100
100
You can set ```tabmemcheck.config.print_prompts = True``` to see the prompts.
101
101
@@ -114,6 +114,17 @@ Because one needs to weight the completions of the LLM against the entropy in th
114
114
115
115
While this all sounds very complex, the practical evidence for memorization is often very clear. This can also be seen in the examples above.
116
116
117
+
118
+
# Can I uses this package to write my own tests?
119
+
120
+
This package provides two fairly general functions
121
+
122
+
-```tabmemcheck.chat_completion```
123
+
-```tabmemcheck.prefix_suffix_chat_completion```
124
+
125
+
126
+
127
+
117
128
# Using the package with your own LLM
118
129
119
130
To test your own LLM, simply implement ```tabmemcheck.LLM_Interface```. We use the OpenAI message format.
"""A basic chat completion function. Takes a list of prefixes and suffixes and a system prompt.
430
-
Sends {num_queries} prompts of the format
436
+
"""A general-purpose chat completion function. Given prefixes, suffixes, and few-shot examples, this function sends {num_queries} LLM queries of the format
The num_queries prefixes and suffixes are randomly selected from the respective lists.
442
-
The function guarantees that the test suffix (as a complete string) is not contained in any of the few-shot prefixes or suffixes.
447
+
The prefixes, suffixes are and few-shot examples are randomly selected.
448
+
449
+
This function guarantees that the test suffix (as a complete string) is not contained in any of the few-shot prefixes or suffixes (a useful sanity check, we don't want to provide the desired response anywhere in the context).
443
450
444
-
Stores the results in a csv file.
451
+
Args:
452
+
llm (LLM_Interface): The LLM.
453
+
prefixes (list[str]): A list of prefixes.
454
+
suffixes (list[str]): A list of suffixes.
455
+
system_prompt (str): The system prompt.
456
+
few_shot (_type_, optional): Either an integer, to select the given number of few-shot examples from the list of prefixes and suffixes. Or a list [([prefixes], [suffixes]), ..., ([prefixes], [suffixes])] to select one few-shot example from each list. Defaults to None.
457
+
num_queries (int, optional): The number of queries. Defaults to 100.
458
+
print_levenshtein (bool, optional): Visualize the Levenshtein string distance between test suffixes and LLM responses. Defaults to False.
459
+
out_file (_type_, optional): Save all queries to a CSV file. Defaults to None.
460
+
rng (_type_, optional): _description_. Defaults to None.
445
461
446
-
Returns: the test prefixes, test suffixes, and responses
447
-
"""
462
+
Raises:
463
+
Exception: It an error occurs.
464
+
465
+
Returns:
466
+
tuple: A tuple of test prefixes, test suffixes, and responses.
467
+
"""
448
468
assertlen(prefixes) ==len(
449
469
suffixes
450
470
), "prefixes and suffixes must have the same length"
"""Feature completion test for memorization. The test resports the number of correctly completed features.
622
608
@@ -674,6 +660,7 @@ def build_prompt(messages):
674
660
cond_feature_names,
675
661
add_description=False,
676
662
out_file=out_file,
663
+
rng=rng,
677
664
)
678
665
679
666
# parse the model responses
@@ -715,6 +702,7 @@ def first_token_test(
715
702
few_shot=7,
716
703
out_file=None,
717
704
system_prompt: str="default",
705
+
rng=None,
718
706
):
719
707
"""First token test for memorization. We ask the model to complete the first token of the next row of the csv file, given the previous rows. The test resports the number of correctly completed tokens.
0 commit comments