Skip to content

Commit 8ecc6a8

Browse files
mo374ztimo282finitearth
authored
v1.3.0 (#34)
* Feature/workflows (#8) * chore: add codeowners file * chore: add python poetry action and docs workflow * chore: update pre-commit file * chore: update docs * chore: update logo * chore: add cicd pipeline for automated deployment * chore: update poetry version * chore: fix action versioning * chore: add gitattributes to ignore line count in jupyter notebooks * chore: add and update docstrings * chore: fix end of files * chore: update action versions * Update README.md --------- Co-authored-by: mo374z <[email protected]> * Fix/workflows (#11) * chore: fix workflow execution * chore: fix version check in CICD pipeline * Opro implementation (#7) * update gitignore * initial implementation of opro * formatting of prompt template * added opro test run * opro refinements * fixed sampling error * add docs to opro * fix pre commit issues# * fix pre commit issues# * fixed end of line * Patch/pre commit config (#10) * fixed pre commit config and removed end of file line breaks in tempaltes * added / * Feature/prompt generation (#12) * added prompt_creation.py * change version * Create LICENSE (#14) * Refactor/remove deepinfra (#16) * Remove deepinfra file * change langchain-community version * Usability patches (#15) * renamed get_tasks to get_task and change functionality accordingly. moved templates and data_sets * init * move templates to templates.py * Add nested asyncio to make it useable in notebooks * Update README.md * changed getting_started.ipynb and created helper functions * added sampling of initial population * fixed config * fixed callbacks * adjust runs * fix run evaluation api token * fix naming convention in opro, remove on epoch end for logger callback, fixed to allow for numeric values in class names * Update promptolution/llms/api_llm.py Co-authored-by: Timo Heiß <[email protected]> * fixed comments * Update pyproject.toml * resolve comments --------- Co-authored-by: mo374z <[email protected]> Co-authored-by: Timo Heiß <[email protected]> Co-authored-by: Moritz Schlager <[email protected]> * Feature/examplar selection (#17) * implemented random selector * added random search selector * increased version count * fix typos * Update promptolution/predictors/base_predictor.py Co-authored-by: Timo Heiß <[email protected]> * Update promptolution/tasks/classification_tasks.py Co-authored-by: Timo Heiß <[email protected]> * resolve comments * resolve comments --------- Co-authored-by: Timo Heiß <[email protected]> * Chore/docs release notes (#18) * Update release-notes.md * Fix release note links * revert Chore/docs release notes (#18)" This reverts commit e23dd74. * revert last commit * updated release notes and read me * Feature/read from df (#21) * Delete Experiment files * Removed config necessities * improved opro meta-prompts * added read from data frame feature * changed required python version to 3.9 * Update pyproject.toml * Update release-notes.md * merge * merge * resolve merge mistakes * delete duplicated lines * Update release-notes.md (#24) * Fix/dependencies (#28) * delete poetry.lock and upgrade transformers dependency * Update release-notes.md * Add vllm as feature and a llm_test_run_script * small fixes in vllm class * differentiate between vllm and api inference * set up experiment over multiple tasks and prompts * change csv saving * add base llm super class * add changes from PR review * change some VLLM params * fix tensor parallel size to 1 * experiment with batch size * experiment with larger batch sizes * add continuous batch llm * remove arg * remove continuous batch inference try * add batching to vllm * add batching in script * Add release notes and increase version number * remove llm_test_run.py script * change system prompt * Fix/vllm (#33) * add token count, flexible batch size and kwargs to vllm class * add testing script for implementation * fix batch size calculation * small changes * add revision test * add argument to parser * max model len to int * remove script * Change version and Release notes * changed callback behaviour and impelemented token count callback * added super inits * allow for splits not based on white space (such as new line break etc) * include task descriptions * add tokenizer based token count to vllm class * update test run script * use classifiers accordingly * small fix * add storage path * helpers should use classificator * use different model * changes in opro test * change get_predictor function * fix callback calling * change optimizer test run script * small alignments * small alignments * small alignments * some changes to match the current optimizer implementation * changes in template and config * allow for batching of prompt creation * update release notes and version * extend csvcallback functionality * change callback csv export * change step time calculation * small changes * remove llm_test_run script * update release notes * fix issues in token stepswise calculation * small fix --------- Co-authored-by: finitearth <[email protected]> * implement changes from review * add typing to token count callback --------- Co-authored-by: Timo Heiß <[email protected]> Co-authored-by: Tom Zehle <[email protected]> Co-authored-by: Timo Heiß <[email protected]>
1 parent 0eb5409 commit 8ecc6a8

22 files changed

+480
-166
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,5 @@ __pycache__/
77
temp/
88
dist/
99
outputs/
10+
results/
1011
poetry.lock

docs/release-notes.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,19 @@
11
# Release Notes
22

3+
## Release v1.3.0
4+
### What's changed
5+
#### Added features
6+
* new features for the VLLM Wrapper (automatic batch size determination, accepting kwargs)
7+
* allow callbacks to terminate optimization run
8+
* add token count functionality
9+
* renamed "Classificator"-Predictor to "FirstOccurenceClassificator"
10+
* introduced "MarkerBasedClassifcator"
11+
* automatic task description creation
12+
* use task description in prompt creation
13+
* implement CSV callbacks
14+
15+
**Full Changelog**: [here](https://github.com/finitearth/promptolution/compare/v1.2.0...v1.3.0)
16+
317
## Release v1.2.0
418
### What's changed
519
#### Added features

promptolution/callbacks.py

Lines changed: 103 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,10 @@
11
"""Callback classes for logging, saving, and tracking optimization progress."""
22

33
import os
4+
import time
5+
from typing import Literal
46

7+
import numpy as np
58
import pandas as pd
69
from tqdm import tqdm
710

@@ -14,24 +17,33 @@ def on_step_end(self, optimizer):
1417
1518
Args:
1619
optimizer: The optimizer object that called the callback.
20+
21+
Returns:
22+
Bool: True if the optimization should continue, False if it should stop.
1723
"""
18-
pass
24+
return True
1925

2026
def on_epoch_end(self, optimizer):
2127
"""Called at the end of each optimization epoch.
2228
2329
Args:
2430
optimizer: The optimizer object that called the callback.
31+
32+
Returns:
33+
Bool: True if the optimization should continue, False if it should stop.
2534
"""
26-
pass
35+
return True
2736

2837
def on_train_end(self, optimizer):
2938
"""Called at the end of the entire optimization process.
3039
3140
Args:
3241
optimizer: The optimizer object that called the callback.
42+
43+
Returns:
44+
Bool: True if the optimization should continue, False if it should stop.
3345
"""
34-
pass
46+
return True
3547

3648

3749
class LoggerCallback(Callback):
@@ -57,14 +69,21 @@ def on_step_end(self, optimizer):
5769
self.logger.critical(f"*** Prompt {i}: Score: {score}")
5870
self.logger.critical(f"{prompt}")
5971

72+
return True
73+
6074
def on_train_end(self, optimizer, logs=None):
6175
"""Log information at the end of training.
6276
6377
Args:
6478
optimizer: The optimizer object that called the callback.
6579
logs: Additional information to log.
6680
"""
67-
self.logger.critical(f"Training ended - {logs}")
81+
if logs is None:
82+
self.logger.critical("Training ended")
83+
else:
84+
self.logger.critical(f"Training ended - {logs}")
85+
86+
return True
6887

6988

7089
class CSVCallback(Callback):
@@ -73,25 +92,25 @@ class CSVCallback(Callback):
7392
This callback saves prompts and scores at each step to a CSV file.
7493
7594
Attributes:
76-
path (str): The path to the CSV file.
95+
dir (str): Directory the CSV file is saved to.
7796
step (int): The current step number.
7897
"""
7998

80-
def __init__(self, path):
99+
def __init__(self, dir):
81100
"""Initialize the CSVCallback.
82101
83102
Args:
84-
path (str): The path to the CSV file.
103+
dir (str): Directory the CSV file is saved to.
85104
"""
86-
# if dir does not exist
87-
if not os.path.exists(os.path.dirname(path)):
88-
os.makedirs(os.path.dirname(path))
89-
90-
# create file in path with header: "step,prompt,score"
91-
with open(path, "w") as f:
92-
f.write("step,prompt,score\n")
93-
self.path = path
105+
if not os.path.exists(dir):
106+
os.makedirs(dir)
107+
108+
self.dir = dir
94109
self.step = 0
110+
self.input_tokens = 0
111+
self.output_tokens = 0
112+
self.start_time = time.time()
113+
self.step_time = time.time()
95114

96115
def on_step_end(self, optimizer):
97116
"""Save prompts and scores to csv.
@@ -101,17 +120,50 @@ def on_step_end(self, optimizer):
101120
"""
102121
self.step += 1
103122
df = pd.DataFrame(
104-
{"step": [self.step] * len(optimizer.prompts), "prompt": optimizer.prompts, "score": optimizer.scores}
123+
{
124+
"step": [self.step] * len(optimizer.prompts),
125+
"input_tokens": [optimizer.meta_llm.input_token_count - self.input_tokens] * len(optimizer.prompts),
126+
"output_tokens": [optimizer.meta_llm.output_token_count - self.output_tokens] * len(optimizer.prompts),
127+
"time_elapsed": [time.time() - self.step_time] * len(optimizer.prompts),
128+
"score": optimizer.scores,
129+
"prompt": optimizer.prompts,
130+
}
105131
)
106-
df.to_csv(self.path, mode="a", header=False, index=False)
132+
self.step_time = time.time()
133+
self.input_tokens = optimizer.meta_llm.input_token_count
134+
self.output_tokens = optimizer.meta_llm.output_token_count
135+
136+
if not os.path.exists(self.dir + "step_results.csv"):
137+
df.to_csv(self.dir + "step_results.csv", index=False)
138+
else:
139+
df.to_csv(self.dir + "step_results.csv", mode="a", header=False, index=False)
140+
141+
return True
107142

108143
def on_train_end(self, optimizer):
109144
"""Called at the end of training.
110145
111146
Args:
112147
optimizer: The optimizer object that called the callback.
113148
"""
114-
pass
149+
df = pd.DataFrame(
150+
dict(
151+
steps=self.step,
152+
input_tokens=optimizer.meta_llm.input_token_count,
153+
output_tokens=optimizer.meta_llm.output_token_count,
154+
time_elapsed=time.time() - self.start_time,
155+
score=np.array(optimizer.scores).mean(),
156+
best_prompts=str(optimizer.prompts),
157+
),
158+
index=[0],
159+
)
160+
161+
if not os.path.exists(self.dir + "train_results.csv"):
162+
df.to_csv(self.dir + "train_results.csv", index=False)
163+
else:
164+
df.to_csv(self.dir + "train_results.csv", mode="a", header=False, index=False)
165+
166+
return True
115167

116168

117169
class BestPromptCallback(Callback):
@@ -139,6 +191,8 @@ def on_step_end(self, optimizer):
139191
self.best_score = optimizer.scores[0]
140192
self.best_prompt = optimizer.prompts[0]
141193

194+
return True
195+
142196
def get_best_prompt(self):
143197
"""Get the best prompt and score achieved during optimization.
144198
@@ -173,10 +227,41 @@ def on_step_end(self, optimizer):
173227
"""
174228
self.pbar.update(1)
175229

230+
return True
231+
176232
def on_train_end(self, optimizer):
177233
"""Close the progress bar at the end of training.
178234
179235
Args:
180236
optimizer: The optimizer object that called the callback.
181237
"""
182238
self.pbar.close()
239+
240+
return True
241+
242+
243+
class TokenCountCallback(Callback):
244+
"""Callback for stopping optimization based on the total token count."""
245+
246+
def __init__(
247+
self,
248+
max_tokens_for_termination: int,
249+
token_type_for_termination: Literal["input_tokens", "output_tokens", "total_tokens"],
250+
):
251+
"""Initialize the TokenCountCallback.
252+
253+
Args:
254+
max_tokens_for_termination (int): Maximum number of tokens which is allowed befor the algorithm is stopped.
255+
token_type_for_termination (str): Can be one of either "input_tokens", "output_tokens" or "total_tokens".
256+
"""
257+
self.max_tokens_for_termination = max_tokens_for_termination
258+
self.token_type_for_termination = token_type_for_termination
259+
260+
def on_step_end(self, optimizer):
261+
"""Check if the total token count exceeds the maximum allowed. If so, stop the optimization."""
262+
token_counts = optimizer.predictor.llm.get_token_count()
263+
264+
if token_counts[self.token_type_for_termination] > self.max_tokens_for_termination:
265+
return False
266+
267+
return True

promptolution/config.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,15 +17,17 @@ class Config:
1717
ds_path (str): Path to the dataset. Should not be None if used.
1818
n_steps (int): Number of optimization steps. Should not be None if used.
1919
optimizer (str): Name of the optimizer to use. Should not be None if used.
20+
predictor (str): Name of the predictor to use. Defaults to "FirstOccurenceClassificator".
2021
meta_llm (str): Name of the meta language model. Should not be None if used.
2122
downstream_llm (str): Name of the downstream language model. Should not be None if used.
2223
evaluation_llm (str): Name of the evaluation language model. Should not be None if used.
2324
init_pop_size (int): Initial population size. Defaults to 10.
2425
logging_dir (str): Directory for logging. Defaults to "logs/run.csv".
2526
experiment_name (str): Name of the experiment. Defaults to "experiment".
26-
include_task_desc (bool): Whether to include task description. Defaults to False.
27+
task_description (str): Task Description fed to the optimizer. Defaults to None.
2728
donor_random (bool): Whether to use random donor prompts for EvoPromptDE. Defaults to False.
2829
random_seed (int): Random seed for reproducibility. Defaults to 42.
30+
model_storage_path (str): Path to the model storage directory (used for VLLM). Defaults to "../models/".
2931
selection_mode (str): Selection mode for EvoPromptGA. Defaults to "random".
3032
meta_bs (int): Batch size for local meta LLM. Should not be None if llm is run locally. Defaults to None.
3133
downstream_bs (int): Batch size for local downstream LLM.
@@ -46,16 +48,18 @@ class Config:
4648
task_name: str = None
4749
ds_path: Path = None
4850
optimizer: str = None
51+
predictor: Literal["MarkerBasedClassificator", "FirstOccurenceClassificator"] = "FirstOccurenceClassificator"
4952
meta_llm: str = None
5053
downstream_llm: str = None
5154
evaluation_llm: str = None
5255
n_steps: int = None
5356
init_pop_size: int = None
5457
logging_dir: Path = Path("logs/run.csv")
5558
experiment_name: str = "experiment"
56-
include_task_desc: bool = True
59+
task_description: str = None
5760
donor_random: bool = False
5861
random_seed: int = 42
62+
model_storage_path: Optional[Path] = Path("../models/")
5963
selection_mode: Optional[Literal["random", "wheel", "tour"]] = "random"
6064
meta_bs: Optional[int] = None
6165
downstream_bs: Optional[int] = None

promptolution/helpers.py

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
from promptolution.exemplar_selectors import get_exemplar_selector
1010
from promptolution.llms import get_llm
1111
from promptolution.optimizers import get_optimizer
12-
from promptolution.predictors import Classificator
12+
from promptolution.predictors import FirstOccurrenceClassificator, MarkerBasedClassificator
1313
from promptolution.tasks import get_task
1414

1515

@@ -27,7 +27,7 @@ def run_experiment(config: Config):
2727
return df
2828

2929

30-
def run_optimization(config: Config):
30+
def run_optimization(config: Config, callbacks: List = None):
3131
"""Run the optimization phase of the experiment.
3232
3333
Args:
@@ -37,8 +37,13 @@ def run_optimization(config: Config):
3737
List[str]: The optimized list of prompts.
3838
"""
3939
task = get_task(config)
40-
llm = get_llm(config.meta_llm, token=config.api_token)
41-
predictor = Classificator(llm, classes=task.classes)
40+
llm = get_llm(config.meta_llm, token=config.api_token, model_storage_path=config.model_storage_path)
41+
if config.predictor == "MarkerBasedClassificator":
42+
predictor = MarkerBasedClassificator(llm, classes=task.classes)
43+
elif config.predictor == "FirstOccurenceClassificator":
44+
predictor = FirstOccurrenceClassificator(llm, classes=task.classes)
45+
else:
46+
raise ValueError(f"Predictor {config.predictor} not supported.")
4247

4348
if config.init_pop_size:
4449
init_pop = np.random.choice(task.initial_population, size=config.init_pop_size, replace=True)
@@ -52,6 +57,8 @@ def run_optimization(config: Config):
5257
task=task,
5358
predictor=predictor,
5459
n_eval_samples=config.n_eval_samples,
60+
callbacks=callbacks,
61+
task_description=predictor.extraction_description,
5562
)
5663

5764
prompts = optimizer.optimize(n_steps=config.n_steps)
@@ -76,7 +83,7 @@ def run_evaluation(config: Config, prompts: List[str]):
7683
task = get_task(config, split="test")
7784

7885
llm = get_llm(config.evaluation_llm, token=config.api_token)
79-
predictor = Classificator(llm, classes=task.classes)
86+
predictor = FirstOccurrenceClassificator(llm, classes=task.classes)
8087

8188
scores = task.evaluate(prompts, predictor, subsample=True, n_samples=config.n_eval_samples)
8289
df = pd.DataFrame(dict(prompt=prompts, score=scores))

promptolution/llms/api_llm.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
import asyncio
44
import time
55
from logging import INFO, Logger
6-
from typing import List
6+
from typing import Any, List
77

88
import nest_asyncio
99
import openai
@@ -63,7 +63,7 @@ class APILLM(BaseLLM):
6363
get_response_async: Asynchronously get responses for a list of prompts.
6464
"""
6565

66-
def __init__(self, model_id: str, token: str = None):
66+
def __init__(self, model_id: str, token: str = None, **kwargs: Any):
6767
"""Initialize the APILLM with a specific model.
6868
6969
Args:
@@ -73,14 +73,15 @@ def __init__(self, model_id: str, token: str = None):
7373
Raises:
7474
ValueError: If an unknown model identifier is provided.
7575
"""
76+
super().__init__()
7677
if "claude" in model_id:
7778
self.model = ChatAnthropic(model=model_id, api_key=token)
7879
elif "gpt" in model_id:
7980
self.model = ChatOpenAI(model=model_id, api_key=token)
8081
else:
8182
self.model = ChatDeepInfra(model_name=model_id, deepinfra_api_token=token)
8283

83-
def get_response(self, prompts: List[str]) -> List[str]:
84+
def _get_response(self, prompts: List[str]) -> List[str]:
8485
"""Get responses for a list of prompts in a synchronous manner.
8586
8687
This method includes retry logic for handling connection errors and rate limits.

0 commit comments

Comments
 (0)