Skip to content

Commit 2818a79

Browse files
authored
Merge pull request #12 from compspec/add-tracker-results
feat: enhanced support for results directory and more metadata
2 parents b447d25 + c0ec067 commit 2818a79

File tree

27 files changed

+1602
-766
lines changed

27 files changed

+1602
-766
lines changed

examples/agent/README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,12 @@ fractale agent --plan ./plans/run-lammps.yaml
5757

5858
# or try using with the cache
5959
fractale agent --plan ./plans/run-lammps.yaml --use-cache
60+
61+
# Save metadata
62+
fractale agent --plan ./plans/run-lammps.yaml --results ./results
63+
64+
# Save metadata and include incremental results
65+
fractale agent --plan ./plans/run-lammps.yaml --results ./results --incremental
6066
```
6167

6268
We haven't hit the case yet where the manager needs to take over - that needs further development, along with being goal oriented (e.g., parsing a log and getting an output).
@@ -66,11 +72,19 @@ We haven't hit the case yet where the manager needs to take over - that needs fu
6672
#### To do items
6773

6874
- Figure out optimization agent (with some goal)
75+
- The LLM absolutely needs detail about the data, and what to run.
76+
- Error messages from programs are immensely important now since the LLM makes decisions entirely from it.
77+
- Right now when we restart, we do with fresh slate (no log memory) - should there be?
78+
- We likely want some want to quantify the amount of change between prompts, and the difficulty of the task.
79+
- I think likely when we return to the manager, we want the last response (that might say why it is returning) should inform step selection. But not just step selection, the updated prompt to the step missing something.
80+
- Right now we rely on random sampling of the space to avoid whatever the issue might be.
6981

7082
#### Research Questions
7183

7284
**And experiment ideas**
7385

86+
- Why does it make the same mistakes? E.g., always forgetting ca-certificates. Did it learn from data that was OK to do and thus errors result from inconsistencies between the way things used to work and the way they do now?
87+
- Insight: if I don't know how to run an app, it's unlikely the LLM can do it, because I can't give any guidance (and it guesses)
7488
- How do we define stability?
7589
- What are the increments of change (e.g., "adding a library")? We should be able to keep track of times for each stage and what changed, and an analyzer LLM can look at result and understand (categorize) most salient contributions to change.
7690
- We also can time the time it takes to do subsequent changes, when relevant. For example, if we are building, we should be able to use cached layers (and the build times speed up) if the LLM is changing content later in the Dockerfile.

fractale/agent/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
from fractale.agent.build import BuildAgent
2-
from fractale.agent.kubernetes_job import KubernetesJobAgent
2+
from fractale.agent.kubernetes import KubernetesJobAgent
33
from fractale.agent.manager import ManagerAgent
44

55

fractale/agent/base.py

Lines changed: 79 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,16 @@
1-
import json
1+
import copy
22
import os
3+
import re
34
import sys
5+
import time
46

57
import google.generativeai as genai
68

79
import fractale.agent.defaults as defaults
810
import fractale.agent.logger as logger
911
import fractale.utils as utils
1012
from fractale.agent.context import get_context
13+
from fractale.agent.decorators import save_result, timed
1114

1215

1316
class Agent:
@@ -22,28 +25,37 @@ class Agent:
2225
"""
2326

2427
# name and description should be on the class
28+
state_variables = ["result", "error_message"]
2529

26-
def __init__(self, use_cache=False):
30+
def __init__(
31+
self, use_cache=False, results_dir=None, save_incremental=False, max_attempts=None
32+
):
33+
self.attempts = 0
34+
self.max_attempts = max_attempts
2735

28-
# Max attempts defaults to unlimited
29-
# We start counting at 1 for the user to see.
30-
# Eat your heart out, Matlab.
31-
self.attempts = 1
32-
self.max_attempts = None
36+
# For now, assume these are for the manager.
37+
# They get added to other agents via the step creation
38+
# We can optionally save incremental result objects
39+
self.results_dir = results_dir or os.getcwd()
40+
self.save_incremental = save_incremental
3341

3442
# The user can save if desired - caching the context to skip steps that already run.
3543
self.setup_cache(use_cache)
3644

45+
# This supports saving custom logs and step (attempt) metadata
46+
self.init_metadata()
47+
3748
# Custom initialization functions
3849
self.init()
3950

51+
def init_metadata(self):
52+
self.metadata = {"times": {}, "assets": {}, "retries": 0, "failures": []}
53+
54+
@save_result
4055
def run(self, context):
4156
"""
4257
Run the agent - a wrapper around internal function _run that prepares it.
4358
"""
44-
# Init attempts. Each agent has an internal counter for total attempts
45-
self.attempts = self.attempts or 1
46-
4759
# Load cached context. This is assumed to override user provided args
4860
# If we have a saved context, we assume we want to use it, return early
4961
cached_context = self.load_cache()
@@ -57,7 +69,8 @@ def run(self, context):
5769
context = get_context(context)
5870

5971
# Run, wrapping with a load and save of cache
60-
context = self._run(context)
72+
# This will return here when the internal loop is done
73+
context = self.run_step(context)
6174
self.save_cache(context)
6275
return context
6376

@@ -70,6 +83,32 @@ def print_result(self, result):
7083
"""
7184
pass
7285

86+
def reset_context(self, context):
87+
"""
88+
Remove output and any stateful variables. This is assuming we
89+
are starting again.
90+
"""
91+
for key in self.state_variables:
92+
if key in context:
93+
del context[key]
94+
95+
# Since we will try again, let's move current metadata into a subsection
96+
metadata = copy.deepcopy(self.metadata)
97+
98+
# We don't want this to recurse forever
99+
failures = metadata.get("failures") or []
100+
if "failures" in metadata:
101+
del metadata["failures"]
102+
failures.append(metadata)
103+
104+
# Reset metadata, save retries
105+
self.init_metadata()
106+
self.metadata["failures"] = failures
107+
self.metadata["retries"] = metadata["retries"]
108+
109+
# We don't need a return here, but let's be explicit
110+
return context
111+
73112
def setup_cache(self, use_cache=False):
74113
"""
75114
Setup (or load) a cache.
@@ -123,10 +162,7 @@ def reached_max_attempts(self):
123162
# Unset (None) or 1.
124163
if not self.max_attempts:
125164
return False
126-
return self.attempts >= self.max_attempts
127-
128-
def set_max_attempts(self, max_attempts):
129-
self.max_attempts = max_attempts
165+
return self.attempts > self.max_attempts
130166

131167
def add_shared_arguments(self, agent):
132168
"""
@@ -190,29 +226,25 @@ def get_code_block(self, content, code_type):
190226
"""
191227
Parse a code block from the response
192228
"""
229+
pattern = f"```(?:{code_type})?\n(.*?)```"
230+
match = re.search(pattern, content, re.DOTALL)
231+
if match:
232+
return match.group(1).strip()
193233
if content.startswith(f"```{code_type}"):
194234
content = content[len(f"```{code_type}") :]
195235
if content.startswith("```"):
196236
content = content[len("```") :]
197237
if content.endswith("```"):
198238
content = content[: -len("```")]
199-
return content
239+
return content.strip()
200240

201-
def _run(self, context):
241+
def run_step(self, context):
202242
"""
203243
Run the agent. This expects to be called with a loaded context.
204244
"""
205245
assert context
206246
raise NotImplementedError(f"The {self.name} agent is missing internal 'run' function")
207247

208-
def get_initial_prompt(self, context):
209-
"""
210-
Get the initial prompt (with details) to provide context to the manager.
211-
212-
If we don't do this, the manager can provide a bad instruction for how to fix the error.
213-
"""
214-
return self.get_prompt(context)
215-
216248
def get_prompt(self, context):
217249
"""
218250
This function should take the same context as run and return the parsed prompt that
@@ -235,19 +267,41 @@ def init(self):
235267
except KeyError:
236268
sys.exit("ERROR: GEMINI_API_KEY environment variable not set.")
237269

270+
# We don't add timed here because we do it custom
238271
def ask_gemini(self, prompt, with_history=True):
239272
"""
240273
Ask gemini adds a wrapper with some error handling.
241274
"""
242275
try:
276+
start = time.perf_counter()
243277
if with_history:
244278
response = self.chat.send_message(prompt)
245279
else:
246280
response = self.model.generate_content(prompt)
281+
end = time.perf_counter()
282+
283+
if self.save_incremental:
284+
self.save_gemini_metadata(end - start, response, with_history)
247285

248286
# This line can fail. If it succeeds, return entire response
249287
return response.text.strip()
250288

251289
except ValueError as e:
252290
print(f"[Error] The API response was blocked and contained no text: {str(e)}")
253291
return "GEMINI ERROR: The API returned an error (or stop) and we need to try again."
292+
293+
def save_gemini_metadata(self, elapsed_time, response, with_history):
294+
"""
295+
Save gemini response metadata and elapsed time
296+
"""
297+
if "ask_gemini" not in self.metadata:
298+
self.metadata["ask_gemini"] = []
299+
self.metadata["ask_gemini"].append(
300+
{
301+
"conversation_history": with_history,
302+
"prompt_token_count": response.usage_metadata.prompt_token_count,
303+
"candidates_token_count": response.usage_metadata.candidates_token_count,
304+
"total_token_count": response.usage_metadata.total_token_count,
305+
"time_seconds": elapsed_time,
306+
}
307+
)

0 commit comments

Comments
 (0)