-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Going through setting up e2b and was having issues with the initial onboarding examples at https://e2b.dev/docs/quickstart/connect-llms. There's a couple things going on I noticed which make it hard to demonstrate how the sandbox execution works without handling edge cases:
- The AI might not even generate code
- AI code can either generate results with the repr or with printing
TLDR: I suspect easiest solution here is just a system message that very clearly says to use one of [print or repr] as well as to make sure the AI always calls a tool for calculations. Then handle that case in the examples.
1) The ai might not even generate code
When I initially went through and ran the first example (openai), I went and added print statements to see where it was calling e2b to see what that looked like and realized e2b wasn't even getting called; the model was just generating the correct answer without doing a function call. That's this bit of code here where the if isn't getting accessed:
response = client.chat.completions.create(
model=model,
messages=messages,
tools=tools,
)
# Append the response message to the messages list
response_message = response.choices[0].message
messages.append(response_message)
# Execute the tool if it's called by the model
if response_message.tool_calls:
for tool_call in response_message.tool_calls:
if tool_call.function.name == "execute_python":
# Create a sandbox and execute the code
with Sandbox.create() as sandbox:
code = json.loads(tool_call.function.arguments)['code']
execution = sandbox.run_code(code)
result = execution.text
...
Likely needs some additional system messages to get the AI to do a tool call there. The user can likely add their own print statements to figure out what's getting called where, but would be useful if the tool call looked like it's supposed to look.
2) The AI's code can output the results nondeterministicly which doesn't have a catchall
The AI can write it as either
word = 'strawberry'
print(word.count('r'))
or
res = word.count('r')
res
The .text prop on the Execution will only handle if it's repr'd, not if it's printed, so the demo gets a bit complex here. If the AI uses print and you check .text, it'll get None (although still get the answer right cause the AI's smart enough nowadays). Can see at the anthropic example on https://e2b.dev/docs/quickstart/connect-llms that it does have the execution.logs.stdout used there to handle that case. However, that means if it does do the repr method then it'll get None again. Can maybe get solved via more deterministic prompting on instructions for output formatting (like should get done to solve the prior issue). However, possibly having something that pulls any text from the output would be nice - I do see there's a .to_json() as well that might be better to pass instead.
- Note: the
.to_json()thing may also have an issue as well with double jsonifying which can take a lot of extra tokens if not done carefully though. Sending all the outputs to the AI usually is the best in my experience though so maybe im over complicating this and the right thing is just doing the json instead of text.