Onboarding issue: Unstable AI responses hinder reproducibility

Going through setting up e2b and was having issues with the initial onboarding examples at https://e2b.dev/docs/quickstart/connect-llms. There's a couple things going on I noticed which make it hard to demonstrate how the sandbox execution works without handling edge cases:

1) The AI might not even generate code
2) AI code can either generate results with the repr or with printing

TLDR: I suspect easiest solution here is just a system message that very clearly says to use one of [print or repr] as well as to make sure the AI always calls a tool for calculations. Then handle that case in the examples.

### 1) The ai might not even generate code
When I initially went through and ran the first example (openai), I went and added print statements to see where it was calling e2b to see what that looked like and realized e2b wasn't even getting called; the model was just generating the correct answer without doing a function call. That's this bit of code here where the `if` isn't getting accessed:

```
response = client.chat.completions.create(
    model=model,
    messages=messages,
    tools=tools,
)

# Append the response message to the messages list
response_message = response.choices[0].message
messages.append(response_message)

# Execute the tool if it's called by the model
if response_message.tool_calls:
    for tool_call in response_message.tool_calls:
        if tool_call.function.name == "execute_python":
            # Create a sandbox and execute the code
            with Sandbox.create() as sandbox:
                code = json.loads(tool_call.function.arguments)['code']
                execution = sandbox.run_code(code)
                result = execution.text
...
```

Likely needs some additional system messages to get the AI to do a tool call there. The user can likely add their own print statements to figure out what's getting called where, but would be useful if the tool call looked like it's supposed to look.

### 2) The AI's code can output the results nondeterministicly which doesn't have a catchall 

The AI can write it as either 

```
word = 'strawberry'
print(word.count('r'))
```

or 
```
res = word.count('r')
res
```

The `.text` prop on the `Execution` will only handle if it's repr'd, not if it's printed, so the demo gets a bit complex here. If the AI uses `print` and you check `.text`, it'll get `None` (although still get the answer right cause the AI's smart enough nowadays). Can see at the anthropic example on https://e2b.dev/docs/quickstart/connect-llms that it does have the `execution.logs.stdout` used there to handle that case. However, that means if it does do the repr method then it'll get `None` again. Can maybe get solved via more deterministic prompting on instructions for output formatting (like should get done to solve the prior issue). However, possibly having something that pulls any text from the output would be nice - I do see there's a `.to_json()` as well that might be better to pass instead.

- Note: the `.to_json()` thing may also have an issue as well with double jsonifying which can take a lot of extra tokens if not done carefully though. Sending all the outputs to the AI usually is the best in my experience though so maybe im over complicating this and the right thing is just doing the json instead of text.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Onboarding issue: Unstable AI responses hinder reproducibility #55

1) The ai might not even generate code

2) The AI's code can output the results nondeterministicly which doesn't have a catchall

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Onboarding issue: Unstable AI responses hinder reproducibility #55

Description

1) The ai might not even generate code

2) The AI's code can output the results nondeterministicly which doesn't have a catchall

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions