Skip to content

Commit dc28860

Browse files
samuelcolvinclaudedavidhewitt
authored
remove external_functions parameter, auto-detect with NameLookup (#214)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: David Hewitt <mail@davidhewitt.dev>
1 parent fb7b7b5 commit dc28860

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

79 files changed

+4502
-2586
lines changed

README.md

Lines changed: 38 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -25,22 +25,24 @@ Monty avoids the cost, latency, complexity and general faff of using a full cont
2525
Instead, it lets you safely run Python code written by an LLM embedded in your agent, with startup times measured in single digit microseconds not hundreds of milliseconds.
2626

2727
What Monty **can** do:
28-
* Run a reasonable subset of Python code - enough for your agent to express what it wants to do
29-
* Completely block access to the host environment: filesystem, env variables and network access are all implemented via external function calls the developer can control
30-
* Call functions on the host - only functions you give it access to
31-
* Run typechecking - monty supports full modern python type hints and comes with [ty](https://docs.astral.sh/ty/) included in a single binary to run typechecking
32-
* Be snapshotted to bytes at external function calls, meaning you can store the interpreter state in a file or database, and resume later
33-
* Startup extremely fast (<1μs to go from code to execution result), and has runtime performance that is similar to CPython (generally between 5x faster and 5x slower)
34-
* Be called from Rust, Python, or Javascript - because Monty has no dependencies on cpython, you can use it anywhere you can run Rust
35-
* Control resource usage - Monty can track memory usage, allocations, stack depth, and execution time and cancel execution if it exceeds preset limits
36-
* Collect stdout and stderr and return it to the caller
37-
* Run async or sync code on the host via async or sync code on the host
28+
29+
- Run a reasonable subset of Python code - enough for your agent to express what it wants to do
30+
- Completely block access to the host environment: filesystem, env variables and network access are all implemented via external function calls the developer can control
31+
- Call functions on the host - only functions you give it access to
32+
- Run typechecking - monty supports full modern python type hints and comes with [ty](https://docs.astral.sh/ty/) included in a single binary to run typechecking
33+
- Be snapshotted to bytes at external function calls, meaning you can store the interpreter state in a file or database, and resume later
34+
- Startup extremely fast (<1μs to go from code to execution result), and has runtime performance that is similar to CPython (generally between 5x faster and 5x slower)
35+
- Be called from Rust, Python, or Javascript - because Monty has no dependencies on cpython, you can use it anywhere you can run Rust
36+
- Control resource usage - Monty can track memory usage, allocations, stack depth, and execution time and cancel execution if it exceeds preset limits
37+
- Collect stdout and stderr and return it to the caller
38+
- Run async or sync code on the host via async or sync code on the host
3839

3940
What Monty **cannot** do:
40-
* Use the standard library (except a few select modules: `sys`, `typing`, `asyncio`, `dataclasses` (soon), `json` (soon))
41-
* Use third party libraries (like Pydantic), support for external python library is not a goal
42-
* define classes (support should come soon)
43-
* use match statements (again, support should come soon)
41+
42+
- Use the standard library (except a few select modules: `sys`, `typing`, `asyncio`, `dataclasses` (soon), `json` (soon))
43+
- Use third party libraries (like Pydantic), support for external python library is not a goal
44+
- define classes (support should come soon)
45+
- use match statements (again, support should come soon)
4446

4547
---
4648

@@ -49,10 +51,11 @@ In short, Monty is extremely limited and designed for **one** use case:
4951
**To run code written by agents.**
5052

5153
For motivation on why you might want to do this, see:
52-
* [Codemode](https://blog.cloudflare.com/code-mode/) from Cloudflare
53-
* [Programmatic Tool Calling](https://platform.claude.com/docs/en/agents-and-tools/tool-use/programmatic-tool-calling) from Anthropic
54-
* [Code Execution with MCP](https://www.anthropic.com/engineering/code-execution-with-mcp) from Anthropic
55-
* [Smol Agents](https://github.com/huggingface/smolagents) from Hugging Face
54+
55+
- [Codemode](https://blog.cloudflare.com/code-mode/) from Cloudflare
56+
- [Programmatic Tool Calling](https://platform.claude.com/docs/en/agents-and-tools/tool-use/programmatic-tool-calling) from Anthropic
57+
- [Code Execution with MCP](https://www.anthropic.com/engineering/code-execution-with-mcp) from Anthropic
58+
- [Smol Agents](https://github.com/huggingface/smolagents) from Hugging Face
5659

5760
In very simple terms, the idea of all the above is that LLMs can work faster, cheaper and more reliably if they're asked to write Python (or Javascript) code, instead of relying on traditional tool calling. Monty makes that possible without the complexity of a sandbox or risk of running code directly on the host.
5861

@@ -105,7 +108,6 @@ prompt: str = ''
105108
m = pydantic_monty.Monty(
106109
code,
107110
inputs=['prompt'],
108-
external_functions=['call_llm'],
109111
script_name='agent.py',
110112
type_check=True,
111113
type_check_stubs=type_definitions,
@@ -151,13 +153,13 @@ data = fetch(url)
151153
len(data)
152154
"""
153155

154-
m = pydantic_monty.Monty(code, inputs=['url'], external_functions=['fetch'])
156+
m = pydantic_monty.Monty(code, inputs=['url'])
155157

156158
# Start execution - pauses when fetch() is called
157159
result = m.start(inputs={'url': 'https://example.com'})
158160

159161
print(type(result))
160-
#> <class 'pydantic_monty.MontySnapshot'>
162+
#> <class 'pydantic_monty.FunctionSnapshot'>
161163
print(result.function_name) # fetch
162164
#> fetch
163165
print(result.args)
@@ -174,7 +176,7 @@ print(result.output)
174176

175177
#### Serialization
176178

177-
Both `Monty` and `MontySnapshot` can be serialized to bytes and restored later.
179+
Both `Monty` and snapshot types like `FunctionSnapshot` can be serialized to bytes and restored later.
178180
This allows caching parsed code or suspending execution across process boundaries:
179181

180182
```python
@@ -190,12 +192,12 @@ print(m2.run(inputs={'x': 41}))
190192
#> 42
191193

192194
# Serialize execution state mid-flight
193-
m = pydantic_monty.Monty('fetch(url)', inputs=['url'], external_functions=['fetch'])
195+
m = pydantic_monty.Monty('fetch(url)', inputs=['url'])
194196
progress = m.start(inputs={'url': 'https://example.com'})
195197
state = progress.dump()
196198

197199
# Later, restore and resume (e.g., in a different process)
198-
progress2 = pydantic_monty.MontySnapshot.load(state)
200+
progress2 = pydantic_monty.FunctionSnapshot.load(state)
199201
result = progress2.resume(return_value='response data')
200202
print(result.output)
201203
#> response data
@@ -215,7 +217,7 @@ def fib(n):
215217
fib(x)
216218
"#;
217219

218-
let runner = MontyRun::new(code.to_owned(), "fib.py", vec!["x".to_owned()], vec![]).unwrap();
220+
let runner = MontyRun::new(code.to_owned(), "fib.py", vec!["x".to_owned()]).unwrap();
219221
let result = runner.run(vec![MontyObject::Int(10)], NoLimitTracker, &mut PrintWriter::Stdout).unwrap();
220222
assert_eq!(result, MontyObject::Int(55));
221223
```
@@ -228,7 +230,7 @@ assert_eq!(result, MontyObject::Int(55));
228230
use monty::{MontyRun, MontyObject, NoLimitTracker, PrintWriter};
229231

230232
// Serialize parsed code
231-
let runner = MontyRun::new("x + 1".to_owned(), "main.py", vec!["x".to_owned()], vec![]).unwrap();
233+
let runner = MontyRun::new("x + 1".to_owned(), "main.py", vec!["x".to_owned()]).unwrap();
232234
let bytes = runner.dump().unwrap();
233235

234236
// Later, restore and run
@@ -337,15 +339,15 @@ I'll try to run through the most obvious alternatives, and why there aren't righ
337339

338340
NOTE: all these technologies are impressive and have widespread uses, this commentary on their limitations for our use case should not be seen as a criticism. Most of these solutions were not conceived with the goal of providing an LLM sandbox, which is why they're not necessary great at it.
339341

340-
| Tech | Language completeness | Security | Start latency | FOSS | Setup complexity | File mounting | Snapshotting |
341-
|--------------------|-----------------------|--------------|----------------|------------|------------------|----------------|--------------|
342-
| Monty | partial | strict | 0.06ms | free / OSS | easy | easy | easy |
343-
| Docker | full | good | 195ms | free / OSS | intermediate | easy | intermediate |
344-
| Pyodide | full | poor | 2800ms | free / OSS | intermediate | easy | hard |
345-
| starlark-rust | very limited | good | 1.7ms | free / OSS | easy | not available? | impossible? |
346-
| WASI / Wasmer | partial, almost full | strict | 66ms | free * | intermediate | easy | intermediate |
347-
| sandboxing service | full | strict | 1033ms | not free | intermediate | hard | intermediate |
348-
| YOLO Python | full | non-existent | 0.1ms / 30ms | free / OSS | easy | easy / scary | hard |
342+
| Tech | Language completeness | Security | Start latency | FOSS | Setup complexity | File mounting | Snapshotting |
343+
| ------------------ | --------------------- | ------------ | ------------- | ---------- | ---------------- | -------------- | ------------ |
344+
| Monty | partial | strict | 0.06ms | free / OSS | easy | easy | easy |
345+
| Docker | full | good | 195ms | free / OSS | intermediate | easy | intermediate |
346+
| Pyodide | full | poor | 2800ms | free / OSS | intermediate | easy | hard |
347+
| starlark-rust | very limited | good | 1.7ms | free / OSS | easy | not available? | impossible? |
348+
| WASI / Wasmer | partial, almost full | strict | 66ms | free \* | intermediate | easy | intermediate |
349+
| sandboxing service | full | strict | 1033ms | not free | intermediate | hard | intermediate |
350+
| YOLO Python | full | non-existent | 0.1ms / 30ms | free / OSS | easy | easy / scary | hard |
349351

350352
See [./scripts/startup_performance.py](scripts/startup_performance.py) for the script used to calculate the startup performance numbers.
351353

@@ -397,7 +399,7 @@ Running Python in WebAssembly via [Wasmer](https://wasmer.io/).
397399
- **Security**: In principle WebAssembly should provide strong sandboxing guarantees.
398400
- **Start latency**: The [wasmer](https://pypi.org/project/wasmer/) python package hasn't been updated for 3 years and I couldn't find docs on calling Python in wasmer from Python, so I called it via subprocess. Start latency was 66ms.
399401
- **Setup complexity**: wasmer download is 100mb, the "python/python" package is 50mb.
400-
- **FOSS**: I marked this as "free *" since the cost is zero but not everything seems to be open source. As of 2026-02-10 the [`python/python` wasmer package](https://wasmer.io/python/python) package has no readme, no license, no source link and no indication of how it's built, the recently uploaded versions show size as "0B" although the download is ~50MB - the build process for the Python binary is not clear and transparent. _(If I'm wrong here, please create an issue to correct correct me)_
402+
- **FOSS**: I marked this as "free \*" since the cost is zero but not everything seems to be open source. As of 2026-02-10 the [`python/python` wasmer package](https://wasmer.io/python/python) package has no readme, no license, no source link and no indication of how it's built, the recently uploaded versions show size as "0B" although the download is ~50MB - the build process for the Python binary is not clear and transparent. _(If I'm wrong here, please create an issue to correct correct me)_
401403
- **File mounting**: Supported
402404
- **Snapshotting**: Supported via journaling
403405

crates/fuzz/fuzz_targets/string_input_panic.rs

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,6 @@ fuzz_target!(|code: String| {
2828
code.to_owned(),
2929
"fuzz.py",
3030
vec![], // no inputs
31-
vec![], // no external functions
3231
) else {
3332
return; // Parse errors are expected for random input
3433
};

crates/fuzz/fuzz_targets/tokens_input_panic.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -543,7 +543,7 @@ fuzz_target!(|tokens: Tokens| {
543543
let code = tokens.to_code();
544544

545545
// Try to parse the code
546-
let Ok(runner) = MontyRun::new(code, "fuzz.py", vec![], vec![]) else {
546+
let Ok(runner) = MontyRun::new(code, "fuzz.py", vec![]) else {
547547
return; // Parse errors are expected
548548
};
549549

crates/monty-cli/src/main.rs

Lines changed: 34 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@ use std::{
66

77
use clap::Parser;
88
use monty::{
9-
LimitedTracker, MontyObject, MontyRepl, MontyRun, NoLimitTracker, PrintWriter, ReplContinuationMode,
10-
ResourceLimits, ResourceTracker, RunProgress, detect_repl_continuation_mode,
9+
LimitedTracker, MontyObject, MontyRepl, MontyRun, NameLookupResult, NoLimitTracker, PrintWriter,
10+
ReplContinuationMode, ResourceLimits, ResourceTracker, RunProgress, detect_repl_continuation_mode,
1111
};
1212
use rustyline::{DefaultEditor, error::ReadlineError};
1313
// disabled due to format failing on https://github.com/pydantic/monty/pull/75 where CI and local wanted imports ordered differently
@@ -201,9 +201,8 @@ fn run_script(file_path: &str, code: String, type_check_enabled: bool, tracker:
201201

202202
let input_names = vec![];
203203
let inputs = vec![];
204-
let ext_functions = vec!["add_ints".to_owned()];
205204

206-
let runner = match MontyRun::new(code, file_path, input_names, ext_functions) {
205+
let runner = match MontyRun::new(code, file_path, input_names) {
207206
Ok(ex) => ex,
208207
Err(err) => {
209208
eprintln!("{BOLD_RED}error{RESET}:\n{err}");
@@ -278,23 +277,15 @@ fn run_script(file_path: &str, code: String, type_check_enabled: bool, tracker:
278277
fn run_repl(file_path: &str, code: String, tracker: impl ResourceTracker) -> ExitCode {
279278
let input_names = vec![];
280279
let inputs = vec![];
281-
let ext_functions = vec!["add_ints".to_owned()];
282-
283-
let (mut repl, init_output) = match MontyRepl::new(
284-
code,
285-
file_path,
286-
input_names,
287-
ext_functions,
288-
inputs,
289-
tracker,
290-
&mut PrintWriter::Stdout,
291-
) {
292-
Ok(v) => v,
293-
Err(err) => {
294-
eprintln!("{BOLD_RED}error{RESET} initializing repl:\n{err}");
295-
return ExitCode::FAILURE;
296-
}
297-
};
280+
281+
let (mut repl, init_output) =
282+
match MontyRepl::new(code, file_path, input_names, inputs, tracker, &mut PrintWriter::Stdout) {
283+
Ok(v) => v,
284+
Err(err) => {
285+
eprintln!("{BOLD_RED}error{RESET} initializing repl:\n{err}");
286+
return ExitCode::FAILURE;
287+
}
288+
};
298289

299290
if init_output != MontyObject::None {
300291
println!("{init_output}");
@@ -401,15 +392,10 @@ fn run_until_complete(mut progress: RunProgress<impl ResourceTracker>) -> Result
401392
loop {
402393
match progress {
403394
RunProgress::Complete(value) => return Ok(value),
404-
RunProgress::FunctionCall {
405-
function_name,
406-
args,
407-
state,
408-
..
409-
} => {
410-
let return_value = resolve_external_call(&function_name, &args)?;
411-
progress = state
412-
.run(return_value, &mut PrintWriter::Stdout)
395+
RunProgress::FunctionCall(call) => {
396+
let return_value = resolve_external_call(&call.function_name, &call.args)?;
397+
progress = call
398+
.resume(return_value, &mut PrintWriter::Stdout)
413399
.map_err(|err| format!("{err}"))?;
414400
}
415401
RunProgress::ResolveFutures(state) => {
@@ -418,8 +404,24 @@ fn run_until_complete(mut progress: RunProgress<impl ResourceTracker>) -> Result
418404
state.pending_call_ids()
419405
));
420406
}
421-
RunProgress::OsCall { function, args, .. } => {
422-
return Err(format!("OS calls not supported in CLI: {function:?}({args:?})"));
407+
RunProgress::NameLookup(lookup) => {
408+
let result = if lookup.name == "add_ints" {
409+
NameLookupResult::Value(MontyObject::Function {
410+
name: "add_ints".to_string(),
411+
docstring: None,
412+
})
413+
} else {
414+
NameLookupResult::Undefined
415+
};
416+
progress = lookup
417+
.resume(result, &mut PrintWriter::Stdout)
418+
.map_err(|err| format!("{err}"))?;
419+
}
420+
RunProgress::OsCall(call) => {
421+
return Err(format!(
422+
"OS calls not supported in CLI: {:?}({:?})",
423+
call.function, call.args
424+
));
423425
}
424426
}
425427
}

crates/monty-js/README.md

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ const result = m.run({ inputs: { x: 10, y: 20 } }) // returns 30
3030
For synchronous external functions, pass them directly to `run()`:
3131

3232
```ts
33-
const m = new Monty('add(2, 3)', { externalFunctions: ['add'] })
33+
const m = new Monty('add(2, 3)')
3434

3535
const result = m.run({
3636
externalFunctions: {
@@ -46,7 +46,6 @@ import { Monty, runMontyAsync } from '@pydantic/monty'
4646

4747
const m = new Monty('fetch_data(url)', {
4848
inputs: ['url'],
49-
externalFunctions: ['fetch_data'],
5049
})
5150

5251
const result = await runMontyAsync(m, {
@@ -65,7 +64,7 @@ const result = await runMontyAsync(m, {
6564
For fine-grained control over external function calls, use `start()` and `resume()`:
6665

6766
```ts
68-
const m = new Monty('a() + b()', { externalFunctions: ['a', 'b'] })
67+
const m = new Monty('a() + b()')
6968

7069
let progress = m.start()
7170
while (progress instanceof MontySnapshot) {
@@ -161,13 +160,11 @@ if (snapshot instanceof MontySnapshot) {
161160
- `Monty.load(data)` - Deserialize from binary format
162161
- `scriptName` - The script name (default: `'main.py'`)
163162
- `inputs` - Declared input variable names
164-
- `externalFunctions` - Declared external function names
165163

166164
### `MontyOptions`
167165

168166
- `scriptName?: string` - Name used in tracebacks (default: `'main.py'`)
169167
- `inputs?: string[]` - Input variable names
170-
- `externalFunctions?: string[]` - External function names
171168
- `typeCheck?: boolean` - Enable type checking on construction
172169
- `typeCheckPrefixCode?: string` - Code to prepend for type checking
173170

0 commit comments

Comments
 (0)