Delphyne is a programming framework for building reliable and modular LLM applications. It offers a powerful approach for integrating traditional programming and prompting.
Users can describe high-level problem solving strategies as arbitrary programs with unresolved choice points. Policies for navigating the resulting search spaces with LLMs can be independently assembled and tuned. Examples of correct choices can be expressed in a dedicated demonstration language that supports interactive, test-driven development.
Let us illustrate Delphyne with a complete example. Consider the task of finding a parameter value that makes a mathematical expression nonnegative for all values of x
. For example, given expression x² - 2x + n
, n = -1
is an incorrect answer (take x = 0
), but n = 1
is a correct answer since x² - 2x + 1 = (x - 1)²
. Here is a Delphyne strategy for solving this problem:
import sympy as sp
import delphyne as dp
from delphyne import Branch, Fail, Strategy, strategy
@strategy
def find_param_value(expr: str) -> Strategy[Branch | Fail, IPDict, int]:
"""
Find an integer `n` that makes a given math expression nonnegative
for all real `x`. Prove that the resulting expression is nonnegative
by rewriting it into an equivalent form.
"""
x, n = sp.Symbol("x", real=True), sp.Symbol("n")
symbs = {"x": x, "n": n}
try:
n_val = yield from dp.guess(int, using=[expr])
expr_sp = sp.parse_expr(expr, symbs).subs({n: n_val})
equiv = yield from dp.guess(str, using=[str(expr_sp)])
equiv_sp = sp.parse_expr(equiv, symbs)
equivalent = (expr_sp - equiv_sp).simplify() == 0
yield from dp.ensure(equivalent, "not_equivalent")
yield from dp.ensure(equiv_sp.is_nonnegative, "not_nonneg")
return n_val
except Exception as e:
yield from dp.fail("sympy_error", message=str(e))
A strategy is a program with unresolved choice points, represented here by the guess
operator. At runtime, LLM oracles are tasked with producing return values for guess
, in such a way as to pass all ensure
assertions. Such a program induces a search tree that can be explored in many different ways, as specified by a separate policy:
def serial_policy():
model = dp.standard_model("gpt-5-mini")
return dp.dfs() & {
"n_val": dp.few_shot(model),
"equiv": dp.take(2) @ dp.few_shot(model)}
This policy uses sequential depth-first search (dfs
), making at most two proof attempts for every parameter guess and sampling choices using gpt-5-mini
. Here is another policy that uses a parallel variant of dfs
, where multiple completions are repeatedly sampled and the resulting branches explored in parallel:
def parallel_policy():
model = dp.standard_model("gpt-5-mini")
return dp.loop() @ dp.par_dfs() & {
"n_val": dp.few_shot(model, max_requests=1, num_completions=3),
"equiv": dp.few_shot(model, max_requests=1, num_completions=2)}
Given an associated policy and a budget limit, our strategy can now be executed:
budget = dp.BudgetLimit({dp.NUM_REQUESTS: 8, dp.DOLLAR_PRICE: 1e-3})
res, _ = (find_param_value("2*x**2 - 4*x + n")
.run_toplevel(dp.PolicyEnv(), serial_policy())
.collect(budget=budget, num_generated=1))
print(res[0].tracked.value) # e.g. 2
Here, choice points are resolved via zero-shot prompting. To increase reliability, one can provide examples of correct decisions by adding demonstrations. Demonstrations can be expressed in a dedicated language that supports an interactive, test-driven development workflow. In the screenshot below, the VSCode extension indicates the next unresolved choice point for solving a particular problem instance, which the user can answer before receiving new feedback:
A complete explanation of this example is provided in the Delphyne Manual.
- Separate your business logic from your search logic. Prompting offers a powerful but unreliable programming primitive, making validation and search essential. At the same time, retries and backtracking are costly. Resolving this tension requires careful tuning. Delphyne enforces a strict separation between business logic (i.e. strategies) and search logic (i.e. policies), allowing users to iterate on the latter without any refactoring.
- Write and repair demonstrations interactively. Few-shot examples are crucial components of an LLM pipeline, but writing them and keeping them consistent as the pipeline evolves can be error-prone and time-consuming. Delphyne's dedicated demonstration language allows bundling examples with unit tests and updating them interactively.
- Refactor without fear. Delphyne leverages static types to enforce consistency between strategies and policies, and a testing DSL to enforce consistency between strategies and demonstrations. Modifying policies is guaranteed to never break demonstrations.
- Compose heterogeneous strategies in a modular fashion. Delphyne allows modularly composing heterogeneous strategies while keeping the associated policies independent. This is enabled by having arbitrary search algorithms cooperate through a shared protocol. In addition, any prompt can be locally and transparently refined into a dedicated strategy.
- Leverage advanced search algorithms. Thanks to an extensible effect system, strategies can produce search trees with arbitrary metadata, structure, and shape, which advanced algorithms (e.g. HyperTree Proof Search) can exploit. Tree transformers enable conversions between tree types, allowing the same strategy to pair with vastly different algorithms.
- Precisely control your resource spending. Delphyne offers a resource-aware, combinator-based language for defining search policies that prevents overspending by construction.
- Leverage the full Python ecosystem. Delphyne integrates seamlessly with Python libraries and tooling. It is fully typed and works smoothly with the Python debugger.
- Access a rich library of components. Delphyne comes with batteries included: its standard library supports many models and prompting idioms, including structured output, tool use, classification, assistant priming, and chain of thought.
- Run reproducible experiments. Delphyne provides utilities for benchmarking and tuning LLM pipelines. Request caching allows experiments to be replicated at no cost.
First, download the Delphyne repository and enter it:
git clone [email protected]:jonathan-laurent/delphyne.git
cd delphyne
git checkout v0.11.1 # latest stable version
Then, to install the Delphyne library and CLI in your current Python environment:
pip install -e ".[dev]"
Note that Python 3.12 (or more recent) is required, since Delphyne makes heavy use of PEP 695 generics. Next, you should build the Delphyne vscode extension. For this, assuming you have Node.js installed (version 22 or later), run:
cd vscode-ui
npm install
npx vsce package
The last command should create a delphyne-xxx.vsix
extensions archive, which can be installed in vscode using the Extensions: Install from VSIX
command (use Ctrl+Shift+P
to search for it in the command palette).
You can verify your installation by running the short test suite:
make test
To test the Delphyne extension, we recommend reading the corresponding manual chapter, opening the examples/find_invariants/abduct_and_branch.demo.yaml
demonstration file, and then executing the Delphyne: Evaluate All Demonstrations in File
command (accessible from the palette via Ctrl+Shift+P
). Diagnostics should appear to indicate that all tests passed.
- The key concepts underlying Delphyne are presented in the Manual.
- A complete API Reference is also available, along with a series of How-To Guides.
- The
examples
directory provides a gallery of Delphyne programs (seeREADME
files). - For a more technical presentation, see the original paper that introduces Delphyne: Oracular Programming: A Modular Foundation for Building LLM-Enabled Software.
If you use Delphyne in an academic paper, please cite our work as follows:
@article{oracular-programming-2025,
title={Oracular Programming: A Modular Foundation for Building LLM-Enabled Software},
author={Laurent, Jonathan and Platzer, Andr{\'e}},
journal={arXiv preprint arXiv:2502.05310},
year={2025}
}
This work was supported by the Alexander von Humboldt Professorship program.