GitHub - jonathan-laurent/delphyne: A framework for building reliable and modular LLM-enabled applications.

Delphyne is a programming framework for building reliable and modular LLM applications. It offers a powerful approach for integrating traditional programming and prompting.

Users can describe high-level problem solving strategies as arbitrary programs with unresolved choice points. Policies for navigating the resulting search spaces with LLMs can be independently assembled and tuned. Examples of correct choices can be expressed in a dedicated demonstration language that supports interactive, test-driven development.

Quick Example

Let us illustrate Delphyne with a complete example. Consider the task of finding a parameter value that makes a mathematical expression nonnegative for all values of x. For example, given expression x² - 2x + n, n = -1 is an incorrect answer (take x = 0), but n = 1 is a correct answer since x² - 2x + 1 = (x - 1)². Here is a Delphyne strategy for solving this problem:

import sympy as sp
import delphyne as dp 
from delphyne import Branch, Fail, Strategy, strategy

@strategy
def find_param_value(expr: str) -> Strategy[Branch | Fail, IPDict, int]:
    """
    Find an integer `n` that makes a given math expression nonnegative
    for all real `x`. Prove that the resulting expression is nonnegative
    by rewriting it into an equivalent form.
    """
    x, n = sp.Symbol("x", real=True), sp.Symbol("n")
    symbs = {"x": x, "n": n}
    try:
        n_val = yield from dp.guess(int, using=[expr])
        expr_sp = sp.parse_expr(expr, symbs).subs({n: n_val})
        equiv = yield from dp.guess(str, using=[str(expr_sp)])
        equiv_sp = sp.parse_expr(equiv, symbs)
        equivalent = (expr_sp - equiv_sp).simplify() == 0
        yield from dp.ensure(equivalent, "not_equivalent")
        yield from dp.ensure(equiv_sp.is_nonnegative, "not_nonneg")
        return n_val
    except Exception as e:
        yield from dp.fail("sympy_error", message=str(e))

A strategy is a program with unresolved choice points, represented here by the guess operator. At runtime, LLM oracles are tasked with producing return values for guess, in such a way as to pass all ensure assertions. Such a program induces a search tree that can be explored in many different ways, as specified by a separate policy:

def serial_policy():
    model = dp.standard_model("gpt-5-mini")
    return dp.dfs() & {
        "n_val": dp.few_shot(model),
        "equiv": dp.take(2) @ dp.few_shot(model)}

This policy uses sequential depth-first search (dfs), making at most two proof attempts for every parameter guess and sampling choices using gpt-5-mini. Here is another policy that uses a parallel variant of dfs, where multiple completions are repeatedly sampled and the resulting branches explored in parallel:

def parallel_policy():
    model = dp.standard_model("gpt-5-mini")
    return dp.loop() @ dp.par_dfs() & {
        "n_val": dp.few_shot(model, max_requests=1, num_completions=3),
        "equiv": dp.few_shot(model, max_requests=1, num_completions=2)}

Given an associated policy and a budget limit, our strategy can now be executed:

budget = dp.BudgetLimit({dp.NUM_REQUESTS: 8, dp.DOLLAR_PRICE: 1e-3})
res, _ = (find_param_value("2*x**2 - 4*x + n")
          .run_toplevel(dp.PolicyEnv(), serial_policy())
          .collect(budget=budget, num_generated=1))
print(res[0].tracked.value)  # e.g. 2

Here, choice points are resolved via zero-shot prompting. To increase reliability, one can provide examples of correct decisions by adding demonstrations. Demonstrations can be expressed in a dedicated language that supports an interactive, test-driven development workflow. In the screenshot below, the VSCode extension indicates the next unresolved choice point for solving a particular problem instance, which the user can answer before receiving new feedback:

A complete explanation of this example is provided in the Delphyne Manual.

Features Overview

Separate your business logic from your search logic. Prompting offers a powerful but unreliable programming primitive, making validation and search essential. At the same time, retries and backtracking are costly. Resolving this tension requires careful tuning. Delphyne enforces a strict separation between business logic (i.e. strategies) and search logic (i.e. policies), allowing users to iterate on the latter without any refactoring.
Write and repair demonstrations interactively. Few-shot examples are crucial components of an LLM pipeline, but writing them and keeping them consistent as the pipeline evolves can be error-prone and time-consuming. Delphyne's dedicated demonstration language allows bundling examples with unit tests and updating them interactively.
Refactor without fear. Delphyne leverages static types to enforce consistency between strategies and policies, and a testing DSL to enforce consistency between strategies and demonstrations. Modifying policies is guaranteed to never break demonstrations.
Compose heterogeneous strategies in a modular fashion. Delphyne allows modularly composing heterogeneous strategies while keeping the associated policies independent. This is enabled by having arbitrary search algorithms cooperate through a shared protocol. In addition, any prompt can be locally and transparently refined into a dedicated strategy.
Leverage advanced search algorithms. Thanks to an extensible effect system, strategies can produce search trees with arbitrary metadata, structure, and shape, which advanced algorithms (e.g. HyperTree Proof Search) can exploit. Tree transformers enable conversions between tree types, allowing the same strategy to pair with vastly different algorithms.
Precisely control your resource spending. Delphyne offers a resource-aware, combinator-based language for defining search policies that prevents overspending by construction.
Leverage the full Python ecosystem. Delphyne integrates seamlessly with Python libraries and tooling. It is fully typed and works smoothly with the Python debugger.
Access a rich library of components. Delphyne comes with batteries included: its standard library supports many models and prompting idioms, including structured output, tool use, classification, assistant priming, and chain of thought.
Run reproducible experiments. Delphyne provides utilities for benchmarking and tuning LLM pipelines. Request caching allows experiments to be replicated at no cost.

Installation

First, download the Delphyne repository and enter it:

git clone [email protected]:jonathan-laurent/delphyne.git
cd delphyne
git checkout v0.11.1  # latest stable version

Then, to install the Delphyne library and CLI in your current Python environment:

pip install -e ".[dev]"

Note that Python 3.12 (or more recent) is required, since Delphyne makes heavy use of PEP 695 generics. Next, you should build the Delphyne vscode extension. For this, assuming you have Node.js installed (version 22 or later), run:

cd vscode-ui
npm install
npx vsce package

The last command should create a delphyne-xxx.vsix extensions archive, which can be installed in vscode using the Extensions: Install from VSIX command (use Ctrl+Shift+P to search for it in the command palette).

Testing your installation

You can verify your installation by running the short test suite:

make test

To test the Delphyne extension, we recommend reading the corresponding manual chapter, opening the examples/find_invariants/abduct_and_branch.demo.yaml demonstration file, and then executing the Delphyne: Evaluate All Demonstrations in File command (accessible from the palette via Ctrl+Shift+P). Diagnostics should appear to indicate that all tests passed.

Learning More

The key concepts underlying Delphyne are presented in the Manual.
A complete API Reference is also available, along with a series of How-To Guides.
The examples directory provides a gallery of Delphyne programs (see README files).
For a more technical presentation, see the original paper that introduces Delphyne: Oracular Programming: A Modular Foundation for Building LLM-Enabled Software.

Citing this Work

If you use Delphyne in an academic paper, please cite our work as follows:

@article{oracular-programming-2025,
  title={Oracular Programming: A Modular Foundation for Building LLM-Enabled Software},
  author={Laurent, Jonathan and Platzer, Andr{\'e}},
  journal={arXiv preprint arXiv:2502.05310},
  year={2025}
}

Acknowledgements

This work was supported by the Alexander von Humboldt Professorship program.

Name		Name	Last commit message	Last commit date
Latest commit History 826 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
examples		examples
scripts		scripts
src/delphyne		src/delphyne
tests		tests
vscode-ui		vscode-ui
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.bib		CITATION.bib
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Quick Example

Features Overview

Installation

Testing your installation

Learning More

Citing this Work

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

jonathan-laurent/delphyne

Folders and files

Latest commit

History

Repository files navigation

Quick Example

Features Overview

Installation

Testing your installation

Learning More

Citing this Work

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages