CHANGELOG

Unreleased

Features

New generated_output field: Add generated_output field to DatasetRecord for storing model-generated outputs separately from ground truth. This allows you to track both the expected output (ground truth) and the actual model output in the same dataset record. In the UI, this field is displayed as "Generated Output".

Example:
```
from galileo.schema.datasets import DatasetRecord

record = DatasetRecord(
    input="What is 2+2?",
    output="4",  # Ground truth
    generated_output="The answer is 4"  # Model-generated output
)
```

Ground Truth naming support: The existing output field is now displayed as "Ground Truth" in the Galileo UI for better clarity. The SDK supports both output and ground_truth field names when creating records - both are normalized to output internally, ensuring full backward compatibility. You can use either field name, and access the value via the ground_truth property.

Example:

from galileo.schema.datasets import DatasetRecord

# Using 'output' (backward compatible)
record1 = DatasetRecord(input="What is 2+2?", output="4")
assert record1.ground_truth == "4"  # Property accessor

# Using 'ground_truth' (new recommended way)
record2 = DatasetRecord(input="What is 2+2?", ground_truth="4")
assert record2.output == "4"  # Normalized internally
assert record2.ground_truth == "4"  # Property accessor

v0.10.0 (2025-05-29)

Bug Fixes

Langchain on_chain_start with_kwargs name and serialised is none (#138, 132955c)

Chores

deps: Bump langchain-core from 0.3.40 to 0.3.58 (#122, e2e6a49)

Features

Add codeflash optimizer on our repo (#151, de1897b)
Enable setting session ID (#139, e646df3)

This changes the start_session function to return the session ID, and adds a new set_session function.

langchain: Implement duration_ns metric for langchain callback async handler (#150, eed7a81)
langchain: Implement duration_ns metric for langchain callback handler (#145, 55f3cdf)

v0.9.2 (2025-05-22)

Bug Fixes

Creating empty dataset (#134, 3b13f52)
Runtimewarning: coroutine AsyncMockMixin._execute_mock_call was never awaited (#135, bd76cff)
Updating Langchain handler issues (#118, 4304d16)

Chores

deps-dev: Bump mkdocs-material from 9.6.13 to 9.6.14 (#136, 71d6676)
release: V0.9.2 (12ad78c)

Automatically generated by python-semantic-release

v0.9.1 (2025-05-15)

Bug Fixes

Reverting a change from a previous context manager fix (#133, b3bd345)

Chores

deps-dev: Bump mkdocs-material from 9.6.12 to 9.6.13 (#131, f44be0c)
deps-dev: Bump pytest-asyncio from 0.25.3 to 0.26.0 (#129, 43e2af7)
release: V0.9.1 (b80a218)

Automatically generated by python-semantic-release

Continuous Integration

Run tests on all OSes (#127, d6f3d1b)

v0.9.0 (2025-05-10)

Bug Fixes

Using context var stacks for saving context with nested call s (#126, 42c1424)

Chores

deps: Bump codecov/codecov-action from 5.4.0 to 5.4.2 (#119, ba53ebf)
deps: Bump openai-agents from 0.0.7 to 0.0.12 (#123, 0150eee)
deps-dev: Bump mkdocs-material from 9.6.5 to 9.6.12 (#125, 707149a)
deps-dev: Bump mkdocstrings from 0.27.0 to 0.29.1 (#124, bdb2c47)
deps-dev: Bump pytest from 8.3.4 to 8.3.5 (#121, 7e5ab19)
release: V0.9.0 (eeb6ddf)

Automatically generated by python-semantic-release

Features

Local scorers for local runner experiments (#116, 4f7d8e0)
Session management (#120, fed6f65)

v0.8.1 (2025-04-30)

Chores

Restrict galileo-core version to only float forward for patch releases (#115, 37a4c36)
release: V0.8.1 (a308655)

Automatically generated by python-semantic-release

v0.8.0 (2025-04-30)

Bug Fixes

Use correct logging schemas from galileo_core.schemas.logging (#114, fa387d3)

Chores

Pulling the latest openapi.yaml and regenerating the API client (#112, bfa38fb)
deps: Bump h11 from 0.14.0 to 0.16.0 (#113, 1336bef)
release: V0.8.0 (3335cb9)

Automatically generated by python-semantic-release

Documentation

Add supported py versions, refactor urls (#111, d6e7169)

Features

Enable mypy on ci (#108, c445a09)

v0.7.0 (2025-04-17)

Bug Fixes

Adding list_prompt_templates (#102, 5814c90)
Error occurred during execution: shutdown: 'GalileoTracingProcessor' object has no attribute '_commit' (#105, e2ece22)
Fix functionality to add new rows to dataset (#103, 5f2d633)
Fixing the OpenAI wrapper when it's used in an active trace (#109, 10bb6f6)

Chores

release: V0.7.0 (76ab22d)

Automatically generated by python-semantic-release

Features

Bubble up openai status_code (#107, 8074ea3)

Testing

Bump codecov target (#106, 04ee6eb)

v0.6.0 (2025-04-10)

Bug Fixes

Adding tools sent to the chat model in the Langchain callback handlers (#95, 125de12)

Co-authored-by: Andrii Soldatenko andrii.soldatenko@gmail.com

Json datasets are not being passed into runner functions correctly (#100, e833481)
Updating README snippets to point to the correct OpenAI client wrapper (#92, ffd2a68)
Updating the Langchain handler retriever, tool, async chain callbacks (#101, ce29a91)
Use Message and MessageRole models from galileo-core (#99, c9ad656)

Chores

Enum name normalization (#91, e084dac)
Increase codecov target, since we alredy > 75% (#93, 4a09189)
Revert "Enum name normalization" (#98, 508af9e)
release: V0.6.0 (ff166d8)

Automatically generated by python-semantic-release

Features

Add dataset version methods (#94, 4081828)
Allowing custom hosted scorers to be used (#87, e908d87)

Co-authored-by: Andrii Soldatenko andrii.soldatenko@gmail.com

Error if a non existent metric is specified (#97, 93a4914)
Improve error handling for create_dataset (#96, 0dda7ba)

v0.5.0 (2025-04-04)

Chores

release: V0.5.0 (2203995)

Automatically generated by python-semantic-release

Features

integration: Add support for openai agents (with tree) (#88, 78b0c56)

Co-authored-by: Andrii Soldatenko andrii.soldatenko@gmail.com

v0.4.0 (2025-04-04)

Chores

release: V0.4.0 (8d700bc)

Automatically generated by python-semantic-release

Features

Langchain async callback handler (#89, ea8d4f8)

Co-authored-by: Andrii Soldatenko andrii.soldatenko@gmail.com

v0.3.0 (2025-04-02)

Bug Fixes

Ierror handling in Experiments.get (#84, 0d36b8c)
Improve error message when galileo api key is empty or invalid (#82, 31593c4)
Make No traces to flush as info (#83, babcd2f)
Making the run_experiment return object consistent (#86, 4a3059f)

Chores

release: V0.3.0 (8fee183)

Automatically generated by python-semantic-release

Features

Add ability to disable logger (#68, b417bd6)
Experiments name should not be re used (#79, 483367b)

v0.2.4 (2025-03-28)

Bug Fixes

Improve error handling and print better error when create prompt template fails (#81, 8839f8b)
Set correct openai status code for successful run (#80, 6d8224f)
Update explanation for Context Relevance (#77, 43e8230)

Chores

deps: Bump galileo-core to v3.19.0 (#78, 8391fb3)
release: V0.2.4 (4e68e33)

Automatically generated by python-semantic-release

Documentation

Add badges (codecov and pypi) to readme (#75, 9f268fe)
Remove extra brackets (#76, 9f0ccc0)

Testing

Add unit tests for experiments (#71, 19870ee)

v0.2.3 (2025-03-21)

Bug Fixes

Move from packaging.version import Version to openai section (#74, 085e989)

Chores

release: V0.2.3 (5aa9b41)

Automatically generated by python-semantic-release

v0.2.2 (2025-03-18)

Bug Fixes

Run_experiment() (#73, 209ef0f)

Chores

release: V0.2.2 (2a00be2)

Automatically generated by python-semantic-release

v0.2.1 (2025-03-18)

Bug Fixes

Fixup after refactoring: PromptTemplates (#72, cae8e70)

Chores

Add more informative console messages after running an experiment (#70, fff4d0c)
release: V0.2.1 (780b1b9)

Automatically generated by python-semantic-release

v0.2.0 (2025-03-18)

Bug Fixes

Langgraph metadata errors in the Langchain handler (#62, a8f8133)

Co-authored-by: Andrii Soldatenko andrii.soldatenko@gmail.com

Log tools on LLM span in OpenAI decorator (#47, d2ef9aa)

Co-authored-by: ajaynayak ajaynayak@gmail.com

Making sure the final span output bubbles up to the trace during flush (#64, bb70952)

Co-authored-by: Andrii Soldatenko andrii.soldatenko@gmail.com

Chores

release: V0.2.0 (7d39fcb)

Automatically generated by python-semantic-release

Features

Add prompt_settings to run_experiment (#67, 02de6ea)
Adding a test and updating the sdk version (#69, 180e8b4)
Implement properly run_experiments with datasets (#66, 17879ef)
Use @log with ThreadPoolExecutor (#65, f13c746)

Refactoring

Newly added classes to align with our naming convention (#63, c540c2a)

Co-authored-by: Andrii Soldatenko andrii.soldatenko@gmail.com

v0.1.0 (2025-03-14)

Bug Fixes

Adding init, reset, and flush_all methods to galileo_context, adding tests, fixing existing tests (#28, 569d1d0)
Change job name according to new api version (#52, 9c9352d)
Fixing an issue with parsing OpenAI tool calls outputs (#36, 55c0fe4)
Fixing get log_stream by name (#40, 2571024)
Output parsing for retriever spans (#34, c87f926)
Serialization of non-serialized types or classes (#48, 39d7611)
Serializing trace, workflow, and tool span inputs and outputs (#41, e54cb95)
Set min Python version for ruff to py39 (#23, fcfc0d1)
Typo inside ./scripts/auto-generate-api-client.sh (#32, 252c061)
Update add_llm_span example (#39, e01838f)

Chores

Add missing keys to pyproject.toml (#57, 500ac21)
Fix path to __init__ (#61, 8d6425d)
Remove js dir and files (#56, 962ae5d)
Set version field in pyproject.toml correctly (#60, 1560b76)
Setup repo + package similarly to our other Python repos (#25, 4daac02)
deps: Bump galileo-core to v3.2+ (#29, ed234e3)
deps: Bump codecov/codecov-action from 5.3.1 to 5.4.0 (#42, 41d7955)
deps: Bump python-semantic-release/python-semantic-release from 9.20.0 to 9.21.0 (#43, 95543ce)
release: V0.1.0 (e443300)

Automatically generated by python-semantic-release

Continuous Integration

Bump python-semantic-release/python-semantic-release from 9.17.0 to 9.20.0 (#26, ac9aaab)

Documentation

Add reference docs to the more client-facing pages (#37, 08b3457)

Co-authored-by: ajaynayak ajaynayak@gmail.com

Add small note about poetry shell (#12, d598aae)

Features

Add streaming support to openai wrapper (#31, 78687c7)
Adding a client-type header to all requests (#54, 764ee2d)
Adding a way to conclude all spans in a trace; restoring defaults in the decorator (#35, 130fe02)
Catch and handle errors throughout the client (#46, d6a82ee)
Changes to support the new core logging schemas (#30, a2c6ca8)

Changes to the client based on the following core and api changes: rungalileo/core#232 rungalileo/api#3489

There were some DX changes I made in this PR that will need to get moved to core: - renaming user_metadata to metadata for the logging functions - allowing more flexible types to be used for documents in the add_retriever_span() method. The current traces_logger method is too restrictive (forces a user to specify a list of dicts or a list of Documents, else throws an error). Since we're using function decorators, we need to be more permissive of method outputs which will map to the retriever documents field.

Decorator should create trace but reraise original exception (#45, d3dbb7a)
Implement get/create for prompt templates (#49, c647da3)
Langchain callback (#44, b6dd9a3)
Replace app.galileo.ai with api.galileo.ai if users specify it incorrectly (#53, 9a7ac9e)
Run experiment with a runner function and hosted metrics (#58, 5c37ff7)

Co-authored-by: Andrii Soldatenko ubuntu@ip-172-31-28-161.eu-central-1.compute.internal

Co-authored-by: ajaynayak ajaynayak@gmail.com

Run experiment with run_prompt and hosted metrics (#50, c78908f)
Updating the readme and pyproject for release (#59, 867a327)

Refactoring

Removing project and log_stream from the log decorator (#55, d348dbd)

Testing

Add tests to emulate openai errors and galileo api errors (#38, 16f73bb)
Adding unit tests for openai wrapper (#27, fdc15d1)
Dont ignore async test (#33, 6f75fa8)

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

CHANGELOG

Unreleased

Features

v0.10.0 (2025-05-29)

Bug Fixes

Chores

Features

v0.9.2 (2025-05-22)

Bug Fixes

Chores

v0.9.1 (2025-05-15)

Bug Fixes

Chores

Continuous Integration

v0.9.0 (2025-05-10)

Bug Fixes

Chores

Features

v0.8.1 (2025-04-30)

Chores

v0.8.0 (2025-04-30)

Bug Fixes

Chores

Documentation

Features

v0.7.0 (2025-04-17)

Bug Fixes

Chores

Features

Testing

v0.6.0 (2025-04-10)

Bug Fixes

Chores

Features

v0.5.0 (2025-04-04)

Chores

Features

v0.4.0 (2025-04-04)

Chores

Features

v0.3.0 (2025-04-02)

Bug Fixes

Chores

Features

v0.2.4 (2025-03-28)

Bug Fixes

Chores

Documentation

Testing

v0.2.3 (2025-03-21)

Bug Fixes

Chores

v0.2.2 (2025-03-18)

Bug Fixes

Chores

v0.2.1 (2025-03-18)

Bug Fixes

Chores

v0.2.0 (2025-03-18)

Bug Fixes

Chores

Features

Refactoring

v0.1.0 (2025-03-14)

Bug Fixes

Chores

Continuous Integration

Documentation

Features

Refactoring

Testing