Skip to content

Latest commit

 

History

History
713 lines (462 loc) · 31 KB

File metadata and controls

713 lines (462 loc) · 31 KB

CHANGELOG

Unreleased

Features

  • New generated_output field: Add generated_output field to DatasetRecord for storing model-generated outputs separately from ground truth. This allows you to track both the expected output (ground truth) and the actual model output in the same dataset record. In the UI, this field is displayed as "Generated Output".

    Example:

    from galileo.schema.datasets import DatasetRecord
    
    record = DatasetRecord(
        input="What is 2+2?",
        output="4",  # Ground truth
        generated_output="The answer is 4"  # Model-generated output
    )
  • Ground Truth naming support: The existing output field is now displayed as "Ground Truth" in the Galileo UI for better clarity. The SDK supports both output and ground_truth field names when creating records - both are normalized to output internally, ensuring full backward compatibility. You can use either field name, and access the value via the ground_truth property.

    Example:

    from galileo.schema.datasets import DatasetRecord
    
    # Using 'output' (backward compatible)
    record1 = DatasetRecord(input="What is 2+2?", output="4")
    assert record1.ground_truth == "4"  # Property accessor
    
    # Using 'ground_truth' (new recommended way)
    record2 = DatasetRecord(input="What is 2+2?", ground_truth="4")
    assert record2.output == "4"  # Normalized internally
    assert record2.ground_truth == "4"  # Property accessor

v0.10.0 (2025-05-29)

Bug Fixes

  • Langchain on_chain_start with_kwargs name and serialised is none (#138, 132955c)

Chores

  • deps: Bump langchain-core from 0.3.40 to 0.3.58 (#122, e2e6a49)

Features

This changes the start_session function to return the session ID, and adds a new set_session function.

  • langchain: Implement duration_ns metric for langchain callback async handler (#150, eed7a81)

  • langchain: Implement duration_ns metric for langchain callback handler (#145, 55f3cdf)

v0.9.2 (2025-05-22)

Bug Fixes

  • Creating empty dataset (#134, 3b13f52)

  • Runtimewarning: coroutine AsyncMockMixin._execute_mock_call was never awaited (#135, bd76cff)

  • Updating Langchain handler issues (#118, 4304d16)

Chores

  • deps-dev: Bump mkdocs-material from 9.6.13 to 9.6.14 (#136, 71d6676)

  • release: V0.9.2 (12ad78c)

Automatically generated by python-semantic-release

v0.9.1 (2025-05-15)

Bug Fixes

  • Reverting a change from a previous context manager fix (#133, b3bd345)

Chores

  • deps-dev: Bump mkdocs-material from 9.6.12 to 9.6.13 (#131, f44be0c)

  • deps-dev: Bump pytest-asyncio from 0.25.3 to 0.26.0 (#129, 43e2af7)

  • release: V0.9.1 (b80a218)

Automatically generated by python-semantic-release

Continuous Integration

v0.9.0 (2025-05-10)

Bug Fixes

  • Using context var stacks for saving context with nested call s (#126, 42c1424)

Chores

  • deps: Bump codecov/codecov-action from 5.4.0 to 5.4.2 (#119, ba53ebf)

  • deps: Bump openai-agents from 0.0.7 to 0.0.12 (#123, 0150eee)

  • deps-dev: Bump mkdocs-material from 9.6.5 to 9.6.12 (#125, 707149a)

  • deps-dev: Bump mkdocstrings from 0.27.0 to 0.29.1 (#124, bdb2c47)

  • deps-dev: Bump pytest from 8.3.4 to 8.3.5 (#121, 7e5ab19)

  • release: V0.9.0 (eeb6ddf)

Automatically generated by python-semantic-release

Features

v0.8.1 (2025-04-30)

Chores

  • Restrict galileo-core version to only float forward for patch releases (#115, 37a4c36)

  • release: V0.8.1 (a308655)

Automatically generated by python-semantic-release

v0.8.0 (2025-04-30)

Bug Fixes

  • Use correct logging schemas from galileo_core.schemas.logging (#114, fa387d3)

Chores

  • Pulling the latest openapi.yaml and regenerating the API client (#112, bfa38fb)

  • deps: Bump h11 from 0.14.0 to 0.16.0 (#113, 1336bef)

  • release: V0.8.0 (3335cb9)

Automatically generated by python-semantic-release

Documentation

  • Add supported py versions, refactor urls (#111, d6e7169)

Features

v0.7.0 (2025-04-17)

Bug Fixes

  • Adding list_prompt_templates (#102, 5814c90)

  • Error occurred during execution: shutdown: 'GalileoTracingProcessor' object has no attribute '_commit' (#105, e2ece22)

  • Fix functionality to add new rows to dataset (#103, 5f2d633)

  • Fixing the OpenAI wrapper when it's used in an active trace (#109, 10bb6f6)

Chores

Automatically generated by python-semantic-release

Features

Testing

v0.6.0 (2025-04-10)

Bug Fixes

  • Adding tools sent to the chat model in the Langchain callback handlers (#95, 125de12)

Co-authored-by: Andrii Soldatenko andrii.soldatenko@gmail.com

  • Json datasets are not being passed into runner functions correctly (#100, e833481)

  • Updating README snippets to point to the correct OpenAI client wrapper (#92, ffd2a68)

  • Updating the Langchain handler retriever, tool, async chain callbacks (#101, ce29a91)

  • Use Message and MessageRole models from galileo-core (#99, c9ad656)

Chores

Automatically generated by python-semantic-release

Features

  • Add dataset version methods (#94, 4081828)

  • Allowing custom hosted scorers to be used (#87, e908d87)

Co-authored-by: Andrii Soldatenko andrii.soldatenko@gmail.com

  • Error if a non existent metric is specified (#97, 93a4914)

  • Improve error handling for create_dataset (#96, 0dda7ba)

v0.5.0 (2025-04-04)

Chores

Automatically generated by python-semantic-release

Features

  • integration: Add support for openai agents (with tree) (#88, 78b0c56)

Co-authored-by: Andrii Soldatenko andrii.soldatenko@gmail.com

v0.4.0 (2025-04-04)

Chores

Automatically generated by python-semantic-release

Features

Co-authored-by: Andrii Soldatenko andrii.soldatenko@gmail.com

v0.3.0 (2025-04-02)

Bug Fixes

  • Ierror handling in Experiments.get (#84, 0d36b8c)

  • Improve error message when galileo api key is empty or invalid (#82, 31593c4)

  • Make No traces to flush as info (#83, babcd2f)

  • Making the run_experiment return object consistent (#86, 4a3059f)

Chores

Automatically generated by python-semantic-release

Features

  • Add ability to disable logger (#68, b417bd6)

  • Experiments name should not be re used (#79, 483367b)

v0.2.4 (2025-03-28)

Bug Fixes

  • Improve error handling and print better error when create prompt template fails (#81, 8839f8b)

  • Set correct openai status code for successful run (#80, 6d8224f)

  • Update explanation for Context Relevance (#77, 43e8230)

Chores

Automatically generated by python-semantic-release

Documentation

Testing

v0.2.3 (2025-03-21)

Bug Fixes

  • Move from packaging.version import Version to openai section (#74, 085e989)

Chores

Automatically generated by python-semantic-release

v0.2.2 (2025-03-18)

Bug Fixes

Chores

Automatically generated by python-semantic-release

v0.2.1 (2025-03-18)

Bug Fixes

  • Fixup after refactoring: PromptTemplates (#72, cae8e70)

Chores

  • Add more informative console messages after running an experiment (#70, fff4d0c)

  • release: V0.2.1 (780b1b9)

Automatically generated by python-semantic-release

v0.2.0 (2025-03-18)

Bug Fixes

  • Langgraph metadata errors in the Langchain handler (#62, a8f8133)

Co-authored-by: Andrii Soldatenko andrii.soldatenko@gmail.com

  • Log tools on LLM span in OpenAI decorator (#47, d2ef9aa)

Co-authored-by: ajaynayak ajaynayak@gmail.com

  • Making sure the final span output bubbles up to the trace during flush (#64, bb70952)

Co-authored-by: Andrii Soldatenko andrii.soldatenko@gmail.com

Chores

Automatically generated by python-semantic-release

Features

  • Add prompt_settings to run_experiment (#67, 02de6ea)

  • Adding a test and updating the sdk version (#69, 180e8b4)

  • Implement properly run_experiments with datasets (#66, 17879ef)

  • Use @log with ThreadPoolExecutor (#65, f13c746)

Refactoring

  • Newly added classes to align with our naming convention (#63, c540c2a)

Co-authored-by: Andrii Soldatenko andrii.soldatenko@gmail.com

v0.1.0 (2025-03-14)

Bug Fixes

  • Adding init, reset, and flush_all methods to galileo_context, adding tests, fixing existing tests (#28, 569d1d0)

  • Change job name according to new api version (#52, 9c9352d)

  • Fixing an issue with parsing OpenAI tool calls outputs (#36, 55c0fe4)

  • Fixing get log_stream by name (#40, 2571024)

  • Output parsing for retriever spans (#34, c87f926)

  • Serialization of non-serialized types or classes (#48, 39d7611)

  • Serializing trace, workflow, and tool span inputs and outputs (#41, e54cb95)

  • Set min Python version for ruff to py39 (#23, fcfc0d1)

  • Typo inside ./scripts/auto-generate-api-client.sh (#32, 252c061)

  • Update add_llm_span example (#39, e01838f)

Chores

  • Add missing keys to pyproject.toml (#57, 500ac21)

  • Fix path to __init__ (#61, 8d6425d)

  • Remove js dir and files (#56, 962ae5d)

  • Set version field in pyproject.toml correctly (#60, 1560b76)

  • Setup repo + package similarly to our other Python repos (#25, 4daac02)

  • deps: Bump galileo-core to v3.2+ (#29, ed234e3)

  • deps: Bump codecov/codecov-action from 5.3.1 to 5.4.0 (#42, 41d7955)

  • deps: Bump python-semantic-release/python-semantic-release from 9.20.0 to 9.21.0 (#43, 95543ce)

  • release: V0.1.0 (e443300)

Automatically generated by python-semantic-release

Continuous Integration

  • Bump python-semantic-release/python-semantic-release from 9.17.0 to 9.20.0 (#26, ac9aaab)

Documentation

  • Add reference docs to the more client-facing pages (#37, 08b3457)

Co-authored-by: ajaynayak ajaynayak@gmail.com

  • Add small note about poetry shell (#12, d598aae)

Features

  • Add streaming support to openai wrapper (#31, 78687c7)

  • Adding a client-type header to all requests (#54, 764ee2d)

  • Adding a way to conclude all spans in a trace; restoring defaults in the decorator (#35, 130fe02)

  • Catch and handle errors throughout the client (#46, d6a82ee)

  • Changes to support the new core logging schemas (#30, a2c6ca8)

Changes to the client based on the following core and api changes: rungalileo/core#232 rungalileo/api#3489

There were some DX changes I made in this PR that will need to get moved to core: - renaming user_metadata to metadata for the logging functions - allowing more flexible types to be used for documents in the add_retriever_span() method. The current traces_logger method is too restrictive (forces a user to specify a list of dicts or a list of Documents, else throws an error). Since we're using function decorators, we need to be more permissive of method outputs which will map to the retriever documents field.

  • Decorator should create trace but reraise original exception (#45, d3dbb7a)

  • Implement get/create for prompt templates (#49, c647da3)

  • Langchain callback (#44, b6dd9a3)

  • Replace app.galileo.ai with api.galileo.ai if users specify it incorrectly (#53, 9a7ac9e)

  • Run experiment with a runner function and hosted metrics (#58, 5c37ff7)

Co-authored-by: Andrii Soldatenko ubuntu@ip-172-31-28-161.eu-central-1.compute.internal

Co-authored-by: ajaynayak ajaynayak@gmail.com

  • Run experiment with run_prompt and hosted metrics (#50, c78908f)

  • Updating the readme and pyproject for release (#59, 867a327)

Refactoring

  • Removing project and log_stream from the log decorator (#55, d348dbd)

Testing

  • Add tests to emulate openai errors and galileo api errors (#38, 16f73bb)

  • Adding unit tests for openai wrapper (#27, fdc15d1)

  • Dont ignore async test (#33, 6f75fa8)