-
Notifications
You must be signed in to change notification settings - Fork 0
feature: Giskard v3 #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
kevinmessiaen
wants to merge
21
commits into
main
Choose a base branch
from
feature/giskard-v3
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 18 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
9564072
chore: remove giskard legacy from doc
kevinmessiaen 6189a9a
custom sidebar for checks docs
mattbit 1f13434
fix context for partial toctree
mattbit 3b1ea05
draft checks docs
mattbit c6ee339
small fixes
mattbit 8e63877
docs(oss-checks): update examples to use fluent builder pattern
kevinmessiaen 88bf862
chore: update dependencies in pyproject.toml and uv.lock
kevinmessiaen 0bfd74f
docs(checks): update documentation for scenario-based approach
kevinmessiaen 12b0955
docs(checks): refresh quickstart scenario flow
kevinmessiaen dbedbaf
docs(checks): clarify core concepts flow
kevinmessiaen dce3721
Apply suggestions from code review
kevinmessiaen f7ed564
docs(checks): switch snippets to gpt-5-mini
kevinmessiaen 57bcca8
Use expected value for `GreaterThan`
kevinmessiaen b366598
docs(checks): clarify async run note
kevinmessiaen e586f47
docs(checks): sharpen single-turn risk examples
kevinmessiaen fd69646
docs(checks): refresh multi-turn risk scenarios
kevinmessiaen 00f6eb3
Merge remote-tracking branch 'origin/main' into feature/giskard-check…
kevinmessiaen b31b849
chore: remove link of removed page
kevinmessiaen 7f22fd0
docs: update check names in docs
kevinmessiaen a8bb09c
docs: fix sidebar configuration
kevinmessiaen ba0d398
docs(oss): Upgrade to main
kevinmessiaen File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1 @@ | ||
| 3.12 | ||
| 3.13 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file added
BIN
+62.2 KB
source/_static/images/oss/checks/quickstart-simple_example_result.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+126 KB
source/_static/images/oss/checks/quickstart-structured_interactions.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| <nav class="table w-full min-w-full my-6 lg:my-8"> | ||
| {{ toctree_from_doc('oss/checks/index', collapse=False, maxdepth=20, includehidden=True, titles_only=False) }} | ||
| </nav> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -70,7 +70,6 @@ def update_sidebar_templates(): | |
|
|
||
| extensions = [ | ||
| "myst_parser", | ||
| "nbsphinx", | ||
| "sphinx_design", | ||
| "sphinx.ext.todo", | ||
| "sphinx.ext.napoleon", | ||
|
|
@@ -126,28 +125,20 @@ def update_sidebar_templates(): | |
| html_js_files = ["custom.js"] | ||
| html_favicon = "_static/favicon.ico" | ||
|
|
||
| html_sidebars = { | ||
| "oss/checks/**": [ | ||
| "sidebar_main_nav_links.html", | ||
| "sidebars/sidebar_oss_checks.html", | ||
| ], | ||
| } | ||
|
|
||
| # Do not execute the notebooks when building the docs | ||
| docs_version = os.getenv("READTHEDOCS_VERSION", "latest") | ||
| if docs_version == "latest" or docs_version == "stable": | ||
| branch = "main" | ||
| else: | ||
| branch = docs_version.replace("-", "/") | ||
| branch = "main" | ||
|
Comment on lines
129
to
134
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| # -- Options for nbsphinx ---------------------------------------------------- | ||
| nbsphinx_execute = "never" | ||
| # fmt: off | ||
| nbsphinx_prolog = """ | ||
| .. raw:: html | ||
|
|
||
| <div class="open-in-colab__wrapper"> | ||
| <a href="https://colab.research.google.com/github/Giskard-AI/giskard-hub/blob/""" + branch + """/script-docs/{{ env.doc2path(env.docname, base=None) }}" target="_blank"><img src="https://colab.research.google.com/assets/colab-badge.svg" style="display: inline; margin: 0" alt="Open In Colab"/></a> | ||
| <a href="https://github.com/Giskard-AI/giskard-hub/tree/""" + branch + """/script-docs/{{ env.doc2path(env.docname, base=None) }}" target="_blank"><img src="https://img.shields.io/badge/github-view%20source-black.svg" style="display: inline; margin: 0" alt="View Notebook on GitHub"/></a> | ||
| </div> | ||
| """ | ||
| # fmt: on | ||
|
|
||
|
|
||
| theme_options = ThemeOptions( | ||
| show_prev_next=True, | ||
| show_scrolltop=True, | ||
|
|
@@ -158,7 +149,7 @@ def update_sidebar_templates(): | |
| "Overview": "/index", | ||
| "Hub UI": "/hub/ui/index", | ||
| "Hub SDK": "/hub/sdk/index", | ||
| "Open Source": "/oss/sdk/index", | ||
| "Checks": "/oss/checks/index", | ||
| }, | ||
| ) | ||
| html_theme_options = asdict(theme_options) | ||
|
|
@@ -193,6 +184,49 @@ def update_sidebar_templates(): | |
| ogp_image = "https://docs.giskard.ai/_static/open-graph-image.png" | ||
|
|
||
|
|
||
| # Add custom template function to render toctree from a specific document | ||
| def setup(app): | ||
| def html_page_context(app, pagename, templatename, context, doctree): | ||
| def toctree_from_doc(docname, **kwargs): | ||
| """Render toctree starting from a specific document""" | ||
| from sphinx.environment.adapters.toctree import TocTree | ||
| from sphinx import addnodes | ||
|
Comment on lines
+190
to
+191
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| source_doctree = app.env.get_doctree(docname) | ||
| toctrees = list(source_doctree.findall(addnodes.toctree)) | ||
|
|
||
| if not toctrees: | ||
| return "" | ||
|
|
||
| toctree_adapter = TocTree(app.env) | ||
| resolved = [ | ||
| toctree_adapter.resolve( | ||
| pagename, # Use current page context, not the toctree source | ||
| app.builder, | ||
| toctree, | ||
| prune=False, | ||
| maxdepth=kwargs.get("maxdepth", -1), | ||
| titles_only=kwargs.get("titles_only", False), | ||
| collapse=kwargs.get("collapse", False), | ||
| includehidden=kwargs.get("includehidden", False), | ||
| ) | ||
| for toctree in toctrees | ||
| ] | ||
|
|
||
| resolved = [r for r in resolved if r is not None] | ||
| if not resolved: | ||
| return "" | ||
|
|
||
| result = resolved[0] | ||
| for toctree in resolved[1:]: | ||
| result.extend(toctree.children) | ||
|
|
||
| return app.builder.render_partial(result)["fragment"] | ||
|
|
||
| context["toctree_from_doc"] = toctree_from_doc | ||
|
|
||
| app.connect("html-page-context", html_page_context) | ||
|
|
||
|
|
||
| # make github links resolve | ||
| def linkcode_resolve(domain, info): | ||
| if domain != "py": | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,179 @@ | ||
| ============= | ||
| Core Concepts | ||
| ============= | ||
|
|
||
| Understanding the key concepts in Giskard Checks will help you write effective tests for your AI applications. | ||
|
|
||
|
|
||
| Overview | ||
| -------- | ||
|
|
||
| Giskard Checks is built around a few core primitives that work together: | ||
|
|
||
| * **Interaction**: A single turn of data exchange (inputs and outputs) | ||
| * **InteractionSpec**: A specification for generating interactions dynamically | ||
| * **Trace**: An immutable snapshot of all interactions in a scenario | ||
| * **Check**: A validation that runs on a trace and returns a result | ||
| * **Scenario**: A list of steps (interactions and checks) executed sequentially | ||
|
|
||
| At runtime, the flow looks like this: | ||
|
|
||
| 1. A Scenario is created with a sequence of steps. | ||
|
|
||
| 2. For each step in order: | ||
|
|
||
| a. Each InteractionSpec is resolved into a concrete Interaction. | ||
| b. The Interaction is appended to the Trace. | ||
| c. Checks run against the current Trace. | ||
|
|
||
| 3. Results are returned as a ScenarioResult. | ||
|
|
||
| Interaction | ||
| ----------- | ||
|
|
||
| An ``Interaction`` represents a single turn of data exchange with the system under test. | ||
| Interactions are computed at execution time by resolving ``InteractionSpec`` objects into the trace. | ||
|
|
||
| **Properties:** | ||
|
|
||
| * ``inputs``: The input to your system (string, dict, Pydantic model, etc.) | ||
| * ``outputs``: The output from your system (any serializable type) | ||
| * ``metadata``: Optional dictionary for additional context (timings, model info, etc.) | ||
|
|
||
| Interactions are **immutable**, as they represent something that has already happened. | ||
|
|
||
|
|
||
| InteractionSpec | ||
| --------------- | ||
|
|
||
| An ``InteractionSpec`` describes *how* to generate an interaction and is used to describe a scenario. | ||
| When you call ``.interact(...)`` in the fluent API, it adds an ``InteractionSpec`` to the scenario sequence. | ||
| Inputs and outputs can be static values or dynamic callables, and you can mix both. | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| from giskard.checks import InteractionSpec | ||
| from openai import OpenAI | ||
| import random | ||
|
|
||
| def generate_random_question() -> str: | ||
| return f"What is 2 + {random.randint(0, 10)}?" | ||
|
|
||
| def generate_answer(inputs: str) -> str: | ||
| client = OpenAI() | ||
| response = client.chat.completions.create( | ||
| model="gpt-5-mini", | ||
kevinmessiaen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| messages=[{"role": "user", "content": inputs}], | ||
| ) | ||
| return response.choices[0].message.content | ||
|
|
||
| spec = InteractionSpec( | ||
| inputs=generate_random_question, | ||
| outputs=generate_answer, | ||
| metadata={ | ||
| "category": "math", | ||
| "difficulty": "easy" | ||
| } | ||
| ) | ||
|
|
||
| Specs are resolved into interactions during scenario execution. This is common in multi-turn scenarios, where inputs and outputs are generated based on previous interactions. See :doc:`multi-turn` for practical examples. | ||
|
|
||
| Trace | ||
| ----- | ||
|
|
||
| A ``Trace`` is an immutable snapshot of all data exchanged with the system under test. In its simplest form, it is a list of interactions. | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| from giskard.checks import Trace, Interaction | ||
|
|
||
| trace = Trace(interactions=[ | ||
| Interaction(inputs="Hello", outputs="Hi there!"), | ||
| Interaction(inputs="How are you?", outputs="I'm doing well, thanks!") | ||
| ]) | ||
|
|
||
| Traces are typically created during scenario execution by resolving each ``InteractionSpec`` into a frozen interaction. | ||
|
|
||
|
|
||
| Checks | ||
| ------ | ||
|
|
||
| A ``Check`` validates something about a trace and returns a ``CheckResult``. There's a library of built-in checks, but you can also create your own. | ||
|
|
||
| When referencing values in a trace, use JSONPath expressions that start with ``trace.``. The ``last`` property is a shortcut for ``interactions[-1]`` and can be used in both JSONPath keys and Python code. | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| from giskard.checks import Groundedness, Trace | ||
|
|
||
| check = Groundedness( | ||
| answer_key="trace.last.outputs", | ||
| context="Giskard Checks is a testing framework for AI systems." | ||
| ) | ||
|
|
||
|
|
||
| Scenario | ||
| -------- | ||
|
|
||
| A ``Scenario`` is a list of steps (interactions and checks) that are executed sequentially with a shared trace. Scenarios work for both single-turn and multi-turn tests. | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| from giskard.checks import scenario | ||
|
|
||
| test_scenario = ( | ||
| scenario("test_with_checks") | ||
| .interact(inputs="test input", outputs="test output") | ||
| .check(check1) | ||
| .check(check2) | ||
| ) | ||
|
|
||
| result = await test_scenario.run() | ||
|
|
||
| .. note:: | ||
| The ``run()`` method is asynchronous. When running in a script, use ``asyncio.run()``: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| import asyncio | ||
|
|
||
| async def main(): | ||
| result = await test_scenario.run() | ||
| return result | ||
|
|
||
| result = asyncio.run(main()) | ||
|
|
||
| In async contexts (like pytest with ``@pytest.mark.asyncio``), you can use ``await`` directly. | ||
|
|
||
| This will give us a result object with the results of the checks. | ||
|
|
||
|
|
||
| Fluent API Mapping | ||
| ------------------ | ||
|
|
||
| The fluent API is the preferred user-facing entry point and maps directly to the core primitives above: | ||
|
|
||
| * ``scenario(name)`` creates a ``Scenario`` builder. | ||
| * ``.interact(...)`` adds an ``InteractionSpec`` to the scenario sequence. | ||
| * ``.check(...)`` adds a ``Check`` to the scenario sequence. | ||
| * ``.run()`` resolves specs to interactions, builds the ``Trace``, runs checks, and returns a ``ScenarioResult``. | ||
|
|
||
| For example, we can test a simple conversation flow with two turns: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| from giskard.checks import scenario, Conformity | ||
|
|
||
| test_scenario = ( | ||
| scenario("conversation_flow") | ||
| .interact(inputs="Hello", outputs=generate_answer) | ||
| .check(Conformity(key="trace.last.outputs", rule="response should be a friendly greeting")) | ||
| .interact(inputs="Who invented the HTML?", outputs=generate_answer) | ||
| .check(Conformity(key="trace.last.outputs", rule="response should mention Tim Berners-Lee as the inventor of HTML")) | ||
| ) | ||
|
|
||
| # Run with asyncio.run() if in a script | ||
| import asyncio | ||
| result = await test_scenario.run() # or: result = asyncio.run(test_scenario.run()) | ||
|
|
||
| For a practical introduction to the fluent API, see :doc:`quickstart`. | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The placeholder
"your-api-key"has been added to thedetect-secretsconfiguration. This is a significant security risk as it might be overlooked and could lead to real secrets being excluded from scans if this pattern is copied. This placeholder should be removed. If a specific secret needs to be excluded, it should be done using its ID or a more specific regex.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"your-api-key"is not a real api key and is used in example code.