Add open router docs and confident test for prompt tools and output schemas #2475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open

A-Vamshi wants to merge 10 commits into confident-ai:main from A-Vamshi:main

+1,077 −30

deepeval/prompt/utils.py

-Original file line number
+Diff line change
@@ Expand Up @@
                 field_type = (
                     field.type.value if hasattr(field.type, "value") else field.type
                 )
-                field_schema = {"type": map_type(field.type)}
+                normalized_type = (
+                    SchemaDataType(field_type)
+                    if not isinstance(field_type, SchemaDataType)
+                    else field_type
+                )
+                field_schema = {"type": map_type(normalized_type)}
                 # Add description if available
                 if field.description:
@@ Expand Down @@

docs/docs/evaluation-prompts.mdx

-Original file line number
+Diff line change
@@ Expand Up @@
     - `output_type`: The string specifying the model to use for generation.
     - `output_schema`: The schema of type `BaseModel` of the output, if `output_type` is `OutputType.SCHEMA`.
+    ### Tools
+    The tools in a prompt are used to specify the tools your agent has access to, all tools are identified using thier name and hence must be unique.
+    ```python
+    from deepeval.prompt import Prompt, Tool
+    from deepeval.prompt.api import ToolMode
+    from pydantic import BaseModel
+    class ToolInputSchema(BaseModel):
+        result: str
+        confidence: float
+    prompt = Prompt(alias="YOUR-PROMPT-ALIAS")
+    tool = Tool(
+        name="ExploreTool",
+        description="Tool used for browsing the internet",
+        mode=ToolMode.STRICT,
+        structured_schema=ToolInputSchema,
+    )
+    prompt.push(
+        text="This is a prompt with a tool",
+        tools=[tool]
+    )
+    # You can also update an existing tool by using the new tool in the push / update method:
+    tool2 = Tool(
+        name="ExploreTool", # Must have the same name to update a tool
+        description="Tool used for browsing the internet",
+        mode=ToolMode.ALLOW_ADDITIONAL,
+        structured_schema=ToolInputSchema,
+    )
+    prompt.update(
+        tools=[tool2]
+    )
+    ```

docs/docs/metrics-custom.mdx

-Original file line number
+Diff line change
@@ Expand Up / @@ -9,7 +9,8 @@ sidebar_label: Do it yourself @@
     </head>
     import MetricTagsDisplayer from '@site/src/components/MetricTagsDisplayer';
-    import { Timeline, TimelineItem } from '@site/src/components/Timeline';
+    import Tabs from "@theme/Tabs";
+    import TabItem from "@theme/TabItem";
     <MetricTagsDisplayer custom={true} usesLLMs={false} />
@@ Expand All @@
     ## Rules To Follow When Creating A Custom Metric
-    <Timeline>
-      <TimelineItem title="Inherit the `BaseMetric` class"></TimelineItem>
-    </Timeline>
     ### 1. Inherit the `BaseMetric` class
     To begin, create a class that inherits from `deepeval`'s `BaseMetric` class:
+    <Tabs groupId="single-multi-turns">
+    <TabItem value="single-turn" label="Single-Turn">
     ```python
     from deepeval.metrics import BaseMetric
     class CustomMetric(BaseMetric):
         ...
     ```
-    This is important because the `BaseMetric` class will help `deepeval` acknowledge your custom metric during evaluation.
+    This is important because the `BaseMetric` class will help `deepeval` acknowledge your custom metric as a single-turn metric during evaluation.
+    </TabItem>
+    <TabItem value="multi-turn" label="Multi-Turn">
+    ```python
+    from deepeval.metrics import BaseConversationalMetric
+    class CustomConversationalMetric(BaseConversationalMetric):
+        ...
+    ```
+    This is important because the `BaseConversationalMetric` class will help `deepeval` acknowledge your custom metric as a multi-turn metric  during evaluation.
+    </TabItem>
+    </Tabs>
     ### 2. Implement the `__init__()` method
-    The `BaseMetric` class gives your custom metric a few properties that you can configure and be displayed post-evaluation, either locally or on Confident AI.
+    The `BaseMetric` / `BaseConversationalMetric` class gives your custom metric a few properties that you can configure and be displayed post-evaluation, either locally or on Confident AI.
     An example is the `threshold` property, which determines whether the `LLMTestCase` being evaluated has passed or not. Although **the `threshold` property is all you need to make a custom metric functional**, here are some additional properties for those who want even more customizability:
@@ Expand All @@
     The `__init__()` method is a great place to set these properties:
+    <Tabs groupId="single-multi-turns">
+    <TabItem value="single-turn" label="Single-Turn">
     ```python
     from deepeval.metrics import BaseMetric
@@ Expand All / @@ -86,6 +107,33 @@ class CustomMetric(BaseMetric): @@
             self.async_mode = async_mode
     ```
+    </TabItem>
+    <TabItem value="multi-turn" label="Multi-Turn">
+    ```python
+    from deepeval.metrics import BaseConversationalMetric
+    class CustomConversationalMetric(BaseConversationalMetric):
+        def __init__(
+            self,
+            threshold: float = 0.5,
+            # Optional
+            evaluation_model: str,
+            include_reason: bool = True,
+            strict_mode: bool = True,
+            async_mode: bool = True
+        ):
+            self.threshold = threshold
+            # Optional
+            self.evaluation_model = evaluation_model
+            self.include_reason = include_reason
+            self.strict_mode = strict_mode
+            self.async_mode = async_mode
+    ```
+    </TabItem>
+    </Tabs>
     ### 3. Implement the `measure()` and `a_measure()` methods
     The `measure()` and `a_measure()` method is where all the evaluation happens. In `deepeval`, evaluation is the process of applying a metric to an `LLMTestCase` to generate a score and optionally a reason for the score (if you're using an LLM) based on the scoring algorithm.
@@ Expand Down Expand Up / @@ -114,6 +162,12 @@ Both `measure()` and `a_measure()` **MUST**: @@
     You can also optionally set `self.reason` in the measure methods (if you're using an LLM for evaluation), or wrap everything in a `try` block to catch any exceptions and set it to `self.error`. Here's a hypothetical example:
+    <Tabs groupId="single-multi-turns">
+    <TabItem value="single-turn" label="Single-Turn">
     ```python
     from deepeval.metrics import BaseMetric
     from deepeval.test_case import LLMTestCase
@@ Expand Down Expand Up / @@ -150,6 +204,49 @@ class CustomMetric(BaseMetric): @@
                 raise
     ```
+    </TabItem>
+    <TabItem value="multi-turn" label="Multi-Turn">
+    ```python
+    from deepeval.metrics import BaseConversationalMetric
+    from deepeval.test_case import ConversationalTestCase
+    class CustomConversationalMetric(BaseConversationalMetric):
+        ...
+        def measure(self, test_case: ConversationalTestCase) -> float:
+            # Although not required, we recommend catching errors
+            # in a try block
+            try:
+                self.score = generate_hypothetical_score(test_case)
+                if self.include_reason:
+                    self.reason = generate_hypothetical_reason(test_case)
+                self.success = self.score >= self.threshold
+                return self.score
+            except Exception as e:
+                # set metric error and re-raise it
+                self.error = str(e)
+                raise
+        async def a_measure(self, test_case: ConversationalTestCase) -> float:
+            # Although not required, we recommend catching errors
+            # in a try block
+            try:
+                self.score = await async_generate_hypothetical_score(test_case)
+                if self.include_reason:
+                    self.reason = await async_generate_hypothetical_reason(test_case)
+                self.success = self.score >= self.threshold
+                return self.score
+            except Exception as e:
+                # set metric error and re-raise it
+                self.error = str(e)
+                raise
+    ```
+    </TabItem>
+    </Tabs>
     :::tip
     Often times, the blocking part of an LLM evaluation metric stems from the API calls made to your LLM provider (such as OpenAI's API endpoints), and so ultimately you'll have to ensure that LLM inference can indeed be made asynchronous.
@@ Expand All @@
     Under the hood, `deepeval` calls the `is_successful()` method to determine the status of your metric for a given `LLMTestCase`. We recommend copy and pasting the code below directly as your `is_successful()` implementation:
+    <Tabs groupId="single-multi-turns">
+    <TabItem value="single-turn" label="Single-Turn">
     ```python
     from deepeval.metrics import BaseMetric
     from deepeval.test_case import LLMTestCase
@@ Expand All / @@ -185,13 +286,46 @@ class CustomMetric(BaseMetric): @@
             if self.error is not None:
                 self.success = False
             else:
-                return self.success
+                try:
+                    self.success = self.score >= self.threshold
+                except TypeError:
+                    self.success = False
+            return self.success
     ```
+    </TabItem>
+    <TabItem value="multi-turn" label="Multi-Turn">
+    ```python
+    from deepeval.metrics import BaseConversationalMetric
+    from deepeval.test_case import ConversationalTestCase
+    class CustomConversationalMetric(BaseConversationalMetric):
+        ...
+        def is_successful(self) -> bool:
+            if self.error is not None:
+                self.success = False
+            else:
+                try:
+                    self.success = self.score >= self.threshold
+                except TypeError:
+                    self.success = False
+            return self.success
+    ```
+    </TabItem>
+    </Tabs>
     ### 5. Name Your Custom Metric
     Probably the easiest step, all that's left is to name your custom metric:
+    <Tabs groupId="single-multi-turns">
+    <TabItem value="single-turn" label="Single-Turn">
     ```python
     from deepeval.metrics import BaseMetric
     from deepeval.test_case import LLMTestCase
@@ Expand All / @@ -204,6 +338,26 @@ class CustomMetric(BaseMetric): @@
             return "My Custom Metric"
     ```
+    </TabItem>
+    <TabItem value="multi-turn" label="Multi-Turn">
+    ```python
+    from deepeval.metrics import BaseConversationalMetric
+    from deepeval.test_case import ConversationalTestCase
+    class CustomConversationalMetric(BaseConversationalMetric):
+        ...
+        @property
+        def __name__(self):
+            return "My Custom Metric"
+    ```
+    </TabItem>
+    </Tabs>
     **Congratulations 🎉!** You've just learnt how to build a custom metric that is 100% integrated with `deepeval`'s ecosystem. In the following section, we'll go through a few real-life examples.
     ## More Examples
@@ Expand Down @@

docs/integrations/models/openrouter.mdx

            
                      Original file line number
                      Diff line number
                      Diff line change
                  
    @@ -1,5 +1,5 @@
  
    ---

    # id: openrouter

    id: openrouter

    title: OpenRouter

    sidebar_label: OpenRouter

    ---

    @@ -43,7 +43,7 @@ model = OpenRouterModel(
  
        model="openai/gpt-4.1",

        api_key="your-openrouter-api-key",

        # Optional: override the default OpenRouter endpoint

        # base_url="https://openrouter.ai/api/v1",

        base_url="https://openrouter.ai/api/v1",

        # Optional: pass OpenRouter headers via **kwargs

        default_headers={

            "HTTP-Referer": "https://your-site.com",

    @@ -59,12 +59,12 @@ There are **ZERO** mandatory and **SEVEN** optional parameters when creating an
  
    - [Optional] `model`: A string specifying the OpenRouter model to use. Defaults to `OPENROUTER_MODEL_NAME` if set; otherwise falls back to "openai/gpt-4.1".

    - [Optional] `api_key`: A string specifying your OpenRouter API key for authentication. Defaults to `OPENROUTER_API_KEY` if not passed; raises an error at runtime if unset.

    - [Optional] `base_url`: A string specifying the base URL for the OpenRouter API endpoint. Defaults to `OPENROUTER_BASE_URL` if set; otherwise falls back to "https://openrouter.ai/api/v1".

    - [Optional] `temperature`: A float specifying the model temperature. Defaults to `TEMPERATURE` if not passed; falls back to `0.0` if unset; raises if < 0.

    - [Optional] `cost_per_input_token`: A float specifying the cost for each input token for the provided model. Defaults to `OPENROUTER_COST_PER_INPUT_TOKEN` if set.

    - [Optional] `cost_per_output_token`: A float specifying the cost for each output token for the provided model. Defaults to `OPENROUTER_COST_PER_OUTPUT_TOKEN` if set.

    - [Optional] `temperature`: A float specifying the model temperature. Defaults to `TEMPERATURE` if not passed; falls back to `0.0` if unset.

    - [Optional] `cost_per_input_token`: A float specifying the cost for each input token for the provided model. Defaults to `OPENROUTER_COST_PER_INPUT_TOKEN` if not passed; raises an error at runtime if unset.

    - [Optional] `cost_per_output_token`: A float specifying the cost for each output token for the provided model. Defaults to `OPENROUTER_COST_PER_OUTPUT_TOKEN` if not passed; raises an error at runtime if unset.

    - [Optional] `generation_kwargs`: A dictionary of additional generation parameters forwarded to OpenRouter's `chat.completions.create(...)` call

    Any additional **kwargs you would like to use for your OpenRouter client can be passed directly to OpenRouterModel(...). These are forwarded to the underlying OpenAI client constructor. We recommend double-checking the parameters and headers supported by your chosen model in the [official OpenRouter docs](https://openrouter.ai/docs).

    Any additional `**kwargs` you would like to use for your `OpenRouter` client can be passed directly to `OpenRouterModel(...)`. These are forwarded to the underlying OpenAI client constructor. We recommend double-checking the parameters and headers supported by your chosen model in the [official OpenRouter docs](https://openrouter.ai/docs).

    :::tip

    Pass headers specific to OpenRouter via kwargs:

docs/sidebarIntegrations.js

-Original file line number
+Diff line change
@@ Expand Up / @@ -24,6 +24,7 @@ module.exports = { @@
             'models/openai',
             'models/azure-openai',
             'models/ollama',
+            'models/openrouter',
             'models/anthropic',
             'models/amazon-bedrock',
             'models/gemini',
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add open router docs and confident test for prompt tools and output schemas #2475

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Uh oh!

Add open router docs and confident test for prompt tools and output schemas #2475

Are you sure you want to change the base?

Add open router docs and confident test for prompt tools and output schemas #2475

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Uh oh!