Skip to content

[ATH-3800] Create functionality to run the evals in datasets#13

Draft
codyathina wants to merge 3 commits intomainfrom
ATH-3800-create-functionality-run-evals-in-datasets
Draft

[ATH-3800] Create functionality to run the evals in datasets#13
codyathina wants to merge 3 commits intomainfrom
ATH-3800-create-functionality-run-evals-in-datasets

Conversation

@codyathina
Copy link
Contributor

@codyathina codyathina commented Apr 10, 2025

Summary by CodeRabbit

  • New Features
    • Introduced support for initiating evaluation runs with configurable parameters, allowing users to trigger evaluations directly.
    • Added functionality for logging dataset events by capturing event details, enhancing tracking and analysis capabilities.
    • Enhanced the Dataset class with a method to programmatically trigger evaluation runs using specified configurations.
    • Added example code in the Jupyter notebook for users to utilize the new evaluation functionality.

@coderabbitai
Copy link

coderabbitai bot commented Apr 10, 2025

Walkthrough

The changes introduce a new static method run_evals to both the AthinaApiService and Dataset classes within the athina_client module. This method orchestrates the evaluation process by creating an evaluation run and triggering a dataset event. Each class's implementation includes error handling and raises a CustomException for failures. Additionally, a new code cell has been added to a Jupyter notebook to demonstrate how to use the Dataset.run_evals method with evaluation configurations.

Changes

File Change Summary
athina_client/.../athina_api_service.py Added a new static method run_evals to orchestrate evaluations, along with internal methods _create_eval_run (POST to /api/v1/eval_run) and _create_dataset_event (POST to /api/v1/dataset_event/{datasetId}), both with error handling.
athina_client/.../dataset.py Added a new static method run_evals to trigger evaluation runs for datasets with error handling.
examples/.../datasets.ipynb Added a new code cell demonstrating the use of Dataset.run_evals with evaluation configurations and error handling.

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client
  participant Service as AthinaApiService
  participant Dataset as Dataset
  participant EvalRunAPI as EvalRun API (/api/v1/eval_run)
  participant DatasetEventAPI as DatasetEvent API (/api/v1/dataset_event/{datasetId})

  Client->>Service: call run_evals(eval_configs, dataset_id, is_conditional_node)
  Service->>EvalRunAPI: POST request to create evaluation run
  EvalRunAPI-->>Service: Return response (Status, Data)
  alt Success (Status 200/201)
    Service->>DatasetEventAPI: POST request to create dataset event
    DatasetEventAPI-->>Service: Return dataset event response
    Service-->>Client: Return dataset event response
  else Failure
    Service-->>Client: Raise CustomException
  end

  Client->>Dataset: call run_evals(dataset_id, eval_configs, is_conditional_node)
  Dataset->>DatasetEventAPI: POST request to trigger dataset event
  DatasetEventAPI-->>Dataset: Return dataset event response
  alt Success
    Dataset-->>Client: Return dataset event response
  else Failure
    Dataset-->>Client: Raise CustomException
  end
Loading

Poem

Hopping through lines with joyful might,
I scurry and munch on bugs in sight.
A new method to run and create,
With careful hops, we celebrate fate.
In every call, a dance so bright,
Code and rabbits, a wondrous sight!
🐰✨


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@codyathina codyathina requested a review from vivek-athina April 10, 2025 01:06
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
athina_client/services/athina_api_service.py (3)

664-673: Consider adding input validation for parameters

The method correctly sets up the API request, but it could benefit from basic parameter validation.

Consider validating that eval_configs is not empty before making the API call, as sending an empty list might result in unexpected behavior:

    def create_eval_run(create_eval_run: bool, eval_configs: List[Dict[str, Any]], dataset_id: Optional[str] = None):
        """
        Wrapper for POST /api/v1/eval_run
        
        Parameters:
        - create_eval_run (bool): Flag indicating whether to create a new eval run.
        - eval_configs (List[Dict]): Evaluation configurations.
        - dataset_id (Optional[str]): Optional dataset ID.

        Returns:
        - eval_run_id (str): The ID of the created evaluation run.
        """
        try:
+           if not eval_configs:
+               raise CustomException("eval_configs cannot be empty", "Please provide at least one evaluation configuration")
            endpoint = f"{AthinaApiService._base_url()}/api/v1/eval_run"

676-679: Maintain consistent error handling pattern with other methods

For consistency with other methods in the class, consider handling 401 errors separately to provide the specific "check your API key" message.

-            if response.status_code not in [200, 201]:
+            if response.status_code == 401:
+                error_message = response_json.get("error", "Unknown Error")
+                details_message = "please check your athina api key and try again"
+                raise CustomException(error_message, details_message)
+            elif response.status_code not in [200, 201]:
                 error_message = response_json.get("error", "Unknown Error")
                 details_message = response_json.get("details", {}).get("message", "No details")
                 raise CustomException(error_message, details_message)

Apply the same change to the create_dataset_event method for consistency.


652-661: Enhance docstring to match style of other methods in the class

The current docstring is minimal compared to other methods in the class.

Consider expanding the docstring to be more descriptive and include information about potential exceptions, similar to other methods:

    def create_eval_run(create_eval_run: bool, eval_configs: List[Dict[str, Any]], dataset_id: Optional[str] = None):
        """
-        Wrapper for POST /api/v1/eval_run
+        Creates an evaluation run by calling the Athina API.
        
        Parameters:
        - create_eval_run (bool): Flag indicating whether to create a new eval run.
        - eval_configs (List[Dict]): Evaluation configurations.
        - dataset_id (Optional[str]): Optional dataset ID.

        Returns:
-        - eval_run_id (str): The ID of the created evaluation run.
+        - The ID of the created evaluation run.
+
+        Raises:
+        - CustomException: If the API call fails or returns an error.
        """

Apply a similar improvement to the create_dataset_event method's docstring.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 59ae448 and fca427a.

📒 Files selected for processing (1)
  • athina_client/services/athina_api_service.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
athina_client/services/athina_api_service.py (1)
athina_client/errors/exceptions.py (1)
  • CustomException (5-16)
🪛 Ruff (0.8.2)
athina_client/services/athina_api_service.py

650-650: Undefined name Optional

(F821)


683-683: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


685-685: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


689-689: Undefined name Optional

(F821)


722-722: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


724-724: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

Comment on lines 687 to 724
@staticmethod
@retry(stop_max_attempt_number=2, wait_fixed=1000)
def create_dataset_event(dataset_id: str, event_type: str, payload: Dict[str, Any], is_conditional_node: Optional[bool] = False):
"""
Wrapper for POST /api/v1/dataset_event/{datasetId}

Parameters:
- dataset_id (str): The dataset ID.
- event_type (str): Type of the event.
- payload (Dict): Payload associated with the event.
- is_conditional_node (bool, optional): Flag indicating if the node is conditional.

Returns:
- Dataset event response data.
"""
try:
endpoint = f"{AthinaApiService._base_url()}/api/v1/dataset_event/{dataset_id}"
response = requests.post(
endpoint,
headers=AthinaApiService._headers(),
json={
"event_type": event_type,
"payload": payload,
"isConditionalNode": is_conditional_node
}
)
response_json = response.json()

if response.status_code not in [200, 201]:
error_message = response_json.get("error", "Unknown Error")
details_message = response_json.get("details", {}).get("message", "No details")
raise CustomException(error_message, details_message)

return response_json["data"]
except requests.RequestException as e:
raise CustomException("Request failed", str(e))
except Exception as e:
raise CustomException("Unexpected error occurred", str(e))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve exception handling and maintain consistency in implementation style

The implementation of this method is good, but there are similar issues as in the previous method:

  1. Exception handling should use raise ... from err pattern
  2. The docstring could be more descriptive, similar to other methods in the class
-            raise CustomException("Request failed", str(e))
+            raise CustomException("Request failed", str(e)) from e
-            raise CustomException("Unexpected error occurred", str(e))
+            raise CustomException("Unexpected error occurred", str(e)) from e

Also, consider standardizing how response status codes are checked. Some methods in the class check for 401 separately before checking other error codes, while these new methods use a different pattern.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@staticmethod
@retry(stop_max_attempt_number=2, wait_fixed=1000)
def create_dataset_event(dataset_id: str, event_type: str, payload: Dict[str, Any], is_conditional_node: Optional[bool] = False):
"""
Wrapper for POST /api/v1/dataset_event/{datasetId}
Parameters:
- dataset_id (str): The dataset ID.
- event_type (str): Type of the event.
- payload (Dict): Payload associated with the event.
- is_conditional_node (bool, optional): Flag indicating if the node is conditional.
Returns:
- Dataset event response data.
"""
try:
endpoint = f"{AthinaApiService._base_url()}/api/v1/dataset_event/{dataset_id}"
response = requests.post(
endpoint,
headers=AthinaApiService._headers(),
json={
"event_type": event_type,
"payload": payload,
"isConditionalNode": is_conditional_node
}
)
response_json = response.json()
if response.status_code not in [200, 201]:
error_message = response_json.get("error", "Unknown Error")
details_message = response_json.get("details", {}).get("message", "No details")
raise CustomException(error_message, details_message)
return response_json["data"]
except requests.RequestException as e:
raise CustomException("Request failed", str(e))
except Exception as e:
raise CustomException("Unexpected error occurred", str(e))
@staticmethod
@retry(stop_max_attempt_number=2, wait_fixed=1000)
def create_dataset_event(dataset_id: str, event_type: str, payload: Dict[str, Any], is_conditional_node: Optional[bool] = False):
"""
Wrapper for POST /api/v1/dataset_event/{datasetId}
Parameters:
- dataset_id (str): The dataset ID.
- event_type (str): Type of the event.
- payload (Dict): Payload associated with the event.
- is_conditional_node (bool, optional): Flag indicating if the node is conditional.
Returns:
- Dataset event response data.
"""
try:
endpoint = f"{AthinaApiService._base_url()}/api/v1/dataset_event/{dataset_id}"
response = requests.post(
endpoint,
headers=AthinaApiService._headers(),
json={
"event_type": event_type,
"payload": payload,
"isConditionalNode": is_conditional_node
}
)
response_json = response.json()
if response.status_code not in [200, 201]:
error_message = response_json.get("error", "Unknown Error")
details_message = response_json.get("details", {}).get("message", "No details")
raise CustomException(error_message, details_message)
return response_json["data"]
except requests.RequestException as e:
raise CustomException("Request failed", str(e)) from e
except Exception as e:
raise CustomException("Unexpected error occurred", str(e)) from e
🧰 Tools
🪛 Ruff (0.8.2)

689-689: Undefined name Optional

(F821)


722-722: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


724-724: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

@vivek-athina vivek-athina requested review from Akhilathina and removed request for vivek-athina April 10, 2025 02:35
Copy link
Contributor

@vivek-athina vivek-athina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not supposed to expose the internal concepts to the user in this feature request.

In fact just think how would a user outside Athina understand dataset_event and eval_run.

@codyathina codyathina requested a review from vivek-athina April 11, 2025 14:08
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
athina_client/datasets/dataset.py (1)

1-5: ⚠️ Potential issue

Missing import for CustomException.

The run_evals method raises a CustomException, but this class isn't imported in the file. Make sure to import it to avoid runtime errors.

 from typing import Any, Dict, List, Optional
 from dataclasses import dataclass, field
 from athina_client.services import AthinaApiService
 from athina_client.constants import MAX_DATASET_ROWS
+from athina_client.exceptions import CustomException
🧹 Nitpick comments (3)
examples/datasets.ipynb (2)

212-218: Consider enhancing documentation for eval configurations.

The eval_configs structure is introduced without explanation of its purpose and structure. Adding a comment explaining what each field represents would improve usability for developers who are new to the API.

 eval_configs = [
     {
+        # Maps the column "run_prompt" from the dataset to the "text" field in the evaluation
         "source_column":"run_prompt",
         "source_column_type":"string",
         "destination_field":"text"
     }
 ]

222-222: Add response structure details.

The code prints the raw response without explaining its structure. Consider adding a more detailed handling of the response to help users understand what to expect.

-    print("Eval run triggered successfully:", response)
+    print("Eval run triggered successfully!")
+    print(f"Evaluation ID: {response.get('id', 'N/A')}")
+    print(f"Status: {response.get('status', 'N/A')}")
+    print("Full response:", response)
athina_client/datasets/dataset.py (1)

370-397: Add usage example to docstring.

The method has a comprehensive docstring, but adding a usage example would improve developer experience, similar to what you've done for the update_cells method.

         Returns:
             Dict[str, Any]: The response from the dataset event trigger.
 
         Raises:
             CustomException: If the evaluation trigger fails.
+
+        Example:
+            ```python
+            eval_configs = [
+                {
+                    "source_column": "run_prompt",
+                    "source_column_type": "string",
+                    "destination_field": "text"
+                }
+            ]
+            result = Dataset.run_evals("dataset-123", eval_configs)
+            ```
         """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 8f8fe0c and c19562d.

📒 Files selected for processing (2)
  • athina_client/datasets/dataset.py (1 hunks)
  • examples/datasets.ipynb (1 hunks)
🔇 Additional comments (2)
examples/datasets.ipynb (1)

221-222: Ensure variable visibility in example.

The notebook uses dataset.id which refers to the dataset created in the previous cell. If someone runs this cell independently, they might encounter an error if dataset isn't defined.

Consider adding a fallback option or clarifying comment:

-    response = Dataset.run_evals(eval_configs=eval_configs, dataset_id=dataset.id)
+    # Make sure dataset.id is defined from a previous cell or use a specific dataset_id
+    dataset_id = dataset.id if 'dataset' in locals() else "your-dataset-id-here"
+    response = Dataset.run_evals(eval_configs=eval_configs, dataset_id=dataset_id)
athina_client/datasets/dataset.py (1)

391-395: LGTM - Implementation correctly delegates to AthinaApiService.

The implementation correctly delegates to the AthinaApiService.run_evals() method with all the required parameters and proper error handling.

Copy link
Contributor

@vivek-athina vivek-athina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's park this PR for now

@vivek-athina vivek-athina marked this pull request as draft April 12, 2025 07:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants