Skip to content

Conversation

@ekin-aisi
Copy link
Collaborator

This PR contains:

  • New features

What is the current behavior? (You can also link to an open issue here)

Reading large eval log files is slow, particularly when extracting messages and events fields from samples. The current implementation uses Pydantic validation for all fields, which adds significant overhead.

What is the new behavior?

Added read_eval_log_as_json() function that:

Bypasses Pydantic validation entirely for faster field extraction

  • Uses multiprocessing to parallelize reading of sample files from eval formatted log files
  • Provides 10-20x speedup for extracting messages and events fields from large eval logs

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

No breaking changes. This adds a new optional function alongside existing functionality. Users can continue using the validated read_eval_log() when full validation is needed, or opt into read_eval_log_as_json() when performance is critical and validation can be skipped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants