Introduced the enum OutputFormat to validate the output format type by dantuzi · Pull Request #19 · SeaseLtd/llm-search-quality-evaluation

dantuzi · 2025-12-30T09:25:03Z

Notes

The dataset generator can export the dataset to Quepid, RRE, and MTEB. The export output format is managed internally as a string, which could cause possible issues due to the lack of proper type checking.

Changes

Introduced a new Enum OutputFormat
Replaced all occurrences of the string quepid, rre, mteb with the proper enum value

Tests

I executed all unit tests
I generated a small dataset with no errors

Copilot

Pull request overview

This PR introduces type safety for output format handling by replacing string-based validation with a dedicated OutputFormat enum. This change reduces the risk of typos and provides better IDE support through autocomplete and type checking.

Introduced new OutputFormat enum with three supported formats: QUEPID, RRE, and MTEB
Updated all references from string literals to enum values across production and test code
Refactored WriterFactory registry to use enum keys instead of strings

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/llm_search_quality_evaluation/shared/models/output_format.py	New enum class defining the three supported output formats (quepid, rre, mteb)
src/llm_search_quality_evaluation/shared/writers/writer_config.py	Updated output_format field type from Literal to OutputFormat enum; removed unused Literal import
src/llm_search_quality_evaluation/shared/writers/writer_factory.py	Updated OUTPUT_FORMAT_REGISTRY to use OutputFormat enum keys instead of string keys
src/llm_search_quality_evaluation/shared/writers/init.py	Added OutputFormat to module exports
src/llm_search_quality_evaluation/dataset_generator/config.py	Updated output_format field type to OutputFormat; refactored validation logic to use enum comparison
src/llm_search_quality_evaluation/dataset_generator/main.py	Updated MTEB format check to use enum comparison instead of string
src/llm_search_quality_evaluation/vector_search_doctor/approximate_search_evaluator/main.py	Updated WriterConfig instantiation to use OutputFormat.RRE
tests/llm_search_quality_evaluation/test_cross_plataform.py	Updated test to use OutputFormat.QUEPID enum value
tests/llm_search_quality_evaluation/shared/writers/test_quepid_writer.py	Updated fixture to use OutputFormat.QUEPID enum value
tests/llm_search_quality_evaluation/shared/writers/test_rre_writer.py	Updated fixture to use OutputFormat.RRE enum value
tests/llm_search_quality_evaluation/shared/writers/test_mteb_writer.py	Updated fixture to use OutputFormat.MTEB enum value
tests/llm_search_quality_evaluation/dataset_generator/test_main_autosave.py	Updated Config instantiation to use OutputFormat.QUEPID
tests/llm_search_quality_evaluation/dataset_generator/test_config_dataset_generator.py	Updated assertions to compare against enum values instead of strings

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-30T09:28:38Z

src/llm_search_quality_evaluation/dataset_generator/config.py

            raise ValueError("id_field is required when output_format='rre'")
-        elif self.output_format == "rre" and not self.rre_query_placeholder:
+        elif not self.rre_query_placeholder:
            raise ValueError("rre_query_placeholder is required when output_format='rre'")
-        elif self.output_format == "rre" and not self.rre_query_template and not self.query_template:
+        elif not self.rre_query_template and not self.query_template:
            raise ValueError("At least one query template is required when output_format='rre'")


The error messages still reference the old string format 'rre' instead of the enum. Consider updating these error messages to be consistent with the enum-based approach. For example, change "output_format='rre'" to "output_format=OutputFormat.RRE" or simply "output_format is RRE".

Copilot · 2025-12-30T09:28:39Z

src/llm_search_quality_evaluation/shared/models/output_format.py

+    """Supported output formats for dataset generation."""
+    QUEPID = "quepid"
+    RRE = "rre"


Consider adding docstrings to each enum member to describe what each output format is used for. For example:

QUEPID: CSV format for Quepid evaluation tool

RRE: Format for Rated Ranking Evaluator

MTEB: Format for Massive Text Embedding Benchmark

This would improve code documentation and make it clearer for developers what each format represents.

Suggested change

"""Supported output formats for dataset generation."""

QUEPID = "quepid"

RRE = "rre"

"""Supported output formats for dataset generation."""

# CSV format for Quepid evaluation tool

QUEPID = "quepid"

# Format for Rated Ranking Evaluator (RRE)

RRE = "rre"

# Format for Massive Text Embedding Benchmark (MTEB)

Introduced the enum OutputFormat to validate the output format type

fe1a1d0

dantuzi requested review from Copilot and nicolo-rinaldi December 30, 2025 09:25

Copilot started reviewing on behalf of dantuzi December 30, 2025 09:25 View session

Copilot AI reviewed Dec 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduced the enum OutputFormat to validate the output format type#19

Introduced the enum OutputFormat to validate the output format type#19
dantuzi wants to merge 1 commit intomainfrom
refactoring

dantuzi commented Dec 30, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Copilot AI Dec 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dantuzi commented Dec 30, 2025

Notes

Changes

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants