Skip to content

Refactor: benchmark dataset too small for meaningful accuracy measurement #83

@Mar10-Labs

Description

@Mar10-Labs

Current dataset has 20 synthetic traces. This validates basic functionality but is not enough to measure real-world accuracy or compare prompt strategies reliably.\n\nGoal: expand dataset to at least 50–100 traces covering edge cases and mixed-signal scenarios.

Metadata

Metadata

Assignees

No one assigned

    Labels

    refactorCode that needs cleanup or restructuringtestingRelated to tests, benchmarks or test data

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions