Skip to content

Evaluate re-ranker performance #2313

@paynejd

Description

@paynejd

Re-ranking capability is already available with a default language model that provides normalized (0..1) and consistent scoring across a multi-algorithm candidate pool.

This task is to design a straightforward evaluation to compare the performance of our default re-ranking language model (LM) to one or more alternative LMs (i.e. a medically trained LM) using several validation datasets. Decision points:

  • Is our current model good enough that we should make re-ranking available to all users?
  • Do users need the option to enable/disable or can it always be enabled?
  • Is a medically trained model giving better results?
  • Is one LM for all projects sufficient to start? Or do different projects require different LMs for re-ranking to be useful?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Requirements

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions