feat(benchmarking): add benchmarking code into MLFlow models to better monitor accuracy

At the moment, we run our benchmarking code manually to assess model changes and guard against regressions. This is imperfect because:
1.	It relies on someone manually checking match rates against our ground-truth datasets.
2.	It’s time-consuming to run more than one dataset when deploying changes.
3.	We don’t have a complete history of model performance over time.

Now that we have MLflow running in GitHub, it would make sense to set up a new repo (or workflow) that automatically runs model evaluation whenever we publish a new GitHub release, and tracks accuracy there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(benchmarking): add benchmarking code into MLFlow models to better monitor accuracy #189

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(benchmarking): add benchmarking code into MLFlow models to better monitor accuracy #189

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions