-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Description
We want to run a benchmark of the inference pipeline to see how many documents it is able to successfully process on staging.
To do this, we would first create a benchmarking script that can run in any environment (local/staging/etc) and will feed full texts into the classifier API and track the results. After confirming on local, we can then benchmark the Staging server.
Implementation Considerations
Make a script that loops through the available full texts within COSMOS, up to a maximum of 5000 full texts, and sends them one by one to the classifier API.
It should record the job_ids and check back in with the classifier to see how many classifications were successful and how long they took.
You will need to reference the API documentation located here: https://github.com/NASA-IMPACT/llm-app-classifier-pipeline.
Deliverable
- stats on classification completion rates
- num documents sent
- num documents in each status (failed, unknown, success)
- stats on classification throughputs
- len() of documents sent
- time taken to classify all the documents