Update evaluation README with metrics and API usage details, and add 'Contributing' section

sahilds1 · sahilds1 · commit d1dd75c2c927 · 2025-07-15T10:25:10.000-04:00
diff --git a/evaluation/README.md b/evaluation/README.md
@@ -2,23 +2,23 @@
 
 ## `evals`: LLM evaluations to test and improve model outputs
 
-LLM evals test a prompt with a set of test data by scoring each item in the data set
-
-To test Balancer's structured text extraction of medication rules, `evals` computes:
+### Metrics
 
 [Extractiveness](https://huggingface.co/docs/lighteval/en/metric-list#automatic-metrics-for-generative-tasks):
 
+Natural Language Generation Performance:
+
 * Extractiveness Coverage: 
     - Percentage of words in the summary that are part of an extractive fragment with the article
 * Extractiveness Density: 
     - Average length of the extractive fragment to which each word in the summary belongs
 * Extractiveness Compression: 
     - Word ratio between the article and the summary
 
-API usage:
+API Performance:
 
-* Token usage (input/output)
-* Estimated cost in USD
+* Token Usage (input/output)
+* Estimated Cost in USD
 * Duration (in seconds)
 
 ### Test Data
@@ -152,4 +152,7 @@ for i, metric in enumerate(all_metrics):
 
 plt.tight_layout()
 plt.show()
-```
+
+```
+
+### Contributing