Skip to content

Commit c483e69

Browse files
committed
DOC Add TODO items, update comments and improve code comments for clarity
1 parent 6e41cf7 commit c483e69

File tree

2 files changed

+42
-1
lines changed

2 files changed

+42
-1
lines changed

evaluation/README.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11

22
# Evaluations
33

4+
#TODO: Open AI evals documentaiton: https://platform.openai.com/docs/guides/evals
5+
46
## LLM Output Evaluator
57

68
The `evals` script evaluates the outputs of Large Language Models (LLMs) and estimates the associated token usage and cost.
@@ -12,6 +14,8 @@ It supports batch evalaution via a configuration CSV and produces a detailed met
1214
This script evaluates LLM outputs using the `lighteval` library:
1315
https://huggingface.co/docs/lighteval/en/metric-list#automatic-metrics-for-generative-tasks
1416

17+
##TODO: Use uv to execute scripts without manually manging enviornments https://docs.astral.sh/uv/guides/scripts/
18+
1519
Ensure you have the `lighteval` library and any model SDKs (e.g., OpenAI) configured properly.
1620

1721

@@ -138,6 +142,37 @@ df_grouped = df_grouped.rename(columns={'formatted_chunk': 'concatenated_chunks'
138142
df_grouped.to_csv('~/Desktop/formatted_chunks.csv', index=False)
139143
```
140144

145+
```
146+
echo 'export PATH="/Applications/Postgres.app/Contents/Versions/latest/bin:$PATH"' >> ~/.zshrc
147+
source ~/.zshrc
148+
149+
createdb backupDBBalancer07012025
150+
pg_restore -v -d backupDBBalancer07012025 ~/Downloads/backupDBBalancer07012025.sql
151+
152+
pip install psycopg2-binary
153+
154+
from sqlalchemy import create_engine
155+
import pandas as pd
156+
157+
# Alternative: Standard psycopg2 connection (if you get psycopg2 working)
158+
# engine = create_engine("postgresql://sahildshah@localhost:5432/backupDBBalancer07012025")
159+
160+
# Fixed the variable name (was "database query", now "query")
161+
query = "SELECT * FROM api_embeddings;"
162+
163+
# Execute the query and load into DataFrame
164+
df = pd.read_sql(query, engine)
165+
166+
df['formatted_chunk'] = df.apply(lambda row: f"ID: {row['chunk_number']} | CONTENT: {row['text']}", axis=1)
167+
# Ensure the chunks are joined in order of chunk_number by sorting the DataFrame before grouping and joining
168+
df = df.sort_values(by=['name', 'upload_file_id', 'chunk_number'])
169+
df_grouped = df.groupby(['name', 'upload_file_id'])['formatted_chunk'].apply(lambda chunks: "\n".join(chunks)).reset_index()
170+
df_grouped = df_grouped.rename(columns={'formatted_chunk': 'concatenated_chunks'})
171+
df_grouped.to_csv('~/Desktop/formatted_chunks.csv', index=False)
172+
```
173+
174+
175+
141176
- Path where the evaluation resuls will be saved
142177

143178
import pandas as pd

server/api/services/llm_services.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ def handle_request(
1616
) -> tuple[str, dict[str, int], dict[str, float], float]:
1717
pass
1818

19+
# LLM Pricing Calculator: https://www.llm-prices.com/
20+
1921
# Anthropic Model Pricing: https://docs.anthropic.com/en/docs/about-claude/pricing#model-pricing
2022

2123
class GPT4OMiniHandler(BaseModelHandler):
@@ -78,7 +80,7 @@ class GPT41NanoHandler(BaseModelHandler):
7880
7981
# Instructions
8082
81-
- Identify decision points for bipolar medications
83+
- Identify decision points for bipolar medications #TODO: "pharmacological and procedurl interventions"
8284
8385
- For each decision point you find, return a JSON object using the following format:
8486
@@ -88,11 +90,15 @@ class GPT41NanoHandler(BaseModelHandler):
8890
"medications": ["<medication 1>", "<medication 2>", ...],
8991
"reason": "<short explanation for why this criterion applies>",
9092
"sources": ["<ID-X>"]
93+
"hierarchy": Primary: Contraindictions for allergies
94+
"override" Exclude for allergy
9195
}
9296
9397
9498
- Only extract bipolar medication decision points that are explicitly stated or strongly implied in the context and never rely on your own knowledge
9599
100+
- TODO: Test against medication indication file
101+
96102
# Output Format
97103
98104
- Return the extracted bipolar medication decision points as a JSON array and if no decision points are found in the context return an empty array

0 commit comments

Comments
 (0)