Skip to content

Commit 1c64a63

Browse files
committed
first commit
0 parents  commit 1c64a63

31 files changed

+88062
-0
lines changed

README.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning
2+
3+
## Introduction
4+
5+
**Background**: Clinical calculators play a vital role in healthcare by offering accurate evidence-based predictions for various purposes, such as diagnosis and prognosis. Nevertheless, their widespread utilization is often hindered by usability challenges and poor dissemination. Augmenting large language models (LLMs) with extensive collections of clinical calculators presents an opportunity to overcome these obstacles and improve workflow efficiency, but the scalability of manual curation and machine adoption poses a significant challenge.
6+
7+
**Methods**: We introduce AgentMD, a novel LLM-based autonomous agent capable of curating and applying calculators across various clinical contexts. Using the medical literature, AgentMD first curates a diverse set of clinical calculators with executable functions as an automatic tool maker. With the curated tools, AgentMD autonomously selects and applies the relevant clinical calculators to any given patient as a tool user.
8+
9+
**Findings**: As a tool maker, AgentMD has curated RiskCalcs, a large collection of 2,164 diverse clinical calculators with over 85% accuracy for quality checks and over 90% pass rate for unit tests. With regards to its tool using capability, AgentMD significantly outperforms chain-of-thought prompting with GPT-4 (87.7% vs. 40.9% accuracy) on RiskQA, an end-to-end benchmark manually annotated in this work. Further evaluation results on 698 emergency department admission notes confirm that AgentMD accurately computes medical risks with real-world patient data at an individual level. Moreover, AgentMD can provide population-level insights for institutional risk management as demonstrated using 9,822 patient notes from MIMIC-III.
10+
11+
**Interpretation**: Our study illustrates the exceptional capabilities of language agents to learn clinical calculators and to further utilize curated calculators for both individual patient care and at-scale healthcare analytics.
12+
13+
## Configuration
14+
15+
To run TrialGPT, one needs to first set up the OpenAI API either directly through OpenAI or through Microsoft Azure. Here we use Microsoft Azure because it is compliant with the Health Insurance Portability and Accountability Act (HIPAA). Please set the environment variables accordingly:
16+
17+
```bash
18+
export OPENAI_ENDPOINT=YOUR_AZURE_OPENAI_ENDPOINT_URL
19+
export OPENAI_API_KEY=YOUR_AZURE_OPENAI_API_KEY
20+
```
21+
22+
The code has been tested with Python 3.9.13 using CentOS Linux release 7.9.2009 (Core). Please install the required Python packages by:
23+
24+
```bash
25+
pip install -r requirements.txt
26+
```
27+
28+
## Instructions
29+
30+
This repository contains the evaluation scripts for three use cases of AgentMD:
31+
32+
- **Evaluation on the RiskQA dataset**. The RiskQA dataset, the RiskCalcs toolkit, and the evaluation code are available under `./riskqa_evaluation`. Please follow the instructions in `./riskqa_evaluation/README.md` to use AgentMD.
33+
- **Evaluation with ED patients**. We put the AgentMD code for our experiments with Yale ED provider notes under `./ed_evaluation`. However, due to privacy concerns, we are not able to release the ED provider notes from Yale Medicine. As such, users would need to use their own clinical notes to run it.
34+
- **Evaluation with MIMIC patients**. The preprocessing code for MIMIC-III notes as well as the AgentMD code are available under `./mimic_evaluation`. One would need to first download and preprocess the MIMIC-III dataset to run the code, following the instructions at `./ed_evaluation`.
35+
36+
## Acknowledgments
37+
38+
This research was supported by the NIH Intramural Research Program, National Library of Medicine, and 1K99LM014024.
39+
40+
## Disclaimer
41+
42+
This tool shows the results of research conducted in the Computational Biology Branch, NCBI/NLM. The information produced on this website is not intended for direct diagnostic use or medical decision-making without review and oversight by a clinical professional. Individuals should not change their health behavior solely on the basis of information produced on this website. NIH does not independently verify the validity or utility of the information produced by this tool. If you have questions about the information produced on this website, please see a health care professional. More information about NCBI's disclaimer policy is available.
43+
44+
## Citation
45+
46+
If you find this repo helpful, please cite AgentMD by:
47+
```bibtex
48+
@article{jin2024agentmd,
49+
title={AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning},
50+
author={Jin, Qiao and Wang, Zhizheng and Yang, Yifan and Zhu, Qingqing and Wright, Donald and Huang, Thomas and Wilbur, W John and He, Zhe and Taylor, Andrew and Chen, Qingyu and others},
51+
journal={arXiv preprint arXiv:2402.13225},
52+
year={2024}
53+
}
54+
```

ed_evaluation/README.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
## Data
2+
3+
Due to privacy concerns, we cannot release the emergency department notes from Yale Medicine. This direcoty provides the code that can work on the user-provided clinical notes from emergency care.
4+
5+
## Step 1: AgentMD tool selection
6+
7+
```bash
8+
python step1_selecting_calcs.py
9+
```
10+
11+
## Step 2: AgentMD tool computation
12+
13+
```bash
14+
python step2_selecting_calcs.py
15+
```
16+
17+
## Step 3: AgentMD tool summarization and scoring
18+
19+
```bash
20+
python step3_ranking_results.py
21+
```
22+
23+
## Acknowledgments
24+
25+
This research was supported by the NIH Intramural Research Program, National Library of Medicine, and 1K99LM014024.
26+
27+
## Disclaimer
28+
29+
This tool shows the results of research conducted in the Computational Biology Branch, NCBI/NLM. The information produced on this website is not intended for direct diagnostic use or medical decision-making without review and oversight by a clinical professional. Individuals should not change their health behavior solely on the basis of information produced on this website. NIH does not independently verify the validity or utility of the information produced by this tool. If you have questions about the information produced on this website, please see a health care professional. More information about NCBI's disclaimer policy is available.
30+
31+
## Citation
32+
33+
If you find this repo helpful, please cite AgentMD by:
34+
```bibtex
35+
@article{jin2024agentmd,
36+
title={AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning},
37+
author={Jin, Qiao and Wang, Zhizheng and Yang, Yifan and Zhu, Qingqing and Wright, Donald and Huang, Thomas and Wilbur, W John and He, Zhe and Taylor, Andrew and Chen, Qingyu and others},
38+
journal={arXiv preprint arXiv:2402.13225},
39+
year={2024}
40+
}
41+
```

ed_evaluation/dataset/__init__.py

Whitespace-only changes.

ed_evaluation/results/__init__.py

Whitespace-only changes.
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
__author__ = "qiao"
2+
3+
"""
4+
select the calculators to use for each patient
5+
"""
6+
7+
import json
8+
import os
9+
from openai import AzureOpenAI
10+
import pandas as pd
11+
12+
import sys
13+
14+
client = AzureOpenAI(
15+
api_version="2023-09-01-preview",
16+
azure_endpoint=os.getenv("OPENAI_ENDPOINT"),
17+
api_key=os.getenv("OPENAI_API_KEY"),
18+
)
19+
20+
if __name__ == "__main__":
21+
22+
model = sys.argv[1]
23+
24+
# loading cached results
25+
output_path = f"results/{model}_calc_selections.json"
26+
27+
if os.path.exists(output_path):
28+
outputs = json.load(open(output_path))
29+
else:
30+
outputs = {}
31+
32+
system = "You are a helpful assistant and your task is to select the calculators that a given patient is eligible for. Here are the candidate calculators:\n"
33+
system += open("tools/calc_desc.txt", "r").read()
34+
system += 'Please first explain what calculators a given patient is eligible for, and then output the list of calculaotr IDs. Please output a JSON dict formatted as Dict{"explanation": Str(explanation), "calculators": List[Int(ID)]}. Please be strict.'
35+
36+
notes = pd.read_csv("dataset/notes_deidentified_verified.csv")
37+
38+
for _, row in notes.iterrows():
39+
note_id = row["PAT_ENC_CSN_ID"]
40+
41+
if note_id in outputs:
42+
continue
43+
44+
note = row["deid_text_combined"]
45+
prompt = "Here is the patient note:\n"
46+
prompt += note + "\n"
47+
prompt += "Output in JSON: "
48+
49+
messages = [
50+
{"role": "system", "content": system},
51+
{"role": "user", "content": prompt}
52+
]
53+
54+
response = client.chat.completions.create(
55+
model=model,
56+
messages=messages,
57+
temperature=0,
58+
)
59+
60+
output = response.choices[0].message.content
61+
outputs[note_id] = output
62+
63+
with open(output_path, "w") as f:
64+
json.dump(outputs, f, indent=4)

ed_evaluation/step2_using_calcs.py

Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
__author__ = "qiao"
2+
3+
"""
4+
Apply the selected tools to each patient with AgentMD.
5+
"""
6+
7+
import json
8+
import contextlib
9+
import os
10+
import io
11+
import re
12+
import traceback
13+
import sys
14+
import pandas as pd
15+
16+
from openai import AzureOpenAI
17+
18+
client = AzureOpenAI(
19+
api_version="2023-09-01-preview",
20+
azure_endpoint=os.getenv("OPENAI_ENDPOINT"),
21+
api_key=os.getenv("OPENAI_API_KEY"),
22+
)
23+
24+
def capture_exec_output_and_errors(code):
25+
"""
26+
Executes the given code and captures its printed output and any error messages.
27+
28+
Parameters:
29+
code (str): The Python code to execute.
30+
31+
Returns:
32+
str: The captured output and error messages of the executed code.
33+
"""
34+
globals = {}
35+
36+
with io.StringIO() as buffer, contextlib.redirect_stdout(buffer), contextlib.redirect_stderr(buffer):
37+
try:
38+
exec(code, globals)
39+
except Exception as e:
40+
# Print the error to the buffer
41+
traceback.print_exc()
42+
43+
return buffer.getvalue()
44+
45+
46+
def extract_python_code(text):
47+
pattern = r"```python\n(.*?)```"
48+
matches = re.findall(pattern, text, re.DOTALL)
49+
return "\n".join(matches)
50+
51+
52+
def apply_calc(patient, calc, model):
53+
"""apply the calculator to a specific patient"""
54+
system = "You are a helpful assistant. Your task is to apply a medical calculator to a patient and interpret the result. You can write Python scripts and the user will execute them for you. The Python function in the calculator has already been in the enviroment, which you can re-use or revise if there is bug. Your responses will be used for research purposes only. Please start with \"Summary: \" to summarize the messages in one paragraph if you have finished the task. Please make sure to include the raw results of the calculator in the summary."
55+
56+
prompt = "Here is the calculator:\n"
57+
prompt += calc + "\n"
58+
prompt += "Here is the patient information:\n"
59+
prompt += patient + "\n"
60+
prompt += "Please apply this calculator to the patient. If there are missing values, please make a range estimation based on best and worst case scenarios inferred from the calculator computing logics. Please write Python scripts and print the results to help the computation. I will provide the stdout to you."
61+
62+
prompt_code = extract_python_code(prompt)
63+
64+
messages = [
65+
{"role": "system", "content": system},
66+
{"role": "user", "content": prompt},
67+
]
68+
69+
n = 0
70+
71+
while True:
72+
73+
response = client.chat.completions.create(
74+
model=model,
75+
messages=messages,
76+
temperature=0,
77+
)
78+
79+
output = response.choices[0].message
80+
81+
n += 1
82+
print(f"Round {n}")
83+
print(output.content)
84+
85+
message = output
86+
messages.append({"role": "assistant", "content": message.content})
87+
88+
if "Summary: " in message.content:
89+
return messages, message.content.split("Summary: ")[-1]
90+
91+
else:
92+
message_code = extract_python_code(message.content)
93+
94+
if message_code:
95+
code = prompt_code + message_code
96+
output = "I have executed your code, and the output is:\n" + capture_exec_output_and_errors(code)
97+
98+
else:
99+
output = "If you have sucessfully applied the calculator. Please start a new message with \"Summary: \"."
100+
101+
messages.append(
102+
{"role": "user", "content": output}
103+
)
104+
105+
if n >= 20:
106+
return messages, "Failed"
107+
108+
109+
if __name__ == "__main__":
110+
# gpt-4-32k or gpt-35-turbo-16k
111+
model = sys.argv[1]
112+
113+
# loading the cached results
114+
output_path = f"results/patient_results_{model}.json"
115+
116+
if os.path.exists(output_path):
117+
output = json.load(open(output_path))
118+
else:
119+
output = {}
120+
121+
# tool selection results
122+
patient_tools = json.load(open(f"results/{model}_calc_selections.json"))
123+
124+
id2calc_txt = json.load(open("tools/id2calculator.json"))
125+
126+
# loading the patient dataset
127+
notes = pd.read_csv("dataset/notes_deidentified_verified.csv")
128+
129+
for _, row in notes.iterrows():
130+
patient_id = str(row["PAT_ENC_CSN_ID"])
131+
note = row["deid_text_combined"]
132+
133+
if patient_id not in output:
134+
output[patient_id] = {}
135+
136+
tool_selection = patient_tools[patient_id]
137+
138+
try:
139+
tools = json.loads(tool_selection)["calculators"]
140+
except:
141+
continue
142+
143+
for tool in tools:
144+
tool = str(tool)
145+
146+
# already cached
147+
if tool in output[patient_id]:
148+
continue
149+
150+
# errors in the selection step
151+
if tool not in id2calc_txt:
152+
continue
153+
154+
calc_text = id2calc_txt[tool]
155+
156+
try:
157+
messages, summary = apply_calc(note, calc_text, model)
158+
except:
159+
continue
160+
161+
output[patient_id][tool] = [summary, messages]
162+
163+
with open(output_path, "w") as f:
164+
json.dump(output, f, indent=4)
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
__author__ = "qiao"
2+
3+
"""
4+
Give an overall score for the ranking patients.
5+
"""
6+
7+
import json
8+
import os
9+
from openai import AzureOpenAI
10+
import sys
11+
12+
client = AzureOpenAI(
13+
api_version="2023-09-01-preview",
14+
azure_endpoint=os.getenv("OPENAI_ENDPOINT"),
15+
api_key=os.getenv("OPENAI_API_KEY"),
16+
)
17+
18+
if __name__ == "__main__":
19+
# model choice
20+
model = sys.argv[1]
21+
22+
# loading cached results
23+
output_path = f"results/calc2results_score_{model}.json"
24+
25+
if os.path.exists(output_path):
26+
outputs = json.load(open(output_path))
27+
else:
28+
outputs = {}
29+
30+
31+
# loading the calculator2results dict
32+
calc2results = json.load(open(f"results/calc2results_{model}.json"))
33+
34+
for calc_id, pid2results in calc2results.items():
35+
36+
# if calc id not even in outputs, initalize the dict
37+
if calc_id not in outputs:
38+
outputs[calc_id] = {}
39+
40+
# pid and the computed summary
41+
for pid, summary in pid2results.items():
42+
43+
# already cached, then continue
44+
if pid in outputs[calc_id]:
45+
continue
46+
47+
system = "You are a helpful medical assistant, and your task is to give an overall score (0-100) given a summary of medical calculation. Higher scores denote more urgent and severe conditions that require immediate attention. If the calculation result contains a wide range, give a low score due to its uncertainty."
48+
49+
prompt = f"Here is the summary: {summary}\n"
50+
prompt += "Output only the overall score (0-100): "
51+
52+
messages = [
53+
{"role": "system", "content": system},
54+
{"role": "user", "content": prompt}
55+
]
56+
57+
response = client.chat.completions.create(
58+
model=model,
59+
messages=messages,
60+
temperature=0,
61+
)
62+
63+
output = response.choices[0].message.content
64+
outputs[calc_id][pid] = output
65+
66+
with open(output_path, "w") as f:
67+
json.dump(outputs, f, indent=4)

0 commit comments

Comments
 (0)