Skip to content

Commit ca1c0b4

Browse files
authored
Update README.md
1 parent a4129c9 commit ca1c0b4

File tree

1 file changed

+39
-1
lines changed

1 file changed

+39
-1
lines changed

README.md

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,9 @@
1-
# TrialGPT
1+
# TrialGPT: Matching Patients to Clinical Trials with Large Language Models
2+
3+
## Introduction
4+
5+
Clinical trials are often hindered by the challenge of patient recruitment. In this work, we introduce TrialGPT, a first-of-its-kind large language model (LLM) framework to assist patient-to-trial matching. Given a patient note, TrialGPT predicts the patient’s eligibility on a criterion-by-criterion basis and then consolidates these predictions assess the patient’s eligibility for the target trial. We evaluate the trial-level prediction performance of TrialGPT on three publicly available cohorts of 184 patients with over 18,000 trial annotations. We also engaged three physicians to label over 1,000 patient-criterion pairs to assess its criterion-level prediction accuracy. Experimental results show that TrialGPT achieves a criterion-level accuracy of 87.3% with faithful explanations, close to the expert performance (88.7%–90.0%). The aggregated TrialGPT scores are highly correlated with human eligibility judgments, and they outperform the best-competing models by 32.6% to 57.2% in ranking and excluding clinical trials. Furthermore, our user study reveals that TrialGPT can significantly reduce the screening time (by 42.6%) in a real-life clinical trial matching task. These results and analyses have demonstrated promising opportunities for clinical trial matching with LLMs such as TrialGPT.
6+
27

38
## Configuration
49

@@ -17,6 +22,39 @@ config = {
1722
}
1823
```
1924

25+
## Datasets
26+
27+
We provide the pre-processed datasets of three publicly available cohorts in `./datasets`, including:
28+
- `./datasets/trial_sigir.json` for the SIGIR cohort
29+
- `./datasets/trial_2021.json` for the TREC Clinical Trials 2021 cohort
30+
- `./datasets/trial_2022.json` for the TREC Clinical Trials 2022 cohort
31+
32+
We also put a pre-processed set of the used clinical trials in `./datasets/trial2info.json`.
33+
34+
## Step 1: Criterion-level Prediction
35+
36+
The first step of TrialGPT is to generate the criterion-level predictions, which include (1) the explanation of patient-criterion relevance, (2) locations of relevant sentences, and (3) the eligibility predictions.
37+
38+
Run the following code to get the GPT-4-based TrialGPT results for the three cohorts:
39+
```bash
40+
# format python run_matching.py {split} {model}
41+
python run_matching.py sigir gpt-4
42+
python run_matching.py 2021 gpt-4
43+
python run_matching.py 2022 gpt-4
44+
```
45+
46+
## Step 2: Trial-level Aggregation
47+
48+
The second step of TrialGPT is to aggregate the criterion-level predictions to get trial-level scores, including one score for relevance and one score for eligibility.
49+
50+
Please make sure that the step 1 results are ready before running the step 2 code:
51+
```bash
52+
# format python run_aggregation.py {split} {model}
53+
python run_aggregation.py sigir gpt-4
54+
python run_aggregation.py 2021 gpt-4
55+
python run_aggregation.py 2022 gpt-4
56+
```
57+
2058
## Acknowledgments
2159

2260
This work was supported by the Intramural Research Programs of the National Institutes of Health, National Library of Medicine.

0 commit comments

Comments
 (0)