@@ -37,7 +37,7 @@ The first step of TrialGPT is to generate the criterion-level predictions, which
3737
3838Run the following code to get the GPT-4-based TrialGPT results for the three cohorts:
3939``` bash
40- # format python run_matching.py {split} {model}
40+ # format: python run_matching.py {split} {model}
4141python run_matching.py sigir gpt-4
4242python run_matching.py 2021 gpt-4
4343python run_matching.py 2022 gpt-4
@@ -49,12 +49,77 @@ The second step of TrialGPT is to aggregate the criterion-level predictions to g
4949
5050Please make sure that the step 1 results are ready before running the step 2 code:
5151``` bash
52- # format python run_aggregation.py {split} {model}
52+ # format: python run_aggregation.py {split} {model}
5353python run_aggregation.py sigir gpt-4
5454python run_aggregation.py 2021 gpt-4
5555python run_aggregation.py 2022 gpt-4
5656```
5757
58+ # Step 3: Computing Performance
59+
60+ The third step is to compute the performance of different linear features, LLM features, and the combined features.
61+
62+ Please make sure that the step 1 and step 2 results are ready before running the step 3 code:
63+ ``` bash
64+ # first convert the results of each split into a csv file
65+ # format: python convert_results_to_csv.py {split} {model}
66+ python convert_results_to_csv.py sigir gpt-4
67+ python convert_results_to_csv.py 2021 gpt-4
68+ python convert_results_to_csv.py 2022 gpt-4
69+
70+ # then compute the results
71+ # format: python get_ranking_results.py {model}
72+ python get_ranking_results.py gpt-4
73+ ```
74+
75+ An example output is:
76+ ``` bash
77+ Ranking NDCG@10
78+ comb 0.8164884118874282
79+ % inc 0.6332474730345071
80+ % not inc 0.5329210870830088
81+ % exc 0.43696962433262426
82+ % not exc 0.45405418648143114
83+ bool not inc 0.5329768607974994
84+ bool exc 0.43696962433262426
85+ random 0.37846131596973925
86+ eligibility 0.7065496001369167
87+ relevance 0.7338932013178386
88+ Ranking Prec@10
89+ comb 0.7327619047619052
90+ % inc 0.5749776817540412
91+ % not inc 0.4977844888166035
92+ % exc 0.4148365417832818
93+ % not exc 0.4310829292061639
94+ bool not inc 0.4977844888166035
95+ bool exc 0.4148365417832818
96+ random 0.36647619047619046
97+ eligibility 0.5659880952380945
98+ relevance 0.552433886908542
99+ Ranking MRR
100+ comb 0.9098095238095236
101+ % inc 0.3827687074829934
102+ % not inc 0.019997732426303858
103+ % exc 0.0009523809523809524
104+ % not exc 0.020113378684807254
105+ bool not inc 0.019997732426303858
106+ bool exc 0.0009523809523809524
107+ random 0.5900770975056686
108+ eligibility 0.8301904761904761
109+ relevance 0.7437573696145123
110+ Auc
111+ comb 0.774898491501416
112+ % inc 0.6524326107402266
113+ % not inc 0.6561815920536348
114+ % exc 0.6512699942037056
115+ % not exc 0.6279445988475326
116+ bool not inc 0.6559597180944899
117+ bool exc 0.6521852962178314
118+ random 0.49775549502869065
119+ eligibility 0.6377132521512072
120+ relevance 0.6495563326979852
121+ ```
122+
58123## Acknowledgments
59124
60125This work was supported by the Intramural Research Programs of the National Institutes of Health, National Library of Medicine.
0 commit comments