Conversation
UCP/UGP prompting methods faithful DSPy prompt repoduction
|
This commit adds a few changes |
|
@warner-benjamin Adding a short summary for your reference. There is 2 types of 'Prompting, Zero Shot (ZS) and Few-Shot (FS) prompt and 3 types of 'Methods' being Unified Global Prompting (UGP), Uniform Category-Specific Prompting (UCP) and Filtered Category-Specific Prompting (FCSP). ZS contains only instructions while FS contains instructions + output examples. To be clear, there is also ZS-CoT:. Add the phrase "Let's think step by step" to the prompt. The paper did suggests skipping this as it did not significantly improve clinical extraction. It was only applied to Qwen2.5:32B. UGP is the Baseline Approach where 1 large prompt is created combines 13 categories into 1 LLM call. I am not sure if the input segments is available in the hf dataset. Authors dont provide the inputs for FCSP prompting. |
This PR implements the CaseReportBench environment for dense clinical information extraction from case reports.
Metric Replication: Implemented Token Set Ratio (TSR), BLEU-1/4, ROUGE-L, Omission, and Hallucination metrics exactly as found in the author's eval_metrics.py.
Prompts: Extracted DSPy prompts.
Two items remained ambiguous in the source repository (which I created issue in original repo for):
Missing Lab_Image Prompt: This category appears in the author's evaluation scripts but has no corresponding DSPy signature in the source code. It has been excluded from this implementation.
Hence, scope Includes 13 extraction categories instead of 14.
Missing Preprocessing Logic: The original repository references a preprocessing_llm_output.py file that was not included in the public repo. Used medarc_verifiers.parsers.JSONParser and flattening logic based on the paper's description for reliable extraction of structured data from model outputs.