Skip to content

Commit f0e87e6

Browse files
mskarlinjamesbrazapre-commit-ci-lite[bot]
authored
Clinical trials docs and bugfixes (#819)
Co-authored-by: James Braza <[email protected]> Co-authored-by: pre-commit-ci-lite[bot] <117423508+pre-commit-ci-lite[bot]@users.noreply.github.com>
1 parent 641f583 commit f0e87e6

File tree

6 files changed

+284
-27
lines changed

6 files changed

+284
-27
lines changed
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# PaperQA2 for Clinical Trials
2+
3+
PaperQA2 now natively supports querying clinical trials in addition to any documents supplied by the user. It
4+
uses a new tool, the aptly named `clinical_trials_search` tool. Users don't have to provide any clinical
5+
trials to the tool itself, it uses the `clinicaltrials.gov` API to retrieve them on the fly. As of
6+
January 2025, the tool is not enabled by default, but it's easy to configure. Here's an example
7+
where we query only clinical trials, without using any documents:
8+
9+
```python
10+
from paperqa import Settings, agent_query
11+
12+
answer_response = await agent_query(
13+
query="What drugs have been found to effectively treat Ulcerative Colitis?",
14+
settings=Settings.from_name("search_only_clinical_trials"),
15+
)
16+
17+
print(answer_response.session.answer)
18+
```
19+
20+
### Output
21+
22+
Several drugs have been found to effectively treat Ulcerative Colitis (UC),
23+
targeting different mechanisms of the disease.
24+
25+
Golimumab, a tumor necrosis factor (TNF) inhibitor marketed as Simponi®, has demonstrated efficacy
26+
in treating moderate-to-severe UC. Administered subcutaneously, it was shown to maintain clinical
27+
response through Week 54 in patients, as assessed by the Partial Mayo Score (NCT02092285).
28+
29+
Mesalazine, an anti-inflammatory drug, is commonly used for UC treatment. In a study comparing
30+
mesalazine enemas to faecal microbiota transplantation (FMT) for left-sided UC,
31+
mesalazine enemas (4g daily) were effective in inducing clinical remission (Mayo score ≤ 2) (NCT03104036).
32+
33+
Antibiotics have also shown potential in UC management. A combination of doxycycline,
34+
amoxicillin, and metronidazole induced remission in 60-70% of patients with moderate-to-severe
35+
UC in prior studies. These antibiotics are thought to alter gut microbiota, reducing pathobionts
36+
and promoting beneficial bacteria (NCT02217722, NCT03986996).
37+
38+
Roflumilast, a phosphodiesterase-4 (PDE4) inhibitor, is being investigated for mild-to-moderate UC.
39+
Preliminary findings suggest it may improve disease severity and biochemical markers when
40+
added to conventional treatments (NCT05684484).
41+
42+
These treatments highlight diverse therapeutic approaches, including immunosuppression,
43+
microbiota modulation, and anti-inflammatory mechanisms.
44+
45+
You can see the in-line citations for each clinical trial used as a response for each query. If you'd like
46+
to see more data on the specific contexts that were used to answer the query:
47+
48+
```python
49+
print(answer_response.session.contexts)
50+
```
51+
52+
[Context(context='The excerpt mentions that a search on ClinicalTrials.gov for clinical trials related to drugs
53+
treating Ulcerative Colitis yielded 689 trials. However, it does not provide specific information about which
54+
drugs have been found effective for treating Ulcerative Colitis.', text=Text(text='', name=...
55+
56+
Using `Settings.from_name('search_only_clinical_trials')` is a shortcut, but note that you can easily
57+
add `clinical_trial_search` into any custom `Settings` by just explicitly naming it as a tool:
58+
59+
```python
60+
from pathlib import Path
61+
from paperqa import Settings, agent_query, AgentSetting
62+
from paperqa.agents.tools import DEFAULT_TOOL_NAMES
63+
64+
# you can start with the default list of PaperQA tools
65+
print(DEFAULT_TOOL_NAMES)
66+
# >>> ['paper_search', 'gather_evidence', 'gen_answer', 'reset', 'complete'],
67+
68+
# we can start with a directory with a potentially useful paper in it
69+
print(list(Path("my_papers").iterdir()))
70+
71+
# now let's query using standard tools + clinical_trials
72+
answer_response = await agent_query(
73+
query="What drugs have been found to effectively treat Ulcerative Colitis?",
74+
settings=Settings(
75+
paper_directory="my_papers",
76+
agent={"tool_names": DEFAULT_TOOL_NAMES + ["clinical_trials_search"]},
77+
),
78+
)
79+
80+
# let's check out the formatted answer (with references included)
81+
print(answer_response.session.formatted_answer)
82+
```
83+
84+
Question: What drugs have been found to effectively treat Ulcerative Colitis?
85+
86+
Several drugs have been found effective in treating Ulcerative Colitis (UC), with treatment
87+
strategies varying based on disease severity and extent. For mild-to-moderate UC, 5-aminosalicylic
88+
acid (5-ASA) is the first-line therapy. Topical 5-ASA, such as mesalazine suppositories (1 g/day),
89+
is effective for proctitis or distal colitis, inducing remission in 31-80% of patients. Oral mesalazine
90+
at higher doses (e.g., 4.8 g/day) can accelerate clinical improvement in more extensive disease
91+
(meier2011currenttreatmentof pages 1-2; meier2011currenttreatmentof pages 3-4).
92+
93+
For moderate-to-severe cases, corticosteroids are commonly used. Oral steroids like prednisolone
94+
(40-60 mg/day) or intravenous steroids such as methylprednisolone (60 mg/day) and hydrocortisone
95+
(400 mg/day) are standard for inducing remission (meier2011currenttreatmentof pages 3-4). Tumor
96+
necrosis factor (TNF)-α blockers, such as infliximab, are effective for steroid-refractory cases
97+
(meier2011currenttreatmentof pages 2-3; meier2011currenttreatmentof pages 3-4).
98+
99+
Immunosuppressive agents, including azathioprine and 6-mercaptopurine, are used for maintenance
100+
therapy in steroid-dependent or refractory cases (meier2011currenttreatmentof pages 2-3;
101+
meier2011currenttreatmentof pages 3-4). Antibiotics, such as combinations of penicillin,
102+
tetracycline, and metronidazole, have shown promise in altering the microbiota and inducing
103+
remission in some patients, though their efficacy varies (NCT02217722).
104+
105+
References
106+
107+
1. (meier2011currenttreatmentof pages 2-3): Johannes Meier and Andreas Sturm. Current treatment
108+
of ulcerative colitis. World journal of gastroenterology, 17 27:3204-12, 2011.
109+
URL: https://doi.org/10.3748/wjg.v17.i27.3204, doi:10.3748/wjg.v17.i27.3204.
110+
111+
2. (meier2011currenttreatmentof pages 3-4): Johannes Meier and Andreas Sturm. Current treatment
112+
of ulcerative colitis. World journal of gastroenterology, 17 27:3204-12, 2011. URL:
113+
https://doi.org/10.3748/wjg.v17.i27.3204, doi:10.3748/wjg.v17.i27.3204.
114+
115+
3. (NCT02217722): Prof. Arie Levine. Use of the Ulcerative Colitis Diet for Induction of
116+
Remission. Prof. Arie Levine. 2014. ClinicalTrials.gov Identifier: NCT02217722
117+
118+
4. (meier2011currenttreatmentof pages 1-2): Johannes Meier and Andreas Sturm. Current
119+
treatment of ulcerative colitis. World journal of gastroenterology, 17 27:3204-12, 2011.
120+
URL: https://doi.org/10.3748/wjg.v17.i27.3204, doi:10.3748/wjg.v17.i27.3204.
121+
122+
We now see both papers and clinical trials cited in our response. For convenience, we have a
123+
`Settings.from_name` that works as well:
124+
125+
```python
126+
from paperqa import Settings, agent_query
127+
128+
answer_response = await agent_query(
129+
query="What drugs have been found to effectively treat Ulcerative Colitis?",
130+
settings=Settings.from_name("clinical_trials"),
131+
)
132+
```
133+
134+
And, this works with the `pqa` cli as well:
135+
136+
```bash
137+
>>> pqa --settings 'search_only_clinical_trials' ask 'what is Ibuprofen effective at treating?'
138+
```
139+
140+
...
141+
[13:29:50] Completing 'what is Ibuprofen effective at treating?' as 'certain'.
142+
Answer: Ibuprofen is a non-steroidal anti-inflammatory drug (NSAID) effective
143+
in treating various conditions, including pain, inflammation, and fever.
144+
It is widely used for tension-type
145+
headaches, with studies showing that ibuprofen sodium provides significant
146+
pain relief and reduces pain intensity compared to standard ibuprofen and placebo
147+
over a 3-hour period (NCT01362491).
148+
Intravenous ibuprofen is effective in managing postoperative pain, particularly
149+
in orthopedic surgeries, and helps control the inflammatory process. When combined
150+
with opioids, it reduces opioid
151+
consumption and associated side effects, making it a key component of
152+
multimodal analgesia (NCT05401916, NCT01773005).
153+
154+
Ibuprofen is also effective in pediatric populations as a first-line
155+
anti-inflammatory and antipyretic agent due to its relatively
156+
low adverse effects compared to other NSAIDs (NCT01478022).
157+
Additionally, it has been studied for its potential use in managing
158+
chronic periodontitis through subgingival irrigation with a 2% ibuprofen
159+
mouthwash, which reduces periodontal pocket depth and
160+
bleeding on probing, improving periodontal health (NCT02538237).
161+
162+
These findings highlight ibuprofen's versatility in treating pain, inflammation,
163+
fever, and specific conditions like tension headaches, postoperative pain, and periodontal diseases.

paperqa/agents/main.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
from .models import AgentStatus, AnswerResponse, SimpleProfiler
3232
from .search import SearchDocumentStorage, SearchIndex, get_directory_index
3333
from .tools import (
34+
DEFAULT_TOOL_NAMES,
3435
Complete,
3536
EnvironmentState,
3637
GatherEvidence,
@@ -117,7 +118,10 @@ async def run_agent(
117118
)
118119

119120
# Build the index once here, and then all tools won't need to rebuild it
120-
await get_directory_index(settings=settings)
121+
# only build if the a search tool is requested
122+
if PaperSearch.TOOL_FN_NAME in (settings.agent.tool_names or DEFAULT_TOOL_NAMES):
123+
await get_directory_index(settings=settings)
124+
121125
if isinstance(agent_type, str) and agent_type.lower() == FAKE_AGENT_TYPE:
122126
session, agent_status = await run_fake_agent(
123127
query, settings, docs, **runner_kwargs

paperqa/configs/clinical_trials.json

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
{
2+
"answer": {
3+
"evidence_k": 15,
4+
"answer_max_sources": 5,
5+
"max_concurrent_requests": 10
6+
},
7+
"agent": {
8+
"tool_names": [
9+
"gather_evidence",
10+
"search_papers",
11+
"gen_answer",
12+
"clinical_trials_search",
13+
"complete"
14+
]
15+
},
16+
"parsing": {
17+
"use_doc_details": true,
18+
"chunk_size": 9000,
19+
"overlap": 750
20+
}
21+
}
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
{
2+
"answer": {
3+
"evidence_k": 15,
4+
"answer_max_sources": 5,
5+
"max_concurrent_requests": 10
6+
},
7+
"agent": {
8+
"tool_names": [
9+
"gather_evidence",
10+
"gen_answer",
11+
"clinical_trials_search",
12+
"complete"
13+
]
14+
},
15+
"parsing": {
16+
"use_doc_details": true,
17+
"chunk_size": 9000,
18+
"overlap": 750
19+
}
20+
}

paperqa/sources/clinical_trials.py

Lines changed: 50 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
from paperqa.docs import Docs
1818
from paperqa.settings import Settings
1919
from paperqa.types import DocDetails, Embeddable, Text
20-
from paperqa.utils import gather_with_concurrency
20+
from paperqa.utils import gather_with_concurrency, logging_filters
2121

2222
logger = logging.getLogger(__name__)
2323

@@ -29,45 +29,61 @@
2929
SEARCH_PAGE_SIZE = 1000
3030
TRIAL_API_FIELDS = "protocolSection,derivedSection"
3131
DOWNLOAD_CONCURRENCY = 20
32-
TRIAL_CHAR_TRUNCATION_SIZE = 30_000 # larger will prevent embeddings from working
32+
TRIAL_CHAR_TRUNCATION_SIZE = 28_000 # stay under 8k tokens for embeddings context limit
3333
MALFORMATTED_QUERY_STATUS: int = 400
3434

3535

36+
class CookieWarningFilter(logging.Filter):
37+
"""Filters out invalid cookie warning.
38+
39+
clincialtrials.gov always sends an x-enc header which aiohttp parsers can't handle
40+
"""
41+
42+
def filter(self, record):
43+
return "Can not load response cookies" not in record.getMessage()
44+
45+
3646
@retry(
3747
stop=stop_after_attempt(3),
3848
wait=wait_incrementing(0.1, 0.1),
3949
retry=retry_if_exception_type(ClientResponseError),
4050
)
4151
async def api_search_clinical_trials(query: str, session: ClientSession) -> dict:
42-
async with session.get(
43-
STUDIES_API_URL,
44-
params={
45-
"query.term": query,
46-
"fields": SEARCH_API_FIELDS,
47-
"pageSize": SEARCH_PAGE_SIZE,
48-
"countTotal": "true",
49-
"sort": "@relevance",
50-
},
51-
) as response:
52-
if response.status == MALFORMATTED_QUERY_STATUS:
53-
# the 400s from clinicaltrials.gov are not JSON
54-
raise HTTPBadRequest(reason=await response.text())
55-
response.raise_for_status()
56-
return await response.json()
52+
53+
with logging_filters(loggers={"aiohttp.client"}, filters={CookieWarningFilter}):
54+
async with (
55+
session.get(
56+
STUDIES_API_URL,
57+
params={
58+
"query.term": query,
59+
"fields": SEARCH_API_FIELDS,
60+
"pageSize": SEARCH_PAGE_SIZE,
61+
"countTotal": "true",
62+
"sort": "@relevance",
63+
},
64+
) as response,
65+
):
66+
if response.status == MALFORMATTED_QUERY_STATUS:
67+
# the 400s from clinicaltrials.gov are not JSON
68+
raise HTTPBadRequest(reason=await response.text())
69+
response.raise_for_status()
70+
return await response.json()
5771

5872

5973
@retry(
6074
stop=stop_after_attempt(3),
6175
wait=wait_incrementing(0.1, 0.1),
6276
)
6377
async def api_get_clinical_trial(nct_id: str, session: ClientSession) -> dict | None:
64-
with suppress(ClientResponseError):
65-
async with session.get(
66-
f"{STUDIES_API_URL}/{nct_id}", params={"fields": TRIAL_API_FIELDS}
67-
) as response:
68-
response.raise_for_status()
69-
return await response.json()
70-
return None
78+
with logging_filters(loggers={"aiohttp.client"}, filters={CookieWarningFilter}):
79+
with suppress(ClientResponseError):
80+
async with session.get(
81+
f"{STUDIES_API_URL}/{nct_id}",
82+
params={"fields": TRIAL_API_FIELDS},
83+
) as response:
84+
response.raise_for_status()
85+
return await response.json()
86+
return None
7187

7288

7389
async def search_retrieve_clinical_trials(
@@ -234,16 +250,20 @@ async def add_clinical_trials_to_docs(
234250
tuple[int, int, str | None]:
235251
Total number of trials found, number of trials added, and error message if any.
236252
"""
237-
session = aiohttp.ClientSession() if session is None else session
253+
# Cookies are not needed, and malformed via clinicaltrials.gov
254+
_session = aiohttp.ClientSession() if session is None else session
238255

239256
logger.info(f"Querying clinical trials for: {query}.")
240257

241258
try:
242259
trials, total_result_count = await search_retrieve_clinical_trials(
243-
query, session, limit, offset
260+
query, _session, limit, offset
244261
)
245262
except Exception as e:
246263
logger.warning(f"Failed to retrieve clinical trials for query: {query}.")
264+
# close session if it was ephemeral
265+
if session is None:
266+
await _session.close()
247267
return (0, 0, str(e))
248268

249269
logger.info(f"Successfully found {len(trials)} trials.")
@@ -300,6 +320,10 @@ async def add_clinical_trials_to_docs(
300320
settings=settings,
301321
)
302322

323+
# close session if it was ephemeral
324+
if session is None:
325+
await _session.close()
326+
303327
return (total_result_count, len(docs.texts) - inital_docs_size, None)
304328

305329

paperqa/utils.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
from __future__ import annotations
22

33
import asyncio
4+
import contextlib
45
import hashlib
56
import logging
67
import logging.config
@@ -519,3 +520,27 @@ def extract_thought(content: str | None) -> str:
519520
"peer-review": "misc", # No direct equivalent, so 'misc' is used
520521
"other": "article", # Assume an article if we don't know the type
521522
}
523+
524+
525+
@contextlib.contextmanager
526+
def logging_filters(
527+
loggers: Collection[str], filters: Collection[type[logging.Filter]]
528+
):
529+
"""Temporarily add a filter to each specified logger."""
530+
filters_added: dict[str, list[logging.Filter]] = {}
531+
try:
532+
for logger_name in loggers:
533+
log_to_filter = logging.getLogger(logger_name)
534+
for log_filter in filters:
535+
_filter = log_filter()
536+
log_to_filter.addFilter(_filter)
537+
if logger_name not in filters_added:
538+
filters_added[logger_name] = [_filter]
539+
else:
540+
filters_added[logger_name] += [_filter]
541+
yield
542+
finally:
543+
for logger_name, log_filters_to_remove in filters_added.items():
544+
log_with_filter = logging.getLogger(logger_name)
545+
for log_filter_to_remove in log_filters_to_remove:
546+
log_with_filter.removeFilter(log_filter_to_remove)

0 commit comments

Comments
 (0)