Skip to content

Commit feb3c43

Browse files
authored
Journal quality fixes (#1034)
1 parent 608850e commit feb3c43

9 files changed

+1441
-6
lines changed

src/paperqa/clients/client_data/journal_quality.csv

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5143,6 +5143,8 @@ annual review of neuroscience,2
51435143
annual review of nuclear and particle science,1
51445144
annual review of nutrition,3
51455145
annual review of pathology-mechanisms of disease,2
5146+
annual review of pathology: mechanisms of disease,2
5147+
annual review of pathology,2
51465148
annual review of pharmacology and toxicology,3
51475149
annual review of physical chemistry,1
51485150
annual review of physiology,3
@@ -6057,6 +6059,15 @@ biochimica et biophysica acta: molecular basis of disease,1
60576059
biochimica et biophysica acta: molecular cell research,1
60586060
biochimica et biophysica acta: proteins and proteomics,1
60596061
biochimica et biophysica acta: reviews on cancer,1
6062+
biochimica et biophysica acta (bba) - bioenergetics,1
6063+
biochimica et biophysica acta (bba) - biomembranes,1
6064+
biochimica et biophysica acta (bba) - gene regulatory mechanisms,1
6065+
biochimica et biophysica acta (bba) - general subjects,1
6066+
biochimica et biophysica acta (bba) - molecular and cell biology of lipids,1
6067+
biochimica et biophysica acta (bba) - molecular basis of disease,1
6068+
biochimica et biophysica acta (bba) - molecular cell research,1
6069+
biochimica et biophysica acta (bba) - proteins and proteomics,1
6070+
biochimica et biophysica acta (bba) - reviews on cancer,1
60606071
biochimie,1
60616072
biochip journal,1
60626073
bioconjugate chemistry,1
@@ -17343,6 +17354,8 @@ proceedings of the linnean society of new south wales,1
1734317354
proceedings of the london mathematical society,3
1734417355
proceedings of the national academy of sciences india section b: biologicalsciences,1
1734517356
proceedings of the national academy of sciences of the united states of america,3
17357+
proceedings of the national academy of sciences,3
17358+
pnas,3
1734617359
proceedings of the nutrition society,1
1734717360
proceedings of the prehistoric society,2
1734817361
proceedings of the risø international symposium on materials science,1
@@ -31738,7 +31751,6 @@ proceedings of international conference on the advancement of steam,0
3173831751
selected papers of internet research,0
3173931752
bat research news,1
3174031753
imerides endymasiologias - praktika,1
31741-
scientific reports,0
3174231754
ecaade proceedings,0
3174331755
traficomin tutkimuksia ja selvityksiä,0
3174431756
esignals research,0
@@ -32339,6 +32351,7 @@ radical philosophy review,1
3233932351
psychology of popular media,1
3234032352
electronic research archive,1
3234132353
bmc ecology and evolution,2
32354+
bmc evolutionary biology,2
3234232355
annales fennici mathematici,2
3234332356
minerva surgery,1
3234432357
forces in mechanics,0

src/paperqa/clients/journal_quality.py

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
import csv
44
import logging
55
import os
6-
from typing import Any
6+
from typing import Any, ClassVar
77

88
from pydantic import ValidationError
99

@@ -18,6 +18,10 @@
1818

1919

2020
class JournalQualityPostProcessor(MetadataPostProcessor[JournalQuery]):
21+
22+
# these will be deleted from any journal names before querying
23+
CASEFOLD_PHRASES_TO_REMOVE: ClassVar[list[str]] = ["amp;"]
24+
2125
def __init__(self, journal_quality_path: os.PathLike | str | None = None) -> None:
2226
if journal_quality_path is None:
2327
# Construct the path relative to module
@@ -41,17 +45,22 @@ async def _process(
4145
) -> DocDetails:
4246
if not self.data:
4347
self.load_data()
48+
49+
# TODO: not super scalable, but unless we need more than this we can just grugbrain
50+
journal_query = query.journal.casefold()
51+
for phrase in self.CASEFOLD_PHRASES_TO_REMOVE:
52+
journal_query = journal_query.replace(phrase, "")
53+
4454
# docname can be blank since the validation will add it
4555
# remember, if both have docnames (i.e. key) they are
4656
# wiped and re-generated with resultant data
4757
return doc_details + DocDetails(
4858
doc_id=doc_details.doc_id, # ensure doc_id is preserved
4959
dockey=doc_details.dockey, # ensure dockey is preserved
5060
source_quality=max(
51-
[
52-
self.data.get(query.journal.casefold(), DocDetails.UNDEFINED_JOURNAL_QUALITY), # type: ignore[union-attr]
53-
self.data.get("the " + query.journal.casefold(), DocDetails.UNDEFINED_JOURNAL_QUALITY), # type: ignore[union-attr]
54-
]
61+
self.data.get(journal_query, DocDetails.UNDEFINED_JOURNAL_QUALITY), # type: ignore[union-attr]
62+
self.data.get("the " + journal_query, DocDetails.UNDEFINED_JOURNAL_QUALITY), # type: ignore[union-attr]
63+
self.data.get(journal_query.replace("&", "and"), DocDetails.UNDEFINED_JOURNAL_QUALITY), # type: ignore[union-attr]
5564
),
5665
)
5766

tests/cassettes/test_tricky_journal_quality_results[10.1016-j.bbcan.2023.188947-1].yaml

Lines changed: 201 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

tests/cassettes/test_tricky_journal_quality_results[10.1016-j.semcdb.2016.08.024-1].yaml

Lines changed: 289 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

tests/cassettes/test_tricky_journal_quality_results[10.1038-s41598-018-27044-6-1].yaml

Lines changed: 375 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

tests/cassettes/test_tricky_journal_quality_results[10.1073-pnas.1205508109-3].yaml

Lines changed: 127 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)