Skip to content

Commit f489cdd

Browse files
committed
fix: Article IDs correctly extracted and tested
1 parent 832b190 commit f489cdd

File tree

5 files changed

+54
-2
lines changed

5 files changed

+54
-2
lines changed

.github/workflows/test.yml

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
name: Source
3+
on: [push, release]
4+
5+
jobs:
6+
test-source-install:
7+
runs-on: ubuntu-latest
8+
strategy:
9+
max-parallel: 3
10+
matrix:
11+
python-version:
12+
- "3.10"
13+
steps:
14+
- name: Checkout
15+
uses: actions/checkout@v2
16+
- name: Set up Python
17+
uses: actions/setup-python@v2
18+
with:
19+
python-version: ${{ matrix.python-version }}
20+
- name: Install dependencies
21+
run: |
22+
python -m pip install --upgrade pip
23+
pip install -r requirements.txt
24+
- name: Install package from source
25+
run: pip install -e .
26+
- name: Test package from source
27+
run: |
28+
python -c "import pymed_paperscraper"
29+
python -m pytest pymed_paperscraper

README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,10 +28,21 @@ pubmed = PubMed(tool="MyTool", email="my@email.address")
2828
results = pubmed.query("Some query", max_results=500)
2929
```
3030

31+
## Bugfixes compared to archived [`pymed`](https://github.com/gijswobben/pymed):
32+
- Article IDs are correctly extracted [`pymed#22`](https://github.com/gijswobben/pymed/issues/22)
33+
- Automatic retries if API is unresponsive/overloaded. Support for `max_tries` in `PubMed` class.
34+
3135
## Notes on the API
3236
The original documentation of the PubMed API can be found here: [PubMed Central](https://www.ncbi.nlm.nih.gov/pmc/tools/developers/). PubMed Central kindly requests you to:
3337

3438
> - Do not make concurrent requests, even at off-peak times; and
3539
> - Include two parameters that help to identify your service or application to our servers
3640
> * _tool_ should be the name of the application, as a string value with no internal spaces, and
3741
> * _email_ should be the e-mail address of the maintainer of the tool, and should be a valid e-mail address.
42+
43+
## Citation
44+
If you use `pymed_paperscraper` in your work, please cite:
45+
```bib
46+
(Citation follows)
47+
```
48+

pymed_paperscraper/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
from .api import PubMed
22

33
__all__ = ["PubMed"]
4-
__version__ = "1.0.3"
4+
__version__ = "1.0.4"

pymed_paperscraper/article.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ def __init__(
4343
self.__setattr__(field, kwargs.get(field, None))
4444

4545
def _extractPubMedId(self: object, xml_element: TypeVar("Element")) -> str:
46-
path = ".//ArticleId[@IdType='pubmed']"
46+
path = ".//PubmedData/ArticleIdList/ArticleId[@IdType='pubmed']"
4747
return getContent(element=xml_element, path=path)
4848

4949
def _extractTitle(self: object, xml_element: TypeVar("Element")) -> str:
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
from pymed_paperscraper import PubMed
2+
3+
4+
def test_unique_id():
5+
pubmed = PubMed(tool="MyTool", email="my@email.address")
6+
query = '((Haliaeetus leucocephalus[Title/Abstract])) AND ((prey[Title/Abstract]) OR (diet[Title/Abstract]))'
7+
results = pubmed.query(query, max_results=30)
8+
9+
for r in results:
10+
ids = r.pubmed_id.strip().split("\n")
11+
print('org',r.pubmed_id, 'IDS', ids)
12+
assert len(ids) == 1

0 commit comments

Comments
 (0)