Skip to content

Commit 9531d7d

Browse files
authored
Restructure README.md: focus on generated docs (#144)
1 parent c24717f commit 9531d7d

File tree

3 files changed

+71
-110
lines changed

3 files changed

+71
-110
lines changed

README.md

Lines changed: 47 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,8 @@
11
# arxiv.py
2-
[![PyPI](https://img.shields.io/pypi/v/arxiv)](https://pypi.org/project/arxiv/) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/arxiv) [![GitHub Workflow Status (branch)](https://img.shields.io/github/actions/workflow/status/lukasschwab/arxiv.py/python-package.yml?branch=master)](https://github.com/lukasschwab/arxiv.py/actions?query=branch%3Amaster)
2+
[![PyPI](https://img.shields.io/pypi/v/arxiv)](https://pypi.org/project/arxiv/) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/arxiv) [![GitHub Workflow Status (branch)](https://img.shields.io/github/actions/workflow/status/lukasschwab/arxiv.py/python-package.yml?branch=master)](https://github.com/lukasschwab/arxiv.py/actions?query=branch%3Amaster) [![Full package documentation](https://img.shields.io/badge/docs-hosted-brightgreen)](https://lukasschwab.me/arxiv.py/index.html)
33

44
Python wrapper for [the arXiv API](https://arxiv.org/help/api/index).
55

6-
## Quick links
7-
8-
+ [Full package documentation](https://lukasschwab.me/arxiv.py/index.html)
9-
+ [Example: fetching results](#example-fetching-results): the most common usage.
10-
+ [Example: downloading papers](#example-downloading-papers)
11-
+ [Example: fetching results with a custom client](#example-fetching-results-with-a-custom-client)
12-
13-
## About arXiv
14-
156
[arXiv](https://arxiv.org/) is a project by the Cornell University Library that provides open access to 1,000,000+ articles in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, and Statistics.
167

178
## Usage
@@ -28,94 +19,48 @@ In your Python script, include the line
2819
import arxiv
2920
```
3021

31-
### Search
32-
33-
A `Search` specifies a search of arXiv's database.
34-
35-
```python
36-
arxiv.Search(
37-
query: str = "",
38-
id_list: List[str] = [],
39-
max_results: int | None = None,
40-
sort_by: SortCriterion = SortCriterion.Relevance,
41-
sort_order: SortOrder = SortOrder.Descending
42-
)
43-
```
44-
45-
+ `query`: an arXiv query string. Advanced query formats are documented in the [arXiv API User Manual](https://arxiv.org/help/api/user-manual#query_details).
46-
+ `id_list`: list of arXiv record IDs (typically of the format `"0710.5765v1"`). See [the arXiv API User's Manual](https://arxiv.org/help/api/user-manual#search_query_and_id_list) for documentation of the interaction between `query` and `id_list`.
47-
+ `max_results`: The maximum number of results to be returned in an execution of this search. To fetch every result available, set `max_results=None` (default); to fetch up to 10 results, set `max_results=10`. The API's limit is 300,000 results.
48-
+ `sort_by`: The sort criterion for results: `relevance`, `lastUpdatedDate`, or `submittedDate`.
49-
+ `sort_order`: The sort order for results: `'descending'` or `'ascending'`.
50-
51-
To fetch arXiv records matching a `Search`, use `(Client).results(search)` to get a generator yielding `Result`s.
22+
### Examples
5223

53-
#### Example: fetching results
54-
55-
Print the titles fo the 10 most recent articles related to the keyword "quantum:"
24+
#### Fetching results
5625

5726
```python
5827
import arxiv
5928

29+
# Construct the default API client.
30+
client = arxiv.Client()
31+
32+
# Search for the 10 most recent articles matching the keyword "quantum."
6033
search = arxiv.Search(
6134
query = "quantum",
6235
max_results = 10,
6336
sort_by = arxiv.SortCriterion.SubmittedDate
6437
)
6538

66-
for result in arxiv.Client().results(search):
67-
print(result.title)
68-
```
39+
results = client.results(search)
6940

70-
Use the `query` syntax documented in the [arXiv API User Manual](https://arxiv.org/help/api/user-manual#query_details):
71-
72-
```python
73-
import arxiv
41+
# `results` is a generator; you can iterate over its elements one by one...
42+
for r in client.results(search):
43+
print(r.title)
44+
# ...or exhaust it into a list. Careful: this is slow for large results sets.
45+
all_results = list(results)
46+
print([r.title for r in all_results])
7447

48+
# For advanced query syntax documentation, see the arXiv API User Manual:
49+
# https://arxiv.org/help/api/user-manual#query_details
7550
search = arxiv.Search(query = "au:del_maestro AND ti:checkerboard")
76-
first_result = next(arxiv.Client().results(search))
51+
first_result = next(client.results(search))
7752
print(first_result)
78-
```
79-
80-
Fetch and print the title of the paper with ID "1605.08386v1:"
8153

82-
```python
83-
import arxiv
84-
85-
client = arxiv.Client()
86-
search = arxiv.Search(id_list=["1605.08386v1"])
87-
88-
paper = next(arxiv.Client().results(search))
89-
print(paper.title)
54+
# Search for the paper with ID "1605.08386v1"
55+
search_by_id = arxiv.Search(id_list=["1605.08386v1"])
56+
# Reuse client to fetch the paper, then print its title.
57+
first_result = next(client.results(search))
58+
print(first_result.title)
9059
```
9160

92-
### Result
93-
94-
<!-- TODO: improve this section. -->
61+
#### Downloading papers
9562

96-
The `Result` objects yielded by `(Client).results()` include metadata about each paper and some helper functions for downloading their content.
97-
98-
The meaning of the underlying raw data is documented in the [arXiv API User Manual: Details of Atom Results Returned](https://arxiv.org/help/api/user-manual#_details_of_atom_results_returned).
99-
100-
+ `result.entry_id`: A url `https://arxiv.org/abs/{id}`.
101-
+ `result.updated`: When the result was last updated.
102-
+ `result.published`: When the result was originally published.
103-
+ `result.title`: The title of the result.
104-
+ `result.authors`: The result's authors, as `arxiv.Author`s.
105-
+ `result.summary`: The result abstract.
106-
+ `result.comment`: The authors' comment if present.
107-
+ `result.journal_ref`: A journal reference if present.
108-
+ `result.doi`: A URL for the resolved DOI to an external resource if present.
109-
+ `result.primary_category`: The result's primary arXiv category. See [arXiv: Category Taxonomy](https://arxiv.org/category_taxonomy).
110-
+ `result.categories`: All of the result's categories. See [arXiv: Category Taxonomy](https://arxiv.org/category_taxonomy).
111-
+ `result.links`: Up to three URLs associated with this result, as `arxiv.Link`s.
112-
+ `result.pdf_url`: A URL for the result's PDF if present. Note: this URL also appears among `result.links`.
113-
114-
They also expose helper methods for downloading papers: `(Result).download_pdf()` and `(Result).download_source()`.
115-
116-
#### Example: downloading papers
117-
118-
To download a PDF of the paper with ID "1605.08386v1," run a `Search` and then use `(Result).download_pdf()`:
63+
To download a PDF of the paper with ID "1605.08386v1," run a `Search` and then use `Result.download_pdf()`:
11964

12065
```python
12166
import arxiv
@@ -143,24 +88,7 @@ paper.download_source(filename="downloaded-paper.tar.gz")
14388
paper.download_source(dirpath="./mydir", filename="downloaded-paper.tar.gz")
14489
```
14590

146-
### Client
147-
148-
A `Client` specifies a strategy for fetching results from arXiv's API; it obscures pagination and retry logic. For most use cases the default client should suffice.
149-
150-
```python
151-
# Default client properties.
152-
arxiv.Client(
153-
page_size: int = 100,
154-
delay_seconds: float = 3.0,
155-
num_retries: int = 3
156-
)
157-
```
158-
159-
+ `page_size`: the number of papers to fetch from arXiv per page of results. Smaller pages can be retrieved faster, but may require more round-trips. The API's limit is 2000 results.
160-
+ `delay_seconds`: the number of seconds to wait between requests for pages. [arXiv's Terms of Use](https://arxiv.org/help/api/tou) ask that you "make no more than one request every three seconds."
161-
+ `num_retries`: The number of times the client will retry a request that fails, either with a non-200 HTTP status code or with an unexpected number of results given the search parameters.
162-
163-
#### Example: fetching results with a custom client
91+
#### Fetching results with a custom client
16492

16593
```python
16694
import arxiv
@@ -176,9 +104,9 @@ for result in big_slow_client.results(arxiv.Search(query="quantum")):
176104
print(result.title)
177105
```
178106

179-
#### Example: logging
107+
#### Logging
180108

181-
To inspect this package's network behavior and API logic, configure an `INFO`-level logger.
109+
To inspect this package's network behavior and API logic, configure an `DEBUG`-level logger.
182110

183111
```pycon
184112
>>> import logging, arxiv
@@ -190,3 +118,23 @@ INFO:arxiv.arxiv:Requesting page (first: False, try: 0): https://export.arxiv.or
190118
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): export.arxiv.org:443
191119
DEBUG:urllib3.connectionpool:https://export.arxiv.org:443 "GET /api/query?search_query=&id_list=1605.08386v1&sortBy=relevance&sortOrder=descending&start=0&max_results=100&user-agent=arxiv.py%2F1.4.8 HTTP/1.1" 200 979
192120
```
121+
122+
## Types
123+
124+
### Client
125+
126+
A `Client` specifies a reusable strategy for fetching results from arXiv's API. For most use cases the default client should suffice.
127+
128+
Clients configurations specify pagination and retry logic. *Reusing* a client allows successive API calls to use the same connection pool and ensures they abide by the rate limit you set.
129+
130+
### Search
131+
132+
A `Search` specifies a search of arXiv's database. Use `Client.results` to get a generator yielding `Result`s.
133+
134+
### Result
135+
136+
The `Result` objects yielded by `Client.results` include metadata about each paper and helper methods for downloading their content.
137+
138+
The meaning of the underlying raw data is documented in the [arXiv API User Manual: Details of Atom Results Returned](https://arxiv.org/help/api/user-manual#_details_of_atom_results_returned).
139+
140+
`Result` also exposes helper methods for downloading papers: `Result.download_pdf` and `Result.download_source`.

arxiv/__init__.py

Lines changed: 23 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -427,9 +427,9 @@ class Search(object):
427427
max_results: int | None
428428
"""
429429
The maximum number of results to be returned in an execution of this
430-
search.
430+
search. To fetch every result available, set `max_results=None`.
431431
432-
To fetch every result available, set `max_results=None`.
432+
The API's limit is 300,000 results per query.
433433
"""
434434
sort_by: SortCriterion
435435
"""The sort criterion for results."""
@@ -484,14 +484,13 @@ def _url_args(self) -> Dict[str, str]:
484484

485485
def results(self, offset: int = 0) -> Generator[Result, None, None]:
486486
"""
487-
Executes the specified search using a default arXiv API client.
488-
489-
For info on default behavior, see `Client.__init__` and `Client.results`.
487+
Executes the specified search using a default arXiv API client. For info
488+
on default behavior, see `Client.__init__` and `Client.results`.
490489
491490
**Deprecated** after 2.0.0; use `Client.results`.
492491
"""
493492
warnings.warn(
494-
"The '(Search).results' method is deprecated, use 'Client.results' instead",
493+
"The 'Search.results' method is deprecated, use 'Client.results' instead",
495494
DeprecationWarning,
496495
stacklevel=2,
497496
)
@@ -507,13 +506,27 @@ class Client(object):
507506
"""
508507

509508
query_url_format = "https://export.arxiv.org/api/query?{}"
510-
"""The arXiv query API endpoint format."""
509+
"""
510+
The arXiv query API endpoint format.
511+
"""
511512
page_size: int
512-
"""Maximum number of results fetched in a single API request."""
513+
"""
514+
Maximum number of results fetched in a single API request. Smaller pages can
515+
be retrieved faster, but may require more round-trips.
516+
517+
The API's limit is 2000 results per page.
518+
"""
513519
delay_seconds: float
514-
"""Number of seconds to wait between API requests."""
520+
"""
521+
Number of seconds to wait between API requests.
522+
523+
[arXiv's Terms of Use](https://arxiv.org/help/api/tou) ask that you "make no
524+
more than one request every three seconds."
525+
"""
515526
num_retries: int
516-
"""Number of times to retry a failing API request."""
527+
"""
528+
Number of times to retry a failing API request before raising an Exception.
529+
"""
517530

518531
_last_request_dt: datetime
519532
_session: requests.Session

arxiv/arxiv.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@
77
from .__init__ import * # noqa: F403
88
import warnings
99

10-
warnings.warn("**Deprecated** after 2.0.0; use `import arxiv` instead.")
10+
warnings.warn("**Deprecated** after 2.0.0; use 'import arxiv' instead.")

0 commit comments

Comments
 (0)