You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
+[Example: fetching results with a custom client](#example-fetching-results-with-a-custom-client)
12
-
13
-
## About arXiv
14
-
15
6
[arXiv](https://arxiv.org/) is a project by the Cornell University Library that provides open access to 1,000,000+ articles in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance, and Statistics.
16
7
17
8
## Usage
@@ -28,94 +19,48 @@ In your Python script, include the line
28
19
import arxiv
29
20
```
30
21
31
-
### Search
32
-
33
-
A `Search` specifies a search of arXiv's database.
34
-
35
-
```python
36
-
arxiv.Search(
37
-
query: str="",
38
-
id_list: List[str] = [],
39
-
max_results: int|None=None,
40
-
sort_by: SortCriterion= SortCriterion.Relevance,
41
-
sort_order: SortOrder= SortOrder.Descending
42
-
)
43
-
```
44
-
45
-
+`query`: an arXiv query string. Advanced query formats are documented in the [arXiv API User Manual](https://arxiv.org/help/api/user-manual#query_details).
46
-
+`id_list`: list of arXiv record IDs (typically of the format `"0710.5765v1"`). See [the arXiv API User's Manual](https://arxiv.org/help/api/user-manual#search_query_and_id_list) for documentation of the interaction between `query` and `id_list`.
47
-
+`max_results`: The maximum number of results to be returned in an execution of this search. To fetch every result available, set `max_results=None` (default); to fetch up to 10 results, set `max_results=10`. The API's limit is 300,000 results.
48
-
+`sort_by`: The sort criterion for results: `relevance`, `lastUpdatedDate`, or `submittedDate`.
49
-
+`sort_order`: The sort order for results: `'descending'` or `'ascending'`.
50
-
51
-
To fetch arXiv records matching a `Search`, use `(Client).results(search)` to get a generator yielding `Result`s.
22
+
### Examples
52
23
53
-
#### Example: fetching results
54
-
55
-
Print the titles fo the 10 most recent articles related to the keyword "quantum:"
24
+
#### Fetching results
56
25
57
26
```python
58
27
import arxiv
59
28
29
+
# Construct the default API client.
30
+
client = arxiv.Client()
31
+
32
+
# Search for the 10 most recent articles matching the keyword "quantum."
60
33
search = arxiv.Search(
61
34
query="quantum",
62
35
max_results=10,
63
36
sort_by= arxiv.SortCriterion.SubmittedDate
64
37
)
65
38
66
-
for result in arxiv.Client().results(search):
67
-
print(result.title)
68
-
```
39
+
results = client.results(search)
69
40
70
-
Use the `query` syntax documented in the [arXiv API User Manual](https://arxiv.org/help/api/user-manual#query_details):
71
-
72
-
```python
73
-
import arxiv
41
+
# `results` is a generator; you can iterate over its elements one by one...
42
+
for r in client.results(search):
43
+
print(r.title)
44
+
# ...or exhaust it into a list. Careful: this is slow for large results sets.
45
+
all_results =list(results)
46
+
print([r.title for r in all_results])
74
47
48
+
# For advanced query syntax documentation, see the arXiv API User Manual:
# Reuse client to fetch the paper, then print its title.
57
+
first_result =next(client.results(search))
58
+
print(first_result.title)
90
59
```
91
60
92
-
### Result
93
-
94
-
<!-- TODO: improve this section. -->
61
+
#### Downloading papers
95
62
96
-
The `Result` objects yielded by `(Client).results()` include metadata about each paper and some helper functions for downloading their content.
97
-
98
-
The meaning of the underlying raw data is documented in the [arXiv API User Manual: Details of Atom Results Returned](https://arxiv.org/help/api/user-manual#_details_of_atom_results_returned).
99
-
100
-
+`result.entry_id`: A url `https://arxiv.org/abs/{id}`.
101
-
+`result.updated`: When the result was last updated.
102
-
+`result.published`: When the result was originally published.
103
-
+`result.title`: The title of the result.
104
-
+`result.authors`: The result's authors, as `arxiv.Author`s.
105
-
+`result.summary`: The result abstract.
106
-
+`result.comment`: The authors' comment if present.
107
-
+`result.journal_ref`: A journal reference if present.
108
-
+`result.doi`: A URL for the resolved DOI to an external resource if present.
109
-
+`result.primary_category`: The result's primary arXiv category. See [arXiv: Category Taxonomy](https://arxiv.org/category_taxonomy).
110
-
+`result.categories`: All of the result's categories. See [arXiv: Category Taxonomy](https://arxiv.org/category_taxonomy).
111
-
+`result.links`: Up to three URLs associated with this result, as `arxiv.Link`s.
112
-
+`result.pdf_url`: A URL for the result's PDF if present. Note: this URL also appears among `result.links`.
113
-
114
-
They also expose helper methods for downloading papers: `(Result).download_pdf()` and `(Result).download_source()`.
115
-
116
-
#### Example: downloading papers
117
-
118
-
To download a PDF of the paper with ID "1605.08386v1," run a `Search` and then use `(Result).download_pdf()`:
63
+
To download a PDF of the paper with ID "1605.08386v1," run a `Search` and then use `Result.download_pdf()`:
A `Client` specifies a strategy for fetching results from arXiv's API; it obscures pagination and retry logic. For most use cases the default client should suffice.
149
-
150
-
```python
151
-
# Default client properties.
152
-
arxiv.Client(
153
-
page_size: int=100,
154
-
delay_seconds: float=3.0,
155
-
num_retries: int=3
156
-
)
157
-
```
158
-
159
-
+`page_size`: the number of papers to fetch from arXiv per page of results. Smaller pages can be retrieved faster, but may require more round-trips. The API's limit is 2000 results.
160
-
+`delay_seconds`: the number of seconds to wait between requests for pages. [arXiv's Terms of Use](https://arxiv.org/help/api/tou) ask that you "make no more than one request every three seconds."
161
-
+`num_retries`: The number of times the client will retry a request that fails, either with a non-200 HTTP status code or with an unexpected number of results given the search parameters.
162
-
163
-
#### Example: fetching results with a custom client
91
+
#### Fetching results with a custom client
164
92
165
93
```python
166
94
import arxiv
@@ -176,9 +104,9 @@ for result in big_slow_client.results(arxiv.Search(query="quantum")):
176
104
print(result.title)
177
105
```
178
106
179
-
#### Example: logging
107
+
#### Logging
180
108
181
-
To inspect this package's network behavior and API logic, configure an `INFO`-level logger.
109
+
To inspect this package's network behavior and API logic, configure an `DEBUG`-level logger.
A `Client` specifies a reusable strategy for fetching results from arXiv's API. For most use cases the default client should suffice.
127
+
128
+
Clients configurations specify pagination and retry logic. *Reusing* a client allows successive API calls to use the same connection pool and ensures they abide by the rate limit you set.
129
+
130
+
### Search
131
+
132
+
A `Search` specifies a search of arXiv's database. Use `Client.results` to get a generator yielding `Result`s.
133
+
134
+
### Result
135
+
136
+
The `Result` objects yielded by `Client.results` include metadata about each paper and helper methods for downloading their content.
137
+
138
+
The meaning of the underlying raw data is documented in the [arXiv API User Manual: Details of Atom Results Returned](https://arxiv.org/help/api/user-manual#_details_of_atom_results_returned).
139
+
140
+
`Result` also exposes helper methods for downloading papers: `Result.download_pdf` and `Result.download_source`.
0 commit comments