Skip to content

Commit f14c17c

Browse files
committed
Updated docs
1 parent 2174978 commit f14c17c

File tree

3 files changed

+36
-16
lines changed

3 files changed

+36
-16
lines changed

README.md

Lines changed: 35 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,33 @@ collections have archived the URL. This kind of information can sometimes
55
provide insight about why a particular web resource or set of web resources were
66
archived from the web.
77

8-
## Install
8+
## Run
99

10-
pip install waybackprov
10+
If you have [uv] installed you can run `waybackprov` easily without installing anything:
11+
12+
```
13+
uvx waybackprov
14+
```
15+
16+
Otherwise you'll probably want to install it with `pip`:
17+
18+
```
19+
pip install waybackprov
20+
```
1121

1222
## Basic Usage
1323

1424
To check a particular URL here's how it works:
1525

16-
% waybackprov https://twitter.com/EPAScottPruitt
17-
364 https://archive.org/details/focused_crawls
18-
306 https://archive.org/details/edgi_monitor
19-
151 https://archive.org/details/www3.epa.gov
20-
60 https://archive.org/details/epa.gov4
21-
47 https://archive.org/details/epa.gov5
22-
...
26+
```shell
27+
waybackprov https://twitter.com/EPAScottPruitt
28+
364 https://archive.org/details/focused_crawls
29+
306 https://archive.org/details/edgi_monitor
30+
151 https://archive.org/details/www3.epa.gov
31+
60 https://archive.org/details/epa.gov4
32+
47 https://archive.org/details/epa.gov5
33+
...
34+
```
2335

2436
The first column contains the number of crawls for a particular URL, and the
2537
second column contains the URL for the Internet Archive collection that added
@@ -30,14 +42,18 @@ it.
3042
By default waybackprov will only look at the current year. If you would like it
3143
to examine a range of years use the `--start` and `--end` options:
3244

33-
% waybackprov --start 2016 --end 2018 https://twitter.com/EPAScottPruitt
45+
```shell
46+
waybackprov --start 2016 --end 2018 https://twitter.com/EPAScottPruitt
47+
```
3448

3549
## Multiple Pages
3650

3751
If you would like to look at all URLs at a particular URL prefix you can use the
3852
`--prefix` option:
3953

40-
% waybackprov --prefix https://twitter.com/EPAScottPruitt
54+
```shell
55+
waybackprov --prefix https://twitter.com/EPAScottPruitt
56+
```
4157

4258
This will use the Internet Archive's [CDX API](https://github.com/webrecorder/pywb/wiki/CDX-Server-API) to also include URLs that are extensions of the URL you supply, so it would include for example:
4359

@@ -53,7 +69,9 @@ interested in is highly recommended since it prevents lots of lookups for CSS,
5369
JavaScript and image files that are components of the resource that was
5470
initially crawled.
5571

56-
% waybackprov --prefix --match 'status/\d+$' https://twitter.com/EPAScottPruitt
72+
```
73+
waybackprov --prefix --match 'status/\d+$' https://twitter.com/EPAScottPruitt
74+
```
5775

5876
## Collections
5977

@@ -78,12 +96,15 @@ rather than a summary.
7896
If you would like to see detailed information about what *waybackprov* is doing
7997
use the `--log` option to supply the a file path to log to:
8098

81-
% waybackprov --log waybackprov.log https://example.com/
99+
```shell
100+
waybackprov --log waybackprov.log https://example.com/
101+
```
82102

83103
## Test
84104

85105
If you would like to test it first install [pytest] and then:
86106

87-
pytest test.py
107+
uv run pytest test.py
88108

89109
[pytest]: https://docs.pytest.org/en/latest/
110+
[uv]: https://docs.astral.sh/uv/

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "waybackprov"
3-
version = "0.0.9"
3+
version = "0.1.0"
44
description = "Checks the provenance of a URL in the Wayback machine"
55
readme = "README.md"
66
authors = [

src/waybackprov/__init__.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,6 @@ def get_crawls(
138138
# month. So some spots in the first and last row are null. Not
139139
# every day has any data if the URL wasn't crawled then.
140140
logging.info("getting calendar year %s for %s", year, url)
141-
print("getting calendar year %s for %s", year, url)
142141
cal = get_json(api % (url, year))
143142
for month in cal:
144143
for week in month:

0 commit comments

Comments
 (0)