@@ -5,21 +5,33 @@ collections have archived the URL. This kind of information can sometimes
55provide insight about why a particular web resource or set of web resources were
66archived from the web.
77
8- ## Install
8+ ## Run
99
10- pip install waybackprov
10+ If you have [ uv] installed you can run ` waybackprov ` easily without installing anything:
11+
12+ ```
13+ uvx waybackprov
14+ ```
15+
16+ Otherwise you'll probably want to install it with ` pip ` :
17+
18+ ```
19+ pip install waybackprov
20+ ```
1121
1222## Basic Usage
1323
1424To check a particular URL here's how it works:
1525
16- % waybackprov https://twitter.com/EPAScottPruitt
17- 364 https://archive.org/details/focused_crawls
18- 306 https://archive.org/details/edgi_monitor
19- 151 https://archive.org/details/www3.epa.gov
20- 60 https://archive.org/details/epa.gov4
21- 47 https://archive.org/details/epa.gov5
22- ...
26+ ``` shell
27+ waybackprov https://twitter.com/EPAScottPruitt
28+ 364 https://archive.org/details/focused_crawls
29+ 306 https://archive.org/details/edgi_monitor
30+ 151 https://archive.org/details/www3.epa.gov
31+ 60 https://archive.org/details/epa.gov4
32+ 47 https://archive.org/details/epa.gov5
33+ ...
34+ ```
2335
2436The first column contains the number of crawls for a particular URL, and the
2537second column contains the URL for the Internet Archive collection that added
3042By default waybackprov will only look at the current year. If you would like it
3143to examine a range of years use the ` --start ` and ` --end ` options:
3244
33- % waybackprov --start 2016 --end 2018 https://twitter.com/EPAScottPruitt
45+ ``` shell
46+ waybackprov --start 2016 --end 2018 https://twitter.com/EPAScottPruitt
47+ ```
3448
3549## Multiple Pages
3650
3751If you would like to look at all URLs at a particular URL prefix you can use the
3852` --prefix ` option:
3953
40- % waybackprov --prefix https://twitter.com/EPAScottPruitt
54+ ``` shell
55+ waybackprov --prefix https://twitter.com/EPAScottPruitt
56+ ```
4157
4258This will use the Internet Archive's [ CDX API] ( https://github.com/webrecorder/pywb/wiki/CDX-Server-API ) to also include URLs that are extensions of the URL you supply, so it would include for example:
4359
@@ -53,7 +69,9 @@ interested in is highly recommended since it prevents lots of lookups for CSS,
5369JavaScript and image files that are components of the resource that was
5470initially crawled.
5571
56- % waybackprov --prefix --match 'status/\d+$' https://twitter.com/EPAScottPruitt
72+ ```
73+ waybackprov --prefix --match 'status/\d+$' https://twitter.com/EPAScottPruitt
74+ ```
5775
5876## Collections
5977
@@ -78,12 +96,15 @@ rather than a summary.
7896If you would like to see detailed information about what * waybackprov* is doing
7997use the ` --log ` option to supply the a file path to log to:
8098
81- % waybackprov --log waybackprov.log https://example.com/
99+ ``` shell
100+ waybackprov --log waybackprov.log https://example.com/
101+ ```
82102
83103## Test
84104
85105If you would like to test it first install [ pytest] and then:
86106
87- pytest test.py
107+ uv run pytest test.py
88108
89109[ pytest ] : https://docs.pytest.org/en/latest/
110+ [ uv ] : https://docs.astral.sh/uv/
0 commit comments