You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[](https://pypi.org/project/waybacktweets)[](https://doi.org/10.5281/zenodo.12528447)[](https://waybacktweets.streamlit.app)[](https://colab.research.google.com/drive/1tnaM3rMWpoSHBZ4P_6iHFPjraWRQ3OGe?usp=sharing)
Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing (see [Field Options](https://claromes.github.io/waybacktweets/field_options.html)), and saves the data in HTML, for easy viewing of the tweets using the iframe tags, CSV, and JSON formats.
7
6
@@ -11,21 +10,50 @@ Retrieves archived tweets CDX data from the Wayback Machine, performs necessary
11
10
pip install waybacktweets
12
11
```
13
12
14
-
## Quickstart
15
-
16
-
### Using Wayback Tweets as a standalone command line tool
17
-
18
-
waybacktweets [OPTIONS] USERNAME
13
+
## CLI
19
14
20
15
```shell
21
-
waybacktweets --from 20150101 --to 20191231 --limit 250 jack
16
+
Usage: waybacktweets [OPTIONS] USERNAME
17
+
18
+
USERNAME: The Twitter username without @
19
+
20
+
Options:
21
+
-c, --collapse [urlkey|digest|timestamp:XX]
22
+
Collapse results based on a field, or a
23
+
substring of a field. XX in the timestamp
24
+
value ranges from 1 to 14, comparing the
25
+
first XX digits of the timestamp field. It
26
+
is recommended to use from 4 onwards, to
27
+
compare at least by years.
28
+
-f, --from DATE Filtering by date range from this date.
29
+
Format: YYYYmmdd
30
+
-t, --to DATE Filtering by date range up to this date.
31
+
Format: YYYYmmdd
32
+
-l, --limit INTEGER Query result limits.
33
+
-rk, --resumption_key TEXT Allows for a simple way to scroll through
34
+
the results. Key to continue the query from
35
+
the end of the previous query.
36
+
-mt, --matchtype [exact|prefix|host|domain]
37
+
Results matching a certain prefix, a certain
38
+
host or all subdomains.
39
+
-v, --verbose Shows the log.
40
+
--version Show the version and exit.
41
+
-h, --help Show this message and exit.
42
+
43
+
Examples:
44
+
45
+
Retrieve all tweets: waybacktweets jack
46
+
47
+
With options and verbose output: waybacktweets --from 20200305 --to 20231231 --limit 300 --verbose jack
48
+
49
+
Documentation:
50
+
51
+
https://claromes.github.io/waybacktweets/
22
52
```
23
53
24
-
### Using Wayback Tweets as a Web App
25
-
26
-
[Open the application](https://waybacktweets.streamlit.app), a prototype written in Python with the Streamlit framework and hosted on Streamlit Cloud.
54
+
## Module
27
55
28
-
### Using Wayback Tweets as a Python Module
56
+
[](https://colab.research.google.com/drive/1tnaM3rMWpoSHBZ4P_6iHFPjraWRQ3OGe?usp=sharing)
29
57
30
58
```python
31
59
from waybacktweets import WaybackTweets, TweetsParser, TweetsExporter
Copy file name to clipboardExpand all lines: docs/conf.py
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@
5
5
project="Wayback Tweets"
6
6
release, version=get_version("waybacktweets")
7
7
rst_epilog=f".. |release| replace:: v{release}"
8
-
copyright=f"2023 - {datetime.datetime.now().year}, Claromes · Icon by The Doodle Library · Title font by Google, licensed under the Open Font License · Pre-release: v{release}"# noqa: E501
8
+
copyright=f"2023 - {datetime.datetime.now().year}, Claromes · Icon by The Doodle Library · Title font by Google, licensed under the Open Font License · Release: v{release}"# noqa: E501
9
9
author="Claromes"
10
10
11
11
# -- General configuration ---------------------------------------------------
Copy file name to clipboardExpand all lines: docs/field_options.rst
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,3 +40,5 @@ The package performs several parses to facilitate the analysis of archived tweet
40
40
- ``archived_digest``: (`str`) The ``SHA1`` hash digest of the content, excluding the headers. It's usually a base-32-encoded string.
41
41
42
42
- ``archived_length``: (`int`) The compressed byte size of the corresponding WARC record, which includes WARC headers, HTTP headers, and content payload.
43
+
44
+
- ``resumption_key``: (`str`) Allows for a simple way to scroll through the results. Key to continue the query from the end of the previous query.
Copy file name to clipboardExpand all lines: docs/outputs.rst
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,10 +14,14 @@ This format allows for easy viewing of the archived tweets, through the use of t
14
14
15
15
- ``original_tweet_url``: (`str`) The original tweet URL.
16
16
17
-
- ``parsed_tweet_url``: (`str`) The original tweet URL after parsing. Old URLs were archived in a nested manner. The parsing applied here unnests these URLs, when necessary. Check the :ref:`utils`.
17
+
- ``parsed_tweet_url``: (`str`) The original tweet URL after parsing. Old URLs were archived in a nested manner. The parsing applied here unnests these URLs when necessary. Refer to the :ref:`utils` for more details.
18
18
19
19
Additionally, other fields are displayed.
20
20
21
+
.. note::
22
+
23
+
The iframes (accordions) are best viewed in Firefox.
waybacktweets --from 20150101 --to 20191231 --limit 250 jack
14
14
15
-
Web App
16
-
-------------
17
-
18
-
Using Wayback Tweets as a Streamlit Web App.
19
-
20
-
`Open the application <https://waybacktweets.streamlit.app>`_, a prototype written in Python with the Streamlit framework and hosted on Streamlit Cloud.
21
-
22
15
Module
23
16
-------------
24
17
@@ -35,14 +28,34 @@ Using Wayback Tweets as a Python Module.
`Open the application <https://waybacktweets.streamlit.app>`_, a prototype written in Python with the Streamlit framework and hosted on Streamlit Cloud.
0 commit comments