Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/FUNDING.yml

This file was deleted.

3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
*.csv
*.json
*.html
*.txt

test.py

waybacktweets/__pycache__
waybacktweets/api/__pycache__
Expand Down
22 changes: 9 additions & 13 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -14,29 +14,25 @@ authors:
identifiers:
- type: doi
value: 10.5281/zenodo.12528447
description: The concept DOI of the work.
description: Retrieves archived tweets from Wayback Machine in HTML, CSV, and JSON.
- type: url
value: "https://pypi.org/project/waybacktweets/"
description: Python Package Index.
- type: url
value: "https://claromes.github.io/waybacktweets/"
value: "https://waybacktweets.claromes.com/"
description: Documentation.
repository-code: "https://github.com/claromes/waybacktweets"
url: "https://claromes.github.io/waybacktweets"
url: "https://waybacktweets.claromes.com/"
abstract: >-
Retrieves archived tweets CDX data from the Wayback
Machine, performs necessary parsing, and saves the data in
HTML (for easy viewing of the tweets using the iframe
tag), CSV, and JSON formats.
Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing, and saves the data in HTML, for easy viewing of the tweets using the iframe tags, CSV, and JSON formats.
keywords:
- Twitter
- Wayback Machine
- X
- Tweets
- Python
- Wayback Machine
- OSINT
- SOCMINT
- X
- Python
license: GPL-3.0
commit: 16f9997a8e2e2b87932ca061bf5731cd65d1d588
version: 1.0a5
date-released: "2024-06-24"
version: 1.0
date-released: "2025-05-26"
96 changes: 74 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,61 @@
# Wayback Tweets

[![PyPI](https://img.shields.io/pypi/v/waybacktweets)](https://pypi.org/project/waybacktweets) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.12528447.svg)](https://doi.org/10.5281/zenodo.12528447) [![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://waybacktweets.streamlit.app) [![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tnaM3rMWpoSHBZ4P_6iHFPjraWRQ3OGe?usp=sharing)
[![PyPI](https://img.shields.io/pypi/v/waybacktweets)](https://pypi.org/project/waybacktweets) [![PyPI Downloads](https://static.pepy.tech/badge/waybacktweets)](https://pepy.tech/projects/waybacktweets)


Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing (see [Field Options](https://claromes.github.io/waybacktweets/field_options.html)), and saves the data in HTML, for easy viewing of the tweets using the iframe tags, CSV, and JSON formats.
Retrieves archived tweets CDX data from the Wayback Machine, performs necessary parsing (see [Field Options](https://waybacktweets.claromes.com/field_options)), and saves the data in HTML, for easy viewing of the tweets using the iframe tags, CSV, and JSON formats.

## Installation

It is compatible with Python versions 3.10 and above. [See installation options](https://waybacktweets.claromes.com/installation).

```shell
pip install waybacktweets
pipx install waybacktweets
```

## Quickstart

### Using Wayback Tweets as a standalone command line tool

waybacktweets [OPTIONS] USERNAME
## CLI

```shell
waybacktweets --from 20150101 --to 20191231 --limit 250 jack
Usage:
waybacktweets [OPTIONS] USERNAME
USERNAME: The Twitter username without @

Options:
-c, --collapse [urlkey|digest|timestamp:xx]
Collapse results based on a field, or a
substring of a field. XX in the timestamp
value ranges from 1 to 14, comparing the
first XX digits of the timestamp field. It
is recommended to use from 4 onwards, to
compare at least by years.
-f, --from DATE Filtering by date range from this date.
Format: YYYYmmdd
-t, --to DATE Filtering by date range up to this date.
Format: YYYYmmdd
-l, --limit INTEGER Query result limits.
-rk, --resumption_key TEXT Allows for a simple way to scroll through
the results. Key to continue the query from
the end of the previous query.
-mt, --matchtype [exact|prefix|host|domain]
Results matching a certain prefix, a certain
host or all subdomains.
-v, --verbose Shows the log.
--version Show the version and exit.
-h, --help Show this message and exit.

Examples:
waybacktweets jack
waybacktweets --from 20200305 --to 20231231 --limit 300 --verbose jack

Repository:
https://github.com/claromes/waybacktweets

Documentation:
https://waybacktweets.claromes.com
```

### Using Wayback Tweets as a Web App

[Open the application](https://waybacktweets.streamlit.app), a prototype written in Python with the Streamlit framework and hosted on Streamlit Cloud.
## Module

### Using Wayback Tweets as a Python Module
[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1tnaM3rMWpoSHBZ4P_6iHFPjraWRQ3OGe?usp=sharing)

```python
from waybacktweets import WaybackTweets, TweetsParser, TweetsExporter
Expand All @@ -37,29 +67,51 @@ archived_tweets = api.get()

if archived_tweets:
field_options = [
"archived_urlkey",
"archived_timestamp",
"original_tweet_url",
"parsed_archived_timestamp",
"archived_tweet_url",
"parsed_archived_tweet_url",
"original_tweet_url",
"parsed_tweet_url",
"available_tweet_text",
"available_tweet_is_RT",
"available_tweet_info",
"archived_mimetype",
"archived_statuscode",
"archived_digest",
"archived_length",
"resumption_key",
]

parser = TweetsParser(archived_tweets, USERNAME, field_options)
parsed_tweets = parser.parse()

exporter = TweetsExporter(parsed_tweets, USERNAME, field_options)
exporter.save_to_csv()
exporter.save_to_json()
exporter.save_to_html()
```

## Web App

[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://waybacktweets.streamlit.app)

A prototype written in Python with the Streamlit framework and hosted on Streamlit Cloud.

Important: Starting from version 1.0, the web app will no longer receive all updates from the official package. To access all features, prefer using the package from PyPI.

## Documentation

- [Wayback Tweets documentation](https://claromes.github.io/waybacktweets)
- [Wayback CDX Server API (Beta) documentation](https://archive.org/developers/wayback-cdx-server.html)
- [Wayback Tweets documentation](https://waybacktweets.claromes.com/).
- [Wayback CDX Server API (Beta) documentation](https://archive.org/developers/wayback-cdx-server.html).

## Acknowledgements

- Tristan Lee (Bellingcat's Data Scientist) for the idea of the application.
- Jessica Smith (Snowflake's Community Growth Specialist) and Streamlit/Snowflake team for the additional server resources on Streamlit Cloud.
- OSINT Community for recommending the application.
- Tristan Lee (Bellingcat's Data Scientist) for the idea.
- Jessica Smith (Snowflake's Community Growth Specialist) and Streamlit team for the additional server resources on Streamlit Cloud.
- OSINT Community for recommending the package and the application.

## License

> [!NOTE]
> If the Streamlit application is down, please check the [Streamlit Cloud Status](https://www.streamlitstatus.com/).
[GPL-3.0](LICENSE.md)
Loading