ask-reddit

A Python scraper using PRAW to extract Reddit comments for NLP and GPT model analysis. This project aims to collect comprehensive comment data from Reddit threads.

Ask Reddit Scraper

A command-line tool to scrape submissions from a specified subreddit, process the data, and save the output into batched JSON files.

Prerequisites

Conda: This project is configured to use conda for environment management. You can install it via Anaconda or Miniconda.
Reddit & Google API Credentials: You will need API keys for both Reddit and any Google services you intend to use.

Installation

Follow these steps to set up the project environment using conda.

1. Clone the Repository

git clone <your-repository-url>
cd ask_reddit

2. Create and Activate the Conda Environment The repository includes an environment.yml file that contains all the necessary dependencies. Run the following command from your terminal to create the environment:

conda env create -f environment.yml

Once the process is complete, activate the new environment:

conda activate ask_reddit

3. Configure Environment Variables Create a file named .env in the project's root directory. This file securely stores your API keys and configuration. Copy the following, paste it into the .env file, and add your credentials.

# .env file

# --- Reddit API Credentials ---
REDDIT_CLIENT_ID="YOUR_CLIENT_ID_HERE"
REDDIT_CLIENT_SECRET="YOUR_CLIENT_SECRET_HERE"
REDDIT_USER_AGENT="A_DESCRIPTIVE_USER_AGENT_STRING"
REDDIT_PASSWORD="YOUR_REDDIT_PASSWORD"

# --- Google Generative AI Configuration ---
GOOGLE_API_KEY="YOUR_GOOGLE_API_KEY_HERE"
GENAI_MODEL="gemini-2.5-flash"

# --- File & Data Configuration ---
FILE_LOCATION="data/"
SOURCE="reddit"

Usage

Run the module from your terminal with the required arguments.

python -m ask_reddit --subreddit <name> --days <number> --batch <M|D>

Arguments

--subreddit: (Required) The name of the subreddit to scrape (e.g., python).
--days: (Required) The number of days back from today to collect submissions.
--batch: (Required) The batching mode for output files (D for daily, M for monthly).

Example

python -m ask_reddit --subreddit dataisbeautiful --days 30 --batch M

Output

The script will generate JSON files inside the data/ directory, which is created automatically if it does not exist.

Daily Batching (D): r_subreddit_YYYY-MM-DD.json
Monthly Batching (M): r_subreddit_YYYY-MM.json

🛡 License

This project is licensed under the terms of the MIT license. See LICENSE for more details.

📃 Citation

@misc{ask-reddit,
  author = {john-james-ai},
  title = {A Python scraper using PRAW to extract Reddit comments for NLP and GPT model analysis. This project aims to collect comprehensive comment data from Reddit threads.},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/ask-reddit/ask-reddit}}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.github		.github
ask_reddit		ask_reddit
assets/images		assets/images
docker		docker
notebooks		notebooks
tests		tests
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
ask_reddit.code-workspace		ask_reddit.code-workspace
cookiecutter-config-file.yml		cookiecutter-config-file.yml
environment.yml		environment.yml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ask-reddit

Ask Reddit Scraper

Prerequisites

Installation

Usage

Arguments

Example

Output

🛡 License

📃 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ask-reddit

Ask Reddit Scraper

Prerequisites

Installation

Usage

Arguments

Example

Output

🛡 License

📃 Citation

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages