A Streamlit web application designed to fetch, analyze, and visualize posts and comments from Reddit subreddits. It helps understand community sentiment, key topics, and feedback patterns using data analysis and optional Large Language Model (LLM) integration.
The application employs a hybrid approach:
- It first attempts to fetch data using public scraping methods (via
requestsonold.reddit.com) which doesn't require authentication but might be less reliable or limited. - It uses the PRAW (Python Reddit API Wrapper) library as a more robust fallback and for features like subreddit searching and specific comment sorting, which requires user-provided Reddit API credentials.
- Subreddit Selection:
- Specify target subreddit by Name (e.g.,
learnpython). - Specify target subreddit by URL (e.g.,
https://www.reddit.com/r/learnpython). - Search for subreddits by keyword (requires PRAW connection).
- Specify target subreddit by Name (e.g.,
- Post Fetching:
- Fetch posts based on sorting criteria (Hot, New, Top, Controversial) with time filters (Day, Week, Month, Year, All Time).
- Filter posts by date range (e.g., Last 30 days, Last Year).
- Specify the maximum number of posts to fetch.
- Comment Fetching:
- Select fetched posts to retrieve their comments.
- Specify maximum comments per post.
- Utilizes PRAW (if connected) for reliable comment fetching and sorting (Top, New, Controversial, etc.). Falls back to public scraping.
- Data Display & Filtering:
- View fetched posts and comments in interactive, sortable tables.
- Filter displayed comments by keywords.
- Analysis & Visualization:
- Sentiment Analysis: Calculates polarity, subjectivity (TextBlob), and compound sentiment (NLTK VADER) for posts and comments.
- Sentiment Distribution: Displays a histogram showing the spread of comment sentiment scores.
- Word Cloud: Generates a word cloud from the displayed comment text to highlight frequent terms.
- LLM Integration (Optional):
- Connect to the Groq API for fast AI-powered analysis.
- Select different analysis focuses (Overall Summary, Themes, Sentiment Detail, Pain Points, Praise, Actionable Insights).
- Choose from various compatible Groq language models.
- Data Export:
- Download fetched posts data as a CSV file.
- Download displayed (potentially filtered) comments data as a CSV file.
- Caching: Caches API responses and LLM results to improve performance and reduce API calls. Cache can be cleared manually via the sidebar.
- Logging: Logs application activity and errors to
app.logfor debugging.
- Language: Python 3.9+
- Web Framework: Streamlit
- Reddit API: PRAW (Python Reddit API Wrapper)
- Public Scraping: Requests
- Data Handling: Pandas
- NLP/Sentiment: TextBlob, NLTK (VADER, Stopwords)
- Visualization: Plotly Express, Matplotlib, WordCloud
- LLM API: Groq Python SDK
- Environment: python-dotenv
-
Prerequisites:
- Python 3.9 or higher installed.
- Git installed and configured in your PATH (see Troubleshooting below if
gitcommand fails).
-
Clone the Repository:
git clone https://github.com/[Your GitHub Username]/reddit-voc-analyzer.git cd reddit-voc-analyzer(Replace
[Your GitHub Username]with your actual username) -
(Recommended) Create and Activate a Virtual Environment:
# Create environment (use python3 on macOS/Linux if needed) python -m venv venv # Activate (Windows PowerShell) .\venv\Scripts\Activate.ps1 # Activate (Windows Command Prompt) .\venv\Scripts\activate.bat # Activate (macOS/Linux Bash/Zsh) source venv/bin/activate
-
Install Dependencies:
pip install -r requirements.txt
-
Download NLTK Data: The application attempts to download necessary NLTK data (
vader_lexicon,stopwords) on first run if not found. If this fails due to network issues or permissions, you can download them manually:python -m nltk.downloader vader_lexicon stopwords
-
Create and Configure
.envFile:- Copy the example file:
# Windows copy .env.example .env # macOS/Linux cp .env.example .env
- Edit the new
.envfile and fill in your credentials:REDDIT_CLIENT_ID: Your Reddit script app's Client ID.REDDIT_CLIENT_SECRET: Your Reddit script app's Secret.REDDIT_APP_NAME: The exact name you gave your app on Reddit.REDDIT_USERNAME(Optional): Your Reddit username (without/u/).GROQ_API_KEY(Optional): Your API key from console.groq.com if you want to use LLM features.
- Get Reddit Credentials: Create a new application of type 'script' on https://www.reddit.com/prefs/apps. You will find the Client ID (under the app name) and the Secret there.
- Copy the example file:
- Make sure your virtual environment (if created) is activated.
- Navigate to the project directory in your terminal.
- Run the Streamlit app:
streamlit run app.py
- Open the URL shown in your terminal (usually
http://localhost:8501) in your web browser. - Use the sidebar to configure PRAW/Groq connections and select a subreddit to begin analysis.
gitcommand not found: See Step 1 of Setup - ensure Git is installed and itscmddirectory is in your system's PATH environment variable. Restart your terminal after making PATH changes.- PRAW Connection Errors (401 Unauthorized): This almost always means your
REDDIT_CLIENT_IDorREDDIT_CLIENT_SECRETin the.envfile (or entered in the UI) is incorrect. Double-check them against https://www.reddit.com/prefs/apps. Also ensure the app type on Reddit is 'script'. - PRAW Connection Errors (Other): Network issues, Reddit API downtime, or incorrect App Name might cause other errors. Check the status message in the app and the
app.logfile. - NLTK Data Errors: If sentiment analysis or word clouds fail with
LookupError, try running the manual NLTK download command from Step 5 of Setup. - Other Errors: Check the
app.logfile in the project directory for detailed error messages and tracebacks.
- More sophisticated NLP analysis (Topic Modeling, NER).
- User authentication for saving settings (requires more complex setup).
- Support for analyzing user profiles or specific post URLs directly.
- More visualization options.
- Error handling improvements.
(If you make the repo public later, uncomment the MIT license badge at the top and add a LICENSE file with the MIT license text.)
- Tayeeb Khan
- GitHub: @followtayeeb
- LinkedIn: Tayeeb Khan