Reddit Voice of Customer (VOC) Analyzer

A Streamlit web application designed to fetch, analyze, and visualize posts and comments from Reddit subreddits. It helps understand community sentiment, key topics, and feedback patterns using data analysis and optional Large Language Model (LLM) integration.

The application employs a hybrid approach:

It first attempts to fetch data using public scraping methods (via requests on old.reddit.com) which doesn't require authentication but might be less reliable or limited.
It uses the PRAW (Python Reddit API Wrapper) library as a more robust fallback and for features like subreddit searching and specific comment sorting, which requires user-provided Reddit API credentials.

Key Features

Subreddit Selection:
- Specify target subreddit by Name (e.g., learnpython).
- Specify target subreddit by URL (e.g., https://www.reddit.com/r/learnpython).
- Search for subreddits by keyword (requires PRAW connection).
Post Fetching:
- Fetch posts based on sorting criteria (Hot, New, Top, Controversial) with time filters (Day, Week, Month, Year, All Time).
- Filter posts by date range (e.g., Last 30 days, Last Year).
- Specify the maximum number of posts to fetch.
Comment Fetching:
- Select fetched posts to retrieve their comments.
- Specify maximum comments per post.
- Utilizes PRAW (if connected) for reliable comment fetching and sorting (Top, New, Controversial, etc.). Falls back to public scraping.
Data Display & Filtering:
- View fetched posts and comments in interactive, sortable tables.
- Filter displayed comments by keywords.
Analysis & Visualization:
- Sentiment Analysis: Calculates polarity, subjectivity (TextBlob), and compound sentiment (NLTK VADER) for posts and comments.
- Sentiment Distribution: Displays a histogram showing the spread of comment sentiment scores.
- Word Cloud: Generates a word cloud from the displayed comment text to highlight frequent terms.
LLM Integration (Optional):
- Connect to the Groq API for fast AI-powered analysis.
- Select different analysis focuses (Overall Summary, Themes, Sentiment Detail, Pain Points, Praise, Actionable Insights).
- Choose from various compatible Groq language models.
Data Export:
- Download fetched posts data as a CSV file.
- Download displayed (potentially filtered) comments data as a CSV file.
Caching: Caches API responses and LLM results to improve performance and reduce API calls. Cache can be cleared manually via the sidebar.
Logging: Logs application activity and errors to app.log for debugging.

Tech Stack

Language: Python 3.9+
Web Framework: Streamlit
Reddit API: PRAW (Python Reddit API Wrapper)
Public Scraping: Requests
Data Handling: Pandas
NLP/Sentiment: TextBlob, NLTK (VADER, Stopwords)
Visualization: Plotly Express, Matplotlib, WordCloud
LLM API: Groq Python SDK
Environment: python-dotenv

Setup Instructions

Prerequisites:
- Python 3.9 or higher installed.
- Git installed and configured in your PATH (see Troubleshooting below if git command fails).

Clone the Repository:

git clone https://github.com/[Your GitHub Username]/reddit-voc-analyzer.git
cd reddit-voc-analyzer

(Replace [Your GitHub Username] with your actual username)

(Recommended) Create and Activate a Virtual Environment:

# Create environment (use python3 on macOS/Linux if needed)
python -m venv venv

# Activate (Windows PowerShell)
.\venv\Scripts\Activate.ps1

# Activate (Windows Command Prompt)
.\venv\Scripts\activate.bat

# Activate (macOS/Linux Bash/Zsh)
source venv/bin/activate

Install Dependencies:
```
pip install -r requirements.txt
```
Download NLTK Data: The application attempts to download necessary NLTK data (vader_lexicon, stopwords) on first run if not found. If this fails due to network issues or permissions, you can download them manually:
```
python -m nltk.downloader vader_lexicon stopwords
```
Create and Configure .env File:
- Copy the example file:
```
# Windows
copy .env.example .env
# macOS/Linux
cp .env.example .env
```
- Edit the new .env file and fill in your credentials:
  - REDDIT_CLIENT_ID: Your Reddit script app's Client ID.
  - REDDIT_CLIENT_SECRET: Your Reddit script app's Secret.
  - REDDIT_APP_NAME: The exact name you gave your app on Reddit.
  - REDDIT_USERNAME (Optional): Your Reddit username (without /u/).
  - GROQ_API_KEY (Optional): Your API key from console.groq.com if you want to use LLM features.
- Get Reddit Credentials: Create a new application of type 'script' on https://www.reddit.com/prefs/apps. You will find the Client ID (under the app name) and the Secret there.

Running the Application

Make sure your virtual environment (if created) is activated.
Navigate to the project directory in your terminal.
Run the Streamlit app:
```
streamlit run app.py
```
Open the URL shown in your terminal (usually http://localhost:8501) in your web browser.
Use the sidebar to configure PRAW/Groq connections and select a subreddit to begin analysis.

Troubleshooting

git command not found: See Step 1 of Setup - ensure Git is installed and its cmd directory is in your system's PATH environment variable. Restart your terminal after making PATH changes.
PRAW Connection Errors (401 Unauthorized): This almost always means your REDDIT_CLIENT_ID or REDDIT_CLIENT_SECRET in the .env file (or entered in the UI) is incorrect. Double-check them against https://www.reddit.com/prefs/apps. Also ensure the app type on Reddit is 'script'.
PRAW Connection Errors (Other): Network issues, Reddit API downtime, or incorrect App Name might cause other errors. Check the status message in the app and the app.log file.
NLTK Data Errors: If sentiment analysis or word clouds fail with LookupError, try running the manual NLTK download command from Step 5 of Setup.
Other Errors: Check the app.log file in the project directory for detailed error messages and tracebacks.

Future Enhancements (Ideas)

More sophisticated NLP analysis (Topic Modeling, NER).
User authentication for saving settings (requires more complex setup).
Support for analyzing user profiles or specific post URLs directly.
More visualization options.
Error handling improvements.

License

(If you make the repo public later, uncomment the MIT license badge at the top and add a LICENSE file with the MIT license text.)

Author

Tayeeb Khan
- GitHub: @followtayeeb
- LinkedIn: Tayeeb Khan

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.devcontainer		.devcontainer
pages		pages
tests		tests
.gitignore		.gitignore
app.py		app.py
config.py		config.py
fetcher.py		fetcher.py
readme.md		readme.md
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reddit Voice of Customer (VOC) Analyzer

Key Features

Tech Stack

Setup Instructions

Running the Application

Troubleshooting

Future Enhancements (Ideas)

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Reddit Voice of Customer (VOC) Analyzer

Key Features

Tech Stack

Setup Instructions

Running the Application

Troubleshooting

Future Enhancements (Ideas)

License

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages