feat: add social media post analysis application (#1)

TorresjDev · web-flow · commit 6d4a46588246 · 2025-06-30T00:16:28.000-05:00
* feat: add social media post analysis application

- Initial project structure and README
- Setup for sentiment analysis and engagement metrics
- Multi-platform social media support planned

* files updated for project social media analyzer

* Remove socialmediapostdata.csv from tracking

- Remove CSV data file from Git repository
- Update .gitignore to prevent future tracking
- File is loaded directly from URL, no need to store locally

* refactor: update README and main.py for social media analysis application; enhance functionality and clarify instructions

* refactor: enhance README and main.py for social media analysis application; improve user instructions and add visualizations

* feat: implement dynamic author selection in social media analysis tool; enhance user interaction and visualization capabilities
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,22 @@
+# This is a list of files that should be ignored by Git.
+# It includes files that are not necessary for version control, such as temporary files, logs, and environment files.
 
-.env
+# Environment file for storing sensitive information like API keys
+.env 
 
+# Stock data files
 stock-data-with-yfinance/apple_stock_1y_history.csv
+stock-data-with-yfinance/apple-1yr-stock-graph.png
+
+# Social media analysis files
+socialmediapostdata.csv
+main-v1.py
+main-template.py
+
+# Python virtual environment directory
+git.txt
+
+#  vscode settings and extensions
+.vscode
+
 
-stock-data-with-yfinance/apple-1yr-stock-graph.png
diff --git a/README.md b/README.md
@@ -26,6 +26,12 @@ This collection showcases multiple Python-based projects for data analysis, visu
   A Python script that retrieves, analyzes, and visualizes 5 years of Apple Inc. stock data using yfinance and Plotly.
 - **Location:** `stock-data-with-yfinance/`
 
+### 📱 Social Media Post Analysis Application
+
+- **Description:**  
+  A powerful Python tool for analyzing social media posting patterns with interactive visualizations and dynamic celebrity engagement analytics. Features user-selectable author analysis, automated data processing, dual-mode visualizations (matplotlib & Plotly), word cloud generation, and comprehensive engagement metrics tracking.
+- **Location:** `social-media-analyzer/`
+
 ---
 
 ## 🚀 Getting Started
@@ -44,6 +50,7 @@ This collection showcases multiple Python-based projects for data analysis, visu
 Python-Programs/
 │
 ├── ai-chat-bot-google-gemini/    # AI chat bot using Google Gemini API
+├── social-media-analyzer/        # Social media post analysis with dynamic author selection
 ├── stock-data-app/               # Streamlit web app
 ├── stock-data-with-yfinance/     # Data analysis & visualization script
 └── README.md                     # This documentation
diff --git a/ai-chat-bot-google-gemini/README.md b/ai-chat-bot-google-gemini/README.md
@@ -91,7 +91,7 @@ ai-chat-bot-google-gemini/
 
 ## 🙏 Acknowledgements
 
-This project was created for educational purposes and as a demonstration of integrating Google Gemini's API with Python as an assignment for the CIDM 4310/5310 Business Intelligence and Decision Support Systems course at West Texas A&M University, under the guidance of Dr. Cheng (Carl) Zhang.
+This project was created for educational purposes as a part of integrating Google Gemini's API with Python for the Computer Information and Decision Management Business Intelligence and Decision Support Systems course at West Texas A&M University, under the guidance of Dr. Cheng (Carl) Zhang.
 
 Special thanks to the open-source community and the authors of the libraries used!
 
diff --git a/social-media-analyzer/README.md b/social-media-analyzer/README.md
@@ -0,0 +1,129 @@
+# 📱 Social Media Post Analysis Application
+
+**A powerful Python tool for analyzing social media posting patterns with interactive visualizations and celebrity engagement analytics.**
+
+Welcome to the **Social Media Post Analysis Application**!  
+This comprehensive platform provides deep insights into social media behavior, featuring automated data processing, dual-mode visualizations, and professional-grade analytics perfect for marketers, researchers, and data enthusiasts.
+
+---
+
+## 🚀 Key Features
+
+- **Interactive User Interface**: Welcomes users with clear instructions and prompts for both CSV data URL input and author selection.
+- **Dynamic Author Selection**: Users can choose any author from the dataset for focused analysis, making it flexible for any celebrity or influencer.
+- **Flexible Data Source**: Accepts custom CSV URLs or uses default environment configuration for seamless data loading.
+- **Intelligent Data Pipeline**: Automatically downloads and processes social media datasets with built-in file existence checking.
+- **Advanced Data Cleaning**: Removes unnecessary columns (country, id, language, latitude, longitude) for focused analysis.
+- **Multi-Author Time Series Analysis**: Converts timestamps and tracks posting patterns across different social media accounts.
+- **Dual Visualization Engine**: Creates both static matplotlib charts and interactive Plotly Express visualizations.
+- **Personalized Celebrity Analysis**: Deep-dive into any selected author's posting patterns, engagement metrics, and content themes.
+- **Engagement Analytics**: Analyzes likes and shares data with comparative trend visualization for the chosen author.
+- **Advanced Text Processing**: Generates beautiful word clouds from post content with intelligent stop-word filtering.
+- **Professional User Experience**: Includes welcome messages, progress updates, and friendly conclusion messages.
+
+---
+
+## 📊 What This Application Does
+
+✅ **Interactive user onboarding** with welcome messages and clear instructions  
+✅ **Flexible data input** - accepts custom URLs or uses environment defaults  
+✅ **Smart data downloading** - automatically fetches CSV files when needed  
+✅ **Comprehensive data cleaning** - removes geographic and metadata columns  
+✅ **Advanced time series processing** - converts datetime formats for analysis  
+✅ **Multi-author analytics** - tracks posting frequency across all social media accounts  
+✅ **Dynamic author selection** - user chooses which celebrity/influencer to analyze in detail  
+✅ **Personalized engagement metrics** - likes and shares analysis for the selected author  
+✅ **Static visualizations** - professional matplotlib charts with grid styling  
+✅ **Interactive charts** - Plotly Express with hover effects, zoom, and responsive design  
+✅ **Custom text analytics** - word cloud generation from the chosen author's content  
+✅ **User experience design** - progress updates and friendly conclusion messages
+
+---
+
+## 📂 Folder Structure
+
+```
+social-media-analyzer/
+│
+├── main.py             # Main analysis engine with interactive interface
+├── requirements.txt    # Python dependencies
+├── .env                # Environment variables (optional, user-created)
+└── README.md           # This documentation
+```
+
+---
+
+## 🎯 What Makes This Special
+
+This isn't just another data analysis script - it's a user-friendly social media insights platform designed with real-world usability in mind. The application features an intuitive command-line interface that guides users through the entire process, from data input to author selection to final visualizations.
+
+The dynamic author selection feature allows users to analyze any celebrity, influencer, or social media personality in the dataset. Whether you're interested in Jimmy Fallon's posting patterns, Taylor Swift's engagement metrics, or any other author's social media behavior, this tool adapts to your research needs - making it invaluable for social media managers, brand strategists, entertainment analysts, or academic researchers studying digital communication patterns.
+
+**Key differentiators:**
+
+- **Dynamic author selection** - analyze any personality in your dataset, not just pre-coded examples
+- **Environment-aware configuration** for seamless deployment
+- **Professional user experience** with guided prompts and status updates
+- **Dual-mode visualization** combining static publication-quality charts with interactive web-ready plots
+- **Flexible celebrity analytics** demonstrating real-world entertainment industry applications
+- **Personalized engagement metrics** tailored to your chosen author's posting frequency and audience interaction
+
+---
+
+## 🛠️ How to Run
+
+1. **Clone the repository**
+
+   ```bash
+   git clone https://github.com/TorresjDev/Python-Programs.git
+   cd Python-Programs/social-media-analyzer
+   ```
+
+2. **Set up environment (Optional)**
+
+   Create a `.env` file in the project directory:
+
+   ```env
+   CVS_URL=https://your-default-csv-url.com/data.csv
+   ```
+
+3. **Install dependencies**
+
+   ```bash
+   pip install -r requirements.txt
+   ```
+
+4. **Run the analysis**
+
+   ```bash
+   python main.py
+   ```
+
+5. **Follow the interactive prompts**
+   - Enter a CSV URL when prompted, or press Enter to use the default from your `.env` file
+   - Choose any author from the dataset for personalized analysis (e.g., 'jimmyfallon', 'taylorswift13', etc.)
+   - Enjoy the automated analysis and visualizations tailored to your selected author!
+
+---
+
+## 📚 References
+
+- [Pandas Documentation](https://pandas.pydata.org/docs/)
+- [Matplotlib Documentation](https://matplotlib.org/stable/contents.html)
+- [Plotly Express Documentation](https://plotly.com/python/plotly-express/)
+- [WordCloud Documentation](https://pypi.org/project/wordcloud/)
+- [python-dotenv Documentation](https://pypi.org/project/python-dotenv/)
+
+---
+
+## 🙏 Acknowledgements
+
+This application was developed by **Jesus Torres** utilizing modern data science tools and visualization libraries.
+
+Special thanks to the open-source community and the creators of Pandas, Matplotlib, Plotly, and WordCloud!
+
+---
+
+## 📝 License
+
+This project is for educational and demonstration purposes.
diff --git a/social-media-analyzer/main.py b/social-media-analyzer/main.py
@@ -0,0 +1,169 @@
+from wordcloud import WordCloud, STOPWORDS
+import csv
+import pandas as pd
+import matplotlib.pyplot as plt
+from dotenv import load_dotenv
+import plotly.express as px
+import requests
+import os
+
+load_dotenv()
+cvs_file = "socialmediapostdata.csv"
+cvs_url_key = "CVS_URL"
+
+# welcome message
+print("Welcome to the Social Media Analyzer!")
+print("-" * 75)
+print("This program will analyze social media posts data and visualize the number of posts per author per day.")
+print("Please follow the instructions to enter the URL of the CSV file containing the social media posts data.")
+print("-" * 75)
+
+# prompt user for the URL of the CSV file if not use env default
+cvs_url = input(
+    "Please enter the URL to the CSV file (or press Enter to use the default): ") or os.getenv(cvs_url_key)
+
+# Check if the CSV file exists, if not download it from the provided URL
+if not os.path.exists(cvs_file):
+    response = requests.get(cvs_url)
+    with open(cvs_file, "wb") as csv_file:
+        csv_file.write(response.content)
+
+# Load the dataset using pandas as pd
+post_data = pd.read_csv(cvs_file, encoding='utf-8')
+print(post_data)
+
+# Clean the dataset by removing unnecessary columns using the drop method from pandas
+post_data = post_data.drop(
+    ['country', 'id', 'language', 'latitude', 'longitude'], axis=1)
+# print(post_data)
+
+# calculate the daily number of posts created by each user using to_datetime method to convert the 'date_time' column to datetime format
+post_data['date_time'] = pd.to_datetime(
+    post_data['date_time'], format='%d/%m/%Y %H:%M').dt.date
+print(post_data)
+
+# Calculate the daily number of posts created by each user
+content_counts = post_data.groupby(['author', 'date_time'])[
+    'content'].count().reset_index(name='content_count')
+print(content_counts)
+
+# Show few rows of the content_counts DataFrame to verify the grouping and counting
+content_counts['date_time'] = pd.to_datetime(content_counts['date_time'])
+
+#  Plot the number of posts per author per day calculated above using matplotlib.pyplot
+plt.figure(figsize=(12, 6))
+for author in content_counts['author'].unique():
+    author_data = content_counts[content_counts['author'] == author]
+    plt.plot(author_data['date_time'],
+             author_data['content_count'], marker='o', label=author)
+plt.title('Number of Posts for Author per Day')
+plt.xlabel('Date')
+plt.ylabel('Number of Posts')
+plt.legend(title='Author')
+plt.tight_layout()
+plt.show()
+
+# Plot the number of posts per author per day using Plotly Express
+fig = px.line(
+    content_counts,
+    x='date_time',
+    y='content_count',
+    color='author',
+    markers=True,
+    labels={'content_count': 'Number of Posts', 'date_time': 'Date'},
+    title='Number of Posts per Author per Day')
+fig.show()
+
+# Filter the DataFrame prompt user for input author
+author_name = input(
+    "Please enter the author name to filter (e.g., 'jimmyfallon'): ")
+author_data = content_counts[content_counts['author'] == author_name].copy()
+
+# Ensure that 'date_time' is in datetime format for plotting
+author_data['date_time'] = pd.to_datetime(author_data['date_time'])
+
+# Plot the number of posts by the selected author per day using matplotlib
+plt.figure(figsize=(12, 6))
+plt.plot(author_data['date_time'], author_data['content_count'],
+         label=author_name, linestyle='-', color='blue')
+plt.title('Number of Posts per Author per Day')
+plt.xlabel('Date')
+plt.ylabel('Number of Posts')
+plt.grid(axis='y', linestyle='--')
+plt.tight_layout()
+plt.show()
+
+# Filter the DataFrame for the selected author's posts
+author_posts = post_data[post_data['author'] == author_name].copy()
+
+# Ensure that 'date_time' is in datetime format for plotting
+author_posts['date_time'] = pd.to_datetime(
+    author_posts['date_time'])
+
+# Extract the content of the selected author's posts
+author_content = author_posts['content']
+
+print(author_content)
+
+# Generate a word cloud from the selected author's posts content
+all_content = ' '.join(author_content.astype(str))
+# Update the stop words to include common words that may not be useful in the word cloud
+updated_stop_words = STOPWORDS.update(["https", "co", "t"])
+# Generate the word cloud using WordCloud from wordcloud library
+wordcloud = WordCloud(stopwords=updated_stop_words, width=800,
+                      height=400, background_color="white").generate(all_content)
+
+# Display the generated word cloud using matplotlib
+plt.figure(figsize=(10, 5))
+plt.imshow(wordcloud)
+plt.axis("off")
+plt.show()
+
+# Convert 'date_time' to datetime format for accurate plotting
+author_posts['date_time'] = pd.to_datetime(
+    author_posts['date_time'])
+
+print(author_posts)
+
+# Plot the daily number of likes and shares for the selected author using matplotlib
+plt.figure(figsize=(12, 6))
+plt.plot(author_posts['date_time'], author_posts['number_of_likes'],
+         label='Daily Likes', linestyle='-', color='blue')
+plt.plot(author_posts['date_time'], author_posts['number_of_shares'],
+         label='Daily Shares', linestyle='-', color='orange')
+plt.title(f'Daily Likes and Shares for {author_name}')
+plt.xlabel('Date')
+plt.ylabel('Count of Likes/Shares')
+plt.legend()
+plt.grid(axis='y', linestyle='--')
+plt.tight_layout()
+plt.show()
+
+
+# Plot the daily number of likes and shares for the selected author using Plotly Express
+fig = px.line(
+    author_posts,
+    x='date_time',
+    y=['number_of_likes', 'number_of_shares'],
+    labels={'date_time': 'Date', 'value': 'Count'},
+    title=f'Daily Likes and Shares for {author_name}',
+    markers=True,
+    color_discrete_map={'number_of_likes': 'blue',
+                        'number_of_shares': 'orange'}
+)
+fig.update_layout(
+    xaxis_title='Date',
+    yaxis_title='Count of Likes/Shares',
+    legend_title='Metrics'
+)
+fig.show()
+
+
+# ending message
+print("Thank you for using the Social Media Analyzer!")
+print("-" * 75)
+print("We hope you found the analysis and visualizations helpful.")
+print("Feel free to explore the data further or modify the code for your own analysis.")
+print("-" * 75)
+print("Goodbye!")
+print("-" * 75)
diff --git a/social-media-analyzer/requirements.txt b/social-media-analyzer/requirements.txt
@@ -0,0 +1,17 @@
+# Social Media Post Analysis - Project Requirements
+
+# Core Data Analysis (Required)
+pandas>=2.0.0
+
+# Data Visualization (Required)
+matplotlib>=3.7.0
+plotly>=5.15.0
+
+# Word Cloud Generation (For text analysis)
+wordcloud>=1.9.2
+
+# HTTP Requests (For downloading CSV data)
+requests>=2.31.0
+
+# Additional HTTP Client (Used in your imports)
+httpx>=0.24.0
diff --git a/stock-data-with-yfinance/README.md b/stock-data-with-yfinance/README.md
@@ -68,7 +68,8 @@ stock-data-with-yfinance/
 
 ## 🙏 Acknowledgements
 
-This program was adapted and modified by Jesus Torres as an assignment for the CIDM 4310/5310 Business Intelligence and Decision Support Systems course at West Texas A&M University, under the guidance of Dr. Cheng (Carl) Zhang.
+This program was created for educational purposes as
+for the Computer Information and Decision Management Business Intelligence and Decision Support Systems course at West Texas A&M University, under the guidance of Dr. Cheng (Carl) Zhang
 
 - The original codebase was provided as part of the course material.