Skip to content

Commit 6d4a465

Browse files
authored
feat: add social media post analysis application (#1)
* feat: add social media post analysis application - Initial project structure and README - Setup for sentiment analysis and engagement metrics - Multi-platform social media support planned * files updated for project social media analyzer * Remove socialmediapostdata.csv from tracking - Remove CSV data file from Git repository - Update .gitignore to prevent future tracking - File is loaded directly from URL, no need to store locally * refactor: update README and main.py for social media analysis application; enhance functionality and clarify instructions * refactor: enhance README and main.py for social media analysis application; improve user instructions and add visualizations * feat: implement dynamic author selection in social media analysis tool; enhance user interaction and visualization capabilities
1 parent 8114435 commit 6d4a465

File tree

7 files changed

+343
-4
lines changed

7 files changed

+343
-4
lines changed

.gitignore

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,22 @@
1+
# This is a list of files that should be ignored by Git.
2+
# It includes files that are not necessary for version control, such as temporary files, logs, and environment files.
13

2-
.env
4+
# Environment file for storing sensitive information like API keys
5+
.env
36

7+
# Stock data files
48
stock-data-with-yfinance/apple_stock_1y_history.csv
9+
stock-data-with-yfinance/apple-1yr-stock-graph.png
10+
11+
# Social media analysis files
12+
socialmediapostdata.csv
13+
main-v1.py
14+
main-template.py
15+
16+
# Python virtual environment directory
17+
git.txt
18+
19+
# vscode settings and extensions
20+
.vscode
21+
522

6-
stock-data-with-yfinance/apple-1yr-stock-graph.png

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,12 @@ This collection showcases multiple Python-based projects for data analysis, visu
2626
A Python script that retrieves, analyzes, and visualizes 5 years of Apple Inc. stock data using yfinance and Plotly.
2727
- **Location:** `stock-data-with-yfinance/`
2828

29+
### 📱 Social Media Post Analysis Application
30+
31+
- **Description:**
32+
A powerful Python tool for analyzing social media posting patterns with interactive visualizations and dynamic celebrity engagement analytics. Features user-selectable author analysis, automated data processing, dual-mode visualizations (matplotlib & Plotly), word cloud generation, and comprehensive engagement metrics tracking.
33+
- **Location:** `social-media-analyzer/`
34+
2935
---
3036

3137
## 🚀 Getting Started
@@ -44,6 +50,7 @@ This collection showcases multiple Python-based projects for data analysis, visu
4450
Python-Programs/
4551
4652
├── ai-chat-bot-google-gemini/ # AI chat bot using Google Gemini API
53+
├── social-media-analyzer/ # Social media post analysis with dynamic author selection
4754
├── stock-data-app/ # Streamlit web app
4855
├── stock-data-with-yfinance/ # Data analysis & visualization script
4956
└── README.md # This documentation

ai-chat-bot-google-gemini/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ ai-chat-bot-google-gemini/
9191

9292
## 🙏 Acknowledgements
9393

94-
This project was created for educational purposes and as a demonstration of integrating Google Gemini's API with Python as an assignment for the CIDM 4310/5310 Business Intelligence and Decision Support Systems course at West Texas A&M University, under the guidance of Dr. Cheng (Carl) Zhang.
94+
This project was created for educational purposes as a part of integrating Google Gemini's API with Python for the Computer Information and Decision Management Business Intelligence and Decision Support Systems course at West Texas A&M University, under the guidance of Dr. Cheng (Carl) Zhang.
9595

9696
Special thanks to the open-source community and the authors of the libraries used!
9797

social-media-analyzer/README.md

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
# 📱 Social Media Post Analysis Application
2+
3+
**A powerful Python tool for analyzing social media posting patterns with interactive visualizations and celebrity engagement analytics.**
4+
5+
Welcome to the **Social Media Post Analysis Application**!
6+
This comprehensive platform provides deep insights into social media behavior, featuring automated data processing, dual-mode visualizations, and professional-grade analytics perfect for marketers, researchers, and data enthusiasts.
7+
8+
---
9+
10+
## 🚀 Key Features
11+
12+
- **Interactive User Interface**: Welcomes users with clear instructions and prompts for both CSV data URL input and author selection.
13+
- **Dynamic Author Selection**: Users can choose any author from the dataset for focused analysis, making it flexible for any celebrity or influencer.
14+
- **Flexible Data Source**: Accepts custom CSV URLs or uses default environment configuration for seamless data loading.
15+
- **Intelligent Data Pipeline**: Automatically downloads and processes social media datasets with built-in file existence checking.
16+
- **Advanced Data Cleaning**: Removes unnecessary columns (country, id, language, latitude, longitude) for focused analysis.
17+
- **Multi-Author Time Series Analysis**: Converts timestamps and tracks posting patterns across different social media accounts.
18+
- **Dual Visualization Engine**: Creates both static matplotlib charts and interactive Plotly Express visualizations.
19+
- **Personalized Celebrity Analysis**: Deep-dive into any selected author's posting patterns, engagement metrics, and content themes.
20+
- **Engagement Analytics**: Analyzes likes and shares data with comparative trend visualization for the chosen author.
21+
- **Advanced Text Processing**: Generates beautiful word clouds from post content with intelligent stop-word filtering.
22+
- **Professional User Experience**: Includes welcome messages, progress updates, and friendly conclusion messages.
23+
24+
---
25+
26+
## 📊 What This Application Does
27+
28+
**Interactive user onboarding** with welcome messages and clear instructions
29+
**Flexible data input** - accepts custom URLs or uses environment defaults
30+
**Smart data downloading** - automatically fetches CSV files when needed
31+
**Comprehensive data cleaning** - removes geographic and metadata columns
32+
**Advanced time series processing** - converts datetime formats for analysis
33+
**Multi-author analytics** - tracks posting frequency across all social media accounts
34+
**Dynamic author selection** - user chooses which celebrity/influencer to analyze in detail
35+
**Personalized engagement metrics** - likes and shares analysis for the selected author
36+
**Static visualizations** - professional matplotlib charts with grid styling
37+
**Interactive charts** - Plotly Express with hover effects, zoom, and responsive design
38+
**Custom text analytics** - word cloud generation from the chosen author's content
39+
**User experience design** - progress updates and friendly conclusion messages
40+
41+
---
42+
43+
## 📂 Folder Structure
44+
45+
```
46+
social-media-analyzer/
47+
48+
├── main.py # Main analysis engine with interactive interface
49+
├── requirements.txt # Python dependencies
50+
├── .env # Environment variables (optional, user-created)
51+
└── README.md # This documentation
52+
```
53+
54+
---
55+
56+
## 🎯 What Makes This Special
57+
58+
This isn't just another data analysis script - it's a user-friendly social media insights platform designed with real-world usability in mind. The application features an intuitive command-line interface that guides users through the entire process, from data input to author selection to final visualizations.
59+
60+
The dynamic author selection feature allows users to analyze any celebrity, influencer, or social media personality in the dataset. Whether you're interested in Jimmy Fallon's posting patterns, Taylor Swift's engagement metrics, or any other author's social media behavior, this tool adapts to your research needs - making it invaluable for social media managers, brand strategists, entertainment analysts, or academic researchers studying digital communication patterns.
61+
62+
**Key differentiators:**
63+
64+
- **Dynamic author selection** - analyze any personality in your dataset, not just pre-coded examples
65+
- **Environment-aware configuration** for seamless deployment
66+
- **Professional user experience** with guided prompts and status updates
67+
- **Dual-mode visualization** combining static publication-quality charts with interactive web-ready plots
68+
- **Flexible celebrity analytics** demonstrating real-world entertainment industry applications
69+
- **Personalized engagement metrics** tailored to your chosen author's posting frequency and audience interaction
70+
71+
---
72+
73+
## 🛠️ How to Run
74+
75+
1. **Clone the repository**
76+
77+
```bash
78+
git clone https://github.com/TorresjDev/Python-Programs.git
79+
cd Python-Programs/social-media-analyzer
80+
```
81+
82+
2. **Set up environment (Optional)**
83+
84+
Create a `.env` file in the project directory:
85+
86+
```env
87+
CVS_URL=https://your-default-csv-url.com/data.csv
88+
```
89+
90+
3. **Install dependencies**
91+
92+
```bash
93+
pip install -r requirements.txt
94+
```
95+
96+
4. **Run the analysis**
97+
98+
```bash
99+
python main.py
100+
```
101+
102+
5. **Follow the interactive prompts**
103+
- Enter a CSV URL when prompted, or press Enter to use the default from your `.env` file
104+
- Choose any author from the dataset for personalized analysis (e.g., 'jimmyfallon', 'taylorswift13', etc.)
105+
- Enjoy the automated analysis and visualizations tailored to your selected author!
106+
107+
---
108+
109+
## 📚 References
110+
111+
- [Pandas Documentation](https://pandas.pydata.org/docs/)
112+
- [Matplotlib Documentation](https://matplotlib.org/stable/contents.html)
113+
- [Plotly Express Documentation](https://plotly.com/python/plotly-express/)
114+
- [WordCloud Documentation](https://pypi.org/project/wordcloud/)
115+
- [python-dotenv Documentation](https://pypi.org/project/python-dotenv/)
116+
117+
---
118+
119+
## 🙏 Acknowledgements
120+
121+
This application was developed by **Jesus Torres** utilizing modern data science tools and visualization libraries.
122+
123+
Special thanks to the open-source community and the creators of Pandas, Matplotlib, Plotly, and WordCloud!
124+
125+
---
126+
127+
## 📝 License
128+
129+
This project is for educational and demonstration purposes.

social-media-analyzer/main.py

Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
from wordcloud import WordCloud, STOPWORDS
2+
import csv
3+
import pandas as pd
4+
import matplotlib.pyplot as plt
5+
from dotenv import load_dotenv
6+
import plotly.express as px
7+
import requests
8+
import os
9+
10+
load_dotenv()
11+
cvs_file = "socialmediapostdata.csv"
12+
cvs_url_key = "CVS_URL"
13+
14+
# welcome message
15+
print("Welcome to the Social Media Analyzer!")
16+
print("-" * 75)
17+
print("This program will analyze social media posts data and visualize the number of posts per author per day.")
18+
print("Please follow the instructions to enter the URL of the CSV file containing the social media posts data.")
19+
print("-" * 75)
20+
21+
# prompt user for the URL of the CSV file if not use env default
22+
cvs_url = input(
23+
"Please enter the URL to the CSV file (or press Enter to use the default): ") or os.getenv(cvs_url_key)
24+
25+
# Check if the CSV file exists, if not download it from the provided URL
26+
if not os.path.exists(cvs_file):
27+
response = requests.get(cvs_url)
28+
with open(cvs_file, "wb") as csv_file:
29+
csv_file.write(response.content)
30+
31+
# Load the dataset using pandas as pd
32+
post_data = pd.read_csv(cvs_file, encoding='utf-8')
33+
print(post_data)
34+
35+
# Clean the dataset by removing unnecessary columns using the drop method from pandas
36+
post_data = post_data.drop(
37+
['country', 'id', 'language', 'latitude', 'longitude'], axis=1)
38+
# print(post_data)
39+
40+
# calculate the daily number of posts created by each user using to_datetime method to convert the 'date_time' column to datetime format
41+
post_data['date_time'] = pd.to_datetime(
42+
post_data['date_time'], format='%d/%m/%Y %H:%M').dt.date
43+
print(post_data)
44+
45+
# Calculate the daily number of posts created by each user
46+
content_counts = post_data.groupby(['author', 'date_time'])[
47+
'content'].count().reset_index(name='content_count')
48+
print(content_counts)
49+
50+
# Show few rows of the content_counts DataFrame to verify the grouping and counting
51+
content_counts['date_time'] = pd.to_datetime(content_counts['date_time'])
52+
53+
# Plot the number of posts per author per day calculated above using matplotlib.pyplot
54+
plt.figure(figsize=(12, 6))
55+
for author in content_counts['author'].unique():
56+
author_data = content_counts[content_counts['author'] == author]
57+
plt.plot(author_data['date_time'],
58+
author_data['content_count'], marker='o', label=author)
59+
plt.title('Number of Posts for Author per Day')
60+
plt.xlabel('Date')
61+
plt.ylabel('Number of Posts')
62+
plt.legend(title='Author')
63+
plt.tight_layout()
64+
plt.show()
65+
66+
# Plot the number of posts per author per day using Plotly Express
67+
fig = px.line(
68+
content_counts,
69+
x='date_time',
70+
y='content_count',
71+
color='author',
72+
markers=True,
73+
labels={'content_count': 'Number of Posts', 'date_time': 'Date'},
74+
title='Number of Posts per Author per Day')
75+
fig.show()
76+
77+
# Filter the DataFrame prompt user for input author
78+
author_name = input(
79+
"Please enter the author name to filter (e.g., 'jimmyfallon'): ")
80+
author_data = content_counts[content_counts['author'] == author_name].copy()
81+
82+
# Ensure that 'date_time' is in datetime format for plotting
83+
author_data['date_time'] = pd.to_datetime(author_data['date_time'])
84+
85+
# Plot the number of posts by the selected author per day using matplotlib
86+
plt.figure(figsize=(12, 6))
87+
plt.plot(author_data['date_time'], author_data['content_count'],
88+
label=author_name, linestyle='-', color='blue')
89+
plt.title('Number of Posts per Author per Day')
90+
plt.xlabel('Date')
91+
plt.ylabel('Number of Posts')
92+
plt.grid(axis='y', linestyle='--')
93+
plt.tight_layout()
94+
plt.show()
95+
96+
# Filter the DataFrame for the selected author's posts
97+
author_posts = post_data[post_data['author'] == author_name].copy()
98+
99+
# Ensure that 'date_time' is in datetime format for plotting
100+
author_posts['date_time'] = pd.to_datetime(
101+
author_posts['date_time'])
102+
103+
# Extract the content of the selected author's posts
104+
author_content = author_posts['content']
105+
106+
print(author_content)
107+
108+
# Generate a word cloud from the selected author's posts content
109+
all_content = ' '.join(author_content.astype(str))
110+
# Update the stop words to include common words that may not be useful in the word cloud
111+
updated_stop_words = STOPWORDS.update(["https", "co", "t"])
112+
# Generate the word cloud using WordCloud from wordcloud library
113+
wordcloud = WordCloud(stopwords=updated_stop_words, width=800,
114+
height=400, background_color="white").generate(all_content)
115+
116+
# Display the generated word cloud using matplotlib
117+
plt.figure(figsize=(10, 5))
118+
plt.imshow(wordcloud)
119+
plt.axis("off")
120+
plt.show()
121+
122+
# Convert 'date_time' to datetime format for accurate plotting
123+
author_posts['date_time'] = pd.to_datetime(
124+
author_posts['date_time'])
125+
126+
print(author_posts)
127+
128+
# Plot the daily number of likes and shares for the selected author using matplotlib
129+
plt.figure(figsize=(12, 6))
130+
plt.plot(author_posts['date_time'], author_posts['number_of_likes'],
131+
label='Daily Likes', linestyle='-', color='blue')
132+
plt.plot(author_posts['date_time'], author_posts['number_of_shares'],
133+
label='Daily Shares', linestyle='-', color='orange')
134+
plt.title(f'Daily Likes and Shares for {author_name}')
135+
plt.xlabel('Date')
136+
plt.ylabel('Count of Likes/Shares')
137+
plt.legend()
138+
plt.grid(axis='y', linestyle='--')
139+
plt.tight_layout()
140+
plt.show()
141+
142+
143+
# Plot the daily number of likes and shares for the selected author using Plotly Express
144+
fig = px.line(
145+
author_posts,
146+
x='date_time',
147+
y=['number_of_likes', 'number_of_shares'],
148+
labels={'date_time': 'Date', 'value': 'Count'},
149+
title=f'Daily Likes and Shares for {author_name}',
150+
markers=True,
151+
color_discrete_map={'number_of_likes': 'blue',
152+
'number_of_shares': 'orange'}
153+
)
154+
fig.update_layout(
155+
xaxis_title='Date',
156+
yaxis_title='Count of Likes/Shares',
157+
legend_title='Metrics'
158+
)
159+
fig.show()
160+
161+
162+
# ending message
163+
print("Thank you for using the Social Media Analyzer!")
164+
print("-" * 75)
165+
print("We hope you found the analysis and visualizations helpful.")
166+
print("Feel free to explore the data further or modify the code for your own analysis.")
167+
print("-" * 75)
168+
print("Goodbye!")
169+
print("-" * 75)
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Social Media Post Analysis - Project Requirements
2+
3+
# Core Data Analysis (Required)
4+
pandas>=2.0.0
5+
6+
# Data Visualization (Required)
7+
matplotlib>=3.7.0
8+
plotly>=5.15.0
9+
10+
# Word Cloud Generation (For text analysis)
11+
wordcloud>=1.9.2
12+
13+
# HTTP Requests (For downloading CSV data)
14+
requests>=2.31.0
15+
16+
# Additional HTTP Client (Used in your imports)
17+
httpx>=0.24.0

stock-data-with-yfinance/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,8 @@ stock-data-with-yfinance/
6868

6969
## 🙏 Acknowledgements
7070

71-
This program was adapted and modified by Jesus Torres as an assignment for the CIDM 4310/5310 Business Intelligence and Decision Support Systems course at West Texas A&M University, under the guidance of Dr. Cheng (Carl) Zhang.
71+
This program was created for educational purposes as
72+
for the Computer Information and Decision Management Business Intelligence and Decision Support Systems course at West Texas A&M University, under the guidance of Dr. Cheng (Carl) Zhang
7273

7374
- The original codebase was provided as part of the course material.
7475

0 commit comments

Comments
 (0)