This project involves a comprehensive analysis of Amazon Prime’s movies and TV shows data using SQL. The goal is to extract valuable insights and answer various business questions based on the dataset. The following README provides a detailed account of the project's objectives, business problems, solutions, findings, and conclusions.
- Analyze the distribution of content types (movies vs TV shows).
- Identify the most common ratings for movies and TV shows.
- List and analyze content based on release years, countries, and durations.
- Explore and categorize content based on specific criteria and keywords.
Include details about the dataset used, such as:
amazon_prime_titles.csvfile- Source of the data (e.g., Kaggle or any public repository)
Below are 15 business questions addressed using SQL on the Amazon Prime Movies and TV Shows dataset.
SELECT
type,
COUNT(*) AS total_content
FROM amazon
GROUP BY type;SELECT DISTINCT type, rating
FROM (
SELECT
type,
rating,
COUNT(*),
RANK() OVER(PARTITION BY type ORDER BY COUNT(*) DESC) AS ranking
FROM amazon
GROUP BY 1, 2
) AS t1
WHERE ranking = 1;- 📌 List all "Movies" released in 2018
SELECT *
FROM amazon
WHERE type = 'Movie'
AND release_year = 2018;- 📌 Top 5 countries with the most content on Amazon
SELECT
UNNEST(STRING_TO_ARRAY(country, ',')) AS new_country,
COUNT(*) AS total_content
FROM amazon
GROUP BY country
ORDER BY COUNT(*) DESC
LIMIT 5;- 📌 Identify the longest movie or TV show duration
SELECT *
FROM amazon
WHERE type = 'Movie'
AND duration = (SELECT MAX(duration) FROM amazon);- 📌 Find content added in the last 5 years
SELECT *
FROM amazon
WHERE TO_DATE(date_added, 'Month,dd,yyyy') >= CURRENT_DATE - INTERVAL '5 years';- 📌 Find all the content by director "Mark"
SELECT *
FROM amazon
WHERE director ILIKE '%Mark%';- 📌 List all TV shows with more than 2 seasons
SELECT *
FROM amazon
WHERE type = 'TV Show'
AND SPLIT_PART(duration, ' ', 1)::INT >= 2;- 📌 Count of content items in each genre
SELECT
UNNEST(STRING_TO_ARRAY(listed_in, ',')) AS genre,
COUNT(*)
FROM amazon
GROUP BY 1;- 📌 Top 5 years with highest average content released by India
SELECT
EXTRACT(YEAR FROM TO_DATE(date_added, 'Month,dd,yyyy')) AS year,
COUNT(*) AS yearly_content,
ROUND(
COUNT(*)::INT / (SELECT COUNT(*) FROM amazon WHERE country = 'India')::INT * 100, 2
) AS avg_release_per_year
FROM amazon
WHERE country = 'India'
GROUP BY 1;- 📌 List two movies that are Documentaries
SELECT *
FROM amazon
WHERE listed_in ILIKE '%Documentary%'
LIMIT 2;- 📌 Find all content without a director
SELECT *
FROM amazon
WHERE director IS NULL;- 📌 Movies where Salman Khan appeared in the last 10 years
SELECT *
FROM amazon
WHERE casts ILIKE '%Salman%'
AND release_year > EXTRACT(YEAR FROM CURRENT_DATE) - 10;- 📌 Top 10 actors in Indian-produced movies
SELECT
UNNEST(STRING_TO_ARRAY(casts, ',')) AS actors,
COUNT(*) AS total_content
FROM amazon
WHERE casts ILIKE '%ind%'
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;- 📌 Categorize content as "Good" or "Bad" based on keywords
WITH new_table AS (
SELECT *,
CASE
WHEN description ILIKE '%kill%' OR description ILIKE '%violence%' THEN 'Bad_Content'
ELSE 'Good_Content'
END AS category
FROM amazon
)
SELECT category, COUNT(*) AS total_content
FROM new_table
GROUP BY 1;As someone who recently started learning SQL and data analysis, I took on the task of exploring Amazon Prime's movie and TV show dataset to apply my skills in a practical way. Here are the key insights I discovered:
Content Distribution: The platform offers a wide variety of movies and TV shows. This helped me understand how content is distributed across different types and categories.
Common Ratings: I found the most frequent content ratings, which gave me an idea about the general audience Amazon Prime targets.
Geographical Insights: By analyzing content by country, especially India, I learned how regional trends impact content availability and production.
Content Categorization: Using keywords like "kill" and "violence," I practiced classifying content into categories like "Good" or "Bad," which helped me understand basic sentiment-style analysis.
