Autobrr Top List

An automated system that fetches and maintains a list of the top 50 recent movies and series, updated daily via GitHub Actions.

Overview

This project automatically scrapes IMDB's "Most Popular Movies" and "Most Popular TV Shows" lists to create a combined ranking of the top 50 entertainment titles. The list's primary purpose is to be used for autobrr filter automation. The list updates daily at 6:15 AM UTC using GitHub Actions.

Features

Daily Updates: Automatically runs every day at 6:15 AM UTC
No API Keys Required: Uses web scraping from public IMDB pages
Combined List: Merges movies and TV shows into a single ranked list
Content Filtering: Filter by release year and user rating for fresh, high-quality content
Private Tracker Optimized: Default filters target recent, well-rated content ideal for autobrr
Simple Output Format: Clean JSON format with just titles
Configurable: Fully configurable via environment variables or .env file
Type Safe: Built with Pydantic models and full type annotations
Modular Architecture: Clean separation of concerns with dedicated classes
GitHub Pages Deployment: Public URLs accessible via simple HTTP GET requests
Robust Error Handling: Graceful failure with empty output on scraping errors

Output Format

The main output file top-list.json contains a simple array of objects:

[
    { "title": "Superman" },
    { "title": "Squid Game" },
    { "title": "Wicked" },
    { "title": "Wednesday" },
    { "title": "Moana 2" }
]

A detailed version top-list-detailed.json includes additional metadata:

{
    "last_updated": "2025-01-08T15:30:00.000Z",
    "total_items": 50,
    "items": [
        {
            "title": "Superman",
            "type": "movie",
            "average_rating": 7.8
        }
    ]
}

Setup Instructions

Fork this repository to your GitHub account
Enable GitHub Actions in your repository:
- Go to the "Actions" tab in your repository
- If prompted, click "I understand my workflows, go ahead and enable them"
Configure repository permissions:
- Go to Settings → Actions → General
- Under "Workflow permissions", select "Read and write permissions"
- Check "Allow GitHub Actions to create and approve pull requests"
- Click "Save"
Enable GitHub Pages:
- Go to Settings → Pages
- Under "Source", select "Deploy from a branch"
- Choose "gh-pages" branch and "/ (root)" folder
- Click "Save"
Manual trigger (optional):
- Go to the "Actions" tab
- Click on "Update Top Movies and Series List"
- Click "Run workflow" to test the setup

Accessing the Data

The generated JSON files are automatically deployed to GitHub Pages and accessible via direct URLs:

Public URLs

Once you've set up the repository, the data will be available at:

Simple List: https://YOUR_USERNAME.github.io/YOUR_REPO_NAME/top-list.json
Detailed List: https://YOUR_USERNAME.github.io/YOUR_REPO_NAME/top-list-detailed.json
Web Interface: https://YOUR_USERNAME.github.io/YOUR_REPO_NAME/

Direct API Access

You can fetch the data directly with any HTTP client:

# Get the simple list
curl https://YOUR_USERNAME.github.io/YOUR_REPO_NAME/top-list.json

# Get the detailed list
curl https://YOUR_USERNAME.github.io/YOUR_REPO_NAME/top-list-detailed.json

Example Usage

// Fetch in JavaScript
fetch("https://YOUR_USERNAME.github.io/YOUR_REPO_NAME/top-list.json")
    .then((response) => response.json())
    .then((data) => console.log(data));

# Fetch in Python
import requests
response = requests.get('https://YOUR_USERNAME.github.io/YOUR_REPO_NAME/top-list.json')
data = response.json()

The URLs always point to the latest data - no authentication, downloading, or unpacking required!

Configuration

The scraper is fully configurable via environment variables. Create a .env file in the root directory to customize settings:

# Copy the example configuration
cp .env.example .env

Available Configuration Options

Variable	Default	Description
`SCRAPER_MAX_MOVIES`	50	Maximum number of movies to fetch (1-100)
`SCRAPER_MAX_TV_SHOWS`	50	Maximum number of TV shows to fetch (1-100)
`SCRAPER_MAX_TOTAL_ITEMS`	100	Maximum total items in final list (1-200)
`SCRAPER_MIN_YEAR`	last 5 years	Minimum release year filter (None to disable)
`SCRAPER_MAX_YEAR`	None	Maximum release year filter (None to disable)
`SCRAPER_MIN_RATING`	6.0	Minimum user rating filter (None to disable)
`SCRAPER_MAX_RATING`	None	Maximum user rating filter (None to disable)
`SCRAPER_REQUEST_TIMEOUT`	15	Request timeout in seconds (5-60)
`SCRAPER_REQUEST_DELAY`	2.0	Delay between requests in seconds (0.1-10.0)
`SCRAPER_SIMPLE_OUTPUT_FILE`	top-list.json	Simple output filename
`SCRAPER_DETAILED_OUTPUT_FILE`	top-list-detailed.json	Detailed output filename

Content Filtering

The scraper supports filtering content by release year and user rating to ensure only fresh, high-quality content is included - perfect for private tracker optimization:

Filter Examples:

# Only very recent content (last 2 years)
SCRAPER_MIN_YEAR=2023
SCRAPER_MAX_YEAR=None

# High-quality content only
SCRAPER_MIN_RATING=7.5
SCRAPER_MAX_RATING=None

# Specific year range
SCRAPER_MIN_YEAR=2022
SCRAPER_MAX_YEAR=2024

# Rating range
SCRAPER_MIN_RATING=6.5
SCRAPER_MAX_RATING=8.0

# Disable all filters (get all popular content)
SCRAPER_MIN_YEAR=None
SCRAPER_MAX_YEAR=None
SCRAPER_MIN_RATING=None
SCRAPER_MAX_RATING=None

Example Configuration

# Fetch more content
SCRAPER_MAX_MOVIES=40
SCRAPER_MAX_TV_SHOWS=40
SCRAPER_MAX_TOTAL_ITEMS=75

# Slower, more respectful scraping
SCRAPER_REQUEST_TIMEOUT=20
SCRAPER_REQUEST_DELAY=3.0

# Custom output files
SCRAPER_SIMPLE_OUTPUT_FILE=my-top-list.json
SCRAPER_DETAILED_OUTPUT_FILE=my-detailed-list.json

How It Works

Scraping: The script fetches data from IMDB's popular movies and TV shows charts
Processing: Removes ranking numbers, extracts clean titles
Combining: Interleaves movies and TV shows for variety
Output: Generates both simple and detailed JSON files
Automation: GitHub Actions deploys the files to GitHub Pages daily

Data Sources

IMDB Most Popular Movies: https://www.imdb.com/chart/moviemeter/
IMDB Most Popular TV Shows: https://www.imdb.com/chart/tvmeter/

These lists are updated by IMDB based on user activity and page views, providing a good indicator of current popularity.

Troubleshooting

No Data Fetched

If scraping fails, the script will output empty arrays to ensure the files are always valid JSON. Check the GitHub Actions logs for error details.

Workflow Not Running

Ensure GitHub Actions are enabled in your repository
Check that workflow permissions are set to "Read and write"
Verify the repository is not private (or you have GitHub Pro for private repos)

Permission Denied Error (403)

If you get a "Permission denied" error when the workflow tries to push to gh-pages:

Check Repository Settings:
- Go to Settings → Actions → General
- Under "Workflow permissions", select "Read and write permissions"
- Save the settings
Verify GitHub Pages Configuration:
- Go to Settings → Pages
- Ensure "Deploy from a branch" is selected
- Choose "gh-pages" as the source branch
- The workflow will create this branch automatically on first run
Repository Ownership:
- Make sure you have admin/write access to the repository
- If it's a forked repository, you may need to enable Actions in your fork

Rate Limiting

The script includes delays between requests to be respectful to IMDB's servers. If you encounter rate limiting, you can increase the SCRAPER_REQUEST_DELAY setting.

Legal Considerations

This project scrapes publicly available data from IMDB for personal/educational use. Please ensure compliance with:

IMDB's terms of service
Rate limiting (built into the script)
Fair use principles

Contributing

Feel free to submit issues or pull requests to improve the project. Some ideas:

Add more data sources
Improve error handling
Add filtering options
Include additional metadata

License

This project is open source and available under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
autobrr_top_list		autobrr_top_list
templates		templates
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
fetch_top_list.py		fetch_top_list.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Autobrr Top List

Overview

Features

Output Format

Setup Instructions

Accessing the Data

Public URLs

Direct API Access

Example Usage

Configuration

Available Configuration Options

Content Filtering

Example Configuration

How It Works

Data Sources

Troubleshooting

No Data Fetched

Workflow Not Running

Permission Denied Error (403)

Rate Limiting

Legal Considerations

Contributing

License

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Autobrr Top List

Overview

Features

Output Format

Setup Instructions

Accessing the Data

Public URLs

Direct API Access

Example Usage

Configuration

Available Configuration Options

Content Filtering

Example Configuration

How It Works

Data Sources

Troubleshooting

No Data Fetched

Workflow Not Running

Permission Denied Error (403)

Rate Limiting

Legal Considerations

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages