A simple open-source tool for downloading dog images from Amazon's 404 error pages.
This project downloads dog images that appear on Amazon's error pages. These images are accessible via predictable URLs like:
- https://images-na.ssl-images-amazon.com/images/G/01/error/1._TTD_.jpg
- https://images-na.ssl-images-amazon.com/images/G/01/error/2._TTD_.jpg
- And so on...
The project includes 200 pre-downloaded dog images ready to use.
git clone https://github.com/good-sellers/amazon-dogs.git
cd amazon-dogsThe downloaded images are located in data/cleaned_dogs/images/:
- 200 dog images (cleaned_dog_1.jpg to cleaned_dog_200.jpg)
- Index file:
data/cleaned_dogs/cleaned_index.json
You can download just the images from the repository without cloning the entire codebase.
The downloader works by:
-
URL Pattern: Amazon uses a predictable URL pattern for error page images
- Base URL:
https://images-na.ssl-images-amazon.com/images/G/01/error/ - File format:
{number}._TTD_.jpg(starting from 1)
- Base URL:
-
Sequential Fetching: The crawler tries image URLs in sequence (1, 2, 3, 4...)
-
Smart Stopping: Stops when:
- Reaches max number of images (default: 1000)
- Gets 100 consecutive 404 errors
- 3-second delay between requests to avoid rate limiting
-
Image Storage: Successfully downloaded images are saved to
data/dogs/with an index file
If you want to download more images:
# Install dependencies
pip install -r requirements.txt
# Run the crawler
python dog_crawler.pyMIT License