Python CLI tool that downloads all photos and text content from an Instagram profile. Produces two outputs: a folder of highest-resolution photos (posts, stories, highlights) and a consolidated JSON file with all text content for LLM voice profiling.
- Python 3.12+
instaloader>=4.15(only dependency; pulls inrequeststransitively)
python3.12 -m venv venv
source venv/bin/activate
pip install instaloaderOr using the venv directly (no activate needed):
./venv/bin/python instagram_scraper.py ...First run (prompts for password, saves session for reuse):
python instagram_scraper.py <target_username> --login <your_username>Subsequent runs reuse the saved session automatically:
python instagram_scraper.py <target_username> --login <your_username>Quick test run:
python instagram_scraper.py <target_username> --login <your_username> --max-posts 5 --skip-stories --skip-highlights --skip-comments| Flag | Description |
|---|---|
target |
Instagram username to scrape (positional) |
--login |
Your Instagram username for authentication (required) |
--output-dir |
Base output directory (default: instagram_scrape) |
--max-posts |
Limit number of posts to scrape (default: all) |
--skip-stories |
Skip active story scraping |
--skip-highlights |
Skip highlight album scraping |
--skip-comments |
Skip owner comment scraping (faster) |
instagram_scrape/<username>/
<username>_content.json # All text content for LLM ingestion
photos/
profile_pic.jpg
posts/
<shortcode>_01.jpg # Single or carousel images
stories/
story_<date>_<id>.jpg
highlights/
<album_title>/
<date>_<id>.jpg
The JSON file includes: bio, captions, hashtags, mentions, the target user's own comments, story/highlight captions, and an aggregated text_summary section designed for LLM voice profiling.
- Authentication required - stories, highlights, and comments need a logged-in session
- 2FA supported - prompts for code when needed; codes are one-time
- Resume support - re-running skips already-downloaded photos
- Ctrl+C safe - saves partial JSON before exiting
- Photos only - videos are skipped
- Rate limiting - large profiles (500+ posts) may take a while; don't use Instagram in another tab during scraping
- Session files are saved to
~/.config/instaloader/and contain auth tokens - don't share them