Skip to content

AndrewQRobb/instascraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

InstaScraper

Python CLI tool that downloads all photos and text content from an Instagram profile. Produces two outputs: a folder of highest-resolution photos (posts, stories, highlights) and a consolidated JSON file with all text content for LLM voice profiling.

Requirements

  • Python 3.12+
  • instaloader>=4.15 (only dependency; pulls in requests transitively)

Setup

python3.12 -m venv venv
source venv/bin/activate
pip install instaloader

Or using the venv directly (no activate needed):

./venv/bin/python instagram_scraper.py ...

Usage

First run (prompts for password, saves session for reuse):

python instagram_scraper.py <target_username> --login <your_username>

Subsequent runs reuse the saved session automatically:

python instagram_scraper.py <target_username> --login <your_username>

Quick test run:

python instagram_scraper.py <target_username> --login <your_username> --max-posts 5 --skip-stories --skip-highlights --skip-comments

Options

Flag Description
target Instagram username to scrape (positional)
--login Your Instagram username for authentication (required)
--output-dir Base output directory (default: instagram_scrape)
--max-posts Limit number of posts to scrape (default: all)
--skip-stories Skip active story scraping
--skip-highlights Skip highlight album scraping
--skip-comments Skip owner comment scraping (faster)

Output

instagram_scrape/<username>/
  <username>_content.json     # All text content for LLM ingestion
  photos/
    profile_pic.jpg
    posts/
      <shortcode>_01.jpg      # Single or carousel images
    stories/
      story_<date>_<id>.jpg
    highlights/
      <album_title>/
        <date>_<id>.jpg

The JSON file includes: bio, captions, hashtags, mentions, the target user's own comments, story/highlight captions, and an aggregated text_summary section designed for LLM voice profiling.

Notes

  • Authentication required - stories, highlights, and comments need a logged-in session
  • 2FA supported - prompts for code when needed; codes are one-time
  • Resume support - re-running skips already-downloaded photos
  • Ctrl+C safe - saves partial JSON before exiting
  • Photos only - videos are skipped
  • Rate limiting - large profiles (500+ posts) may take a while; don't use Instagram in another tab during scraping
  • Session files are saved to ~/.config/instaloader/ and contain auth tokens - don't share them

About

Python tool for downloading photos from public Instagram profiles

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages