Skip to content

Torrafox/e621-metadata-extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

e621 Metadata Fetcher and Extractor

Overview

e621 Metadata Fetcher and Extractor is a tool for downloading and processing database exports from e621. It matches local files against the e621 data export and generates a clean CSV file containing the relevant metadata.

This tool is useful for managing local galleries by providing metadata such as tags, ratings, descriptions, and more.

Features

  • Fetch the latest database exports from e621.
  • Process local files in a directory by matching their MD5 checksums against the data export.
  • Extract relevant metadata (e.g., tags, ratings, URLs) into a CSV and JSON file.

Installation

Prerequisites

  • Python 3.8 or later
  • Stable internet connection

One-liner Installation Command:

For Linux/macOS:

git clone https://github.com/Torrafox/e621-metadata-extractor.git && cd e621-metadata-extractor && python -m venv venv && source venv/bin/activate && pip install -e .

For Windows:

git clone https://github.com/Torrafox/e621-metadata-extractor.git && cd e621-metadata-extractor && python -m venv venv && venv\Scripts\activate && pip install -e .

Manual Steps (Optional):

  1. Clone the Repository:

    git clone https://github.com/Torrafox/e621-metadata-extractor.git
    cd e621-metadata-extractor
  2. Set Up Virtual Environment:

    python -m venv venv
  3. Activate the Virtual Environment:

    • For Linux/macOS:
      source venv/bin/activate
    • For Windows:
      venv\Scripts\activate
  4. Install Dependencies:

    pip install -r requirements.txt

Configuration

Edit config.json in the project directory:

{
  "data_directory": "/path/to/your/e621/media/folder",
  "export_json": false
}
  • data_directory: Path to the local gallery directory to process.
  • export_json: Whether to export the extracted metadata to a JSON file (in addition to CSV).

Usage

Standalone Execution

You can run the tool directly to fetch and process metadata:

python main.py

By default, the script:

  1. Downloads the latest database exports (posts and tags) from e621.
  2. Processes a specified directory of files.
  3. Outputs the extracted metadata to e621_metadata.csv.

As a Library

The repository can also be used as a library in other Python projects. Import the necessary functions:

from e621_metadata_extractor.fetcher import get_latest_dump_urls, download_file
from e621_metadata_extractor.extractor import process_directory

# Example usage
dump_urls = get_latest_dump_urls()
download_file(dump_urls.get("posts"), "posts_dump.csv.gz")
download_file(dump_urls.get("tags"), "tags_dump.csv.gz")
process_directory("/path/to/e621/media/folder", "posts_dump.csv.gz", "tags_dump.csv.gz", "output.csv")

Troubleshooting

  • Missing Dependencies: Ensure all dependencies from requirements.txt are installed.
  • Gallery Not Found: Ensure the data_directory in config.json points to the correct folder.

Limitations

  • Site-Specific: This tool only works with metadata from e621.net and cannot process files from other sites.
  • Large Files: The e621 metadata dump is approximately 1.4 GB, so ensure you have sufficient disk space and a stable internet connection.

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages