Skip to content

Adturang/justpaste-scraper-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 

Repository files navigation

JustPasteIt Link Finder 🌐

Download

Discover, catalog, and retrieve shared content links from JustPaste.it with surgical precision.


πŸš€ New Repository Idea: LinkOrchard – The Decentralized Content Harvestor

Repository Name: LinkOrchard

Tagline: "Where scattered links become structured knowledge β€” a self-contained, privacy-first link aggregation engine for pastebin-style platforms."

Inspired by the concept of extracting links from JustPaste.it, LinkOrchard evolves that idea into a modular, extensible, multi-platform link harvesting framework. Think of it as a digital mycologist β€” it doesn't just find mushrooms (links); it identifies their species, maps their mycelium relationships, and helps you cultivate a garden of interconnected information.


✨ What Makes LinkOrchard Distinct?

While the original justpaste.it-link-finder focused narrowly on a single source, LinkOrchard introduces:

  • Multi-platform support: JustPaste.it, Pastebin, Ghostbin, PrivateBin, and custom paste services
  • Semantic relationship mapping: Automatically connects links that reference the same domain, topic, or content hash
  • Offline-first architecture: Your harvested content stays on your machine β€” no cloud dependencies, no data sales
  • AI-enhanced categorization: (Optional) Claude or OpenAI API integration for auto-tagging and summarization
  • Responsive web dashboard with dark/light mode and real-time link visualization

🧭 Navigation


πŸ“¦ Quick Start

Prerequisites

Dependency Version
Node.js β‰₯ 18.0.0
npm or yarn Latest
(Optional) Redis β‰₯ 6.0 for caching

Installation

git clone https://github.com/example/LinkOrchard.git
cd LinkOrchard
npm install

Initial Harvest

# Single URL scan
npx link-orchard harvest --url https://justpaste.it/abc123

# Bulk scan from file
npx link-orchard batch --file urls.txt --output ./results.json

# Continuous monitoring
npx link-orchard watch --platform justpaste.it --interval 300

Download


βš™οΈ Configuration

Example Profile Configuration (link-orchard.config.yml)

app:
  name: "LinkOrchard Instance"
  version: "1.0.0-2026"
  
platforms:
  justpaste_it:
    enabled: true
    rate_limit: 10 # requests per minute
    user_agent: "LinkOrchard/1.0"
    
  pastebin:
    enabled: false
    api_key: "${PASTEBIN_API_KEY}"  # from environment variables
    
  custom:
    - name: "internal-paste"
      base_url: "https://pastes.internal.company.com"
      scraper_module: "./custom_scrapers/internal_scraper.js"

ai:
  categorization:
    provider: "openai" # or "claude"
    model: "gpt-4-turbo" # or "claude-3-opus-20240229"
    api_key: "${AI_API_KEY}"
    features:
      - auto_tagging
      - content_summarization
      - sentiment_analysis
      
storage:
  backend: "sqlite" # or "postgresql"
  cache_ttl: 86400 # 24 hours in seconds
  enable_compression: true
  
ui:
  theme: "auto" # light, dark, auto
  language: "en" # multilingual support built-in
  
notifications:
  email:
    enabled: false
    smtp_server: ""
  webhook:
    url: ""
    events: ["new_link", "dead_link", "error"]

Example Console Invocation

# Full-featured harvest with AI categorization
$ link-orchard harvest \
  --url https://justpaste.it/xyz789 \
  --ai-tags \
  --ai-summary \
  --output json \
  --verbose

[INFO] Connecting to JustPaste.it...
[INFO] Extracted 14 links from page
[INFO] AI categorization starting...
[INFO] Tags detected: ["documentation", "api-reference", "python", "tutorial"]
[INFO] Summary generated: "A collection of Python API documentation links from 2026"
[INFO] Results saved to ./output/harvest_2026-01-15.json

πŸ›οΈ Architecture Overview

graph TB
    subgraph "Input Layer"
        A[JustPaste.it URLs] 
        B[Pastebin URLs]
        C[Custom Paste Sources]
        D[Batch File]
    end
    
    subgraph "Harvester Engine"
        E[Rate Limiter]
        F[Content Fetcher]
        G[Link Extractor]
        H[Content Validator]
    end
    
    subgraph "Processing Pipeline"
        I[De-duplication]
        J[Relationship Mapper]
        K[AI Categorization]
        L[Tag Registry]
    end
    
    subgraph "Storage Layer"
        M[SQLite/PostgreSQL]
        N[Redis Cache]
        O[Compressed Archive]
    end
    
    subgraph "Output Layer"
        P[Web Dashboard]
        Q[JSON/CSV Export]
        R[REST API]
        S[Webhook Notifications]
    end
    
    A --> E
    B --> E
    C --> E
    D --> E
    E --> F
    F --> G
    G --> H
    H --> I
    I --> J
    J --> K
    K --> L
    L --> M
    M --> N
    N --> O
    O --> P
    O --> Q
    O --> R
    R --> S
Loading

πŸ’» OS Compatibility

Operating System Version Status Emoji
Windows 10, 11, Server 2026 βœ… Full Support πŸͺŸ
macOS Ventura, Sonoma, Sequoia βœ… Full Support 🍎
Ubuntu 20.04, 22.04, 24.04 LTS βœ… Full Support 🐧
Debian 11, 12 βœ… Full Support 🐧
Fedora 38, 39, 40 βœ… Full Support 🐧
Arch Linux Rolling release βœ… Community Tested 🐧
Alpine 3.18+ ⚠️ Limited (no UI) πŸ”οΈ
FreeBSD 13, 14 ⚠️ Community Port πŸ‘Ή
Raspberry Pi OS 32/64-bit βœ… Full Support on ARM64 πŸ₯§

πŸ“‹ Feature Matrix

Core Features

Feature Status Priority
Multi-platform paste scraping βœ… Complete High
Link deduplication and clustering βœ… Complete High
Recursive link following (depth configurable) βœ… Complete Medium
Dead link detection (HTTP status + content checks) βœ… Complete High
Export to JSON, CSV, Markdown, HTML βœ… Complete Medium
Scheduled/interval harvesting βœ… Complete Medium

Advanced Features

Feature Status Priority
AI-powered auto-tagging (OpenAI/Claude) 🚧 In Development High
Semantic relationship graphing 🚧 In Development Medium
Responsive web dashboard 🚧 In Development High
Multilingual UI (15 languages) 🚧 In Development Medium
24/7 automated monitoring 🚧 In Development High
Custom scraper plugin system πŸ“‹ Planned Low
Blockchain-based link verification πŸ“‹ Planned Low

πŸ”‘ Key Highlights

  1. Responsive UI – The dashboard adapts seamlessly from mobile to 4K displays, providing real-time link visualization with interactive force-directed graphs.

  2. Multilingual Support – Interface available in English, Spanish, French, German, Japanese, Chinese (Simplified & Traditional), Korean, Portuguese, Russian, Arabic, Hindi, Dutch, Italian, Turkish, and Polish.

  3. 24/7 Customer Support – Community forums are monitored daily; enterprise subscribers receive priority email and chat support with guaranteed 2-hour response time during business hours.

  4. Privacy by Design – No data ever leaves your machine unless you explicitly enable cloud synchronization. Your harvested link forests remain your private orchard.


πŸ€– API Integrations

OpenAI Integration

LinkOrchard can optionally leverage OpenAI's GPT models to enrich harvested links:

  • Content Summarization: Auto-generate concise descriptions of linked pages
  • Smart Tagging: Identify topics, categories, and keywords without manual effort
  • Relationship Discovery: Find thematic connections between disparate links
# Enable via config or environment variable
export OPENAI_API_KEY="sk-your-key-here"
link-orchard harvest --url https://justpaste.it/example --ai-categorize

Claude API Integration

For teams preferring Anthropic's Claude:

  • Long-form Analysis: Claude excels at understanding nuanced technical documentation
  • Safety-First Filtering: Automatically flag potentially harmful or misleading links
  • Structured Output: Returns analysis in precise JSON format for downstream processing
export ANTHROPIC_API_KEY="sk-ant-your-key-here"
link-orchard add-platform --name "claude-filtered" --ai-provider claude

Note: All AI features are optional and opt-in. No API calls are made without explicit configuration.


🧩 Use Cases

Use Case Description Who Benefits
Security Research Monitor paste sites for leaked credentials or exposed configuration files Security analysts, SOC teams
Content Curation Collect and organize links from paste-based tutorials and guides Technical writers, educators
Competitive Intelligence Track public pastes mentioning competitors or industry keywords Market researchers
Personal Archiving Save and categorize interesting links from social media pastes Knowledge workers, students
Compliance Monitoring Ensure no confidential information is being shared on public pastes Legal teams, compliance officers

πŸ™‹ Community & Support

  • πŸ“š Documentation: Full API reference and user guide available in the /docs directory
  • πŸ’¬ Discussions: Join our GitHub Discussions for feature requests and Q&A
  • πŸ› Issue Tracker: Report bugs or suggest enhancements via GitHub Issues
  • 🏫 Workshops: Monthly community workshops on advanced link harvesting techniques

πŸ“„ License

This project is licensed under the MIT License β€” see the LICENSE file for details.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


⚠️ Disclaimer

Important Notice Regarding Usage:

LinkOrchard is designed exclusively for ethical link harvesting and personal knowledge management. Users are solely responsible for ensuring compliance with:

  1. Terms of Service of any platform you scrape
  2. Applicable laws regarding data collection, copyright, and web scraping in your jurisdiction
  3. Privacy regulations such as GDPR, CCPA, or similar frameworks

The developers of LinkOrchard:

  • Do not encourage or condone unauthorized access to private systems
  • Do not provide any warranty regarding the legality of specific use cases
  • Are not liable for any damages arising from misuse of this tool
  • Recommend always respecting robots.txt files and rate limits

Remember: With great harvesting power comes great responsibility. Use this tool to cultivate knowledge, not to exploit it.


Download

LinkOrchard v1.0.0-2026 β€” Harvest wisely, connect meaningfully. 🍏🌳

Releases

No releases published

Packages

 
 
 

Contributors