Skip to content

Ndolo7/jewel

Repository files navigation

Logo
Build GitHub Release Docker Pulls

Jewel: AI-Powered Web Scraper

Jewel is an AI-powered tool for conducting web research and information gathering. It leverages LLMs to refine queries, filter search results from web search engines, scrape content, and provide comprehensive research summaries.

InstallationUsageContributingAcknowledgements


Features

  • ⚙️ Modular Architecture – Clean separation between search, scrape, and LLM workflows.
  • 🤖 Multi-Model Support – Easily switch between OpenAI, Claude, Gemini or local models like Ollama.
  • 💻 CLI-First Design – Built for terminal warriors and automation ninjas.
  • 🐳 Docker-Ready – Optional Docker deployment for clean, isolated usage.
  • 📝 Custom Reporting – Save investigation output to file for reporting or further analysis.
  • 🧩 Extensible – Easy to plug in new search engines, models, or output formats.

⚠️ Disclaimer

This tool is intended for educational and lawful research purposes only. Always respect website terms of service, robots.txt files, and rate limits when scraping. The author is not responsible for any misuse of this tool or the data gathered using it.

Use responsibly and at your own risk. Ensure you comply with all relevant laws, website terms of service, and institutional policies before conducting web scraping activities.

Additionally, Jewel leverages third-party APIs (including LLMs). Be cautious when sending potentially sensitive queries, and review the terms of service for any API or model provider you use.

Installation

Note

This tool performs direct HTTP/HTTPS requests to web search engines and websites. No special network configuration is required. The tool includes rate limiting to be respectful to target websites.

Tip

You can provide OpenAI or Anthropic or Google API key by either creating .env file (refer to sample env file in the repo) or by setting env variables in PATH.

For Ollama, provide http://host.docker.internal:11434 as Ollama URL if running using docker image method or http://127.0.0.1:11434 for other methods.

Docker (Web UI Mode) [Recommended]

docker run --rm \
   -v "$(pwd)/.env:/app/.env" \
   --add-host=host.docker.internal:host-gateway \
   -p 4000:4000 \
   jewel:latest ui --ui-port 4000 --ui-host 0.0.0.0

Release Binary (CLI Mode)

  • Download the appropriate binary for your system from the latest release
  • Unzip the file, make it executable
chmod +x jewel
  • Run the binary as:
jewel cli --model gpt-4.1 --query "python web scraping tutorials"

Using Python (Development Version)

  • With Python 3.10+ installed, run the following:
pip install -r requirements.txt
python main.py -m gpt-4.1 -q "machine learning resources" -t 12

Usage

Jewel: AI-Powered Web Scraper

options:
  -h, --help            show this help message and exit
  --model {gpt4o,gpt-4.1,claude-3-5-sonnet-latest,llama3.1,gemini-2.5-flash}, -m {gpt4o,gpt-4.1,claude-3-5-sonnet-latest,llama3.1,gemini-2.5-flash}
                        Select LLM model (e.g., gpt4o, claude sonnet 3.5, ollama models, gemini 2.5 flash)
  --query QUERY, -q QUERY
                        Web search query
  --threads THREADS, -t THREADS
                        Number of threads to use for scraping (Default: 5)
  --output OUTPUT, -o OUTPUT
                        Filename to save the final research summary. If not provided, a filename based on the
                        current date and time is used.

Example commands:
 - jewel -m gpt4o -q "python web scraping" -t 12
 - jewel --model claude-3-5-sonnet-latest --query "machine learning tutorials" --threads 8 --output filename
 - jewel -m llama3.1 -q "open source projects"
 - jewel -m gemini-2.5-flash -q "data science resources"

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  • Fork the repository
  • Create your feature branch (git checkout -b feature/amazing-feature)
  • Commit your changes (git commit -m 'Add some amazing feature')
  • Push to the branch (git push origin feature/amazing-feature)
  • Open a Pull Request

Open an Issue for any of these situations:

  • If you spot a bug or bad code
  • If you have a feature request idea
  • If you have questions or doubts about usage

Acknowledgements

About

AI-Powered Web Scraping Tool, utilizing Langchain

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors