🦅 Web-Scrapper-with-AI

A powerful, Streamlit-based web scraping tool that combines Selenium, BeautifulSoup, and LLaMA 3.2 (via Ollama) to extract meaningful insights from website content — with a built-in chatbot for interactive Q&A!

🔧 Features

🌐 Scrape any website using Selenium + BeautifulSoup
🧹 Clean and structure DOM content automatically
🧠 Ask LLMs (LLaMA 3.2) to parse and extract insights
💬 Built-in chatbot interface for natural language queries
💻 Streamlit UI — fast, simple, and user-friendly

📁 Project Structure

Web-Scrapper-with-AI/
│
├── Firefoxdriver/             # Firefox geckodriver directory
│   └── geckodriver.exe
│
├── __pycache__/               # Python cache
├── .env                       # Environment variables (set geckodriver path)
├── requirements.txt           # Python dependencies
├── main.py                    # Streamlit frontend
├── scrape.py                  # Scraper & DOM cleaning logic
├── parse.py                   # LLaMA-powered content parser
├── OllamaChatBot.py           # Chatbot using LLaMA 3.2
├── eagle.bmp                  # App icon
├── image1.png                 # Optional preview/logo

⚙️ Setup Instructions

1. Clone the Repository

git clone https://github.com/Rahul-18r/Web-Scrapper-with-AI.git
cd Web-Scrapper-with-AI

2. Configure Environment

Create a .env file and specify the path to your geckodriver.exe:

SBR_WEBDRIVER=absolute/path/to/Firefoxdriver/geckodriver.exe

3. Install Dependencies

pip install -r requirements.txt

Ensure Ollama and the llama3 model are installed and running locally:

ollama run llama3

4. Launch the App

streamlit run main.py

🚀 How It Works

🔍 Home Tab

Input the URL of any website
Scrape and clean up raw HTML
Specify what data you want (e.g., product names, blog titles)
LLaMA processes and extracts your requested info

💬 Chatbot Tab

Chat with the LLaMA 3.2 model directly
Ask questions about scraped content or general topics

🧪 Example Use Cases

🔹 "Extract all blog post titles and dates from this page."
🔹 "Summarize all product names and prices."
🔹 "What is the main topic of this website?"

🛠 Tech Stack

Python
Streamlit
Selenium
BeautifulSoup
LangChain
Ollama + LLaMA 3.2

🪪 License

Licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦅 Web-Scrapper-with-AI

🔧 Features

📁 Project Structure

⚙️ Setup Instructions

1. Clone the Repository

2. Configure Environment

3. Install Dependencies

4. Launch the App

🚀 How It Works

🔍 Home Tab

💬 Chatbot Tab

🧪 Example Use Cases

🛠 Tech Stack

🪪 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Firefoxdriver		Firefoxdriver
__pycache__		__pycache__
.env		.env
1bojtpyo.bmp		1bojtpyo.bmp
LICENSE		LICENSE
OllamaChatBot.py		OllamaChatBot.py
README.md		README.md
eagle.bmp		eagle.bmp
image1.png		image1.png
main.py		main.py
parse.py		parse.py
requirements.txt		requirements.txt
scrape.py		scrape.py

License

Rahul-18r/Web-Scrapper-with-AI

Folders and files

Latest commit

History

Repository files navigation

🦅 Web-Scrapper-with-AI

🔧 Features

📁 Project Structure

⚙️ Setup Instructions

1. Clone the Repository

2. Configure Environment

3. Install Dependencies

4. Launch the App

🚀 How It Works

🔍 Home Tab

💬 Chatbot Tab

🧪 Example Use Cases

🛠 Tech Stack

🪪 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages