A powerful, Streamlit-based web scraping tool that combines Selenium, BeautifulSoup, and LLaMA 3.2 (via Ollama) to extract meaningful insights from website content — with a built-in chatbot for interactive Q&A!
- 🌐 Scrape any website using Selenium + BeautifulSoup
- 🧹 Clean and structure DOM content automatically
- 🧠 Ask LLMs (LLaMA 3.2) to parse and extract insights
- 💬 Built-in chatbot interface for natural language queries
- 💻 Streamlit UI — fast, simple, and user-friendly
Web-Scrapper-with-AI/
│
├── Firefoxdriver/ # Firefox geckodriver directory
│ └── geckodriver.exe
│
├── __pycache__/ # Python cache
├── .env # Environment variables (set geckodriver path)
├── requirements.txt # Python dependencies
├── main.py # Streamlit frontend
├── scrape.py # Scraper & DOM cleaning logic
├── parse.py # LLaMA-powered content parser
├── OllamaChatBot.py # Chatbot using LLaMA 3.2
├── eagle.bmp # App icon
├── image1.png # Optional preview/logo
git clone https://github.com/Rahul-18r/Web-Scrapper-with-AI.git
cd Web-Scrapper-with-AICreate a .env file and specify the path to your geckodriver.exe:
SBR_WEBDRIVER=absolute/path/to/Firefoxdriver/geckodriver.exepip install -r requirements.txtEnsure Ollama and the llama3 model are installed and running locally:
ollama run llama3streamlit run main.py- Input the URL of any website
- Scrape and clean up raw HTML
- Specify what data you want (e.g., product names, blog titles)
- LLaMA processes and extracts your requested info
- Chat with the LLaMA 3.2 model directly
- Ask questions about scraped content or general topics
- 🔹 "Extract all blog post titles and dates from this page."
- 🔹 "Summarize all product names and prices."
- 🔹 "What is the main topic of this website?"
- Python
- Streamlit
- Selenium
- BeautifulSoup
- LangChain
- Ollama + LLaMA 3.2
Licensed under the MIT License.