This project allows you to scrape websites and analyze their content using Ollama's Llama3 AI model. The scraper fetches and processes data from websites, allowing you to ask questions about the extracted DOM content and download the results.
If you prefer the Italian version of this project, you can find it here.
Before you begin, make sure you have the following installed:
- Python 3.8 or higher
pip(Python package installer)
-
Download Ollama:
- Ollama is a platform that runs AI models like Llama3. You can download Ollama for your system from the official website:
Ollama Download
- Ollama is a platform that runs AI models like Llama3. You can download Ollama for your system from the official website:
-
Install Llama3 model:
- After downloading Ollama, you need to install the Llama3 model. Open a terminal or command prompt and run the following command to install the Llama3 model:
ollama install llama3
- After downloading Ollama, you need to install the Llama3 model. Open a terminal or command prompt and run the following command to install the Llama3 model:
-
Clone the repository:
- First, clone the GitHub repository to your local machine by running the following command:
git clone https://github.com/dddevid/AI-Web-Scraper-Ollama.git
- First, clone the GitHub repository to your local machine by running the following command:
-
Navigate into the project directory:
- Change into the project directory by running:
cd AI-Web-Scraper-Ollama
- Change into the project directory by running:
-
Create a virtual environment (optional but recommended):
- Create a virtual environment to manage the project's dependencies:
python -m venv venv
- Create a virtual environment to manage the project's dependencies:
-
Activate the virtual environment:
- On Windows:
venv\Scripts\activate - On macOS/Linux:
source venv/bin/activate
- On Windows:
-
Install required dependencies:
- The
requirements.txtfile is located in theddbqscriptfolder. Install the necessary Python libraries by running:pip install -r ddbqscript/requirements.txt
- The
- Run the script using the batch file:
- Use the
EseguiScript.batfile to start the application:\EseguiScript.bat
- Use the
- Run the script using Streamlit:
- Start the Streamlit app by running the following command:
streamlit run ddbqscript/main.py
- Start the Streamlit app by running the following command:
- Access the app:
- The Streamlit interface should open in your web browser. You can now enter a website URL, scrape its content, and analyze it using Llama3.
- Scraping a website: Enter a URL and click on "Scrape the website" to scrape the site's content.
- Analyzing content: After scraping, you can input a description of what you want to analyze, and the AI will process the content.
- Downloading results: Once the analysis is complete, you can download the results in both JSON and TXT formats.
- Ollama not found: If you encounter an error related to Ollama, make sure it is correctly installed and added to your system's PATH.
- Missing dependencies: Ensure all required Python packages are installed by running
pip install -r ddbqscript/requirements.txt.
By clicking the link, you can choose to donate any amount or subscribe for 30€ per month to support the ongoing development of this project.
