Noon.com Scrapers

Noon.com Scrapers

Description

This repository contains Python-based scrapers for Noon.com search results and product pages. These scrapers leverage the Crawlbase Crawling API to handle JavaScript rendering, CAPTCHA challenges, and anti-bot protections. The extracted data is processed using BeautifulSoup for HTML parsing and Pandas for structured storage.

➡ Read the full blog here to learn more.

Scrapers Overview

Noon.com Search Results Scraper

The Noon.com Search Results Scraper (noon_serp_scraper.py) extracts:

Product Title
Price & Currency
Ratings
Product Page URL

It also automatically handles pagination, ensuring comprehensive data extraction. It saves the extracted data in a CSV file.

Noon.com Product Page Scraper

The Noon.com Product Page Scraper (noon_product_page_scraper.py) extracts detailed product information, including:

Product Name
Price
Product Highlights
Specifications

It saves the extracted data in a CSV file.

Environment Setup

Ensure that Python is installed on your system. Check the version using:

# Use python3 if you're on Linux with Python 3 installed
python --version

Next, install the required dependencies:

pip install crawlbase beautifulsoup4 pandas

Crawlbase – Handles JavaScript rendering and bypasses bot protections.
BeautifulSoup – Parses and extracts structured data from HTML.
Pandas – Formats and stores extracted data, enabling CSV exports.

Running the Scrapers

Get Your Crawlbase Access Token
- Sign up for Crawlbase here to get an API token.
- Use the JS token for Noon.com scraping, as the site uses JavaScript-rendered content.
Update the Scraper with Your Token
- Replace "YOUR_CRAWLBASE_TOKEN" in the script with your Crawlbase JS Token.
Run the Scraper

# Use python3 if required (for Linux/macOS)
python SCRAPER_FILE_NAME.py

Replace "SCRAPER_FILE_NAME.py" with the actual script name (noon_serp_scraper.py or noon_product_page_scraper.py).

To-Do List

Expand scrapers to extract additional product details.
Optimize data storage and export formats (e.g., JSON, database integration).
Enhance scraper efficiency and speed.

Why Use This Scraper?

Bypasses anti-bot protections with Crawlbase.
Handles JavaScript-rendered content seamlessly.
Extracts accurate and structured product data efficiently.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Noon.com Scrapers

Description

Scrapers Overview

Noon.com Search Results Scraper

Noon.com Product Page Scraper

Environment Setup

Running the Scrapers

To-Do List

Why Use This Scraper?

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
noon_product_page_scraper.py		noon_product_page_scraper.py
noon_serp_scraper.py		noon_serp_scraper.py

ScraperHub/noon-scraper

Folders and files

Latest commit

History

Repository files navigation

Noon.com Scrapers

Description

Scrapers Overview

Noon.com Search Results Scraper

Noon.com Product Page Scraper

Environment Setup

Running the Scrapers

To-Do List

Why Use This Scraper?

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages