Guardianmy Reviews Spider Scraper

A robust tool for extracting detailed product reviews from Guardian Malaysia product pages. It transforms scattered customer feedback into clean, structured datasets for analysis, monitoring, and reporting. Ideal for teams needing reliable Guardian Malaysia review data at scale.

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for guardianmy-reviews-spider you've just found your team — Let’s Chat. 👆👆

Introduction

This project extracts product review information from Guardian Malaysia product pages and converts it into structured, analysis-ready data. It solves the problem of manually collecting and organizing customer reviews across multiple products. It is built for analysts, e-commerce teams, researchers, and product managers who rely on accurate review intelligence.

Product Review Intelligence for Guardian Malaysia

Collects all available reviews directly from product pages
Normalizes ratings, titles, content, and metadata into consistent fields
Supports batch processing of multiple product URLs
Enables longitudinal analysis using crawl and publication dates
Designed for clean exports into analytics and BI workflows

Features

Feature	Description
Review Extraction	Scrapes complete review text, titles, and ratings from product pages.
Product Metadata	Captures product identifiers, names, segments, and categories.
Batch URL Support	Processes multiple Guardian Malaysia product URLs in one run.
Structured Output	Returns normalized, analysis-ready review objects.
Temporal Tagging	Includes review dates, crawl dates, and quarter labeling.

What Data This Scraper Extracts

Field Name	Field Description
Product_Id	Unique identifier of the product being reviewed.
Review_Id	Unique identifier for each individual review.
Rating	Numerical rating score associated with the review.
Title	Review title or short summary text.
Body	Main textual content of the review.
Full_Review	Combined review text used for analysis.
Product_Name	Name of the reviewed product.
Product_Segment	High-level product category.
Product_Segment2	Secondary product classification.
Gender	Intended gender segment of the product.
Country	Country associated with the product listing.
Date	Original publication date of the review.
Year_Quarter	Derived year and quarter label for trend analysis.
URL	Source product page URL.
Crawled_Date	Date when the review data was collected.

Example Output

[
      {
        "Product_Id": "121068601",
        "Review_Id": "121068601-rev-1",
        "Rating": 100,
        "Title": "dove-shower-1l-beauty-nour-121068601",
        "Body": "Best!",
        "Sentiment": null,
        "Section": "",
        "Higher_Topic": null,
        "Granular_Topic": null,
        "Source": "Guardian",
        "Full_Review": "Best!",
        "Review_Type": "Product Review",
        "Title_Trans": "",
        "Body_Trans": "Best!",
        "Full_Review_Trans": "Best!",
        "Product_Name_Trans": "dove-shower-1l-beauty-nour-121068601",
        "Product_Segment": "Supplements",
        "Gender": "Unisex",
        "Product_Segment2": "Wellness",
        "Year_Quarter": "2024-Q2",
        "Country": "Malaysia",
        "Date": "06-04-2024",
        "Product_Name": "dove-shower-1l-beauty-nour-121068601",
        "Brand": null,
        "URL": "https://www.guardian.com.my/dove-shower-1l-beauty-nour-121068601.html?page=1",
        "Crawled_Date": "10-06-2025"
      }
    ]

Directory Structure Tree

Guardianmy Reviews Spider/
├── src/
│   ├── main.py
│   ├── review_parser.py
│   ├── product_parser.py
│   ├── validators.py
│   └── utils.py
├── data/
│   ├── sample_input.json
│   └── sample_output.json
├── config/
│   └── settings.example.json
├── requirements.txt
└── README.md

Use Cases

E-commerce teams use it to monitor customer feedback, so they can improve product positioning and listings.
Market analysts use it to study review trends, so they can identify shifts in consumer sentiment.
Brand managers use it to track product perception, so they can respond to recurring issues faster.
Data scientists use it to build sentiment or rating models, so they can predict product performance.

FAQs

How do I provide input URLs? You supply an array of Guardian Malaysia product page URLs, and the scraper processes each page sequentially.

Does it support multiple products at once? Yes, multiple product URLs can be processed in a single run for batch review extraction.

Are translations required for usage? No, translation fields are optional and can be ignored if not needed for your workflow.

Is the output suitable for analytics tools? Yes, the structured format is designed to integrate easily with databases, dashboards, and data pipelines.

Performance Benchmarks and Results

Primary Metric: Processes dozens of product reviews per minute per product page on average.

Reliability Metric: Maintains a high success rate across paginated review pages with consistent field coverage.

Efficiency Metric: Optimized parsing minimizes redundant page processing and memory usage.

Quality Metric: Delivers high data completeness with consistent review-to-product mapping and timestamp accuracy.

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Guardianmy Reviews Spider Scraper

Introduction

Product Review Intelligence for Guardian Malaysia

Features

What Data This Scraper Extracts

Example Output

Directory Structure Tree

Use Cases

FAQs

Performance Benchmarks and Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Guardianmy Reviews Spider Scraper

Introduction

Product Review Intelligence for Guardian Malaysia

Features

What Data This Scraper Extracts

Example Output

Directory Structure Tree

Use Cases

FAQs

Performance Benchmarks and Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages