Skip to content

dorattodoreaczw/guardianmy-reviews-spider

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

Guardianmy Reviews Spider Scraper

A robust tool for extracting detailed product reviews from Guardian Malaysia product pages. It transforms scattered customer feedback into clean, structured datasets for analysis, monitoring, and reporting. Ideal for teams needing reliable Guardian Malaysia review data at scale.

Bitbash Banner

Telegram   WhatsApp   Gmail   Website

Created by Bitbash, built to showcase our approach to Scraping and Automation!
If you are looking for guardianmy-reviews-spider you've just found your team — Let’s Chat. 👆👆

Introduction

This project extracts product review information from Guardian Malaysia product pages and converts it into structured, analysis-ready data. It solves the problem of manually collecting and organizing customer reviews across multiple products. It is built for analysts, e-commerce teams, researchers, and product managers who rely on accurate review intelligence.

Product Review Intelligence for Guardian Malaysia

  • Collects all available reviews directly from product pages
  • Normalizes ratings, titles, content, and metadata into consistent fields
  • Supports batch processing of multiple product URLs
  • Enables longitudinal analysis using crawl and publication dates
  • Designed for clean exports into analytics and BI workflows

Features

Feature Description
Review Extraction Scrapes complete review text, titles, and ratings from product pages.
Product Metadata Captures product identifiers, names, segments, and categories.
Batch URL Support Processes multiple Guardian Malaysia product URLs in one run.
Structured Output Returns normalized, analysis-ready review objects.
Temporal Tagging Includes review dates, crawl dates, and quarter labeling.

What Data This Scraper Extracts

Field Name Field Description
Product_Id Unique identifier of the product being reviewed.
Review_Id Unique identifier for each individual review.
Rating Numerical rating score associated with the review.
Title Review title or short summary text.
Body Main textual content of the review.
Full_Review Combined review text used for analysis.
Product_Name Name of the reviewed product.
Product_Segment High-level product category.
Product_Segment2 Secondary product classification.
Gender Intended gender segment of the product.
Country Country associated with the product listing.
Date Original publication date of the review.
Year_Quarter Derived year and quarter label for trend analysis.
URL Source product page URL.
Crawled_Date Date when the review data was collected.

Example Output

[
      {
        "Product_Id": "121068601",
        "Review_Id": "121068601-rev-1",
        "Rating": 100,
        "Title": "dove-shower-1l-beauty-nour-121068601",
        "Body": "Best!",
        "Sentiment": null,
        "Section": "",
        "Higher_Topic": null,
        "Granular_Topic": null,
        "Source": "Guardian",
        "Full_Review": "Best!",
        "Review_Type": "Product Review",
        "Title_Trans": "",
        "Body_Trans": "Best!",
        "Full_Review_Trans": "Best!",
        "Product_Name_Trans": "dove-shower-1l-beauty-nour-121068601",
        "Product_Segment": "Supplements",
        "Gender": "Unisex",
        "Product_Segment2": "Wellness",
        "Year_Quarter": "2024-Q2",
        "Country": "Malaysia",
        "Date": "06-04-2024",
        "Product_Name": "dove-shower-1l-beauty-nour-121068601",
        "Brand": null,
        "URL": "https://www.guardian.com.my/dove-shower-1l-beauty-nour-121068601.html?page=1",
        "Crawled_Date": "10-06-2025"
      }
    ]

Directory Structure Tree

Guardianmy Reviews Spider/
├── src/
│   ├── main.py
│   ├── review_parser.py
│   ├── product_parser.py
│   ├── validators.py
│   └── utils.py
├── data/
│   ├── sample_input.json
│   └── sample_output.json
├── config/
│   └── settings.example.json
├── requirements.txt
└── README.md

Use Cases

  • E-commerce teams use it to monitor customer feedback, so they can improve product positioning and listings.
  • Market analysts use it to study review trends, so they can identify shifts in consumer sentiment.
  • Brand managers use it to track product perception, so they can respond to recurring issues faster.
  • Data scientists use it to build sentiment or rating models, so they can predict product performance.

FAQs

How do I provide input URLs? You supply an array of Guardian Malaysia product page URLs, and the scraper processes each page sequentially.

Does it support multiple products at once? Yes, multiple product URLs can be processed in a single run for batch review extraction.

Are translations required for usage? No, translation fields are optional and can be ignored if not needed for your workflow.

Is the output suitable for analytics tools? Yes, the structured format is designed to integrate easily with databases, dashboards, and data pipelines.


Performance Benchmarks and Results

Primary Metric: Processes dozens of product reviews per minute per product page on average.

Reliability Metric: Maintains a high success rate across paginated review pages with consistent field coverage.

Efficiency Metric: Optimized parsing minimizes redundant page processing and memory usage.

Quality Metric: Delivers high data completeness with consistent review-to-product mapping and timestamp accuracy.

Book a Call Watch on YouTube

Review 1

"Bitbash is a top-tier automation partner, innovative, reliable, and dedicated to delivering real results every time."

Nathan Pennington
Marketer
★★★★★

Review 2

"Bitbash delivers outstanding quality, speed, and professionalism, truly a team you can rely on."

Eliza
SEO Affiliate Expert
★★★★★

Review 3

"Exceptional results, clear communication, and flawless delivery.
Bitbash nailed it."

Syed
Digital Strategist
★★★★★

Releases

No releases published

Packages

 
 
 

Contributors