A web data crawler for e-commerce in php
To scrape a shop at www.XXX.com you need to write the script src/Site/Xxx.php You can copy one of the scripts at src/Site/ to start with.
Inside Xxx.php you need to implement 3 functions:
fetchCategories() - get all the links to products categories pages. fetchProducts() - get all the links to product pages fetchProductData() - scrape information from the product page
Moreover you need to add your new package Xxx.php at: src/Command/Fetch.php src/Site.php
To run tests and check each of the functions independantly you can use src/Commands/Test.php
To run the program use:
php index.php test --sites xxx
When the functions are done, we run it using:
php index.php fetch --sites xxx
CREATE TABLE products(idint(11) unsigned NOT NULL AUTO_INCREMENT,sitevarchar(50) DEFAULT NULL,urlvarchar(255) DEFAULT NULL,product_codevarchar(255) DEFAULT NULL,titlevarchar(255) DEFAULT NULL,descriptiontext,imagetext,videovarchar(255) DEFAULT NULL,modelvarchar(150) DEFAULT NULL,manufacturervarchar(150) DEFAULT NULL,warrantyvarchar(150) DEFAULT NULL,deliveryvarchar(150) DEFAULT NULL,priceint(11) DEFAULT NULL,sale_priceint(11) DEFAULT NULL,ship_priceint(11) DEFAULT NULL,optionsjson DEFAULT NULL,categoryjson DEFAULT NULL,created_atdatetime DEFAULT NULL,updated_atdatetime DEFAULT NULL,visited datetime DEFAULT NULL, PRIMARY KEY (id), UNIQUE KEY site (site,url`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE categories (
id int(11) unsigned NOT NULL AUTO_INCREMENT,
title varchar(255) DEFAULT NULL,
url varchar(255) DEFAULT NULL,
site varchar(50) DEFAULT NULL,
updated_at datetime DEFAULT NULL,
visited datetime DEFAULT NULL,
PRIMARY KEY (id),
UNIQUE KEY url (url,site)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;`