This is a Scrapy project to scrape online book store from http://books.toscrape.com/ and store data in mongodb.
This project is only meant for educational purposes.
Bookstore Website
Book Url Selection
Next Page Url Selection
Image Url Selection
Title Selection
Price Selection
Stock And Ratings Selection
Description Selection
Product Type,Price Inc Tax,Price Exc Tax,Tax,Availability,Reviews Selection
Mongodb Stored Data
This project extracts availability,description,image_urls,images(download with rename),url,instock_availability,number_of_reviews,price,price__excl_tax,price_incl_tax,product_type,rating,tax,title,upc. The extracted data looks like this sample:
{
'price_without_tax': ['£27.70'],
'price_with_tax': ['£27.70'],
'product_type': ['Books'],
'tax': ['£0.00'],
'upc': ['d510567580c8be52'],
'number_of_reviews': 'Four'
}
in settings.py set pipeline for download data in mongodb. command:
TITEM_PIPELINES = {
'scrapyMongoDbExample.pipelines.ScrapymongodbexamplePipeline': 300,
}
MONGODB_SERVER = 'localhost'
MONGODB_PORT = 27017
MONGODB_DB = 'books'
MONGODB_COLLECTION = 'products'
settings pipelines
lass ScrapymongodbexamplePipeline(object):
# product details store in database
# database configuration
def __init__(self):
settings = get_project_settings()
connection = MongoClient(
settings['MONGODB_SERVER'],
settings['MONGODB_PORT'])
db = connection[settings['MONGODB_DB']]
self.collection = db[settings['MONGODB_COLLECTION']]
# insert data into database
def process_item(self, item, spider):
self.collection.insert(dict(item))
return item
This project contains one spider and you can list them using the list
command:
$ scrapy list
scrapyMongoDbDemoSpider
Spider extract the data from book store.
You can run a spider using the scrapy crawl command, such as:
$ scrapy crawl scrapyMongoDbDemoSpider









