Crawl categories and products from tiki.vn, store in PostGres database and use Flask to query the data. Plus an analysis presentation embed in the Flask website. Heroku app and database is deployed at https://tiki-postgresql-app.herokuapp.com
- Prepare the database
- Here we use python psycopg2 module to create connection to postgres
- run
Tiki_crawl_categories.ipynbto crawl categories on Tiki and store in categories table - run
classProduct.ipynbto crawl products on Tiki and store in products table - Optionally we use multithreading with Python
concurrent.futures ThreadPoolExecutorto speed up crawl time. When using multhreading, new connection to db has to be created and closed each access to the db because the connection cannot handle multiprocessing. - Product data include: product title, brand name, regular price, discount, final price, category, comments, number of ratings, TikiNOW availability, image, link
- Database schema:
- Data analysis
- Our analysis is in
Analysis.ipynb - We perform category analysis, seller analysis and product analysis. We use
seabornto make charts.
- Our analysis is in
- Database connection
- We use
flask_sqlalchemy SQLAlchemyto create data class models and query models.pycreates the class data models andconfig.pycontains configuration for our app- Database can be query in realtime by input URL path
- We use
- Start Flask app
- On terminal run
python app.pyto start the Flask app
- On terminal run
- Navigate using the URL
- The URL path is used to input query to the app
- /product/getid/[id] will query and return the product by id
- /product/getseller/[seller] will query by seller and return all products from the seller ([seller] input is case sensitive), for example, /product/getseller/FORD will return all products by FORD
- /product/getcategory/[categoryid] will query the products by the category id
- /category/getid/[id] will return the category by id
- View Tiki analysis presentation
- Tiki Analysis is embed at /presentation and can be viewed by going to /presentation or clicking on 'Go to Tiki analysis Slides' button
- Create app and push database to Heroku
- Install heroku and login
- Create environment
virtualenv envthensource env/bin/activateand install required python modules. (Flask, flask_script, flask_migrate, psycopg2-binary, gunicorn) - Create
requirements.txtbypip freeze > requirements.txt - Create
runtime.txtcontainingpython-3.6.5 - Create
Procfilecontaingweb: gunicorn app:app - Create app
heroku create [app-name] - Create remote and ready to push
git remote add prod https://git.heroku.com/tiki-postgresql-app.git - Config heroku
heroku config:set APP_SETTINGS=config.ProductionConfig --remote prod - Create database remotely
heroku addons:create heroku-postgresql:hobby-dev --app [app-name] - View configurations by
heroku config --app [app-name] - push postgres db to remote
PGUSER=postgres PGPASSWORD=password heroku pg:push postgresql://postgres:password@localhost:5432/[localdbname] DATABASE_URL --app [app-name]. TheDATABASE_URLis the in heroku configurations in the previous remote db creation step - If
pg_dumpversion mismatch upgrade postgres bybrew upgrade postgresql - create file
.envcontaining
(also editexport APP_SETTINGS="config.DevelopmentConfig" export DATABASE_URL="[remoteDatabaseURL]"config.pyDATABASE_URL to be remoteDatabaseURL)- run
git initon the root folder that has not been git initialized git commit -m "message here"thengit push prod master- All done and go to heroku url to use app
