The goal of this project was analyzing the text of property listings and creating three different search engines using the airbnb data that, given as input a query, return the houses that match the query.
Link to the rendered project: http://nbviewer.jupyter.org/github/dusicastepic/ADMThirdHomework/blob/master/Homework%203%20-%20group%20%2319.ipynb
Instructions for project utilization:
1. Download airbnb data
2. Use files: functions.py and Homework 3 - group #19.ipynb
3. Cells that are markdown cells in the .ipynb file put them as code cells
and run them (they should be executed only once and then just saved as files and loaded from the working directory)
4. There should be a folder named 'data' where .tsv files are created and stored
The repository consists of the following files:
-
Homework 3 - group #19.ipynb:A Jupyter notebook which provides the following:
Search Engine 1 - Conjunctive query The first Search Engine evaluated queries based on the `description` and `title` of each document. It also uses inverted index to return the result of the query. Inverted index is in the form of dictionary(key=term_id, value=list of document_ids). Search Engine 2 - Conjunctive query & Ranking score In the new Search Engine, given a query, top-k documents related to the query should be returned sorted based on the calculated _Cosine similarity_ Based on the second inverted index it will return the result of the query. Second inverted index is in the form of dictionary(key=term_id, value=list of tuples(doc_id,dict{key=(term,doc_id), value=tf_idf value}). Afterwards the values were stored and sorted using the heap structure. It was also used to return top-5 houses. Search Engine 3 - Conjunctive query & a new score -
functions.py:A python script which provides all the functions used in the
Homework 3 - group #19.ipynbnotebook. -
Maps_radius.html:A map that shows the houses in the radius user chose based on the location he entered. The code is in the
Homework 3 - group #19.ipynbnotebook.
Team members: * Dusica Stepic * Giulia Maslov * Daniele Figoli *