Skip to content

dusicastepic/ADMThirdHomework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Homework 3 - Find the perfect place to stay in Texas!

The goal of this project was analyzing the text of property listings and creating three different search engines using the airbnb data that, given as input a query, return the houses that match the query.

Link to the rendered project: http://nbviewer.jupyter.org/github/dusicastepic/ADMThirdHomework/blob/master/Homework%203%20-%20group%20%2319.ipynb

Instructions for project utilization:

1. Download airbnb data
2. Use files: functions.py and Homework 3 - group #19.ipynb
3. Cells that are markdown cells in the .ipynb file put them as code cells 
and run them (they should be executed only once and then just saved as files and loaded from the working directory)
4. There should be a folder named 'data' where .tsv files are created and stored

The repository consists of the following files:

  1. Homework 3 - group #19.ipynb:

    A Jupyter notebook which provides the following:

    Search Engine 1 - Conjunctive query
         The first Search Engine evaluated queries based on the `description` and `title` of each document. It also uses inverted index to return the result of the query. Inverted index is in the form of dictionary(key=term_id, value=list of document_ids). 
    
    Search Engine 2 - Conjunctive query & Ranking score
    In the new Search Engine, given a query, top-k documents related to the query should be returned 
    sorted based on the calculated _Cosine similarity_  
    Based on the second inverted index it will return the result of the query. Second inverted index is in the form of		            dictionary(key=term_id, value=list of tuples(doc_id,dict{key=(term,doc_id), value=tf_idf value}). 
    Afterwards the values were stored and sorted using the heap structure. It was also used to return top-5 houses.
    
    Search Engine 3 - Conjunctive query & a new score
    
  2. functions.py:

    A python script which provides all the functions used in the Homework 3 - group #19.ipynb notebook.

  3. Maps_radius.html:

    A map that shows the houses in the radius user chose based on the location he entered. The code is in the Homework 3 - group #19.ipynb notebook.

Team members: * Dusica Stepic * Giulia Maslov * Daniele Figoli *

About

Third homework for Algorithmic Methods of Data Mining - Group #19

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •