Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed .DS_Store
Binary file not shown.
124 changes: 124 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@

# Created by https://www.gitignore.io/api/python,visualstudiocode
# Edit at https://www.gitignore.io/?templates=python,visualstudiocode

### Python ###
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
pip-wheel-metadata/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/

# Translations
*.mo
*.pot

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# pyenv
.python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# celery beat schedule file
celerybeat-schedule

# SageMath parsed files
*.sage.py

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# Mr Developer
.mr.developer.cfg
.project
.pydevproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

### VisualStudioCode ###
.vscode/*
!.vscode/settings.json
!.vscode/tasks.json
!.vscode/launch.json
!.vscode/extensions.json

### VisualStudioCode Patch ###
# Ignore all local history of files
.history

# End of https://www.gitignore.io/api/python,visualstudiocode

airbnb_scraper/settings.py
rv-data/
3 changes: 0 additions & 3 deletions .vscode/settings.json

This file was deleted.

26 changes: 4 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,7 @@
# airbnb_scraper :spider:

Spider built with scrapy and ScrapySplash to crawl listings

## Checklist

*This checklist is for personal use and isn't relevant to using the scraper.*

- [x] Spider can successfully parse one page of listings
- [x] Spider can successfully parse mutliple/all pages of designated location
- [x] Spider can take price ranges as arguments (`price_lb` and `price_ub`)
- [x] Spider can take location as argument
Spider built with scrapy and ScrapySplash to crawl Airbnb and HomeAway listings of Luxembourg.
Fork of [airbnb_scraper](https://github.com/kailu3/airbnb-scraper)

## Set up

Expand All @@ -33,18 +25,8 @@ See [scrapy-splash](https://github.com/scrapy-plugins/scrapy-splash) if you run

## Crawling

Run the spider with `scrapy crawl airbnb -o {filename}.json -a city='{cityname}' -a price_lb='{pricelowerbound}' -a price_ub='{priceupperbound}'`

`cityname` refers to a valid city name

`pricelowerbound` refers to a lower bound for price from 0 to 999

`priceupperbound` refers to upper bound for price from 0 to 999. Spider will close if `priceupperbound` is less than
`pricelowerbound`
**Note: Airbnb only returns a maximum of ~300 listings per specific filter (price range). To get more listings, I recommend scraping multiple times using small increments in price and concatenating the datasets.**

If you would like to do multiple scrapes over a wide price range (e.g. 10-spaced intervals from 20 to 990), see `cancun.sh` which I used to crawl a large number listings for Cancún.
TBD

## Acknowledgements

I would like to thank **Ahmed Rafik** for his guidance and teachings.
I would like to thank **kailu3** from whom I forked this project.
Loading