python script to search and export craiglists search data into .csv file.
##Dependency
- Python packages
- urllib
- requests
- csv
- BeautifulSoup
To search posting for "Honda 2015" in Albany region
from Scraper import GetCraiglistSites, GetCraiglistData
site = GetCraiglistSites()
g_data = GetCraiglistData("Honda 2015")
g_data.seturl(site.forcity("albany"))
g_data.saveresults()
Above code will export result in SearchResult.csv file in current path. To print first 50 results to output console rather than exporting to csv
g_data.printresults(50)
See more sample usages in runner.py
##Classes and their methods:
Class GetCraiglistSites
Use GetCraiglistSites To configure/search url /print list of craigslist site urls.
continent is optional parameter which can be set to -
- US = United States
- EU = Europe
- ASIA = Asia
- Oceana = Australia/New Zealand
- LATAM = Latin America
- AF = Africa
Default continent is US.
GetCraiglistSites has following methods -
-
forcity(<city name>)- Use this method to find site url for given city name. A string<city name>is only acceptable argument. -
printsitelist()- Use this method to print list of all available cities & their site urls forcontinent. This method requires no arguments. -
getsuggestions(<city name>)- Use this method to get/print suggestions for a partial city name.This method is called internally whenforcity()can not find the site url
=================
Class GetCraiglistData
Use GetCraiglistData to search item on cragslist site. Default site url is set to
https://newyork.craigslist.org/ This can be changed using seturl() method.
Item to search can be passed as string. if no search item specified, scraper will pull all available items for sale.Craiglist server usually responds with only 2500 postings at the max. (warning - This may take significant amount of time to process)
GetCraiglistData has following methods -
-
seturl()- Use this method instruct scrapper to search craigslist posting in specific city url.seturl()accepts a link passed as string as well as url returned byforcity()method ofGetCraiglistSiteclass -
printresults()- Use this method to print the result of search on output console. This method has optional argument ofint. Method will print firstintnumber of results. If not speficied, it will print all result. -
saveresults()- Use this method to save the result of search in .csv format. This will save data in current_path/SearchResults.csv file.
csv file has following headers -
Post Title Posting URL Price Location Posted on Posted Time Updated on Updated Time Description
GetCraiglistSites properties -
gettermto retrive search string that was passed to class while creating an objectgeturlto get the craiglist url that was used while performing search.
=================
- Build a GUI
- Build a stand-alone executable
- Extract keywords
- Add additonal configurations methods for searching.
- Add threading to optimize performance.
##Notice This script is developed solely for the purpose of research and education. Any commertial use of this script may violate cragslist's terms of use