Releases: simonsdave/cloudfeaster
Releases · simonsdave/cloudfeaster
v0.9.9
v0.9.8
Added
max concurrencyper spider property is now part of
the output fromSpider.get_validated_metadata()regardless
of whether or not it is specified as part of the explicit spider
metadata declaration- added
paranoia_levelto spider metadata - added
max_crawl_time_in_secondsto spider metadata ttl_in_secondsnow has an upper bound of 86,400 (1 day in seconds)max_concurrencynow has an upper bound of 25
Changed
- Selenium 3.7.0 -> 3.8.1
- ChromeDriver 2.33 -> 2.34
- breaking change
ttl->ttl_in_secondsin spider metadata
Removed
- Nothing
v0.9.7
Added
- added
.prep-for-release-master-branch-changes.shso package version number
is automatically bumped when cutting a relase .prep-for-release-master-branch-changes.shnow generates Python packages
for PyPI from release branch
Changed
- bug fix in
.prep-for-release-release-branch-changes.shto links in mainREADME.md
work correctly after a release
Removed
- removed
cloudfeaster.utilmodule since it wasn't used
v0.9.6
Added
- added --log command line option to spiders.py
- added --samples command line option to spiders.py
cloudfeaster.webdriver_spider.WebElementnow has
ais_element_present()method that functions just
likecloudfeaster.webdriver_spider.Browser
Changed
- per this article
headless Chrome
is now available andCloudfeasterwill use it by default which means we're also
able to remove the need to Xvfb which is a really
nice simplification and reduction in required crawling resources - also, because we're
removing Xvfbbin/spiderhost.shwas also removed - selenium 3.3.3 -> 3.7.0
- requests 2.13.0 -> >=2.18.2
- ndg-httpsclient 0.4.2 -> 0.4.3
- ChromeDriver 2.29 -> 2.33
- simonsdave/cloudfeaster docker image
now uses the latest version of pip
Removed
- removed all code related to Signal FX
v0.9.5
Added
- pypi_spider.py now included with distro in cloudfeaster.samples
Changed
- upgrade selenium 3.0.2 -> 3.3.3
- upgrade chromedriver 2.27 -> 2.29
Removed
- Nothing
v0.9.4
Added
- added _crawl_time to crawl results
Changed
- upgrade to ChromeDriver 2.27 from 2.24
Removed
- Nothing
v0.9.3
Added
- Nothing
Changed
-
fix crawl response key errors - _status & _status_code in crawl
response were missing the leading underscore for the following responses- SC_CTR_RAISED_EXCEPTION
- SC_INVALID_CRAWL_RETURN_TYPE
- SC_CRAWL_RAISED_EXCEPTION
- SC_SPIDER_NOT_FOUND
Removed
- Nothing
v0.9.2
Added
- Nothing
Changed
- dev env upgraded to docker 1.12
- BREAKING CHANGE = selenium 2.53.6 -> 3.0.1 which resulted in
requiring an upgrade to
ChromeDriver 2.24
from 2.22 and it turns out 2.22 does not work with selenium 3.0.1 - spider version # in crawl results now include hash algo along
with the hash value - BREAKING CHANGE = the spidering infrastructure augments crawl results
with data such as the time to crawl, spider name & version number, etc - in
order to more easily differentiate crawl results from augmented data, the
top level property names for all augment data is now prefixed with an underscore - as
an example, below shows the new output from running the PyPI
sample spider
>./pypi_spider.py | jq .
{
"virtualenv": {
"count": 46718553,
"link": "http://pypi-ranking.info/module/virtualenv",
"rank": 5
},
"_status_code": 0,
"setuptools": {
"count": 63758431,
"link": "http://pypi-ranking.info/module/setuptools",
"rank": 2
},
"simplejson": {
"count": 182739575,
"link": "http://pypi-ranking.info/module/simplejson",
"rank": 1
},
"requests": {
"count": 53961784,
"link": "http://pypi-ranking.info/module/requests",
"rank": 4
},
"six": {
"count": 54950976,
"link": "http://pypi-ranking.info/module/six",
"rank": 3
},
"_spider": {
"version": "sha1:ccb6a042dd11f2f7fb7b9541d4ec888fc908a8ef",
"name": "__main__.PyPISpider"
},
"_crawl_time_in_ms": 4773,
"_status": "Ok"
}- upgrade dev env to docker 1.12
Removed
- Nothing
v0.9.1
Added
- Nothing
Changed
- fixed bug that was duplicating crawl response data in
CrawlResponseOk
Removed
- Nothing
v0.9.0
Added
- support docker 1.12
Changed
- version bumps for dependancies:
- chromedriver 2.22
- selenium 2.53.6
- requests 2.11.0
- ndg-httpsclient 0.4.2
- set of simplifications in dev env setup
Removed
- temporary removal of authenticated proxy support