Skip to content

Releases: simonsdave/cloudfeaster

v0.9.9

03 Feb 00:12

Choose a tag to compare

Added

  • Nothing

Changed

Removed

  • Nothing

v0.9.8

10 Jan 19:21

Choose a tag to compare

Added

  • max concurrency per spider property is now part of
    the output from Spider.get_validated_metadata() regardless
    of whether or not it is specified as part of the explicit spider
    metadata declaration
  • added paranoia_level to spider metadata
  • added max_crawl_time_in_seconds to spider metadata
  • ttl_in_seconds now has an upper bound of 86,400 (1 day in seconds)
  • max_concurrency now has an upper bound of 25

Changed

  • Selenium 3.7.0 -> 3.8.1
  • ChromeDriver 2.33 -> 2.34
  • breaking change ttl -> ttl_in_seconds in spider metadata

Removed

  • Nothing

v0.9.7

28 Nov 02:14

Choose a tag to compare

Added

  • added .prep-for-release-master-branch-changes.sh so package version number
    is automatically bumped when cutting a relase
  • .prep-for-release-master-branch-changes.sh now generates Python packages
    for PyPI from release branch

Changed

  • bug fix in .prep-for-release-release-branch-changes.sh to links in main README.md
    work correctly after a release

Removed

  • removed cloudfeaster.util module since it wasn't used

v0.9.6

25 Nov 22:33

Choose a tag to compare

Added

  • added --log command line option to spiders.py
  • added --samples command line option to spiders.py
  • cloudfeaster.webdriver_spider.WebElement now has
    a is_element_present() method that functions just
    like cloudfeaster.webdriver_spider.Browser

Changed

  • per this article
    headless Chrome
    is now available and Cloudfeaster will use it by default which means we're also
    able to remove the need to Xvfb which is a really
    nice simplification and reduction in required crawling resources - also, because we're
    removing Xvfb bin/spiderhost.sh was also removed
  • selenium 3.3.3 -> 3.7.0
  • requests 2.13.0 -> >=2.18.2
  • ndg-httpsclient 0.4.2 -> 0.4.3
  • ChromeDriver 2.29 -> 2.33
  • simonsdave/cloudfeaster docker image
    now uses the latest version of pip

Removed

  • removed all code related to Signal FX

v0.9.5

18 Apr 03:07

Choose a tag to compare

Added

  • pypi_spider.py now included with distro in cloudfeaster.samples

Changed

  • upgrade selenium 3.0.2 -> 3.3.3
  • upgrade chromedriver 2.27 -> 2.29

Removed

  • Nothing

v0.9.4

05 Mar 22:28

Choose a tag to compare

Added

  • added _crawl_time to crawl results

Changed

Removed

  • Nothing

v0.9.3

04 Mar 01:11

Choose a tag to compare

Added

  • Nothing

Changed

  • fix crawl response key errors - _status & _status_code in crawl
    response were missing the leading underscore for the following responses

    • SC_CTR_RAISED_EXCEPTION
    • SC_INVALID_CRAWL_RETURN_TYPE
    • SC_CRAWL_RAISED_EXCEPTION
    • SC_SPIDER_NOT_FOUND

Removed

  • Nothing

v0.9.2

12 Feb 23:08

Choose a tag to compare

Added

  • Nothing

Changed

  • dev env upgraded to docker 1.12
  • BREAKING CHANGE = selenium 2.53.6 -> 3.0.1 which resulted in
    requiring an upgrade to
    ChromeDriver 2.24
    from 2.22 and it turns out 2.22 does not work with selenium 3.0.1
  • spider version # in crawl results now include hash algo along
    with the hash value
  • BREAKING CHANGE = the spidering infrastructure augments crawl results
    with data such as the time to crawl, spider name & version number, etc - in
    order to more easily differentiate crawl results from augmented data, the
    top level property names for all augment data is now prefixed with an underscore - as
    an example, below shows the new output from running the PyPI
    sample spider
>./pypi_spider.py | jq .
{
  "virtualenv": {
    "count": 46718553,
    "link": "http://pypi-ranking.info/module/virtualenv",
    "rank": 5
  },
  "_status_code": 0,
  "setuptools": {
    "count": 63758431,
    "link": "http://pypi-ranking.info/module/setuptools",
    "rank": 2
  },
  "simplejson": {
    "count": 182739575,
    "link": "http://pypi-ranking.info/module/simplejson",
    "rank": 1
  },
  "requests": {
    "count": 53961784,
    "link": "http://pypi-ranking.info/module/requests",
    "rank": 4
  },
  "six": {
    "count": 54950976,
    "link": "http://pypi-ranking.info/module/six",
    "rank": 3
  },
  "_spider": {
    "version": "sha1:ccb6a042dd11f2f7fb7b9541d4ec888fc908a8ef",
    "name": "__main__.PyPISpider"
  },
  "_crawl_time_in_ms": 4773,
  "_status": "Ok"
}
  • upgrade dev env to docker 1.12

Removed

  • Nothing

v0.9.1

17 Aug 13:12

Choose a tag to compare

Added

  • Nothing

Changed

  • fixed bug that was duplicating crawl response data in CrawlResponseOk

Removed

  • Nothing

v0.9.0

16 Aug 20:06

Choose a tag to compare

Added

  • support docker 1.12

Changed

  • version bumps for dependancies:
    • chromedriver 2.22
    • selenium 2.53.6
    • requests 2.11.0
    • ndg-httpsclient 0.4.2
  • set of simplifications in dev env setup

Removed

  • temporary removal of authenticated proxy support