Releases: jamesturk/spatula
v0.9.1
Changelog
!!! note
spatula 1.0 should be ready in a few months, providing a more stable interface to build upon, until then interfaces may change between releases.
0.9.1 - 2024-07-10
- add support for new versions of lxml and Python
0.9.0 - 2022-02-10
- add
Page.accept_responsemethod that can be overriden to trigger custom retry logic - add preliminary spatula.config for setting/overriding global defaults
(this feature is not yet considered stable, it likely will be modified before 1.0)
0.8.10 - 2022-01-31
- update click dependency
0.8.9 - 2021-12-14
- fix for
--rmdirnot recreating directory
0.8.8 - 2021-12-09
- add
--rmdirflag tospatula scrape
0.8.7 - 2021-11-09
- add support for raising
SkipItemfrom a detail page to resume processing
without yielding data from the page
0.8.6 - 2021-10-13
- add
timeoutargument to URL source - add
--subpagesargument tospatula testwhich runs
similarly tospatula scrapebut writes output to the terminal
0.8.5 - 2021-08-09
- add
verifyargument to URL source - improve messaging when using
spatula test - add
--dumpflag tospatula scrapeto control output format
0.8.4 - 2021-07-15
self.skipis deprecated in favor of raisingSkipItem- add experimental support for module arguments to
scrapecommand
0.8.3 - 2021-06-23
- fix bug where default headers were cleared by default
- update to scrapelib 2.0.6 which contains a bugfix for a redirect follow bug
0.8.2 - 2021-06-22
- fix
spatula --versionto report correct version - allow
--datacommand line flags to overrideexample_inputvalues - add caching of
dependencies - fix pagination on non-list pages
- add advanced documentation & anatomy of a scrape
0.8.1 - 2021-06-17
- remove undocumented
page_to_itemsfunction - added
Page.do_scrapeto programmatically get all items from a scrape - added
--sourceparameter to scout & scrape commands
0.8.0 - 2021-06-15
- remove undocumented
Workflow - allow using
Pageinstances (as opposed to just the type) for scout & scrape - add check for
get_filenameon output classes to override default filename - improved automatic
pydanticsupport - add --timeout, --no-verify, --retries, --retry-wait options
- add --fastmode option to use local cache
- fix all CLI commands to obey various scraper options
0.7.1 - 2021-06-14
- remove undocumented default behavior for
get_source_from_input - major documentation overhaul
- fixes for scout scrape when working with raw data returns
0.7.0 - 2021-06-04
- add
spatula scoutcommand - make error messages a bit more clear
- improvements to documentation
- added more CLI options to control verbosity, user agent, etc.
- if module cannot be found, search current directory
0.6.0 - 2021-04-12
- add full typing to library
- small bugfixes
0.5.0 - 2021-02-04
- add
ExcelListPage - improve
Page.loggerand CLI output - move to simpler
Workflowclass spatula scrapecan now take the name of a page, will use default
Workflow- bugfix: inconsistent name for
process_error_response
0.4.1 - 2021-02-01
- bugfix: dependencies are instantiated from parent page input
0.4.0 - 2021-02-01
- restore Python 3.7 compatibility
- add behavior to handle returning additional
Pagesubclasses to
continue scraping - add default behavior when
Page.inputhas aurlattribute. - add
PdfPage - add
page_to_itemshelper - add
Page.example_inputandPage.example_sourcefor test command - add
Page.loggerfor logging - allow use of
dataclassesin addition toattrsas input objects - improve output of HTML elements
- bugfix: not specifying a page processor on workflow is no longer an
error
0.3.0 - 2021-01-18
- first documented major release
v0.9.0
Changelog
!!! note
spatula 1.0 should be ready in a few months, providing a more stable interface to build upon, until then interfaces may change between releases.
0.9.0 - 2022-02-10
- add
Page.accept_responsemethod that can be overriden to trigger custom retry logic - add preliminary spatula.config for setting/overriding global defaults
(this feature is not yet considered stable, it likely will be modified before 1.0)
0.8.10 - 2022-01-31
- update click dependency
0.8.9 - 2021-12-14
- fix for
--rmdirnot recreating directory
0.8.8 - 2021-12-09
- add
--rmdirflag tospatula scrape
0.8.7 - 2021-11-09
- add support for raising
SkipItemfrom a detail page to resume processing
without yielding data from the page
0.8.6 - 2021-10-13
- add
timeoutargument to URL source - add
--subpagesargument tospatula testwhich runs
similarly tospatula scrapebut writes output to the terminal
0.8.5 - 2021-08-09
- add
verifyargument to URL source - improve messaging when using
spatula test - add
--dumpflag tospatula scrapeto control output format
0.8.4 - 2021-07-15
self.skipis deprecated in favor of raisingSkipItem- add experimental support for module arguments to
scrapecommand
0.8.3 - 2021-06-23
- fix bug where default headers were cleared by default
- update to scrapelib 2.0.6 which contains a bugfix for a redirect follow bug
0.8.2 - 2021-06-22
- fix
spatula --versionto report correct version - allow
--datacommand line flags to overrideexample_inputvalues - add caching of
dependencies - fix pagination on non-list pages
- add advanced documentation & anatomy of a scrape
0.8.1 - 2021-06-17
- remove undocumented
page_to_itemsfunction - added
Page.do_scrapeto programmatically get all items from a scrape - added
--sourceparameter to scout & scrape commands
0.8.0 - 2021-06-15
- remove undocumented
Workflow - allow using
Pageinstances (as opposed to just the type) for scout & scrape - add check for
get_filenameon output classes to override default filename - improved automatic
pydanticsupport - add --timeout, --no-verify, --retries, --retry-wait options
- add --fastmode option to use local cache
- fix all CLI commands to obey various scraper options
0.7.1 - 2021-06-14
- remove undocumented default behavior for
get_source_from_input - major documentation overhaul
- fixes for scout scrape when working with raw data returns
0.7.0 - 2021-06-04
- add
spatula scoutcommand - make error messages a bit more clear
- improvements to documentation
- added more CLI options to control verbosity, user agent, etc.
- if module cannot be found, search current directory
0.6.0 - 2021-04-12
- add full typing to library
- small bugfixes
0.5.0 - 2021-02-04
- add
ExcelListPage - improve
Page.loggerand CLI output - move to simpler
Workflowclass spatula scrapecan now take the name of a page, will use default
Workflow- bugfix: inconsistent name for
process_error_response
0.4.1 - 2021-02-01
- bugfix: dependencies are instantiated from parent page input
0.4.0 - 2021-02-01
- restore Python 3.7 compatibility
- add behavior to handle returning additional
Pagesubclasses to
continue scraping - add default behavior when
Page.inputhas aurlattribute. - add
PdfPage - add
page_to_itemshelper - add
Page.example_inputandPage.example_sourcefor test command - add
Page.loggerfor logging - allow use of
dataclassesin addition toattrsas input objects - improve output of HTML elements
- bugfix: not specifying a page processor on workflow is no longer an
error
0.3.0 - 2021-01-18
- first documented major release
v0.8.10
Changelog
!!! note
spatula 1.0 should be ready in a few months, providing a more stable interface to build upon, until then interfaces may change between releases.
0.8.10 - 2022-01-31
- update click dependency
0.8.9 - 2021-12-14
- fix for
--rmdirnot recreating directory
0.8.8 - 2021-12-09
- add
--rmdirflag tospatula scrape
0.8.7 - 2021-11-09
- add support for raising
SkipItemfrom a detail page to resume processing
without yielding data from the page
0.8.6 - 2021-10-13
- add
timeoutargument to URL source - add
--subpagesargument tospatula testwhich runs
similarly tospatula scrapebut writes output to the terminal
0.8.5 - 2021-08-09
- add
verifyargument to URL source - improve messaging when using
spatula test - add
--dumpflag tospatula scrapeto control output format
0.8.4 - 2021-07-15
self.skipis deprecated in favor of raisingSkipItem- add experimental support for module arguments to
scrapecommand
0.8.3 - 2021-06-23
- fix bug where default headers were cleared by default
- update to scrapelib 2.0.6 which contains a bugfix for a redirect follow bug
0.8.2 - 2021-06-22
- fix
spatula --versionto report correct version - allow
--datacommand line flags to overrideexample_inputvalues - add caching of
dependencies - fix pagination on non-list pages
- add advanced documentation & anatomy of a scrape
0.8.1 - 2021-06-17
- remove undocumented
page_to_itemsfunction - added
Page.do_scrapeto programmatically get all items from a scrape - added
--sourceparameter to scout & scrape commands
0.8.0 - 2021-06-15
- remove undocumented
Workflow - allow using
Pageinstances (as opposed to just the type) for scout & scrape - add check for
get_filenameon output classes to override default filename - improved automatic
pydanticsupport - add --timeout, --no-verify, --retries, --retry-wait options
- add --fastmode option to use local cache
- fix all CLI commands to obey various scraper options
0.7.1 - 2021-06-14
- remove undocumented default behavior for
get_source_from_input - major documentation overhaul
- fixes for scout scrape when working with raw data returns
0.7.0 - 2021-06-04
- add
spatula scoutcommand - make error messages a bit more clear
- improvements to documentation
- added more CLI options to control verbosity, user agent, etc.
- if module cannot be found, search current directory
0.6.0 - 2021-04-12
- add full typing to library
- small bugfixes
0.5.0 - 2021-02-04
- add
ExcelListPage - improve
Page.loggerand CLI output - move to simpler
Workflowclass spatula scrapecan now take the name of a page, will use default
Workflow- bugfix: inconsistent name for
process_error_response
0.4.1 - 2021-02-01
- bugfix: dependencies are instantiated from parent page input
0.4.0 - 2021-02-01
- restore Python 3.7 compatibility
- add behavior to handle returning additional
Pagesubclasses to
continue scraping - add default behavior when
Page.inputhas aurlattribute. - add
PdfPage - add
page_to_itemshelper - add
Page.example_inputandPage.example_sourcefor test command - add
Page.loggerfor logging - allow use of
dataclassesin addition toattrsas input objects - improve output of HTML elements
- bugfix: not specifying a page processor on workflow is no longer an
error
0.3.0 - 2021-01-18
- first documented major release
v0.8.9
Changelog
!!! note
spatula 1.0 should be ready in a few months, providing a more stable interface to build upon, until then interfaces may change between releases.
0.8.9 - 2021-12-14
- fix for
--rmdirnot recreating directory
0.8.8 - 2021-12-09
- add
--rmdirflag tospatula scrape
0.8.7 - 2021-11-09
- add support for raising
SkipItemfrom a detail page to resume processing
without yielding data from the page
0.8.6 - 2021-10-13
- add
timeoutargument to URL source - add
--subpagesargument tospatula testwhich runs
similarly tospatula scrapebut writes output to the terminal
0.8.5 - 2021-08-09
- add
verifyargument to URL source - improve messaging when using
spatula test - add
--dumpflag tospatula scrapeto control output format
0.8.4 - 2021-07-15
self.skipis deprecated in favor of raisingSkipItem- add experimental support for module arguments to
scrapecommand
0.8.3 - 2021-06-23
- fix bug where default headers were cleared by default
- update to scrapelib 2.0.6 which contains a bugfix for a redirect follow bug
0.8.2 - 2021-06-22
- fix
spatula --versionto report correct version - allow
--datacommand line flags to overrideexample_inputvalues - add caching of
dependencies - fix pagination on non-list pages
- add advanced documentation & anatomy of a scrape
0.8.1 - 2021-06-17
- remove undocumented
page_to_itemsfunction - added
Page.do_scrapeto programmatically get all items from a scrape - added
--sourceparameter to scout & scrape commands
0.8.0 - 2021-06-15
- remove undocumented
Workflow - allow using
Pageinstances (as opposed to just the type) for scout & scrape - add check for
get_filenameon output classes to override default filename - improved automatic
pydanticsupport - add --timeout, --no-verify, --retries, --retry-wait options
- add --fastmode option to use local cache
- fix all CLI commands to obey various scraper options
0.7.1 - 2021-06-14
- remove undocumented default behavior for
get_source_from_input - major documentation overhaul
- fixes for scout scrape when working with raw data returns
0.7.0 - 2021-06-04
- add
spatula scoutcommand - make error messages a bit more clear
- improvements to documentation
- added more CLI options to control verbosity, user agent, etc.
- if module cannot be found, search current directory
0.6.0 - 2021-04-12
- add full typing to library
- small bugfixes
0.5.0 - 2021-02-04
- add
ExcelListPage - improve
Page.loggerand CLI output - move to simpler
Workflowclass spatula scrapecan now take the name of a page, will use default
Workflow- bugfix: inconsistent name for
process_error_response
0.4.1 - 2021-02-01
- bugfix: dependencies are instantiated from parent page input
0.4.0 - 2021-02-01
- restore Python 3.7 compatibility
- add behavior to handle returning additional
Pagesubclasses to
continue scraping - add default behavior when
Page.inputhas aurlattribute. - add
PdfPage - add
page_to_itemshelper - add
Page.example_inputandPage.example_sourcefor test command - add
Page.loggerfor logging - allow use of
dataclassesin addition toattrsas input objects - improve output of HTML elements
- bugfix: not specifying a page processor on workflow is no longer an
error
0.3.0 - 2021-01-18
- first documented major release
v0.8.8
Changelog
!!! note
spatula 1.0 should be ready in a few months, providing a more stable interface to build upon, until then interfaces may change between releases.
0.8.8 - 2021-12-09
- add
--rmdirflag tospatula scrape
0.8.7 - 2021-11-09
- add support for raising
SkipItemfrom a detail page to resume processing
without yielding data from the page
0.8.6 - 2021-10-13
- add
timeoutargument to URL source - add
--subpagesargument tospatula testwhich runs
similarly tospatula scrapebut writes output to the terminal
0.8.5 - 2021-08-09
- add
verifyargument to URL source - improve messaging when using
spatula test - add
--dumpflag tospatula scrapeto control output format
0.8.4 - 2021-07-15
self.skipis deprecated in favor of raisingSkipItem- add experimental support for module arguments to
scrapecommand
0.8.3 - 2021-06-23
- fix bug where default headers were cleared by default
- update to scrapelib 2.0.6 which contains a bugfix for a redirect follow bug
0.8.2 - 2021-06-22
- fix
spatula --versionto report correct version - allow
--datacommand line flags to overrideexample_inputvalues - add caching of
dependencies - fix pagination on non-list pages
- add advanced documentation & anatomy of a scrape
0.8.1 - 2021-06-17
- remove undocumented
page_to_itemsfunction - added
Page.do_scrapeto programmatically get all items from a scrape - added
--sourceparameter to scout & scrape commands
0.8.0 - 2021-06-15
- remove undocumented
Workflow - allow using
Pageinstances (as opposed to just the type) for scout & scrape - add check for
get_filenameon output classes to override default filename - improved automatic
pydanticsupport - add --timeout, --no-verify, --retries, --retry-wait options
- add --fastmode option to use local cache
- fix all CLI commands to obey various scraper options
0.7.1 - 2021-06-14
- remove undocumented default behavior for
get_source_from_input - major documentation overhaul
- fixes for scout scrape when working with raw data returns
0.7.0 - 2021-06-04
- add
spatula scoutcommand - make error messages a bit more clear
- improvements to documentation
- added more CLI options to control verbosity, user agent, etc.
- if module cannot be found, search current directory
0.6.0 - 2021-04-12
- add full typing to library
- small bugfixes
0.5.0 - 2021-02-04
- add
ExcelListPage - improve
Page.loggerand CLI output - move to simpler
Workflowclass spatula scrapecan now take the name of a page, will use default
Workflow- bugfix: inconsistent name for
process_error_response
0.4.1 - 2021-02-01
- bugfix: dependencies are instantiated from parent page input
0.4.0 - 2021-02-01
- restore Python 3.7 compatibility
- add behavior to handle returning additional
Pagesubclasses to
continue scraping - add default behavior when
Page.inputhas aurlattribute. - add
PdfPage - add
page_to_itemshelper - add
Page.example_inputandPage.example_sourcefor test command - add
Page.loggerfor logging - allow use of
dataclassesin addition toattrsas input objects - improve output of HTML elements
- bugfix: not specifying a page processor on workflow is no longer an
error
0.3.0 - 2021-01-18
- first documented major release