- Add
encodingconfig option (see All available config options) - Validate url before processing a request (Base#request_to)
- Fix console command bug (see issue 21)
- In the project template, set Ruby version as >= 2.5 (before was hard-coded to 2.5.1)
- Remove .ruby-version file (was hard-coded to 2.5.1) from the project template
- Fixed bug in Base#save_to
- Remove persistence database feature (because it's slow and makes things complicated)
- Add
--includeand--excludeoptions to CLI#runner - Add Base
#create_browsermethod to easily create additional browser instances - Add Capybara::Session
#scroll_to_bottom - Add skip_on_failure feature to
retry_request_errorsconfig option - Add info about
add_eventmethod to the README
- Improve Runner
- Fix time helper in schedule.rb
- Add proxy validation to browser builders
- Allow to pass different arguments to the
Base.parsemethod
- Add possibility to add array of values to the storage (
Base::Storage#add) - Add
exception_on_failoption toBase.crawl! - Add possibility to pass request hash to the
start_urls(You can use array of hashes as well, like:@start_urls = [{ url: "https://example.com/cat?id=1", data: { category: "First Category" } }]) - Implement
skip_request_errorsconfig feature. Added Handle request errors chapter to the README. - Add option to choose response type for
Session#current_response(:htmldefault, or:json) - Add option to provide custom chrome and chromedriver paths
- Refactor
Runner
- Fix
Base#Saver(automatically create file if it doesn't exists in case of persistence database) - Do not deep merge config's
headers:option
browser config option depricated. Now all sub-options inside browser should be placed right into @config hash, without browser parent key. Example:
# Was:
@config = {
browser: {
retry_request_errors: [Net::ReadTimeout],
restart_if: {
memory_limit: 350_000,
requests_limit: 100
},
before_request: {
change_proxy: true,
change_user_agent: true,
clear_cookies: true,
clear_and_set_cookies: true,
delay: 1..3
}
}
}
# Now:
@config = {
retry_request_errors: [Net::ReadTimeout],
restart_if: {
memory_limit: 350_000,
requests_limit: 100
},
before_request: {
change_proxy: true,
change_user_agent: true,
clear_cookies: true,
clear_and_set_cookies: true,
delay: 1..3
}
}- Add
storageobject with additional methods and persistence database feature - Add events feature to
run_info - Add
skip_duplicate_requestsconfig option to automatically skip already visited urls when using requrst_to - Add
extensionsconfig option to allow inject JS code into browser (supported only by poltergeist_phantomjs engine) - Add Capybara::Session#within_new_window_by method
- Add the last backtrace line to pipeline output when item was dropped
- Do not destroy driver if it's not exists (for Base.parse! method)
- Handle possible Net::ReadTimeout error while trying to #quit driver
- Fix Mechanize::Driver#proxy (there was a bug while using proxy for mechanize engine without authorization)
- Fix requests retries logic
- Add missing
loggermethod to pipeline - Fix
set_proxyin Mechanize and Poltergeist builders