- New features
- Added support for configuring the jsoup parser to use.
- Breaking changes
- Drops support for Scala 2.12.
- Bug fixes
- Reverted Scala 3 version back to 3.3 (LTS) to fix binary compatibility issues introduced in the last version.
Maintenance update to update dependency versions.
Maintenance update to update dependency versions.
Maintenance update to update dependency versions.
- New features
- Added a new
ownTextmethod toElement.
- Added a new
- Breaking changes
- Upgraded htmlunit to 3.x.
Support for Scala 3.1 was introduced.
- New features
- Added a new
withProxymethod toBrowser, allowing users to configure the proxy regardless of the browser implementation being used.
- Added a new
- Breaking changes
- Removed all previously deprecated code.
Support for Scala 2.11 was dropped.
Support for Scala 2.13 was introduced.
- Deprecations
ProxyUtilswas deprecated in favor of setting proxy servers perBrowserinstance (see below);
- New features
JsoupBrowserandHtmlUnitBrowsercan now be created with proxy settings that are applied only to the created instance, superseeding the usage ofProxyUtils;- Added a new
tablecontext extractor allowing the extraction of cells from HTML tables.
- Breaking changes
- Extracting using a CSS query string as extractor will now extract elements instead of text. This allows easier
chaining of extractors and CSS selectors and fits more nicely the current extractor model. The old behavior can be
recovered by wrapping the CSS query string in the
textscontent extractor, e.g.doc >> texts("myQuery"); HtmlExtractor,HtmlValidatorandElementQuerynow have an additional type parameter for the type ofElementthey work on. If you have custom instances of one of those classes, filling the missing parameter withElement(which is a superclass of all elements) should be enough for them to work with all source code using scala-scraper 1.x;- Methods for loading extractors and validators from a config were extracted to a separate module. In order to use
them users must add
scala-scraper-configto their SBT dependencies and importnet.ruippeixotog.scalascraper.config.dsl.DSL._; - The implicit conversion of
Validated/Eitherto aRightProjectionin order to exposeforeach,mapandflatMapin for comprehensions was moved to a separate object that is not imported together with the DSL. Either upgrade to Scala 2.12 (in whichEitheris already right-biased) or import the newnet.ruippeixotog.scalascraper.util.EitherRightBiassupport object;
- Extracting using a CSS query string as extractor will now extract elements instead of text. This allows easier
chaining of extractors and CSS selectors and fits more nicely the current extractor model. The old behavior can be
recovered by wrapping the CSS query string in the
- Deprecations
SimpleExtractorandSimpleValidatorare now deprecated. The classes remain available for the time being, but DSL methods that returned those classes now return onlyHtmlExtractorandHtmlValidatorinstances;- The
Validatedtype alias is now deprecated. Users should now useEither,RightandLeftdirectly; - The
asDatecontent parser was deprecated in favor ofasLocalDateandasDateTime; - The DSL validation operator
~/~was renamed to>/~in order to have the same precedence as the extraction operators>>and>?>; - The
andDSL operator is deprecated and will be removed in future versions;
- New features
- The concrete type of the models in scala-scraper is now passed down from the
BrowsertoElementinstances extracted from documents. This allows users to use features unique of each browser (such as modifying or interacting with elements) while still using the scala-scraper DSL to exteact and query them; HtmlExtractor[E, A]is now a proper instance ofElementQuery[E] => Aand havemapandmapQuerymethods to map the extraction results and the preceding query, respectively;- Content extractors, which were previously just functions, are now full-fledged
HtmlExtractorinstances and can be used by themselves, e.g.doc >> elements,doc >> elementList("myQuery") >> formData; - A new
PolyHtmlExtractorclass was created, allowing the implementation of extractors whose return type depends on the type of the element or document being extracted; - Overall code cleanup and simplification of some concepts.
- The concrete type of the models in scala-scraper is now passed down from the
- Bug fixes
- Fix type parameter usage in three-arg
>?>DSL operator.
- Fix type parameter usage in three-arg
- New features
- Support for Scala 2.12;
- New method
closeAllinHtmlUnitBrowser, for closing opened windows; - New model
Noderepresenting a DOM node - in this library, either aElementNodeor aTextNode; - New methods
childNodesandsiblingNodesinElement.
- New features
- New methods
clearCookies,parseInputStreamandparseResourceinBrowser; - New methods
hasAttrandsiblingsinElement; - Support for SOCKS proxies.
- New methods
- Bug fixes
- Correct handling of missing name and value attributes in the
formDataextractor.
- Correct handling of missing name and value attributes in the
First stable version.