|
6 | 6 |
|
7 | 7 | #### New features |
8 | 8 |
|
| 9 | +- **Browser processor:** Loads fetched pages in a local browser (Firefox/ChromeDriver), records all browser requests, |
| 10 | + and runs pluggable behaviors (e.g. scrolling, link extraction). [#653](https://github.com/internetarchive/heritrix3/pull/653) |
| 11 | + - Uses the [WebDriver BiDi protocol](https://www.w3.org/TR/webdriver-bidi/) for browser automation. |
| 12 | + - The recording proxy is built on Jetty's ProxyHandler and the FetchHTTP2 module. |
| 13 | + - **Status:** Working for small crawls but needs more robust error handling (browser crashes, resource limits). |
| 14 | + |
9 | 15 | - **Basic web auth:** You can now switch the web interface from Digest authentication to Basic authentication |
10 | 16 | with the `--web-auth basic` command-line option. This is useful when running Heritrix behind a reverse proxy that |
11 | | - adds external authentication. |
| 17 | + adds external authentication. [#654](https://github.com/internetarchive/heritrix3/pull/654) |
12 | 18 |
|
13 | 19 | - **Robots.txt wildcards:** The `*` and `$` wildcard rules from RFC 9309 are now supported. |
14 | 20 | [#656](https://github.com/internetarchive/heritrix3/pull/656) |
|
17 | 23 |
|
18 | 24 | - **Code editor:** The configuration editor and script console were upgraded to CodeMirror 6. This resolves some browser |
19 | 25 | incompatibilities, allowing CodeMirror’s own find function to be re-enabled for reliable text search of content far |
20 | | - outside the viewport. |
| 26 | + outside the viewport. [#651](https://github.com/internetarchive/heritrix3/pull/651) |
21 | 27 |
|
22 | 28 | #### Removals |
23 | 29 |
|
24 | 30 | - **Removed Apache HttpClient 3**: If you have custom Heritrix modules you may need to update the following |
25 | | - class references in your code: |
| 31 | + class references in your code: |
26 | 32 |
|
27 | 33 | | Removed | Replacement | |
28 | 34 | |-----------------------------------------------------------|--------------------------------------| |
29 | 35 | | `org.apache.commons.httpclient.URIException` | `org.archive.url.URIException` | |
30 | 36 | | `org.apache.commons.httpclient.Header` | `org.archive.format.http.HttpHeader` | |
31 | 37 |
|
32 | 38 | Note that Apache HttpClient 4 (`org.apache.http`) was not removed. |
| 39 | + [#652](https://github.com/internetarchive/heritrix3/pull/652) |
33 | 40 |
|
34 | 41 | #### Dependency Upgrades |
35 | 42 |
|
36 | 43 | - **codemirror**: 2.23 → 6 |
37 | | -- **easmock**: 5.5.0 → removed |
| 44 | +- **easymock**: 5.5.0 → removed |
38 | 45 | - **junit**: 5.12.2 → 5.13.1 |
39 | 46 | - **spring**: 6.2.6 → 6.2.7 |
40 | 47 | - **webarchive-commons**: 1.3.0 → 2.0.1 |
|
0 commit comments