Skip to content

Commit ef26826

Browse files
committed
Merge branch 'main' into fulldom-env
* main: Release v9.1.1 Improve changelog Add test when mime type contains charset Fix PDF extraction when MIME type contains charset
2 parents 08e0359 + 6cea3ae commit ef26826

File tree

5 files changed

+18
-4
lines changed

5 files changed

+18
-4
lines changed

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,16 @@ All changes that impact users of this module are documented in this file, in the
1212
- Add debugging options to disable headless mode for visual troubleshooting during development; set `FETCHER_NO_HEADLESS=1` to show browser window
1313
- Add sandbox control for improved compatibility with Docker and containerized environments; set `FETCHER_NO_SANDBOX=1` when running in containers
1414

15+
## 9.1.1 - 2025-10-07
16+
17+
_Full changeset and discussions: [#1198](https://github.com/OpenTermsArchive/engine/pull/1198)._
18+
19+
> Development of this release was supported by the [French Ministry for Foreign Affairs](https://www.diplomatie.gouv.fr/fr/politique-etrangere-de-la-france/diplomatie-numerique/) through its ministerial [State Startups incubator](https://beta.gouv.fr/startups/open-terms-archive.html) under the aegis of the Ambassador for Digital Affairs.
20+
21+
### Fixed
22+
23+
- Increase robustness of PDF content type detection
24+
1525
## 9.1.0 - 2025-10-01
1626

1727
_Full changeset and discussions: [#1197](https://github.com/OpenTermsArchive/engine/pull/1197)._

package-lock.json

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@opentermsarchive/engine",
3-
"version": "9.1.0",
3+
"version": "9.1.1",
44
"description": "Tracks and makes visible changes to the terms of online services",
55
"homepage": "https://opentermsarchive.org",
66
"bugs": {

src/archivist/extract/index.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ export { ExtractDocumentError } from './errors.js';
1818
*/
1919
export default async function extract(sourceDocument) {
2020
try {
21-
if (sourceDocument.mimeType == mime.getType('pdf')) {
21+
if (mime.getExtension(sourceDocument.mimeType) == 'pdf') {
2222
return await extractFromPDF(sourceDocument);
2323
}
2424

src/archivist/extract/index.test.js

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -534,6 +534,10 @@ describe('Extract', () => {
534534
expect(await extract({ content: pdfContent, mimeType: mime.getType('pdf') })).to.equal(expectedExtractedContent);
535535
});
536536

537+
it('extracts content from PDF when MIME type includes charset parameter', async () => {
538+
expect(await extract({ content: pdfContent, mimeType: 'application/pdf; charset=utf-8' })).to.equal(expectedExtractedContent);
539+
});
540+
537541
context('when PDF contains no text', () => {
538542
it('throws an ExtractDocumentError error', async () => {
539543
await expect(extract({ content: await fs.readFile(path.resolve(__dirname, '../../../test/fixtures/termsWithoutText.pdf')), mimeType: mime.getType('pdf') })).to.be.rejectedWith(ExtractDocumentError, /contains no text/);

0 commit comments

Comments
 (0)