Skip to content

Releases: openzim/sotoki

3.0.2

22 Dec 10:15
e0f6359

Choose a tag to compare

Fixed

  • Fix extraction of header on non-stackoverflow and non-stackexchange domains (#385)

3.0.1

18 Dec 11:22
d2cb519

Choose a tag to compare

  • Fix scraper is looping over failing URLs forever (#379)
  • Fix extraction of header content for stackoverflow domains (#378)
  • Rewrite CSS files (#383)

3.0.0

12 Dec 09:32
d6ca6cc

Choose a tag to compare

Changed

  • Upgrade to Python 3.14 + Debian bookworm (#351)
  • BREAKING: Replace usage of multiple --tag with single CSV --tags for Zimfarm integration (#351)
  • Enhance logic to download images to better respect upstream servers (#367 and #369)

### Fixed

  • Docker image does not build anymore on main branch (#346)
  • Scraper fails while processing posts tags (#338)
  • Posts tags can be split by | or by >< characters (#356)
  • Fix numbers passed to Users progresser (#353)
  • Fix display of comments randomly not present (#361)
  • Add SVG icons for all left menu entries (#365)
  • Fetch, rewrite and include real online page header (including potential SVG image) (#241)
  • Simplify mobile topbar to better reuse styles already fetched (#364)
  • Fix display of user infos in posts (#362)
  • Fix display of post vote counts (#355)
  • Add support for SVG images (#364)
  • Fix computation of article excerpt to produce valid safe HTML (#320)
  • Adjust computation of linked questions and remove related ones (#263)
  • Scraper keeps re-uploading same images (#373)

2.2.1

26 Jun 07:53
10af887

Choose a tag to compare

Fixed

  • Reduce number of workers and add backoff on 429 and non-HTTP (network, ...) errors

Changed

  • Changed default --redis-url behavior (#333)
    • It now uses the REDIS_URL environment variable if set, falling back to redis://localhost:6379.
    • Docker image sets REDIS_URL to unix:///var/run/redis.sock by default.

2.2.0

10 Jun 11:57
eaf48dc

Choose a tag to compare

Changed

  • Breaking changes: adapt to new StackExchange dumps and missing Sites.xml (#322)
    • Only working on recent dumps (June 2024 and later)
    • --title and --description CLI parameters are now mandatory to specify ZIM metadata
    • Dropped -l/--list-all CLI action to list all SE sites (not working anymore)

Fixed

  • Fix duplicate english Language metadata (#321)
  • Change image processing order to save memory (#325)
  • Fix confusion between selection and flavour in ZIM name (#327)
  • New XML dump files have changed (#329)

2.1.3

29 Oct 13:36
e4c7a7d

Choose a tag to compare

Fixed

  • Fix Mathjax equations not displayed properly (#283)

2.1.2

13 May 09:25
a80bc68

Choose a tag to compare

Fixed

  • User icons don't load properly (#301)
  • Revert adaptations to upstream XML format changes (#313)

2.1.1

07 May 14:47
1bc3b9b

Choose a tag to compare

Fixed

  • Adapt to upstream XML format changes (#305)
  • Add continuous delivery to Pypi (#303)

2.1.0

28 Mar 16:40
d5ad19a

Choose a tag to compare

Added

  • Redirection from /questions/{questionId} to the question page (#277)

Changed

  • ZIM Tags now include _videos:no;_details:no and conditionaly include _pictures:no (#278)
  • Default filename now uses nopic instead of all if using --without-images (#278)
  • Multi-language domains now handled as such:
    • Language metadata to be set to eng,xxx (xxx being the second language)
    • Name metadata to be like "{domain}mul{variant}"
    • Filename metadata to match Name
  • Using zimscraperlib 3.3
  • Changed default publisher metadata from 'Kiwix' to 'openZIM'
  • description metadata is now limited to 80 chars, full description goes to the long_description (#290)

Fixed

  • Multilanguage ZIM are not perfectly handled (#259)
  • Incorrect image displayed (#284)
  • Markdown text formatting is not rendered (#286)
  • Harmonize default publisher to openZIM (#291)
  • Docker image: align redis binaries with Python distribution (#294)
  • Issue with xml.sax.saxutils (#298)

2.0.2

31 Oct 13:17

Choose a tag to compare

Changed

  • Fixed language-code-looking project codes setting incorrect Language (ell, or, vi)
  • Fixed --name parameter not being used to set Name nor filename (#267)
  • Sax parser now explicitly closed after use
  • Fixed same-protocol links being considered relative paths during rewriting (#265)
  • More reliable database commits
  • Updated to zimscraperlib 1.8.0 and lxml 4.9.1
  • Removed inline JS to comply with some CSP
  • renamed redis module to avoid confusion
  • External link icon now included