Commit 42b2f78
committed
fix: examples accuracy, export stalling, missing images, third-party asset filtering
Docs & examples:
- Fix install instructions (not on PyPI, use git clone + local install)
- Fix invalid format name markdown-github -> html,md
- Fix placeholder URLs (your-org -> 19-84, security@example.com)
- Fix EXPORT_FORMATS docs to use valid names (html, md, hybrid)
- Remove 9 deadwood config options ([export.html/markdown/github])
Docker:
- Dockerfiles now pip install the package (not bare deps)
- Fix healthcheck to use python -m chronicon.cli
- Fix postgres compose: API volume not read-only, watch gets external network
- Fix double entrypoint in postgres compose usage comment
- Remove deprecated version key and duplicate restart in prod compose
Export stalling:
- Add iter_topics_batched() to database layer for paginated iteration
- Rewrite search indexer to stream JSON to disk (single pass, no list accumulation)
- Fix SEO context to use current page posts instead of loading all
- Sitemap writes directly to file instead of building in-memory list
- HTML and markdown exporters use batched topic iteration
Missing images:
- Extract data-src/data-original attributes (Discourse lazy loading)
- Fix emoji URL filter (class="emoji" is sufficient, don't require "emoji" in path)
- Normalize protocol-relative URLs (//domain) to https:// everywhere
- Fix lightbox URL extraction doubling domain on // URLs
- Log download failures with URL and error type
- Handle filename collisions with hash suffix
- Fix Tier 3 fallback: exact filename match instead of substring
- Fix Windows path separator in get_assets_for_topic()
- Remove bogus /images/favicon.ico and /images/logo.png fallback downloads
- Filter third-party URLs before downloading (only forum domain + CDN)
- Skip emoji class images in extract_image_sets (handled separately)
CLI:
- Add --base-url flag for canonical URL (was config-only)
- Add --timeout, --retry-max, --posts-per-page flags
- Wire timeout/retry into API client from config
Tests:
- Add test_search_indexer_streaming.py (10 tests)
- Add test_examples_accuracy.py (21 e2e tests for examples)
- Add data-src/data-original extraction tests
- Add protocol-relative URL tests
- Add iter_topics_batched and cross-platform path tests
- Update emoji filter and site asset tests for new behavior1 parent 59dd1d1 commit 42b2f78
File tree
31 files changed
+1052
-308
lines changed- examples
- docker
- config
- systemd
- src/chronicon
- exporters
- fetchers
- processors
- storage
- utils
- tests
31 files changed
+1052
-308
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | 25 | | |
41 | 26 | | |
42 | 27 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
20 | | - | |
| 20 | + | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | | - | |
| 25 | + | |
25 | 26 | | |
26 | | - | |
| 27 | + | |
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| |||
67 | 68 | | |
68 | 69 | | |
69 | 70 | | |
70 | | - | |
| 71 | + | |
71 | 72 | | |
72 | 73 | | |
73 | 74 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | | - | |
| 24 | + | |
24 | 25 | | |
25 | | - | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| |||
67 | 68 | | |
68 | 69 | | |
69 | 70 | | |
70 | | - | |
| 71 | + | |
71 | 72 | | |
72 | 73 | | |
73 | 74 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
36 | | - | |
37 | | - | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
38 | 39 | | |
39 | | - | |
40 | | - | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
41 | 44 | | |
42 | 45 | | |
43 | | - | |
44 | | - | |
45 | | - | |
| 46 | + | |
46 | 47 | | |
47 | 48 | | |
48 | 49 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
| 7 | + | |
7 | 8 | | |
8 | 9 | | |
9 | 10 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
| 6 | + | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | | - | |
| 24 | + | |
24 | 25 | | |
25 | | - | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| |||
67 | 68 | | |
68 | 69 | | |
69 | 70 | | |
70 | | - | |
| 71 | + | |
71 | 72 | | |
72 | 73 | | |
73 | 74 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | | - | |
| 24 | + | |
24 | 25 | | |
25 | | - | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
461 | 461 | | |
462 | 462 | | |
463 | 463 | | |
464 | | - | |
465 | | - | |
| 464 | + | |
| 465 | + | |
466 | 466 | | |
467 | 467 | | |
468 | | - | |
469 | | - | |
| 468 | + | |
| 469 | + | |
470 | 470 | | |
471 | 471 | | |
472 | 472 | | |
| |||
0 commit comments