Skip to content

Conversation

@lanijw
Copy link
Contributor

@lanijw lanijw commented Nov 2, 2025

This PR implements a python script that runs in the CI pipeline, which converts the entire hanabi.github.io content (except for one page) into an epub e-book.

This is still a draft, because I haven’t quite figured out how to get the pipeline to run the script automatically, but the script itself is complete. If someone wants to adjust the pipeline config to complete that, please be my guest.

To run the script locally just checkout the branch and run ./scripts/epub.sh. Download the my latest compilation of the epub here (too large for github attachments, expires 2025-01-01).

Authors Note: I’m so glad I’m finally done. This project probably took me upwards of 60 hours to complete. Even thought it was 95 % of great fun, the other 5 % were agonising debugging hell.

@lanijw
Copy link
Contributor Author

lanijw commented Nov 2, 2025

closes #1350

@Zamiell
Copy link
Collaborator

Zamiell commented Nov 3, 2025

can you rebase on main? then you can probably get rid of the TODO: Investigate what the logic behind this path compilation is. part

@Zamiell
Copy link
Collaborator

Zamiell commented Nov 3, 2025

additionally, if you make it pass CI then I will take a deeper look

@lanijw
Copy link
Contributor Author

lanijw commented Nov 3, 2025

can you rebase on main? then you can probably get rid of the TODO: Investigate what the logic behind this path compilation is. part

Unfortunately that doesn’t do it. This is needed, because docusaurus seems to recompile the reference examples/5-pull-double-finesse.mdx in level-19 to pull-double-finesse.mdx.

@Zamiell
Copy link
Collaborator

Zamiell commented Nov 3, 2025

oh that's interesting.
I hypothesize that its because using a number as the first character is illegal.
maybe we should change the name of the page from 5-pull-double-finesse.mdx to five-pull-double-finesse-finesse.mdx, and then we can remove the special-case code.
what do you think about that?

@Zamiell
Copy link
Collaborator

Zamiell commented Nov 3, 2025

actually, upon further investigation, we have 2-save.mdx and 5-save.mdx.
is special code needed for those files too?

@Zamiell
Copy link
Collaborator

Zamiell commented Nov 3, 2025

please remove the changes to package.json + package-lock.json.
we already have prettier installed as part of complete-lint.

@lanijw
Copy link
Contributor Author

lanijw commented Nov 3, 2025

actually, upon further investigation, we have 2-save.mdx and 5-save.mdx.
is special code needed for those files too?

@Zamiell

I was unsure and couldn’t figure out why I didn’t do that in the first place, when you first asked, but now figured it out. I think the situation is this:

  • 5-pull-double-finesse.mdx is the only file that’s referenced from another .mdx file.
  • The other two examples are (2-save, 5-save) are only referenced in the sidebars.ts file.

(See outputs of grep -rnE '\]\((.+/)*[0-9]’ docs and grep -rn docs -e '2-save’.)

My guess is that having a link to another file from mdx to mdx, is somehow illegal from docusaurus’ point of view, so they update that.

Even though I’m a little annoyed I have to fix another mistake, I think I should include logic for this, because docusaurus supports it.

@Zamiell
Copy link
Collaborator

Zamiell commented Nov 3, 2025

5-pull-double-finesse.mdx is the only file that’s referenced from another .mdx file.

by this, do you mean to say "the only mdx file that is referenced from another .mdx file"?
because if so, that's not true, there are tons of mdx files linking to other mdx files, like for example at the bottom of level 2, it links to the level 2 challenge questions.

@lanijw
Copy link
Contributor Author

lanijw commented Nov 3, 2025

5-pull-double-finesse.mdx is the only file that’s referenced from another .mdx file.

by this, do you mean to say "the only mdx file that is referenced from another .mdx file"? because if so, that's not true, there are tons of mdx files linking to other mdx files, like for example at the bottom of level 2, it links to the level 2 challenge questions.

You’re right. I didn’t express myself clearly. There was a word missing there. I meant to say 5-pull-double-finesse.mdx is the only file, which starts with a digit, that’s referenced from another .mdx file.

@Zamiell
Copy link
Collaborator

Zamiell commented Nov 3, 2025

oh I see.
it's super annoying that docusaurus does that.
we could just change this file to start with a letter, but then its kind of a footgun because the next time a new example gets added, we might not remember to make it start with a letter.
so probably better to add code to dynamically handle this case whenever it finds a mdx --> mdx link.

@lanijw
Copy link
Contributor Author

lanijw commented Nov 3, 2025

better to add code to dynamically handle this case whenever it finds a mdx --> mdx link.

I agree. I’ll do that later, though. I started using the epub and found another small bug, that I’ll have to fix. I’m out of time for today, though. Gotta do other stuff.

I also fixed the CI as far as I could, without getting into fixing approach to launching my transformer (I expect the build dir to exist, but it doesn’t seem to). Problem for later me.

@lanijw
Copy link
Contributor Author

lanijw commented Nov 5, 2025

@Zamiell Implemented the fix we talked about and two more. Also got the pipeline to run through.

Aside from your review of the code the big thing that’s left, is to decide where to link to the epub file. It’s stored under assets/epub/hgroup-conventions.epub. I don’t see a logical page to put it on, since there’s not downloads or assets page and creating one just for a single file seems a little weird, since there probably won’t be another file for quite a while. My first thought was to add a fourth item on the “Home” page below “Learning Path”.

@Zamiell
Copy link
Collaborator

Zamiell commented Nov 5, 2025

I think you can put a link on this page: https://hanabi.github.io/about

@lanijw
Copy link
Contributor Author

lanijw commented Nov 22, 2025

Did you get a chance to take a closer look @Zamiell?

@Zamiell
Copy link
Collaborator

Zamiell commented Nov 22, 2025

i'll take a look now

@Zamiell
Copy link
Collaborator

Zamiell commented Nov 23, 2025

I rebased your branch on main and then made several commits, you can review them above if you'd like.

Now: Is there a reason that the script has to be in Python? We don't have any other Python files in this repo, so we don't have the tooling set up to ensure that Python files are linted, to ensure that Python files are formatted, and so on.

Docusaurus uses TypeScript, and we already wrote the YAML --> SVG code in TypeScript. So, I think TypeScript might also be the correct choice for this script. If you did that, then we also clean up the actions/setup-python@v6 stuff that you added to the "ci.yml" file.

Assuming that there isn't a good reason for it to be in Python, can you use an LLM to convert it from Python to TypeScript? And then ensure that the converted version still works?

@Zamiell
Copy link
Collaborator

Zamiell commented Nov 23, 2025

Also, I tried to run the Python script and it didn't work:

james@1qaz MSYS /d/Repositories/hanabi.github.io (lanijw/main)
$ ./scripts/epub.sh
Traceback (most recent call last):
  File "D:\Repositories\hanabi.github.io\scripts\epub.py", line 655, in <module>
    main()
  File "D:\Repositories\hanabi.github.io\scripts\epub.py", line 56, in main
    linked_files = collect_linked_files(toc, link_map)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Repositories\hanabi.github.io\scripts\epub.py", line 194, in collect_linked_files
    collect_linked_files(node["children"], files)
  File "D:\Repositories\hanabi.github.io\scripts\epub.py", line 167, in collect_linked_files
    mdx_str = f.read()
              ^^^^^^^^
  File "C:\Users\james\AppData\Local\Programs\Python\Python312\Lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 512: character maps to <undefined>

This might be because:

  1. my commits to your branch messed something up somehow
  2. I'm on Windows and you are on macOS
  3. something else

For now, I'll just report it here, but ultimately we should get this fixed, because before I merge this PR, I would want to get the script working correctly on my local computer so that I can open the epub and check it out.

@lanijw
Copy link
Contributor Author

lanijw commented Nov 23, 2025

I removed the newline from the mimetype file, because some e-readers might not handle that well. The EPUB specification requires that this file only contain the string and no newlines (20 bytes). If there is a newline epubcheck fails.

(venv) hanabi.github.io [main●] epubcheck build/assets/epub/hgroup-conventions.epub
ERROR(PKG-007): build/assets/epub/hgroup-conventions.epub/./build/assets/epub/hgroup-conventions.epub(-1,-1): Mimetype file should only contain the string "application/epub+zip" and should not be compressed.
Validating using EPUB version 2.0.1 rules.

Check finished with errors
Messages: 0 fatals / 1 error / 0 warnings / 0 infos

EPUBCheck completed```

@lanijw
Copy link
Contributor Author

lanijw commented Nov 23, 2025

I tried to run the Python script and it didn't work

I hope this won't be a long back and forth, but I added some fixes. The error you got, most likely comes from python choosing the system file encoding (when no encoding is specified), which is not UTF-8 on Windows, so when a UTF-8 character is read, it chokes. I should have and now have updated all file operations to use UTF-8 explicitly, so that would fix this issue. Let's hope there are no other platform specific issues.

The other important thing you need to have is a valid build in the build folder, since the SVGs for the examples are extracted from there.

@lanijw
Copy link
Contributor Author

lanijw commented Nov 23, 2025

Is there a reason that the script has to be in Python?

Not really, other than me knowing it would be possible. I will try to get that working in TS, but it'll probably take me a while, because I assume that none of the LLMs I have access to, would be powerful enough to do this at all or without errors.

@lanijw
Copy link
Contributor Author

lanijw commented Nov 23, 2025

And thank you for taking the time to look at it in depth!

@Zamiell
Copy link
Collaborator

Zamiell commented Nov 23, 2025

gemini 3 is free and should be able to accomplish this task

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants