Skip to content

Conversation

cderv
Copy link
Collaborator

@cderv cderv commented Nov 13, 2024

This is additional fix to #11177 so that category with special characters, and UTF-8 characters are correctly handled.

It fixes first #11358 and then add to #11177 to fix #8517. A category with apostrophe does not cause issue anymore (#10829) but cliking on them was still not working.

About UTF-8 in category

Using base64 encoding works, in the limite of ASCII character range. For content with UTF-8 characters, we need to take extra measure.

This PR uses encodeURIComponent and decodeURIComponent to get into that range of characters. It seems UTF-8 stream can be used, so we could switch to that.

About clicking on category with special char

It took me a while to find the problem but the issue was that

  • categories are metadata processed by us in JS, but also passed to Pandoc to use in title-block template.

  • Pandoc parses any string value in metadata as Markdown, and we default to markdown+smart for the readers. This smart extension applies and so ' was transformed to another character.

  • This lead to our JS script not being able to find the right category to activate because they are not the same when base64 encoded.

    Example
     ---
     title: "Font test"
     categories: "apos'trophe"
     ---
     
     test

    Render to native

     Pandoc
       Meta
         { unMeta =
             fromList
               [ ( "categories" , MetaInlines [ Str "apos\8217trophe" ] )
               , ( "title"
                 , MetaInlines [ Str "Font" , Space , Str "test" ]
                 )
               ]
         }
       [ Para [ Str "test" ] ]
    

    \8217 is not '

We had such issues in the past with revealjs-url metadata, so I am using same "trick" by wrapping in pandocNativeStr() so that it can be passed unparsed as a metadata to Pandoc.

This requires processing the format metadata to override the existing one. To do that

  • it is using projectExtras as this really apply to website project.
  • it is using a metadataOverride though because currently mergeConfigs is appending arrays together otherwise.
  • There needs to be a special handle in the (again) problematic engineMetadata override. As discussed in the past, I think this should be removed, but that it is not the place to discuss. Fact is categories is not listed as a isQuartoMetadata() so, it does get identified as possible engine metadata override, but it can't happen here as we are processing it. So like revealjs theme key, there is an exception now. This whole part will need to be reconsider when we'll deal with mergeConfig and this code piece

Hopefully this explains clearly what I tried. Happy to discuss and consider another solution if you think of one.

Consequence of this change: categories can only be raw string, as in non-markdown string. Which I think is what we currently expect (but not enforced until now)

This is additional fix to #11177 so that category with special characters, and UTF-8 characters are correctly handled.

Using base64 encoding works, in the limite of ASCII character range. For content with UTF-8 characters, we need to take extra measure.
…nserted in templates.

This also prevent +smart extension from pandoc to apply and modify some character like a single quote
@cderv cderv added the needs-discussion Issues that require a team-wide discussion before proceeding further label Nov 13, 2024
@cderv cderv requested a review from cscheid November 13, 2024 23:24
@cscheid cscheid merged commit f38a78d into main Nov 14, 2024
47 checks passed
@cscheid cscheid deleted the fix/utf8-categories branch November 14, 2024 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-discussion Issues that require a team-wide discussion before proceeding further

Projects

None yet

2 participants