Skip to content

Fix OpenType Context Substitution Format 3 and decomposition type handling for Arabic/emoji shaping#824

Open
fjobeir wants to merge 12 commits intoopentypejs:masterfrom
fjobeir:master
Open

Fix OpenType Context Substitution Format 3 and decomposition type handling for Arabic/emoji shaping#824
fjobeir wants to merge 12 commits intoopentypejs:masterfrom
fjobeir:master

Conversation

@fjobeir
Copy link
Copy Markdown

@fjobeir fjobeir commented Apr 1, 2026

Description

Replace the contextSubstitutionFormat3 implementation in src/features/featureQuery.mjs with one that uses lookupCoverageList and the standard lookup infrastructure (getLookupByIndex, getLookupMethod, getSubstitutionType) for proper GSUB lookup type 5, substFormat 3 handling. Additionally, fix both contextSubstitutionFormat3 and chainingSubstitutionFormat3 to use lookupRecord.sequenceIndex instead of iterating all input lookups, and add support for decomposition substitution type 21 in both functions.

Also adds a version field (1.4.0-beta.1) to package.json to support downstream dependency resolution.

Motivation and Context

Some Arabic fonts (e.g. IBM Plex Sans Arabic) use OpenType Context Substitution Format 3 (GSUB lookup type 5, substFormat 3) for contextual letter forms. The previous implementation bypassed the standard lookup infrastructure and could fail to correctly resolve substitutions. Additionally, both contextSubstitutionFormat3 and chainingSubstitutionFormat3 iterated all input lookups for each lookup record instead of targeting the specific glyph at sequenceIndex, causing duplicate glyph output. Fonts like noto-emoji that use nested decomposition substitution (type 21) within context/chaining lookups also failed because only type 12 was handled.

This is part of a broader effort to fix RTL (Arabic/Hebrew) text rendering in Next.js OG image generation, which uses Satori -> opentype.js for font shaping.

How Has This Been Tested?

  • All 328 existing tests pass (npm run test), including:
    • featureQuery.mjs - "should parse multiple glyphs - ligature substitution format 3 (53)" — validates correct substitution output [54, 54]
    • bidi.mjs - "shape emoji with sub_5" — validates flag emoji shaping produces [1850] (not [1850, 1850])
    • featureQuery.mjs - "should find a substitute - chaining context substitution format 3 (63)" — validates chaining substitution still produces [1348]
  • Downstream Satori test suite (432 tests) passes with this change, including the Arabic language rendering test

Screenshots (if appropriate):

N/A — changes affect font shaping logic, validated via automated tests.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • I did npm run test and all tests passed green (including code styling checks).
  • I have added tests to cover my changes.
  • My change requires a change to the documentation.
  • I have updated the README accordingly.
  • I have read the Contribute README section.

claude and others added 11 commits March 31, 2026 23:07
Replace contextSubstitutionFormat3 with implementation that uses
lookupCoverageList and getLookupByIndex for proper GSUB lookup type 5
substFormat 3 handling. Some Arabic fonts (e.g. IBM Plex Sans Arabic)
use this lookup format for contextual letter forms.

https://claude.ai/code/session_0192MAXgejpjkKBgChuyTFcd
Fix OpenType Context Substitution Format 3 for Arabic font shaping
Each lookupRecord should only substitute the glyph at its
sequenceIndex, not iterate all input lookups. This caused
duplicate results (e.g. [54,54,54,54] instead of [54,54]).

https://claude.ai/code/session_0192MAXgejpjkKBgChuyTFcd
Fix contextSubstitutionFormat3 producing duplicate substitutions
The chaining context substitution handler only supported type '12'
(single substitution format 2) and threw on all other types. Some
fonts (e.g. noto-emoji) use type '21' (multiple/decomposition
substitution) within chaining context lookups.

https://claude.ai/code/session_0192MAXgejpjkKBgChuyTFcd
Handle decomposition substitution type 21 in chainingSubstitutionFormat3
Same fix as contextSubstitutionFormat3: each lookupRecord should only
substitute the glyph at its sequenceIndex, not iterate all input
lookups. Fixes duplicate glyph output in emoji/flag shaping.

https://claude.ai/code/session_0192MAXgejpjkKBgChuyTFcd
Fix chainingSubstitutionFormat3 to use lookupRecord.sequenceIndex
Context substitution format 3 only handled nested lookup type '12'
(single substitution). Some fonts (e.g. noto-emoji) use type '21'
(multiple/decomposition substitution) in context substitution lookup
records. Without handling type '21', the second covered glyph was not
processed by the context substitution, causing it to be re-matched
and duplicated in the ccmp pipeline.

https://claude.ai/code/session_0192MAXgejpjkKBgChuyTFcd
fjobeir pushed a commit to fjobeir/satori that referenced this pull request Apr 3, 2026
Keep @shuding/opentype.js at 1.4.0-beta.0 (the published version).
The opentype.js Context Substitution Format 3 fix is tracked
separately in opentypejs/opentype.js#824. Restore webkit-text-stroke
snapshots to match the published dependency.

https://claude.ai/code/session_0192MAXgejpjkKBgChuyTFcd
@fjobeir
Copy link
Copy Markdown
Author

fjobeir commented Apr 8, 2026

Hi @fdb would you please take a look whenever possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants