Add AedTreeBuilder #127

larissakl · 2025-05-17T11:47:39Z

Adds an AedTreeBuilder which can later be used for a TreeLabelsyncBeamSearch. The search tree only includes labels, without blanks, self-loops and skip-transitions.
The sentence-end token is retrieved from a special lemma in the lexicon and added as a label reachable from the root.
As in the CtcTreeBuilder, a word-boundary root will be added if a word-boundary token is present in the lexicon.
I moved some helper functions from CtcTreeBuilder to AbstractTreeBuilder so that I can easily reuse them.

larissakl · 2025-05-28T13:51:04Z

The helper functions used by CtcTreeBuilder and AedTreeBuilder are now in the shared base class ~~CtcAedSharedBaseClassTreeBuilder~~ SharedBaseClassTreeBuilder instead of AbstractTreeBuilder.

SimBe195 · 2025-07-28T14:24:55Z

@larissakl Do you have any plots that show the generated tree structure for a simple example lexicon? If so, it would be nice to include one in this PR.

larissakl · 2025-07-29T06:40:58Z

Sure, here is an example tree and this is the corresponding example lexicon:

<?xml version="1.0" ?>
<lexicon>
  <phoneme-inventory>
    <phoneme>
      <symbol>_</symbol>
      <variation>none</variation>
    </phoneme>
    <phoneme>
      <symbol>A</symbol>
      <variation>none</variation>
    </phoneme>
    <phoneme>
      <symbol>B</symbol>
      <variation>none</variation>
    </phoneme>
    <phoneme>
      <symbol>C</symbol>
      <variation>none</variation>
    </phoneme>
    <phoneme>
      <symbol>[SILENCE]</symbol>
      <variation>none</variation>
    </phoneme>
    <phoneme>
      <symbol>[UNKNOWN]</symbol>
      <variation>none</variation>
    </phoneme>
    <phoneme>
      <symbol>&lt;/s&gt;</symbol>
      <variation>none</variation>
    </phoneme>
    <phoneme>
      <symbol>@</symbol>
      <variation>none</variation>
    </phoneme>
  </phoneme-inventory>
  <lemma special="silence">
    <orth>[SILENCE]</orth>
    <orth/>
    <phon>[SILENCE]</phon>
    <synt/>
    <eval/>
  </lemma>
  <lemma special="unknown">
    <orth>[UNKNOWN]</orth>
    <phon>[UNKNOWN]</phon>
    <synt>
      <tok>&lt;UNK&gt;</tok>
    </synt>
  </lemma>
  <lemma special="sentence-end">
    <orth>&lt;/s&gt;</orth>
    <phon>&lt;/s&gt;</phon>
  </lemma>
  <lemma special="word-boundary">
    <orth>@</orth>
    <phon>@</phon>
  </lemma>
  <lemma>
    <orth>AA</orth>
    <phon>A A</phon>
  </lemma>
  <lemma>
    <orth>AB</orth>
    <phon>A B</phon>
  </lemma>
  <lemma>
    <orth>AAA</orth>
    <phon>A A A</phon>
  </lemma>
  <lemma>
    <orth>AAB</orth>
    <phon>A A B</phon>
  </lemma>
  <lemma>
    <orth>ABA</orth>
    <phon>A B A</phon>
  </lemma>
  <lemma>
    <orth>ACA</orth>
    <phon>A C A</phon>
  </lemma>
  <lemma>
    <orth>BA</orth>
    <phon>B A</phon>
  </lemma>
  <lemma>
    <orth>BAC</orth>
    <phon>B A C</phon>
  </lemma>
</lexicon>

m=... is the AM index (1 for A, 2 for B, 3 for C, 4 for [SILENCE], 5 for [UNKNOWN] and 6 for sentence-end, just as the order of the phonemes). The blank symbol _ is still part of the lexicon, but not relevant for this tree anymore.

SimBe195 · 2025-07-29T11:04:00Z

It looks like "word-boundary" is part of the lexicon but not in the tree. From the code it looks like word-boundary should be integrated though. Is this really the right picture given the lexicon?

larissakl · 2025-07-29T11:19:10Z

Oh yes, you're right. This was the tree without word-boundary token. With word-boundary in the lexicon, it looks like this:

Add AedTreeBuilder

5ae3486

larissakl requested review from SimBe195 and curufinwe May 17, 2025 11:47

larissakl added 2 commits May 19, 2025 18:54

Add assertion

87953e9

Introduce shared base class for Ctc- and AedTreeBuilder

f0af175

larissakl mentioned this pull request May 30, 2025

Add tree label-synchronous beam-search algorithm #129

Open

Merge branch 'master' into aed-treebuilder

23709b0

larissakl added 2 commits August 28, 2025 09:54

Rename CtcAedSharedBaseClassTreeBuilder to SharedBaseClassTreeBuilder

971fdc3

Formatting

3699201

larissakl mentioned this pull request Oct 20, 2025

Add HmmTreeBuilder #154

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add AedTreeBuilder #127

Add AedTreeBuilder #127

Uh oh!

larissakl commented May 17, 2025

Uh oh!

larissakl commented May 28, 2025 •

edited

Loading

Uh oh!

SimBe195 commented Jul 28, 2025

Uh oh!

larissakl commented Jul 29, 2025

Uh oh!

SimBe195 commented Jul 29, 2025

Uh oh!

larissakl commented Jul 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add AedTreeBuilder #127

Are you sure you want to change the base?

Add AedTreeBuilder #127

Uh oh!

Conversation

larissakl commented May 17, 2025

Uh oh!

larissakl commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SimBe195 commented Jul 28, 2025

Uh oh!

larissakl commented Jul 29, 2025

Uh oh!

SimBe195 commented Jul 29, 2025

Uh oh!

larissakl commented Jul 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

larissakl commented May 28, 2025 •

edited

Loading