k-MTSLIA algorithm for arbitrary k.
Generalizes the 2-MTSLIA from (McMullin, Aksënova & De Santo 2019). Lifts any assumptions about tier overlap.
Presented as a poster at SCiL 2023 (Rudaitis, 2023).
import k_mtslia
# Learn a MTSL₃ grammar with overlapping tiers
G = k_mtslia.learn(['miimimii', 'uumumu', 'mumumu', 'iimimi'], k=3, overlap=True)print(
k_mtslia.scan('mimi', G), # ⇒ True
k_mtslia.scan('mumu', G), # ⇒ True
k_mtslia.scan('mimu', G), # ⇒ False
)for n_gram, tier_conditions in G:
print(
'*' + ''.join(n_gram),
' ∧ '.join('(' + ' ∨ '.join(clause) + ')' for clause in tier_conditions)
)This prints out the following:
*><< (i ∨ m) ∧ (m ∨ u)
*>>< (i ∨ m) ∧ (m ∨ u)
*>u<
*>ui
*>um
*>i<
*>iu
*>im
*>m<
*>mm (i) ∧ (u)
*uu< (m)
*uuu (m)
*uui
*ui<
*uiu
and 22 more lines.
Let us interpret the first line of the output. > and < are word boundary symbols, added automatically to each string by k-MTSLIA. *><< is the restriction that the trigram ><< must not occur on certain tiers. The formula (i ∨ m) ∧ (m ∨ u) specifies that these are the following tiers:
- {i, m, >, <},
- {i, u, >, <},
- {m, >, <},
- {m, u, >, <},
- and any superset of the above.
In other words, these are the tiers that satisfy the formula and contain the restricted trigram's symbols.