Skip to content

USFM parsing errorsย #837

@Enkidu93

Description

@Enkidu93

We've seen some more USFM parsing errors in three different projects. Contact me if you need the project names.

The later two are in non-Scripture books which I don't believe we should be parsing: I think when I reworked the preprocess logic, I may have undone this safeguard. It's probably as simple as passing the list of canonical books as text ids when processing the whole corpus

System.InvalidOperationException: An error occurred while parsing the text '1KI' in project ///. Verse: 1KI 1:0, line: 3, character: 1, error: 'Stack empty.'
       ---> System.InvalidOperationException: Stack empty.
         at System.Collections.Generic.Stack`1.ThrowForEmptyStack()
         at System.Collections.Generic.Stack`1.Pop()
         at SIL.Machine.Corpora.ScriptureRefUsfmParserHandlerBase.NextElement(String marker)
         at SIL.Machine.Corpora.ScriptureRefUsfmParserHandlerBase.StartParentElement(String marker)
         at SIL.Machine.Corpora.ScriptureRefUsfmParserHandlerBase.StartSidebar(UsfmParserState state, String marker, String category)
         at SIL.Machine.Corpora.UsfmParser.ProcessToken()
         at SIL.Machine.Corpora.UsfmTextBase.GetVersesInDocOrder()
         --- End of inner exception stack trace ---
         at SIL.Machine.Corpora.UsfmTextBase.GetVersesInDocOrder()
         at SIL.Machine.Corpora.ScriptureText.GetRows()
         at System.Linq.Enumerable.SelectManySingleSelectorIterator`2.MoveNext()
         at System.Linq.Enumerable.SelectEnumerableIterator`2.MoveNext()
         at System.Linq.Enumerable.WhereIterator[TSource](IEnumerable`1 source, Func`3 predicate)+MoveNext()
         at SIL.Machine.Corpora.TextCorpusEnumerator.MoveNext()
         at SIL.Machine.Corpora.NParallelTextCorpus.GetRows(IList`1 enumerators)+MoveNext()
         at SIL.Machine.Corpora.NParallelTextCorpus.GetRows(IEnumerable`1 textIds)+MoveNext()
         at SIL.Machine.Corpora.MergedTextCorpus.GetRows(IEnumerable`1 textIds)+MoveNext()
         at SIL.Machine.Corpora.TextCorpusEnumerator.CollectVerses()
         at SIL.Machine.Corpora.TextCorpusEnumerator.MoveNext()
         at SIL.Machine.Corpora.NParallelTextCorpus.GetRows(IList`1 enumerators)+MoveNext()
         at SIL.Machine.Corpora.NParallelTextCorpus.GetRows(IEnumerable`1 textIds)+MoveNext()
         at System.Collections.Generic.LargeArrayBuilder`1.AddRange(IEnumerable`1 items)
         at System.Collections.Generic.EnumerableHelpers.ToArray[T](IEnumerable`1 source)
         at SIL.ServiceToolkit.Services.ParallelCorpusPreprocessingService.PreprocessAsync(IReadOnlyList`1 corpora, Func`2 train, Func`4 inference, Boolean useKeyTerms, HashSet`1 ignoreUsfmMarkers) in /app/src/ServiceToolkit/src/SIL.ServiceToolkit/Services/ParallelCorpusPreprocessingService.cs:line 165
         at Serval.Machine.Shared.Services.TranslationPreprocessBuildJob.WriteDataFilesAsync(String buildId, IReadOnlyList`1 corpora, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs:line 47
         at Serval.Machine.Shared.Services.TranslationPreprocessBuildJob.WriteDataFilesAsync(String buildId, IReadOnlyList`1 corpora, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs:line 83
         at Serval.Machine.Shared.Services.TranslationPreprocessBuildJob.WriteDataFilesAsync(String buildId, IReadOnlyList`1 corpora, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs:line 83
         at Serval.Machine.Shared.Services.TranslationPreprocessBuildJob.WriteDataFilesAsync(String buildId, IReadOnlyList`1 corpora, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs:line 83
         at Serval.Machine.Shared.Services.TranslationPreprocessBuildJob.WriteDataFilesAsync(String buildId, IReadOnlyList`1 corpora, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs:line 83
         at Serval.Machine.Shared.Services.PreprocessBuildJob`1.DoWorkAsync(String engineId, String buildId, IReadOnlyList`1 data, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs:line 44
         at Serval.Machine.Shared.Services.HangfireBuildJob`2.RunAsync(String engineId, String buildId, TData data, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/HangfireBuildJob.cs:line 56
Failed to process the job '69206425ae2c0992fc32e756': an exception occurred.
      System.InvalidOperationException: An error occurred while parsing the text 'XXA' in project ///'. Verse: XXA 1:0, line: 2, character: 1, error: 'Stack empty.'
       ---> System.InvalidOperationException: Stack empty.
         at System.Collections.Generic.Stack`1.ThrowForEmptyStack()
         at System.Collections.Generic.Stack`1.Pop()
         at SIL.Machine.Corpora.ScriptureRefUsfmParserHandlerBase.NextElement(String marker)
         at SIL.Machine.Corpora.ScriptureRefUsfmParserHandlerBase.StartParentElement(String marker)
         at SIL.Machine.Corpora.ScriptureRefUsfmParserHandlerBase.StartSidebar(UsfmParserState state, String marker, String category)
         at SIL.Machine.Corpora.UsfmParser.ProcessToken()
         at SIL.Machine.Corpora.UsfmTextBase.GetVersesInDocOrder()
         --- End of inner exception stack trace ---
         at SIL.Machine.Corpora.UsfmTextBase.GetVersesInDocOrder()
         at SIL.Machine.Corpora.ScriptureText.GetRows()
         at System.Linq.Enumerable.SelectManySingleSelectorIterator`2.MoveNext()
         at System.Linq.Enumerable.SelectEnumerableIterator`2.MoveNext()
         at SIL.Machine.Corpora.TextCorpusEnumerator.MoveNext()
         at SIL.Machine.Corpora.NParallelTextCorpus.GetRows(IList`1 enumerators)+MoveNext()
         at SIL.Machine.Corpora.NParallelTextCorpus.GetRows(IEnumerable`1 textIds)+MoveNext()
         at SIL.Machine.Corpora.MergedTextCorpus.GetRows(IEnumerable`1 textIds)+MoveNext()
         at System.Linq.Enumerable.WhereIterator[TSource](IEnumerable`1 source, Func`3 predicate)+MoveNext()
         at SIL.Machine.Corpora.TextCorpusEnumerator.CollectVerses()
         at SIL.Machine.Corpora.TextCorpusEnumerator.MoveNext()
         at SIL.Machine.Corpora.NParallelTextCorpus.GetRows(IList`1 enumerators)+MoveNext()
         at SIL.Machine.Corpora.NParallelTextCorpus.GetRows(IEnumerable`1 textIds)+MoveNext()
         at SIL.Machine.Corpora.ParallelTextCorpus.GetRows(IEnumerable`1 textIds)+MoveNext()
         at System.Collections.Generic.LargeArrayBuilder`1.AddRange(IEnumerable`1 items)
         at System.Collections.Generic.EnumerableHelpers.ToArray[T](IEnumerable`1 source)
         at SIL.ServiceToolkit.Services.ParallelCorpusPreprocessingService.PreprocessAsync(IReadOnlyList`1 corpora, Func`2 train, Func`4 inference, Boolean useKeyTerms, HashSet`1 ignoreUsfmMarkers) in /app/src/ServiceToolkit/src/SIL.ServiceToolkit/Services/ParallelCorpusPreprocessingService.cs:line 120
         at Serval.Machine.Shared.Services.TranslationPreprocessBuildJob.WriteDataFilesAsync(String buildId, IReadOnlyList`1 corpora, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs:line 47
         at Serval.Machine.Shared.Services.TranslationPreprocessBuildJob.WriteDataFilesAsync(String buildId, IReadOnlyList`1 corpora, String buildOptions, CancellationToken cancellationToken)
         at Serval.Machine.Shared.Services.TranslationPreprocessBuildJob.WriteDataFilesAsync(String buildId, IReadOnlyList`1 corpora, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs:line 83
         at Serval.Machine.Shared.Services.TranslationPreprocessBuildJob.WriteDataFilesAsync(String buildId, IReadOnlyList`1 corpora, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs:line 83
         at Serval.Machine.Shared.Services.TranslationPreprocessBuildJob.WriteDataFilesAsync(String buildId, IReadOnlyList`1 corpora, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs:line 83
         at Serval.Machine.Shared.Services.PreprocessBuildJob`1.DoWorkAsync(String engineId, String buildId, IReadOnlyList`1 data, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs:line 44
         at Serval.Machine.Shared.Services.HangfireBuildJob`2.RunAsync(String engineId, String buildId, TData data, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/HangfireBuildJob.cs:line 56
         at Serval.Machine.Shared.Services.HangfireBuildJob`2.RunAsync(String engineId, String buildId, TData data, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/HangfireBuildJob.cs:line 120
         at Serval.Machine.Shared.Services.HangfireBuildJob`2.RunAsync(String engineId, String buildId, TData data, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/HangfireBuildJob.cs:line 124
      Failed to process the job '692e9619ae2c0992fc33faad': an exception occurred.
      System.InvalidOperationException: An error occurred while parsing the text 'XXB' in project ///. Verse: XXB 1:0, line: 2, character: 1, error: 'Stack empty.'
       ---> System.InvalidOperationException: Stack empty.
         at System.Collections.Generic.Stack`1.ThrowForEmptyStack()
         at System.Collections.Generic.Stack`1.Pop()
         at SIL.Machine.Corpora.ScriptureRefUsfmParserHandlerBase.NextElement(String marker)
         at SIL.Machine.Corpora.ScriptureRefUsfmParserHandlerBase.StartParentElement(String marker)
         at SIL.Machine.Corpora.ScriptureRefUsfmParserHandlerBase.StartSidebar(UsfmParserState state, String marker, String category)
         at SIL.Machine.Corpora.UsfmParser.ProcessToken()
         at SIL.Machine.Corpora.UsfmTextBase.GetVersesInDocOrder()
         --- End of inner exception stack trace ---
         at SIL.Machine.Corpora.UsfmTextBase.GetVersesInDocOrder()
         at SIL.Machine.Corpora.ScriptureText.GetRows()
         at System.Linq.Enumerable.SelectManySingleSelectorIterator`2.MoveNext()
         at System.Linq.Enumerable.SelectEnumerableIterator`2.MoveNext()
         at SIL.Machine.Corpora.TextCorpusEnumerator.MoveNext()
         at SIL.Machine.Corpora.NParallelTextCorpus.GetRows(IList`1 enumerators)+MoveNext()
         at SIL.Machine.Corpora.NParallelTextCorpus.GetRows(IEnumerable`1 textIds)+MoveNext()
         at SIL.Machine.Corpora.MergedTextCorpus.GetRows(IEnumerable`1 textIds)+MoveNext()
         at System.Linq.Enumerable.WhereIterator[TSource](IEnumerable`1 source, Func`3 predicate)+MoveNext()
         at SIL.Machine.Corpora.TextCorpusEnumerator.CollectVerses()
         at SIL.Machine.Corpora.TextCorpusEnumerator.MoveNext()
         at SIL.Machine.Corpora.NParallelTextCorpus.GetRows(IList`1 enumerators)+MoveNext()
         at SIL.Machine.Corpora.NParallelTextCorpus.GetRows(IEnumerable`1 textIds)+MoveNext()
         at SIL.Machine.Corpora.ParallelTextCorpus.GetRows(IEnumerable`1 textIds)+MoveNext()
         at System.Collections.Generic.LargeArrayBuilder`1.AddRange(IEnumerable`1 items)
         at System.Collections.Generic.EnumerableHelpers.ToArray[T](IEnumerable`1 source)
         at SIL.ServiceToolkit.Services.ParallelCorpusPreprocessingService.PreprocessAsync(IReadOnlyList`1 corpora, Func`2 train, Func`4 inference, Boolean useKeyTerms, HashSet`1 ignoreUsfmMarkers) in /app/src/ServiceToolkit/src/SIL.ServiceToolkit/Services/ParallelCorpusPreprocessingService.cs:line 124
         at Serval.Machine.Shared.Services.TranslationPreprocessBuildJob.WriteDataFilesAsync(String buildId, IReadOnlyList`1 corpora, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs:line 47
         at Serval.Machine.Shared.Services.TranslationPreprocessBuildJob.WriteDataFilesAsync(String buildId, IReadOnlyList`1 corpora, String buildOptions, CancellationToken cancellationToken)
         at Serval.Machine.Shared.Services.TranslationPreprocessBuildJob.WriteDataFilesAsync(String buildId, IReadOnlyList`1 corpora, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs:line 83
         at Serval.Machine.Shared.Services.TranslationPreprocessBuildJob.WriteDataFilesAsync(String buildId, IReadOnlyList`1 corpora, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs:line 83
         at Serval.Machine.Shared.Services.TranslationPreprocessBuildJob.WriteDataFilesAsync(String buildId, IReadOnlyList`1 corpora, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/TranslationPreprocessBuildJob.cs:line 83
         at Serval.Machine.Shared.Services.PreprocessBuildJob`1.DoWorkAsync(String engineId, String buildId, IReadOnlyList`1 data, String buildOptions, CancellationToken cancellationToken) in /app/src/Machine/src/Serval.Machine.Shared/Services/PreprocessBuildJob.cs:line 44

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    ๐Ÿ”– Ready

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions