Add option to generate LM image and GC via two separate jobs #446

NeoLegends · 2023-08-21T16:34:02Z

Closes #430
Closes #514

Now testing...

Closes #430

recognition/advanced_tree_search.py

See rwth-i6/rasr@d58a228

NeoLegends · 2023-08-29T15:26:51Z

I just discovered during testing the LM image and GC are set on the post config, not in the normal config. This means splitting the LMGC-Job into two does not change the hash of any existing jobs.

Therefore, I think this flag can be enabled by default, or rather, we can in-line the flag and make it the new default! WDYT?

To finish testing I need to run a decoding w/ the new GC and new LM, but otherwise the stuff here is now tested. This is what it looks like when the flag is set in a pipeline w/ thousands of jobs (note due to the GC not being hashed only a few GC-jobs are even run):

[2023-08-29 17:30:46,918] INFO: Finished updating job states
[2023-08-29 17:30:46,934] INFO: Experiment directory: /u/mgunz/setups/2023-08--subsampling-new      Call: /u/mgunz/src/sisyphus/sis m -r -io
[2023-08-29 17:30:47,002] INFO: runnable: Job<work/i6_core/lm/lm_image/CreateLmImageJob.gNAo29L3jB0Y>
[2023-08-29 17:30:47,002] INFO: runnable: Job<work/i6_core/recognition/advanced_tree_search/BuildGlobalCacheJob.9kZVSb7jUVMk>
[2023-08-29 17:30:47,002] INFO: runnable: Job<work/i6_core/recognition/advanced_tree_search/BuildGlobalCacheJob.VesmsCzSguWG>
[2023-08-29 17:30:47,002] INFO: runnable(3) running(8) waiting(878)

NeoLegends · 2023-08-29T15:36:48Z

Apparently switching the default here changes hashes at AppTek. In case it is possible to live w/ the hash breakage (e.g. because it is in unused parts of the code) I'd like the flag to be on by default so as many folks as possible can profit from the changes here. If the hash breakage is unacceptable we leave it off by default of course.

michelwi · 2023-08-30T12:37:22Z

I just discovered during testing the LM image and GC are set on the post config, not in the normal config. This means splitting the LMGC-Job into two does not change the hash of any existing jobs.

*Any existing search jobs. The graph however is changed as in all of the LMGC Jobs are now removed and separate LM and GC Jobs are added. This is caught in the pipeline.

In case it is possible to live w/ the hash breakage (e.g. because it is in unused parts of the code) I'd like the flag to be on by default so as many folks as possible can profit from the changes here. If the hash breakage is unacceptable we leave it off by default of course.

Unfortunately all parts that are tested in the pipeline are used.

lm/lm_image.py

rasr/crp.py

recognition/advanced_tree_search.py

NeoLegends · 2023-08-31T09:10:39Z

I have successfully run recognitions w/ this setup. This is tested now.

recognition/advanced_tree_search.py

lm/util.py

Marvin84 · 2023-11-14T16:13:00Z

@curufinwe @michelwi can we merge this?

michelwi · 2023-11-14T19:10:58Z

@curufinwe @michelwi can we merge this?

Since the apptek test passes, I see no objections from our site.

I'll dismiss my old review, but I currently don't have much time to re-review it.. sorry.

No time to follow up on it.

lm/util.py

recognition/advanced_tree_search.py

Atticus1806 · 2023-11-15T14:54:23Z

recognition/advanced_tree_search.py

        )
-        for i, lm_config in enumerate(arpa_lms):
-            lm_config[1].image = lm_gc.out_lm_images[i + 1]
+        for i, (_lm_config, lm_post_config) in enumerate(arpa_lms):


Same as above

recognition/advanced_tree_search.py

Atticus1806 · 2023-11-15T14:58:50Z

can we merge this?

With two approves yes :). I just got a few comments then I can approve, but please ask someone else to also review.

michelwi

I think many open comments should be addressed before merging.

lm/util.py

recognition/advanced_tree_search.py

michelwi · 2025-07-15T13:08:03Z

is anyone still interested in having this merged or has this Job become irrelevant at the chair?

Marvin84 · 2025-07-15T13:35:28Z

is anyone still interested in having this merged or has this Job become irrelevant at the chair?

I am interested, but not willing to spend time on it myself.

Co-authored-by: michelwi <[email protected]> Co-authored-by: Benedikt Hilmes <[email protected]>

…i6_core into feat/separate-lmi-gc-generation

michelwi

I think all open comments are addressed and the pipeline is green.

@Atticus1806 , @JackTemaki, @NeoLegends would you be able to take another look? Thx.

NeoLegends

One nit, otherwise LGTM! :)

NeoLegends · 2025-08-14T10:38:04Z

recognition/advanced_tree_search.py

        :param lmgc_alias: Alias for the AdvancedTreeSearchLmImageAndGlobalCacheJob
        :param lmgc_scorer: Dummy scorer for the AdvancedTreeSearchLmImageAndGlobalCacheJob which is required but unused
+        :param lm_cache_method: Specifies, how the LM image and the global cache should be created:
+            JOINED (default) -> automatically create lm images and global cache as output of one job


Nit: consider informing about the tradeoff between the different options here, i.e. that the default has hash issues.

NeoLegends · 2025-08-15T13:53:56Z

I cannot approve since I originally created this PR; but LGTM!

feat: Add option to generate LM image and GC via two separate jobs

620a042

Closes #430

NeoLegends added the enhancement New feature or request label Aug 21, 2023

NeoLegends requested review from JackTemaki and christophmluscher August 21, 2023 16:34

NeoLegends self-assigned this Aug 21, 2023

NeoLegends marked this pull request as draft August 21, 2023 16:34

NeoLegends added 2 commits August 21, 2023 18:34

chore: Document parameter

a68265c

fix: Always assign the (possibly to None) lm_gc property

b74c654

NeoLegends force-pushed the feat/separate-lmi-gc-generation branch from 6b40167 to b74c654 Compare August 21, 2023 16:39

michelwi previously requested changes Aug 22, 2023

View reviewed changes

recognition/advanced_tree_search.py Outdated Show resolved Hide resolved

recognition/advanced_tree_search.py Outdated Show resolved Hide resolved

recognition/advanced_tree_search.py Outdated Show resolved Hide resolved

NeoLegends added 5 commits August 22, 2023 10:22

fix bug, assign jobs to class if possible

79cbd2a

refactor find_arpa_lms into standalone function

78f6bed

Merge branch 'main' into feat/separate-lmi-gc-generation

291b734

fix bugs from trial runs

d209e8f

Re-enable lm-util

02d7088

See rwth-i6/rasr@d58a228

NeoLegends requested a review from michelwi August 29, 2023 14:44

NeoLegends added 2 commits August 29, 2023 16:48

more mem for LM + GC jobs

6a236c8

make mem configurable

a61b03f

NeoLegends marked this pull request as ready for review August 29, 2023 14:49

even more mem

008c9dc

enable split behavior by default, document hash implications

07e1136

NeoLegends changed the title ~~feat: Add option to generate LM image and GC via two separate jobs~~ Add option to generate LM image and GC via two separate jobs Aug 29, 2023

disable flag by default

cdc791a

michelwi reviewed Aug 30, 2023

View reviewed changes

lm/lm_image.py Show resolved Hide resolved

rasr/crp.py Show resolved Hide resolved

recognition/advanced_tree_search.py Outdated Show resolved Hide resolved

recognition/advanced_tree_search.py Show resolved Hide resolved

JackTemaki reviewed Sep 6, 2023

View reviewed changes

recognition/advanced_tree_search.py Outdated Show resolved Hide resolved

recognition/advanced_tree_search.py Outdated Show resolved Hide resolved

Rename flag to be more clear

ab010c5

Merge branch 'main' into feat/separate-lmi-gc-generation

115b160

Atticus1806 reviewed Nov 9, 2023

View reviewed changes

lm/util.py Show resolved Hide resolved

Atticus1806 reviewed Nov 15, 2023

View reviewed changes

michelwi reviewed Dec 4, 2023

View reviewed changes

JackTemaki mentioned this pull request Jun 18, 2024

Exclusion of unnecessary parameters from AdvancedTreeSearchLmImageAndGlobalCacheJob hash calculation #514

Open

NeoLegends removed their assignment Jun 18, 2024

michelwi requested changes Jun 19, 2024

View reviewed changes

recognition/advanced_tree_search.py Outdated Show resolved Hide resolved

recognition/advanced_tree_search.py Outdated Show resolved Hide resolved

recognition/advanced_tree_search.py Show resolved Hide resolved

recognition/advanced_tree_search.py Outdated Show resolved Hide resolved

Merge branch 'main' into feat/separate-lmi-gc-generation

49315ea

Daniel Mann and others added 11 commits July 30, 2025 11:38

introduce enum

0182ad6

Apply suggestions from code review

f3199fb

Co-authored-by: michelwi <[email protected]> Co-authored-by: Benedikt Hilmes <[email protected]>

Merge branch 'feat/separate-lmi-gc-generation' of github.com:rwth-i6/…

f784e87

…i6_core into feat/separate-lmi-gc-generation

change util function signature

f02c2c9

more reviewer comments

9b61538

ruff formatting

d2507d0

adjust rescoring job

a5a6af9

fix parameter name typo

f74c52f

fix empty post config

0f82f3f

postpone arpa discovery -> should fix hash test

ffe3681

ruff

ef5e8ce

michelwi approved these changes Aug 7, 2025

View reviewed changes

NeoLegends commented Aug 14, 2025

View reviewed changes

Daniel Mann added 2 commits August 15, 2025 09:32

reviewer comments

a2292d0

more reviewer comments

7d1bcc9

Add option to generate LM image and GC via two separate jobs #446

Are you sure you want to change the base?

Add option to generate LM image and GC via two separate jobs #446

Uh oh!

Conversation

NeoLegends commented Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NeoLegends commented Aug 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NeoLegends commented Aug 29, 2023

Uh oh!

michelwi commented Aug 30, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NeoLegends commented Aug 31, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Marvin84 commented Nov 14, 2023

Uh oh!

michelwi commented Nov 14, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Atticus1806 Nov 15, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Atticus1806 commented Nov 15, 2023

Uh oh!

michelwi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

michelwi commented Jul 15, 2025

Uh oh!

Marvin84 commented Jul 15, 2025

Uh oh!

michelwi left a comment

Choose a reason for hiding this comment

Uh oh!

NeoLegends left a comment

Choose a reason for hiding this comment

Uh oh!

NeoLegends Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

NeoLegends commented Aug 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

NeoLegends commented Aug 21, 2023 •

edited

Loading

NeoLegends commented Aug 29, 2023 •

edited

Loading