Skip to content

Conversation

@NeoLegends
Copy link
Member

@NeoLegends NeoLegends commented Aug 21, 2023

Closes #430
Closes #514

Now testing...

@NeoLegends NeoLegends added the enhancement New feature or request label Aug 21, 2023
@NeoLegends NeoLegends self-assigned this Aug 21, 2023
@NeoLegends NeoLegends marked this pull request as draft August 21, 2023 16:34
@NeoLegends NeoLegends force-pushed the feat/separate-lmi-gc-generation branch from 6b40167 to b74c654 Compare August 21, 2023 16:39
@NeoLegends NeoLegends requested a review from michelwi August 29, 2023 14:44
@NeoLegends NeoLegends marked this pull request as ready for review August 29, 2023 14:49
@NeoLegends
Copy link
Member Author

NeoLegends commented Aug 29, 2023

I just discovered during testing the LM image and GC are set on the post config, not in the normal config. This means splitting the LMGC-Job into two does not change the hash of any existing jobs.

Therefore, I think this flag can be enabled by default, or rather, we can in-line the flag and make it the new default! WDYT?

To finish testing I need to run a decoding w/ the new GC and new LM, but otherwise the stuff here is now tested. This is what it looks like when the flag is set in a pipeline w/ thousands of jobs (note due to the GC not being hashed only a few GC-jobs are even run):

[2023-08-29 17:30:46,918] INFO: Finished updating job states
[2023-08-29 17:30:46,934] INFO: Experiment directory: /u/mgunz/setups/2023-08--subsampling-new      Call: /u/mgunz/src/sisyphus/sis m -r -io
[2023-08-29 17:30:47,002] INFO: runnable: Job<work/i6_core/lm/lm_image/CreateLmImageJob.gNAo29L3jB0Y>
[2023-08-29 17:30:47,002] INFO: runnable: Job<work/i6_core/recognition/advanced_tree_search/BuildGlobalCacheJob.9kZVSb7jUVMk>
[2023-08-29 17:30:47,002] INFO: runnable: Job<work/i6_core/recognition/advanced_tree_search/BuildGlobalCacheJob.VesmsCzSguWG>
[2023-08-29 17:30:47,002] INFO: runnable(3) running(8) waiting(878)

@NeoLegends
Copy link
Member Author

Apparently switching the default here changes hashes at AppTek. In case it is possible to live w/ the hash breakage (e.g. because it is in unused parts of the code) I'd like the flag to be on by default so as many folks as possible can profit from the changes here. If the hash breakage is unacceptable we leave it off by default of course.

@NeoLegends NeoLegends changed the title feat: Add option to generate LM image and GC via two separate jobs Add option to generate LM image and GC via two separate jobs Aug 29, 2023
@michelwi
Copy link
Contributor

I just discovered during testing the LM image and GC are set on the post config, not in the normal config. This means splitting the LMGC-Job into two does not change the hash of any existing jobs.

*Any existing search jobs. The graph however is changed as in all of the LMGC Jobs are now removed and separate LM and GC Jobs are added. This is caught in the pipeline.

In case it is possible to live w/ the hash breakage (e.g. because it is in unused parts of the code) I'd like the flag to be on by default so as many folks as possible can profit from the changes here. If the hash breakage is unacceptable we leave it off by default of course.

Unfortunately all parts that are tested in the pipeline are used.

@NeoLegends
Copy link
Member Author

I have successfully run recognitions w/ this setup. This is tested now.

@Marvin84
Copy link
Contributor

@curufinwe @michelwi can we merge this?

@michelwi
Copy link
Contributor

@curufinwe @michelwi can we merge this?

Since the apptek test passes, I see no objections from our site.

I'll dismiss my old review, but I currently don't have much time to re-review it.. sorry.

@michelwi michelwi dismissed their stale review November 14, 2023 19:11

No time to follow up on it.

)
for i, lm_config in enumerate(arpa_lms):
lm_config[1].image = lm_gc.out_lm_images[i + 1]
for i, (_lm_config, lm_post_config) in enumerate(arpa_lms):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@Atticus1806
Copy link
Contributor

can we merge this?

With two approves yes :). I just got a few comments then I can approve, but please ask someone else to also review.

Copy link
Contributor

@michelwi michelwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think many open comments should be addressed before merging.

@michelwi
Copy link
Contributor

is anyone still interested in having this merged or has this Job become irrelevant at the chair?

@Marvin84
Copy link
Contributor

is anyone still interested in having this merged or has this Job become irrelevant at the chair?

I am interested, but not willing to spend time on it myself.

Copy link
Contributor

@michelwi michelwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all open comments are addressed and the pipeline is green.

@Atticus1806 , @JackTemaki, @NeoLegends would you be able to take another look? Thx.

Copy link
Member Author

@NeoLegends NeoLegends left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit, otherwise LGTM! :)

:param lmgc_alias: Alias for the AdvancedTreeSearchLmImageAndGlobalCacheJob
:param lmgc_scorer: Dummy scorer for the AdvancedTreeSearchLmImageAndGlobalCacheJob which is required but unused
:param lm_cache_method: Specifies, how the LM image and the global cache should be created:
JOINED (default) -> automatically create lm images and global cache as output of one job
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: consider informing about the tradeoff between the different options here, i.e. that the default has hash issues.

@NeoLegends
Copy link
Member Author

I cannot approve since I originally created this PR; but LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

7 participants