Fix: Alignment stream analyzer added a new forward hook for each utterance by tobmi1 · Pull Request #455 · resemble-ai/chatterbox

tobmi1 · 2026-02-01T18:47:56Z

A new instance of the alignment stream analyzer is currently being instantiated for each call of T3.inference:

chatterbox/src/chatterbox/models/t3/t3.py

Lines 273 to 288 in ed27b95

    
           self.compiled = False 
        
           # TODO? synchronize the expensive compile function 
        
           # with self.compile_lock: 
        
           if not self.compiled: 
        
               # Default to None for English models, only create for multilingual 
        
               alignment_stream_analyzer = None 
        
               if self.hp.is_multilingual: 
        
                   alignment_stream_analyzer = AlignmentStreamAnalyzer( 
        
                       self.tfmr, 
        
                       None, 
        
                       text_tokens_slice=(len_cond, len_cond + text_tokens.size(-1)), 
        
                       alignment_layer_idx=9, # TODO: hparam or something? 
        
                       eos_idx=self.hp.stop_speech_token, 
        
                   ) 
        
                   assert alignment_stream_analyzer.eos_idx == self.hp.stop_speech_token

Hence, each time a new forward hook is added to the backbone's self-attention layers. When generating multiple utterances, this will eventually slow down the generation. After 1000 utterances, the attention masks will be copied 1000 times from GPU to CPU when running on CUDA. This affects the multilingual model since the alignment stream analyzer is enabled by default for it, see, for example, #352.

This PR changes the logic so that the alignment stream analyzer is only instantiated once, and its internal state is instead reset for each utterance without adding new forward hooks to the backbone model.

I tested the fix by generating ~8500 short utterances, which now takes ~4.5h instead of ~50h on my setup.

A big thanks to @benHeid for pointing out the issue in #352! 😊

…utterance

Fix alignment stream analyzer to not add a new forward hook for each …

69f4d13

…utterance

tobmi1 marked this pull request as ready for review February 1, 2026 18:49

tobmi1 mentioned this pull request Feb 1, 2026

Inference getting slower in large scale generation of multiple files #352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Alignment stream analyzer added a new forward hook for each utterance#455

Fix: Alignment stream analyzer added a new forward hook for each utterance#455
tobmi1 wants to merge 1 commit intoresemble-ai:masterfrom
tobmi1:fix_alignment_stream_analyzer

tobmi1 commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	self.compiled = False

	# TODO? synchronize the expensive compile function
	# with self.compile_lock:
	if not self.compiled:
	# Default to None for English models, only create for multilingual
	alignment_stream_analyzer = None
	if self.hp.is_multilingual:
	alignment_stream_analyzer = AlignmentStreamAnalyzer(
	self.tfmr,
	None,
	text_tokens_slice=(len_cond, len_cond + text_tokens.size(-1)),
	alignment_layer_idx=9, # TODO: hparam or something?
	eos_idx=self.hp.stop_speech_token,
	)
	assert alignment_stream_analyzer.eos_idx == self.hp.stop_speech_token

Conversation

tobmi1 commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant