You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
will make your beam search exponentially slower. Furthermore, the longer your outputs, the more time large beams will take.
51
51
This is an important parameter that represents a tradeoff you need to make based on your dataset and needs.
52
52
-`num_processes` Parallelize the batch using num_processes workers. You probably want to pass the number of cpus your computer has. You can find this in python with `import multiprocessing` then `n_cpus = multiprocessing.cpu_count()`. Default 4.
53
-
-`blank_id` This should be the index of the blank token (probably 0) used when training your model so that ctcdecode can remove it during decoding.
53
+
-`blank_id` This should be the index of the CTC blank token (probably 0).
54
54
-`log_probs_input` If your outputs have passed through a softmax and represent probabilities, this should be false, if they passed through a LogSoftmax and represent negative log likelihood, you need to pass True. If you don't understand this, run `print(output[0][0].sum())`, if it's a negative number you've probably got NLL and need to pass True, if it sums to ~1.0 you should pass False. Default False.
1.`beam_results` - Shape: BATCHSIZE x N_BEAMS X N_TIMESTEPS A batch containing the series of characters (these are ints, you still need to decode them back to your text) representing results from a given beam search. Note that the beams are almost always shorter than the total number of timesteps, and the additional data is non-sensical, so to see the top beam (as int labels) from the first item in the batch, you need to run `beam_results[0][0][:out_len[0][0]]`.
63
-
1.`beam_scores` - Shape: BATCHSIZE x N_BEAMS x N_TIMESTEPS A batch with the likelihood of each beam (I think this is p=1/e\**beam_score). If this is true, you can get the model's confidence that that beam is correct with `p=1/np.exp(beam_score)`**more info needed**
63
+
1.`beam_scores` - Shape: BATCHSIZE x N_BEAMS x N_TIMESTEPS A batch with the approximate CTC score of each beam (look at the code [here](https://github.com/parlance/ctcdecode/blob/master/ctcdecode/src/ctc_beam_search_decoder.cpp#L191-L192) for more info). If this is true, you can get the model's confidence that the beam is correct with `p=1/np.exp(beam_score)`.
64
64
1.`timesteps` - Shape: BATCHSIZE x N_BEAMS The timestep at which the nth output character has peak probability. Can be used as alignment between the audio and the transcript.
65
65
1.`out_lens` - Shape: BATCHSIZE x N_BEAMS. `out_lens[i][j]` is the length of the jth beam_result, of item i of your batch.
Copy file name to clipboardExpand all lines: ctcdecode/__init__.py
+26-16Lines changed: 26 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -4,21 +4,24 @@
4
4
5
5
classCTCBeamDecoder(object):
6
6
"""
7
-
Pytorch wrapper for DeepSpeech PaddlePaddle Beam Search Decoder
8
-
7
+
PyTorch wrapper for DeepSpeech PaddlePaddle Beam Search Decoder.
9
8
Args:
10
-
labels (list): The tokens/vocab used to train your model. They should be in the same order as they are in your model's outputs.
9
+
labels (list): The tokens/vocab used to train your model.
10
+
They should be in the same order as they are in your model's outputs.
11
11
model_path (basestring): The path to your external KenLM language model(LM)
12
-
alpha (float): Weighting associated with the LMs probabilities. A weight of 0 means the LM has no effect.
12
+
alpha (float): Weighting associated with the LMs probabilities.
13
+
A weight of 0 means the LM has no effect.
13
14
beta (float): Weight associated with the number of words within our beam.
14
-
cutoff_top_n (int): Cutoff number in pruning. Only the top cutoff_top_n characters with the highest probability in the vocab will be used in beam search.
15
+
cutoff_top_n (int): Cutoff number in pruning. Only the top cutoff_top_n characters
16
+
with the highest probability in the vocab will be used in beam search.
15
17
cutoff_prob (float): Cutoff probability in pruning. 1.0 means no pruning.
16
-
beam_width (int): This controls how broad the beam search is. Higher values are more likely to find top beams, but they also
17
-
will make your beam search exponentially slower.
18
+
beam_width (int): This controls how broad the beam search is. Higher values are more likely to find top beams,
19
+
but they also will make your beam search exponentially slower.
18
20
num_processes (int): Parallelize the batch using num_processes workers.
19
-
blank_id (int): Index of the blank token (probably 0) used when training your model so that ctcdecode can remove it during decoding.
20
-
log_probs_input (bool): Pass False if your model has passed through a softmax and output probabilities sum to 1. Pass True otherwise.
21
+
blank_id (int): Index of the CTC blank token (probably 0) used when training your model.
22
+
log_probs_input (bool): False if your model has passed through a softmax and output probabilities sum to 1.
Conduct the beamsearch on model outputs and return results
40
-
42
+
Conducts the beamsearch on model outputs and return results.
41
43
Args:
42
44
probs (Tensor) - A rank 3 tensor representing model outputs. Shape is batch x num_timesteps x num_labels.
43
-
seq_lens (Tensor) - A rank 1 tensor representing the sequence length of the items in the batch. Optional, if not provided the size of axis 1 (num_timesteps) of `probs` is used for all items
45
+
seq_lens (Tensor) - A rank 1 tensor representing the sequence length of the items in the batch. Optional,
46
+
if not provided the size of axis 1 (num_timesteps) of `probs` is used for all items
beam_results (Tensor): A rank 3 tensor representing the top n beams of a batch of items. Shape: batchsize x num_beams x num_timeteps. Results are still encoded as ints at this stage.
49
-
beam_scores (Tensor): A rank 3 tensor representing the likelihood of each beam in beam_results. Shape: batchsize x num_beams x num_timeteps
50
-
timesteps (Tensor): A rank 2 tensor representing the timesteps at which the nth output character has peak probability. To be used as alignment between audio and transcript. Shape: batchsize x num_beams
51
-
out_lens (Tensor): A rank 2 tensor representing the length of each beam in beam_results. Shape: batchsize x n_beams.
51
+
beam_results (Tensor): A 3-dim tensor representing the top n beams of a batch of items.
52
+
Shape: batchsize x num_beams x num_timesteps.
53
+
Results are still encoded as ints at this stage.
54
+
beam_scores (Tensor): A 3-dim tensor representing the likelihood of each beam in beam_results.
55
+
Shape: batchsize x num_beams x num_timesteps
56
+
timesteps (Tensor): A 2-dim tensor representing the timesteps at which the nth output character
57
+
has peak probability.
58
+
To be used as alignment between audio and transcript.
59
+
Shape: batchsize x num_beams
60
+
out_lens (Tensor): A 2-dim tensor representing the length of each beam in beam_results.
0 commit comments