Skip to content

Commit 1b1f425

Browse files
author
Sean Narenthiran
committed
Cleaned up doc strings for pep8, added missing details by referring code for beam scores
1 parent 4487b31 commit 1b1f425

File tree

2 files changed

+28
-18
lines changed

2 files changed

+28
-18
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ beam_results, beam_scores, timesteps, out_lens = decoder.decode(output)
5050
will make your beam search exponentially slower. Furthermore, the longer your outputs, the more time large beams will take.
5151
This is an important parameter that represents a tradeoff you need to make based on your dataset and needs.
5252
- `num_processes` Parallelize the batch using num_processes workers. You probably want to pass the number of cpus your computer has. You can find this in python with `import multiprocessing` then `n_cpus = multiprocessing.cpu_count()`. Default 4.
53-
- `blank_id` This should be the index of the blank token (probably 0) used when training your model so that ctcdecode can remove it during decoding.
53+
- `blank_id` This should be the index of the CTC blank token (probably 0).
5454
- `log_probs_input` If your outputs have passed through a softmax and represent probabilities, this should be false, if they passed through a LogSoftmax and represent negative log likelihood, you need to pass True. If you don't understand this, run `print(output[0][0].sum())`, if it's a negative number you've probably got NLL and need to pass True, if it sums to ~1.0 you should pass False. Default False.
5555

5656
### Inputs to the `decode` method
@@ -60,7 +60,7 @@ beam_results, beam_scores, timesteps, out_lens = decoder.decode(output)
6060

6161
4 things get returned from `decode`
6262
1. `beam_results` - Shape: BATCHSIZE x N_BEAMS X N_TIMESTEPS A batch containing the series of characters (these are ints, you still need to decode them back to your text) representing results from a given beam search. Note that the beams are almost always shorter than the total number of timesteps, and the additional data is non-sensical, so to see the top beam (as int labels) from the first item in the batch, you need to run `beam_results[0][0][:out_len[0][0]]`.
63-
1. `beam_scores` - Shape: BATCHSIZE x N_BEAMS x N_TIMESTEPS A batch with the likelihood of each beam (I think this is p=1/e\**beam_score). If this is true, you can get the model's confidence that that beam is correct with `p=1/np.exp(beam_score)` **more info needed**
63+
1. `beam_scores` - Shape: BATCHSIZE x N_BEAMS x N_TIMESTEPS A batch with the approximate CTC score of each beam (look at the code [here](https://github.com/parlance/ctcdecode/blob/master/ctcdecode/src/ctc_beam_search_decoder.cpp#L191-L192) for more info). If this is true, you can get the model's confidence that the beam is correct with `p=1/np.exp(beam_score)`.
6464
1. `timesteps` - Shape: BATCHSIZE x N_BEAMS The timestep at which the nth output character has peak probability. Can be used as alignment between the audio and the transcript.
6565
1. `out_lens` - Shape: BATCHSIZE x N_BEAMS. `out_lens[i][j]` is the length of the jth beam_result, of item i of your batch.
6666

ctcdecode/__init__.py

Lines changed: 26 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4,21 +4,24 @@
44

55
class CTCBeamDecoder(object):
66
"""
7-
Pytorch wrapper for DeepSpeech PaddlePaddle Beam Search Decoder
8-
7+
PyTorch wrapper for DeepSpeech PaddlePaddle Beam Search Decoder.
98
Args:
10-
labels (list): The tokens/vocab used to train your model. They should be in the same order as they are in your model's outputs.
9+
labels (list): The tokens/vocab used to train your model.
10+
They should be in the same order as they are in your model's outputs.
1111
model_path (basestring): The path to your external KenLM language model(LM)
12-
alpha (float): Weighting associated with the LMs probabilities. A weight of 0 means the LM has no effect.
12+
alpha (float): Weighting associated with the LMs probabilities.
13+
A weight of 0 means the LM has no effect.
1314
beta (float): Weight associated with the number of words within our beam.
14-
cutoff_top_n (int): Cutoff number in pruning. Only the top cutoff_top_n characters with the highest probability in the vocab will be used in beam search.
15+
cutoff_top_n (int): Cutoff number in pruning. Only the top cutoff_top_n characters
16+
with the highest probability in the vocab will be used in beam search.
1517
cutoff_prob (float): Cutoff probability in pruning. 1.0 means no pruning.
16-
beam_width (int): This controls how broad the beam search is. Higher values are more likely to find top beams, but they also
17-
will make your beam search exponentially slower.
18+
beam_width (int): This controls how broad the beam search is. Higher values are more likely to find top beams,
19+
but they also will make your beam search exponentially slower.
1820
num_processes (int): Parallelize the batch using num_processes workers.
19-
blank_id (int): Index of the blank token (probably 0) used when training your model so that ctcdecode can remove it during decoding.
20-
log_probs_input (bool): Pass False if your model has passed through a softmax and output probabilities sum to 1. Pass True otherwise.
21+
blank_id (int): Index of the CTC blank token (probably 0) used when training your model.
22+
log_probs_input (bool): False if your model has passed through a softmax and output probabilities sum to 1.
2123
"""
24+
2225
def __init__(self, labels, model_path=None, alpha=0, beta=0, cutoff_top_n=40, cutoff_prob=1.0, beam_width=100,
2326
num_processes=4, blank_id=0, log_probs_input=False):
2427
self.cutoff_top_n = cutoff_top_n
@@ -36,19 +39,26 @@ def __init__(self, labels, model_path=None, alpha=0, beta=0, cutoff_top_n=40, cu
3639

3740
def decode(self, probs, seq_lens=None):
3841
"""
39-
Conduct the beamsearch on model outputs and return results
40-
42+
Conducts the beamsearch on model outputs and return results.
4143
Args:
4244
probs (Tensor) - A rank 3 tensor representing model outputs. Shape is batch x num_timesteps x num_labels.
43-
seq_lens (Tensor) - A rank 1 tensor representing the sequence length of the items in the batch. Optional, if not provided the size of axis 1 (num_timesteps) of `probs` is used for all items
45+
seq_lens (Tensor) - A rank 1 tensor representing the sequence length of the items in the batch. Optional,
46+
if not provided the size of axis 1 (num_timesteps) of `probs` is used for all items
4447
4548
Returns:
4649
tuple: (beam_results, beam_scores, timesteps, out_lens)
4750
48-
beam_results (Tensor): A rank 3 tensor representing the top n beams of a batch of items. Shape: batchsize x num_beams x num_timeteps. Results are still encoded as ints at this stage.
49-
beam_scores (Tensor): A rank 3 tensor representing the likelihood of each beam in beam_results. Shape: batchsize x num_beams x num_timeteps
50-
timesteps (Tensor): A rank 2 tensor representing the timesteps at which the nth output character has peak probability. To be used as alignment between audio and transcript. Shape: batchsize x num_beams
51-
out_lens (Tensor): A rank 2 tensor representing the length of each beam in beam_results. Shape: batchsize x n_beams.
51+
beam_results (Tensor): A 3-dim tensor representing the top n beams of a batch of items.
52+
Shape: batchsize x num_beams x num_timesteps.
53+
Results are still encoded as ints at this stage.
54+
beam_scores (Tensor): A 3-dim tensor representing the likelihood of each beam in beam_results.
55+
Shape: batchsize x num_beams x num_timesteps
56+
timesteps (Tensor): A 2-dim tensor representing the timesteps at which the nth output character
57+
has peak probability.
58+
To be used as alignment between audio and transcript.
59+
Shape: batchsize x num_beams
60+
out_lens (Tensor): A 2-dim tensor representing the length of each beam in beam_results.
61+
Shape: batchsize x n_beams.
5262
5363
"""
5464
probs = probs.cpu().float()

0 commit comments

Comments
 (0)