TensorSpeech
diff --git a/‎README.md‎
Lines changed: 3 additions & 0 deletions b/‎README.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎examples/streaming_transducer/README.md‎
Lines changed: 77 additions & 0 deletions b/‎examples/streaming_transducer/README.md‎
Lines changed: 77 additions & 0 deletions
diff --git a/‎examples/streaming_transducer/config.yml‎
Lines changed: 79 additions & 0 deletions b/‎examples/streaming_transducer/config.yml‎
Lines changed: 79 additions & 0 deletions
diff --git a/‎examples/streaming_transducer/test_streaming_transducer.py‎
Lines changed: 100 additions & 0 deletions b/‎examples/streaming_transducer/test_streaming_transducer.py‎
Lines changed: 100 additions & 0 deletions
diff --git a/‎examples/streaming_transducer/test_subword_streaming_transducer.py‎
Lines changed: 108 additions & 0 deletions b/‎examples/streaming_transducer/test_subword_streaming_transducer.py‎
Lines changed: 108 additions & 0 deletions
@@ -19,6 +19,7 @@ TensorFlowASR implements some automatic speech recognition architectures such as
 
 ## What's New?
 
+- (10/18/2020) Supported Streaming Transducer [https://arxiv.org/abs/1811.06621](https://arxiv.org/abs/1811.06621)
 - (10/15/2020) Add gradients accumulation and Refactor to TensorflowASR
 - (10/10/2020) Update documents and upload package to pypi
 - (10/6/2020) Change `nlpaug` version to `>=1.0.1`
@@ -32,6 +33,8 @@ TensorFlowASR implements some automatic speech recognition architectures such as
 - **Transducer Models** (End2end models using RNNT Loss for training)
 - **Conformer Transducer** (Reference: [https://arxiv.org/abs/2005.08100](https://arxiv.org/abs/2005.08100))
   See [examples/conformer](./examples/conformer)
+- **Streaming Transducer** (Reference: [https://arxiv.org/abs/1811.06621](https://arxiv.org/abs/1811.06621))
+  See [examples/streaming_transducer](./examples/streaming_transducer)
 
 ## Setup Environment and Datasets
 
 
@@ -0,0 +1,77 @@
+# Streaming End-to-end Speech Recognition For Mobile Devices
+
+Reference: [https://arxiv.org/abs/1811.06621](https://arxiv.org/abs/1811.06621)
+
+## Example Model YAML Config
+
+```yaml
+speech_config:
+  sample_rate: 16000
+  frame_ms: 25
+  stride_ms: 10
+  feature_type: log_mel_spectrogram
+  num_feature_bins: 80
+  preemphasis: 0.97
+  normalize_signal: True
+  normalize_feature: True
+  normalize_per_feature: False
+
+decoder_config:
+  vocabulary: null
+  target_vocab_size: 1024
+  max_subword_length: 4
+  blank_at_zero: True
+  beam_width: 5
+  norm_score: True
+
+model_config:
+  name: streaming_transducer
+  subsampling:
+    type: time_reduction
+    factor: 3
+  encoder_dim: 320
+  encoder_units: 1024
+  encoder_layers: 7
+  encoder_layer_norm: True
+  encoder_type: lstm
+  embed_dim: 320
+  embed_dropout: 0.1
+  num_rnns: 1
+  rnn_units: 320
+  rnn_type: lstm
+  layer_norm: True
+  joint_dim: 320
+
+learning_config:
+  augmentations:
+    after:
+      time_masking:
+        num_masks: 10
+        mask_factor: 100
+        p_upperbound: 0.2
+      freq_masking:
+        num_masks: 1
+        mask_factor: 27
+
+  dataset_config:
+    train_paths: ...
+    eval_paths: ...
+    test_paths: ...
+    tfrecords_dir: ...
+
+  running_config:
+    batch_size: 4
+    num_epochs: 22
+    outdir: ...
+    log_interval_steps: 400
+    save_interval_steps: 400
+    eval_interval_steps: 1000
+```
+
+## Usage
+
+Training, see `python examples/streamingTransducer/train_streaming_transducer.py --help`
+
+Testing, see `python examples/streamingTransducer/train_streaming_transducer.py --help`
+
+TFLite Conversion, see `python examples/streamingTransducer/tflite_streaming_transducer.py --help`
@@ -0,0 +1,79 @@
+# Copyright 2020 Huy Le Nguyen (@usimarit)
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+speech_config:
+  sample_rate: 16000
+  frame_ms: 25
+  stride_ms: 10
+  num_feature_bins: 80
+  feature_type: log_mel_spectrogram
+  preemphasis: 0.97
+  normalize_signal: True
+  normalize_feature: True
+  normalize_per_feature: False
+
+decoder_config:
+  vocabulary: null
+  target_vocab_size: 1024
+  max_subword_length: 4
+  blank_at_zero: True
+  beam_width: 5
+  norm_score: True
+
+model_config:
+  name: streaming_transducer
+  reduction_factor: 2
+  reduction_positions: [1]
+  encoder_dim: 320
+  encoder_units: 1024
+  encoder_layers: 8
+  encoder_layer_norm: True
+  encoder_type: lstm
+  embed_dim: 320
+  embed_dropout: 0.1
+  num_rnns: 1
+  rnn_units: 320
+  rnn_type: lstm
+  layer_norm: True
+  joint_dim: 320
+
+learning_config:
+  augmentations:
+    after:
+      time_masking:
+        num_masks: 10
+        mask_factor: 100
+        p_upperbound: 0.05
+      freq_masking:
+        num_masks: 1
+        mask_factor: 27
+
+  dataset_config:
+    train_paths:
+      - /mnt/Data/ML/ASR/Raw/LibriSpeech/train-clean-100/transcripts.tsv
+    eval_paths:
+      - /mnt/Data/ML/ASR/Raw/LibriSpeech/dev-clean/transcripts.tsv
+      - /mnt/Data/ML/ASR/Raw/LibriSpeech/dev-other/transcripts.tsv
+    test_paths:
+      - /mnt/Data/ML/ASR/Raw/LibriSpeech/test-clean/transcripts.tsv
+    tfrecords_dir: null
+
+  running_config:
+    batch_size: 2
+    accumulation_steps: 1
+    num_epochs: 20
+    outdir: /mnt/Projects/asrk16/trained/local/librispeech/streaming_transducer
+    log_interval_steps: 300
+    eval_interval_steps: 500
+    save_interval_steps: 1000
@@ -0,0 +1,100 @@
+# Copyright 2020 Huy Le Nguyen (@usimarit)
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import argparse
+from tensorflow_asr.utils import setup_environment, setup_devices
+
+setup_environment()
+import tensorflow as tf
+
+DEFAULT_YAML = os.path.join(os.path.abspath(os.path.dirname(__file__)), "config.yml")
+
+tf.keras.backend.clear_session()
+
+parser = argparse.ArgumentParser(prog="Conformer Testing")
+
+parser.add_argument("--config", type=str, default=DEFAULT_YAML,
+                    help="The file path of model configuration file")
+
+parser.add_argument("--saved", type=str, default=None,
+                    help="Path to saved model")
+
+parser.add_argument("--tfrecords", default=False, action="store_true",
+                    help="Whether to use tfrecords as dataset")
+
+parser.add_argument("--mxp", default=False, action="store_true",
+                    help="Enable mixed precision")
+
+parser.add_argument("--device", type=int, default=0,
+                    help="Device's id to run test on")
+
+parser.add_argument("--cpu", default=False, action="store_true",
+                    help="Whether to only use cpu")
+
+parser.add_argument("--output_name", type=str, default="test",
+                    help="Result filename name prefix")
+
+args = parser.parse_args()
+
+tf.config.optimizer.set_experimental_options({"auto_mixed_precision": args.mxp})
+
+setup_devices([args.device], cpu=args.cpu)
+
+from tensorflow_asr.configs.user_config import UserConfig
+from tensorflow_asr.datasets.asr_dataset import ASRTFRecordDataset, ASRSliceDataset
+from tensorflow_asr.featurizers.speech_featurizers import TFSpeechFeaturizer
+from tensorflow_asr.featurizers.text_featurizers import CharFeaturizer
+from tensorflow_asr.runners.base_runners import BaseTester
+from tensorflow_asr.models.streaming_transducer import StreamingTransducer
+
+config = UserConfig(DEFAULT_YAML, args.config, learning=True)
+speech_featurizer = TFSpeechFeaturizer(config["speech_config"])
+text_featurizer = CharFeaturizer(config["decoder_config"])
+
+tf.random.set_seed(0)
+assert args.saved
+
+if args.tfrecords:
+    test_dataset = ASRTFRecordDataset(
+        data_paths=config["learning_config"]["dataset_config"]["test_paths"],
+        tfrecords_dir=config["learning_config"]["dataset_config"]["tfrecords_dir"],
+        speech_featurizer=speech_featurizer,
+        text_featurizer=text_featurizer,
+        stage="test", shuffle=False
+    )
+else:
+    test_dataset = ASRSliceDataset(
+        data_paths=config["learning_config"]["dataset_config"]["test_paths"],
+        speech_featurizer=speech_featurizer,
+        text_featurizer=text_featurizer,
+        stage="test", shuffle=False
+    )
+
+# build model
+streaming_transducer = StreamingTransducer(
+    vocabulary_size=text_featurizer.num_classes,
+    **config["model_config"]
+)
+streaming_transducer._build(speech_featurizer.shape)
+streaming_transducer.load_weights(args.saved, by_name=True)
+streaming_transducer.summary(line_length=150)
+streaming_transducer.add_featurizers(speech_featurizer, text_featurizer)
+
+streaming_transducer_tester = BaseTester(
+    config=config["learning_config"]["running_config"],
+    output_name=args.output_name
+)
+streaming_transducer_tester.compile(streaming_transducer)
+streaming_transducer_tester.run(test_dataset)
@@ -0,0 +1,108 @@
+# Copyright 2020 Huy Le Nguyen (@usimarit)
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import argparse
+from tensorflow_asr.utils import setup_environment, setup_devices
+
+setup_environment()
+import tensorflow as tf
+
+DEFAULT_YAML = os.path.join(os.path.abspath(os.path.dirname(__file__)), "config.yml")
+
+tf.keras.backend.clear_session()
+
+parser = argparse.ArgumentParser(prog="Conformer Testing")
+
+parser.add_argument("--config", type=str, default=DEFAULT_YAML,
+                    help="The file path of model configuration file")
+
+parser.add_argument("--saved", type=str, default=None,
+                    help="Path to saved model")
+
+parser.add_argument("--tfrecords", default=False, action="store_true",
+                    help="Whether to use tfrecords as dataset")
+
+parser.add_argument("--mxp", default=False, action="store_true",
+                    help="Enable mixed precision")
+
+parser.add_argument("--device", type=int, default=0,
+                    help="Device's id to run test on")
+
+parser.add_argument("--cpu", default=False, action="store_true",
+                    help="Whether to only use cpu")
+
+parser.add_argument("--subwords", type=str, default=None,
+                    help="Path to file that stores generated subwords")
+
+parser.add_argument("--output_name", type=str, default="test",
+                    help="Result filename name prefix")
+
+args = parser.parse_args()
+
+tf.config.optimizer.set_experimental_options({"auto_mixed_precision": args.mxp})
+
+setup_devices([args.device], cpu=args.cpu)
+
+from tensorflow_asr.configs.user_config import UserConfig
+from tensorflow_asr.datasets.asr_dataset import ASRTFRecordDataset, ASRSliceDataset
+from tensorflow_asr.featurizers.speech_featurizers import TFSpeechFeaturizer
+from tensorflow_asr.featurizers.text_featurizers import SubwordFeaturizer
+from tensorflow_asr.runners.base_runners import BaseTester
+from tensorflow_asr.models.streaming_transducer import StreamingTransducer
+
+config = UserConfig(DEFAULT_YAML, args.config, learning=True)
+speech_featurizer = TFSpeechFeaturizer(config["speech_config"])
+
+if args.subwords and os.path.exists(args.subwords):
+    print("Loading subwords ...")
+    text_featurizer = SubwordFeaturizer.load_from_file(config["decoder_config"], args.subwords)
+else:
+    raise ValueError("subwords must be set")
+
+tf.random.set_seed(0)
+assert args.saved
+
+if args.tfrecords:
+    test_dataset = ASRTFRecordDataset(
+        data_paths=config["learning_config"]["dataset_config"]["test_paths"],
+        tfrecords_dir=config["learning_config"]["dataset_config"]["tfrecords_dir"],
+        speech_featurizer=speech_featurizer,
+        text_featurizer=text_featurizer,
+        stage="test", shuffle=False
+    )
+else:
+    test_dataset = ASRSliceDataset(
+        data_paths=config["learning_config"]["dataset_config"]["test_paths"],
+        speech_featurizer=speech_featurizer,
+        text_featurizer=text_featurizer,
+        stage="test", shuffle=False
+    )
+
+# build model
+streaming_transducer = StreamingTransducer(
+    vocabulary_size=text_featurizer.num_classes,
+    **config["model_config"]
+)
+streaming_transducer._build(speech_featurizer.shape)
+streaming_transducer.load_weights(args.saved, by_name=True)
+streaming_transducer.summary(line_length=150)
+streaming_transducer.add_featurizers(speech_featurizer, text_featurizer)
+
+streaming_transducer_tester = BaseTester(
+    config=config["learning_config"]["running_config"],
+    output_name=args.output_name
+)
+streaming_transducer_tester.compile(streaming_transducer)
+streaming_transducer_tester.run(test_dataset)