Lint the Transfer Learning with YAMNet tutorial

8bitmp3 · web-flow · commit 1ddc6e6c58bc · 2021-06-13T20:20:23.000+01:00
diff --git a/site/en/tutorials/audio/transfer_learning_audio.ipynb b/site/en/tutorials/audio/transfer_learning_audio.ipynb
@@ -64,7 +64,7 @@
       "source": [
         "# Transfer Learning with YAMNet for environmental sound classification\n",
         "\n",
-        "[YAMNet](https://tfhub.dev/google/yamnet/1) is an audio event classifier that can predict audio events from [521 classes](https://github.com/tensorflow/models/blob/master/research/audioset/yamnet/yamnet_class_map.csv), like laughter, barking, or a siren. \n",
+        "[YAMNet](https://tfhub.dev/google/yamnet/1) is a pretrained deep neural network that can predict audio events from [521 classes](https://github.com/tensorflow/models/blob/master/research/audioset/yamnet/yamnet_class_map.csv), like laughter, barking, or a siren. \n",
         "\n",
         " In this tutorial you will learn how to:\n",
         "\n",
@@ -130,27 +130,27 @@
       "source": [
         "## About YAMNet\n",
         "\n",
-        "YAMNet is an audio event classifier that takes audio waveform as input and makes independent predictions for each of 521 audio events from the [AudioSet](https://research.google.com/audioset/) ontology.\n",
+        "[YAMNet](https://github.com/tensorflow/models/tree/master/research/audioset/yamnet) is a pretrained neural network that employs the [MobileNetV1](https://arxiv.org/abs/1704.04861) depthwise-separable convolution architecture. It can use an audio waveform as input and classify 521 audio events from the [AudioSet](http://g.co/audioset) corpus.\n",
         "\n",
-        "Internally, the model extracts \"frames\" from the audio signal and processes batches of these frames. This version of the model uses frames that are 0.96s long and extracts one frame every 0.48s.\n",
+        "Internally, the model extracts \"frames\" from the audio signal and processes batches of these frames. This version of the model uses frames that are 0.96 second long and extracts one frame every 0.48 second.\n",
         "\n",
         "The model accepts a 1-D float32 Tensor or NumPy array containing a waveform of arbitrary length, represented as mono 16 kHz samples in the range `[-1.0, +1.0]`. This tutorial contains code to help you convert a `.wav` file into the correct format.\n",
         "\n",
         "The model returns 3 outputs, including the class scores, embeddings (which you will use for transfer learning), and the log mel spectrogram. You can find more details [here](https://tfhub.dev/google/yamnet/1), and this tutorial will walk you through using these in practice.\n",
         "\n",
-        "One specific use of YAMNet is as a high-level feature extractor: the `1024-D` embedding output of YAMNet can be used as the input features of another shallow model which can then be trained on a small amount of data for a particular task. This allows the quick creation of specialized audio classifiers without requiring a lot of labeled data and without having to train a large model end-to-end.\n",
+        "One specific use of YAMNet is as a high-level feature extractor: the 1024-dimensional embedding output of YAMNet can be used as the input features of another shallow model which can then be trained on a small amount of data for a particular task. This allows the quick creation of specialized audio classifiers without requiring a lot of labeled data and without having to train a large model end-to-end.\n",
         "\n",
         "You will use YAMNet's embeddings output for transfer learning and train one or more [Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense) layers on top of this.\n",
         "\n",
         "First, you will try the model and see the results of classifying audio. You will then construct the data pre-processing pipeline.\n",
         "\n",
         "### Loading YAMNet from TensorFlow Hub\n",
         "\n",
-        "You are going to use YAMNet from [Tensorflow Hub](https://tfhub.dev/) to extract the embeddings from the sound files.\n",
+        "You are going to use a pre-trained YAMNet from [Tensorflow Hub](https://tfhub.dev/) to extract the embeddings from the sound files.\n",
         "\n",
-        "Loading a model from TensorFlow Hub is straightforward: choose the model, copy its URL and use the `load` function.\n",
+        "Loading a model from TensorFlow Hub is straightforward: choose the model, copy its URL, and use the `load` function.\n",
         "\n",
-        "Note: to read the documentation of the model, you can use the model url in your browser."
+        "Note: to read the documentation of the model, use the model URL in your browser."
       ]
     },
     {
@@ -171,7 +171,7 @@
         "id": "GmrPJ0GHw9rr"
       },
       "source": [
-        "With the model loaded and following the [models's basic usage tutorial](https://www.tensorflow.org/hub/tutorials/yamnet) you'll download a sample wav file and run the inference.\n"
+        "With the model loaded, you can follow the [YAMNet basic usage tutorial](https://www.tensorflow.org/hub/tutorials/yamnet) and download a sample WAV file to run the inference.\n"
       ]
     },
     {
@@ -196,9 +196,9 @@
         "id": "mBm9y9iV2U_-"
       },
       "source": [
-        "You will need a function to load the audio files. They will also be used later when working with the training data.\n",
+        "You will need a function to load audio files, which will also be used later when working with the training data.\n",
         "\n",
-        "Note: The returned `wav_data` from `load_wav_16k_mono` is already normalized to values in `[-1.0, 1.0]` (as stated in the model's [documentation](https://tfhub.dev/google/yamnet/1))."
+        "Note: The returned `wav_data` from `load_wav_16k_mono` is already normalized to values in the `[-1.0, 1.0]` range (as stated in the model's [documentation](https://tfhub.dev/google/yamnet/1))."
       ]
     },
     {
@@ -209,7 +209,7 @@
       },
       "outputs": [],
       "source": [
-        "# Util functions for loading audio files and ensure the correct sample rate\n",
+        "# Utility functions for loading audio files and ensure the correct sample rate\n",
         "\n",
         "@tf.function\n",
         "def load_wav_16k_mono(filename):\n",
@@ -248,7 +248,7 @@
       "source": [
         "### Load the class mapping\n",
         "\n",
-        "It's important to load the class names that YAMNet is able to recognize. The mapping file is present at `yamnet_model.class_map_path()`, in the `csv` format."
+        "It's important to load the class names that YAMNet is able to recognize. The mapping file is present at `yamnet_model.class_map_path()` in the CSV format."
       ]
     },
     {
@@ -275,7 +275,7 @@
       "source": [
         "### Run inference\n",
         "\n",
-        "YAMNet provides frame-level class-scores (i.e., 521 scores for every frame). In order to determine clip-level predictions, the scores can be aggregated per-class across frames (e.g., using mean or max aggregation). This is done below by `scores_np.mean(axis=0)`. Finally, in order to find the top-scored class at the clip-level, we take the maximum of the 521 aggregated scores.\n"
+        "YAMNet provides frame-level class-scores (i.e., 521 scores for every frame). In order to determine clip-level predictions, the scores can be aggregated per-class across frames (e.g., using mean or max aggregation). This is done below by `scores_np.mean(axis=0)`. Finally, to find the top-scored class at the clip-level, you take the maximum of the 521 aggregated scores.\n"
       ]
     },
     {
@@ -301,7 +301,7 @@
         "id": "YBaLNg5H5IWa"
       },
       "source": [
-        "Note: The model correctly inferred an animal sound. Your goal is to increase accuracy for specific classes. Also, notice that the the model generated 13 embeddings, 1 per frame."
+        "Note: The model correctly inferred an animal sound. Your goal in this tutorial is to increase the model's accuracy for specific classes. Also, notice that the the model generated 13 embeddings, 1 per frame."
       ]
     },
     {
@@ -312,9 +312,9 @@
       "source": [
         "## ESC-50 dataset\n",
         "\n",
-        "The [ESC-50 dataset](https://github.com/karolpiczak/ESC-50#repository-content), well described [here](https://www.karolpiczak.com/papers/Piczak2015-ESC-Dataset.pdf), is a labeled collection of 2000 environmental audio recordings (each 5 seconds long). The data consists of 50 classes, with 40 examples per class.\n",
+        "The [ESC-50 dataset](https://github.com/karolpiczak/ESC-50#repository-content) - described in detail [here](https://www.karolpiczak.com/papers/Piczak2015-ESC-Dataset.pdf) - is a labeled collection of 2,000 five-second long environmental audio recordings. The data consists of 50 classes, with 40 examples per class.\n",
         "\n",
-        "Next, you will download and extract it. \n"
+        "Download the dataset and extract it. \n"
       ]
     },
     {
@@ -344,7 +344,7 @@
         "\n",
         "and all the audio files are in `./datasets/ESC-50-master/audio/`\n",
         "\n",
-        "You will create a pandas dataframe with the mapping and use that to have a clearer view of the data.\n"
+        "You will create a pandas DataFrame with the mapping and use that to have a clearer view of the data.\n"
       ]
     },
     {
@@ -370,11 +370,11 @@
       "source": [
         "### Filter the data\n",
         "\n",
-        "Given the data on the dataframe, you will apply some transformations:\n",
+        "Now that the data is tored in the DataFrame, apply some transformations:\n",
         "\n",
-        "- filter out rows and use only the selected classes (dog and cat). If you want to use any other classes, this is where you can choose them.\n",
-        "- change the filename to have the full path. This will make loading easier later.\n",
-        "- change targets to be within a specific range. In this example, dog will remain 0, but cat will become 1 instead of its original value of 5."
+        "- Filter out rows and use only the selected classes - `dog` and `cat`. If you want to use any other classes, this is where you can choose them.\n",
+        "- Amend the filename to have the full path. This will make loading easier later.\n",
+        "- Change targets to be within a specific range. In this example, `dog` will remain at `0`, but `cat` will become `1` instead of its original value of `5`."
       ]
     },
     {
@@ -418,9 +418,9 @@
         "id": "AKDT5RomaDKO"
       },
       "source": [
-        "Your model will use each frame as one input so you need to to create a new column that has one frame per row. You also need to expand the labels and fold column to proper reflect these new rows.\n",
+        "Your model will use each frame as one input. Therefore, you need to create a new column that has one frame per row. You also need to expand the labels and fold column to proper reflect these new rows.\n",
         "\n",
-        "The expanded fold column keeps the original value. You cannot mix frames because, when doing the splits, you might end with parts of the same audio on different splits and that would make our validation and test steps less effective."
+        "The expanded fold column keeps the original value. You cannot mix frames because, when performing the splits, you might end up having parts of the same audio on different splits - that would make your validation and test steps less effective."
       ]
     },
     {
@@ -525,7 +525,7 @@
         "## Create your model\n",
         "\n",
         "You did most of the work!\n",
-        "Next, define a very simple Sequential Model to start with -- one hiden layer and 2 outputs to recognize cats and dogs.\n"
+        "Next, define a very simple Sequential model with one hidden layer and two outputs to recognize cats and dogs.\n"
       ]
     },
     {
@@ -641,15 +641,15 @@
         "id": "k2yleeev645r"
       },
       "source": [
-        "## Save a model that can directly take a wav file as input\n",
+        "## Save a model that can directly take a WAV file as input\n",
         "\n",
         "Your model works when you give it the embeddings as input.\n",
         "\n",
-        "In a real situation you'll want to give it the sound data directly.\n",
+        "In a real-world scenario, you'll want to use audio data as a direct input.\n",
         "\n",
-        "To do that you will combine YAMNet with your model into one single model that you can export for other applications.\n",
+        "To do that, you will combine YAMNet with your model into a single model that you can export for other applications.\n",
         "\n",
-        "To make it easier to use the model's result, the final layer will be a `reduce_mean` operation. When using this model for serving, as you will see bellow, you will need the name of the final layer. If you don't define one, TF will auto define an incremental one that makes it hard to test as it will keep changing everytime you train the model. When using a raw tf operation you can't assign a name to it. To address this issue, you'll create a custom layer that just apply `reduce_mean` and you will call it 'classifier'.\n"
+        "To make it easier to use the model's result, the final layer will be a `reduce_mean` operation. When using this model for serving, as you will see bellow, you will need the name of the final layer. If you don't define one, TensorFlow will auto-define an incremental one that makes it hard to test, as it will keep changing every time you train the model. When using a raw tf operation you can't assign a name to it. To address this issue, you'll create a custom layer that just apply `reduce_mean` and you will call it 'classifier'.\n"
       ]
     },
     {
@@ -828,9 +828,9 @@
       "source": [
         "## Next steps\n",
         "\n",
-        "You just created a model that can classify sounds from dogs or cats. With the same idea and proper data you could, for example, build a bird recognizer based on their singing.\n",
+        "You have created a model that can classify sounds from dogs or cats. With the same idea and a different dataset you can try, for example, building an [acoustic identifier of birds](https://www.kaggle.com/c/birdclef-2021/) based on their singing.\n",
         "\n",
-        "Let us know what you come up with! Share your project with us on social media.\n"
+        "Share your project with the TensorFlow team on social media!\n"
       ]
     }
   ],