Update distributed model saving/loading tutorial: Add example showing how saving works after calling .fit. Also fix some typos, linting, adding some minor details.

tensorflower-gardener · copybara-github · commit 8aea6cc53d01 · 2022-03-29T15:22:48.000-07:00
PiperOrigin-RevId: 438140288
diff --git a/site/en/tutorials/distribute/save_and_load.ipynb b/site/en/tutorials/distribute/save_and_load.ipynb
@@ -71,7 +71,9 @@
       "source": [
         "## Overview\n",
         "\n",
-        "It's common to save and load a model during training. There are two sets of APIs for saving and loading a keras model: a high-level API, and a low-level API. This tutorial demonstrates how you can use the SavedModel APIs when using `tf.distribute.Strategy`. To learn about SavedModel and serialization in general, please read the [saved model guide](../../guide/saved_model.ipynb), and the [Keras model serialization guide](https://www.tensorflow.org/guide/keras/save_and_serialize). Let's start with a simple example: "
+        "This tutorial demonstrates how you can save and load models in a SavedModel format with `tf.distribute.Strategy` during or after training. There are two kinds of APIs for saving and loading a Keras model: high-level (`tf.keras.Model.save` and `tf.keras.models.load_model`) and low-level (`tf.saved_model.save` and `tf.saved_model.load`).\n",
+        "\n",
+        "To learn about SavedModel and serialization in general, please read the [saved model guide](../../guide/saved_model.ipynb), and the [Keras model serialization guide](https://www.tensorflow.org/guide/keras/save_and_serialize). Let's start with a simple example: "
       ]
     },
     {
@@ -102,7 +104,7 @@
         "id": "qqapWj98ptNV"
       },
       "source": [
-        "Prepare the data and model using `tf.distribute.Strategy`:"
+        "Load and prepare the data with TensorFlow Datasets and `tf.data`, and create the model using `tf.distribute.MirroredStrategy`:"
       ]
     },
     {
@@ -116,7 +118,7 @@
         "mirrored_strategy = tf.distribute.MirroredStrategy()\n",
         "\n",
         "def get_data():\n",
-        "  datasets, ds_info = tfds.load(name='mnist', with_info=True, as_supervised=True)\n",
+        "  datasets = tfds.load(name='mnist', as_supervised=True)\n",
         "  mnist_train, mnist_test = datasets['train'], datasets['test']\n",
         "\n",
         "  BUFFER_SIZE = 10000\n",
@@ -157,7 +159,7 @@
         "id": "qmU4Y3feS9Na"
       },
       "source": [
-        "Train the model: "
+        "Train the model with `tf.keras.Model.fit`: "
       ]
     },
     {
@@ -181,11 +183,11 @@
       "source": [
         "## Save and load the model\n",
         "\n",
-        "Now that you have a simple model to work with, let's take a look at the saving/loading APIs. \n",
-        "There are two sets of APIs available:\n",
+        "Now that you have a simple model to work with, let's explore the saving/loading APIs. \n",
+        "There are two kinds of APIs available:\n",
         "\n",
-        "*   High level keras `model.save` and `tf.keras.models.load_model`\n",
-        "*   Low level `tf.saved_model.save` and `tf.saved_model.load`\n"
+        "*   High-level (Keras): `Model.save` and `tf.keras.models.load_model`\n",
+        "*   Low-level: `tf.saved_model.save` and `tf.saved_model.load`\n"
       ]
     },
     {
@@ -194,7 +196,7 @@
         "id": "FX_IF2F1tvFs"
       },
       "source": [
-        "### The Keras APIs"
+        "### The Keras API"
       ]
     },
     {
@@ -203,7 +205,7 @@
         "id": "O8xfceg4Z3H_"
       },
       "source": [
-        "Here is an example of saving and loading a model with the Keras APIs:"
+        "Here is an example of saving and loading a model with the Keras API:"
       ]
     },
     {
@@ -214,7 +216,7 @@
       },
       "outputs": [],
       "source": [
-        "keras_model_path = \"/tmp/keras_save\"\n",
+        "keras_model_path = '/tmp/keras_save'\n",
         "model.save(keras_model_path)"
       ]
     },
@@ -245,9 +247,9 @@
         "id": "gYAnskzorda-"
       },
       "source": [
-        "After restoring the model, you can continue training on it, even without needing to call `compile()` again, since it is already compiled before saving. The model is saved in the TensorFlow's standard `SavedModel` proto format. For more information, please refer to [the guide to `saved_model` format](../../guide/saved_model.ipynb).\n",
+        "After restoring the model, you can continue training on it, even without needing to call `Model.compile` again, since it was already compiled before saving. The model is saved in TensorFlow's standard `SavedModel` proto format. For more information, please refer to [the guide to `SavedModel` format](../../guide/saved_model.ipynb).\n",
         "\n",
-        "Now to load the model and train it using a `tf.distribute.Strategy`:"
+        "Now, restore the model and train it using a `tf.distribute.Strategy`:"
       ]
     },
     {
@@ -258,7 +260,7 @@
       },
       "outputs": [],
       "source": [
-        "another_strategy = tf.distribute.OneDeviceStrategy(\"/cpu:0\")\n",
+        "another_strategy = tf.distribute.OneDeviceStrategy('/cpu:0')\n",
         "with another_strategy.scope():\n",
         "  restored_keras_model_ds = tf.keras.models.load_model(keras_model_path)\n",
         "  restored_keras_model_ds.fit(train_dataset, epochs=2)"
@@ -270,7 +272,7 @@
         "id": "PdiiPmL5tQk5"
       },
       "source": [
-        "As you can see, loading works as expected with `tf.distribute.Strategy`. The strategy used here does not have to be the same strategy used before saving. "
+        "As the `Model.fit` output shows, loading works as expected with `tf.distribute.Strategy`. The strategy used here does not have to be the same strategy used before saving. "
       ]
     },
     {
@@ -279,7 +281,7 @@
         "id": "3CrXIbmFt0f6"
       },
       "source": [
-        "### The `tf.saved_model` APIs"
+        "### The `tf.saved_model` API"
       ]
     },
     {
@@ -288,7 +290,7 @@
         "id": "HtGzPp6et4Em"
       },
       "source": [
-        "Now let's take a look at the lower level APIs. Saving the model is similar to the keras API:"
+        "Saving the model with lower-level API is similar to the Keras API:"
       ]
     },
     {
@@ -300,7 +302,7 @@
       "outputs": [],
       "source": [
         "model = get_model()  # get a fresh model\n",
-        "saved_model_path = \"/tmp/tf_save\"\n",
+        "saved_model_path = '/tmp/tf_save'\n",
         "tf.saved_model.save(model, saved_model_path)"
       ]
     },
@@ -310,7 +312,7 @@
         "id": "q1QNRYcwuRll"
       },
       "source": [
-        "Loading can be done with `tf.saved_model.load()`. However, since it is an API that is on the lower level (and hence has a wider range of use cases), it does not return a Keras model. Instead, it returns an object that contain functions that can be used to do inference. For example:"
+        "Loading can be done with `tf.saved_model.load`. However, since it is a lower-level API (and hence has a wider range of use cases), it does not return a Keras model. Instead, it returns an object that contain functions that can be used to do inference. For example:"
       ]
     },
     {
@@ -321,7 +323,7 @@
       },
       "outputs": [],
       "source": [
-        "DEFAULT_FUNCTION_KEY = \"serving_default\"\n",
+        "DEFAULT_FUNCTION_KEY = 'serving_default'\n",
         "loaded = tf.saved_model.load(saved_model_path)\n",
         "inference_func = loaded.signatures[DEFAULT_FUNCTION_KEY]"
       ]
@@ -332,7 +334,7 @@
         "id": "x65l7AaHUZCA"
       },
       "source": [
-        "The loaded object may contain multiple functions, each associated with a key. The `\"serving_default\"` is the default key for the inference function with a saved Keras model. To do an inference with this function: "
+        "The loaded object may contain multiple functions, each associated with a key. The `\"serving_default\"` key is the default key for the inference function with a saved Keras model. To do inference with this function: "
       ]
     },
     {
@@ -375,7 +377,9 @@
         "\n",
         "  # Calling the function in a distributed manner\n",
         "  for batch in dist_predict_dataset:\n",
-        "    another_strategy.run(inference_func,args=(batch,))"
+        "    result = another_strategy.run(inference_func, args=(batch,))\n",
+        "    print(result)\n",
+        "    break"
       ]
     },
     {
@@ -384,7 +388,7 @@
         "id": "hWGSukoyw3fF"
       },
       "source": [
-        "Calling the restored function is just a forward pass on the saved model (predict). What if yout want to continue training the loaded function? Or embed the loaded function into a bigger model? A common practice is to wrap this loaded object to a Keras layer to achieve this. Luckily, [TF Hub](https://www.tensorflow.org/hub) has [hub.KerasLayer](https://github.com/tensorflow/hub/blob/master/tensorflow_hub/keras_layer.py) for this purpose, shown here:"
+        "Calling the restored function is just a forward pass on the saved model (`tf.keras.Model.predict`). What if you want to continue training the loaded function? Or what if you need to embed the loaded function into a bigger model? A common practice is to wrap this loaded object into a Keras layer to achieve this. Luckily, [TF Hub](https://www.tensorflow.org/hub) has [`hub.KerasLayer`](https://github.com/tensorflow/hub/blob/master/tensorflow_hub/keras_layer.py) for this purpose, shown here:"
       ]
     },
     {
@@ -421,7 +425,7 @@
         "id": "Oe1z_OtSJlu2"
       },
       "source": [
-        "As you can see, `hub.KerasLayer` wraps the result loaded back from `tf.saved_model.load()` into a Keras layer that can be used to build another model. This is very useful for transfer learning. "
+        "In the above example, Tensorflow Hub's `hub.KerasLayer` wraps the result loaded back from `tf.saved_model.load` into a Keras layer that is used to build another model. This is very useful for transfer learning. "
       ]
     },
     {
@@ -439,11 +443,11 @@
         "id": "GC6GQ9HDLxD6"
       },
       "source": [
-        "For saving, if you are working with a keras model, it is almost always recommended to use the Keras's `model.save()` API. If what you are saving is not a Keras model, then the lower level API is your only choice. \n",
+        "For saving, if you are working with a Keras model, use the Keras `Model.save` API unless you need the additional control allowed by the low-level API. If what you are saving is not a Keras model, then the lower-level API, `tf.saved_model.save`, is your only choice. \n",
         "\n",
-        "For loading, which API you use depends on what you want to get from the loading API. If you cannot (or do not want to) get a Keras model, then use `tf.saved_model.load()`. Otherwise, use `tf.keras.models.load_model()`. Note that you can get a Keras model back only if you saved a Keras model. \n",
+        "For loading, your API choice depends on what you want to get from the model loading API. If you cannot (or do not want to) get a Keras model, then use `tf.saved_model.load`. Otherwise, use `tf.keras.models.load_model`. Note that you can get a Keras model back only if you saved a Keras model. \n",
         "\n",
-        "It is possible to mix and match the APIs. You can save a Keras model with `model.save`, and load a non-Keras model with the low-level API, `tf.saved_model.load`. "
+        "It is possible to mix and match the APIs. You can save a Keras model with `Model.save`, and load a non-Keras model with the low-level API, `tf.saved_model.load`. "
       ]
     },
     {
@@ -456,11 +460,11 @@
       "source": [
         "model = get_model()\n",
         "\n",
-        "# Saving the model using Keras's save() API\n",
-        "model.save(keras_model_path) \n",
+        "# Saving the model using Keras `Model.save`\n",
+        "model.save(keras_model_path)\n",
         "\n",
         "another_strategy = tf.distribute.MirroredStrategy()\n",
-        "# Loading the model using lower level API\n",
+        "# Loading the model using the lower-level API\n",
         "with another_strategy.scope():\n",
         "  loaded = tf.saved_model.load(keras_model_path)"
       ]
@@ -471,7 +475,7 @@
         "id": "0Z7lSj8nZiW5"
       },
       "source": [
-        "### Saving/Loading from local device"
+        "### Saving/Loading from a local device"
       ]
     },
     {
@@ -480,7 +484,7 @@
         "id": "NVAjWcosZodw"
       },
       "source": [
-        "When saving and loading from a local io device while running remotely, for example using a cloud TPU, the option `experimental_io_device` must be used to set the io device to localhost."
+        "When saving and loading from a local I/O device while training on remote devices—for example, when using a Cloud TPU—you must use the option `experimental_io_device` in `tf.saved_model.SaveOptions` and `tf.saved_model.LoadOptions` to set the I/O device to `localhost`. For example:"
       ]
     },
     {
@@ -494,7 +498,7 @@
         "model = get_model()\n",
         "\n",
         "# Saving the model to a path on localhost.\n",
-        "saved_model_path = \"/tmp/tf_save\"\n",
+        "saved_model_path = '/tmp/tf_save'\n",
         "save_options = tf.saved_model.SaveOptions(experimental_io_device='/job:localhost')\n",
         "model.save(saved_model_path, options=save_options)\n",
         "\n",
@@ -517,14 +521,10 @@
     {
       "cell_type": "markdown",
       "metadata": {
-        "id": "Tzog2ti7YYgy"
+        "id": "2cCSZrD7VCxe"
       },
       "source": [
-        "A special case is when you have a Keras model that does not have well-defined inputs. For example, a Sequential model can be created without any input shapes (`Sequential([Dense(3), ...]`). Subclassed models also do not have well-defined inputs after initialization. In this case, you should stick with the lower level APIs on both saving and loading, otherwise you will get an error. \n",
-        "\n",
-        "To check if your model has well-defined inputs, just check if `model.inputs` is `None`. If it is not `None`, you are all good. Input shapes are automatically defined when the model is used in `.fit`, `.evaluate`, `.predict`, or when calling the model (`model(inputs)`). \n",
-        "\n",
-        "Here is an example:"
+        "One special case is when you create Keras models in certain ways, and then save them before training. For example:"
       ]
     },
     {
@@ -536,6 +536,7 @@
       "outputs": [],
       "source": [
         "class SubclassedModel(tf.keras.Model):\n",
+        "  \"\"\"Example model defined by subclassing `tf.keras.Model`.\"\"\"\n",
         "\n",
         "  output_name = 'output_layer'\n",
         "\n",
@@ -548,8 +549,89 @@
         "    return self._dense_layer(inputs)\n",
         "\n",
         "my_model = SubclassedModel()\n",
-        "# my_model.save(keras_model_path)  # ERROR! \n",
-        "tf.saved_model.save(my_model, saved_model_path)"
+        "try:\n",
+        "  my_model.save(keras_model_path)\n",
+        "except ValueError as e:\n",
+        "  print(f'{type(e).__name__}: ', *e.args)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "D4qMyXFDSPDO"
+      },
+      "source": [
+        "A SavedModel saves the `tf.types.experimental.ConcreteFunction` objects generated when you trace a `tf.function` (check _When is a Function tracing?_ in the [Introduction to graphs and tf.function](../../guide/intro_to_graphs.ipynb) guide to learn more). If you get a `ValueError` like this it's because `Model.save` was not able to find or create a traced `ConcreteFunction`.\n",
+        "\n",
+        "**Caution:** You shouldn't save a model without at least one `ConcreteFunction`, since the low-level API will otherwise generate a SavedModel with no `ConcreteFunction` signatures ([learn more](../../guide/saved_model.ipynb) about the SavedModel format). For example:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "064SE47mYDj8"
+      },
+      "outputs": [],
+      "source": [
+        "tf.saved_model.save(my_model, saved_model_path)\n",
+        "x = tf.saved_model.load(saved_model_path)\n",
+        "x.signatures"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "LRTxlASJX-cY"
+      },
+      "source": [
+        "\n",
+        "Usually the model's forward pass—the `call` method—will be traced automatically when the model is called for the first time, often via the Keras `Model.fit` method. A `ConcreteFunction` can also be generated by the Keras [Sequential](https://www.tensorflow.org/guide/keras/sequential_model) and [Functional](https://www.tensorflow.org/guide/keras/functional) APIs, if you set the input shape, for example, by making the first layer either a `tf.keras.layers.InputLayer` or another layer type, and passing it the `input_shape` keyword argument. \n",
+        "\n",
+        "To verify if your model has any traced `ConcreteFunction`s, check if `Model.save_spec` is `None`:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "xAXise4eR0YJ"
+      },
+      "outputs": [],
+      "source": [
+        "print(my_model.save_spec() is None)"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "G2G_FQrWJAO3"
+      },
+      "source": [
+        "Let's use `tf.keras.Model.fit` to train the model, and notice that the `save_spec` gets defined and model saving will work:"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "cv5LTi0zDkKS"
+      },
+      "outputs": [],
+      "source": [
+        "BATCH_SIZE_PER_REPLICA = 4\n",
+        "BATCH_SIZE = BATCH_SIZE_PER_REPLICA * mirrored_strategy.num_replicas_in_sync\n",
+        "\n",
+        "dataset_size = 100\n",
+        "dataset = tf.data.Dataset.from_tensors(\n",
+        "    (tf.range(5, dtype=tf.float32), tf.range(5, dtype=tf.float32))\n",
+        "    ).repeat(dataset_size).batch(BATCH_SIZE)\n",
+        "\n",
+        "my_model.compile(optimizer='adam', loss='mean_squared_error')\n",
+        "my_model.fit(dataset, epochs=2)\n",
+        "\n",
+        "print(my_model.save_spec() is None)\n",
+        "my_model.save(keras_model_path)"
       ]
     }
   ],