|
1066 | 1066 | "source": [
|
1067 | 1067 | "There is some overhead to parsing the CSV data. For small models this can be the bottleneck in training.\n",
|
1068 | 1068 | "\n",
|
1069 |
| - "Depending on your use case, it may be a good idea to use `Dataset.cache` or `tf.data.experimental.snapshot`, so that the CSV data is only parsed on the first epoch.\n", |
| 1069 | + "Depending on your use case, it may be a good idea to use `Dataset.cache` or `tf.data.Dataset.snapshot`, so that the CSV data is only parsed on the first epoch.\n", |
1070 | 1070 | "\n",
|
1071 | 1071 | "The main difference between the `cache` and `snapshot` methods is that `cache` files can only be used by the TensorFlow process that created them, but `snapshot` files can be read by other processes.\n",
|
1072 | 1072 | "\n",
|
|
1120 | 1120 | "id": "wN7uUBjmgNZ9"
|
1121 | 1121 | },
|
1122 | 1122 | "source": [
|
1123 |
| - "Note: The `tf.data.experimental.snapshot` files are meant for *temporary* storage of a dataset while in use. This is *not* a format for long term storage. The file format is considered an internal detail, and not guaranteed between TensorFlow versions." |
| 1123 | + "Note: The `tf.data.Dataset.snapshot` files are meant for *temporary* storage of a dataset while in use. This is *not* a format for long term storage. The file format is considered an internal detail, and not guaranteed between TensorFlow versions." |
1124 | 1124 | ]
|
1125 | 1125 | },
|
1126 | 1126 | {
|
|
1132 | 1132 | "outputs": [],
|
1133 | 1133 | "source": [
|
1134 | 1134 | "%%time\n",
|
1135 |
| - "snapshot = tf.data.experimental.snapshot('titanic.tfsnap')\n", |
1136 |
| - "snapshotting = traffic_volume_csv_gz_ds.apply(snapshot).shuffle(1000)\n", |
| 1135 | + "snapshotting = traffic_volume_csv_gz_ds.snapshot('titanic.tfsnap').shuffle(1000)\n", |
1137 | 1136 | "\n",
|
1138 | 1137 | "for i, (batch, label) in enumerate(snapshotting.shuffle(1000).repeat(20)):\n",
|
1139 | 1138 | " if i % 40 == 0:\n",
|
|
1147 | 1146 | "id": "fUSSegnMCGRz"
|
1148 | 1147 | },
|
1149 | 1148 | "source": [
|
1150 |
| - "If your data loading is slowed by loading CSV files, and `Dataset.cache` and `tf.data.experimental.snapshot` are insufficient for your use case, consider re-encoding your data into a more streamlined format." |
| 1149 | + "If your data loading is slowed by loading CSV files, and `Dataset.cache` and `tf.data.Dataset.snapshot` are insufficient for your use case, consider re-encoding your data into a more streamlined format." |
1151 | 1150 | ]
|
1152 | 1151 | },
|
1153 | 1152 | {
|
|
1862 | 1861 | "source": [
|
1863 | 1862 | "For another example of increasing CSV performance by using large batches, refer to the [Overfit and underfit tutorial](../keras/overfit_and_underfit.ipynb).\n",
|
1864 | 1863 | "\n",
|
1865 |
| - "This sort of approach may work, but consider other options like `Dataset.cache` and `tf.data.experimental.snapshot`, or re-encoding your data into a more streamlined format." |
| 1864 | + "This sort of approach may work, but consider other options like `Dataset.cache` and `tf.data.Dataset.snapshot`, or re-encoding your data into a more streamlined format." |
1866 | 1865 | ]
|
1867 | 1866 | }
|
1868 | 1867 | ],
|
|
0 commit comments