|
241 | 241 | "#### Pattern 1: Python object manipulation and variable creation intended to be done only once get run multiple times\n",
|
242 | 242 | "<a id=\"pattern-1\"></a>\n",
|
243 | 243 | "\n",
|
244 |
| - "In TF1.x programs that rely on graphs and sessions, the expectation is usually that all Python logic in your program will only run once. However, with eager execution and `tf.function` it is fair to expect that your Python logic will be run at least once, but possibly more times (either multiple times eagerly, or multiple times across different `tf.function` traces). Any Python logic within a `tf.function` will be traced at least twice due to how `tf.function` works. Refer to the `tf.function` [guide](https://www.tensorflow.org/guide/function) for more details.\n", |
| 244 | + "In TF1.x programs that rely on graphs and sessions, the expectation is usually that all Python logic in your program will only run once. However, with eager execution and `tf.function` it is fair to expect that your Python logic will be run at least once, but possibly more times (either multiple times eagerly, or multiple times across different `tf.function` traces). Sometimes, `tf.function` will even trace twice on the same input, causing unexpected behaviors (see Example 1 and 2). Refer to the `tf.function` [guide](https://www.tensorflow.org/guide/function) for more details.\n", |
245 | 245 | "\n",
|
246 | 246 | "Note: This pattern usually causes your code to silently misbehave when executing eagerly without `tf.function`s, but generally raises an `InaccessibleTensorError` or a `ValueError` when attempting to wrap the problematic code inside of a `tf.function`. To discover and debug this issue, it is recommended you wrap your code with `tf.function` early on, and use [pdb](https://docs.python.org/3/library/pdb.html) or interactive debugging to identify the source of the `InaccessibleTensorError`.\n",
|
247 | 247 | "\n",
|
248 | 248 | "**Example 1: Variable creation**\n",
|
249 | 249 | "\n",
|
250 |
| - "TF1.x code often creates variables without checking that they have already been made (because it runs the Python logic only once at all times). Naively mapping this code to eager execution may cause it to accidentally create new variables in each training step.\n", |
| 250 | + "Consider the example below, where the function creates a variable when called:\n", |
251 | 251 | "\n",
|
252 |
| - "**Example 2: Manipulating a global Python list**\n", |
| 252 | + "```python\n", |
| 253 | + "def f():\n", |
| 254 | + " v = tf.Variable(1.0)\n", |
| 255 | + " return v\n", |
| 256 | + "\n", |
| 257 | + "with tf.Graph().as_default():\n", |
| 258 | + " with tf.compat.v1.Session() as sess:\n", |
| 259 | + " res = f()\n", |
| 260 | + " sess.run(tf.compat.v1.global_variables_initializer())\n", |
| 261 | + " sess.run(res)\n", |
| 262 | + "```\n", |
| 263 | + "\n", |
| 264 | + "However, naively wrapping the above function that contains variable creation with `tf.function` is not allowed. `tf.function` only supports [singleton variable creations on the first call](https://www.tensorflow.org/guide/function#creating_tfvariables). To enforce this, when tf.function detects variable creation in the first call, it will attempt to trace again and raise an error if there is variable creation in the second trace.\n", |
| 265 | + "\n", |
| 266 | + "```python\n", |
| 267 | + "@tf.function\n", |
| 268 | + "def f():\n", |
| 269 | + " print(\"trace\") # This will print twice because the python body is run twice\n", |
| 270 | + " v = tf.Variable(1.0)\n", |
| 271 | + " return v\n", |
| 272 | + "\n", |
| 273 | + "try:\n", |
| 274 | + " f()\n", |
| 275 | + "except ValueError as e:\n", |
| 276 | + " print(e)\n", |
| 277 | + "```\n", |
| 278 | + "\n", |
| 279 | + "A workaround is caching and reusing the variable after it is created in the first call.\n", |
| 280 | + "\n", |
| 281 | + "```python\n", |
| 282 | + "class Model(tf.Module):\n", |
| 283 | + " def __init__(self):\n", |
| 284 | + " self.v = None\n", |
| 285 | + "\n", |
| 286 | + " @tf.function\n", |
| 287 | + " def __call__(self):\n", |
| 288 | + " print(\"trace\") # This will print twice because the python body is run twice\n", |
| 289 | + " if self.v is None:\n", |
| 290 | + " self.v = tf.Variable(0)\n", |
| 291 | + " return self.v\n", |
| 292 | + "\n", |
| 293 | + "m = Model()\n", |
| 294 | + "m()\n", |
| 295 | + "```\n", |
| 296 | + "\n", |
| 297 | + "**Example 2: Out-of-scope Tensors due to `tf.function` retracing**\n", |
| 298 | + "\n", |
| 299 | + "As demonstrated in Example 1, `tf.function` will retrace when it detects Variable creation in the first call. This can cause extra confusion, because the two tracings will create two graphs. When the second graph from retracing attempts to access a Tensor from the graph generated during the first tracing, Tensorflow will raise an error complaining that the Tensor is out of scope. To demonstrate the scenario, the code below creates a dataset on the first `tf.function` call. This would run as expected.\n", |
| 300 | + "\n", |
| 301 | + "```python\n", |
| 302 | + "class Model(tf.Module):\n", |
| 303 | + " def __init__(self):\n", |
| 304 | + " self.dataset = None\n", |
| 305 | + "\n", |
| 306 | + " @tf.function\n", |
| 307 | + " def __call__(self):\n", |
| 308 | + " print(\"trace\") # This will print once: only traced once\n", |
| 309 | + " if self.dataset is None:\n", |
| 310 | + " self.dataset = tf.data.Dataset.from_tensors([1, 2, 3])\n", |
| 311 | + " it = iter(self.dataset)\n", |
| 312 | + " return next(it)\n", |
| 313 | + "\n", |
| 314 | + "m = Model()\n", |
| 315 | + "m()\n", |
| 316 | + "```\n", |
| 317 | + "\n", |
| 318 | + "However, if we also attempt to create a variable on the first `tf.function` call, the code will raise an error complaining that the dataset is out of scope. This is because the dataset is in the first graph, while the second graph is also attempting to access it.\n", |
| 319 | + "\n", |
| 320 | + "```python\n", |
| 321 | + "class Model(tf.Module):\n", |
| 322 | + " def __init__(self):\n", |
| 323 | + " self.v = None\n", |
| 324 | + " self.dataset = None\n", |
| 325 | + "\n", |
| 326 | + " @tf.function\n", |
| 327 | + " def __call__(self):\n", |
| 328 | + " print(\"trace\") # This will print twice because the python body is run twice\n", |
| 329 | + " if self.v is None:\n", |
| 330 | + " self.v = tf.Variable(0)\n", |
| 331 | + " if self.dataset is None:\n", |
| 332 | + " self.dataset = tf.data.Dataset.from_tensors([1, 2, 3])\n", |
| 333 | + " it = iter(self.dataset)\n", |
| 334 | + " return [self.v, next(it)]\n", |
| 335 | + "\n", |
| 336 | + "m = Model()\n", |
| 337 | + "try:\n", |
| 338 | + " m()\n", |
| 339 | + "except TypeError as e:\n", |
| 340 | + " print(e) # <tf.Tensor ...> is out of scope and cannot be used here.\n", |
| 341 | + "```\n", |
| 342 | + "\n", |
| 343 | + "The most straightfoward solution is ensuring that the variable creation and dataset creation are both outside of the `tf.funciton` call. For example:\n", |
| 344 | + "\n", |
| 345 | + "```python\n", |
| 346 | + "class Model(tf.Module):\n", |
| 347 | + " def __init__(self):\n", |
| 348 | + " self.v = None\n", |
| 349 | + " self.dataset = None\n", |
| 350 | + "\n", |
| 351 | + " def initialize(self):\n", |
| 352 | + " if self.dataset is None:\n", |
| 353 | + " self.dataset = tf.data.Dataset.from_tensors([1, 2, 3])\n", |
| 354 | + " if self.v is None:\n", |
| 355 | + " self.v = tf.Variable(0)\n", |
| 356 | + "\n", |
| 357 | + " @tf.function\n", |
| 358 | + " def __call__(self):\n", |
| 359 | + " it = iter(self.dataset)\n", |
| 360 | + " return [self.v, next(it)]\n", |
| 361 | + "\n", |
| 362 | + "m = Model()\n", |
| 363 | + "m.initialize()\n", |
| 364 | + "m()\n", |
| 365 | + "```\n", |
| 366 | + "\n", |
| 367 | + "However, sometimes it's not avoidable to create variables in `tf.function` (such as slot variables in some [TF keras optimizers](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Optimizer#slots)). Still, we can simply move the dataset creation outside of the `tf.function` call. The reason that we can rely on this is because `tf.function` will receive the dataset as an implicit input and both graphs can access it properly.\n", |
| 368 | + "\n", |
| 369 | + "```python\n", |
| 370 | + "class Model(tf.Module):\n", |
| 371 | + " def __init__(self):\n", |
| 372 | + " self.v = None\n", |
| 373 | + " self.dataset = None\n", |
| 374 | + "\n", |
| 375 | + " def initialize(self):\n", |
| 376 | + " if self.dataset is None:\n", |
| 377 | + " self.dataset = tf.data.Dataset.from_tensors([1, 2, 3])\n", |
| 378 | + "\n", |
| 379 | + " @tf.function\n", |
| 380 | + " def __call__(self):\n", |
| 381 | + " if self.v is None:\n", |
| 382 | + " self.v = tf.Variable(0)\n", |
| 383 | + " it = iter(self.dataset)\n", |
| 384 | + " return [self.v, next(it)]\n", |
| 385 | + "\n", |
| 386 | + "m = Model()\n", |
| 387 | + "m.initialize()\n", |
| 388 | + "m()\n", |
| 389 | + "```\n", |
| 390 | + "\n", |
| 391 | + "**Example 3: Unexpected Tensorflow object re-creations due to dict usage**\n", |
| 392 | + "\n", |
| 393 | + "`tf.function` has very poor support for python side effects such as appending to a list, or checking/adding to a dictionary. More details are in [\"Better performance with tf.function\"](https://www.tensorflow.org/guide/function#executing_python_side_effects). In the example below, the code uses dictionaries to cache datasets and iterators. For the same key, each call to the model will return the same iterator of the dataset.\n", |
| 394 | + "\n", |
| 395 | + "```python\n", |
| 396 | + "class Model(tf.Module):\n", |
| 397 | + " def __init__(self):\n", |
| 398 | + " self.datasets = {}\n", |
| 399 | + " self.iterators = {}\n", |
| 400 | + "\n", |
| 401 | + " def __call__(self, key):\n", |
| 402 | + " if key not in self.datasets:\n", |
| 403 | + " self.datasets[key] = tf.compat.v1.data.Dataset.from_tensor_slices([1, 2, 3])\n", |
| 404 | + " self.iterators[key] = self.datasets[key].make_initializable_iterator()\n", |
| 405 | + " return self.iterators[key]\n", |
| 406 | + "\n", |
| 407 | + "with tf.Graph().as_default():\n", |
| 408 | + " with tf.compat.v1.Session() as sess:\n", |
| 409 | + " m = Model()\n", |
| 410 | + " it = m('a')\n", |
| 411 | + " sess.run(it.initializer)\n", |
| 412 | + " for _ in range(3):\n", |
| 413 | + " print(sess.run(it.get_next())) # prints 1, 2, 3\n", |
| 414 | + "```\n", |
| 415 | + "\n", |
| 416 | + "However, the pattern above will not work as expected in `tf.function`. During tracing, `tf.function` will ignore the python side effect of addition to the dictionaries. Instead, it only remembers the creation of a new dataset and iterator. As a result, each call to the model will always return a new iterator. This issue is hard to notice unless the numerical results or performance are significant enough. Hence, we recommend users to think about the code carefully before wrapping `tf.function` naively onto the python code.\n", |
| 417 | + "\n", |
| 418 | + "```python\n", |
| 419 | + "class Model(tf.Module):\n", |
| 420 | + " def __init__(self):\n", |
| 421 | + " self.datasets = {}\n", |
| 422 | + " self.iterators = {}\n", |
| 423 | + "\n", |
| 424 | + " @tf.function\n", |
| 425 | + " def __call__(self, key):\n", |
| 426 | + " if key not in self.datasets:\n", |
| 427 | + " self.datasets[key] = tf.data.Dataset.from_tensor_slices([1, 2, 3])\n", |
| 428 | + " self.iterators[key] = iter(self.datasets[key])\n", |
| 429 | + " return self.iterators[key]\n", |
| 430 | + "\n", |
| 431 | + "m = Model()\n", |
| 432 | + "for _ in range(3):\n", |
| 433 | + " print(next(m('a'))) # prints 1, 1, 1\n", |
| 434 | + "```\n", |
| 435 | + "\n", |
| 436 | + "We can use [`tf.init_scope`](https://www.tensorflow.org/api_docs/python/tf/init_scope) to lift the dataset and iterator creation outside of the graph, to achieve the expected behavior:\n", |
| 437 | + "\n", |
| 438 | + "```python\n", |
| 439 | + "class Model(tf.Module):\n", |
| 440 | + " def __init__(self):\n", |
| 441 | + " self.datasets = {}\n", |
| 442 | + " self.iterators = {}\n", |
| 443 | + "\n", |
| 444 | + " @tf.function\n", |
| 445 | + " def __call__(self, key):\n", |
| 446 | + " if key not in self.datasets:\n", |
| 447 | + " # Lifts ops out of function-building graphs\n", |
| 448 | + " with tf.init_scope():\n", |
| 449 | + " self.datasets[key] = tf.data.Dataset.from_tensor_slices([1, 2, 3])\n", |
| 450 | + " self.iterators[key] = iter(self.datasets[key])\n", |
| 451 | + " return self.iterators[key]\n", |
| 452 | + "\n", |
| 453 | + "m = Model()\n", |
| 454 | + "for _ in range(3):\n", |
| 455 | + " print(next(m('a'))) # prints 1, 2, 3\n", |
| 456 | + "```\n", |
| 457 | + "\n", |
| 458 | + "The general rule of thumb is to avoid relying on Python side effects in your logic and only use them to debug your traces.\n", |
| 459 | + "\n", |
| 460 | + "**Example 4: Manipulating a global Python list**\n", |
253 | 461 | "\n",
|
254 | 462 | "The following TF1.x code uses a global list of losses that it uses to only maintain the list of losses generated by the current training step. Note that the Python logic that appends losses to the list will only be called once regardless of how many training steps the session is run for.\n",
|
255 | 463 | "\n",
|
|
477 | 685 | "\n",
|
478 | 686 | "However, it is ***possible though unlikely*** that these stronger consistency guarantees may increase the memory usage of your specific program. Please file an [issue](https://github.com/tensorflow/tensorflow/issues) if you find this to be the case. Additionally, if you have unit tests relying on exact string comparisons against the operator names in a graph corresponding to variable reads, be aware that enabling resource variables may slightly change the name of these operators.\n",
|
479 | 687 | "\n",
|
480 |
| - "To isolate the impact of this behavior change on your code, if eager execution is disabled you can use `tf.compat.v1.disable_resource_variables()` and `tf.compat.v1.enable_resource_variables()` to globally disable or enable this behavior change. `ResourceVariables` will always be used if eager execution is enabled. You can also \n" |
| 688 | + "To isolate the impact of this behavior change on your code, if eager execution is disabled you can use `tf.compat.v1.disable_resource_variables()` and `tf.compat.v1.enable_resource_variables()` to globally disable or enable this behavior change. `ResourceVariables` will always be used if eager execution is enabled.\n" |
481 | 689 | ]
|
482 | 690 | },
|
483 | 691 | {
|
|
925 | 1133 | ],
|
926 | 1134 | "metadata": {
|
927 | 1135 | "colab": {
|
928 |
| - "collapsed_sections": [ |
929 |
| - "Tce3stUlHN0L" |
930 |
| - ], |
| 1136 | + "collapsed_sections": [], |
931 | 1137 | "name": "tf1_vs_tf2.ipynb",
|
932 | 1138 | "provenance": [],
|
933 | 1139 | "toc_visible": true
|
|
0 commit comments