Skip to content

Commit a077cd6

Browse files
MarkDaoustcopybara-github
authored andcommitted
Fix notebooks.
PiperOrigin-RevId: 483639934
1 parent 9f36543 commit a077cd6

File tree

4 files changed

+60
-49
lines changed

4 files changed

+60
-49
lines changed

site/en/guide/migrate/migration_debugging.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -416,7 +416,7 @@
416416
" decay_steps=params['decay_steps'],\n",
417417
" end_learning_rate=params['end_lr'],\n",
418418
" power=params['lr_power']) \n",
419-
" self.optimizer = tf.keras.optimizers.SGD(learning_rate_fn)\n",
419+
" self.optimizer = tf.keras.optimizers.legacy.SGD(learning_rate_fn)\n",
420420
" self.compiled_loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)\n",
421421
" self.logs = {\n",
422422
" 'lr': [],\n",

site/en/tutorials/audio/transfer_learning_audio.ipynb

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,9 @@
9999
},
100100
"outputs": [],
101101
"source": [
102-
"!pip install tensorflow_io"
102+
"!pip install \"tensorflow==2.10.*\"\n",
103+
"# tensorflow_io 0.27 is compatible with TensorFlow 2.10\n",
104+
"!pip install \"tensorflow_io==0.27.*\""
103105
]
104106
},
105107
{

site/en/tutorials/reinforcement_learning/actor_critic.ipynb

Lines changed: 48 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -161,8 +161,7 @@
161161
"source": [
162162
"%%bash\n",
163163
"# Install additional packages for visualization\n",
164-
"sudo apt-get install -y xvfb python-opengl > /dev/null 2>&1\n",
165-
"pip install pyvirtualdisplay > /dev/null 2>&1\n",
164+
"sudo apt-get install -y python-opengl > /dev/null 2>&1\n",
166165
"pip install git+https://github.com/tensorflow/docs > /dev/null 2>&1"
167166
]
168167
},
@@ -187,11 +186,10 @@
187186
"\n",
188187
"\n",
189188
"# Create the environment\n",
190-
"env = gym.make(\"CartPole-v0\")\n",
189+
"env = gym.make(\"CartPole-v1\")\n",
191190
"\n",
192191
"# Set seed for experiment reproducibility\n",
193192
"seed = 42\n",
194-
"env.seed(seed)\n",
195193
"tf.random.set_seed(seed)\n",
196194
"np.random.seed(seed)\n",
197195
"\n",
@@ -307,7 +305,7 @@
307305
"def env_step(action: np.ndarray) -> Tuple[np.ndarray, np.ndarray, np.ndarray]:\n",
308306
" \"\"\"Returns state, reward and done flag given an action.\"\"\"\n",
309307
"\n",
310-
" state, reward, done, _ = env.step(action)\n",
308+
" state, reward, done, truncated, info = env.step(action)\n",
311309
" return (state.astype(np.float32), \n",
312310
" np.array(reward, np.int32), \n",
313311
" np.array(done, np.int32))\n",
@@ -431,15 +429,22 @@
431429
{
432430
"cell_type": "markdown",
433431
"metadata": {
434-
"id": "1hrPLrgGxlvb"
432+
"id": "qhr50_Czxazw"
435433
},
436434
"source": [
437435
"### 3. The actor-critic loss\n",
438436
"\n",
439437
"Since a hybrid actor-critic model is used, the chosen loss function is a combination of actor and critic losses for training, as shown below:\n",
440438
"\n",
441-
"$$L = L_{actor} + L_{critic}$$\n",
442-
"\n",
439+
"$$L = L_{actor} + L_{critic}$$"
440+
]
441+
},
442+
{
443+
"cell_type": "markdown",
444+
"metadata": {
445+
"id": "nOQIJuG1xdTH"
446+
},
447+
"source": [
443448
"#### Actor loss\n",
444449
"\n",
445450
"The actor loss is based on [policy gradients with the critic as a state dependent baseline](https://www.youtube.com/watch?v=EKqxumCuAAY&t=62m23s) and computed with single-sample (per-episode) estimates.\n",
@@ -456,8 +461,15 @@
456461
"\n",
457462
"A negative term is added to the sum since the idea is to maximize the probabilities of actions yielding higher rewards by minimizing the combined loss.\n",
458463
"\n",
459-
"<br>\n",
460-
"\n",
464+
"<br>"
465+
]
466+
},
467+
{
468+
"cell_type": "markdown",
469+
"metadata": {
470+
"id": "Y304O4OAxiAv"
471+
},
472+
"source": [
461473
"##### Advantage\n",
462474
"\n",
463475
"The $G - V$ term in our $L_{actor}$ formulation is called the [advantage](https://spinningup.openai.com/en/latest/spinningup/rl_intro.html#advantage-functions), which indicates how much better an action is given a particular state over a random action selected according to the policy $\\pi$ for that state.\n",
@@ -468,8 +480,15 @@
468480
"\n",
469481
"For instance, suppose that two actions for a given state would yield the same expected return. Without the critic, the algorithm would try to raise the probability of these actions based on the objective $J$. With the critic, it may turn out that there's no advantage ($G - V = 0$) and thus no benefit gained in increasing the actions' probabilities and the algorithm would set the gradients to zero.\n",
470482
"\n",
471-
"<br>\n",
472-
"\n",
483+
"<br>"
484+
]
485+
},
486+
{
487+
"cell_type": "markdown",
488+
"metadata": {
489+
"id": "1hrPLrgGxlvb"
490+
},
491+
"source": [
473492
"#### Critic loss\n",
474493
"\n",
475494
"Training $V$ to be as close possible to $G$ can be set up as a regression problem with the following loss function:\n",
@@ -596,11 +615,11 @@
596615
"\n",
597616
"min_episodes_criterion = 100\n",
598617
"max_episodes = 10000\n",
599-
"max_steps_per_episode = 1000\n",
618+
"max_steps_per_episode = 500\n",
600619
"\n",
601-
"# Cartpole-v0 is considered solved if average reward is >= 195 over 100 \n",
620+
"# Cartpole-v1 is considered solved if average reward is >= 475 over 500 \n",
602621
"# consecutive trials\n",
603-
"reward_threshold = 195\n",
622+
"reward_threshold = 475\n",
604623
"running_reward = 0\n",
605624
"\n",
606625
"# Discount factor for future rewards\n",
@@ -609,16 +628,17 @@
609628
"# Keep last episodes reward\n",
610629
"episodes_reward: collections.deque = collections.deque(maxlen=min_episodes_criterion)\n",
611630
"\n",
612-
"with tqdm.trange(max_episodes) as t:\n",
613-
" for i in t:\n",
614-
" initial_state = tf.constant(env.reset(), dtype=tf.float32)\n",
631+
"t = tqdm.trange(max_episodes)\n",
632+
"for i in t:\n",
633+
" initial_state, info = env.reset()\n",
634+
" initial_state = tf.constant(initial_state, dtype=tf.float32)\n",
615635
" episode_reward = int(train_step(\n",
616636
" initial_state, model, optimizer, gamma, max_steps_per_episode))\n",
617637
" \n",
618638
" episodes_reward.append(episode_reward)\n",
619639
" running_reward = statistics.mean(episodes_reward)\n",
620640
" \n",
621-
" t.set_description(f'Episode {i}')\n",
641+
"\n",
622642
" t.set_postfix(\n",
623643
" episode_reward=episode_reward, running_reward=running_reward)\n",
624644
" \n",
@@ -655,31 +675,26 @@
655675
"\n",
656676
"from IPython import display as ipythondisplay\n",
657677
"from PIL import Image\n",
658-
"from pyvirtualdisplay import Display\n",
659-
"\n",
660-
"\n",
661-
"display = Display(visible=0, size=(400, 300))\n",
662-
"display.start()\n",
663678
"\n",
679+
"render_env = gym.make(\"CartPole-v1\", render_mode='rgb_array')\n",
664680
"\n",
665681
"def render_episode(env: gym.Env, model: tf.keras.Model, max_steps: int): \n",
666-
" screen = env.render(mode='rgb_array')\n",
667-
" im = Image.fromarray(screen)\n",
668-
"\n",
669-
" images = [im]\n",
670-
" \n",
671-
" state = tf.constant(env.reset(), dtype=tf.float32)\n",
682+
" state, info = render_env.reset()\n",
683+
" state = tf.constant(state, dtype=tf.float32)\n",
684+
" screen = render_env.render()\n",
685+
" images = [Image.fromarray(screen)]\n",
686+
" \n",
672687
" for i in range(1, max_steps + 1):\n",
673688
" state = tf.expand_dims(state, 0)\n",
674689
" action_probs, _ = model(state)\n",
675690
" action = np.argmax(np.squeeze(action_probs))\n",
676691
"\n",
677-
" state, _, done, _ = env.step(action)\n",
692+
" state, reward, done, truncated, info = render_env.step(action)\n",
678693
" state = tf.constant(state, dtype=tf.float32)\n",
679694
"\n",
680695
" # Render screen every 10 steps\n",
681696
" if i % 10 == 0:\n",
682-
" screen = env.render(mode='rgb_array')\n",
697+
" screen = render_env.render()\n",
683698
" images.append(Image.fromarray(screen))\n",
684699
" \n",
685700
" if done:\n",
@@ -690,7 +705,7 @@
690705
"\n",
691706
"# Save GIF image\n",
692707
"images = render_episode(env, model, max_steps_per_episode)\n",
693-
"image_file = 'cartpole-v0.gif'\n",
708+
"image_file = 'cartpole-v1.gif'\n",
694709
"# loop=0: loop forever, duration=1: play each frame for 1ms\n",
695710
"images[0].save(\n",
696711
" image_file, save_all=True, append_images=images[1:], loop=0, duration=1)"

site/en/tutorials/video/video_classification.ipynb

Lines changed: 8 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -115,11 +115,7 @@
115115
"\n",
116116
"import tensorflow as tf\n",
117117
"import keras\n",
118-
"from keras.models import Model, Sequential\n",
119-
"from tensorflow.keras import layers\n",
120-
"from tensorflow.keras.optimizers import Adam\n",
121-
"from keras.losses import sparse_categorical_crossentropy\n",
122-
"from keras.utils.vis_utils import plot_model"
118+
"from keras import layers\n"
123119
]
124120
},
125121
{
@@ -550,7 +546,7 @@
550546
" \"\"\"\n",
551547
" def __init__(self, units):\n",
552548
" super().__init__()\n",
553-
" self.seq = Sequential([\n",
549+
" self.seq = keras.Sequential([\n",
554550
" layers.Dense(units),\n",
555551
" layers.LayerNormalization()\n",
556552
" ])\n",
@@ -681,9 +677,9 @@
681677
"\n",
682678
"x = layers.GlobalAveragePooling3D()(x)\n",
683679
"x = layers.Flatten()(x)\n",
684-
"x = layers.Dense(10, activation = 'softmax')(x)\n",
680+
"x = layers.Dense(10)(x)\n",
685681
"\n",
686-
"model = Model(input, x)"
682+
"model = keras.Model(input, x)"
687683
]
688684
},
689685
{
@@ -707,7 +703,7 @@
707703
"outputs": [],
708704
"source": [
709705
"# Visualize the model\n",
710-
"plot_model(model, expand_nested=True, dpi=60, show_shapes=True)"
706+
"keras.utils.plot_model(model, expand_nested=True, dpi=60, show_shapes=True)"
711707
]
712708
},
713709
{
@@ -729,8 +725,8 @@
729725
},
730726
"outputs": [],
731727
"source": [
732-
"model.compile(loss = sparse_categorical_crossentropy, \n",
733-
" optimizer = Adam(learning_rate = 0.0001), \n",
728+
"model.compile(loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True), \n",
729+
" optimizer = keras.optimizers.Adam(learning_rate = 0.0001), \n",
734730
" metrics = ['accuracy'])"
735731
]
736732
},
@@ -740,8 +736,6 @@
740736
"id": "nZT1Xlx9stP2"
741737
},
742738
"source": [
743-
"\n",
744-
"\n",
745739
"Train the model for 50 epoches with the Keras `Model.fit` method.\n",
746740
"\n",
747741
"Note: This example model is trained on fewer data points (300 training and 100 validation examples) to keep training time reasonable for this tutorial. Moreover, this example model may take over one hour to train."
@@ -871,7 +865,7 @@
871865
" actual = [labels for _, labels in dataset.unbatch()]\n",
872866
" predicted = model.predict(dataset)\n",
873867
"\n",
874-
" actual = tf.concat(actual, axis=0)\n",
868+
" actual = tf.stack(actual, axis=0)\n",
875869
" predicted = tf.concat(predicted, axis=0)\n",
876870
" predicted = tf.argmax(predicted, axis=1)\n",
877871
"\n",

0 commit comments

Comments
 (0)