Skip to content

Commit d0a7b3a

Browse files
DEKHTIARJonathanzsdonghao
authored andcommitted
AMSGrad optimizer added - Issue #583 (#636)
* AMSGrad and related unittest added. Missing Documentation. * YAPF error correct * PR number added * Codacy errors fix * TL documentation updated * Documentation Error Fix * Changelog Updated
1 parent 36dd7ae commit d0a7b3a

File tree

7 files changed

+316
-5
lines changed

7 files changed

+316
-5
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,13 +77,18 @@ To release a new version, please update the changelog as followed:
7777
- `test_utils_predict.py` added to reproduce and fix issue #288 (by @2wins in #566)
7878
- `Layer_DeformableConvolution_Test` added to reproduce issue #572 with deformable convolution (by @DEKHTIARJonathan in #573)
7979
- `Array_Op_Alphas_Test` and `Array_Op_Alphas_Like_Test` added to test `tensorlayer/array_ops.py` file (by @DEKHTIARJonathan in #580)
80+
- `test_optimizer_amsgrad.py` added to test `AMSGrad` optimizer (by @DEKHTIARJonathan in #636)
8081
- CI Tool:
8182
- Danger CI has been added to enforce the update of the changelog (by @lgarithm and @DEKHTIARJonathan in #563)
8283
- https://github.com/apps/stale/ added to clean stale issues (by @DEKHTIARJonathan in #573)
8384
- Layer:
8485
- ElementwiseLambdaLayer added to use custom function to connect multiple layer inputs (by @One-sixth in #579)
8586
- Documentation:
8687
- Release semantic version added on index page (by @DEKHTIARJonathan in #633)
88+
- Optimizers page added (by @DEKHTIARJonathan in #636)
89+
- `AMSGrad` added on Optimizers page added (by @DEKHTIARJonathan in #636)
90+
- Optimizer:
91+
- AMSGrad Optimizer added based on `On the Convergence of Adam and Beyond (ICLR 2018)` (by @DEKHTIARJonathan in #636)
8792

8893
### Changed
8994
- Tensorflow CPU & GPU dependencies moved to separated requirement files in order to allow PyUP.io to parse them (by @DEKHTIARJonathan in #573)

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ method, this part of the documentation is for you.
5353
modules/models
5454
modules/nlp
5555
modules/layers
56+
modules/optimizers
5657
modules/prepro
5758
modules/rein
5859
modules/utils

docs/modules/optimizers.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
API - Optimizers
2+
================
3+
4+
.. automodule:: tensorlayer.optimizers
5+
6+
TensorLayer provides simple API and tools to ease research, development and reduce the time to production.
7+
Therefore, we provide the latest state of the art optimizers that work with Tensorflow.
8+
9+
Optimizers List
10+
---------------
11+
12+
.. autosummary::
13+
14+
AMSGrad
15+
16+
AMSGrad Optimizer
17+
-----------------
18+
.. autoclass:: AMSGrad
19+
:members:

tensorlayer/__init__.py

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,23 +7,28 @@
77
import tensorflow
88

99
from . import activation
10-
from .array_ops import alphas, alphas_like
10+
from . import array_ops
1111
from . import cost
12+
from . import distributed
1213
from . import files
1314
from . import iterate
1415
from . import layers
1516
from . import models
16-
from . import utils
17-
from . import visualize
18-
from . import prepro
1917
from . import nlp
18+
from . import optimizers
19+
from . import prepro
2020
from . import rein
21-
from . import distributed
21+
from . import utils
22+
from . import visualize
2223

2324
# alias
2425
act = activation
2526
vis = visualize
2627

28+
alphas = array_ops.alphas
29+
alphas_like = array_ops.alphas_like
30+
31+
# global vars
2732
global_flag = {}
2833
global_dict = {}
2934

tensorlayer/optimizers/__init__.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
"""
2+
TensorLayer provides rich layer implementations trailed for
3+
various benchmarks and domain-specific problems. In addition, we also
4+
support transparent access to native TensorFlow parameters.
5+
For example, we provide not only layers for local response normalization, but also
6+
layers that allow user to apply ``tf.nn.lrn`` on ``network.outputs``.
7+
More functions can be found in `TensorFlow API <https://www.tensorflow.org/versions/master/api_docs/index.html>`__.
8+
"""
9+
10+
from .amsgrad import AMSGrad

tensorlayer/optimizers/amsgrad.py

Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
"""AMSGrad Implementation based on the paper: "On the Convergence of Adam and Beyond" (ICLR 2018)
2+
Article Link: https://openreview.net/pdf?id=ryQu7f-RZ
3+
Original Implementation by: https://github.com/taki0112/AMSGrad-Tensorflow
4+
"""
5+
6+
from tensorflow.python.eager import context
7+
from tensorflow.python.framework import ops
8+
from tensorflow.python.ops import control_flow_ops
9+
from tensorflow.python.ops import math_ops
10+
from tensorflow.python.ops import resource_variable_ops
11+
from tensorflow.python.ops import state_ops
12+
from tensorflow.python.ops import variable_scope
13+
from tensorflow.python.training import optimizer
14+
15+
16+
class AMSGrad(optimizer.Optimizer):
17+
"""Implementation of the AMSGrad optimization algorithm.\n
18+
See: `On the Convergence of Adam and Beyond - [Reddi et al., 2018] <https://openreview.net/pdf?id=ryQu7f-RZ>`__.
19+
20+
Parameters
21+
----------
22+
learning_rate: float
23+
A Tensor or a floating point value. The learning rate.
24+
beta1: float
25+
A float value or a constant float tensor.
26+
The exponential decay rate for the 1st moment estimates.
27+
beta2: float
28+
A float value or a constant float tensor.
29+
The exponential decay rate for the 2nd moment estimates.
30+
epsilon: float
31+
A small constant for numerical stability.
32+
This epsilon is "epsilon hat" in the Kingma and Ba paper
33+
(in the formula just before Section 2.1), not the epsilon in Algorithm 1 of the paper.
34+
use_locking: bool
35+
If True use locks for update operations.
36+
name: str
37+
Optional name for the operations created when applying gradients.
38+
Defaults to "AMSGrad".
39+
"""
40+
41+
def __init__(self, learning_rate=0.01, beta1=0.9, beta2=0.99, epsilon=1e-8, use_locking=False, name="AMSGrad"):
42+
"""Construct a new Adam optimizer.
43+
"""
44+
super(AMSGrad, self).__init__(use_locking, name)
45+
self._lr = learning_rate
46+
self._beta1 = beta1
47+
self._beta2 = beta2
48+
self._epsilon = epsilon
49+
50+
self._lr_t = None
51+
self._beta1_t = None
52+
self._beta2_t = None
53+
self._epsilon_t = None
54+
55+
self._beta1_power = None
56+
self._beta2_power = None
57+
58+
def _create_slots(self, var_list):
59+
first_var = min(var_list, key=lambda x: x.name)
60+
61+
create_new = self._beta1_power is None
62+
if not create_new and context.in_graph_mode():
63+
create_new = (self._beta1_power.graph is not first_var.graph)
64+
65+
if create_new:
66+
with ops.colocate_with(first_var):
67+
self._beta1_power = variable_scope.variable(self._beta1, name="beta1_power", trainable=False)
68+
self._beta2_power = variable_scope.variable(self._beta2, name="beta2_power", trainable=False)
69+
# Create slots for the first and second moments.
70+
for v in var_list:
71+
self._zeros_slot(v, "m", self._name)
72+
self._zeros_slot(v, "v", self._name)
73+
self._zeros_slot(v, "vhat", self._name)
74+
75+
def _prepare(self):
76+
self._lr_t = ops.convert_to_tensor(self._lr)
77+
self._beta1_t = ops.convert_to_tensor(self._beta1)
78+
self._beta2_t = ops.convert_to_tensor(self._beta2)
79+
self._epsilon_t = ops.convert_to_tensor(self._epsilon)
80+
81+
def _apply_dense(self, grad, var):
82+
beta1_power = math_ops.cast(self._beta1_power, var.dtype.base_dtype)
83+
beta2_power = math_ops.cast(self._beta2_power, var.dtype.base_dtype)
84+
lr_t = math_ops.cast(self._lr_t, var.dtype.base_dtype)
85+
beta1_t = math_ops.cast(self._beta1_t, var.dtype.base_dtype)
86+
beta2_t = math_ops.cast(self._beta2_t, var.dtype.base_dtype)
87+
epsilon_t = math_ops.cast(self._epsilon_t, var.dtype.base_dtype)
88+
89+
lr = (lr_t * math_ops.sqrt(1 - beta2_power) / (1 - beta1_power))
90+
91+
# m_t = beta1 * m + (1 - beta1) * g_t
92+
m = self.get_slot(var, "m")
93+
m_scaled_g_values = grad * (1 - beta1_t)
94+
m_t = state_ops.assign(m, beta1_t * m + m_scaled_g_values, use_locking=self._use_locking)
95+
96+
# v_t = beta2 * v + (1 - beta2) * (g_t * g_t)
97+
v = self.get_slot(var, "v")
98+
v_scaled_g_values = (grad * grad) * (1 - beta2_t)
99+
v_t = state_ops.assign(v, beta2_t * v + v_scaled_g_values, use_locking=self._use_locking)
100+
101+
# amsgrad
102+
vhat = self.get_slot(var, "vhat")
103+
vhat_t = state_ops.assign(vhat, math_ops.maximum(v_t, vhat))
104+
v_sqrt = math_ops.sqrt(vhat_t)
105+
106+
var_update = state_ops.assign_sub(var, lr * m_t / (v_sqrt + epsilon_t), use_locking=self._use_locking)
107+
return control_flow_ops.group(*[var_update, m_t, v_t, vhat_t])
108+
109+
def _resource_apply_dense(self, grad, var):
110+
var = var.handle
111+
beta1_power = math_ops.cast(self._beta1_power, grad.dtype.base_dtype)
112+
beta2_power = math_ops.cast(self._beta2_power, grad.dtype.base_dtype)
113+
lr_t = math_ops.cast(self._lr_t, grad.dtype.base_dtype)
114+
beta1_t = math_ops.cast(self._beta1_t, grad.dtype.base_dtype)
115+
beta2_t = math_ops.cast(self._beta2_t, grad.dtype.base_dtype)
116+
epsilon_t = math_ops.cast(self._epsilon_t, grad.dtype.base_dtype)
117+
118+
lr = (lr_t * math_ops.sqrt(1 - beta2_power) / (1 - beta1_power))
119+
120+
# m_t = beta1 * m + (1 - beta1) * g_t
121+
m = self.get_slot(var, "m").handle
122+
m_scaled_g_values = grad * (1 - beta1_t)
123+
m_t = state_ops.assign(m, beta1_t * m + m_scaled_g_values, use_locking=self._use_locking)
124+
125+
# v_t = beta2 * v + (1 - beta2) * (g_t * g_t)
126+
v = self.get_slot(var, "v").handle
127+
v_scaled_g_values = (grad * grad) * (1 - beta2_t)
128+
v_t = state_ops.assign(v, beta2_t * v + v_scaled_g_values, use_locking=self._use_locking)
129+
130+
# amsgrad
131+
vhat = self.get_slot(var, "vhat").handle
132+
vhat_t = state_ops.assign(vhat, math_ops.maximum(v_t, vhat))
133+
v_sqrt = math_ops.sqrt(vhat_t)
134+
135+
var_update = state_ops.assign_sub(var, lr * m_t / (v_sqrt + epsilon_t), use_locking=self._use_locking)
136+
return control_flow_ops.group(*[var_update, m_t, v_t, vhat_t])
137+
138+
def _apply_sparse_shared(self, grad, var, indices, scatter_add):
139+
beta1_power = math_ops.cast(self._beta1_power, var.dtype.base_dtype)
140+
beta2_power = math_ops.cast(self._beta2_power, var.dtype.base_dtype)
141+
lr_t = math_ops.cast(self._lr_t, var.dtype.base_dtype)
142+
beta1_t = math_ops.cast(self._beta1_t, var.dtype.base_dtype)
143+
beta2_t = math_ops.cast(self._beta2_t, var.dtype.base_dtype)
144+
epsilon_t = math_ops.cast(self._epsilon_t, var.dtype.base_dtype)
145+
146+
lr = (lr_t * math_ops.sqrt(1 - beta2_power) / (1 - beta1_power))
147+
148+
# m_t = beta1 * m + (1 - beta1) * g_t
149+
m = self.get_slot(var, "m")
150+
m_scaled_g_values = grad * (1 - beta1_t)
151+
m_t = state_ops.assign(m, m * beta1_t, use_locking=self._use_locking)
152+
with ops.control_dependencies([m_t]):
153+
m_t = scatter_add(m, indices, m_scaled_g_values)
154+
155+
# v_t = beta2 * v + (1 - beta2) * (g_t * g_t)
156+
v = self.get_slot(var, "v")
157+
v_scaled_g_values = (grad * grad) * (1 - beta2_t)
158+
v_t = state_ops.assign(v, v * beta2_t, use_locking=self._use_locking)
159+
with ops.control_dependencies([v_t]):
160+
v_t = scatter_add(v, indices, v_scaled_g_values)
161+
162+
# amsgrad
163+
vhat = self.get_slot(var, "vhat")
164+
vhat_t = state_ops.assign(vhat, math_ops.maximum(v_t, vhat))
165+
v_sqrt = math_ops.sqrt(vhat_t)
166+
var_update = state_ops.assign_sub(var, lr * m_t / (v_sqrt + epsilon_t), use_locking=self._use_locking)
167+
return control_flow_ops.group(*[var_update, m_t, v_t, vhat_t])
168+
169+
def _apply_sparse(self, grad, var):
170+
return self._apply_sparse_shared(
171+
grad.values,
172+
var,
173+
grad.indices,
174+
lambda x, i, v: state_ops.
175+
scatter_add( # pylint: disable=g-long-lambda
176+
x, i, v, use_locking=self._use_locking
177+
)
178+
)
179+
180+
def _resource_scatter_add(self, x, i, v):
181+
with ops.control_dependencies([resource_variable_ops.resource_scatter_add(x.handle, i, v)]):
182+
return x.value()
183+
184+
def _resource_apply_sparse(self, grad, var, indices):
185+
return self._apply_sparse_shared(grad, var, indices, self._resource_scatter_add)
186+
187+
def _finish(self, update_ops, name_scope):
188+
# Update the power accumulators.
189+
with ops.control_dependencies(update_ops):
190+
with ops.colocate_with(self._beta1_power):
191+
update_beta1 = self._beta1_power.assign(
192+
self._beta1_power * self._beta1_t, use_locking=self._use_locking
193+
)
194+
update_beta2 = self._beta2_power.assign(
195+
self._beta2_power * self._beta2_t, use_locking=self._use_locking
196+
)
197+
return control_flow_ops.group(*update_ops + [update_beta1, update_beta2], name=name_scope)

tests/test_optimizer_amsgrad.py

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
#!/usr/bin/env python
2+
# -*- coding: utf-8 -*-
3+
import unittest
4+
5+
import tensorflow as tf
6+
import tensorlayer as tl
7+
8+
try:
9+
from tests.unittests_helper import CustomTestCase
10+
except ImportError:
11+
from unittests_helper import CustomTestCase
12+
13+
14+
class Layer_Pooling_Test(CustomTestCase):
15+
16+
@classmethod
17+
def setUpClass(cls):
18+
cls.x = tf.placeholder(tf.float32, shape=[None, 784], name='x')
19+
cls.y_ = tf.placeholder(tf.int64, shape=[None], name='y_')
20+
21+
# define the network
22+
cls.network = tl.layers.InputLayer(cls.x, name='input')
23+
cls.network = tl.layers.DropoutLayer(cls.network, keep=0.8, name='drop1')
24+
cls.network = tl.layers.DenseLayer(cls.network, 800, tf.nn.relu, name='relu1')
25+
cls.network = tl.layers.DropoutLayer(cls.network, keep=0.5, name='drop2')
26+
cls.network = tl.layers.DenseLayer(cls.network, 800, tf.nn.relu, name='relu2')
27+
cls.network = tl.layers.DropoutLayer(cls.network, keep=0.5, name='drop3')
28+
29+
cls.network = tl.layers.DenseLayer(cls.network, n_units=10, act=tf.identity, name='output')
30+
31+
# define cost function and metric.
32+
cls.y = cls.network.outputs
33+
cls.cost = tl.cost.cross_entropy(cls.y, cls.y_, name='cost')
34+
35+
correct_prediction = tf.equal(tf.argmax(cls.y, 1), cls.y_)
36+
37+
cls.acc = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
38+
39+
# define the optimizer
40+
train_params = cls.network.all_params
41+
optimizer = tl.optimizers.AMSGrad(learning_rate=1e-4, beta1=0.9, beta2=0.999, epsilon=1e-8)
42+
cls.train_op = optimizer.minimize(cls.cost, var_list=train_params)
43+
44+
@classmethod
45+
def tearDownClass(cls):
46+
tf.reset_default_graph()
47+
48+
def test_training(self):
49+
50+
with self.assertNotRaises(Exception):
51+
52+
X_train, y_train, X_val, y_val, _, _ = tl.files.load_mnist_dataset(shape=(-1, 784))
53+
54+
with tf.Session() as sess:
55+
# initialize all variables in the session
56+
tl.layers.initialize_global_variables(sess)
57+
58+
# print network information
59+
self.network.print_params()
60+
self.network.print_layers()
61+
62+
# train the network
63+
tl.utils.fit(
64+
sess, self.network, self.train_op, self.cost, X_train, y_train, self.x, self.y_, acc=self.acc,
65+
batch_size=500, n_epoch=2, print_freq=1, X_val=X_val, y_val=y_val, eval_train=False
66+
)
67+
68+
69+
if __name__ == '__main__':
70+
71+
# tf.logging.set_verbosity(tf.logging.INFO)
72+
tf.logging.set_verbosity(tf.logging.DEBUG)
73+
74+
unittest.main()

0 commit comments

Comments
 (0)