@@ -193,16 +193,55 @@ def logp_(z_g):
193
193
def advi_minibatch (vars = None , start = None , model = None , n = 5000 , n_mcsamples = 1 ,
194
194
minibatch_RVs = None , minibatch_tensors = None , minibatches = None ,
195
195
local_RVs = None , observed_RVs = None , encoder_params = [],
196
- total_size = None , scales = None , optimizer = None , learning_rate = .001 ,
196
+ total_size = None , optimizer = None , learning_rate = .001 ,
197
197
epsilon = .1 , random_seed = None , verbose = 1 ):
198
- """Run mini-batch ADVI.
198
+ """Perform mini-batch ADVI.
199
199
200
- minibatch_tensors and minibatches should be in the same order.
200
+ This function implements a mini-batch ADVI with the meanfield
201
+ approximation. Autoencoding variational inference is also supported.
202
+
203
+ The log probability terms for mini-batches, corresponding to RVs in
204
+ minibatch_RVs, are scaled to (total_size) / (the number of samples in each
205
+ mini-batch), where total_size is an argument for the total data size.
206
+
207
+ minibatch_tensors is a list of tensors (can be shared variables) to which
208
+ mini-batch samples are set during the optimization. In most cases, these
209
+ tensors are observations for RVs in the model.
210
+
211
+ local_RVs and observed_RVs are used for autoencoding variational Bayes.
212
+ Both of these RVs are associated with each of given samples.
213
+ The difference is that local_RVs are unkown and their posterior
214
+ distributions are approximated.
215
+
216
+ local_RVs are Ordered dict, whose keys and values are RVs and a tuple of
217
+ two objects. The first is the theano expression of variational parameters
218
+ (mean and log of std) of the approximate posterior, which are encoded from
219
+ given samples by an arbitrary deterministic function, e.g., MLP. The other
220
+ one is a scaling constant to be multiplied to the log probability term
221
+ corresponding to the RV.
222
+
223
+ observed_RVs are also Ordered dict with RVs as the keys, but whose values
224
+ are only the scaling constant as in local_RVs. In this case, total_size is
225
+ ignored.
226
+
227
+ If local_RVs is None (thus not using autoencoder), the following two
228
+ settings are equivalent:
229
+
230
+ - observed_RVs=OrderedDict([(rv, total_size / minibatch_size)])
231
+ - minibatch_RVs=[rv], total_size=total_size
232
+
233
+ where minibatch_size is minibatch_tensors[0].shape[0].
234
+
235
+ The variational parameters and the parameters of the autoencoder are
236
+ simultaneously optimized with given optimizer, which is a function that
237
+ returns a dictionary of parameter updates as provided to Theano function.
238
+ See the docstring of pymc3.variational.advi().
201
239
202
240
Parameters
203
241
----------
204
242
vars : object
205
- Random variables.
243
+ List of random variables. If None, variational posteriors (normal
244
+ distribution) are fit for all RVs in the given model.
206
245
start : Dict or None
207
246
Initial values of parameters (variational means).
208
247
model : Model
@@ -212,25 +251,36 @@ def advi_minibatch(vars=None, start=None, model=None, n=5000, n_mcsamples=1,
212
251
n_mcsamples : int
213
252
Number of Monte Carlo samples to approximate ELBO.
214
253
minibatch_RVs : list of ObservedRVs
215
- Random variables for mini-batch.
254
+ Random variables in the model for which mini-batch tensors are set.
255
+ When this argument is given, both of arguments local_RVs and global_RVs
256
+ must be None.
216
257
minibatch_tensors : list of (tensors or shared variables)
217
258
Tensors used to create ObservedRVs in minibatch_RVs.
218
259
minibatches : generator of list
219
260
Generates a set of minibatches when calling next().
220
261
The length of the returned list must be the same with the number of
221
262
random variables in `minibatch_tensors`.
222
263
total_size : int
223
- Total size of training samples.
264
+ Total size of training samples. This is used to appropriately scale the
265
+ log likelihood terms corresponding to mini-batches in ELBO.
266
+ local_RVs : Ordered dict
267
+ Include encoded variational parameters and a scaling constant for
268
+ the corresponding RV. See the above description.
269
+ observed_RVs : Ordered dict
270
+ Include a scaling constant for the corresponding RV. See the above
271
+ description
272
+ encoder_params : list of theano shared variables
273
+ Parameters of encoder.
224
274
optimizer : (loss, tensor) -> dict or OrderedDict
225
275
A function that returns parameter updates given loss and parameter
226
276
tensor. If :code:`None` (default), a default Adagrad optimizer is
227
277
used with parameters :code:`learning_rate` and :code:`epsilon` below.
228
278
learning_rate: float
229
279
Base learning rate for adagrad. This parameter is ignored when
230
- optimizer is given.
280
+ an optimizer is given.
231
281
epsilon : float
232
282
Offset in denominator of the scale of learning rate in Adagrad.
233
- This parameter is ignored when optimizer is given.
283
+ This parameter is ignored when an optimizer is given.
234
284
random_seed : int
235
285
Seed to initialize random state.
236
286
0 commit comments