Skip to content

Commit c3bc1b8

Browse files
authored
Update link to Hoffman paper (online VB LDA)
The previous link was Matthew Hoffman's Google Scholar profile or not the official one. Use full author names in the first occurrence and first author only afterward.
1 parent 0be9891 commit c3bc1b8

File tree

1 file changed

+18
-16
lines changed

1 file changed

+18
-16
lines changed

gensim/models/ldamodel.py

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,10 @@
1313
for online training.
1414
1515
The core estimation code is based on the `onlineldavb.py script
16-
<https://github.com/blei-lab/onlineldavb/blob/master/onlineldavb.py>`_, by `Hoffman, Blei, Bach:
16+
<https://github.com/blei-lab/onlineldavb/blob/master/onlineldavb.py>`_, by
17+
`Matthew D. Hoffman, David M. Blei, Francis Bach:
1718
Online Learning for Latent Dirichlet Allocation, NIPS 2010
18-
<https://scholar.google.com/citations?hl=en&user=IeHKeGYAAAAJ&view_op=list_works>`_.
19+
<https://papers.neurips.cc/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf>`_.
1920
2021
The algorithm:
2122
@@ -199,7 +200,7 @@ def blend(self, rhot, other, targetsize=None):
199200
The number of documents is stretched in both state objects, so that they are of comparable magnitude.
200201
This procedure corresponds to the stochastic gradient update from
201202
`Hoffman et al. :"Online Learning for Latent Dirichlet Allocation"
202-
<https://www.di.ens.fr/~fbach/mdhnips2010.pdf>`_, see equations (5) and (9).
203+
<https://papers.neurips.cc/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf>`_, see equations (5) and (9).
203204
204205
Parameters
205206
----------
@@ -311,8 +312,9 @@ def load(cls, fname, *args, **kwargs):
311312

312313

313314
class LdaModel(interfaces.TransformationABC, basemodel.BaseTopicModel):
314-
"""Train and use Online Latent Dirichlet Allocation (OLDA) models as presented in
315-
`Hoffman et al. :"Online Learning for Latent Dirichlet Allocation" <https://www.di.ens.fr/~fbach/mdhnips2010.pdf>`_.
315+
"""Train and use Online Latent Dirichlet Allocation models as presented in
316+
`Hoffman et al. :"Online Learning for Latent Dirichlet Allocation"
317+
<https://papers.neurips.cc/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf>`_.
316318
317319
Examples
318320
-------
@@ -395,12 +397,12 @@ def __init__(self, corpus=None, num_topics=100, id2word=None,
395397
decay : float, optional
396398
A number between (0.5, 1] to weight what percentage of the previous lambda value is forgotten
397399
when each new document is examined. Corresponds to Kappa from
398-
`Matthew D. Hoffman, David M. Blei, Francis Bach:
399-
"Online Learning for Latent Dirichlet Allocation NIPS'10" <https://www.di.ens.fr/~fbach/mdhnips2010.pdf>`_.
400+
`Hoffman et al. :"Online Learning for Latent Dirichlet Allocation"
401+
<https://papers.neurips.cc/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf>`_.
400402
offset : float, optional
401403
Hyper-parameter that controls how much we will slow down the first steps the first few iterations.
402-
Corresponds to Tau_0 from `Matthew D. Hoffman, David M. Blei, Francis Bach:
403-
"Online Learning for Latent Dirichlet Allocation NIPS'10" <https://www.di.ens.fr/~fbach/mdhnips2010.pdf>`_.
404+
Corresponds to Tau_0 from `Hoffman et al. :"Online Learning for Latent Dirichlet Allocation"
405+
<https://papers.neurips.cc/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf>`_.
404406
eval_every : int, optional
405407
Log perplexity is estimated every that many updates. Setting this to one slows down training by ~2x.
406408
iterations : int, optional
@@ -643,7 +645,7 @@ def inference(self, chunk, collect_sstats=False):
643645
"""Given a chunk of sparse document vectors, estimate gamma (parameters controlling the topic weights)
644646
for each document in the chunk.
645647
646-
This function does not modify the model The whole input chunk of document is assumed to fit in RAM;
648+
This function does not modify the model. The whole input chunk of document is assumed to fit in RAM;
647649
chunking of a large corpus must be done earlier in the pipeline. Avoids computing the `phi` variational
648650
parameter directly using the optimization presented in
649651
`Lee, Seung: Algorithms for non-negative matrix factorization"
@@ -863,8 +865,8 @@ def update(self, corpus, chunksize=None, decay=None, offset=None,
863865
This update also supports updating an already trained model with new documents; the two models are then merged
864866
in proportion to the number of old vs. new documents. This feature is still experimental for non-stationary
865867
input streams. For stationary input (no topic drift in new documents), on the other hand, this equals the
866-
online update of `Matthew D. Hoffman, David M. Blei, Francis Bach:
867-
"Online Learning for Latent Dirichlet Allocation NIPS'10" <https://www.di.ens.fr/~fbach/mdhnips2010.pdf>`_.
868+
online update of `Hoffman et al. :"Online Learning for Latent Dirichlet Allocation"
869+
<https://papers.neurips.cc/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf>`_
868870
and is guaranteed to converge for any `decay` in (0.5, 1.0). Additionally, for smaller corpus sizes, an
869871
increasing `offset` may be beneficial (see Table 1 in the same paper).
870872
@@ -878,12 +880,12 @@ def update(self, corpus, chunksize=None, decay=None, offset=None,
878880
decay : float, optional
879881
A number between (0.5, 1] to weight what percentage of the previous lambda value is forgotten
880882
when each new document is examined. Corresponds to Kappa from
881-
`Matthew D. Hoffman, David M. Blei, Francis Bach:
882-
"Online Learning for Latent Dirichlet Allocation NIPS'10" <https://www.di.ens.fr/~fbach/mdhnips2010.pdf>`_.
883+
`Hoffman et al. :"Online Learning for Latent Dirichlet Allocation"
884+
<https://papers.neurips.cc/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf>`_.
883885
offset : float, optional
884886
Hyper-parameter that controls how much we will slow down the first steps the first few iterations.
885-
Corresponds to Tau_0 from `Matthew D. Hoffman, David M. Blei, Francis Bach:
886-
"Online Learning for Latent Dirichlet Allocation NIPS'10" <https://www.di.ens.fr/~fbach/mdhnips2010.pdf>`_.
887+
Corresponds to Tau_0 from `Hoffman et al. :"Online Learning for Latent Dirichlet Allocation"
888+
<https://papers.neurips.cc/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf>`_.
887889
passes : int, optional
888890
Number of passes through the corpus during training.
889891
update_every : int, optional

0 commit comments

Comments
 (0)