Skip to content

Commit 2513620

Browse files
authored
Merge pull request numpy#20350 from WarrenWeckesser/zipf-example
DOC: random: Fix a mistake in the zipf example.
2 parents e564fcb + abb136c commit 2513620

File tree

2 files changed

+42
-25
lines changed

2 files changed

+42
-25
lines changed

numpy/random/_generator.pyx

Lines changed: 21 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -3107,7 +3107,7 @@ cdef class Generator:
31073107
`a` > 1.
31083108
31093109
The Zipf distribution (also known as the zeta distribution) is a
3110-
continuous probability distribution that satisfies Zipf's law: the
3110+
discrete probability distribution that satisfies Zipf's law: the
31113111
frequency of an item is inversely proportional to its rank in a
31123112
frequency table.
31133113
@@ -3135,9 +3135,10 @@ cdef class Generator:
31353135
-----
31363136
The probability density for the Zipf distribution is
31373137
3138-
.. math:: p(x) = \\frac{x^{-a}}{\\zeta(a)},
3138+
.. math:: p(k) = \\frac{k^{-a}}{\\zeta(a)},
31393139
3140-
where :math:`\\zeta` is the Riemann Zeta function.
3140+
for integers :math:`k \geq 1`, where :math:`\\zeta` is the Riemann Zeta
3141+
function.
31413142
31423143
It is named for the American linguist George Kingsley Zipf, who noted
31433144
that the frequency of any word in a sample of a language is inversely
@@ -3153,22 +3154,29 @@ cdef class Generator:
31533154
--------
31543155
Draw samples from the distribution:
31553156
3156-
>>> a = 2. # parameter
3157-
>>> s = np.random.default_rng().zipf(a, 1000)
3157+
>>> a = 4.0
3158+
>>> n = 20000
3159+
>>> s = np.random.default_rng().zipf(a, size=n)
31583160
31593161
Display the histogram of the samples, along with
3160-
the probability density function:
3162+
the expected histogram based on the probability
3163+
density function:
31613164
31623165
>>> import matplotlib.pyplot as plt
3163-
>>> from scipy import special # doctest: +SKIP
3166+
>>> from scipy.special import zeta # doctest: +SKIP
3167+
3168+
`bincount` provides a fast histogram for small integers.
31643169
3165-
Truncate s values at 50 so plot is interesting:
3170+
>>> count = np.bincount(s)
3171+
>>> k = np.arange(1, s.max() + 1)
31663172
3167-
>>> count, bins, ignored = plt.hist(s[s<50],
3168-
... 50, density=True)
3169-
>>> x = np.arange(1., 50.)
3170-
>>> y = x**(-a) / special.zetac(a) # doctest: +SKIP
3171-
>>> plt.plot(x, y/max(y), linewidth=2, color='r') # doctest: +SKIP
3173+
>>> plt.bar(k, count[1:], alpha=0.5, label='sample count')
3174+
>>> plt.plot(k, n*(k**-a)/zeta(a), 'k.-', alpha=0.5,
3175+
... label='expected count') # doctest: +SKIP
3176+
>>> plt.semilogy()
3177+
>>> plt.grid(alpha=0.4)
3178+
>>> plt.legend()
3179+
>>> plt.title(f'Zipf sample, a={a}, size={n}')
31723180
>>> plt.show()
31733181
31743182
"""

numpy/random/mtrand.pyx

Lines changed: 21 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3609,7 +3609,7 @@ cdef class RandomState:
36093609
`a` > 1.
36103610
36113611
The Zipf distribution (also known as the zeta distribution) is a
3612-
continuous probability distribution that satisfies Zipf's law: the
3612+
discrete probability distribution that satisfies Zipf's law: the
36133613
frequency of an item is inversely proportional to its rank in a
36143614
frequency table.
36153615
@@ -3642,9 +3642,10 @@ cdef class RandomState:
36423642
-----
36433643
The probability density for the Zipf distribution is
36443644
3645-
.. math:: p(x) = \\frac{x^{-a}}{\\zeta(a)},
3645+
.. math:: p(k) = \\frac{k^{-a}}{\\zeta(a)},
36463646
3647-
where :math:`\\zeta` is the Riemann Zeta function.
3647+
for integers :math:`k \geq 1`, where :math:`\\zeta` is the Riemann Zeta
3648+
function.
36483649
36493650
It is named for the American linguist George Kingsley Zipf, who noted
36503651
that the frequency of any word in a sample of a language is inversely
@@ -3660,21 +3661,29 @@ cdef class RandomState:
36603661
--------
36613662
Draw samples from the distribution:
36623663
3663-
>>> a = 2. # parameter
3664-
>>> s = np.random.zipf(a, 1000)
3664+
>>> a = 4.0
3665+
>>> n = 20000
3666+
>>> s = np.random.zipf(a, n)
36653667
36663668
Display the histogram of the samples, along with
3667-
the probability density function:
3669+
the expected histogram based on the probability
3670+
density function:
36683671
36693672
>>> import matplotlib.pyplot as plt
3670-
>>> from scipy import special # doctest: +SKIP
3673+
>>> from scipy.special import zeta # doctest: +SKIP
3674+
3675+
`bincount` provides a fast histogram for small integers.
36713676
3672-
Truncate s values at 50 so plot is interesting:
3677+
>>> count = np.bincount(s)
3678+
>>> k = np.arange(1, s.max() + 1)
36733679
3674-
>>> count, bins, ignored = plt.hist(s[s<50], 50, density=True)
3675-
>>> x = np.arange(1., 50.)
3676-
>>> y = x**(-a) / special.zetac(a) # doctest: +SKIP
3677-
>>> plt.plot(x, y/max(y), linewidth=2, color='r') # doctest: +SKIP
3680+
>>> plt.bar(k, count[1:], alpha=0.5, label='sample count')
3681+
>>> plt.plot(k, n*(k**-a)/zeta(a), 'k.-', alpha=0.5,
3682+
... label='expected count') # doctest: +SKIP
3683+
>>> plt.semilogy()
3684+
>>> plt.grid(alpha=0.4)
3685+
>>> plt.legend()
3686+
>>> plt.title(f'Zipf sample, a={a}, size={n}')
36783687
>>> plt.show()
36793688
36803689
"""

0 commit comments

Comments
 (0)