Update docs for API changes

grantjenks · grantjenks · commit d3aceef12fc1 · 2017-09-29T15:40:30.000-07:00
diff --git a/docs/api.rst b/docs/api.rst
@@ -1,12 +1,16 @@
 WordSegment API Reference
 =========================
 
+`WordSegment`_ API reference.
+
+.. _`WordSegment`: http://www.grantjenks.com/docs/wordsegment/
+
 .. py:function:: clean(text)
    :module: wordsegment
 
     Return `text` lower-cased with non-alphanumeric characters removed.
 
-.. py:function:: divide(text, limit=24)
+.. py:function:: divide(text)
    :module: wordsegment
 
     Yield (prefix, suffix) pairs from `text` with len(prefix) not
@@ -36,18 +40,11 @@ WordSegment API Reference
    :module: wordsegment
 
     Mapping of (unigram, count) pairs.
-    Loaded from the file 'wordsegment_data/unigrams.txt'.
+    Loaded from the file 'wordsegment/unigrams.txt'.
 
 .. py:data:: BIGRAMS
    :module: wordsegment
 
     Mapping of (bigram, count) pairs.
     Bigram keys are joined by a space.
-    Loaded from the file 'wordsegment_data/bigrams.txt'.
-
-.. py:data:: TOTAL
-   :module: wordsegment
-
-    Total number of unigrams in the corpus.
-    Need not match `sum(UNIGRAMS.values())`.
-    Defaults to 1,024,908,267,229.
+    Loaded from the file 'wordsegment/bigrams.txt'.
diff --git a/docs/using-a-different-corpus.rst b/docs/using-a-different-corpus.rst
@@ -33,6 +33,7 @@ dictionaries: ``wordsegment.clean``, ``wordsegment.BIGRAMS`` and
 .. code:: python
 
     import wordsegment
+    wordsegment.load()
 
 .. code:: python
 
@@ -75,7 +76,8 @@ Now we'll build our dictionaries.
 
     from collections import Counter
 
-    wordsegment.UNIGRAMS = Counter(tokenize(text))
+    wordsegment.UNIGRAMS.clear()
+    wordsegment.UNIGRAMS.update(Counter(tokenize(text)))
 
     def pairs(iterable):
         iterator = iter(iterable)
@@ -85,7 +87,8 @@ Now we'll build our dictionaries.
             yield ' '.join(values)
             del values[0]
 
-    wordsegment.BIGRAMS = Counter(pairs(tokenize(text)))
+    wordsegment.BIGRAMS.clear()
+    wordsegment.BIGRAMS.update(Counter(pairs(tokenize(text))))
 
 That's it.
 
@@ -97,10 +100,12 @@ input to ``segment``.
 
 .. code:: python
 
+    from wordsegment import _segmenter
+
     def identity(value):
         return value
 
-    wordsegment.clean = identity
+    _segmenter.clean = identity
 
 .. code:: python
 
@@ -111,12 +116,12 @@ input to ``segment``.
     ['want', 'of', 'a', 'wife']
 
 If you find this behaves poorly then you may need to change the
-``wordsegment.TOTAL`` variable to reflect the total of all unigrams. In
+``_segmenter.total`` variable to reflect the total of all unigrams. In
 our case that's simply:
 
 .. code:: python
 
-    wordsegment.TOTAL = float(sum(wordsegment.UNIGRAMS.values()))
+    _segmenter.total = float(sum(wordsegment.UNIGRAMS.values()))
 
 WordSegment doesn't require any fancy machine learning training
 algorithms. Simply update the unigram and bigram count dictionaries and