@@ -33,6 +33,7 @@ dictionaries: ``wordsegment.clean``, ``wordsegment.BIGRAMS`` and
33
33
.. code :: python
34
34
35
35
import wordsegment
36
+ wordsegment.load()
36
37
37
38
.. code :: python
38
39
@@ -75,7 +76,8 @@ Now we'll build our dictionaries.
75
76
76
77
from collections import Counter
77
78
78
- wordsegment.UNIGRAMS = Counter(tokenize(text))
79
+ wordsegment.UNIGRAMS .clear()
80
+ wordsegment.UNIGRAMS .update(Counter(tokenize(text)))
79
81
80
82
def pairs (iterable ):
81
83
iterator = iter (iterable)
@@ -85,7 +87,8 @@ Now we'll build our dictionaries.
85
87
yield ' ' .join(values)
86
88
del values[0 ]
87
89
88
- wordsegment.BIGRAMS = Counter(pairs(tokenize(text)))
90
+ wordsegment.BIGRAMS .clear()
91
+ wordsegment.BIGRAMS .update(Counter(pairs(tokenize(text))))
89
92
90
93
That's it.
91
94
@@ -97,10 +100,12 @@ input to ``segment``.
97
100
98
101
.. code :: python
99
102
103
+ from wordsegment import _segmenter
104
+
100
105
def identity (value ):
101
106
return value
102
107
103
- wordsegment .clean = identity
108
+ _segmenter .clean = identity
104
109
105
110
.. code :: python
106
111
@@ -111,12 +116,12 @@ input to ``segment``.
111
116
['want', 'of', 'a', 'wife']
112
117
113
118
If you find this behaves poorly then you may need to change the
114
- ``wordsegment.TOTAL `` variable to reflect the total of all unigrams. In
119
+ ``_segmenter.total `` variable to reflect the total of all unigrams. In
115
120
our case that's simply:
116
121
117
122
.. code :: python
118
123
119
- wordsegment. TOTAL = float (sum (wordsegment.UNIGRAMS .values()))
124
+ _segmenter.total = float (sum (wordsegment.UNIGRAMS .values()))
120
125
121
126
WordSegment doesn't require any fancy machine learning training
122
127
algorithms. Simply update the unigram and bigram count dictionaries and
0 commit comments