Skip to content

Commit 6254e82

Browse files
committed
More copyright updates and other minor changes
1 parent 48b169a commit 6254e82

File tree

3 files changed

+12
-8
lines changed

3 files changed

+12
-8
lines changed

README.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,6 @@
11
Python Word Segmentation
22
========================
33

4-
.. image:: https://api.travis-ci.org/grantjenks/wordsegment.svg
5-
:target: http://www.grantjenks.com/blog/portfolio-post/english-word-segmentation-python/
6-
74
`WordSegment`_ is an Apache2 licensed module for English word
85
segmentation, written in pure-Python, and based on a trillion-word corpus.
96

@@ -35,6 +32,9 @@ Features
3532
- Developed on Python 2.7
3633
- Tested on CPython 2.6, 2.7, 3.2, 3.3, 3.4 and PyPy 2.5+, PyPy3 2.4+
3734

35+
.. image:: https://api.travis-ci.org/grantjenks/wordsegment.svg
36+
:target: http://www.grantjenks.com/docs/wordsegment/
37+
3838
Quickstart
3939
----------
4040

@@ -135,7 +135,7 @@ Reference and Indices
135135
WordSegment License
136136
-------------------
137137

138-
Copyright 2015 Grant Jenks
138+
Copyright 2016 Grant Jenks
139139

140140
Licensed under the Apache License, Version 2.0 (the "License");
141141
you may not use this file except in compliance with the License.

docs/index.rst

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
Python Word Segmentation
22
========================
33

4-
Python WordSegment is an Apache2 licensed module for English word segmentation,
5-
written in pure-Python, and based on a trillion-word corpus.
4+
`WordSegment`_ is an Apache2 licensed module for English word
5+
segmentation, written in pure-Python, and based on a trillion-word corpus.
66

77
Based on code from the chapter "`Natural Language Corpus Data`_" by Peter
88
Norvig from the book "`Beautiful Data`_" (Segaran and Hammerbacher, 2009).
@@ -14,6 +14,7 @@ data. The unigram data includes only the most common 333,000 words. Similarly,
1414
bigram data includes only the most common 250,000 phrases. Every word and
1515
phrase is lowercased with punctuation removed.
1616

17+
.. _`WordSegment`: http://www.grantjenks.com/docs/wordsegment/
1718
.. _`Natural Language Corpus Data`: http://norvig.com/ngrams/
1819
.. _`Beautiful Data`: http://oreilly.com/catalog/9780596157111/
1920
.. _`Google Web Trillion Word Corpus`: http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
@@ -31,6 +32,9 @@ Features
3132
- Developed on Python 2.7
3233
- Tested on CPython 2.6, 2.7, 3.2, 3.3, 3.4 and PyPy 2.5+, PyPy3 2.4+
3334

35+
.. image:: https://api.travis-ci.org/grantjenks/wordsegment.svg
36+
:target: http://www.grantjenks.com/docs/wordsegment/
37+
3438
Quickstart
3539
----------
3640

docs/using-a-different-corpus.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,12 +45,12 @@ dictionaries: ``wordsegment.clean``, ``wordsegment.BIGRAMS`` and
4545
.. code:: python
4646
4747
print wordsegment.UNIGRAMS.items()[:3]
48-
print wordsegment.BIGRAMS.items()[:3]
48+
print wordsegment.BIGRAMS.items()[:2]
4949
5050
.. parsed-literal::
5151
5252
[('biennials', 37548.0), ('verplank', 48349.0), ('tsukino', 19771.0)]
53-
[('personal effects', 151369.0), ('basic training', 294085.0), ('it absolutely', 130505.0)]
53+
[('personal effects', 151369.0), ('basic training', 294085.0)]
5454
5555
Ok, so ``wordsegment.UNIGRAMS`` is just a dictionary mapping
5656
unigrams to their counts. Let's write a method to tokenize our text.

0 commit comments

Comments
 (0)