More copyright updates and other minor changes

grantjenks · grantjenks · commit 6254e8222854 · 2016-12-20T16:49:57.000-08:00
diff --git a/README.rst b/README.rst
@@ -1,9 +1,6 @@
 Python Word Segmentation
 ========================
 
-.. image:: https://api.travis-ci.org/grantjenks/wordsegment.svg
-    :target: http://www.grantjenks.com/blog/portfolio-post/english-word-segmentation-python/
-
 `WordSegment`_ is an Apache2 licensed module for English word
 segmentation, written in pure-Python, and based on a trillion-word corpus.
 
@@ -35,6 +32,9 @@ Features
 - Developed on Python 2.7
 - Tested on CPython 2.6, 2.7, 3.2, 3.3, 3.4 and PyPy 2.5+, PyPy3 2.4+
 
+.. image:: https://api.travis-ci.org/grantjenks/wordsegment.svg
+    :target: http://www.grantjenks.com/docs/wordsegment/
+
 Quickstart
 ----------
 
@@ -135,7 +135,7 @@ Reference and Indices
 WordSegment License
 -------------------
 
-Copyright 2015 Grant Jenks
+Copyright 2016 Grant Jenks
 
 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
diff --git a/docs/index.rst b/docs/index.rst
@@ -1,8 +1,8 @@
 Python Word Segmentation
 ========================
 
-Python WordSegment is an Apache2 licensed module for English word segmentation,
-written in pure-Python, and based on a trillion-word corpus.
+`WordSegment`_ is an Apache2 licensed module for English word
+segmentation, written in pure-Python, and based on a trillion-word corpus.
 
 Based on code from the chapter "`Natural Language Corpus Data`_" by Peter
 Norvig from the book "`Beautiful Data`_" (Segaran and Hammerbacher, 2009).
@@ -14,6 +14,7 @@ data. The unigram data includes only the most common 333,000 words. Similarly,
 bigram data includes only the most common 250,000 phrases. Every word and
 phrase is lowercased with punctuation removed.
 
+.. _`WordSegment`: http://www.grantjenks.com/docs/wordsegment/
 .. _`Natural Language Corpus Data`: http://norvig.com/ngrams/
 .. _`Beautiful Data`: http://oreilly.com/catalog/9780596157111/
 .. _`Google Web Trillion Word Corpus`: http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html
@@ -31,6 +32,9 @@ Features
 - Developed on Python 2.7
 - Tested on CPython 2.6, 2.7, 3.2, 3.3, 3.4 and PyPy 2.5+, PyPy3 2.4+
 
+.. image:: https://api.travis-ci.org/grantjenks/wordsegment.svg
+    :target: http://www.grantjenks.com/docs/wordsegment/
+
 Quickstart
 ----------
 
diff --git a/docs/using-a-different-corpus.rst b/docs/using-a-different-corpus.rst
@@ -45,12 +45,12 @@ dictionaries: ``wordsegment.clean``, ``wordsegment.BIGRAMS`` and
 .. code:: python
 
     print wordsegment.UNIGRAMS.items()[:3]
-    print wordsegment.BIGRAMS.items()[:3]
+    print wordsegment.BIGRAMS.items()[:2]
 
 .. parsed-literal::
 
     [('biennials', 37548.0), ('verplank', 48349.0), ('tsukino', 19771.0)]
-    [('personal effects', 151369.0), ('basic training', 294085.0), ('it absolutely', 130505.0)]
+    [('personal effects', 151369.0), ('basic training', 294085.0)]
 
 Ok, so ``wordsegment.UNIGRAMS`` is just a dictionary mapping
 unigrams to their counts. Let's write a method to tokenize our text.