Skip to content

Commit 000918b

Browse files
committed
Updated utils
Now takes a string instead of a list of words and tokenize that string of words using NLTK's word_tokenizer
1 parent e054212 commit 000918b

File tree

4 files changed

+4
-7
lines changed

4 files changed

+4
-7
lines changed

.DS_Store

0 Bytes
Binary file not shown.

examples/ytpy.py

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,6 @@
11
import couchdb
22

33
from sys import argv,path
4-
5-
6-
7-
8-
#initialization
94
path.append('/Volumes/My Book/Dropbox/ToxTweet/Software/APIs/ytpy/lib')
105
path.append('/Volumes/My Book/Dropbox/ToxTweet/Software/APIs/ytpy/examples')
116

@@ -54,7 +49,8 @@ def save(video,format='txt'):
5449

5550
data = YouTubeSearch(drug)
5651

57-
def update_database
52+
def update_database():
53+
pass
5854

5955
server = couchdb.Server()
6056

lib/utils.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,8 +160,9 @@ def clean(tweet,keepTags = False):
160160

161161
def remove_stopwords(utterance, languages=['english','spanish','french']):
162162
#languages is a list
163+
utterance = word_tokenize(utterance)
163164
from nltk.corpus import stopwords
164-
my_stopwords = map(lambda word: word.rstrip('\n'),open('stopwords','rb').readlines())
165+
my_stopwords = map(lambda word: word.rstrip('\n'),open('/Volumes/My Book/Dropbox/ToxTweet/Software/APIs/ytpy/constants/stopwords','rb').readlines())
165166
allowed_languages = {'english','spanish','french'} #This is a set
166167
#Serial processing is least obfuscated
167168
for language in set(languages).intersection(allowed_languages):

lib/utils.pyc

91 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)