-
Notifications
You must be signed in to change notification settings - Fork 87
Running Sentences One at A Time Gives A Different Result than Batching Them #10
Copy link
Copy link
Open
Description
Running a list of tweets in self.data through sentiment_score one at a time gives different results than batching that same data in 25 or 100 at a time through sentiment_scores_of_sents.
It's not just floating point issues either. I ran a list of 9068 tweets, and I found that the largest difference was 0.9579128974724299, and that 161 tweets in total had a different score in batch than in the single run that were greater than .5!
Code for running them one at a time:
out = []
for tweet in self.data:
out.append(sentiment_score(tweet))
Code for running the data in batches:
out = []
for batch in self.batch_data:
out.extend(sentiment_scores_of_sents(batch))
Code for batching the data:
temp_list = []
for x in cls.data:
if count >= 25:
cls.batch_data.append(temp_list)
temp_list = []
count = 0
temp_list.append(x)
count += 1
if len(temp_list) > 0:
cls.batch_data.append(temp_list)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels