You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: chapters/3.SentimentAnalysis/emotion.qmd
+10-11Lines changed: 10 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@ title: "Emotion Detection"
3
3
editor: visual
4
4
---
5
5
6
-
Emotion detection is another NLP technique aimed at identifying and quantifying human emotions expressed in text, which builds directly on traditional sentiment polarity analysis focusing on capturing more nuanced emotional states. While polarity classification identifies whether a text expresses positive, negative, or neutral sentiment, it does not capture the specific type of emotion behind that sentiment. For example, two negative texts could express very different emotions—one might convey anger, while another reflects sadness. By extending polarity into multiple emotional dimensions, emotion detection provides more granular and more actionable insights into how people truly feel.
6
+
Emotion detection is another NLP technique aimed at identifying and quantifying human emotions expressed in text, which builds directly on traditional sentiment polarity analysis focusing on capturing more nuanced emotional states. While polarity classification identifies whether a text expresses positive, negative, or neutral sentiment, it does not capture the specific type of emotion behind that sentiment. For example, two negative texts could express very different emotions: one might convey anger, while another reflects sadness. By extending polarity into multiple emotional dimensions, emotion detection provides more granular and more actionable insights into how people truly feel.
7
7
8
8
We will use the `syuzhet` package ([more info](https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html)) to to help us classify emotions detected in our dataset. The name “syuzhet” is inspired by the work of Russian Formalists Victor Shklovsky and Vladimir Propp, who distinguished between two aspects of a narrative: the fabula and the syuzhet. The fabula represents the chronological sequence of events, while the syuzhet refers to the way these events are presented or structured; the narrative’s technique or “device.” In other words, syuzhet focuses on how the story (fabula) is organized and conveyed to the audience.
9
9
@@ -22,18 +22,17 @@ You may explore NRC's lexicon Tableau dashboard to explore words associated with
Now that we have a better understanding of this package, let's get back to business and perform emotion detection to our data.
27
26
28
-
####Emotion Detection with Syuzhet's NRC Lexicon
27
+
### Emotion Detection with Syuzhet's NRC Lexicon
29
28
30
-
#####Detecting Emotions per Comment/Sentence
29
+
#### Detecting Emotions per Comment/Sentence
31
30
32
31
```r
33
32
sentences<- get_sentences(comments$comments)
34
33
```
35
34
36
-
#####Compute Emotion Scores per Sentence
35
+
#### Compute Emotion Scores per Sentence
37
36
38
37
```r
39
38
emotion_score<- get_nrc_sentiment(sentences)
@@ -43,7 +42,7 @@ The `get_nrc_sentiment()` function assigns emotion and sentiment scores (based o
43
42
44
43

45
44
46
-
#####Review Summary of Emotion Scores
45
+
#### Review Summary of Emotion Scores
47
46
48
47
Let's now compute basic statistics (min, max, mean, etc.) for each emotion column and get an overview of how frequent or strong each emotion is on our example dataset.
49
48
@@ -59,7 +58,7 @@ Based on the results the overall emotion in these comments leans heavily toward
59
58
60
59
On the flip side, **Disgust** was the rarest emotion, with the lowest average (0.145). It's also worth noting that while Sadness and Trust are the most *common*, a few comments really went off the rails with **Trust (47.000), Anger (44.000)**, and **Fear (37.000)**, hitting the highest extreme scores.
61
60
62
-
#####Regroup with comments and IDs
61
+
#### Regroup with comments and IDs
63
62
64
63
After computing scores for emotions, we want to link them back to its **original comment and ID**.
@@ -103,7 +102,7 @@ ggplot(emotion_summary, aes(x = emotion, y = count, fill = emotion)) +
103
102
104
103

105
104
106
-
#####Add a “Season” Variable (Grouping) and Summarize
105
+
#### Add a “Season” Variable (Grouping) and Summarize
107
106
108
107
Let's now add a new column called `season` by looking at the ID pattern — for example, `s1_` means season 1 and `s2_` means season 2. This makes it easy to compare the emotional tone across seasons.
Copy file name to clipboardExpand all lines: chapters/3.SentimentAnalysis/introduction.qmd
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ Our analysis pipeline will follow a two-step approach. First, we will compute ba
23
23
Let’s start by installing and loading the necessary packages, then bringing in the cleaned dataset so we can begin our sentiment analysis. We will discuss the role of each package in the next episodes.
24
24
25
25
```r
26
-
# Install packages (remove comments for packages you might have skipped)
26
+
# Install packages (remove comments for packages you might have skipped in previous episodes)
Copy file name to clipboardExpand all lines: chapters/3.SentimentAnalysis/polarity.qmd
+10-10Lines changed: 10 additions & 10 deletions
Original file line number
Diff line number
Diff line change
@@ -23,21 +23,21 @@ Words like “but,” “however,” and “although” also influence the senti
23
23
24
24
With this approach, we can explore more confidently whether the show’s viewers felt positive, neutral, or negative about it.
25
25
26
-
####Computing Polarity with Sentiment R (Valence Sifters Capability)
26
+
### Computing Polarity with Sentiment R (Valence Sifters Capability)
27
27
28
-
#####Calculating sentiment scores
28
+
#### Calculating sentiment scores
29
29
30
-
Here we’re using the **`sentiment_by()`** function which looks at each comment and calculates a **sentiment score** representing how positive or negative comments are.
30
+
Here we’re using the **`sentiment_by()`** function which looks at each comment and calculates a **sentiment score** representing how positive or negative comments are. Let's enter the following code to select all the values contained in the comments column:
So after running this, we get a new object called `sentiment_scores` with the average sentiment for every comment. Can you guess why the SD column is empty? A single data point (sentence/row) does not have a standard deviation by itself.
39
39
40
-
#####Adding those scores back to our dataset
40
+
#### Adding those scores back to our dataset
41
41
42
42
Now we’re using the **`dplyr`** package to make our dataset more informative. We take our `comments` dataset, and with **`mutate()`**, we add two new columns: `score` and `sentiment label`. The little rule inside **`case_when()`** decides what label to give. The small buffer around zero (±0.1) helps us avoid overreacting to tiny fluctuations.
43
43
@@ -55,17 +55,17 @@ Let's now take a look at the `sentiment_scores` data frame:
55
55
56
56

57
57
58
-
To get a sense of the overall mood of our dataset let's run:
58
+
To get a sense of the overall "mood" of our dataset let's run:
Overall, the majority of viewers reacted positively to the show, with positive opinions more than double the negative ones, indicating a generally favorable reception. However, this is only part of the story—positive sentiment can range from mildly favorable to very enthusiastic. To better visualize the full distribution of opinions, a histogram is presented below.
67
67
68
-
####Plotting Scores
68
+
### Plotting Scores
69
69
70
70
Next, let's plot some results and histograms to check the distribution for the scores:
This histogram suggests that the overall sentiment toward the Severance show was **mostly neutral to slightly positive**. This suggests that most viewers are expressing their opinions in a **measured, nuanced, or factual** manner, rather than with intense emotional language (either extremely positive or negative).
82
+
This histogram suggests that the overall sentiment toward the Severance show was **mostly neutral to slightly positive**. This suggests that most viewers are expressing their opinions without using intense emotional language (either extremely positive or negative).
83
83
84
84
We can also break the data down by season to compare how audience opinions vary over each season finale:
0 commit comments