Skip to content

Commit aafb70f

Browse files
authored
Merge pull request #56 from UCSB-Library-Research-Data-Services/renata
improving formatting
2 parents cbc022a + 0542e95 commit aafb70f

File tree

4 files changed

+23
-24
lines changed

4 files changed

+23
-24
lines changed

_quarto.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ website:
6262
- section: "Sentiment Analysis"
6363
contents:
6464
- href: chapters/3.SentimentAnalysis/introduction.qmd
65-
text: Introduction to Sentiment Analysis
65+
text: What is Sentiment Analysis?
6666
- href: chapters/3.SentimentAnalysis/polarity.qmd
6767
text: Polarity Classification
6868
- href: chapters/3.SentimentAnalysis/emotion.qmd

chapters/3.SentimentAnalysis/emotion.qmd

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ title: "Emotion Detection"
33
editor: visual
44
---
55

6-
Emotion detection is another NLP technique aimed at identifying and quantifying human emotions expressed in text, which builds directly on traditional sentiment polarity analysis focusing on capturing more nuanced emotional states. While polarity classification identifies whether a text expresses positive, negative, or neutral sentiment, it does not capture the specific type of emotion behind that sentiment. For example, two negative texts could express very different emotionsone might convey anger, while another reflects sadness. By extending polarity into multiple emotional dimensions, emotion detection provides more granular and more actionable insights into how people truly feel.
6+
Emotion detection is another NLP technique aimed at identifying and quantifying human emotions expressed in text, which builds directly on traditional sentiment polarity analysis focusing on capturing more nuanced emotional states. While polarity classification identifies whether a text expresses positive, negative, or neutral sentiment, it does not capture the specific type of emotion behind that sentiment. For example, two negative texts could express very different emotions: one might convey anger, while another reflects sadness. By extending polarity into multiple emotional dimensions, emotion detection provides more granular and more actionable insights into how people truly feel.
77

88
We will use the `syuzhet` package ([more info](https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html)) to to help us classify emotions detected in our dataset. The name “syuzhet” is inspired by the work of Russian Formalists Victor Shklovsky and Vladimir Propp, who distinguished between two aspects of a narrative: the fabula and the syuzhet. The fabula represents the chronological sequence of events, while the syuzhet refers to the way these events are presented or structured; the narrative’s technique or “device.” In other words, syuzhet focuses on how the story (fabula) is organized and conveyed to the audience.
99

@@ -22,18 +22,17 @@ You may explore NRC's lexicon Tableau dashboard to explore words associated with
2222
```{=html}
2323
<iframe width="780" height="500" src="https://public.tableau.com/views/NRC-Emotion-Lexicon-viz1/NRCEmotionLexicon-viz1?:embed=y&:loadOrderID=0&:display_count=no&:showVizHome=no" title="NRC Lexicon Interactive Visualization"></iframe>
2424
```
25-
2625
Now that we have a better understanding of this package, let's get back to business and perform emotion detection to our data.
2726

28-
#### Emotion Detection with Syuzhet's NRC Lexicon
27+
### Emotion Detection with Syuzhet's NRC Lexicon
2928

30-
##### Detecting Emotions per Comment/Sentence
29+
#### Detecting Emotions per Comment/Sentence
3130

3231
``` r
3332
sentences <- get_sentences(comments$comments)
3433
```
3534

36-
##### Compute Emotion Scores per Sentence
35+
#### Compute Emotion Scores per Sentence
3736

3837
``` r
3938
emotion_score <- get_nrc_sentiment(sentences)
@@ -43,7 +42,7 @@ The `get_nrc_sentiment()` function assigns emotion and sentiment scores (based o
4342

4443
![](images/emotions_scores-dataframe.png)
4544

46-
##### Review Summary of Emotion Scores
45+
#### Review Summary of Emotion Scores
4746

4847
Let's now compute basic statistics (min, max, mean, etc.) for each emotion column and get an overview of how frequent or strong each emotion is on our example dataset.
4948

@@ -59,7 +58,7 @@ Based on the results the overall emotion in these comments leans heavily toward
5958

6059
On the flip side, **Disgust** was the rarest emotion, with the lowest average (0.145). It's also worth noting that while Sadness and Trust are the most *common*, a few comments really went off the rails with **Trust (47.000), Anger (44.000)**, and **Fear (37.000)**, hitting the highest extreme scores.
6160

62-
##### Regroup with comments and IDs
61+
#### Regroup with comments and IDs
6362

6463
After computing scores for emotions, we want to link them back to its **original comment and ID**.
6564

@@ -70,7 +69,7 @@ emotion_data <- bind_cols(comments, emotion_score)
7069

7170
`bind_cols()` merges the original `comments` data frame with the new `emotion_score` table.
7271

73-
##### Summarize Emotion Counts Across All Sentences
72+
#### Summarize Emotion Counts Across All Sentences
7473

7574
Now, let's count **how many times each emotion appears** overall.
7675

@@ -88,7 +87,7 @@ emotion_summary <- emotion_data %>%
8887

8988
![](images/emotion-counts.png){width="194"}
9089

91-
##### Plot the Overall Emotion Distribution
90+
#### Plot the Overall Emotion Distribution
9291

9392
``` r
9493
ggplot(emotion_summary, aes(x = emotion, y = count, fill = emotion)) +
@@ -103,7 +102,7 @@ ggplot(emotion_summary, aes(x = emotion, y = count, fill = emotion)) +
103102

104103
![](images/barchart-emotions.png)
105104

106-
##### Add a “Season” Variable (Grouping) and Summarize
105+
#### Add a “Season” Variable (Grouping) and Summarize
107106

108107
Let's now add a new column called `season` by looking at the ID pattern — for example, `s1_` means season 1 and `s2_` means season 2. This makes it easy to compare the emotional tone across seasons.
109108

@@ -124,7 +123,7 @@ emotion_by_season <- emotion_seasons %>%
124123
)
125124
```
126125

127-
##### Plotting the Data
126+
#### Plotting the Data
128127

129128
Comparing emotions by season:
130129

chapters/3.SentimentAnalysis/introduction.qmd

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: "Introduction to Sentiment Analysis"
2+
title: "What is Sentiment Analysis?"
33
---
44

55
Now that we have completed all the key preprocessing steps and our example dataset is in much better shape, we can finally proceed with sentiment analysis.
@@ -23,7 +23,7 @@ Our analysis pipeline will follow a two-step approach. First, we will compute ba
2323
Let’s start by installing and loading the necessary packages, then bringing in the cleaned dataset so we can begin our sentiment analysis. We will discuss the role of each package in the next episodes.
2424

2525
``` r
26-
# Install packages (remove comments for packages you might have skipped)
26+
# Install packages (remove comments for packages you might have skipped in previous episodes)
2727
install.packages("sentimentr")
2828
install.packages("syuzhet")
2929
# install.packages("dplyr")

chapters/3.SentimentAnalysis/polarity.qmd

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -23,21 +23,21 @@ Words like “but,” “however,” and “although” also influence the senti
2323

2424
With this approach, we can explore more confidently whether the show’s viewers felt positive, neutral, or negative about it.
2525

26-
#### Computing Polarity with Sentiment R (Valence Sifters Capability)
26+
### Computing Polarity with Sentiment R (Valence Sifters Capability)
2727

28-
##### Calculating sentiment scores
28+
#### Calculating sentiment scores
2929

30-
Here we’re using the **`sentiment_by()`** function which looks at each comment and calculates a **sentiment score** representing how positive or negative comments are.
30+
Here we’re using the **`sentiment_by()`** function which looks at each comment and calculates a **sentiment score** representing how positive or negative comments are. Let's enter the following code to select all the values contained in the comments column:
3131

3232
``` r
3333
sentiment_scores <- sentiment_by(comments$comments)
3434
```
3535

36-
![Sentiment Scores Output](images/sentiment-scores.png){width="342"}
36+
![Sentiment Scores Output](images/sentiment-scores.png){width="418"}
3737

3838
So after running this, we get a new object called `sentiment_scores` with the average sentiment for every comment. Can you guess why the SD column is empty? A single data point (sentence/row) does not have a standard deviation by itself.
3939

40-
##### Adding those scores back to our dataset
40+
#### Adding those scores back to our dataset
4141

4242
Now we’re using the **`dplyr`** package to make our dataset more informative. We take our `comments` dataset, and with **`mutate()`**, we add two new columns: `score` and `sentiment label`. The little rule inside **`case_when()`** decides what label to give. The small buffer around zero (±0.1) helps us avoid overreacting to tiny fluctuations.
4343

@@ -55,17 +55,17 @@ Let's now take a look at the `sentiment_scores` data frame:
5555

5656
![Sentiment Scores with Polarity Results](images/polarity-scores.png)
5757

58-
To get a sense of the overall mood of our dataset let's run:
58+
To get a sense of the overall "mood" of our dataset let's run:
5959

6060
``` r
6161
table(polarity$sentiment_label)
6262
```
6363

64-
![Overall Polarity Count](images/overall-polarity-count.png){width="363"}
64+
![Polarity Count](images/overall-polarity-count.png){width="363"}
6565

6666
Overall, the majority of viewers reacted positively to the show, with positive opinions more than double the negative ones, indicating a generally favorable reception. However, this is only part of the story—positive sentiment can range from mildly favorable to very enthusiastic. To better visualize the full distribution of opinions, a histogram is presented below.
6767

68-
#### Plotting Scores
68+
### Plotting Scores
6969

7070
Next, let's plot some results and histograms to check the distribution for the scores:
7171

@@ -77,9 +77,9 @@ ggplot(polarity, aes(x = score)) +
7777
labs(title = "Sentiment Score Distribution", x = "Average Sentiment", y = "Count")
7878
```
7979

80-
![Polarity Distribution](images/histogram-polarity.png){width="529"}
80+
![Polarity Distribution](images/histogram-polarity.png){width="699"}
8181

82-
This histogram suggests that the overall sentiment toward the Severance show was **mostly neutral to slightly positive**. This suggests that most viewers are expressing their opinions in a **measured, nuanced, or factual** manner, rather than with intense emotional language (either extremely positive or negative).
82+
This histogram suggests that the overall sentiment toward the Severance show was **mostly neutral to slightly positive**. This suggests that most viewers are expressing their opinions without using intense emotional language (either extremely positive or negative).
8383

8484
We can also break the data down by season to compare how audience opinions vary over each season finale:
8585

0 commit comments

Comments
 (0)