Merge pull request #56 from UCSB-Library-Research-Data-Services/renata

jairomelo · web-flow · commit aafb70f92148 · 2025-11-13T16:49:10.000-08:00
improving formatting
diff --git a/_quarto.yml b/_quarto.yml
@@ -62,7 +62,7 @@ website:
       - section: "Sentiment Analysis"
         contents:
           - href: chapters/3.SentimentAnalysis/introduction.qmd
-            text: Introduction to Sentiment Analysis
+            text: What is Sentiment Analysis?
           - href: chapters/3.SentimentAnalysis/polarity.qmd
             text: Polarity Classification
           - href: chapters/3.SentimentAnalysis/emotion.qmd
diff --git a/chapters/3.SentimentAnalysis/emotion.qmd b/chapters/3.SentimentAnalysis/emotion.qmd
@@ -3,7 +3,7 @@ title: "Emotion Detection"
 editor: visual
 ---
 
-Emotion detection is another NLP technique aimed at identifying and quantifying human emotions expressed in text, which builds directly on traditional sentiment polarity analysis focusing on capturing more nuanced emotional states. While polarity classification identifies whether a text expresses positive, negative, or neutral sentiment, it does not capture the specific type of emotion behind that sentiment. For example, two negative texts could express very different emotions—one might convey anger, while another reflects sadness. By extending polarity into multiple emotional dimensions, emotion detection provides more granular and more actionable insights into how people truly feel.
+Emotion detection is another NLP technique aimed at identifying and quantifying human emotions expressed in text, which builds directly on traditional sentiment polarity analysis focusing on capturing more nuanced emotional states. While polarity classification identifies whether a text expresses positive, negative, or neutral sentiment, it does not capture the specific type of emotion behind that sentiment. For example, two negative texts could express very different emotions: one might convey anger, while another reflects sadness. By extending polarity into multiple emotional dimensions, emotion detection provides more granular and more actionable insights into how people truly feel.
 
 We will use the `syuzhet` package ([more info](https://cran.r-project.org/web/packages/syuzhet/vignettes/syuzhet-vignette.html)) to to help us classify emotions detected in our dataset. The name “syuzhet” is inspired by the work of Russian Formalists Victor Shklovsky and Vladimir Propp, who distinguished between two aspects of a narrative: the fabula and the syuzhet. The fabula represents the chronological sequence of events, while the syuzhet refers to the way these events are presented or structured; the narrative’s technique or “device.” In other words, syuzhet focuses on how the story (fabula) is organized and conveyed to the audience.
 
@@ -22,18 +22,17 @@ You may explore NRC's lexicon Tableau dashboard to explore words associated with
 ```{=html}
 <iframe width="780" height="500" src="https://public.tableau.com/views/NRC-Emotion-Lexicon-viz1/NRCEmotionLexicon-viz1?:embed=y&:loadOrderID=0&:display_count=no&:showVizHome=no" title="NRC Lexicon Interactive Visualization"></iframe>
 ```
-
 Now that we have a better understanding of this package, let's get back to business and perform emotion detection to our data.
 
-#### Emotion Detection with Syuzhet's NRC Lexicon
+### Emotion Detection with Syuzhet's NRC Lexicon
 
-##### Detecting Emotions per Comment/Sentence
+#### Detecting Emotions per Comment/Sentence
 
 ``` r
 sentences <- get_sentences(comments$comments)
 ```
 
-##### Compute Emotion Scores per Sentence
+#### Compute Emotion Scores per Sentence
 
 ``` r
 emotion_score <- get_nrc_sentiment(sentences)
@@ -43,7 +42,7 @@ The `get_nrc_sentiment()` function assigns emotion and sentiment scores (based o
 
 ![](images/emotions_scores-dataframe.png)
 
-##### Review Summary of Emotion Scores
+#### Review Summary of Emotion Scores
 
 Let's now compute basic statistics (min, max, mean, etc.) for each emotion column and get an overview of how frequent or strong each emotion is on our example dataset.
 
@@ -59,7 +58,7 @@ Based on the results the overall emotion in these comments leans heavily toward
 
 On the flip side, **Disgust** was the rarest emotion, with the lowest average (0.145). It's also worth noting that while Sadness and Trust are the most *common*, a few comments really went off the rails with **Trust (47.000), Anger (44.000)**, and **Fear (37.000)**, hitting the highest extreme scores.
 
-##### Regroup with comments and IDs
+#### Regroup with comments and IDs
 
 After computing scores for emotions, we want to link them back to its **original comment and ID**.
 
@@ -70,7 +69,7 @@ emotion_data <- bind_cols(comments, emotion_score)
 
 `bind_cols()` merges the original `comments` data frame with the new `emotion_score` table.
 
-##### Summarize Emotion Counts Across All Sentences
+#### Summarize Emotion Counts Across All Sentences
 
 Now, let's count **how many times each emotion appears** overall.
 
@@ -88,7 +87,7 @@ emotion_summary <- emotion_data %>%
 
 ![](images/emotion-counts.png){width="194"}
 
-##### Plot the Overall Emotion Distribution
+#### Plot the Overall Emotion Distribution
 
 ``` r
 ggplot(emotion_summary, aes(x = emotion, y = count, fill = emotion)) +
@@ -103,7 +102,7 @@ ggplot(emotion_summary, aes(x = emotion, y = count, fill = emotion)) +
 
 ![](images/barchart-emotions.png)
 
-##### Add a “Season” Variable (Grouping) and Summarize
+#### Add a “Season” Variable (Grouping) and Summarize
 
 Let's now add a new column called `season` by looking at the ID pattern — for example, `s1_` means season 1 and `s2_` means season 2. This makes it easy to compare the emotional tone across seasons.
 
@@ -124,7 +123,7 @@ emotion_by_season <- emotion_seasons %>%
   )
 ```
 
-##### Plotting the Data
+#### Plotting the Data
 
 Comparing emotions by season:
 
diff --git a/chapters/3.SentimentAnalysis/introduction.qmd b/chapters/3.SentimentAnalysis/introduction.qmd
@@ -1,5 +1,5 @@
 ---
-title: "Introduction to Sentiment Analysis"
+title: "What is Sentiment Analysis?"
 ---
 
 Now that we have completed all the key preprocessing steps and our example dataset is in much better shape, we can finally proceed with sentiment analysis.
@@ -23,7 +23,7 @@ Our analysis pipeline will follow a two-step approach. First, we will compute ba
 Let’s start by installing and loading the necessary packages, then bringing in the cleaned dataset so we can begin our sentiment analysis. We will discuss the role of each package in the next episodes.
 
 ``` r
-# Install packages (remove comments for packages you might have skipped)
+# Install packages (remove comments for packages you might have skipped in previous episodes)
 install.packages("sentimentr")
 install.packages("syuzhet")
 # install.packages("dplyr")
diff --git a/chapters/3.SentimentAnalysis/polarity.qmd b/chapters/3.SentimentAnalysis/polarity.qmd
@@ -23,21 +23,21 @@ Words like “but,” “however,” and “although” also influence the senti
 
 With this approach, we can explore more confidently whether the show’s viewers felt positive, neutral, or negative about it.
 
-#### Computing Polarity with Sentiment R (Valence Sifters Capability)
+### Computing Polarity with Sentiment R (Valence Sifters Capability)
 
-##### Calculating sentiment scores
+#### Calculating sentiment scores
 
-Here we’re using the **`sentiment_by()`** function which looks at each comment and calculates a **sentiment score** representing how positive or negative comments are.
+Here we’re using the **`sentiment_by()`** function which looks at each comment and calculates a **sentiment score** representing how positive or negative comments are. Let's enter the following code to select all the values contained in the comments column:
 
 ``` r
 sentiment_scores <- sentiment_by(comments$comments)
 ```
 
-![Sentiment Scores Output](images/sentiment-scores.png){width="342"}
+![Sentiment Scores Output](images/sentiment-scores.png){width="418"}
 
 So after running this, we get a new object called `sentiment_scores` with the average sentiment for every comment. Can you guess why the SD column is empty? A single data point (sentence/row) does not have a standard deviation by itself.
 
-##### Adding those scores back to our dataset
+#### Adding those scores back to our dataset
 
 Now we’re using the **`dplyr`** package to make our dataset more informative. We take our `comments` dataset, and with **`mutate()`**, we add two new columns: `score` and `sentiment label`. The little rule inside **`case_when()`** decides what label to give. The small buffer around zero (±0.1) helps us avoid overreacting to tiny fluctuations.
 
@@ -55,17 +55,17 @@ Let's now take a look at the `sentiment_scores` data frame:
 
 ![Sentiment Scores with Polarity Results](images/polarity-scores.png)
 
-To get a sense of the overall mood of our dataset let's run:
+To get a sense of the overall "mood" of our dataset let's run:
 
 ``` r
 table(polarity$sentiment_label)
 ```
 
-![Overall Polarity Count](images/overall-polarity-count.png){width="363"}
+![Polarity Count](images/overall-polarity-count.png){width="363"}
 
 Overall, the majority of viewers reacted positively to the show, with positive opinions more than double the negative ones, indicating a generally favorable reception. However, this is only part of the story—positive sentiment can range from mildly favorable to very enthusiastic. To better visualize the full distribution of opinions, a histogram is presented below.
 
-#### Plotting Scores
+### Plotting Scores
 
 Next, let's plot some results and histograms to check the distribution for the scores:
 
@@ -77,9 +77,9 @@ ggplot(polarity, aes(x = score)) +
   labs(title = "Sentiment Score Distribution", x = "Average Sentiment", y = "Count")
 ```
 
-![Polarity Distribution](images/histogram-polarity.png){width="529"}
+![Polarity Distribution](images/histogram-polarity.png){width="699"}
 
-This histogram suggests that the overall sentiment toward the Severance show was **mostly neutral to slightly positive**. This suggests that most viewers are expressing their opinions in a **measured, nuanced, or factual** manner, rather than with intense emotional language (either extremely positive or negative).
+This histogram suggests that the overall sentiment toward the Severance show was **mostly neutral to slightly positive**. This suggests that most viewers are expressing their opinions without using intense emotional language (either extremely positive or negative).
 
 We can also break the data down by season to compare how audience opinions vary over each season finale: