|
| 1 | + |
| 2 | +<!-- README.md is generated from README.Rmd. Please edit that file --> |
| 3 | + |
| 4 | +# tidyEmoji |
| 5 | + |
| 6 | +<!-- badges: start --> |
| 7 | + |
| 8 | +[](https://github.com/PursuitOfDataScience/tidyEmoji/actions) |
| 9 | +<!-- badges: end --> |
| 10 | + |
| 11 | +The goal of tidyEmoji is to help R users work with text data with the |
| 12 | +presence of Emoji as easy as possible. The most common text data that |
| 13 | +falls into this category would be Tweets. When people tweet their |
| 14 | +emotions, ideas, celebrations, etc., Emoji sometimes appears on their |
| 15 | +Tweets, making the text rendered more colorful. To researchers/users who |
| 16 | +want to work with this type of text, it is intriguing to know the |
| 17 | +information about Emoji appearing in the text. With the help of |
| 18 | +tidyEmoji, dealing with Emoji is at ease. |
| 19 | + |
| 20 | +## Installation |
| 21 | + |
| 22 | +Please install the released version of `tidyEmoji` from CRAN with: |
| 23 | + |
| 24 | +``` r |
| 25 | +install.packages("tidyEmoji") |
| 26 | +``` |
| 27 | + |
| 28 | +Alternatively, you can install the latest development version from |
| 29 | +Github with: |
| 30 | + |
| 31 | +``` r |
| 32 | +# install.packages("devtools") |
| 33 | +devtools::install_github("PursuitOfDataScience/tidyEmoji") |
| 34 | +``` |
| 35 | + |
| 36 | +## Usage |
| 37 | + |
| 38 | +Here a tweet-like dataframe is created for brief demostration. |
| 39 | + |
| 40 | +``` r |
| 41 | +library(tidyEmoji) |
| 42 | +library(dplyr) |
| 43 | +``` |
| 44 | + |
| 45 | +``` r |
| 46 | +tweet_df <- data.frame(tweets = c("I love tidyverse \U0001f600\U0001f603\U0001f603", |
| 47 | + "R is my language! \U0001f601\U0001f606\U0001f605", |
| 48 | + "This Tweet does not have Emoji!", |
| 49 | + "Wearing a mask\U0001f637\U0001f637\U0001f637\U0001f637.", |
| 50 | + "Emoji does not appear in all Tweets", |
| 51 | + "A flag \U0001f600\U0001f3c1")) |
| 52 | +``` |
| 53 | + |
| 54 | +### Emoji Tweets summary |
| 55 | + |
| 56 | +Emoji Tweets are defined as Tweets containing at least one Emoji. |
| 57 | + |
| 58 | +- `emoji_summary()`: |
| 59 | + |
| 60 | +``` r |
| 61 | +tweet_df %>% |
| 62 | + emoji_summary(tweets) |
| 63 | +#> # A tibble: 1 x 2 |
| 64 | +#> emoji_tweets total_tweets |
| 65 | +#> <int> <int> |
| 66 | +#> 1 4 6 |
| 67 | +``` |
| 68 | + |
| 69 | +`emoji_summary()` gives an overview of how many Emoji Tweets and Tweet |
| 70 | +in total the data has. |
| 71 | + |
| 72 | +- `emoji_tweets()`: |
| 73 | + |
| 74 | +``` r |
| 75 | +tweet_df %>% |
| 76 | + emoji_tweets(tweets) |
| 77 | +#> tweets |
| 78 | +#> 1 I love tidyverse <U+0001F600><U+0001F603><U+0001F603> |
| 79 | +#> 2 R is my language! <U+0001F601><U+0001F606><U+0001F605> |
| 80 | +#> 3 Wearing a mask<U+0001F637><U+0001F637><U+0001F637><U+0001F637>. |
| 81 | +#> 4 A flag <U+0001F600><U+0001F3C1> |
| 82 | +``` |
| 83 | + |
| 84 | +`emoji_tweets()` filters out non-Emoji Tweets while preserving the raw |
| 85 | +data structure. |
| 86 | + |
| 87 | +### Popular Emoji Tweets |
| 88 | + |
| 89 | +- `top_n_emojis()`: |
| 90 | + |
| 91 | +``` r |
| 92 | +tweet_df %>% |
| 93 | + top_n_emojis(tweets, n = 2) |
| 94 | +#> # A tibble: 2 x 4 |
| 95 | +#> emoji_name unicode emoji_category n |
| 96 | +#> <chr> <chr> <chr> <int> |
| 97 | +#> 1 face_with_medical_mask "\U0001f637" Smileys & Emotion 4 |
| 98 | +#> 2 grinning "\U0001f600" Smileys & Emotion 2 |
| 99 | +``` |
| 100 | + |
| 101 | +`top_n_emojis()` returns a tibble about the most popular Emojis in the |
| 102 | +entire data. `n` is how many the most popular Emojis users want to |
| 103 | +output. By default, it is 20. |
| 104 | + |
| 105 | +### Emoji extraction |
| 106 | + |
| 107 | +- `emoji_extract_unnest()`: |
| 108 | + |
| 109 | +``` r |
| 110 | +tweet_df %>% |
| 111 | + emoji_extract_unnest(tweets) |
| 112 | +#> # A tibble: 8 x 3 |
| 113 | +#> row_number .emoji_unicode emoji_count |
| 114 | +#> <int> <chr> <int> |
| 115 | +#> 1 1 "\U0001f600" 1 |
| 116 | +#> 2 1 "\U0001f603" 2 |
| 117 | +#> 3 2 "\U0001f601" 1 |
| 118 | +#> 4 2 "\U0001f605" 1 |
| 119 | +#> 5 2 "\U0001f606" 1 |
| 120 | +#> 6 4 "\U0001f637" 4 |
| 121 | +#> 7 6 "\U0001f3c1" 1 |
| 122 | +#> 8 6 "\U0001f600" 1 |
| 123 | +``` |
| 124 | + |
| 125 | +When looking at the tibble above, it has three columns: `row_number`, |
| 126 | +`.emoji_unicode`, and `emoji_count`. `row_number` is which row each |
| 127 | +Tweet is located in the raw data. This can give users a global overview |
| 128 | +of Emoji and counts. |
| 129 | + |
| 130 | +- `emoji_extract_nest()`: |
| 131 | + |
| 132 | +`emoji_extract_nest()` is analogous to `emoji_extract_unnest()`, but it |
| 133 | +preserves the raw data with one extra column `.emoji_unicode` added. |
| 134 | + |
| 135 | +``` r |
| 136 | +tweet_df %>% |
| 137 | + emoji_extract_nest(tweets) |
| 138 | +#> tweets |
| 139 | +#> 1 I love tidyverse <U+0001F600><U+0001F603><U+0001F603> |
| 140 | +#> 2 R is my language! <U+0001F601><U+0001F606><U+0001F605> |
| 141 | +#> 3 This Tweet does not have Emoji! |
| 142 | +#> 4 Wearing a mask<U+0001F637><U+0001F637><U+0001F637><U+0001F637>. |
| 143 | +#> 5 Emoji does not appear in all Tweets |
| 144 | +#> 6 A flag <U+0001F600><U+0001F3C1> |
| 145 | +#> .emoji_unicode |
| 146 | +#> 1 <U+0001F600>, <U+0001F603>, <U+0001F603> |
| 147 | +#> 2 <U+0001F601>, <U+0001F606>, <U+0001F605> |
| 148 | +#> 3 |
| 149 | +#> 4 <U+0001F637>, <U+0001F637>, <U+0001F637>, <U+0001F637> |
| 150 | +#> 5 |
| 151 | +#> 6 <U+0001F600>, <U+0001F3C1> |
| 152 | +``` |
| 153 | + |
| 154 | +### Emoji category |
| 155 | + |
| 156 | +- `emoji_categorize()`: |
| 157 | + |
| 158 | +``` r |
| 159 | +tweet_df %>% |
| 160 | + emoji_categorize(tweets) |
| 161 | +#> # A tibble: 4 x 2 |
| 162 | +#> tweets .emoji_category |
| 163 | +#> <chr> <chr> |
| 164 | +#> 1 "I love tidyverse \U0001f600\U0001f603\U0001f603" Smileys & Emotion |
| 165 | +#> 2 "R is my language! \U0001f601\U0001f606\U0001f605" Smileys & Emotion |
| 166 | +#> 3 "Wearing a mask\U0001f637\U0001f637\U0001f637\U0001f637." Smileys & Emotion |
| 167 | +#> 4 "A flag \U0001f600\U0001f3c1" Smileys & Emotion|F~ |
| 168 | +``` |
| 169 | + |
| 170 | +Each Emoji Tweet is categorized based on the Emoji(s). If Emojis fall |
| 171 | +into various categories, the `.emoji_category` column has `|` to |
| 172 | +separate each category. |
| 173 | + |
| 174 | +For more information about tidyEmoji, please refer to the package |
| 175 | +vignette for a comprehensive introduction. |
0 commit comments