Skip to content

Commit ea31a25

Browse files
author
Youzhi Yu
committed
added README
1 parent f218e1c commit ea31a25

File tree

5 files changed

+351
-0
lines changed

5 files changed

+351
-0
lines changed

.Rbuildignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
^.*\.Rproj$
22
^\.Rproj\.user$
33
^LICENSE\.md$
4+
^README\.Rmd$
5+
^\.github$

.github/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
*.html

.github/workflows/R-CMD-check.yaml

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# Workflow derived from https://github.com/r-lib/actions/tree/master/examples
2+
# Need help debugging build failures? Start at https://github.com/r-lib/actions#where-to-find-help
3+
on:
4+
push:
5+
branches: [main, master]
6+
pull_request:
7+
branches: [main, master]
8+
9+
name: R-CMD-check
10+
11+
jobs:
12+
R-CMD-check:
13+
runs-on: ${{ matrix.config.os }}
14+
15+
name: ${{ matrix.config.os }} (${{ matrix.config.r }})
16+
17+
strategy:
18+
fail-fast: false
19+
matrix:
20+
config:
21+
- {os: macOS-latest, r: 'release'}
22+
- {os: windows-latest, r: 'release'}
23+
- {os: ubuntu-latest, r: 'devel', http-user-agent: 'release'}
24+
- {os: ubuntu-latest, r: 'release'}
25+
- {os: ubuntu-latest, r: 'oldrel-1'}
26+
27+
env:
28+
GITHUB_PAT: ${{ secrets.GITHUB_TOKEN }}
29+
R_KEEP_PKG_SOURCE: yes
30+
31+
steps:
32+
- uses: actions/checkout@v2
33+
34+
- uses: r-lib/actions/setup-pandoc@v1
35+
36+
- uses: r-lib/actions/setup-r@v1
37+
with:
38+
r-version: ${{ matrix.config.r }}
39+
http-user-agent: ${{ matrix.config.http-user-agent }}
40+
use-public-rspm: true
41+
42+
- uses: r-lib/actions/setup-r-dependencies@v1
43+
with:
44+
extra-packages: rcmdcheck
45+
46+
- uses: r-lib/actions/check-r-package@v1

README.Rmd

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
---
2+
output: github_document
3+
---
4+
5+
<!-- README.md is generated from README.Rmd. Please edit that file -->
6+
7+
```{r, include = FALSE}
8+
knitr::opts_chunk$set(
9+
collapse = TRUE,
10+
comment = "#>",
11+
warning = FALSE,
12+
message = FALSE,
13+
fig.path = "man/figures/README-",
14+
out.width = "100%"
15+
)
16+
```
17+
18+
# tidyEmoji
19+
20+
<!-- badges: start -->
21+
[![R-CMD-check](https://github.com/PursuitOfDataScience/tidyEmoji/workflows/R-CMD-check/badge.svg)](https://github.com/PursuitOfDataScience/tidyEmoji/actions)
22+
<!-- badges: end -->
23+
24+
The goal of tidyEmoji is to help R users work with text data with the presence of Emoji as easy as possible. The most common text data that falls into this category would be Tweets. When people tweet their emotions, ideas, celebrations, etc., Emoji sometimes appears on their Tweets, making the text rendered more colorful. To researchers/users who want to work with this type of text, it is intriguing to know the information about Emoji appearing in the text. With the help of tidyEmoji, dealing with Emoji is at ease.
25+
26+
## Installation
27+
28+
Please install the released version of `tidyEmoji` from CRAN with:
29+
30+
``` r
31+
install.packages("tidyEmoji")
32+
```
33+
34+
Alternatively, you can install the latest development version from Github with:
35+
36+
``` r
37+
# install.packages("devtools")
38+
devtools::install_github("PursuitOfDataScience/tidyEmoji")
39+
```
40+
41+
## Usage
42+
43+
Here a tweet-like dataframe is created for brief demostration.
44+
45+
```{r}
46+
library(tidyEmoji)
47+
library(dplyr)
48+
```
49+
50+
```{r}
51+
tweet_df <- data.frame(tweets = c("I love tidyverse \U0001f600\U0001f603\U0001f603",
52+
"R is my language! \U0001f601\U0001f606\U0001f605",
53+
"This Tweet does not have Emoji!",
54+
"Wearing a mask\U0001f637\U0001f637\U0001f637\U0001f637.",
55+
"Emoji does not appear in all Tweets",
56+
"A flag \U0001f600\U0001f3c1"))
57+
```
58+
59+
60+
### Emoji Tweets summary
61+
62+
Emoji Tweets are defined as Tweets containing at least one Emoji.
63+
64+
- `emoji_summary()`:
65+
66+
```{r}
67+
tweet_df %>%
68+
emoji_summary(tweets)
69+
```
70+
71+
`emoji_summary()` gives an overview of how many Emoji Tweets and Tweet in total the data has.
72+
73+
- `emoji_tweets()`:
74+
75+
```{r}
76+
tweet_df %>%
77+
emoji_tweets(tweets)
78+
```
79+
`emoji_tweets()` filters out non-Emoji Tweets while preserving the raw data structure.
80+
81+
82+
### Popular Emoji Tweets
83+
84+
- `top_n_emojis()`:
85+
86+
```{r}
87+
tweet_df %>%
88+
top_n_emojis(tweets, n = 2)
89+
```
90+
91+
`top_n_emojis()` returns a tibble about the most popular Emojis in the entire data. `n` is how many the most popular Emojis users want to output. By default, it is 20.
92+
93+
94+
### Emoji extraction
95+
96+
- `emoji_extract_unnest()`:
97+
98+
```{r}
99+
tweet_df %>%
100+
emoji_extract_unnest(tweets)
101+
```
102+
103+
When looking at the tibble above, it has three columns: `row_number`, `.emoji_unicode`, and `emoji_count`. `row_number` is which row each Tweet is located in the raw data. This can give users a global overview of Emoji and counts.
104+
105+
106+
- `emoji_extract_nest()`:
107+
108+
`emoji_extract_nest()` is analogous to `emoji_extract_unnest()`, but it preserves the raw data with one extra column `.emoji_unicode` added.
109+
110+
```{r}
111+
tweet_df %>%
112+
emoji_extract_nest(tweets)
113+
```
114+
115+
### Emoji category
116+
117+
- `emoji_categorize()`:
118+
119+
```{r}
120+
tweet_df %>%
121+
emoji_categorize(tweets)
122+
```
123+
Each Emoji Tweet is categorized based on the Emoji(s). If Emojis fall into various categories, the `.emoji_category` column has `|` to separate each category.
124+
125+
126+
For more information about tidyEmoji, please refer to the package vignette for a comprehensive introduction.
127+

README.md

Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,175 @@
1+
2+
<!-- README.md is generated from README.Rmd. Please edit that file -->
3+
4+
# tidyEmoji
5+
6+
<!-- badges: start -->
7+
8+
[![R-CMD-check](https://github.com/PursuitOfDataScience/tidyEmoji/workflows/R-CMD-check/badge.svg)](https://github.com/PursuitOfDataScience/tidyEmoji/actions)
9+
<!-- badges: end -->
10+
11+
The goal of tidyEmoji is to help R users work with text data with the
12+
presence of Emoji as easy as possible. The most common text data that
13+
falls into this category would be Tweets. When people tweet their
14+
emotions, ideas, celebrations, etc., Emoji sometimes appears on their
15+
Tweets, making the text rendered more colorful. To researchers/users who
16+
want to work with this type of text, it is intriguing to know the
17+
information about Emoji appearing in the text. With the help of
18+
tidyEmoji, dealing with Emoji is at ease.
19+
20+
## Installation
21+
22+
Please install the released version of `tidyEmoji` from CRAN with:
23+
24+
``` r
25+
install.packages("tidyEmoji")
26+
```
27+
28+
Alternatively, you can install the latest development version from
29+
Github with:
30+
31+
``` r
32+
# install.packages("devtools")
33+
devtools::install_github("PursuitOfDataScience/tidyEmoji")
34+
```
35+
36+
## Usage
37+
38+
Here a tweet-like dataframe is created for brief demostration.
39+
40+
``` r
41+
library(tidyEmoji)
42+
library(dplyr)
43+
```
44+
45+
``` r
46+
tweet_df <- data.frame(tweets = c("I love tidyverse \U0001f600\U0001f603\U0001f603",
47+
"R is my language! \U0001f601\U0001f606\U0001f605",
48+
"This Tweet does not have Emoji!",
49+
"Wearing a mask\U0001f637\U0001f637\U0001f637\U0001f637.",
50+
"Emoji does not appear in all Tweets",
51+
"A flag \U0001f600\U0001f3c1"))
52+
```
53+
54+
### Emoji Tweets summary
55+
56+
Emoji Tweets are defined as Tweets containing at least one Emoji.
57+
58+
- `emoji_summary()`:
59+
60+
``` r
61+
tweet_df %>%
62+
emoji_summary(tweets)
63+
#> # A tibble: 1 x 2
64+
#> emoji_tweets total_tweets
65+
#> <int> <int>
66+
#> 1 4 6
67+
```
68+
69+
`emoji_summary()` gives an overview of how many Emoji Tweets and Tweet
70+
in total the data has.
71+
72+
- `emoji_tweets()`:
73+
74+
``` r
75+
tweet_df %>%
76+
emoji_tweets(tweets)
77+
#> tweets
78+
#> 1 I love tidyverse <U+0001F600><U+0001F603><U+0001F603>
79+
#> 2 R is my language! <U+0001F601><U+0001F606><U+0001F605>
80+
#> 3 Wearing a mask<U+0001F637><U+0001F637><U+0001F637><U+0001F637>.
81+
#> 4 A flag <U+0001F600><U+0001F3C1>
82+
```
83+
84+
`emoji_tweets()` filters out non-Emoji Tweets while preserving the raw
85+
data structure.
86+
87+
### Popular Emoji Tweets
88+
89+
- `top_n_emojis()`:
90+
91+
``` r
92+
tweet_df %>%
93+
top_n_emojis(tweets, n = 2)
94+
#> # A tibble: 2 x 4
95+
#> emoji_name unicode emoji_category n
96+
#> <chr> <chr> <chr> <int>
97+
#> 1 face_with_medical_mask "\U0001f637" Smileys & Emotion 4
98+
#> 2 grinning "\U0001f600" Smileys & Emotion 2
99+
```
100+
101+
`top_n_emojis()` returns a tibble about the most popular Emojis in the
102+
entire data. `n` is how many the most popular Emojis users want to
103+
output. By default, it is 20.
104+
105+
### Emoji extraction
106+
107+
- `emoji_extract_unnest()`:
108+
109+
``` r
110+
tweet_df %>%
111+
emoji_extract_unnest(tweets)
112+
#> # A tibble: 8 x 3
113+
#> row_number .emoji_unicode emoji_count
114+
#> <int> <chr> <int>
115+
#> 1 1 "\U0001f600" 1
116+
#> 2 1 "\U0001f603" 2
117+
#> 3 2 "\U0001f601" 1
118+
#> 4 2 "\U0001f605" 1
119+
#> 5 2 "\U0001f606" 1
120+
#> 6 4 "\U0001f637" 4
121+
#> 7 6 "\U0001f3c1" 1
122+
#> 8 6 "\U0001f600" 1
123+
```
124+
125+
When looking at the tibble above, it has three columns: `row_number`,
126+
`.emoji_unicode`, and `emoji_count`. `row_number` is which row each
127+
Tweet is located in the raw data. This can give users a global overview
128+
of Emoji and counts.
129+
130+
- `emoji_extract_nest()`:
131+
132+
`emoji_extract_nest()` is analogous to `emoji_extract_unnest()`, but it
133+
preserves the raw data with one extra column `.emoji_unicode` added.
134+
135+
``` r
136+
tweet_df %>%
137+
emoji_extract_nest(tweets)
138+
#> tweets
139+
#> 1 I love tidyverse <U+0001F600><U+0001F603><U+0001F603>
140+
#> 2 R is my language! <U+0001F601><U+0001F606><U+0001F605>
141+
#> 3 This Tweet does not have Emoji!
142+
#> 4 Wearing a mask<U+0001F637><U+0001F637><U+0001F637><U+0001F637>.
143+
#> 5 Emoji does not appear in all Tweets
144+
#> 6 A flag <U+0001F600><U+0001F3C1>
145+
#> .emoji_unicode
146+
#> 1 <U+0001F600>, <U+0001F603>, <U+0001F603>
147+
#> 2 <U+0001F601>, <U+0001F606>, <U+0001F605>
148+
#> 3
149+
#> 4 <U+0001F637>, <U+0001F637>, <U+0001F637>, <U+0001F637>
150+
#> 5
151+
#> 6 <U+0001F600>, <U+0001F3C1>
152+
```
153+
154+
### Emoji category
155+
156+
- `emoji_categorize()`:
157+
158+
``` r
159+
tweet_df %>%
160+
emoji_categorize(tweets)
161+
#> # A tibble: 4 x 2
162+
#> tweets .emoji_category
163+
#> <chr> <chr>
164+
#> 1 "I love tidyverse \U0001f600\U0001f603\U0001f603" Smileys & Emotion
165+
#> 2 "R is my language! \U0001f601\U0001f606\U0001f605" Smileys & Emotion
166+
#> 3 "Wearing a mask\U0001f637\U0001f637\U0001f637\U0001f637." Smileys & Emotion
167+
#> 4 "A flag \U0001f600\U0001f3c1" Smileys & Emotion|F~
168+
```
169+
170+
Each Emoji Tweet is categorized based on the Emoji(s). If Emojis fall
171+
into various categories, the `.emoji_category` column has `|` to
172+
separate each category.
173+
174+
For more information about tidyEmoji, please refer to the package
175+
vignette for a comprehensive introduction.

0 commit comments

Comments
 (0)