You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description: Imports conversation transcripts into R, concatenates them into a single dataframe appending event identifiers, cleans and formats the text, then yokes user-specified psycholinguistic database values to each word. `ConversationAlign` then computes alignment indices between two interlocutors across each transcript for >40 possible semantic, lexical, and affective dimensions. In addition to alignment, `ConversationAlign` also produces a table of analytics (e.g., token count, type-token-ratio) in a summary table describing your particular text corpus.
16
+
Description: Imports conversation transcripts into R, concatenates them into a single dataframe appending event identifiers, cleans and formats the text, then yokes user-specified psycholinguistic database values to each word. 'ConversationAlign' then computes alignment indices between two interlocutors across each transcript for >40 possible semantic, lexical, and affective dimensions. In addition to alignment, 'ConversationAlign' also produces a table of analytics (e.g., token count, type-token-ratio) in a summary table describing your particular text corpus.
warning(paste0("Some conversations are shorter than 50 exchanges (100 turns). It is recomended that conversations are longer than 50 exchanges. Attached is a list of conversations with fewer than 50 exchanges:\n",
161
-
paste(small_dyads, collapse="\n")))
162
+
warning(paste0("Some conversations are shorter than 50 exchanges (100 turns). ",
163
+
"It is recommended that conversations are longer than 50 exchanges. ",
Copy file name to clipboardExpand all lines: R/compute_lagcorr.R
+2Lines changed: 2 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,8 @@
1
1
#'
2
2
#' computes lagged correlations alignment measure across partners within each conversation
3
3
#' @name compute_lagcorr
4
+
#' @returns
5
+
#' internal function to summarize_dyads that produces a dataframe with lagged correlations across turns (-2,0,2 as default) for each dimension of interest.
Copy file name to clipboardExpand all lines: R/corpus_analytics.R
+2-16Lines changed: 2 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,8 @@
3
3
#' Produces a table of corpus analytics including numbers of complete observations at each step, word counts, lexical diversity (e.g., TTR), stopword ratios, etc. Granularity of the summary statistics are guided by the user (e.g., by conversation, by conversation and speaker, collapsed all)
4
4
#' @name corpus_analytics
5
5
#' @param dat_prep takes dataframe produced from the df_prep() function
6
-
#' @return dataframe with summary analytics for a conversation corpus
6
+
#' @returns
7
+
#' dataframe with summary statistics (mean, SD, range) for numerous corpus analytics (e.g., token count, type-token-ratio, word-count-per-turn) for the target conversation corpus. Summary data structured in table format for easy export to a journal method section.
7
8
#' @importFrom dplyr across
8
9
#' @importFrom dplyr bind_rows
9
10
#' @importFrom dplyr everything
@@ -51,21 +52,6 @@
51
52
# TTR (clean): Group by Event_ID, distinct Text_Clean divided by Text_Clean
#' Cleans, vectorizes and appends lexical norms to all content words in a language corpus. User guides options for stopword removal and lemmatization. User selects up to three psycholinguistic dimensions to yoke norms on each content word in the transcript.
3
+
#' Cleans, vectorizes and appends lexical norms to all content words in a language corpus.
4
+
#' User guides options for stopword removal and lemmatization. User selects up to three psycholinguistic dimensions to yoke norms
5
+
#' on each content word in the original conversation transcript.
4
6
#' @name prep_dyads
5
-
#' @param dat_read data frame produced from the read_dyads() function
#' @param dat_read dataframe produced from read_dyads() function
8
+
#' @param omit_stops option to remove stopwords, default TRUE
7
9
#' @param lemmatize logical, should words be lemmatized (switched to base morphological form), default is TRUE
8
-
#' @param which_stoplist user specifies stopword removal method with options including "none", "SMART", "MIT_stops", "CA_OriginalStops", or "Temple_Stopwords25". "Temple_Stopwords25 is the default list
9
-
#' @return dataframe with cleaned text data, formatted with one word per row
10
+
#' @param which_stoplist user-specified stopword removal method with options including "none", "SMART", "MIT_stops", "CA_OriginalStops", or "Temple_Stopwords25".
11
+
#' "Temple_Stopwords25 is the default list
12
+
#' @returns
13
+
#' dataframe with text cleaned and vectorized to a one word per-row format.
14
+
#' Lexical norms and metadata are appended to each content word. Cleaned text appears under a new column
15
+
#' called 'Text_Clean'. Any selected dimensions (e.g., word length) and metadata are also appended to each word along
16
+
#' with speaker identity, turn, and Event_ID (conversation identifier).
Copy file name to clipboardExpand all lines: R/summarize_dyads.R
+4-9Lines changed: 4 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -1,12 +1,15 @@
1
1
#' summarize_dyads
2
2
#'
3
-
#' Calculates and appends 3 measures for quantifying alignment. Appends the mean score for each dimension by turn. Calculates and Spearman's rank correlation between interlocutor time series and appends by transcript. Calculates the area under the curve of the absolute difference time series between interlocutor time series. The length of the difference time series can be standardized the shortest number of exchanges present in the group using an internally defined resampling function, called with resample = TRUE. Spearman's rank correlation and area under the curve become less reliable for dyads under 30 exchanges.
3
+
#' Calculates and appends 3 measures for quantifying alignment. Appends the averaged value for each selected dimension by turn and speaker. Calculates and Spearman's rank correlation between interlocutor time series and appends by transcript. Calculates the area under the curve of the absolute difference time series between interlocutor time series. The length of the difference time series can be standardized the shortest number of exchanges present in the group using an internally defined resampling function, called with resample = TRUE. Spearman's rank correlation and area under the curve become less reliable for dyads under 30 exchanges.
4
4
#'
5
5
#' @name summarize_dyads
6
6
#' @param df_prep produced in the align_dyads function
7
7
#' @param custom_lags integer vector, should any lags be added in addition to -2, 0, 2
#' @param sumdat_only default=TRUE, group and summarize data, two rows per conversation, one row for each participant, false will fill down summary statistics across all exchanges
10
+
#' @returns either:
11
+
#' - a grouped dataframe with summary data aggregated by converation (Event_ID) and participant if sumdat_only=T.
12
+
#' - the origoinal dataframe 'filled down' with summary data (e.g., AUC, turn-by-turn correlations) for each conversation is sumdat_only=F.
0 commit comments