Skip to content

Commit ff4d994

Browse files
Update data_manipulation/join_multiple_datasets.r
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
1 parent 66d17c3 commit ff4d994

File tree

1 file changed

+33
-0
lines changed

1 file changed

+33
-0
lines changed

data_manipulation/join_multiple_datasets.r

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,38 @@
11
# join_multiple_datasets.r
2+
#
3+
# Algorithm Description:
4+
# This script provides a function to join multiple datasets (data frames or CSV file paths)
5+
# by their common columns using inner joins. The function reads input datasets, removes
6+
# empty or invalid ones, and then sequentially merges them on their shared columns.
7+
# The merging is performed using dplyr's inner_join, and missing values are replaced with
8+
# empty strings. The algorithm iteratively joins datasets, so its complexity is O(n * m),
9+
# where n is the number of datasets and m is the average number of rows in each dataset.
10+
#
11+
# Example usage:
12+
# result <- join_multiple_datasets(list("data1.csv", "data2.csv", df3))
13+
# head(result)
14+
#
215

16+
#' Join Multiple Datasets by Common Columns
17+
#'
18+
#' This function takes a list of data frames or CSV file paths and joins them sequentially
19+
#' on their common columns using inner joins. It reads CSV files if paths are provided,
20+
#' removes empty or invalid datasets, and merges the remaining datasets. Missing values
21+
#' in the result are replaced with empty strings.
22+
#'
23+
#' @param inputs A list of data frames and/or character strings representing CSV file paths.
24+
#' @return A data frame resulting from the inner join of all valid input datasets on their common columns.
25+
#' @examples
26+
#' # Example 1: Joining three data frames
27+
#' df1 <- data.frame(id = 1:3, val1 = c("A", "B", "C"))
28+
#' df2 <- data.frame(id = 2:3, val2 = c("X", "Y"))
29+
#' df3 <- data.frame(id = 3, val3 = "Z")
30+
#' result <- join_multiple_datasets(list(df1, df2, df3))
31+
#' print(result)
32+
#'
33+
#' # Example 2: Joining CSV files and a data frame
34+
#' result <- join_multiple_datasets(list("file1.csv", "file2.csv", df3))
35+
#' head(result)
336
library(dplyr)
437
library(purrr)
538

0 commit comments

Comments
 (0)