Skip to content

Commit d0a4f37

Browse files
committed
"Revise ?mergelist Details re. join-from/join-to (fixes #7190)"
1 parent c8bbb58 commit d0a4f37

File tree

1 file changed

+15
-10
lines changed

1 file changed

+15
-10
lines changed

man/mergelist.Rd

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -26,17 +26,22 @@
2626

2727
Merging is performed sequentially from "left to right", so that for \code{l} of 3 tables, it will do something like \code{merge(merge(l[[1L]], l[[2L]]), l[[3L]])}. \emph{Non-equi joins} are not supported. Column names to merge on must be common in both tables on each merge.
2828

29-
Arguments \code{on}, \code{how}, \code{mult}, \code{join.many} could be lists as well, each of length \code{length(l)-1L}, to provide argument to be used for each single tables pair to merge, see examples.
29+
Arguments \code{on}, \code{how}, \code{mult}, \code{join.many} may also be lists, each of length \code{length(l)-1L}, providing the argument to be used at each successive merge; see examples.
3030

31-
The terms \emph{join-to} and \emph{join-from} indicate which in a pair of tables is the "baseline" or "authoritative" source -- this governs the ordering of rows and columns.
31+
Heuristically speaking, the \emph{join-from} table searches the \emph{join-to} table. More precisely:
32+
\itemize{
33+
\item{ \code{mult} determines the policy when a row of \emph{join-from} finds multiple matches in \emph{join-to}. }
34+
\item{ When \code{on} is missing, the key of \emph{join-to} is used as the join column(s). }
35+
}
3236
Whether each refers to the "left" or "right" table of a pair depends on the \code{how} argument:
3337
\enumerate{
34-
\item{ \code{how \%in\% c("left", "semi", "anti")}: \emph{join-to} is \emph{RHS}, \emph{join-from} is \emph{LHS}. }
35-
\item{ \code{how \%in\% c("inner", "full", "cross")}: \emph{LHS} and \emph{RHS} tables are treated equally, so that the terms are interchangeable. }
36-
\item{ \code{how == "right"}: \emph{join-to} is \emph{LHS}, \emph{join-from} is \emph{RHS}. }
38+
\item{ \code{how \%in\% c("left", "semi", "anti")}: \emph{join-from} is \emph{LHS}, \emph{join-to} is \emph{RHS}. }
39+
\item{ \code{how == "right"}: \emph{join-from} is \emph{RHS}, \emph{join-to} is \emph{LHS}. }
40+
\item{ \code{how \%in\% c("inner", "full")}: \emph{LHS} and \emph{RHS} are treated equally, so that each is both \emph{join-from} and \emph{join-to}. }
41+
\item{ \code{how == "cross"}: \code{mult} must be \code{"all"} and \code{on} is not used, so the terms are not relevant. }
3742
}
3843

39-
Using \code{mult="error"} will throw an error when multiple rows in \emph{join-to} table match to the row in \emph{join-from} table. It should not be used just to detect duplicates, which might not have matching row, and thus would silently be missed.
44+
Using \code{mult="error"} will throw an error when a row in the \emph{join-from} table finds multiple matching rows in the \emph{join-to} table. It should not be used just to detect duplicates in \emph{join-to}, as these might not have a matching row in \emph{join-from}, and thus silently be missed.
4045

4146
When not specified, \code{mult} takes its default depending on the \code{how} argument:
4247
\enumerate{
@@ -45,10 +50,10 @@
4550
\item{ When \code{how == "cross"}, \code{mult="all"}. }
4651
}
4752

48-
When the \code{on} argument is missing, it will be determined based \code{how} argument:
49-
\enumerate{
50-
\item{ When \code{how \%in\% c("left", right", "semi", "anti")}, \code{on} becomes the key column(s) of the \emph{join-to} table. }
51-
\item{ When \code{how \%in\% c("inner", full")}, if only one table has a key, then this key is used; if both tables have keys, then \code{on = intersect(key(lhs), key(rhs))}, having its order aligned to shorter key. }
53+
Symmetrical \emph{join-from}/\emph{join-to} treatment of \emph{LHS} and \emph{RHS} when \code{how \%in\% c("inner", "full")} is as follows:
54+
\itemize{
55+
\item{ When \code{mult \%in\% c("first", "last", "error")}, then at distinct each value of the join column(s) the rows joined are respectively first-to-first, last-to-last, and only-to-only (or else an error). }
56+
\item{ If only one table has a key, then this key is used; if both tables have keys, then \code{on = intersect(key(lhs), key(rhs))}, having its order aligned to the shorter key. }
5257
}
5358

5459
When joining tables that are not directly linked to a single table, e.g. a snowflake schema (see References), a \emph{right} outer join can be used to optimize the sequence of merges, see Examples.

0 commit comments

Comments
 (0)