|
26 | 26 |
|
27 | 27 | Merging is performed sequentially from "left to right", so that for \code{l} of 3 tables, it will do something like \code{merge(merge(l[[1L]], l[[2L]]), l[[3L]])}. \emph{Non-equi joins} are not supported. Column names to merge on must be common in both tables on each merge. |
28 | 28 |
|
29 | | - Arguments \code{on}, \code{how}, \code{mult}, \code{join.many} could be lists as well, each of length \code{length(l)-1L}, to provide argument to be used for each single tables pair to merge, see examples. |
| 29 | + Arguments \code{on}, \code{how}, \code{mult}, \code{join.many} may also be lists, each of length \code{length(l)-1L}, providing the argument to be used at each successive merge; see examples. |
30 | 30 |
|
31 | | - The terms \emph{join-to} and \emph{join-from} indicate which in a pair of tables is the "baseline" or "authoritative" source -- this governs the ordering of rows and columns. |
| 31 | + Heuristically speaking, the \emph{join-from} table searches the \emph{join-to} table. More precisely: |
| 32 | + \itemize{ |
| 33 | + \item{ \code{mult} determines the policy when a row of \emph{join-from} finds multiple matches in \emph{join-to}. } |
| 34 | + \item{ When \code{on} is missing, the key of \emph{join-to} is used as the join column(s). } |
| 35 | + } |
32 | 36 | Whether each refers to the "left" or "right" table of a pair depends on the \code{how} argument: |
33 | 37 | \enumerate{ |
34 | | - \item{ \code{how \%in\% c("left", "semi", "anti")}: \emph{join-to} is \emph{RHS}, \emph{join-from} is \emph{LHS}. } |
35 | | - \item{ \code{how \%in\% c("inner", "full", "cross")}: \emph{LHS} and \emph{RHS} tables are treated equally, so that the terms are interchangeable. } |
36 | | - \item{ \code{how == "right"}: \emph{join-to} is \emph{LHS}, \emph{join-from} is \emph{RHS}. } |
| 38 | + \item{ \code{how \%in\% c("left", "semi", "anti")}: \emph{join-from} is \emph{LHS}, \emph{join-to} is \emph{RHS}. } |
| 39 | + \item{ \code{how == "right"}: \emph{join-from} is \emph{RHS}, \emph{join-to} is \emph{LHS}. } |
| 40 | + \item{ \code{how \%in\% c("inner", "full")}: \emph{LHS} and \emph{RHS} are treated equally, so that each is both \emph{join-from} and \emph{join-to}. } |
| 41 | + \item{ \code{how == "cross"}: \code{mult} must be \code{"all"} and \code{on} is not used, so the terms are not relevant. } |
37 | 42 | } |
38 | 43 |
|
39 | | - Using \code{mult="error"} will throw an error when multiple rows in \emph{join-to} table match to the row in \emph{join-from} table. It should not be used just to detect duplicates, which might not have matching row, and thus would silently be missed. |
| 44 | + Using \code{mult="error"} will throw an error when a row in the \emph{join-from} table finds multiple matching rows in the \emph{join-to} table. It should not be used just to detect duplicates in \emph{join-to}, as these might not have a matching row in \emph{join-from}, and thus silently be missed. |
40 | 45 |
|
41 | 46 | When not specified, \code{mult} takes its default depending on the \code{how} argument: |
42 | 47 | \enumerate{ |
|
45 | 50 | \item{ When \code{how == "cross"}, \code{mult="all"}. } |
46 | 51 | } |
47 | 52 |
|
48 | | - When the \code{on} argument is missing, it will be determined based \code{how} argument: |
49 | | - \enumerate{ |
50 | | - \item{ When \code{how \%in\% c("left", right", "semi", "anti")}, \code{on} becomes the key column(s) of the \emph{join-to} table. } |
51 | | - \item{ When \code{how \%in\% c("inner", full")}, if only one table has a key, then this key is used; if both tables have keys, then \code{on = intersect(key(lhs), key(rhs))}, having its order aligned to shorter key. } |
| 53 | + Symmetrical \emph{join-from}/\emph{join-to} treatment of \emph{LHS} and \emph{RHS} when \code{how \%in\% c("inner", "full")} is as follows: |
| 54 | + \itemize{ |
| 55 | + \item{ When \code{mult \%in\% c("first", "last", "error")}, then at distinct each value of the join column(s) the rows joined are respectively first-to-first, last-to-last, and only-to-only (or else an error). } |
| 56 | + \item{ If only one table has a key, then this key is used; if both tables have keys, then \code{on = intersect(key(lhs), key(rhs))}, having its order aligned to the shorter key. } |
52 | 57 | } |
53 | 58 |
|
54 | 59 | When joining tables that are not directly linked to a single table, e.g. a snowflake schema (see References), a \emph{right} outer join can be used to optimize the sequence of merges, see Examples. |
|
0 commit comments