merge {base} | R Documentation |
Description
Merge two data frames by common columns or row names, or do otherversions of database join operations.
Usage
merge(x, y, ...)## Default S3 method:merge(x, y, ...)## S3 method for class 'data.frame'merge(x, y, by = intersect(names(x), names(y)), by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all, sort = TRUE, suffixes = c(".x",".y"), no.dups = TRUE, incomparables = NULL, ...)
Arguments
x , y | data frames, or objects to be coerced to one. |
by , by.x , by.y | specifications of the columns used for merging.See ‘Details’. |
all | logical; |
all.x | logical; if |
all.y | logical; analogous to |
sort | logical. Should the result be sorted on the |
suffixes | a character vector of length 2 specifying the suffixesto be used for making unique the names of columns in the resultwhich are not used for merging (appearing in |
no.dups | logical indicating that |
incomparables | values which cannot be matched. See |
... | arguments to be passed to or from methods. |
Details
merge
is a generic function whose principal method is for dataframes: the default method coerces its arguments to data frames andcalls the "data.frame"
method.
By default the data frames are merged on the columns with names theyboth have, but separate specifications of the columns can be given byby.x
and by.y
. The rows in the two data frames thatmatch on the specified columns are extracted, and joined together. Ifthere is more than one match, all possible matches contribute one roweach. For the precise meaning of ‘match’, seematch
.
Columns to merge on can be specified by name, number or by a logicalvector: the name "row.names"
or the number 0
specifiesthe row names. If specified by name it must correspond uniquely to anamed column in the input.
If by
or both by.x
and by.y
are of length 0 (alength zero vector or NULL
), the result, r
, is theCartesian product of x
and y
, i.e.,dim(r) = c(nrow(x)*nrow(y), ncol(x) + ncol(y))
.
If all.x
is true, all the non matching cases of x
areappended to the result as well, with NA
filled in thecorresponding columns of y
; analogously for all.y
.
If the columns in the data frames not used in merging have any commonnames, these have suffixes
(".x"
and ".y"
bydefault) appended to try to make the names of the result unique. Ifthis is not possible, an error is thrown.
If a by.x
column name matches one of y
, and ifno.dups
is true (as by default), the y version gets suffixed aswell, avoiding duplicate column names in the result.
The complexity of the algorithm used is proportional to the length ofthe answer.
In SQL database terminology, the default value of all = FALSE
gives a natural join, a special case of an innerjoin. Specifying all.x = TRUE
gives a left (outer)join, all.y = TRUE
a right (outer) join, and both(all = TRUE
) a (full) outer join. DBMSes do not matchNULL
records, equivalent to incomparables = NA
in R.
Value
A data frame. The rows are by default lexicographically sorted on thecommon columns, but for sort = FALSE
are in an unspecified order.The columns are the common columns followed by theremaining columns in x
and then those in y
. If thematching involved row names, an extra character column calledRow.names
is added at the left, and in all cases the result has‘automatic’ row names.
Note
This is intended to work with data frames with vector-like columns:some aspects work with data frames containing matrices, but not all.
Currently long vectors are not accepted for inputs, which are thusrestricted to less than 2^31 rows. That restriction also applies tothe result for 32-bit platforms.
See Also
data.frame
,by
,cbind
.
dendrogram
for a class which has a merge
method.
Examples
authors <- data.frame( ## I(*) : use character columns of names to get sensible sort order surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")), nationality = c("US", "Australia", "US", "UK", "Australia"), deceased = c("yes", rep("no", 4)))authorN <- within(authors, { name <- surname; rm(surname) })books <- data.frame( name = I(c("Tukey", "Venables", "Tierney", "Ripley", "Ripley", "McNeil", "R Core")), title = c("Exploratory Data Analysis", "Modern Applied Statistics ...", "LISP-STAT", "Spatial Statistics", "Stochastic Simulation", "Interactive Data Analysis", "An Introduction to R"), other.author = c(NA, "Ripley", NA, NA, NA, NA, "Venables & Smith"))(m0 <- merge(authorN, books))(m1 <- merge(authors, books, by.x = "surname", by.y = "name")) m2 <- merge(books, authors, by.x = "name", by.y = "surname")stopifnot(exprs = { identical(m0, m2[, names(m0)]) as.character(m1[, 1]) == as.character(m2[, 1]) all.equal(m1[, -1], m2[, -1][ names(m1)[-1] ]) identical(dim(merge(m1, m2, by = NULL)), c(nrow(m1)*nrow(m2), ncol(m1)+ncol(m2)))})## "R core" is missing from authors and appears only here :merge(authors, books, by.x = "surname", by.y = "name", all = TRUE)## example of using 'incomparables'x <- data.frame(k1 = c(NA,NA,3,4,5), k2 = c(1,NA,NA,4,5), data = 1:5)y <- data.frame(k1 = c(NA,2,NA,4,5), k2 = c(NA,NA,3,4,5), data = 1:5)merge(x, y, by = c("k1","k2")) # NA's matchmerge(x, y, by = "k1") # NA's match, so 6 rowsmerge(x, y, by = "k2", incomparables = NA) # 2 rows
[Package base version 4.4.1 Index]