R: Merge Two Data Frames (2024)

merge {base}R Documentation

Description

Merge two data frames by common columns or row names, or do otherversions of database join operations.

Usage

merge(x, y, ...)## Default S3 method:merge(x, y, ...)## S3 method for class 'data.frame'merge(x, y, by = intersect(names(x), names(y)), by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all, sort = TRUE, suffixes = c(".x",".y"), no.dups = TRUE, incomparables = NULL, ...)

Arguments

x, y

data frames, or objects to be coerced to one.

by, by.x, by.y

specifications of the columns used for merging.See ‘Details’.

all

logical; all = L is shorthand for all.x = L andall.y = L, where L is either TRUE orFALSE.

all.x

logical; if TRUE, then extra rows will be added tothe output, one for each row in x that has no matching row iny. These rows will have NAs in those columns that areusually filled with values from y. The default isFALSE, so that only rows with data from both x andy are included in the output.

all.y

logical; analogous to all.x.

sort

logical. Should the result be sorted on the bycolumns?

suffixes

a character vector of length 2 specifying the suffixesto be used for making unique the names of columns in the resultwhich are not used for merging (appearing in by etc).

no.dups

logical indicating that suffixes are appended inmore cases to avoid duplicated column names in the result. Thiswas implicitly false before R version 3.5.0.

incomparables

values which cannot be matched. Seematch. This is intended to be used for merging on onecolumn, so these are incomparable values of that column.

...

arguments to be passed to or from methods.

Details

merge is a generic function whose principal method is for dataframes: the default method coerces its arguments to data frames andcalls the "data.frame" method.

By default the data frames are merged on the columns with names theyboth have, but separate specifications of the columns can be given byby.x and by.y. The rows in the two data frames thatmatch on the specified columns are extracted, and joined together. Ifthere is more than one match, all possible matches contribute one roweach. For the precise meaning of ‘match’, seematch.

Columns to merge on can be specified by name, number or by a logicalvector: the name "row.names" or the number 0 specifiesthe row names. If specified by name it must correspond uniquely to anamed column in the input.

If by or both by.x and by.y are of length 0 (alength zero vector or NULL), the result, r, is theCartesian product of x and y, i.e.,dim(r) = c(nrow(x)*nrow(y), ncol(x) + ncol(y)).

If all.x is true, all the non matching cases of x areappended to the result as well, with NA filled in thecorresponding columns of y; analogously for all.y.

If the columns in the data frames not used in merging have any commonnames, these have suffixes (".x" and ".y" bydefault) appended to try to make the names of the result unique. Ifthis is not possible, an error is thrown.

If a by.x column name matches one of y, and ifno.dups is true (as by default), the y version gets suffixed aswell, avoiding duplicate column names in the result.

The complexity of the algorithm used is proportional to the length ofthe answer.

In SQL database terminology, the default value of all = FALSEgives a natural join, a special case of an innerjoin. Specifying all.x = TRUE gives a left (outer)join, all.y = TRUE a right (outer) join, and both(all = TRUE) a (full) outer join. DBMSes do not matchNULL records, equivalent to incomparables = NA in R.

Value

A data frame. The rows are by default lexicographically sorted on thecommon columns, but for sort = FALSE are in an unspecified order.The columns are the common columns followed by theremaining columns in x and then those in y. If thematching involved row names, an extra character column calledRow.names is added at the left, and in all cases the result has‘automatic’ row names.

Note

This is intended to work with data frames with vector-like columns:some aspects work with data frames containing matrices, but not all.

Currently long vectors are not accepted for inputs, which are thusrestricted to less than 2^31 rows. That restriction also applies tothe result for 32-bit platforms.

See Also

data.frame,by,cbind.

dendrogram for a class which has a merge method.

Examples

authors <- data.frame( ## I(*) : use character columns of names to get sensible sort order surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")), nationality = c("US", "Australia", "US", "UK", "Australia"), deceased = c("yes", rep("no", 4)))authorN <- within(authors, { name <- surname; rm(surname) })books <- data.frame( name = I(c("Tukey", "Venables", "Tierney", "Ripley", "Ripley", "McNeil", "R Core")), title = c("Exploratory Data Analysis", "Modern Applied Statistics ...", "LISP-STAT", "Spatial Statistics", "Stochastic Simulation", "Interactive Data Analysis", "An Introduction to R"), other.author = c(NA, "Ripley", NA, NA, NA, NA, "Venables & Smith"))(m0 <- merge(authorN, books))(m1 <- merge(authors, books, by.x = "surname", by.y = "name")) m2 <- merge(books, authors, by.x = "name", by.y = "surname")stopifnot(exprs = { identical(m0, m2[, names(m0)]) as.character(m1[, 1]) == as.character(m2[, 1]) all.equal(m1[, -1], m2[, -1][ names(m1)[-1] ]) identical(dim(merge(m1, m2, by = NULL)), c(nrow(m1)*nrow(m2), ncol(m1)+ncol(m2)))})## "R core" is missing from authors and appears only here :merge(authors, books, by.x = "surname", by.y = "name", all = TRUE)## example of using 'incomparables'x <- data.frame(k1 = c(NA,NA,3,4,5), k2 = c(1,NA,NA,4,5), data = 1:5)y <- data.frame(k1 = c(NA,2,NA,4,5), k2 = c(NA,NA,3,4,5), data = 1:5)merge(x, y, by = c("k1","k2")) # NA's matchmerge(x, y, by = "k1") # NA's match, so 6 rowsmerge(x, y, by = "k2", incomparables = NA) # 2 rows

[Package base version 4.4.1 Index]

R: Merge Two Data Frames (2024)

FAQs

How to combine two data frames into one in R? ›

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.

How do I combine multiple data frames into one? ›

The pandas. concat() method is used to combine DataFrames either vertically (along rows) or horizontally (along columns). It takes a list of DataFrames as input and concatenates them based on the specified axis (0 for vertical, 1 for horizontal).

How do I join more than two data frames in R? ›

To join more than two (multiple) R data frames use the reduce() function from tidyverse package. This function takes all the data frames as a list and joins the data frames based on the specified column.

How do I concatenate two data frames? ›

Here are some general guidelines:
  1. Use 'concat()' for combining DataFrames or Series along a particular axis (rows or columns) without considering any common keys or indexes. ...
  2. Use 'join()' for combining DataFrames based on their index values. ...
  3. Use 'merge()' for combining DataFrames based on a common column or index.
Aug 15, 2023

How to merge two datasets? ›

To do this you use a MERGE statement and a BY statement within a data step, like this: DATA New-Dataset-Name (OPTIONS); MERGE Dataset-Name-1 (OPTIONS) Dataset-Name-2 (OPTIONS); BY Variable(s); RUN; You must sort both datasets on your matching variable(s) before merging them!

How do I combine two data frames by row? ›

The concat() function in Pandas can be used to concatenate two or more DataFrames along a particular axis. In this example, we create two sample dataframes df1 and df2 , each with two columns and three rows of data. We then use the concat() function to concatenate the two dataframes along the row axis ( axis=0 ).

How do you merge two DataFrames into a new one? ›

Example 1: Combining Two DataFrame Using append() Method

In this example, two Pandas DataFrames, df1 and df2 , are combined using the append method, resulting in a new DataFrame named 'result'. The resulting DataFrame contains all rows from both df1 and df2 , with a continuous index.

How to merge two DataFrames based on a common column in R? ›

The merge() function in base R can be used to merge input dataframes by common columns or row names. The merge() function retains all the row names of the dataframes, behaving similarly to the inner join. The dataframes are combined in order of the appearance in the input function call.

How do I merge two data frames with the same column? ›

The merge() function is used to merge two dataframes based on a common column or index. It can be used to merge dataframes horizontally (along columns) or vertically (along rows). Where df1 and df2 are the dataframes you want to merge, and on='index' specifies that you want to merge the dataframes based on the index.

How do I combine two or more datasets in R? ›

Merge – adds variables to a dataset. This document will use –merge– function. Merging two datasets require that both have at least one variable in common (either string or numeric). If string make sure the categories have the same spelling (i.e. country names, etc.).

How can you merge two data frames without losing any rows? ›

Outer Join

If a row doesn't have a match in the other DataFrame based on the key column(s), then you won't lose the row like you would with an inner join. Instead, the row will be in the merged DataFrame, with NaN values filled in where appropriate.

How to merge DataFrames in R dplyr? ›

Using 'dplyr' package:

The primary function for merging in 'dplyr' is 'join()', which supports various types of joins. 'x' and 'y' are the data frames to be merged. 'type_of_join' can be 'inner', 'left',' right' or 'full' to specify the type of join.

How to combine two data frames in R? ›

In R we use merge() function to merge two dataframes in R. This function is present inside join() function of dplyr package. The most important condition for joining two dataframes is that the column type should be the same on which the merging happens. merge() function works similarly like join in DBMS.

Which function is used to merge two data frames? ›

The merge() method updates the content of two DataFrame by merging them together, using the specified method(s). Use the parameters to control which values to keep and which to replace.

How do you combine data from multiple DataFrames? ›

The merge() operation is a method used to combine two dataframes based on one or more common columns, also called keys. The resulting data frame contains only the rows from both dataframes with matching keys. The merge() function is similar to the SQL JOIN operation.

How do I combine two lists into a DataFrame in R? ›

The expand. grid function create a data frame from all combinations of the provided lists or vectors or factors. For example, if we have two lists defined as List1 and List2 then we can create a data frame using the code expand. grid(List1,List2).

How do I combine two data series into a DataFrame? ›

Using pd.

concat() to concatenate s1 and s2 along axis 1 (columns) to create a DataFrame df . The resulting DataFrame has two columns, with the values from s1 in the first column and the values from s2 in the second column. Note that the index labels from the original Series are preserved in the resulting DataFrame.

How do I combine two columns into one DataFrame in R? ›

Merge Two Columns into One using paste0()

The paste0() function is used for the concatenation of two columns into one column of the R data frame without any separator. This code creates a new column call Location in the data frame df .

How do I combine two data frames in R with different column names? ›

To join data frames on the different columns in R use either base merge() function or use dplyr functions. Using the dplyr functions is the best approach as it runs faster than the R base approach. dplyr package provides several functions to join R data frames and all these supports merge on the different column names.

References

Top Articles
Latest Posts
Article information

Author: Maia Crooks Jr

Last Updated:

Views: 6391

Rating: 4.2 / 5 (63 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Maia Crooks Jr

Birthday: 1997-09-21

Address: 93119 Joseph Street, Peggyfurt, NC 11582

Phone: +2983088926881

Job: Principal Design Liaison

Hobby: Web surfing, Skiing, role-playing games, Sketching, Polo, Sewing, Genealogy

Introduction: My name is Maia Crooks Jr, I am a homely, joyous, shiny, successful, hilarious, thoughtful, joyous person who loves writing and wants to share my knowledge and understanding with you.