How to Create and Interpret Pairs Plots in R?
Last Updated :
27 Jan, 2022
In this article, we will discuss how to create and interpret Pair Plots in the R Language.
The Pair Plot helps us to visualize the distribution of single variables as well as relationships between two variables. They are a great method to identify trends between variables for follow-up analysis. Pair plots are essentially multipanel scatter plots where every different panel contains a scatter plot between a pair of variables.
Method 1: Create Pair Plots in Base R
To create a Pair Plot in the R Language, we use the pairs() function. The pairs function is provided in R Language by default and it produces a matrix of scatterplots. The pairs() function takes the data frame as an argument and returns a matrix of scatter plots between each pair of variables in the data frame.
Syntax: pairs( df )
Parameter:
- df: determines the data frame used for plotting to scatter plot.
Example:
Here, is a basic Pair Plot in Base R.
R
# create sample_data
x <- rnorm(500)
y <- x + rnorm(500, 0, 10)
z <- x - rnorm(500, 0, 7)
sample_data <- data.frame(x, y, z)
#create pairs plot
pairs( sample_data )
Output:
Here, in the above pair plot, diagonal boxes show the name of variables x, y, and z. All other boxes display a scatterplot between each pairwise combination of variables. For example, the second box shows a scatterplot between x and y whereas the third box shows a scatter plot between x and z.
The problem with this pair plot is that this doesn't give us any statistical information about variables and there are only three distinguished scatter plots out of six in the above figure as x-z and z-x plot is same, y-x and x-y plot is same, and y-z and z-y plot is same. So, there is a wastage of space as well as the absence of relational data. To solve this we use the ggplot2 package.
Method 2: Create Pair Plots Using ggplot2 and ggally
To create a pair plot using the ggplot2 package, we use the ggpairs() function of the ggally package. The ggally package is an extension of the ggplot2 package which extends the ggplot2 package by adding several functions to reduce the complexity of combining the geoms with transformed data. The ggpairs() function makes a matrix of plots with a given data set. It produces scatter plots for each pair of variables, density plots for each variable, and also shows the Pearson Correlation Coefficients of each pair of variables.
Syntax:
ggpairs( df )
Parameter:
- df: determines the data frame used for plotting to scatter plot.
Example:
Here, is a basic Pair Plot using the ggplot2 and ggally package library.
R
# load libraries ggplot2 and ggally
library(ggplot2)
library(GGally)
# create sample_data
x <- rnorm(500)
y <- x + rnorm(500, 0, 10)
z <- x - rnorm(500, 0, 7)
sample_data <- data.frame(x, y, z)
# create pairs plot
ggpairs( sample_data )
Output:
Here, in the above pair plot, the variable names are displayed on the outer edges of the matrix as x, y, and z. The boxes along the diagonals display the density plot for each variable whereas the boxes in the lower-left corner display the scatterplot between each -pair of variables. The boxes in the upper right corner display the Pearson correlation coefficient between each variable.Â
The Pearson correlation gives us the measure of the linear relationship between two variables. It has a value between -1 to 1, where a value of -1 signifies a total negative linear correlation, 0 signifies no correlation, and + 1 signifies a total positive correlation.
The pair plots made by using the ggplot2 package are better because they give more visual information with no repetition of the same plot. They also give us the Pearson correlation coefficient which helps us in understanding the relationship between those variables.
Similar Reads
How to Create Interaction Plot in R?
In this article, we will discuss how to create an interaction plot in the R Programming Language. The interaction plot shows the relationship between a continuous variable and a categorical variable in relation to another categorical variable. It lets us know whether two categorical variables have a
3 min read
How to Create a Log-Log Plot in R?
In this article, we will discuss how to create a Log-Log plot in the R Programming Language. A log-log plot is a plot that uses logarithmic scales on both the axes i.e., the x-axis and the y-axis.We can create a Log-Log plot in the R language by following methods. Log-Log Plot in Base R: To create a
2 min read
How to Create a Bland-Altman Plot in R?
In this article, we will discuss how to create a Bland-Altman Plot in the R programming Language. The Bland-Altman plot helps us to visualize the difference in measurements between two different measurement techniques. It is used vastly in the field of biochemistry. It is useful for determining how
3 min read
How to Create a Forest Plot in R?
In this article, we will discuss how to create a Forest Plot in the R programming language. A forest plot is also known as a blobbogram. It helps us to visualize estimated results from a certain number of studies together along with the overall results in a single plot. It is extensively used in med
4 min read
How to Create Added Variable Plots in R?
In this article, we will discuss how to create an added variable plot in the R Programming Language. The Added variable plot is an individual plot that displays the relationship between a response variable and one predictor variable in a multiple linear regression model while controlling for the pre
3 min read
How to Create a Scatterplot with a Regression Line in R?
A scatter plot uses dots to represent values for two different numeric variables. Scatter plots are used to observe relationships between variables. A linear regression is a straight line representation of relationship between an independent and dependent variable. In this article, we will discuss h
3 min read
How to Install plotly in Anaconda for R
The Plotly is a powerful and versatile graphing library that allows users to create interactive, publication, and quality graphs. Plotly is a popular library for building interactive graphs and visualizations in R Programming Language. When using Anaconda, a distribution of Python and R for scientif
2 min read
How to Create a Population Pyramid in R?
In this article, we will discuss how to create a population pyramid in the R Programming Language. A population pyramid is also known as an age-sex pyramid. It helps us to visualize the distribution of a population by age group and sex. It generally takes the shape of a pyramid. In the population py
4 min read
Creation & Interpretation of Line Plots
In this article, we will discuss about the line chart. Where we will know, what is a line chart, key concepts and interpretations of a line chart. And finally, we will plot the line chart graph with R.Line plotsLine plots, also known as line graphs or time series plots, are essential tools in data v
6 min read
How to Create and Visualise Volcano Plot in R
In R, a volcano plot is commonly used in bioinformatics and genomics to visualize differential expression analysis results. It displays fold change on the x-axis and statistical significance on the y-axis, typically represented as -log10(p-value). Volcano plots provide a concise way to identify sign
6 min read