Correlation Matrix in R Programming
Last Updated :
24 Nov, 2023
Correlation refers to the relationship between two variables. It refers to the degree of linear correlation between any two random variables. This Correlation Matrix in R can be expressed as a range of values expressed within the interval [-1, 1]. The value -1 indicates a perfect non-linear (negative) relationship, 1 is a perfect positive linear relationship and 0 is an intermediate between neither positive nor negative linear interdependency. Hoindependent of each other completely. Correlation Matrix in R computes the linear relationship degree between a set of random variables, taking one pair at a time and performing for each set of pairs within the data.
Properties of Correlation Matrix in R
- All the diagonal elements of the Correlation Matrix in R must be 1 because the correlation of a variable with itself is always perfect, cii=1.
- It should be symmetric cij=cji.
Computing Correlation Matrix in R
In R Programming Language, a correlation matrix can be completed using the cor( ) function, which has the following syntax:
Syntax: cor (x, use = , method = )
Parameters:
x: It is a numeric matrix or a data frame.
use: Deals with missing data.
- all.obs: this parameter value assumes that the data frame has no missing values and throws an error in case of violation.
- complete.obs: listwise deletion.
- pairwise.complete.obs: pairwise deletion.
method: Deals with a type of relationship. Either Pearson, Spearman, or Kendall can be used for computation. The default method used is Pearson.
Correlation in R Programming Language
The Correlation Matrix in R is done after loading the data. The following code snippet indicates the usage of the cor() function:
R
header = TRUE , fileEncoding = "latin1" )
print ( "Original Data" )
head (data)
cor_data = cor (data)
print ( "Correlation matrix" )
print (cor_data)
|
Output:
[1] "Original Data"
Year Mileage..thousands. Price
1 1998 27 9991
2 1997 17 9925
3 1998 28 10491
4 1998 5 10990
5 1997 38 9493
6 1997 36 9991
[1] "Correlation matrix"
Year Mileage..thousands. Price
Year 1.0000000 -0.7480982 0.9343679
Mileage..thousands. -0.7480982 1.0000000 -0.8113807
Price 0.9343679 -0.8113807 1.0000000
Computing Correlation Coefficients of Correlation Matrix in R
R contains an in-built function rcorr() which generates the correlation coefficients and a table of p-values for all possible column pairs of a data frame. This function basically computes the significance levels for Pearson and spearman correlations.
Syntax: rcorr (x, type = c(“pearson”, “spearman”))
In order to run this function in R, we need to download and load the “Hmisc” package into the environment. This can be done in the following way:
install.packages(“Hmisc”)
library(“Hmisc”)
The following code snippet indicates the computation of correlation coefficients in R:
R
header = TRUE , fileEncoding = "latin1" )
print ( "Original Data" )
head (data)
install.packages ( "Hmisc" )
library ( "Hmisc" )
p_values <- rcorr ( as.matrix (data))
print (p_values)
|
Output:
[1] "Original Data"
Year Mileage..thousands. Price
1 1998 27 9991
2 1997 17 9925
3 1998 28 10491
4 1998 5 10990
5 1997 38 9493
6 1997 36 9991
Year Mileage..thousands. Price
Year 1.00 -0.75 0.93
Mileage..thousands. -0.75 1.00 -0.81
Price 0.93 -0.81 1.00
n= 23
P
Year Mileage..thousands. Price
Year 0 0
Mileage..thousands. 0 0
Price 0 0
Visualize a Correlation Matrix in R
In R, we shall use the “corrplot” package to implement a correlogram. Hence, to install the package from the R Console we should execute the following command:
install.packages("corrplot")
Once we have installed the package properly, we shall load the package in our R script using the library() function as follows:
library("corrplot")
We will use the corrplot() function and mention the shape in its method arguments.
R
library (corrplot)
head (mtcars)
M<- cor (mtcars)
head ( round (M,2))
corrplot (M, method= "circle" )
corrplot (M, method= "pie" )
corrplot (M, method= "color" )
corrplot (M, method= "number" )
|
Output:

Visualize Correlogram as a pie chart
R
corrplot (M, method= "pie" )
|
Output:

Visualize Correlogram as colored rectangles
R
corrplot (M, method= "color" )
|
Output:

Visualize Correlogram as numbers
R
corrplot (M, method= "number" )
|
Output:

Visualize Correlogram as 3D Scatter Plot
R
corrplot (correlation_matrix, method= "ellipse" )
|
Output:

Correlation Matrix in R Programming
Visualize Correlogram as Density Plot
R
corrplot (M, method= "shade" )
|
Output:

Correlation Matrix in R Programming
We can choose the visualization method that best suits your needs or preferences. The corrplot
package provides various customization options for each visualization method.
Similar Reads
Array vs Matrix in R Programming
The data structure is a particular way of organizing data in a computer so that it can be used effectively. The idea is to reduce the space and time complexities of different tasks. Data structures in R programming are tools for holding multiple values. The two most important data structures in R ar
3 min read
Sort Correlation Matrix in Python
Prerequisites: correlation matrix A correlation matrix investigates the dependence between multiple variables at the same time. It shows symmetric tabular data where each row and column represent a variable, and the corresponding value is the correlation coefficient denoting the strength of a relati
4 min read
Pearson Correlation Testing in R Programming
Correlation is a statistical measure that indicates how strongly two variables are related. It involves the relationship between multiple variables as well. For instance, if one is interested to know whether there is a relationship between the heights of fathers and sons, a correlation coefficient c
5 min read
Covariance and Correlation in R Programming
Covariance and Correlation are terms used in statistics to measure relationships between two random variables. Both of these terms measure linear dependency between a pair of random variables or bivariate data. They both capture a different component of the relationship, despite the fact that they b
5 min read
Kendall Correlation Testing in R Programming
Correlation is a statistical measure that indicates how strongly two variables are related. It involves the relationship between multiple variables as well. For instance, if one is interested to know whether there is a relationship between the heights of fathers and sons, a correlation coefficient c
4 min read
Visualize correlation matrix using correlogram in R Programming
A graph of the correlation matrix is known as Correlogram. This is generally used to highlight the variables in a data set or data table that are correlated most. The correlation coefficients in the plot are colored based on the value. Based on the degree of association among the variables, we can r
7 min read
Array Operations in R Programming
Arrays are the R data objects which store the data in more than two dimensions. Arrays are n-dimensional data structures. For example, if we create an array of dimensions (2, 3, 3) then it creates 3 rectangular matrices each with 2 rows and 3 columns. They are homogeneous data structures. Now, letâs
4 min read
Spearman Correlation Testing in R Programming
Correlation is a key statistical concept used to measure the strength and direction of the relationship between two variables. Unlike Pearsonâs correlation, which assumes a linear relationship and continuous data, Spearmanâs rank correlation coefficient is a non-parametric measure that assesses how
3 min read
Visualize Correlation Matrix using symnum function in R Programming
Correlation refers to the relationship between two variables. It refers to the degree of linear correlation between any two random variables. This relation can be expressed as a range of values expressed within the interval [-1, 1]. The value -1 indicates a perfect non-linear (negative) relationship
6 min read
Create a correlation Matrix using Python
In the field of data science and machine learning, a correlation matrix aids in understanding relationships between variables. Correlation matrix represents how different variables interact with each other. For someone who is navigating the complex landscape of data, understanding and harnessing the
8 min read