
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Find Correlation Matrix in R Using All Variables of a Data Frame
Correlation matrix helps us to determine the direction and strength of linear relationship among multiple variables at a time. Therefore, it becomes easy to decide which variables should be used in the linear model and which ones could be dropped. We can find the correlation matrix by simply using cor function with data frame name.
Example
Consider the below data frame of continuous variable −
> set.seed(9) > x1<-rnorm(20) > x2<-rnorm(20,0.2) > x3<-rnorm(20,0.5) > x4<-rnorm(20,0.8) > x5<-rnorm(20,1) > df<-data.frame(x1,x2,x3,x4,x5) > df x1 x2 x3 x4 x5 1 -0.76679604 1.95699294 -0.30845634 1.081222227 1.11407587 2 -0.81645834 0.38225214 -1.51938169 -0.402708626 -0.05365988 3 -0.14153519 -0.06688875 -0.23872407 1.265163691 1.15599915 4 -0.27760503 1.12642163 0.88288656 1.152016386 2.30039421 5 0.43630690 -0.49333188 2.23086367 0.210143783 -0.15588645 6 -1.18687252 2.88199007 0.29691805 -0.053599959 1.21604185 7 1.19198691 0.42252448 -0.49639735 0.553267880 1.80447819 8 -0.01819034 -0.50667241 -0.80653629 2.339338571 0.26788427 9 -0.24808460 0.61721325 -0.49783160 1.346077684 -0.61809812 10 -0.36293689 0.56955678 -0.06502873 2.364961851 1.83906927 11 1.27757055 -0.71376435 2.25205784 1.049670178 0.64856205 12 -0.46889715 -0.11691475 -0.04777135 -1.162418630 0.28371561 13 0.07105410 1.24905921 -0.35852571 -0.009060223 0.05970815 14 -0.26603845 0.36811181 0.54929453 0.301314912 1.73016571 15 1.84525720 0.23144021 0.29995552 1.105121769 0.56212952 16 -0.83944966 -0.81033054 -0.60395445 0.510792758 0.75061790 17 -0.07744806 0.58275153 0.74058804 2.257714201 0.32792906 18 -2.61770553 -0.61969653 0.88111362 1.673755484 1.80101407 19 0.88788403 0.56171109 2.73045895 -0.152956042 -0.48886193 20 -0.70749145 0.29337136 1.69920239 0.768324524 1.45401160
Finding the correlation matrix for all variables in df −
> cor(df) x1 x2 x3 x4 x5 x1 1.00000000 -0.1332350 0.25115920 -0.04210749 -0.28891754 x2 -0.13323501 1.0000000 -0.15071432 -0.15398933 0.14759671 x3 0.25115920 -0.1507143 1.00000000 -0.05268172 -0.02505888 x4 -0.04210749 -0.1539893 -0.05268172 1.00000000 0.27861734 x5 -0.28891754 0.1475967 -0.02505888 0.27861734 1.00000000
Consider the below data frame of continuous variable −
> a1<-rpois(20,2) > a2<-rpois(20,5) > a3<-rpois(20,8) > a4<-rpois(20,10) > a5<-rpois(20,15) > df_new<-data.frame(a1,a2,a3,a4,a5) > df_new a1 a2 a3 a4 a5 1 2 8 9 5 13 2 1 4 7 11 16 3 2 2 5 12 11 4 1 3 12 9 15 5 1 4 8 4 14 6 0 6 9 8 14 7 2 6 12 10 9 8 7 5 13 11 20 9 0 6 6 13 19 10 4 7 10 8 12 11 0 3 14 8 20 12 3 2 10 15 13 13 2 8 7 12 14 14 2 6 10 11 14 15 2 1 5 10 21 16 2 3 12 10 14 17 3 6 7 9 17 18 0 7 6 14 16 19 2 6 6 9 15 20 2 3 7 8 12
Finding the correlation matrix for all variables in df_new −
> cor(df_new) a1 a2 a3 a4 a5 a1 1.000000000 0.02485671 0.26409706 0.05617819 0.009229284 a2 0.024856710 1.00000000 -0.04540504 -0.10727065 -0.184062998 a3 0.264097059 -0.04540504 1.00000000 -0.17991092 -0.013487095 a4 0.056178192 -0.10727065 -0.17991092 1.00000000 0.115063107 a5 0.009229284 -0.18406300 -0.01348709 0.11506311 1.000000000
Advertisements