
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Detect Multicollinearity in Categorical Variables Using R
The multicollinearity is the term is related to numerical variables. It means that independent variables are linearly correlated to each other and they are numerical in nature. The categorical variables are either ordinal or nominal in nature hence we cannot say that they can be linearly correlated.
Example
Consider the below data frame −
x<-sample(LETTERS[1:4],30,replace=TRUE) y<-sample(letters[1:4],30,replace=TRUE) response<-rnorm(30) df<-data.frame(x,y,response) df
Output
x y response 1 C c 0.742577646 2 C b 0.151037885 3 A d 0.872867986 4 D c 1.668988206 5 C a -0.310929854 6 B b -0.582732624 7 A a -1.189979792 8 A d 0.869424789 9 B c 1.321981265 10 A c -0.378250113 11 B b 1.077948111 12 D b -1.166599657 13 A b 1.218434700 14 B b -0.938781129 15 B a 0.393036330 16 D a 0.031261588 17 B c -0.926288814 18 D b 0.807480575 19 A d 2.056935369 20 B c 0.464491514 21 B d 0.466033703 22 D b 0.236794674 23 D b 0.761648127 24 C b -0.438568617 25 D c -1.806599022 26 B c 0.885648179 27 A b -0.830359221 28 A b 0.545703187 29 D d 0.007146744 30 C a -0.243890913
Have a look at the categorical columns and think about how we can find correlation between those columns.
Advertisements