
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Extract First Digit from Character Column in R Data Frame
If we have a character column in the data frame that contains string as well as numeric values and the first digit of the numeric values has some meaning that can help in data analysis then we can extract those first digits. For this purpose, we can use stri_extract_first function from stringi package.
Example1
Consider the below data frame −
> x1<-1:20 > y1<-sample(c("HT23L","HT14L","HT32L"),20,replace=TRUE) > df1<-data.frame(x1,y1) > df1
Output
x1 y1 1 1 HT14L 2 2 HT14L 3 3 HT23L 4 4 HT14L 5 5 HT32L 6 6 HT32L 7 7 HT14L 8 8 HT32L 9 9 HT32L 10 10 HT32L 11 11 HT23L 12 12 HT32L 13 13 HT14L 14 14 HT23L 15 15 HT14L 16 16 HT23L 17 17 HT23L 18 18 HT23L 19 19 HT23L 20 20 HT23L
Loading stringi package and extracting first digit in column y1 −
> library(stringi) > stri_extract_first(df1$y1,regex="\d")
Output
[1] "1" "1" "2" "1" "3" "3" "1" "3" "3" "3" "2" "3" "1" "2" "1" "2" "2" "2" "2" [20] "2"
Example2
> x2<-sample(c("India1RT1","UK5RT1","Egypt2PT4"),20,replace=TRUE) > y2<-rpois(20,5) > df2<-data.frame(x2,y2) > df2
Output
x2 y2 1 India1RT1 2 2 India1RT1 8 3 India1RT1 7 4 India1RT1 6 5 UK5RT1 6 6 India1RT1 5 7 UK5RT1 6 8 India1RT1 6 9 India1RT1 7 10 UK5RT1 10 11 Egypt2PT4 8 12 Egypt2PT4 5 13 Egypt2PT4 7 14 India1RT1 2 15 UK5RT1 3 16 Egypt2PT4 5 17 UK5RT1 3 18 Egypt2PT4 6 19 Egypt2PT4 3 20 UK5RT1 5
Extracting first digit in column x2 −
> stri_extract_first(df2$x2,regex="\d")
Output
[1] "1" "1" "1" "1" "5" "1" "5" "1" "1" "5" "2" "2" "2" "1" "5" "2" "5" "2" "2" [20] "5"
Example3
> x3<-sample(c("abc123","dfe456"),20,replace=TRUE) > y3<-rnorm(20) > df3<-data.frame(x3,y3) > df3
Output
x3 y3 1 abc123 0.1027005 2 dfe456 0.2297002 3 dfe456 -0.1441151 4 dfe456 1.0510760 5 abc123 0.8182656 6 dfe456 -0.5018968 7 dfe456 0.2957634 8 abc123 -0.4240910 9 dfe456 -1.0700713 10 dfe456 -0.3374661 11 dfe456 -0.4654241 12 dfe456 -0.4542710 13 abc123 0.6969808 14 dfe456 -0.6514574 15 abc123 0.2258769 16 dfe456 -0.5348958 17 abc123 0.6629195 18 dfe456 1.0998636 19 dfe456 -1.3147809 20 dfe456 -2.3015384
Extracting first digit in column x3 −
> stri_extract_first(df3$x3,regex="\d")
Output
[1] "1" "4" "4" "4" "1" "4" "4" "1" "4" "4" "4" "4" "1" "4" "1" "4" "1" "4" "4" [20] "4"
Advertisements