
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Extract String Vector Elements Up to a Fixed Number of Characters in R
To extract string vector elements up to a fixed number of characters in R, we can use substring function of base R.
For Example, if we have a vector of strings say X that contains 100 string values and we want to find the first five character of each value then we can use the command as given below −
substring(X,1,5)
Example 1
Following snippet creates a sample data frame −
x1<-c("Alabama", "Alaska", "American Samoa", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "District of Columbia", "Florida", "Georgia", "Guam", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Minor Outlying Islands", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Northern Mariana Islands", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Puerto Rico", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "U.S. Virgin Islands", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming") x1
The following dataframe is created
[1] "Alabama" "Alaska" [3] "American Samoa" "Arizona" [5] "Arkansas" "California" [7] "Colorado" "Connecticut" [9] "Delaware" "District of Columbia" [11] "Florida" "Georgia" [13] "Guam" "Hawaii" [15] "Idaho" "Illinois" [17] "Indiana" "Iowa" [19] "Kansas" "Kentucky" [21] "Louisiana" "Maine" [23] "Maryland" "Massachusetts" [25] "Michigan" "Minnesota" [27] "Minor Outlying Islands" "Mississippi" [29] "Missouri" "Montana" [31] "Nebraska" "Nevada" [33] "New Hampshire" "New Jersey" [35] "New Mexico" "New York" [37] "North Carolina" "North Dakota" [39] "Northern Mariana Islands" "Ohio" [41] "Oklahoma" "Oregon" [43] "Pennsylvania" "Puerto Rico" [45] "Rhode Island" "South Carolina" [47] "South Dakota" "Tennessee" [49] "Texas" "U.S. Virgin Islands" [51] "Utah" "Vermont" [53] "Virginia" "Washington" [55] "West Virginia" "Wisconsin" [57] "Wyoming"
To find first two characters of each value in x1 on the above created data frame, add the following code to the above snippet −
x1<-c("Alabama", "Alaska", "American Samoa", "Arizona", "Arkansas", "California", "Colorado", "Connecticut", "Delaware", "District of Columbia", "Florida", "Georgia", "Guam", "Hawaii", "Idaho", "Illinois", "Indiana", "Iowa", "Kansas", "Kentucky", "Louisiana", "Maine", "Maryland", "Massachusetts", "Michigan", "Minnesota", "Minor Outlying Islands", "Mississippi", "Missouri", "Montana", "Nebraska", "Nevada", "New Hampshire", "New Jersey", "New Mexico", "New York", "North Carolina", "North Dakota", "Northern Mariana Islands", "Ohio", "Oklahoma", "Oregon", "Pennsylvania", "Puerto Rico", "Rhode Island", "South Carolina", "South Dakota", "Tennessee", "Texas", "U.S. Virgin Islands", "Utah", "Vermont", "Virginia", "Washington", "West Virginia", "Wisconsin", "Wyoming") substring(x1,1,2)
Output
If you execute all the above given snippets as a single program, it generates the following Output −
[1] "Al" "Al" "Am" "Ar" "Ar" "Ca" "Co" "Co" "De" "Di" "Fl" "Ge" "Gu" "Ha" "Id" [16] "Il" "In" "Io" "Ka" "Ke" "Lo" "Ma" "Ma" "Ma" "Mi" "Mi" "Mi" "Mi" "Mi" "Mo" [31] "Ne" "Ne" "Ne" "Ne" "Ne" "Ne" "No" "No" "No" "Oh" "Ok" "Or" "Pe" "Pu" "Rh" [46] "So" "So" "Te" "Te" "U." "Ut" "Ve" "Vi" "Wa" "We" "Wi" "Wy"
Example 2
Following snippet creates a sample data frame −
x2<-c("Austria", "Belgium", "Bulgaria", "Croatia", "Cyprus", "Czechia", "Denmark", "Estonia", "Finland", "France", "Germany", "Greece", "Hungary", "Ireland", "Italy", "Latvia", "Lithuania", "Luxembourg", "Malta", "Netherlands", "Poland", "Portugal", "Romania", "Slovakia", "Slovenia", "Spain", "Sweden") x2
The following dataframe is created
[1] "Austria" "Belgium" "Bulgaria" "Croatia" "Cyprus" [6] "Czechia" "Denmark" "Estonia" "Finland" "France" [11] "Germany" "Greece" "Hungary" "Ireland" "Italy" [16] "Latvia" "Lithuania" "Luxembourg" "Malta" "Netherlands" [21] "Poland" "Portugal" "Romania" "Slovakia" "Slovenia" [26] "Spain" "Sweden"
To find first two characters of each value in x2 on the above created data frame, add the following code to the above snippet −
x2<-c("Austria", "Belgium", "Bulgaria", "Croatia", "Cyprus", "Czechia", "Denmark", "Estonia", "Finland", "France", "Germany", "Greece", "Hungary", "Ireland", "Italy", "Latvia", "Lithuania", "Luxembourg", "Malta", "Netherlands", "Poland", "Portugal", "Romania", "Slovakia", "Slovenia", "Spain", "Sweden") substring(x2,1,2)
Output
If you execute all the above given snippets as a single program, it generates the following Output −
[1] "Au" "Be" "Bu" "Cr" "Cy" "Cz" "De" "Es" "Fi" "Fr" "Ge" "Gr" "Hu" "Ir" "It" [16] "La" "Li" "Lu" "Ma" "Ne" "Po" "Po" "Ro" "Sl" "Sl" "Sp" "Sw"
Example 3
Following snippet creates a sample data frame −
x3<-c("Cuba", "Cyprus", "Czech Republic", "Djibouti", "Dominica", "Dominican Republic", "East Timor", "Ecuador", "Egypt", "El Salvador", "Equatorial Guinea", "Eritrea", "Estonia", "Ethiopia", "Fiji", "Finland", "France", "Metropolitan", "French Guiana", "Gambia", "Georgia", "Germany", "Ghana", "Greenland", "Grenada", "Guatemala", "Honduras", "Hong Kong", "Hungary", "Iceland", "India", "Indonesia", "Iran", "Iraq", "Ireland", "Israel", "Italy", "Jamaica", "Japan", "Jordan", "Kazakhstan", "Kenya", "Mozambique", "Namibia", "Nepal", "Netherlands", "Nigeria", "Norway", "Oman", "Paraguay", "Peru", "Philippines") x3
The following dataframe is created
[1] "Cuba" "Cyprus" "Czech Republic" [4] "Djibouti" "Dominica" "Dominican Republic" [7] "East Timor" "Ecuador" "Egypt" [10] "El Salvador" "Equatorial Guinea" "Eritrea" [13] "Estonia" "Ethiopia" "Fiji" [16] "Finland" "France" "Metropolitan" [19] "French Guiana" "Gambia" "Georgia" [22] "Germany" "Ghana" "Greenland" [25] "Grenada" "Guatemala" "Honduras" [28] "Hong Kong" "Hungary" "Iceland" [31] "India" "Indonesia" "Iran" [34] "Iraq" "Ireland" "Israel" [37] "Italy" "Jamaica" "Japan" [40] "Jordan" "Kazakhstan" "Kenya" [43] "Mozambique" "Namibia" "Nepal" [46] "Netherlands" "Nigeria" "Norway" [49] "Oman" "Paraguay" "Peru" [52] "Philippines"
To find first two characters of each value in x3 on the above created data frame, add the following code to the above snippet −
x3<-c("Cuba", "Cyprus", "Czech Republic", "Djibouti", "Dominica", "Dominican Republic", "East Timor", "Ecuador", "Egypt", "El Salvador", "Equatorial Guinea", "Eritrea", "Estonia", "Ethiopia", "Fiji", "Finland", "France", "Metropolitan", "French Guiana", "Gambia", "Georgia", "Germany", "Ghana", "Greenland", "Grenada", "Guatemala", "Honduras", "Hong Kong", "Hungary", "Iceland", "India", "Indonesia", "Iran", "Iraq", "Ireland", "Israel", "Italy", "Jamaica", "Japan", "Jordan", "Kazakhstan", "Kenya", "Mozambique", "Namibia", "Nepal", "Netherlands", "Nigeria", "Norway", "Oman", "Paraguay", "Peru", "Philippines") substring(x3,1,2)
Output
If you execute all the above given snippets as a single program, it generates the following Output −
[1] "Cu" "Cy" "Cz" "Dj" "Do" "Do" "Ea" "Ec" "Eg" "El" "Eq" "Er" "Es" "Et" "Fi" [16] "Fi" "Fr" "Me" "Fr" "Ga" "Ge" "Ge" "Gh" "Gr" "Gr" "Gu" "Ho" "Ho" "Hu" "Ic" [31] "In" "In" "Ir" "Ir" "Ir" "Is" "It" "Ja" "Ja" "Jo" "Ka" "Ke" "Mo" "Na" "Ne" [46] "Ne" "Ni" "No" "Om" "Pa" "Pe" "Ph"