14.6 Converting categorical data
The GENDER column contains categorical data rather
than numeric. The items in the column belong to a fixed set of values, which are usually
strings. In this case, the values are 'F' and 'M'. While we can check
if an item is equal to one of these, it is often easier to convert the categorical column to
multiple numeric “dummy columns” containing 0 and 1.
Here are the first two rows of df:
df.head(2)
Locality Postcode Breed Colour Gender
0 DANDENONG NORTH 3175 DOMSH TAB F
1 DANDENONG NORTH 3175 DOMLH BLAWHI M
and this is what we get when we use get_dummies on the
GENDER column:
pd.get_dummies(df, columns=["Gender"]).head(2)
Locality Postcode Breed Colour Gender_F Gender_M
0 DANDENONG NORTH 3175 DOMSH TAB 1 0
1 DANDENONG NORTH 3175 DOMLH ...