15.4 Feature selection and reduction
Recall that our df DataFrame has 5
features/columns/dimensions. Do we need all these features for our analysis, or are any of them
redundant?
In our df DataFrame, we do not need both the F and M
columns. When a column entry in one is 0, the entry in the other is
1, and vice versa. This is easy to see, but other and more subtle relationships
may exist among columns.
For example, if a, b, and c are floating-point numbers and X, Y, and Z are columns, we might have z = a x + b y + c for each value x in X, y in Y, and z in Z in the same row. In column notation, Z = a X + b Y + c.
Exercise 15.11
Show that F = – M + 1.
Exercise 15.12
Interpret this correlation coefficient matrix for the F and
M columns:
df[['F', 'M']].corr(method="pearson")
F M
F...