Decision Tree
- 该特征带来的标准(信息增益、基尼指数)减少的总和(需要经过标准化). 也被称为基尼重要性.
- sklearn官网说明原文如下:
The importance of a feature is computed as the (normalized) total reduction of the criterion brought by that feature. It is also known as the Gini importance.
XGBoost
- 只有当树模型作为基学习器时Feature importance才被定义(booster=gbtree),其他如线性学习器(booster=gblinear)则没有该定义。
- 参数importance_type(string, default “gain”), 其他取值有“weight”, “cover”, “total_gain” or “total_cover”.
- weight:该特征被选为分裂特征的次数; gain: 该特征在它所有分裂使用中带来的平均增益; total_gain: 该特征在它所有分裂使用中带来的增益和; cover: 该特征在它所有分裂使用中的平均覆盖率(覆盖率:受分裂影响的样本数); total_cover: 该特征在它所有分裂使用中的总覆盖率.
- xgboost python API文档地址: XGBoost Python API Reference
- xgboost 参数完整文档github地址: XGBoost Parameters
- xgboost文档原文引用:
”weight” is the number of times a feature appears in a tree
”gain” is the average gain of splits which use the feature
”cover” is the average coverage of splits which use the feature where coverage is defined as the number of samples affected by the split
LightGBM
- 参数importance_type (string, optional (default=‘split’))
- split: 该特征在模型中被使用的次数; gain: 同xgboost的total_gain.
- lightgbm python API文档地址:LightGBM Python API
- lightgbm文档原文引用:
If ‘split’, result contains numbers of times the feature is used in a model. If ‘gain’, result contains total gains of splits which use the feature.