这里可以看一下帮助文档中的描述
predict_proba(X)Probability estimates.
概率估计The returned estimates for all classes are ordered by the label of classes.
这个方法的返回所有类别的概率的估计值按照类别的标签排序For a multi_class problem, if multi_class is set to be “multinomial” the softmax function is used to find the predicted probability of each class. Else use a one-vs-rest approach, i.e calculate the probability of each class assuming it to be positive using the logistic function. and normalize these values across all the classes.
对于一个多分类问题,如果multi_clas桉树被设置为“multinomial” ,则softmax函数会被用于预测每个类别的预测概率。否则,使用ovr方法,即使用sigmoid函数,依次假设每个类为唯一正类,计算每个类别的概率,然后对所有类预测的概率进行归一化
这里我们以load_iris数据集为例,这是一个三分类数据集
from sklearn.datasets import load_iris
import numpy as np
X,y = load_iris(return_X_y=True)
逻辑回归的multi_class没有’ovo’选项
这里我们先说’ovr’情况,建立多分类情况下的模型
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(max_iter=1000,multi_class='ovr').fit(X,y)
clf.predict_proba(X[:5,:])
---
array([[8.96807569e-01, 1.03191359e-01, 1.07219602e-06],
[7.78979389e-01, 2.21019299e-01, 1.31168933e-06],
[8.34864184e-01, 1.65134802e-01, 1.01485082e-06],
[7.90001986e-01, 2.09996107e-01, 1.90723705e-06],
[9.12050403e-01, 8.79485212e-02, 1.07537143e-06]])
按行求和
clf.predict_proba(X[:5,:]