Skip to content

Commit 78052f9

Browse files
committed
Pushing the docs to dev/ for branch: main, commit 34db65a3addfc83d99e64ec55d5e6896ecfbb940
1 parent 49d1599 commit 78052f9

File tree

2,351 files changed

+1303951
-319249
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

2,351 files changed

+1303951
-319249
lines changed

dev/.buildinfo

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: 9b326f5a24b70ee04933e1c4cba406d7
3+
config: 0eeb1325ca94d95b00a60ff91bce101a
44
tags: 645f666f9bcd5a90fca523b33c5a78b7

dev/_downloads/0785ea6d45bde062e5beedda88131215/plot_release_highlights_1_3_0.ipynb

+2-2
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@
7272
"cell_type": "markdown",
7373
"metadata": {},
7474
"source": [
75-
"## New display `model_selection.ValidationCurveDisplay`\n:class:`model_selection.ValidationCurveDisplay` is now available to plot results\nfrom :func:`model_selection.validation_curve`.\n\n"
75+
"## New display :class:`~model_selection.ValidationCurveDisplay`\n:class:`model_selection.ValidationCurveDisplay` is now available to plot results\nfrom :func:`model_selection.validation_curve`.\n\n"
7676
]
7777
},
7878
{
@@ -108,7 +108,7 @@
108108
"cell_type": "markdown",
109109
"metadata": {},
110110
"source": [
111-
"## Grouping infrequent categories in :class:`preprocessing.OrdinalEncoder`\nSimilarly to :class:`preprocessing.OneHotEncoder`, the class\n:class:`preprocessing.OrdinalEncoder` now supports aggregating infrequent categories\ninto a single output for each feature. The parameters to enable the gathering of\ninfrequent categories are `min_frequency` and `max_categories`.\nSee the `User Guide <encoder_infrequent_categories>` for more details.\n\n"
111+
"## Grouping infrequent categories in :class:`~preprocessing.OrdinalEncoder`\nSimilarly to :class:`preprocessing.OneHotEncoder`, the class\n:class:`preprocessing.OrdinalEncoder` now supports aggregating infrequent categories\ninto a single output for each feature. The parameters to enable the gathering of\ninfrequent categories are `min_frequency` and `max_categories`.\nSee the `User Guide <encoder_infrequent_categories>` for more details.\n\n"
112112
]
113113
},
114114
{
Binary file not shown.

dev/_downloads/133f2198d3ab792c75b39a63b0a99872/plot_cost_sensitive_learning.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"\n# Post-tuning the decision threshold for cost-sensitive learning\n\nOnce a classifier is trained, the output of the :term:`predict` method outputs class\nlabel predictions corresponding to a thresholding of either the :term:`decision\nfunction` or the :term:`predict_proba` output. For a binary classifier, the default\nthreshold is defined as a posterior probability estimate of 0.5 or a decision score of\n0.0.\n\nHowever, this default strategy is most likely not optimal for the task at hand.\nHere, we use the \"Statlog\" German credit dataset [1]_ to illustrate a use case.\nIn this dataset, the task is to predict whether a person has a \"good\" or \"bad\" credit.\nIn addition, a cost-matrix is provided that specifies the cost of\nmisclassification. Specifically, misclassifying a \"bad\" credit as \"good\" is five\ntimes more costly on average than misclassifying a \"good\" credit as \"bad\".\n\nWe use the :class:`~sklearn.model_selection.TunedThresholdClassifierCV` to select the\ncut-off point of the decision function that minimizes the provided business\ncost.\n\nIn the second part of the example, we further extend this approach by\nconsidering the problem of fraud detection in credit card transactions: in this\ncase, the business metric depends on the amount of each individual transaction.\n.. topic:: References\n\n .. [1] \"Statlog (German Credit Data) Data Set\", UCI Machine Learning Repository,\n [Link](https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29).\n\n .. [2] [Charles Elkan, \"The Foundations of Cost-Sensitive Learning\",\n International joint conference on artificial intelligence.\n Vol. 17. No. 1. Lawrence Erlbaum Associates Ltd, 2001.](https://cseweb.ucsd.edu/~elkan/rescale.pdf)\n"
7+
"\n# Post-tuning the decision threshold for cost-sensitive learning\n\nOnce a classifier is trained, the output of the :term:`predict` method outputs class\nlabel predictions corresponding to a thresholding of either the\n:term:`decision_function` or the :term:`predict_proba` output. For a binary classifier,\nthe default threshold is defined as a posterior probability estimate of 0.5 or a\ndecision score of 0.0.\n\nHowever, this default strategy is most likely not optimal for the task at hand.\nHere, we use the \"Statlog\" German credit dataset [1]_ to illustrate a use case.\nIn this dataset, the task is to predict whether a person has a \"good\" or \"bad\" credit.\nIn addition, a cost-matrix is provided that specifies the cost of\nmisclassification. Specifically, misclassifying a \"bad\" credit as \"good\" is five\ntimes more costly on average than misclassifying a \"good\" credit as \"bad\".\n\nWe use the :class:`~sklearn.model_selection.TunedThresholdClassifierCV` to select the\ncut-off point of the decision function that minimizes the provided business\ncost.\n\nIn the second part of the example, we further extend this approach by\nconsidering the problem of fraud detection in credit card transactions: in this\ncase, the business metric depends on the amount of each individual transaction.\n\n.. rubric :: References\n\n.. [1] \"Statlog (German Credit Data) Data Set\", UCI Machine Learning Repository,\n [Link](https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29).\n\n.. [2] [Charles Elkan, \"The Foundations of Cost-Sensitive Learning\",\n International joint conference on artificial intelligence.\n Vol. 17. No. 1. Lawrence Erlbaum Associates Ltd, 2001.](https://cseweb.ucsd.edu/~elkan/rescale.pdf)\n"
88
]
99
},
1010
{

dev/_downloads/1b8827af01c9a70017a4739bcf2e21a8/plot_gpr_co2.py

+5-6
Original file line numberDiff line numberDiff line change
@@ -4,20 +4,19 @@
44
====================================================================================
55
66
This example is based on Section 5.4.3 of "Gaussian Processes for Machine
7-
Learning" [RW2006]_. It illustrates an example of complex kernel engineering
7+
Learning" [1]_. It illustrates an example of complex kernel engineering
88
and hyperparameter optimization using gradient ascent on the
99
log-marginal-likelihood. The data consists of the monthly average atmospheric
1010
CO2 concentrations (in parts per million by volume (ppm)) collected at the
1111
Mauna Loa Observatory in Hawaii, between 1958 and 2001. The objective is to
1212
model the CO2 concentration as a function of the time :math:`t` and extrapolate
1313
for years after 2001.
1414
15-
.. topic: References
15+
.. rubric:: References
1616
17-
.. [RW2006] `Rasmussen, Carl Edward.
18-
"Gaussian processes in machine learning."
19-
Summer school on machine learning. Springer, Berlin, Heidelberg, 2003
20-
<http://www.gaussianprocess.org/gpml/chapters/RW.pdf>`_.
17+
.. [1] `Rasmussen, Carl Edward. "Gaussian processes in machine learning."
18+
Summer school on machine learning. Springer, Berlin, Heidelberg, 2003
19+
<http://www.gaussianprocess.org/gpml/chapters/RW.pdf>`_.
2120
"""
2221

2322
print(__doc__)

dev/_downloads/23614d75e8327ef369659da7d2ed62db/plot_nested_cross_validation_iris.py

+6-6
Original file line numberDiff line numberDiff line change
@@ -30,17 +30,17 @@
3030
performance of non-nested and nested CV strategies by taking the difference
3131
between their scores.
3232
33-
.. topic:: See Also:
33+
.. seealso::
3434
3535
- :ref:`cross_validation`
3636
- :ref:`grid_search`
3737
38-
.. topic:: References:
38+
.. rubric:: References
3939
40-
.. [1] `Cawley, G.C.; Talbot, N.L.C. On over-fitting in model selection and
41-
subsequent selection bias in performance evaluation.
42-
J. Mach. Learn. Res 2010,11, 2079-2107.
43-
<http://jmlr.csail.mit.edu/papers/volume11/cawley10a/cawley10a.pdf>`_
40+
.. [1] `Cawley, G.C.; Talbot, N.L.C. On over-fitting in model selection and
41+
subsequent selection bias in performance evaluation.
42+
J. Mach. Learn. Res 2010,11, 2079-2107.
43+
<http://jmlr.csail.mit.edu/papers/volume11/cawley10a/cawley10a.pdf>`_
4444
4545
"""
4646

dev/_downloads/2402de18d671ce5087e3760b2540184f/plot_grid_search_stats.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -330,7 +330,7 @@
330330
"cell_type": "markdown",
331331
"metadata": {},
332332
"source": [
333-
".. topic:: References\n\n .. [1] Dietterich, T. G. (1998). [Approximate statistical tests for\n comparing supervised classification learning algorithms](http://web.cs.iastate.edu/~jtian/cs573/Papers/Dietterich-98.pdf).\n Neural computation, 10(7).\n .. [2] Nadeau, C., & Bengio, Y. (2000). [Inference for the generalization\n error](https://papers.nips.cc/paper/1661-inference-for-the-generalization-error.pdf).\n In Advances in neural information processing systems.\n .. [3] Bouckaert, R. R., & Frank, E. (2004). [Evaluating the replicability\n of significance tests for comparing learning algorithms](https://www.cms.waikato.ac.nz/~ml/publications/2004/bouckaert-frank.pdf).\n In Pacific-Asia Conference on Knowledge Discovery and Data Mining.\n .. [4] Benavoli, A., Corani, G., Dem\u0161ar, J., & Zaffalon, M. (2017). [Time\n for a change: a tutorial for comparing multiple classifiers through\n Bayesian analysis](http://www.jmlr.org/papers/volume18/16-305/16-305.pdf).\n The Journal of Machine Learning Research, 18(1). See the Python\n library that accompanies this paper [here](https://github.com/janezd/baycomp).\n .. [5] Diebold, F.X. & Mariano R.S. (1995). [Comparing predictive accuracy](http://www.est.uc3m.es/esp/nueva_docencia/comp_col_get/lade/tecnicas_prediccion/Practicas0708/Comparing%20Predictive%20Accuracy%20(Dielbold).pdf)\n Journal of Business & economic statistics, 20(1), 134-144.\n\n"
333+
".. rubric:: References\n\n.. [1] Dietterich, T. G. (1998). [Approximate statistical tests for\n comparing supervised classification learning algorithms](http://web.cs.iastate.edu/~jtian/cs573/Papers/Dietterich-98.pdf).\n Neural computation, 10(7).\n.. [2] Nadeau, C., & Bengio, Y. (2000). [Inference for the generalization\n error](https://papers.nips.cc/paper/1661-inference-for-the-generalization-error.pdf).\n In Advances in neural information processing systems.\n.. [3] Bouckaert, R. R., & Frank, E. (2004). [Evaluating the replicability\n of significance tests for comparing learning algorithms](https://www.cms.waikato.ac.nz/~ml/publications/2004/bouckaert-frank.pdf).\n In Pacific-Asia Conference on Knowledge Discovery and Data Mining.\n.. [4] Benavoli, A., Corani, G., Dem\u0161ar, J., & Zaffalon, M. (2017). [Time\n for a change: a tutorial for comparing multiple classifiers through\n Bayesian analysis](http://www.jmlr.org/papers/volume18/16-305/16-305.pdf).\n The Journal of Machine Learning Research, 18(1). See the Python\n library that accompanies this paper [here](https://github.com/janezd/baycomp).\n.. [5] Diebold, F.X. & Mariano R.S. (1995). [Comparing predictive accuracy](http://www.est.uc3m.es/esp/nueva_docencia/comp_col_get/lade/tecnicas_prediccion/Practicas0708/Comparing%20Predictive%20Accuracy%20(Dielbold).pdf)\n Journal of Business & economic statistics, 20(1), 134-144.\n\n"
334334
]
335335
}
336336
],

dev/_downloads/32173eb704d697c23dffbbf3fd74942a/plot_digits_denoising.py

+5-5
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,12 @@
1212
1313
We will use USPS digits dataset to reproduce presented in Sect. 4 of [1]_.
1414
15-
.. topic:: References
15+
.. rubric:: References
1616
17-
.. [1] `Bakır, Gökhan H., Jason Weston, and Bernhard Schölkopf.
18-
"Learning to find pre-images."
19-
Advances in neural information processing systems 16 (2004): 449-456.
20-
<https://papers.nips.cc/paper/2003/file/ac1ad983e08ad3304a97e147f522747e-Paper.pdf>`_
17+
.. [1] `Bakır, Gökhan H., Jason Weston, and Bernhard Schölkopf.
18+
"Learning to find pre-images."
19+
Advances in neural information processing systems 16 (2004): 449-456.
20+
<https://papers.nips.cc/paper/2003/file/ac1ad983e08ad3304a97e147f522747e-Paper.pdf>`_
2121
2222
"""
2323

dev/_downloads/3316f301d7c7651c033565a5ae51c295/plot_release_highlights_1_3_0.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -101,8 +101,8 @@
101101
tree.predict(X)
102102

103103
# %%
104-
# New display `model_selection.ValidationCurveDisplay`
105-
# ----------------------------------------------------
104+
# New display :class:`~model_selection.ValidationCurveDisplay`
105+
# ------------------------------------------------------------
106106
# :class:`model_selection.ValidationCurveDisplay` is now available to plot results
107107
# from :func:`model_selection.validation_curve`.
108108
from sklearn.datasets import make_classification
@@ -141,8 +141,8 @@
141141
cross_val_score(gbdt, X, y).mean()
142142

143143
# %%
144-
# Grouping infrequent categories in :class:`preprocessing.OrdinalEncoder`
145-
# -----------------------------------------------------------------------
144+
# Grouping infrequent categories in :class:`~preprocessing.OrdinalEncoder`
145+
# ------------------------------------------------------------------------
146146
# Similarly to :class:`preprocessing.OneHotEncoder`, the class
147147
# :class:`preprocessing.OrdinalEncoder` now supports aggregating infrequent categories
148148
# into a single output for each feature. The parameters to enable the gathering of

dev/_downloads/3c3c738275484acc54821615bf72894a/plot_permutation_importance.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,10 @@
1818
This example shows how to use Permutation Importances as an alternative that
1919
can mitigate those limitations.
2020
21-
.. topic:: References:
21+
.. rubric:: References
2222
23-
* :doi:`L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32,
24-
2001. <10.1023/A:1010933404324>`
23+
* :doi:`L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32,
24+
2001. <10.1023/A:1010933404324>`
2525
2626
"""
2727

dev/_downloads/45916745bb89ca49be3a50aa80e65e3f/plot_nested_cross_validation_iris.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
"cell_type": "markdown",
55
"metadata": {},
66
"source": [
7-
"\n# Nested versus non-nested cross-validation\n\nThis example compares non-nested and nested cross-validation strategies on a\nclassifier of the iris data set. Nested cross-validation (CV) is often used to\ntrain a model in which hyperparameters also need to be optimized. Nested CV\nestimates the generalization error of the underlying model and its\n(hyper)parameter search. Choosing the parameters that maximize non-nested CV\nbiases the model to the dataset, yielding an overly-optimistic score.\n\nModel selection without nested CV uses the same data to tune model parameters\nand evaluate model performance. Information may thus \"leak\" into the model\nand overfit the data. The magnitude of this effect is primarily dependent on\nthe size of the dataset and the stability of the model. See Cawley and Talbot\n[1]_ for an analysis of these issues.\n\nTo avoid this problem, nested CV effectively uses a series of\ntrain/validation/test set splits. In the inner loop (here executed by\n:class:`GridSearchCV <sklearn.model_selection.GridSearchCV>`), the score is\napproximately maximized by fitting a model to each training set, and then\ndirectly maximized in selecting (hyper)parameters over the validation set. In\nthe outer loop (here in :func:`cross_val_score\n<sklearn.model_selection.cross_val_score>`), generalization error is estimated\nby averaging test set scores over several dataset splits.\n\nThe example below uses a support vector classifier with a non-linear kernel to\nbuild a model with optimized hyperparameters by grid search. We compare the\nperformance of non-nested and nested CV strategies by taking the difference\nbetween their scores.\n\n.. topic:: See Also:\n\n - `cross_validation`\n - `grid_search`\n\n.. topic:: References:\n\n .. [1] [Cawley, G.C.; Talbot, N.L.C. On over-fitting in model selection and\n subsequent selection bias in performance evaluation.\n J. Mach. Learn. Res 2010,11, 2079-2107.](http://jmlr.csail.mit.edu/papers/volume11/cawley10a/cawley10a.pdf)\n"
7+
"\n# Nested versus non-nested cross-validation\n\nThis example compares non-nested and nested cross-validation strategies on a\nclassifier of the iris data set. Nested cross-validation (CV) is often used to\ntrain a model in which hyperparameters also need to be optimized. Nested CV\nestimates the generalization error of the underlying model and its\n(hyper)parameter search. Choosing the parameters that maximize non-nested CV\nbiases the model to the dataset, yielding an overly-optimistic score.\n\nModel selection without nested CV uses the same data to tune model parameters\nand evaluate model performance. Information may thus \"leak\" into the model\nand overfit the data. The magnitude of this effect is primarily dependent on\nthe size of the dataset and the stability of the model. See Cawley and Talbot\n[1]_ for an analysis of these issues.\n\nTo avoid this problem, nested CV effectively uses a series of\ntrain/validation/test set splits. In the inner loop (here executed by\n:class:`GridSearchCV <sklearn.model_selection.GridSearchCV>`), the score is\napproximately maximized by fitting a model to each training set, and then\ndirectly maximized in selecting (hyper)parameters over the validation set. In\nthe outer loop (here in :func:`cross_val_score\n<sklearn.model_selection.cross_val_score>`), generalization error is estimated\nby averaging test set scores over several dataset splits.\n\nThe example below uses a support vector classifier with a non-linear kernel to\nbuild a model with optimized hyperparameters by grid search. We compare the\nperformance of non-nested and nested CV strategies by taking the difference\nbetween their scores.\n\n.. seealso::\n\n - `cross_validation`\n - `grid_search`\n\n.. rubric:: References\n\n.. [1] [Cawley, G.C.; Talbot, N.L.C. On over-fitting in model selection and\n subsequent selection bias in performance evaluation.\n J. Mach. Learn. Res 2010,11, 2079-2107.](http://jmlr.csail.mit.edu/papers/volume11/cawley10a/cawley10a.pdf)\n"
88
]
99
},
1010
{

dev/_downloads/4cf0456267ced0f869a458ef4776d4c5/plot_release_highlights_1_1_0.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,8 @@
2424
# %%
2525
# .. _quantile_support_hgbdt:
2626
#
27-
# Quantile loss in :class:`ensemble.HistGradientBoostingRegressor`
28-
# ----------------------------------------------------------------
27+
# Quantile loss in :class:`~ensemble.HistGradientBoostingRegressor`
28+
# -----------------------------------------------------------------
2929
# :class:`~ensemble.HistGradientBoostingRegressor` can model quantiles with
3030
# `loss="quantile"` and the new parameter `quantile`.
3131
from sklearn.ensemble import HistGradientBoostingRegressor

dev/_downloads/4e46f015ab8300f262e6e8775bcdcf8a/plot_adaboost_multiclass.py

+11-11
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@
1717
be selected. This ensures that subsequent iterations of the algorithm focus on
1818
the difficult-to-classify samples.
1919
20-
.. topic:: References:
20+
.. rubric:: References
2121
22-
.. [1] :doi:`J. Zhu, H. Zou, S. Rosset, T. Hastie, "Multi-class adaboost."
23-
Statistics and its Interface 2.3 (2009): 349-360.
24-
<10.4310/SII.2009.v2.n3.a8>`
22+
.. [1] :doi:`J. Zhu, H. Zou, S. Rosset, T. Hastie, "Multi-class adaboost."
23+
Statistics and its Interface 2.3 (2009): 349-360.
24+
<10.4310/SII.2009.v2.n3.a8>`
2525
2626
"""
2727

@@ -231,16 +231,16 @@ def misclassification_error(y_true, y_pred):
231231
# decision. Indeed, this exactly is the formulation of updating the base
232232
# estimators' weights after each iteration in AdaBoost.
233233
#
234-
# |details-start| Mathematical details |details-split|
234+
# .. dropdown:: Mathematical details
235235
#
236-
# The weight associated with a weak learner trained at the stage :math:`m` is
237-
# inversely associated with its misclassification error such that:
236+
# The weight associated with a weak learner trained at the stage :math:`m` is
237+
# inversely associated with its misclassification error such that:
238238
#
239-
# .. math:: \alpha^{(m)} = \log \frac{1 - err^{(m)}}{err^{(m)}} + \log (K - 1),
239+
# .. math:: \alpha^{(m)} = \log \frac{1 - err^{(m)}}{err^{(m)}} + \log (K - 1),
240240
#
241-
# where :math:`\alpha^{(m)}` and :math:`err^{(m)}` are the weight and the error
242-
# of the :math:`m` th weak learner, respectively, and :math:`K` is the number of
243-
# classes in our classification problem. |details-end|
241+
# where :math:`\alpha^{(m)}` and :math:`err^{(m)}` are the weight and the error
242+
# of the :math:`m` th weak learner, respectively, and :math:`K` is the number of
243+
# classes in our classification problem.
244244
#
245245
# Another interesting observation boils down to the fact that the first weak
246246
# learners of the model make fewer errors than later weak learners of the

0 commit comments

Comments
 (0)