scikit-learn
diff --git a/‎dev/.buildinfo
Lines changed: 1 addition & 1 deletion b/‎dev/.buildinfo
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/0785ea6d45bde062e5beedda88131215/plot_release_highlights_1_3_0.ipynb
Lines changed: 2 additions & 2 deletions b/‎dev/_downloads/0785ea6d45bde062e5beedda88131215/plot_release_highlights_1_3_0.ipynb
Lines changed: 2 additions & 2 deletions
diff --git a/‎dev/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip
-211 Bytes b/‎dev/_downloads/07fcc19ba03226cd3d83d4e40ec44385/auto_examples_python.zip
-211 Bytes
diff --git a/‎dev/_downloads/133f2198d3ab792c75b39a63b0a99872/plot_cost_sensitive_learning.ipynb
Lines changed: 1 addition & 1 deletion b/‎dev/_downloads/133f2198d3ab792c75b39a63b0a99872/plot_cost_sensitive_learning.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/1b8827af01c9a70017a4739bcf2e21a8/plot_gpr_co2.py
Lines changed: 5 additions & 6 deletions b/‎dev/_downloads/1b8827af01c9a70017a4739bcf2e21a8/plot_gpr_co2.py
Lines changed: 5 additions & 6 deletions
diff --git a/‎dev/_downloads/23614d75e8327ef369659da7d2ed62db/plot_nested_cross_validation_iris.py
Lines changed: 6 additions & 6 deletions b/‎dev/_downloads/23614d75e8327ef369659da7d2ed62db/plot_nested_cross_validation_iris.py
Lines changed: 6 additions & 6 deletions
diff --git a/‎dev/_downloads/2402de18d671ce5087e3760b2540184f/plot_grid_search_stats.ipynb
Lines changed: 1 addition & 1 deletion b/‎dev/_downloads/2402de18d671ce5087e3760b2540184f/plot_grid_search_stats.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/32173eb704d697c23dffbbf3fd74942a/plot_digits_denoising.py
Lines changed: 5 additions & 5 deletions b/‎dev/_downloads/32173eb704d697c23dffbbf3fd74942a/plot_digits_denoising.py
Lines changed: 5 additions & 5 deletions
diff --git a/‎dev/_downloads/3316f301d7c7651c033565a5ae51c295/plot_release_highlights_1_3_0.py
Lines changed: 4 additions & 4 deletions b/‎dev/_downloads/3316f301d7c7651c033565a5ae51c295/plot_release_highlights_1_3_0.py
Lines changed: 4 additions & 4 deletions
diff --git a/‎dev/_downloads/3c3c738275484acc54821615bf72894a/plot_permutation_importance.py
Lines changed: 3 additions & 3 deletions b/‎dev/_downloads/3c3c738275484acc54821615bf72894a/plot_permutation_importance.py
Lines changed: 3 additions & 3 deletions
diff --git a/‎dev/_downloads/45916745bb89ca49be3a50aa80e65e3f/plot_nested_cross_validation_iris.ipynb
Lines changed: 1 addition & 1 deletion b/‎dev/_downloads/45916745bb89ca49be3a50aa80e65e3f/plot_nested_cross_validation_iris.ipynb
Lines changed: 1 addition & 1 deletion
diff --git a/‎dev/_downloads/4cf0456267ced0f869a458ef4776d4c5/plot_release_highlights_1_1_0.py
Lines changed: 2 additions & 2 deletions b/‎dev/_downloads/4cf0456267ced0f869a458ef4776d4c5/plot_release_highlights_1_1_0.py
Lines changed: 2 additions & 2 deletions
diff --git a/‎dev/_downloads/4e46f015ab8300f262e6e8775bcdcf8a/plot_adaboost_multiclass.py
Lines changed: 11 additions & 11 deletions b/‎dev/_downloads/4e46f015ab8300f262e6e8775bcdcf8a/plot_adaboost_multiclass.py
Lines changed: 11 additions & 11 deletions
@@ -1,4 +1,4 @@
 # Sphinx build info version 1
 # This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
-config: 9b326f5a24b70ee04933e1c4cba406d7
+config: 0eeb1325ca94d95b00a60ff91bce101a
 tags: 645f666f9bcd5a90fca523b33c5a78b7
@@ -72,7 +72,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "## New display `model_selection.ValidationCurveDisplay`\n:class:`model_selection.ValidationCurveDisplay` is now available to plot results\nfrom :func:`model_selection.validation_curve`.\n\n"
+        "## New display :class:`~model_selection.ValidationCurveDisplay`\n:class:`model_selection.ValidationCurveDisplay` is now available to plot results\nfrom :func:`model_selection.validation_curve`.\n\n"
       ]
     },
     {
@@ -108,7 +108,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "## Grouping infrequent categories in :class:`preprocessing.OrdinalEncoder`\nSimilarly to :class:`preprocessing.OneHotEncoder`, the class\n:class:`preprocessing.OrdinalEncoder` now supports aggregating infrequent categories\ninto a single output for each feature. The parameters to enable the gathering of\ninfrequent categories are `min_frequency` and `max_categories`.\nSee the `User Guide <encoder_infrequent_categories>` for more details.\n\n"
+        "## Grouping infrequent categories in :class:`~preprocessing.OrdinalEncoder`\nSimilarly to :class:`preprocessing.OneHotEncoder`, the class\n:class:`preprocessing.OrdinalEncoder` now supports aggregating infrequent categories\ninto a single output for each feature. The parameters to enable the gathering of\ninfrequent categories are `min_frequency` and `max_categories`.\nSee the `User Guide <encoder_infrequent_categories>` for more details.\n\n"
       ]
     },
     {
 
@@ -4,7 +4,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Post-tuning the decision threshold for cost-sensitive learning\n\nOnce a classifier is trained, the output of the :term:`predict` method outputs class\nlabel predictions corresponding to a thresholding of either the :term:`decision\nfunction` or the :term:`predict_proba` output. For a binary classifier, the default\nthreshold is defined as a posterior probability estimate of 0.5 or a decision score of\n0.0.\n\nHowever, this default strategy is most likely not optimal for the task at hand.\nHere, we use the \"Statlog\" German credit dataset [1]_ to illustrate a use case.\nIn this dataset, the task is to predict whether a person has a \"good\" or \"bad\" credit.\nIn addition, a cost-matrix is provided that specifies the cost of\nmisclassification. Specifically, misclassifying a \"bad\" credit as \"good\" is five\ntimes more costly on average than misclassifying a \"good\" credit as \"bad\".\n\nWe use the :class:`~sklearn.model_selection.TunedThresholdClassifierCV` to select the\ncut-off point of the decision function that minimizes the provided business\ncost.\n\nIn the second part of the example, we further extend this approach by\nconsidering the problem of fraud detection in credit card transactions: in this\ncase, the business metric depends on the amount of each individual transaction.\n.. topic:: References\n\n    .. [1] \"Statlog (German Credit Data) Data Set\", UCI Machine Learning Repository,\n       [Link](https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29).\n\n    .. [2] [Charles Elkan, \"The Foundations of Cost-Sensitive Learning\",\n       International joint conference on artificial intelligence.\n       Vol. 17. No. 1. Lawrence Erlbaum Associates Ltd, 2001.](https://cseweb.ucsd.edu/~elkan/rescale.pdf)\n"
+        "\n# Post-tuning the decision threshold for cost-sensitive learning\n\nOnce a classifier is trained, the output of the :term:`predict` method outputs class\nlabel predictions corresponding to a thresholding of either the\n:term:`decision_function` or the :term:`predict_proba` output. For a binary classifier,\nthe default threshold is defined as a posterior probability estimate of 0.5 or a\ndecision score of 0.0.\n\nHowever, this default strategy is most likely not optimal for the task at hand.\nHere, we use the \"Statlog\" German credit dataset [1]_ to illustrate a use case.\nIn this dataset, the task is to predict whether a person has a \"good\" or \"bad\" credit.\nIn addition, a cost-matrix is provided that specifies the cost of\nmisclassification. Specifically, misclassifying a \"bad\" credit as \"good\" is five\ntimes more costly on average than misclassifying a \"good\" credit as \"bad\".\n\nWe use the :class:`~sklearn.model_selection.TunedThresholdClassifierCV` to select the\ncut-off point of the decision function that minimizes the provided business\ncost.\n\nIn the second part of the example, we further extend this approach by\nconsidering the problem of fraud detection in credit card transactions: in this\ncase, the business metric depends on the amount of each individual transaction.\n\n.. rubric :: References\n\n.. [1] \"Statlog (German Credit Data) Data Set\", UCI Machine Learning Repository,\n    [Link](https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29).\n\n.. [2] [Charles Elkan, \"The Foundations of Cost-Sensitive Learning\",\n    International joint conference on artificial intelligence.\n    Vol. 17. No. 1. Lawrence Erlbaum Associates Ltd, 2001.](https://cseweb.ucsd.edu/~elkan/rescale.pdf)\n"
       ]
     },
     {
 
@@ -4,20 +4,19 @@
 ====================================================================================
 
 This example is based on Section 5.4.3 of "Gaussian Processes for Machine
-Learning" [RW2006]_. It illustrates an example of complex kernel engineering
+Learning" [1]_. It illustrates an example of complex kernel engineering
 and hyperparameter optimization using gradient ascent on the
 log-marginal-likelihood. The data consists of the monthly average atmospheric
 CO2 concentrations (in parts per million by volume (ppm)) collected at the
 Mauna Loa Observatory in Hawaii, between 1958 and 2001. The objective is to
 model the CO2 concentration as a function of the time :math:`t` and extrapolate
 for years after 2001.
 
-.. topic: References
+.. rubric:: References
 
-    .. [RW2006] `Rasmussen, Carl Edward.
-       "Gaussian processes in machine learning."
-       Summer school on machine learning. Springer, Berlin, Heidelberg, 2003
-       <http://www.gaussianprocess.org/gpml/chapters/RW.pdf>`_.
+.. [1] `Rasmussen, Carl Edward. "Gaussian processes in machine learning."
+    Summer school on machine learning. Springer, Berlin, Heidelberg, 2003
+    <http://www.gaussianprocess.org/gpml/chapters/RW.pdf>`_.
 """
 
 print(__doc__)
 
@@ -30,17 +30,17 @@
 performance of non-nested and nested CV strategies by taking the difference
 between their scores.
 
-.. topic:: See Also:
+.. seealso::
 
     - :ref:`cross_validation`
     - :ref:`grid_search`
 
-.. topic:: References:
+.. rubric:: References
 
-    .. [1] `Cawley, G.C.; Talbot, N.L.C. On over-fitting in model selection and
-     subsequent selection bias in performance evaluation.
-     J. Mach. Learn. Res 2010,11, 2079-2107.
-     <http://jmlr.csail.mit.edu/papers/volume11/cawley10a/cawley10a.pdf>`_
+.. [1] `Cawley, G.C.; Talbot, N.L.C. On over-fitting in model selection and
+    subsequent selection bias in performance evaluation.
+    J. Mach. Learn. Res 2010,11, 2079-2107.
+    <http://jmlr.csail.mit.edu/papers/volume11/cawley10a/cawley10a.pdf>`_
 
 """
 
 
@@ -330,7 +330,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        ".. topic:: References\n\n   .. [1] Dietterich, T. G. (1998). [Approximate statistical tests for\n          comparing supervised classification learning algorithms](http://web.cs.iastate.edu/~jtian/cs573/Papers/Dietterich-98.pdf).\n          Neural computation, 10(7).\n   .. [2] Nadeau, C., & Bengio, Y. (2000). [Inference for the generalization\n          error](https://papers.nips.cc/paper/1661-inference-for-the-generalization-error.pdf).\n          In Advances in neural information processing systems.\n   .. [3] Bouckaert, R. R., & Frank, E. (2004). [Evaluating the replicability\n          of significance tests for comparing learning algorithms](https://www.cms.waikato.ac.nz/~ml/publications/2004/bouckaert-frank.pdf).\n          In Pacific-Asia Conference on Knowledge Discovery and Data Mining.\n   .. [4] Benavoli, A., Corani, G., Dem\u0161ar, J., & Zaffalon, M. (2017). [Time\n          for a change: a tutorial for comparing multiple classifiers through\n          Bayesian analysis](http://www.jmlr.org/papers/volume18/16-305/16-305.pdf).\n          The Journal of Machine Learning Research, 18(1). See the Python\n          library that accompanies this paper [here](https://github.com/janezd/baycomp).\n   .. [5] Diebold, F.X. & Mariano R.S. (1995). [Comparing predictive accuracy](http://www.est.uc3m.es/esp/nueva_docencia/comp_col_get/lade/tecnicas_prediccion/Practicas0708/Comparing%20Predictive%20Accuracy%20(Dielbold).pdf)\n          Journal of Business & economic statistics, 20(1), 134-144.\n\n"
+        ".. rubric:: References\n\n.. [1] Dietterich, T. G. (1998). [Approximate statistical tests for\n       comparing supervised classification learning algorithms](http://web.cs.iastate.edu/~jtian/cs573/Papers/Dietterich-98.pdf).\n       Neural computation, 10(7).\n.. [2] Nadeau, C., & Bengio, Y. (2000). [Inference for the generalization\n       error](https://papers.nips.cc/paper/1661-inference-for-the-generalization-error.pdf).\n       In Advances in neural information processing systems.\n.. [3] Bouckaert, R. R., & Frank, E. (2004). [Evaluating the replicability\n       of significance tests for comparing learning algorithms](https://www.cms.waikato.ac.nz/~ml/publications/2004/bouckaert-frank.pdf).\n       In Pacific-Asia Conference on Knowledge Discovery and Data Mining.\n.. [4] Benavoli, A., Corani, G., Dem\u0161ar, J., & Zaffalon, M. (2017). [Time\n       for a change: a tutorial for comparing multiple classifiers through\n       Bayesian analysis](http://www.jmlr.org/papers/volume18/16-305/16-305.pdf).\n       The Journal of Machine Learning Research, 18(1). See the Python\n       library that accompanies this paper [here](https://github.com/janezd/baycomp).\n.. [5] Diebold, F.X. & Mariano R.S. (1995). [Comparing predictive accuracy](http://www.est.uc3m.es/esp/nueva_docencia/comp_col_get/lade/tecnicas_prediccion/Practicas0708/Comparing%20Predictive%20Accuracy%20(Dielbold).pdf)\n       Journal of Business & economic statistics, 20(1), 134-144.\n\n"
       ]
     }
   ],
 
@@ -12,12 +12,12 @@
 
 We will use USPS digits dataset to reproduce presented in Sect. 4 of [1]_.
 
-.. topic:: References
+.. rubric:: References
 
-   .. [1] `Bakır, Gökhan H., Jason Weston, and Bernhard Schölkopf.
-      "Learning to find pre-images."
-      Advances in neural information processing systems 16 (2004): 449-456.
-      <https://papers.nips.cc/paper/2003/file/ac1ad983e08ad3304a97e147f522747e-Paper.pdf>`_
+.. [1] `Bakır, Gökhan H., Jason Weston, and Bernhard Schölkopf.
+    "Learning to find pre-images."
+    Advances in neural information processing systems 16 (2004): 449-456.
+    <https://papers.nips.cc/paper/2003/file/ac1ad983e08ad3304a97e147f522747e-Paper.pdf>`_
 
 """
 
 
@@ -101,8 +101,8 @@
 tree.predict(X)
 
 # %%
-# New display `model_selection.ValidationCurveDisplay`
-# ----------------------------------------------------
+# New display :class:`~model_selection.ValidationCurveDisplay`
+# ------------------------------------------------------------
 # :class:`model_selection.ValidationCurveDisplay` is now available to plot results
 # from :func:`model_selection.validation_curve`.
 from sklearn.datasets import make_classification
@@ -141,8 +141,8 @@
 cross_val_score(gbdt, X, y).mean()
 
 # %%
-# Grouping infrequent categories in :class:`preprocessing.OrdinalEncoder`
-# -----------------------------------------------------------------------
+# Grouping infrequent categories in :class:`~preprocessing.OrdinalEncoder`
+# ------------------------------------------------------------------------
 # Similarly to :class:`preprocessing.OneHotEncoder`, the class
 # :class:`preprocessing.OrdinalEncoder` now supports aggregating infrequent categories
 # into a single output for each feature. The parameters to enable the gathering of
 
@@ -18,10 +18,10 @@
 This example shows how to use Permutation Importances as an alternative that
 can mitigate those limitations.
 
-.. topic:: References:
+.. rubric:: References
 
-   * :doi:`L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32,
-     2001. <10.1023/A:1010933404324>`
+* :doi:`L. Breiman, "Random Forests", Machine Learning, 45(1), 5-32,
+  2001. <10.1023/A:1010933404324>`
 
 """
 
 
@@ -4,7 +4,7 @@
       "cell_type": "markdown",
       "metadata": {},
       "source": [
-        "\n# Nested versus non-nested cross-validation\n\nThis example compares non-nested and nested cross-validation strategies on a\nclassifier of the iris data set. Nested cross-validation (CV) is often used to\ntrain a model in which hyperparameters also need to be optimized. Nested CV\nestimates the generalization error of the underlying model and its\n(hyper)parameter search. Choosing the parameters that maximize non-nested CV\nbiases the model to the dataset, yielding an overly-optimistic score.\n\nModel selection without nested CV uses the same data to tune model parameters\nand evaluate model performance. Information may thus \"leak\" into the model\nand overfit the data. The magnitude of this effect is primarily dependent on\nthe size of the dataset and the stability of the model. See Cawley and Talbot\n[1]_ for an analysis of these issues.\n\nTo avoid this problem, nested CV effectively uses a series of\ntrain/validation/test set splits. In the inner loop (here executed by\n:class:`GridSearchCV <sklearn.model_selection.GridSearchCV>`), the score is\napproximately maximized by fitting a model to each training set, and then\ndirectly maximized in selecting (hyper)parameters over the validation set. In\nthe outer loop (here in :func:`cross_val_score\n<sklearn.model_selection.cross_val_score>`), generalization error is estimated\nby averaging test set scores over several dataset splits.\n\nThe example below uses a support vector classifier with a non-linear kernel to\nbuild a model with optimized hyperparameters by grid search. We compare the\nperformance of non-nested and nested CV strategies by taking the difference\nbetween their scores.\n\n.. topic:: See Also:\n\n    - `cross_validation`\n    - `grid_search`\n\n.. topic:: References:\n\n    .. [1] [Cawley, G.C.; Talbot, N.L.C. On over-fitting in model selection and\n     subsequent selection bias in performance evaluation.\n     J. Mach. Learn. Res 2010,11, 2079-2107.](http://jmlr.csail.mit.edu/papers/volume11/cawley10a/cawley10a.pdf)\n"
+        "\n# Nested versus non-nested cross-validation\n\nThis example compares non-nested and nested cross-validation strategies on a\nclassifier of the iris data set. Nested cross-validation (CV) is often used to\ntrain a model in which hyperparameters also need to be optimized. Nested CV\nestimates the generalization error of the underlying model and its\n(hyper)parameter search. Choosing the parameters that maximize non-nested CV\nbiases the model to the dataset, yielding an overly-optimistic score.\n\nModel selection without nested CV uses the same data to tune model parameters\nand evaluate model performance. Information may thus \"leak\" into the model\nand overfit the data. The magnitude of this effect is primarily dependent on\nthe size of the dataset and the stability of the model. See Cawley and Talbot\n[1]_ for an analysis of these issues.\n\nTo avoid this problem, nested CV effectively uses a series of\ntrain/validation/test set splits. In the inner loop (here executed by\n:class:`GridSearchCV <sklearn.model_selection.GridSearchCV>`), the score is\napproximately maximized by fitting a model to each training set, and then\ndirectly maximized in selecting (hyper)parameters over the validation set. In\nthe outer loop (here in :func:`cross_val_score\n<sklearn.model_selection.cross_val_score>`), generalization error is estimated\nby averaging test set scores over several dataset splits.\n\nThe example below uses a support vector classifier with a non-linear kernel to\nbuild a model with optimized hyperparameters by grid search. We compare the\nperformance of non-nested and nested CV strategies by taking the difference\nbetween their scores.\n\n.. seealso::\n\n    - `cross_validation`\n    - `grid_search`\n\n.. rubric:: References\n\n.. [1] [Cawley, G.C.; Talbot, N.L.C. On over-fitting in model selection and\n    subsequent selection bias in performance evaluation.\n    J. Mach. Learn. Res 2010,11, 2079-2107.](http://jmlr.csail.mit.edu/papers/volume11/cawley10a/cawley10a.pdf)\n"
       ]
     },
     {
 
@@ -24,8 +24,8 @@
 # %%
 # .. _quantile_support_hgbdt:
 #
-# Quantile loss in :class:`ensemble.HistGradientBoostingRegressor`
-# ----------------------------------------------------------------
+# Quantile loss in :class:`~ensemble.HistGradientBoostingRegressor`
+# -----------------------------------------------------------------
 # :class:`~ensemble.HistGradientBoostingRegressor` can model quantiles with
 # `loss="quantile"` and the new parameter `quantile`.
 from sklearn.ensemble import HistGradientBoostingRegressor
 
@@ -17,11 +17,11 @@
 be selected. This ensures that subsequent iterations of the algorithm focus on
 the difficult-to-classify samples.
 
-.. topic:: References:
+.. rubric:: References
 
-    .. [1] :doi:`J. Zhu, H. Zou, S. Rosset, T. Hastie, "Multi-class adaboost."
-           Statistics and its Interface 2.3 (2009): 349-360.
-           <10.4310/SII.2009.v2.n3.a8>`
+.. [1] :doi:`J. Zhu, H. Zou, S. Rosset, T. Hastie, "Multi-class adaboost."
+    Statistics and its Interface 2.3 (2009): 349-360.
+    <10.4310/SII.2009.v2.n3.a8>`
 
 """
 
@@ -231,16 +231,16 @@ def misclassification_error(y_true, y_pred):
 # decision. Indeed, this exactly is the formulation of updating the base
 # estimators' weights after each iteration in AdaBoost.
 #
-# |details-start| Mathematical details |details-split|
+# .. dropdown:: Mathematical details
 #
-# The weight associated with a weak learner trained at the stage :math:`m` is
-# inversely associated with its misclassification error such that:
+#    The weight associated with a weak learner trained at the stage :math:`m` is
+#    inversely associated with its misclassification error such that:
 #
-# .. math:: \alpha^{(m)} = \log \frac{1 - err^{(m)}}{err^{(m)}} + \log (K - 1),
+#    .. math:: \alpha^{(m)} = \log \frac{1 - err^{(m)}}{err^{(m)}} + \log (K - 1),
 #
-# where :math:`\alpha^{(m)}` and :math:`err^{(m)}` are the weight and the error
-# of the :math:`m` th weak learner, respectively, and :math:`K` is the number of
-# classes in our classification problem. |details-end|
+#    where :math:`\alpha^{(m)}` and :math:`err^{(m)}` are the weight and the error
+#    of the :math:`m` th weak learner, respectively, and :math:`K` is the number of
+#    classes in our classification problem.
 #
 # Another interesting observation boils down to the fact that the first weak
 # learners of the model make fewer errors than later weak learners of the
Original file line number	Diff line number	Diff line change
`@@ -72,7 +72,7 @@`
`72`	`72`	`"cell_type": "markdown",`
`73`	`73`	`"metadata": {},`
`74`	`74`	`"source": [`
`75`		- "## New display `model_selection.ValidationCurveDisplay`\n:class:`model_selection.ValidationCurveDisplay` is now available to plot results\nfrom :func:`model_selection.validation_curve`.\n\n"
	`75`	+ "## New display :class:`~model_selection.ValidationCurveDisplay`\n:class:`model_selection.ValidationCurveDisplay` is now available to plot results\nfrom :func:`model_selection.validation_curve`.\n\n"
`76`	`76`	`]`
`77`	`77`	`},`
`78`	`78`	`{`
`@@ -108,7 +108,7 @@`
`108`	`108`	`"cell_type": "markdown",`
`109`	`109`	`"metadata": {},`
`110`	`110`	`"source": [`
`111`		- "## Grouping infrequent categories in :class:`preprocessing.OrdinalEncoder`\nSimilarly to :class:`preprocessing.OneHotEncoder`, the class\n:class:`preprocessing.OrdinalEncoder` now supports aggregating infrequent categories\ninto a single output for each feature. The parameters to enable the gathering of\ninfrequent categories are `min_frequency` and `max_categories`.\nSee the `User Guide <encoder_infrequent_categories>` for more details.\n\n"
	`111`	+ "## Grouping infrequent categories in :class:`~preprocessing.OrdinalEncoder`\nSimilarly to :class:`preprocessing.OneHotEncoder`, the class\n:class:`preprocessing.OrdinalEncoder` now supports aggregating infrequent categories\ninto a single output for each feature. The parameters to enable the gathering of\ninfrequent categories are `min_frequency` and `max_categories`.\nSee the `User Guide <encoder_infrequent_categories>` for more details.\n\n"
`112`	`112`	`]`
`113`	`113`	`},`
`114`	`114`	`{`
Original file line number	Diff line number	Diff line change
`@@ -4,7 +4,7 @@`
`4`	`4`	`"cell_type": "markdown",`
`5`	`5`	`"metadata": {},`
`6`	`6`	`"source": [`
`7`		- "\n# Post-tuning the decision threshold for cost-sensitive learning\n\nOnce a classifier is trained, the output of the :term:`predict` method outputs class\nlabel predictions corresponding to a thresholding of either the :term:`decision\nfunction` or the :term:`predict_proba` output. For a binary classifier, the default\nthreshold is defined as a posterior probability estimate of 0.5 or a decision score of\n0.0.\n\nHowever, this default strategy is most likely not optimal for the task at hand.\nHere, we use the \"Statlog\" German credit dataset [1]_ to illustrate a use case.\nIn this dataset, the task is to predict whether a person has a \"good\" or \"bad\" credit.\nIn addition, a cost-matrix is provided that specifies the cost of\nmisclassification. Specifically, misclassifying a \"bad\" credit as \"good\" is five\ntimes more costly on average than misclassifying a \"good\" credit as \"bad\".\n\nWe use the :class:`~sklearn.model_selection.TunedThresholdClassifierCV` to select the\ncut-off point of the decision function that minimizes the provided business\ncost.\n\nIn the second part of the example, we further extend this approach by\nconsidering the problem of fraud detection in credit card transactions: in this\ncase, the business metric depends on the amount of each individual transaction.\n.. topic:: References\n\n .. [1] \"Statlog (German Credit Data) Data Set\", UCI Machine Learning Repository,\n [Link](https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29).\n\n .. [2] [Charles Elkan, \"The Foundations of Cost-Sensitive Learning\",\n International joint conference on artificial intelligence.\n Vol. 17. No. 1. Lawrence Erlbaum Associates Ltd, 2001.](https://cseweb.ucsd.edu/~elkan/rescale.pdf)\n"
	`7`	+ "\n# Post-tuning the decision threshold for cost-sensitive learning\n\nOnce a classifier is trained, the output of the :term:`predict` method outputs class\nlabel predictions corresponding to a thresholding of either the\n:term:`decision_function` or the :term:`predict_proba` output. For a binary classifier,\nthe default threshold is defined as a posterior probability estimate of 0.5 or a\ndecision score of 0.0.\n\nHowever, this default strategy is most likely not optimal for the task at hand.\nHere, we use the \"Statlog\" German credit dataset [1]_ to illustrate a use case.\nIn this dataset, the task is to predict whether a person has a \"good\" or \"bad\" credit.\nIn addition, a cost-matrix is provided that specifies the cost of\nmisclassification. Specifically, misclassifying a \"bad\" credit as \"good\" is five\ntimes more costly on average than misclassifying a \"good\" credit as \"bad\".\n\nWe use the :class:`~sklearn.model_selection.TunedThresholdClassifierCV` to select the\ncut-off point of the decision function that minimizes the provided business\ncost.\n\nIn the second part of the example, we further extend this approach by\nconsidering the problem of fraud detection in credit card transactions: in this\ncase, the business metric depends on the amount of each individual transaction.\n\n.. rubric :: References\n\n.. [1] \"Statlog (German Credit Data) Data Set\", UCI Machine Learning Repository,\n [Link](https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29).\n\n.. [2] [Charles Elkan, \"The Foundations of Cost-Sensitive Learning\",\n International joint conference on artificial intelligence.\n Vol. 17. No. 1. Lawrence Erlbaum Associates Ltd, 2001.](https://cseweb.ucsd.edu/~elkan/rescale.pdf)\n"
`8`	`8`	`]`
`9`	`9`	`},`
`10`	`10`	`{`
Original file line number	Diff line number	Diff line change
`@@ -330,7 +330,7 @@`
`330`	`330`	`"cell_type": "markdown",`
`331`	`331`	`"metadata": {},`
`332`	`332`	`"source": [`
`333`		- ".. topic:: References\n\n .. [1] Dietterich, T. G. (1998). [Approximate statistical tests for\n comparing supervised classification learning algorithms](http://web.cs.iastate.edu/~jtian/cs573/Papers/Dietterich-98.pdf).\n Neural computation, 10(7).\n .. [2] Nadeau, C., & Bengio, Y. (2000). [Inference for the generalization\n error](https://papers.nips.cc/paper/1661-inference-for-the-generalization-error.pdf).\n In Advances in neural information processing systems.\n .. [3] Bouckaert, R. R., & Frank, E. (2004). [Evaluating the replicability\n of significance tests for comparing learning algorithms](https://www.cms.waikato.ac.nz/~ml/publications/2004/bouckaert-frank.pdf).\n In Pacific-Asia Conference on Knowledge Discovery and Data Mining.\n .. [4] Benavoli, A., Corani, G., Dem\u0161ar, J., & Zaffalon, M. (2017). [Time\n for a change: a tutorial for comparing multiple classifiers through\n Bayesian analysis](http://www.jmlr.org/papers/volume18/16-305/16-305.pdf).\n The Journal of Machine Learning Research, 18(1). See the Python\n library that accompanies this paper [here](https://github.com/janezd/baycomp).\n .. [5] Diebold, F.X. & Mariano R.S. (1995). [Comparing predictive accuracy](http://www.est.uc3m.es/esp/nueva_docencia/comp_col_get/lade/tecnicas_prediccion/Practicas0708/Comparing%20Predictive%20Accuracy%20(Dielbold).pdf)\n Journal of Business & economic statistics, 20(1), 134-144.\n\n"
	`333`	+ ".. rubric:: References\n\n.. [1] Dietterich, T. G. (1998). [Approximate statistical tests for\n comparing supervised classification learning algorithms](http://web.cs.iastate.edu/~jtian/cs573/Papers/Dietterich-98.pdf).\n Neural computation, 10(7).\n.. [2] Nadeau, C., & Bengio, Y. (2000). [Inference for the generalization\n error](https://papers.nips.cc/paper/1661-inference-for-the-generalization-error.pdf).\n In Advances in neural information processing systems.\n.. [3] Bouckaert, R. R., & Frank, E. (2004). [Evaluating the replicability\n of significance tests for comparing learning algorithms](https://www.cms.waikato.ac.nz/~ml/publications/2004/bouckaert-frank.pdf).\n In Pacific-Asia Conference on Knowledge Discovery and Data Mining.\n.. [4] Benavoli, A., Corani, G., Dem\u0161ar, J., & Zaffalon, M. (2017). [Time\n for a change: a tutorial for comparing multiple classifiers through\n Bayesian analysis](http://www.jmlr.org/papers/volume18/16-305/16-305.pdf).\n The Journal of Machine Learning Research, 18(1). See the Python\n library that accompanies this paper [here](https://github.com/janezd/baycomp).\n.. [5] Diebold, F.X. & Mariano R.S. (1995). [Comparing predictive accuracy](http://www.est.uc3m.es/esp/nueva_docencia/comp_col_get/lade/tecnicas_prediccion/Practicas0708/Comparing%20Predictive%20Accuracy%20(Dielbold).pdf)\n Journal of Business & economic statistics, 20(1), 134-144.\n\n"
`334`	`334`	`]`
`335`	`335`	`}`
`336`	`336`	`],`