Skip to content

Commit fb3f05b

Browse files
committed
Pushing the docs to dev/ for branch: main, commit bb5ab53f94637e6dc9341f6f8d817910ccec4cbb
1 parent f008f32 commit fb3f05b

File tree

1,234 files changed

+4901
-5111
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,234 files changed

+4901
-5111
lines changed
Binary file not shown.

dev/_downloads/26f110ad6cff1a8a7c58b1a00d8b8b5a/plot_column_transformer_mixed_types.ipynb

+55-8
Original file line numberDiff line numberDiff line change
@@ -26,14 +26,61 @@
2626
},
2727
"outputs": [],
2828
"source": [
29-
"# Author: Pedro Morales <[email protected]>\n#\n# License: BSD 3 clause\n\nimport numpy as np\n\nfrom sklearn.compose import ColumnTransformer\nfrom sklearn.datasets import fetch_openml\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.impute import SimpleImputer\nfrom sklearn.preprocessing import StandardScaler, OneHotEncoder\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.model_selection import train_test_split, GridSearchCV\n\nnp.random.seed(0)\n\n# Load data from https://www.openml.org/d/40945\nX, y = fetch_openml(\"titanic\", version=1, as_frame=True, return_X_y=True)\n\n# Alternatively X and y can be obtained directly from the frame attribute:\n# X = titanic.frame.drop('survived', axis=1)\n# y = titanic.frame['survived']"
29+
"# Author: Pedro Morales <[email protected]>\n#\n# License: BSD 3 clause"
30+
]
31+
},
32+
{
33+
"cell_type": "code",
34+
"execution_count": null,
35+
"metadata": {
36+
"collapsed": false
37+
},
38+
"outputs": [],
39+
"source": [
40+
"import numpy as np\n\nfrom sklearn.compose import ColumnTransformer\nfrom sklearn.datasets import fetch_openml\nfrom sklearn.pipeline import Pipeline\nfrom sklearn.impute import SimpleImputer\nfrom sklearn.preprocessing import StandardScaler, OneHotEncoder\nfrom sklearn.linear_model import LogisticRegression\nfrom sklearn.model_selection import train_test_split, GridSearchCV\n\nnp.random.seed(0)"
41+
]
42+
},
43+
{
44+
"cell_type": "markdown",
45+
"metadata": {},
46+
"source": [
47+
"Load data from https://www.openml.org/d/40945\n\n"
48+
]
49+
},
50+
{
51+
"cell_type": "code",
52+
"execution_count": null,
53+
"metadata": {
54+
"collapsed": false
55+
},
56+
"outputs": [],
57+
"source": [
58+
"X, y = fetch_openml(\"titanic\", version=1, as_frame=True, return_X_y=True)\n\n# Alternatively X and y can be obtained directly from the frame attribute:\n# X = titanic.frame.drop('survived', axis=1)\n# y = titanic.frame['survived']"
59+
]
60+
},
61+
{
62+
"cell_type": "markdown",
63+
"metadata": {},
64+
"source": [
65+
"Use ``ColumnTransformer`` by selecting column by names\n\nWe will train our classifier with the following features:\n\nNumeric Features:\n\n* ``age``: float;\n* ``fare``: float.\n\nCategorical Features:\n\n* ``embarked``: categories encoded as strings ``{'C', 'S', 'Q'}``;\n* ``sex``: categories encoded as strings ``{'female', 'male'}``;\n* ``pclass``: ordinal integers ``{1, 2, 3}``.\n\nWe create the preprocessing pipelines for both numeric and categorical data.\nNote that ``pclass`` could either be treated as a categorical or numeric\nfeature.\n\n"
66+
]
67+
},
68+
{
69+
"cell_type": "code",
70+
"execution_count": null,
71+
"metadata": {
72+
"collapsed": false
73+
},
74+
"outputs": [],
75+
"source": [
76+
"numeric_features = [\"age\", \"fare\"]\nnumeric_transformer = Pipeline(\n steps=[(\"imputer\", SimpleImputer(strategy=\"median\")), (\"scaler\", StandardScaler())]\n)\n\ncategorical_features = [\"embarked\", \"sex\", \"pclass\"]\ncategorical_transformer = OneHotEncoder(handle_unknown=\"ignore\")\n\npreprocessor = ColumnTransformer(\n transformers=[\n (\"num\", numeric_transformer, numeric_features),\n (\"cat\", categorical_transformer, categorical_features),\n ]\n)"
3077
]
3178
},
3279
{
3380
"cell_type": "markdown",
3481
"metadata": {},
3582
"source": [
36-
"## Use ``ColumnTransformer`` by selecting column by names\n We will train our classifier with the following features:\n\n Numeric Features:\n\n * ``age``: float;\n * ``fare``: float.\n\n Categorical Features:\n\n * ``embarked``: categories encoded as strings ``{'C', 'S', 'Q'}``;\n * ``sex``: categories encoded as strings ``{'female', 'male'}``;\n * ``pclass``: ordinal integers ``{1, 2, 3}``.\n\n We create the preprocessing pipelines for both numeric and categorical data.\n Note that ``pclass`` could either be treated as a categorical or numeric\n feature.\n\n"
83+
"Append classifier to preprocessing pipeline.\nNow we have a full prediction pipeline.\n\n"
3784
]
3885
},
3986
{
@@ -44,14 +91,14 @@
4491
},
4592
"outputs": [],
4693
"source": [
47-
"numeric_features = [\"age\", \"fare\"]\nnumeric_transformer = Pipeline(\n steps=[(\"imputer\", SimpleImputer(strategy=\"median\")), (\"scaler\", StandardScaler())]\n)\n\ncategorical_features = [\"embarked\", \"sex\", \"pclass\"]\ncategorical_transformer = OneHotEncoder(handle_unknown=\"ignore\")\n\npreprocessor = ColumnTransformer(\n transformers=[\n (\"num\", numeric_transformer, numeric_features),\n (\"cat\", categorical_transformer, categorical_features),\n ]\n)\n\n# Append classifier to preprocessing pipeline.\n# Now we have a full prediction pipeline.\nclf = Pipeline(\n steps=[(\"preprocessor\", preprocessor), (\"classifier\", LogisticRegression())]\n)\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n\nclf.fit(X_train, y_train)\nprint(\"model score: %.3f\" % clf.score(X_test, y_test))"
94+
"clf = Pipeline(\n steps=[(\"preprocessor\", preprocessor), (\"classifier\", LogisticRegression())]\n)\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n\nclf.fit(X_train, y_train)\nprint(\"model score: %.3f\" % clf.score(X_test, y_test))"
4895
]
4996
},
5097
{
5198
"cell_type": "markdown",
5299
"metadata": {},
53100
"source": [
54-
"## HTML representation of ``Pipeline`` (display diagram)\n When the ``Pipeline`` is printed out in a jupyter notebook an HTML\n representation of the estimator is displayed as follows:\n\n"
101+
"HTML representation of ``Pipeline`` (display diagram)\n\nWhen the ``Pipeline`` is printed out in a jupyter notebook an HTML\nrepresentation of the estimator is displayed:\n\n"
55102
]
56103
},
57104
{
@@ -62,14 +109,14 @@
62109
},
63110
"outputs": [],
64111
"source": [
65-
"from sklearn import set_config\n\nset_config(display=\"diagram\")\nclf"
112+
"clf"
66113
]
67114
},
68115
{
69116
"cell_type": "markdown",
70117
"metadata": {},
71118
"source": [
72-
"## Use ``ColumnTransformer`` by selecting column by data types\n When dealing with a cleaned dataset, the preprocessing can be automatic by\n using the data types of the column to decide whether to treat a column as a\n numerical or categorical feature.\n :func:`sklearn.compose.make_column_selector` gives this possibility.\n First, let's only select a subset of columns to simplify our\n example.\n\n"
119+
"Use ``ColumnTransformer`` by selecting column by data types\n\nWhen dealing with a cleaned dataset, the preprocessing can be automatic by\nusing the data types of the column to decide whether to treat a column as a\nnumerical or categorical feature.\n:func:`sklearn.compose.make_column_selector` gives this possibility.\nFirst, let's only select a subset of columns to simplify our\nexample.\n\n"
73120
]
74121
},
75122
{
@@ -123,7 +170,7 @@
123170
},
124171
"outputs": [],
125172
"source": [
126-
"from sklearn.compose import make_column_selector as selector\n\npreprocessor = ColumnTransformer(\n transformers=[\n (\"num\", numeric_transformer, selector(dtype_exclude=\"category\")),\n (\"cat\", categorical_transformer, selector(dtype_include=\"category\")),\n ]\n)\nclf = Pipeline(\n steps=[(\"preprocessor\", preprocessor), (\"classifier\", LogisticRegression())]\n)\n\n\nclf.fit(X_train, y_train)\nprint(\"model score: %.3f\" % clf.score(X_test, y_test))"
173+
"from sklearn.compose import make_column_selector as selector\n\npreprocessor = ColumnTransformer(\n transformers=[\n (\"num\", numeric_transformer, selector(dtype_exclude=\"category\")),\n (\"cat\", categorical_transformer, selector(dtype_include=\"category\")),\n ]\n)\nclf = Pipeline(\n steps=[(\"preprocessor\", preprocessor), (\"classifier\", LogisticRegression())]\n)\n\n\nclf.fit(X_train, y_train)\nprint(\"model score: %.3f\" % clf.score(X_test, y_test))\nclf"
127174
]
128175
},
129176
{
@@ -159,7 +206,7 @@
159206
"cell_type": "markdown",
160207
"metadata": {},
161208
"source": [
162-
"## Using the prediction pipeline in a grid search\n Grid search can also be performed on the different preprocessing steps\n defined in the ``ColumnTransformer`` object, together with the classifier's\n hyperparameters as part of the ``Pipeline``.\n We will search for both the imputer strategy of the numeric preprocessing\n and the regularization parameter of the logistic regression using\n :class:`~sklearn.model_selection.GridSearchCV`.\n\n"
209+
"Using the prediction pipeline in a grid search\n\nGrid search can also be performed on the different preprocessing steps\ndefined in the ``ColumnTransformer`` object, together with the classifier's\nhyperparameters as part of the ``Pipeline``.\nWe will search for both the imputer strategy of the numeric preprocessing\nand the regularization parameter of the logistic regression using\n:class:`~sklearn.model_selection.GridSearchCV`.\n\n"
163210
]
164211
},
165212
{

dev/_downloads/3a10dcfbc1a4bf1349c7101a429aa47b/plot_feature_transformation.py

-4
Original file line numberDiff line numberDiff line change
@@ -25,10 +25,6 @@
2525
#
2626
# License: BSD 3 clause
2727

28-
from sklearn import set_config
29-
30-
set_config(display="diagram")
31-
3228
# %%
3329
# First, we will create a large dataset and split it into three sets:
3430
#

dev/_downloads/3c9b7bcd0b16f172ac12ffad61f3b5f0/plot_stack_predictors.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
},
2727
"outputs": [],
2828
"source": [
29-
"# Authors: Guillaume Lemaitre <[email protected]>\n# Maria Telenczuk <https://github.com/maikia>\n# License: BSD 3 clause\n\nfrom sklearn import set_config\n\nset_config(display=\"diagram\")"
29+
"# Authors: Guillaume Lemaitre <[email protected]>\n# Maria Telenczuk <https://github.com/maikia>\n# License: BSD 3 clause"
3030
]
3131
},
3232
{

dev/_downloads/51833337bfc73d152b44902e5baa50ff/plot_lasso_lars_ic.ipynb

-11
Original file line numberDiff line numberDiff line change
@@ -29,17 +29,6 @@
2929
"# Author: Alexandre Gramfort\n# Guillaume Lemaitre\n# License: BSD 3 clause"
3030
]
3131
},
32-
{
33-
"cell_type": "code",
34-
"execution_count": null,
35-
"metadata": {
36-
"collapsed": false
37-
},
38-
"outputs": [],
39-
"source": [
40-
"import sklearn\n\nsklearn.set_config(display=\"diagram\")"
41-
]
42-
},
4332
{
4433
"cell_type": "markdown",
4534
"metadata": {},

dev/_downloads/51e6f272e94e3b63cfd48c4b41fbaa10/plot_feature_selection_pipeline.ipynb

-11
Original file line numberDiff line numberDiff line change
@@ -18,17 +18,6 @@
1818
"\n# Pipeline ANOVA SVM\n\nThis example shows how a feature selection can be easily integrated within\na machine learning pipeline.\n\nWe also show that you can easily introspect part of the pipeline.\n"
1919
]
2020
},
21-
{
22-
"cell_type": "code",
23-
"execution_count": null,
24-
"metadata": {
25-
"collapsed": false
26-
},
27-
"outputs": [],
28-
"source": [
29-
"from sklearn import set_config\n\nset_config(display=\"diagram\")"
30-
]
31-
},
3221
{
3322
"cell_type": "markdown",
3423
"metadata": {},

dev/_downloads/58580795dd881384f33e7e6492e154e2/plot_lasso_model_selection.py

-5
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,6 @@
1919
# Guillaume Lemaitre
2020
# License: BSD 3 clause
2121

22-
# %%
23-
import sklearn
24-
25-
sklearn.set_config(display="diagram")
26-
2722
# %%
2823
# Dataset
2924
# -------

dev/_downloads/5a7e586367163444711012a4c5214817/plot_feature_selection_pipeline.py

-4
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,6 @@
1010
1111
"""
1212

13-
from sklearn import set_config
14-
15-
set_config(display="diagram")
16-
1713
# %%
1814
# We will start by generating a binary classification dataset. Subsequently, we
1915
# will divide the dataset into two subsets.

0 commit comments

Comments
 (0)