-
Notifications
You must be signed in to change notification settings - Fork 81
/
Copy pathdevelop.html
1456 lines (1216 loc) · 111 KB
/
develop.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="en" data-content_root="../" >
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="viewport" content="width=device-width, initial-scale=1" />
<meta property="og:title" content="Developing scikit-learn estimators" />
<meta property="og:type" content="website" />
<meta property="og:url" content="https://scikit-learn/stable/developers/develop.html" />
<meta property="og:site_name" content="scikit-learn" />
<meta property="og:description" content="Whether you are proposing an estimator for inclusion in scikit-learn, developing a separate package compatible with scikit-learn, or implementing custom components for your own projects, this chapt..." />
<meta property="og:image" content="https://scikit-learn.org/stable/_static/scikit-learn-logo-small.png" />
<meta property="og:image:alt" content="scikit-learn" />
<meta name="description" content="Whether you are proposing an estimator for inclusion in scikit-learn, developing a separate package compatible with scikit-learn, or implementing custom components for your own projects, this chapt..." />
<title>Developing scikit-learn estimators — scikit-learn 1.7.dev0 documentation</title>
<script data-cfasync="false">
document.documentElement.dataset.mode = localStorage.getItem("mode") || "";
document.documentElement.dataset.theme = localStorage.getItem("theme") || "";
</script>
<!--
this give us a css class that will be invisible only if js is disabled
-->
<noscript>
<style>
.pst-js-only { display: none !important; }
</style>
</noscript>
<!-- Loaded before other Sphinx assets -->
<link href="../_static/styles/theme.css?digest=8878045cc6db502f8baf" rel="stylesheet" />
<link href="../_static/styles/pydata-sphinx-theme.css?digest=8878045cc6db502f8baf" rel="stylesheet" />
<link rel="stylesheet" type="text/css" href="../_static/pygments.css?v=8f2a1f02" />
<link rel="stylesheet" type="text/css" href="../_static/copybutton.css?v=76b2166b" />
<link rel="stylesheet" type="text/css" href="../_static/plot_directive.css" />
<link rel="stylesheet" type="text/css" href="https://fonts.googleapis.com/css?family=Vibur" />
<link rel="stylesheet" type="text/css" href="../_static/jupyterlite_sphinx.css?v=2c9f8f05" />
<link rel="stylesheet" type="text/css" href="../_static/sg_gallery.css?v=d2d258e8" />
<link rel="stylesheet" type="text/css" href="../_static/sg_gallery-binder.css?v=f4aeca0c" />
<link rel="stylesheet" type="text/css" href="../_static/sg_gallery-dataframe.css?v=2082cf3c" />
<link rel="stylesheet" type="text/css" href="../_static/sg_gallery-rendered-html.css?v=1277b6f3" />
<link rel="stylesheet" type="text/css" href="../_static/sphinx-design.min.css?v=95c83b7e" />
<link rel="stylesheet" type="text/css" href="../_static/styles/colors.css?v=cc94ab7d" />
<link rel="stylesheet" type="text/css" href="../_static/styles/custom.css?v=8f525996" />
<!-- So that users can add custom icons -->
<script src="../_static/scripts/fontawesome.js?digest=8878045cc6db502f8baf"></script>
<!-- Pre-loaded scripts that we'll load fully later -->
<link rel="preload" as="script" href="../_static/scripts/bootstrap.js?digest=8878045cc6db502f8baf" />
<link rel="preload" as="script" href="../_static/scripts/pydata-sphinx-theme.js?digest=8878045cc6db502f8baf" />
<script src="../_static/documentation_options.js?v=473747f4"></script>
<script src="../_static/doctools.js?v=9bcbadda"></script>
<script src="../_static/sphinx_highlight.js?v=dc90522c"></script>
<script src="../_static/clipboard.min.js?v=a7894cd8"></script>
<script src="../_static/copybutton.js?v=97f0b27d"></script>
<script src="../_static/jupyterlite_sphinx.js?v=96e329c5"></script>
<script src="../_static/design-tabs.js?v=f930bc37"></script>
<script data-domain="scikit-learn.org" defer="defer" src="https://views.scientific-python.org/js/script.js"></script>
<script>DOCUMENTATION_OPTIONS.pagename = 'developers/develop';</script>
<script>
DOCUMENTATION_OPTIONS.theme_version = '0.16.1';
DOCUMENTATION_OPTIONS.theme_switcher_json_url = 'https://scikit-learn.org/dev/_static/versions.json';
DOCUMENTATION_OPTIONS.theme_switcher_version_match = '1.7.dev0';
DOCUMENTATION_OPTIONS.show_version_warning_banner =
true;
</script>
<script src="../_static/scripts/dropdown.js?v=d6825577"></script>
<script src="../_static/scripts/version-switcher.js?v=a6dd8357"></script>
<script src="../_static/scripts/sg_plotly_resize.js?v=2167d4db"></script>
<link rel="canonical" href="https://scikit-learn.org/stable/developers/develop.html" />
<link rel="icon" href="../_static/favicon.ico"/>
<link rel="author" title="About these documents" href="../about.html" />
<link rel="search" title="Search" href="../search.html" />
<link rel="next" title="Developers’ Tips and Tricks" href="tips.html" />
<link rel="prev" title="Crafting a minimal reproducer for scikit-learn" href="minimal_reproducer.html" />
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<meta name="docsearch:language" content="en"/>
<meta name="docsearch:version" content="1.7" />
</head>
<body data-bs-spy="scroll" data-bs-target=".bd-toc-nav" data-offset="180" data-bs-root-margin="0px 0px -60%" data-default-mode="">
<div id="pst-skip-link" class="skip-link d-print-none"><a href="#main-content">Skip to main content</a></div>
<div id="pst-scroll-pixel-helper"></div>
<button type="button" class="btn rounded-pill" id="pst-back-to-top">
<i class="fa-solid fa-arrow-up"></i>Back to top</button>
<dialog id="pst-search-dialog">
<form class="bd-search d-flex align-items-center"
action="../search.html"
method="get">
<i class="fa-solid fa-magnifying-glass"></i>
<input type="search"
class="form-control"
name="q"
placeholder="Search the docs ..."
aria-label="Search the docs ..."
autocomplete="off"
autocorrect="off"
autocapitalize="off"
spellcheck="false"/>
<span class="search-button__kbd-shortcut"><kbd class="kbd-shortcut__modifier">Ctrl</kbd>+<kbd>K</kbd></span>
</form>
</dialog>
<div class="pst-async-banner-revealer d-none">
<aside id="bd-header-version-warning" class="d-none d-print-none" aria-label="Version warning"></aside>
</div>
<header class="bd-header navbar navbar-expand-lg bd-navbar d-print-none">
<div class="bd-header__inner bd-page-width">
<button class="pst-navbar-icon sidebar-toggle primary-toggle" aria-label="Site navigation">
<span class="fa-solid fa-bars"></span>
</button>
<div class=" navbar-header-items__start">
<div class="navbar-item">
<a class="navbar-brand logo" href="../index.html">
<img src="../_static/scikit-learn-logo-small.png" class="logo__image only-light" alt="scikit-learn homepage"/>
<img src="../_static/scikit-learn-logo-small.png" class="logo__image only-dark pst-js-only" alt="scikit-learn homepage"/>
</a></div>
</div>
<div class=" navbar-header-items">
<div class="me-auto navbar-header-items__center">
<div class="navbar-item">
<nav>
<ul class="bd-navbar-elements navbar-nav">
<li class="nav-item ">
<a class="nav-link nav-internal" href="../install.html">
Install
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="../user_guide.html">
User Guide
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="../api/index.html">
API
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="../auto_examples/index.html">
Examples
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-external" href="https://blog.scikit-learn.org/">
Community
</a>
</li>
<li class="nav-item dropdown">
<button class="btn dropdown-toggle nav-item" type="button"
data-bs-toggle="dropdown" aria-expanded="false"
aria-controls="pst-nav-more-links">
More
</button>
<ul id="pst-nav-more-links" class="dropdown-menu">
<li class=" ">
<a class="nav-link dropdown-item nav-internal" href="../getting_started.html">
Getting Started
</a>
</li>
<li class=" ">
<a class="nav-link dropdown-item nav-internal" href="../whats_new.html">
Release History
</a>
</li>
<li class=" ">
<a class="nav-link dropdown-item nav-internal" href="../glossary.html">
Glossary
</a>
</li>
<li class=" current active">
<a class="nav-link dropdown-item nav-internal" href="index.html">
Development
</a>
</li>
<li class=" ">
<a class="nav-link dropdown-item nav-internal" href="../faq.html">
FAQ
</a>
</li>
<li class=" ">
<a class="nav-link dropdown-item nav-internal" href="../support.html">
Support
</a>
</li>
<li class=" ">
<a class="nav-link dropdown-item nav-internal" href="../related_projects.html">
Related Projects
</a>
</li>
<li class=" ">
<a class="nav-link dropdown-item nav-internal" href="../roadmap.html">
Roadmap
</a>
</li>
<li class=" ">
<a class="nav-link dropdown-item nav-internal" href="../governance.html">
Governance
</a>
</li>
<li class=" ">
<a class="nav-link dropdown-item nav-internal" href="../about.html">
About us
</a>
</li>
</ul>
</li>
</ul>
</nav></div>
</div>
<div class="navbar-header-items__end">
<div class="navbar-item navbar-persistent--container">
<button class="btn btn-sm pst-navbar-icon search-button search-button__button pst-js-only" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
<i class="fa-solid fa-magnifying-glass fa-lg"></i>
</button>
</div>
<div class="navbar-item">
<button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button pst-js-only" aria-label="Color mode" data-bs-title="Color mode" data-bs-placement="bottom" data-bs-toggle="tooltip">
<i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light" title="Light"></i>
<i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark" title="Dark"></i>
<i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto" title="System Settings"></i>
</button></div>
<div class="navbar-item"><ul class="navbar-icon-links"
aria-label="Icon Links">
<li class="nav-item">
<a href="https://github.com/scikit-learn/scikit-learn" title="GitHub" class="nav-link pst-navbar-icon" rel="noopener" target="_blank" data-bs-toggle="tooltip" data-bs-placement="bottom"><i class="fa-brands fa-square-github fa-lg" aria-hidden="true"></i>
<span class="sr-only">GitHub</span></a>
</li>
</ul></div>
<div class="navbar-item">
<div class="version-switcher__container dropdown pst-js-only">
<button id="pst-version-switcher-button-2"
type="button"
class="version-switcher__button btn btn-sm dropdown-toggle"
data-bs-toggle="dropdown"
aria-haspopup="listbox"
aria-controls="pst-version-switcher-list-2"
aria-label="Version switcher list"
>
Choose version <!-- this text may get changed later by javascript -->
<span class="caret"></span>
</button>
<div id="pst-version-switcher-list-2"
class="version-switcher__menu dropdown-menu list-group-flush py-0"
role="listbox" aria-labelledby="pst-version-switcher-button-2">
<!-- dropdown will be populated by javascript on page load -->
</div>
</div></div>
</div>
</div>
<div class="navbar-persistent--mobile">
<button class="btn btn-sm pst-navbar-icon search-button search-button__button pst-js-only" title="Search" aria-label="Search" data-bs-placement="bottom" data-bs-toggle="tooltip">
<i class="fa-solid fa-magnifying-glass fa-lg"></i>
</button>
</div>
<button class="pst-navbar-icon sidebar-toggle secondary-toggle" aria-label="On this page">
<span class="fa-solid fa-outdent"></span>
</button>
</div>
</header>
<div class="bd-container">
<div class="bd-container__inner bd-page-width">
<dialog id="pst-primary-sidebar-modal"></dialog>
<div id="pst-primary-sidebar" class="bd-sidebar-primary bd-sidebar">
<div class="sidebar-header-items sidebar-primary__section">
<div class="sidebar-header-items__center">
<div class="navbar-item">
<nav>
<ul class="bd-navbar-elements navbar-nav">
<li class="nav-item ">
<a class="nav-link nav-internal" href="../install.html">
Install
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="../user_guide.html">
User Guide
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="../api/index.html">
API
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="../auto_examples/index.html">
Examples
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-external" href="https://blog.scikit-learn.org/">
Community
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="../getting_started.html">
Getting Started
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="../whats_new.html">
Release History
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="../glossary.html">
Glossary
</a>
</li>
<li class="nav-item current active">
<a class="nav-link nav-internal" href="index.html">
Development
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="../faq.html">
FAQ
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="../support.html">
Support
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="../related_projects.html">
Related Projects
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="../roadmap.html">
Roadmap
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="../governance.html">
Governance
</a>
</li>
<li class="nav-item ">
<a class="nav-link nav-internal" href="../about.html">
About us
</a>
</li>
</ul>
</nav></div>
</div>
<div class="sidebar-header-items__end">
<div class="navbar-item">
<button class="btn btn-sm nav-link pst-navbar-icon theme-switch-button pst-js-only" aria-label="Color mode" data-bs-title="Color mode" data-bs-placement="bottom" data-bs-toggle="tooltip">
<i class="theme-switch fa-solid fa-sun fa-lg" data-mode="light" title="Light"></i>
<i class="theme-switch fa-solid fa-moon fa-lg" data-mode="dark" title="Dark"></i>
<i class="theme-switch fa-solid fa-circle-half-stroke fa-lg" data-mode="auto" title="System Settings"></i>
</button></div>
<div class="navbar-item"><ul class="navbar-icon-links"
aria-label="Icon Links">
<li class="nav-item">
<a href="https://github.com/scikit-learn/scikit-learn" title="GitHub" class="nav-link pst-navbar-icon" rel="noopener" target="_blank" data-bs-toggle="tooltip" data-bs-placement="bottom"><i class="fa-brands fa-square-github fa-lg" aria-hidden="true"></i>
<span class="sr-only">GitHub</span></a>
</li>
</ul></div>
<div class="navbar-item">
<div class="version-switcher__container dropdown pst-js-only">
<button id="pst-version-switcher-button-3"
type="button"
class="version-switcher__button btn btn-sm dropdown-toggle"
data-bs-toggle="dropdown"
aria-haspopup="listbox"
aria-controls="pst-version-switcher-list-3"
aria-label="Version switcher list"
>
Choose version <!-- this text may get changed later by javascript -->
<span class="caret"></span>
</button>
<div id="pst-version-switcher-list-3"
class="version-switcher__menu dropdown-menu list-group-flush py-0"
role="listbox" aria-labelledby="pst-version-switcher-button-3">
<!-- dropdown will be populated by javascript on page load -->
</div>
</div></div>
</div>
</div>
<div class="sidebar-primary-items__start sidebar-primary__section">
<div class="sidebar-primary-item">
<nav class="bd-docs-nav bd-links"
aria-label="Section Navigation">
<p class="bd-links__title" role="heading" aria-level="1">Section Navigation</p>
<div class="bd-toc-item navbar-nav"><ul class="current nav bd-sidenav">
<li class="toctree-l1"><a class="reference internal" href="contributing.html">Contributing</a></li>
<li class="toctree-l1"><a class="reference internal" href="minimal_reproducer.html">Crafting a minimal reproducer for scikit-learn</a></li>
<li class="toctree-l1 current active"><a class="current reference internal" href="#">Developing scikit-learn estimators</a></li>
<li class="toctree-l1"><a class="reference internal" href="tips.html">Developers’ Tips and Tricks</a></li>
<li class="toctree-l1"><a class="reference internal" href="utilities.html">Utilities for Developers</a></li>
<li class="toctree-l1"><a class="reference internal" href="performance.html">How to optimize for speed</a></li>
<li class="toctree-l1"><a class="reference internal" href="cython.html">Cython Best Practices, Conventions and Knowledge</a></li>
<li class="toctree-l1"><a class="reference internal" href="advanced_installation.html">Installing the development version of scikit-learn</a></li>
<li class="toctree-l1"><a class="reference internal" href="bug_triaging.html">Bug triaging and issue curation</a></li>
<li class="toctree-l1"><a class="reference internal" href="maintainer.html">Maintainer Information</a></li>
<li class="toctree-l1"><a class="reference internal" href="plotting.html">Developing with the Plotting API</a></li>
</ul>
</div>
</nav></div>
</div>
<div class="sidebar-primary-items__end sidebar-primary__section">
</div>
</div>
<main id="main-content" class="bd-main" role="main">
<div class="bd-content">
<div class="bd-article-container">
<div class="bd-header-article d-print-none">
<div class="header-article-items header-article__inner">
<div class="header-article-items__start">
<div class="header-article-item">
<nav aria-label="Breadcrumb" class="d-print-none">
<ul class="bd-breadcrumbs">
<li class="breadcrumb-item breadcrumb-home">
<a href="../index.html" class="nav-link" aria-label="Home">
<i class="fa-solid fa-home"></i>
</a>
</li>
<li class="breadcrumb-item"><a href="index.html" class="nav-link">Developer’s Guide</a></li>
<li class="breadcrumb-item active" aria-current="page"><span class="ellipsis">Developing scikit-learn estimators</span></li>
</ul>
</nav>
</div>
</div>
</div>
</div>
<div id="searchbox"></div>
<article class="bd-article">
<section id="developing-scikit-learn-estimators">
<span id="develop"></span><h1>Developing scikit-learn estimators<a class="headerlink" href="#developing-scikit-learn-estimators" title="Link to this heading">#</a></h1>
<p>Whether you are proposing an estimator for inclusion in scikit-learn,
developing a separate package compatible with scikit-learn, or
implementing custom components for your own projects, this chapter
details how to develop objects that safely interact with scikit-learn
pipelines and model selection tools.</p>
<p>This section details the public API you should use and implement for a scikit-learn
compatible estimator. Inside scikit-learn itself, we experiment and use some private
tools and our goal is always to make them public once they are stable enough, so that
you can also use them in your own projects.</p>
<section id="apis-of-scikit-learn-objects">
<span id="api-overview"></span><h2>APIs of scikit-learn objects<a class="headerlink" href="#apis-of-scikit-learn-objects" title="Link to this heading">#</a></h2>
<p>There are two major types of estimators. You can think of the first group as simple
estimators, which consists of most estimators, such as
<a class="reference internal" href="../modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression" title="sklearn.linear_model.LogisticRegression"><code class="xref py py-class docutils literal notranslate"><span class="pre">LogisticRegression</span></code></a> or
<a class="reference internal" href="../modules/generated/sklearn.ensemble.RandomForestClassifier.html#sklearn.ensemble.RandomForestClassifier" title="sklearn.ensemble.RandomForestClassifier"><code class="xref py py-class docutils literal notranslate"><span class="pre">RandomForestClassifier</span></code></a>. And the second group are
meta-estimators, which are estimators that wrap other estimators.
<a class="reference internal" href="../modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline" title="sklearn.pipeline.Pipeline"><code class="xref py py-class docutils literal notranslate"><span class="pre">Pipeline</span></code></a> and <a class="reference internal" href="../modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV" title="sklearn.model_selection.GridSearchCV"><code class="xref py py-class docutils literal notranslate"><span class="pre">GridSearchCV</span></code></a>
are two examples of meta-estimators.</p>
<p>Here we start with a few vocabulary terms, and then we illustrate how you can implement
your own estimators.</p>
<p>Elements of the scikit-learn API are described more definitively in the
<a class="reference internal" href="../glossary.html#glossary"><span class="std std-ref">Glossary of Common Terms and API Elements</span></a>.</p>
<section id="different-objects">
<h3>Different objects<a class="headerlink" href="#different-objects" title="Link to this heading">#</a></h3>
<p>The main objects in scikit-learn are (one class can implement multiple interfaces):</p>
<dl class="field-list">
<dt class="field-odd">Estimator<span class="colon">:</span></dt>
<dd class="field-odd"><p>The base object, implements a <code class="docutils literal notranslate"><span class="pre">fit</span></code> method to learn from data, either:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">estimator</span> <span class="o">=</span> <span class="n">estimator</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">targets</span><span class="p">)</span>
</pre></div>
</div>
<p>or:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">estimator</span> <span class="o">=</span> <span class="n">estimator</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</pre></div>
</div>
</dd>
<dt class="field-even">Predictor<span class="colon">:</span></dt>
<dd class="field-even"><p>For supervised learning, or some unsupervised problems, implements:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">prediction</span> <span class="o">=</span> <span class="n">predictor</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</pre></div>
</div>
<p>Classification algorithms usually also offer a way to quantify certainty
of a prediction, either using <code class="docutils literal notranslate"><span class="pre">decision_function</span></code> or <code class="docutils literal notranslate"><span class="pre">predict_proba</span></code>:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">probability</span> <span class="o">=</span> <span class="n">predictor</span><span class="o">.</span><span class="n">predict_proba</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</pre></div>
</div>
</dd>
<dt class="field-odd">Transformer<span class="colon">:</span></dt>
<dd class="field-odd"><p>For modifying the data in a supervised or unsupervised way (e.g. by adding, changing,
or removing columns, but not by adding or removing rows). Implements:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">new_data</span> <span class="o">=</span> <span class="n">transformer</span><span class="o">.</span><span class="n">transform</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</pre></div>
</div>
<p>When fitting and transforming can be performed much more efficiently
together than separately, implements:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">new_data</span> <span class="o">=</span> <span class="n">transformer</span><span class="o">.</span><span class="n">fit_transform</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</pre></div>
</div>
</dd>
<dt class="field-even">Model<span class="colon">:</span></dt>
<dd class="field-even"><p>A model that can give a <a class="reference external" href="https://en.wikipedia.org/wiki/Goodness_of_fit">goodness of fit</a> measure or a likelihood of
unseen data, implements (higher is better):</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">score</span> <span class="o">=</span> <span class="n">model</span><span class="o">.</span><span class="n">score</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
</pre></div>
</div>
</dd>
</dl>
</section>
<section id="estimators">
<h3>Estimators<a class="headerlink" href="#estimators" title="Link to this heading">#</a></h3>
<p>The API has one predominant object: the estimator. An estimator is an
object that fits a model based on some training data and is capable of
inferring some properties on new data. It can be, for instance, a
classifier or a regressor. All estimators implement the fit method:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">estimator</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
</pre></div>
</div>
<p>Out of all the methods that an estimator implements, <code class="docutils literal notranslate"><span class="pre">fit</span></code> is usually the one you
want to implement yourself. Other methods such as <code class="docutils literal notranslate"><span class="pre">set_params</span></code>, <code class="docutils literal notranslate"><span class="pre">get_params</span></code>, etc.
are implemented in <a class="reference internal" href="../modules/generated/sklearn.base.BaseEstimator.html#sklearn.base.BaseEstimator" title="sklearn.base.BaseEstimator"><code class="xref py py-class docutils literal notranslate"><span class="pre">BaseEstimator</span></code></a>, which you should inherit from.
You might need to inherit from more mixins, which we will explain later.</p>
<section id="instantiation">
<h4>Instantiation<a class="headerlink" href="#instantiation" title="Link to this heading">#</a></h4>
<p>This concerns the creation of an object. The object’s <code class="docutils literal notranslate"><span class="pre">__init__</span></code> method might accept
constants as arguments that determine the estimator’s behavior (like the <code class="docutils literal notranslate"><span class="pre">alpha</span></code>
constant in <a class="reference internal" href="../modules/generated/sklearn.linear_model.SGDClassifier.html#sklearn.linear_model.SGDClassifier" title="sklearn.linear_model.SGDClassifier"><code class="xref py py-class docutils literal notranslate"><span class="pre">SGDClassifier</span></code></a>). It should not, however, take
the actual training data as an argument, as this is left to the <code class="docutils literal notranslate"><span class="pre">fit()</span></code> method:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">clf2</span> <span class="o">=</span> <span class="n">SGDClassifier</span><span class="p">(</span><span class="n">alpha</span><span class="o">=</span><span class="mf">2.3</span><span class="p">)</span>
<span class="n">clf3</span> <span class="o">=</span> <span class="n">SGDClassifier</span><span class="p">([[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">],</span> <span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]],</span> <span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">])</span> <span class="c1"># WRONG!</span>
</pre></div>
</div>
<p>Ideally, the arguments accepted by <code class="docutils literal notranslate"><span class="pre">__init__</span></code> should all be keyword arguments with a
default value. In other words, a user should be able to instantiate an estimator without
passing any arguments to it. In some cases, where there are no sane defaults for an
argument, they can be left without a default value. In scikit-learn itself, we have
very few places, only in some meta-estimators, where the sub-estimator(s) argument is
a required argument.</p>
<p>Most arguments correspond to hyperparameters describing the model or the optimisation
problem the estimator tries to solve. Other parameters might define how the estimator
behaves, e.g. defining the location of a cache to store some data. These initial
arguments (or parameters) are always remembered by the estimator. Also note that they
should not be documented under the “Attributes” section, but rather under the
“Parameters” section for that estimator.</p>
<p>In addition, <strong>every keyword argument accepted by</strong> <code class="docutils literal notranslate"><span class="pre">__init__</span></code> <strong>should
correspond to an attribute on the instance</strong>. Scikit-learn relies on this to
find the relevant attributes to set on an estimator when doing model selection.</p>
<p>To summarize, an <code class="docutils literal notranslate"><span class="pre">__init__</span></code> should look like:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">def</span><span class="w"> </span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">param1</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">param2</span><span class="o">=</span><span class="mi">2</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">param1</span> <span class="o">=</span> <span class="n">param1</span>
<span class="bp">self</span><span class="o">.</span><span class="n">param2</span> <span class="o">=</span> <span class="n">param2</span>
</pre></div>
</div>
<p>There should be no logic, not even input validation, and the parameters should not be
changed; which also means ideally they should not be mutable objects such as lists or
dictionaries. If they’re mutable, they should be copied before being modified. The
corresponding logic should be put where the parameters are used, typically in <code class="docutils literal notranslate"><span class="pre">fit</span></code>.
The following is wrong:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="k">def</span><span class="w"> </span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">param1</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">param2</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">param3</span><span class="o">=</span><span class="mi">3</span><span class="p">):</span>
<span class="c1"># WRONG: parameters should not be modified</span>
<span class="k">if</span> <span class="n">param1</span> <span class="o">></span> <span class="mi">1</span><span class="p">:</span>
<span class="n">param2</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="bp">self</span><span class="o">.</span><span class="n">param1</span> <span class="o">=</span> <span class="n">param1</span>
<span class="c1"># WRONG: the object's attributes should have exactly the name of</span>
<span class="c1"># the argument in the constructor</span>
<span class="bp">self</span><span class="o">.</span><span class="n">param3</span> <span class="o">=</span> <span class="n">param2</span>
</pre></div>
</div>
<p>The reason for postponing the validation is that if <code class="docutils literal notranslate"><span class="pre">__init__</span></code> includes input
validation, then the same validation would have to be performed in <code class="docutils literal notranslate"><span class="pre">set_params</span></code>, which
is used in algorithms like <a class="reference internal" href="../modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV" title="sklearn.model_selection.GridSearchCV"><code class="xref py py-class docutils literal notranslate"><span class="pre">GridSearchCV</span></code></a>.</p>
<p>Also it is expected that parameters with trailing <code class="docutils literal notranslate"><span class="pre">_</span></code> are <strong>not to be set
inside the</strong> <code class="docutils literal notranslate"><span class="pre">__init__</span></code> <strong>method</strong>. More details on attributes that are not init
arguments come shortly.</p>
</section>
<section id="fitting">
<h4>Fitting<a class="headerlink" href="#fitting" title="Link to this heading">#</a></h4>
<p>The next thing you will probably want to do is to estimate some parameters in the model.
This is implemented in the <code class="docutils literal notranslate"><span class="pre">fit()</span></code> method, and it’s where the training happens.
For instance, this is where you have the computation to learn or estimate coefficients
for a linear model.</p>
<p>The <code class="docutils literal notranslate"><span class="pre">fit()</span></code> method takes the training data as arguments, which can be one
array in the case of unsupervised learning, or two arrays in the case
of supervised learning. Other metadata that come with the training data, such as
<code class="docutils literal notranslate"><span class="pre">sample_weight</span></code>, can also be passed to <code class="docutils literal notranslate"><span class="pre">fit</span></code> as keyword arguments.</p>
<p>Note that the model is fitted using <code class="docutils literal notranslate"><span class="pre">X</span></code> and <code class="docutils literal notranslate"><span class="pre">y</span></code>, but the object holds no
reference to <code class="docutils literal notranslate"><span class="pre">X</span></code> and <code class="docutils literal notranslate"><span class="pre">y</span></code>. There are, however, some exceptions to this, as in
the case of precomputed kernels where this data must be stored for use by
the predict method.</p>
<div class="pst-scrollable-table-container"><table class="table">
<thead>
<tr class="row-odd"><th class="head"><p>Parameters</p></th>
<th class="head"></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>X</p></td>
<td><p>array-like of shape (n_samples, n_features)</p></td>
</tr>
<tr class="row-odd"><td><p>y</p></td>
<td><p>array-like of shape (n_samples,)</p></td>
</tr>
<tr class="row-even"><td><p>kwargs</p></td>
<td><p>optional data-dependent parameters</p></td>
</tr>
</tbody>
</table>
</div>
<p>The number of samples, i.e. <code class="docutils literal notranslate"><span class="pre">X.shape[0]</span></code> should be the same as <code class="docutils literal notranslate"><span class="pre">y.shape[0]</span></code>. If this
requirement is not met, an exception of type <code class="docutils literal notranslate"><span class="pre">ValueError</span></code> should be raised.</p>
<p><code class="docutils literal notranslate"><span class="pre">y</span></code> might be ignored in the case of unsupervised learning. However, to
make it possible to use the estimator as part of a pipeline that can
mix both supervised and unsupervised transformers, even unsupervised
estimators need to accept a <code class="docutils literal notranslate"><span class="pre">y=None</span></code> keyword argument in
the second position that is just ignored by the estimator.
For the same reason, <code class="docutils literal notranslate"><span class="pre">fit_predict</span></code>, <code class="docutils literal notranslate"><span class="pre">fit_transform</span></code>, <code class="docutils literal notranslate"><span class="pre">score</span></code>
and <code class="docutils literal notranslate"><span class="pre">partial_fit</span></code> methods need to accept a <code class="docutils literal notranslate"><span class="pre">y</span></code> argument in
the second place if they are implemented.</p>
<p>The method should return the object (<code class="docutils literal notranslate"><span class="pre">self</span></code>). This pattern is useful
to be able to implement quick one liners in an IPython session such as:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">y_predicted</span> <span class="o">=</span> <span class="n">SGDClassifier</span><span class="p">(</span><span class="n">alpha</span><span class="o">=</span><span class="mi">10</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">)</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">X_test</span><span class="p">)</span>
</pre></div>
</div>
<p>Depending on the nature of the algorithm, <code class="docutils literal notranslate"><span class="pre">fit</span></code> can sometimes also accept additional
keywords arguments. However, any parameter that can have a value assigned prior to
having access to the data should be an <code class="docutils literal notranslate"><span class="pre">__init__</span></code> keyword argument. Ideally, <strong>fit
parameters should be restricted to directly data dependent variables</strong>. For instance a
Gram matrix or an affinity matrix which are precomputed from the data matrix <code class="docutils literal notranslate"><span class="pre">X</span></code> are
data dependent. A tolerance stopping criterion <code class="docutils literal notranslate"><span class="pre">tol</span></code> is not directly data dependent
(although the optimal value according to some scoring function probably is).</p>
<p>When <code class="docutils literal notranslate"><span class="pre">fit</span></code> is called, any previous call to <code class="docutils literal notranslate"><span class="pre">fit</span></code> should be ignored. In
general, calling <code class="docutils literal notranslate"><span class="pre">estimator.fit(X1)</span></code> and then <code class="docutils literal notranslate"><span class="pre">estimator.fit(X2)</span></code> should
be the same as only calling <code class="docutils literal notranslate"><span class="pre">estimator.fit(X2)</span></code>. However, this may not be
true in practice when <code class="docutils literal notranslate"><span class="pre">fit</span></code> depends on some random process, see
<a class="reference internal" href="../glossary.html#term-random_state"><span class="xref std std-term">random_state</span></a>. Another exception to this rule is when the
hyper-parameter <code class="docutils literal notranslate"><span class="pre">warm_start</span></code> is set to <code class="docutils literal notranslate"><span class="pre">True</span></code> for estimators that
support it. <code class="docutils literal notranslate"><span class="pre">warm_start=True</span></code> means that the previous state of the
trainable parameters of the estimator are reused instead of using the
default initialization strategy.</p>
</section>
<section id="estimated-attributes">
<h4>Estimated Attributes<a class="headerlink" href="#estimated-attributes" title="Link to this heading">#</a></h4>
<p>According to scikit-learn conventions, attributes which you’d want to expose to your
users as public attributes and have been estimated or learned from the data must always
have a name ending with trailing underscore, for example the coefficients of some
regression estimator would be stored in a <code class="docutils literal notranslate"><span class="pre">coef_</span></code> attribute after <code class="docutils literal notranslate"><span class="pre">fit</span></code> has been
called. Similarly, attributes that you learn in the process and you’d like to store yet
not expose to the user, should have a leading underscore, e.g. <code class="docutils literal notranslate"><span class="pre">_intermediate_coefs</span></code>.
You’d need to document the first group (with a trailing underscore) as “Attributes” and
no need to document the second group (with a leading underscore).</p>
<p>The estimated attributes are expected to be overridden when you call <code class="docutils literal notranslate"><span class="pre">fit</span></code> a second
time.</p>
</section>
<section id="universal-attributes">
<h4>Universal attributes<a class="headerlink" href="#universal-attributes" title="Link to this heading">#</a></h4>
<p>Estimators that expect tabular input should set a <code class="docutils literal notranslate"><span class="pre">n_features_in_</span></code>
attribute at <code class="docutils literal notranslate"><span class="pre">fit</span></code> time to indicate the number of features that the estimator
expects for subsequent calls to <a class="reference internal" href="../glossary.html#term-predict"><span class="xref std std-term">predict</span></a> or <a class="reference internal" href="../glossary.html#term-transform"><span class="xref std std-term">transform</span></a>.
See <a class="reference external" href="https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep010/proposal.html">SLEP010</a>
for details.</p>
<p>Similarly, if estimators are given dataframes such as pandas or polars, they should
set a <code class="docutils literal notranslate"><span class="pre">feature_names_in_</span></code> attribute to indicate the features names of the input data,
detailed in <a class="reference external" href="https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep007/proposal.html">SLEP007</a>.
Using <a class="reference internal" href="../modules/generated/sklearn.utils.validation.validate_data.html#sklearn.utils.validation.validate_data" title="sklearn.utils.validation.validate_data"><code class="xref py py-func docutils literal notranslate"><span class="pre">validate_data</span></code></a> would automatically set these
attributes for you.</p>
</section>
</section>
</section>
<section id="rolling-your-own-estimator">
<span id="id1"></span><h2>Rolling your own estimator<a class="headerlink" href="#rolling-your-own-estimator" title="Link to this heading">#</a></h2>
<p>If you want to implement a new estimator that is scikit-learn compatible, there are
several internals of scikit-learn that you should be aware of in addition to
the scikit-learn API outlined above. You can check whether your estimator
adheres to the scikit-learn interface and standards by running
<a class="reference internal" href="../modules/generated/sklearn.utils.estimator_checks.check_estimator.html#sklearn.utils.estimator_checks.check_estimator" title="sklearn.utils.estimator_checks.check_estimator"><code class="xref py py-func docutils literal notranslate"><span class="pre">check_estimator</span></code></a> on an instance. The
<a class="reference internal" href="../modules/generated/sklearn.utils.estimator_checks.parametrize_with_checks.html#sklearn.utils.estimator_checks.parametrize_with_checks" title="sklearn.utils.estimator_checks.parametrize_with_checks"><code class="xref py py-func docutils literal notranslate"><span class="pre">parametrize_with_checks</span></code></a> pytest
decorator can also be used (see its docstring for details and possible
interactions with <code class="docutils literal notranslate"><span class="pre">pytest</span></code>):</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span><span class="w"> </span><span class="nn">sklearn.utils.estimator_checks</span><span class="w"> </span><span class="kn">import</span> <span class="n">check_estimator</span>
<span class="gp">>>> </span><span class="kn">from</span><span class="w"> </span><span class="nn">sklearn.tree</span><span class="w"> </span><span class="kn">import</span> <span class="n">DecisionTreeClassifier</span>
<span class="gp">>>> </span><span class="n">check_estimator</span><span class="p">(</span><span class="n">DecisionTreeClassifier</span><span class="p">())</span> <span class="c1"># passes</span>
<span class="go">[...]</span>
</pre></div>
</div>
<p>The main motivation to make a class compatible to the scikit-learn estimator
interface might be that you want to use it together with model evaluation and
selection tools such as <a class="reference internal" href="../modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV" title="sklearn.model_selection.GridSearchCV"><code class="xref py py-class docutils literal notranslate"><span class="pre">GridSearchCV</span></code></a> and
<a class="reference internal" href="../modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline" title="sklearn.pipeline.Pipeline"><code class="xref py py-class docutils literal notranslate"><span class="pre">Pipeline</span></code></a>.</p>
<p>Before detailing the required interface below, we describe two ways to achieve
the correct interface more easily.</p>
<aside class="topic">
<p class="topic-title">Project template:</p>
<p>We provide a <a class="reference external" href="https://github.com/scikit-learn-contrib/project-template/">project template</a> which helps in the
creation of Python packages containing scikit-learn compatible estimators. It
provides:</p>
<ul class="simple">
<li><p>an initial git repository with Python package directory structure</p></li>
<li><p>a template of a scikit-learn estimator</p></li>
<li><p>an initial test suite including use of <code class="xref py py-func docutils literal notranslate"><span class="pre">parametrize_with_checks</span></code></p></li>
<li><p>directory structures and scripts to compile documentation and example
galleries</p></li>
<li><p>scripts to manage continuous integration (testing on Linux, MacOS, and Windows)</p></li>
<li><p>instructions from getting started to publishing on <a class="reference external" href="https://pypi.org/">PyPi</a></p></li>
</ul>
</aside>
<aside class="topic">
<p class="topic-title"><a class="reference internal" href="../modules/generated/sklearn.base.BaseEstimator.html#sklearn.base.BaseEstimator" title="sklearn.base.BaseEstimator"><code class="xref py py-class docutils literal notranslate"><span class="pre">base.BaseEstimator</span></code></a> and mixins:</p>
<p>We tend to use “duck typing” instead of checking for <a class="reference external" href="https://docs.python.org/3/library/functions.html#isinstance" title="(in Python v3.13)"><code class="xref py py-func docutils literal notranslate"><span class="pre">isinstance</span></code></a>, which means
it’s technically possible to implement an estimator without inheriting from
scikit-learn classes. However, if you don’t inherit from the right mixins, either
there will be a large amount of boilerplate code for you to implement and keep in
sync with scikit-learn development, or your estimator might not function the same
way as a scikit-learn estimator. Here we only document how to develop an estimator
using our mixins. If you’re interested in implementing your estimator without
inheriting from scikit-learn mixins, you’d need to check our implementations.</p>
<p>For example, below is a custom classifier, with more examples included in the
scikit-learn-contrib <a class="reference external" href="https://github.com/scikit-learn-contrib/project-template/blob/master/skltemplate/_template.py">project template</a>.</p>
<p>It is particularly important to notice that mixins should be “on the left” while
the <code class="docutils literal notranslate"><span class="pre">BaseEstimator</span></code> should be “on the right” in the inheritance list for proper
MRO.</p>
<div class="doctest highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">import</span><span class="w"> </span><span class="nn">numpy</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="nn">np</span>
<span class="gp">>>> </span><span class="kn">from</span><span class="w"> </span><span class="nn">sklearn.base</span><span class="w"> </span><span class="kn">import</span> <span class="n">BaseEstimator</span><span class="p">,</span> <span class="n">ClassifierMixin</span>
<span class="gp">>>> </span><span class="kn">from</span><span class="w"> </span><span class="nn">sklearn.utils.validation</span><span class="w"> </span><span class="kn">import</span> <span class="n">validate_data</span><span class="p">,</span> <span class="n">check_is_fitted</span>
<span class="gp">>>> </span><span class="kn">from</span><span class="w"> </span><span class="nn">sklearn.utils.multiclass</span><span class="w"> </span><span class="kn">import</span> <span class="n">unique_labels</span>
<span class="gp">>>> </span><span class="kn">from</span><span class="w"> </span><span class="nn">sklearn.metrics</span><span class="w"> </span><span class="kn">import</span> <span class="n">euclidean_distances</span>
<span class="gp">>>> </span><span class="k">class</span><span class="w"> </span><span class="nc">TemplateClassifier</span><span class="p">(</span><span class="n">ClassifierMixin</span><span class="p">,</span> <span class="n">BaseEstimator</span><span class="p">):</span>
<span class="gp">...</span>
<span class="gp">... </span> <span class="k">def</span><span class="w"> </span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">demo_param</span><span class="o">=</span><span class="s1">'demo'</span><span class="p">):</span>
<span class="gp">... </span> <span class="bp">self</span><span class="o">.</span><span class="n">demo_param</span> <span class="o">=</span> <span class="n">demo_param</span>
<span class="gp">...</span>
<span class="gp">... </span> <span class="k">def</span><span class="w"> </span><span class="nf">fit</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="gp">...</span>
<span class="gp">... </span> <span class="c1"># Check that X and y have correct shape, set n_features_in_, etc.</span>
<span class="gp">... </span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">validate_data</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">)</span>
<span class="gp">... </span> <span class="c1"># Store the classes seen during fit</span>
<span class="gp">... </span> <span class="bp">self</span><span class="o">.</span><span class="n">classes_</span> <span class="o">=</span> <span class="n">unique_labels</span><span class="p">(</span><span class="n">y</span><span class="p">)</span>
<span class="gp">...</span>
<span class="gp">... </span> <span class="bp">self</span><span class="o">.</span><span class="n">X_</span> <span class="o">=</span> <span class="n">X</span>
<span class="gp">... </span> <span class="bp">self</span><span class="o">.</span><span class="n">y_</span> <span class="o">=</span> <span class="n">y</span>
<span class="gp">... </span> <span class="c1"># Return the classifier</span>
<span class="gp">... </span> <span class="k">return</span> <span class="bp">self</span>
<span class="gp">...</span>
<span class="gp">... </span> <span class="k">def</span><span class="w"> </span><span class="nf">predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
<span class="gp">...</span>
<span class="gp">... </span> <span class="c1"># Check if fit has been called</span>
<span class="gp">... </span> <span class="n">check_is_fitted</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span>
<span class="gp">...</span>
<span class="gp">... </span> <span class="c1"># Input validation</span>
<span class="gp">... </span> <span class="n">X</span> <span class="o">=</span> <span class="n">validate_data</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">reset</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
<span class="gp">...</span>
<span class="gp">... </span> <span class="n">closest</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">argmin</span><span class="p">(</span><span class="n">euclidean_distances</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">X_</span><span class="p">),</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="gp">... </span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">y_</span><span class="p">[</span><span class="n">closest</span><span class="p">]</span>
</pre></div>
</div>
</aside>
<p>And you can check that the above estimator passes all common checks:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span><span class="w"> </span><span class="nn">sklearn.utils.estimator_checks</span><span class="w"> </span><span class="kn">import</span> <span class="n">check_estimator</span>
<span class="gp">>>> </span><span class="n">check_estimator</span><span class="p">(</span><span class="n">TemplateClassifier</span><span class="p">())</span> <span class="c1"># passes </span>
</pre></div>
</div>
<section id="get-params-and-set-params">
<h3>get_params and set_params<a class="headerlink" href="#get-params-and-set-params" title="Link to this heading">#</a></h3>
<p>All scikit-learn estimators have <code class="docutils literal notranslate"><span class="pre">get_params</span></code> and <code class="docutils literal notranslate"><span class="pre">set_params</span></code> functions.</p>
<p>The <code class="docutils literal notranslate"><span class="pre">get_params</span></code> function takes no arguments and returns a dict of the
<code class="docutils literal notranslate"><span class="pre">__init__</span></code> parameters of the estimator, together with their values.</p>
<p>It takes one keyword argument, <code class="docutils literal notranslate"><span class="pre">deep</span></code>, which receives a boolean value that determines
whether the method should return the parameters of sub-estimators (only relevant for
meta-estimators). The default value for <code class="docutils literal notranslate"><span class="pre">deep</span></code> is <code class="docutils literal notranslate"><span class="pre">True</span></code>. For instance considering
the following estimator:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="kn">from</span><span class="w"> </span><span class="nn">sklearn.base</span><span class="w"> </span><span class="kn">import</span> <span class="n">BaseEstimator</span>
<span class="gp">>>> </span><span class="kn">from</span><span class="w"> </span><span class="nn">sklearn.linear_model</span><span class="w"> </span><span class="kn">import</span> <span class="n">LogisticRegression</span>
<span class="gp">>>> </span><span class="k">class</span><span class="w"> </span><span class="nc">MyEstimator</span><span class="p">(</span><span class="n">BaseEstimator</span><span class="p">):</span>
<span class="gp">... </span> <span class="k">def</span><span class="w"> </span><span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">subestimator</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">my_extra_param</span><span class="o">=</span><span class="s2">"random"</span><span class="p">):</span>
<span class="gp">... </span> <span class="bp">self</span><span class="o">.</span><span class="n">subestimator</span> <span class="o">=</span> <span class="n">subestimator</span>
<span class="gp">... </span> <span class="bp">self</span><span class="o">.</span><span class="n">my_extra_param</span> <span class="o">=</span> <span class="n">my_extra_param</span>
</pre></div>
</div>
<p>The parameter <code class="docutils literal notranslate"><span class="pre">deep</span></code> controls whether or not the parameters of the
<code class="docutils literal notranslate"><span class="pre">subestimator</span></code> should be reported. Thus when <code class="docutils literal notranslate"><span class="pre">deep=True</span></code>, the output will be:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="n">my_estimator</span> <span class="o">=</span> <span class="n">MyEstimator</span><span class="p">(</span><span class="n">subestimator</span><span class="o">=</span><span class="n">LogisticRegression</span><span class="p">())</span>
<span class="gp">>>> </span><span class="k">for</span> <span class="n">param</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">my_estimator</span><span class="o">.</span><span class="n">get_params</span><span class="p">(</span><span class="n">deep</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">param</span><span class="si">}</span><span class="s2"> -> </span><span class="si">{</span><span class="n">value</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="go">my_extra_param -> random</span>
<span class="go">subestimator__C -> 1.0</span>
<span class="go">subestimator__class_weight -> None</span>
<span class="go">subestimator__dual -> False</span>
<span class="go">subestimator__fit_intercept -> True</span>
<span class="go">subestimator__intercept_scaling -> 1</span>
<span class="go">subestimator__l1_ratio -> None</span>
<span class="go">subestimator__max_iter -> 100</span>
<span class="go">subestimator__multi_class -> deprecated</span>
<span class="go">subestimator__n_jobs -> None</span>
<span class="go">subestimator__penalty -> l2</span>
<span class="go">subestimator__random_state -> None</span>
<span class="go">subestimator__solver -> lbfgs</span>
<span class="go">subestimator__tol -> 0.0001</span>
<span class="go">subestimator__verbose -> 0</span>
<span class="go">subestimator__warm_start -> False</span>
<span class="go">subestimator -> LogisticRegression()</span>
</pre></div>
</div>
<p>If the meta-estimator takes multiple sub-estimators, often, those sub-estimators have
names (as e.g. named steps in a <a class="reference internal" href="../modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline" title="sklearn.pipeline.Pipeline"><code class="xref py py-class docutils literal notranslate"><span class="pre">Pipeline</span></code></a> object), in which case the
key should become <code class="docutils literal notranslate"><span class="pre"><name>__C</span></code>, <code class="docutils literal notranslate"><span class="pre"><name>__class_weight</span></code>, etc.</p>
<p>When <code class="docutils literal notranslate"><span class="pre">deep=False</span></code>, the output will be:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="gp">>>> </span><span class="k">for</span> <span class="n">param</span><span class="p">,</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">my_estimator</span><span class="o">.</span><span class="n">get_params</span><span class="p">(</span><span class="n">deep</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
<span class="gp">... </span> <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">param</span><span class="si">}</span><span class="s2"> -> </span><span class="si">{</span><span class="n">value</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="go">my_extra_param -> random</span>
<span class="go">subestimator -> LogisticRegression()</span>
</pre></div>
</div>
<p>On the other hand, <code class="docutils literal notranslate"><span class="pre">set_params</span></code> takes the parameters of <code class="docutils literal notranslate"><span class="pre">__init__</span></code> as keyword
arguments, unpacks them into a dict of the form <code class="docutils literal notranslate"><span class="pre">'parameter':</span> <span class="pre">value</span></code> and sets the
parameters of the estimator using this dict. It returns the estimator itself.</p>
<p>The <a class="reference internal" href="../modules/generated/sklearn.base.BaseEstimator.html#sklearn.base.BaseEstimator.set_params" title="sklearn.base.BaseEstimator.set_params"><code class="xref py py-func docutils literal notranslate"><span class="pre">set_params</span></code></a> function is used to set parameters during
grid search for instance.</p>
</section>
<section id="cloning">
<span id="id2"></span><h3>Cloning<a class="headerlink" href="#cloning" title="Link to this heading">#</a></h3>
<p>As already mentioned that when constructor arguments are mutable, they should be