<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> <title>Underfitting vs. Overfitting — scikit-learn 0.16.1 documentation</title> <!-- htmltitle is before nature.css - we use this hack to load bootstrap first --> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <link rel="stylesheet" href="../_static/css/bootstrap.min.css" media="screen" /> <link rel="stylesheet" href="../_static/css/bootstrap-responsive.css"/> <link rel="stylesheet" href="../_static/nature.css" type="text/css" /> <link rel="stylesheet" href="../_static/pygments.css" type="text/css" /> <link rel="stylesheet" href="../_static/gallery.css" type="text/css" /> <script type="text/javascript"> var DOCUMENTATION_OPTIONS = { URL_ROOT: '../', VERSION: '0.16.1', COLLAPSE_INDEX: false, FILE_SUFFIX: '.html', HAS_SOURCE: true }; </script> <script type="text/javascript" src="../_static/jquery.js"></script> <script type="text/javascript" src="../_static/underscore.js"></script> <script type="text/javascript" src="../_static/doctools.js"></script> <script type="text/javascript" src="../_static/js/copybutton.js"></script> <link rel="shortcut icon" href="../_static/favicon.ico"/> <link rel="author" title="About these documents" href="../about.html" /> <link rel="top" title="scikit-learn 0.16.1 documentation" href="../index.html" /> <script type="text/javascript" src="../_static/sidebar.js"></script> <meta name="viewport" content="width=device-width, initial-scale=1.0" /> <script src="../_static/js/bootstrap.min.js" type="text/javascript"></script> <link rel="canonical" href="https://scikit-learn.org/stable/auto_examples/plot_underfitting_overfitting.html" /> <script type="text/javascript"> $("div.buttonNext, div.buttonPrevious").hover( function () { $(this).css('background-color', '#FF9C34'); }, function () { $(this).css('background-color', '#A7D6E2'); } ); var bodywrapper = $('.bodywrapper'); var sidebarbutton = $('#sidebarbutton'); sidebarbutton.css({'height': '900px'}); </script> </head> <body> <div class="header-wrapper"> <div class="header"> <p class="logo"><a href="../index.html"> <img src="../_static/scikit-learn-logo-small.png" alt="Logo"/> </a> </p><div class="navbar"> <ul> <li><a href="../../stable/index.html">Home</a></li> <li><a href="../../stable/install.html">Installation</a></li> <li class="btn-li"><div class="btn-group"> <a href="../documentation.html">Documentation</a> <a class="btn dropdown-toggle" data-toggle="dropdown"> <span class="caret"></span> </a> <ul class="dropdown-menu"> <li class="link-title">Scikit-learn 0.16 (Stable)</li> <li><a href="../tutorial/index.html">Tutorials</a></li> <li><a href="../user_guide.html">User guide</a></li> <li><a href="../modules/classes.html">API</a></li> <li><a href="../faq.html">FAQ</a></li> <li class="divider"></li> <li><a href="http://scikit-learn.org/dev/documentation.html">Development</a></li> <li><a href="http://scikit-learn.org/0.15/">Scikit-learn 0.15</a></li> </ul> </div> </li> <li><a href="index.html">Examples</a></li> </ul> <div class="search_form"> <div id="cse" style="width: 100%;"></div> </div> </div> <!-- end navbar --></div> </div> <!-- Github "fork me" ribbon --> <a href="https://github.com/scikit-learn/scikit-learn"> <img class="fork-me" style="position: absolute; top: 0; right: 0; border: 0;" src="../_static/img/forkme.png" alt="Fork me on GitHub" /> </a> <div class="content-wrapper"> <div class="sphinxsidebar"> <div class="sphinxsidebarwrapper"> <p class="doc-version">This documentation is for scikit-learn <strong>version 0.16.1</strong> — <a href="http://scikit-learn.org/stable/support.html#documentation-resources">Other versions</a></p> <p class="citing">If you use the software, please consider <a href="../about.html#citing-scikit-learn">citing scikit-learn</a>.</p> <ul> <li><a class="reference internal" href="#">Underfitting vs. Overfitting</a></li> </ul> </div> </div> <div class="content"> <div class="documentwrapper"> <div class="bodywrapper"> <div class="body"> <div class="section" id="underfitting-vs-overfitting"> <span id="example-plot-underfitting-overfitting-py"></span><h1>Underfitting vs. Overfitting<a class="headerlink" href="#underfitting-vs-overfitting" title="Permalink to this headline">ΒΆ</a></h1> <p>This example demonstrates the problems of underfitting and overfitting and how we can use linear regression with polynomial features to approximate nonlinear functions. The plot shows the function that we want to approximate, which is a part of the cosine function. In addition, the samples from the real function and the approximations of different models are displayed. The models have polynomial features of different degrees. We can see that a linear function (polynomial with degree 1) is not sufficient to fit the training samples. This is called <strong>underfitting</strong>. A polynomial of degree 4 approximates the true function almost perfectly. However, for higher degrees the model will <strong>overfit</strong> the training data, i.e. it learns the noise of the training data.</p> <img alt="../_images/plot_underfitting_overfitting_0011.png" class="align-center" src="../_images/plot_underfitting_overfitting_0011.png" /> <p><strong>Python source code:</strong> <a class="reference download internal" href="../_downloads/plot_underfitting_overfitting1.py"><tt class="xref download docutils literal"><span class="pre">plot_underfitting_overfitting.py</span></tt></a></p> <div class="highlight-python"><div class="highlight"><pre><span class="k">print</span><span class="p">(</span><span class="n">__doc__</span><span class="p">)</span> <span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span> <span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="kn">as</span> <span class="nn">plt</span> <span class="kn">from</span> <span class="nn">sklearn.pipeline</span> <span class="kn">import</span> <a href="../modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline"><span class="n">Pipeline</span></a> <span class="kn">from</span> <span class="nn">sklearn.preprocessing</span> <span class="kn">import</span> <a href="../modules/generated/sklearn.preprocessing.PolynomialFeatures.html#sklearn.preprocessing.PolynomialFeatures"><span class="n">PolynomialFeatures</span></a> <span class="kn">from</span> <span class="nn">sklearn.linear_model</span> <span class="kn">import</span> <a href="../modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression"><span class="n">LinearRegression</span></a> <a href="http://docs.scipy.org/doc/numpy-1.6.0/reference/generated/numpy.random.seed.html#numpy.random.seed"><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span></a><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="n">n_samples</span> <span class="o">=</span> <span class="mi">30</span> <span class="n">degrees</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">15</span><span class="p">]</span> <span class="n">true_fun</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">X</span><span class="p">:</span> <a href="http://docs.scipy.org/doc/numpy-1.6.0/reference/generated/numpy.cos.html#numpy.cos"><span class="n">np</span><span class="o">.</span><span class="n">cos</span></a><span class="p">(</span><span class="mf">1.5</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">pi</span> <span class="o">*</span> <span class="n">X</span><span class="p">)</span> <span class="n">X</span> <span class="o">=</span> <a href="http://docs.scipy.org/doc/numpy-1.6.0/reference/generated/numpy.sort.html#numpy.sort"><span class="n">np</span><span class="o">.</span><span class="n">sort</span></a><span class="p">(</span><a href="http://docs.scipy.org/doc/numpy-1.6.0/reference/generated/numpy.random.rand.html#numpy.random.rand"><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">rand</span></a><span class="p">(</span><span class="n">n_samples</span><span class="p">))</span> <span class="n">y</span> <span class="o">=</span> <span class="n">true_fun</span><span class="p">(</span><span class="n">X</span><span class="p">)</span> <span class="o">+</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">n_samples</span><span class="p">)</span> <span class="o">*</span> <span class="mf">0.1</span> <a href="http://matplotlib.org/api/figure_api.html#matplotlib.figure"><span class="n">plt</span><span class="o">.</span><span class="n">figure</span></a><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">14</span><span class="p">,</span> <span class="mi">4</span><span class="p">))</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">degrees</span><span class="p">)):</span> <span class="n">ax</span> <span class="o">=</span> <a href="http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.subplot"><span class="n">plt</span><span class="o">.</span><span class="n">subplot</span></a><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">degrees</span><span class="p">),</span> <span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <a href="http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.setp"><span class="n">plt</span><span class="o">.</span><span class="n">setp</span></a><span class="p">(</span><span class="n">ax</span><span class="p">,</span> <span class="n">xticks</span><span class="o">=</span><span class="p">(),</span> <span class="n">yticks</span><span class="o">=</span><span class="p">())</span> <span class="n">polynomial_features</span> <span class="o">=</span> <a href="../modules/generated/sklearn.preprocessing.PolynomialFeatures.html#sklearn.preprocessing.PolynomialFeatures"><span class="n">PolynomialFeatures</span></a><span class="p">(</span><span class="n">degree</span><span class="o">=</span><span class="n">degrees</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">include_bias</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span> <span class="n">linear_regression</span> <span class="o">=</span> <a href="../modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression"><span class="n">LinearRegression</span></a><span class="p">()</span> <span class="n">pipeline</span> <span class="o">=</span> <a href="../modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline"><span class="n">Pipeline</span></a><span class="p">([(</span><span class="s">"polynomial_features"</span><span class="p">,</span> <span class="n">polynomial_features</span><span class="p">),</span> <span class="p">(</span><span class="s">"linear_regression"</span><span class="p">,</span> <span class="n">linear_regression</span><span class="p">)])</span> <span class="n">pipeline</span><span class="o">.</span><span class="n">fit</span><span class="p">(</span><span class="n">X</span><span class="p">[:,</span> <a href="http://docs.scipy.org/doc/numpy-1.6.0/reference/arrays.indexing.html#numpy.newaxis"><span class="n">np</span><span class="o">.</span><span class="n">newaxis</span></a><span class="p">],</span> <span class="n">y</span><span class="p">)</span> <span class="n">X_test</span> <span class="o">=</span> <a href="http://docs.scipy.org/doc/numpy-1.6.0/reference/generated/numpy.linspace.html#numpy.linspace"><span class="n">np</span><span class="o">.</span><span class="n">linspace</span></a><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">100</span><span class="p">)</span> <a href="http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot"><span class="n">plt</span><span class="o">.</span><span class="n">plot</span></a><span class="p">(</span><span class="n">X_test</span><span class="p">,</span> <span class="n">pipeline</span><span class="o">.</span><span class="n">predict</span><span class="p">(</span><span class="n">X_test</span><span class="p">[:,</span> <a href="http://docs.scipy.org/doc/numpy-1.6.0/reference/arrays.indexing.html#numpy.newaxis"><span class="n">np</span><span class="o">.</span><span class="n">newaxis</span></a><span class="p">]),</span> <span class="n">label</span><span class="o">=</span><span class="s">"Model"</span><span class="p">)</span> <a href="http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot"><span class="n">plt</span><span class="o">.</span><span class="n">plot</span></a><span class="p">(</span><span class="n">X_test</span><span class="p">,</span> <span class="n">true_fun</span><span class="p">(</span><span class="n">X_test</span><span class="p">),</span> <span class="n">label</span><span class="o">=</span><span class="s">"True function"</span><span class="p">)</span> <a href="http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter"><span class="n">plt</span><span class="o">.</span><span class="n">scatter</span></a><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="s">"Samples"</span><span class="p">)</span> <a href="http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.xlabel"><span class="n">plt</span><span class="o">.</span><span class="n">xlabel</span></a><span class="p">(</span><span class="s">"x"</span><span class="p">)</span> <a href="http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.ylabel"><span class="n">plt</span><span class="o">.</span><span class="n">ylabel</span></a><span class="p">(</span><span class="s">"y"</span><span class="p">)</span> <a href="http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.xlim"><span class="n">plt</span><span class="o">.</span><span class="n">xlim</span></a><span class="p">((</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span> <a href="http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.ylim"><span class="n">plt</span><span class="o">.</span><span class="n">ylim</span></a><span class="p">((</span><span class="o">-</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span> <a href="http://matplotlib.org/api/legend_api.html#matplotlib.legend"><span class="n">plt</span><span class="o">.</span><span class="n">legend</span></a><span class="p">(</span><span class="n">loc</span><span class="o">=</span><span class="s">"best"</span><span class="p">)</span> <a href="http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.title"><span class="n">plt</span><span class="o">.</span><span class="n">title</span></a><span class="p">(</span><span class="s">"Degree </span><span class="si">%d</span><span class="s">"</span> <span class="o">%</span> <span class="n">degrees</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <a href="http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.show"><span class="n">plt</span><span class="o">.</span><span class="n">show</span></a><span class="p">()</span> </pre></div> </div> <p><strong>Total running time of the example:</strong> 0.20 seconds ( 0 minutes 0.20 seconds)</p> </div> </div> </div> </div> <div class="clearer"></div> </div> </div> <div class="footer"> © 2010 - 2014, scikit-learn developers (BSD License). <a href="../_sources/auto_examples/plot_underfitting_overfitting.txt" rel="nofollow">Show this page source</a> </div> <div class="rel rellarge"> </div> <script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-22606712-2']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script> <script src="http://www.google.com/jsapi" type="text/javascript"></script> <script type="text/javascript"> google.load('search', '1', {language : 'en'}); google.setOnLoadCallback(function() { var customSearchControl = new google.search.CustomSearchControl('016639176250731907682:tjtqbvtvij0'); customSearchControl.setResultSetSize(google.search.Search.FILTERED_CSE_RESULTSET); var options = new google.search.DrawOptions(); options.setAutoComplete(true); customSearchControl.draw('cse', options); }, true); </script> <script src="https://scikit-learn.org/versionwarning.js"></script> </body> </html>