愉快的学习就从翻译开始吧_How to Convert a Time Series to a Supervised Learning Problem in Python

One-Step Univariate Forecasting/单步单变量预测

It is standard practice in time series forecasting to use lagged observations (e.g. t-1) as input variables to forecast the current time step (t).
时间序列预测的标准做法是使用滞后观测值（例如t-1）作为输入变量来预测当前时间步长（t）。

This is called one-step forecasting.

这被称为一步预测。

The example below demonstrates a one lag time step (t-1) to predict the current time step (t).

下面的例子演示了预测当前时间步长（t）的一个滞后时间步长（t-1）。

 
         1 
       
         2 
       
         3 
       
         4 
       
         5 
       
         6 
       
         7 
       
         8 
       
         9 
       
         10 
       
         11 
       
         12 
       
         13 
       
         14 
       
         15 
       
         16 
       
         17 
       
         18 
       
         19 
       
         20 
       
         21 
       
         22 
       
         23 
       
         24 
       
         25 
       
         26 
       
         27 
       
         28 
       
         29 
       
         30 
       
         31 
       
         32 
       
         33 
       
         34 
       
         35 
       
         36 
       
         37 
       
         38 
       
         39 
       
         40 
       
        from  
        pandas  
        import  
        DataFrame 
       
        from  
        pandas  
        import  
        concat 
       
        def  
        series_to_supervised 
        ( 
        data 
        , 
          
        n_in 
        = 
        1 
        , 
          
        n_out 
        = 
        1 
        , 
          
        dropnan 
        = 
        True 
        ) 
        : 
       
        "" 
        " 
       
         Frame a time series as a supervised learning dataset. 
       
         Arguments: 
       
         data: Sequence of observations as a list or NumPy array. 
       
         n_in: Number of lag observations as input (X). 
       
         n_out: Number of observations as output (y). 
       
         dropnan: Boolean whether or not to drop rows with NaN values. 
       
         Returns: 
       
         Pandas DataFrame of series framed for supervised learning. 
       
         " 
        "" 
       
        n_vars 
          
        = 
          
        1 
          
        if 
          
        type 
        ( 
        data 
        ) 
          
        is 
          
        list  
        else 
          
        data 
        . 
        shape 
        [ 
        1 
        ] 
       
        df 
          
        = 
          
        DataFrame 
        ( 
        data 
        ) 
       
        cols 
        , 
          
        names 
          
        = 
          
        list 
        ( 
        ) 
        , 
          
        list 
        ( 
        ) 
       
        # input sequence (t-n, ... t-1) 
       
        for 
          
        i 
          
        in 
          
        range 
        ( 
        n_in 
        , 
          
        0 
        , 
          
        - 
        1 
        ) 
        : 
       
        cols 
        . 
        append 
        ( 
        df 
        . 
        shift 
        ( 
        i 
        ) 
        ) 
       
        names 
          
        += 
          
        [ 
        ( 
        'var%d(t-%d)' 
          
        % 
          
        ( 
        j 
        + 
        1 
        , 
          
        i 
        ) 
        ) 
          
        for 
          
        j 
          
        in 
          
        range 
        ( 
        n_vars 
        ) 
        ] 
       
        # forecast sequence (t, t+1, ... t+n) 
       
        for 
          
        i 
          
        in 
          
        range 
        ( 
        0 
        , 
          
        n_out 
        ) 
        : 
       
        cols 
        . 
        append 
        ( 
        df 
        . 
        shift 
        ( 
        - 
        i 
        ) 
        ) 
       
        if 
          
        i 
          
        == 
          
        0 
        : 
       
        names 
          
        += 
          
        [ 
        ( 
        'var%d(t)' 
          
        % 
          
        ( 
        j 
        + 
        1 
        ) 
        ) 
          
        for 
          
        j 
          
        in 
          
        range 
        ( 
        n_vars 
        ) 
        ] 
       
        else 
        : 
       
        names 
          
        += 
          
        [ 
        ( 
        'var%d(t+%d)' 
          
        % 
          
        ( 
        j 
        + 
        1 
        , 
          
        i 
        ) 
        ) 
          
        for 
          
        j 
          
        in 
          
        range 
        ( 
        n_vars 
        ) 
        ] 
       
        # put it all together 
       
        agg 
          
        = 
          
        concat 
        ( 
        cols 
        , 
          
        axis 
        = 
        1 
        ) 
       
        agg 
        . 
        columns 
          
        = 
          
        names 
       
        # drop rows with NaN values 
       
        if 
          
        dropnan 
        : 
       
        agg 
        . 
        dropna 
        ( 
        inplace 
        = 
        True 
        ) 
       
        return 
          
        agg 
       
        values 
          
        = 
          
        [ 
        x 
          
        for 
          
        x 
          
        in 
          
        range 
        ( 
        10 
        ) 
        ] 
       
        data 
          
        = 
          
        series_to_supervised 
        ( 
        values 
        ) 
       
        print 
        ( 
        data 
        )

Running the example prints the output of the reframed time series.

运行例子，打印重构的时间序列

 
         1 
       
         2 
       
         3 
       
         4 
       
         5 
       
         6 
       
         7 
       
         8 
       
         9 
       
         10 
       
            var1(t-1)  var1(t) 
       
         1        0.0        1 
       
         2        1.0        2 
       
         3        2.0        3 
       
         4        3.0        4 
       
         5        4.0        5 
       
         6        5.0        6 
       
         7        6.0        7 
       
         8        7.0        8 
       
         9        8.0        9

We can see that the observations are named “var1” and that the input observation is suitably named (t-1) and the output time step is named (t).

我们可以看到，观察结果被命名为“var1”，并且输入观察值被命名为（t-1），输出时间步骤命名为（t）。

We can also see that rows with NaN values have been automatically removed from the DataFrame.

我们还可以看到具有NaN值的行已自动从DataFrame中删除。

We can repeat this example with an arbitrary number length input sequence, such as 3. This can be done by specifying the length of the input sequence as an argument; for example:

我们可以用任意数字长度的输入序列重复这个例子，例如3.这可以通过将输入序列的长度指定为参数来完成。例如：

 
         1 
       
        data 
          
        = 
          
        series_to_supervised 
        ( 
        values 
        , 
          
        3 
        )

The complete example is listed below.

下面列出了完整的示例。

 
         1 
       
         2 
       
         3 
       
         4 
       
         5 
       
         6 
       
         7 
       
         8 
       
         9 
       
         10 
       
         11 
       
         12 
       
         13 
       
         14 
       
         15 
       
         16 
       
         17 
       
         18 
       
         19 
       
         20 
       
         21 
       
         22 
       
         23 
       
         24 
       
         25 
       
         26 
       
         27 
       
         28 
       
         29 
       
         30 
       
         31 
       
         32 
       
         33 
       
         34 
       
         35 
       
         36 
       
         37 
       
         38 
       
         39 
       
         40 
       
        from  
        pandas  
        import  
        DataFrame 
       
        from  
        pandas  
        import  
        concat 
       
        def  
        series_to_supervised 
        ( 
        data 
        , 
          
        n_in 
        = 
        1 
        , 
          
        n_out 
        = 
        1 
        , 
          
        dropnan 
        = 
        True 
        ) 
        : 
       
        "" 
        " 
       
         Frame a time series as a supervised learning dataset. 
       
         Arguments: 
       
         data: Sequence of observations as a list or NumPy array. 
       
         n_in: Number of lag observations as input (X). 
       
         n_out: Number of observations as output (y). 
       
         dropnan: Boolean whether or not to drop rows with NaN values. 
       
         Returns: 
       
         Pandas DataFrame of series framed for supervised learning. 
       
         " 
        "" 
       
        n_vars 
          
        = 
          
        1 
          
        if 
          
        type 
        ( 
        data 
        ) 
          
        is 
          
        list  
        else 
          
        data 
        . 
        shape 
        [ 
        1 
        ] 
       
        df 
          
        = 
          
        DataFrame 
        ( 
        data 
        ) 
       
        cols 
        , 
          
        names 
          
        = 
          
        list 
        ( 
        ) 
        , 
          
        list 
        ( 
        ) 
       
        # input sequence (t-n, ... t-1) 
       
        for 
          
        i 
          
        in 
          
        range 
        ( 
        n_in 
        , 
          
        0 
        , 
          
        - 
        1 
        ) 
        : 
       
        cols 
        . 
        append 
        ( 
        df 
        . 
        shift 
        ( 
        i 
        ) 
        ) 
       
        names 
          
        += 
          
        [ 
        ( 
        'var%d(t-%d)' 
          
        % 
          
        ( 
        j 
        + 
        1 
        , 
          
        i 
        ) 
        ) 
          
        for 
          
        j 
          
        in 
          
        range 
        ( 
        n_vars 
        ) 
        ] 
       
        # forecast sequence (t, t+1, ... t+n) 
       
        for 
          
        i 
          
        in 
          
        range 
        ( 
        0 
        , 
          
        n_out 
        ) 
        : 
       
        cols 
        . 
        append 
        ( 
        df 
        . 
        shift 
        ( 
        - 
        i 
        ) 
        ) 
       
        if 
          
        i 
          
        == 
          
        0 
        : 
       
        names 
          
        += 
          
        [ 
        ( 
        'var%d(t)' 
          
        % 
          
        ( 
        j 
        + 
        1 
        ) 
        ) 
          
        for 
          
        j 
          
        in 
          
        range 
        ( 
        n_vars 
        ) 
        ] 
       
        else 
        : 
       
        names 
          
        += 
          
        [ 
        ( 
        'var%d(t+%d)' 
          
        % 
          
        ( 
        j 
        + 
        1 
        , 
          
        i 
        ) 
        ) 
          
        for 
          
        j 
          
        in 
          
        range 
        ( 
        n_vars 
        ) 
        ] 
       
        # put it all together 
       
        agg 
          
        = 
          
        concat 
        ( 
        cols 
        , 
          
        axis 
        = 
        1 
        ) 
       
        agg 
        . 
        columns 
          
        = 
          
        names 
       
        # drop rows with NaN values 
       
        if 
          
        dropnan 
        : 
       
        agg 
        . 
        dropna 
        ( 
        inplace 
        = 
        True 
        ) 
       
        return 
          
        agg 
       
        values 
          
        = 
          
        [ 
        x 
          
        for 
          
        x 
          
        in 
          
        range 
        ( 
        10 
        ) 
        ] 
       
        data 
          
        = 
          
        series_to_supervised 
        ( 
        values 
        , 
          
        3 
        ) 
       
        print 
        ( 
        data 
        )

Again, running the example prints the reframed series. We can see that the input sequence is in the correct left-to-right order with the output variable to be predicted on the far right.

再次，运行该示例打印重构系列。我们可以看到，输入序列按照正确的从左到右的顺序排列，将被预测的输出变量在最右边。

 
         1 
       
         2 
       
         3 
       
         4 
       
         5 
       
         6 
       
         7 
       
         8 
       
            var1(t-3)  var1(t-2)  var1(t-1)  var1(t) 
       
         3        0.0        1.0        2.0        3 
       
         4        1.0        2.0        3.0        4 
       
         5        2.0        3.0        4.0        5 
       
         6        3.0        4.0        5.0        6 
       
         7        4.0        5.0        6.0        7 
       
         8        5.0        6.0        7.0        8 
       
         9        6.0        7.0        8.0        9