Comprehensive Time Series Forecasting Survey
Comprehensive Time Series Forecasting Survey
Abstract—Time series forecasting (TSF) has become an increasingly vital tool in various decision-making applications, including
business intelligence and scientific discovery, in today’s rapidly evolving digital landscape. Over the years, a wide range of methods for
TSF has been proposed, spanning from traditional static-based models to more recent machine learning-driven, data-intensive
approaches. Despite the extensive body of research, there is still no universally accepted, unified problem statement or systematic
elaboration of the core challenges and characteristics of TSF. The extent to which deep TSF models can address fundamental
issues—such as data sparsity and non-stationarity—remains unclear, and the broader TSF research landscape continues to evolve,
shaped by diverse methodological trends. This comprehensive survey aims to address these gaps by examining the key entities in TSF
(e.g., covariates) and their characteristics (e.g., frequency, length, missing values). We introduce a general problem formulation and
challenge analysis for TSF, propose a taxonomy that classifies representative methodologies from the preprocessing and forecasting
perspectives, and highlight emerging topics like transfer learning and trustworthy forecasting. Finally, we discuss promising research
directions that are poised to drive innovation in this dynamic and rapidly advancing field. The related paper list is available at
[Link]
Index Terms—Time Series, Time Series Forecasting, Statistics Model, Deep Learning
1 I NTRODUCTION
presented by real-world time series data. We also high- when designing forecasting or analysis techniques. These
light the importance of time series preprocessing, offering properties—including continuous, unbounded states and
a detailed examination of techniques such as tokenization typically uniform sampling intervals—distinguish time se-
and graph transformations, which have gained attention in ries data from cross-sectional data, where observations lack
recent literature for their effectiveness in improving model inherent temporal ordering.
performance. Additionally, we discuss the progress in time
series forecasting, including statistical models, data-driven
2.2 Basic Concepts of Time Series Forecasting
approaches, transfer learning methods, and trustworthy
forecasting techniques. By synthesizing diverse TSF meth- Time series forecasting aims to predict future values of a
ods under a single unifying framework, our survey aims target series based on past observations and, potentially,
to clarify the rich methodological landscape while exposing additional explanatory variables. The core formulation typ-
unresolved challenges and open questions. ically involves two key components: the look-back window
The remainder of this article is organized as follows. (i.e., a set of recent past observations) as input, and the
Section 2 formally defines the fundamental concepts of predicted window (i.e., one or more future time steps) as
TSF and introduces the widely used evaluation protocols. output. Below, we outline several fundamental concepts:
Following this, Section 3 provides an analysis of key data
• Look-back Window of the Target Series: This is a
characteristics and challenges. In Section 4, we describe
sequence of consecutive time steps (e.g., the previous
several mainstream time series preprocessing strategies. We
L observations) from the target series itself. It acts as
then present the classical statistical-based approaches in
the primary source of historical context, capturing
Sections 5, before transitioning to data-driven methods in
trends, seasonal patterns, and other temporal depen-
Section 6, which covers both machine-learning-based and
dencies.
deep-learning-based approaches. Besides, Section 7 focuses
• Covariates (Exogenous Variables): Beyond the target
on recent advances in transfer learning models, and Sec-
series, many applications leverage auxiliary factors
tion 8 discusses the trustworthy TSF. Additionally, We intro-
such as weather conditions, economic indicators, or
duce some prevalent benchmark datasets and representative
demographic information. These additional inputs,
applications of TSF in various domains in Section 9, and
known as covariates or exogenous variables, can offer
highlight open research directions in Section 10. Finally, we
valuable insights when the target series is influenced
conclude the survey in Section 11.
by external drivers.
• Predicted Window of the Target Series: In forecast-
2 F OUNDATION C ONCEPT D ESCRIPTION ing tasks, the output is typically a set of future time
2.1 Introduction to Time Series Data steps to be estimated. This window could be as short
A time series is commonly defined as a sequence of data as a single time step (e.g., predicting the next hour)
points indexed in chronological order, where each obser- or span multiple future periods (e.g., predicting the
vation is obtained at a specific (often uniformly spaced) next week or month).
time interval. Although the data are frequently recorded • One-step vs. Multi-step Prediction:One-step Predic-
at discrete intervals (e.g., hourly, daily, monthly), many tion aims to forecast a single future time step at
real-world phenomena underlying these observations can a time (e.g., t + 1 given observations up to t). It
be treated as continuous and, in principle, unbounded in is often simpler to implement, but may require re-
both time and value. As a result, time series modeling must peated application to forecast multiple steps ahead.
address both the discrete nature of the measurements and Multi-step Prediction directly generates forecasts for
the potential continuity of the process itself. several consecutive future steps (e.g., t + 1 to t + H )
A typical time series can often be decomposed into in one shot. This approach can capture long-range
several key components: dependencies but often entails greater complexity
• Trend: A long-term progression of increasing or de- and potential error accumulation.
creasing values in the series over extended periods. • Univariate vs. Multivariate Forecasting: Univari-
• Seasonality: Regular, repeating patterns that recur at ate Forecasting considers only a single target series
fixed intervals (e.g., daily, weekly, annually), driven without additional external information. Multivariate
by predictable factors such as calendar effects or Forecasting involves multiple interrelated series or
environmental cycles. auxiliary inputs to exploit cross-dependencies and
• Cyclicity: Fluctuations that are not strictly tied to a potentially improve accuracy.
fixed calendar period but exhibit recognizable rises • Iterative vs. Direct Strategies: Iterative Forecasting
and falls over time, often linked to economic or other forecasts one step ahead repeatedly, feeding each
systemic influences. predicted value back as input for the next step. Direct
• Irregular or Random Variations: Unpredictable Forecasting trains separate models (or a single model
changes in the series that are not explained by the with multi-output) to predict each future step or win-
trend, seasonality, or cyclical behavior, often treated dow of interest directly, avoiding error propagation
as residual noise. at the cost of additional modeling complexity.
Because time series data are recorded in chronolog- In summary, the essential building blocks of time se-
ical order, statistical dependencies exist among observa- ries forecasting encompass which segments of past data
tions, making it crucial to consider temporal structures (i.e., the look-back window) and what external factors (i.e.,
PREPRINT 3
TABLE 1: Overview of the metrics widely evaluated in point-based forecasting and probabilistic forecasting tasks.
Continuous Ranked Probability Score (CRPS) is used significant efforts focused on individual imputation and
to evaluate the accuracy of probabilistic predictions, where anomaly detection [23], [24], mainstream research still of-
F (z) represents the cumulative distribution function and y ten assumes that forecasting datasets are well-filtered and
is the value to be predicted. The ρ−Quantile Loss, with ŷ of high quality. As a result, there remains a challenge in
as the predicted value for the quantile level ρ, and Iŷ>y exploring effective methods to mitigate the impact of low-
and Iŷ≤y as indicator functions, computes the quantile- quality data on forecasting models.
level prediction accuracy. In the context of Negative Log
Likelihood (NLL), Df is related to the data distribution or 3.2 Irregular Time Series
f
prediction model, and pD (y) is the predictive probability
Although continuous formulations are widely adopted in
density function, with NLL mainly aiming to assess the
the field of time series forecasting, many real-world datasets
model’s goodness of fit to the data, where a smaller value
are irregularly sampled due to technical constraints or
implies a better fit. Besides, the Variogram (VG) describes
limitations inherent in the data collection process. As a
the spatial variability of the data, helping to understand its
result, observations are recorded at variable time intervals.
spatial structure and correlation. It involves observed values
The time gaps between these observations themselves con-
ya and yb potentially related to xa and xb , a given parameter
tain crucial information about the underlying time series,
ρ and the data distribution Df .
thereby presenting significant challenges in handling the
irregular sampling intervals and effectively capturing the
3 C HALLENGES A NALYSIS evolving latent dynamics [25], [26].
Sequence signals collected in chronological order can be
3.3 Long-term Sequence Forecasting
classified as time series data, a distinct data modality de-
rived from various sensors or real-world observations. Time Time series data essentially consists of a sequence of con-
series data captures the dynamic evolution of systems over tinuous numerical values and can thus be considered an
time, reflecting both short-term fluctuations and long-term intermediary modality between the extensively studied
trends. While rich in information, time series data exhibits fields of image and language data, with further inherent
several key characteristics that pose specific challenges for characteristics. Modeling the temporal dependencies within
accurate forecasting. In this survey, we provide a compre- time series data remains a prominent research focus in time
hensive discussion of these characteristics and challenges as series forecasting [27]. From a sequence perspective, time
follows: series data often spans longer time intervals, resulting in
extended sequences. Additionally, from a numerical stand-
point, time series data typically lacks well-defined upper
3.1 Noise, Anomalies, and Outliers and lower bounds, further complicating accurate forecast-
Time series data, which reflects a system through multiple ing. More specifically, many real-world applications are
sensors or evaluations, inevitably raises concerns related framed as long-term time series forecasting tasks [28], [29],
to data quality. This can result in noise, anomalies, and requiring forecasts over extended horizons and windows,
outliers within the collected series, ultimately undermining which presents substantial challenges in capturing long-
the performance of downstream forecasting tasks. Despite term dependencies within the series. Furthermore, a critical
PREPRINT 5
challenge lies in mitigating the impact of error accumulation series data can be decomposed into several independent
during the numerical sequence modeling process [30]. components, and considerable efforts have been dedicated
to developing advanced decomposition techniques over the
years [40], [41]. In forecasting tasks, a common approach
3.4 Multivariate Dependence Modeling
involves a simple neural decomposition layer that separates
Collected time series datasets typically exhibit multivariate the original series into trend and seasonal components.
characteristics, primarily due to the complex cross-channel This decomposition captures the long-term progression and
dependencies embedded within the data. On the one hand, seasonal patterns inherent in the data, thereby facilitating
causal or leading effects may exist in certain scenarios, more accurate forecasting [42]. However, precise modeling
including traffic load prediction and temperature analysis of structural characteristics, multi-periodicity, and the ef-
[31]. However, clear prior assumptions or experiential in- fects of holidays in time series data remains a challenge that
sights about these dependencies are often lacking, especially requires further exploration.
in complex and high-dimensional systems. This absence
of guidance presents a significant challenge in accurately
identifying and modeling cross-channel dependencies. On 3.8 Multi-scale Representations
the other hand, in many complex systems, such as the hu-
The multi-scale characteristics of time series data are pri-
man body or weather systems, pre-existing knowledge often
marily exhibited from two perspectives: multi-granularity
makes it difficult to identify the relevant variables. This
and hierarchical [43]. On one hand, time series data consists
necessitates the collection of data from multiple perspectives
of discrete records sampled from a continuous time-space.
to enable a precise and holistic analysis of the entire system
Depending on the sampling frequency, it inherently exhibits
[4]. Consequently, sophisticated approaches are required to
multi-granularity characteristics, where high-frequency se-
effectively capture the hidden patterns and dependencies
ries tend to be more informative but noisier, while low-
across multiple time series. The current channel-dependent
frequency series offer smoother trends but less detail. On
[32] and channel-independent [33] solutions still warrant
the other hand, both local disturbances and global trends
further exploration [34].
significantly influence the time series data. Consequently,
determining an appropriate modeling granularity, fully ex-
3.5 Exogenous Variables Modeling ploiting multi-scale features, and achieving effective feature
Real-world systems often exhibit a partially observed nature fusion present key research challenges [44], [45].
due to incomplete prior knowledge, which can lead to
suboptimal forecasting outcomes when relying solely on
endogenous variables [35], [36], [37]. In particular, certain 3.9 Computational Efficiency
scenarios necessitate the incorporation of external variables, In the forecasting of time series, data often consists of series
such as weather, policies, holidays, and macroeconomic from multiple channels, and both the forecast horizon and
indicators. The impact of these external variables on time prediction lengths are typically long. This easily lead to
series objectives may be nonlinear and time-varying. How- a significant computational efficiency challenge for apply-
ever, it remains unclear how to effectively identify the key ing traditional neural networks for time series forecasting,
exogenous variables and integrate this multimodal informa- particularly problematic in tasks requiring real-time perfor-
tion into a unified forecasting framework. mance, such as stock price forecasting [46] and human phys-
iological signal prediction [47]. Consequently, developing
3.6 Distribution Shift Modeling prediction methods that effectively balance computational
efficiency with model representation capability remains a
Time series data typically exhibit non-stationary character-
key research challenge.
istics, meaning that the distribution of the data may change
rapidly over time. This results in discrepancies between
different time spans, ultimately hindering the generalization 3.10 Generalizability and Transferability
capabilities of deep learning models, as the distribution shift
contradicts their fundamental assumption of independent Unlike the pixels used in computer vision (CV) and word
and identically distributed (I.I.D.) data [38], [39]. Further- tokens in natural language processing (NLP), there is no
more, traditional methods for addressing distribution shift, standardized definition of unified semantic units in time
such as domain adaptation and domain generalization, may series analysis. Although these series share a similar data
not be well-suited to this task, as defining a domain for time format, the values within them may have inconsistent mean-
series data is not straightforward [12]. As a result, it remains ings across different contexts due to the distinct physical
an open challenge to develop models or frameworks that significance in various scenarios. As a result, most forecast-
can effectively mitigate the impact of non-stationarity. ing methodologies are typically trained and evaluated on
a single dataset, limiting their generalizability and transfer-
ability. Therefore, the development of a robust and scalable
3.7 Trend-Seasonal Pattern Recognition forecasting foundation model, built on extensive multi-
Time series data are usually numerical responses to natural source pretraining data, represents a significant challenge
indicators or human behavior, making them highly influ- [48], [49]. Furthermore, the operational mechanisms, such as
enced by natural rhythms and exhibiting distinct, struc- scaling laws and the influence of data distribution on model
tured numerical patterns. It is widely accepted that time performance, remain unclear [50].
PREPRINT 6
former allows the model to automatically find the optimal DFT DCT DWT SFTP
Suitable for
Best for stationary signals Used for compression and non-stationary signals Ideal for analyzing
with constant properties, feature extraction in with changing signals with localized
Application Scenarios
such as periodic stationary signals, like images frequencies, such as frequency changes, like
behaviors. or videos. financial or biomedical speech or audio.
data.
effectively capture cross-channel relationships. Generally, within time series data, such as lagged values and error
spatial-temporal graphs can be generated through either terms, to identify and predict future series, effectively cap-
heuristic-based or learning-based approaches [14], as illus- turing the underlying dynamics of the data. Classical time
trated in Figure 4. series models typically assume a linear relationship, rely-
ing on established statistical principles to produce accurate
forecasts. In this section, we primarily introduce ARIMA [6]
and its extensions, Exponential Smoothing methods, the
Heuristic
Metrics State-Space Model (SSM), and the Gaussian Mixture Model
Learned (GMM), discussing their key characteristics.
Metrics
time
Time Series
Adjacency Matrix Spatial-temporal 5.1 Auto-Regressive Moving Average and Extensions
Construction Graphs
Autoregressive (AR) and moving average (MA) components
Fig. 4: Illustration of the process of constructing spatial- are fundamental techniques in time series analysis. Several
temporal graphs. variants of the moving average (MA) model, such as the
Simple Moving Average (SMA) and Exponential Moving
Heuristic-based methods construct the spatial-temporal Average (EMA), have been developed to improve forecast
graph based on the inherent characteristics of the original accuracy. SMA smooths short-term fluctuations by averag-
time series data. For instance, spatial distance and con- ing over a fixed window [106], while EMA assigns exponen-
nectivity metrics between variables can provide valuable tially decreasing weights to recent observations, making it
guidance in typical scenarios such as traffic load forecasting particularly responsive to trends and seasonal changes [107].
and weather prediction [100]. In addition, for forecasting These variants enhance flexibility in capturing different
tasks where prior knowledge of spatial correlations between series patterns.
variables is lacking, compositional methods based on data The Auto-Regressive Moving Average (ARMA) model
similarity metrics are widely used. These methods construct combines AR and MA components and is effective for
an approximate adjacency matrix based on the Pearson Cor- stationary time series, where statistical properties like
relation Coefficient (PCC) [101] or Dynamic Time Warping mean, variance, and autocorrelation remain constant [27].
(DTW) distance [102] between the multivariate time series. However, for non-stationary data, ARMA is insufficient.
Learning-based approaches focus on learning the graph The Auto-Regressive Integrated Moving Average (ARIMA)
structure in an end-to-end manner, aligning the learning model addresses this by introducing differencing (I) to sta-
process with the forecasting task itself. This allows for bilize the mean and make the series stationary, making it
the data-driven discovery of less obvious graph structures. suitable for a broader range of data [6]. For data with sea-
Typically, these methods employ additional neural networks sonal fluctuations, the Seasonal ARIMA (SARIMA) model
alongside the main forecasting model to generate the ad- extends ARIMA by incorporating seasonal components to
jacency matrix by leveraging interactions between learned capture periodic patterns, though its complexity in model
variable embeddings [103], [104] or the parametric distribu- specification and parameter tuning can be challenging [6].
tion [105] based on the observed time series data.
Section 6
SSMs: [146], [147], [148], [149];
Emerging Architectures (6.3)
KANs: [150], [151], [152].
6.1.1 Support Vector Regression where N (x) denotes the K nearest neighbors to the sample
x, and yi are their corresponding target values in the train-
Support Vector Regression (SVR) [201] is a regression form
ing set. In early research, KNN regression was used to deal
of Support Vector Machines (SVM) [202] designed to predict
with properties like repetitive patterns [123] or seasonality
continuous values. The objective of SVR is to ensure that
[124]. Researchers have also explored its application in
most of the data points (xi , yi ) satisfy the condition that the
multivariate time series forecasting [125], [126].
prediction f (xi ) lies within an ϵ range of the actual value yi .
SVR can be solved by transforming the original optimization
problem into its dual form with Lagrange multipliers αi and 6.2 Neural Networks and Deep Learning Models
αi∗ [203]. The final regression function is given by: With the development of deep learning, numerous neural
n
X forecasting models have been proposed that achieve supe-
f (x) = (αi − αi∗ )K(xi , x) + b. (3) rior forecasting accuracy, owing to their powerful capability
i=1 to capture both temporal and cross-channel dependencies.
In this section, we present recent advancements in deep
The function K(xi , x) is the kernel method. Common kernel
learning-based forecasting approaches, focusing on main-
functions include Linear kernel, Polynomial kernel, Gaus-
stream architectures, as illustrated in Figure 6.
sian (RBF) kernel, and so on.
In time series forecasting, SVR is widely used across
6.2.1 Recurrent Neural Networks
various fields [114]. For example, Cao et al. [115] uses
SVR to predict the relative price change trends of futures Recurrent Neural Networks (RNNs) have gained a lot of
contracts, Lu et al. [116] uses SVR to forecast air quality attention due to their unique advantage in modeling se-
parameters, Pai et al. [117] proposes a Seasonal Support quential data. The basic recursive structure can be written
Vector Regression (SSVR) model to address the challenges as follows:
ht = σ(U ht−1 + V xt + bh ),
of seasonal time series forecasting. Zhang et al [118]. pro- (5)
posed a multiple support vector regression (MSVR) model ŷt = W ht + by .
to reduce error accumulation. Where xt , yt and ht are the input, output and hidden state
at time step t, respectively. ht−1 is the hidden state at the
6.1.2 Regression Trees previous time step t − 1. U , V and W are the weight
Regression tree is a regression method based on decision matrices. bh and by are the bias terms. σ is a nonlinear
tree. Its core concept is to iteratively partition the feature activation function.
space into mutually exclusive regions, with each region In recent years, many studies have used RNNs as back-
associated with a predicted value. bone networks for time series forecasting [210]. DeepAR
The Classification and Regression Trees (CART) algo- [19], MQRNN [211] and DF-Model [130] are probabilistic
rithm [204] is one of the most widely used methods. CART forecasting models designed for uncertainty quantification.
recursively splits data based on selected features and corre- DeepAR generates the probability distribution for future
sponding split points to minimize the sum of mean squared time steps by jointly learning historical patterns and sea-
errors (MSE) for resulting subsets. This process continues sonal features across multiple sequences. MQRNN employs
until a predefined stopping criterion is met, and the target an Encoder-Decoder structure, with LSTM as the Encoder
value is typically predicted by averaging the target values and two MLP branches as the Decoder, to simultaneously
within the leaf nodes. Moreover, to reduce overfitting and predict quantiles for multiple future time steps. DF-Model
improve prediction accuracy, ensemble learning methods decomposes time series into global and local parts, using
like Random Forest [205], Gradient Boosted Decision Trees RNN to extract complex non-linear patterns globally and
(GBDT) [206], Extreme Gradient Boosting (XGBoost) [207], capturing individual random effects for each time series lo-
LightGBM [208] are often used. cally with probabilistic models like Gaussian processes(GP).
PREPRINT 12
Outputs
Outputs Outputs Outputs
Fig. 6: Illustration of several types of deep neural networks for time series forecasting.
In the point-based forecasting area, DA-RNN [128] and on a local segment of the input data, weight sharing uses
MH-TAL [129] both apply the attention mechanism to en- the same weights across different segments of the input,
hance context capture and prediction accuracy in tasks. and translation invariance allows the network to recognize
DA-RNN introduces input attention at the encoder stage features learned at any position in the input. These charac-
and temporal attention at the decoder stage, while MH- teristics are crucial for capturing local patterns in time series
TAL uses the attention mechanism to establish a connection that may shift over time, while significantly reducing the
between future and historical time step features. Besides, number of parameters and enhancing the network’s ability
the CRU model [25] introduces a new RNN variant for to efficiently analyze time series data. Given CNNs’ strong
modeling irregular time series, and SegRNN [132] proposes performance in feature extraction and computational effi-
a GRU-based model that employs channel-independent ciency, researchers suggest considering convolutional net-
strategy and replaces the original time point-wise iterations works as one of the primary candidate models for modeling
with sequence segment-wise iterations. Additionally, hybrid sequence data [217].
methods have also been studied, LSTNet [127] combines The initial convolutional-based neural network designed
CNN and RNN architectures to capture both short-term and specifically for time series data was Temporal Convolutional
long-term dependencies, with an autoregressive component Network(TCN) [217]. TCN employs dilated convolutions,
enhancing stability in multi-step forecasting. Its recurrent- allowing it to achieve a wider receptive field with the same
skip component helps alleviate gradient vanishing in long- number of model layers. Subsequently, researchers have
sequence modeling. ESLSTM [131] combines the exponen- continued to explore the potential of convolutional neural
tial smoothing model with LSTM to create a hybrid hierar- networks in time series analysis. MLCNN [133] utilizes a
chical forecasting model. multi-layer CNN (more than 10 layers) to learn deep ab-
In summary, RNNs have several strengths, including the stract features of time series, while DSANet [218] and MICN
ability to process time series of arbitrary length and model [134] employ convolutional kernels of two different scales,
temporal dependencies. But they also have certain limita- extracting both local and global features of time series si-
tions [212], [213], such as the difficulty in capturing long- multaneously. Unlike the traditional organization of convo-
term dependencies due to vanishing or exploding gradients lutional kernels, SCINet [135] uses sequence downsampling
and challenges with parallelization due to their sequential segmentation and organizes the convolutional networks
nature. Recent advancements, such as RWKV [214] and according to binary trees, alleviating the issues of limited
xLSTM [215], have provided new insights into RNNs. The receptive fields in lower layers of TCN and the inability of
integration of these advancements into time series forecast- a single convolutional filter to capture complex features. To
ing is still an open question and requires further exploration. fully leverage the frequency characteristics of time series,
FTMixer [136] employs convolutional neural networks to
6.2.2 Convolutional Neural Networks
extract both frequency domain and time domain features of
Convolutional Neural Network (CNN) is a prevalent archi- time series simultaneously.
tecture in deep learning, demonstrating exceptional perfor-
mance in fields such as image processing, video analysis, As the exploration in the field of time series continue
and natural language processing. In the field of time series to advance, researchers have started to focus on the study
analysis, the convolution operation, a fundamental opera- of generic frameworks for time series analysis. CNNs, with
tion in the CNN, can be defined as follows: their powerful data understanding capabilities and high
computational efficiency, have been widely used in the
C K−1
X X design of generic frameworks for time series. TimesNet [96]
Y (t) = Xc (t + k) · hc (k), (6) folds the time series according to its primary cycles, treats
c=1 k=0 the folded time series as images, and uses 2D convolution to
where X be the input time series sequence of length L extract abstract features. ModernTCN [75] employs Depth-
with C channels, and h be the convolutional kernel of size Wise Separable convolutions instead of traditional convolu-
K . Y is the output of the convolution operation at time t for tions, achieving high effectiveness and computational effi-
a single output channel. ciency of time series analysis. It also introduces the concept
Convolutional Neural Networks (CNNs) exhibit three of reparameterization, which stabilizes the learning of large
key characteristics when analyzing time series data: lo- convolutional kernels. At the same time, ConvTimeNet [78]
cal connectivity, weight sharing, and translation invariance adopts a multi-scale deep convolutional neural network
[216]. Local connectivity ensures that each neuron focuses with large kernels to simultaneously learn global representa-
PREPRINT 13
tions and deep representations. TS2Vec [137], which utilizes Fourier Transform (FFT). The aforementioned representa-
the TCN architecture to extract features, employs contrastive tive methods have achieved a computational complexity of
learning for the feature views at each layer, providing strong O(L log L), while some more recent methods have pushed
contextual support for each timestamp. the complexity to O(L): Pyraformer [220] introduces the
In summary, convolutional neural networks excel in pyramidal attention module (PAM) in which the inter-
computational efficiency and performance in time series scale tree structure summarizes features at different reso-
forecasting. However, due to their parameter sharing and lutions and the intra-scale neighboring connections model
local perception characteristics, these networks relatively the temporal dependencies of different ranges. FEDformer
struggle to capture dependencies between different time [139] proposes frequency-enhanced blocks to capture im-
points in long sequences. portant structures in time series through frequency domain
mapping, which also achieves linear complexity By ran-
6.2.3 Attention-based Neural Networks domly selecting a fixed number of Fourier components.
Transformer [219] is one of the most successful architectures Moreover from the non-stationarity perspective, the Non-
in the era of deep learning, which has brought significant stationary Transformer [221] exploits the De-stationary At-
advancements in various research areas. The most outstand- tention mechanism for boosting the forecasting performance
ing design of the Transformer is its attention mechanism, of the mainstream transformers.
which is expressed as: In terms of the overall structure, early research on Trans-
former variants for forecasting predominantly adopted the
QKT traditional encoder-decoder structure [30], [32], [42], [139],
Attention(Q, K, V) = Softmax( √ )V, (7)
d [221], [222]. The encoders process the long horizon series
where Q, K, V are the query, key, and value vectors with into hidden states, which are later decoded by the decoders
dimension d respectively. Based on this, the Transformer is to generate future forecasting results through one forward
stacked by multiple blocks which function as: procedure. Later works point out that complex decoders
are not necessary and adopt the encoder-only structure by
′
Hl = LayerNorm(SelfAttn(Hl ) + Hl ), replacing the decoder part with a linear prediction layer. The
′ ′ (8) experimental results show that encoder-only transformers
Hl+1 = LayerNorm(FFN(Hl ) + Hl ).
achieve more accurate forecasting results on the benchmark
Here SelfAttn is a special form of the attention mechanism datasets [45], [73], [140], [223], [224]. Moreover, recent stud-
where all three vectors are derived from the same input. Be- ies have focused on training a time series foundation model,
sides, Hl is the input of l−th block, FFN is the feed-forward the auto-regressive decoder-only transformer structure is
network made up of multi-layer perception, and LayerNorm widely utilized due to its ability to process and generate
refers to the layer normalization operation. Through the arbitrary lengths of series [48], [49], [153].
iterative modeling approach, the Transformer has shown In summary, the Transformer architecture has been ex-
great modeling ability for long-range dependencies, and tensively studied for forecasting tasks due to its ability to
recent works have proposed many variants of Transformers effectively capture long-term dependencies and its scala-
tailored for time series forecasting tasks [13]. bility as a foundation model. However, when applied to
Early Transformer-based forecasting models generally small-scale domain-specific time series data, the substantial
adopt the traditional method of projecting multi-channel data requirements for training Transformers may result in
data at a single time step into a hidden state [30]. However, overfitting issues.
due to the inherent noise in time series data, a single time
step lacks semantic meaning comparable to that of a word 6.2.4 Multi-layer Perceptrons
in a sentence. To address this limitation, PatchTST [73] Time series forecasting task basically regresses future values
introduces a patching technique that enhances locality and based on the observation within a lookback window. The
captures richer semantic information, a method now widely Multi-layer Perceptrons (MLP) based approaches assume
adopted by recent studies. Additionally, iTransformer [140] that the major correlation exists in a linear form which
refines this approach by projecting individual time points linear models with high computational efficiency and in-
into variate-level tokens, enabling a more effective repre- terpretability can well capture.
sentation of multivariate correlations. From the model’s architecture perspective, N-BEATS
On the other hand for the core attention mechanism, [141] is a groundbreaking work that constructs a deep
the original attention module operates on each time step, neural architecture based on backward and forward residual
exhibiting quadratic time and memory complexity of O(L2 ), links and an exceptionally deep stack of fully connected lay-
where L is the sequence length. Given that time series data ers. Building on this stacking approach, Nhits [44] integrates
are often formed in long sequences, the conventional atten- innovative hierarchical interpolation and multi-rate data
tion mechanism may lead to a significant computational sampling techniques to reduce computational complexity
burden and be susceptible to disturbances from noise or and achieve forecast volatility, and Koopa [142] disentangles
outliers. To this end, LogTrans [138] suggests a sparse con- time-variant and time-invariant components from intricate
volutional self-attention mechanism that generates queries non-stationary series by Fourier Filter and designs MLP-
and keys using causal convolution for lowering computa- based Koopman Predictor to advance respective dynam-
tional complexity. Besides, Autoformer [42] has developed ics forward. Additionally, TimeMixer [145] notes that time
the Auto-Correlation mechanism, discovers and represents series display unique patterns at various sampling scales.
dependencies at the sub-series level through the use of Fast Consequently, it constructs a fully MLP-based architecture
PREPRINT 14
on multiple datasets, TimeVAE demonstrates its ability to results on clinical time series. Additionally, FM-TS [161]
accurately represent temporal attributes, performing well in simplifies time series generation through a Flow Matching-
both similarity and next-step prediction tasks. Furthermore, based framework, offering efficient training and inference
it can integrate domain-specific patterns, such as poly- while outperforming diffusion models in both uncondi-
nomial trends and seasonalities, to generate interpretable tional and conditional time series generation.
outputs. This feature is especially beneficial for applications
requiring transparency. 6.4.5 Diffusion Models
Another important contribution is HyVAE [155]. HyVAE In time series forecasting, Diffusion Denoising Probabilistic
combines the strengths of VAEs with diffusion models by Models (DDPMs) have gained prominence for capturing
employing a hybrid variational inference approach. This complex temporal patterns, achieving high predictive per-
integration enhances the model’s ability to capture temporal formance, and generating realistic data samples. DDPMs
dependencies and uncertainty, leading to improved perfor- operate via a diffusion-denoising process: noise is incre-
mance in time series forecasting tasks. mentally added during forward diffusion, and subsequently
removed in reverse diffusion to recover the original data.
6.4.3 Generative Adversarial Networks (GANs) This allows the model to learn complex data distributions
GANs consist of two components: a generator, which learns and produce high-quality forecasts.
to produce realistic samples, and a discriminator, which Early models like TimeGrad [162] introduced autore-
distinguishes between real and generated data [228]. This gressive denoising with Langevin sampling. TSDiff [163]
adversarial training framework enables GANs to generate improved short-term accuracy and data generation through
highly realistic data, making them powerful tools for data self-guidance, while ScoreGrad [164] utilized stochastic dif-
synthesis. In the context of time series, GANs must address ferential equations (SDEs) for continuous-time forecasting,
the challenge of modeling sequential dependencies. addressing irregularly sampled data. Conditional diffusion
A seminal work in this field is TimeGAN [156]. models, such as TimeDiff [165] and DiffLoad [230], in-
TimeGAN adapts the GAN framework specifically for time corporated external information to improve accuracy, with
series data by incorporating a recurrent architecture into applications ranging from power load forecasting to sparse
both the generator and the discriminator. This allows the ICU and ECG data [231], [232].
model to capture temporal dependencies while maintaining Recent advancements include Latent Diffusion Models
statistical consistency with the original data. TimeGAN has (LDMs), with examples such as Latent Diffusion Trans-
been shown to effectively generate realistic time series data, formers (LDT) [233], which have demonstrated notable im-
which can be used for tasks such as simulation, data aug- provements in both precipitation forecasting and scalability.
mentation, and anomaly detection. Recent studies have ex- Innovations such as DSPD and CSPD [53] extend diffusion
tended this framework to specific domains. For example, the to function space for anomaly detection and interpolation.
Context-aware Traffic Flow Forecasting in New Roads [157] Models like FDF [166] address challenges in trend modeling
presents a GAN-based model that predicts traffic flow on by integrating linear layers and conditional modules to
new roads by considering contextual factors like weather capture trends and seasonal components, improving long-
and day type, demonstrating effectiveness in scenarios with term forecasting.
limited data. Another approach, Curb-GAN [158], addresses Diffusion models have also been applied to diverse
the urban traffic estimation problem by using a condi- domains, including flood forecasting [234], stock market
tional GAN with dynamic convolutional layers and self- prediction [235], and electric vehicle load forecasting [236].
attention mechanisms to capture both spatial and temporal While they show strong potential, limitations remain in
dependencies, providing accurate estimations even under effectively modeling trends and long-term dependencies.
unprecedented travel demand patterns. Future advancements in denoising techniques and trend
separation methods, combined with diffusion models’ high
6.4.4 Flow-based Models parallelization and cross-domain applicability, offer promis-
Normalizing Flows-based models transform a simple ing opportunities for further enhancement and deployment
base distribution into a more complex target distribution across various fields.
through a series of invertible and differentiable transfor-
mations [229]. This method facilitates exact likelihood es- 7 T RANSFER L EARNING M ETHODS
timation, making it particularly effective for modeling high- In this section, we introduce the transfer learning techniques
dimensional data with complex dependencies. In the context for time series forecasting, including self-supervised pre-
of time series, MAF [21] utilizes normalizing flows to model training, domain adaptation, and LLM-based methods.
multivariate time series by conditioning on past observa-
tions, capturing intricate temporal dependencies. 7.1 Self-supervised Pre-training Methods
Recent developments have further extended this tech- Self-supervised learning alleviates the dependence on large
nique in various ways. For instance, Conditional Flow labeled datasets by enabling models to learn transferable
Matching for Time Series (CFMTS) [159] improves the train- representations through pre-training on unlabeled data.
ing of neural ODEs by regressing vector fields of conditional Prominent self-supervised pre-training techniques include
probability paths, outperforming traditional methods on contrastive learning, denoising masked autoencoders, and
long trajectory tasks. Trajectory Flow Matching (TFM) [160] autoregressive pre-training models. The proposed taxon-
introduces a simulation-free training method for Neural omy is illustrated in Figure 8, and the related works can
SDEs, enhancing stability and scalability, with promising be found in Table 4.
PREPRINT 16
domain discriminator to learn domain-invariant latent fea- components or the entire base parameters. Finally, for the
tures, leveraging statistical strengths from the source do- processing of the input layer, models like Chronus [22]
main to enhance target domain performance. STONE [175] and UniTime [178] might directly input time series data
focuses on learning invariant node dependencies for OOD processed through embedding into the model, whereas
spatial-temporal data, addressing both structural and tem- models like TimeLLM [182] adopt a feature fusion approach,
poral shifts. Moreover, FOIL [176] introduces an innovative reprogramming temporal features with textual information
surrogate loss function designed to alleviate the influence to enhance the informativeness of the input data. The com-
of unobserved variables, coupled with a joint optimiza- bined application of these diverse strategies and methods
tion strategy. This facilitates the acquisition of invariant enables the models to exhibit exceptional performance in
representations across the inferred environments, thereby handling time series forecasting tasks. To better demonstrate
enhancing the forecasting accuracy for OOD generalized the above content, we select several representative models
time series data. and present their tuning perspectives in Table 5.
7.3.2 Tuning-Free
7.3 LM-based Models
The tuning-free approach utilizes pre-trained language
The LM-based time series predictor is an innovative ap-
models for direct time series forecasting without the need
proach that utilizes advanced language models to forecast
for additional fine-tuning. This method offers the advantage
future values. These language models, pretrained on exten-
of rapid deployment, significantly reducing computational
sive textual data, possess the ability to capture rich semantic
costs and time. By leveraging the general linguistic features
information and robust reasoning capabilities [238], [239].
already learned by the model, effective predictions can be
This enables them to effectively analyze and reason about
made across various types of time series data. In this con-
time series data, thereby enhancing their understanding
text, LLMTime [181] incorporates the sampling background
and predictive abilities regarding time series patterns and
of instance as textual information, constructing it along with
trends [182]. Moreover, compared to traditional time series
the input sequence into a prompt. Going further, LSTPrompt
predictor, language models can further improve prediction
[242] introduces a TimeBreak design that simulates human
accuracy by integrating relevant textual information such as
thinking scenarios, allowing the model to take a break after
individual characteristics and sampling backgrounds with
every k predictions. When designing prompts for predict-
corresponding numerical features through specific prompts
ing patient survival probabilities [243], Zhu et al. make
or quantification processes [178], [180], [240]. We categorize
full use of the patients’ Electronic Health Records (EHR)
the LM-based time series predictors into two types: tuning-
and adopt a context learning strategy consistent with the
based and tuning-free.
clinical environment. Besides, TimeRAF [244] adopts the
RAG concept, using a K-nearest neighbors approach in the
7.3.1 Tuning-Based
database to retrieve the k closest time series samples and
The tuning-based approach involves making precise ad- constructing them into prompts for reference by the lan-
justments to the backbone parameters to adapt to specific guage model. Similarly, TimeRAG [245] integrates multiple
time series data. This usually employs pre-trained lan- retrieved neighboring samples with the original text into
guage models and involves additional training on specific JSON-formatted strings, and then feed it into the language
time series data to adjust the model weights. The fine- model. Meanwhile, TableTime [246] validates the capabili-
tuning process helps the model to more accurately under- ties of the large language model, Llama-3.1-405b-instruct, in
stand and predict specific patterns and trends within the understanding and classifying time series. In agent perspec-
time series. To better achieve this goal, researchers have tive, TESSA [247] designs a time series annotation scheme
explored multiple tuning perspectives. First of all, from that leverages both general and specific domain annotation
the forecasting paradigm, the AutoTimes [154] utilizes an agents to generate corresponding textual annotations for
autoregressive generation approach, which is particularly time series, significantly enhancing the understanding and
suited for simulating left-to-right sequential relationships reasoning capabilities of language models (such as GPT-4o)
and incrementally constructing the target sequence [153], regarding time series data. It is noteworthy that A. Merrill
[173]. To avoid the potential error accumulation associated et al. [248] have found that tuning-free forecasting models
with autoregression, other models such as FPT [177] opt perform only marginally better than random guessing in
for a One-step Prediction approach, generating the entire time series reasoning tasks, and the introduction of related
forecast target sequence in one go. Besides, for the training context also offers only modest improvements in predictive
paradigm, models like CrossTimeNet [86] introduce a pre- ability. These weaknesses indicate that time series reasoning
training phase of self-supervised representation learning, is an influential yet severely underdeveloped direction in
which deepen the model’s perception and understanding of tuning-free LM-based time series predictor.
time series data through extensive representation learning.
Additionally, models such as TEMPO [180] incorporate the
input of related textual information, enabling the effective 7.4 Time Series Foundation Models
use of text features to assist in predictions. Moreover, re- Recent years have witnessed the emergence of time series
garding base model parameter updates, some models like foundation models pretrained on large-scale temporal data,
LLM4TS [179] employ a LoRA [241] fine-tuning strategy to which drive innovations in time series forecasting tasks
make low-rank adjustments to critical components, while through their cross-domain transferable representations. Ex-
other models [86], [177] may directly update certain key isting architectures of time series foundation models gener-
PREPRINT 18
TABLE 5: The summary of tuning perspectives from several representative methods, corresponding to the definitions in
section 7.3. In the column of ”Parameter Updating”, D and L represent ”Directly Updating Parameter” and ”Lora Fine-
tuning” respectively, with the specific tuning components indicated in parentheses.
ally adopt two paradigms: encoder-based structures that ex- vancements in trustworthy time series forecasting, focusing
tract universal temporal features, and decoder-based struc- on interpretability, robustness, and privacy preserving.
tures that focus on autoregressive generation capabilities.
The encoder-based approaches include MOMENT [184],
8.1 Interpretability
which integrates multi-domain data from transportation
and healthcare to establish a multi-task pretraining system Trustworthy time series forecasting involves not only deliv-
supporting classification, forecasting, and anomaly detec- ering accurate predictions but also addressing key concerns
tion. Following similar principles, Moirai [49] introduces such as model interpretability, which enables users to under-
attention mechanisms for arbitrary variates and achieves stand and trust the reasoning behind predictions. To achieve
generalizable forecasting through masked pretraining on interpretability in forecasting models, mainstream research
the LOTSA dataset, demonstrating effectiveness in out-of- primarily focuses on two approaches: causal discovery and
distribution scenarios like electricity load forecasting. In physics-informed neural networks (PINNs).
contrast, decoder-based architectures exhibit distinct design Causal discovery enhances time series forecasting by
philosophies: TimesFM [185] combines patch techniques uncovering the underlying cause-and-effect relationships
with decoder structures and frequency-specific tokeniza- between variables [249]. This approach provides deeper
tion to enable zero-shot generalization, while Chronos [22] insights into how variables influence one another over time,
discretizes continuous time series into token buckets and improving both accuracy and interpretability, particularly
leverages synthetic data with cross-entropy training for in complex systems where predictions rely on the causal
few-shot forecasting. Recently, Timer [153], unifies diverse structure [250], [251]. Statistical models such as VAR utilize
downstream tasks through decoder-only architectures and Granger causality to enhance predictions [186], while dy-
autoregressive generative training strategies. namic Bayesian networks capture temporal dependencies
Despite architectural variations, these models share fun- and adapt to changing causal structures [187]. Deep learn-
damental principles of simplicity and generality, avoiding ing models also benefit from integrating causal inference,
over-specialized designs. Current foundation models fo- enhancing their interpretability and ability to manage com-
cus primarily on constructing large-scale temporal datasets, plex, non-stationary patterns [188].
developing cross-domain modeling techniques, and estab- On the other hand, integrating PINNs into time series
lishing unified frameworks for multivariate analysis. No- forecasting has recently emerged as a promising direction
tably, the data scale surpasses previous end-to-end methods for improving both prediction accuracy and interpretability
by orders of magnitude (e.g., 27B time steps in LOTSA). [189], [190]. PINNs guide data-driven approaches using
Furthermore, synthetic time series data is emerging as a physical principles, such as conservation laws and differen-
critical pretraining resource, with synthetic data constituting tial equations, ensuring predictions are consistent with both
20% of TimesFM’s pretraining corpus. These developments data and physical realities—particularly in scenarios with
signal a paradigm shift toward data-centric methodologies limited or noisy data. By embedding physical knowledge
in both foundation model development and downstream directly into time series models, PINNs ensure that predic-
forecasting tasks. tions align with physical laws [191], [192], [193]. This not
only enhances the learning process but also improves the
robustness and reliability of the models, making them better
8 T RUSTWORTHY T IME S ERIES F ORECASTING suited for practical applications where physical consistency
is crucial.
Recently, the demand for reliable and trustworthy time se-
ries forecasting has grown exponentially, driven by its wide
applications across various domains, including finance, 8.2 Robustness
healthcare, and energy management. As these forecasting Despite significant advancements in recent forecasting mod-
models become increasingly integrated into critical decision- els, they remain vulnerable to adversarial attacks, which
making processes, ensuring their trustworthiness becomes raises important concerns about their trustworthy deploy-
paramount. Next, we primarily discuss the research ad- ment in critical applications. Understanding the robustness
PREPRINT 19
city development by improving energy scheduling, smart direction, and many recent efforts have been devoted as pio-
grid management, and air quality tracking, supporting in- neering explorations. Despite the effectiveness, there remain
telligent and sustainable environmental management. multiple unresolved issues worth exploring.
distinct tasks. Furthermore, to better model the contextual 10.4 Comprehensive Benchmark Evaluation
characteristics mentioned above, an urgent research chal- In addition to model design, model performance evalua-
lenge lies in effectively integrating multiple modalities, such tion and the design of appropriate benchmarks are also
as text, images, and graphs, into the forecasting process. important future research directions in the field of time
On the other hand, it remains inconclusive whether series forecasting. Existing benchmarks often lack a suffi-
training a universal foundation model is the optimal solu- ciently broad range of data distributions and fail to clearly
tion for time series forecasting. This uncertainty, however, differentiate the difficulty levels of tasks. Moreover, the
opens the door to several potentially valuable research evaluation metrics used are typically too simplistic, making
directions. First, it is worth exploring the operational mech- it difficult to assess the strengths and weaknesses of each
anisms of general foundation models in more depth, such model from multiple perspectives. This limitation may even
as scaling laws and emergent phenomena. Additionally, in- encourage an arms race-style search for hyperparameters of
vestigating the impact of different data ratios on model per- each model for better testing performance. Therefore, there
formance to guide the construction of pretraining datasets is a need to develop more generalized evaluation datasets
is a promising avenue. Lastly, training domain-specific time and diverse evaluation metrics to ensure the healthy and
series forecasting foundation models, which fully integrate well-rounded development of the entire research field.
the inductive biases and prior knowledge of the respective
domain for more precise predictions, is also a direction
worth exploring. 11 C ONCLUSION
In this survey, we have presented a holistic and structured
examination of recent advancements in time series forecast-
10.2 Trustworthy Time Series Forecasting ing, encompassing foundational concepts, methodological
evolutions, and critical challenges. We established a unified
Most of the current mainstream time series forecasting
perspective that bridges classical statistical modeling and
methods rely on black-box neural network models, which,
cutting-edge deep learning paradigms, highlighting how
while offering high accuracy, are difficult to apply in many
shifts in data availability, computational power, and al-
sensitive real-world applications such as healthcare analysis
gorithmic sophistication are reshaping the field. Through
and decision-making. Therefore, a promising research di-
careful attention to key hurdles—ranging from handling
rection is to leverage advanced techniques such as explain-
non-stationarity and uncertainty quantification to manag-
able AI and causal inference, in combination with existing
ing high-dimensionality and interpretability—we have il-
powerful forecasting models, to construct interpretable and
lustrated the complexity and dynamism that define modern
trustworthy forecasting systems that can facilitate practical,
time series forecasting tasks. We further surveyed promi-
real-world applications.
nent benchmark datasets and evaluation metrics, underscor-
Moreover, with the increasing demand for training time
ing the importance of robust and fair performance compar-
series forecasting models using multi-source cross-domain
isons to drive meaningful progress. Crucially, we identified
data, ensuring data privacy and model security has become
emerging trends, such as leveraging generative models,
an important research topic. Specifically, multi-source cross-
explainable AI techniques, and integrative frameworks that
domain training often relies on federated learning frame-
combine domain knowledge with data-driven insights. The
works. This raises critical issues such as how to protect the
work presented here not only consolidates the state of
privacy of client-side time series data from leakage using
the art but also illuminates avenues for future innovation,
techniques like differential privacy, and how to improve
offering researchers and practitioners a coherent reference
the robustness of models to guard against data poisoning
point as they navigate the ever-evolving landscape of time
attacks from malicious clients. These are pressing research
series forecasting research.
challenges that need to be addressed.
ing,” Advances in neural information processing systems, vol. 34, pp. ordinate computing and iewt reconstruction,” Energy Conversion
22 419–22 430, 2021. and Management, vol. 167, pp. 203–219, 2018.
[43] M. C. Mozer, “Induction of multiscale temporal structure,” Ad- [64] J. Xin, C. Zhou, Y. Jiang, Q. Tang, X. Yang, and J. Zhou, “A signal
vances in neural information processing systems, vol. 4, 1991. recovery method for bridge monitoring system using tvfemd and
[44] C. Challu, K. G. Olivares, B. N. Oreshkin, F. G. Ramirez, M. M. encoder-decoder aided lstm,” Measurement, vol. 214, p. 112797,
Canseco, and A. Dubrawski, “Nhits: Neural hierarchical inter- 2023.
polation for time series forecasting,” in Proceedings of the AAAI [65] P. Bonizzi, J. M. Karel, O. Meste, and R. L. Peeters, “Singular
conference on artificial intelligence, vol. 37, no. 6, 2023, pp. 6989– spectrum decomposition: A new method for time series decom-
6997. position,” Advances in Adaptive Data Analysis, vol. 6, no. 04, p.
[45] M. A. Shabani, A. H. Abdi, L. Meng, and T. Sylvain, “Scale- 1450011, 2014.
former: Iterative multi-scale refining transformers for time series [66] L. Karthikeyan and D. N. Kumar, “Predictability of nonstationary
forecasting,” in The Eleventh International Conference on Learning time series using wavelet and emd based arma models,” Journal
Representations, 2023. of hydrology, vol. 502, pp. 103–119, 2013.
[46] M. Hou, C. Xu, Y. Liu, W. Liu, J. Bian, L. Wu, Z. Li, E. Chen, [67] N. A. Agana and A. Homaifar, “Emd-based predictive deep belief
and T.-Y. Liu, “Stock trend prediction with multi-granularity network for time series prediction: An application to drought
data: A contrastive learning approach with adaptive fusion,” in forecasting,” Hydrology, vol. 5, no. 1, p. 18, 2018.
Proceedings of the 30th ACM International Conference on Information [68] W.-c. Wang, K.-w. Chau, D.-m. Xu, and X.-Y. Chen, “Improving
& Knowledge Management, 2021, pp. 700–709. forecasting accuracy of annual runoff time series using arima
[47] B. Rim, N.-J. Sung, S. Min, and M. Hong, “Deep learning in based on eemd decomposition,” Water Resources Management,
physiological signal data: A survey,” Sensors, vol. 20, no. 4, p. vol. 29, pp. 2655–2675, 2015.
969, 2020. [69] M. Theodosiou, “Forecasting monthly and quarterly time se-
[48] Z. Liu, J. Yang, M. Cheng, Y. Luo, and Z. Li, “Generative pre- ries using stl decomposition,” International Journal of Forecasting,
trained hierarchical transformer for time series forecasting,” in vol. 27, no. 4, pp. 1178–1195, 2011.
Proceedings of the 30th ACM SIGKDD Conference on Knowledge [70] J. Nasir, M. Aamir, Z. U. Haq, S. Khan, M. Y. Amin, and
Discovery and Data Mining, 2024, pp. 2003–2013. M. Naeem, “A new approach for forecasting crude oil prices
[49] G. Woo, C. Liu, A. Kumar, C. Xiong, S. Savarese, and D. Sahoo, based on stochastic and deterministic influences of lmd using
“Unified training of universal time series forecasting transform- arima and lstm models,” IEEE Access, vol. 11, pp. 14 322–14 339,
ers,” in Forty-first International Conference on Machine Learning, 2023.
2024.
[71] M. Wang, J. Yang, B. Yang, H. Li, T. Gong, B. Yang, and
[50] J. Shi, Q. Ma, H. Ma, and L. Li, “Scaling law for time series J. Cui, “Towards lightweight time series forecasting: a patch-
forecasting,” arXiv preprint arXiv:2405.15124, 2024. wise transformer with weak data enriching,” arXiv preprint
[51] P. Bansal, P. Deshpande, and S. Sarawagi, “Missing value impu- arXiv:2501.10448, 2025.
tation on multidimensional time series,” Proceedings of the VLDB
[72] C. Ying and J. Lu, “Tfeformer: Temporal feature enhanced trans-
Endowment, vol. 14, no. 11, pp. 2533–2545, 2021.
former for multivariate time series forecasting,” IEEE Access,
[52] Y. Luo, X. Cai, Y. Zhang, J. Xu et al., “Multivariate time series 2024.
imputation with generative adversarial networks,” Advances in
[73] Y. Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time
neural information processing systems, vol. 31, 2018.
series is worth 64 words: Long-term forecasting with transform-
[53] M. Biloš, K. Rasul, A. Schneider, Y. Nevmyvaka, and
ers,” in The Eleventh International Conference on Learning Represen-
S. Günnemann, “Modeling temporal data as continuous func-
tations, 2023.
tions with stochastic process diffusion,” in International Conference
on Machine Learning. PMLR, 2023, pp. 2452–2470. [74] P. Chen, Y. Zhang, Y. Cheng, Y. Shu, Y. Wang, Q. Wen,
B. Yang, and C. Guo, “Pathformer: Multi-scale transformers with
[54] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng,
adaptive pathways for time series forecasting,” arXiv preprint
N.-C. Yen, C. C. Tung, and H. H. Liu, “The empirical mode
arXiv:2402.05956, 2024.
decomposition and the hilbert spectrum for nonlinear and non-
stationary time series analysis,” Proceedings of the Royal Society [75] D. Luo and X. Wang, “Moderntcn: A modern pure convolution
of London. Series A: mathematical, physical and engineering sciences, structure for general time series analysis,” in The Twelfth Interna-
vol. 454, no. 1971, pp. 903–995, 1998. tional Conference on Learning Representations, 2024.
[55] P. Chaovalit, A. Gangopadhyay, G. Karabatis, and Z. Chen, “Dis- [76] Z. Gong, Y. Tang, and J. Liang, “Patchmixer: A patch-mixing
crete wavelet transform-based time series analysis and mining,” architecture for long-term time series forecasting,” arXiv preprint
ACM Computing Surveys (CSUR), vol. 43, no. 2, pp. 1–37, 2011. arXiv:2310.00655, 2023.
[56] T. Yoon, Y. Park, E. K. Ryu, and Y. Wang, “Robust probabilistic [77] V. Ekambaram, A. Jati, N. Nguyen, P. Sinthong, and
time series forecasting,” in International Conference on Artificial J. Kalagnanam, “Tsmixer: Lightweight mlp-mixer model for mul-
Intelligence and Statistics. PMLR, 2022, pp. 1336–1358. tivariate time series forecasting,” in Proceedings of the 29th ACM
[57] N. Passalis, A. Tefas, J. Kanniainen, M. Gabbouj, and A. Iosifidis, SIGKDD Conference on Knowledge Discovery and Data Mining, 2023,
“Deep adaptive input normalization for time series forecasting,” pp. 459–469.
IEEE transactions on neural networks and learning systems, vol. 31, [78] M. Cheng, J. Yang, T. Pan, Q. Liu, and Z. Li, “Convtimenet: A
no. 9, pp. 3760–3765, 2019. deep hierarchical fully convolutional model for multivariate time
[58] W. Fan, P. Wang, D. Wang, D. Wang, Y. Zhou, and Y. Fu, “Dish- series analysis,” arXiv preprint arXiv:2403.01493, 2024.
ts: a general paradigm for alleviating distribution shift in time [79] Q. Huang, L. Shen, R. Zhang, J. Cheng, S. Ding, Z. Zhou, and
series forecasting,” in Proceedings of the AAAI conference on artificial Y. Wang, “Hdmixer: Hierarchical dependency with extendable
intelligence, vol. 37, no. 6, 2023, pp. 7522–7529. patch for multivariate time series forecasting,” in Proceedings of
[59] L. Han, H.-J. Ye, and D.-C. Zhan, “SIN: Selective and interpretable the AAAI Conference on Artificial Intelligence, vol. 38, no. 11, 2024,
normalization for long-term time series forecasting,” in Forty-first pp. 12 608–12 616.
International Conference on Machine Learning, 2024. [80] S. Zhong, S. Song, W. Zhuo, G. Li, Y. Liu, and S.-H. G. Chan, “A
[60] S. Lahmiri, “A variational mode decompoisition approach for multi-scale decomposition mlp-mixer for time series analysis,”
analysis and forecasting of economic and financial time series,” arXiv preprint arXiv:2310.11959, 2023.
Expert Systems with Applications, vol. 55, pp. 268–273, 2016. [81] Y. Zhang, L. Ma, S. Pal, Y. Zhang, and M. Coates, “Multi-
[61] E. Ghanbari and A. Avar, “Short-term wind power forecasting resolution time-series transformer for long-term forecasting,”
using the hybrid model of multivariate variational mode de- in International Conference on Artificial Intelligence and Statistics.
composition (mvmd) and long short-term memory (lstm) neural PMLR, 2024, pp. 4222–4230.
networks,” Electrical Engineering, pp. 1–31, 2024. [82] K. Rasul, A. Ashok, A. R. Williams, A. Khorasani, G. Adamopou-
[62] Y. Wang, S. Sun, X. Chen, X. Zeng, Y. Kong, J. Chen, Y. Guo, and los, R. Bhagwatkar, M. Biloš, H. Ghonia, N. Hassen, A. Schneider
T. Wang, “Short-term load forecasting of industrial customers et al., “Lag-llama: Towards foundation models for time series
based on svmd and xgboost,” International Journal of Electrical forecasting,” in R0-FoMo: Robustness of Few-shot and Zero-shot
Power & Energy Systems, vol. 129, p. 106830, 2021. Learning in Large Foundation Models, 2023.
[63] Y. Li, H. Wu, and H. Liu, “Multi-step wind speed forecasting [83] J. Xie, W. Mao, Z. Bai, D. J. Zhang, W. Wang, K. Q. Lin,
using ewt decomposition, lstm principal computing, relm sub- Y. Gu, Z. Chen, Z. Yang, and M. Z. Shou, “Show-o: One single
PREPRINT 24
transformer to unify multimodal understanding and generation,” in neural information processing systems, vol. 33, pp. 17 804–17 815,
arXiv preprint arXiv:2408.12528, 2024. 2020.
[84] P. Anastassiou, J. Chen, J. Chen, Y. Chen, Z. Chen, Z. Chen, [105] C. Shang, J. Chen, and J. Bi, “Discrete graph structure learning
J. Cong, L. Deng, C. Ding, L. Gao et al., “Seed-tts: A family of for forecasting multiple time series,” in International Conference on
high-quality versatile speech generation models,” arXiv preprint Learning Representations, 2021.
arXiv:2406.02430, 2024. [106] F. Johnston, J. E. Boyland, M. Meadows, and E. Shale, “Some
[85] P. Schäfer and M. Högqvist, “Sfa: a symbolic fourier approx- properties of a simple moving average when applied to fore-
imation and index for similarity search in high dimensional casting a time series,” Journal of the Operational Research Society,
datasets,” in Proceedings of the 15th international conference on vol. 50, no. 12, pp. 1267–1271, 1999.
extending database technology, 2012, pp. 516–527. [107] C. C. Holt, “Forecasting seasonals and trends by exponentially
[86] M. Cheng, X. Tao, Q. Liu, H. Zhang, Y. Chen, and C. Lei, “Learn- weighted moving averages,” International journal of forecasting,
ing transferable time series classifier with cross-domain pre- vol. 20, no. 1, pp. 5–10, 2004.
training from language model,” arXiv preprint arXiv:2403.12372, [108] E. S. Gardner Jr, “Exponential smoothing: The state of the art,”
2024. Journal of forecasting, vol. 4, no. 1, pp. 1–28, 1985.
[87] A. Van Den Oord, O. Vinyals et al., “Neural discrete representa- [109] E. Ostertagova and O. Ostertag, “Forecasting using simple ex-
tion learning,” Advances in neural information processing systems, ponential smoothing method,” Acta Electrotechnica et Informatica,
vol. 30, 2017. vol. 12, no. 3, p. 62, 2012.
[88] A. Van Den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, [110] C. Chatfield, A. B. Koehler, J. K. Ord, and R. D. Snyder, “A new
A. Graves, N. Kalchbrenner, A. Senior, K. Kavukcuoglu et al., look at models for exponential smoothing,” Journal of the Royal
“Wavenet: A generative model for raw audio,” arXiv preprint Statistical Society: Series D (The Statistician), vol. 50, no. 2, pp. 147–
arXiv:1609.03499, vol. 12, 2016. 159, 2001.
[89] M. Łajszczak, G. Cámbara, Y. Li, F. Beyhan, A. van Korlaar, [111] A. C. Harvey, “Forecasting, structural time series models and the
F. Yang, A. Joly, Á. Martı́n-Cortinas, A. Abbas, A. Michal- kalman filter,” 1990.
ski et al., “Base tts: Lessons from building a billion-parameter [112] Y. Zhang, “Prediction of financial time series with hidden markov
text-to-speech model on 100k hours of data,” arXiv preprint models,” 2004.
arXiv:2402.08093, 2024. [113] T. Lux, “The markov-switching multifractal model of asset re-
[90] C. Wang, S. Chen, Y. Wu, Z. Zhang, L. Zhou, S. Liu, Z. Chen, turns: Gmm estimation and linear forecasting of volatility,” Jour-
Y. Liu, H. Wang, J. Li et al., “Neural codec language mod- nal of business & economic statistics, vol. 26, no. 2, pp. 194–210,
els are zero-shot text to speech synthesizers,” arXiv preprint 2008.
arXiv:2301.02111, 2023. [114] N. I. Sapankevych and R. Sankar, “Time series prediction using
[91] M. Cheng, Y. Chen, Q. Liu, Z. Liu, Y. Luo, and E. Chen, “In- support vector machines: a survey,” IEEE computational intelli-
structime: Advancing time series classification with multimodal gence magazine, vol. 4, no. 2, pp. 24–38, 2009.
language modeling,” in Proceedings of the Eighteenth ACM Interna- [115] L.-J. Cao and F. E. H. Tay, “Support vector machine with adaptive
tional Conference on Web Search and Data Mining, 2025, pp. 792–800. parameters in financial time series forecasting,” IEEE Transactions
[92] S. Winograd, “On computing the discrete fourier transform,” on neural networks, vol. 14, no. 6, pp. 1506–1518, 2003.
Mathematics of computation, vol. 32, no. 141, pp. 175–199, 1978. [116] W. Lu, W. Wang, A. Y. Leung, S.-M. Lo, R. K. Yuen, Z. Xu,
[93] N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine trans- and H. Fan, “Air pollutant parameter forecasting using support
form,” IEEE transactions on Computers, vol. 100, no. 1, pp. 90–93, vector machines,” in Proceedings of the 2002 International Joint
1974. Conference on Neural Networks. IJCNN’02 (Cat. No. 02CH37290),
[94] M. J. Shensa et al., “The discrete wavelet transform: wedding vol. 1. IEEE, 2002, pp. 630–635.
the a trous and mallat algorithms,” IEEE Transactions on signal [117] P.-F. Pai, K.-P. Lin, C.-S. Lin, and P.-T. Chang, “Time series fore-
processing, vol. 40, no. 10, pp. 2464–2482, 1992. casting by a seasonal support vector regression model,” Expert
[95] A. V. Oppenheim, Discrete-time signal processing. Pearson Edu- Systems with Applications, vol. 37, no. 6, pp. 4261–4265, 2010.
cation India, 1999. [118] L. Zhang, W.-D. Zhou, P.-C. Chang, J.-W. Yang, and F.-Z. Li,
[96] H. Wu, T. Hu, Y. Liu, H. Zhou, J. Wang, and M. Long, “Timesnet: “Iterated time series prediction with multiple support vector
Temporal 2d-variation modeling for general time series analysis,” regression models,” Neurocomputing, vol. 99, pp. 411–422, 2013.
in The Eleventh International Conference on Learning Representations, [119] T. Januschowski, Y. Wang, K. Torkkola, T. Erkkilä, H. Hasson,
2023. and J. Gasthaus, “Forecasting with trees,” International Journal of
[97] X. Zhu, D. Shen, H. Wang, and Y. Hao, “Fcnet: Fully complex Forecasting, vol. 38, no. 4, pp. 1473–1481, 2022.
network for time series forecasting,” IEEE Internet of Things [120] B. Wang, P. Wu, Q. Chen, and S. Ni, “Prediction and analysis
Journal, 2024. of train passenger load factor of high-speed railway based on
[98] P. Liu, B. Wu, N. Li, T. Dai, F. Lei, J. Bao, Y. Jiang, and S.-T. lightgbm algorithm,” Journal of Advanced Transportation, vol. 2021,
Xia, “Wftnet: Exploiting global and local periodicity in long-term no. 1, p. 9963394, 2021.
time series forecasting,” in ICASSP 2024-2024 IEEE International [121] H. Wu, Y. Cai, Y. Wu, R. Zhong, Q. Li, J. Zheng, D. Lin, and Y. Li,
Conference on Acoustics, Speech and Signal Processing (ICASSP). “Time series analysis of weekly influenza-like illness rate using a
IEEE, 2024, pp. 5960–5964. one-year period of factors in random forest regression,” Bioscience
[99] A. Ma, D. Luo, and M. Sha, “Mmfnet: Multi-scale frequency trends, vol. 11, no. 3, pp. 292–296, 2017.
masking neural network for multivariate time series forecasting,” [122] V. Mayrink and H. S. Hippert, “A hybrid method using ex-
arXiv preprint arXiv:2410.02070, 2024. ponential smoothing and gradient boosting for electrical short-
[100] B. Yu, H. Yin, and Z. Zhu, “Spatio-temporal graph convolutional term load forecasting,” in 2016 IEEE Latin American Conference on
networks: a deep learning framework for traffic forecasting,” in Computational Intelligence (LA-CCI). IEEE, 2016, pp. 1–6.
Proceedings of the 27th International Joint Conference on Artificial [123] F. Martı́nez, M. P. Frı́as, M. D. Pérez, and A. J. Rivera, “A method-
Intelligence, 2018, pp. 3634–3640. ology for applying k-nearest neighbor to time series forecasting,”
[101] X. Zhang, R. Cao, Z. Zhang, and Y. Xia, “Crowd flow forecasting Artificial Intelligence Review, vol. 52, no. 3, pp. 2019–2037, 2019.
with multi-graph neural networks,” in 2020 International Joint [124] F. Martı́nez, M. P. Frı́as, M. D. Pérez-Godoy, and A. J. Rivera,
Conference on Neural Networks (IJCNN). IEEE, 2020, pp. 1–7. “Dealing with seasonality by narrowing the training set in time
[102] M. Li and Z. Zhu, “Spatial-temporal fusion graph neural net- series forecasting with knn,” Expert systems with applications, vol.
works for traffic flow forecasting,” in Proceedings of the AAAI 103, pp. 38–48, 2018.
conference on artificial intelligence, vol. 35, no. 5, 2021, pp. 4189– [125] B. Rajagopalan and U. Lall, “A k-nearest-neighbor simulator for
4196. daily precipitation and other weather variables,” Water resources
[103] Z. Wu, S. Pan, G. Long, J. Jiang, X. Chang, and C. Zhang, research, vol. 35, no. 10, pp. 3089–3101, 1999.
“Connecting the dots: Multivariate time series forecasting with [126] F. H. Al-Qahtani and S. F. Crone, “Multivariate k-nearest neigh-
graph neural networks,” in Proceedings of the 26th ACM SIGKDD bour regression for time series data—a novel algorithm for
international conference on knowledge discovery & data mining, 2020, forecasting uk electricity demand,” in The 2013 international joint
pp. 753–763. conference on neural networks (IJCNN). IEEE, 2013, pp. 1–8.
[104] L. Bai, L. Yao, C. Li, X. Wang, and C. Wang, “Adaptive graph [127] G. Lai, W.-C. Chang, Y. Yang, and H. Liu, “Modeling long-and
convolutional recurrent network for traffic forecasting,” Advances short-term temporal patterns with deep neural networks,” in The
PREPRINT 25
41st international ACM SIGIR conference on research & development [148] M. A. Ahamed and Q. Cheng, “Timemachine: A time series
in information retrieval, 2018, pp. 95–104. is worth 4 mambas for long-term forecasting,” arXiv preprint
[128] Y. Qin, D. Song, H. Chen, W. Cheng, G. Jiang, and G. Cottrell, arXiv:2403.09898, 2024.
“A dual-stage attention-based recurrent neural network for time [149] Z. Wang, F. Kong, S. Feng, M. Wang, X. Yang, H. Zhao, D. Wang,
series prediction,” arXiv preprint arXiv:1704.02971, 2017. and Y. Zhang, “Is mamba effective for time series forecasting?”
[129] C. Fan, Y. Zhang, Y. Pan, X. Li, C. Zhang, R. Yuan, D. Wu, arXiv preprint arXiv:2403.11144, 2024.
W. Wang, J. Pei, and H. Huang, “Multi-horizon time series [150] Z. Liu, Y. Wang, S. Vaidya, F. Ruehle, J. Halverson, M. Soljačić,
forecasting with temporal attention learning,” in Proceedings of the T. Y. Hou, and M. Tegmark, “Kan: Kolmogorov-arnold net-
25th ACM SIGKDD International conference on knowledge discovery works,” arXiv preprint arXiv:2404.19756, 2024.
& data mining, 2019, pp. 2527–2535. [151] C. J. Vaca-Rubio, L. Blanco, R. Pereira, and M. Caus,
[130] Y. Wang, A. Smola, D. Maddix, J. Gasthaus, D. Foster, and “Kolmogorov-arnold networks (kans) for time series analysis,”
T. Januschowski, “Deep factors for forecasting,” in International arXiv preprint arXiv:2405.08790, 2024.
conference on machine learning. PMLR, 2019, pp. 6607–6617. [152] R. Genet and H. Inzirillo, “A temporal kolmogorov-arnold
[131] S. Smyl, “A hybrid method of exponential smoothing and re- transformer for time series forecasting,” arXiv preprint
current neural networks for time series forecasting,” International arXiv:2406.02486, 2024.
journal of forecasting, vol. 36, no. 1, pp. 75–85, 2020. [153] Y. Liu, H. Zhang, C. Li, X. Huang, J. Wang, and M. Long, “Timer:
generative pre-trained transformers are large time series mod-
[132] S. Lin, W. Lin, W. Wu, F. Zhao, R. Mo, and H. Zhang, “Segrnn:
els,” in Proceedings of the 41st International Conference on Machine
Segment recurrent neural network for long-term time series
Learning, 2024, pp. 32 369–32 399.
forecasting,” arXiv preprint arXiv:2308.11200, 2023.
[154] Y. Liu, G. Qin, X. Huang, J. Wang, and M. Long, “Autotimes:
[133] J. Cheng, K. Huang, and Z. Zheng, “Towards better forecasting Autoregressive time series forecasters via large language mod-
by fusing near and distant future visions,” in Proceedings of the els,” Advances in Neural Information Processing Systems, vol. 37,
AAAI Conference on Artificial Intelligence, vol. 34, no. 04, 2020, pp. pp. 122 154–122 184, 2025.
3593–3600.
[155] B. Cai, S. Yang, L. Gao, and Y. Xiang, “Hybrid variational autoen-
[134] H. Wang, J. Peng, F. Huang, J. Wang, J. Chen, and Y. Xiao, “Micn: coder for time series forecasting,” Knowledge-Based Systems, vol.
Multi-scale local and global context modeling for long-term series 281, p. 111079, 2023.
forecasting,” in The eleventh international conference on learning [156] J. Yoon, D. Jarrett, and M. Van der Schaar, “Time-series generative
representations, 2023. adversarial networks,” Advances in neural information processing
[135] M. Liu, A. Zeng, M. Chen, Z. Xu, Q. Lai, L. Ma, and Q. Xu, systems, vol. 32, 2019.
“Scinet: Time series modeling and forecasting with sample con- [157] N. Kim, D.-K. Chae, J. A. Shin, S.-W. Kim, D. H. Chau, and
volution and interaction,” Advances in Neural Information Process- S. Park, “Context-aware traffic flow forecasting in new roads,” in
ing Systems, vol. 35, pp. 5816–5828, 2022. Proceedings of the 31st ACM International Conference on Information
[136] Z. Li, Y. Qin, X. Cheng, and Y. Tan, “Ftmixer: Frequency and time & Knowledge Management, 2022, pp. 4133–4137.
domain representations fusion for time series modeling,” arXiv [158] Y. Zhang, Y. Li, X. Zhou, X. Kong, and J. Luo, “Curb-gan:
preprint arXiv:2405.15256, 2024. Conditional urban traffic estimation through spatio-temporal
[137] Z. Yue, Y. Wang, J. Duan, T. Yang, C. Huang, Y. Tong, and B. Xu, generative adversarial networks,” in Proceedings of the 26th ACM
“Ts2vec: Towards universal representation of time series,” in SIGKDD International Conference on Knowledge Discovery & Data
Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, Mining, 2020, pp. 842–852.
no. 8, 2022, pp. 8980–8987. [159] E. Tamir, N. Laabid, M. Heinonen, V. Garg, and A. Solin,
[138] S. Li, X. Jin, Y. Xuan, X. Zhou, W. Chen, Y.-X. Wang, and X. Yan, “Conditional flow matching for time series modelling,” ICML
“Enhancing the locality and breaking the memory bottleneck 2024 Workshop on Structured Probabilistic Inference and Generative
of transformer on time series forecasting,” Advances in neural Modeling, 2023.
information processing systems, vol. 32, 2019. [160] X. N. Zhang, Y. Pu, Y. Kawamura, A. Loza, Y. Bengio, D. Shung,
[139] T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin, “Fedformer: and A. Tong, “Trajectory flow matching with applications to
Frequency enhanced decomposed transformer for long-term se- clinical time series modelling,” Advances in Neural Information
ries forecasting,” in International conference on machine learning. Processing Systems, vol. 37, pp. 107 198–107 224, 2025.
PMLR, 2022, pp. 27 268–27 286. [161] Y. Hu, X. Wang, L. Wu, H. Zhang, S. Z. Li, S. Wang, and T. Chen,
[140] Y. Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, “Fm-ts: Flow matching for time series generation,” arXiv preprint
“itransformer: Inverted transformers are effective for time series arXiv:2411.07506, 2024.
forecasting,” in The Twelfth International Conference on Learning [162] K. Rasul, C. Seward, I. Schuster, and R. Vollgraf, “Autoregressive
Representations, 2024. denoising diffusion models for multivariate probabilistic time
[141] B. N. Oreshkin, D. Carpov, N. Chapados, and Y. Bengio, “N-beats: series forecasting,” in International Conference on Machine Learning.
Neural basis expansion analysis for interpretable time series fore- PMLR, 2021, pp. 8857–8868.
casting,” in International Conference on Learning Representations, [163] M. Kollovieh, A. F. Ansari, M. Bohlke-Schneider, J. Zschiegner,
2020. H. Wang, and Y. B. Wang, “Predict, refine, synthesize: Self-
guiding diffusion models for probabilistic time series forecast-
[142] Y. Liu, C. Li, J. Wang, and M. Long, “Koopa: Learning non-
ing,” Advances in Neural Information Processing Systems, vol. 36,
stationary time series dynamics with koopman predictors,” Ad-
2024.
vances in Neural Information Processing Systems, vol. 36, 2024.
[164] T. Yan, H. Zhang, T. Zhou, Y. Zhan, and Y. Xia, “Score-
[143] Z. Xu, A. Zeng, and Q. Xu, “FITS: Modeling time series with grad: Multivariate probabilistic time series forecasting with
$10k$ parameters,” in The Twelfth International Conference on continuous energy-based generative models,” arXiv preprint
Learning Representations, 2024. arXiv:2106.10121, 2021.
[144] S. Lin, W. Lin, W. Wu, H. Chen, and J. Yang, “SparseTSF: Mod- [165] L. Shen and J. T. Kwok, “Non-autoregressive conditional dif-
eling long-term time series forecasting with *1k* parameters,” in fusion models for time series prediction,” in Proceedings of the
Forty-first International Conference on Machine Learning, 2024. 40th International Conference on Machine Learning (ICML), 2023,
[145] S. Wang, H. Wu, X. Shi, T. Hu, H. Luo, L. Ma, J. Y. Zhang, pp. 31 016–31 029.
and J. ZHOU, “Timemixer: Decomposable multiscale mixing for [166] J. Zhang, M. Cheng, X. Tao, Z. Liu, and D. Wang, “Fdf: Flexi-
time series forecasting,” in The Twelfth International Conference on ble decoupled framework for time series forecasting with con-
Learning Representations, 2024. ditional denoising and polynomial modeling,” arXiv preprint
[146] S. S. Rangapuram, M. W. Seeger, J. Gasthaus, L. Stella, Y. Wang, arXiv:2410.13253, 2024.
and T. Januschowski, “Deep state space models for time series [167] S. Tonekaboni, D. Eytan, and A. Goldenberg, “Unsupervised rep-
forecasting,” Advances in neural information processing systems, resentation learning for time series with temporal neighborhood
vol. 31, 2018. coding,” in International Conference on Learning Representations,
[147] M. Zhang, K. K. Saab, M. Poli, T. Dao, K. Goel, and C. Re, “Ef- 2021.
fectively modeling time series with simple discrete state spaces,” [168] E. Eldele, M. Ragab, Z. Chen, M. Wu, C. K. Kwoh, X. Li, and
in The Eleventh International Conference on Learning Representations, C. Guan, “Time-series representation learning via temporal and
2023. contextual contrasting,” arXiv preprint arXiv:2106.14112, 2021.
PREPRINT 26
[169] X. Zhang, Z. Zhao, T. Tsiligkaridis, and M. Zitnik, “Self- ries for cuffless blood pressure estimation,” npj Digital Medicine,
supervised contrastive pre-training for time series via time- vol. 6, no. 1, p. 110, 2023.
frequency consistency,” Advances in Neural Information Processing [190] F. M. Abushaqra, H. Xue, Y. Ren, and F. D. Salim, “Seqlink: A
Systems, vol. 35, pp. 3988–4003, 2022. robust neural-ode architecture for modelling partially observed
[170] G. Zerveas, S. Jayaraman, D. Patel, A. Bhamidipaty, and C. Eick- time series,” Transactions on Machine Learning Research, 2024.
hoff, “A transformer-based framework for multivariate time [191] M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed
series representation learning,” in Proceedings of the 27th ACM neural networks: A deep learning framework for solving forward
SIGKDD conference on knowledge discovery & data mining, 2021, pp. and inverse problems involving nonlinear partial differential
2114–2124. equations,” Journal of Computational physics, vol. 378, pp. 686–707,
[171] M. Cheng, Q. Liu, Z. Liu, H. Zhang, R. Zhang, and E. Chen, 2019.
“Timemae: Self-supervised representations of time series with de- [192] B. Huang and J. Wang, “Applications of physics-informed neural
coupled masked autoencoders,” arXiv preprint arXiv:2303.00320, networks in power systems-a review,” IEEE Transactions on Power
2023. Systems, vol. 38, no. 1, pp. 572–588, 2022.
[172] J. Dong, H. Wu, H. Zhang, L. Zhang, J. Wang, and M. Long, [193] A. Bracco, J. Brajard, H. A. Dijkstra, P. Hassanzadeh, C. Lessig,
“Simmtm: A simple pre-training framework for masked time- and C. Monteleoni, “Machine learning for the physics of climate,”
series modeling,” Advances in Neural Information Processing Sys- Nature Reviews Physics, pp. 1–15, 2024.
tems, vol. 36, 2024. [194] R. Dang-Nhu, G. Singh, P. Bielik, and M. Vechev, “Adversarial
[173] D. Wang, M. Cheng, Z. Liu, Q. Liu, and E. Chen, “Timedart: attacks on probabilistic autoregressive forecasting models,” in
A diffusion autoregressive transformer for self-supervised time International Conference on Machine Learning. PMLR, 2020, pp.
series representation,” arXiv preprint arXiv:2410.05711, 2024. 2356–2365.
[174] X. Jin, Y. Park, D. Maddix, H. Wang, and Y. Wang, “Domain [195] L. Liu, Y. Park, T. N. Hoang, H. Hasson, and L. Huan, “Ro-
adaptation for time series forecasting via attention sharing,” in bust multivariate time-series forecasting: Adversarial attacks and
International Conference on Machine Learning. PMLR, 2022, pp. defense mechanisms,” in The Eleventh International Conference on
10 280–10 297. Learning Representations, 2023.
[175] B. Wang, J. Ma, P. Wang, X. Wang, Y. Zhang, Z. Zhou, and [196] F. Liu, W. Zhang, and H. Liu, “Robust spatiotemporal traffic
Y. Wang, “Stone: A spatio-temporal ood learning framework kills forecasting with reinforced dynamic adversarial training,” in
both spatial and temporal shifts,” in Proceedings of the 30th ACM Proceedings of the 29th ACM SIGKDD Conference on Knowledge
SIGKDD Conference on Knowledge Discovery and Data Mining, 2024, Discovery and Data Mining, 2023, pp. 1417–1428.
pp. 2948–2959. [197] X. Lin, Z. Liu, D. Fu, R. Qiu, and H. Tong, “Backtime: Backdoor
[176] H. Liu, H. Kamarthi, L. Kong, Z. Zhao, C. Zhang, and B. A. attacks on multivariate time series forecasting,” in The Thirty-
Prakash, “Time-series forecasting for out-of-distribution gener- eighth Annual Conference on Neural Information Processing Systems,
alization using invariant learning,” in Forty-first International 2024.
Conference on Machine Learning, 2024. [198] C. Meng, S. Rambhatla, and Y. Liu, “Cross-node federated graph
[177] T. Zhou, P. Niu, L. Sun, R. Jin et al., “One fits all: Power general neural network for spatio-temporal data modeling,” in Proceed-
time series analysis by pretrained lm,” Advances in neural infor- ings of the 27th ACM SIGKDD conference on knowledge discovery &
mation processing systems, vol. 36, pp. 43 322–43 355, 2023. data mining, 2021, pp. 1202–1211.
[199] S. Chen, G. Long, T. Shen, and J. Jiang, “Prompt federated
[178] X. Liu, J. Hu, Y. Li, S. Diao, Y. Liang, B. Hooi, and R. Zim-
learning for weather forecasting: toward foundation models on
mermann, “Unitime: A language-empowered unified model for
meteorological data,” in Proceedings of the Thirty-Second Interna-
cross-domain time series forecasting,” in Proceedings of the ACM
tional Joint Conference on Artificial Intelligence, 2023, pp. 3532–3540.
on Web Conference 2024, 2024, pp. 4095–4106.
[200] Q. Liu, X. Liu, C. Liu, Q. Wen, and Y. Liang, “Time-ffm: Towards
[179] C. Chang, W.-C. Peng, and T.-F. Chen, “Llm4ts: Two-stage fine-
lm-empowered federated foundation model for time series fore-
tuning for time-series forecasting with pre-trained llms,” arXiv
casting,” arXiv preprint arXiv:2405.14252, 2024.
preprint arXiv:2308.08469, 2023.
[201] H. Drucker, C. J. Burges, L. Kaufman, A. Smola, and V. Vapnik,
[180] D. Cao, F. Jia, S. O. Arik, T. Pfister, Y. Zheng, W. Ye, and Y. Liu, “Support vector regression machines,” Advances in neural infor-
“TEMPO: Prompt-based generative pre-trained transformer for mation processing systems, vol. 9, 1996.
time series forecasting,” in The Twelfth International Conference on [202] C. Cortes, “Support-vector networks,” Machine Learning, 1995.
Learning Representations, 2024.
[203] A. J. Smola and B. Schölkopf, “A tutorial on support vector
[181] N. Gruver, M. Finzi, S. Qiu, and A. G. Wilson, “Large language regression,” Statistics and computing, vol. 14, pp. 199–222, 2004.
models are zero-shot time series forecasters,” Advances in Neural [204] L. BREIMAN, “Classification and regression trees,” Monterey, CA:
Information Processing Systems, vol. 36, 2024. Wadsworth and Brools, 1984.
[182] M. Jin, S. Wang, L. Ma, Z. Chu, J. Y. Zhang, X. Shi, P.-Y. Chen, [205] L. Breiman, “Random forests,” Machine learning, vol. 45, pp. 5–32,
Y. Liang, Y.-F. Li, S. Pan et al., “Time-llm: Time series forecast- 2001.
ing by reprogramming large language models,” arXiv preprint [206] J. H. Friedman, “Greedy function approximation: a gradient
arXiv:2310.01728, 2023. boosting machine,” Annals of statistics, pp. 1189–1232, 2001.
[183] C. Sun, H. Li, Y. Li, and S. Hong, “TEST: Text prototype aligned [207] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting sys-
embedding to activate LLM’s ability for time series,” in The tem,” in Proceedings of the 22nd acm sigkdd international conference
Twelfth International Conference on Learning Representations, 2024. on knowledge discovery and data mining, 2016, pp. 785–794.
[184] M. Goswami, K. Szafer, A. Choudhry, Y. Cai, S. Li, [208] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-
and A. Dubrawski, “Moment: A family of open time- Y. Liu, “Lightgbm: A highly efficient gradient boosting decision
series foundation models,” 2024. [Online]. Available: https: tree,” Advances in neural information processing systems, vol. 30,
//[Link]/abs/2402.03885 2017.
[185] A. Das, W. Kong, R. Sen, and Y. Zhou, “A decoder-only [209] T. Cover and P. Hart, “Nearest neighbor pattern classification,”
foundation model for time-series forecasting,” arXiv preprint IEEE transactions on information theory, vol. 13, no. 1, pp. 21–27,
arXiv:2310.10688, 2023. 1967.
[186] S. Johansen, “Estimation and hypothesis testing of cointegration [210] H. Hewamalage, C. Bergmeir, and K. Bandara, “Recurrent neural
vectors in gaussian vector autoregressive models,” Econometrica: networks for time series forecasting: Current status and future
journal of the Econometric Society, pp. 1551–1580, 1991. directions,” International Journal of Forecasting, vol. 37, no. 1, pp.
[187] L. Song, M. Kolar, and E. Xing, “Time-varying dynamic bayesian 388–427, 2021.
networks,” Advances in neural information processing systems, [211] R. Wen, K. Torkkola, B. Narayanaswamy, and D. Madeka,
vol. 22, 2009. “A multi-horizon quantile recurrent forecaster,” arXiv preprint
[188] P. Cui, Z. Shen, S. Li, L. Yao, Y. Li, Z. Chu, and J. Gao, “Causal arXiv:1711.11053, 2017.
inference meets machine learning,” in Proceedings of the 26th [212] Y. Bengio, P. Simard, and P. Frasconi, “Learning long-term de-
ACM SIGKDD international conference on knowledge discovery & pendencies with gradient descent is difficult,” IEEE transactions
data mining, 2020, pp. 3527–3528. on neural networks, vol. 5, no. 2, pp. 157–166, 1994.
[189] K. Sel, A. Mohammadi, R. I. Pettigrew, and R. Jafari, “Physics- [213] R. Pascanu, “On the difficulty of training recurrent neural net-
informed neural networks for modeling physiological time se- works,” arXiv preprint arXiv:1211.5063, 2013.
PREPRINT 27
[214] B. Peng, E. Alcaide, Q. Anthony, A. Albalak, S. Arcadinho, in ICASSP 2024-2024 IEEE International Conference on Acoustics,
S. Biderman, H. Cao, X. Cheng, M. Chung, M. Grella et al., Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 7335–7339.
“Rwkv: Reinventing rnns for the transformer era,” arXiv preprint [236] S. Li, H. Xiong, and Y. Chen, “Diffplf: A conditional diffusion
arXiv:2305.13048, 2023. model for probabilistic forecasting of ev charging load,” arXiv
[215] M. Beck, K. Pöppel, M. Spanring, A. Auer, O. Prudnikova, preprint arXiv:2402.13548, 2024.
M. Kopp, G. Klambauer, J. Brandstetter, and S. Hochreiter, [237] M. Ragab, E. Eldele, Z. Chen, M. Wu, C.-K. Kwoh, and X. Li,
“xlstm: Extended long short-term memory,” arXiv preprint “Self-supervised autoregressive domain adaptation for time se-
arXiv:2405.04517, 2024. ries data,” IEEE Transactions on Neural Networks and Learning
[216] Y. LeCun, Y. Bengio et al., “Convolutional networks for images, Systems, vol. 35, no. 1, pp. 1341–1351, 2022.
speech, and time series,” The handbook of brain theory and neural [238] S. Mirchandani, F. Xia, P. Florence, B. Ichter, D. Driess, M. G.
networks, vol. 3361, no. 10, p. 1995, 1995. Arenas, K. Rao, D. Sadigh, and A. Zeng, “Large language models
[217] S. Bai, J. Z. Kolter, and V. Koltun, “An empirical evaluation as general pattern machines,” arXiv preprint arXiv:2307.04721,
of generic convolutional and recurrent networks for sequence 2023.
modeling,” arXiv preprint arXiv:1803.01271, 2018. [239] Y. Wang, Z. Chu, X. Ouyang, S. Wang, H. Hao, Y. Shen, J. Gu,
[218] S. Huang, D. Wang, X. Wu, and A. Tang, “Dsanet: Dual self- S. Xue, J. Y. Zhang, Q. Cui et al., “Enhancing recommender
attention network for multivariate time series forecasting,” in systems with large language model reasoning graphs,” arXiv
Proceedings of the 28th ACM international conference on information preprint arXiv:2308.10835, 2023.
and knowledge management, 2019, pp. 2129–2132. [240] F. Jia, K. Wang, Y. Zheng, D. Cao, and Y. Liu, “Gpt4mts: Prompt-
[219] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. based large language model for multimodal time-series forecast-
Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” ing,” in Proceedings of the AAAI Conference on Artificial Intelligence,
Advances in neural information processing systems, vol. 30, 2017. vol. 38, no. 21, 2024, pp. 23 343–23 351.
[220] S. Liu, H. Yu, C. Liao, J. Li, W. Lin, A. X. Liu, and S. Dust- [241] E. J. Hu, yelong shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang,
dar, “Pyraformer: Low-complexity pyramidal attention for long- L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large
range time series modeling and forecasting,” in International language models,” in International Conference on Learning Repre-
Conference on Learning Representations, 2022. sentations, 2022.
[221] Y. Liu, H. Wu, J. Wang, and M. Long, “Non-stationary trans- [242] H. Liu, Z. Zhao, J. Wang, H. Kamarthi, and B. A. Prakash,
formers: Exploring the stationarity in time series forecasting,” “Lstprompt: Large language models as zero-shot time se-
Advances in Neural Information Processing Systems, vol. 35, pp. ries forecasters by long-short-term prompting,” arXiv preprint
9881–9893, 2022. arXiv:2402.16132, 2024.
[222] G. Woo, C. Liu, D. Sahoo, A. Kumar, and S. Hoi, “Etsformer: [243] Y. Zhu, Z. Wang, J. Gao, Y. Tong, J. An, W. Liao, E. M. Harrison,
Exponential smoothing transformers for time-series forecasting,” L. Ma, and C. Pan, “Prompting large language models for zero-
arXiv preprint arXiv:2202.01381, 2022. shot clinical prediction with structured longitudinal electronic
[223] J. Jiang, C. Han, W. X. Zhao, and J. Wang, “Pdformer: Propa- health record data,” arXiv preprint arXiv:2402.01713, 2024.
gation delay-aware dynamic long-range transformer for traffic [244] H. Zhang, C. Xu, Y.-F. Zhang, Z. Zhang, L. Wang, J. Bian, and
flow prediction,” in Proceedings of the AAAI conference on artificial T. Tan, “Timeraf: Retrieval-augmented foundation model for
intelligence, vol. 37, no. 4, 2023, pp. 4365–4373. zero-shot time series forecasting,” arXiv preprint arXiv:2412.20810,
[224] R. Ilbert, A. Odonnat, V. Feofanov, A. Virmaux, G. Paolo, T. Pal- 2024.
panas, and I. Redko, “Samformer: unlocking the potential of [245] M. Xiao, Z. Jiang, Z. Chen, D. Li, S. Chen, S. Ananiadou, J. Huang,
transformers in time series forecasting with sharpness-aware M. Peng, and Q. Xie, “Timerag: It’s time for retrieval-augmented
minimization and channel-wise attention,” in Proceedings of the generation in time-series forecasting.”
41st International Conference on Machine Learning, 2024, pp. 20 924– [246] J. Wang, M. Cheng, Q. Mao, Q. Liu, F. Xu, X. Li, and E. Chen,
20 954. “Tabletime: Reformulating time series classification as zero-shot
[225] A. Gu and T. Dao, “Mamba: Linear-time sequence modeling with table understanding via large language models,” arXiv preprint
selective state spaces,” arXiv preprint arXiv:2312.00752, 2023. arXiv:2411.15737, 2024.
[226] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhari- [247] M. Lin, Z. Chen, Y. Liu, X. Zhao, Z. Wu, J. Wang, X. Zhang,
wal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., S. Wang, and H. Chen, “Decoding time series with llms: A multi-
“Language models are few-shot learners,” Advances in neural agent framework for cross-domain annotation,” arXiv preprint
information processing systems, vol. 33, pp. 1877–1901, 2020. arXiv:2410.17462, 2024.
[227] D. P. Kingma, M. Welling et al., “Auto-encoding variational [248] M. A. Merrill, M. Tan, V. Gupta, T. Hartvigsen, and T. Althoff,
bayes,” 2013. “Language models still struggle to zero-shot reason about time
[228] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde- series,” arXiv preprint arXiv:2404.11757, 2024.
Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative ad- [249] C. Glymour, K. Zhang, and P. Spirtes, “Review of causal discov-
versarial nets,” Advances in neural information processing systems, ery methods based on graphical models,” Frontiers in genetics,
vol. 27, 2014. vol. 10, p. 524, 2019.
[229] D. Rezende and S. Mohamed, “Variational inference with nor- [250] J. Tian and J. Pearl, “Causal discovery from changes,” arXiv
malizing flows,” in International conference on machine learning. preprint arXiv:1301.2312, 2013.
PMLR, 2015, pp. 1530–1538. [251] C. K. Assaad, E. Devijver, and E. Gaussier, “Survey and eval-
[230] Z. Wang, Q. Wen, C. Zhang, L. Sun, and Y. Wang, “Diffload: uation of causal discovery methods for time series,” Journal of
uncertainty quantification in load forecasting with diffusion Artificial Intelligence Research, vol. 73, pp. 767–819, 2022.
model,” arXiv preprint arXiv:2306.01001, 2023. [252] C. Miller, A. Kathirgamanathan, B. Picchetti, P. Arjunan, J. Y.
[231] P. Chang, H. Li, S. F. Quan, S. Lu, S.-F. Wung, J. Roveda, and Park, Z. Nagy, P. Raftery, B. W. Hobson, Z. Shi, and F. Meggers,
A. Li, “Tdstf: Transformer-based diffusion probabilistic model for “The building data genome project 2, energy meter data from
sparse time series forecasting,” arXiv preprint arXiv:2301.06625, the ashrae great energy predictor iii competition,” Scientific data,
2023. vol. 7, no. 1, p. 368, 2020.
[232] N. Neifar, A. Ben-Hamadou, A. Mdhaffar, and M. Jmaiel, “Dif- [253] S. Barker, A. Mishra, D. Irwin, E. Cecchet, P. Shenoy, J. Albrecht
fecg: A versatile probabilistic diffusion model for ecg signals et al., “Smart*: An open data set and tools for enabling research
synthesis,” arXiv preprint arXiv:2306.01875, 2023. in sustainable homes,” SustKDD, August, vol. 111, no. 112, p. 108,
[233] S. Feng, C. Miao, Z. Zhang, and P. Zhao, “Latent diffusion trans- 2012.
former for probabilistic time series forecasting,” in Proceedings of [254] D. A. Bashawyah and S. M. Qaisar, “Machine learning based
the AAAI Conference on Artificial Intelligence, vol. 38, no. 11, 2024, short-term load forecasting for smart meter energy consumption
pp. 11 979–11 987. data in london households,” in 2021 IEEE 12th International
[234] P. Shao, J. Feng, J. Lu, P. Zhang, and C. Zou, “Data-driven and Conference on Electronics and Information Technologies (ELIT). IEEE,
knowledge-guided denoising diffusion model for flood forecast- 2021, pp. 99–102.
ing,” Expert Systems with Applications, vol. 244, p. 122908, 2024. [255] J. Zhou, X. Lu, Y. Xiao, J. Su, J. Lyu, Y. Ma, and D. Dou, “Sdwpf:
[235] D. Daiya, M. Yadav, and H. S. Rao, “Diffstock: Probabilistic A dataset for spatial dynamic wind power forecasting challenge
relational stock market predictions using diffusion models,” at kdd cup 2022,” arXiv preprint arXiv:2208.04360, 2022.
PREPRINT 28
[256] A. Alexandrov, K. Benidis, M. Bohlke-Schneider, V. Flunkert, [276] C.-Y. Hsu and W.-C. Liu, “Multiple time-series convolutional
J. Gasthaus, T. Januschowski, D. C. Maddix, S. Rangapuram, neural network for fault detection and diagnosis and empirical
D. Salinas, J. Schulz et al., “Gluonts: Probabilistic and neural time study in semiconductor manufacturing,” Journal of Intelligent
series modeling in python,” Journal of Machine Learning Research, Manufacturing, vol. 32, no. 3, pp. 823–836, 2021.
vol. 21, no. 116, pp. 1–6, 2020. [277] H. Ben Ameur, S. Boubaker, Z. Ftiti, W. Louhichi, and K. Tissaoui,
[257] J. Wang, J. Jiang, W. Jiang, C. Li, and W. X. Zhao, “Libcity: An “Forecasting commodity prices: empirical evidence using deep
open library for traffic prediction,” in Proceedings of the 29th in- learning tools,” Annals of Operations Research, vol. 339, no. 1, pp.
ternational conference on advances in geographic information systems, 349–367, 2024.
2021, pp. 145–148. [278] W. Kong, H. Li, C. Yu, J. Xia, Y. Kang, and P. Zhang, “A deep
[258] Google, “Web traffic time series forecasting,” [Link] spatio-temporal forecasting model for multi-site weather predic-
[Link]/c/web-traffic-time-series-forecasting, 2017. tion post-processing,” Communications in Computational Physics,
[259] H. V. Jagadish, J. Gehrke, A. Labrinidis, Y. Papakonstantinou, vol. 31, no. 1, pp. 131–153, 2022.
J. M. Patel, R. Ramakrishnan, and C. Shahabi, “Big data and its [279] H. Wu, H. Zhou, M. Long, and J. Wang, “Interpretable weather
technical challenges,” Communications of the ACM, vol. 57, no. 7, forecasting for worldwide stations with a unified deep model,”
pp. 86–94, 2014. Nature Machine Intelligence, vol. 5, no. 6, pp. 602–611, 2023.
[260] S. Mouatadid, P. Orenstein, G. Flaspohler, M. Oprescu, J. Cohen, [280] S. Guo, Y. Lin, N. Feng, C. Song, and H. Wan, “Attention based
F. Wang, S. Knight, M. Geogdzhayeva, S. Levang, E. Fraenkel spatial-temporal graph convolutional networks for traffic flow
et al., “Subseasonalclimateusa: a dataset for subseasonal forecast- forecasting,” in Proceedings of the AAAI conference on artificial
ing and benchmarking,” Advances in Neural Information Processing intelligence, vol. 33, no. 01, 2019, pp. 922–929.
Systems, vol. 36, 2024. [281] J. Fan, W. Weng, H. Tian, H. Wu, F. Zhu, and J. Wu, “Rgdan: A
random graph diffusion attention network for traffic prediction,”
[261] X. Qiu, J. Hu, L. Zhou, X. Wu, J. Du, B. Zhang, C. Guo, A. Zhou,
Neural networks, vol. 172, p. 106093, 2024.
C. S. Jensen, Z. Sheng et al., “Tfb: Towards comprehensive and
fair benchmarking of time series forecasting methods,” Proceed-
ings of the VLDB Endowment, vol. 17, no. 9, pp. 2363–2377, 2024.
[262] Y. Zheng, X. Yi, M. Li, R. Li, Z. Shan, E. Chang, and T. Li,
“Forecasting fine-grained air quality based on big data,” in
Proceedings of the 21th ACM SIGKDD international conference on
knowledge discovery and data mining, 2015, pp. 2267–2276.
[263] S. Chen, “Beijing Multi-Site Air Quality,” UCI Machine Learning
Repository, 2017, DOI: [Link]
[264] S. Makridakis, A. Andersen, R. Carbone, R. Fildes, M. Hibon,
R. Lewandowski, J. Newton, E. Parzen, and R. Winkler, “The
accuracy of extrapolation (time series) methods: Results of a
forecasting competition,” Journal of forecasting, vol. 1, no. 2, pp.
111–153, 1982.
[265] S. Makridakis and M. Hibon, “The m3-competition: results,
conclusions and implications,” International journal of forecasting,
vol. 16, no. 4, pp. 451–476, 2000.
[266] S. Makridakis, E. Spiliotis, and V. Assimakopoulos, “The m4
competition: Results, findings, conclusion and way forward,”
International Journal of forecasting, vol. 34, no. 4, pp. 802–808, 2018.
[267] ——, “M5 accuracy competition: Results, findings, and conclu-
sions,” International Journal of Forecasting, vol. 38, no. 4, pp. 1346–
1364, 2022.
[268] S. B. Taieb, G. Bontempi, A. F. Atiya, and A. Sorjamaa, “A review
and comparison of strategies for multi-step ahead time series
forecasting based on the nn5 forecasting competition,” Expert
systems with applications, vol. 39, no. 8, pp. 7067–7083, 2012.
[269] G. Athanasopoulos, R. J. Hyndman, H. Song, and D. C. Wu,
“The tourism forecasting competition,” International Journal of
Forecasting, vol. 27, no. 3, pp. 822–844, 2011.
[270] W. G. van Panhuis, A. Cross, and D. S. Burke, “Project tycho 2.0: a
repository to improve the integration and reuse of data for global
population health,” Journal of the American Medical Informatics
Association, vol. 25, no. 12, pp. 1608–1617, 2018.
[271] F. Piccialli, F. Giampaolo, E. Prezioso, D. Camacho, and G. Acam-
pora, “Artificial intelligence and healthcare: Forecasting of med-
ical bookings through multi-source time-series fusion,” Informa-
tion Fusion, vol. 74, pp. 1–16, 2021.
[272] J. Xie and Q. Wang, “Benchmarking machine learning algorithms
on blood glucose prediction for type i diabetes in comparison
with classical time-series models,” IEEE Transactions on Biomedical
Engineering, vol. 67, no. 11, pp. 3101–3124, 2020.
[273] E. Hwang, Y.-S. Park, J.-Y. Kim, S.-H. Park, J. Kim, and S.-H.
Kim, “Intraoperative hypotension prediction based on features
automatically generated within an interpretable deep learning
model,” IEEE Transactions on Neural Networks and Learning Sys-
tems, 2023.
[274] F. Lu, W. Li, Z. Zhou, C. Song, Y. Sun, Y. Zhang, Y. Ren, X. Liao,
H. Jin, A. Luo et al., “A composite multi-attention framework for
intraoperative hypotension early warning,” in Proceedings of the
AAAI Conference on Artificial Intelligence, vol. 37, no. 12, 2023, pp.
14 374–14 381.
[275] M. Cheng, J. Zhang, Z. Liu, C. Liu, and Y. Xie, “Hmf: A hybrid
multi-factor framework for dynamic intraoperative hypotension
prediction,” arXiv preprint arXiv:2409.11064, 2024.