0% found this document useful (0 votes)
58 views8 pages

Hamid Price20Classification GM2011

The document discusses using machine learning models to classify electricity prices with respect to predefined thresholds. It proposes two alternative support vector machine (SVM) based models for multi-class multi-step-ahead price classification and provides numerical results for Ontario and Alberta electricity markets.

Uploaded by

Dasmit Sethi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views8 pages

Hamid Price20Classification GM2011

The document discusses using machine learning models to classify electricity prices with respect to predefined thresholds. It proposes two alternative support vector machine (SVM) based models for multi-class multi-step-ahead price classification and provides numerical results for Ontario and Alberta electricity markets.

Uploaded by

Dasmit Sethi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/224261568

Electricity price thresholding and classification

Conference Paper · August 2011


DOI: 10.1109/PES.2011.6039506 · Source: IEEE Xplore

CITATIONS READS
2 182

5 authors, including:

Hamidreza Zareipour Henry Leung


The University of Calgary The University of Calgary
195 PUBLICATIONS 10,908 CITATIONS 527 PUBLICATIONS 11,729 CITATIONS

SEE PROFILE SEE PROFILE

Antony Schellenberg
The University of Calgary
20 PUBLICATIONS 1,028 CITATIONS

SEE PROFILE

All content following this page was uploaded by Henry Leung on 23 January 2014.

The user has requested enhancement of the downloaded file.


1

Electricity Price Thresholding and Classification


Hamidreza Zareipour, Senior Member, IEEE, Arya Janjani, Member, IEEE Henry Leung, Member, IEEE Amir
Motamedi, Student Member, IEEE Anthony Schellenberg, Member, IEEE

Abstract—With the advent of smart grids, electricity con- by various unpredictable events (e.g., forced outages) and
sumers are expected to be enabled to respond to electricity complex and irregular bidding strategies of market players.
price fluctuations. Thus, forecasting short-term future electricity As such, while relatively low point price forecasting errors,
prices is a key component of optimal demand-side management
in an enabled competitive environment. While several price in the order of 5%, have been reported in some cases, price
forecasting approaches are available, achieving low forecasting forecasting errors are usually high, up to 36%.
errors is not always possible, especially for markets with high The available point price forecasting methods usually fore-
price volatility. On the other hand, for demand-side management cast the exact value of prices in a given time in the future.
applications the exact value of future prices is not primarily Alternatively, classification of electricity prices with respect to
required, rather, whether prices hit certain thresholds is the
basis of making planning decisions. In this paper, classification user-defined thresholds [11] is discussed in the present paper.
of future electricity market prices with respect to pre-specified The premise of price classification is to label prices in different
price thresholds is investigated. Two alternative models based on classes, instead of trying to model price behaviour. In other
support vector machines are proposed in a multi-class, multi-step- words, price classification is based on pattern recognition,
ahead price classification context. Numerical results are provided whereas point price forecasting is based on function approx-
for price classification in Ontario’s and Alberta’s markets.
imation. Thus, the cost of price classification is the loss of
Index Terms—Demand-side management, smart grid, forecast- exact price information; this information may not be required
ing, classification.
in various applications, such as demand-side management.
However, the benefit, is the improved classification accuracy,
I. I NTRODUCTION as demonstrated in the present paper.
In general, price classification is particularly useful when it
D EMAND-side participation in operation of modern elec-
tric energy systems, especially from the residential and
commercial sectors, is expected to improve energy efficiency,
is only required to know if prices exceed certain thresholds.
In price classification, price-classes are predicted with respect
supply reliability and operational flexibility [1], [2]. In a smart to pre-specified price thresholds. Price thresholds are defined
electricity grid environment, demand-side resources will be by users based on their operation and planning objectives.
enabled to participate in electricity markets through an ad- Observe that one way to implement price classification is
vanced metering and control infrastructure. Enabled customers to label the future unknown prices according to the specified
will have the choice to reduce their energy costs through thresholds based on the point estimates given by the point
energy conservation and load shifting by switching loads forecasting methods. As demonstrated later in this paper, using
on/off depending on the dynamics of electricity prices [3]. a direct classifier, such as the ones discussed in this paper,
The on/off decisions are usually made when electricity prices provides more accurate results.
exceed a certain threshold. For example, a consumer may In this paper, the application of two alternative models
decide to turn off a certain load if prices are over $300/MWh, based on support vector machines (SVMs) to electricity price
and cut the load even further if prices are above $500/MWh. thresholding is investigated. The classifiers are designed in
Thus, forecasting the future prices of electricity with respect a multi-class multi-step-ahead classification context. A cost-
to predefined thresholds is of interest. based measure is introduced for assessing the economic value
Forecasting electricity prices in competitive electricity mar- of improving classification accuracies to the users.
kets has been widely addressed in the literature; see recent The remainder of this paper is organized as follows: In
reviews in [4], [5]. This has been driven by the critical role Section II, a review of the background pertaining to this work
that electricity prices play in the bidding strategy, decision is presented. The proposed models are discussed in Section
making, and operation scheduling of wholesale customers in III, followed by the numerical results in Section IV. Finally,
both demand and supply sides [6]. Available price forecasting the main findings of this paper are summarized in Section V.
methods are in fact “point forecasters” or “numerical fore-
casters”. In other words, these methods try to approximately II. BACKGROUND R EVIEW
model the underlying process of price formation in a market
and use the model to forecast the exact value of future prices When building a data-driven model, two major steps are
[7]–[10]. However, prices in electricity markets are influenced involved. In the first step, model inputs which potentially
describe the variations of the target variable needs to be
This work was financially supported by NSERC Canada and the University decided. This step, generally known as feature selection, starts
of Calgary’s Research Grant Committee (URGC). with a large set of potential inputs and by using a feature
Authors are with the Department of Electrical and Computer Engineering,
University of Calgary, AB, Canada; (e-mail: h.zareipour, janjania, leungh, selection technique, a smaller and yet informative set of inputs
amotamed, [email protected]). is determined. In the present work, historical price, load
2

and reserve information are considered in the initial set of algorithm is a filter method, but it is employed as a wrapper
features to describe the variations of electricity market price. method here. The steps of this method are summarized as
The Support Vector Machine Recursive Feature Elimination follows:
(SVMRFE) [12] and Kernel-based Feature Vector Selection 1) consider an initial set of input features X = {Xk , k =
(KFVS) [13] are employed for feature selection, which are 1, 2, ..., K, Xk ∈ Rϑ }.
briefly reviewed later in this section. In the next step, a model 2) Map each feature vector Xk in X into a new feature
structure needs to be identified that relates the input data to space Φ = {Φk , k = 1, 2, ..., K, Φk ∈ Rζ≤ϑ } using a
the target variable. In this paper, SVMs are employed as the pre-defined mapping function Φ(Xk ).
core model for classification, which are also quickly reviewed 3) Calculate the fitness values, which in fact are measures
in this section. of similarity, of each Φk ∈ Φ based on the normalized
Euclidean distance, as explained in [13], and move the
A. SVMRFE Method feature vector having the maximum global fitness value
and the one having the minimum local fitness value from
The original algorithm SVMRFE is presented in [12]. The the set Φ to an originally empty set of selected feature
steps of this approach is summarized as follows: vectors Φ(S) .
1) Consider an initial set of N potential features X = 4) Evaluate the performance of the classification model
{xn , n = 1, 2, ..., N, xn ∈ R}. using the feature vectors whose mapped equivalents are
2) Train a SVM classifier using X, and find corresponding in Φ(S) as input.
model parameters or weights, denoted here by Ω = 5) Recalculate the fitness of the feature vectors remaining
{ωx1 , ωx2 , ..., ωxn }, where ωxn represents the weight in Φ and move the feature vector having the lowest local
associated with feature xn . fitness value from Φ to Φ(S) .
3) Eliminate features with weights with the normalized 6) Re-evaluate the performance of the classification model
absolute values less than a defined threshold to further using features vectors whose mapped equivalents are in
mitigate the computational costs. This value of this Φ(S) . The newly added feature vector survives elimina-
threshold can be selected anything larger than zero and tion from Φ(S) only if it improves model accuracy.
less than one, depending how much computational time 7) Φ(S) is the final feature set if the stopping criterion is
is tolerated. The lower this threshold, the more features met or there are no more features left in Φ. Otherwise,
will be included in the next steps. go to step 5.
4) Using the remaining features, generate a ranked list of In the present study, the stopping criterion of total computa-
features based on the absolute value of their correspond- tional time is also added to the original set of stopping criteria
ing weights in a descending order, denoted here by in [13]. These two methods resulted in reasonably accurate
X (R) = {xi , xj , ...}, where ωx2i ≥ ωx2j . results at affordable computational costs.
5) Form the selected feature set X (S) starting from X (S) =
{xi |ωxi = max(Ω)}. Evaluate the performance of the
classification model using X (S) as the input. C. Binary Support Vector Machines
6) Add the next feature in X (R) to X (S) and re-evaluate the In SVM-based models, the main idea is to find separating
model performance. hyperplanes in order to distinguish different data classes in a
7) The newly added feature does not improve model accu- way that the hyperplanes have the maximum possible distance
racy it is eliminated from X (S) . from either of the data sets- the theory of SVMs is ex-
8) X (S) is the final feature set if the stopping criterion is plained in details in [14], [15]. SVMs can achieve reasonably
met or there are no more features left in X (R) to add to accurate classification results using a small set of training
X (S) . Otherwise, go to step 6. instances compared to other classification techniques [15].
Various stopping criteria in feature selection algorithms have This feature is particularly important in electricity market
been reported, such as a predefined level of classification price classification considering the fact that regime changes
accuracy, or a predefined number of features [12]. In the over relatively short periods of time have been observed in
present work, considering the fact that between the time that electricity markets [16]. Moreover, SVMs are relatively robust
the latest hourly price becomes available and the need for against outlier data in the training set because of the way
determining the class of price for the next hour is around they find the alignment of the hyperplanes [14], [15]. This
60 minutes, the stopping criterion is the computational time, is important knowing that outlier prices have been repeatedly
which is desired to be less than 60 minutes. observed in electricity market price patterns [17]. In addition,
SVMs have previously been applied to several applications
with competitive classification accuracy at reasonably low
B. KFVS Method computational time [18]. Therefore, SVMs are employed as
Feature vector selection is necessary when each element of the core classifier in the present work.
the feature space is itself a vector, as it is the case in one of A binary SVM is a “maximum margin classifier” that
the proposed classification models in this paper. The KFVS is used to separate a set of training data into two classes
method employed in the present work is a customized version while maximizing the margin between the two [14]. Assume
of the algorithm presented in [13] in the sense that the original {(Xi , Yi ), Xi ∈ RN , Yi ∈ {−1, 1}, i ∈ {1, ..., I}} is the set
3

of I linearly-separable training instances, where Xi is the N - gets the highest number of votes is chosen as the one to which
dimensional vector of features and Yi represents the class of that given candidate input data is assigned. Therefore, for an
instance i, say Yi = 1 for one class and Yi = −1 for the other. M -class problem with M − 1 decision boundaries, there will
The goal of training the SVM is to find two maximum-margin be M(M−1)
2 classifiers [20]. Each of these classifiers again
hyper-planes W ·Xi +b = 1 and W ·Xi +b = −1 which would distinguishes one of the classes in each pair from the other
separate the instances having Yi = 1 from those having Yi = through use of the bounding classification boundaries. For
-1. It can be shown that 2/||W || is the margin bounded by large number of classes, this approach is less computationally
the two parallel hyper-planes [15]. To maximize this margin, costly.
and to avoid non-convexity, the following equivalent quadratic 3) Single Machine Approach: In this approach, all available
optimization problem is solved to train the SVM and find SVM classes are considered at the same time. In this model, the mth
parameters W and b: decision function separates the input data belonging to class
K m from the other inputs. Therefore, there will be a total of
1 X
M decision functions, but they will all be obtained by solving
min ||W ||2 + γ ǫi (1)
W,b 2 only one optimization problem [20], [21]. The single machine
i=1
s.t: Yi (W · φ(Xi ) + b) ≥ 1 − ǫi i ∈ 1, ..., I (2) extension of (1)-(2) for an M -class problem can be written as
follows [20]:
where φ(Xi ) maps the input vector Xi into a high-dimensional M I X M
space, γ is a penalty factor, and the ǫi terms are slack 1 X X
min ||Wm ||2 + γ ǫm
i (3)
variables. The mapped input feature and the penalty term are 2 m=1 i=1 m=1
used when the data are linearly nonseparable [14], [15], [18]. s.t: (Wj · Xi + bj ) ≥ (Wm · Xi + bm ) + 2 − ǫm (4)
i
Constraint (2) prevents any instance from crossing over the
hyper-planes and falling into the margin, and in case one where j is the index of the specific class to which Xi belongs,
does, this constraint penalizes the objective function accord- Wm and bm are the weight coefficients and bias term of the
ingly. Optimization problem (1)-(2) may be solved using the corresponding separating hyperplane belonging to class m,
original optimal hyper-plane algorithm proposed by Vapnik and i ∈ {1, ..., I}. Since the size of this problem could be
[14]. However, when the size of the problem is high, the very high for large values of M , solution techniques which
Sequential Minimal Optimization approach proposed in [19] reduce problem (4) into M sub-problems have been proposed
is more computationally efficient. [20]. If more than one class is identified for a given training
instance Xi , the class that has the highest decision value will
be assigned to Xi .
D. Multi-Class Support Vector Machines
A comparative study reported in [20] has examined the
In the present work, it is assumed that the forecast user three alternative multi-class approaches for a wide range of
considers M +1 pre-defined price thresholds, T1 , T2 , ..., TM+1 classification applications. While for a given set of data,
where {pmin = T1 < ... < TM+1 = pmax } and pmin the three methods presented different levels of accuracy in
and pmax are the minimum and maximum observed prices. some cases, none of the methods consistently outperformed
Thus, there would exist M price classes, i.e., c1 , c2 , ..., cM , others across all applications. In the present work, the three
corresponding to (pmin = T1 ≤ pt < T2 ), (T2 ≤ pt < approaches were implemented but the overall accuracy results
T3 ), ..., (TM ≤ pt < TM+1 = pmax ) price ranges that are were not found significantly different; thus, the simulation
separated by M − 1 classification boundaries. Observe that T1 results presented in the following sections are for the one-
and TM+1 do not separate any two classes in this context as against-all approach.
they are considered to be at the minimum and maximum price
values, respectively.
III. T HE P ROPOSED SVM C LASSIFIERS
There are three alternative extensions to the binary SVM
when classifying the data into more than two classes is The present work focuses on 24-hour-ahead classification of
concerned. These extensions are as follows. hourly electricity market prices. Assume that previous prices
1) One-Against-All: In this approach, for an M -class prob- up to hour t, i.e., {..., pt−1 , pt }, are available and the objective
lem, M independent classifiers are trained to classify the input is to determine price classes for the next 24 hours, i.e. for
data. Each classifier decides whether or not a given training {pt+1 , pt+2 , ..., pt+24 }. For simplicity, it is assumed that pt
instance belongs to a certain class. The training instance may is the price at the last hour of Day d and the 24 hourly
either be found to be a member of that class, or it will price classes for Day d + 1 is to be determined. Linear
otherwise be assigned as a member of “others” class. If a [7] and nonlinear [22] correlation studies have shown that
given training instance Xi is not found as a member of any there is a strong correlation between price at any given hour
of the classes, it will be assigned to the class for which the and a number of preceding prices. In other words, pt+l ,
value of the decision function W.Xi + b of the classifier is the l ∈ {1, 2, ..., 24} has a strong correlation with pt+l−q , where
highest [20]. typical values of q are as follows [7], [9], [22]:
2) One-Against-One: In this approach, an independent clas-
q ∈ Q = {1, 2, 3, 23, 24, 25, 47, 48, 49, 72, 73, 96, 97, 120,
sifier is applied to each possible pair of classes for a given
training instance in an M -class problem, and the class that 121, 144, 145, 168, 336} (5)
4

Thus, these preceding prices, along with other features (e.g., the following matrix form:
load), have been consistently included in the price models  (h=1) (h=1) (h=1) 
pd pd−1 ... pd−(D−1)
proposed in the literature for numerical electricity market price  (h=2) (h=2) (h=1)
 pd pd−1 ... pd−(D−1) 

forecasting [7], [9], [22]. Focusing only on past prices, observe  
that when forecasting the class of pt+l , l > 1, the actual values X(h=j,d+1) =  .. .. ..
 (7)

M2  . . .
of a number of preceding prices are not available (e.g., for the 
 p(h=24) p(h=24) ... p(h=1)


above values of q, and for forecasting pt+5 , the actual values  d d−1 d−(D−1) 
(h=j) (h=j) (h=j)
of pt+4 , pt+3 and pt+2 are not available). In numerical price Γd,M2 Γd−1,M2 ...Γd−(D−1),M2
forecasting, the unavailable prices are normally replaced by (h=j)
their corresponding forecast values as the best guess. In the where Γd−δ,M2 represents the sub-set of all other non-price
present work, however, the output of a classification model is features from Day d − δ considered for Hour j. Observe that
no longer a numerical value, but a class, and thus, unavailable all 24 hourly prices of the D previous days are the common
prices cannot be replaced by their corresponding predictions features for all individual hours, i.e., for j = 1, 2, ..., 24,
anymore. Keeping in mind the mentioned unavailability of and (h=j)
the difference in feature sets for different hours lies in
some of the input features in classification stage, two alter- Γ d−δ,M2 . Although the most recent price information is not
native classification models are proposed in the next sections. represented in this model either, the historical price autocorre-
In both of the proposed models, such input features are not lation information is represented in the initial set of features.
included in the initial set of features. Also, through various It should be noted that for this model, the initial feature set
simulations, generating numerical forecasts for the unavailable is a ‘matrix’, compared to the ‘vector’ of features for the base
features was not found to be a competitive alternative for model. Thus, the task in feature selection stage for Model M2
the proposed models. The one-against-all multi-class SVM is to select the most informative days from the D previous
classifier, discussed in Section II-D1 is the core classifier in days, i.e., the most informative columns of (7). This is to
all of the proposed models. retain intra-day price and demand patterns and by doing this,
it is ensured that if a certain day is found informative, its full
price pattern is represented in the model. Therefore, feature
A. Base Model (BM): Independent Classifiers for Each Hour selection in this case is in fact ‘feature vector’ selection, and
In this model, the price time series {..., p , p , p } is hence, the KFVS approach of [13] is used for this model.
t−2 t−1 t
(h=1) (h=1) (h=24) (h=24)
broken into {{..., pd−1 , pd }, ..., {..., pd−1 , pd }}
(h=j) IV. N UMERICAL R ESULTS
sub-time series, where pd represents price at Hour j on
Historical data from the Ontario and Alberta electricity
Day d. Twenty four independent classifiers are trained to
markets, which have the most volatile prices [23], [24] in
predict the class of each of the hourly prices for the next
North America, are selected for numerical simulations in this
24 hours. In fact, the 24-hour-ahead classification problem is
work. Ontario’s physical electricity market is a real-time joint
broken into 24 one-step-ahead classification problems in this
(h=j) energy and reserve market and is cleared every five minutes.
model. Thus, to predict the class of pd+1 using the data form
The hourly average of cleared energy prices is referred to
D previous days, the set of initial features, denoted here by
(h=j,d+1) as Hourly Ontario Energy Price (HOEP) and applies to most
XBM , can be written as:
demand- and supply-side wholesale market participants [23].
(h=j,d+1) (h=j) (h=j) High numerical forecasting errors have been reported in the
XBM = {pd−δ , ΓBM , δ = 0, 1, ..., (D − 1)} (6)
literature for the HOEP, i.e., from 16 to 22% [9], [25].
(h=j) Alberta’s market is an energy-only, real-time market which is
where pd−δ is the price at Hour j on δ + 1 days previous to
(h=j) cleared every minute and the average of the 60 market clearing
Day d + 1, ΓBM represents the sub-set of all other non-price prices over an hour, referred to as the Hourly Alberta Pool
features (e.g., load) for Hour j for this model, and subscript Price (HAPP), is used as the basis of financial settlements
BM refers to the base model. The disadvantage of this [26].
approach is that neither the price autocorrelation information For Ontario’s market, price, load and reserve information
nor the most recent price information is represented in the are considered in the initial set of features. For representing
model. The SVMRFE approach [12] is used to select the final reserve, the predicted supply cushion (PSC), a variant of
(h=j,d+1)
set of features out of XBM for each of the 24 models. available reserve in the system for each hour, as defined
in more details in [9], is considered. For Alberta’s market,
price and load information are considered in the initial set
B. Model M2: Independent Classifiers for Each Hour Consid-
of features. In Alberta’s electricity market, system operating
ering Price Autocorrelations
reserves are auctioned independent of energy on a weekly
In this model, similar to the base model, price at each hour basis. Moreover, supply adequacy assessments are carried out
is classified independently. Unlike the base model for which for 2-year periods and the data is not completely available to
the historical prices for only Hour j from D previous days the public. Thus, no reserve metric has been considered for
were considered, in Model M2, all the 24 hourly prices of Alberta.
the D previous days are taken into account. Thus, the initial The models proposed in Section III are built and tested for
(h=j)
feature set for predicting the class of pd+1 can be written as 250 consecutive days starting from April 26, 2004. For each
5

TABLE I
given day, historical data for 35 previous days, i.e., D = 35, MPCE (%) FOR THE BASE MODEL THE HOEP AND THE HAPP
are considered in the initial set of features. The choice of
35 days is based on various trail and error analyses and has BM(HOEP)
(P) BM(HOEP)
(P,L) BM(HOEP)
(P,L,PSC) BM(HAPP)
(P) BM(HAPP)
(P,L)
resulted in the most accurate overall results. 13.95 14.08 14.03 15.43 15.7
The following price thresholds are considered for each
market:
HOEP
• For the HOEP: T1 = −2000, T2HOEP = 50, T3HOEP = Hour j of Day d+1. The resulting 24-hour-ahead classification
HOEP
100 and T4 = 2000, all in $/MWh. models for HOEP and HAPP classification are denoted here
• For the HAPP: T1
HAPP
= 0, T2HAPP = 55, T3HAPP = 100 by BM(HOEP)
(P,L) and BM(HAPP)
(P,L) , respectively.
HAPP
and T4 = 1000, all in $/MWh. In a third scenario, for Ontario’s market only, PSC data for
(h=j)
Observe that the values of T1HOEP & T1HAPP and T4HOEP & the 35 previous days, i.e., PSCd−δ , δ ∈ {0, 1, ..., 34} and for
T4HAPP are the official price floor and price cap defined by the (h=j)
the day under consideration, i.e., PSCd+1 , are also added to
applicable market rules in Ontario and Alberta. The values of the initial feature set, as follows:
T2 are selected here as the annual average of HOEP and HAPP (h=j,d+1) (h=j,d+1) (h=j) (h=j)
for year 2004. In a practical situation, however, the user would XBM,P,L,PSC = {XBM,P,L , PSCd−δ , PSCd+1 } (11)
normally define the thresholds. For the above price thresholds,
The resulting 24-hour-ahead classification models for HOEP
the price data are classified into three classes, i.e., c1 for prices
are referred to here by BM(HOEP)
(P,L,PSC) .
between T1 and T2 , c2 for prices between T2 and T3 , and c3
The classification errors of the three scenarios for the base
for prices between T3 and T4 . Thus, 3-class, one-against-all
model over the 250-day test period are presented in Table
SVM-based classifiers, briefly discussed in Section II-D1, are
I. Observe from the results that additional load and reserve
built for HOEP and HAPP classification.
information has not improved the classification accuracy.
The measure of overall classification error in this section is
defined as follows:
ηmc B. Model M2
MPCE(%) = × 100 (8)
ηT ot Model M2 is also trained for HOEP and HAPP classification
where MPCE is the mean percentage classification error, ηmc in three different scenarios for its initial feature set. In the
is the number of misclassifications and ηT ot is the total number first scenario when only historical prices are considered, the
of classification instances. A cost-based accuracy assessment is initial feature sets are identical for all the 24 hours, i.e.
(h=j)
also presented in Section IV-D. Given the number of developed in (7) Γd−δ,M2 = ∅, δ ∈ {0, 1, ..., 34}. In this case, the
models, and the consideration of three classes in this work, initial feature matrix is composed of 35 columns, each column
presenting other more detailed error/accuracy measures (e.g., consisting the 24 hourly prices of a historical day. For vector
confusion matrix) is not possible due to page limitations. feature selection in this scenario, and according to the KFVS
algorithm in [13], a mapping function is needed to reduce the
A. Base Model dimension of feature vectors. A mapping function composed
of the daily geometric mean of hourly prices, the intra-day
Considering the structure proposed in Section III-A, the base
price volatility [23] and the daily average of hourly prices
model is trained and tested for HOEP and HAPP classification
is used in the present work. The intra-day price volatility
using three alternative initial feature sets. The alternative initial
is a measure of price variations over a given day [23]. The
feature sets are to examine whether additional information can
presence of geometric mean mitigates the negative impact of
contribute to improving model accuracy when the pattern in
extreme outlier prices, whereas the price volatility ensures
historical prices is already captured. In the first scenario, the
that intra-day price fluctuations are represented. This mapping
base model is trained using an initial feature set which only
function basically maps the columns of the initial feature
consists of historical prices. Thus, in this case, the set of initial
(h=j) matrix from a 24-dimensional space into a 3-dimensional
features for classifying pd+1 , is considered as follows:
space. The outcome of the feature selection process here is
(h=j,d+1) (h=j) the best set of previous days, out of the 35 initially considered
XBM,P = {pd−δ , δ = 0, 1, ..., 34} (9)
days, whose data describes price classes the best.
In other words, prices at Hour j for 35 previous days are con- In the second scenario, load data are added to the set of
sidered as inputs to the model. Twenty four models are built initial features. Thus, the non-price feature sets can be written
for j = 1, ..., 24, and the resulting 24-hour-ahead classification as follows:
models for HOEP and HAPP are referred to here as BM(HOEP) (P) (h=j) (h=1) (h=24) (h=j)
and BM(HAPP) , respectively. Γd−δ,M2,L = {Ld−δ , ..., Ld−δ , L̂d+1 } (12)
(P)
In the second scenario, load data are also added to the set
where δ ∈ {0, 1, ..., 34}. Observe that the historical load for all
of initial features, as follows:
the 24 hours of the previous days are included in the feature
(h=j,d+1) (h=j,d+1) (h=j) (h=j)
XBM,P,L = {XBM,P , Ld−δ , L̂d+1 } (10) sets and the load forecast is the only feature that varies in
(h=j)
the features sets for different hours. For feature selection,
where δ ∈ {0, 1, ..., 34}, Ld−δ represents the historical load similar to the first scenario, daily geometric mean demand,
(h=j)
for Hour j of Day d − δ and L̂d+1 is the load forecast for daily average demand, intra-day demand volatility and L̂h=j d+1
6

TABLE II
MPCE(%) FOR M ODEL M2 FOR THE HOEP AND THE HAPP 25
PDP
M2(HOEP) M2(HOEP) M2(HOEP) M2(HAPP) M2(HAPP)
(P) (P,L) (P,L,PSC) (P) (P,L) 20
8.17 8.37 8.28 8.87 8.89 ARIMA

TMCC (%)
15
TABLE III TF
MPCE (%) OF M ODELS IN [9] FOR HOEP
10 DR
ARIMA TF DR PDP M2(HOEP)
(P)
21.52 17.85 17.26 26.39 6.84 5 M2HOEP
P

0
0 0.2 0.4 0.6 0.8 1
are considered as the mapped feature set, in addition to those α
considered in the first scenario.
Fig. 1. Change in TMCC (%) for the models in [9] and Model M2HOEP
In the third scenario, and for Ontario’s market only, the PSC P

data are added to the sets of initial features, as follows: for 0 ≤ α ≤ 1

(h=j) (h=j) (h=1) (h=24)


Γd−δ,M2,L,PSC = {Γd−δ,M2,L , PSCd−δ , ..., PSCd−δ ,
D. Cost-based Accuracy Assessment
(h=j)
PSCd+1 , δ = 0, ..., 34} (13)
The MPCE measure defined in (8) provides the user with an
overall measure of model performance. However, in practical
In this scenario, daily average PSC, daily geometric mean
h=j situations the user may have different costs associated to
PSC and PSCd+1 are added to the mapped features of the
different types of misclassification. For an M -class problem,
second scenario. The resulting 24-hour-ahead models in the
assume the total number of instances with class cm which are
three scenarios are referred to here by M2(HOEP)
(P) , M2(HOEP)
(P,L)
(HOEP) misclassified as class cm′ is denoted by β(cm′ |cm ) . Further
and M2(P,L,PSC) for the HOEP. The models built for the HAPP assume α(cm′ |cm ) represents the cost associated with the
are named similarly. misclassification of an instance with class cm as an instance
The classification errors for Model M2 in the three scenarios with class cm′ . Thus, the total misclassification cost (TMCC)
are presented in Table II. Observe that, compared to the results can be written as:
of the base model in Table I, the classification accuracy has
M M
improved significantly, i.e., 5.78 percentage points for HOEP X X
TMCC = α(cm′ |cm ) × β(cm′ |cm ) (14)
and 6.56 percentage points for HAPP. In this case also, similar
m=1 m′ =1
to the base model, additional reserve and load information has m′ 6=m
not improved classification accuracy.
The model that minimizes TMCC is preferred by the user,
regardless of its overall MPCE. Note that for an M -class
C. Comparison of the Results with the Previous Literature problem, there will be M (M − 1) values of αm′ |m to be
specified by the user.
The performance of the proposed models is compared to the In this section, classification errors of the proposed models
relevant available literature in this section. To do so, the time in the present work are compared with those of the models pre-
series models, presented in [9] for numerical forecasting of the sented in [9] based on the cost-based error measure defined in
HOEP for a 42-day period, are considered. The 42-day period (14). In order to reduce the number of pre-specified α(cm′ |cm ) ’s
in [9] consists of three 2-week periods in spring, summer and for this comparison, the proposed models for the HOEP are
winter 2004 with typical price fluctuations. The numerical trained in a 2-class case where c1 and c2 refer to prices
forecasts generated using ARIMA, Transfer Function (TF) under and above T = $50/MWh, respectively. Without loss
and Dynamic Regression (DR) models in [9], and the Pre- of generality, it is assumed that the normalized cost of mis-
dispatch Prices (PDPs) published by the Ontario Independent classifying prices over $50/MWh as prices under $50/MWh
Electricity System Operator for this period, have a mean is α(c1 |c2 ) = α (%), and the cost of misclassifying prices
absolute percentage error ranging between 16.8 to 40%. These under $50/MWh as prices over $50/MWh is α(c2 |c1 ) = 1 − α
numerical forecasts are classified according to the thresholds (%), where 0 ≤ α ≤ 1. By varying the value of α, the
specified in the present work, and the classification errors TMCC (%) values are calculated for the ARIMA, TF, DR,
are presented in Table III. The classification errors of all of PDP models in [9], and for Model M2(HOEP) (P) , over the same
the proposed models in the present work over the same 42- 42-day period, and the results are depicted in Fig. 1. Observe
day period were also evaluated and the classification error of from this figure that Model M2(HOEP) (P) outperforms all other
Model M2(HOEP)
(P) , which was the lowest among others, is also models for a wide range of α, except for the extreme cases
presented in this table. Observe that model M2(HOEP)(P) signif- of α ≈ 1 for which all models perform similarly. Model
icantly outperforms all other models. No previous literature M2(HOEP)
(P) also presents a steady performance of TMCC≤ 5%
exists for HAPP forecasting and thus no comparisons are for various values of α, which indicates that this model is not
made. biased toward misclassifying either c2 as c1 or c1 as c2 . For
7

the other four models, however, small values of α result in [14] V. N. Vapnik, Statistical Learning Theory. The USA: John Wiley &
higher TMCCs which means misclassifying c1 as c2 occurs Sons, 1998.
[15] C. J. C. Burges, A Tutorial on Support Vector Machines for Pattern
more often than misclassifying c2 as c1 . Also, observe that Recognition. Boston-USA: Kluwer Academic Publishers, 1998.
for α = 1, PDP forecasts yield the lowest TMCC despite their [16] S. Vucetic, K. Tomsovic, and Z. Obradovic, “Discovering price-load
highest MPCE. relationships in California’s electricity market,” IEEE Transactions on
Power Systems, vol. 16, no. 2, pp. 280–286, May 2001.
[17] J. H. Zhao, Z. Y. Dong, X. Li, and K. P. Wong, “A framework for
electricity price spike analysis with advanced data mining methods,”
V. C ONCLUSIONS IEEE Transactions on Power Systems, vol. 22, no. 1, pp. 376–385, Feb.
Classification of future electricity prices with respect to a 2007.
[18] D. Meyer and F. L. andKurt Hornikb, “The support vector machine
number of user-specified price thresholds was proposed in this under test,” Neurocomputing, vol. 55, no. 1-2, pp. 169–186, September
paper. Multi-class support vector machines were employed as 2003.
the core classifier for multi-step-ahead classification. Consid- [19] J. C. Platt, “Sequential minimal optimization: A fast algorithm for
training support vector machines,” Microsoft Research, Tech. Rep., April
ering the 24-hour classification horizon in this work, most 1998.
recent price information were not available in the classification [20] C.-W. Hsu and C.-J. Lin, “A comparison of methods for multiclass
stage and thus, two alternative models were proposed for support vector machines,” Neural Networks, IEEE Transactions on,
vol. 13, no. 2, pp. 415–425, Mar 2002.
which unavailable features were not considered in the initial [21] M. Gonen, A. Tanugur, and E. Alpaydm, “Multiclass posterior probabil-
feature set. Numerical results were presented for Ontario ity support vector machines,” Neural Networks, IEEE Transactions on,
and Alberta electricity prices- two markets with high price vol. 19, no. 1, pp. 130–139, Jan. 2008.
[22] N. Amjady and F. Keynia, “Day-ahead price forecasting of electric-
volatilities. Classification accuracy of the models was com- ity markets by mutual information technique and cascaded neuro-
pared to the previous literature where possible. The simulation evolutionary algorithm,” IEEE Transactions on Power Systems, vol. 24,
results showed that the proposed price-class prediction models no. 1, pp. 306–318, Feb. 2009.
[23] H. Zareipour, K. Bhattacharya, and C. Canizares, “Electricity market
provided significantly more accurate price classification results price volatility: the case of Ontario,” Energy Policy, vol. 35, no. 9, pp.
compared to the available point-forecasting models. This is 4739–4748, September 2007.
particularly important in markets with high price volatility [24] A. Janjani, “Classification of future prices in competitive electricity mar-
kets,” Master’s thesis, University of Calgary, Department of Electrical
where point forecasting accuracy is considerably low. The cost and Computer Engineering, 2009, available upon request, pending patent
of the achieved higher classification accuracy is the loss of application.
point-forecasting information; this information, however, has [25] C. P. Rodriguez and G. J. Anders, “Energy price forecasting in the
ontario competitive power system market,” IEEE Transactions on Power
a marginal value for certain groups of market participants. Systems, vol. 19, no. 1, pp. 366–374, February 2004.
[26] J. MacCormack, H. Zareipour, and W. Rosehart, “A reduced model of
the Alberta electric system for policy, regulatory, and future development
R EFERENCES studies,” in proceedings of the Power and Energy Society General
Meeting, 2008 IEEE, July 2008, pp. 1–8.
[1] A. Ipakchi and F. Albuyeh, “Grid of the future,” IEEE Power Energy
Mag., vol. 7, no. 2, pp. 52–62, Mar. 2009.
[2] F. Rahimi and A. Ipakchi, “Demand response as a market resource under
the smart grid paradigm,” IEEE Trans. Smart Grid, vol. 1, no. 1, pp.
82–88, Jun. 2010. Hamidreza (Hamid) Zareipour (SM’09) is currently an Assistant Professor
[3] T. Lui, W. Stirling, and H. Marcy, “Get smart,” IEEE Power Energy with the Department of Electrical and Computer Engineering, University
Mag., vol. 8, no. 3, pp. 66–78, May 2010. of Calgary, Calgary, Alberta, Canada. His research focuses on economics,
[4] R. Weron, Modeling and Forecasting Electricity Loads and Prices: A planning and management of electric energy systems.
Statistical Approach. New York: Wiley, 2006.
[5] L. Wu and M. Shahidehpour, “A hybrid model for day-ahead price
forecasting,” IEEE Trans. Power Syst., vol. 25, no. 3, pp. 1519–1530,
Aug. 2010.
Arya Janjani (S’06) received Master (2009) degrees from the University
[6] A. J. Conejo, F. J. Nogales, and J. M. Arroyo, “Price-taker bidding
of Calgary in Electrical and Computer Engineering. His research focuses on
strategy under price uncertainty,” IEEE Transactions on Power Systems,
applications of data mining in power systems planning.
vol. 17, no. 4, pp. 1081 – 1088, November 2002.
[7] F. J. Nogales, J. Contreras, A. J. Conejo, and R. Espinola, “Forecasting
next-day electricity prices by time series models,” IEEE Transactions
on Power Systems, vol. 17, no. 2, pp. 342–348, May 2002.
[8] J. Contreras, R. Espinola, F. J. Nogales, and A. J. Conejo, “ARIMA Henry Leung (M’90) is a Professor at the Department of Electrical and
models to predict next-day electricity prices,” IEEE Transactions on Computer Engineering, University of Calgary. His research interests include
Power Systems, vol. 18, no. 3, pp. 1014–1020, August 2003. chaos, computational intelligence, data mining, nonlinear signal processing,
[9] H. Zareipour, C. Canizares, K. Bhattacharya, and J. Thomson, “Applica- multimedia, radar, sensor fusion and wireless communications.
tion of public-domain market information to forecast Ontario wholesale
electricity prices,” IEEE Transactions on Power Systems, vol. 21, no. 4,
pp. 1707–1717, November 2006.
[10] J. H. Zhao, Z. Y. Dong, Z. Xu, and K. P. Wong, “A statistical approach Amir Motamedi (S’07) received his M.Sc. degree in Electrical Engineering,
for interval forecasting of the electricity price,” IEEE Transactions on from Sharif University of Technology, Iran, in 2007. Currently, he is a Ph.D.
Power Systems, vol. 23, no. 2, pp. 267–276, May 2008. student at the Department of Electrical and Computer Engineering, University
[11] H. Zareipour, A. Janjani, H. Leung, A. Motamedi, and A. Schellenberg, of Calgary.
“Classification of future electricity market prices,” IEEE Trans. Power
Syst., 2010, to appear.
[12] I. Guyon, J. Weston1, S. Barnhill1, and V. Vapnik, “Gene selection for
cancer classification using support vector machines,” Machine Learning, Anthony Schellenberg (S’02, M’06) received the Ph.D. degree in Electrical
vol. 46, no. 1-3, pp. 389–422, January 2002. Engineering from the University of Calgary in 2006, where he is currently an
[13] G. Baudat and F. Anouar, “Feature vector selection and projection using Adjunct Professor. His research interests are in optimization under uncertainty
kernels,” Journal of Neurocomputing, vol. 55, pp. 21–38, 2003. and renewable energy systems.

View publication stats

You might also like