0% found this document useful (0 votes)
13 views15 pages

1 s2.0 S0360132324011545 Main

This study presents an unsupervised automated fault detection and diagnosis (AFDD) method for HVAC systems in light commercial buildings, which are defined as structures with fewer than six stories and less than 2500 ft² in area. The method utilizes Principal Component Analysis (PCA) to handle unlabeled and inconsistent data from Building Energy Management Systems (BEMS), demonstrating effectiveness in identifying and isolating faults in HVAC systems across different configurations. Validation was conducted on two distinct buildings, showing promising results in detecting faults and providing insights into their severity and location.

Uploaded by

23147092
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views15 pages

1 s2.0 S0360132324011545 Main

This study presents an unsupervised automated fault detection and diagnosis (AFDD) method for HVAC systems in light commercial buildings, which are defined as structures with fewer than six stories and less than 2500 ft² in area. The method utilizes Principal Component Analysis (PCA) to handle unlabeled and inconsistent data from Building Energy Management Systems (BEMS), demonstrating effectiveness in identifying and isolating faults in HVAC systems across different configurations. Validation was conducted on two distinct buildings, showing promising results in detecting faults and providing insights into their severity and location.

Uploaded by

23147092
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Building and Environment 267 (2025) 112312

Contents lists available at ScienceDirect

Building and Environment


journal homepage: www.elsevier.com/locate/buildenv

Unsupervised automated fault detection and diagnosis for light commercial


buildings’ HVAC systems
Milad Babadi Soultanzadeh a,* , Mazdak Nik-Bakht a , Mohamed M. Ouf a, Pierre Paquette b,
Steve Lupien b
a
Building, Civil and Environmental Engineering Department, Concordia University, Montreal, Canada
b
Strato Automation Co, Montreal, Canada

A R T I C L E I N F O A B S T R A C T

Keywords: Fault detection in light commercial building HVAC systems can significantly improve the energy efficiency of this
Automated fault detection and diagnosis class of buildings. A light commercial building is a commercial structure with fewer than six stories and a floor
HVAC plan area of less than 2500 ft2. Data extracted from existing buildings in this class are generally unlabeled, raw,
Commercial building
and characterized by many inconsistencies and discontinuities, making Automated Fault Detection and Diagnosis
PCA
(AFDD) particularly challenging. This study aims to develop an unsupervised AFDD method tailored for light
Unsupervised learning
commercial buildings, which is transferable among different HVAC configurations within this building class. The
method is designed to handle unlabeled, incomplete, and raw datasets provided by their Building Energy
Management Systems (BEMS). Principal Component Analysis (PCA) was selected as the core method due to its
scalability and transferability. Specific techniques were introduced to address time series analysis and fault
detection and diagnosis (FDD) based on the dynamics of the system, using appropriate window sizing. The
method was validated using two different light commercial buildings with distinct configurations and data
availability. The primary building, an office in Montreal, Canada, and the secondary building, a small industrial
facility in Ireland, served as the test cases. The proposed method demonstrated promising results in detecting and
isolating faulty inputs, providing information on the severity levels and locations of faults. It successfully
identified whether faults were at the level of the central system or within specific zones in both studied cases.

1. Introduction AFDD frameworks specifically designed for light commercial buildings.


It has continued with recent research aimed at finding the most gener­
Various types of faults in HVAC system components can impact both alizable framework. Within the context of our studies, a light commer­
the energy usage of buildings and the comfort of occupants across cial building is considered to be a commercial structure with six stories
different building types. In the United States, faults contribute to an or fewer and a floor area of 2500 square feet or less. Examples of such
additional energy consumption of 103 to 500 terawatt-hours (TWh) in buildings include small office spaces, bank branches, medical clinics,
the building sector [1]. Detecting and diagnosing faults in HVAC systems and small manufacturing facilities. Generally, these buildings share
can reduce energy consumption by 20 % to 30 % [2]. Commercial similar HVAC systems in configuration, size, and components. These
buildings in Canada covered an area of 709,029,612 m2 in 2019. These similarities allow for the development of a transferable AFDD system.
buildings accounted for 948,216,746 GJ energy consumption [3], and Researchers have used various fault detection and diagnosis
HVAC systems are responsible for 25 %- 50 % of this energy consump­ methods, which can be traditionally categorized as Data-Driven (DD)
tion [4]. FDD of HVAC systems in light commercial buildings holds methods, gray-box methods, and a priori knowledge-based approaches
significant potential for energy savings within this building category. [7]. Data-driven methods stand out for their independence from human
Taking all factors into consideration, it’s evident that a generic Auto­ expertise or physical models, relying solely on real-world operational
matic Fault Detection and Diagnosis (AFDD) framework, transferable data from the system. This means that rather than depending on intri­
and applicable across different buildings within the same class, is vital, cate theoretical frameworks or pre-established models, these methods
especially for commercial buildings. Previous studies [5,6] introduced extract insights directly from the system’s behavior. With the increasing

* Corresponding author.
E-mail address: [email protected] (M. Babadi Soultanzadeh).

https://doi.org/10.1016/j.buildenv.2024.112312
Received 1 August 2024; Received in revised form 21 October 2024; Accepted 12 November 2024
Available online 15 November 2024
0360-1323/Crown Copyright © 2024 Published by Elsevier Ltd. This is an open access article under the CC BY-NC license (http://creativecommons.org/licenses/by-
nc/4.0/).
M. Babadi Soultanzadeh et al. Building and Environment 267 (2025) 112312

sophistication of data collection technologies and analytical tools, newer methods like encoding also use Reconstruction Error (RE) for
data-driven approaches have gained significant traction among re­ fault detection, they have high computational costs and are typically
searchers [8]. Their appeal lies in their ability to adapt to the com­ limited to the detection phase [14]. Besides, PCA-based fault detection,
plexities of HVAC systems without the need for prior assumptions or with minor adjustments and tuning, can be easily transferred between
extensive manual intervention, making them particularly valuable in different datasets and buildings and can also be used for diagnosis as
modern buildings’ Fault Detection and Diagnosis (FDD) [9]. Data-driven well as detection. Most existing BEM systems have limited computa­
methods encompass both supervised and unsupervised techniques. In tional and storage resources, and PCA-based AFDD can be easily
the past, supervised classification was predominantly employed for fault implemented using simple dependencies and basic Python libraries
detection and diagnosis (FDD) tasks, while unsupervised methods were (such as the Pandas and NumPy).
commonly used for preprocessing or fault detection (FD) [10]. This is
mainly because supervised algorithms have clear labels for prediction, 2. Literature review
making them easier to validate and interpret, and they are generally
straightforward when labeled data is available. Supervised methods use This section reviews previous works on data-driven methods in
labeled data to learn and predict labels for new data, while unsupervised AFDD in building HVAC systems, particularly focusing on AFDD
methods identify patterns within datasets without predefined labels methods using PCA. To conduct a comprehensive investigation of
[11]. A labeled dataset in FDD indicates normal conditions as well as important previous works, the SCOPUS API was utilized. A query was
various types of faults with different severity levels. These datasets are performed using specific keywords: "AFDD," "HVAC," and "Automatic
scarce in real buildings and are primarily utilized in theoretical fault Fault Detection", for the general data-driven methods. The most relevant
detection and diagnosis algorithms using experimental or simulated papers identified from the search results were manually selected for
data [8]. While theoretical applications may possess a certain amount of review. All the papers chosen for this review were published after the
labeled data, datasets collected from Building Energy Management year 2000.
(BEM) systems or Building Automation systems (BAS) are usually
extensive in size. However, they frequently suffer from missing data 2.1. Data-Driven fault detection in HVAC systems
points and inconsistencies. Furthermore, they lack labels or information
regarding various operational patterns. Cleaning and labeling these DD methods include unsupervised, supervised, semi-supervised, and
datasets necessitate prior knowledge of the system and are highly regression-based approaches [8] and can be categorized into three
time-consuming. Additionally, such labeling is typically restricted to general classes: traditional Machine Learning (ML), Deep Learning (DL),
faults that occurred during the available data collection period, poten­ and hybrid methods [15]. Traditional ML techniques have remained the
tially overlooking numerous other possible fault scenarios [12]. dominant data-driven AFDD of HVAC systems for the two decades
Faults in HVAC systems can generally be classified into three cate­ following the start of the 21st century. Various researchers have
gories based on their nature [13]: continued to explore and modify traditional ML and started to use DL
and hybrid methods to improve system performance and fault detection
• Condition-based faults: These refer to physical conditions that accuracy these days [9].
deviate from the norm, such as a stuck valve, fouled coil, or broken Various classification techniques are mostly employed in supervised
fan. ML methods. Support Vector Machines (SVM), K-Nearest Neighbors
• Behavior-based faults: These involve undesirable system behaviors, (KNN), Decision Trees (DT), Random Forests (RF), and Neural Networks
like simultaneous heating and cooling, erratic fan speeds, or are frequently utilized for Fault Detection and Diagnosis (FDD) in HVAC
improper cycling. systems [16]. Additionally, some data-driven methods utilize
• Outcome-based faults: These are identified through performance regression-based techniques, primarily to establish a baseline for normal
metrics, such as a reduced coefficient of performance (COP), conditions which can be classified under the supervised methods [8].
increased hot water flow, or higher energy consumption than Supervised FDDs require labeled data sets. Consequently, these methods
expected. are sensitive to the data structure and the number of labeled faults and
selected features for FDD of different types of faults [17]. It is chal­
Each type of fault requires distinct detection and diagnosis ap­ lenging to make them generalizable across different HVAC systems,
proaches to ensure effective system maintenance and optimization. which may have varying components, sizes, and configurations, each
The main goal of this paper is to develop a generalizable unsuper­ with unique faulty and unfaulty patterns [18]. Additionally, although
vised AFDD method tailored for light commercial buildings. This increasing the dataset’s dimensions can improve classification precision,
method aims to be generally applicable across various buildings within it also significantly raises the computational cost of the models [19].
this class, capable of utilizing different types of historical data from the Different types of clustering (such as K-means, DBSCAN, and Hier­
BEM systems, even if it is unlabeled, inconsistent, or fragmented. archical), PCA, and Autoencoding are unsupervised methods that
Additionally, the AFDD method should provide minimal yet essential various researchers have applied for FDD in HVAC systems [20]. Un­
information about faulty conditions to the HVAC operators. Thus, the supervised ML methods analyze HVAC historical operational data pat­
objectives of this paper can be mainly divided into the following: terns without assigned labels. Due to the challenge of obtaining faulty
data for training, unsupervised FDD is easier to deploy than supervised
1. Develop a data cleaning process to identify the most suitable subset FDD [21]. Unsupervised ML methods are not as effective as Supervised
for training a PCA-based time series FDD, then train a PCA on the ML at diagnosis because the complexity of HVAC systems leads to
selected subset. intricate fault symptoms. The interdependencies among different com­
2. Establish a PCA-based time series fault detection method using his­ ponents at different levels complicate the diagnosis process due to fault
torical raw unlabeled data. symptom propagation and compensation without pre-labeled data. As a
3. Develop an automated method to identify the primary sources of result, unsupervised methods are primarily used for detecting sensor
faults and their locations. faults [22,23].
4. Assess the method’s applicability and effectiveness in various light To leverage the advantages of both supervised and unsupervised ML
commercial building’s HVAC configurations. techniques, semi-supervised techniques have been developed and
employed for the FDD of HVAC systems. Semi-supervised methods use a
The reason for selecting PCA among other techniques lies in its small portion of labeled data to predict the labels of a larger set of similar
ability to satisfy the main goals and objectives of the research. While unlabeled data [24]. While semi-supervised ML methods require only a

2
M. Babadi Soultanzadeh et al. Building and Environment 267 (2025) 112312

small portion of labeled data, this labeled dataset must be reliable and corresponding to the (1- α) percentile of SPEs higher than threshold and
clearly define patterns for different labels to ensure robust FDD perfor­ λ is the eigenvalue corresponding to the j th eigenvector. Also, θ repre­
mance [25]. Data labeling expenses make it highly challenging to sents the first to third moments of eigenvalues from k+1 th to nth com­
exploit massive HVAC data for FDD purposes fully. Additionally, these ponents of eigenvectors, and h0, is a term that adjust the SPE
methods are constrained by the labels provided for a small portion of the distribution.
data [26].
⎡ √̅̅̅̅̅̅̅̅̅̅̅̅ ⎤h10
Cα 2θ2 h20 θ2 h0 (h0 − 1)⎦
ThresholdSPE = θ1 ⎣ +1+
2.2. PCA application in HVAC FDD θ2 θ21

PCA is a multivariable statistical tool for analyzing process mea­ n


∑ (4)
θi = λij , i = 1, 2, 3
surements, which can reveal how different variables change concerning j=k+1
each other and individually [27]. It is essentially a linear transformation
that transforms correlated variables (dimensions) into orthogonal 1 − 2θ1 θ3
h0 =
spaces, resulting in independent dimensions. These orthogonal di­ 3θ22
mensions are the eigenvectors of the covariance matrix of the original
After detecting a fault, the Q-contribution is traditionally employed
data, forming the columns of the loading matrix. The projection of
to pinpoint the faulty inputs by identifying which input has the greatest
sample points onto these eigenvectors can produce the reconstructed
influence on the Q-statistic, as shown in the Eq. (5) [30].
data samples when represented in the original dimensions. The number
of eigenvectors used for this purpose represents the number of principal e

Qcontribution = (5)
components (k) in the PCA method. A specific optimized value for k does ‖ e ‖2

not exist, and various methods can be used to determine the best option
for the number of principal components, as detailed in the literature Since the foundation of this paper’s method is PCA, the literature
[28]. review in the next section will delve into a more detailed study of this
The difference between the original data and the reconstructed data method.
Some of the earliest applications of PCA in HVAC FDD were reported
(projected on PC directions) can be represented by Eq. (1), where → e is
→ by Wang S. and Xiao F. et al. [31–36]. These initial studies focused
the vector of reconstruction error, X is the original sample vector, and primarily on fault detection. They later extended their methods to the

̂
X is the reconstructed sample vector. Fig. 1 illustrates the process of diagnosis of sensor faults in various HVAC systems, including Air
calculating the projected error. Handling Units (AHU), Variable Air Volume (VAV) terminals, and Vapor
Compression Systems (VCS). They utilized a variety of data sources for
̅→ ̂→
e = X − X
→ (1) their studies, including simulated data, field data, and experimental data
published by ASHRAE. Du et al. . combined various rules with PCA to
The reconstructed vector can be found using Eq. (2), where P refers detect and diagnose sensor faults in a Chiller + AHU system, exper­
to the loading matrix, which is a collection of the k eigenvectors asso­ imenting with different numbers of principal components (PCs). Using a
ciated with the top k eigenvalues. labeled dataset, they revealed that a robust FDD process for each sensor
→ → T
̂ fault requires a specific number of PCs (cumulative variance). They
X = X PP (2) expanded their research using simulated labeled data and a combination
SPE (Squared Prediction Error) will be calculated as the squared sum of multi-level PCA and Fisher Discriminant Analysis to enhance the
of the elements of the reconstructed errors, as shown in Eq. (3): detection and diagnosis of sensor faults. They discovered that drifting
sensor faults can persist in the system for a long time, so they simulated
SPE = ‖ →
e ‖2 (3) exaggerated drifting for the sensors to make these faults more detectable
[37–39]. Xiao F. and Wang S. et al. . expanded their research on AFDD
This means that a sample vector with n dimensions, and hence an n-
by using PCA with an expert-based multivariate decoupling method for
dimensional Reconstructed Error (RE) vector leads to a scalar (single
sensor fault detection and isolation. They revealed that although PCA is
value) SPE. Traditionally, in fault detection, a threshold is used for SPE,
powerful for detecting sensor faults in various engineering processes, it
as shown by Eq. (4) [29]. Where Cα is the normal deviation

Fig. 1. Concept of PCA-Based AFDD.

3
M. Babadi Soultanzadeh et al. Building and Environment 267 (2025) 112312

has significant weaknesses in the diagnosis phase. They found that the Table 1
Q-contribution plot, commonly used for detection, is not effective even PCA-based HVAC systems AFDD studies.
for sensor faults diagnosis in HVAC systems and they used Study Methodology Dataset Used Type of Key Findings
expert-system-based diagnostics after detecting using PCA [40]. Wang Faults
et al. . utilized normal operational data from an HVAC system, including Detected
AHUs, chillers, and cooling towers, and simulated fault scenarios to Hu et al Adaptive PCA Simulated Sensor faults Errors in
develop a system-level sensor fault detection and diagnosis method. [29]. data normal data
They employed performance indicators for each level to enhance the used for PCA
training
detection and diagnosis process [41]. The research on PCA-based AFDD decrease FDD
of HVAC systems continued with the studies summarized in Table 1. efficiency; PCA
Regarding Table 1. PCA has established itself as a valuable tool in is less effective
HVAC FDD, both as a standalone method and in combination with other for negative
faults.
techniques, or for dimensionality reduction. The reviewed literature
Li et al PCA-Wavelet ASHRAE RP- Outdoor Data
indicates that PCA-based algorithms predominantly focus on sensor [27]. model 1312 damper pretreatment is
faults, particularly those not involved in feedback loops. A limited stuck, essential;
number of studies address component faults, often relying on labeled heating coil improved
data for diagnosis or focusing solely on fault detection. PCA-based al­ leaking detection of
outdoor
gorithms are highly sensitive to the quality of training data, with outliers damper stuck
and inconsistencies significantly impacting performance. The SPE has vs. heating coil
proven to be more effective than Hotelling’s T2 for detecting and diag­ leaking faults.
nosing sensor faults and malfunctions. For specific fault detection in Li et al PCA with ASHRAE RP- Various faults Enhanced PCA-
[42]. similarity 1312 based FDD with
particular systems, appropriate thresholds and cumulative covariance
analysis better
values must be applied. Furthermore, PCA is sensitive to dynamic and performance
transient conditions, leading many researchers to use steady-state data using ASHRAE
for training or to implement smoothing techniques. Steady-state refers RP-1312 data.
to a physical model that defines the system’s operation when charac­ Padilla et Passive-active Simulated Sensor faults SPE
al [43]. sensor fault and outperformed
teristics of the system are time-independent, though detecting this
detection experimental T2 for
condition in real-world scenarios is impossible because it is just a model. data experimental
Therefore, the steady-state condition is considered a stable condition, tests; both
with relative time independence (featuring very small fluctuations), and performed
equally well
is usually identified through statistical analysis when using sensorial
with simulated
data. Transient refers to the operational condition where the system’s data.
characteristics change significantly over time, and the dynamics of the Yan et al. PCA with k- Simulated Sensor faults Appropriate
system encompass both the steady state and transient conditions, [44]. distance and data thresholds for
capturing the full time-dependency of the system. OPTICS specific faults
and systems are
PCA-based FDD algorithms are also more effective for faults of a
crucial for
certain severity, meaning that faults with lower severity might go un­ efficient FDD.
detected. Overall, while PCA-based methods are powerful, their effec­ Hu et al PCA with Operational Sensor faults Steady-state
tiveness depends on careful data preparation and the application of [45]. preprocessing data condition
consideration
tailored thresholds and techniques.
and outlier
removal (z-
2.3. Existing gap in HVAC FDD literature score)
significantly
The use of various data-driven techniques for FDD in building HVAC improved FDD
efficiency.
systems highlights the ongoing growth in this field. Although many re­ Guo et al PCA with Simulated System and FDD
searchers have employed different data-driven techniques to develop [46,47]. expert-based data sensor faults performance
tailored AFDD solutions for specific HVAC systems, these methods often multivariable varied (59 %-84
lack transparency during testing and implementation, providing little decoupling %) for different
and SG faults;
information on how faults are detected and isolated [58]. However, it is
method diagnosis
a significant challenge for building operators to understand the mech­ performance
anisms and inferences of data-driven methods, especially black-box su­ >90 % for all
pervised ones, which rely on complex relationships between data, and to faults.
trust the results [59]. Consequently, building operators in practice need Li et al PCA with Operational Sensor faults Clustering
[48]. DBSCAN data operational
AFDD systems that can provide at least minimal information about the conditions and
operational conditions leading to faults, enabling them to analyze and applying PCA
trust the results. separately
Commercial buildings’ HVAC systems are designed and installed enhanced FDD
performance.
based on the specific conditions of each building. Factors such as
Montazeri PCA, KPCA, Simulated Sensor faults PCA FDD
weather, internal loads, occupant behavior, and schedules vary from et al and ANN data, performance
building to building, leading to different historical data behaviors. [49]. ASHRAE RP- was 60 %,
Consequently, a data-driven AFDD tailored for one building may not (2020) 1312 KPCA was 62
apply to others [8]. So, a generalizable AFDD that can be applied to %, ANN
reached 98.7 %.
different buildings within the same class with acceptable performance is Gu et al Gaussian Operational System faults PCA for data
a significant need in the commercial buildings HVAC AFDD market. [50]. Mixture data dimensionality
The availability of data for the training stage of AFDD is another (continued on next page)
significant challenge. Obtaining comprehensive faulty data that includes

4
M. Babadi Soultanzadeh et al. Building and Environment 267 (2025) 112312

Table 1 (continued ) raw, unlabeled data from BEMs, representing another crucial aspect for
Study Methodology Dataset Used Type of Key Findings the advancement of commercial building AFDD systems.
Faults Another significant gap in previous studies is the lack of consider­
Detected ation for the historical behavior of systems when using PCA-based FDD.
Model with reduction Most researchers have used only a subset of unfaulty cleaned data
PCA significantly without time sequence or continuity, or just a segment of cleaned
increased FDD steady-state unfaulty data for training PCA. In reality, a true steady-state
performance.
is never fully achieved, as systems constantly experience fluctuations.
Burgas et Unfolded-PCA Operational Sensor faults, SPE was more
al [51]. data leakages effective for However, previous works have defined steady-state as a period during
detecting which the system remains relatively stable and have used statistical
sensor faults; methods to detect it [60]. These steady-state periods are often inter­
Hotelling’s T2 mittent, even within a single day, due to changing thermal loads,
was more
efficient for
operating modes, and setpoints. As a result, significant information
identifying about the system’s dynamic behavior, including its time-dependent
leakages. variations, is often lost. Although PCA has been applied for time series
Yang et al PCA with ASHRAE RP- Sensor faults FDD analysis in various fields (like quality control) [28], PCA-based FDDs
[52]. thermal load 1312 performance
have yet to fully leverage time series analysis.
matching highly depends
on fault
severity; 3. Methodology and proposed framework
method for
identifying best
normal
The proposed method for AFDD in this paper is divided into three
condition steps based on the objectives. Each step is presented separately, and the
training set. combination of all these steps leads to the comprehensive AFDD
Liang et al Hybrid ASHRAE RP- Sensor faults Well-chosen framework as it illustrated in Fig. 2. The methodology starts with data
[53]. clustering- 1043 feature
cleaning for training a PCA to be used for FDD. It then continues with the
isolation combination
forest-PCA significantly proposed method for fault detection, and finally, the diagnosis part will
improved be presented.
sensor fault
detection
performance.
3.1. Data cleaning and PCA training
Wen et al DBSCAN, SG Operational Temperature Enhanced
[54]. smoothing, data sensor faults sensor FDD for
(2023) and PCA temperature Dealing with long-term, raw, unlabeled data from existing light
sensors; applied commercial buildings presents challenges due to the system’s dynamics,
to sensors not varied modes of operation, maintenance breaks, changing weather
part of feedback
conditions, and firmware updates. Such data inherently contains diverse
regulation
systems. patterns, outliers, missing data, and inconsistencies. Consequently,
Yang et al PCA with data ASHRAE RP- Sensor faults Larger data manually identifying a comprehensive set of samples for PCA training
[55]. window 1312 window sample that encompasses all periods is nearly impossible. Several preprocessing
analysis size enhances steps are essential for PCA, including removing missing values, scaling,
fault-free
condition ratio
and centering the data.
by capturing A cleaning loop has been devised to identify an appropriate dataset
more system for training. This loop uses PCA and daily SPE to remove days with
dynamics. significant deviations from the dataset. The variability in system
Ma et al BPNN-PCA Simulated Sensor faults Improved fault
behavior significantly influences the minimum SPE each day, where
[56]. and diagnosis
operational accuracy when days with greater deviation exhibit higher minimum SPE values. While
data using BPNN- other data cleansing techniques rely on the initial dataset, some data
PCA compared samples within the outliers days might be misclassified as normal,
to BPNN alone. especially when using PCA. Additionally, traditional data cleaning
Li et al PCA and Field data, Sensor faults Detectable
[57]. Bayesian ASHRAE RP- faults in AHU
methods require detailed consideration of operational modes, seasonal
inference 1312 system; changes, and weather conditions, for determining the normal range of
combined PCA each input, which can be complex and time-consuming. In contrast, this
with Bayesian approach automates the cleaning process, ensuring that these factors are
inference for
accounted for. Furthermore, since this process is designed for PCA-based
HVAC sensor
FDD and fault detection, the PCA-loop ensures that the final selected subset ex­
automatic cludes any detectable daily outliers, preventing them from influencing
calibration. the testing period results. The cleaning process begins by determining
the necessary number of PCs to achieve a Cumulative Variance (CV)
representing the percentage of explained variance by the selected PCs.
all types and severities of faults is nearly impossible. Additionally, the
The minimum acceptable CV can vary between 75 % to 90 % for
Building Energy System (BES) data from existing buildings often suffers
different purposes [6]. This study, however, sets the minimum CV,
from low quality, short and discontinued time ranges, numerous missing
through trial and error, at 92 % as it presented promising results during
values, and inconsistencies [21]. Furthermore, many previous studies on
the tuning phase. The proposed process continues by applying PCA to
AFDD have developed methods using simulated data, experimental data,
the initial dataset and generating an SPE time series. This series is then
or real operational labeled data containing specific faults with specific
resampled daily with the minimum daily SPE values. Outlier days are
severities for particular systems, limiting their generalizability [5,6].
flagged for removal from the dataset employing a statistical outlier
Therefore, there is a need for a data-driven AFDD system that can utilize
detection mechanism using a threshold as defined by Eq. (5):

5
M. Babadi Soultanzadeh et al. Building and Environment 267 (2025) 112312

Fig. 2. Proposed AFDD framework.

ThresholdDaily Daily Daily Besides, the Moving Standard Deviation (MSD) of SPE in the current
SPE = SPEmin + m ∗ SPEstd (6)
time step (SPEtmsd ) can be calculated as shown in Eq. (7):
Since SPEDaily
min may not be normally distributed in all cases, the value √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅

of m, set at 3 for this study, can be adjusted based on the dynamic √1 ∑ t
( )2
characteristics of the building’s HVAC operations and skewness of the SPEmsd = √
t
SPEi − SPEtma (8)
ws i=t− ws+1
SPEDaily
min . If any outlier days are identified, all data corresponding to those
days are excluded from the initial dataset. PCA and minimum daily SPE The dynamic band of the system adopted from Bollinger Bands [61]
analysis are subsequently applied to the refined dataset iteratively until can be determined through the upper band (UB) and lower band (LB) as
no outliers remain. These days were removed from the initial dataset to shown in Eq. (8):
find an appropriate training subset for the primary training of PCA for
UL = SPEtma + mstd ∗ SPEtmsd
FDD. Later, during the test steps, they can be considered as a subset to (9)
LL = SPEtma − mstd ∗ SPEtmsd
determine whether there exist any faults during those days by using
them as the inputs of the fault detection part. Upon completion of the Where mstd is a tunable multiplier that can be selected based on the
first objective, the final trained PCA model after complete cleansing is desired sensitivity to deviations. However, in this study, mstd = 3, which
saved for use in subsequent fault detection phases. The cleaning and is a standard statistical selection.
training processes are illustrated by Fig. 2. The primary threshold for fault detection is set based on SPEtma .
While traditional statistical methods based on confidence levels or
higher quantile percentages are common, they may not effectively ac­
3.2. PCA-Based fault detection
count for the smoothness of SPEtma in this case, potentially leading to
This section’s target is to detect faulty patterns in the time series data high false alarm rates during fault-free conditions. Hence, this study
of BEM, considering the historical behavior of HVAC systems in light proposes using Eq. (9) to determine the main threshold (ThresholdSPE ) for
commercial buildings. To achieve this, a smoothing method for the SPE fault detection. It is important to note that the parameter mthrsh can be
is employed, allowing the analysis of a specific period (window size) of adjusted during the tuning phase for each specific building regarding
previous data to identify any faulty patterns in the current time. HVAC operators’ considerations. To avoid missing alarms, mthrsh should
The detection phase begins by applying PCA #1 (the trained PCA not be set too high relative to mstd . Ideally, it should be set between 10 %
from the previous step) and calculating the SPE of the input data. The to 20 % of m. For this study, an acceptable value considered mthrsh = 0.5
input data can be the entire initial dataset, a specific period of the initial which is 16.7 % of mstd .
dataset that is of particular interest, or a new period of raw data not ThresholdSPE = SPEtma + mthrsh ∗ SPEtmsd (10)
included in the initial training phase. This process generates an SPE time
series corresponding to the original timestamps of the data. Then, a Any SPEtma above the ThresholdSPE will be flagged as a potential fault.
simple moving average (SMA) of SPE in the current time step (SPEtma ) for Then faulty and unfaulty samples will be labeled and separated. Addi­
a window size “ws” can be calculated using Eq. (6): tionally, any SPEi outside the dynamic band of the system may be
considered outliers; however, the combination of moving SPEs above
t
1 ∑ the threshold and SPEs outside the dynamic band can indicate a higher
SPEtma = SPEi (7)
ws i=t− ws+1
level of fault severity.

6
M. Babadi Soultanzadeh et al. Building and Environment 267 (2025) 112312

3.3. PCA-Based fault isolation and localization intervals. Similarly, the false alarm rate dropped from 23 %-37 % with
a 5-day window size to 0 %-5.5 % with a 15-day window size, depending
After detecting a fault, the fault diagnosis process relies on three key on the testing intervals used.
aspects: localization, identification, and severity assessment [18]. During both weekdays and weekends, as well as throughout different
Identifying and localizing the root cause of an occurred fault is generally seasons, the HVAC system operates in various modes such as occupied
more difficult than detecting the fault itself. Various faults can produce and unoccupied modes, heating and cooling modes, and even free
similar symptoms even in a specific HVAC system. Accurate diagnosis of ventilation mode. These modes can change frequently based on building
the root cause usually demands an in-depth understanding of the HVAC occupancy, external weather conditions, and building setpoints. Because
configuration and control strategies, which are unique to each building of this variability, it is essential that the proposed AFDD method cap­
[8]. However, as a generic AFDD framework applicable to different light tures the system’s full range of operational behaviors. To address this,
commercial buildings, the fault diagnosis in the proposed method fo­ the AFDD method is designed to analyze a comprehensive window of
cuses on identifying problematic inputs (sensor data, control values, data that reflects all of these operational modes. By doing so, it can more
etc.) and pinpointing the location (level of HVAC system) of the fault accurately capture the system’s historical performance, including its
within the HVAC system (either at the central system or in the zones). dynamic responses to different conditions. A minimum of 3 months of
Additionally, in this study, each fault is classified in a binary manner training data ensures that the model is exposed to the complete range of
into two levels of severity ‘high’ and ‘low’. This information is then operational scenarios, including seasonal variations, guaranteeing a
provided to HVAC operators, enabling them to make informed decisions more reliable fault detection and diagnosis process.
and accurately identify the fault. Finally, it is important to mention that based on the computational
The diagnosis process starts with applying PCA to the subset of data limits of the existing building system, the method is suggested to be
identified as unfaulty during the fault detection phase. By transforming executed once per day. The specific time for execution can be chosen
and reconstructing each of the unfaulty and faulty subsets using the PCA based on the HVAC operator’s preference and operational requirements.
model trained on the unfaulty samples, the reconstruction error matrix
for each subset can be calculated. Using the principal concepts of 4. Validation case studies
Reconstruction Error (RE) illustrated in Fig. 1, the RE of unfaulty and
faulty samples should exhibit distinct differences. The average RE de­ To validate and check the generalizability of the presented method,
viation between the unfaulty and faulty subsets can provide insights into the HVAC system of a typical light commercial building in Montreal,
the specific inputs causing the fault. If the inputs with high RE deviations Canada (Primary case study), and a dataset from a small industrial fa­
belong to the central system components (e.g., Roof Top Unit, AHU), cility in Ireland (Secondary case study) were used, representing typical
these inputs will be reported, and the fault location will be identified as challenges such as unlabeled missing and noisy data, making them an
the ‘central system level’, otherwise it will be considered as ‘zone level’. excellent test cases.
For inputs associated with zone-level faults, further investigation is
required. The diagnosis process will be reapplied solely to inputs 4.1. Primary case study
belonging to the central system. The Deviated RE of inputs corre­
sponding to the source will be calculated. If a Deviated RE input related The HVAC system of a typical light commercial building in Montreal,
to the central system is detected, the fault location will be reported at the Canada, has been selected as the primary building to validate the pre­
central system level. Otherwise, the fault will be noted at the zone level, sented method. This choice was made because detailed information
specifying the faulty zone based on the affected inputs. about the system and access to HVAC operators were available. The
In the end, if only the SPEtma is above the threshold, the severity is system includes a single-duct electrical heating AHU combined with four
considered to be low (level 1). If not only the SPEtma is above the VAV boxes with electric coil reheating capability. The AHU’s electric
threshold but SPEi values are also outside the dynamic range, then the heating coil has a maximum heating capacity of 21.2 kW. The supply fan
fault severity is considered to be high (level 2). The reported inputs, fault can deliver air at a maximum rate of 2500 cubic feet per minute (CFM)
location, and severity levels provided to the HVAC operators will enable and is powered by a 1.5 horsepower (HP) motor. The system’s cooling
them to conduct further investigations. Using their experience with the capacity is comparable to 6 tons of refrigeration. The heating coil’s
system, they can identify its specific faults. The proposed localization steady-state efficiency is between 80.5 % and 81.1 %, and the direct
technique addresses a key limitation of previous PCA-based methods, expansion cooling coil’s efficiency stands at 80 %. In terms of energy
which only report faulty sensors without considering the system’s hi­ efficiency, the system has an Energy Efficiency Ratio (EER) of 12 during
erarchy. By distinguishing the system level at which the fault originates, cooling and a Seasonal Energy Efficiency Ratio (SEER) of 14. Given the
our approach provides enhanced fault traceability, overcoming the nature of the data and the extensive period it covers, this building serves
confusion often faced by operators in multilevel systems. as a proper example for developing and evaluating AFDD methods in
An important consideration is the amount of data required for the light commercial structures. The floor plan and the schematic of the
method. To address this, various data durations were chosen for training HVAC system given from OCN+ BEM of the building are presented in
and testing purposes. Based on the findings of this study testing on two Fig. 3. OCN+ is the building energy management system of the building
different case studies, a minimum of 3 months of data is necessary for of interest. It provides information about sensor locations and de­
the training phase. For testing (operational purposes), the amount of scriptions, and it can be used to monitor data trends and export data.
data required for each execution should be at least one step larger than Data collection was conducted utilizing the OCN+ BEM system of the
the window size used for the moving SPE. For instance, if a 15-day building, spanning from January 8, 2022, to March 18, 2023. The se­
window is employed, at least 16 days of data are needed. During the lection of input data for the AFDD was informed by a detailed analysis of
initial 15 days, the method learns the system dynamics but does not sensor descriptions within the OCN+ BEM framework. This selection
detect faults within this period. The selection of the training period and process was guided by consultations with HVAC system operators and
the window size for the testing period can be determined by HVAC the incorporation of critical variables as recommended by the ASHRAE
operators based on their specific concerns. In this study, a minimum of 3 Guideline 36 for high-performance sequence control [62].
months training period and a window size of 15 days for the testing The data frame encompasses sensor data, including analog values, as
period were selected after analyzing fault-free conditions to reduce false well as control values, which consist of both binary and analog types.
alarms. As a result, the false alarm rate decreased from 18 %-26 % with Additionally, three calculated features have been integrated into the
one month of training data to 0 %-4 % when using various fault-free dataset. The first is the total airflow rate of the AHU, determined by the
continuity (the conservation of mass) equation. The second and third

7
M. Babadi Soultanzadeh et al. Building and Environment 267 (2025) 112312

Fig. 3. The Floor plan (a) and HVAC system (b, c) of the studied light commercial building (From OCN+ BEM system).

features are the average room temperature and average supply tem­ important factor when using different inputs is to ensure that the cu­
perature across the zones, respectively. With these features included, the mulative variance threshold of 92 %, as mentioned in the Methodology
total number of distinct features in the data frame is 94, which are section, is satisfied. However, using different inputs may impact the
referred to as inputs of AFDD. Table 2. Introduced all the inputs sepa­ AFDD results.
rated by the level (location) of each and (i) refers to the zone assigned
number. 5. Results and discussion

4.2. Secondary case study This section presents and discusses the results of the proposed
method applied to the primary and secondary case study buildings. The
Since one target of this study is to investigate the generalizability of section starts with the data cleaning and training phase and continues
the proposed method for light commercial buildings, a second case with fault detection and isolation. The results begin with the compre­
example has been chosen from reference [12,63] with the dataset hensive framework outcomes for the primary building, selected due to
available in [64] as the secondary dataset for validation. This dataset, extensive information obtained from HVAC operators. After that, an
collected from a small industrial facility in Ireland, serves as an excellent example of AFDD results from the secondary building is presented and
test case for generalizability since the building can be considered discussed.
representative of a light commercial building. Additionally, the dataset,
obtained from a real BMS system, includes all the typical challenges 5.1. Data cleaning and PCA training
found in real data used for validation of the method, such as broken data
(missing data points), poor quality data (noisy sensor data), and The number of principal components (PCs) required for subsequent
importantly, the dataset is unlabeled. Table 3. Comparison of Data Used steps was primarily determined as shown in Fig. 4 for the office building
for Validation and Generalizability Analysis. The configuration and in Montreal. The results indicate that a minimum of 17 PCs is necessary
feature description of the dataset have been detailed in [64]. to meet the framework’s requirements. The cumulative variance
Regarding the two different datasets for validation, the number of explained of 0.926 for 17 PCs signifies that these components capture
inputs, types of data, and data quality can vary for this method. The most 92.6 % of the total variance in the dataset. Therefore, the next step

8
M. Babadi Soultanzadeh et al. Building and Environment 267 (2025) 112312

Table 2
The description of input data for the AFDD method.
Input Parameter Feature Type

Central system Level (AHU)

Airflow airflow_AC Calculated


Supply temperature supply_temp_AC Sensor_Analog
Return temperature return_temp Sensor_Analog
Outdoor Air Temperature OAT Sensor_Analog
Room setpoint AC room_sp_AC Control value
Supply pressure pressure Sensor_Analog
Damper positions dmp_pos_AC Control value
Heating loop heating_load_AC Control value
Cooling loop cooling_load_AC Control value
Fan mode fan_ac Control_Binary
Cooling mode cooling_ac Control_Binary
Heating mode heating_ac Control_Binary Fig. 4. Cumulative Variance Explained by Number of PCs.
Cooling fan demand cooling_fan_demand Control_Binary
Heating fan demand heating_fan_demand Control_Binary
Active heating room temperature act_htg_RTSP_AC Control value involves utilizing these 17 PCs for the cleaning and training processes.
setpoint AC Fig. 5 Shows the initial step, two interval steps, and the final step of
Active cooling room temperature act_clg_RTSP_AC Control value the data cleaning process for training the PCA. The horizontal axis
setpoint AC
represents the days within the dataset, and the vertical axis shows the
Sandby mode room temperature STBmode_RTSP_clg_offset Control value
setpoint cooling offset minimum daily SPE. It is evident that the threshold for the minimum
Standby mode room temperature STBmode_RTSP_htg_offset Control value daily SPE decreases with each step of the loop. The maximum acceptable
setpoint heating offset of minimum daily SPE for training the PCA is calculated to be 0.23. The
Zones Level entire cleaning process took 9 iterations, during which 27.5 % of the
data was removed from the initial dataset.
Airflow airflow_z(i) Sensor_Analog
Room Temperature room_temp_z(i) Sensor_Analog
Average Room Temperature Room_temp_avg Calculated 5.2. Fault detection and isolation in primary case study
Supply Temperature supply_temp_z(i) Sensor_Analog
Average Supply Temperature Supply_temp_avg Calculated
This section provides the results of fault detection and isolation using
Heating loop heating_load_z(i) Control value
Damper Position dmp_pos_z(i) Control value a dataset spanning two months. It discusses two examples of faults
Room temperature setpoint cooling RTSP_clg_occ_z(i) Control value within the initial dataset. In all result figures, the green band indicates
occupied the dynamic range of the system, with the black line representing the
Room temperature setpoint heating RTSP_htg_occ_z(i) Control value moving SPE and the yellow line indicating the moving SPE threshold.
occupied
Room temperature setpoint cooling RTSP_clg_unocc_z(i) Control value
Additionally, the traditional SPE threshold using a 95 % confidence level
un-occupied is shown by the dashed blue line.
Room temperature setpoint heating RTSP_htg_unocc_z(i) Control value The first example of a faulty pattern was detected on April 27, 2022,
un-occupied as shown in Fig. 6 (a). The two inputs most contributing to this fault
Motion sensor MotionSens_z(i) Sensor_Binary
were room_temp_z4 in the level of the zones, as indicated by the mean
Occupied mode OccMode_z(i) Control value
Active heating room temperature act_htg_RTSP_z(i) Control value value of reconstructed errors in Fig. 6 (b). Fig. 6 (c) shows that the de­
setpoint viation of room_temp_z4 was constant as depicted in Fig. 6 (d). With this
Active cooling room temperature act_clg_RTSP_z(i) Control value provided information, system operators confirmed this condition as a
setpoint sensor disconnection for room_temp_z4, where the sensor data reached
Active flow setpoint act_FlowSP_z(i) Control value
its maximum allowable limit in the system. A comparison between the
Minimum Airflow heating MinAF_htg_z(i) Control value
Minimum Airflow cooling MinAF_clg_z(i) Control value date of sensor disconnection and fault detection revealed that the
moving SPE reached its threshold before this incident, which is attrib­
uted to the fluctuating behavior of room_sp_AC during a system recon­
figuration period. Moreover, the method successfully detected and
Table 3
Comparison between validation data and generalizability analysis data.
isolated the fault related to room_temp_z4 disconnection, situated at the
zone level. Furthermore, the analysis indicated that following the sensor
Characteristics Industrial facility, Ireland Office building, Montreal
disconnection, this fault significantly impacted the system. This finding
(Secondary Building) (Primary Building)
was corroborated by the presence of outliers outside the dynamic system
Location Ireland Canada
band concurrent with the sensor disconnection fault, thereby elevating
Nature of data Unlabeled, Raw data Unlabeled, Raw Data
Data Quality Several missing/ Several missing/
the fault severity to level 2, as detailed in the methodology section.
Inconsistency Inconsistency Compared to traditional PCA, the proposed approach can detect de­
Time step 15 Minutes 5 Minutes viations in room_sp_AC, indicating sensor disconnection.
System AHU + 2 VAVs AHU + 4 VAVs As another example of a different source, the fault triggered on
Configuration
2023–02–22 is shown in Fig. 7. The initial inputs contributing to the
Heating loop Water heating coils Electric heating/reheating
coils reconstruction error deviation were the room temperature setpoints for
Cooling loop Water cooling coil DX cooling coil zone 2 and zone 5, prompting further analysis of the AHU. The AHU
Performance Energy Consumption Energy Consumption analysis also indicated a fault, with significant deviations in pressure
characteristic available unavailable and airflow as shown in Figs. 7 (c) and (d). Based on the results, the
Number of inputs 20 Inputs 94 Inputs
location of the fault was reported at the level of the AHU, with pressure
and airflow identified as the two inputs with the highest deviation. This
led HVAC operators to confirm a dirty filter fault. As shown in Fig. 7(b),
the deviation of setpoints is much higher than that of the pressure and
airflow due to the normalization process before PCA analysis. This

9
M. Babadi Soultanzadeh et al. Building and Environment 267 (2025) 112312

Fig. 5. The results in internal steps of the cleaning process using PCA loop.

indicates that the dirty filter significantly impacted the thermal comfort faults when they begin to significantly impact the behavior of the
of the occupants by decreasing the airflow, leading to a setpoint override HVAC system. For both buildings, the diagnosis results were highly
to improve comfort. This override had a substantial effect on the entire precise. As soon as the fault was detected, the method was able to
system, making fault detection easier. Additionally, it’s important to accurately isolate the source of the detected faults (or potential faults in
note that all four highly contributing inputs showed decreased values, the secondary building) without exception.
demonstrating that the method is effective in identifying faults in both To ensure the effectiveness of the AFDD method, further testing was
increasing and decreasing input behaviors. Furthermore, since there are conducted using new data from the OCN+ BEM system of the primary
not any outliers beyond the dynamic band (when the moving SPE is building. An example of the method’s effectiveness is the fault triggered
above the threshold), the severity of fault is reported as level 1. Finally, on 2023–09–17. This fault was related to the pressure sensor configu­
although the method shows a delay in detecting the dirty filter fault, it is ration during the firmware updating process. This example confirms the
much more effective than traditional PCA, which almost missed this method’s ability to handle new data following firmware updates (which
fault entirely. can affect data dynamics) using the pre-trained PCA.
Furthermore, in investigating the efficiency of the proposed AFDD
method, the method was also applied to a simulated labeled dataset for a
5.3. Fault detection and isolation in secondary case study single ducted AHU from LBNL [65], focusing on the fault of the outdoor
damper stuck fully closed. The method successfully detected the fault
The entire method was applied to the secondary dataset, and the after 5 executions and with an accuracy of 86 %. During the diagnosis
results have been obtained. The testing of the method showed consistent phase, the main contributors to the fault were identified as the supply air
outcomes. A good example of the model’s performance is when a fault pressure setpoint (SA_SPSTP), supply air pressure (SA_SP), and outdoor
was triggered on 2021–08–02, as shown in Fig. 8 (a). The most damper position (OA_DMPR), due to the SA_SP dropping to zero while
contributing input was reported as ReHeatVlvPos_2, which is the reheat the supply air CFM remained unchanged, and the SA_SPSTP dropping to
valve position signal for zone 2, based on Fig. 8 (c). The RE has fluctu­ -400 inches of H₂O, which is an unrealistic condition. These findings
ations in Fig. 8 (d), indicating that it is not stuck. Additionally, there is suggested that AFDD methods (especially supervised methods) devel­
no deviation in the inputs related to the AHU, the fault is determined to oped using simulated data may not be effective in real conditions, likely
be at zone 2, with the problematic input being ReHeatVlvPos_2. due to the difficulty of representing the faulty conditions accurately
Although the location and inputs should be reported to the HVAC system within simulations.
operators to decide if there was a fault at that time, the method shows
that it can detect deviations, isolate the problematic input, and deter­ 6. Conclusion
mine the level of divergence.
Regarding AFDD results from both the primary and secondary In conclusion, this study has developed an unsupervised PCA-based
buildings, the method demonstrated promising effectiveness in detect­ AFDD tailored for application in light commercial buildings. The
ing and isolating faults using the same tuning parameters, including method is designed to detect and diagnose faults using raw, unlabeled
window sizes, m, mstd, and mthresh. It should be noted that the secondary data from BEMS, showcasing its ability to work with datasets from
building’s data was resampled using the forward filling method to match existing buildings without any further pre-investigation and labeling.
the primary building’s data time step of 5 minutes. Furthermore, the The data can be exported from BEMS and directly used in the AFDD
results indicated that different data from various HVAC configurations method. Validation was conducted on a light commercial building in
and components of light commercial buildings can be effectively used as Montreal, Canada, and its transferability and generalizability were
inputs for the method, showcasing its adaptability to different types and confirmed by successfully testing it on another building in Ireland. The
volumes of data. However, the method still requires HVAC operators to method’s transferability and generalizability allow it to be applied to
confirm faults based on the information provided by the AFDD method. different configurations of HVAC systems with varying numbers of in­
In both buildings, AFDD exhibited a specific limitation when dealing puts, ensuring effective performance across diverse HVAC setups.
with faults of lower severity. Since the method is designed to be Furthermore, the method demonstrated promising performance in
executed once a day, the detection time can vary. For instance, imme­ detecting and isolating faults both in initial and unseen datasets.
diate detection occurred with the first execution of the method for the Different faults related to sensors and system conditions have been
Disconnected Sensor fault in the primary building, while detection took detected and isolated using this method, illustrating its capability
up to six executions for the ReHeatVlvPos_2 signal fault in the secondary beyond sensor faults. Generally, any fault that affects the system’s
building. This indicates that the method can detect condition-based

10
M. Babadi Soultanzadeh et al. Building and Environment 267 (2025) 112312

Fig. 6. Disconnected sensor fault on 2022–04–27.

behavior can be identified and isolated using this approach. • The involvement of an HVAC operator is always necessary for
The most important aspect of using this method is choosing the comprehensive diagnosis and final confirmation.
appropriate tuning parameters. A good set of tuning parameters can be • The method requires at least three months of data to begin func­
selected by HVAC consultants or operators through initial trials on the tioning and is not applicable during the initial running period of the
dataset to optimize fault detection. While the method automates fault system or after major system changes until sufficient data accumu­
detection, isolates faulty inputs, and identifies fault severity levels, some lates. However, additional operational data can always be appended
limitations can be listed as follows: to the initial dataset, providing a more comprehensive dataset for the
training phase.

11
M. Babadi Soultanzadeh et al. Building and Environment 267 (2025) 112312

Fig. 7. Dirty Filter Fault detected on 2023–02–22.

• During the fault detection phase, a single day of data is insufficient; • The proposed AFDD method can detect faults when they cause
at least n+1 (which n is the window size in days) consecutive days of changes in historical trends.
data are needed when the method is detecting faults. The method
cannot detect faults within the first n days, as it relies on a window However, this method offers a straightforward AFDD solution with
size of n days to capture the dynamics of the system. minimal computational overhead, suitable for diverse HVAC systems
• If there is a continuous fault in the testing dataset that started before and configurations in light commercial buildings. One of the significant
the date of the first sample and persisted until the test date, the advantages of this AFDD method is its ability to utilize an unlabeled raw
method will not be able to detect it, as the system’s dynamics will dataset without any prior information about the faults and their severity,
remain unchanged. making it highly adaptable to different scenarios. Unlike previous

12
M. Babadi Soultanzadeh et al. Building and Environment 267 (2025) 112312

Fig. 8. Generalizability analysis of the method.

approaches that require a set of unfaulty samples for PCA training, this significantly enhances fault detection accuracy, reducing the likelihood
method operates without pre-known unfaulty samples, enabling of false alarms and ensuring reliable performance in real-world
execution with minimal information about the system condition. This applications.
flexibility ensures that the method can be readily applied in various Future work could involve developing a set of rules using various
situations, even when detailed system information is lacking. Further­ machine learning techniques, such as association rule mining or
more, while earlier methods focused solely on steady-state conditions, Bayesian networks, to identify antecedents with a specific level of sup­
which increases false alarms during dynamic changes, this method in­ port and confidence that lead to particular faults. These probabilistic
corporates historical dynamic information of the system, utilizing both rules could accelerate the diagnosis process for HVAC operators,
dynamic and steady-state conditions. This comprehensive approach enhancing the automation and efficiency of fault detection and

13
M. Babadi Soultanzadeh et al. Building and Environment 267 (2025) 112312

diagnosis. Additionally, the long-term application of the proposed AFDD comprehensive framework, application, and performance evaluation, Energy Build.
316 (2024) 114341, https://doi.org/10.1016/j.enbuild.2024.114341.
method can offer valuable insights into its reliability by continuously
[7] S. Katipamula, M.R. Brambley, Review article: Methods for fault detection,
addressing various faults that may arise over time. diagnostics, and prognostics for building systems—A review, part I, HVAC R Res.
In summary, this method presents an efficient AFDD solution that 11 (1) (2005) 3–25, https://doi.org/10.1080/10789669.2005.10391123.
leverages minimal information to deliver accurate and reliable AFDD [8] Z. Chen, et al., A review of data-driven fault detection and diagnostics for building
HVAC systems, Appl. Energy 339 (2023), https://doi.org/10.1016/j.
when combined with operator experience for precise diagnosis. Its apenergy.2023.121030.
applicability across various HVAC configurations, combined with its [9] M.S. Mirnaghi, F. Haghighat, Fault Detection and Diagnosis of Large-Scale HVAC
potential to reduce operational costs and improve system efficiency, Systems in Buildings Using Data-Driven methods: A comprehensive Review,
Elsevier Ltd, 2020, https://doi.org/10.1016/j.enbuild.2020.110492.
makes it a valuable tool for HVAC operators in light commercial [10] A. Hosseini Gourabpasi, M. Nik-Bakht, Knowledge discovery by analyzing the state
buildings. All light commercial building HVAC operators can benefit of the art of data-driven fault detection and diagnostics of building HVAC, CivilEng
from the ability of this method to detect and isolate faults efficiently not 2 (4) (2021) 986–1008, https://doi.org/10.3390/civileng2040053.
[11] Z. Shi and W. O’Brien, “Development and implementation of automated fault
only enhances system performance but also contributes to energy sav­ detection and diagnostics for building systems: A review”, Aug. 01, 2019, Elsevier
ings and cost reductions, making it an economically beneficial solution. B.V. doi: 10.1016/j.autcon.2019.04.002.
Additionally, the continuous operation of this AFDD method enables [12] M. Ahern, D.T.J. O’Sullivan, K. Bruton, Implementation of the IDAIC framework on
an air handling unit to transition to proactive maintenance, Energy Build. 284
building HVAC operators to label the data. Upon detecting a potential (2023) 112872, https://doi.org/10.1016/j.enbuild.2023.112872.
fault, the information provided about the fault’s location and source [13] S. Frank, G. Lin, X. Jin, R. Singla, A. Farthing, J. Granderson, A performance
allows operators to use their field experience to assign a specific fault evaluation framework for building fault detection and diagnosis algorithms,
Energy Build. 192 (2019) 84–92, https://doi.org/10.1016/j.enbuild.2019.03.024.
label. These labels can be stored in the dataset, allowing for the
[14] M.A.F. Abdollah, R. Scoccia, M. Aprile, Transformer encoder based self-supervised
extraction of rules that indicate which fault sources and locations learning for HVAC fault detection with unlabeled data, Build. Environ. 258 (2024),
correspond to specific fault labels. This process helps improve automatic https://doi.org/10.1016/j.buildenv.2024.111568.
fault diagnosis over time. [15] J. Bi et al., “AI in HVAC fault detection and diagnosis: A systematic review”, Jun.
01, 2024, Elsevier B.V. doi: 10.1016/j.enrev.2024.100071.
[16] Y. Zhao, C. Zhang, Y. Zhang, Z. Wang, J. Li, A review of data mining technologies
CRediT authorship contribution statement in building energy systems: load prediction, pattern identification, fault detection
and diagnosis, KeAi Communications Co, 2020, https://doi.org/10.1016/j.
enbenv.2019.11.003.
Milad Babadi Soultanzadeh: Writing – original draft, Visualization, [17] Y. Bezyan, M. Nik-Bakht, F. Nasiri, Fault detection and diagnosis for of chillers
Validation, Methodology, Investigation, Formal analysis, Data curation, under transient conditions, Canadian Society of Civil engineering, Moncton, 2023.
Conceptualization. Mazdak Nik-Bakht: Writing – review & editing, [18] P. Barandier, M. Mendes, A.J. Marques Cardoso, Comparative analysis of four
classification algorithms for fault detection of heat pumps, Energy Build. 316
Supervision. Mohamed M. Ouf: Writing – review & editing, Supervi­ (2024) 114342, https://doi.org/10.1016/j.enbuild.2024.114342.
sion, Funding acquisition. Pierre Paquette: Writing – review & editing, [19] Q. Ma, C. Yue, M. Yu, Y. Song, P. Cui, Y. Yu, Research on fault diagnosis strategy of
Supervision, Resources. Steve Lupien: Writing – review & editing, Su­ air-conditioning system based on signal demodulation and BPNN-PCA, Internat. J.
Refrig. 158 (2024) 124–134, https://doi.org/10.1016/j.ijrefrig.2023.12.008.
pervision, Resources. [20] S.L. Zhou, A.A. Shah, P.K. Leung, X. Zhu, Q. Liao, A comprehensive review of the
applications of machine learning for HVAC, DeCarbon 2 (2023) 100023, https://
doi.org/10.1016/j.decarb.2023.100023.
Declaration of competing interest
[21] J. Chen, L. Zhang, Y. Li, Y. Shi, X. Gao, Y. Hu, A review of computing-based
automated fault detection and diagnosis of heating, ventilation and air
The authors declare that they have no known competing financial conditioning systems, Elsevier Ltd, 2022, https://doi.org/10.1016/j.
rser.2022.112395.
interests or personal relationships that could have appeared to influence
[22] K. Verbert, R. Babuška, B. De Schutter, Combining knowledge and historical data
the work reported in this paper. for system-level fault diagnosis of HVAC systems, Eng. Appl. Artif. Intell. 59 (2017)
260–273, https://doi.org/10.1016/j.engappai.2016.12.021.
[23] D. Dey, B. Dong, A probabilistic approach to diagnose faults of air handling units in
Acknowledgment
buildings, Energy Build. 130 (2016) 177–187, https://doi.org/10.1016/j.
enbuild.2016.08.017.
The research team extends their deepest gratitude to the Natural [24] G. Ma, H. Ding, Semi-Supervised Random Forest Methodology for Fault Diagnosis
Science and Engineering Research Council of Canada (NSERC) (#ALLRP in Air-Handling Units, Buildings 13 (1) (2023), https://doi.org/10.3390/
buildings13010014.
576761 – 22), MITACS (#IT32282), and Strato Automation. Their [25] M.G. Albayati, J. Faraj, A. Thompson, P. Patil, R. Gorthala, S. Rajasekaran, Semi-
generous financial support, steadfast assistance, and provision of crucial supervised machine learning for fault detection and diagnosis of a rooftop unit, Big
data were vital to the successful completion of this study. Data Min. Analyt. 6 (2) (2023) 170–184, https://doi.org/10.26599/
BDMA.2022.9020015.
[26] C. Fan, Q. Wu, Y. Zhao, L. Mo, Integrating active learning and semi-supervised
Data availability learning for improved data-driven HVAC fault diagnosis performance, Appl.
Energy 356 (2024), https://doi.org/10.1016/j.apenergy.2023.122356.
[27] S. Li, J. Wen, A model-based fault detection and diagnostic methodology based on
The authors do not have permission to share data. PCA method and wavelet transform, Energy Build. 68 (2014) 63–71, https://doi.
org/10.1016/j.enbuild.2013.08.044, no. PARTA.
References [28] J.Edward. Jackson, A User’s Guide to Principal Components, Wiley, 1991.
[29] Y. Hu, H. Chen, J. Xie, X. Yang, C. Zhou, Chiller sensor fault detection using a self-
adaptive principal component analysis method, Energy Build. 54 (2012) 252–258,
[1] S.P. Melgaard, K.H. Andersen, A. Marszal-Pomianowska, R.L. Jensen, and P.K.
https://doi.org/10.1016/j.enbuild.2012.07.014.
Heiselberg, “Fault detection and diagnosis encyclopedia for building systems: a
[30] G. Li, Y. Hu, Improved sensor fault detection, diagnosis and estimation for screw
systematic review”, Jun. 01, 2022, MDPI. doi: 10.3390/en15124366.
chillers using density-based clustering and principal component analysis, Energy
[2] F. Xiao and S. Wang, “Progress and methodologies of lifecycle commissioning of
Build. 173 (2018) 502–515, https://doi.org/10.1016/j.enbuild.2018.05.025.
HVAC systems to enhance building sustainability”, Jun. 2009. doi: 10.1016/j.
[31] S. Wang, F. Xiao, Detection and diagnosis of AHU sensor faults using principal
rser.2008.03.006.
component analysis method, Energy Convers. Manage 45 (17) (2004) 2667–2686,
[3] Statistics Canada, Survey of commercial and institutional energy use, Canadian
https://doi.org/10.1016/j.enconman.2003.12.008.
Government, 2019.
[32] S. Wang, F. Xiao, AHU sensor fault diagnosis using principal component analysis
[4] Z. Chen, P. Xu, F. Feng, Y. Qiao, W. Luo, Data mining algorithm and framework for
method, Energy Build. 36 (2) (2004) 147–160, https://doi.org/10.1016/j.
identifying HVAC control strategies in large commercial buildings, Build. Simul. 14
enbuild.2003.10.002.
(1) (2021) 63–74, https://doi.org/10.1007/s12273-019-0599-0.
[33] S. Wang, J. Qin, Sensor fault detection and validation of VAV terminals in air
[5] M. Babadi Soutanzadeh, M. Ouf, N.B. Nik-Bakht, P. Paquette, S. Lupien,
conditioning systems, Energy Convers. Manage 46 (2005) 2482–2500, https://doi.
A framework for automated fault detection in light commercial buildings HVAC
org/10.1016/j.enconman.2004.11.011, 15–16Sep.
system, ASHRAe Trans. 130 (2024) 590–599. Accessed: Jul. 28, 2024. [Online].
[34] S. Wang, J. Cui, A robust fault detection and diagnosis strategy for centrifugal
Available, https://www.scopus.com/record/display.uri?eid=2-s2.0-851989143
chillers, HVAC R Res. 12 (3) (2006) 407–428, https://doi.org/10.1080/
62&origin=resultslist.
10789669.2006.10391187.
[6] M. Babadi Soultanzadeh, M.M. Ouf, M. Nik-Bakht, P. Paquette, S. Lupien, Fault
detection and diagnosis in light commercial buildings’ HVAC systems: A

14
M. Babadi Soultanzadeh et al. Building and Environment 267 (2025) 112312

[35] S. Wang, F. Xiao, Sensor fault detection and diagnosis of air-handling units using a [51] L. Burgas, J. Colomer, J. Melendez, F.I. Gamero, S. Herraiz, Integrated unfold-pca
condition-based adaptive statistical method, HVAC R Res. 12 (1) (2006) 127–150, monitoring application for smart buildings: An ahu application example, Energies.
https://doi.org/10.1080/10789669.2006.10391171. (Basel) 14 (1) (2021), https://doi.org/10.3390/en14010235.
[36] F. Xiao, S. Wang, J. Zhang, A diagnostic tool for online sensor health monitoring in [52] X. Yang, R. He, J. Wang, X. Li, R. Liu, Using thermal load matching strategy to
air-conditioning systems, Autom. Constr. 15 (4) (2006) 489–503, https://doi.org/ locate historical benchmark data for moving-window PCA based fault detection in
10.1016/j.autcon.2005.06.001. air handling units, Sustain. Energy Techn. Assess. 52 (2022), https://doi.org/
[37] Z. Du, X. Jin, Multiple faults diagnosis for sensors in air handling unit using Fisher 10.1016/j.seta.2022.102238.
discriminant analysis, Energy Convers. Manage 49 (12) (2008) 3654–3665, [53] A. Liang, Y. Hu, G. Li, The impact of improved PCA method based on anomaly
https://doi.org/10.1016/j.enconman.2008.06.032. detection on chiller sensor fault detection, Internat. J. Refrig. 155 (2023) 184–194,
[38] Z. Du, X. Jin, Detection and diagnosis for sensor fault in HVAC systems, Energy https://doi.org/10.1016/j.ijrefrig.2023.09.002.
Convers. Manage 48 (3) (2007) 693–702, https://doi.org/10.1016/j. [54] S. Wen, et al., An enhanced principal component analysis method with
enconman.2006.09.023. Savitzky–Golay filter and clustering algorithm for sensor fault detection and
[39] Z. Du, X. Jin, Detection and diagnosis for sensor fault in HVAC systems, Energy diagnosis, Appl. Energy 337 (2023), https://doi.org/10.1016/j.
Convers. Manage 48 (3) (2007) 693–702, https://doi.org/10.1016/j. apenergy.2023.120862.
enconman.2006.09.023. [55] X. Yang, J. Chen, X. Gu, R. He, J. Wang, Sensitivity analysis of scalable data on
[40] F. Xiao, S. Wang, X. Xu, G. Ge, An isolation enhanced PCA method with expert- three PCA related fault detection methods considering data window and thermal
based multivariate decoupling for sensor FDD in air-conditioning systems, Appl. load matching strategies, Expert. Syst. Appl. 234 (2023), https://doi.org/10.1016/
Therm. Eng. 29 (4) (2009) 712–722, https://doi.org/10.1016/j. j.eswa.2023.121024.
applthermaleng.2008.03.046. [56] Q. Ma, C. Yue, M. Yu, Y. Song, P. Cui, Y. Yu, Research on fault diagnosis strategy of
[41] S. Wang, Q. Zhou, F. Xiao, A system-level fault detection and diagnosis strategy for air-conditioning system based on signal demodulation and BPNN-PCA, Internat. J.
HVAC systems involving sensor faults, Energy Build. 42 (4) (2010) 477–490, Refrig. 158 (2024) 124–134, https://doi.org/10.1016/j.ijrefrig.2023.12.008.
https://doi.org/10.1016/j.enbuild.2009.10.017. [57] G. Li, C. Xiong, J. Gao, H. Zhu, C. Wang, J. Xiao, Fault detection, diagnosis and
[42] S. Li, J. Wen, Application of pattern matching method for detecting faults in air calibration of heating, ventilation and air conditioning sensors by combining
handling unit system, Autom. Constr. 43 (2014) 49–58, https://doi.org/10.1016/j. principal component analysis and improved bayesian inference, J. Build. Eng. 82
autcon.2014.03.002. (2024) 108230, https://doi.org/10.1016/j.jobe.2023.108230.
[43] M. Padilla, D. Choinière, A combined passive-active sensor fault detection and [58] R. Yan, Z. Ma, Y. Zhao, G. Kokogiannakis, A decision tree based data-driven
isolation approach for air handling units, Energy Build. 99 (2015) 214–219, diagnostic strategy for air handling units, Energy Build. 133 (2016) 37–45, https://
https://doi.org/10.1016/j.enbuild.2015.04.035. doi.org/10.1016/j.enbuild.2016.09.039.
[44] R. Yan, Z. Ma, G. Kokogiannakis, Y. Zhao, A sensor fault detection strategy for air [59] C. Fan, F. Xiao, C. Yan, C. Liu, Z. Li, J. Wang, A novel methodology to explain and
handling units using cluster analysis, Autom. Constr. 70 (2016) 77–88, https://doi. evaluate data-driven building energy performance models based on interpretable
org/10.1016/j.autcon.2016.06.005. machine learning, Appl. Energy 235 (2019) 1551–1560, https://doi.org/10.1016/
[45] Y. Hu, H. Chen, G. Li, H. Li, R. Xu, J. Li, A statistical training data cleaning strategy j.apenergy.2018.11.081.
for the PCA-based chiller sensor fault detection, diagnosis and data reconstruction [60] W.Y. Lee, J.M. House, N.H. Kyong, Subsystem level fault diagnosis of a building’s
method, Energy Build. 112 (2016) 270–278, https://doi.org/10.1016/j. air-handling unit using general regression neural networks, Appl. Energy 77 (2)
enbuild.2015.11.066. (2004) 153–170, https://doi.org/10.1016/S0306-2619(03)00107-7.
[46] Y. Guo, et al., An enhanced PCA method with Savitzky-Golay method for VRF [61] John Bollinger, 2020, www.bollingerbands.com.
system sensor fault detection and diagnosis, Energy Build. 142 (2017) 167–178, [62] M.M. Hydeman et al., “ASHRAE standing guideline project committee 36 cognizant
https://doi.org/10.1016/j.enbuild.2017.03.026. TC: 1.4, control theory and application SPLS Liaison: Julie M. Ferguson”, 2018.
[47] Y. Guo, et al., Modularized PCA method combined with expert-based multivariate [Online]. Available: www.ashrae.org/technology.
decoupling for FDD in VRF systems including indoor unit faults, Appl. Therm. Eng. [63] M. Ahern, D.T.J. O’sullivan, K. Bruton, Development of a framework to aid the
115 (2017) 744–755, https://doi.org/10.1016/j.applthermaleng.2017.01.008. transition from reactive to proactive maintenance approaches to enable energy
[48] G. Li, Y. Hu, Improved sensor fault detection, diagnosis and estimation for screw reduction, Appl. Sci. (Switzerland) 12 (13) (2022), https://doi.org/10.3390/
chillers using density-based clustering and principal component analysis, Energy app12136704.
Build. 173 (2018) 502–515, https://doi.org/10.1016/j.enbuild.2018.05.025. [64] M. Ahern, D.T.J. O’sullivan, and K. Bruton, “Specifications table value of the data”,
[49] A. Montazeri, S.M. Kargar, Fault detection and diagnosis in air handling using data- 109208 Energy Build, vol. 48, p. 2023, 2023, doi: 10.17632/8.
driven methods, J. Build. Eng. 31 (2020), https://doi.org/10.1016/j. [65] Jessica Granderson, Guanjing Lin, Yimin Chen, Armando Casillas, Sen Huang, and
jobe.2020.101388. Draguna Vrabie, “LBNL fault detection and diagnostics data sets: single duct air
[50] Y. Guo, H. Chen, Fault diagnosis of VRF air-conditioning system based on improved handling unit”, 2022.
Gaussian mixture model with PCA approach, Internat. J. Refrig. 118 (2020) 1–11,
https://doi.org/10.1016/j.ijrefrig.2020.06.009.

15

You might also like