0% found this document useful (0 votes)
13 views19 pages

Performance of NO, NO2 Low Cost Sensors and Three Calibration

The study evaluates the performance of low-cost NO and NO2 electrochemical sensors using three calibration algorithms in real-world urban settings. Results indicate that non-linear algorithms significantly outperform traditional methods, achieving low root mean square errors and demonstrating the sensors' ability to detect concentration differences effectively. The findings highlight the potential for these sensors to provide reliable air quality data, despite challenges related to calibration and relocation effects.

Uploaded by

Miftah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views19 pages

Performance of NO, NO2 Low Cost Sensors and Three Calibration

The study evaluates the performance of low-cost NO and NO2 electrochemical sensors using three calibration algorithms in real-world urban settings. Results indicate that non-linear algorithms significantly outperform traditional methods, achieving low root mean square errors and demonstrating the sensors' ability to detect concentration differences effectively. The findings highlight the potential for these sensors to provide reliable air quality data, despite challenges related to calibration and relocation effects.

Uploaded by

Miftah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Atmos. Meas. Tech.

, 11, 3717–3735, 2018


https://doi.org/10.5194/amt-11-3717-2018
© Author(s) 2018. This work is distributed under
the Creative Commons Attribution 4.0 License.

Performance of NO, NO2 low cost sensors and three calibration


approaches within a real world application
Alessandro Bigi1 , Michael Mueller2 , Stuart K. Grange3 , Grazia Ghermandi1 , and Christoph Hueglin2
1 “Enzo Ferrari” Department of Engineering, University of Modena and Reggio Emilia, Modena, Italy
2 Empa, Swiss Federal Laboratories for Materials Science and Technology, Duebendorf, Switzerland
3 Wolfson Atmospheric Chemistry Laboratory, University of York, York, UK

Correspondence: Alessandro Bigi ([email protected])

Received: 16 February 2018 – Discussion started: 12 March 2018


Revised: 29 May 2018 – Accepted: 12 June 2018 – Published: 26 June 2018

Abstract. Low cost sensors for measuring atmospheric pol- relative expanded uncertainty, resulted in ca. 15–20 ppb and
lutants are experiencing an increase in popularity worldwide was provided by the non-linear algorithms. As an assessment
among practitioners, academia and environmental agencies, for the suitability of the tested sensors for a targeted applica-
and a large amount of data by these devices are being deliv- tion, the probability of resolving hourly concentration differ-
ered to the public. Notwithstanding their behaviour, perfor- ence in cities was investigated. It was found that NO concen-
mance and reliability are not yet fully investigated and under- tration differences of 5–10 ppb (8–10 for NO2 ) can reliably
stood. In the present study we investigate the medium term be detected (90 % confidence), depending on the air pollu-
performance of a set of NO and NO2 electrochemical sensors tion level. The findings of this study, although derived from a
in Switzerland using three different regression algorithms specific sensor type and sensor model, are based on a flexible
within a field calibration approach. In order to mimic a re- methodology and have extensive potential for exploring the
alistic application of these devices, the sensors were initially performance of other low cost sensors, that are different in
co-located at a rural regulatory monitoring site for a 4-month their target pollutant and sensing technology.
calibration period, and subsequently deployed for 4 months
at two distant regulatory urban sites in traffic and urban back-
ground conditions, where the performance of the calibration
algorithms was explored. The applied algorithms were Mul- 1 Introduction
tivariate Linear Regression, Support Vector Regression and
Random Forest; these were tested, along with the sensors, in Air quality assessment for regulatory purposes is addressed
terms of generalisability, selectivity, drift, uncertainty, bias, by means of monitoring stations following a strict QA/QC
noise and suitability for spatial mapping intra-urban pollu- protocol in order to deliver measurements having an uncer-
tion gradients with hourly resolution. Results from the de- tainty within a specific range that is appropriate for the pur-
ployment at the urban sites show a better performance of the pose (2008/50/EC, Council of Europe, 2008). The costs as-
non-linear algorithms (Support Vector Regression and Ran- sociated to these monitoring sites led to a reconfiguration of
dom Forest) achieving RMSE < 5 ppb, R 2 between 0.74 and regulatory air quality networks across Europe over the last
0.95 and MAE between 2 and 4 ppb. The combined use of decade, resulting in improved but still spatially sparse regu-
both NO and NO2 sensor output in the estimate of each pol- latory air quality networks over the continent. Although this
lutant showed some contribution by NO sensor to NO2 es- trend towards optimisation is coherent with main regulatory
timate and vice-versa. All algorithms exhibited a drift rang- needs, it is not consistent with the increasing demand for
ing between 5 and 10 ppb for Random Forest and 15 ppb for spatio-temporal air quality information in urban areas, where
Multivariate Linear Regression at the end of the deployment. largest part of worldwide population lives (United Nations,
The lowest concentration correctly estimated, with a 25 % 2015). Up to now, two of the most promising approaches for
estimating air quality conditions in complex environments

Published by Copernicus Publications on behalf of the European Geosciences Union.


3718 A. Bigi et al.: Low cost sensors in a real world application

such as urban areas are simulation models and small low cost from classical statistics, e.g. Multivariate Linear Regression
sensors. The former approach include dispersion modelling (Mijling et al., 2018; Mueller et al., 2017), or more sophis-
(e.g. Ghermandi et al., 2015), while the latter approach con- ticated methods such as high-dimensional model represen-
sists in sensor deployment for time-resolved air quality map- tation (Cross et al., 2017). In other studies several machine
ping (e.g. Mueller et al., 2016), plume tracking or other tasks. learning algorithms have been tested, for both metal oxide
Besides some devices based on the absorption in the infrared and electrochemical sensors, including also laboratory cali-
region by the target gas, most common low cost sensors for bration: different types of Artificial Neural Networks (ANN,
gas phase compounds are based on either metal oxide or elec- e.g. De Vito et al., 2009; Esposito et al., 2016; Spinelle et al.,
trochemical technology. The high expectations from these 2015), Reservoir Computing (Fonollosa et al., 2015), Ran-
two latter types of low cost sensors were seldom met, as they dom Forest (Zimmerman et al., 2018) and a recent com-
often face problems of calibration (Spinelle et al., 2013), sta- parison of three algorithms fed by dynamic and static input
bility (Fonollosa et al., 2016), cross-sensitivity (Mead et al., shows promising results by Support Vector Regression (De
2013) and low repeatability and reproducibility (Rai et al., Vito et al., 2018).
2017), urging for more research and tests for their mind- This latter literature showed how generally calibration pro-
ful use (Lewis and Edwards, 2016). Among these problems, cedures involving non-linear methods outperform those us-
calibration is one of the major unsolved issues, preventing ing classical statistics, and better capture the effects of envi-
broad use of these devices: ideally a calibration should in- ronmental factors on sensor response. However, the perfor-
clude a full description of the sensors physical or chemical mance shown by several of the methods cited above is not
working principles along with its response to all environ- taking into account the effects of relocation, which has to
mental conditions and with ageing. Calibration approaches be expected in a realistic use of similar devices. The main
should be consistent with the intended application and the notable exception is a study on SO2 electrochemical sen-
resulting measuring device, made up of a sensing unit and its sors by Hagan et al. (2018), who achieved RMSE values of
calibration model, should meet the performance required by ∼ 8 ppb and R 2 ∼ 0.88 during a 4 month relocation using a
the application. Indications about possible minimum require- Hybrid Regression model, combining a linear with a non-
ments for air quality studies can be taken by the EU directive linear solution. Other studies involving relocation include Es-
2008/50/EC, requiring an expanded uncertainty of 25 % for posito et al. (2018), who showed a significant degradation of
indicative measurement devices. NO2 estimate by electrochemical sensors after their reloca-
Main current calibration solutions involve either sensor tion within the urban area of Oslo (Norway), along with the
testing in the laboratory under controlled conditions or field one by Zimmerman et al. (2018), who showed a good perfor-
co-location of sensors next to a calibrated reference instru- mance from a Random Forest regression model on a 4-weeks
ment, with the former being an approach based on first prin- relocation in the vicinity of the calibration site.
ciples and the latter an approach based on co-location data. In the present study we installed a set of electrochemi-
Until now the former approach provided unsatisfactory re- cal sensors at a rural site exposed to highway traffic emis-
sults during the model validation in the field (e.g. Spinelle sion for calibration and subsequently deployed these same
et al., 2017; Fonollosa et al., 2016), making a field calibration sensors in two distant urban sites in traffic and background
approach more commonly and successfully applied. How- conditions. The first aim of the study is to compare state-of-
ever, this latter approach introduced issues about the gen- the-art calibration algorithms, using a data-driven approach,
eralisability of a calibration model, because of the limited within this realistic framework. The second is to investigate
and site-specific range of environmental conditions occur- the change in performance over time and after a relocation
ring during the calibration period. This holds even more true of these measuring devices, i.e. of the sensor units (the hard-
in case the calibration and the following measurements are ware) and of their individual calibration (data processing al-
performed at two different sites, i.e. in case of relocation, gorithm). The final aim is the quantitative assessment of the
with the additional possible influence of sensor handling and measurement uncertainty of sensor units deployed in a net-
transport. Nonetheless, in the common case of field calibra- work and investigate whether they are suitable for mapping
tion, the subsequent relocation is extremely likely in a realis- intra-urban pollution gradients of NO and NO2 . The results
tic application of these devices, because of the sparsity of the strictly apply to the type and model of sensors involved (ac-
regulatory monitoring networks and given the most straight- tually extremely popular among sensor systems) and to the
forward applications of these sensors, i.e. the collection of environmental conditions during sampling, nonetheless the
time-resolved air quality data where no data is available. In flexibility of the methodology here used has a large potential
the literature the effect of relocation is scarcely described, for other low cost sensing instruments.
while several studies show results from a field calibration In Sect. 2, the sensor units and the calibration methods
and further deployment at the same site. For this latter case are described. Results from the calibration and the deploy-
several algorithms have been tested: since field calibration ment periods are found in Sect. 3. Finally the results are dis-
consists in a data driven approach, the algorithm used has a cussed and main conclusions are drawn. All data processing
large impact on the final results. Some studies used models

Atmos. Meas. Tech., 11, 3717–3735, 2018 www.atmos-meas-tech.net/11/3717/2018/


A. Bigi et al.: Low cost sensors in a real world application 3719

has been performed with the software R 3.4.2 (R Core Team, nal readings and rrm was computed (σdif ), then each original
2017). reading having a difference to rrm larger than s times σdif was
removed. This latter procedure used the command despike
in the oce package, where k and s parameters were individ-
2 Materials and methods ually set for each electrochemical sensor. The 1 min despiked
data were subsequently averaged to 10 min readings and used
2.1 Sensor units for all following analyses, except where stated otherwise.

Four identical sensor units have been jointly developed with


Decentlab GmbH (Dübendorf, Switzerland) and used for this
study. The sensor units used are labelled SU009, SU010, 2.2 Calibration and deployment sites
SU011 and SU012. Each unit consists of one box that in-
cludes two NO2 sensors (Alphasense NO2 -B43F), two NO
sensors (Alphasense NO-B4), a relative humidity (RH) and
temperature (T ) sensor (Sensirion STH21) and a data trans- All four units were initially installed at the Härkingen
mission module using GSM/GPRS connection. The system (Switzerland) monitoring site within the Swiss Federal Air
is battery powered. Two identical NO sensors and two identi- Quality Monitoring Network (HAE: 47.311◦ N, 7.820◦ E,
cal NO2 sensors are used for a better control of the data qual- 430 m a.s.l.). SU009, SU011 and SU012 were installed on
ity. NO and NO2 sensors are housed inside the box to better 13 April 2017, while SU010 was installed on 5 May 2017.
protect their gas permeable membrane, and a small blower All boxes were removed from HAE on 20 July 2017. The
is used to draw ambient air through a teflon (PTFE) mani- HAE monitoring site encounters clean (rural) air masses
fold to which the sensors are connected. The electrochemi- when northern winds blow and polluted (highway) air masses
cal (EC) sensors used employ four electrodes: working, ref- when southern winds blow. This allows an exposure of sen-
erence and counter electrodes account for target gas concen- sors to a wide range of pollutant concentration (Hueglin
tration, while a fourth auxiliary electrode compensates for et al., 2006). The data collected at HAE represents the cal-
zero current. The former three electrodes represent an elec- ibration dataset (or training dataset) and were used to de-
trochemical cell where a redox reaction of the target gas oc- velop, train and validate the three regression algorithms
curs, generating a electric current directly proportional to the tested in this study. In order to estimate the performance
gas concentration, while the auxiliary electrode accounts for of the sensor units within a realistic application framework,
changes in baseline signal (further details in Baron and Saf- the regression models calibrated upon this latter dataset
fell, 2017; Alphasense Ltd, 2014). The blower is operated for were subsequently used to estimate concentrations after de-
7 s every 20 s as a compromise between battery consumption ploying the units to different sites, experiencing different
and sample collection, and its main benefit is threefold: the pollution levels and different environmental conditions. On
air is not reaching the EC by diffusion only; therefore, the de- 28 July 2017 the units were moved to two different air qual-
pendence on ambient conditions, especially on wind speed, is ity monitoring sites: SU009 and SU010 were deployed at
decreased; it reduces the overall response time, since shield- Zurich-Kaserne, an urban background site in Switzerland
ing the sensor inside the box results in a certain dead volume; (ZUE: 47.378◦ N, 8.530◦ E, 408 m a.s.l.). SU011 and SU012
it limits water vapour condensation on the EC membrane, were deployed at an urban traffic site in Lausanne, Switzer-
and/or enhances its evaporation. The 1 min averaging time is land (LAU: 46.522◦ N, 6.640◦ E, 495 m a.s.l.). At these mon-
longer than the fluctuations in the flow; therefore, changes itoring sites NO, NO2 , O3 , temperature (T ), relative humid-
in the performance characteristics due to the intermittence of ity (RH) were available and were used to verify the con-
the blower are expected to be negligible. centration estimate by the sensor units: the data collected at
The signal of each sensor is sampled every 20 s. Three ZUE and LAU represents the deployment dataset (or test-
such values are aggregated by the SU to a 1 min mean value. ing dataset), which includes data until 5 December 2017. Ta-
These 1 min values are transmitted to a central database every ble S1 of the Supplement shows the descriptive statistics of
180 min. Data transmission implied both an increase in en- the meteorological and pollution conditions by the regulatory
ergy requirement by the transmission module, causing a drop network instruments at the three sites, during their respec-
in battery level, and spikes in electrode output. A despiking tive study period. The time series of the complete dataset is
procedure based upon battery level data was applied: this shown in Fig. S1 in the Supplement, linear correlation matrix
consisted in selecting the data associated with single drops for these same data is shown in Fig. S2 and the NO2 / NOx
in battery level and removing them. In case few occasional ratio is in Fig. S3 in the Supplement. The range in NO and
spikes remained after this first procedure, these were selec- NO2 levels at the calibration site is similar to the deployment
tively identified and removed by the following procedure: a sites, benefiting the data driven calibration approach used,
running median of k original readings (rrm ) was calculated, with the calibration site showing pollution conditions more
the standard deviation of the difference between the origi- similar to ZUE than to LAU.

www.atmos-meas-tech.net/11/3717/2018/ Atmos. Meas. Tech., 11, 3717–3735, 2018


3720 A. Bigi et al.: Low cost sensors in a real world application

2.3 Regression models and explanatory variables replicate EC sensors for NO, VNO2 indicates the mean net
voltage produced by the replicate EC sensor for NO2 , with
Three different calibration algorithms have been tested: a net voltage being the difference between the working and
Multivariate Linear Regression model (MLR), a Support auxiliary electrodes. Note that this model is also listed in the
Vector Regression model (SVR) and a Random Forest re- Appendix in Eq. (A7).
gression model (RF). These methods were used to estimate
the atmospheric concentration of NO and NO2 using only NO = β0 + β1 VNO + β2 VNO2 + β3 T + β4 RH + β5 VNO
information available by each SU, i.e. voltage output by the × T + NO2 = β0 + β1 VNO + β2 VNO2 + β3 T + β4 RH
EC sensors, T and RH. Two identical NO and NO2 sensors
in each sensor unit allows the use of tens of different combi- + β5 VNO2 × T +  (1)
nations of explanatory variables in the regression models, for
2.3.2 Support Vector Regression
example a set based on the mean of the net voltages of the
replicate EC sensors or on the individual net signals of both. SVR modelling consists in a machine composed by three
Firstly, the best set of explanatory variables was selected main steps: in the former the input data are mapped into
by comparing the performance of the algorithms in using 10 a (high dimensional) feature space by means of a function,
different model equations. For each tested model SVR was generally a kernel. In the second step the flattest function
tuned for each pollutant and each of the SUs, while the same fitting the images of the input is found in the feature space
hyperparameters set was used for RF. In this task, for tun- by solving the corresponding constrained optimisation equa-
ing and performance estimate, only the calibration dataset tion: support vectors are the points corresponding to the non-
was used, consistently with the realistic framework of this null Lagrangian multipliers of this latter function. In the lat-
study. Finally, the best performing model was selected and ter step the results are mapped back into the input space.
the regression models, tuned and calibrated upon the cali- More details on SVR modelling can be found in Smola
bration dataset, were applied to the deployment dataset to and Schölkopf (2004). In the present study we used –SVR
estimate pollutant concentration. The equations of the four featured by a Gaussian radial basis kernel: the three main
main covariate combinations that were tested are listed in hyperparameters of this model are , the parameter of the
Appendix A: these models are labelled minimal when using insensitive-loss function, σ , the inverse kernel width, and
one EC sensor only (Eqs. A1, A2), basic when using one NO C, the cost of constraints violation. These hyperparameters
and one NO2 EC sensor (Eqs. A3 and A4), single replicate were tuned upon the calibration dataset by a 5-fold cross-
when using 2 EC sensors of the same gas (Eqs. A5, A6) and validation approach and the best performing set was selected
double replicate when using the four EC sensors (Eqs. A7, using three different goodness-of-fit metrics, i.e. the mean of
A8). All equations include ambient RH and T readings by squared errors, the root-mean of squared errors and the coef-
their respective sensor within each SU. ficient of determination. The hyperparameters were individ-
All plots and results in the remainder of the text proceed ually tuned for each sensor unit and each pollutant.
from the model including all four EC sensors, i.e. the one SVR modelling and tuning were achieved using the
achieving the best performance on the calibration dataset. kernlab and mlr packages for R (Karatzoglou et al., 2004;
However, as the redundancy in EC sensors is a feature spe- Bischl et al., 2016). Fast and optimal SVR hyperparameter
cific to the SUs used in this study, for the sake of compa- tuning is an active research area within the scientific com-
rability with the literature and to verify the benefit of a re- munity, motivated by the hyperparameters reciprocal inter-
dundant design, the final performance of the SUs at the de- action and leading to large hyperparameter spaces being ex-
ployment sites using the four main regression models listed plored for an optimal result. The computing time and com-
in Appendix A is shown in Figs. S4, S5 and in Table S2 in puting resources needed to tune the calibration dataset were
the Supplement. significantly larger than for the other models (70–300 core-
hours per sensor per pollutant on one Intel i7-6700 CPU at
2.3.1 Multivariate Linear Regression 3.40 GHz). Moreover, SVR showed a tendency to overfit the
data and it often led to similar fitting performance with dif-
The MLR model used in this study partly included MLR re- ferent hyperparameter sets: for final optimal results, a mi-
quirements of independent covariates. In a previous study nor manual tuning on  was occasionally applied on a model
Mueller et al. (2017) employed Alphasense NO2 B42F sen- bias-variance trade-off basis (Cawley and Talbot, 2010).
sors and among the explanatory variables both the weighted
cumulative index of past RH changes and the change in sen- 2.3.3 Random Forest Regression
sor sensitivity with temperature (as observed in lab tests, Al-
phasense Ltd, 2017). The latter covariate was included in the RF modelling consists of growing M randomised trees, rep-
four tested models (see Appendix A). In the present study resenting the forest, where each tree is built on a random sub-
the final regression model for NO and NO2 followed Eq. (1), set of the p-dimensional initial sample Xp . A tree is grown
where VNO indicates the mean net voltage produced by the by performing optimal cuts of each tree node (starting from

Atmos. Meas. Tech., 11, 3717–3735, 2018 www.atmos-meas-tech.net/11/3717/2018/


A. Bigi et al.: Low cost sensors in a real world application 3721

the root), until the cardinality of each final cell is lower than finally, for both SVR and RF, led to the regression model in
nodesize. Cut optimality is estimated using the Classifica- Eq. (2), where VNOA indicates the net voltage by the NO sen-
tion and Regression Trees split criterion (CART) (Breiman sor A VNOA indicates the net voltage by the NO2 sensor A,
2
et al., 1993): this algorithm compares the variance of the un- and consistently VNOB and VNOB for the respective replicate
2
cut node, with the variance of all possible cuts along mtry sensor B. The model in Eq. (2) is also listed in the Appendix
directions, where mtry is a random subset of sample coor- as Eq. (A8).
dinates p. The prediction is produced by averaging all tree
estimates into a (pointwise) forest estimate. More details on
RF regression modelling can be found in Breiman (2001). NO = function(VNOA , VNOA , VNOB , VNOB , T , RH)
2 2
Two main approaches exist to overcome the RF standard
NO2 = function(VNOA , VNOA , VNOB , VNOB , T , RH) (2)
pointwise estimate and build an interval for model predic- 2 2
tion, i.e. to include modelling uncertainty in the final esti-
Using a similar model structure for MLR would strongly
mate: forest-based quantile regression (QRF) and inference
violate the requirements for a reliable estimate of MLR er-
on RF estimates (RF-CI). Predictions by quantile regression
rors. It is worth noting that the residuals from the SVR and
forest result in keeping all observations in every node in ev-
RF application of Eq. (2) are independent, contrarily to MLR
ery tree and estimating a weighted mean for each observation
residuals from Eq. (1). This latter model shows autocorre-
(Meinshausen, 2006). Confidence interval for RF estimates is
lated residuals, to be expected from an ordinary linear regres-
an open research topic being tackled in different ways (e.g.
sion on a time series, and inflated variance for its coefficients,
Wager et al., 2014; Sexton and Laake, 2009; Mentch and
because of the multicollinearity of the regressors. Nonethe-
Hooker, 2016). In this study, the uncertainty of point pre-
less MLR has been included among the regression methods
dictions was tentatively assessed by using both approaches,
in this study for its wide use in low cost sensor calibration.
although still experimental. For the assessment of confidence
A further difference among algorithms is that MLR and SVR
intervals we used the approach by Athey et al. (2017), who
allow to extrapolate outside the range of their input dataset,
rely the inference on asymptotically gaussian RF predictions
while the estimates provided by RF can only be within the
and use the bootstrap of little bags algorithm (Sexton and
bounds of the calibration space, being RF a tree-based algo-
Laake, 2009) to compute asymptotically valid confidence in-
rithm. This worth noting feature of RF on one side implies
tervals. In this study standard RF modelling was performed
a constraint on its application to relocated SUs, on the other
using the RandomForestSRC package in R, while quantile
side it will guarantee only positive estimates.
regression and confidence interval estimate were both per-
The role of each predictor in MLR, SVR and RF mod-
formed using the grf package in R.
els was assessed by estimating its partial dependence, which
Main RF hyperparameters (mtry, nodesize, M) were tuned
consists in evaluating the average prediction when the covari-
upon the calibration dataset by a 5-fold cross validation by
ate of interest is held constant, repeating this prediction for
investigating several goodness-of-fit metrics. The possible
a set of values across the distribution of this covariate. Par-
range of RF hyperparameters is narrower than SVR and RF
tial dependence plots allow to investigate the effect of each
model showed a minor sensitivity to changes in mtry and
covariate on the prediction. For RF models only, it is also
nodesize, because of the small number of covariates. Finally
possible to estimate the importance of each variable by com-
nodesize and mtry were set to 7 and 5 respectively, slightly
puting the increase in prediction error by randomly permut-
larger than their recommended values, to further avoid over-
ing each covariate in every tree and averaging this prediction
fitting, an unlikely event for RF models (Breiman, 2001). The
error over the forest (Breiman, 2001): the larger the increase
number of trees was set to 1000 for standard forest and to
in prediction error, the larger the importance of the variable
4000 for QRF and RF-CI forests. These hyperparameter val-
for that RF model. This importance metric of a variable is
ues were used for all SUs and all pollutants. It is worth noting
the error occurring if a RF model, calibrated including that
that small differences exists between RandomForestSRC
variable, is used in prediction without that same variable.
and grf, which are mainly due to the splitting algorithm,
i.e. the use of fair and unfair forests (Athey et al., 2017), be-
sides that QRF central estimate is the forest median, while 3 Results
the other two RF flavours use the forest mean.
Several goodness-of-fit indexes were used to assess the over-
2.3.4 Features of machine learning regression models all performance of the four SUs when individually calibrated
using the different described calibration approaches: these
SVR and RF modelling share the ability to build a non-linear include root mean square error (RMSE), centred root mean
regression model using several time series as explanatory square error (CRMSE), mean bias error (MBE), mean abso-
variables and are superior to MLR in handling both auto- lute error (MAE) and the coefficient of determination (R 2 ).
correlation and multicollinearity. This ability allowed for the Temporal variability of these indexes was investigated, along
free testing of any combination of the possible covariates and with an overall performance of the sensing devices.

www.atmos-meas-tech.net/11/3717/2018/ Atmos. Meas. Tech., 11, 3717–3735, 2018


3722 A. Bigi et al.: Low cost sensors in a real world application

10 20 30 40 50 60 70

10 20 30 40 50 60 70

10 20 30 40 50 60 70
● ● ●
● ● ●
● ●
● ●
● ●

● ●
● ●



NO (ppb)

NO (ppb)

NO (ppb)


● ●
● ●


● ● ● ●
● ● ● ●

● ● ● ●
● ●

● ● ●
● ● ●
● ●

● ● ● ●
● ●

● ● ●
● ●
● ●
● ● ● ●

● ●
● ● ●
● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ● ● ●
● ● ●
● ● ●
● ● ● ●
● ● ● ● ● ●
● ● ●
● ● ●

● ● ● ● ● ●

● ●
● ● ● ●
0

0
−20 −10 0 10 20 −30 −20 −10 0 0 20 40 60 80
VNOA2 (mV) VNOB2 (mV) VNOA (mV)
10 20 30 40 50 60 70

10 20 30 40 50 60 70

10 20 30 40 50 60 70

SVR
RF
● ● ● ● ●
● ● ● ●






NO (ppb)

NO (ppb)

NO (ppb)

● ● ●
● ●
● ●
● ● ●
● ● ●

● ● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ● ●
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ●







● ●
● ●

● ●
● ●
● ● ● ● ●
● ● ● ●
0

0
−20 20 60 100 0 10 20 30 40 20 40 60 80

VNOB (mV) T (°C) RH (%)


200

200

200
MLR
NO (ppb)

NO (ppb)

NO (ppb)
50 100

50 100

50 100
0

0
−50

−50

−50

−20 −10 0 10 0 20 40 60 80 0 10 20 30 40
VNO2 (mV) VNO (mV) T (°C)
200

200
NO (ppb)

NO (ppb)
50 100

50 100
0

0
−50

−50

20 40 60 80 −500 0 500 1000

RH (%) mV × T (mV °C)

Figure 1. Partial plots for SVR, RF and MLR for the calibration dataset from SU009, NO. Rug on the abscissa indicates the range of the
covariate.

3.1 Results for the calibration dataset from each covariate, while SVR and RF allows non-linearity.
The partial plots for EC net voltage vs. its target gas show
Partial plots applied to the calibration dataset of SU009 are a similar pattern across all SUs and all algorithms, indicat-
shown in Figs. 1 and 2 and of SU010, SU011 and SU012 ing that the final model structure generalises well across the
in Figs. S6 through S11. These provide insights in the role of hardware for this covariate, and that the differences existing
each predictor within the model, a remedy for the widely per- among SUs are minor in this case. Both SVR and RF ex-
ceived black box nature of machine learning algorithms. The ploit the replicate EC sensors: the former algorithm shows
most prominent result by these plots is the difference existing significant response by replicate gas sensors in the estimate
among the three algorithms: MLR implies a linear response of their target gas (i.e. by both NO2 EC sensors in predict-

Atmos. Meas. Tech., 11, 3717–3735, 2018 www.atmos-meas-tech.net/11/3717/2018/


A. Bigi et al.: Low cost sensors in a real world application 3723

50

50

50
40

40

40
NO2 (ppb)

NO2 (ppb)

NO2 (ppb)
● ● ● ●
● ●
● ●

● ●

30

30

30

● ● ● ● ● ● ●
● ● ● ● ●
● ● ● ● ●

● ● ● ● ● ● ● ● ● ●
● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ●
● ● ●
● ● ● ● ●
● ●
● ● ●
● ● ●
● ● ●
● ●
● ●
● ● ●
● ●
● ●

20

20

20
● ●
● ●
● ● ●
● ●

● ● ●
● ● ● ●
● ● ● ●
● ● ●
● ● ●

● ● ● ● ●
● ●
● ●

● ●
● ●
● ● ● ●

10

10

10
● ●
● ●
● ●

−20 −10 0 10 20 −30 −20 −10 0 0 20 40 60 80


VNOA2 (mV) VNOB2 (mV) VNOA (mV)
50

50

50

SVR
RF
40

40

40
NO2 (ppb)

NO2 (ppb)

NO2 (ppb)
30

30

30
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●




● ● ●
● ●
● ●
● ● ●
● ● ●
● ● ●
● ● ●
20

20

20
● ● ●
● ● ● ● ●
● ● ● ●
● ● ● ● ●

● ● ● ● ● ●
● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ●
● ● ●
● ●
● ● ●
● ● ● ●
● ● ● ● ● ● ●
10

10

10
−20 20 60 100 0 10 20 30 40 20 40 60 80

VNOB (mV) T (°C) RH (%)

MLR
100

100

100
NO2 (ppb)

NO2 (ppb)

NO2 (ppb)
50

50

50
0

0
−100 −50

−100 −50

−100 −50

−20 −10 0 10 0 20 40 60 80 0 10 20 30 40
VNO2 (mV) VNO (mV) T (°C)
100

100
NO2 (ppb)

NO2 (ppb)
50

50
0

0
−100 −50

−100 −50

20 40 60 80 −1000 −600 −200 200

RH (%) mV × T (mV °C)

Figure 2. Partial plots for SVR, RF and MLR for the calibration dataset from SU009, NO2 . Rug on the abscissa indicates the range of the
covariate.

ing atmospheric NO2 ), while RF shows large response by – it shows a quasi-linear response of the EC net voltage to-
both replicate sensors only in case of NO by SU009 and by wards its target gas, contrary to the often non-monotonic
SU011. It is notable the similarity in the response of atmo- behaviour shown by SVR;
spheric variables according to SVR and RF, supporting the
result also by these specific partial plots. RF correctly identi- – this linear behaviour is held across large part of the full
fies the most informative variable (as supported by the vari- range of the EC net voltage output;
able importance plots in Figs. 3 and S12) and it appears to be – for RF estimating a gas, the net voltage of the EC sen-
the most efficient algorithm among the three: sor targeting that same gas has the broadest response
among all covariates. The non-monotonicity in the par-

www.atmos-meas-tech.net/11/3717/2018/ Atmos. Meas. Tech., 11, 3717–3735, 2018


3724 A. Bigi et al.: Low cost sensors in a real world application

mate. However, no anomalous peak was evident in the 10 min


data, although rapid variations in atmospheric RH occurred.
Independently of the calibration algorithm, partial plots in-
dicate a contribution by NO2 and NO EC sensors to NO and
NO2 , respectively: this might be due to the inability of the
algorithms to untangle the large correlation of these pollu-
tants in the atmosphere, and/or an existing cross-sensitivity
of the EC sensors. The latter cause cannot be excluded com-
pletely and was highlighted in several field deployment of
EC sensors: both NO2 sensors Alphasense NO2 -B4 and Al-
phasense NO2 -B43F exhibited a significant cross-sensitivity
to CO2 at atmospheric levels (Lewis and Edwards, 2016;
Kim et al., 2018), while NO2 sensor Alphasense NO2 -B42F
was shown to have large cross-sensitivity to NO by Kim et al.
(2018). Nonetheless literature studies available do not pro-
vide a clear and consistent picture about sensor selectivity
and further laboratory tests are required to shed light on this
topic. During this study no concurrent suitable data of at-
mospheric CO2 was available, preventing an investigation
of possible bias in sensor estimates of NO2 induced by the
cross-sensitivity to CO2 in the field.
The cross-sensitivity, along with a site- and time-specific
NO–NO2 correlation, may prevent the application of a cal-
ibrated regression model over a wide spatial and temporal
scale, because of a different NO / NO2 ratio at the calibra-
tion and the deployment site. The SU performance at LAU
and ZUE (Sect. 3.2) allows the evaluation of the effect of re-
location of the sensors on the data quality, since the two sites
are representing urban air pollution situations that are differ-
ent from the site where the collocated measurements have
been performed (HAE), see Table S1 and Figs. S1, S2 and
Figure 3. Variable importance plot for the prediction by SU009 of
S3 in the Supplement).
NO (a) and NO2 (b). In order to further test the generalisability of the response
by each covariate and hence of the proposed models, the 3 al-
gorithms were calibrated also using the deployment dataset,
in order to build partial plots at ZUE and LAU (Figs. S13
tial response from SVR suggests that a minor overfit is through S20 in the Supplement): note that SVR and RF were
still present, although this is not significantly affecting not tuned in this case, i.e. the same hyperparameter sets as
performance during deployment. for HAE were used. These latter partial plots are largely sim-
ilar with those derived from the HAE dataset, particularly for
Variable importance plots (Figs. 3 and S12 in the Supple- MLR and RF, while SVR still exhibits some overfit. Each
ment), possible for RF only, show how the main regression SU shows similar patterns between its partial plots for the
variable is the net voltage by the EC sensor of the corre- calibration and the deployment dataset, including for the re-
sponding target gas. Its importance is generally ∼ 4 times sponse of the EC sensor to their non-target gas. A minor ex-
larger than the second important variable; however, for NO ceptions to this latter point is the response by NO sensor B
prediction by SU009 and SU011, the second most important in SU010 (Figs. S7 and S16 in the Supplement), suggesting
variable is the replicate NO sensor and in this case its impor- that the NO / NO2 ratio partly influences the response of non-
tance is closer to the first most important variable. target gas sensors. Overall these latter partial plots also show
The effect of RH on sensor response is extremely low for how the main behaviour of each SU was not significantly af-
all algorithms, consistent with results from laboratory studies fected by 7-month outdoor installation, notwithstanding the
(e.g. Spinelle et al., 2017). Nonetheless, humidity transients relocation and the change in environmental conditions.
are known for being responsible for spurious responses by
the EC sensors (Mueller et al., 2017; Alphasense Ltd, 2017;
Pang et al., 2017), but this effect was not parameterized in
this study, possibly leading to a slightly degraded model esti-

Atmos. Meas. Tech., 11, 3717–3735, 2018 www.atmos-meas-tech.net/11/3717/2018/


A. Bigi et al.: Low cost sensors in a real world application 3725

Figure 4. Comparison of NO (a) and NO2 (b) estimates by SU009 with observations by reference instruments, at the urban background site
Zurich-Kaserne. 1 : 1 red dashed line is added in the scatterplots.

www.atmos-meas-tech.net/11/3717/2018/ Atmos. Meas. Tech., 11, 3717–3735, 2018


3726 A. Bigi et al.: Low cost sensors in a real world application

Figure 5. Comparison of NO (a) and NO2 (b) estimates by SU011 with observations by reference instruments, at the urban traffic site of
Lausanne. 1 : 1 red dashed line is added in the scatterplots.

Atmos. Meas. Tech., 11, 3717–3735, 2018 www.atmos-meas-tech.net/11/3717/2018/


A. Bigi et al.: Low cost sensors in a real world application 3727

3.2 Results for the deployment dataset

Time series of estimate from SU009, deployed at the urban


background site ZUE, and from SU011, deployed at the ur-
ban traffic site LAU, are summarised in Figs. 4 and 5. Sum-
mary plots for SU010 and SU012, deployed at the back-
ground and traffic site respectively, are in Figs. S21 and S22.
SVR and RF performed similarly and generally better than
MLR, with a RMSE ranging between 2 and 5 ppb for both
NO and NO2 . Notwithstanding their similar goodness of fit
indexes, RF showed a more regular performance than SVR
across the SUs and the pollutants, and its time series predic-
tions are more stable than the ones by SVR, which occasion-
ally show negative spikes (e.g NO2 by SU012 in Fig. S22 in
the Supplement).
Several analyses have been performed to detail the per-
formance of each device during deployment. Timeseries of
goodness-of-fit indexes, computed with a rolling window of
1 week, indicate the change of model performance over time:
in target plots (Spinelle et al., 2015) the change in perfor-
mance is plotted in terms of CRMSE and MBE, both nor-
malised by the standard deviation of the reference (σref ),
and the right quadrants are used when the standard devia-
tion of the reference is lower than the one from model pre-
dictions, and vice-versa. In target plots the distance of each
target score to the origin equals RMSE normalised by σref .
Finally, a unit circumference is added to this diagram, con-
taining model predictions that have residuals with a standard
deviation smaller than σref . Time-resolved target plots for the
deployment dataset highlight significant variability in perfor-
mance with time depending on the device, on the gas and on
the algorithm. All three algorithms provide results within the
unit circumference for most of the deployment period, and
confirm how SVR and RF results are generally better than
those by MLR (Figs. 6 and S23 through S25 in the Supple-
ment).
The timeseries of 1 week rolling RMSE in Fig. 7 indicate
an overall lower performance in the estimate of NO, most
likely due to its larger variability, and a more steady trend for
NO2 . The RMSE for MLR is, in most occasions, the largest
among the three algorithms, while SVR and RF performed
Figure 6. Target plot for timeseries of 1 week rolling goodness-of-
similarly. The lowest variation in RMSE, ranging in 2 ppb,
fit indexes of NO (a) and NO2 (b) estimate by SU009, in Zurich
was observed for NO estimates by RF on SU010 data, while urban background.
an increase up to 6 ppb occurred in the case of NO2 predic-
tions by MLR on SU010 readings. In some cases the increas-
ing trend in RMSE is evident, e.g. for NO2 by SU009, in
others the large variability hinder a clear assessment of the ilar and consistent results (not shown). As a proxy for the
status of the SU, e.g. for NO2 by SU012. overall drift in the estimate by sensor devices, the time se-
The sensing devices (i.e. the sensor units and their individ- ries of mean daily residuals was computed: results in Fig. 8
ual calibration) were investigated also in terms of their drift, confirm the occurrence of a drift in all cases, although only
uncertainty, bias, noise and ability of resolving spatial differ- occasionally with a clear trend, and among algorithms RF
ences in pollution levels: for better comparison with common generally outperforms both SVR and MLR, achieving an ab-
regulatory measurements, all these analyses were performed solute variation in the residuals between 5 and 10 ppb after
using 1 h average input data, instead of 10 min as for previ- 4 months of deployment. As a specific proxy for zero-drift
ous ones. Nonetheless the use of 10 min data delivered sim- we used the SU estimates coupled to reference instruments

www.atmos-meas-tech.net/11/3717/2018/ Atmos. Meas. Tech., 11, 3717–3735, 2018


3728 A. Bigi et al.: Low cost sensors in a real world application

Figure 7. Timeseries of 1 week rolling RMSE for 10 min data of Figure 8. Time series of mean daily residuals for NO and NO2 es-
NO (a) and NO2 (b) at the deployment sites. timates, from 1 h average data. Smooth lines from locally weighted
polynomial regression, by loess function in R, were added.

measurements < 0.5 ppb: this analysis, not possible for NO2
due its low statistics of quasi-null values, confirms the bet- surements) of 2 and 16 ppb for NO and NO2 after 2.5 months
ter performance of the two machine learning algorithms and of field deployment.
hints to zero-drift of ∼ − 10 ppb or ∼ + 2 ppb in the worst The uncertainty of the devices was computed as relative
and in the best case, respectively. The values of these prox- expanded uncertainty according to the guidelines for the data
ies for the overall drift and the zero-drift are consistent with quality objective required by the directive 2008/50/EC (WG,
the results for these same EC sensors by Kim et al. (2018), 2010) and compared either to the expanded uncertainty of
who reported an absolute zero-drift (from laboratory mea- the reference instrument (EMPA, 2016), and to the 25 % rec-

Atmos. Meas. Tech., 11, 3717–3735, 2018 www.atmos-meas-tech.net/11/3717/2018/


A. Bigi et al.: Low cost sensors in a real world application 3729

only for the RF estimates using 1 h input data and if at least


10 values were available. Results are shown in Fig. S27
and include the 1 : 1 line along with its 25 and 35 % un-
certainty intervals. In these figures the bottom shortest rug
(red coloured) indicates whether the median is included in
the 25 % uncertainty bounds. The rug in green (blue) indi-
cates if the 5–95 % percentile range is included in the 35 %
(25 %) uncertainty range. The estimate by the sensor units is
linear over a broad range of NO and NO2 , with a fairly con-
stant 5–95 % percentile range in most cases, besides for NO
in Lausanne (traffic site), hinting to a fairly steady precision
for these devices. The bias for the median is in the order of
several ppb over large parts of the concentration range for
both pollutants and most of the SUs.
The lowest concentration correctly estimated on 90 % of
occurrences with a specified uncertainty is again dependent
on the SU, on the site and on the pollutant: at the urban back-
ground site (Zurich) this lowest concentration is provided by
SU010 and results in ∼ 15 ppb (∼ 20 ppb) for NO (NO2 ) and
this is also the best result across all 4 devices. At the urban
traffic site (Lausanne) the lowest concentration correctly esti-
mated (on 90 % of occurrences and with a 25 % uncertainty)
is ∼ 50 ppb (by SU012) and ∼ 30 ppb (by SU011) for NO and
NO2 respectively; these latter thresholds reduce to ∼ 15 ppb
for both pollutants if a 35 % uncertainty is considered.
The potential benefit of using eight EC sensors in the same
RF model was tested by combining the data of the two SUs
deployed at the same site into the same RF model. Results
for Zurich (combining SU009 with SU010) and Lausanne
(combining SU011 with SU012) lead to figures similar to
the best performing SU at the respective site, i.e. did not lead
to a decrease in uncertainty, suggesting that the latter has
a more fundamental constraint, either from the calibration
approach or by the EC and the measurement system them-
Figure 9. Comparison of expanded relative uncertainty and refer- selves. Nonetheless, the combined use of the two SUs led to
ence NO and NO2 concentration for the SU009, as deployed at the a slight improvement in the overall goodness-of-fit indexes,
urban background site Zurich-Kaserne, using 1 h average data. with a decrease of the RMSE of ∼ 0.5 ppb (see Table S3).
The overall sensor noise for each bin was computed as the
2σ of the RF estimate, if at least 10 estimates were available
ommendation for indicative measurements by the same di- for the bin. The median of this 2σ noise ranged in ±4–7 ppb
rective, as a reasonable threshold required for the detection and in ±5–8 ppb for NO and NO2 , i.e. only 1–2 ppb larger
of pollution gradients within urban areas, i.e. for a possible than the noise observed by Kim et al. (2018) under laboratory
application of these devices. Results show some variability conditions on 10 s data, and half of the 2σ noise reported by
between the two deployment sites, with highest uncertainty the EC sensor manufacturer.
for NO in Lausanne (traffic site). According to this proce- Finally, we were interested whether the tested sensor units
dure, the calculated relative expanded measurement uncer- would be appropriate for a targeted application, i.e. for re-
tainty by SUs are within 25 % for mixing ratios larger than solving the intra-urban concentration gradient with hourly
about 15–20 ppb for both NO and NO2 . Calibration models resolution. Assume that sensor units are deployed in the same
based on RF have generally the lowest uncertainty among the urban environment at two distant sites A and B, where A is
three algorithms (Figs. 9 and S26 in the Supplement). typically less polluted (urban background site) compared to
A further assessment of the uncertainty of these devices site B (site impacted by nearby sources such as road traffic).
at the deployment sites was obtained by binning reference For this assessment, the data from all four SUs have been
concentration in 1 ppb intervals and estimating for each bin pooled in order to account for different performance of indi-
the corresponding 5th–95th quantile range of the predictions, vidual sensor units, and similarly to the previous uncertainty
along with the median. The quantile range was calculated assessments, only RF estimates using 1 h input data were

www.atmos-meas-tech.net/11/3717/2018/ Atmos. Meas. Tech., 11, 3717–3735, 2018


3730 A. Bigi et al.: Low cost sensors in a real world application

80

55

1
75

5

50

0.9
Probability of resolving the difference in concentration

70
13

Intra−urban ∆NO2 between site B and site A (ppb)


Intra−urban ∆NO between site B and site A (ppb)

45
65

0.8

60

6 ●

40
7

0.7
55

between site B and site A


35
14
50

0.6
● 8
15
45

30

0.5
40

● ●
17 ●
13 ● 8

25
10
35


17

0.4
15
30


● 18

20
●●
19 ●
1 1 11
25

0.3
● ●
4 ●

15

●6
20

4

20

0.2


●●
15

● ● ●

● 2

10

● ● ●

2●11 18
● ● ● ● ●
● ●


● ● ● ● ● ● ● ● ● ●

● ●

9
● ● ● ● ● ● ● ● ● ●
10

16
● ● ● ● ● ● ● ●

9 ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

0.1
● ● ● ● ● ●
● ● ●


● ● ●


● ● ● ● ● ● ● ●

● 16 2012

5
● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●
5


12

0
5 15 25 35 45 55 65 75 5 10 15 20 25 30 35 40 45 50 55
NO concentration at site A (ppb) NO2 concentration at site A (ppb)

1. Barcelona 4. Budapest 7. London 10. Marseille 13. Munich 16. Reggio Emilia 19. Vienna
2. Bologna 5. Hamburg 8. Madrid 11. Milan 14. Paris 17. Rome 20. Zürich
3. Brussels 6. Krakow 9. Manchester 12. Modena 15. Prague 18. Turin

Figure 10. Probability of resolving spatial intra-urban difference in NO and NO2 between site B and site A, with the latter exposed to
lower concentrations. Red dots indicate the concentration difference between site B and A that can be detected with a probability of 90 %.
Numbered dot coordinates indicate pollution condition for 20 European cities: the x coordinate is the urban median concentration and the
y coordinate is the median intra-urban gradient for hourly concentration data by the air quality monitoring sites within that same urban area.

used. Next, the concentrations measured by the reference in- by these devices, while for NO2 a gradient of almost 10 ppb
struments were binned in 1 ppb intervals and denoted ref- is needed. These results were compared to the hourly gradi-
erence bins. The corresponding sensors measurements were ent in a pool of European cities, including several sites in the
then linked to the reference bins. Any concentration differ- Po valley, a NOx hotspot for Europe. The data used proceed
ence between sites B and A can now been simulated by the from 2 years of regulatory measurements at reference moni-
reference bins, and the probability distribution of the con- toring sites: data for years 2016 and 2017 were used for Italy
centration difference as measured by the tested sensors can and delivered directly by the local Environmental Agencies,
be expressed by the concentration differences of the sensor data for years 2015 and 2016 were used for the other cities
measurements assigned to the corresponding reference bins. and provided by the Air Quality e-Reporting (European En-
Integrating the sample probability distribution of the con- vironment Agency, 2017) (boxplots summarising this dataset
centration difference over values larger than zero provided are found in Figs. S28 and S29 in the Supplement). For each
the probability that the concentration gradient between site city, the intra-urban gradient was computed as the maximum
B and A is resolved by two different SUs. This probability hourly difference between traffic and background urban sites
was computed if at least 10 estimates were available for ei- within the same urban area; when more than two reference
ther site A and site B. Figure 10 shows the probability that, sites were available, the pair of sites showing the largest con-
for a given reference concentration at site A and its differ- centration difference was selected. In Fig. 10 the ordinates
ence in concentration with site B, the measurements by a SU of each city indicate its intra-urban gradient, while the ab-
at site B are larger than measurements by a SU at site A. scissa expresses its median over the analysed period. As a
In Fig. 10 red dots indicate the concentration difference be- final step, the uncertainty in RF estimates was tentatively es-
tween site B and A that can be detected with a probability timated by using experimental Quantile Random Forest Re-
of 90 %. Figure 10 highlights how the possibility of resolv- gression (QRF) and Confidence Interval estimates (RF-CI).
ing the gradient depends both on the gradient amount and Results for QRF (band including 5th to 95th quantiles) shows
on the concentration at site A, besides some influence by the that ca. 80 % of reference values are within the QRF band
sample size, as evident by the lower chance of resolving dif- for both NO and NO2 . On the contrary confidence bands by
ferences at higher (and less frequent) levels. Generally gra- RF-CI, containing ca. 20 % of the predictions, appear exces-
dients in NO above ∼ 5 to ∼ 10 ppb are likely to be captured sively narrow, although the mean prediction still indicates a

Atmos. Meas. Tech., 11, 3717–3735, 2018 www.atmos-meas-tech.net/11/3717/2018/


A. Bigi et al.: Low cost sensors in a real world application 3731

Figure 11. Comparison of QRF and RF-CI estimates of NO (a) and NO2 (b) by SU009 with observations by reference instruments, at the
urban background site Zurich-Kaserne. The grey shaded area indicates either the 5–95 % quantiles band (QRF case) or the 95 % confidence
interval (RF-CI case). 1 : 1 red dashed line is added in the scatterplots.

www.atmos-meas-tech.net/11/3717/2018/ Atmos. Meas. Tech., 11, 3717–3735, 2018


3732 A. Bigi et al.: Low cost sensors in a real world application

good performance for this model (Figs. 11, and S30, S31 and the calibration and the deployment space: the more similar
S32 in the Supplement). are these spaces, the better will be the performance of the
measuring device in case of field calibration. Standard RF is
not able to extrapolate out-of-sample, as clearly shown e.g.
4 Conclusions by the steady NO prediction corresponding to observations
larger than 100 ppb (Fig. S19 in the Supplement): notwith-
Four sensor units (SU) using low cost electrochemical sen- standing the remarkable performance achieved by this algo-
sors (EC) were tested with three calibration approaches. The rithm, this feature of RF on one side represents a main lim-
study simulates a possible realistic application of these de- itation, on the other it allows to confine the estimates within
vices and consisted of field-calibrating the units at a single air the calibration space and to identify possible misalignments
quality monitoring site and subsequently deploy the units at between the calibration and the deployment spaces. Finally,
two distant air quality monitoring sites. This procedure added although the use of a confidence band in the estimates by low
relocation to the other well documented sources of uncer- cost sensors should be recommended, in the present study,
tainty by low cost sensors (e.g. stability, cross-sensitivity and confidence bands for RF resulted too experimental to be used
reproducibility), involving further possible errors generated for application studies.
by differences in pollution levels and environmental condi- On a broader view, these results recommend to investigate
tions between the calibration and deployment site, and be- whether these sensors are fit for the intended purpose and the
tween the calibration and the deployment period. Within this intended environment, prior to their use. Given the perfor-
realistic framework the performance of three state-of-the- mance of these devices, they resulted unsuitable for cleaner
art calibration algorithms were tested: Multivariate Linear urban areas (e.g. in background mountain locations) and un-
Regression (MLR), Support Vector Regression (SVR) and suitable to reliably map small intra-urban gradients; nonethe-
Random Forest (RF). For each SU and for each algorithm, less they also showed a large potential for time-resolved
the overall performance and its change over time was esti- monitoring of NO and NO2 in medium-to-high polluted ar-
mated according to several metrics. Drift, uncertainty, bias eas and for quantitatively resolving intra-urban concentra-
and noise were assessed, along with the probability to re- tion gradients on a hourly basis in higher polluted and larger
solve spatial concentration differences by using these SUs, cities. Targeted QA/QC protocols for the management of this
still within the same realistic framework. class of sensors and/or of a network of sensors need to be
Each unit hosted two EC sensors for each of the two mon- implemented for achieving the best and constant data quality
itored pollutants (NO and NO2 ), resulting in several possi- during medium to longterm deployment.
ble covariate combinations for the regression models. For all
three algorithms the model fully exploiting the replicate EC
sensors performed best, with RF resulting the most success- Data availability. All raw data can be provided by the authors upon
ful algorithm. MLR achieved the worst performance accord- request.
ing to all goodness-of-fit indexes, along with a large drift
over time, which is not surprising given the large autocor-
relation in its residuals, indicating that important informa-
tion from the input data were not included in the regression
model. SVR overall performance is comparable, or occasion-
ally better, than RF throughout the deployment period; how-
ever, the tuning of its parameters is computer intensive and
the algorithm exhibited a tendency to overfit (as shown by the
occasional lack of monotonicity in its partial plots), discour-
aging its use in a realistic production application, potentially
involving several sensor units.
The lowest correctly estimated concentration resulted
mainly dependent on the SU, on the pollutant and on the
algorithm: best results for this study indicate 15–20 ppb for
both NO and NO2 , if an expanded uncertainty of 25 % is con-
sidered. RMSE ranged between 3 and 7 ppb, drift resulted
few ppb larger and the 2σ noise showed figures similar to
RMSE. When calibrated, the sensors resulted capable to de-
tect concentration differences of about 5–10 ppb for NO and
8–10 ppb for NO2 , depending on the urban background level.
It is worth noting how the performance of the three algo-
rithms is strongly dependent on the comparability between

Atmos. Meas. Tech., 11, 3717–3735, 2018 www.atmos-meas-tech.net/11/3717/2018/


A. Bigi et al.: Low cost sensors in a real world application 3733

Appendix A Equation of the single replicate model for Multivariate Lin-


ear Regression, VNO indicates the mean net voltage produced
Equation of the minimal model for Multivariate Linear Re- by the twin EC sensors for NO, VNO2 indicates the net volt-
gression, only EC sensor A for the target pollutant is used. age produced by the two EC sensor for NO2 .

NO = β0 + β1 VNOA + β2 T + β3 RH + β4 VNOA × T +  NO = β0 + β1 VNO + β2 T + β3 RH + β4 VNO × T + 


NO2 = β0 + β1 VNOA + β2 T + β3 RH + β4 VNOA NO2 = β0 + β1 VNO2 + β2 T + β3 RH + β4 VNO2
2 2

×T + (A1) ×T + (A5)

Equation of the minimal model for Support Vector Regres- Equation of the single replicate model for Support Vector
sion and Random Forest, only EC sensor A for the target Regression and Random Forest, either EC sensor A for NO
pollutant is used. and EC sensor A for NO2 are used.

NO = function(VNOA , T , RH) NO = function(VNOA , VNOA , T , RH)


2
NO2 = function(VNOA , T , RH) (A2) NO2 = function(VNOA , VNOA , T , RH) (A6)
2
2

Equation of the basic model for Multivariate Linear Regres- Equation of the double replicate and final model for Multi-
sion, EC sensors A both for NO and NO2 are used. variate Linear Regression, VNO indicates the mean net volt-
age produced by the twin EC sensors for NO, VNO2 indicates
NO = β0 + β1 VNOA + β2 VNOA + β3 T + β4 RH + β5 VNOA
2 the net voltage produced by the two EC sensor for NO2 .
× T + NO2 = β0 + β1 VNOA + β2 VNOA + β3 T + β4 RH
2
NO = β0 + β1 VNO + β2 VNO2 + β3 T + β4 RH + β5 VNO
+ β5 VNOA × T +  (A3)
2 × T + NO2 = β0 + β1 VNO + β2 VNO2 + β3 T + β4 RH
Equation of the basic model for Support Vector Regression + β5 VNO2 × T +  (A7)
and Random Forest, EC sensors A both for NO and NO2 are
used. Equation of the double replicate and final model for Support
Vector Regression and Random Forest, either EC sensor A
NO = function(VNOA , VNOA , T , RH) for NO and EC sensor A for NO2 are used.
2

NO2 = function(VNOA , VNOA , T , RH) (A4) NO = function(VNOA , VNOB , VNOA , VNOB , T , RH)
2
2 2

NO2 = function(VNOA , VNOB , VNOA , VNOB , T , RH) (A8)


2 2

www.atmos-meas-tech.net/11/3717/2018/ Atmos. Meas. Tech., 11, 3717–3735, 2018


3734 A. Bigi et al.: Low cost sensors in a real world application

The Supplement related to this article is available online De Vito, S., Piga, M., Martinotto, L., and Francia, G. D.:
at https://doi.org/10.5194/amt-11-3717-2018-supplement. CO, NO2 and NOx urban pollution monitoring with on-
field calibrated electronic nose by automatic bayesian
regularization, Sens. Actuat. B-Chem., 143, 182–191,
https://doi.org/10.1016/j.snb.2009.08.041, 2009.
De Vito, S., Esposito, E., Salvato, M., Popoola, O., Formisano,
Competing interests. The sensor units have been jointly developed
F., Jones, R., and Francia, G. D.: Calibrating chem-
by Decentlab and Empa. The authors declare that they have no con-
ical multisensory devices for real world applications:
flict of interest.
An in-depth comparison of quantitative machine learn-
ing approaches, Sens. Actuat. B-Chem., 255, 1191–1210,
https://doi.org/10.1016/j.snb.2017.07.155, 2018.
Acknowledgements. Alessandro Bigi was supported by the Swiss EMPA: Technical report for the national monitoring net-
National Science Foundation International Short Visit Grant work of atmospheric pollutants (NABEL), 2016 (in Ger-
(IZK0Z2-174969). Stuart K. Grange was supported by Anthony man), Tech. rep., EMPA, available at: https://www.empa.
Wild with the provision of the Wild Fund Scholarship. The authors ch/documents/56101/246436/Technischer+Bericht+2016/
thank the two anonymous referees for investing their time in 0bc321a3-f489-4f20-bcda-a323fbc4ca8a (last access: 7 May
reviewing this manuscript and for providing valuable comments to 2018), 2016.
improve the manuscript. Esposito, E., De Vito, S., Salvato, M., Bright, V., Jones, R.,
and Popoola, O.: Dynamic neural network architectures for
Edited by: Francis Pope on field stochastic calibration of indicative low cost air qual-
Reviewed by: two anonymous referees ity sensing systems, Sens. Actuat. B-Chem., 231, 701–713,
https://doi.org/10.1016/j.snb.2016.03.038, 2016.
Esposito, E., Salvato, M., De Vito, S., Fattoruso, G., Castell,
References N., Karatzas, K., and Di Francia, G.: Assessing the Reloca-
tion Robustness of on Field Calibrations for Air Quality Mon-
Alphasense Ltd: Alphasense 4-Electrode Individual Sensor Board itoring Devices, 303–312, Springer International Publishing,
(ISB), Great Notley, UK, 085-2217 edn., 2014. https://doi.org/10.1007/978-3-319-66802-4_38, 2018.
Alphasense Ltd: Environmental changes: temperature, pressure, European Environment Agency: Eionet Central Data Repository,
humidity, Tech. Rep. AAN 110, Great Notley, UK, available available at: http://cdr.eionet.europa.eu/ (last access: 20 June
at: http://www.alphasense.com/WEB1213/wp-content/uploads/ 2018), 2017.
2013/07/AAN_110.pdf, last access: 20 June 2018, 2017. Fonollosa, J., Sheik, S., Huerta, R., and Marco, S.: Reser-
Athey, S., Tibshirani, J., and Wager, S.: Generalized Random voir computing compensates slow response of chemosensor
Forests, https://arxiv.org/pdf/1610.01271v3.pdf (last access: 25 arrays exposed to fast varying gas concentrations in con-
June 2018), 2017. tinuous monitoring, Sens. Actuat. B-Chem., 215, 618–629,
Baron, R. and Saffell, J.: Amperometric Gas Sensors as a Low https://doi.org/10.1016/j.snb.2015.03.028, 2015.
Cost Emerging Technology Platform for Air Quality Moni- Fonollosa, J., Fernández, L., Gutièrrez-Gálvez, A., Huerta,
toring Applications: A Review, ACS Sensors, 2, 1553–1566, R., and Marco, S.: Calibration transfer and drift coun-
https://doi.org/10.1021/acssensors.7b00620, 2017. teraction in chemical sensor arrays using Direct Stan-
Bischl, B., Lang, M., Kotthoff, L., Schiffner, J., Richter, J., dardization, Sens. Actuat. B-Chem., 236, 1044–1053,
Studerus, E., Casalicchio, G., and Jones, Z. M.: mlr: Machine https://doi.org/10.1016/j.snb.2016.05.089, 2016.
Learning in R, J. Machine Learn. Res., 17, 1–5, 2016. Ghermandi, G., Fabbi, S., Zaccanti, M., Bigi, A., and Teggi, S.:
Breiman, L.: Random Forests, Machine Learn., 45, 5–32, Micro–scale simulation of atmospheric emissions from power–
https://doi.org/10.1023/A:1010933404324, 2001. plant stacks in the Po Valley, Atmos. Pollut. Res., 6, 382–388,
Breiman, L., Friedman, J., Stone, C., and Olshen, R.: Classifica- https://doi.org/10.5094/APR.2015.042, 2015.
tion and Regression Trees, New York; London: Chapman & Hall, Hagan, D. H., Isaacman-VanWertz, G., Franklin, J. P., Wallace, L.
358 pp., 1993. M. M., Kocar, B. D., Heald, C. L., and Kroll, J. H.: Calibra-
Cawley, G. C. and Talbot, N. L. C.: On over-fitting in model selec- tion and assessment of electrochemical air quality sensors by co-
tion and subsequent selection bias in performance evaluation, J. location with regulatory-grade instruments, Atmos. Meas. Tech.,
Machine Learn. Res., 11, 2079–2107, 2010. 11, 315–328, https://doi.org/10.5194/amt-11-315-2018, 2018.
Council of Europe: Directive 2008/50/EC of the European Parlia- Hueglin, C., Buchmann, B., and Weber, R.: Long-term ob-
ment and of the Council of 21 May 2008 on ambient air qual- servation of real-world road traffic emission factors on a
ity and cleaner air for Europe, Official Journal of the Euro- motorway in Switzerland, Atmos. Environ., 40, 3696–3709,
pean Union, Official Journal of the European Union, L152/1– https://doi.org/10.1016/j.atmosenv.2006.03.020, 2006.
L152/144, 2008. Karatzoglou, A., Smola, A., Hornik, K., and Zeileis, A.: kernlab -
Cross, E. S., Williams, L. R., Lewis, D. K., Magoon, G. R., Onasch, An S4 Package for Kernel Methods in R, J. Stat. Softw., 11, 1–20,
T. B., Kaminsky, M. L., Worsnop, D. R., and Jayne, J. T.: Use https://doi.org/10.18637/jss.v011.i09, 2004.
of electrochemical sensors for measurement of air pollution: cor- Kim, J., Shusterman, A. A., Lieschke, K. J., Newman, C.,
recting interference response and validating measurements, At- and Cohen, R. C.: The Berkeley Atmospheric CO2 Ob-
mos. Meas. Tech., 10, 3575–3588, https://doi.org/10.5194/amt- servation Network: field calibration and evaluation of low-
10-3575-2017, 2017.

Atmos. Meas. Tech., 11, 3717–3735, 2018 www.atmos-meas-tech.net/11/3717/2018/


A. Bigi et al.: Low cost sensors in a real world application 3735

cost air quality sensors, Atmos. Meas. Tech., 11, 1937–1946, Sexton, J. and Laake, P.: Standard Errors for Bagged and Ran-
https://doi.org/10.5194/amt-11-1937-2018, 2018. dom Forest Estimators, Comput. Stat. Data Anal., 53, 801–811,
Lewis, A. and Edwards, P.: Validate personal air-pollution sensors, https://doi.org/10.1016/j.csda.2008.08.007, 2009.
Nature, 535, 29–31, https://doi.org/10.1038/535029a, 2016. Smola, A. J. and Schölkopf, B.: A tutorial on sup-
Mead, M., Popoola, O., Stewart, G., Landshoff, P., Calleja, M., port vector regression, Stat. Comput., 14, 199–222,
Hayes, M., Baldovi, J., McLeod, M., Hodgson, T., Dicks, J., https://doi.org/10.1023/B:STCO.0000035301.49549.88, 2004.
Lewis, A., Cohen, J., Baron, R., Saffell, J., and Jones, R.: The Spinelle, L., Aleixandre, M., and Gerboles, M.: Protocol of evalu-
use of electrochemical sensors for monitoring urban air quality ation and calibration of low-cost gas sensors for the monitoring
in low-cost, high-density networks, Atmos. Environ., 70, 186– of air pollution, Technical report EUR 26112 EN, Joint Research
203, https://doi.org/10.1016/j.atmosenv.2012.11.060, 2013. Centre, 44 pp., https://doi.org/10.2788/9916, 2013.
Meinshausen, N.: Quantile Regression Forests, J. Machine Learn. Spinelle, L., Gerboles, M., Villani, M. G., Aleixandre, M., and
Res., 7, 983–999, 2006. Bonavitacola, F.: Field calibration of a cluster of low-cost
Mentch, L. and Hooker, G.: Quantifying Uncertainty in Random available sensors for air quality monitoring. Part A: Ozone
Forests via Confidence Intervals and Hypothesis Tests, J. Ma- and nitrogen dioxide, Sens. Actuat. B-Chem., 215, 249–257,
chine Learn. Res., 17, 1–41, 2016. https://doi.org/10.1016/j.snb.2015.03.031, 2015.
Mijling, B., Jiang, Q., de Jonge, D., and Bocconi, S.: Spinelle, L., Gerboles, M., Kotsev, A., and Signorini, M.:
Field calibration of electrochemical NO2 sensors in a cit- Evaluation of low-cost sensors for air pollution monitoring,
izen science context, Atmos. Meas. Tech., 11, 1297–1312, Technical report EUR 28601 EN, Joint Research Centre,
https://doi.org/10.5194/amt-11-1297-2018, 2018. https://doi.org/10.2760/548327, 2017.
Mueller, M., Hasenfratz, D., Saukh, O., Fierz, M., and Hueglin, United Nations: World Urbanization Prospects: The 2014 Revision,
C.: Statistical modelling of particle number concentration in Tech. Rep. ST/ESA/SER.A/366, Department of Economic and
Zurich at high spatio-temporal resolution utilizing data from Social Affairs, Population Division, 27 pp., 2015.
a mobile sensor network, Atmos. Environ., 126, 171–181, Wager, S., Hastie, T., and Efron, B.: Confidence Intervals for Ran-
https://doi.org/10.1016/j.atmosenv.2015.11.033, 2016. dom Forests: The Jackknife and the Infinitesimal Jackknife, J.
Mueller, M., Meyer, J., and Hueglin, C.: Design of an ozone and Machine Learn. Res., 15, 1625–1651, 2014.
nitrogen dioxide sensor unit and its long-term operation within WG, E.: Guide to the demonstration of equivalence of ambi-
a sensor network in the city of Zurich, Atmos. Meas. Tech., 10, ent air monitoring methods, Tech. rep., EC Working Group
3783–3799, https://doi.org/10.5194/amt-10-3783-2017, 2017. on Guidance for the Demonstration of Equivalence, avail-
Pang, X., Shaw, M. D., Lewis, A. C., Carpenter, L. J., and Batchel- able at: http://ec.europa.eu/environment/air/quality/legislation/
lier, T.: Electrochemical ozone sensors: A miniaturised alter- pdf/equivalence.pdf (last access: 20 June 2018), 2010.
native for ozone measurements in laboratory experiments and Zimmerman, N., Presto, A. A., Kumar, S. P. N., Gu, J., Hauryliuk,
air-quality monitoring, Sens. Actuat. B-Chem., 240, 829–837, A., Robinson, E. S., Robinson, A. L., and Subramanian, R.: A
https://doi.org/10.1016/j.snb.2016.09.020, 2017. machine learning calibration model using random forests to im-
R Core Team: R: A Language and Environment for Statistical Com- prove sensor performance for lower-cost air quality monitoring,
puting, R Foundation for Statistical Computing, Vienna, Aus- Atmos. Meas. Tech., 11, 291–313, https://doi.org/10.5194/amt-
tria, available at: https://www.R-project.org/ (last access: 20 June 11-291-2018, 2018.
2018), 2017.
Rai, A. C., Kumar, P., Pilla, F., Skouloudis, A. N., Sabatino,
S. D., Ratti, C., Yasar, A., and Rickerby, D.: End-user
perspective of low-cost sensors for outdoor air pol-
lution monitoring, Sci. Total Environ., 607, 691–705,
https://doi.org/10.1016/j.scitotenv.2017.06.266, 2017.

www.atmos-meas-tech.net/11/3717/2018/ Atmos. Meas. Tech., 11, 3717–3735, 2018

You might also like