Reputation-Aware Data Fusion and Malicious Participant Detection in Mobile Crowdsensing
Reputation-Aware Data Fusion and Malicious Participant Detection in Mobile Crowdsensing
Abstract—Mobile crowdsensing, an emerging sensing or erroneous data, making trust evaluation a highly important
paradigm, promotes scalability and reduction in the deployment issue in MCS applications. Therefore, validating the accuracy
of specialized sensing devices for large-scale data collection of contributions is essential to ensure the reliability of the
in a decentralized fashion. However, its open structure allows
malicious entities to interrupt a system by reporting fabricated application system.
or erroneous data, making trust evaluation a highly important In this paper, we consider data corruption attack behavior of
issue in mobile crowdsensing applications. The goal of this a malicious participant. By malicious we mean a participant
research is to show that an introduction of a reputation system who sends incorrect data either intentionally or unintention-
in the process of correlated sensor-based data fusion will ally. The unintentional error can arise because a participant
enhance the overall quality of the sensed data. To do so, we
design a reputation-aware data fusion mechanism to ensure data carelessly performed the sensing task, or due to a sensor
integrity. We use Gompertz function in our reputation method to error. On the contrary, a malicious participant can deliberately
rate the trustworthiness of the data reported by a crowdsensing fabricate the sensed data to infiltrate the system. For example,
participant. The proposed mechanism, on one hand, is capable in air quality monitoring, a malicious participant may hold
of defending a data corruption attack and identifying malicious the sensor beside a burning cigarette or place it over sand
or honest participants based on their reported data in real
time. On the other hand, this mechanism yields more accurate instead of facing to the air. Thus, the reported data will not
data prediction in terms of lower data prediction error. We represent the actual air quality. In the related contemporary
conducted experiments using two different real-world datasets. works [6], [7], [8], [9], the authors did not consider the
We compare our correlated data and reputation-aware data participants’ malicious behavior. Thus, these works were not
prediction (CDR) method with other popular methods, and able to distinguish the sensing data reported by malicious or
the results show that our effective method incurs lower data
prediction error. careless users. This limitation of the existing works motivates
Key words- Big data analytics; anomaly detection; data us to design reputation-aware real-time data fusion algorithms
fusion; mobile crowdsensing; Spatial-temporal data anal- for MCS to ensure data integrity. Our method can detect
ysis malicious participants and prevent them from infiltrating the
system in real time.
I. I NTRODUCTION We develop an online method for data quality prediction
With the advent of better wireless technology and an in- in MCS considering the heterogeneous trust level of the
crease in smartphone usage, a new mode of data collection participants. We took into account spatio-temporal and inter
named mobile crowdsensing (MCS) has emerged. Mobile sensor-category correlations. We consider the users who are
crowdsensing has a number of practical applications: traf- willing to participate in sensing at the same time. The terms
fic monitoring, epidemic disease monitoring, reporting from participant or node are used to denote a user with sensing
disaster situations and environment monitoring [1], [2], [3]. capability.
For example, an environmental air quality sensing system was We implement our Correlated Data and Reputation-aware
deployed on street sweeping vehicles to monitor air quality Data Prediction (CDR) method on two real-world datasets
in San Francisco [4]. These applications are usually open to [10], [11]. The sensing was performed for four days, and
the public and receive sensor data from multiple participants. there are 289 taxi values in the first real dataset. The taxis
This influences the reduction of data sparsity at lower costs move around different parts of Rome sensing temperature.
in comparison with traditional sensor networks. With various The second data set consists of Beijing’s air quality data. One
advantages, MCS’s people-centric architecture allows both hundred and forty nine taxis with four types of sensors col-
more inaccurate and corrupted data [5]. Malicious participants lect P M 2.5, P M 1.0, N O2 and humidity data from Beijing
can manipulate the MCS data collection process at ease. during seven days.
These entities can interrupt a system by reporting fabricated The main contributions of this paper are as follows.
Authorized licensed use limited to: University of Pittsburgh. Downloaded on January 02,2021 at 20:34:54 UTC from IEEE Xplore. Restrictions apply.
one mobile user at each point of interest (PoI). Kishino et
al. [17] mounted sensor nodes on garbage trucks that drive
around the city. Their motivation was to detect target events by
analyzing vehicle-mounted sensor data streams. The authors
used machine learning methods to achieve so. On the other
hand, the author [7] broached a new sampling method named
stratified sampling for calculating mean temperature of a linear
area. In this paper, only random waypoint mobility model has
been considered for the movement of the sensing devices.
There are several works focusing on the cleaning of data
streams. Most of the previous works on sensor data cleaning
focused on the reduction of consumed energy. To achieve this
reduction, the authors [18], [19], [20] tried to reduce the inter
node communication. In these works, it was assumed that
Fig. 1. Three-dimensional Tensor sensor data are always aggregated during submission. There
have been significant works on using compressive sensing
for data reconstruction in static sensor networks [21], [22].
• We propose a novel reputation-aware correlated sensor-
In recent days, researchers [23], [24], [25], [26], [16], [27]
based light-weight real-time data fusion and malicious par-
are designing frameworks to deal with big data services.
ticipant detection mechanism for mobile crowdsensing data
In the past, the data size was not as big as present days,
streams.
which influences researchers to design and develop scalable
• Extensive experiments using two real-world data sets
mechanisms to correct any kind of inaccuracy in data streams.
ensure the efficiency and accuracy of our data prediction
Liu et al. [26] designed a framework for big data cleaning. This
mechanism over state-of-the art techniques.
paper gives direction on how to achieve reliable database in big
We organize our paper as follows. In section II, we discuss
data applications. They used context to find similarity between
the related work. In section III, we discuss different modules of
data items. Moreover, the authors exploited usage pattern to
our overall system. In section IV, we discuss our performance
classify and group data items that are not related contextually.
evaluation. Finally, concluding remarks and future works are
One of the challenging tasks in dealing with big data is to
offered in section V.
shrink the data size by extracting the irrelevant subset. Dong
II. R ELATED W ORK et al. [28], in contrast, debated that having more data does
not always provide more information. During data integration,
In this section, we discuss the most pertinent works. Zhang proper selection of reliable source among all available sources
et al. [12] proposed data cleaning method for environmental results in higher data accuracy.
sensing which was based on incrementally adjusted reliability Another aspect of literature focuses on finding outliers in
of individual sensors. With the advance of time, they incre- sensor data streams. In order to find global outliers in the
mentally adjusted the reliability of each sensor depending on data, Branch et al. [18] proposed a distance based ranking
the sensing data accuracy. Trustworthiness has been considered method. The other existing methods for finding outliers in
as a measure of data quality estimation [13]. Huang et al. [14] sensor data are geometry-based [29] , polygon-based spatial
showed that using a reputation framework helped to weed out outlier detection [30], clustering-based [31], kernel density-
non-colluding malicious attackers. Their reputation framework based [8] and histogram approach [32]. Bosman et al. [33]
produced more accurate results than not using a reputation tried to answer the question if adding more neighbors makes
framework. However, the authors assumed that data is coming the anomaly detection perform better. This paper considered
from every discrete block of space-time which is not practical static sensor nodes and it varied the neighborhood size by
in real-world scenarios. On the contrary, Peng et al. [15] used changing the communication range of the sensors.
unsupervised learning for data quality estimation. Though this However, to the best of our knowledge, we are the first
method works after the collection of historical data from all to develop reputation-aware correlated sensor-based real-time
the users, it is not an online method. data fusion and malicious participant detection mechanism for
Nowadays, instead of traditional static wireless sensor net- mobile crowdsensing data streams.
works, the sensing is distributed among a crowd of people.
This brings heterogeneity in the sensor networks and makes
the computation more complex. The most recent work on III. M ETHODOLOGY
data quality estimation in mobile crowdsensing is done by
Shengzhong et al. [16]. The authors broached real-time data In this section we first present an overview of the proposed
estimation in mobile crowd sensing and proposed a context- mechanism, Correlated Data and Reputation-aware Data Pre-
aware method for data quality estimation. The limitation of this diction (CDR), then a detailed description of the components,
work is that the authors considered the presence of exactly and finally how we fit them together to create our full structure.
4821
Authorized licensed use limited to: University of Pittsburgh. Downloaded on January 02,2021 at 20:34:54 UTC from IEEE Xplore. Restrictions apply.
A. Overview normalization, the cooperation scores belong to the range
CDR consists of two parts: a reputation calculation method [−1, 1].
and correlated data [6]. The reputation method considers two 2(pi − min(p))
types of trust for each sensor, cooperation and reputation, and pnorm
i = −1 (3)
max(p) − min(p)
both parameters are calculated at the application server level.
The reputation calculation method is applied to multiple types We want to maximize impact of the most recent epochs
of sensor data streams. These varied sensors are correlated and minimize the impact of the least recent ones. To make
with each other. It is important for our mechanism to take the aging effective, we age the normalized cooperation scores
the granularity of time and space into account. We discretized with Eq. 4.
our time into epochs, and space into equal-sized grids. The
k
framework is applied only on data from sensors within the 0 X 0
pi,k = λk−k pnorm
i,k (4)
same region and the same epoch. CDR is applied to each
k0
different type of data and then the final, discretized space-
time blocks are used to produce a least-square regression Here, k denotes the current epoch and k 0 has the value from
on the target data type. This regression can be used to 1 to current epoch. Aging parameter λ has the value [0, 1]
predict both future data and missing data. We borrow the Finally, reputation is calculated using the Gompertz function
concept of three-dimensional tensors shown in Fig. 1 from [6]. [14], shown in Eq. 5.
The authors considered temporal interpolation for the sparse 0
cpi,k
regions. However, Kang et al. [6] assumed that all incoming Ri,k = aebe (5)
data from sensors was accurate.
Here, a, b and c are function parameters. The parameter a
B. Cooperation denotes the upper asymptote, displacement along x-axis is
Cooperation scores of sensors are measured per epoch; they controlled by b and the growth rate is controlled by parameter
measure the proportion of the inverse square root error of the c.
data from the sensor over the sum of the proportion of the D. Full Structure
inverse square root error from all sensors. For our cooperation
parameter, we used an inverse proportion of the square root We discretize the space into regions and the time into
of the absolute error so as not to punish small deviations from epochs, then we run CDR on every discrete block of space-
the average as much. In the data sets we tested, temperature time.
data and air quality data, small variations from the average are First we run an Expectation Maximization Algorithm (EM),
common. The equation for cooperation score is shown in Eq. shown in Algorithm 1, on the “reputable” sensors. To be
1. classified as reputable sensors, the participant must have a
reputation higher than the threshold. This threshold is an
√ 1 application dependent. Initially, all sensors are classified as
|xi −r|
Pn √ reputable with equal cooperation score.
|xi −r|
i=1
pi = Pn (1)
√ 1
j=1 |xj −r|
Pn √
|xi −r|
Algorithm 1 Expectation Maximization on Cooperation
i=1
Scores for Robust Average
Where r is the robust average of the data in that epoch and Input: Robust Average (r) , Cooperation Scores (pi )
xi is the measurement from sensor i. The robust average of Output: Robust Average (r)
the data provides an idea of where the data clusters, and this Initialize: all pi to 1/n, where n is the number of sensors, and
increases the accuracy of the data by assigning more weight to l = 0, where l is the iteration
values that occur more frequently. We calculate robust average while pli and pl+1 don’t converge do
i
using Eq. 2. Compute rl+1 from pli ’s using Eq. 2
n Compute pl+1 l
i ’s from r using Eq. 1
X
r= pi ∗ xi (2) l = l+1
end
i=1
return rl+1
C. Reputation
Reputation scores are updated at the end of each epoch; it After running EM algorithm once on only the reputable
measures how accurate the crowdsensing participant has been sensors, we then check the reported values from “disreputable”
over time. To calculate reputation from cooperation scores, sensors, or sensors with a reputation lower than the threshold.
first the cooperation scores are normalized [14] using Eq. 3. If the reported value from any of these sensors is within
Here, Pi is the cooperation score of participant i. min(p) an acceptable error range of the robust average calculated
and max(p) denote the minimum and maximum cooperation from the reputable sensors’ reported data, then it is added
score among all the participants during that epoch. After as faux reputable sensor in that block of space-time. After
4822
Authorized licensed use limited to: University of Pittsburgh. Downloaded on January 02,2021 at 20:34:54 UTC from IEEE Xplore. Restrictions apply.
finding all the sensors from the set of disreputable sensors
that contributed acceptable data in the block of space-time,
EM is then run again on the new set of reputable sensors.
The reason that we run EM twice is to provide sensors in
the disreputable set a chance to move into the reputable set
if they consistently contribute accurate data, because only
sensors with a cooperation score for the epoch will have their
reputations updated. The second EM run gives a new reputable
average as well as update reputation scores for each sensor.
The new reputation scores are then normalized to the range
[−1, 1] using Eq. 3. The normalized cooperation scores are
then aged based on their cooperation rating. Sensors with a
cooperation score above a certain threshold are labeled as
“cooperative” and sensors with a cooperation score below
that threshold are labeled as “uncooperative”. Depending on
the sensor’s classification for the latest block, the normalized
cooperation is multiplied by a different aging parameter, λ. Fig. 2. Prediction results for test set 1: out of 612 predictions, CDR performed
better in 466 and was within 5% of the true value in 290 cases
Cooperative sensors are multiplied by a lower aging parameter
than uncooperative sensors. This means that the growth and
decay rates of reputation will be different; the decay rate will
be higher, and this provides higher punishment for bad data
and thus helps quickly detect malicious users. Finally the aged
cooperation score is inputted to Eq.5.
Once all the blocks are processed for each data type, then
we use the processed data to create a least-square fit with the
non-target data as the coefficient matrix, A, and the target data
type as the dependent matrix, b as shown in Eq. 6.
Ax̂ = b (6)
The regression, x̂, is then used to predict the target value given
knowledge of all the other data values.
IV. P ERFORMANCE E VALUATION
We used percentage absolute difference and Root Mean
Square Error (RM SE) as performance metrics of data pre-
Fig. 3. Prediction results for test set 2: out of 612 predictions, CDR performed
diction accuracy. We compared the performance of our CDR better in 453 and was within 5% of the true value in 261 cases
method against mean-based and temporal linear regression-
based data prediction models. We tested using two real-world
data sets. In the first data set, our target type is temperature
and uses two types of simulated correlated data. In the second
data set, our target type is particulate matter with a diameter
under 2.5 µm (P M 2.5) and uses three types of real correlated
data (P M 1.0, N O2 and humidity).
A. Temperature
The temperature data was from an area of roughly 22km by
23km and was taken over four days. The experimental area
was split into 25 regions using a 5x5 equal-sized grid. We
split the execution time into 96 epochs with each epoch being
one hour long. We tested the performance of our CDR method
against the existing mean-based method in three test data sets.
To imitate the data impurity, continuous or random errors
were applied on the temperature data streams. The data error
from malicious participants ranged from 25% to 75%. Figures
Fig. 4. Prediction results for test set 3: out of 612 predictions, CDR performed
2 through 4 show CDR’s percentage improvement over the better in 498 and was within 5% of the true value in 213 cases
mean-based method, and each figure shows 612 predictions.
4823
Authorized licensed use limited to: University of Pittsburgh. Downloaded on January 02,2021 at 20:34:54 UTC from IEEE Xplore. Restrictions apply.
Fig. 5. Prediction results for test set 1: out of 640 predictions, CDR performed
better in 379 cases Fig. 6. Prediction results for test set 2: out of 640 predictions, CDR performed
better in 445 cases
B. PM2.5
The air quality data was collected from an area of roughly
120km by 150km. The duration was seven days (149 hours).
CDR was tested against the existing mean-based and temporal
linear regression-based data prediction methods on five test
data sets. To imitate the data impurity, continuous or random
errors were applied on the crowdsensing data streams. The
data error ranged from 25% to 75%.
We tested the performance of our algorithm for different
levels of erroneous data from malicious users. We also varied
the knowledge level of the participants in regards to the experi-
mental environment to imitate sophisticated data manipulation
by a malicious crowdsensing participant. Test set 1 (Fig. 5, Fig. 7. Prediction results for test set 3: out of 640 predictions, CDR performed
better in 442 cases
Fig. 10, Fig. 15) was used for missing data prediction. We
tested with sequential and random data loss patterns. In the
first experiment with erroneous data from malicious users (Fig.
6, Fig. 8, Fig. 11, Fig. 13, Fig. 16, Fig. 18), we assumed
the participants did not have any prior knowledge about the
experimental environment. The data error ranged from 25%
to 75%. One group of malicious participants reported a fixed
percentage of error throughout the experiment. In the second
experiment, we considered that the malicious participant has
extended knowledge about the sensing area (Fig. 7, Fig. 9,
Fig. 12, Fig. 14, Fig. 17, Fig. 19). Thus, these participants try
to change the sensing data by adding noise to the air quality
data of that particular spatio-temporal unit.
1) Percent Error per Prediction: Figures 5 through 9 show
CDR’s percentage improvement over the mean-based method,
and each figure shows 640 predictions. On average CDR
performed better in 70% of cases and is 70% more accurate.
2) Root Mean Square Error by Epoch: Figures 10 through
Fig. 8. Prediction results for test set 4: out of 640 predictions, CDR performed
19 show CDR’s improvement of the root mean square error better in 454 cases
(RM SE) normalized by epoch. We calculated RM SE and
4824
Authorized licensed use limited to: University of Pittsburgh. Downloaded on January 02,2021 at 20:34:54 UTC from IEEE Xplore. Restrictions apply.
Fig. 9. Prediction results for test set 5: out of 640 predictions, CDR performed
better in 533 cases Fig. 11. Prediction results for test set 2: out of 149 epochs, CDR performed
better in 88 epochs
Fig. 10. Prediction results for test set 1: out of 149 epochs, CDR performed
better in 88 epochs
Fig. 12. Prediction results for test set 3: Out of 149 epochs, CDR performed
better in 90 epochs
4825
Authorized licensed use limited to: University of Pittsburgh. Downloaded on January 02,2021 at 20:34:54 UTC from IEEE Xplore. Restrictions apply.
Fig. 14. Prediction results for test set 5: out of 149 epochs, CDR performed Fig. 17. Prediction results for test set 3: out of 149 epochs, CDR performed
better in 115 epochs better in 105 epochs
Fig. 15. Prediction results for test set 1: out of 149 epochs, CDR performed Fig. 18. Prediction results for test set 4: out of 149 epochs, CDR performed
better in 119 epochs better in 93 epochs
Fig. 16. Prediction results for test set 2: out of 149 epochs, CDR performed Fig. 19. Prediction results for test set 5: Out of 149 epochs, CDR performed
better in 119 epochs better in 95 epochs
4826
Authorized licensed use limited to: University of Pittsburgh. Downloaded on January 02,2021 at 20:34:54 UTC from IEEE Xplore. Restrictions apply.
V. C ONCLUSION & F UTURE W ORK [9] S. Tasnim, N. Pissinou, and S. Iyengar, “A novel cleaning approach of
environmental sensing data streams,” in Consumer Communications &
In this paper, we proposed a novel method, named as CDR, Networking Conference (CCNC), 2017 14th IEEE Annual. IEEE, 2017,
for reputation-aware data fusion for mobile crowdsensing data pp. 632–633.
streams. We showed that the proposed mechanism outperforms [10] L. Bracciale, M. Bonola, P. Loreti, G. Bianchi, R. Amici, and A. Rabuffi,
“CRAWDAD dataset roma/taxi (v. 2014-07-17),” Downloaded from url
the existing mean-based and temporal linear regression-based http://crawdad.org/roma/taxi/20140717, Jul. 2014.
data prediction models. We evaluate the approaches based on [11] Y. Zheng, F. Liu, and H.-P. Hsieh, “U-air: When urban air quality
two datasets: Rome crowdsensing temperature and Beijing Air inference meets big data,” in Proceedings of the 19th ACM SIGKDD
international conference on Knowledge discovery and data mining.
quality datasets, to demonstrate CDR’s efficacy in different ACM, 2013, pp. 1436–1444.
scenarios. For the Rome crowdsensing dataset, we achieved [12] Y. Zhang, C. Szabo, and Q. Z. Sheng, “Cleaning environmental sensing
16% better accuracy. Specifically, the 9.3% prediction error data streams based on individual sensor reliability,” in International
in temperature measurements of our approach equates to Conference on Web Information Systems Engineering. Springer, 2014,
pp. 405–414.
roughly 1 degree difference, which is negligible in real-life [13] H.-S. Lim, Y.-S. Moon, and E. Bertino, “Provenance-based trustwor-
applications. With this in mind, we can say that our mechanism thiness assessment in sensor networks,” in Proceedings of the Seventh
predicts temperature values with high accuracy. In case of the International Workshop on Data Management for Sensor Networks.
ACM, 2010, pp. 2–7.
air quality dataset, our CDR method incurred on average 25%
[14] K. L. Huang, S. S. Kanhere, and W. Hu, “On the need for a reputation
and 59% less RM SE than mean-based and temporal linear re- system in mobile phone based sensing,” Ad Hoc Networks, vol. 12, pp.
gression models, respectively. Our data fusion method incurred 130–149, 2014.
an average RM SE of 0.66 per epoch, which insinuates higher [15] D. Peng, F. Wu, and G. Chen, “Pay as how well you do: A quality
based incentive mechanism for crowdsensing,” in Proceedings of the
data prediction accuracy. The success of our approach lies in 16th ACM International Symposium on Mobile Ad Hoc Networking and
the integration of dynamic trust evaluation of the sensed data Computing. ACM, 2015, pp. 177–186.
which allows us to defend data corruption attack and identify [16] S. Liu, Z. Zheng, F. Wu, S. Tang, and G. Chen, “Context-aware data
quality estimation in mobile crowdsensing,” in INFOCOM 2017-IEEE
malicious or honest participants based on their reported data in Conference on Computer Communications, IEEE. IEEE, 2017, pp. 1–9.
real time. In the future, we will extend this work considering [17] Y. Kishino, K. Takeuchi, Y. Shirai, F. Naya, and N. Ueda, “Datafying
collusion attack of malicious participants. city: Detecting and accumulating spatio-temporal events by vehicle-
mounted sensors,” in Big Data (Big Data), 2017 IEEE International
ACKNOWLEDGMENT Conference on. IEEE, 2017, pp. 4098–4104.
[18] J. W. Branch, C. Giannella, B. Szymanski, R. Wolff, and H. Kargupta,
The work was supported by the National Science Founda- “In-network outlier detection in wireless sensor networks,” Knowledge
tion Grant number CNS-1560134 for the Research Experience and information systems, vol. 34, no. 1, pp. 23–54, 2013.
for Undergraduates, Advanced Secured Sensor Enabling Tech- [19] A. Deligiannakis, Y. Kotidis, V. Vassalos, V. Stoumpos, and A. Delis,
“Another outlier bites the dust: Computing meaningful aggregates in
nologies, and the Dissertation Year Fellowship support pro- sensor networks,” in Data Engineering, 2009. ICDE’09. IEEE 25th
vided by Florida International University’s Graduate School. International Conference on. IEEE, 2009, pp. 988–999.
The authors would like to thank Eric Xu for his contribution. [20] N. Giatrakos, Y. Kotidis, A. Deligiannakis, V. Vassalos, and Y. Theodor-
idis, “Taco: tunable approximate computation of outliers in wireless sen-
R EFERENCES sor networks,” in Proceedings of the 2010 ACM SIGMOD International
Conference on Management of data. ACM, 2010, pp. 279–290.
[1] F. Chen, P. Deng, J. Wan, D. Zhang, A. V. Vasilakos, and X. Rong, [21] G. Chen, X.-Y. Liu, L. Kong, J.-L. Lu, Y. Gu, W. Shu, and M.-Y. Wu,
“Data mining for the internet of things: literature review and challenges,” “Multiple attributes-based data recovery in wireless sensor networks,” in
International Journal of Distributed Sensor Networks, vol. 11, no. 8, p. Global Communications Conference (GLOBECOM), 2013 IEEE. IEEE,
431047, 2015. 2013, pp. 103–108.
[2] Z. Feng and Y. Zhu, “A survey on trajectory data mining: techniques [22] L. Kong, M. Xia, X.-Y. Liu, M.-Y. Wu, and X. Liu, “Data loss and
and applications,” IEEE Access, vol. 4, pp. 2056–2067, 2016. reconstruction in sensor networks,” in INFOCOM, 2013 Proceedings
[3] F. Restuccia, N. Ghosh, S. Bhattacharjee, S. K. Das, and T. Melodia, IEEE. IEEE, 2013, pp. 1654–1662.
“Quality of information in mobile crowdsensing: Survey and research [23] S. Gill, B. Lee, and E. Neto, “Context aware model-based cleaning of
challenges,” ACM Transactions on Sensor Networks (TOSN), vol. 13, data streams,” in Signals and Systems Conference (ISSC), 2015 26th
no. 4, p. 34, 2017. Irish. IEEE, 2015, pp. 1–6.
[4] P. M. Aoki, R. Honicky, A. Mainwaring, C. Myers, E. Paulos, S. Subra-
[24] S. Krishnan, J. Wang, E. Wu, M. J. Franklin, and K. Goldberg, “Active-
manian, and A. Woodruff, “A vehicle for research: using street sweepers
clean: interactive data cleaning for statistical modeling,” Proceedings of
to explore the landscape of environmental community action,” in Pro-
the VLDB Endowment, vol. 9, no. 12, pp. 948–959, 2016.
ceedings of the SIGCHI Conference on Human Factors in Computing
Systems. ACM, 2009, pp. 375–384. [25] A. Lazar, L. Jin, C. A. Spurlock, K. Wu, and A. Sim, “Data quality
[5] H. Mousa, S. B. Mokhtar, O. Hasan, O. Younes, M. Hadhoud, and challenges with missing values and mixed types in joint sequence
L. Brunie, “Trust management and reputation systems in mobile par- analysis,” in Big Data (Big Data), 2017 IEEE International Conference
ticipatory sensing applications: A survey,” Computer Networks, vol. 90, on. IEEE, 2017, pp. 2620–2627.
pp. 49–73, 2015. [26] H. Liu, A. K. Tk, J. P. Thomas, and X. Hou, “Cleaning framework for
[6] X. Kang, L. Liu, and H. Ma, “Data correlation based crowdsensing bigdata: An interactive approach for data cleaning,” in Big Data Com-
enhancement for environment monitoring,” in Communications (ICC), puting Service and Applications (BigDataService), 2016 IEEE Second
2016 IEEE International Conference on. IEEE, 2016, pp. 1–6. International Conference on. IEEE, 2016, pp. 174–181.
[7] I. Koukoutsidis, “Estimating spatial averages of environmental param- [27] S. Tasnim, J. Caldas, N. Pissinou, S. Iyengar, and Z. Ding, “Semantic-
eters based on mobile crowdsensing,” ACM Transactions on Sensor aware clustering-based approach of trajectory data stream mining,” in
Networks (TOSN), vol. 14, no. 1, p. 2, 2018. 2018 International Conference on Computing, Networking and Commu-
[8] S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki, and nications (ICNC). IEEE, 2018, pp. 88–92.
D. Gunopulos, “Online outlier detection in sensor data using non- [28] X. L. Dong, B. Saha, and D. Srivastava, “Less is more: Selecting sources
parametric models,” in Proceedings of the 32nd international conference wisely for integration,” in Proceedings of the VLDB Endowment, vol. 6,
on Very large data bases. VLDB Endowment, 2006, pp. 187–198. no. 2. VLDB Endowment, 2012, pp. 37–48.
4827
Authorized licensed use limited to: University of Pittsburgh. Downloaded on January 02,2021 at 20:34:54 UTC from IEEE Xplore. Restrictions apply.
[29] S. Burdakis and A. Deligiannakis, “Detecting outliers in sensor networks
using the geometric approach,” in Data Engineering (ICDE), 2012 IEEE
28th International Conference on. IEEE, 2012, pp. 1108–1119.
[30] C. Franke and M. Gertz, “Orden: Outlier region detection and explo-
ration in sensor networks,” in Proceedings of the 2009 ACM SIGMOD
International Conference on Management of data. ACM, 2009, pp.
1075–1078.
[31] M. Keally, G. Zhou, and G. Xing, “Watchdog: Confident event detection
in heterogeneous sensor networks,” in Real-Time and Embedded Tech-
nology and Applications Symposium (RTAS), 2010 16th IEEE. IEEE,
2010, pp. 279–288.
[32] B. Sheng, Q. Li, W. Mao, and W. Jin, “Outlier detection in sensor
networks,” in Proceedings of the 8th ACM international symposium on
Mobile ad hoc networking and computing. ACM, 2007, pp. 219–228.
[33] H. H. Bosman, G. Iacca, A. Tejada, H. J. Wörtche, and A. Liotta,
“Spatial anomaly detection in sensor networks using neighborhood
information,” Information Fusion, vol. 33, pp. 41–56, 2017.
4828
Authorized licensed use limited to: University of Pittsburgh. Downloaded on January 02,2021 at 20:34:54 UTC from IEEE Xplore. Restrictions apply.