0% found this document useful (0 votes)
36 views12 pages

UJIIndoorLoc: WLAN Indoor Localization Database

Uploaded by

gundam98795
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views12 pages

UJIIndoorLoc: WLAN Indoor Localization Database

Uploaded by

gundam98795
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

See discussions, stats, and author profiles for this publication at: [Link]

net/publication/283894296

UJIIndoorLoc: A new multi-building and multi-floor database for WLAN


fingerprint-based indoor localization problems

Conference Paper · October 2014


DOI: 10.1109/IPIN.2014.7275492

CITATIONS READS

276 3,397

7 authors, including:

Joaquín Torres-Sospedra Adolfo Martínez-Usó


University of Minho Universitat Jaume I
191 PUBLICATIONS 2,299 CITATIONS 36 PUBLICATIONS 1,036 CITATIONS

SEE PROFILE SEE PROFILE

Joan Pere Avariento Mauri Benedito-Bordonau


Universitat Jaume I Universitat Jaume I
4 PUBLICATIONS 367 CITATIONS 5 PUBLICATIONS 413 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

A-WEAR View project

IPIN 2019 View project

All content following this page was uploaded by Joaquín Torres-Sospedra on 16 March 2020.

The user has requested enhancement of the downloaded file.


This is a preprint version of the paper.

J. Torres-Sospedra et al., "UJIIndoorLoc: A new multi-


building and multi-floor database for WLAN fingerprint-
based indoor localization problems", in Proceedings of the
Fifth International Conference on Indoor Positioning and
Indoor Navigation (IPIN 2014), Busan (Korea), 2014, pp.
261-270.

Abstract: Although indoor localization is a key topic for


mobile computing, it is still very difficult for the mobile
sensing community to compare state-of-art localization
algorithms due to the scarcity of databases. Thus, a multi-
building and multi-floor localization database based on
WLAN fingerprinting is presented in this work, being its
public access granted for the research community. The
here proposed database not only is the biggest database
in the literature but it is also the first publicly available
database. Among other comprehensively described
features, full raw information taken by more than 20 users
and by means of 25 devices is provided.

Available at:
[Link]
[Link]
UJIIndoorLoc: A New Multi-building and
Multi-floor Database for WLAN Fingerprint-based
Indoor Localization Problems

Joaquı́n Torres-Sospedra⇤ , Raúl Montoliu⇤ , Adolfo Martı́nez-Usó† , Joan P. Avariento⇤ ,


Tomás J. Arnau⇤ , Mauri Benedito-Bordonau⇤ , and Joaquı́n Huerta⇤
⇤ Institute of New Imaging Technologies, Universitat Jaume I, Avda. Vicente Sos Baynat S/N, 12071, Castellón, Spain.
† Dpto. de Sistemas Informáticos y Computación, Universitat Politècnica de València, Valencia, Spain.

Abstract—Although indoor localization is a key topic for


mobile computing, it is still very difficult for the mobile sensing
community to compare state-of-art localization algorithms due to
the scarcity of databases. Thus, a multi-building and multi-floor
localization database based on WLAN fingerprinting is presented
in this work, being its public access granted for the research
community. The here proposed database not only is the biggest
database in the literature but it is also the first publicly available
database. Among other comprehensively described features, full
raw information taken by more than 20 users and by means of
25 devices is provided.

I. I NTRODUCTION
Many real world applications need to know the localization Fig. 1. Timeline and number of research works records on indoor localization
of a user in the world to provide their services. Therefore, from 2004 to 2013 (records collected in September 2013 [4]).
automatic user localization has been a hot research topic in the
last years. Automatic user localization consists of estimating
the position of the user (latitude, longitude and altitude) by his/her position and creates a test sample. This sample is sent
using an electronic device, usually a mobile phone. Outdoor to the server to be compared with the training samples of the
localization problem can be solved very accurately thanks to radio map. Basically, the user’s location corresponds to the
the inclusion of GPS sensors into the mobile devices. However, position associated with the most similar sample in the radio
indoor localization is still an open problem mainly due to the map.
loss of GPS signal in indoor environments.
One of the major advantages of the WLAN fingerprint-
A spectacular growth of indoor localization studies has based methods is that they do not require the installation of
been witnessed during the last decade (see Figure 1), and the any additional hardware since they use the existing WLAN
WLAN-based ones is the basis for many indoor localization infrastructure. Therefore, the location of the user can be
approaches. This is mainly due to the proliferation of both obtained without additional infrastructures and costs. However,
wireless local area networks (WLANs) and mobile devices. WLANs were not natively designed to support a positioning
Nowadays WLANs can be found anywhere, and mobile phones function. Taking into account the existing obstacles introduced
have increasingly become an indispensable part of our daily by the indoor environment (including reflections and multi path
lives and, therefore, we can safely expect that the user is at interference) the spread of radio signal in indoor environments
the same location than the mobile device. The last generation is very hard to predict [5]. In addition, in WLAN-based po-
of these devices (also known as smartphones) not only pro- sitioning systems, the user typically carries the mobile device
vides programmable abilities but they carry embedded sensors with him/her, being his/her motion or how the device is carried
[1] like GPS, accelerometer, gyroscope, microphone, camera, an important factor that affects the measured RSSI values [6].
bluetooth, etc. which have even been used to study social
interactions [2] or predict human behavior [3] among many Although there are many papers in the literature trying
other studies. to solve the indoor localization problem using a WLAN
fingerprint-based method, there still exists one important draw-
WLAN Fingerprint-based positioning systems are based on back in this field which is the lack of a common database
the Received Signal Strength Indicator (RSSI) value. Com- for comparison purposes. Each approach presents its estimated
monly, two phases are needed: calibration and operation [5]. results using its own database and describes how the experi-
In the calibration phase, a radio map of the area where the ment was carried out. Under these conditions, it is not possible
users should be detected is constructed. Later, during the to compare different methods since the particularities of each
operational phase, a user obtains the signal strength of all experiment are hardly reproducible. In the Pattern Recognition
visible access points of the WLAN that can be detected from and Machine Learning research fields, the common practice is
to test the results of each proposal either using a well-known II. R ELATED WORK
dataset or providing the dataset used. In this way, researchers
are able to fairly compare different methodologies in the Indoor positioning and localization literature is vast. In
literature. For instance, the UCI Machine Learning Repository1 [8], authors categorised approaches according to the tech-
is a well-known example [7] in this sense. However, in the nique used for localization into several paradigms, includ-
WLAN fingerprint-based indoor localization field does not ing calibration-free localization [9], WLAN based techniques
exist such kind of database. [10], Dead-reckoning [11], simultaneous localization and map-
ping (SLAM) [12] and multi-modal sensing [13]. Another
In this paper, the UJIIndoorLoc database is presented to classification can be found in [14], where fingerprint-based
overcome this gap. We expect that the proposed database will indoor localization has been particularly classified into two
become the reference database to compare different indoor categories: infrastructure-based and infrastructure-less ap-
localization methodologies. As far as we know, the proposed proaches. Infrastructure-based approaches rely on the deploy-
database is the first public accessible database in this field and ment of customized Radio-Frequency beacons (RFID, infrared,
researchers can access to the database following this url2 . ultrasound, bluetooth, led lights, etc.) that can be carefully
optimized for a particular purpose. The main drawback of these
The main contribution of this work is the creation and the approaches is that they need their own customized hardware.
presentation of the UJIIndoorLoc database which is the biggest However, infrastructure-less approaches use the already avail-
database in the literature as it was previously mentioned. It able wireless signals to profile a location, taking advantage
would also be the first publicly available database that could of the powerful mobile phones sensors. Our work is an
be used to make comparisons among different methods in this infrastructure-less approach since we use the already available
field. The main characteristics of the database3 are: WLAN access points (WAPs) to construct the database by
using mobile phones.
• It covers a surface of 108703m2 including 3 buildings
with 4 or 5 floors depending on the building. There are also many works dealing with the indoor lo-
calization problem by using WLAN-based techniques. Table I
• The number of different places (reference points) shows the main characteristics of the data used in some of the
appearing in the database is 933. most important papers in this field. In [5] a new fingerprint-
based method was proposed which uses a previously stored
• 21049 sampled points have been captured: 19938 for map of the signal strength at several positions and determines
training/learning and 1111 for validation/testing. the position using similarity functions and majority rules.
• Dataset independence has been assured by taking According to the authors, their proposed method is able to
Validation (or testing) samples 4 months after Training obtain high rates determining the building, the floor and the
ones. place, with an average error around 3 meters. The database
used in [5] has (see Table I) 9358 sample points taken from
• The number of different wireless access points (WAPs) 2 buildings of 3 floors each one, with 101 different WAPs
appearing in the database is 520. in the database. However authors do not provide information
about the number of users or the number of different devices
• Data were collected by more than 20 users using 25 used to capture the samples. Information about the covered
different models of mobile devices (some users used surface was also not provided. The database used in [5] was
more than one model). the biggest one in the previous literature but it is not public
and it has 55% less samples than the one we propose here,
Two Android applications have been used to create the 80% less number of WAPs and 60% less places, among other
database CaptureLoc and ValidationLoc. Both applications use data.
as a reference map services that are published in ArcGIS
server. These services contain the geographic information of TABLE I. M AIN CHARACTERISTICS OF THE DATA USED IN SOME OF
the building interiors as well as the training reference points THE MOST IMPORTANT PAPERS IN THE INDOOR LOCALIZATION FIELD :
Number of buildings (NB ), Surface (S URF.), Number of floors (NF ),
localization. Using these services, the applications created Number of places (NP ), Number of Samples (NS ), Number of WAPs
show the maps to improve the user localization for training and (NW ), AND Number of Devices (ND ). N/A STANDS FOR INFORMATION
validation. Data were collected at three multi-floor buildings NOT AVAILABLE OR NOT PROVIDED BY AUTHORS .
of the Jaume I University 4 (UJI).
Work NB Surf. NF NP NS NW ND
The rest of the paper has been organized as follows: [5] 2 N/A 3 392 9358 101 N/A
Section II presents the related work. Section III explains in [15] 1 N/A 1 96 2880 206 2
detail the proposed database. How this database has been [16] 1 N/A N/A N/A N/A 9 N/A
created is commented in Section IV. Section VI describes the [17] 1 980m2 1 50 N/A N/A N/A
most important challenges we contemplate using the proposed [14]a 3 2730m2 3 120 N/A 434 1
database. Finally, in Section VII some conclusions are given. [14]b 3 69000m2 1 13 N/A 379 1
Our 3 108703m2 4/5 933 21049 520 25
1 [Link]
2 [Link]
3 The first version of the database only covers 3 buildings, our idea is to In [15] a different approach that uses only the rankings of
cover all the university facilities (30 buildings), so the final version of the the RSSI values is used. Authors argue that their method is
database will be approximately 10 times higher. better to avoid the well known problem of having hardware
4 [Link] and software differences between user devices, that produces
that the RSSI reported by the current mobile device may 001-520 RSSI levels
differ from the RSSI in the database, and therefore this can
degrade the positioning accuracy. According to the information 521-523 Real world coordinates of the sample points
provided by authors, the database used is also significantly 524 BuildingID
smaller than the proposed in [5]. For instance, the number of
WAPs, which is the only criterion in which work is superior 525 SpaceID
to [5], is still 60% less than ours. In [16] two improvements 526 Relative position with respect to SpaceID
are proposed to the common way used to solve the WLAN-
based fingerprint problem. The first issue is for differing 527 UserID
antenna attenuation among different devices. The second is for 528 PhoneID
dealing with environments where not every beacon is visible
everywhere. Although the methods proposed in [15] and in 529 Timestamp
[16] are both promising, since they try to solve some of the
common problems in this field, the databases used (see Table As an example, Table II shows an extract of one record.
I) are very small and are different. Due to the length of this record, only two RSSI levels are
shown and being the values of the spatial coordinates truncated
Other WLAN-based works as [18], [19], [20], [21], [22], to one decimal. The example corresponds to the 7754th record
[23], [24], [25], [26], [27], [28], [29], [30], [31], [32], [33] from the training set.
have been not included in Table I since they do not provide
1) RSSI Levels: The most important information for
information about the characteristics of the database or the
WLAN fingerprinting comparison purposes are the WAPs
information provided shows that their databases are very small
detected and their RSSI level values. In the proposed database,
with respect to the one proposed in this paper or even the one
this information represents the 98% of the data given in each
used in [5]. The last three rows of the Table I show information
record (520 vector positions out of 529) as a 520-element
about the databases used in two papers [17], [14] which are
vector of integer values. These values represent the RSSI levels
not WLAN-based. [17] uses a RF-based method, and not all
whereas the WAP identifiers (MAC addresses) are linked to the
information about the database is available. [14] proposed a
vector positions. Table III expands the example record given
FM signal-based method. They performed experiments in two
in Table II showing its RSSI levels. This representation has
scenarios named as [14]a and [14]b in Table I. In both cases,
been adopted due to the number of different WAPs detected
the authors tested their proposed FM-based method together
on the three buildings (that is, 520) and the fact that Android
with a WLAN-based methodology. The database of the second
provides integer RSSI levels.
scenario has similar size to the one proposed in this paper, but
it is still smaller than ours. For instance, it covers 37% less The method getScanResults() included in WifiManager5
surface, and the number of WAPs is 28% smaller. In addition, Android class has been used to obtain the list of detected
just one device were used in their experiments. WAPs from each localization in each capture. This list, shown
in Table IV, contains the MAC address and the corresponding
In general, each indoor localization work used their own
intensity level for any detected WAP. The MAC addresses
data to publish the results, and therefore it is quite difficult to
are coded as strings, and the RSSI levels correspond to
make comparison among the proposed methods. A common
negative integer values6 measured in dBm, where 100dBm
database is needed to perform this task. As it has been
is equivalent to a very weak signal, whereas 0dBM means that
commented before, in the Pattern Recognition and Machine
the detected WAP has an extremely good signal. Although the
Learning research fields, it is a common practice to share
list shown in the example is sorted according to the intensity
databases to allow other researchers to provide comparable
value, this ordering depends mostly on the device (model and
results when using their proposals. Up to date, WLAN-based
Android version).
indoor localization methods have relatively small databases
which in some cases, come just from one building being often The database includes 520 WAPs identified by the MAC
captured using a small number of devices or by a small number address. These addresses have been alphabetically sorted and
of users. The database presented in this paper overcomes all sequentially renamed to WAPnnnn . We use these new identi-
these lacks. fiers instead of the MAC addresses due to privacy reasons.
A total number of 520 WAPs appear in the database and
III. UJIIndoorLoc DATABASE DESCRIPTION the 520-element vector from each record contains the raw
In this section, the proposed database is completely de- intensity levels of the detected WAPs from a single WiFi
scribed. First, Section III-A provides details about the informa- scan. Obviously, not all the WAPs are detected in each scan.
tion published for each capture (i.e. for each sampled point or For instance, only 14 WAP identifiers were detected in the
record). Second, Section III-B shows how the whole database scan example shown in Table IV. The RSSI levels for these
has been split into sets for training and validation purposes. WAPs remain unaltered, using the artificial value +100dBm
by default in those WAPs that have not been detected by the
device.
A. Description of elements stored
It is important to mention how the RSSI values are dis-
As it was previously introduced, the whole proposed tributed in the proposed database. Figure 2 introduces the
database contains 21049 records. Each record is directly related
to a single capture and it contains the following 529 numeric 5 [Link]

elements: 6 [Link]
TABLE II. E XAMPLE OF ONE DATABASE ENTRY (7754- TH RECORD ). I T WAS CAPTURED ON J UNE , 4 TH 2013 ([Link] PM GMT+02) BY U SER 11
WITH A HTC Wildfire S A510e (A NDROID VERSION 2.3.5). T HE DEVICE DETECTED 14 WAP ( NEGATIVE RSSI VALUES ) ON THE REFERENCE POINT
LOCATED OUTSIDE OFFICE 111 ON THE THIRD FLOOR OF THE TI BUILDING .

[1] ... [520] [521] [522] [523] [524] [525] [526] [527] [528] [529]
WAP001 ... WAP520 Longitude Latitude Floor BuildingID SpaceID [Link]. UserID PhoneID Time
-97 ... +100 -7594.7... 4864983.9... 3 0 111 2 11 13 1370340142

TABLE III. E XTRACT OF THE VECTOR THAT REPRESENTS THE RSSI VALUES . MAC ADDRESSES HAVE BEEN ANONYMIZED DUE TO PRIVACY REASONS .

WAP001 ... WAP031 WAP032 WAP033 WAP034 WAP035 WAP036 ... WAP520
-97 ... +100 -97 +100 +100 -65 -65 ... +100

TABLE IV. E XTRACT OF THE WAP S LIST WITH 14 RSSI VALUES


PROVIDED BY getScanResults().
scan are the location, the phone model (Android version and
hardware) and how the device is held, although we are aware
pos. in list WAP Identifier RSSI level that there exist other factors as is demonstrated in [6], [5].
1st WAP032 97dBm
2nd WAP001 97dBm 1800"

3rd WAP268 97dBm 1600"

4th WAP150 94dBm 1400"

Number'of'database'records'
···
1200"
11th WAP036 65dBm
1000"
12th WAP035 65dBm
13th WAP142 48dBm 800"

14th WAP143 46dBm 600"

400"

200"

frequency distribution of all the individual intensity levels 0"

recorded in our database (374234 intensity values). Although 0" 2" 4" 6" 8" 10" 12" 14" 16" 18" 20" 22" 24" 26" 28" 30" 32" 34" 36" 38" 40" 42" 44" 46" 48" 50"
Number'of'WAPs'detected'

these levels range from 104dBm to 0dBm, the number


of individual measures inside range [ 45dBm...0dBm] is Fig. 3. Frequency distribution of the Number of WAPs that are detected on
insignificant and cover only 1.7% of total RSSI levels recorded a single capture.
in our database. Similarly, the number of single measures
reporting a value lower than 95dBm is also insignificant 2) Real-world coordinates: Real-world coordinates are
(2.3% of total RSSI measures). represented in each sample/capture by means of three values in
each record (vector positions from 521 to 523), the longitude
100000" 90000" and latitude coordinates (in meters with UTM from WGS84)
80000" and the floor of the building. An example of these values is
10000" shown in Table V.
Number'of'WAPs'detected'(lineal)'
Number''of'WAPs'detected'(log)'

70000"

60000"
1000" TABLE V. D ETAILED COORDINATES AND FLOOR OF THE PLACE
50000"
WHERE THE STORED CAPTURE ON THE 7754 TH RECORD WAS TAKEN .
40000"
100"
30000"
Longitude Latitude Floor
20000"
-7594.736999999732 4864983.902400002 3
10"

10000"

1" 0" 3) Space identifiers: BuildingID, vector position 524, is an


integer value (from 0 to 2) that corresponds to the building
5… ["

[-5 ["
["
0… ["
5… ["
0… ["
5… ["
0… ["
5… ["
0… ["
5… ["
0… ["
5… ["
0… ["
5… ["
0… ["
5… ["
0… ["
5… ["

0… "
00 00"

0"
[-1 10[
[-9 -95

-5
…0
[-9 -90
[-8 -85
[-8 -80
[-7 -75
[-7 -70
[-6 -65
[-6 -60
[-5 -55
[-5 -50
[-4 -45
[-4 -40
[-3 -35
[-3 -30
[-2 -25
[-2 -20
[-1 -15
[-1 <-1

-

RSSI'level' in which the capture was taken. Figure VI shows: the UJI
University campus (left); the three buildings of the School
Fig. 2. Frequency distribution of the number of times that a RSSI value of Technology and Experimental Sciences (center image),
appears in the proposed database. Red bars stand for the values in linear scale hereafter ESTCE; and a zoom inside the third floor of the
(right scale) and blue bars stand for the values in logarithmic scale (left scale).
TI building (right). This figure has been introduced to show
where the example point (Table V) is exactly located. Table
Figure 3 shows the number of WAPs detected in a single VI shows the BuildingID for each building of the ESTCE.
capture. This number ranges from 0 (where there is not any
WiFi coverage) to 51. So, localizations with no coverage have TABLE VI. R ELATION BETWEEN BuildingID AND THE REAL
not been removed from the database. Finally, the average BUILDING .
number of WAPs scanned in each capture is 17.92, therefore,
approximately 500 elements of the previously described vector Building ID Real Building
contain out of range values (represented as +100dBm). It 0 ESTCE - TI
is worth saying that, according to our experiments, the main 1 ESTCE - TD
factors that affect to the number of WAPs reported by a WiFi 2 ESTCE - TC
111

Interior TI Building
UJI Campus Tx Buildings Third Floor

Fig. 4. Map of the UJI Riu Sec Campus and zoom on the Tx Buildings. Pink refers to the ESTCE - Tx building on the UJI Campus map (left). On the Tx
building zoom (right): red refers to TI building, green corresponds to TD building and blue stands for TC building. On the interior of TI building, the blue point
is the reference point.

TABLE VIII. C ORRESPONDENCE AMONG UserID, THE ANONYMIZED


The position 525 of the vector that represents each sample USER WHO TOOK THE CAPTURE AND ITS HEIGHT.
is called SpaceID and contains a single integer value that, in
this case, is used to identify the particular space (offices, labs, UserID Anonymized user Height
etc.) where the capture was taken. The relative position with 0 USER0000 (Validation User) -
respect to the space is also provided in the 526 position and 1 USER0001 170
it denotes if the capture was taken inside (value 1) or outside 2 USER0002 176
(value 2) the space at the corridor. Outside means in front of 3 USER0003 172
the door of the space. Following with our previous example, 4 USER0004 174
the values of these fields on the 7754-th record are shown in 5 USER0005 184
Table VII. According to these values, the capture was taken 6 USER0006 180
on the reference point located outside office 111 at ESTCE - 7 USER0007 160
TI building (TI). 8 USER0008 176
9 USER0009 177
TABLE VII. R EFERENCE POINT IN WHICH THE 7754TH RECORD WAS 10 USER0010 186
TAKEN .
11 USER0011 176
Floor BuildingID SpaceID [Link]. 12 USER0012 158
3 0 111 2 13 USER0013 174
14 USER0014 173
As Section III-B describes, the database is split into two 15 USER0015 174
subsets: the training subset and the validation subset. In the 16 USER0016 171
training subset, the reference points are well-specified, being 17 USER0017 166
these points captured by, at least, two users. However, in the 18 USER0018 162
validation subset, the measures were taken at arbitrary points as
would happen in a real localization system and, for this reason,
these reference points (identified by SpaceID and Relative
position) are not stored in the validation records. This fact
is denoted by assigning the default value 0 to both fields. As it can be noticed from Table IX, the number of different
devices used is 20 (25 if the Android version is considered).
4) User Identifier: UserID, position 527, contains an in- There have been a few cases in which some users have the
teger value that ranges from 1 to 18. This value is used same device model. Concretely, three different users own a
to represent the 18 different users who participated in the Nexus 4 and they participated in the procedure to create the
procedure to generate the training samples. This field has not validation set. Moreover, a LT22i phone has been shared by
been recorded in the validation phase, so the default value 0 two users to generate the training samples.
is used to denote it. The height of each user is also provided.
This information could be useful because the concrete spatial
position of the device has a direct impact on the measured
6) Timestamp: Finally, we have also included a Timestamp
RSSI values [6]. Table VIII shows the correspondence among
register in the 529 position of the vector representing the time
UserID, anonymized user name and the user’s height.
(in Unix time format) in which the capture was taken. This
5) Phone Identifier: Similarly, PhoneID (position 528) time was set by a centralized server to avoid outliers. The
contains an integer value to represent the Android device used timestamps provided by each device is not recorded because
in each capture. Table IX shows the correspondence between the device’s timing settings could be different and we could
each PhoneID and its associated device (model and version). not trust on the time provided by them. In fact, wrong timing
Moreover, the device used by each user is also shown. As settings could add noise to our records or invalidate them. E.g,
mentioned before, users from USER0001 to USER0018 are a capture taken at morning could be incorrectly stored as taken
those who participated in the procedure of generating the at evening. Another severe case occurs if the time has not been
training set, whereas user USER0000 is used to denote that setup, so captures taken at 00:00 Jan 1st 2001 should not be
the device was used to generate the validation set. recorded because the proposed database was done in 2013.
TABLE IX. C ORRESPONDENCE BETWEEN PhoneID AND REAL
DEVICE . R EAL DEVICE ’ S INFORMATION INCLUDES THE MODEL
IV. H OW IT WAS MADE
DESCRIPTION AND A NDROID VERSION . U SERS WHO EMPLOYED THE
DEVICE ARE ALSO SHOWN .
As mentioned in Section III, the UJIIndoorLoc database is
split into two independent subsets: the Training set, and the
PhoneID Android Device Android Ver. UserID Validation set. This section introduces how the two sets were
0 Celkon A27 4.0.4(6577) 0 created.
1 GT-I8160 2.3.6 8
2 GT-I8160 4.1.2 0
3 GT-I9100 4.0.4 5 A. Generating the training set
4 GT-I9300 4.1.2 0 An Android application called CaptureLoc has been devel-
5 GT-I9505 4.2.2 0 oped to capture records for the training set. This application
6 GT-S5360 2.3.6 7 collects all the required information (See Section III) and
7 GT-S6500 2.3.6 14 sends it to a centralized server which permanently stores it.
8 Galaxy Nexus 4.2.2 10 This collect-and-send procedure is automatically repeated ten
9 Galaxy Nexus 4.3 0 times for each captured location due to the harsh nature of
10 HTC Desire HD 2.3.5 18 the WLAN signals propagation [34]. Interaction user-device
11 HTC One 4.1.2 15 is required to select the user identifier and to select the place
12 HTC One 4.2.2 0 where the capture will be taken. Figure 5 shows an example
13 HTC Wildfire S 2.3.5 0,11 of the user-device interaction.
14 LT22i 4.0.4 0,1,9,16
15 LT22i 4.1.2 0
16 LT26i 4.0.4 3
17 M1005D 4.0.4 13
18 MT11i 2.3.4 4
19 Nexus 4 4.2.2 6
20 Nexus 4 4.3 0
21 Nexus S 4.1.2 0
22 Orange Monte Carlo 2.3.5 17
23 Transformer TF101 4.0.3 2
24 bq Curie 4.1.1 12

B. Database division Fig. 5. Screenshot of CaptureLoc. On the left, example where the capture
is done (red circle). Button Send Fingerprint starts the collect-and-send
The whole database is split into two different sets: the procedure. On the right, the result of a capturing process that reports four
Training set and the Validation set. On the one hand, the errors.
training set provides fully-detailed measures whose location
corresponds to predefined reference points. On the other hand, To generate the training set, all the closed spaces of the
the validation set provides the same information on arbitrary three buildings (offices, laboratories, classrooms, WCs, among
points. Table X shows the information about the two sets. other spaces) have been initially considered as important places
where the captures should be done. Then, one reference point
TABLE X. BASIC FEATURES OF BOTH DATABASE SUBSETS inside each space and, at least, another reference point outside
Training Validation each space (i.e. at corridors) have been selected as reference
Captures 19674 1111 points for all the considered closed spaces. The point inside the
WAPs 465 367 space is located at the centroid of the closed space, whereas
RSSI Range [ 104 . . . 0]dBm [ 102 . . . 34]dBm the outside point is located in front of the door. If the space
Ref. points 933 N one⇤ has multiple accesses, we have selected one reference point
Users 18 U nknown⇤⇤ per entrance (door). Figure 6 shows a graphical example of
Devices 16 11 how and where the reference points are located.

There was not any established reference point for valida- Then, 18 users performed the captures to generate the
tion. training set. The reference points were uniformly distributed to
⇤⇤
The validation stage does not store the user id in order the users with the restriction that any reference point should
to be more realistic. be covered by, at least, two users. Any further suggestion,
advice and/or direct order were not provided to the users,
Although both the training subset and the validation subset and they were free to capture the assigned reference points
contain the same information, the latter includes the value 0 in on their own way. Figure 5 (right) shows a capture process in
some fields. These fields are: SpaceID, Relative Position with which there were some errors (captures 5, 8, 9 and 10 were
respect to SpaceID and UserID. As it has been commented not recorded), so the user decided to repeat the process in the
before, this information was not recorded because the valida- same reference point. The user was in charge of deciding if the
tion captures were taken at arbitrary points and the users were capture procedure must be repeated or not. Errors on capturing
not tracked in this phase. This fact tries to simulate a real are often related to low internet coverage (either 3G nor WiFi)
localization system. and they have only been reported on a few places.
V. UJIIndoorLoc BASELINE
In this section, the proposed database has been used with
a basic indoor positioning system to provide a baseline for
further comparisons. Note that the objective of this work is not
to provide an accurate indoor positioning system, the objective
is to provide an objective database which can be used for
comparing positioning systems and other algorithms based on
WLAN-fingerprinting.
We considered that the distance-based technique k-Nearest
Neighbor (kNN) [35] can be used as baseline for comparison
purposes. In particular, we have developed the 1NN technique
(k = 1) in conjunction to the Euclidean Distance as a basic
Fig. 6. Example of reference points located at the first floor of TI building
(left) and third floor of the TC building (right). Red points corresponds to
indoor localization system. The necessary steps to localize a
the reference points inside closed spaces, where blue points stands for the current fingerprint are:
reference points taken in front of the door/doors (outside the spaces).
• The Euclidean Distance of the current fingerprint with
respect to all the fingerprints included in the training
Finally, the 18 users have covered 924 reference points. set is calculated.
In general, each reference point has been registered by, at • The current fingerprint location corresponds to the Eu-
least, two users, so more than 18400 training samples were clidean’s closest training fingerprint location if there
expected to be recorded (19937 records were finally obtained). is only one candidate with the shortest distance.
There have been a few cases in which the user has repeated
the capture procedure due to connection errors on the first • When some candidates provide the shortest distance,
trial (Figure 5) and, moreover, there are some areas that have we apply a voting procedure to extract the “winning”
been covered by 3 users. Although all the suggested reference building and floor. Then, the position corresponds to
points located outside the spaces were captured, the users were the average of the location provided by the Euclidean’s
not always allowed to capture inside some restricted spaces closest training fingerprints that are on the winning
(chemical laboratories with biohazard labels, private offices, building and floor. In case of tie, a localization error
among other facilities). is raised.

So, the 1111 validation fingerprints were located using this


B. Generating the validation set simple positioning system and the results can be considered
the baseline for this database. The results are shown in Table
Another Android application, ValidateLoc, has been devel- XI and include the Success rate, the Error in positioning and
oped to capture more points for validation purposes. This ap- Time.
plication performs the operation phase by sending the required
information (only WAPs detected and RSSI levels) to a cen- TABLE XI. UJII NDOOR L OC RESULTS WITH 1NN IN CONJUNCTION
tralized server, and it gets a point inside a building (given by WITH Euclidean Distance
its longitude, latitude, floor) from the server. The localization Error in positioning 7.9m
is performed in the server side so the procedure to obtain the Success rate 89.92%
position from the fingerprint is totally transparent to the user. Time 495.26 ± 0.54ms
Then, the application asks the user if the provided localization
is correct or not. If it is correct, the WiFi fingerprint and the
successfully predicted localization are sent to the server and The success rate corresponds to the percentage of vali-
they are permanently stored. Otherwise, the application asks dation fingerprints correctly located inside the corresponding
the user for the real localization and sends the information building and floor. The error in positioning is the average
to be stored in the server side. Figures 7 and 8 show two error in meters of the validation fingerprints correctly located
execution examples of ValidateLoc. inside the corresponding building and floor. The T ime field
contains the average time in milliseconds required to obtain
To generate the validation samples, 14 users installed the the precise location (longitude, latitude, floor) per fingerprint.
application on their Android devices and executed it during 20 T ime was calculated after repeating the experiments 20 times.
minutes (approximately) in each of the three Tx buildings. The The experiments were done by means of a non-optimized
users localized themselves with the application on their way, Matlab script run on a Intel’s Core i7 based computer.
any advice/suggestion was not provided to them. Moreover,
any reference point was not provided to them so they could With the basic indoor location system the error is, in
capture WiFi signals in places which were not in the training average, 7.9meters when the fingerprint has been located in
phase. Despite last fact, users were close to reference points the correct building and floor. In more than 10% of cases,
in most of the cases, so they were correctly located. the fingerprint was localized in other floor and/or building.
Particularly, there were 3 out of 1111 cases (0.27%) in which
Finally, 1111 captures were recorded. Note that the Val- the building was not correctly predicted, and 109 errors (9.81%
idateLoc application only sends one validation capture after of total cases) in detecting the correct floor. Other researchers
localizing the user. can use the results shown here for further comparisons.
Fig. 7. Screenshots of the ValidateLoc application. The first image corresponds to the initial screen before localizing the user. The second one shows the
localization and asks the user if the position is correct and the users presses the “yes” button. The last image warns the user that the validation fingerprint has
successfully arrived to the server. Blue point stands for the predicted position and green one to the position assigned to the fingerprint.

Fig. 8. Screenshots of the ValidateLoc application. The first image shows again the localization and asks the user if the position is correct. In this case, the
user presses the “no” button. The second image shows the screen in which the real position is introduced. The last image warns the user that the validation
fingerprint has successfully arrived to the server. Blue point stands for the predicted position and the red one to the position assigned to the fingerprint.

VI. D ISCUSSION AND CHALLENGES majority of them are hidden to human eye. Commonly,
the WLAN antennas are located inside restricted areas
This database has been initially generated for indoor lo- or in the ceiling. The localization of the WAPs can
calization in our University Campus. Therefore, testing lo- strongly support the localization algorithm.
calization algorithms is the first use that external researchers
can do with the published dataset. New indoor localization • Detection of low-coverage places such as the ones
algorithms are often proposed using private datasets that are recorded in our dataset. On the one hand, adding
not publicly available, as it has been previously mentioned new antennas can improve the localization algorithm
in sections I and II. Two different algorithms can be hardly because those places can not be localized by RSSI-
compared only with the information and the results provided based algorithms. On the other hand, it would improve
by the authors. This public dataset can be used to test the the internet connection to those users who are on those
accuracy of any localization algorithm based on RSSI levels or places.
for performing a comparison of localization algorithms under • Detection of WLAN collision places where some
the same experimental framework. WAPs are emitting in the same channel, and WLAN
However, the database is susceptible of being used on connectivity may be degraded.
alternative problems that are discussed here. For instance: • The automatically detection of removed and new
• It could be of interest an analysis about how the WAPs may be interesting since re-mapping could be
internal structure of the building is related with the avoided. The procedure to fully map a building re-
WLAN access points and, therefore, how the number quires planning, elaboration of a mapping strategy and
and position of these points can be optimised without working hours. The automatic detection may reduce
being out of WiFi range. This could obviously result the maintenance costs of fingerprint databases. In the
in important savings in terms of hardware acquisition introduced database, the validation fingerprints were
and installation efforts. In summary, a very desirable taken 3 months later than the training ones, and some
preprocessing step in any WiFi structure is to reduce WAPs disappeared and new ones were introduced.
the redundant access points keeping a complete cov- From the 520 detected WAPs in the UJIIndoorLoc
erage. database, 312 of them were detected in training and
validation phases. 153 WAPs were only detected in
• WAPs location can be inferred with the databased the training phase, and 55 new WAPs appeared in the
provided. Although some WAPs are visible, the wide validation phase.
Another practical applications about how devices detect the VII. C ONCLUSIONS
WAPs are:
This paper introduces a new database for indoor localiza-
• How two devices differ on obtaining the individual tion, UJIIndoorLoc, on the basis of a WLAN fingerprinting
RSSI values on the same place. Table XII shows (RSSI levels) environment. First, database description has been
a summary of the RSSI values of the same WAP fully detailed, including the features used in the database,
provided by two different devices. Although the range their meaning, and the value ranges. Second, the procedure
of possible values is similar for both devices, one of and the applications used to generate the database have also
them tends to provide lower values according to the been described. To address the problem of samples diversity
mean and median values. and realistic approach, more than 20 users participated in
generating the database. Each training reference point was
• The study of anomalies in data such as the one detailed initially assigned to, at least, two different users. No suggestion
in Table XIII, where the RSSI values of the same WAP or advice about capturing was given to the users. In addition,
are shown for 4 different records located in the same the device was always held by a human user in contrast to
place. One of the devices detected the highest possible other datasets in which the device was left on a place to
value 0, but the same device detected a very low signal take several samples. All the samples were collected by a
strength in the same place for the same WAP. This human user because the human body partially blocks the radio
seems to be an anomaly in the data. wave communication [36]. Therefore, the samples taken for
• According to our records, the number of RSSI scanned UJIIndoorLoc can be considered very realistic. Due to privacy
by a device depends on the environment and on the issues, some information has been anonymized.
device itself. Table XIV shows an example of that, While the WLAN-based localization databases used in the
where the number of WAPs detected by two devices literature tend to cover small areas [17] or one-floor buildings
is shown. It is well known that not all devices detect [14], UJIIndoorLoc covers three buildings with 4 or more
the same number of WAPs. However, this database not floors and almost 110.000m2 . Moreover, the shape and internal
only confirms this fact but a detailed analysis on this structure are quite different among the three buildings where
sense could be performed. the samples were collected. In addition, more than 20 people
TABLE XII. S UMMARY OF THE RSSI VALUES OF WAP0034 PROVIDED
and 25 different devices have been used to generate the
BY TWO DIFFERENT DEVICES . M EASURES WERE TAKEN IN FRONT OF database, in contrast to other databases that were generated us-
OFFICE 120 LOCATED ON THE FIRST FLOOR OF THE TI BUILDING . ing a single device or few devices [25]. Our proposed database
can also be very useful for validation and comparison pur-
PhoneID min max mean median
poses, since validation samples have also been provided. All
13 52dBm 41dBm 48.5dBm 50.5dBm these features of the proposed database make UJIIndoorLoc
14 49dBm 37dBm 41.5dBm 39dBm suitable for testing and benchmarking localization algorithms.
Alternatively, UJIIndoorLoc can be used for other purposes
TABLE XIII. E XTRACT OF WAP0517 RSSI VALUES . M EASURES such as analysis of device accuracy, improvement of WiFi
TAKEN IN FRONT OF OFFICE 215 (3 RD FLOOR OF TC BUILDING ).
coverage, WAPs optimization (localization and distribution),
Record PhoneID UserID WAP0517 RSSI level among others. The here proposed UJIIndoorLoc database not
628 23 2 87dBm only is the biggest database in the literature shown but it is
846 23 2 not detected also the first publicly available database that could be used
2643 19 6 81dBm to make comparisons among different methods in the field. In
2677 19 6 0dBm addition, s basic positioning system has been developed using
the k-Nearest Neighbor rule in order to provide a baseline for
TABLE XIV. N UMBER OF RSSI VALUES SCANNED BY TWO DIFFERENT
comparison purposes.
DEVICES IN TWO SCENARIOS : 1) IN FRONT OF O FFICE 121 (1 ST FLOOR OF
Summarizing, the scarcity of publicly available localization
TI BUILDING ) AND 2) CONSIDERING THE WHOLE DATABASE .
databases, none as far as we know, reflects the need of a
Outside office 121 common public database for research purposes such as the
PhoneID min max mean median one here presented.
13 12 17 14.9 15
14 9 22 19.5 21 ACKNOWLEDGMENT

General We would like to thank to Yasmina Andreu, Óscar Bel-


PhoneID min max mean median monte, Irene Garcia-Martı́, Diego Gargallo, Nadal Francisco,
Josep López, Ruben Martı́nez, Roberto Mediero, Javier Or-
13 0 32 15.55 15
tells, Nacho Piqueras, Ianisse Quizán, David Rambla, Luis E.
14 2 48 17.98 17
Rodrı́guez, Ana Sanchı́s, Carlos Serra and Sergi Trilles for
their help on creating this database.
Finally, similarly to other public databases (as for instance
the ones included in the UCI Machine Learning Repository We also thank Javier Fernández, Ángel Ramos, Álvaro
[7]), the proposed database can be used to evaluate new Pattern Arranz and Guillermo Amat for their collaboration and com-
Recognition methods, as for instance the ones related to fea- ments, as members of Percepción Project (Ministerio de In-
ture selection, editing & condensation, and new classification dustria, Energı́a y Comercio - Programa Avanza2 - 2012), for
strategies, among others. their useful conversations in this area.
R EFERENCES [20] J. Machaj, P. Brida, and R. Pich, “Rank based fingerprinting algorithm
for indoor positioning,” in Proceedings of the 2on International Con-
[1] N. D. Lane, E. Miluzzo, H. Lu, D. Peebles, T. Choudhury, and A. T. ference on Indoor Positioning and Indoor Navigation (IPIN’11), 2011,
Campbell, “A survey of mobile phone sensing,” IEEE Communications pp. 1–6.
Magazine, vol. 48, no. 9, pp. 140–150, 2010. [21] M. Kranz, C. Fischer, and A. Schmidt, “A comparative study of DECT
[2] R. Montoliu, J. Blom, and D. Gatica-Perez, “Discovering places of and WLAN signals for indoor localization,” in Proceedings of the 8th
interest in everyday life from smartphone data,” Multimedia Tools and Annual IEEE International Conference on Pervasive Computing and
Applications, vol. 62, no. 1, pp. 179–207, 2013. Communications (PerCom’10), 2010, pp. 235–243.
[3] R. Montoliu, A. Martinez-Uso, and J. Martinez-Sotoca, “Semantic place [22] V. Honkavirta, T. Perl, S. Ali-Lytty, and R. Pich, “A comparative survey
prediction by combining smart binary classifiers,” in Proceedings of of wlan location fingerprinting methods.” in Proceedings of the 6th
the Mobile Data Challenge by Nokia Workshop, in conjunction with Workshop on Positioning, Navigation and Communication (WPNC’09),
International Conference on Pervasive Computing (Pervasive’12), 2012. 2009, pp. 243–251.
[4] Google, “Google Scholar,” 2013. [Online]. Available: [Link] [23] M. Gunawan, B. Li, T. J. Gallagher, A. G. Dempster, and G. Retscher,
[Link]/ “A new method to generate and maintain a wifi fingerprinting database
automatically by using rfid,” in Proceedings of the 3rd International
[5] N. Marques, F. Meneses, and A. Moreira, “Combining similarity func- Conference on Indoor Positioning and Indoor Navigation (IPIN’12),
tions and majority rules for multi-building, multi-floor, wifi positioning,” 2012, pp. 1–6.
in Proceedings of the 3th the International Conference on Indoor
Positioning and Indoor Navigation (IPIN’2012), 2012. [24] Y. Gu, A. Lo, and I. G. Niemegeers, “A survey of indoor positioning
systems for wireless personal networks.” IEEE Communications Surveys
[6] K. Kaemarungsi and P. Krishnamurthy, “Properties of indoor received and Tutorials, vol. 11, no. 1, pp. 13–32, 2009.
signal strength for wlan location fingerprinting.” in Proceedings of
the 1th Annual International Conference on Mobile and Ubiquitous [25] T. J. Gallagher, B. Li, A. G. Dempster, and C. Rizos, “A sector-based
Systems: Networking and Services (MobiQuitous’04), 2004, pp. 14–23. campus-wide indoor positioning system,” in Proceedings of the 1st
International Conference on Indoor Positioning and Indoor Navigation
[7] K. Bache and M. Lichman, “UCI machine learning repository,” 2013. (IPIN’10), 2010, pp. 1–8.
[Online]. Available: [Link]
[26] S. Brning, J. Zapotoczky, P. Ibach, and V. Stantchev, “Cooperative
[8] H. Wang, S. Sen, A. Elgohary, M. Farid, M. Youssef, and R. R. positioning with magicmap,” in Proceedings of the 4th Workshop on
Choudhury, “No need to war-drive: unsupervised indoor localization,” Positioning, Navigation and Communication (WPNC’07), 2007, pp. 17–
in Proceedings of the 10th international conference on Mobile systems, 22.
applications, and services (MobiSys’12), 2012, pp. 197–210. [27] N. Swangmuang and P. Krishnamurthy, “Location fingerprint analyses
[9] K. Chintalapudi, A. Padmanabha Iyer, and V. N. Padmanabhan, “Indoor toward efficient indoor positioning.” in Proceedings of the 6th IEEE
localization without the pain,” in Proceedings of the 16th Annual International Conference on Pervasive Computing and Communications
International Conference on Mobile Computing and Networking (Mo- (PerCom’08), 2008, pp. 100–109.
biCom’10), 2010, pp. 173–184. [28] C. Nerguizian, C. L. Despins, and S. Affes, “Indoor geolocation with
[10] W. G. Griswold, P. Shanahan, S. W. Brown, R. T. Boyer, M. Ratto, R. B. received signal strength fingerprinting technique and neural networks.”
Shapiro, and T. M. Truong, “Activecampus: Experiments in community- in Proceedings of the 11th International Conference on Telecommuni-
oriented ubiquitous computing.” IEEE Computer, vol. 37, no. 10, pp. cations (ICT’04), vol. 3124, 2004, pp. 866–875.
73–81, 2004. [29] E. Martin, O. Vinyals, G. Friedland, and R. Bajcsy, “Precise indoor
[11] I. Constandache, R. R. Choudhury, and I. Rhee, “Towards mobile phone localization using smart phones.” in Proceedings of the international
localization without war-driving.” in Proceedings of the 29th IEEE In- conference on Multimedia (MM’10), 2010, pp. 787–790.
ternational Conference on Computer Communications (INFOCOM’10), [30] J. Geun Park, B. Charrow, D. Curtis, J. Battat, E. Minkov, J. Hicks, S. J.
2010, pp. 2321–2329. Teller, and J. Ledlie, “Growing an organic indoor location system.” in
[12] B. Ferris, D. Fox, and N. Lawrence, “Wifi-slam using gaussian process Proceedings of the 8th International Conference on Mobile Systems,
latent variable models,” in In Proceedings of the 20th International Joint Applications, and Services (MobiSys’10), 2010, pp. 271–284.
Conference on Artificial Intelligence (IJCAI’07), 2007, pp. 2480–2485. [31] E. C. L. Chan, G. Baciu, and S. C. Mak, “Orientation-based wi-
[13] J. Lester, T. Choudhury, and G. Borriello, “A practical approach to fi positioning on the google nexus one.” in Proceedings of the 6th
recognizing physical activities,” in Proceedings of the 4th international IEEE International Conference on Wireless and Mobile Computing,
conference on Pervasive Computing (Pervasive’06), 2006, pp. 1–16. Networking and Communications (WiMob’10), 2010, pp. 392–397.
[14] Y. Chen, D. Lymberopoulos, J. Liu, and B. Priyantha, “Indoor local- [32] J. marti, J. Sales, and E. Marin, R. Jimenez-Ruiz, “Localization of
ization using fm signals,” IEEE Transactions on Mobile Computing, mobile sensors and actuators for intervention in low-visibility condi-
vol. 12, no. 8, pp. 1502–1517, 2013. tions: The zigbee fingerprinting approach,” International Journal of
Distributed Sensor Networks, vol. 2012, p. 10, 2012.
[15] J. Machaj, P. Brida, and J. Benikovsky, “Optimization of rank based
[33] J. Marti, J. Sales, R. Marin, and P. Sanz, “Multi-sensor localization
fingerprinting localization algorithm,” in Proceedings of the 3rd In-
and navigation for remote manipulation in smoky areas.” International
ternational Conference on Indoor Positioning and Indoor Navigation
Journal of Advanced Robotic Systems, vol. 10, p. 8, 2013.
(IPIN’12), 2012, pp. 1–7.
[34] H. A. Karimi, Ed., Advanced Location-Based Technologies and Ser-
[16] C. Beder and M. Klepal, “Fingerprinting based localisation revisited,” in vices. Taylor & Francis, 2013.
Proceedings of the 3rd International Conference on Indoor Positioning
and Indoor Navigation (IPIN’12), 2012, pp. 1–7. [35] T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE
Trans. Inf. Theor., vol. 13, no. 1, pp. 21–27, Sep. 1967.
[17] P. Bahl and V. N. Padmanabhan, “Radar: An in-building rf-based user
location and tracking system,” in Proceedings of the 19th IEEE Inter- [36] A. Natarajan, M. Motani, B. de Silva, K.-K. Yap, and K. C. Chua,
national Conference on Computer Communications (INFOCOM’00), “Investigating network architectures for body sensor networks,” in
2000, pp. 775–784. Proceedings of the 1st ACM International Workshop on Systems and
Networking Support for Healthcare and Assisted Living Environments
[18] S. Yang, P. Dessai, M. Verma, and M. Gerla, “Freeloc: Calibration-free (SIGMOBILE’07), 2007, pp. 19–24.
crowdsourced indoor localization,” in Proceedings of the 32th IEEE In-
ternational Conference on Computer Communications (INFOCOM’13),
2013.
[19] P. Mirowski, H. Strck, P. Whiting, R. Palaniappan, M. MacDonald,
and T. Kam Ho, “Kl-divergence kernel regression for non-gaussian
fingerprint based localization,” in Proceedings of the 2on International
Conference on Indoor Positioning and Indoor Navigation (IPIN’11),
2011, pp. 1–10.

View publication stats

You might also like