
User Guide
Version 1.3 (2012/08/01)
1. Data Description
This GPS trajectory dataset was collected in (Microsoft Research Asia) Geolife project by 182 users in a period of over five years
(from April 2007 to August 2012). A GPS trajectory of this dataset is represented by a sequence of time-stamped points, each of
which contains the information of latitude, longitude and altitude. This dataset contains 17,621 trajectories with a total distance of
1,292,951kilometers and a total duration of 50,176 hours. These trajectories were recorded by different GPS loggers and GPS-
phones, and have a variety of sampling rates. 91.5 percent of the trajectories are logged in a dense representation, e.g. every 1~5
seconds or every 5~10 meters per point.
This dataset recoded a broad range of users’ outdoor movements, including not only life routines like go home and go to work
but also some entertainments and sports activities, such as shopping, sightseeing, dining, hiking, and cycling. This trajectory
dataset can be used in many research fields, such as mobility pattern mining, user activity recognition, location-based social
networks, location privacy, and location recommendation.
Although this dataset is wildly distributed in over 30 cities of China and even in some cities located in the USA and Europe,
the majority of the data was created in Beijing, China. Figure 1 plots the distribution (heat map) of this dataset in Beijing. The
figures standing on the right side of the heat bar denote the number of points generated in a location.
A) Data overview in Beijing
B) Within the 5
th
Ring Road of Beijing
Figure 1 Distribution of the dataset in Beijing city
The distributions of distance and duration of the trajectories are presented in Figure 2 and Figure 3.
In the data collection program, a portion of users have carried a GPS logger for years, while some of the others only have a
trajectory dataset of a few weeks. This distribution is presented in Figure 4, and the distribution of the number of trajectories
collected by each user is shown in Figure 5.
Figure 2 Distribution of trajectories by distance
Figure 3 Distribution of trajectories by effective duration
36%
36%
23%
5%
< 5km
5km ~ 20km
20km ~ 100km
≥ 100km
58%
26%
11%
5%
< 1h
1h ~ 6h
6h ~ 12h
≥ 12h