Activity Recognition From Accelerometer Data: Nishkam Ravi and Nikhil Dandekar and Preetham Mysore and Michael L. Littman
Activity Recognition From Accelerometer Data: Nishkam Ravi and Nikhil Dandekar and Preetham Mysore and Michael L. Littman
Nishkam Ravi and Nikhil Dandekar and Preetham Mysore and Michael L. Littman
Department of Computer Science
Rutgers University
Piscataway, NJ 08854
{nravi,nikhild,preetham,mlittman}@cs.rutgers.edu
Bluetooth library was used for programming Bluetooth. The Feature extraction
data was then converted to ASCII format using a Python Features were extracted from the raw accelerometer data us-
script. ing a window size of 256 with 128 samples overlapping
We collected data for a set of eight activities: between consecutive windows. Feature extraction on win-
• Standing dows with 50% overlap has demonstrated success in previ-
ous work (Bao & Intille 2004). At a sampling frequency
• Walking of 50Hz, each window represents data for 5.12 seconds. A
• Running window of several seconds can sufficiently capture cycles
in activities such as walking, running, climbing up stairs
• Climbing up stairs etc. Furthermore, a window size of 256 samples enabled
fast computation of FFTs used for one of the features.
• Climbing down stairs
Four features were extracted from each of the three axes
• Sit-ups of the accelerometer, giving a total of twelve attributes. The
features extracted were:
• Vacuuming
• Mean
• Brushing teeth.
• Standard Deviation
The activities were performed by two subjects in multiple
rounds over different days. No noise filtering was carried • Energy
out on the data. • Correlation.
Label-generation is semi-automatic. As the users per- The usefulness of these features has been demonstrated
formed activities, they were timed using a stop watch. The in prior work (Bao & Intille 2004). The DC component of
time values were then fed into a Perl script, which labeled the signal over the window is the mean acceleration value.
the data. Acceleration data collected between the start and Standard deviation was used to capture the fact that the range
stop times were labeled with the name of that activity. Since of possible acceleration values differ for different activities
the subject is probably standing still or sitting while he such as walking, running etc.
records the start and stop times, the activity label around The periodicity in the data is reflected in the frequency
these times may not correspond to the actual activity per- domain. To capture data periodicity, the energy feature was
formed. calculated. Energy is the sum of the squared discrete FFT
Figure 2 shows the lifecycle of the data. To minimize mis- component magnitudes of the signal. The sum was divided
labeling, data within 10 s of the start and stop times were by the window length for normalization. If x1 , x2P, ... are the
discarded. Figure 3 shows the x-axis readings of the ac- |w|
|xi |2
celerometer for various activities. FFT components of the window then, Energy = i=1
|w| .
Figure 3: X-axis readings for different activities
Correlation is calculated between each pair of axes as the than that of base-level classifiers, base-level-classifiers are
ratio of the covariance and the product of the standard devia- known to outperform meta-level-classifiers on several data
tions corr(x, y) = cov(x,y)
σx σy . Correlation is especially useful
sets. One of the goals of this work was to find out if com-
for differentiating among activities that involve translation bining classifiers is indeed the right thing to do for activity
in just one dimension. For example, we can differentiate recognition from accelerometer data, which to the best of
walking and running from stair climbing using correlation. our knowledge, has not been studied earlier.
Walking and Running usually involve translation in one di- Meta-level classifiers can be clustered into three frame-
mension whereas Climbing involves translation in more than works: voting (used in bagging and boosting), stack-
one dimension. ing (Wolpert 1992; Dzeroski & Zenko 2004) and cascad-
ing (Gama & Brazdil 2000). In voting, each base-level clas-
Data Interpretation sifier gives a vote for its prediction. The class receiving the
most votes is the final prediction. In stacking, a learning
The activity recognition algorithm should be able to recog- algorithm is used to learn how to combine the predictions
nize the accelerometer signal pattern corresponding to every of the base-level classifiers. The induced meta-level clas-
activity. Figure 3 shows the x-axis readings for the different sifier is then used to obtain the final prediction from the
activities. It is easy to see that every activity does have a predictions of the base-level classifiers. The state-of-the-art
distinct pattern. We formulate activity recognition as a clas- methods in stacking are stacking with class probability dis-
sification problem where classes correspond to activities and tributions using Meta Decision Trees (MDTs) (Todorovski
a test data instance is a set of acceleration values collected & Dzeroski 2003), stacking with class probability distribu-
over a time interval and post-processed into a single instance tions using Ordinary Decision Trees (ODTs) (Todorovski &
of {mean, standard deviation, energy, correlation}. We eval- Dzeroski 2003) and stacking using multi-response linear re-
uated the performance of the following base-level classifiers, gression (Seewald 2002). Cascading is an iterative process
available in the Weka toolkit: of combining classifiers: at each iteration, the training data
• Decision Tables set is extended with the predictions obtained in the previ-
• Decision Trees (C4.5) ous iteration. Cascading in general gives sub-optimal results
compared to the other two schemes.
• K-nearest neighbors
To have a near exhaustive set of classifiers, we chose the
• SVM following set of classifiers: Boosting, Bagging, Plurality
• Naive Bayes. Voting, Stacking with Ordinary-Decision trees (ODTs) and
We also evaluated the performance of some of the state- Stacking with Meta-Decision trees (MDTs).
of-the-art meta-level classifiers. Although the overall per- • Boosting (Meir & Ratsch 2003) is used to improve the
formance of meta-level classifiers is known to be better classification accuracy of any given base-level classifier.
Boosting applies a single learning algorithm repeatedly All the above meta-level classifiers, except MDTs, are
and combines the hypothesis learned each time (using available in the Weka toolkit. We downloaded the source
voting), such that the final classification accuracy is im- code for MDTs and compiled it with Weka.
proved. It does so by assigning a certain weight to each Alternate approaches to activity recognition include use
example in the training set, and then modifying the weight of Hidden Markov Models(HMMs) or regression. HMMs
after each iteration depending on whether the example would be useful in recognizing a sequence of activities to
was correctly or incorrectly classified by the current hy- model human behavior. In this paper, we concentrate on rec-
pothesis. Thus final hypothesis learned can be given as ognizing a single activity. Regression is normally used when
T a real-valued output is desired, otherwise classification is a
natural choice. Signal processing can be helpful in automat-
X
f (x) = αt ht (x),
t=1
ically extracting features from raw data. Signal processing,
however, is computationally expensive and not very suitable
where αt denotes the coefficient with which the hypothe- for resource constrained and battery powered devices.
sis ht is combined. Both αt and ht are learned during the
Boosting procedure. (Boosting is available in the Weka
toolkit.)
Results
All the base-level and meta-level classifiers mentioned
• Bagging (Breiman 1996) is another simple meta-level above were run on data sets in four different settings:
classifier that uses just one base-level classifier at a time.
It works by training each classifier on a random redistri- Setting 1: Data collected for a single subject over
bution of the training set. Thus, each classifier’s training different days, mixed together and cross-validated.
set is generated by randomly drawing, with replacement,
N instances from the original training set. Here N is the Setting 2: Data collected for multiple subjects over
size of the original training set itself. Many of the origi- different days, mixed together and cross-validated.
nal examples may be repeated in the resulting training set
while others may be left out. The final bagged estimator, Setting 3: Data collected for a single subject on one day
hbag (.) is the expected value of the prediction over each of used as training data, and data collected for the same subject
the trained hypotheses. If hk (.) is the hypothesis learned on another day used as testing data.
for training sample k,
M Setting 4: Data collected for a subject for one day used
1 X
hbag (.) = hk (.). as training data, and data collected on another subject on
M another day used as testing data.
k=1
100
Classifier Accuracy(%)
Setting1 Setting2 Setting3 Setting4
Naive Bayes(NB) 98.86 96.69 89.96 64.00
90
Boosted NB 98.86 98.71 89.96 64.00
Bagged NB 98.58 96.88 90.39 59.33 Stacking (MDTs)
SVM 98.15 98.16 68.78 63.00 Boosted NB
Plurality Voting
Boosted SVM 99.43 98.16 67.90 73.33 80
Naive Bayes(NB)
Boosted kNN 99.15 99.26 72.93 49.67 70 Bagged DTr Boosted SVM
Bagged kNN 99.15 99.26 70.52 46.67 Boosted DTr
Decision Tree (DTr)
Decision Table(DT) 92.45 91.91 55.68 46.33 Bagged SVM
Boosted DT 97.86 98.53 55.68 46.33 SVM
60 Boosted kNN
Bagged DT 93.30 94.85 55.90 46.67 Decision Bagged DT Bagged kNN
Decision Tree(DTr) 97.29 98.53 77.95 57.00 Table (DT) kNN
Makikawa, M.; Kurata, S.; Higa, Y.; Araki, Y.; and Tokue, R.
Table 3: Effect of dropping an attribute on classification ac- 2001. Ambulatory monitoring of behavior in daily life by ac-
curacy celerometers set at both-near-sides of the joint. In Proceedings of
Average no. of misclassi- MedInfo, 840–843.
Attribute
fications Meir, R., and Ratsch, G. 2003. An introduction to boosting and
Drop None 14.05 leveraging. 118–183.
Drop Mean 21.83 Priyantha, N. B.; Chakraborty, A.; and Balakrishnan, H. 2000.
Drop Standard Deviation 32.44 The cricket location-support system. In Mobile Computing and
Drop Energy 14.72 Networking, 32–43.
Drop Correlation 28.38
Randell, C., and Muller, H. 2000. Context awareness by analysing
accelerometer data. In MacIntyre, B., and Iannucci, B., eds., The
Fourth International Symposium on Wearable Computers, 175–
176. IEEE Computer Society.
Acknowledgments
Seewald, A. K. 2002. How to make stacking better and faster
Our sincere thanks to Amit Gaur and Muthu Muthukrishnan while also taking care of an unknown weakness. In Proceedings
for lending us the accelerometer. of the Nineteenth International Conference on Machine Learning,
554–561. Morgan Kaufmann Publishers Inc.
References Todorovski, L., and Dzeroski, S. 2003. Combining classifiers
Bao, L., and Intille, S. S. 2004. Activity recognition from user- with meta decision trees. Machine Learning 223–249.
annotated acceleration data. In Proceceedings of the 2nd Interna- Want, R.; Hopper, A.; Falcao, V.; and Gibbons, J. 1992. The
tional Conference on Pervasive Computing, 1–17. active badge location system. Technical Report 92.1, ORL, 24a
Breiman, L. 1996. Bagging predictors. Machine Learning 123– Trumpington Street, Cambridge CB2 1QA.
140. Witten, I., and Frank, E. 1999. Data Mining: Practical Machine
Bussmann, J.; Martens, W.; Tulen, J.; Schasfoort, F.; van den Learning Tools and Techniques with Java Implementations. Mor-
Bergemons, H.; and H.J.Stam. 2001. Measuring daily behavior gan Kauffman.
using ambulatory accelerometry: the activity monitor. Behavior Wolpert, D. H. 1992. Stacked generalization. Neural Networks
Research Methods, Instruments, and Computers 349–356. 241–259.
DeVaul, R., and Dunn, S. 2001. Real-Time Motion Classifi-
cation for Wearable Computing Applications. Technical report,
MIT Media Laboratory.
Dzeroski, S., and Zenko, B. 2004. Is combining classifiers with
stacking better than selecting the best one? Machine Learning
255–273.
Foerster, F.; Smeja, M.; and Fahrenberg, J. 1999. Detection of
posture and motion by accelerometry: a validation in ambulatory
monitoring. Computers in Human Behavior 571–583.
Freund, Y., and Schapire, R. E. 1996. Experiments with a
new boosting algorithm. In International Conference on Machine
Learning, 148–156.
Gama, J., and Brazdil, P. 2000. Cascade generalization. Machine
Learning 315–343.
Harter, A., and Hopper, A. 1994. A distributed location system
for the active office. IEEE Network 8(1).
Lee, S., and K.Mase. 2002. Activity and location recognition
using wearable sensors. IEEE Pervasive Computing 24–32.