Theory and Practice of
Log Analysis
Agenda
• Introduction
• Automated log analysis using state machines,
work by James Andrews
• Break
• Practical implementation and its difficulties
• Connection to other research fields
• Questions and Discussion
Why analyze logs?
cout << “Message “ << i << “ received” << endl;
[Link]().logMessage(“Message %d received”,
i);
LOG_MESSAGE(“Warning”, “Message %d received”, i)
Who uses logs?
Programmers
Testers
Customers
Analyze that
Wed Jun 23 2004 [Link].645549 0 infrastructure P2P Communication 10
MEC->PROMPT: (GET_BUFFER : sid=15653,lid=10)
Wed Jun 23 2004 [Link].646357 0 infrastructure P2P Communication 10
PROMPT->MEC: (BUFFER_AVAILABLE : sid=15653,lid=10)
Wed Jun 23 2004 [Link].646939 0 infrastructure P2P Communication 10
MEC->PMI: (BUFFER_AVAILABLE : sid=15653,lid=10)
Wed Jun 23 2004 [Link].689819 2 PMI_GW PMIStreamDataVoicePlay Communication
10 PMI_GW->PMI_GW: (type=GET_ANOTHER_BUF,sid=15653,lid=10)
Wed Jun 23 2004 [Link].690896 0 infrastructure P2P Communication 10
PMI->MEC: (GET_BUFFER : sid=15653,lid=10)
Wed Jun 23 2004 [Link].691655 0 infrastructure P2P Communication 10
MEC->PROMPT: (GET_BUFFER : sid=15653,lid=10)
Wed Jun 23 2004 [Link].692457 0 infrastructure P2P Communication 10
PROMPT->MEC: (BUFFER_AVAILABLE : sid=15653,lid=10)
Wed Jun 23 2004 [Link].693035 0 infrastructure P2P Communication 10
MEC->PMI: (BUFFER_AVAILABLE : sid=15653,lid=10)
Wed Jun 23 2004 [Link].530243 2 PMI_GW PMIStreamDataVoicePlay Communication
10 PMI_GW->PMI_GW: (type=GET_ANOTHER_BUF,sid=15653,lid=10)
It almost sounds trivial
• At first glance it looks like a short Googling
should provide the answer
• In fact, almost no material is available on the
subject
There is one exception
• Analyzers for web server logs are abundant
• Web server logs are standardized
• Questions are common and known
• There are also log monitors
James H. Andrews
• Professor at the CS department of
University of Western Ontario.
• Lives in London…Ontario,
Canada.
• Produced a paper called “Theory
and Practice of Log Analysis” in
1998
“What we seem to need is log file analyzers which are conceived
and expressed as a set of grammars, or state machines, running
in parallel on the reports in the log file”
Log File Characteristics
• Log file is a distinct output file of ASCII text
(and in English)
• On startup the log file is empty or contains
information from previous runs. The system
only appends lines to log never changing the
previously written text
• Each "line" reports on some specific event
• "The information reported on is the information
that programmers feel will be useful in
monitoring the system and / or locating faults"
- an exact quote
Definitions
R – set of report elements
K – set of keywords, K R
Report – a finite sequence of report elements
beginning with a keyword.
Report trace – a finite or infinite sequence of
reports.
Note: in order to move from the notion of report trace to log
file, a portrayal function is defined
A (Log File) Machine
1. An identifying name
2. A countable set Q of machine states
3. A distinguished initial state i Q
4.A set F Q of final states
5. A countable set N R of reports which the machine notices
6. A relation Q N Q; (s1 , r, s 2 ) represents the fact that if the machine
is in state s1 and it receives report r it can make a transition to state s 2
A (log file) analyzer is defined as a countable set of machines
How it works
Log analyzer accepts a given report trace if the
reports in the trace cause the machines in the
analyzer to move through transitions beginning
with their initial states, and all ending at final
states (if the report trace is finite).
How it works
The report trace is rejected by the analyzer if:
• A report in the trace is not noticed by any of the
analyzer’s machines
• A report is noticed by some machine but it can
not make a transition on it
• One of the machines is not in its final state after
the end of the report trace
An example
• temp 21
• malloc 2096
• temp 19
• malloc 2088
• malloc 1016
• heater on
• temp 19
• temp 21
• free 2088
• heater off
• temp 21
An example
LFAL
LFAL
Before we move forward…
BREAK
Scaling up
Program has performed an illegal
operation and will be closed
Программа выполнила недопустимую
операцию и будет закрыта
Scaling up
Calculator
pressed 1
pressed 2
pressed +
pressed 5
pressed =
display 17
pressed *
pressed 2
pressed =
display 34
Context
sent packet 14643 received packet 78764
sent packet 78764 received packet 14643
sent packet 23425 received packet 23425
. .
. .
. .
Context
sent packet 5 received 5, sent 456 received 456
. . .
. . .
. . .
Complexity
Once upon a time there lived in a
certain village a little country girl, the
prettiest creature who was ever seen.
Her mother was excessively fond of
her; and her grandmother doted on
her still more. This good woman had a
little red riding hood made for her. It
suited the girl so extremely well that
everybody called her Little Red Riding
Hood.
Information
Error occurred 12/10/2005 [Link]
Session 475 has received an
unexpected message with
type CREATE_SESSION.
Message will be ignored.
Requirements
"The information reported on is the information
that programmers feel will be useful in
monitoring the system and / or locating faults" -
an exact quote
Requirements
log specification
Connections
Executable
Software Software
Development Models
Process
Log
Analysis
Automatic
Verification Testing
Future
• Language support for logging.
– Logger implemented as aspect
– Contract specifies which logs should be
written
• Example
[LOG(name=“sendMessage”, log=“Message <m as Message> sent
through channel <c as Channel>”)]
void sendMessage(Message msg)
{
Channel cn = getChannel();
[Link](m);
LOG_MESSAGE(name=“sendMessage”, m=msg, c=cn);
}
Future
• Logs are used throughout the development
process
• Requirements specify which logs should be
written
• Each function declares which events it logs
• Automatic testing for conformance
Summary
• Logging is a common and useful technique
• No generic, industrial strength software tools are
currently available for log analysis
• James Andrews has developed a useful
formalism for log analysis using state machines
• Real life log analysis can be complicated
• Proper use of logs alongside powerful log
analysis tools may improve software
development process.
Thank You
Questions ?