Linux Performance Analysis New Tools and Old Secrets: Brendan Gregg
Linux Performance Analysis New Tools and Old Secrets: Brendan Gregg
Brendan Gregg
Senior Performance Architect
Performance Engineering Team
[email protected]
@brendangregg
Porting these to Linux…
• Massive Amazon EC2 Linux cloud
– Tens of thousands of instances
– Autoscale by ~3k each day
– CentOS and Ubuntu
• FreeBSD for content delivery
– Approx 33% of US Internet traffic at night
• Performance is critical
– Customer satisfaction: >50M subscribers
– $$$ price/performance
– Develop tools for cloud-wide analysis; use
server tools as needed
Brendan Gregg
• Senior Performance Architect, Netflix
– Linux and FreeBSD performance
– Performance Engineering team (@coburnw)
• Recent work:
– Linux perf-tools: ftrace & perf_events
– Testing of other tracers: eBPF
• Previously:
– Performance of Linux, Solaris, ZFS, DBs,
TCP/IP, hypervisors, …
– Flame graphs, heat maps, methodologies,
DTrace tools, DTraceToolkit
Agenda
1. Some one-liners
2. Background
3. Technology
4. Tools
1. Some one-liners
(cut
to
the
chase!)
tpoint for disk I/O
• Who is creating disk I/O, and of what type?
# ./tpoint -H block:block_rq_insert!
Tracing block:block_rq_insert. Ctrl-C to end.!
# tracer: nop!
#!
# TASK-PID CPU# TIMESTAMP FUNCTION!
# | | | | |!
flush-9:0-9318 [013] 1936182.007914: block_rq_insert: 202,16 W 0 () 160186560 + 8 [flush-9:0]!
flush-9:0-9318 [013] 1936182.007939: block_rq_insert: 202,16 W 0 () 280100936 + 8 [flush-9:0]!
java-9469 [014] 1936182.316184: block_rq_insert: 202,1 R 0 () 1319592 + 72 [java]!
java-9469 [000] 1936182.331270: block_rq_insert: 202,1 R 0 () 1125744 + 8 [java]!
java-9469 [000] 1936182.341418: block_rq_insert: 202,1 R 0 () 2699008 + 88 [java]!
java-9469 [000] 1936182.341419: block_rq_insert: 202,1 R 0 () 2699096 + 88 [java]!
java-9469 [000] 1936182.341419: block_rq_insert: 202,1 R 0 () 2699184 + 32 [java]!
java-9469 [000] 1936182.345870: block_rq_insert: 202,1 R 0 () 1320304 + 24 [java]!
java-9469 [000] 1936182.351590: block_rq_insert: 202,1 R 0 () 1716848 + 16 [java]!
^C!
Ending tracing...!
• Automate:
# functrace tcp_retransmit_skb!
• Document:
# man functrace!
[…]!
SYNOPSIS!
functrace [-hH] [-p PID] [-d secs] funcstring!
[…]!
ftrace Interface
• Plus many more capabilities
– buffered (trace) or live tracing (trace_pipe)
– filters for conditional tracing
– stack traces on events
– function triggers to enable/disable tracing
– functions with arguments (via kprobes)
• See Documentation/trace/ftrace.txt
perf_events
• Use via the “perf” command
• Add from linux-tools-common, …
– Source code is in Linux: tools/perf
• Powerful multi-tool and profiler
– interval sampling, CPU performance counter events
– user and kernel dynamic tracing
– kernel line tracing and local variables (debuginfo)
– kernel filtering, and in-kernel counts (perf stat)
• Not very programmable, yet
– limited kernel summaries. May improve with eBPF.
perf_events tracing
• Static tracing of block_rq_insert tracepoint:
# perf record -e block:block_rq_insert -a!
^C[ perf record: Woken up 1 times to write data ]!
[ perf record: Captured and wrote 0.172 MB perf.data (~7527 samples) ]!
!
# perf script!
# ========! trace,
dump,
post-‐process
# captured on: Wed Nov 12 20:50:05 2014!
# hostname : bgregg-test-i-92b81f78!
[…]!
# ========!
#!
java 9940 [015] 1199510.044783: block_rq_insert: 202,1 R 0 () 4783360 + 88 [java]!
java 9940 [015] 1199510.044786: block_rq_insert: 202,1 R 0 () 4783448 + 88 [java]!
java 9940 [015] 1199510.044786: block_rq_insert: 202,1 R 0 () 4783536 + 24 [java]!
java 9940 [000] 1199510.065194: block_rq_insert: 202,1 R 0 () 4864000 + 88 [java]!
java 9940 [000] 1199510.065195: block_rq_insert: 202,1 R 0 () 4864088 + 88 [java]!
java 9940 [000] 1199510.065196: block_rq_insert: 202,1 R 0 () 4864176 + 80 [java]!
java 9940 [000] 1199510.083745: block_rq_insert: 202,1 R 0 () 4864344 + 88 [java]!
[…]!
perf_events One-Liners
• Great one-liners. From http://www.brendangregg.com/perf.html:
# List all currently known events:!
perf list!
!
# Various basic CPU statistics, system wide, for 10 seconds:!
perf stat -e cycles,instructions,cache-references,cache-misses -a sleep 10!
!
# Count ext4 events for the entire system, for 10 seconds:!
perf stat -e 'ext4:*' -a sleep 10!
!
# Sample CPU stack traces for the entire system, at 99 Hertz, for 10 seconds:!
perf record -F 99 -ag -- sleep 10!
!
# Sample CPU stack traces, once every 100 last level cache misses, for 5 seconds:!
perf record -e LLC-load-misses -c 100 -ag -- sleep 5 !
!
# Trace all block device (disk I/O) requests with stack traces, until Ctrl-C:!
perf record -e block:block_rq_issue –ag!
!
# Add a tracepoint for the kernel tcp_sendmsg() function return:!
perf probe 'tcp_sendmsg%return'!
!
# Add a tracepoint for tcp_sendmsg, with size and socket state (needs debuginfo):!
perf probe 'tcp_sendmsg size sk->__sk_common.skc_state'!
!
# Show perf.data as a text report, with data coalesced and percentages:!
perf report –n --stdio!
eBPF
• Extended BPF: programs on tracepoints
– High performance filtering: JIT
– In-kernel summaries: maps
• eg, in-kernel latency heat map (showing bimodal):
Low
High
latency
latency
Time
cache
device
hits
I/O
eBPF
• Created by Alexei Starovoitov
• Gradually being included in Linux (see lkml)
• Has been difficult to program directly
– Other tools can become front-ends: ftrace, perf_events,
SystemTap, ktap?
Other Tracers
• Discussion:
– SystemTap
– ktap
– LTTng
– DTrace ports
– sysdig
The Tracing Landscape, Nov 2014
(my
opinion)
(less
brutal)
dtrace4L.
ktap
sysdig
Ease
of
use
perf stap
Irace
(alpha)
(mature)
Stage
of
eBPF
(brutal)
Development
Scope
&
Capability
4. Tools
Tools
one-‐liners: many
# ./iosnoop –h!
USAGE: iosnoop [-hQst] [-d device] [-i iotype] [-p PID] [-n name] [duration]!
-d device # device string (eg, "202,1)!
-i iotype # match type (eg, '*R*' for all reads)!
-n name # process name to match on I/O issue!
-p PID # PID to match on I/O issue!
-Q # include queueing time in LATms!
-s # include start time of I/O (s)!
-t # include completion time of I/O (s)!
-h # this usage message!
duration # duration seconds, and use buffers!
[…]!
iolatency
• Block I/O (disk) latency distributions:
# ./iolatency !
Tracing block I/O. Output every 1 seconds. Ctrl-C to end.!
!
>=(ms) .. <(ms) : I/O |Distribution |!
0 -> 1 : 1144 |######################################|!
1 -> 2 : 267 |######### |!
2 -> 4 : 10 |# |!
4 -> 8 : 5 |# |!
8 -> 16 : 248 |######### |!
16 -> 32 : 601 |#################### |!
32 -> 64 : 117 |#### |!
[…]!
Comes
from
include/trace/events/block.h:!
DECLARE_EVENT_CLASS(block_rq,!
[...]!
TP_printk("%d,%d %s %u (%s) %llu + %u [%s]",!
MAJOR(__entry->dev), MINOR(__entry->dev),!
__entry->rwbs, __entry->bytes, __get_str(cmd),!
(unsigned long long)__entry->sector,!
__entry->nr_sector, __entry->comm)!
preemp7on latency
wakeup
latency
Trace Compass
perf CPU Flame Graph
Kernel
TCP/IP
Broken
GC
Java
stacks
Locks
epoll
(missing
Idle
frame
Time
thread
pointer)
perf Block I/O Latency Heat Map
Summary
1. Some one-liners
2. Background
3. Technology
4. Tools
• Questions?
• http://slideshare.net/brendangregg
• http://www.brendangregg.com
• [email protected]
• @brendangregg