0% found this document useful (0 votes)

51 views78 pages

D Trace Cloud 2012

The document discusses using DTrace, a dynamic tracing tool, for performance analysis and troubleshooting in cloud environments. It describes the different types of cloud deployments and DTrace's visibility within hardware virtualization and OS virtualization setups. In OS virtualization like zones, the host can see the entire system including all tenants, while tenants can see their own processes and limited host context. The document also covers some historical issues with DTrace in non-global zones and how it has improved. Examples are given of problems DTrace has helped solve in cloud operations.

Uploaded by

wanna_ac

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views78 pages

D Trace Cloud 2012

Uploaded by

wanna_ac

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 78

DTracing the Cloud

Brendan Gregg
Lead Performance Engineer [email protected] @brendangregg

October, 2012

Monday, October 1, 12

DTracing the Cloud

Monday, October 1, 12

whoami

GDay, Im Brendan These days I do performance analysis of the cloud I use the right tool for the job; sometimes traditional, often DTrace.

Traditional + some DTrace All DTrace

Monday, October 1, 12

DTrace

DTrace is a magician that conjures up rainbows, ponies and unicorns and does it all entirely safely and in production!

Monday, October 1, 12

DTrace

Or, the version with fewer ponies: DTrace is a performance analysis and troubleshooting tool

Instruments all software, kernel and user-land. Production safe. Designed for minimum overhead. Default in SmartOS, Oracle Solaris, Mac OS X and FreeBSD. Two Linux ports are in development.

Theres a couple of awesome books about it.

Monday, October 1, 12

illumos

Joyents SmartOS uses (and contributes to) the illumos kernel.

illumos is the most DTrace-featured kernel

illumos community includes Bryan Cantrill & Adam Leventhal, DTrace co-inventors (pictured on right).

Monday, October 1, 12

Agenda

Theory

Cloud types and DTrace visibility

Reality DTrace and Zones DTrace Wins

Tools DTrace Cloud Tools Cloud Analytics

Case Studies

Monday, October 1, 12

Theory

Monday, October 1, 12

Cloud Types

We deploy two types of virtualization on SmartOS/illumos:

Hardware Virtualization: KVM OS-Virtualization: Zones

Monday, October 1, 12

Cloud Types, cont.

Both virtualization types can co-exist:

Linux Windows SmartOS

Cloud Tenant Apps Guest Kernel

Cloud Tenant Apps

Virtual Device Drivers Host Kernel

SmartOS

Monday, October 1, 12

Cloud Types, cont.

KVM

Used for Linux and Windows guests Legacy apps

Zones Used for SmartOS guests (zones) called SmartMachines Preferred over Linux:

Bare-metal performance Less memory overheads Better visibility (debugging)

Global Zone == host, Non-Global Zone == guest Also used to encapsulate KVM guests (double-hull security)

Monday, October 1, 12

Cloud Types, cont.

DTrace can be used for:

Performance analysis: user- and kernel-level Troubleshooting

Specically, for the cloud: Performance eects of multi-tenancy Eectiveness and troubleshooting of performance isolation

Four contexts: KVM host, KVM guest, Zones host, Zones guest FAQ: What can DTrace see in each context?

Monday, October 1, 12

Hardware Virtualization: DTrace Visibility

As the cloud operator (host):

Linux Linux Windows

Cloud Tenant Apps Guest Kernel

Cloud Tenant Apps Guest Kernel Virtual Device Drivers Host Kernel
SmartOS

Cloud Tenant Apps Guest Kernel

Monday, October 1, 12

Hardware Virtualization: DTrace Visibility

Host can see:

Entire host: kernel, apps Guest disk I/O (block-interface-level) Guest network I/O (packets) Guest CPU MMU context register

Host cant see: Guest kernel Guest apps Guest disk/network context (kernel stack) ... unless the guest has DTrace, and access (SSH) is allowed

Monday, October 1, 12

Hardware Virtualization: DTrace Visibility

As a tenant (guest):
Linux An OS with DTrace Windows

Cloud Tenant Apps Guest Kernel

Cloud Tenant Apps Guest Kernel Virtual Device Drivers Host Kernel
SmartOS

Cloud Tenant Apps Guest Kernel

Monday, October 1, 12

Hardware Virtualization: DTrace Visibility

Guest can see:

Guest kernel, apps, provided DTrace is available

Guest cant see: Other guests Host kernel, apps

Monday, October 1, 12

OS Virtualization: DTrace Visibility

As the cloud operator (host):

SmartOS SmartOS SmartOS

Cloud Tenant Apps

Host Kernel
SmartOS

Monday, October 1, 12

OS Virtualization: DTrace Visibility

Host can see:

Entire host: kernel, apps Entire guests: apps

Monday, October 1, 12

OS Virtualization: DTrace Visibility

Operators can trivially see the entire cloud

Direct visibility from host of all tenant processes

Each blob is a tenant. The background shows one entire data center (availability zone).

Monday, October 1, 12

OS Virtualization: DTrace Visibility

Zooming in, 1 host, 10 guests: All can be examined with 1 DTrace invocation; dont need multiple SSH or API logins per-guest. Reduces observability framework overhead by a factor of 10 (guests/host)

This pic was just created from a process snapshot (ps) http://dtrace.org/blogs/brendan/2011/10/04/visualizing-the-cloud/

Monday, October 1, 12

OS Virtualization: DTrace Visibility

As a tenant (guest):
SmartOS SmartOS SmartOS

Cloud Tenant Apps

Host Kernel
SmartOS

Monday, October 1, 12

OS Virtualization: DTrace Visibility

Guest can see:

Guest apps Some host kernel (in guest context), as congured by DTrace zone privileges

Guest cant see:

Other guests Host kernel (in non-guest context), apps

Monday, October 1, 12

OS Stack DTrace Visibility

Entire operating system stack (example): Applications DBs, all server types, ... Virtual Machines System Libaries System Call Interface VFS Sockets UFS/... ZFS TCP/UDP Volume Managers IP Block Device Interface Ethernet Device Drivers Devices

Monday, October 1, 12

OS Stack DTrace Visibility

user kernel

DTrace

Monday, October 1, 12

Reality

Monday, October 1, 12

DTrace and Zones

DTrace and Zones were developed in parallel for Solaris 10, and then integrated. DTrace functionality for the Global Zone (GZ) was added rst.

This is the host context, and allows operators to use DTrace to inspect all tenants.

DTrace functionality for the Non-Global Zone (NGZ) was harder, and some capabilities added later (2006):

Providers: syscall, pid, prole This is the guest context, and allows customers to use DTrace to inspect themselves only (cant see neighbors).

Monday, October 1, 12

DTrace and Zones, cont.

Monday, October 1, 12

DTrace and Zones, cont.

GZ DTrace works well. We found many issues in practice with NGZ DTrace:

Cant read fds[] to translate le descriptors. Makes using the syscall provider more dicult.

# dtrace -n 'syscall::read:entry /fds[arg0].fi_fs == "zfs"/ { @ = quantize(arg2); }' dtrace: description 'syscall::read:entry ' matched 1 probe dtrace: error on enabled probe ID 1 (ID 4: syscall::read:entry): invalid kernel access in predicate at DIF offset 64 dtrace: error on enabled probe ID 1 (ID 4: syscall::read:entry): invalid kernel access in predicate at DIF offset 64 dtrace: error on enabled probe ID 1 (ID 4: syscall::read:entry): invalid kernel access in predicate at DIF offset 64 dtrace: error on enabled probe ID 1 (ID 4: syscall::read:entry): invalid kernel access in predicate at DIF offset 64 [...]

Monday, October 1, 12

DTrace and Zones, cont.

Cant read curpsinfo, curlwpsinfo, which breaks many scripts (eg, curpsinfo->pr_psargs, or curpsinfo->pr_dmodel)

# dtrace -n 'syscall::exec*:return { trace(curpsinfo->pr_psargs); }' dtrace: description 'syscall::exec*:return ' matched 1 probe dtrace: error on enabled probe ID 1 (ID 103: syscall::exece:return): kernel access in action #1 at DIF offset 0 dtrace: error on enabled probe ID 1 (ID 103: syscall::exece:return): kernel access in action #1 at DIF offset 0 dtrace: error on enabled probe ID 1 (ID 103: syscall::exece:return): kernel access in action #1 at DIF offset 0 dtrace: error on enabled probe ID 1 (ID 103: syscall::exece:return): kernel access in action #1 at DIF offset 0 [...]

invalid invalid invalid invalid

Missing proc provider. Breaks this common one-liner:

# dtrace -n 'proc:::exec-success { trace(execname); }' dtrace: invalid probe specifier proc:::exec-success { trace(execname); }: probe description proc:::exec-success does not match any probes [...]

Monday, October 1, 12

DTrace and Zones, cont.

Missing vminfo, sysinfo, and sched providers. Cant read cpu built-in. prole probes behave oddly. Eg, prole:::tick-1s only res if tenant is on-CPU at the same time as the probe would re. Makes any script that produces interval-output unreliable.

Monday, October 1, 12

DTrace and Zones, cont.

These and other bugs have since been xed for SmartOS/illumos (thanks Bryan Cantrill!) Now, from a SmartOS Zone:
# dtrace -n 'proc:::exec-success { @[curpsinfo->pr_psargs] = count(); } profile:::tick-5s { exit(0); }' dtrace: description 'proc:::exec-success ' matched 2 probes CPU ID FUNCTION:NAME 13 71762 :tick-5s -bash /bin/cat -s /etc/motd /bin/mail -E /usr/bin/hostname /usr/sbin/quota /usr/bin/locale -a ls -l sh -c /usr/bin/locale -a 1 1 1 1 1 2 3 4

Trivial DTrace one-liner, but represents much needed functionality.

Monday, October 1, 12

DTrace Wins

Aside from the NGZ issues, DTrace has worked well in the cloud and solved numerous issues. For example (these are mostly from operator context):

http://dtrace.org/blogs/brendan/2012/08/09/10-performance-wins/

Monday, October 1, 12

DTrace Wins, cont.

Not surprising given DTraces visibility...

Monday, October 1, 12

DTrace Wins, cont.

For example, DTrace script counts from the DTrace book: Applications DBs, all server types, ... Virtual Machines System Libaries System Call Interface VFS Sockets UFS/... ZFS TCP/UDP Volume Managers IP Block Device Interface Ethernet Device Drivers Devices 4 8 8 16 4 3

10+

10+ 13 21 10 17

Monday, October 1, 12

Tools

Monday, October 1, 12

Ad-hoc

Write DTrace scripts as needed Execute individually on hosts, or, With ah-hoc scripting, execute across all hosts (cloud) My ad-hoc tools include:

DTrace Cloud Tools Flame Graphs

Monday, October 1, 12

Ad-hoc: DTrace Cloud Tools

Contains around 70 ad-hoc DTrace tools written by myself for operators and cloud customers.
./fs/metaslab_free.d ./fs/spasync.d ./fs/zfsdist.d ./fs/zfsslower.d ./fs/zfsslowzone.d ./fs/zfswhozone.d ./fs/ziowait.d ./mysql/innodb_pid_iolatency.d ./mysql/innodb_pid_ioslow.d ./mysql/innodb_thread_concurrency.d ./mysql/libmysql_pid_connect.d ./mysql/libmysql_pid_qtime.d ./mysql/libmysql_pid_snoop.d ./mysql/mysqld_latency.d ./mysql/mysqld_pid_avg.d ./mysql/mysqld_pid_filesort.d ./mysql/mysqld_pid_fslatency.d [...] ./net/dnsconnect.d ./net/tcp-fbt-accept_sdc5.d ./net/tcp-fbt-accept_sdc6.d ./net/tcpconnreqmaxq-pid_sdc5.d ./net/tcpconnreqmaxq-pid_sdc6.d ./net/tcpconnreqmaxq_sdc5.d ./net/tcpconnreqmaxq_sdc6.d ./net/tcplistendrop_sdc5.d ./net/tcplistendrop_sdc6.d ./net/tcpretranshosts.d ./net/tcpretransport.d ./net/tcpretranssnoop_sdc5.d ./net/tcpretranssnoop_sdc6.d ./net/tcpretransstate.d ./net/tcptimewait.d ./net/tcptimewaited.d ./net/tcptimretransdropsnoop_sdc6.d [...]

Customer scripts are linked from the smartmachine directory

https://github.com/brendangregg/dtrace-cloud-tools

Monday, October 1, 12

Ad-hoc: DTrace Cloud Tools, cont.

For example, tcplistendrop.d traces each kernel-dropped SYN due to TCP backlog overow (saturation):
# ./tcplistendrop.d TIME 2012 Jan 19 01:22:49 2012 Jan 19 01:22:49 2012 Jan 19 01:22:49 2012 Jan 19 01:22:49 2012 Jan 19 01:22:49 2012 Jan 19 01:22:49 2012 Jan 19 01:22:49 2012 Jan 19 01:22:49 2012 Jan 19 01:22:49 [...] SRC-IP 10.17.210.103 10.17.210.108 10.17.210.116 10.17.210.117 10.17.210.112 10.17.210.106 10.12.143.16 10.17.210.100 10.17.210.99 PORT 25691 18423 38883 10739 27988 28824 65070 56392 24628 DST-IP 192.192.240.212 192.192.240.212 192.192.240.212 192.192.240.212 192.192.240.212 192.192.240.212 192.192.240.212 192.192.240.212 192.192.240.212 PORT 80 80 80 80 80 80 80 80 80

-> -> -> -> -> -> -> -> ->

Can explain multi-second client connect latency.

Monday, October 1, 12

Ad-hoc: DTrace Cloud Tools, cont.

tcplistendrop.d processes IP and TCP headers from the in-kernel packet buffer:
fbt::tcp_input_listener:entry { self->mp = args[1]; } fbt::tcp_input_listener:return { self->mp = 0; } mib:::tcpListenDrop /self->mp/ { this->iph = (ipha_t *)self->mp->b_rptr; this->tcph = (tcph_t *)(self->mp->b_rptr + 20); printf("%-20Y %-18s %-5d -> %-18s %-5d\n", walltimestamp, inet_ntoa(&this->iph->ipha_src), ntohs(*(uint16_t *)this->tcph->th_lport), inet_ntoa(&this->iph->ipha_dst), ntohs(*(uint16_t *)this->tcph->th_fport)); }

Since this traces the fbt provider (kernel), it is operator only.

Monday, October 1, 12

Ad-hoc: DTrace Cloud Tools, cont.

A related example: tcpconnreqmaxq-pid*.d prints a summary, showing backlog lengths (on SYN arrival), the current max, and drops:
tcp_conn_req_cnt_q distributions: cpid:3063 value -1 0 1 max_q:8 ------------- Distribution ------------- count | 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 | 0 Text max_q:128 ------------- Distribution ------------- count | 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 7279 |@@ 405 |@ 255 |@ 138 | 81 | 83 | 62 | 67 | 34 | 0 max_q:128 34

cpid:11504 value -1 0 1 2 4 8 16 32 64 128 256 tcpListenDrops: cpid:11504

Monday, October 1, 12

Ad-hoc: Flame Graphs

Visualizing CPU time using DTrace proling and SVG

Monday, October 1, 12

Product

Cloud observability products including DTrace:

Joyents Cloud Analytics

Monday, October 1, 12

Product: Cloud Analytics

Syscall latency across the entire cloud, as a heat map!

Monday, October 1, 12

Product: Cloud Analytics, cont.

For operators and cloud customers Observes entire cloud, in real-time Latency focus, including heat maps Instrumentation: DTrace and kstats Front-end: Browser JavaScript Back-end: node.js and C

Monday, October 1, 12

Product: Cloud Analytics, cont.

Creating an instrumentation:

Monday, October 1, 12

Product: Cloud Analytics, cont.

Aggregating data across cloud:

Monday, October 1, 12

Product: Cloud Analytics, cont.

Visualizing data:

Monday, October 1, 12

Product: Cloud Analytics, cont.

By-host breakdowns are essential:

Switch from cloud to host in one click

Monday, October 1, 12

Case Studies

Monday, October 1, 12

Case Studies

Slow disks Scheduler

Monday, October 1, 12

Slow disks

Customer complains of poor MySQL performance.

Noticed disks are busy via iostat-based monitoring software, and have blamed noisy neighbors causing disk I/O contention.

Multi-tenancy and performance isolation are common cloud issues

Monday, October 1, 12

Slow disks, cont.

Unix 101

Process Syscall Interface VFS ZFS ...

Block Device Interface Disks

Monday, October 1, 12

Slow disks, cont.

Unix 101

Process sync. VFS ZFS ... iostat(1) often async: write buering, read ahead Syscall Interface

Block Device Interface Disks

Monday, October 1, 12

Slow disks, cont.

By measuring FS latency in application-synchronous context we can either conrm or rule-out FS/disk origin latency.

Including expressing FS latency during MySQL query, so that the issue can be quantied, and speedup calculated.

Ideally, this would be possible from within the SmartMachine, so both customer and operator can run the DTrace script. This is possible using:

pid provider: trace and time MySQL FS functions syscall provider: trace and time read/write syscalls for FS le descriptors (hence needing fds[]._fs; otherwise cache open())

Monday, October 1, 12

Slow disks, cont.

mysql_pid_fslatency.d from dtrace-cloud-tools:

# ./mysqld_pid_fslatency.d -n 'tick-10s { exit(0); }' -p 7357 Tracing PID 7357... Hit Ctrl-C to end. MySQL filesystem I/O: 55824; latency (ns): read value 1024 2048 4096 8192 16384 32768 65536 131072 262144 value 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 16777216 33554432 ------------- Distribution ------------- count | 0 |@@@@@@@@@@ 9053 |@@@@@@@@@@@@@@@@@ 15490 |@@@@@@@@@@@ 9525 |@@ 1982 | 121 | 28 | 6 | 0 ------------- Distribution ------------- count | 0 | 1 |@@@@@@ 3003 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 13532 |@@@@@ 2590 |@ 370 | 58 | 27 | 12 | 1 | 0 | 10 | 14 | 1 | 0

write

Monday, October 1, 12

Slow disks, cont.

mysql_pid_fslatency.d from dtrace-cloud-tools:

write

DRAM cache hits

Disk I/O

Monday, October 1, 12

Slow disks, cont.

mysql_pid_fslatency.d is about 30 lines of DTrace:

pid$target::os_file_read:entry, pid$target::os_file_write:entry, pid$target::my_read:entry, pid$target::my_write:entry { self->start = timestamp; } pid$target::os_file_read:return pid$target::os_file_write:return pid$target::my_read:return pid$target::my_write:return { { { { this->dir this->dir this->dir this->dir = = = = "read"; } "write"; } "read"; } "write"; }

pid$target::os_file_read:return, pid$target::os_file_write:return, pid$target::my_read:return, pid$target::my_write:return /self->start/ { @time[this->dir] = quantize(timestamp - self->start); @num = count(); self->start = 0; } dtrace:::END { printa("MySQL filesystem I/O: %@d; latency (ns):\n", @num); printa(@time); clear(@time); clear(@num); }
Monday, October 1, 12

Slow disks, cont.

mysql_pid_fslatency.d is about 30 lines of DTrace:

Thank you MySQL! If not that easy, try syscall with fds[]
this->dir this->dir this->dir this->dir = = = = "read"; } "write"; } "read"; } "write"; }

Slow disks, cont.

Going for the slam dunk:

query > 100 ms: query 538 ms, query > 100 ms: query 342 ms, query > 100 ms: query 479 ms, query > 100 ms: query 153 ms, query > 100 ms: query 383 ms, query > 100 ms: query 406 ms, query > 100 ms: query 343 ms, query > 100 ms: query 196 ms, query > 100 ms: query 254 ms,

# ./mysqld_pid_fslatency_slowlog.d 29952 2011 May 16 23:34:00 filesystem I/O during fs 509 ms, 83 I/O 2011 May 16 23:34:11 filesystem I/O during fs 303 ms, 75 I/O 2011 May 16 23:34:38 filesystem I/O during fs 471 ms, 44 I/O 2011 May 16 23:34:58 filesystem I/O during fs 152 ms, 1 I/O 2011 May 16 23:35:09 filesystem I/O during fs 372 ms, 72 I/O 2011 May 16 23:36:09 filesystem I/O during fs 344 ms, 109 I/O 2011 May 16 23:36:44 filesystem I/O during fs 319 ms, 75 I/O 2011 May 16 23:36:54 filesystem I/O during fs 185 ms, 59 I/O 2011 May 16 23:37:10 filesystem I/O during fs 209 ms, 83 I/O

Shows FS latency as a proportion of Query latency mysld_pid_fslatency_slowlog*.d in dtrace-cloud-tools

Monday, October 1, 12

Slow disks, cont.

The cloud operator can trace kernel internals. Eg, the VFS->ZFS interface using zfsslower.d:
PROCESS zlogin bash mysqld mysqld master master D W R R R R R KB 0 0 1024 1024 1 4 ms 11 14 19 22 6 5 FILE /zones/b8b2464c/var/adm/wtmpx /zones/b8b2464c/opt/local/bin/zsh /zones/b8b2464c/var/mysql/ibdata1 /zones/b8b2464c/var/mysql/ibdata1 /zones/b8b2464c/root/opt/local/ /zones/b8b2464c/root/opt/local/etc/

# ./zfsslower.d 10 TIME 2012 Sep 27 13:45:33 2012 Sep 27 13:45:36 2012 Sep 27 13:45:58 2012 Sep 27 13:45:58 2012 Sep 27 13:46:14 libexec/postfix/qmgr 2012 Sep 27 13:46:14 postfix/master.cf [...]

My go-to tool (does all apps). This example showed if there were VFS-level I/O > 10ms? (arg == 10) Stupidly easy to do

Monday, October 1, 12

Slow disks, cont.

zfs_read() entry -> return; same for zfs_write().

[...] fbt::zfs_read:entry, fbt::zfs_write:entry { self->path = args[0]->v_path; self->kb = args[1]->uio_resid / 1024; self->start = timestamp; } fbt::zfs_read:return, fbt::zfs_write:return /self->start && (timestamp - self->start) >= min_ns/ { this->iotime = (timestamp - self->start) / 1000000; this->dir = probefunc == "zfs_read" ? "R" : "W"; printf("%-20Y %-16s %1s %4d %6d %s\n", walltimestamp, execname, this->dir, self->kb, this->iotime, self->path != NULL ? stringof(self->path) : "<null>"); } [...]

zfsslower.d originated from the DTrace book

Monday, October 1, 12

Slow disks, cont.

The operator can use deeper tools as needed. Anywhere in ZFS.

# dtrace -n 'io:::start { @[stack()] = count(); }' dtrace: description 'io:::start ' matched 6 probes ^C genunix`ldi_strategy+0x53 zfs`vdev_disk_io_start+0xcc zfs`zio_vdev_io_start+0xab zfs`zio_execute+0x88 zfs`zio_nowait+0x21 zfs`vdev_mirror_io_start+0xcd zfs`zio_vdev_io_start+0x250 zfs`zio_execute+0x88 zfs`zio_nowait+0x21 zfs`arc_read_nolock+0x4f9 zfs`arc_read+0x96 zfs`dsl_read+0x44 zfs`dbuf_read_impl+0x166 zfs`dbuf_read+0xab zfs`dmu_buf_hold_array_by_dnode+0x189 zfs`dmu_buf_hold_array+0x78 zfs`dmu_read_uio+0x5c zfs`zfs_read+0x1a3 genunix`fop_read+0x8b genunix`read+0x2a7 143

Monday, October 1, 12

Slow disks, cont.

Cloud Analytics, for either operator or customer, can be used to examine the full latency distribution, including outliers:

Outliers

This heat map shows FS latency for an entire cloud data center

Monday, October 1, 12

Slow disks, cont.

Found that the customer problem was not disks or FS (99% of the time), but was CPU usage during table joins. On Joyents IaaS architecture, its usually not the disks or lesystem; useful to rule that out quickly. Some of the time it is, due to:

Bad disks (1000+ms I/O) Controller issues (PERC) Big I/O (how quick is a 40 Mbyte read from cache?) Other tenants (benchmarking!). Much less for us now with ZFS I/O throttling (thanks Bill Pijewski), used for disk performance isolation in the SmartOS cloud.

Monday, October 1, 12

Slow disks, cont.

Customer resolved real issue Prior to DTrace analysis, had spent months of poor performance believing disks were to blame

Monday, October 1, 12

Kernel scheduler

Customer problem: occasional latency outliers Analysis: no smoking gun. No slow I/O or locks, etc. Some random dispatcher queue latency, but with CPU headroom.

$ prstat -mLc 1 PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT 17930 103 21 7.6 0.0 0.0 0.0 53 16 9.1 17930 103 20 7.0 0.0 0.0 0.0 57 16 0.4 17930 103 20 7.4 0.0 0.0 0.0 53 18 1.7 17930 103 19 6.7 0.0 0.0 0.0 60 14 0.4 17930 103 2.0 0.7 0.0 0.0 0.0 96 1.6 0.0 17930 103 1.0 0.9 0.0 0.0 0.0 97 0.9 0.0 [...]

VCX ICX SCL SIG PROCESS/LWPID 57K 1 73K 0 beam.smp/265 57K 2 70K 0 beam.smp/264 63K 0 78K 0 beam.smp/263 52K 0 65K 0 beam.smp/266 6K 0 8K 0 beam.smp/267 4 0 47 0 beam.smp/280

Monday, October 1, 12

Kernel scheduler, cont.

Unix 101
Threads: R = Ready to run O = On-CPU R R R Run Queue

CPU
O

Scheduler

preemption R R R R Run Queue

CPU
O

Monday, October 1, 12

Kernel scheduler, cont.

Unix 102 TS (and FSS) check for CPU starvation

Priority Promotion R R R R R R R Run Queue CPU Starvation

CPU
O

Monday, October 1, 12

Kernel scheduler, cont.

Experimentation: run 2 CPU-bound threads, 1 CPU Subsecond offset heat maps:

Monday, October 1, 12

Kernel scheduler, cont.

Experimentation: run 2 CPU-bound threads, 1 CPU Subsecond offset heat maps:

THIS SHOULDNT HAPPEN

Monday, October 1, 12

Kernel scheduler, cont.

Worst case (4 threads 1 CPU), 44 sec dispq latency

# dtrace -n 'sched:::off-cpu /execname == "burn1"/ { self->s = timestamp; } sched:::on-cpu /self->s/ { @["off-cpu (ms)"] = lquantize((timestamp - self->s) / 1000000, 0, 100000, 1000); self->s = 0; }' off-cpu (ms) value < 0 0 1000 2000 3000 4000 5000 6000 [...] 41000 42000 43000 44000 45000 ------------- Distribution ------------- count | 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 387184 | 2256 | 1078 | 862 | 1070 | 637 | 535

Expected Bad Inconceivable

| | | | |

3 2 2 1 0

ts_maxwait @pri 59 = 32s, FSS uses ?

Monday, October 1, 12

Kernel scheduler, cont.

FSS scheduler class bug:

FSS uses a more complex technique to avoid CPU starvation. A thread priority could stay high and on-CPU for many seconds before the priority is decayed to allow another thread to run. Analyzed (more DTrace) and xed (thanks Jerry Jelinek)

Under (too) high CPU load, your runtime can be bound by how well you schedule, not do work

Cloud workloads scale fast, hit (new) scheduler issues

Monday, October 1, 12

Kernel scheduler, cont.

Required the operator of the cloud to debug

Even if the customer doesnt have kernel-DTrace access in the zone, they still benet from the cloud provider having access Ask your cloud provider to trace scheduler internals, in case you have something similar

On Hardware Virtualization, scheduler issues can be terrifying

Monday, October 1, 12

Kernel scheduler, cont.

Each kernel believes they own the hardware. Cloud Tenant Apps Guest Kernel VCPU VCPU Cloud Tenant Apps Guest Kernel VCPU VCPU Cloud Tenant Apps Guest Kernel VCPU VCPU

Host Kernel CPU CPU CPU CPU

Monday, October 1, 12

Kernel scheduler, cont.

One scheduler: Cloud Tenant Apps Guest Kernel VCPU VCPU Cloud Tenant Apps Guest Kernel VCPU VCPU Cloud Tenant Apps Guest Kernel VCPU VCPU

Host Kernel CPU CPU CPU CPU

Monday, October 1, 12

Kernel scheduler, cont.

Many schedulers. Kernel ght! Cloud Tenant Apps Guest Kernel VCPU VCPU Cloud Tenant Apps Guest Kernel VCPU VCPU Cloud Tenant Apps Guest Kernel VCPU VCPU

Host Kernel CPU CPU CPU CPU

Monday, October 1, 12

Kernel scheduler, cont.

Had a networking performance issue on KVM; debugged using:

Host: DTrace Guests: Prototype DTrace for Linux, SystemTap

Took weeks to debug the kernel scheduler interactions and determine the x for an 8x win. Ofce wall (output from many perf tools, including Flame Graphs):

Monday, October 1, 12

Thank you!

http://dtrace.org/blogs/brendan email [email protected] twitter @brendangregg Resources:

http://www.slideshare.net/bcantrill/dtrace-in-the-nonglobal-zone http://dtrace.org/blogs/dap/2011/07/27/oscon-slides/ https://github.com/brendangregg/dtrace-cloud-tools http://dtrace.org/blogs/brendan/2011/12/16/ame-graphs/ http://dtrace.org/blogs/brendan/2012/08/09/10-performance-wins/ http://dtrace.org/blogs/brendan/2011/10/04/visualizing-the-cloud/

Thanks @dapsays and team for Cloud Analytics, Bryan Cantrill for DTrace xes, @rmustacc for KVM perf war, and @DeirdreS for another great event.

Monday, October 1, 12

Solaris 10: DTrace and Zones Overview
No ratings yet
Solaris 10: DTrace and Zones Overview
42 pages
Dynamic Tracing For Exploitation and Fuzzing Final
No ratings yet
Dynamic Tracing For Exploitation and Fuzzing Final
31 pages
2020 2021 l41 Dtrace
No ratings yet
2020 2021 l41 Dtrace
6 pages
Opensolaris Troubleshooting: The Unofficial Tourist Guide
No ratings yet
Opensolaris Troubleshooting: The Unofficial Tourist Guide
39 pages
Solaris 10 for IT Professionals
No ratings yet
Solaris 10 for IT Professionals
33 pages
Troubleshooting Java Programs With Dtrace: Arieh Markel Sun Microsystems
No ratings yet
Troubleshooting Java Programs With Dtrace: Arieh Markel Sun Microsystems
50 pages
Solaris to Linux Userland Tools Guide
No ratings yet
Solaris to Linux Userland Tools Guide
35 pages
Javaone2015mixedmodeflamegraphs 151028205342 Lva1 App6891
No ratings yet
Javaone2015mixedmodeflamegraphs 151028205342 Lva1 App6891
92 pages
DTrace Overview for OpenSolaris
No ratings yet
DTrace Overview for OpenSolaris
181 pages
Understanding DTrace Applications
No ratings yet
Understanding DTrace Applications
40 pages
DTrace for Debugging Transient Failures
No ratings yet
DTrace for Debugging Transient Failures
40 pages
Linux Server Identification Lab
No ratings yet
Linux Server Identification Lab
7 pages
Thenewsystemsperformance 131014005720 Phpapp01
No ratings yet
Thenewsystemsperformance 131014005720 Phpapp01
17 pages
51 Solaris Dev Day DTrace060522
No ratings yet
51 Solaris Dev Day DTrace060522
126 pages
Linux Server Command Line Lab
No ratings yet
Linux Server Command Line Lab
7 pages
Linux Performance Analysis Tools Overview
No ratings yet
Linux Performance Analysis Tools Overview
75 pages
Chapter 2
No ratings yet
Chapter 2
12 pages
Fusion For Apple Silicon Companion v30
No ratings yet
Fusion For Apple Silicon Companion v30
95 pages
Advanced DTrace Insights
No ratings yet
Advanced DTrace Insights
56 pages
Dtrace & Mysql: Ben Rockwood Director of Systems Joyent
No ratings yet
Dtrace & Mysql: Ben Rockwood Director of Systems Joyent
24 pages
Ftrace Kernel Tracing Guide
No ratings yet
Ftrace Kernel Tracing Guide
45 pages
DFOR510 Week12 VM Cloud Class
No ratings yet
DFOR510 Week12 VM Cloud Class
44 pages
Fusion For Apple Silicon Companion v31
No ratings yet
Fusion For Apple Silicon Companion v31
103 pages
Linux Command Line: Identify Servers
No ratings yet
Linux Command Line: Identify Servers
7 pages
4.3.4 Lab - Linux Servers
No ratings yet
4.3.4 Lab - Linux Servers
7 pages
OS Observability Tools Overview
No ratings yet
OS Observability Tools Overview
37 pages
Velocity2017bpfsuperpowers 170622233822
No ratings yet
Velocity2017bpfsuperpowers 170622233822
54 pages
Dtrace by Example: Solving A Real-World Problem: Paul Van Den Bogaard January 2007 Sun Microsystems, Inc
No ratings yet
Dtrace by Example: Solving A Real-World Problem: Paul Van Den Bogaard January 2007 Sun Microsystems, Inc
23 pages
3.1.3.4 Lab Linux Servers
No ratings yet
3.1.3.4 Lab Linux Servers
7 pages
W3-Lab 02 - Linux Servers - ILM
No ratings yet
W3-Lab 02 - Linux Servers - ILM
7 pages
Linux Performance Optimization Guide
No ratings yet
Linux Performance Optimization Guide
27 pages
Linux Perf Profiling at Netflix
No ratings yet
Linux Perf Profiling at Netflix
79 pages
Linux Profiling at Netflix: Using Perf - Events (Aka "Perf")
No ratings yet
Linux Profiling at Netflix: Using Perf - Events (Aka "Perf")
84 pages
Vsphere VM Snapshots Perf
No ratings yet
Vsphere VM Snapshots Perf
20 pages
AHGL Monitoring From Linux Command Line
No ratings yet
AHGL Monitoring From Linux Command Line
20 pages
Java Performance Analysis with Flame Graphs
No ratings yet
Java Performance Analysis with Flame Graphs
71 pages
Pstack Truss
No ratings yet
Pstack Truss
9 pages
MetaSAN 5 0 1 Users Guide
No ratings yet
MetaSAN 5 0 1 Users Guide
214 pages
Ftrace: Linux Kernel Tracing Guide
No ratings yet
Ftrace: Linux Kernel Tracing Guide
50 pages
Network Appliance Dataontap 7.2 Command Reference: Page 1of 3
No ratings yet
Network Appliance Dataontap 7.2 Command Reference: Page 1of 3
3 pages
Documentation Trace Ftrace
No ratings yet
Documentation Trace Ftrace
30 pages
A Kernel Trace Device For Plan9
No ratings yet
A Kernel Trace Device For Plan9
29 pages
Linux Profiling with perf_events at Netflix
No ratings yet
Linux Profiling with perf_events at Netflix
84 pages
Strace ITCAM TT
No ratings yet
Strace ITCAM TT
6 pages
Percona2016linuxsystemsperf 160421182216
No ratings yet
Percona2016linuxsystemsperf 160421182216
72 pages
KVM Linux PDF
No ratings yet
KVM Linux PDF
33 pages
Cloud Assignment 1: Application Virtualization
No ratings yet
Cloud Assignment 1: Application Virtualization
8 pages
Awsreinvent2014perftuningec2 141112191859 Conversion Gate02
No ratings yet
Awsreinvent2014perftuningec2 141112191859 Conversion Gate02
81 pages
Linux Process Debugging Tools Guide
No ratings yet
Linux Process Debugging Tools Guide
8 pages
Arista EOS Hardening Guide
No ratings yet
Arista EOS Hardening Guide
12 pages
Module-1 Notes
No ratings yet
Module-1 Notes
6 pages
P51a 03 Part2
No ratings yet
P51a 03 Part2
38 pages
CSC437 Fall2013 Project 3 DHCP Snooping DNS Cache Poisoning MITM Attack
No ratings yet
CSC437 Fall2013 Project 3 DHCP Snooping DNS Cache Poisoning MITM Attack
9 pages
Arista EOS Hardening Guide
0% (1)
Arista EOS Hardening Guide
26 pages
Chapter1 Solaris Overview - : Feature and Architecture
No ratings yet
Chapter1 Solaris Overview - : Feature and Architecture
21 pages
Task 3
No ratings yet
Task 3
9 pages
4.3.4 Lab Linux Servers
No ratings yet
4.3.4 Lab Linux Servers
7 pages
Vmware Cloud Foundation 9-0 Feature Comparison and Upgrade Paths
No ratings yet
Vmware Cloud Foundation 9-0 Feature Comparison and Upgrade Paths
21 pages
Kew 1021R Multimeter
No ratings yet
Kew 1021R Multimeter
1 page
Kew 1019r Multimeter
No ratings yet
Kew 1019r Multimeter
1 page
KEW1020R Digital Multimeter Manual
No ratings yet
KEW1020R Digital Multimeter Manual
1 page
Kyoritsu KEW1030 Safety Instructions
No ratings yet
Kyoritsu KEW1030 Safety Instructions
1 page
Kyoritsu 1110 Multimeter Manual
No ratings yet
Kyoritsu 1110 Multimeter Manual
1 page
FSCK Paper For AIX
No ratings yet
FSCK Paper For AIX
21 pages
Digital Multi Meter Safety Manual
No ratings yet
Digital Multi Meter Safety Manual
1 page
Kyoritsu 1009 Multimeter Manual
No ratings yet
Kyoritsu 1009 Multimeter Manual
13 pages
Slides From Capacity Planning For LAMP' Talk at MySQL Conf 2007
100% (9)
Slides From Capacity Planning For LAMP' Talk at MySQL Conf 2007
54 pages
FreeBSD Jails: Secure Virtualization Guide
No ratings yet
FreeBSD Jails: Secure Virtualization Guide
172 pages
Tutorial Netgraph
No ratings yet
Tutorial Netgraph
66 pages
Kyoritsu 1110 Multimeter Manual
No ratings yet
Kyoritsu 1110 Multimeter Manual
1 page
Strategies for Legacy Code Reengineering
100% (3)
Strategies for Legacy Code Reengineering
47 pages
Introduction to bhyve Hypervisor
No ratings yet
Introduction to bhyve Hypervisor
34 pages
Reflections PDF
No ratings yet
Reflections PDF
3 pages
Kyoritsu 1011 Kew
No ratings yet
Kyoritsu 1011 Kew
12 pages
Linux Command Line Basics
No ratings yet
Linux Command Line Basics
17 pages
Past Simple and Continuous Tense Guide
No ratings yet
Past Simple and Continuous Tense Guide
7 pages
Catholic Dictionary PDF
No ratings yet
Catholic Dictionary PDF
28 pages
Bhakti and Sufi Traditions Overview
No ratings yet
Bhakti and Sufi Traditions Overview
2 pages
14 Magic
No ratings yet
14 Magic
20 pages
Undivided Neal Shusterman Download
50% (2)
Undivided Neal Shusterman Download
35 pages
Short E-Ten Pets
No ratings yet
Short E-Ten Pets
6 pages
GEOMETRY OF SETS AND MEASURES IN EUCLIDEAN SPACES - Rectifiability - and - Singular - Integrals
No ratings yet
GEOMETRY OF SETS AND MEASURES IN EUCLIDEAN SPACES - Rectifiability - and - Singular - Integrals
24 pages
Ajit Kumar Resume
No ratings yet
Ajit Kumar Resume
2 pages
Coms 101 Exam A Embu Campus
No ratings yet
Coms 101 Exam A Embu Campus
2 pages
Lab 6 Ye Wali
No ratings yet
Lab 6 Ye Wali
13 pages
Understanding Wish and Hope Usage
No ratings yet
Understanding Wish and Hope Usage
4 pages
Lista de Exercícios: Question Tags
No ratings yet
Lista de Exercícios: Question Tags
2 pages
C# Math Operations Programs
No ratings yet
C# Math Operations Programs
5 pages
Mommy - Google Search
No ratings yet
Mommy - Google Search
1 page
Olympic Games Language Unit
No ratings yet
Olympic Games Language Unit
15 pages
Excel VBA Code Essentials Guide
100% (1)
Excel VBA Code Essentials Guide
152 pages
Chekhov S Vision of Reality
No ratings yet
Chekhov S Vision of Reality
37 pages
Critical Analysis Literature Review Examples
No ratings yet
Critical Analysis Literature Review Examples
19 pages
FortiAnalyzer 7.0.3 CLI Reference
No ratings yet
FortiAnalyzer 7.0.3 CLI Reference
289 pages
The Misanthrope
0% (1)
The Misanthrope
3 pages
Database Stored
No ratings yet
Database Stored
2 pages
Mitchell - What Do The Pictures Want
No ratings yet
Mitchell - What Do The Pictures Want
4 pages
Beyond The Liberal Vs Faithful' Binary-Exploring Non-Brahmin Contestations at Sabarimala
No ratings yet
Beyond The Liberal Vs Faithful' Binary-Exploring Non-Brahmin Contestations at Sabarimala
10 pages
D.el - Ed Part-I Previous Year Question Till 2022-2024 PDF
No ratings yet
D.el - Ed Part-I Previous Year Question Till 2022-2024 PDF
175 pages
Mail Merge Guide for Data Entry
No ratings yet
Mail Merge Guide for Data Entry
3 pages
Ecr
No ratings yet
Ecr
36 pages
EMACS 25 Quick Reference Guide
No ratings yet
EMACS 25 Quick Reference Guide
6 pages
FSG JAVA Brochure 14-10-2024update
No ratings yet
FSG JAVA Brochure 14-10-2024update
13 pages
May vs. Might: Grammar Simplified
No ratings yet
May vs. Might: Grammar Simplified
3 pages
Logic Computer Design Fundamentals 5th Edition Mano Full Chapters Instanly
100% (2)
Logic Computer Design Fundamentals 5th Edition Mano Full Chapters Instanly
119 pages