0% found this document useful (0 votes)
349 views60 pages

Picking Up The Pieces After Your LINUX System Crashes

This document discusses gathering crash data after a Linux system crash. It outlines goals such as preparing the system to provide information about crashes and hangs. It recommends configuring tools like SysRq, sysstat, and system logs to collect useful data. The document also discusses different crash scenarios and analyzing gathered information to reconstruct the crash.

Uploaded by

Amit Mehta
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
349 views60 pages

Picking Up The Pieces After Your LINUX System Crashes

This document discusses gathering crash data after a Linux system crash. It outlines goals such as preparing the system to provide information about crashes and hangs. It recommends configuring tools like SysRq, sysstat, and system logs to collect useful data. The document also discusses different crash scenarios and analyzing gathered information to reconstruct the crash.

Uploaded by

Amit Mehta
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 60

Picking Up the Pieces after

your LINUX System Crashes


An Administrator's Overview to
Gathering Crash Data

Alan Boda
Hewlett-Packard Company
Introduction
 Did my system crash or hang?
 Why did my system crash or hang?
 What do I save?
 What should I do next time?
 What should I do now?

2
Alan Boda - HP 8/15/2006
What we will be discussing
 System Admin goals
 Crash analogy
 Difference between a crash and a hang
 Environment Scenarios
 Tools to have in place now
 What data to gather and how to gather it
 What to do before and after the reboot
 Reconstructing the crash scene
 Ways to look at gathered data

3
Alan Boda - HP 8/15/2006
What we won’t be discussing
 Crash dump analysis
 Tool installation and configuration details
 System or Application Performance
Tuning

4
Alan Boda - HP 8/15/2006
Goals as System Administrator
 Prepare system so it can tell you what
happened if it hangs or crashes
 Reconstruct the Crash/Hang Scene
 Develop emergency procedures
sar envir logs
onme
nt

profiler Diagnos LED’s


s tics

track
dumps
record updat
es

5
Alan Boda - HP 8/15/2006
Car Crash Analogy
 Accident Reconstruction Consultants
 Evidence and Clues
 Making the accident scene tell its story
 More clues = clearer picture
 Goal: Prep system to tell you what
happened

6
Alan Boda - HP 8/15/2006
Car Crash vs. System Crash
skid marks performance degradation
Weather system environment changes
blown tire Failures in storage, fan, power supply, etc…
eye witnesses system administrator or user who saw the failure or hang
survivors Do any processes respond?
Login response?
Db/sql query response?
Ping response?

black box System activity data


Message Logs
System management log entries
Profiler tool data
Dumps

physical Damaged hardware


evidence Diagnostic LED’s
Physical or remote console messages

age of vehicle New system installation?


Extended System/Application track record?
service records New Package installations
Kernel or package upgrades
System h/w upgrade records

travel plan Scheduled tests, changes, maintenance logs


7
Alan Boda - HP 8/15/2006
System Hang vs. Crash
 System Hang
– Partially responsive
– Resource deficiency
– Not responsive, but no crash
– Runaway high priority process
– bug in driver’s interrupt handling code
 Oops
 System Crash
– Nonresponsive system
– Panic due to logical error
8
Alan Boda - HP 8/15/2006
Different Environment Scenarios
 Lights Out environment?
 Cluster environment?
 Database server?

9
Alan Boda - HP 8/15/2006
System Tools
to Configure Now!

10
Alan Boda - HP 8/15/2006
SysRq
 aka magic keys
 used during hang/freeze situations
 Alt-SysRq-<command key> sequence
 provides memory, stack trace, process info
 commands to sync disks, crash system
 logs to messages and netdump (RHEL) – best effort
 RHEL and SLES kernels have SysRq configured but not
enabled.
 To verify if enabled:
# cat /proc/sys/kernel/sysrq
(1 = enabled, 0 = disabled)
 Security risk

11
Alan Boda - HP 8/15/2006
SysRq Configuration
 Must set /proc/sys/kernel/sysrq to 1

# echo 1 > /proc/sys/kernel/sysrq

-or-

#sysctl –w kernel.sysrq=1
#sysctl –p

 To retain setting across reboots

RHEL - Edit: /etc/sysctl.conf


kernel.sysrq = 1

SLES - Edit /etc/sysconfig/sysctl


ENABLE_SYSRQ="yes"

12
Alan Boda - HP 8/15/2006
Sample SysRq Output
SysRq : Show Regs
Pid/TGid: 0/0, comm: swapper
EIP: 0060:[<c0109129>] CPU: 1
EIP is at default_idle [kernel] 0x29 (2.4.21-20.ELsmp)
ESP: 080b:c01091c2 EFLAGS: 00000246 Tainted: P
EAX: 00000000 EBX: c0109100 ECX: c043bc80 EDX: c9b20000
ESI: c9b20000 EDI: c9b20000 EBP: c0109100 DS: 0068 ES: 0068 FS: 0000 GS:
0000
CR0: 8005003b CR2: b729f000 CR3: 376c9900 CR4: 000006f0
Call Trace: [<c01091c2>] cpu_idle [kernel] 0x42 (0xc9b21fb0)
[<c01291c3>] call_console_drivers [kernel] 0x63 (0xc9b21fc4)
[<c01294f3>] printk [kernel] 0x153 (0xc9b21ffc)
 
Zone:Normal freepages:108783 min: 1279 low: 4544 high: 6304
Zone:HighMem freepages:1209405 min: 255 low: 20990 high: 31485
Free pages: 1321089 (1209405 HighMem)
( Active: 78806/14876, inactive_laundry: 4493, inactive_clean: 0, free:
1321089
)

13
Alan Boda - HP 8/15/2006
sysstat
 package containing iostat, sadc, sar, mpstat
 System activity data collected
 snapshots taken every 10 minutes
 saves 7 days of reports by default (RHEL)
 to verify:
# rpm -qa | grep sysstat
sysstat-5.0.1-35.4

# chkconfig --list | grep sysstat


sysstat 0:off 1:on 2:on 3:on 4:on 5:on 6:off

14
Alan Boda - HP 8/15/2006
Contents of /var/log/sa (RHEL)
# ls -l /var/log/sa
total 4060
-rw-r--r-- 1 root root 207600 Jan 20 23:50 sa20
-rw-r--r-- 1 root root 207600 Jan 21 23:50 sa21
-rw-r--r-- 1 root root 207600 Jan 22 23:50 sa22
-rw-r--r-- 1 root root 207600 Jan 23 23:50 sa23
-rw-r--r-- 1 root root 207600 Jan 24 23:50 sa24
-rw-r--r-- 1 root root 207600 Jan 25 23:50 sa25
-rw-r--r-- 1 root root 207600 Jan 26 23:50 sa26
-rw-r--r-- 1 root root 207600 Jan 27 23:50 sa27
-rw-r--r-- 1 root root 88080 Jan 28 10:00 sa28
-rw-r--r-- 1 root root 287976 Jan 20 23:53 sar20
-rw-r--r-- 1 root root 287976 Jan 21 23:53 sar21
-rw-r--r-- 1 root root 287976 Jan 22 23:53 sar22
-rw-r--r-- 1 root root 287976 Jan 23 23:53 sar23
-rw-r--r-- 1 root root 287976 Jan 24 23:53 sar24
-rw-r--r-- 1 root root 287976 Jan 25 23:53 sar25
-rw-r--r-- 1 root root 287976 Jan 26 23:53 sar26
-rw-r--r-- 1 root root 287976 Jan 27 23:53 sar27
15
Alan Boda - HP 8/15/2006
Contents of /var/log/sa (SLES)
# ls /var/log/sa
. sa.2006_01_13 sa.2006_01_24 sar.2006_01_10 sar.2006_01_21
.. sa.2006_01_14 sa.2006_01_25 sar.2006_01_11 sar.2006_01_22
sa.2006_01_04 sa.2006_01_15 sa.2006_01_26 sar.2006_01_12 sar.2006_01_23
sa.2006_01_05 sa.2006_01_16 sa.2006_01_27 sar.2006_01_13 sar.2006_01_24
sa.2006_01_06 sa.2006_01_17 sa.2006_01_28 sar.2006_01_14 sar.2006_01_25
sa.2006_01_07 sa.2006_01_18 sar.2006_01_04 sar.2006_01_15 sar.2006_01_26
sa.2006_01_08 sa.2006_01_19 sar.2006_01_05 sar.2006_01_16 sar.2006_01_27
sa.2006_01_09 sa.2006_01_20 sar.2006_01_06 sar.2006_01_17
sa.2006_01_10 sa.2006_01_21 sar.2006_01_07 sar.2006_01_18
sa.2006_01_11 sa.2006_01_22 sar.2006_01_08 sar.2006_01_19
sa.2006_01_12 sa.2006_01_23 sar.2006_01_09 sar.2006_01_20

16
Alan Boda - HP 8/15/2006
Sample sar report
# more sar20
Linux 2.4.21-37.ELsmp (karp.alf.cpqcorp.net) 2006-01-20

00:00:00 proc/s
00:10:00 0.03
00:20:00 0.01
00:30:00 0.01
00:40:00 0.01
00:50:00 0.01
01:00:00 0.01
01:10:00 0.03
01:20:00 0.01
01:30:00 0.01
01:40:00 0.01
01:50:00 0.01
02:00:00 0.01
02:10:00 0.03
02:20:00 0.01
02:30:00 0.01
02:40:00 0.01

17
Alan Boda - HP 8/15/2006
System Management Tools
 snmp-based tools
 Examples:
– IBM: IBM Director Agents
– Dell: OpenManage Server Administrator
– HP: Insight Manager and Agents
 Agents monitor and log to system logs
 Predictive fault (if supported by driver)

18
Alan Boda - HP 8/15/2006
Other tools
 Special situations
 vendor-specific cron script to gather
– /proc/meminfo
– top
– /proc/slabinfo
– vmstat
– netstat
– interrupt
– lsof

19
Alan Boda - HP 8/15/2006
Crash Dump Issues
 Inconsistent crash dump methods
 Standard kernel
 deadlocks
 resources
 network throughput for network-based dumps
 assumes trusted kernel state
 where to dump
 ASR interference

20
Alan Boda - HP 8/15/2006
Crash Dump tools

 netdump - RHEL
 diskdump - RHEL
 LKCD - SLES
 mkdump
 kdump

21
Alan Boda - HP 8/15/2006
Netdump
 dumps to remote disk
 nic must support polled operation
 log file of panic, oops and other SysRq output
 Verify:
– # service netdump status
– # service netdump-server status
– Check /etc/sysconfig/netdump
 DEV=eth0 (or other nic)
 NETDUMPADDR={netdump-server IP}

22
Alan Boda - HP 8/15/2006
Diskdump
 dumps to local disk
 limited controllers
 dump levels
 Available as of RHEL 3 U3
 Verify:

– # service diskdump status


– # cat /etc/sysconfig/diskdump

23
Alan Boda - HP 8/15/2006
LKCD
 dumps to local disk
 can also dump to netdump-server (default)
 different dump levels
 Verify:

– # lkcd query

24
Alan Boda - HP 8/15/2006
mkdump
 minikernel dump (based on mkexec)
 OpenSource
 Uses netdump and LKCD dump format

kdump
 kexec-based kernel crash dump mechanism
 OpenSource
 Use crash to analyze dump file

25
Alan Boda - HP 8/15/2006
Dump Suggestions
 Disk-based dumps
 Network-based dumps
 Automatic Server Recovery (ASR)
timeouts
 Synchronize time
 Best effort

26
Alan Boda - HP 8/15/2006
Test out Dump
 Enable the magic sysrq key
# sysctl -w kernel/sysrq=1
 Enable panic_on_oops
# sysctl -w kernel/panic_on_oops=1
 netdump: check to see if netlog is working
# echo h > /proc/sysrq-trigger
 netdump: Test SysRq writes to netdump log file
#echo m > /proc/sysrq-trigger
 Sync all mounted file systems
# echo s > /proc/sysrq-trigger
 Crash the system
– # echo c > /proc/sysrq-trigger (RHEL)
– # echo d > /proc/sysrq-trigger (SLES)
– crash.c (RHEL)
 diskdump – check /var/crash/127.0.0.1-<date>
 lkcd – check /var/log/dump/

27
Alan Boda - HP 8/15/2006
System Snapshot
 Take snapshot of working system now
 Run normal working load while taking
snapshot
 Will discuss tools one can use shortly

28
Alan Boda - HP 8/15/2006
Now that the System has
Crashed or Hung

Gathering the clues

Assume that tools have been preconfigured!


Before You Reboot
 Don’t delay!
 Console messages?
 ping?
 telnet, ssh, or rsh?
 If db server, query response?
 SysRq keys
 LED’s?
 Physical environment changes?
 Dump?
 Time of hang or crash

30
Alan Boda - HP 8/15/2006
After the Reboot
 kernel (uname –a)
 loaded modules (lsmod)
 bus information (lspci -w)
 boot information (dmesg)
 system logs (/var/log/*)
 memory (/proc/meminfo)
 cpu (/proc/cpuinfo)
 disk (/proc/scsi/scsi)
 disk partition (/proc/partitions)
 installed rpm’s (/var/log/rpminfo, “rpm –qa”)
 time of hang or crash
 cpu details – (dmidecode)

31
Alan Boda - HP 8/15/2006
Snapshot Tools for After Reboot

 sysreport (RHEL)
 sitar (SLES)
 config.sh (SLES)
 cfg2html (OpenSource)
 h/w diagnostic tools (vendor-specific)

32
Alan Boda - HP 8/15/2006
sysreport
 Verify: rpm –qa | grep sysreport
 What does it generate?
# ls -w 50
boot free ksyms mount rpm-Va
date hardware.py lib proc uname
df hostname ls-boot ps uptime
etc ifconfig lsmod pstree var
fdisk-l installed-rpms lspci route

33
Alan Boda - HP 8/15/2006
sitar
 Generates various reports detailing
– add-on's
– installed packages
– system info
– yast installed packages
 Reports created in different formats

34
Alan Boda - HP 8/15/2006
sitar-generated files
# sitar
# ls /tmp/sitar-fwills.america.cpqcorp.net-2006020104/
.
..
sitar-addon-fwills.america.cpqcorp.net-yast2.sel
sitar-fwills.america.cpqcorp.net-yast1.sel
sitar-fwills.america.cpqcorp.net.html
sitar-fwills.america.cpqcorp.net.sdocbook.xml
sitar-fwills.america.cpqcorp.net.tex
sitar-sles-fwills.america.cpqcorp.net-yast2.sel

35
Alan Boda - HP 8/15/2006
Sitar .html report
fwills.america.cpqcorp.net, Wed Feb 1 04:48:57 2006
Linux fwills 2.6.5-7.193-default #1 Wed Jul 20 14:39:18 UTC 2005 i686 i686 i386 GNU/Linux
SUSE LINUX Enterprise Server 9 (i586)

Table of Contents
 1. General Information
 2. CPU


1. General Information
Hostname fwills.america.cpqcorp.net
Operating System SUSE LINUX Enterprise Server 9 (i586)
UName Linux fwills 2.6.5-7.193-default #1 Wed Jul 20 14:39:18 UTC 2005 i686 i686 i386 GNU/Linux
Date Wed Feb 1 04:48:57 2006
Main Memory 385976 KByte
Cmdline root=/dev/sda2 vga=0x317 selinux=0 resume=/dev/sda3 elevator=cfq splash=silent
Load 0.00 0.00 0.00 1/79 2084
Uptime (minutes hours days) 181715 3028 124
Idletime (minutes hours days) 37934 632 26

2. CPU

36
Alan Boda - HP 8/15/2006
config.sh
 What does it generate?

# ls -w 50
. iscsi.txt performance.txt
.. lvm.txt rcd.txt
boot.txt messages.txt release.txt
chkconfig.txt modules.txt rpm.txt
config.sh.txt mpio.txt rug.txt
cron.txt ncp.txt scsi.txt
env.txt network.txt siga.txt
evms.txt nss.txt softraid.txt
hwinfo.txt pam.txt y2log.txt

37
Alan Boda - HP 8/15/2006
Tools to view sysstat data
 sar
 isag
 sarcheck

38
Alan Boda - HP 8/15/2006
Sample sar commands
# sar -u 2 4
Linux 2.4.21-37.ELsmp (karp.alf.cpqcorp.net) 02/01/2006

03:01:50 PM CPU %user %nice %system %iowait %idle


03:01:52 PM all 0.50 0.00 0.25 0.75 98.49
03:01:54 PM all 0.00 0.00 0.50 0.00 99.50
03:01:56 PM all 0.00 0.00 0.00 0.25 99.75
03:01:58 PM all 0.25 0.00 0.50 0.00 99.25
Average: all 0.19 0.00 0.31 0.25 99.25

# cd /var/log/sa
# sar -A -f sa01 > sar01-new
# ls -l sa*01*
-rw-r--r-- 1 root root 132720 Feb 1 15:10 sa01
-rw-r--r-- 1 root root 181575 Feb 1 15:08 sar01-new

39
Alan Boda - HP 8/15/2006
Reconstructing the Crash Scene
Check logs
 /var/log/messages
– Search for kernel load entry
– Work backwards and look for:
 Errors or Warnings
 Oops messages with trace output

 Other log files


 df output

41
Alan Boda - HP 8/15/2006
Oops
Oct 30 00:05:34 karp kernel: Unable to handle kernel NULL pointer dereference
at virtual address 00000008
Oct 30 00:05:34 karp kernel:  printing eip:
Oct 30 00:05:34 karp kernel: c011ec5d
Oct 30 00:05:34 karp kernel: *pde = 2aefa001
Oct 30 00:05:34 karp kernel: Oops: 0000
Oct 30 00:05:34 karp kernel: Kernel 2.4.9-e.38enterprise
Oct 30 00:05:34 karp kernel: CPU:    1
Oct 30 00:05:34 karp kernel: EIP:    0010:[get_module_list+61/816]    Tainted: P
Oct 30 00:05:34 karp kernel: EIP:    0010:[<c011ec5d>]    Tainted: P
Oct 30 00:05:34 karp kernel: EFLAGS: 00010246
Oct 30 00:05:34 karp kernel: EIP is at get_module_list [kernel] 0x3d
Oct 30 00:50:00 karp syslogd 1.4.1: restart.
Oct 30 00:50:00 karp syslog: syslogd startup succeeded
Oct 30 00:50:00 karp kernel: klogd 1.4.1, log source = /proc/kmsg started.
Oct 30 00:50:00 karp kernel: Inspecting /boot/System.map-2.4.9-e.38enterprise
Oct 30 00:50:00 karp syslog: klogd startup succeeded

42
Alan Boda - HP 8/15/2006
ksymoops
>>EIP; c0113f8c <sys_init_module+49c/4d0>
Trace; c011d3f5 <sys_mremap+295/370>
Trace; c011af5f <do_generic_file_read+5bf/5f0>
Trace; c011afe9 <file_read_actor+59/60>
Trace; c011d2bc <sys_mremap+15c/370>
Trace; c010e80f <do_sigaltstack+ff/1a0>
Trace; c0107c39 <overflow+9/c>
Trace; c0107b30 <tracesys+1c/23>
Trace; 00001000 Before first symbol

43
Alan Boda - HP 8/15/2006
SAR Data Example
15:20:00    dentunusd   file-sz  %file-sz  inode-sz  super-sz %super-sz  dquot-sz %dquot-sz  rtsig-sz %rtsig-sz
15:30:00      1866554      2728      2.08   2055024         0       0.00         0      0.00         1      0.10
15:40:01      1866909      2785      2.12   2055020         0       0.00         0      0.00         1      0.10

17:20:00      1870217      2786      2.13   2055019         0       0.00         0      0.00         1      0.10
17:30:00      1870516      2762      2.11   2055022         0       0.00         0      0.00         1      0.10
17:40:00      1870848      2785      2.12   2055019         0       0.00         0      0.00         1      0.10
17:50:00      1569671      2156      1.64   1743619         0       0.00         0      0.00         1      0.10
18:00:00      1570730      1984      1.51   1744880         0       0.00         0      0.00         1      0.10
18:10:00      1571240      1792      1.37   1745241         0       0.00         0      0.00         1      0.10
18:20:00      1571768      1510      1.15   1745796         0       0.00         0      0.00         1      0.10
18:30:00      1572100      1483      1.13   1745826         0       0.00         0      0.00         1      0.10
18:40:00      1573175       16      0.01   1747980       0        0.00         0      0.00         1      0.10   

44
Alan Boda - HP 8/15/2006
ISAG View of Sar Data

45
Alan Boda - HP 8/15/2006
The Question of Debuggers
Torvalds quote
“I do see some good points in a kernel debugger, but I have yet to be
convinced that the good things outweigh the bad. The only valid uses of
debuggers is to get a stack backtrace and a register dump, imho, and
that is what you get from a kernel panic anyway (and the ksymoops.cc
program will actually make it readable for others than just me ;-)

I'm afraid that I've seen too many people fix bugs by looking at
debugger output, and that almost inevitably leads to fixing the symptoms
rather than the underlying problems. “

Ref: http://www.ussg.iu.edu/hypermail/linux/kernel/9510/0103.html

46
Alan Boda - HP 8/15/2006
In The Dumps
 Recover the vmcore
 Tools to analyze:
– netdump / diskdump: use crash
– LKCD: use lcrash or crash
– mkdump: use lcrash or crash
 Key items: process stacks, system calls
 Requirements needed from crashed system

47
Alan Boda - HP 8/15/2006
Netdump
 Check netdump log file first
– Oops or panic messages
– loaded modules
– SysRq memory, trace, process info
– Stack trace
 Use “crash” on vmcore
 syslog
 Ref: /usr/share/doc/netdump-*

48
Alan Boda - HP 8/15/2006
Netdump-server Files

# ls -l /var/crash/16.113.5.104-2003-12-15-12:21
total 141108
-rw------- 1 netdump netdump 63067 Dec 15 2003 log
-rw------- 1 netdump netdump 134205440 Dec 15 2003 vmcore

# du -sk /var/crash/16.113.5.104-2003-12-15-12:21/vmcore
131192 /var/crash/16.113.5.104-2003-12-15-12:21/vmcore

49
Alan Boda - HP 8/15/2006
Netdump-server log
# more log
Oops: 0002
Kernel 2.4.9-e.3
CPU: 0
EIP: 0010:[<c8a44076>] Tainted: P
EFLAGS: 00010282
EIP is at init_module [crash] 0x16
eax: 00000013 ebx: c8a44000 ecx: 00000000 edx: c543e000
esi: 00000000 edi: 00000000 ebp: c3149f28 esp: c3149f20
ds: 0018 es: 0018 ss: 0018
Process insmod (pid: 5619, stackpage=c3149000)
Stack: 00000000 00000060 00000060 c0118eb5 00000000 c36fb000 00000098 c35c6000
00000060 ffffffea 00000005 c468b740 00000060 c8a3f000 c8a44060 000002e8
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Call Trace: [<c0118eb5>] sys_init_module [kernel] 0x535
[<c8a44060>] init_module [crash] 0x0
[<c0106f03>] system_call [kernel] 0x33

Code: c6 05 00 00 00 00 00 b8 00 00 00 00 c9 c3 63 72 61 73 68 69
< netdump activated - performing handshake with the client. >

Process: 5619, { insmod}


50
Alan Boda - HP 8/15/2006
Disk Dump

 check for vmcore in /var/crash/127.0.0.1-<date>


 use same crash tool as for netdump to debug
 Ref: /usr/share/doc/diskdumputils-*/

51
Alan Boda - HP 8/15/2006
LKCD

 check /var/log/dump/n
 Use lcrash or crash to analyze vmcore
 Ref: /usr/share/doc/packages/lkcdutils/

52
Alan Boda - HP 8/15/2006
What next?
 H/W vendor
 System Service Provider
 OS vendor
 OpenSource Community

53
Alan Boda - HP 8/15/2006
Summary

 Tools to use to prep system


 Reconstruction of hang / crash scene
 Before a reboot
 After the reboot
 Methods to approach looking at data

54
Alan Boda - HP 8/15/2006
Questions & Answers
???

55
Alan Boda - HP 8/15/2006
Make your system talk
Prepare your system now
so it can tell you what
happened!

56
Alan Boda - HP 8/15/2006
Appendix – More Information
 http://www.novell.com/coolsolutions/tools/16106.html -- SLES config.sh
 http://come.to/cfg2html -- cfg2html utility to gather system information
 http://www.linuxtroubleshooting.com/wiki/index.php?title=Main_Page – Linux troubleshooting tools
 http://www.volny.cz/linux_monitor/isag/ -- isag
 http://rpmfind.net//linux/RPM/contrib/noarch/noarch/isag-4.1.1-1.noarch.html -- isag
 http://www.sarcheck.com/sclinux.htm -- sarcheck
 http://linuxgazette.net/issue59/nazario.html -- good dmesg description
 http://lkcd.sourceforge.net -- lkcd
 http://lkcd.sourceforge.net/doc/lcrash.pdf -- lcrash HOWTO
 http://lkcd.sourceforge.net/doc/lkcd_tutorial.pdf -- good lkcd tutorial
 /usr/share/doc/packages/lkcdutils/README.SuSE – LKCD setup
 http://www.novell.com/coolsolutions/feature/14813.html -- SLES lkcd
 http://support.novell.com/cgi-bin/search/searchtid.cgi?10099561.htm – SLES lkcd howto
 /usr/share/doc/diskdumputils-*/README -- diskdump setup
 http://www.redhat.com/support/wpapers/redhat/netdump/ -- netdump
 /usr/share/doc/netdump*/README* -- netdump / netdump-server
 http://www.linuxforums.org/forum/peripherals-hardware/35963-cpu-naming-schemes-x86-386-486-586-amd-64-ia64-em64t.html?
highlight=naming+schemes -- good cpu chip reference
 http://mkdump.sourceforge.net -- mkdump
 http://lse.sourceforge.net/kdump/ - kdump
 http://www.linuxdevcenter.com/lpt/a/1319 -- “Linux System Failure Post-Mortem”, by Jennifer Vesperman
(O’Reilly Network)
 http://www.die.net/doc/linux/man/man5/proc.5.html - manpage for /proc details
 http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0509wright/?ca=dgr-lnxw06DB2Linux –
good article on Linux memory utilization
 http://www.ataassociates.com/Process.htm -- accident reconstruction

57
Alan Boda - HP 8/15/2006
Appendix - Vocabulary
 AMD64/EM64T – Similar X86 architectures w/ 64 bit mem registers
collectively known as X86_64
 ARC – Accident Reconstruction Consultant
 ASR - Automatic Server Recovery
 IA64 – CPU based on 64-bit Itanium chipset
 ISAG – Interactive System Activity Grapher
 lkcd – Linux Kernel Crash Dump utility
 mkdump – minikernel dump utility
 RHEL - Red Hat Enterprise Linux
 SAR – System Activity Report
 SLES - SuSE Linux Enterprise Server
 SysRq (aka magic keys) – key sequence intercepted by kernel to
perform certain operations
 x86 – CPU based on Intel 80x86 chipset

58
Alan Boda - HP 8/15/2006
Appendix – Pre-Crash Check List
 Enable SysRq
 Enable sysstat
 Enable system management tools
 Develop emergency procedures
 Train staff in emergency procedures
 Configure and enable dump utility
 Take system snapshot on loaded/running system
 Setup remote console access

59
Alan Boda - HP 8/15/2006
60
Alan Boda - HP 8/15/2006

You might also like