The Red Hat Crash Utility is a kernel-specific debugger. It is usually used for performing postmortem system analysis when the system panicked, locked up, or appears unresponsive. In this short article, Eugene Teo will give a quick overview of how you can install crash and how you can use it to get important information from the crash dump files for debugging and root-cause analysis purposes.
Prerequisites
The crash utility has the following three prerequisites:
- Kernel object file: A
vmlinux
kernel object file. Thevmlinux
file associated with the running kernel is typically found in the/boot
directory for Red Hat Enterprise Linux 3 and/usr/lib/debug/lib/modules/
directory for both Red Hat Enterprise Linux 4 and 5. - Kernel crash dump: This may consist of a kernel crash dump file generated from any of the three crash dump facilities (Diskdump, Netdump, or Kdump). The filename is called
vmcore
orvmcore.incomplete
(if it is not generated completely), and is typically found in/var/crash/
by default. Diskdump will be discussed in the next section. - Linux kernel versions: The crash utility is backwards-compatible to at least Red Hat Linux 6.0, up to Red Hat Enterprise Linux 5.
Install crash
Starting with the Red Hat Enterprise Linux 3 release, the crash utility is automatically installed during the system installation if the Development Tools package set is selected. If the crash utility is not installed, download and install the binary RPM as follows:
# rpm -ivh crash-4.0-2.30.i386.rpm Preparing... ########################################### [100%] 1:crash ########################################### [100%]
The crash
executable will be installed in the /usr/bin
directory.
Also, before you can invoke crash on a vmcore, you need to install the associated kernel debuginfo package. The vmlinux kernel debug information is stored in a separate debuginfo file. The debuginfo package needs to match the kernel version, variant (like “smp” or “hugemem”) and architecture. You can download the packages at ftp://ftp.redhat.com/pub. See the comments in the following example:
# file ./vmcore <-- will show you the kernel architecture ./vmcore: ELF 32-bit LSB core file Intel 80386, version 1 (SYSV), SVR4-style, from 'vmlinux' # strings vmcore | fgrep -m1 ‘Linux ‘ <-- will show you the kernel variant Linux version 2.6.9-22.EL (bhcompile@porky.build.redhat.com) (gcc version 3.4.4 20050721 (Red Hat 3.4.4-2)) #1 Mon Sep 19 18:20:28 EDT 2005 # rpm -qa kernel-debuginfo <-- will show you all the versions installed # rpm -ivh kernel-debuginfo-2.6.9-22.EL.i686.rpm Preparing... ########################################### [100%] 1:kernel-debuginfo ########################################### [100%] # ls /usr/lib/debug/lib/modules/ -l total 24 drwxr-xr-x 3 root root 4096 May 2 10:41 2.6.9-22.EL drwxr-xr-x 3 root root 4096 May 2 10:41 2.6.9-22.ELhugemem drwxr-xr-x 3 root root 4096 May 2 10:41 2.6.9-22.ELsmp # ls /usr/lib/debug/lib/modules/2.6.9-22.EL -l total 32848 drwxr-xr-x 9 root root 4096 May 1 19:50 kernel -rwxr-xr-x 1 root root 33583473 Sep 20 2005 vmlinux
Take note that:
- You should use
-ivh
rather than-Uvh
when installing the kernel package. This will preserve the older version of the kernel installed so that you can revert back to a known working version of the kernel should you encounter any problems with the new version. - The kernel-debuginfo package for an older kernel can safely remain installed when installing a newer version. The kernel-debuginfo must match the kernel version, variant, and architecture that created the
vmcore
. See thefile ./vmcore
andstrings vmcore | fgrep -m1 ‘Linux ‘
commands in the above output. - In Red Hat Enterprise Linux 5, the
vmlinux
kernel-debuginfo package is divided into two packages:kernel-debuginfo-version.arch.rpm
andkernel-debuginfo-common-version.arch.rpm
. Both are required in order to perform crash dump analysis on the Red Hat Enterprise Linux 5 kernels.
Run crash
When crash is run on a vmcore, at least two arguments are always required:
<ul
vmlinux
file associated with the running kernel, typically found in/usr/lib/debug/lib/modules/
directory.vmcore
.For example:
# crash /usr/lib/debug/lib/module/vmlinux /var/crash/127.0.0.1-2007-04-30-21\:38/vmcore [...] KERNEL: /usr/lib/debug/lib/modules/2.6.9-22.EL/vmlinux DUMPFILE: /home/eteo/crash/127.0.0.1-2007-04-30-21:38/vmcore CPUS: 1 DATE: Mon Apr 30 21:38:40 2007 UPTIME: 00:04:04 LOAD AVERAGE: 0.36, 0.23, 0.08 TASKS: 36 NODENAME: localhost.localdomain RELEASE: 2.6.9-22.EL VERSION: #1 Mon Sep 19 18:20:28 EDT 2005 MACHINE: i686 (1862 Mhz) MEMORY: 1 GB PANIC: "Oops: 0002 [#1]" (check log for details) PID: 2857 COMMAND: "bash" TASK: f7b677f0 [THREAD_INFO: f7191000] CPU: 0 STATE: TASK_RUNNING (SYSRQ) crash>
Setup Diskdump
Diskdump is one of the two different crash dump facilities that we shipped with Red Hat Enterprise Linux 3 and 4. This article will not cover Netdump or Kdump.
Before you beginning setting up Diskdump on your machine, do read /usr/share/doc/diskdumputils-version/README
to make sure that your machine has a supported storage adapter before proceeding.
Assign a disk device to dump memory. It may be:
- a full disk device (for Red Hat Enterprise Linux 3 only), e.g.
/dev/sda
- a partition of a disk device, e.g.
/dev/sda2
- a swap partition (for Red Hat Enterprise Linux 4 only), e.g.
/dev/sda2
Define the disk device in /etc/sysconfig/diskdump
. In this example, we will use /dev/sda2
:
# vi /etc/sysconfig/diskdump add the line "DEVICE=/dev/sda2"
Load the kernel module:
# tail -f /var/log/message & # modprobe diskdump Apr 30 21:29:20 kerndev kernel: disk_dump: Maximum block size: 16384 Apr 30 21:29:20 kerndev kernel: disk_dump: total blocks required: 261770 (header 3 + bitmap 8 + memory 261759)
See /proc/diskdump
after loading diskdump kernel module:
# cat /proc/diskdump # sample_rate: 8 # block_order: 2 # fallback_on_err: 1 # allow_risky_dumps: 1 # dump_level: 0 # total_blocks: 261770 #
Format the diskdump device:
# service diskdump initialformat /dev/sda2: [100.0%]
See /proc/diskdump
after formatting:
# tail -n2 /proc/diskdump sda2 102398310 10233405
Enable Diskdump service:
# chkconfig diskdump on # service diskdump start Starting diskdump: [ OK ] # Apr 30 21:31:19 kerndev diskdump: activating succeeded
Test that Diskdump works. The following commands will crash your machine:
# echo 1 > /proc/sys/kernel/sysrq # echo c > /proc/sysrq-trigger
Make sure that you run the above two commands in console (press Ctrl + Alt + F1), so that we can see what is happening when your system crashes. You have to perform this so that you can have a vmcore
file to follow the rest of the paper. It will be located at /var/crash
.
Commonly Used Crash Commands
There are many commands in crash
. It is also possible to extend crash
by adding new commands, by writing new code and compiling it into the crash
executable, or creating a shared object library that can be dynamically loaded by using the extend
command. The following are some commonly used crash
commands that you will likely use:
- help – get help
crash
has a readily available help information built into the utility, by typinghelp
. Each command has its ownman
-like page, which can be viewed by typinghelp command-name
.crash> help * files mod runq union alias foreach mount search vm ascii fuser net set vtop bt gdb p sig waitq btop help ps struct whatis dev irq pte swap wr dis kmem ptob sym q eval list ptov sys exit log rd task extend mach repeat timer crash version: 4.0-3.3 gdb version: 6.1 For help on any command above, enter "help ". For help on input options, enter "help input". For help on output options, enter "help output".
Tip: all the
crash
commands can be piped to external programs or redirected to files:crash> log > log.txt
This will send the in-kernel log to a local file called
log.txt
.crash> ps | fgrep bash | wc -l
This will count the number of
bash
tasks that were running. - sys – system data
crash> sys KERNEL: /usr/lib/debug/lib/modules/2.6.9-22.EL/vmlinux DUMPFILE: /home/eteo/crash/127.0.0.1-2007-04-30-21:38/vmcore CPUS: 1 DATE: Mon Apr 30 21:38:40 2007 UPTIME: 00:04:04 LOAD AVERAGE: 0.36, 0.23, 0.08 TASKS: 36 NODENAME: localhost.localdomain RELEASE: 2.6.9-22.EL VERSION: #1 Mon Sep 19 18:20:28 EDT 2005 MACHINE: i686 (1862 Mhz) MEMORY: 1 GB PANIC: "Oops: 0002 [#1]" (check log for details)
The sys messages have information of the system (e.g. kernel release, kernel version, number of CPUs, amount of memory, etc), the time of
vmcore
taken, the operating period, and the panic (e.g. oops type, panic task/PID/command, etc). - bt – backtrace
crash> bt PID: 2857 TASK: f7b677f0 CPU: 0 COMMAND: "bash" #0 [f7191e04] start_disk_dump at f89d7bb3 #1 [f7191e18] die at c010682e #2 [f7191e48] do_page_fault at c011ab00 [...] #9 [f7191fc0] system_call at c030f918 EAX: 00000004 EBX: 00000001 ECX: b7de7000 EDX: 00000002 DS: 007b ESI: 00000002 ES: 007b EDI: b7de7000 SS: 007b ESP: bfe01650 EBP: bfe01670 CS: 0073 EIP: 003297a2 ERR: 00000004 EFLAGS: 00000246
- log – dump system message buffer
crash> log [...] SysRq : Crashing the kernel by request Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c0233fa7 *pde = 3e9f3067 Oops: 0002 [#1] Modules linked in: md5 ipv6 autofs4 i2c_dev i2c_core sunrpc scsi_dump diskdump dm_mirror dm_mod button battery ac yenta_socket pcmcia_core uhci_hcd ehci_hcd shpchp snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore ipw2200 ieee80211 ieee80211_crypt tg3 floppy ext3 jbd ata_piix libata sd_mod scsi_mod CPU: 0 EIP: 0060:[] Not tainted VLI EFLAGS: 00010246 (2.6.9-22.EL) EIP is at sysrq_handle_crash+0x0/0x8 eax: 00000063 ebx: c0370db4 ecx: 00000000 edx: 00000000 esi: 00000063 edi: 00000000 ebp: 00000000 esp: f7191f60 ds: 007b es: 007b ss: 0068 Process bash (pid: 2857, threadinfo=f7191000 task=f7b677f0) Stack: c02342d8 c032dc4e c032f105 00000003 00000002 f7b6adc0 00000002 f7191fac c01a8a13 c0362740 c0168205 f7191fac b7de7000 f7b6adc0 fffffff7 b7de7000 f7191000 c01682cf f7191fac 00000000 00000000 00000000 00000001 00000002 Call Trace: [] __handle_sysrq+0x58/0xc6 [] write_sysrq_trigger+0x23/0x29 [] vfs_write+0xb6/0xe2 [] sys_write+0x3c/0x62 [] syscall_call+0x7/0xb Code: 4c 11 42 c0 05 00 00 00 c7 05 50 11 42 c0 2f cc 31 c0 c7 05 54 11 42 c0 00 00 00 00 c7 05 58 11 42 c0 00 00 00 00 e9 e5 0b f0 ff 05 00 00 00 00 00 c3 e9 e1 59 f3 ff e9 1e bc f3 ff 85 d2 89
The log command dumps the kernel log buffer contents in chronological order. This is similar to what you would see when you type
dmesg
on a running machine. This is useful when you want to look at the panic or oops message. An oops is triggered by some exception. It is a dump of the CPU register's state and kernel stack at that instant. From the panic message, we can find hints as to how the panic was triggered (e.g. the function or process or pid or command or address that triggered the panic), the register's information, kernel module list, whether the kernel is tainted with proprietary kernel modules loaded, and so on. Let’s walk through the panic message to see what we can learn from it. See the comments below each section within the log:crash> log [...] SysRq : Crashing the kernel by request <-- this panic is intentional Unable to handle kernel NULL pointer dereference at virtual address 00000000
This is the address to which reference was attempted.
printing eip: c0233fa7
This is the address at which the failure occurred.
*pde = 3e9f3067 Oops: 0002 [#1]
Often one oops will trigger more; only the first is reliable.
Modules linked in: md5 ipv6 autofs4 i2c_dev i2c_core sunrpc scsi_dump diskdump dm_mirror dm_mod button battery ac yenta_socket pcmcia_core uhci_hcd ehci_hcd shpchp snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore ipw2200 ieee80211 ieee80211_crypt tg3 floppy ext3 jbd ata_piix libata sd_mod scsi_mod CPU: 0 EIP: 0060:[] Not tainted VLI
The first part is the code segment and instruction address. If tainted, it will be followed by:
G – All modules loaded have a GPL or compatible license P – Proprietary modules loaded F – Module forcibly loaded S – Oops on hardware that are not SMP capable R – Module forcibly unloaded M - Machine Check Exception (MCE) occurred etc (see Further readings section).
EFLAGS: 00010246 (2.6.9-22.EL)
This line denotes program status, registers information.
<pre class="EIP is at sysrq_handle_crash+0x0/0x8
eax: 00000063 ebx: c0370db4 ecx: 00000000 edx: 00000000
esi: 00000063 edi: 00000000 ebp: 00000000 esp: f7191f60
ds: 007b es: 007b ss: 0068
Process bash (pid: 2857, threadinfo=f7191000 task=f7b677f0)
Stack: c02342d8 c032dc4e c032f105 00000003 00000002 f7b6adc0 00000002
These are the stack, operations, return addresses.f7191fac f7191000 c01682cf f7191fac 00000000 00000000 00000000 00000001 00000002
Call Trace:
This is the backtrace of function calls.
[] __handle_sysrq+0x58/0xc6 [] write_sysrq_trigger+0x23/0x29 [] vfs_write+0xb6/0xe2 [] sys_write+0x3c/0x62 [] syscall_call+0x7/0xb Code: 4c 11 42 c0 05 00 00 00 c7 05 50 11 42 c0 2f cc 31 c0 c7 05 54 11 42 c0 00 00 00 00 c7 05 58 11 42 c0 00 00 00 00 e9 e5 0b f0 ff 05 00 00 00 00 00 c3 e9 e1 59 f3 ff e9 1e bc f3 ff 85 d2 89
From the line
c0233fa7
, we can see that this is the address at which the failure occurred. Issuing the following command can give us more hints as to which function or source code or assembly statement in the kernel triggered that:crash> dis -lr c0233fa7 /usr/src/build/614745-i686/BUILD/kernel-2.6.9/linux- 2.6.9/drivers/char/sysrq.c: 115 0xc0233fa7 : movb $0x0,0x0
- ps – display process status information
crash> ps PID PPID CPU TASK ST %MEM VSZ RSS COMM 0 0 0 c0358be0 RU 0.0 0 0 [swapper] 1 0 0 f7e01770 IN 0.1 1680 684 init [...] 2380 1 0 f7ac2800 IN 0.0 1604 504 mingetty 2769 2371 0 f7ac3970 IN 0.2 5740 1636 bash 2852 1 0 f7b1a880 IN 0.2 4240 2012 sshd 2855 2852 0 f7b66680 IN 0.3 8316 2756 sshd > 2857 2855 0 f7b677f0 RU 0.2 6260 1628 bash Sometimes it is useful to know which process belongs to which parent or vice versa.
ps
has-c
and-p
to show the child and parent processes.crash> ps -p 2857 PID: 0 TASK: c0358be0 CPU: 0 COMMAND: "swapper" PID: 1 TASK: f7e01770 CPU: 0 COMMAND: "init" PID: 2852 TASK: f7b1a880 CPU: 0 COMMAND: "sshd" PID: 2855 TASK: f7b66680 CPU: 0 COMMAND: "sshd" PID: 2857 TASK: f7b677f0 CPU: 0 COMMAND: "bash"
- files – open files
crash> files PID: 2857 TASK: f7b677f0 CPU: 0 COMMAND: "bash" ROOT: / CWD: /root FD FILE DENTRY INODE TYPE PATH 0 f7a6e7c0 f7790198 f7b0fdcc CHR /dev/pts/0 1 f7b6adc0 f7190130 f7b9ca4c REG /proc/sysrq-trigger 2 f7a6e7c0 f7790198 f7b0fdcc CHR /dev/pts/0 10 f7a6e7c0 f7790198 f7b0fdcc CHR /dev/pts/0 255 f7a6e7c0 f7790198 f7b0fdcc CHR /dev/pts/0 crash> files 2852 PID: 2852 TASK: f7b1a880 CPU: 0 COMMAND: "sshd" ROOT: / CWD: / FD FILE DENTRY INODE TYPE PATH 0 f7b336c0 f78001d8 f7cb1ba4 CHR /dev/null 1 f7b336c0 f78001d8 f7cb1ba4 CHR /dev/null 2 f7b336c0 f78001d8 f7cb1ba4 CHR /dev/null 3 f7b69600 f7bf5280 f7aadafc SOCK socket:/[6277]
- dev – device data
crash> help dev [...] If no argument is entered, this command dumps the contents of the chrdevs and blkdevs arrays. crash> dev CHRDEV NAME OPERATIONS 1 mem (none) 4 /dev/vc/0 (none) 4 tty (none) [...] BLKDEV NAME OPERATIONS 1 ramdisk c0376d08 2 fd (unknown) 8 sd f880e070
Further reading
To learn more about this topic, check out the following reference links:
- White Paper: Red Hat Crash Utility by David Anderson, 2003
- More about Oops
- To find out more about Diskdump, read
/usr/share/doc/diskdumputils-version/README
- Red Hat Kbase: How do I configure a Netdump Server and a Netdump Client?
- Red Hat Kbase: How do I configure kexec/kdump on Red Hat Enterprise Linux 5?
Acknowledgements
The author would like to thank Guy Streeter, Wade Mealing, and Masahiro Matsuya for reviewing and suggesting improvements for several drafts of this article.
About the author
Eugene Teo is Technical Account Manager of Red Hat’s Asia Pacific Global Support Services. Eugene received his bachelor’s degree in Computing from the National University of Singapore. In his spare time, Eugene enjoys learning new things, auditing the Linux kernel source code, and contributing kernel fixes.
15 responses to “A quick overview of Linux kernel crash dump analysis”
'OS > Linux' 카테고리의 다른 글
[RHEL] Linux HugePages and virtual memory (VM) tuning (0) | 2014.03.08 |
---|---|
[RHEL] hugetlbpage.txt (0) | 2014.03.08 |
[RHEL] Large Memory Optimization, Big Pages, and Huge Pages (0) | 2014.03.08 |
[RHEL] Large Page Memory (0) | 2014.03.08 |
[RHEL] Understanding Linux buffers/cached (0) | 2014.01.29 |
[RHEL] Analyzing the Core Dump (0) | 2014.01.29 |
[RHEL] How to analyze and interpret sar data. (0) | 2014.01.29 |
[RHEL] How to use a new partition in RHEL6 without reboot? (0) | 2014.01.29 |
[Linux] GNU 'make' (0) | 2011.02.09 |
[Linux] RPM 사용하기 (0) | 2011.02.09 |
August 15th, 2007 at 12:27 pm
The link “ftp://ftp.redhat.com/pub” is formatted very strangely. The quotes for the tag’s href attribute appear to be “smart quotes” style or something other than a standard quote.
August 24th, 2007 at 10:44 pm
The URLs for the Red Hat Knowledge Base are bad. The URL and text before and after the FAQ URL is the problem.
How do I configure a Netdump Server and a Netdump Client?
- Bad: http://www.redhatmagazine.com/2007/08/15/a-quick-overview-of-linux-kernel-crash-dump-analysis/%E2%80%9Dhttp://kbase.redhat.com/faq/FAQ_43_2467.shtm%E2%80%9D
- Good: http://kbase.redhat.com/faq/FAQ_43_2467.shtm
How do I configure kexec/kdump on Red Hat Enterprise Linux 5?
- Bad: http://www.redhatmagazine.com/2007/08/15/a-quick-overview-of-linux-kernel-crash-dump-analysis/%E2%80%9Dhttp://kbase.redhat.com/faq/FAQ_105_9036.shtm%E2%80%9D
- Good: http://kbase.redhat.com/faq/FAQ_105_9036.shtm
August 25th, 2007 at 4:51 am
Thanks guys. I have informed the editor about the formatting problems.
September 4th, 2007 at 10:23 am
It is good article.
Can any one post few more details on crash dump on different hardware platform.
so we can come to know more error.
Thanks.
Ashish Barot.
September 20th, 2007 at 8:40 am
Hi,
Not very usable because of the formatting.
It would be better to provide also a pdf version.
Thanks
September 27th, 2007 at 8:02 am
Ashish, thanks for the suggestion.
Safir, I will provide a pdf version, and post the link here soon. Stay tuned.
October 9th, 2007 at 6:30 pm
Where can we get the kernel-debuginfo RPM’s for the production Red Hat kernels? YUM, UP2DATE and RHN don’t seem to have them available.
Do I need to rebuild the kernel from the SRPM?
October 9th, 2007 at 6:34 pm
Hi Philip,
You can download the packages at ftp://ftp.redhat.com/pub. No you do not need to rebuild the kernel from the source RPM.
Thanks.
October 9th, 2007 at 6:57 pm
Hi Philip,
If you are using yum, take a look at the .repo files in /etc/yum.repos.d/. For example, in Fedora 7, you can set enabled=0 to 1 under [fedora-debuginfo] in /etc/yum.repos.d/fedora.repo file. Once you done that, type “yum clean all” on the command line, and then start yum -y install the kernel-debuginfo rpm you need.
Hope this info helps.
Eugene
October 17th, 2008 at 2:28 pm
This is nice but do you have something a little bit more recent? It seems “printing eip:” has been depreciated in Redhat 5.2. Unless I am doing something wrong.
TIA
November 16th, 2008 at 11:41 pm
[...] A quick overview of Linux kernel crash dump analysis [...]
November 18th, 2008 at 6:58 pm
Thanks Mag. I will try to update the article soon. Will keep you posted.
November 25th, 2008 at 10:20 pm
In the calltrace, does anyone know what the numbers after the + mean (see below)? For example, after __handle_sysrq is 0×58/0xc6. Is this some sort of offset into the function where the call was made? I need to pinpoint an exception in a lengthy function. Thanks in advance.
Call Trace:
[] __handle_sysrq+0×58/0xc6
[] write_sysrq_trigger+0×23/0×29
December 3rd, 2008 at 9:50 pm
Manish, reg:
+x/y
x represents the approx offset into the function .
y represents the approx total length of that function. Actually, y is the distance to the next global symbol. Therefore, these are approximations and not exact – but they do give you a “good enough” idea of where the faulting code lies. Typically, you could now try using objdump to disassemble and look at the offsets that show up, matching to the closest ‘x’ offset above.
December 16th, 2008 at 6:31 pm
Thanks Kaiwan!