Linux Kdump & Fadump:
Comprehensive Guide:
As a associate system administrator I worked on Redhat Linux servers, including user management, permissions, services, and performance monitoring Automated routine administrative tasks using Bash scripting and cron jobs, reducing manual effort by ~30% I am aws certified sysops administrator and Google Certified Cloud Engineer. Determined to transition my career into cloud architect /Cloud Support role
This guide simplifies the technical concepts of Kdump and Fadump, the two primary mechanisms used to capture "crash dumps" (memory snapshots) when a Linux kernel fails.
1. The Basics: Why do we need Crash Dumps?
When a server crashes or hangs, the goal is First Failure Data Capture (FFDC). This means gathering all the system's memory information (vmcore) the moment the problem occurs so you can analyze it later using tools like crash.
Kdump (The Standard)
Kdump is the go-to mechanism for most Linux systems. It uses kexec to boot into a second, "clean" kernel (called the capture kernel) without performing a full hardware reboot.
- How it works: A small amount of RAM is reserved at boot time for this capture kernel. When the main kernel crashes, the system immediately jumps to the capture kernel to save the memory data to a disk or network.
Fadump (The IBM POWER Alternative):
Firmware-Assisted Dump (Fadump) is specific to IBM POWER systems. Instead of relying on a second kernel sitting in memory, it uses the system's firmware to preserve memory.
- Why use it? It is more robust than Kdump because it fully resets the hardware (PCI slots, I/O devices) before capturing the dump. This ensures a "clean" environment if a hardware driver caused the crash.
2. Configuration Comparison: RHEL vs. SLES
While both distributions use the same underlying technology, the commands and file paths differ slightly.
Step 1: Reserving Memory
You must tell the system how much RAM to set aside for the crash mechanism via the crashkernel boot parameter.
Feature | RHEL (Red Hat / CentOS) | SLES (SUSE) |
Tool |
|
|
Command |
|
|
Apply Changes | Reboot system |
|
Step 2: Selecting a Target
Where should the vmcore file be saved? You can choose local storage or a remote server.
Local:
/var/crash(Default).Network (SSH): Sends the dump to a remote server via encrypted shell.
Network (NFS): Mounts a remote file system to save the dump.
Raw: Writes directly to a specific partition (e.g.,
/dev/sdb1).
3. Optimizing the Dump with makedumpfile
Memory dumps can be massive. To save space, Linux uses a "core collector" called makedumpfile to compress the data and exclude unnecessary parts.
Common Filtering Levels (-d flag):
Level 1: Exclude zero-filled pages.
Level 16: Exclude free pages (most common).
Level 31: Exclude everything except kernel data (smallest file size).
RHEL Example in /etc/kdump.conf:
core_collector makedumpfile -l --message-level 1 -d 31
4. Specifics for Fadump (IBM POWER)
Fadump must be explicitly enabled. It uses the /sys/kernel/fadump/ directory for management.
To enable Fadump:
Add
fadump=onto your kernel boot parameters usinggrubby(RHEL) or editing the GRUB file (SLES).Verify status:
cat /sys/kernel/fadump/enabled(1 means active).Registration: The system must register with firmware to handle the crash:
echo 1 > /sys/kernel/fadump/registered.
5. Summary Checklist
Install: Ensure
kexec-toolsis installed.Reserve: Set the
crashkernelsize in GRUB.Configure: Set the destination path and compression in
/etc/kdump.conf(RHEL) or/etc/sysconfig/kdump(SLES).Test: Trigger a "fake" crash to ensure it works:
Bash
echo 1 > /proc/sys/kernel/sysrq echo c > /proc/sysrq-trigger