https://blogs.oracle.com/linux/generating-a-vmcore-in-oci
https://blogs.oracle.com/linux/vmcore-smaller-than-you-think
https://blogs.oracle.com/linux/enter-the-drgn
https://blogs.oracle.com/linux/extracting-kernel-stack-function-arguments-from-linux-x86-64-kernel-crash-dumps
https://man7.org/linux/man-pages/man2/kexec_load.2.html
https://www.kernel.org/doc/Documentation/kdump/kdump.txt
https://github.com/makedumpfile/makedumpfile
https://manpages.debian.org/testing/makedumpfile/makedumpfile.conf.5.en.html#FILTER_COMMANDS
https://man7.org/linux/man-pages/man5/elf.5.html
https://sourceforge.net/projects/lkdump/
https://github.com/makedumpfile/makedumpfile/blob/f23bb943568188a2746dbf9b6692668f5a2ac3b6/diskdump_mod.h#L44
https://linux.die.net/man/8/makedumpfile
https://github.com/ptesarik/libkdumpfile/tree/tip/examples
https://qemu-project.gitlab.io/qemu/interop/qemu-qmp-ref.html#qapidoc-1360
https://xenbits.xen.org/docs/4.6-testing/misc/dump-core-format.txt
https://github.com/crash-utility/crash/blob/3253e5ac87c67dd7742e2b2bd9d912f21c1d2711/main.c#L427-L694
https://crash-utility.github.io/
https://sourceforge.net/projects/pykdump/
https://github.com/ptesarik/libkdumpfile
https://drgn.readthedocs.io/en/latest/
https://github.com/ptesarik/libkdumpfile/tree/tip/examples
https://github.com/brenns10/kernel_stuff/tree/master/vmcoreinfo
https://github.com/torvalds/linux/blob/master/fs/proc/kcore.c
https://github.com/makedumpfile/makedumpfile/blob/master/diskdump_mod.h
https://github.com/qemu/qemu/blob/master/dump/dump.c
https://github.com/crash-utility/crash/blob/master/diskdump.c
https://github.com/ptesarik/libkdumpfile/blob/tip/src/kdumpfile/diskdump.c
https://github.com/ptesarik/libkdumpfile/blob/tip/src/kdumpfile/elfdump.c

Linux kernel core dumps are often critical for diagnosing and fixing problems with the OS. We’ve published several blogs related to kernel core dumps, including how to generate them, how to estimate their size, how to analyze them with Drgn, and even how to manually extract stack function arguments from them. But have you ever wondered what’s really in a core dump? In this blog, we’ll answer that question. We’ll start by discussing the different software which can actually produce a vmcore (there are more than you might think). Next, we’ll discuss the contents of vmcores, including what important metadata needs to be present in order to make analysis possible. We’ll then dive into the details of a few of the most prominent vmcore formats and variations on them, before finishing with a quick overview of some tools that can be used to analyze them.
Linux 内核核心转储常常是诊断并修复操作系统问题的关键依据。我们此前已发布多篇与内核 core dump 相关的博客,内容涵盖如何生成 core dump、如何估算其体积、如何使用 Drgn 进行分析,甚至包括如何从中手工提取栈上函数参数。然而,core dump 中究竟“真实包含”哪些内容,是否曾引发你的好奇?本文将回答这一问题:首先讨论能够实际生成 vmcore 的各类软件(其数量可能超出直觉);随后阐述 vmcore 的内容构成,并强调为使分析成为可能而必须具备的关键元数据;继而深入若干最常见 vmcore 格式及其变体的细节;最后对可用于分析 vmcore 的工具进行简要综述。
This topic has a lot of history and so much variation, that there’s no hope of covering it all. Instead, we’ll focus on the sources and formats most commonly found with Oracle Linux, which should cover much of the modern desktop and server Linux landscape. This blog isn’t intended to be a step-by-step guide to achieving a particular task; it’s just a reference and introduction to a field that’s not frequently discussed.
该主题具有较长的历史脉络且存在高度多样性,因而不可能在一篇文章中穷尽全部情形。本文将聚焦于 Oracle Linux 生态中最常见的 core dump 来源与格式,这一范围在很大程度上能够覆盖现代桌面与服务器 Linux 的主流场景。本文并非面向某一特定任务的操作型“手把手教程”,而是对一个相对少被系统讨论的领域提供参考性梳理与导论。
Core dump sources
When the Linux kernel encounters an unrecoverable error (a “panic”) or a hang, it’s incredibly useful to save the state of memory, registers, etc. into a file for later analysis & debugging. In these situations, the system may be in a dire state, so the process which creates the core dump must be reliable. Further, downtime and disk space can be expensive, so the process must also be reasonably quick, and should avoid capturing unnecessary information. This is a tall order, and so there are multiple ways a core dump can be created, each with different trade-offs.
当 Linux 内核遭遇不可恢复错误(即“panic”)或出现卡死(hang)时,将当时的内存、寄存器等状态保存至文件,以便后续分析与调试,具有极高价值。在此类场景下,系统往往处于不稳定甚至危险状态,因此生成 core dump 的流程必须具备高度可靠性。与此同时,停机时间与磁盘空间通常成本不菲,因此该流程还应具备合理的速度,并尽量避免采集无助于分析的冗余信息。这些目标并不容易同时满足,因而现实中存在多种生成 core dump 的方案,各自对应不同的权衡取舍。
For distributions like Oracle Linux, the most common way to create a vmcore is by using kexec(8), generally with makedumpfile. However, it can also be done by a hypervisor or firmware level crash dump system. We’ll explain and discuss the different possibilities in this section.
在 Oracle Linux 等发行版中,生成 vmcore 最常见的方法是使用 `kexec(8)`,通常配合 `makedumpfile`。此外,vmcore 亦可由虚拟机监控器(hypervisor)或固件层(firmware level)的崩溃转储系统生成。本节将对这些不同路径进行解释与讨论。
Kexec crash kernels
On a system which is configured for kexec crash dumps, some memory is reserved at boot time for a second Linux kernel (the “kdump kernel”). On startup, the system uses kexec_load(2) to load a kernel image into this reserved memory region. If a panic occurs, all CPUs are halted and control is transferred to the kdump kernel. The kdump kernel boots up and represents the memory image of the previous kernel as the file /proc/vmcore – thus the name “vmcore”. Normally, the kdump kernel is configured to execute a tool (typically makedumpfile) which will then save this file to a disk or network location, and then reboot.
在启用 kexec 崩溃转储的系统上,启动阶段会预留一段内存供第二个 Linux 内核使用(称为 “kdump kernel”)。系统启动时通过 `kexec_load(2)` 将内核镜像加载到该预留内存区域。一旦发生 panic,所有 CPU 会被停止,控制权切换至 kdump 内核。kdump 内核随后启动,并将前一个内核的内存镜像以文件 `/proc/vmcore` 的形式对外呈现,这亦是 “vmcore” 命名的来源。通常,kdump 内核会被配置为执行某个工具(典型为 `makedumpfile`),将该文件保存到磁盘或网络位置,随后重启系统。
The /proc/vmcore file is thus one of the most common sources for kernel core dumps. The data is represented in ELF (Executable and Linkable Format), which we will discuss a bit more later on. However, if you were to go searching on your Linux machine for a /proc/vmcore file, you probably wouldn’t find it, because it only appears when you’re running within a kdump kernel. (More precisely, only when the elfcorehdr command-line option points at a valid ELF header created by the original kernel). But there is also another ELF formatted core image which the kernel provides: /proc/kcore.
因此,`/proc/vmcore` 是最常见的内核 core dump 来源之一,其数据以 ELF(Executable and Linkable Format,可执行与可链接格式)表示,后文将进一步讨论。需要注意的是,在常规运行的 Linux 系统中通常找不到 `/proc/vmcore`,因为它只会在 kdump 内核环境中出现。(更精确地说,仅当内核命令行参数 `elfcorehdr` 指向由原始内核创建的有效 ELF 头部时才会出现。)与此同时,内核还提供另一种 ELF 格式的核心镜像:`/proc/kcore`。
Linux running kernels
Unlike /proc/vmcore, the file /proc/kcore is always available. Rather than showing the memory image of the previously crashed kernel, it shows the live memory image of the currently running kernel. It’s common for people to run live debugging tools like crash or drgn against /proc/kcore, since it’s always present and easy to access, assuming that security features such as lockdown=confidentiality are not active.
与 `/proc/vmcore` 不同,`/proc/kcore` 始终可用。它并非展示“此前崩溃的内核”的内存镜像,而是展示“当前正在运行的内核”的实时内存镜像。在未启用诸如 `lockdown=confidentiality` 等安全特性时,研究者常直接针对 `/proc/kcore` 运行 `crash`、`drgn` 等在线调试工具,因为该接口常驻且获取便利。
You could also create a copy of /proc/kcore much like you could of /proc/vmcore, however this isn’t terribly common. Typical use of /proc/kcore is for live debugging, while /proc/vmcore is normally saved for later inspection.
类似于对 `/proc/vmcore` 的处理,理论上也可以复制 `/proc/kcore` 以留存分析,但这种做法并不常见。一般而言,`/proc/kcore` 更偏向用于在线调试,而 `/proc/vmcore` 则通常被保存以供离线检视。
Makedumpfile
Both /proc/vmcore and /proc/kcore are direct representations of the memory space of a kernel, using the ELF format. This means that the files take up roughly the same amount of space as the physical memory in use by the kernel. On a laptop with 8GiB or 16GiB of memory, that may not be too bad, but for servers with 100s of GiBs or even TiBs of memory, that’s just not acceptable.
`/proc/vmcore` 与 `/proc/kcore` 均以 ELF 格式直接表示内核的内存空间,这意味着其文件体积大体与内核正在使用的物理内存规模同量级。在具备 8GiB 或 16GiB 内存的笔记本上或许尚可接受,但对于拥有数百 GiB 甚至 TiB 级内存的服务器而言,这样的体积通常难以接受。
The makedumpfile tool is used to create a much smaller dump file, using two main strategies. First, it can omit data in memory that may not be useful for debugging the kernel (e.g. memory filled with zero’s, free memory, userspace memory, etc). Second, for the data that is included, it can compress each page of memory, if the output format supports it. Incidentally, it also supports the ability to “filter out” data for particular symbols or data structures, but this doesn’t reduce the data size: it simply redacts the data.
`makedumpfile` 工具用于生成体积显著更小的转储文件,主要依赖两类策略:其一,省略对内核调试价值较低的内存数据(例如全零内存、空闲内存、用户态内存等);其二,对于保留的数据,若输出格式支持,则可按页对内存进行压缩。值得一提的是,该工具还支持针对特定符号或数据结构“过滤”数据,但这并不会减少文件体积,其效果主要是对数据进行遮蔽(redaction)。
In a typical configuration, the kdump kernel is configured to use makedumpfile to save the /proc/vmcore file to disk or the network. You might expect this to take longer than simply copying the file to disk, but usually it’s faster: the slow part is writing the file to disk, and by omitting and compressing data, the time spent doing I/O is greatly decreased. So the end result is a vmcore which is quicker to generate, and takes up much less space than the original /proc/vmcore file would have. It’s less common, but perfectly valid to use makedumpfile to create a compressed dump of your currently-running kernel too, by running it on /proc/kcore instead. Makedumpfile also supports running against already-created core dump files, in order to re-filter them or convert the format.
在典型配置中,kdump 内核会使用 `makedumpfile` 将 `/proc/vmcore` 保存到磁盘或网络位置。直观上可能认为这比“直接复制文件”更慢,但实践中通常更快:瓶颈往往在于写盘 I/O,而通过省略与压缩数据,可显著降低 I/O 负担。因此最终得到的 vmcore 不仅生成更快,体积也远小于原始 `/proc/vmcore`。虽然不常见,但同样可以对正在运行的内核执行 `makedumpfile`(以 `/proc/kcore` 为输入)来生成压缩转储。此外,`makedumpfile` 也支持对既有 core dump 文件再次运行,以便重新过滤或转换格式。
Hypervisors
So far, all of the core dumps we’ve discussed came from Linux’s /proc/kcore or /proc/vmcore, with a possible intermediate step through makedumpfile. However, the Linux kernel is frequently run as a virtual machine guest, and in those cases, a hypervisor is responsible for managing the VM’s memory. It’s entirely possible for a hypervisor to create a core dump itself, by pausing the execution of the VM (to ensure consistency), and then saving a copy of that memory and the CPU state. This may be necessary in rare cases where the VM guest is unresponsive for some reason. If the normal means of triggering a panic & kdump within the guest OS are unsuccessful, a hypervisor core dump could provide you the necessary information to resolve the issue.
截至目前,本文讨论的 core dump 均源自 Linux 的 `/proc/kcore` 或 `/proc/vmcore`,并可能经过 `makedumpfile` 的中间处理。然而,Linux 内核经常以虚拟机来宾(guest)的形式运行,此时虚拟机的内存由 hypervisor 负责管理。hypervisor 完全可能自行生成 core dump:通过暂停虚拟机执行以保证一致性,然后保存内存副本与 CPU 状态。在少数来宾系统因故无响应的情况下,这一机制可能成为必要手段;若在来宾系统内部触发 panic 与 kdump 的常规方式失效,hypervisor 级的 core dump 仍可能提供解决问题所需的关键信息。
A current common example of a hypervisor creating a core dump would be QEMU, which supports a dump-guest-memory QMP/HMP command, which is frequently used via libvirt’s virsh dump command. You can also create memory dumps with Hyper-V, Xen, and other hypervisors.
当下一个常见的 hypervisor 生成 core dump 的例子是 QEMU:它支持 `dump-guest-memory` 的 QMP/HMP 命令,并常通过 libvirt 的 `virsh dump` 命令调用。此外,Hyper-V、Xen 等其他 hypervisor 也能够生成内存转储。
Some hypervisors, such as Hyper-V, have their own custom dump formats. Others, like QEMU, support a range of formats. And naturally, each hypervisor tends to create its core dump format using its own implementation and quirks. For example, ELF vmcores generated by QEMU will look different than the ELF /proc/vmcore, which itself even looks different than the /proc/kcore file. As a result, it’s not always enough to simply know what format a vmcore is in: you may also need to know what created it.
部分 hypervisor(如 Hyper-V)具有自定义的转储格式;另一些(如 QEMU)则支持多种格式。并且,各 hypervisor 往往基于自身实现细节与特性生成 core dump,从而呈现出各自“风格”。例如,QEMU 生成的 ELF vmcore 在结构上会不同于 ELF 格式的 `/proc/vmcore`,而后者又与 `/proc/kcore` 的表现并不一致。因此,分析时仅知道 vmcore “属于何种格式”往往不够,还需要明确“由谁生成”。
Others
While /proc/vmcore, /proc/kcore, and hypervisors are certainly the most common sources of core dumps we see, this list is by no means exhaustive. Some server vendors provide firmware-based diagnostics and dumping mechanisms, which are similar in principle to hypervisor core dumps: the firmware takes the place of the hypervisor, halting the machine and saving elements of physical memory. These vendor-specific solutions will vary widely in capability, quality, and their output formats. And even outside of these solutions, there are other application-specific solutions (e.g. those which are better suited for embedded devices). And of course, there are historical systems & formats which are mostly unused today.
尽管 `/proc/vmcore`、`/proc/kcore` 与 hypervisor 是最常见的 core dump 来源,但该列表远非穷尽。一些服务器厂商还提供固件层的诊断与转储机制,其原理与 hypervisor core dump 相近:由固件替代 hypervisor,停止机器并保存部分物理内存。这类厂商方案在能力、质量与输出格式方面差异显著。此外,还存在面向特定应用场景的方案(例如更适用于嵌入式设备者),以及在当代已基本不再使用的历史系统与格式。
For the purpose of this article, we can’t cover all of those, so we’ll stick to the world of Linux ELF vmcores, makedumpfile, and hypervisors. These are the ones we see most commonly with Oracle Linux.
出于篇幅与聚焦考虑,本文无法覆盖上述所有路径,因此将限定在 Linux ELF vmcore、`makedumpfile` 与 hypervisor 相关的范畴;它们也是 Oracle Linux 场景中最常见的组合。
Data contained in core dumps
As we’ve seen, there’s quite a diversity of tools which can be used to create a kernel core dump. But they’re all trying to achieve the same end goal: provide enough data that an analysis tool can understand the dump, and allow a user to analyze the information. The end goal is to allow that user to debug whatever issue led to the core dump being created in the first place.
如前所述,生成内核 core dump 的工具体系相当多样,但其目标一致:提供足够数据,使分析工具能够理解该转储,并让使用者得以分析其中信息,进而定位并调试最初触发 core dump 的问题根因。
The main information provided by the core dump is, of course, the contents of memory. However, on its own, the memory contents are not enough information for tools to interpret meaningful information such as variable values, stack traces, log buffer contents, etc. Tools will need additional information to properly interpret a core dump:
core dump 提供的核心信息当然是内存内容本身。然而,仅有内存内容并不足以让工具解释出变量值、栈回溯、日志缓冲区内容等“语义层面”信息。为正确解读 core dump,工具还需要额外的辅助信息(元数据):
Probably the most important information is the exact kernel release as reported by uname -r, for example: 5.15.0-104.119.4.2.el9uek.x86_64. This string usually identifies a specific kernel build released by your distribution, and it is generally enough to identify the specific *-debuginfo package which applies to that kernel. Increasingly, debuggers are relying on a special value called the “build ID” which is unique to each build and can also be used to find debugging information, so the build ID may be an important metadata as well. The debugging information typically contains an ELF symbol table as well as DWARF information that describes variables, types, and much more, allowing debuggers to introspect data structures.
最重要的信息之一是由 `uname -r` 报告的精确内核发行版本字符串,例如:`5.15.0-104.119.4.2.el9uek.x86_64`。该字符串通常用于唯一标识发行版发布的某个具体内核构建版本,且一般足以确定与之匹配的 `*-debuginfo` 包。近年来,调试器越来越依赖一个称为 “build ID” 的特殊值,它对每次构建唯一,并可用于定位调试信息,因此 build ID 也可能是关键元数据。调试信息通常包含 ELF 符号表以及 DWARF 信息,后者描述变量、类型及更多细节,从而使调试器能够对数据结构进行内省式解析。
Another fundamental requirement is to know the architecture that the core dump came from. This obviously includes the architecture name (x86_64, aarch64, etc.), but it may also include details such as page size, word size, or endianness that could be left unspecified by the architecture.
另一项基础要求是明确 core dump 所属的体系结构。这不仅包括架构名称(如 `x86_64`、`aarch64` 等),还可能包括页大小、字长、端序等细节,而这些细节在不同架构实现中未必总是隐含明确。
The kernel is responsible for managing memory, and that includes maintaining the page tables. The vast majority of the kernel works with virtual addresses, and so debugging tools need to understand those virtual address mappings. Some core dump formats can represent the virtual address space as part of their memory encoding. In other cases, the core dump simply represents physical memory addresses, and leaves the debugger to find and interpret the page tables via other metadata. In Linux, the symbol swapper_pg_dir refers to the virtual address of the root page table. With some architecture specific metadata to translate that virtual address to its corresponding physical address, and with page table support for the architecture, a debugger can traverse these tables and understand the virtual address mappings.
内核负责内存管理,其中包括维护页表。内核绝大多数代码以虚拟地址工作,因此调试工具必须理解虚拟地址映射关系。部分 core dump 格式可在其内存编码中直接表示虚拟地址空间;而在其他情况下,core dump 仅表示物理内存地址,调试器需要借助其他元数据自行定位并解释页表。在 Linux 中,符号 `swapper_pg_dir` 指向根页表的虚拟地址;结合与架构相关的元数据(用于将该虚拟地址转换为对应物理地址)以及架构页表解析支持,调试器即可遍历页表并还原虚拟地址映射。
Most kernel configurations randomize at boot time the physical base address at which the kernel is loaded, as well as the virtual memory addresses where the kernel is mapped. These are different forms of so-called KASLR, or Kernel Address Space Layout Randomization. While it’s possible to search for well-known data in a core dump in order to “break” the KASLR, this takes a long time and can be error-prone, so it’s important for the dump to contain these KASLR offsets.
多数内核配置会在启动时随机化内核加载的物理基址,以及内核映射到虚拟内存中的地址位置,这些均属于所谓 KASLR(Kernel Address Space Layout Randomization,内核地址空间布局随机化)的不同形式。虽然理论上可以在 core dump 中搜索某些“已知特征数据”以“破除”KASLR,但该方法耗时且易出错,因此转储中包含 KASLR 偏移信息至关重要。
Finally, the values of the registers for each CPU are very important. These include the program counter & stack pointer, which are crucial for creating an accurate stack trace for the code executing on a particular CPU. These are typically represented in an architecture-specific ELF note called NT_PRSTATUS.
最后,每个 CPU 的寄存器值同样关键,尤其包括程序计数器(program counter)与栈指针(stack pointer),它们是为某个 CPU 上正在执行的代码生成准确栈回溯的基础。这类信息通常存放在名为 `NT_PRSTATUS` 的、与架构相关的 ELF note 中。
These are just the low-level metadata that a debugger would need in order to interpret variables and data structures in memory, as well as unwind the stacks of active tasks. However, there are other kinds of metadata which need to be considered. For example, makedumpfile, which we’ve already discussed, has the ability to omit certain kinds of memory pages from its output. In order to do that, makedumpfile needs to be able to understand the kernel’s array of page frames. It can use debuginfo for this, but in practice, debuginfo is rarely available at runtime. So makedumpfile can use additional metadata that declares the sizes and member offsets of certain structures, as well as addresses of certain essential symbols. Other tools directly use the vmcore (without debuginfo) to extract the kernel log buffer, and so metadata for symbols and types related to the log buffer are also frequently needed.
上述仅是调试器为解释内存中的变量与数据结构、以及对活动任务栈进行回溯展开所需的底层元数据。然而,还需考虑其他类型的元数据。例如,前述 `makedumpfile` 能够在输出中省略特定类型的内存页;为实现这一点,它必须理解内核的页帧数组。它可以借助 `debuginfo` 达成,但在实践中运行时往往难以获得 `debuginfo`。因此,`makedumpfile` 会使用额外元数据来声明某些结构体的大小与成员偏移,以及若干关键符号的地址。还有一些工具会在缺少 `debuginfo` 的情况下直接使用 vmcore 提取内核日志缓冲区,因此与日志缓冲区相关的符号与类型元数据也经常必不可少。
All told, there’s quite a bit of metadata that’s necessary to interpret the kernel core dump. Some of this metadata is represented by the dump format intrinsically (e.g. ELF contains the architecture information in its header, and can hold virtual address mappings in the program headers). However, many of the Linux-specific details, such as page table locations, KASLR offsets, kernel release, and more are contained in a special piece of data called the “vmcoreinfo”.
总体而言,正确解读内核 core dump 需要相当多的元数据。其中一部分由转储格式内生承载(例如 ELF 头部包含架构信息,program header 可携带虚拟地址映射);但大量 Linux 特有细节(如页表位置、KASLR 偏移、内核发行版本等)则集中存放在名为 “vmcoreinfo” 的特殊数据中。
The vmcoreinfo is text formatted, key=value block of data. Normally, a page of memory is allocated at startup by the Linux kernel, and the data is formatted and written into this page. The page is never overwritten or deallocated, so it’s always available – if you can find it. In the event of a panic, the kernel includes the already-created vmcoreinfo into the ELF /proc/kcore and /proc/vmcore files as an ELF note. This makes it easy to explore on a running system: simply run eu-readelf -n /proc/kcore as root and look for the “VMCOREINFO” note and its data. Much of the above metadata is represented as keys in this text, for example:
`vmcoreinfo` 以文本形式组织,为 `key=value` 的数据块。通常,Linux 内核在启动时分配一页内存,将相关数据格式化后写入该页;该页既不被覆盖也不被释放,因此理论上始终可用——前提是能够定位到它。当发生 panic 时,内核会将已生成的 `vmcoreinfo` 作为 ELF note 写入 ELF 格式的 `/proc/kcore` 与 `/proc/vmcore` 文件。这使得在运行中系统上进行探索十分便利:以 root 身份执行 `eu-readelf -n /proc/kcore`,查找名为 “VMCOREINFO” 的 note 及其内容即可。上述大量元数据在该文本中以键值形式呈现,例如:
OSRELEASE – the kernel release 内核发行版本
BUILD-ID – the build ID of the kernel (vmlinux) 内核(`vmlinux`)的构建 ID
PAGESIZE – physical page frame size 物理页帧大小
KERNELOFFSET – this is x86_64 specific, but it contains the KASLR offset (x86_64 特有)KASLR 偏移
SYMBOL(swapper_pg_dir) – the root page table 根页表
The strength of the vmcoreinfo is that it is text-based, so it’s easy to extend with new information that wasn’t anticipated, unlike binary file formats. Another benefit is that it’s included in kernel memory anyway, so it’s almost always present somewhere in a core dump, if you’re willing to search for it. The downside is that core dump formats need to be aware of it and either include a copy, or include a pointer to it. If that copy or pointer is lost, or if the tool which created the dump never knew where it was (e.g. hypervisors), then it can be difficult or impossible to use with tools like Crash and Drgn.
`vmcoreinfo` 的优势在于其文本化特征:相较二进制文件格式,它更易扩展以承载事先未预料的新信息。另一优势是:由于它本就驻留在内核内存中,因此在 core dump 中几乎总能在某处找到(若愿意进行搜索)。其不足在于:转储格式需要“意识到”它的存在,并在转储中包含其副本或包含指向它的指针;一旦该副本/指针丢失,或生成转储的工具从未获知其位置(例如某些 hypervisor 场景),则该转储可能难以甚至无法被 Crash、Drgn 等工具有效使用。
Core dump formats
Now that we know the diversity of core dump producers, and that there is a large amount of metadata that needs to be well represented in a core dump, we’re ready to tackle an incomplete list of the more common core dump formats. In each section, we’ll show how to identify the format (frequently, by using the file utility, but sometimes using a hex dumper like xxd). We’ll also describe the benefits and drawbacks, along with the common producers and consumers.
在了解 core dump 生成端的多样性,以及 core dump 需要良好承载大量元数据之后,我们可以进一步讨论一份并不完整、但覆盖常见情形的 core dump 格式列表。各小节将展示如何识别格式(通常借助 `file` 工具,有时使用 `xxd` 等十六进制查看器),并阐述其优缺点以及典型的生产者与消费者(生成工具与分析工具)。
ELF
The Executable and Linkable Format (ELF) is ubiquitous in the Linux world. Most programs are in ELF, as are the intermediate outputs of the compiler, and also the core dumps of userspace programs. In order to suit that diversity of use cases, ELF has to be very flexible.
可执行与可链接格式(ELF)在 Linux 生态中无处不在:大多数程序以 ELF 表示,编译器的中间产物亦然,用户态程序的 core dump 也多采用 ELF。为适配如此多样的用途,ELF 具备很强的灵活性。
ELF is essentially composed of four parts:
ELF 本质上由四部分构成:
The ELF header
The section headers (optional)
The program headers (optional)
Data
The ELF header points to section & program headers, and gives some basic information about the architecture. The section & program headers are optional (but at least one needs to be included), and they define regions of the data and provide metadata about them. The section headers are intended to be more useful for a compiler or linker: they define a series of “sections” of data in the file, all of which are named and have different types and some flags. The program headers, on the other hand, define “segments”, which are intended to be used while executing a program. Each entry describes which parts of the file should be loaded into memory at what address, with what permissions. It’s like a recipe for creating a process image in memory.
ELF 头部指向节头表与程序头表,并提供一些关于体系结构的基础信息。节头表与程序头表均为可选(但至少必须包含其一),它们用于定义文件中数据区域并提供相应元数据。节头表主要服务于编译器或链接器:它定义文件中一组具名的“节”(sections),各节具有不同类型及标志。程序头表则定义“段”(segments),主要用于程序执行时的装载:每个条目描述文件的哪些部分应以何种权限装载到内存的何处,类似于构造进程内存映像的“配方”。
ELF is typically used to represent core dumps using the program headers. Rather than describing how a loader should create the program in memory, a core dump contains the memory contents of the program, and the program headers describe where the contents were mapped at the time of the crash. The Linux Kernel creates userspace core dumps using the ELF format in this manner, and so it’s not surprising that /proc/kcore and /proc/vmcore are also represented as ELF files with memory regions described by program headers.
ELF 通常通过程序头表来表示 core dump。此时,它不再描述装载器如何在内存中构造进程,而是将程序在崩溃时的内存内容写入文件,并由程序头表描述这些内容在崩溃时映射到何处。Linux 内核以同样方式为用户态生成 ELF core dump,因此 `/proc/kcore` 与 `/proc/vmcore` 也以 ELF 形式呈现并由程序头表描述内存区域,乃顺理成章。
But how is the metadata handled? Some of it is included in the ELF header, especially the architecture. The remainder is usually stored in ELF “notes”. Program headers and section headers can both declare a section of the file as containing specially-formatted “note” data that can have arbitrary contents. The following notes are contained in vmcore/kcore files generated by Linux:
那么元数据如何承载?其中一部分(尤其是架构信息)包含于 ELF 头部;其余通常存放在 ELF “notes” 中。程序头表与节头表都可声明文件的某一部分为 “note” 数据区域,用以容纳内容可任意扩展的、具特殊格式的元信息。Linux 生成的 vmcore/kcore 文件通常包含如下 note:
Notes of type PRSTATUS, which contain the registers for every CPU. (The /proc/kcore file only contains the PRSTATUS for the running CPU. Getting the data for other CPUs would require sending inter-processor-interrupts to collect the data, and by the time it was returned to userspace, the data would be stale anyway.)
PRSTATUS 类型 note:包含每个 CPU 的寄存器信息。(`/proc/kcore` 仅包含当前运行 CPU 的 PRSTATUS;若要获取其他 CPU 的数据需发送处理器间中断进行采集,而当数据返回用户态时往往已不再“新鲜”。)
A note of type PRPSINFO can contain the kernel command line
PRPSINFO 类型 note:可包含内核命令行
A note of type VMCOREINFO will contain the vmcoreinfo note with the metadata described above.
VMCOREINFO 类型 note:包含前述 `vmcoreinfo` 的元数据内容
Detecting ELF Core Dumps
识别 ELF core dump
You can tell a file is an ELF core dump via the file command. For example:
可以使用 `file` 命令判断某文件是否为 ELF core dump。例如:
$ sudo file /proc/kcore/proc/kcore: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from 'BOOT_IMAGE=(hd0,gpt3)/vmlinuz-5.14.0-284.25.1.0.1.el9_2.x86_64 root=/dev/mapper'
You can also view the program headers that define the core dump memory via eu-readelf -l FILE, and you can see the notes (with contents, if possible) via eu-readelf -n FILE. For example, here are the program headers for a /proc/kcore and a /proc/vmcore respectively:
还可以通过 `eu-readelf -l FILE` 查看定义 core dump 内存布局的程序头表,并通过 `eu-readelf -n FILE` 查看 note(如可能则包含其内容)。例如,下列分别展示了 `/proc/kcore` 与 `/proc/vmcore` 的程序头表:
sh-5.1# readelf -l /proc/kcoreElf file type is CORE (Core file)Entry point 0x0There are 9 program headers, starting at offset 64Program Headers:Type Offset VirtAddr PhysAddrFileSiz MemSiz Flags AlignNOTE 0x0000000000000238 0x0000000000000000 0x00000000000000000x0000000000001b60 0x0000000000000000 0x0LOAD 0x00007fffff602000 0xffffffffff600000 0xffffffffffffffff0x0000000000001000 0x0000000000001000 RWE 0x1000LOAD 0x00007fff81002000 0xffffffff81000000 0x000000007d6000000x0000000001630000 0x0000000001630000 RWE 0x1000LOAD 0x0000490000002000 0xffffc90000000000 0xffffffffffffffff0x00001fffffffffff 0x00001fffffffffff RWE 0x1000LOAD 0x00007fffc0002000 0xffffffffc0000000 0xffffffffffffffff0x000000003f000000 0x000000003f000000 RWE 0x1000LOAD 0x0000088000003000 0xffff888000001000 0x00000000000010000x000000000009e000 0x000000000009e000 RWE 0x1000LOAD 0x00006a0000002000 0xffffea0000000000 0xffffffffffffffff0x0000000000003000 0x0000000000003000 RWE 0x1000LOAD 0x000008806f003000 0xffff88806f001000 0x000000006f0010000x000000000ffff000 0x000000000ffff000 RWE 0x1000LOAD 0x00006a0001bc2000 0xffffea0001bc0000 0xffffffffffffffff0x0000000000400000 0x0000000000400000 RWE 0x1000sh-5.1# readelf -l /proc/vmcoreElf file type is CORE (Core file)Entry point 0x0There are 4 program headers, starting at offset 64Program Headers:Type Offset VirtAddr PhysAddrFileSiz MemSiz Flags AlignNOTE 0x0000000000001000 0x0000000000000000 0x00000000000000000x0000000000001700 0x0000000000001700 0x0LOAD 0x0000000000003000 0xffffffff88600000 0x00000000562000000x0000000001630000 0x0000000001630000 RWE 0x0LOAD 0x0000000001633000 0xffff9a7ec0100000 0x00000000001000000x000000006ef00000 0x000000006ef00000 RWE 0x0LOAD 0x0000000070533000 0xffff9a7f3f000000 0x000000007f0000000x0000000000fe0000 0x0000000000fe0000 RWE 0x0
You can see the /proc/kcore file contains many program header entries: this is because it is representing several virtual address regions, like the kernel’s vmalloc and vmemmap regions, in addition to the kernel’s mapping of code, and the kernel’s direct mapping of physical memory.
可以看到,`/proc/kcore` 含有大量 program header 条目,这是因为它不仅表示内核代码映射与物理内存的直接映射,还表示多个虚拟地址区域,例如内核的 `vmalloc` 与 `vmemmap` 区域。
By comparison, /proc/vmcore contains far fewer headers. It doesn’t represent vmalloc or vmemmap regions, because the currently running kernel doesn’t have enough information to find them.
相比之下,`/proc/vmcore` 的 header 数量少得多。它不表示 `vmalloc` 或 `vmemmap` 区域,因为当前运行的(kdump)内核缺乏足够信息来定位这些区域。
Benefits & Drawbacks of ELF Core Dumps
Since ELF is such a widespread, enduring standard, there are very many tools that support it. As we’ve seen above, readelf and its elfutils-based variant, eu-readelf can both show detailed information related to the file format, without the need for a user to read or write code to analyze it. Further, ELF is one of the only formats that “normal debuggers” (i.e. those which don’t specialize on the Linux kernel) can use.
由于 ELF 是一种广泛且长期稳定的标准,因此有大量工具对其提供支持。如上所示,`readelf` 及其基于 elfutils 的变体 `eu-readelf` 都能在无需编写分析代码的情况下展示与格式相关的细节信息。此外,ELF 也是“通用调试器”(即非专门面向 Linux 内核的调试器)少数能够直接使用的格式之一。
However, ELF has a few drawbacks. Much of the data in a kernel core dump is not useful for debugging, and can or should be omitted. The ELF program header, which is used to define the memory of the core dump, can be used to help omit this data: segments can be split up so that the gaps between them exclude unneeded data. Unfortunately, this comes at a cost: each additional segment requires an entry of 56 bytes (for a 64-bit ELF file), and the number of program headers is limited by the size of the header field e_phnum, a 16-bit integer whose max value is 65,535, which is a limit that could conceivably be hit by systems with large amounts of memory. This limit can be exceeded (search for PN_XNUM in elf(5)), but it is a bit clunky. Despite these oddities, makedumpfile -E does support outputting ELF-formatted vmcores with pages excluded in exactly this way. It’s worth trying this out for yourself, if you want to explore: makedumpfile -E -d 31 /proc/kcore my_dump.elf would create such a file.
然而,ELF 亦存在若干缺点。内核 core dump 中有相当一部分数据对调试并无价值,理论上可以或应当被省略。用于定义 core dump 内存布局的 ELF 程序头表可在一定程度上支持省略:通过将段拆分,使段间空洞对应“未包含的数据”。但这会带来成本:对 64 位 ELF 文件而言,每增加一个段就需要新增一个 56 字节的条目;且 program header 数量受头字段 `e_phnum` 限制,该字段为 16 位整数,最大值为 65,535,在大内存系统上理论上可能触及。尽管可通过扩展机制突破该限制(参见 `elf(5)` 中的 `PN_XNUM`),但实现较为笨重。即便如此,`makedumpfile -E` 确实支持以此方式输出省略页面后的 ELF 格式 vmcore;若需实践探索,可尝试 `makedumpfile -E -d 31 /proc/kcore my_dump.elf`。
The far more important limitation is the ELF program header does not define any field or flag to indicate compression for a segment. So there is no broadly compatible way for an ELF vmcore to include compressed data. Of course, it is possible to compress the entire ELF file, but this may not be practical due to memory or CPU constraints.
更重要的限制在于:ELF program header 并未定义用于标识“段压缩”的字段或标志,因此缺乏一种广泛兼容的方式在 ELF vmcore 中内嵌压缩数据。当然,可以对整个 ELF 文件整体压缩,但在内存或 CPU 受限的环境中未必现实可行。
Variant: QEMU ELF
Not all ELF kernel core dumps are created by the kernel or makedumpfile, though. QEMU (and thus, virsh dump) has the option to create an ELF-formatted vmcore. It uses mostly the same format as the Linux kernel, but it does include some extra QEMU-specific data for each CPU, in addition to the PRSTATUS note. It also includes a nearly empty section header table containing no information of value.
并非所有 ELF 格式的内核 core dump 都由内核或 `makedumpfile` 生成。QEMU(以及由其驱动的 `virsh dump`)可选择生成 ELF 格式 vmcore。其格式大体与 Linux 内核相同,但除 PRSTATUS note 外,还会为每个 CPU 附加一些 QEMU 特有数据,并包含一个几乎为空、缺乏有价值信息的节头表。
The main concern with QEMU’s ELF core dumps is whether they contain the vmcoreinfo note. Hypervisors may have access to the guest memory, but they don’t have any general-purpose way to see the vmcoreinfo data that the kernel has prepared at boot time. This means that, unless you’ve explicitly configured it, vmcores generated by QEMU won’t have a vmcoreinfo note. Thankfully, QEMU implements a virtualized “vmcoreinfo” device. The guest Linux kernel can detect its presence at boot and write its vmcoreinfo data into this device once it is ready. The QEMU hypervisor can then store this data alongside the virtual machine, in case it must later create a vmcore. If you’ve run QEMU with -device vmcoreinfo and you have a properly configured kernel, then QEMU will include that data into its core dumps.
QEMU 的 ELF core dump 的主要风险在于是否包含 `vmcoreinfo` note。hypervisor 虽可访问来宾内存,但通常缺乏通用机制以获取内核在启动期准备好的 `vmcoreinfo` 数据。因此,若未显式配置,QEMU 生成的 vmcore 往往不包含 `vmcoreinfo` note。所幸,QEMU 实现了虚拟化的 “vmcoreinfo” 设备:来宾 Linux 内核在启动时可探测其存在,并在数据准备就绪后将 `vmcoreinfo` 写入该设备;随后 QEMU 可将其与虚拟机状态一并保存,以备未来生成 vmcore 时使用。若以 `-device vmcoreinfo` 运行 QEMU 且内核配置正确,则 QEMU 会在 core dump 中包含该数据。
It’s worth noting that, even if the ELF core dump doesn’t contain the vmcoreinfo as an ELF note, the data is still there buried in the core dump. With the right tools, you can search through the memory contents and find it.
值得注意的是,即便 ELF core dump 未以 ELF note 形式包含 `vmcoreinfo`,该数据通常仍埋藏在转储的内存内容中;借助合适工具仍可能通过搜索定位并提取。
More Variation: Virtual Addresses in ELF
As we’ve seen, ELF program headers are a quite flexible way of storing memory metadata. They allow each memory segment to be assigned a virtual and physical memory address. This means that an ELF core dump can represent the kernel’s virtual address space as well as its physical address space. For /proc/kcore and /proc/vmcore, the kernel does exactly that, since it already knows its own virtual address mappings. When creating ELF vmcores, makedumpfile will also populate the virtual address field according to the kernel’s virtual address mappings.
如前所述,ELF program header 对内存元数据的表达相当灵活:它允许为每个内存段同时指定虚拟地址与物理地址。这意味着 ELF core dump 可同时表示内核的虚拟地址空间与物理地址空间。对于 `/proc/kcore` 与 `/proc/vmcore`,内核正是如此处理,因为它掌握自身的虚拟地址映射;`makedumpfile` 在生成 ELF vmcore 时也会据此填充虚拟地址字段。
However, hypervisor or firmware level vmcores tend not to include the virtual address information in core dumps. While the memory mapping information is available to a hypervisor, it may not be easily accessible. For example, KVM-based hypervisors delegate page table management to processor virtualization extensions wherever possible, so the hypervisor itself may not even know the current guest page tables. Parsing the page tables from the guest memory would be expensive and could result in a denial of service if a malicious guest crafted a very large set of page tables. So hypervisors tend to avoid that complexity, and they’ll only include the physical addresses for memory. QEMU does have the support for including virtual address information, but it must be explicitly enabled (e.g. dump-guest-memory -p).
然而,hypervisor 或固件层的 vmcore 往往不包含虚拟地址信息。尽管 hypervisor 理论上可获得内存映射信息,但实际获取并不总是便利。例如,基于 KVM 的 hypervisor 会尽可能将页表管理委托给处理器虚拟化扩展,从而 hypervisor 自身可能甚至并不知道来宾当前页表;而从来宾内存解析页表不仅代价高,还可能在恶意来宾构造巨大页表集合时引发拒绝服务风险。因此,hypervisor 通常避免该复杂性,仅包含内存的物理地址。QEMU 确实支持加入虚拟地址信息,但必须显式启用(例如 `dump-guest-memory -p`)。
When hypervisors don’t have virtual address information available, the behavior is not well-defined. Some, like QEMU, include a fake virtual address (the same value as the physical address). This can make it difficult for a debugger to detect whether a vmcore really has accurate virtual memory mappings.
当 hypervisor 无法提供虚拟地址信息时,不同实现的行为并无统一规范。有些实现(如 QEMU)会填写“伪造”的虚拟地址(与物理地址取相同值),这会使调试器难以判断 vmcore 是否真正包含准确的虚拟内存映射。
Kdump-compressed Format
While ELF may be a ubiquitous file format for programs, libraries, and userspace core dumps, the kdump-compressed format is by far the most common format used by our customers for kernel core dumps. A default Oracle Linux installation with kdump enabled will use makedumpfile to create core dumps in kdump-compressed format, which offers several advantages over ELF (the main one being compression, which reduces file size significantly). However, this format wasn’t always ubiquitous, nor did it originate with makedumpfile.
尽管 ELF 广泛用于程序、库及用户态 core dump,但在客户实践中,内核 core dump 最常见的格式远非 ELF,而是 kdump 压缩格式(kdump-compressed)。Oracle Linux 在默认启用 kdump 的安装中,会使用 `makedumpfile` 生成 kdump-compressed core dump;相较 ELF,该格式具有若干优势,其中最核心的是压缩能力,可显著降低文件体积。不过,该格式并非自始至终都如此普及,也并非起源于 `makedumpfile`。
In the dark days prior to kexec being a viable way to make crash dumps, there were a few projects that contained out-of-tree Linux kernel patches to enable core dumps, as well as utilities to use these dumps. Your author was not of an age to pay attention at the time that these systems were at their zenith, so you’ll be spared the history lesson. One such project was lkdump, in whose diskdumputils-0.1.5.tar.bz2 distribution you can see an early definition of a file format called diskdump, in dumpheader.h. While this project seems to have died off, the format seems to be the predecessor to the one currently in use by makedumpfile.
在 kexec 尚不足以支撑崩溃转储的早期阶段,业界曾存在一些包含内核树外补丁(out-of-tree patches)的项目,用以启用 core dump,并提供相应工具链。本文作者并未在这些系统“鼎盛时期”密切跟踪,因此略去历史细节。其中一个项目是 lkdump:在其 `diskdumputils-0.1.5.tar.bz2` 发行包中,可在 `dumpheader.h` 看到名为 diskdump 的早期格式定义。尽管该项目似已停滞,但其格式看起来是 `makedumpfile` 当前使用格式的前身。
For the purposes of this description, we’ll call the dump format “kdump-compressed”, though in truth, it goes by many names. This is because there is no single standard, except what’s in makedumpfile’s source code. You would be forgiven for calling it a modified “diskdump” format, as that is what makedumpfile’s own code seems to refer to it as. However, the file utility, as well as makedumpfile(8), refer to it as kdump-compressed, so we’ll try to use that name too. You can see how recent versions of file recognize the format below:
为便于叙述,本文将该转储格式称为 “kdump-compressed”,尽管事实上它有多种称谓。这是因为除 `makedumpfile` 源码所体现的事实标准外,并不存在单一的统一规范。将其称为“改造版 diskdump 格式”亦无不可,因为 `makedumpfile` 自身代码似乎也以此指代;不过 `file` 工具与 `makedumpfile(8)` 手册更常称其为 kdump-compressed,因此本文也采用该名称。下例展示了新版本 `file` 如何识别此格式:
$ file vmcorevmcore: Kdump compressed dump v6, system Linux, node stepbren-ol9.local, release 5.14.0-284.25.1.0.1.el9_2.x86_64, version #1 SMP PREEMPT_DYNAMIC Fri Aug 4 09:00:16 PDT 2023, machine x86_64, domain (none)
Leaving aside matters of naming and provenance, the kdump-compressed format is primarily output by makedumpfile (though QEMU can output a variant of the format), and it is primarily read by crash, although the libkdumpfile library is a powerful option for consuming it as well. The strength of the format is in its ability include or omit any page, as well as compress each page. A typical kdump-compressed vmcore can be just a fraction of the size of physical memory, though it will depend on the memory usage of the system, as well as the dump level and compression arguments passed to makedumpfile. This makes it ideal for use when a vmcore needs to be sent to remote engineers for analysis.
暂不考虑命名与渊源问题,kdump-compressed 格式主要由 `makedumpfile` 输出(尽管 QEMU 也能输出其变体),并主要由 `crash` 读取;此外,`libkdumpfile` 作为读取与消费该格式的库同样强大。该格式的核心优势在于:可按页选择性包含或省略,并能对每页进行压缩。典型的 kdump-compressed vmcore 体积可远小于物理内存规模,但具体比例取决于系统内存使用状况,以及 `makedumpfile` 的 dump level 与压缩参数设置。这使其非常适合在需要将 vmcore 发送给远端工程师进行分析时使用。
The downside of the kdump-compressed format, of course, is that it is quite niche. Any kernel-specific debugger, such as Crash and Drgn, can understand it, so in the common case, there aren’t many compatibility issues. However, in case that you’re having trouble opening a kdump-compressed vmcore (e.g. due to implementation bugs, corruption, etc), you’ll find fewer tools available to help you. There is no readelf equivalent for kdump-compressed files. You’ll likely find yourself writing your own tools to help in diagnosing these issues, possibly with the help of libkdumpfile. In fact, the examples directory in the libkdumpfile project contains some genuinely useful programs for investigating issues.
其不足在于:kdump-compressed 相对小众。对于 Crash、Drgn 等内核专用调试器而言,这通常不是问题;因此在常见使用场景下兼容性障碍有限。然而,当你在打开某个 kdump-compressed vmcore 时遇到困难(例如实现缺陷、文件损坏等)时,可用的辅助工具显著更少。kdump-compressed 并不存在类似 `readelf` 的通用检查工具;你很可能需要自行编写诊断工具,并可能借助 `libkdumpfile`。事实上,`libkdumpfile` 项目的 `examples` 目录中包含一些确有实用价值的排障示例程序。
As shown above, recent versions of the file tool do a good job of identifying this format. However, older versions may unhelpfully report only: data. You can manually look for the file header using xxd:
如上所示,新版 `file` 通常能较好识别该格式;但旧版本可能仅给出含糊的 `data`。此时可用 `xxd` 手工查看文件头:
$ xxd ./vmcore | head -n 400000000: 4b44 554d 5020 2020 0600 0000 4c69 6e75 KDUMP ....Linu00000010: 7800 0000 0000 0000 0000 0000 0000 0000 x...............00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................00000030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
If the first 8 bytes are "KDUMP " (that is, KDUMP followed by three spaces), then you can be confident that you’re looking at a kdump-compressed vmcore. If the first 8 bytes are "DISKDUMP", then you have a quite old core dump from the old diskdump utilities. If, however, the first 12 bytes are "makedumpfile", then you should continue reading, as this is a “flattened” kdump file, discussed below.
若前 8 字节为 `"KDUMP "`(即 KDUMP 后跟三个空格),则可确信这是 kdump-compressed vmcore。若前 8 字节为 `"DISKDUMP"`,则说明这是旧版 diskdump 工具生成的较早期转储。若前 12 字节为 `"makedumpfile"`,则需继续阅读:这表示一种下文将讨论的“扁平化”(flattened)kdump 文件。
Kdump Format Details
The modern kdump-compressed format manages to achieve impressive compactness through a few optimizations. First, it uses a bitmap to represent the availability of any given page. This is quite efficient for the common cases, where there is a mix of pages included and excluded. For the common page size of 4096 bytes, each gigabyte of physical memory requires 32KiB of bitmap space in the output file, regardless of whether any of that memory is actually included in the output file.
现代 kdump-compressed 格式通过多项优化实现了显著的紧凑性。首先,它使用位图(bitmap)表示任意页面是否被包含;在“包含与排除页面混合存在”的常见情形下,该方案效率较高。以常见 4096 字节页面大小为例,每 1GiB 物理内存仅需要 32KiB 的位图空间(不论该 GiB 的内存数据是否实际被写入输出)。
In addition to this bitmap, the format maintains an array of 24-byte descriptors, one for each page included in the dump. The descriptor contains information about compression, size, and location of the data in the dump. It also reserves 8 bytes within this descriptor for a field called page_flags, which seems intended to be populated from the corresponding struct page in the kernel. From quick inspection, neither crash nor libkdumpfile, the two major consumers of the format, use this field. So there is always room for improvement.
除位图外,该格式还维护一个由 24 字节描述符组成的数组:对转储中每个被包含页面各有一个描述符。描述符记录压缩信息、大小以及该页数据在转储文件中的位置。此外,它在描述符中预留了 8 字节字段 `page_flags`,似乎意在从内核相应的 `struct page` 填充。粗略观察表明,该格式的两大消费者 `crash` 与 `libkdumpfile` 目前均未使用该字段,因此仍存在改进空间。
To store metadata, the format uses an interesting trick. Rather than define its own format to store PRSTATUS or VMCOREINFO notes, the latest version (v6) of the format simply reuses the ELF Note structures & formats. This is a major advantage, since it allows any defined ELF note type (or custom notes) to be inserted into the vmcore, and it allows sharing code in systems which may output either ELF or kdump-compressed files.
在元数据存储方面,该格式采用了一个颇为巧妙的策略:最新版本(v6)并未为 PRSTATUS 或 VMCOREINFO 等信息定义全新结构,而是直接复用 ELF Note 的结构与格式。这一选择具有显著优势:它允许将任意已定义的 ELF note 类型(乃至自定义 note)嵌入 vmcore,并有利于在同时支持输出 ELF 与 kdump-compressed 的系统中复用代码实现。
Finally, the kdump-compressed format is not designed with any particular mechanism for encoding the kernel’s virtual address mappings. Instead, consumers like crash and kdump are expected to use architecture-specific code and VMCOREINFO metadata to find and interpret the page tables.
最后,kdump-compressed 并未设计专门机制来编码内核虚拟地址映射;相反,它假定由 `crash` 等消费者借助架构相关代码与 VMCOREINFO 元数据来定位并解析页表,从而重建映射关系。
Variant: flattened format
To properly write the kdump-compressed format, the core dump creator (typically makedumpfile) needs to seek between two locations in the output file: the header area, which contains the bitmaps and page descriptors, and the data area, where the actual compressed page contents are written out. Since the compressed size of pages cannot be known ahead of time, the descriptors can’t be written before the pages. The descriptors need to be written near the beginning of the file to serve as an index for the variable-size page data. While seeking between these locations is no problem for conventional files, it is impossible to do this if you would like to output the core dump on stdout, or transmit it via a network socket: pipes and sockets do not support lseek(2).
为正确写入 kdump-compressed 格式,core dump 生成器(通常为 `makedumpfile`)需要在输出文件的两处位置之间反复 `seek`:其一为头部区域,包含位图与页面描述符;其二为数据区域,写入实际的压缩页面内容。由于页面压缩后的大小无法预先得知,描述符无法在写页数据之前写入;但描述符又必须位于文件前部作为变长页面数据的索引。对常规文件而言跨区域 `seek` 并非问题;但若希望将 core dump 输出到标准输出(stdout)或通过网络 socket 传输,则无法这样做,因为 pipe 与 socket 不支持 `lseek(2)`。
To resolve this issue, makedumpfile introduced the “flattened” variant of the kdump-compressed format. Instead of using lseek(fd, SEEK_SET, offset) to go to a particular offset, and then write(fd, data, size), makedumpfile simply writes offset and size out, followed by the data. Before reading the data, a “reassembly” phase is necessary, which simply reads each offset + size record, and follows the instructions to create the final output file.
为解决该问题,`makedumpfile` 引入了 kdump-compressed 的“扁平化”变体:它不再通过 `lseek(fd, SEEK_SET, offset)` 跳转到指定偏移再写入数据,而是按顺序写出 `offset` 与 `size`,再紧随其后写出对应数据。在读取该数据前,需要进行一次“重组”(reassembly)阶段:读取每条 `offset + size` 记录并据此构造最终输出文件。
This format has a header which starts the 12 bytes: “makedumpfile”, so it can be recognized quite easily with xxd as shown above.
该格式的文件头以 12 字节字符串 `"makedumpfile"` 开始,因此可如前所示通过 `xxd` 较容易地识别。
There is only one advantage to this format: it’s the most practical way to output the kdump-compressed format to a socket or pipe. However, once the core dump is saved, there is no point in using the flattened format! Flattened vmcores can be “reassembled” into a normal kdump-compressed vmcore using the command makedumpfile -R, or makedumpfile-R.pl.
该格式的优势几乎只有一个:它是将 kdump-compressed 输出到 socket 或 pipe 的最实用方式。然而,一旦 core dump 已落盘保存,继续使用扁平化格式并无意义。扁平化 vmcore 可通过 `makedumpfile -R` 或 `makedumpfile-R.pl` “重组”为标准的 kdump-compressed vmcore。
For convenience, some analysis tools support directly reading the flattened variant of the format: Crash, starting with version 5.1.2; libkdumpfile, starting with 0.5.3; and Drgn, starting with 0.0.25 (alongside capable libkdumpfile). However, this support comes at a cost, which is frequently not advertised to the user. In order to directly read a flattened vmcore, these tools must first build an index of what the reassembled file should look like. This process is time-consuming for large vmcores, and as of this writing, no implementation saves the resulting index, so it needs to be recomputed each time the file is opened. Frequently, tools like Crash don’t inform the user about what the indexing process is, nor do they inform the user that they could avoid the indexing phase by simply using makedumpfile -R one time only to generate a standard vmcore.
为便利起见,一些分析工具支持直接读取扁平化变体:Crash 自 5.1.2 起、`libkdumpfile` 自 0.5.3 起、Drgn 自 0.0.25 起(并需配合具备能力的 `libkdumpfile`)。但这种“直接读取”常伴随代价且往往未被显式告知:工具必须先构建一个索引,以描述重组后文件应呈现的布局。对大型 vmcore 而言,这一过程耗时显著;并且截至本文撰写时,没有实现会持久化保存该索引,因此每次打开文件都需重新计算。实践中,Crash 等工具往往不会向用户解释索引阶段的含义,也不会提醒用户仅需一次执行 `makedumpfile -R` 生成标准 vmcore 即可规避该阶段。
To add even more confusion to the situation, QEMU’s default “kdump” output setting creates a flattened vmcore, not a standard kdump-compressed vmcore. This was done for much the same reason as makedumpfile: to allow outputting core dumps to pipe file descriptors. But unlike makedumpfile, which falls back to the flattened format only when lseek() fails, QEMU versions prior to 8.2 simply only implement the flattened format. Since QEMU 8.2, a new output setting called “kdump-raw” been added, which corresponds to the standard kdump format. Since it is opt-in, users are forced to know the difference between the flattened and standard formats. Users who don’t know it may end up with a vmcore incompatible with their tools, or which is painfully slow to analyze.
更令人困惑的是,QEMU 默认的 “kdump” 输出设置生成的是扁平化 vmcore,而非标准 kdump-compressed vmcore;其动机与 `makedumpfile` 类似,即支持将 core dump 输出到 pipe 文件描述符。但与 `makedumpfile` 仅在 `lseek()` 失败时才回退到扁平化不同,8.2 之前的 QEMU 仅实现扁平化格式。自 QEMU 8.2 起新增了名为 “kdump-raw” 的输出设置,对应标准 kdump 格式;由于该选项需要显式启用,用户被迫理解扁平化与标准格式的差异,否则可能得到与工具不兼容、或分析极其缓慢的 vmcore。
What’s worse, it’s possible for QEMU to omit the VMCOREINFO note from its vmcores, as we’ve discussed already. In the unhappy case where a flattened vmcore is produced without a VMCOREINFO note, the resulting file is nearly impossible to load in Crash, but Crash will spend a good deal of time trying to open it prior to failing. Your author has witnessed many well-qualified engineers give up on these sorts of core dumps, unaware of the subtleties of these formats, and unaware that none of the obstacles are insurmountable.
更糟的是,如前所述,QEMU 还有可能在 vmcore 中省略 VMCOREINFO note。若扁平化 vmcore 又恰好缺少 VMCOREINFO note,则该文件几乎无法被 Crash 正常加载;但 Crash 往往会在失败前耗费相当时间尝试打开。作者见过不少能力出众的工程师因不了解这些格式细节而放弃处理此类 core dump,尽管这些障碍并非不可克服。
Xen ELF
While we have already covered ELF formatted vmcores, the ELF format used by the Xen hypervisor really deserves its own category. A core dump created by running xm dump-core from the dom0 (hypervisor) results in an ELF file, on which file will cheerfully report:
尽管前文已讨论 ELF 格式 vmcore,但 Xen hypervisor 使用的 ELF 格式确实值得单列一类。从 dom0(hypervisor)执行 `xm dump-core` 生成的 core dump 是一个 ELF 文件,`file` 工具会报告:
$ file xendumpxendump: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), no program header
This all seems fine, except for that last bit: “no program header”. As we saw previously, the ELF format for vmcores used by Linux and QEMU use the program headers to describe the memory segments and where they belong in memory. ELF section headers don’t have fields to represent this sort of information. Let’s take a look at the ELF sections that are present:
看似一切正常,唯独最后一句 “no program header” 颇为异常。如前所述,Linux 与 QEMU 的 ELF vmcore 使用 program header 描述内存段及其归属位置;而 ELF 的 section header 并不具备用于表达此类信息的字段。下面查看其实际包含的 ELF sections:
$ readelf -S xendumpThere are 7 section headers, starting at offset 0x40:Section Headers:[Nr] Name Type Address OffsetSize EntSize Flags Link Info Align[ 0] NULL 0000000000000000 000000000000000000000000 0000000000000000 0 0 0[ 1] .shstrtab STRTAB 0000000000000000 100bfdfc00000000000000048 0000000000000000 0 0 0[ 2] .note.Xen NOTE 0000000000000000 000002000000000000000568 0000000000000000 0 0 0[ 3] .xen_prstatus PROGBITS 0000000000000000 000007680000000000002860 0000000000001430 0 0 8[ 4] .xen_shared_info PROGBITS 0000000000000000 00002fc80000000000001000 0000000000001000 0 0 8[ 5] .xen_pages PROGBITS 0000000000000000 0000400000000001003f8000 0000000000001000 0 0 4096[ 6] .xen_pfn PROGBITS 0000000000000000 1003fc0000000000000801fc0 0000000000000008 0 0 8
There are some immediately recognizable items here. .note.Xen is a NOTE section, so it probably contains some metadata about the hypervisor. The .xen_prstatus section shares a name with the PRSTATUS notes we saw in previous core dumps, so it almost certainly contains the register state of each CPU. For the remainder, we can refer to a greatly appreciated documentation file turned up by a web search. To quote:
其中有一些条目可立即识别:`.note.Xen` 是 NOTE section,推测包含与 hypervisor 相关的元数据;`.xen_prstatus` 与此前 core dump 中的 PRSTATUS 名称相似,几乎可以确定其包含各 CPU 的寄存器状态。其余部分可参考通过网络检索到的文档说明。引用如下:
".xen_pfn" sectionname ".xen_pfn"type SHT_PROGBITSstructure array of uint64_tdescriptionThis elements represents the frame number of the pagein .xen_pages section.The size of arrays is stored in xch_nr_pages member of headernote descriptor in .note.Xen note section.The entries are stored in ascending order.The value, ~(uint64_t)0, means invalid pfn and thecorresponding page has zero. There might exist invalidpfn's at the end part of this array.This section must exist when the domain is auto translatedphysmap mode. Currently x86 full virtualized domain andia64 domain.[...]".xen_pages" sectionname ".xen_pages"type SHT_PROGBITSstructure array of page where page is page size byte arraydescriptionThis section includes the contents of pages.The corresponding address is described in .xen_p2m sectionor .xen_pfn section.The page size is stored in xch_page_size member of header notedescriptor in .note.Xen section.The array size is stored in xch_nr_pages member of header notedescriptor in .note.Xen section.This section must exist.
So, the .xen_pages section contains the actual memory data (which makes sense, as it is the largest), and the .xen_pfn section provides the PFN (roughly the same as the address) of each page. This also gives some info on the note contents: they are providing some of that critical metadata like page sizes.
因此,`.xen_pages` section 包含实际内存数据(其体积最大亦符合直觉),而 `.xen_pfn` section 则提供每个页面的 PFN(可粗略理解为与地址同量级的页帧编号)。这也间接说明了 note 内容所提供的信息:其中包含一些关键元数据,例如页大小等。
All this is to say that, while the Xen core dump may be in the ELF format, it is not at all similar to the ELF vmcores we’ve seen before. This is both a testament to, and a consequence of, ELF’s incredible flexibility. It’s worth noting that, thanks to the .xen_pfn section, it seems like it’s possible for this format to exclude pages from the vmcore, much like makedumpfile does with the kdump-compressed format. However, it’s unlikely that Xen actually uses this capability, since deciding which pages should be excluded is difficult for a hypervisor. On the other hand, unlike the kdump-compressed format, this format still cannot support per-page compression: the metadata is designed with the expectation that each page in the .xen_pages takes up the full PAGE_SIZE bytes.
总之,尽管 Xen core dump 采用 ELF 封装,但它与前文所见的 ELF vmcore 在结构上并不相似。这既体现了 ELF 的高度灵活性,也正是这种灵活性带来的结果。值得注意的是,由于存在 `.xen_pfn` section,该格式看起来也可能像 `makedumpfile` 的 kdump-compressed 一样具备“排除部分页面”的能力;但 hypervisor 很难判定哪些页面应被排除,因此 Xen 实际使用该能力的可能性较低。另一方面,与 kdump-compressed 不同,该格式仍无法支持按页压缩:其元数据设计假定 `.xen_pages` 中每个页面占用完整的 `PAGE_SIZE` 字节。
Without more experience with the format, it’s difficult to say what its advantages are (beyond being the format available if you’re using Xen). It could be that the various metadata can provide insight into the hypervisor state as well. However, the disadvantages here are clear. Though ELF tools will analyze it, the non-standard use of application-specific sections instead of program headers precludes most standard debugging tools from understanding it. Of course, this does not apply to Crash, which does fully support Xen vmcores.
在缺乏更多实践经验的情况下,除“使用 Xen 时可用”之外,很难断言该格式的优势;其元数据或许还能提供对 hypervisor 状态的洞见。然而,其劣势相当明确:尽管 ELF 工具能对其进行解析,但由于它以应用特定 section 取代 program header 来表达核心信息,多数通用调试工具难以理解该布局。当然,Crash 例外,它对 Xen vmcore 提供了完整支持。
Other formats
Like we’ve already said, this listing of kernel core dump formats is mostly incomplete. Simply browsing the code in Crash which is responsible for identifying the core dump format is overwhelming, as it reveals several others which aren’t even mentioned here:
如前所述,本文对内核 core dump 格式的列举并不完整。仅仅浏览 Crash 中负责识别 core dump 格式的代码就足以令人目不暇接,因为其中还揭示了许多本文未提及的格式:
A few additional Xen-specific formats 若干额外的 Xen 专用格式
A “kvmdump” format which seems to be obsolete now 一种名为 “kvmdump” 的格式(似乎已过时)
An “sadump” format which seems related to a BIOS dump capability on some Fujitsu servers 一种名为 “sadump” 的格式(似乎与部分富士通服务器的 BIOS 转储能力相关)
Dump formats for the netdump and LKCD projects which predate kdump 早于 kdump 的 netdump 与 LKCD 项目使用的转储格式
Some formats for VMWare, as well as one related to Mission Critical Linux. 若干 VMWare 相关格式,以及一种与 Mission Critical Linux 相关的格式
And these are just the formats Crash supports! Your author has also had the pleasure of using Hyper-V to create a virtual machine snapshot, and then using the vm2core tool provided by azure-linux-utils to create an ELF file which was marginally useful (with some tweaking) with Crash. Surely there are even more exotic formats yet to be found.
而上述还只是 Crash 已支持的格式集合。作者还曾使用 Hyper-V 创建虚拟机快照,并借助 `azure-linux-utils` 提供的 `vm2core` 工具生成一个 ELF 文件;在进行一定调整后,该文件对 Crash 而言“勉强可用”。可以预期,尚有更多更为“异域”的格式有待发现。
Core dump consumers
Throughout this article, we’ve mentioned a few tools which can be used to analyze Linux vmcores, namely: GDB, Crash, libkdumpfile, and drgn. In this section, we’ll briefly give some pros and cons of each tool, especially as they relate to their supported input formats and analysis capabilities.
本文已多次提到可用于分析 Linux vmcore 的工具,包括:GDB、Crash、`libkdumpfile` 与 `drgn`。本节将简要比较它们的优缺点,尤其关注其支持的输入格式范围与分析能力。
GDB
GDB needs no introduction, as the GNU project’s very well-known general purpose debugger. GDB is of course capable of debugging running processes, but can also support ELF formatted core dumps, typically by running:
GDB 无需赘述,它是 GNU 项目中广为人知的通用调试器。GDB 不仅可调试运行中的进程,也支持 ELF 格式的 core dump,典型用法如下:
$ gdb EXECUTABLE...(gdb) core-file CORE
It is capable of debugging some Linux core dumps or /proc/kcore, and in fact the Linux kernel contains a set of scripts which could be used for debugging the kernel with GDB. However, this solution is limited in a number of ways. GDB only supports standard ELF core dumps, and it relies on the virtual addresses in the program headers to do address translations. If the program headers are missing for certain ranges, or if the ELF vmcore was from a source that didn’t include the virtual address translations, then GDB won’t be able to understand the core.
GDB 能够调试部分 Linux core dump 或 `/proc/kcore`;事实上,Linux 内核还包含一组脚本可辅助使用 GDB 进行内核调试。但该方案存在多方面限制:GDB 仅支持标准 ELF core dump,并依赖 program header 中的虚拟地址完成地址转换;若某些范围缺失相应 program header,或 ELF vmcore 来源本就未包含虚拟地址转换信息,则 GDB 将无法理解该 core。
It also seems that GDB may not support kernels with KASLR enabled, but further research is needed to confirm that this is still a relevant concern. Needless to say, while GDB is a quite powerful debugger, it’s not designed for the kernel, and so it’s not regularly used for it.
此外,GDB 似乎也可能无法良好支持启用 KASLR 的内核,但仍需进一步研究以确认该问题在当下是否依旧显著。总之,尽管 GDB 功能强大,但其并非为内核而设计,因此并不常作为内核 core dump 的主力工具。
Crash
The Crash utility can be thought of as a successor to the kernel’s GDB scripts. It wraps a GDB process and implements support for a broad variety of core dump formats, as well as details like page table translation for various architectures. Users can run common GDB commands or several quite useful kernel-specific helpers. The PyKdump framework can be used to further extend this system with Python scripting.
Crash 工具可视为内核 GDB 脚本体系的继任者。它封装了一个 GDB 进程,并实现了对多种 core dump 格式的支持,以及跨架构的页表地址转换等关键细节。用户既可运行常见的 GDB 命令,也可使用一系列面向内核的辅助命令。`PyKdump` 框架还可通过 Python 脚本进一步扩展该系统能力。
Crash is able to combine the power of GDB with support for almost every core dump format under the sun, which makes it a quite impressive debugging tool. It sets the standard for interactive kernel core dump debuggers.
Crash 将 GDB 的能力与对几乎“无所不包”的 core dump 格式支持结合在一起,因而是一个相当强大的调试工具;它在交互式内核 core dump 调试器领域确立了事实上的标杆。
libkdumpfile
Unlike Crash and GDB, libkdumpfile is not a debugger, per-se. Instead, it is a library which implements the ability to read many different types of core dumps, including ELF and kdump-compressed formats. Its bundled library, libaddrxlat, implements the details of address translation for a variety of architectures. You can write simple applications to read data out of a vmcore, and by using this library, you’ll find it shocking how many core dump formats and architectures your tools will support.
与 Crash、GDB 不同,`libkdumpfile` 严格来说并非调试器,而是一个实现“读取多种 core dump 格式”能力的库,涵盖 ELF 与 kdump-compressed 等格式。其配套库 `libaddrxlat` 实现了多架构的地址转换细节。通过该库可编写简洁应用从 vmcore 中读取数据;使用后往往会对其所支持的格式与架构覆盖面之广感到惊讶。
drgn
Drgn is a kernel (and userspace) debugger as a Python library. Rather than using debugger commands, it allows users to write Python code that treats the kernel’s data structures like regular Python objects. It supports standard ELF core dumps (and the running kernel) natively, and it relies on libkdumpfile to understand other formats, like kdump-compressed files. Its support for more esoteric core dumps (e.g. hypervisor core dumps which may be missing metadata) is behind alternatives like Crash, but it is improving.
Drgn 是以 Python 库形式提供的内核(以及用户态)调试器。它不以传统“调试命令”为交互范式,而允许用户编写 Python 代码,将内核数据结构当作常规 Python 对象来操作。它原生支持标准 ELF core dump(以及运行中内核),并依赖 `libkdumpfile` 来理解其他格式(如 kdump-compressed)。对于更“非典型”的 core dump(例如缺失元数据的 hypervisor core dump),其支持程度目前落后于 Crash 等替代方案,但仍在持续改进。
Other Tools
When presented with a core dump that is difficult to understand, you may want to revisit some of the tools used throughout this article to better understand it:
当遇到难以理解的 core dump 时,可以重新利用本文提及的一些工具来辅助澄清其结构与内容:
file is a good first step! Be sure to use the most recent possible version, as its detection is constantly changing.
`file` 是很好的第一步;应尽量使用最新版本,因为其格式识别逻辑在持续演进。
readelf and eu-readelf (from binutils and elfutils, respectively) provide critical tools for examining ELF files:
`readelf` 与 `eu-readelf`(分别来自 binutils 与 elfutils)是检查 ELF 文件的关键工具:
-l for viewing program headers `-l`:查看 program header
-S for viewing section headers `-S`:查看 section header
-n for viewing notes. Prefer eu-readelf here, because it supports printing the contents of some note types, like build IDs and VMCOREINFO. `-n`:查看 notes。此处更推荐 `eu-readelf`,因为它支持打印部分 note 类型的内容,例如 build ID 与 VMCOREINFO。
xxd is quite useful for viewing data headers, though any hex dumper will do.
`xxd` 对查看数据头部很有帮助;当然,任何十六进制查看器均可替代。
Some more specific needs may require some scripts, or even a libkdumpfile based tool. You can find some example tools in the libkdumpfile examples directory, as well as some of my own tools here.
更具体的需求可能需要脚本,甚至需要基于 `libkdumpfile` 编写工具。你可以在 `libkdumpfile` 的 `examples` 目录中找到一些示例工具,作者也在此处提供了自己的部分工具。
Finally, some of the best diagnostic information when analyzing a core dump can come from reading the code of tools designed to generate or read them. To that end, here are some links to a few relevant portions of several important projects:
最后,分析 core dump 时最有价值的诊断信息之一,往往来自阅读“生成或读取 core dump 的工具本身”的源码。为此,下面列出若干重要项目的相关代码入口:
Linux: kcore.c implements /proc/kcore Linux:`kcore.c` 实现 `/proc/kcore`
Makedumpfile: diskdump_mod.h contains the definition of kdump-compressed format Makedumpfile:`diskdump_mod.h` 包含 kdump-compressed 格式定义
QEMU: dump.c contains QEMU’s implementation of creating ELF and kdump-compressed vmcores. QEMU:`dump.c` 包含 QEMU 生成 ELF 与 kdump-compressed vmcore 的实现
Crash: diskdump.c contains the implementation of reading kdump-compressed files. However, there are lots of other relevant files for different formats and variants. Crash:`diskdump.c` 包含读取 kdump-compressed 文件的实现(此外还有大量其他格式/变体相关文件)
libkdumpfile: diskdump.c contains implementations related to kdump-compressed vmcores, and elfdump.c contains implementations related to ELF libkdumpfile:`diskdump.c` 包含与 kdump-compressed vmcore 相关实现,`elfdump.c` 包含与 ELF 相关实现
Conclusion
Kernel core dumps are complex. They are not simply copies of system memory; they contain plenty of extra metadata which is critical to understanding their contents. And like any other type of data, the design of the file formats can enable lots of flexibility and power. However, due to the broad variety of tools out there, the diversity of dump formats is overwhelming, and the lack of documentation or specifications compounds the problem. While the ecosystem, and especially high quality tools like Crash, makedumpfile, libkdumpfile, and drgn, generally work very well together, there are still compatibility issues that can be difficult to work around. Hopefully this guide can provide the first step in understanding these issues, so that you can be better equipped to fix your vmcores in the future.
内核 core dump 的复杂性不容低估:它们并非系统内存的简单拷贝,而包含大量对理解其内容至关重要的附加元数据。与其他数据类型一样,文件格式的设计可以带来高度灵活性与强大表达力。然而,由于工具链生态极其多样,转储格式也呈现出令人眼花缭乱的多样化;而缺乏文档或规范又进一步加剧了理解与互操作难度。尽管整个生态——尤其是 Crash、`makedumpfile`、`libkdumpfile` 与 `drgn` 等高质量工具——通常能够良好协同,但仍存在一些难以规避的兼容性问题。希望本文能够成为理解这些问题的第一步,从而使你在未来更有能力修复与处理自己的 vmcore。