mquire: keeping up with kallsyms on Linux 7.x

mquire rebuilds the symbol table by reading kallsyms straight from a memory dump, so it depends on the exact way that data is laid out in the kernel image. This was originally written for 6.4+ kernels, but the recently released 7.0 version changed the layout again, breaking the parser.

I introduced mquire in an earlier post, back before the 7.0 kernels were out. Getting it to support them came down to three separate changes: one in how symbol addresses are stored, and two smaller ones that both fall out of a single change to how the kallsyms data is aligned. None of them is complicated on its own, but together they show how fragile it is to rely on a data layout that the kernel never meant to be read from the outside.

How mquire reads kallsyms

On a live system you can read /proc/kallsyms as plain text, because a running kernel is there to unpack it for you. In a dump there is no kernel to do that. The data is still present, but only as a set of raw sections in .rodata with nothing pointing at them, and mquire can’t walk from one to the next by adding sizes, because the gaps between sections shift from one kernel version to the next. It has to find each section on its own, from what its bytes look like.

The method is to anchor on a few landmarks: byte patterns distinctive enough to recognize without a pointer. Once a landmark gives mquire one section, it reads a bounded window of memory next to it and looks for the neighbor there, instead of searching all of memory again. And because a landmark can match in more than one place, mquire never commits to a single answer: it carries every candidate forward as its own session, and the wrong ones drop out later, when the data they imply fails to line up.

Three landmarks do most of the work:

  • The token table. Symbol names are compressed against a table of 256 tokens, and that table almost always contains the single-character tokens for 0 through 9 as a run of digit, null, digit, null, .... That run is distinctive enough to be mquire’s entry point: it scans the whole kernel address space for it, then walks outward to recover where the table starts and ends.
  • The markers. kallsyms_markers is an array of offsets into the names data, and its first entry is always zero. mquire searches the window just before the token table for that leading zero, keeping only candidates whose offsets grow by a plausible amount from one 256-symbol group to the next.
  • The symbol count. kallsyms_num_syms is a bare four-byte number, impossible to recognize on its own. But it sits right after linux_banner, the “Linux version x.x.x …” string that every kernel carries in plain text. The linker places that string right before the kallsyms block, so mquire finds the banner in the window before the markers and reads the count from just after it.

Everything else follows from those three. The token index is fully determined by the token table, so mquire builds the exact bytes it expects and finds them right next to it. The offset array and the names data each sit at the end of a section it has already located, so reaching them is just a matter of alignment. The candidate that survives is the one whose sections decompress into a consistent set of symbols. The rest are thrown away.

The rest of this post is about that data, and how it changed in 7.0.

The layout, before and after

It helps to see the whole layout at once. On a 6.x kernel the kallsyms sections are laid out like this:

section kallsyms_num_syms
section kallsyms_names
section kallsyms_markers
section kallsyms_token_table
section kallsyms_token_index

%if !base_relative
    section kallsyms_addresses
%else
    section kallsyms_offsets
    section kallsyms_relative_base
%endif

section kallsyms_seqs_of_names

On a 7.0 kernel it looks like this:

section kallsyms_num_syms
section kallsyms_names
section kallsyms_markers
section kallsyms_token_table
section kallsyms_token_index
section kallsyms_offsets
section kallsyms_seqs_of_names

The order of the sections is the same. The difference is at the bottom: the conditional block from 6.x, with its two storage modes, collapses into a single kallsyms_offsets array. The next section is about what that array means in 7.0.

PC-relative symbol offsets

After mquire has found the symbol names, it needs their addresses. This is the biggest change in 7.0.

On a 6.x kernel the addresses could be stored in two different ways, and mquire supported both. Most builds used a base-relative layout: the kernel stored one absolute address called kallsyms_relative_base, and then a 32-bit offset for each symbol. The real address was the base plus the offset. Builds that turned this option off stored the full 64-bit address of every symbol instead, in an array called kallsyms_addresses. This was simpler to read but used four times the space. mquire checked which of the two was present and used it.

7.0 removes both of them. kallsyms_relative_base is gone, the record_relative_base() function that produced it was removed from scripts/kallsyms.c, and the kallsyms_addresses array is gone too. There is only one array left, kallsyms_offsets, but the meaning of each entry is different now. Each offset is relative to the position of the entry itself. To get the address of a symbol, you take the virtual address where its offset is stored, and you add the value of the offset to it. There is no base address to read anymore, because the answer only depends on where the entry is and what it contains.

This is a nice property. If the whole kernel image is loaded at a different address because of KASLR, the entry moves and the symbol moves by the same amount, so the difference between them does not change. The offsets stay correct without a base address and without any fixup at boot. For mquire it means the address is just one addition: the position of the offset, plus the offset.

The four-byte alignment change

The other two changes are smaller and share a cause: in 7.0 the alignment of the whole kallsyms data went from eight bytes to four. In the generator script, .balign 8 became .balign 4. It looks like a tiny detail, but it broke mquire in two places, both relying on the padding the larger alignment used to leave around the data.

The first was the symbol count. kallsyms_num_syms has always been four bytes, but mquire read eight. Under eight-byte alignment that was harmless: four zero bytes of padding followed the count, so the extra read just pulled in zeros. Under four-byte alignment the padding is gone, so those four bytes are now the start of the compressed names and the count comes out wildly wrong. The fix is a one-character change in two places, the read and the offset that locates it. Both are the same: read four bytes, align to four.

The second was harder. Symbol names are compressed with a table of 256 tokens, and mquire finds that table, kallsyms_token_table, by its content: it almost always contains the single-character tokens for digits and letters in a predictable run, which mquire uses as an anchor, scanning backwards from it to find the start. That scan is what relied on the alignment. The section before the table, kallsyms_markers, is an array of 32-bit offsets that reads as random bytes, and under eight-byte alignment there was always some zero padding between the two. mquire scanned back to two zero bytes in a row and took the table as starting right after. Under four-byte alignment that padding isn’t guaranteed: the markers can run right up to the table, and their trailing offset bytes can look exactly like the start of a token. There is no single byte you can point to as the start.

The fix here is to stop looking for one answer. The backward scan now returns every candidate it can find. A candidate is any position where a zero byte is followed by a printable one. Each one goes to the forward table reader, which only accepts a start it can read 256 tokens from, each ending in one zero byte. Wrong candidates fail and are dropped, and the real start is the one that parses.

This shipped in mquire 1.2.7, so the latest release reads 7.0 dumps.

Categories: Development 
Tags: forensics