Unlike the first time I (incompletely) implemented ELF handling by reading readlelf.c and linux/arch/*/include/asm/elf.h, this time I'm reading the specs. All of the specs (at least for all the architectures I care about, I'll let friendly PR providers add their own stuff later but I'm not gonna).
It turns out, Wikipedia (https://en.wikipedia.org/wiki/Executable_and_Linkable_Format#Specifications), the uclibc page (https://uclibc.org/specs.html), and especially the kernel page (https://refspecs.linuxfoundation.org/) refer to out of date documentation that doesn't match the specification as implemented. Luckily, for nearly all of these specifications that don't match, it's just because the spec has been updated and the reference hasn't. This is most noticeable with the most actively developed architectures x86_64, RISC-V, and ARM. So it's just a matter of tracking down these new specs, which I've done along with all the old, not-likely-to-change specs and gathered them all here: https://github.com/novafacing/elf/tree/main/specifications.
Ok, with the "how am I getting my information" question answered (a combination of up to date docs and, yeah, glibc+linux code because it is not precisely gospel but is so close to it that we may as well sing it) the first decision in writing an implementation of the spec is how to handle the object file. Computers are pretty fast now, and I don't intend this to be used in an operating system or anything that's stupendously performance sensitive. I tried, well, see for yourself:
pub struct ElfHeaderVersion {
pub version: u64,
}
pub trait ElfHeaderVersionKind {
fn version(&self) -> ElfHeaderVersion;
}
pub struct Elf32HeaderVersion(Elf32Half);
impl ElfHeaderVersionKind for Elf32HeaderVersion { /* ...boring... */ }
Yucky!!!! Not good. Obviously the goal is:
If that's all we need, why not just parse everything into Elf64 with some metadata so we can convert it back later? I'm going to do this, because otherwise it's a mess, and if I regret the choice later, I'll just change the implementation because it's rust and we can do that.
A surprising thing you'll notice after reading the various specs and code is that there is way less architecture-specific stuff than you would expect. There's a lot, but it's not like every vendor defines an extension twice the size as the original spec like with PDF.
It's mostly relocation-specific stuff and flags.
Ok, unfortunately I have nerd sniped myself. As soon as I pasted the snippet above I realized there is actually a way more legit way to do this because of how I mentioned earlier that ELF is a pretty good format. The only abstraction is over bitwidth and byte ordering, and we know them up front (and if we don't, we can guess pretty accurately). So... behold:
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
/// An address in an ELF file. Represented as 32 bits for class 32 and 64 bits for class 64.
pub struct ElfAddress<const EC: u8, const ED: u8>(pub u64);
A happy medium, now we can do the entire decode from the top with the right shape. This has one unfortunate side effect -- if we implement a best-effort mode and it turns out at any point that we guessed wrong, we'll need to start fro the top. Luckily, even HUGE ELFs are under a few hundred MB, and this is a factor of 2, not n so I think it's fine.