Accounting for endianness has always been one of the more challenging aspects of writing a YARA rule. It’s hard explaining why seeing the MZ header in a hex editor as "4D5A", would need to be referenced as "5A4D" within a YARA condition.

TIL using the "be" equivalent condition (for all bytes), allows us to use the byte sequences as we see them in our hex editors, without needing to swap for endianess, and without impacting performance in any meaningful way, e.g.,

PE: uint16be(0) == 0x4D5A
PDF: uint32be(0) == 0x25504446

Huge thanks to @glesnewich and @stvemillertime for this enlightening moment!

#100DaysofYara

@snkhan @glesnewich @stvemillertime
Is there a good reason to use this syntax over creating the MZ header bytes as a variable (e.g $mz = {4D 5A}) and using it in the condition ($mz at 0x0)?

Curious since that’s what I’ve been doing, but I know that many don’t write their rules like that.

I’ve personally found this to correlate with what is seen in the hex editor

@0x1c Great question!

It’s because `$mz at 0` isn’t very efficient. In the background, that condition causes YARA to first search for *every* single instance of "MZ" in the file. And because that is such a short sequence of bytes, there are likely to be a great number of them. Only after YARA has found ALL "MZ" occurrences, does it evaluate the `at 0` portion of the rule.

In comparison, `uint16be(0) == 0x4D5A` (and other $string-less conditions), evaluate that part of the condition immediately, and are therefore more performant. Which really makes a difference when searching across a huge corpus of samples. Hope this explanation helps!

#100DaysofYara

@snkhan Ah- that does make sense! Will keep that in mind from now on, thanks!