i don't see enough people with one of the best tool improvements i've ever made for reverse engineering, so i had to write a blog post about it!
i don't see enough people with one of the best tool improvements i've ever made for reverse engineering, so i had to write a blog post about it!
@simonomi by sheer chance my eyes locked onto the single C0 in the monochrome example
did I win?
@simonomi Seeing this preview image makes me want the improvement of alphabetizing my hex editor.
But seriously this is incredible, thank you!
@simonomi That reminds me of debugging Sinclair QL code.
The screen could be mapped to the next bank up of RAM, where code and data live. Instead of a colour coded hex dump, the screen was then a colour coded bit dump - each pixel was one of four colours indicating its two bit value.
So two bits per pixel, sixteen pixels per thirty-two bit pointer. With a bit of practice we could follow pointer chains around the screen, really helpful for debugging complex data structures.
@simonomi i'm curious about the statement here:
the bitstream is much more colorful and chaotic because good compression algorithms output data that looks visually random.
not disputing its correctness but this is a very nontrivial claim described in visual terms that are somewhat removed from the discussion just above regarding prefix codes. i'm curious about how you arrived at this and in particular if your reverse engineering work motivated this intuitive description
@hipsterelectron it mostly came from the intuition of having look at so many different types of binary. stuff with really high information density (compressed, executable, media, etc) tends to look very busy, because there's simply more information squished into fewer bytes
i've ended up writing my own whole fancy binary parsing system for my tool carbonizer. it's pretty specialized to the patterns used in the game files for Fossil Fighters, but i'm reasonably happy with it overall
zip crate which was initially just to avoid multiple reads for data of known size but also supports safe and very performant SIMD searching for magic bytes (every single impl i've seen for zips works byte-by-byte and has trouble with the file comment). took some more work to use the same specification for writes@simonomi yes very good. I wrote my own hexdump tool for similar reasons (and some other bits and bobs I wanted), and would encourage regular hexdump users to write their own too and tune it to their preferences
the main other feature I added is an `-r`to specify a byte range, and I use something like `hexd -r 0-0x40 *.foo` way too often to compare headers of a bunch of files quickly (e.g.)
@simonomi Very nice. I can see similar use cases for PDU parsing.
A caveat I'll throw out is these sort of things need a bit of palette diversity to cover color blindness. I've worked with a statistically unusual number of such folk over the years in software development.
@simonomi I haven’t done web dev in ages but sites and apps exist.
No endorsement but here is an example.
@simonomi I did something similar for hashes/digests: https://github.com/boredzo/hashvis
Uses both color and shape to make hashes easily visually comparable.
@simonomi @b0rk this is great! As you noted, there are lots (infinite) ways to choose your coloring scheme and I really like how you use so many colors in yours.
My own such tool uses a far smaller pallette for text/Unicode purposes (and also supports wider displays and a few other tweaks): https://github.com/adamhotep/misc-scripts/blob/main/hd
(Edit: that screenshot isn't great since I took it on my phone with Termux and its Unicode support isn't great.)
You might be interested in "hevi", my own project. It colors the output based on the binary format and its semantics. Right now, I'm (slowly) working on greatly improving the way to define formats.@simonomi Interesting! I'll have to try doing hexdumps with a full colour spectrum.
I did find it was a lot easier to scrub through unknown files after making a hexdump tool with defaults that worked for me (colours for 00/FF + ASCII range + everything else, symbol column using CP437). I also made a tool for byte histograms, which provides pretty good results for discerning compressed files from encrypted