Okay a little bit of #unpaper ghosts for tonight.

I'm nearly at the end of completing the work @federicomena started: moving all the options into an Options structure instead of using a bunch of separate globals (which I already turned into locals to main() at least.)

This would unblock, among others, the option of processing each input file in parallel instead of doing them one by one.

I was hoping to hide some of those locals by defining them in a block within the parser… but tests fail.

one last note for the day from my #unpaper refactoring work: what is even the point of doing bitwise operations in C to try to contain up to four edges (left, top, right, bottom) into an `int` variable?

`struct Edges { bool left; bool top; bool right; bool bottom; }` is literally the same size, the difference being instead of keeping the data in the lower four bits, it puts it one byte per edge!

Yesterday's maintenance work on #unpaper is something that to me clearly shows the point I was making about the opportunities arising in treating *specific* #LLMs as Computer-Aided Software Engineering (CASE) tools, so I thought I would post a quick thread here, since I don't think I'll manage to post it on the blog any time soon.

Full disclosure before I start: I work for Meta, which clearly has been betting a lot on AI — but this is my personal point of view, and I don't work on AI projects.

Only a minor #unpaper task today because I'm still recovering from a very annoying flu, but I decided to re-do the parsing of physical dimensions so that it stops being quite random depending on the order of passing -dpi (which is the wrong name anyway) and the sizes themselves.

Okay I guess by this point I need to put numbers together and make a blog post — on a different system (native rather than WSL), a clang -O2 build running the same #unpaper pipeline takes 6.1s.

The same revision but built "ricer" (-O3 -march=native -ftree-vectorize) takes 3.5s! But the text size of the binary is quite increased: 75KB vs 91KB.

Unfortunately codiff (at least the OpenSUSE version) doesn't like those DWARVES :(

Okay either there is something wrong in the code, or wow does GCC not keep up.

(Editing for clarity)

Of the two #unpaper binaries, one built with clang, the other with gcc, both with -O2 and LTO, the first take 16s to run, the latter 33s!

Maybe it's the LTO that makes a difference? If that is the case it's well possible that I could see a significant performance regression with the refactoring, with GCC — the code is now split across multiple units rather than in a single one.

@flameeyes

These conditions (on image formats) are evaluated to the same result so they aren't so costly, they are going through a fast path due to easy branch prediction. However, that's still extra work. It's possible that the introduction of function pointers removed the possibility of optimizations (the compiler doesn't know which function would be called, so at least it can't inline the call).

The code would be much faster if the format switch was outside of the pixel loop. But you'd probably want to pull in some more dependencies instead of reinventing the wheel, and the improvement would have to be worthwhile to justify the effort.

What is your main use case for using #unpaper? I've NIH'ed code ("digigami" which I hope to publish some day) to handle various paper scanning and digitization tasks.

Talk about the non-obviousness of optimizations! I thought that, since most of the time in #unpaper was spent on the set/get pixel, and these both had huge switch/case blocks, getting rid of those in favour of using function pointers could have helped.

Nope! Regressed by over two seconds on a six seconds run. That's *quite worse.* I guess I'll keep optimizing for readability and trust the compiler.

Okay #unpaper poll time: I'm going to definitely change the way it configures masks: there's both auto-masking and user-inserted masks. Right now it allocates 100 items for each of those, with a running count. The order *shouldn't* matter, they are only added to during option parsing and iterated over a few times.

I'm going to change these with a structure that holds a count as well as the actual masks, either with a flexible array or with a linked list.

Which one do you think make more sense?

Flexible array in blocks of 5
0%
Flexible array in blocks of 10
50%
Linked list of single masks
50%
Linked list of blocks of N masks
0%
Poll ended at .