Today's bug is a `duperemove` infinite looping bug: https://github.com/markfasheh/duperemove/pull/376
There `duperemove` was not able to dedupe against NoCOW file:
$ dd if=/dev/urandom bs=8M count=1 > a
$ touch b
$ chattr +C b
$ cat a >> b
$ ./duperemove -d -q --batchsize=0 --dedupe-options=partial,same a b
<hangup>
I noticed it about a month ago but got to debug it only today. It's a 0.15 regression. The fix is trivial once bisected.
Bin dann doch ein wenig neugierig, ob der seit 29.11. laufende duperemove-Prozess noch irgendwann enden oder in die Erbmasse mit eingehen wird.
Today's `duperemove` bug is a https://github.com/markfasheh/duperemove/issues/332.
There `duperemove` crashes when the file being deduped gets truncated down to zero.
And the bug is already fixed!
`dupermove-0.14` is a lot faster than `duperemove-0.13`!
Unfortunately it crashes sometimes on my input data. It takes about 10 minutes to observe the crash.
I wrote a trivial fuzzer to generate funny filesystem states for `duperemove`. Guess how long it takes to crash `duperemove `with it.
Spoiler: https://trofi.github.io/posts/305-fuzzing-duperemove.html
Today's `duperemove` bug is a https://github.com/markfasheh/duperemove/pull/324.
There quite aggressive `--dedupe-options=partial` option used less optimized `sqlite` query to fetch unique file extents. That caused the whole database scan when data was queries for each individual file.
The fix switched `JOIN` query for nested `SELECT` query to convert from full scan to an index lookup.
The idea of the change is to substitute linear scan of extents table for lookup in it in block dedupe phase. Here are the query explanations by sqlite: $ sqlite3 /tmp/foo.db sqlite> .eqp on Before...
Today's `duperemove` bug is a minor accounting bug: https://github.com/markfasheh/duperemove/pull/323
$ ls -lh /nix/var/nix/db/db.sqlite
1.4G /nix/var/nix/db/db.sqlite
Before the change:
$ ./show-shared-extents /nix/var/nix/db/db.sqlite
/nix/var/nix/db/db.sqlite: 27065321263104 shared bytes
After the change:
$ ./show-shared-extents /nix/var/nix/db/db.sqlite
/nix/var/nix/db/db.sqlite: 1169276928 shared bytes
The size reduction is not as impressive as initially reported :)
Before the change filerec_count_shared() incorrectly accounted for extent end compared to file end: $ ls -lh /nix/var/nix/db/db.sqlite -rw-r--r-- 1 root root 1.4G Nov 9 22:21 /nix/var/nix/db/db.sq...