I can’t believe that nobody has thought about copying files earlier. Do I really need to write my own program to sync folders efficiently? A good design would roughly double the transfer speed / cut time in half.

Why is there still no sane Linux sync tool for file systems, that does: stat-based compare, whole-file replace, preallocation, one file at a time, and a bounded read/write queue for concurrent streaming? Everything is either rsync-style overcomplicated, cloud-ish, inefficient or brittle.

#linux #sync #filecopy #fast #efficient

@sl I use syncthing. I'm not a big user (neither large shares nor large turnover) but it's been solid.

And still rsync for one-off backups etc ;-)

@buckfiftyseven I use rsync exactly for backups, but it’s slow. Based on my discussion, I asked AI to draft me a specifications for the software, if I ask it later to write and test it. I’ll drop the spec in next message, because I’ll delete it later.
Share CLVZ

@sl interesting. I suppose I could at least rsync --size-only and --whole-file. That would probably be compatible with my methods.
@buckfiftyseven I use --whole-file but it won’t solve the problem. synchronized write. Causing clear read/write in turns, blocking efficient copying in parallel. #rsync

@sl this all did encourage me to review my processes and improve my scripts, so thank you.

Good luck on the effort

@buckfiftyseven First iteration used huge read / write async buffering, it helped already. But didn’t reveal what the root cause was. Today after thinking a while, I tried alt approach, and now it works like EVERY copy program should work…

The Root of rsync Slowness - After this final iteration, I finally understand why rsync is darn slow. It all comes down to proper cache management and executing disk operations in an efficient and optimized way. This is amazing. I am still totally baffled that nobody had thought about this previously. With efficient cache management implemented, I could turn down the buffering dramatically. - This is the secret sauce, which rsync is NOT doing: _fadvise(src_fd, 0, 0, 'POSIX_FADV_SEQUENTIAL') _fadvise(src_fd, 0, 0, 'POSIX_FADV_NOREUSE') _fadvise(current_fd, 0, 0, 'POSIX_FADV_SEQUENTIAL') _fadvise(current_fd, 0, 0, 'POSIX_FADV_NOREUSE') - And that the key of actually doubling the performance. Having huge async buffer is one way, but that reveals the root cause.

This seems to prevent excess buffering and flush pauses, which needed the huge buffer on read side. I dropped the buffer from 1 GiB to 8 MiB and that’s still working perfectly. So large buffer is not needed, when cache is managed efficiently.

@sl @buckfiftyseven Interesting! I'm a bit skeptical though.
rsync has been around for 30 years and is maintained by very experienced systems programmers. The idea that a fundamental optimization like fadvise hints was simply overlooked seems unlikely for me. POSIX_FADV_SEQUENTIAL and POSIX_FADV_NOREUSE are kernel hints, not commands; the kernel is free to ignore them, and their effect varies heavily by OS, kernel version, and workload.
@sl @buckfiftyseven A few thinks worth considering:
- What was your actual bottleneck? Network, disk I/O, or CPU? rsync is network-bound in most real-world scenarios, where local cache hints would have zero impact.
- Have you compared against cp --reflink or rclone for the same workload?
- Are the benchmarks controlled for filesystem cache warmup, file sizes, and concurrency?
@sl @buckfiftyseven Your results sound interesting for a specific use case (e.g. local, sequential, large-file copies?), but "rsync is twice as fast" is a strong generalization. It might be more accurate to say: "in my specific workload, adding fadvise hints eliminated a cache pressure bottleneck"?
@Madic @buckfiftyseven Those are excellent questions. To address them comprehensively, I will write a making-of blog post.
Sami Lehtinen - Fastsync: How I Doubled rsync's Speed

Writing fastsync: Why is copying files still so slow, and how I doubled rsync’s speed I wrote a tool called fastsync when I finally got too frustrated with rsync’s poor local copying performance. This post goes through the making of the tool, the root cause I found, and why proper cache management

@sl
Thank you. Will read it later
@buckfiftyseven