@buckfiftyseven First iteration used huge read / write async buffering, it helped already. But didn’t reveal what the root cause was. Today after thinking a while, I tried alt approach, and now it works like EVERY copy program should work…
The Root of rsync Slowness - After this final iteration, I finally understand why rsync is darn slow. It all comes down to proper cache management and executing disk operations in an efficient and optimized way. This is amazing. I am still totally baffled that nobody had thought about this previously. With efficient cache management implemented, I could turn down the buffering dramatically. - This is the secret sauce, which rsync is NOT doing: _fadvise(src_fd, 0, 0, 'POSIX_FADV_SEQUENTIAL') _fadvise(src_fd, 0, 0, 'POSIX_FADV_NOREUSE') _fadvise(current_fd, 0, 0, 'POSIX_FADV_SEQUENTIAL') _fadvise(current_fd, 0, 0, 'POSIX_FADV_NOREUSE') - And that the key of actually doubling the performance. Having huge async buffer is one way, but that reveals the root cause.
This seems to prevent excess buffering and flush pauses, which needed the huge buffer on read side. I dropped the buffer from 1 GiB to 8 MiB and that’s still working perfectly. So large buffer is not needed, when cache is managed efficiently.