OK, here's a #TechieQuestion: if I want to sync directories from a Linux server using a Python script, does it make much difference whether I invoke a shell command with rsync from the Python script or use a dedicated Python sync command such as pyrsync?

(Boosts welcome if this isn't your area of expertise)

#python #rsync #linux

@statsguy

Quick sanity check:

pyrsync 13 commits total, last one 14 years ago. https://github.com/isislovecruft/pyrsync
rsync 7800 commits total, last one yesterday https://github.com/rsyncproject/rsync

Software which is just perfect and needs no mending and fixing all the time is absolutely fantastic. Yet I doubt prsync is in that league.๐Ÿ˜Ž ๐Ÿ˜€

(Whereby I am absolutely not implying rsync is bad because it has many commits. The above merely descibes a seemingly unattainable perfect state๐Ÿ™‚ .)

GitHub - isislovecruft/pyrsync: A pure Python module which implements the rsync algorithm.

A pure Python module which implements the rsync algorithm. - isislovecruft/pyrsync

GitHub

@HaraldKi @statsguy Yeah the main point is that OG rsync is being maintained... bitrot is very real it takes time and commits to, eg, keep up with vuln discoveries and compiler improvements.

As you say it could be that the last bug died 14 years ago and no more can ever be found in that... or...

@statsguy

Performance might vary a bit depending on whether there is a lot of data to transfer or not. Rsync hashes file chunks to avoid sending data when itโ€™s not necessary, and the C version might perform slightly better for that. Otherwise the process is I/O bound, which should mean you get the roughly the same performance when there is a lot of data that needs to be transferred.

Personally, Iโ€™d shell to rsync, but then Iโ€™d probably use a shell script instead of python as well, unless there is some other need for the capabilities that python provides.

@statsguy In practice, no. Theoretically, starting up a separate process (e.g. to run rsync) places a slight extra burden on the system, but it's not going to make any noticeable difference unless you're doing this many thousands of times per second or something like that. There may also be differences in how fast the two pieces of software work, but again, it typically wouldn't matter, and if it does the only way to know is to test it.

That is, assuming that both pieces of software properly do what they're supposed to do. As @HaraldKi mentioned, you can be a lot more confident that rsync does what it's supposed to do than you can that pyrsync does what it's supposed to do. So if you need reliability, that's a point in favor of rsync.

@statsguy Everybody is giving good guidance. But I donโ€™t understand the question. Meaning: rsync does it and has tons of documentation and examples. Was there something that gave you pause and prompted you to consider alternatives?

The difference in performance between Python or a binary are probably not going to be material. If the Python is written well, your waiting time will be spent waiting on I/O. Itโ€™s not a cpu-intensive activity.

@paco I have indeed had good guidance and I will be invoking original rsync from the shell within my Python script.

The question arose because I will be using a Python script, and sometimes if you can do something directly in Python then it seems neater to me to do it that way rather than invoking a shell command. That would be particularly true if the Python command were more efficient than the shell command, though that looks like it won't be true here.