I have 90,000 files to move to S3. AWS CLI client takes at least 1 second for every single one, in cloud. I wrote my own client in bash of all things that's at least 5x faster. AWS doesn't invest in its tooling. Slow tools are an OPERATIONAL LIABILITY! With AWS client this would take a whole day.
(Yes, you can use parallel -j N or xargs -P N—traditional ways to work around Python—but still.)
This trivial job—just one http request—quicky becomes CPU bound! Whatever they're doing with Python doesn't make any sense at all. There is no CPU heavy part of dispatching, e.g., an aws s3 rm (which is what I am doing now), yet doing a few in parallel can completely saturate a multicore machine.
#aws #awsfail #python No kidding:
OK, update to this. There is `aws s3 sync` which helpfully forks parallel processes for you. …by default, I think only 10. You can increase it with this: $ cat ~/.aws/config [default] s3 = max_concurrent_requests = 200 …but here's the kicker. Every process uses 305m. To make…one API request.