Command-line Tools can be 235x Faster than your Hadoop Cluster
"This find | xargs mawk | mawk pipeline gets us down to a runtime of about 12 seconds, or about 270MB/sec, which is around 235 times faster than the Hadoop implementation."

#complexity #ShellTools #RightToolForTheRightJob #Hadoop #computing

Adam Drake

Adam Drake is an advisor to scale-up tech companies. He writes about ML/AI/data, leadership, and building tech teams.

Adam Drake

Apropos an ongoing project, looking at sed, and realising:

  • It can execute external commands (e)
  • It can read in entire files at a given address within the input stream (r).
  • It can read in specified external files on a line-by-line basis at a specified address within the input stream (R).

I've only been using sed for, oh, 40 years.

https://www.gnu.org/software/sed/manual/sed.html

#sed #ShellTools #linux #unix #til #shellScripting

sed, a stream editor

sed, a stream editor