Fed up with calculating dataset splits (e.g. train, validation, test, dev, silly, etc) for multiple classes to make sure they're balanced? Me too.
I built a tool to help me:
https://sbrl.github.io/research-smflooding/dataset-split-calculator.html
Put 1 integer value per line.
It even spits out shell commands to cut lines-based files (e.g. jsonl, csv, etc) into separate files!
May write a proper blog post soon!
#AI #DataScience #BigData #Automation #JSONL #CSV #Bash / #Shell #Scripts #AreAwesome

