We have a git repo that uses git-lfs. We had a scare where we realized the repo was much bigger than the files in it and concluded something large was not in lfs. In fact the problem was the lfs cache was big.

For a minute there, I was considering writing a script that checked every file and its lfs status, and gave you the largest file that is not in lfs and maybe the file extension that contributes most to non-lfs repo weight. But now I wonder: Does a script like that exist already?

It seems like this is seconds to write but then the part that gets a little more complicated is you probably want to also look through the git *history* to spot if there's like a 2GB file that predates you adding lfs to the repo which you've been lugging around the whole time. Off the top of my head, I'm not sure I know how to scrape the entire history like that. You could check out each commit in turn, but theoretically git offers slightly more powerful tools.
@mcc You might be able to use git-bisect for this, if you had a script that you could provide it to check the condition. It's an iterative search though. Maybe run the test on each commit from the middle of history with some large step size, moving earlier if it fails and later if it passes, and then apply git-bisect from the first fail to the previous step?
@veviser i think the version where you just check out every version would work it's just disruptive to do checkouts on a working repository. you'd need to clone.