We have a git repo that uses git-lfs. We had a scare where we realized the repo was much bigger than the files in it and concluded something large was not in lfs. In fact the problem was the lfs cache was big.

For a minute there, I was considering writing a script that checked every file and its lfs status, and gave you the largest file that is not in lfs and maybe the file extension that contributes most to non-lfs repo weight. But now I wonder: Does a script like that exist already?

It seems like this is seconds to write but then the part that gets a little more complicated is you probably want to also look through the git *history* to spot if there's like a 2GB file that predates you adding lfs to the repo which you've been lugging around the whole time. Off the top of my head, I'm not sure I know how to scrape the entire history like that. You could check out each commit in turn, but theoretically git offers slightly more powerful tools.

@mcc we did this exercise with DDA a few months ago, I'll look through my chat history and see if there's anything there someone else can use.

We went into it 100% certain it was tileset and/or audio files, but it was 90%+ .pot file updates.

@mcc ok I tracked it down, "a few months ago" was just shy of a year ago (what is time anyway, it's all squishy and wobbly and gross), and also or looks like it was an invocation of git filter-repo --analyze

Someone has probably said so already.

"Analyze repository history and create a report that may be useful in determining what to filter in a subsequent run (or in determining if a previous filtering command did what you wanted). Will not modify your repo."