does anyone use Forgejo and have issues with larger repositories? i a local copy of FreeBSD's ports.git (1.8GB on disk) where a normal 'git pull' from poudriere takes ~10 minutes, which doesn't seem right.

this is on an 8-core server with 2 mirrored SSDs, but the pull seems blocked on a single-threaded, CPU-bound git command: /usr/local/libexec/git-core/git --shallow-file pack-objects --revs --thin --stdout --shallow --delta-base-offset --include-tag

do i need to 'optimise' the repo somehow or what am i missing here? because i don't think this behaviour is normal or expected.

#forgejo #freebsd

@lw Pre-Forgejo, the #HardenedBSD project tried for a while to use #Gitea. But that fell over sideways with src and ports, especially with scraper bots continuously looking up each and every commit.

At the time, the problem stemmed from the fact that the #golang git package that everyone uses will load the entire repo history just to look up a single commit.

Rinse and repeat for thousands of hits per second, and kaboom!

HardenedBSD currently uses a self-hosted #GitLab Enterprise instance. We're hoping to eventually migrate to #Radicle.

@lattera i did wonder if it was perhaps looking at the entire history to decide what to send to the client... maybe related to the fact that poudriere does a shallow clone by default?

this also only seems to affect HTTP pulls, i haven't had any issues with git over SSH.

@lw ah, yeah, for poudriere, I usually pass in the -D option to do a full clone when creating jails or ports.

@lw @lattera shallow clones are quite a bit more expensive to compute than full clones (e.g. ¹). Homebrew was directly asked by GitHub to stop using shallow clones because of the load this created². However I don't know of any reason why SSH clones should be speedy while HTTPS ones are slow, that's quite odd and possibly a clue.

Do you happen to have any more verbose logs? E.g. with GIT_TRACE=1 or ³?

¹: https://github.blog/open-source/git/counting-objects/

²: https://github.com/Homebrew/brew/pull/9383

³: https://git-scm.com/docs/api-trace2

Counting Objects

The Systems Team at GitHub works to solve complex bugs and performance bottlenecks at the lowest levels of our infrastructure. Over the past two years we’ve undertaken a major project…

The GitHub Blog
@lw did you ever discover anything about that slow Foregejoe git clone issue? I'm quite curious to hear the unraveling of that mystery!
@gnomon i haven’t been at my computer, so no, but i was planning to look again later or maybe tomorrow
@lw ah OK! I shan't pester - hope I didn't come across that way - but if you dig into it sometime and want to bounce hypotheses off someone, I'm at your service

@lw Have a look at too/btop/htop and look at how many CPU cores are active and if the CPU cores are actively working and not in a iowait state. Also check the iotop and network throughput. And finally, check that the memory load is reasonable as well and there is now swap activities (which would kill performance if heavily used).

That's usually the better places to look if performance is not as expected. If the memory load is good, with no/little swap activity, little disk and network traffic, then your bottleneck is the CPU ... And then it might also be the software written in a way causing performance issues.