Nomad_Sim (@sedonaroxx)

모델의 파라미터 수 m이 관측치 n보다 적을 때는 under-parameterized 상태이고, m=n 부근에서 손실이 급등한 뒤, 더 큰 모델에서는 여러 방식으로 데이터를 맞출 수 있어 double descent 현상이 나타난다고 설명했다. 대규모 모델의 학습 곡선에 대한 핵심 이론을 간단히 정리한 글이다.

https://x.com/sedonaroxx/status/2049439721714266218

#doubledescent #deeplearning #modeltraining #machinelearning #theory

Nomad_Sim (@sedonaroxx) on X

@_avichawla In the beginning, the model is under-parameterized; then at m(parameters)=n(obs), the model is tight as it can only fit 1 way, the loss function explodes slightly above it. Then, for higher m, model has more parameters m > n and can find more ways to fit leading to double descent

X (formerly Twitter)
#Mastodon is great updating and correcting my "basic knowledge" where I did not expect more to come. In a month I e.g. learned that:
- Carbon-14 (that dating thing) decays to Nitrogen-14, not C-12, making safe nuclear battery in implant possible
- #DoubleDescent" in #DeepLearning is well studied for 3 years and not just a ghost anymore
- #phage therapy is now science important along antibiotics, not a tale about fringe doctors messing in sewers
etc.
I feel it is a very good society here. Thanks!