Yann LeCun's argument against auto-regressive model's seems convincing. I wonder if there is some other factorization besides auto-regressive where error doesn't compound exponentially.

I guess the JEPA model is supposed to address this: https://drive.google.com/file/d/1BU5bV3X5w65DwSMapKcsr0ZvrMRU_Nbi/view

@crude2refined

I don't disagree, but what he leaves out is the data. LLM are trained on all the data they can find, not just factually correct statements. Any model that relies on vast amounts of data has this annotation problem, doesn't matter whether it's autoregressive or not.