Yann LeCun's argument against auto-regressive model's seems convincing. I wonder if there is some other factorization besides auto-regressive where error doesn't compound exponentially.
I guess the JEPA model is supposed to address this: https://drive.google.com/file/d/1BU5bV3X5w65DwSMapKcsr0ZvrMRU_Nbi/view