Tested Cogito V1 8B on my Linux server. 83 t/s, 5.4GB VRAM, 131k context. The real story is where it deliberately wrote worse code because it decided a beginner needed simplicity over efficiency -- and admitted it! That's IDA self-reflection making a live call.
I guess a 5GB model with a conscience is worth more than a 70B model with none?
Read the full breakdown below.
