Is there any reason for the continued push in the #AI #LLM world purely for bigger models requiring more compute/RAM? At this point the smart decision would seem to be an optimisation phase - go for a 2x, or even 10x improvement on how it can run on smaller, more generalised hardware. With the economics of #tokenmaxxing suddenly hitting home the way to get ahead of the bubble would be to take a current frontier model and optimise that to make it run on consumer hardware then build from there