The current and future state of AI workloads or at least based on the most recent discussions appear to converge around scaling challenges.
There is an ongoing debate about whether pre-training of large language models is facing a deceleration in improvement rates, possibly due to limitations in available data and optimization techniques. However, post-training and run-time inferencing continue to show progress, driven by improvements in data quality and memory efficiency.
As AI infrastructure evolves, there is a shift towards smaller, more efficient models, which can operate effectively on less powerful hardware. This is a good thing. While some companies are focusing on large cluster sizes for pre-training, the majority are trying to explore alternative architectures for inference to handle expanding context windows. This would allow for “less prompting” and even greater accesibility by all users (both consumer segment and enterprise segment).
My two cents are that the average enterprise will have to focus more on investing in better, more context aware agents and care more about inference than training expanding the demand for new hardware alternatives. We could all benefit from more optimization and openness