Maybe this is a controversial take, but SoTA small LLMs are where it's at, IMO. My team and I use them at work. They cost near-$0, ex. we run 8B or smaller models like Phi locally on instances we were already running (didn't even need to upgrade). At home I have Llama 3b hooked up to HomeAssistant Voice. They aren't wise, but the tech is useful with a human-in-the-loop, and you can do useful things with only a handful of watts' power budget and without spending thousands of dollars on GPUs.
Example things I use small LLMs for to accelerate our work:
* If a human engineer judges that an issue is a true positive, an LLM drafts customer-facing justification w/ additional context. Human engineer reviews and corrects minor issues before sending.
* Human engineers write rough notes what they worked on every week. An LLM transforms the team's notes into draft updates at various depths: a technical update we review in standup, a higher level update for program managers and leadership, etc.