Listening to Paige Bailey
talk about tradeoffs between small and large language models in terms of cost/latency vs quality of output. #sw2con
Small models today can compete with large models from 6-9 months ago.
She thinks smaller models augmented with retrieval is probably the sweet spot.
(also her general rule of thumb is that code assistants need to roundtrip in <500ms.)