Mastodawn

mahadevank

@n_dimension also, I think if we trim down to quality datasets - like Wikipedia and open-source books, I think we can build a smaller model that runs on lower spec hardware.

I run Qwen-3/Jan-code models on my RTX 2060 - no sweat for inference and I can use it for 80-90% of my work. Its like having an interactive encyclopedia offline. I love it.

Specific models for specific use-cases/communities might also be a good idea. Like an agri-trained llm for agriculture

Wulfy—Speaker to the machines 2d ago

@mahadevank

Google has just released a super tight, great local #LLM I had not had a chance to look at it yet

I really like your agri model idea.

I was thinking a basic medical (nurse level) one for the third world/post-collapse.

@n_dimension @mahadevank I recall there was some research a while back which showed that domain specific fine tuning really did not work well.

There was attempts at training astronomy specific models, and while they outperformed models of a similar size at questions like "describe the lightcurve of binary star mergers" they suffered from much higher hallucination rates, and performed worse at generalising outside of the specific documents they were fine tuned on.

Now admittedly, this was back in the Llama2 days so maybe "modern" architectures would behave differently. But it seems that a broad dataset is necessary for generalising, even within a specific domain

@n_dimension @mahadevank (con't)

For example, there is other research easily available which shows that including programming in the training data MASSIVELY improved performance in mathematics and general problem solving.

@n_dimension @mahadevank (con't again, sorry I started rambling)

That is not to say that general purpose systems are the best and specialised systems won't work.

There was some work apparently by Yann LeCunn (I haven't read it myself yet though) and apparently the optimal architecture game playing AIs was small LLMs combined with domain specific tools

@AuntyRed @n_dimension I'm a total newbie to LLMs and such. Just know a few things about basic neural networks, so not much idea on what makes LLMs tick.

I saw one model though - shellm - which was just a 378M parameter model that was great at writing out shell commands for a given prompt

@mahadevank @n_dimension oh yeah, there's a whole lot you can do with a small model, especially when combined with an external trusted source of data, e.g. a tool to search Wikipedia

Wulfy—Speaker to the machines 2d ago

@mahadevank @AuntyRed

The models push to prod every week a .. release

However, I don't doubt the research. Multidimensional vector trees that underpin #LLM s have some very peculiar traversal patterns.

To be honest, I did not have enough time to look into the local models. There is a local education RAG hybrid I stumbled upon, that actually looks real solid. The purdy picture is all I have tho.

@n_dimension do you have the name, I'd love to give it a go

Wulfy—Speaker to the machines 2d ago

@mahadevank

#Gemma4 it comes in 4 sizes with the biggest at 32B parameters and apparently it runs pretty decent on...a mobile device!!!

Apache license.
Sounds like #LLAMA2 #Quen #deepseek killer.

Let me know if you remember its utility, please.
Its literally just out.

@n_dimension nice! I'll get that downloaded - I use llama-cli for the models, works great. llama-server does some funny stuff and fails to allocate a ton of memory, no idea why