Introducing Mistral Small 4 | Mistral AI

https://aussie.zone/post/30679863

Introducing Mistral Small 4 | Mistral AI - Aussie Zone

Key architectural details Mixture of Experts (MoE): 128 experts, with 4 active per token, enabling efficient scaling and specialization. 119B total parameters, with 6B active parameters per token (8B including embedding and output layers). 256k context window, supporting long-form interactions and document analysis. Configurable reasoning effort: Toggle between fast, low-latency responses and deep, reasoning-intensive outputs. Native multimodality: Accepts both text and image inputs, unlocking use cases from document parsing to visual analysis.

LLM Architecture Gallery

https://lemdro.id/post/37634842

Short Doco: How LLMs Took Over The World - Everything is a Pattern

https://aussie.zone/post/30630219

Short Doco: How LLMs Took Over The World - Everything is a Pattern - Aussie Zone

sorry I know not strictly local llm related but I did find it fascinating, it’s technical but not so technical only specialists can understand

I am wondering, is the path big AI corps are going with providing models via huge server farms quite opposing capitalism?
Normally costs run down over time (see solar or microchips). LLMs get smaller and suddenly they fit on your device.

I checked OVH cloud for their offerings of cloud models. They all fit on a 64gb strix halo, probably even 32gb ram. The SOTA models still have an edge, but honestly not much.

#localllm #localllama

CanIRun.ai — Can your machine run AI models?

https://discuss.tchncs.de/post/56526579

CanIRun.ai — Can your machine run AI models? - tchncs

Lemmy

llama.cpp + mcp - docker and more

https://lemmy.zip/post/60682562

How to... (Maybe I am missing something)

https://downonthestreet.eu/post/560800

How to... (Maybe I am missing something) - Down On The Street

Well, I run my own OpenWebUI with Ollama, installed with docker compose and running local on my home server with some NVIDIA GPU and I am pretty happy with the overall result. I have only installed local open source models like gptoss, deepseek-r1, llama (3.2, 4), qwen3… My use case is mostly ask questions on documentation for some development (details on programming language syntax and such). I have been running it for months now, and it come to my mind that it would be useful for the following tasts as well: - audio transcribing (voice messages to text) - image generation (logos, small art for my games and such) I fiddled a bit around, but got nowhere. How do you do that from the openwebui web interface? (I never used ollama directly, only through the openwebui GUI)

Guide to run Qwen3.5 locally

https://communick.news/post/5598650

Guide to run Qwen3.5 locally - Communick News

Lemmy

Anyone interested in AI radio?

https://lemmy.world/post/43936980

Anyone interested in AI radio? - Lemmy.World

Hey lemmy! I was wondering if anyone was interested in checking out an AI radio app that I’ve put together. Really just looking for feedback and overall impressions. Obviously if it’s against the rules (I didnt see a rule this would violate), I wont post the link. If you guys are interested, I’ll edit the post to include the link. Thanks guys!

A possible hardware solution for ultra speed (73x faster than H200) self hosted small models that is not dependent on RAM

https://lemmy.ca/post/61063772

A possible hardware solution for ultra speed (73x faster than H200) self hosted small models that is not dependent on RAM - Lemmy.ca

Approach hardwires model weights into transistors, and uses older 6nm process. Targetting 70b model sizes (presumably 16 bit) by year end. It should cost much less than a 140gb card. but I don’t know details.