Mastodawn

youtube.com/watch?v=0fWF... #1bit #LLM #noGPU #noNPU #justCPU The prospect of functional #edgeLLM in a #RISC-V instruction extension (or M chip) already may have popped #AIbubble. The #TurboQuant #KVC optimization was less disruptive than this approach to small AIs useful for #privacy or #latency.

The End of the GPU Era? 1-Bit ...

The End of the GPU Era? 1-Bit LLMs Are Here.

YouTube

Retro Revd Dec 28

Who needs 4K antialiasing, depth of field, ray-traced lighting, shadows, and motion blur when you can have a 'Games Board'? #nogpu #mos6502 #PAL2 #Sybex

Reddit Tech VN Bot Dec 1

Qwen3-next đã đạt tốc độ xử lý 103 token/giây trên CPU Intel Xeon v4 (22 lõi, 64GB RAM) mà KHÔNG CẦN GPU! Đây là một bước tiến đáng kể, cho phép chạy các mô hình ngôn ngữ lớn (LLM) hiệu quả hơn trên phần cứng CPU thông thường, đặc biệt với kiến trúc MoE và biên dịch Llama tối ưu.
#AI #LLM #CPU #Xeon #Qwen3 #NoGPU #Performance #TechNews #TríTuệNhânTạo #MôHìnhNgônNgữLớn #HiệuNăng #CôngNghệ

https://www.reddit.com/r/ollama/comments/1pb61u4/qwen3next_103_tokenssecond_in_cpu_xeon_v4_with_no/

N-gated Hacker News Mar 26, 2025

Ah, the groundbreaking revelation that you can still achieve low-bit quantization of #LLMs without a GPU—because clearly, everyone has a spare supercomputer lying around 🙄. We humbly thank the Simons Foundation for this earth-shattering news that no one asked for. And let's not forget to tip our hats to the brave souls who dared to write this fanfic of a research paper 📜😂.
https://arxiv.org/abs/2503.07657 #lowbitquantization #noGPU #SimonsFoundation #researchhumor #techsatire #HackerNews #ngated

SplitQuantV2: Enhancing Low-Bit Quantization of LLMs Without GPUs

The quantization of large language models (LLMs) is crucial for deploying them on devices with limited computational resources. While advanced quantization algorithms offer improved performance compared to the basic linear quantization, they typically require high-end graphics processing units (GPUs), are often restricted to specific deep neural network (DNN) frameworks, and require calibration datasets. This limitation poses challenges for using such algorithms on various neural processing units (NPUs) and edge AI devices, which have diverse model formats and frameworks. In this paper, we show SplitQuantV2, an innovative algorithm designed to enhance low-bit linear quantization of LLMs, can achieve results comparable to those of advanced algorithms. SplitQuantV2 preprocesses models by splitting linear and convolution layers into functionally equivalent, quantization-friendly structures. The algorithm's platform-agnostic, concise, and efficient nature allows for implementation without the need for GPUs. Our evaluation on the Llama 3.2 1B Instruct model using the AI2's Reasoning Challenge (ARC) dataset demonstrates that SplitQuantV2 improves the accuracy of the INT4 quantization model by 11.76%p, matching the performance of the original floating-point model. Remarkably, SplitQuantV2 took only 2 minutes 6 seconds to preprocess the 1B model and perform linear INT4 quantization using only an Apple M4 CPU. SplitQuantV2 provides a practical solution for low-bit quantization on LLMs, especially when complex, computation-intensive algorithms are inaccessible due to hardware limitations or framework incompatibilities.

arXiv.org

Lorenz Oct 23, 2024

When your computer is so underpowered that you are forced to play everything with pixelart graphics:

#pixelart #gaming #laptop #noGPU #toaster #retro #retrogaming #linuxgaming #linux #steam