How to tell Pixi it can use CUDA? System requirements are the way to do this!

#pixi #cuda #packagemanagement #software

#HPC won't be an #x86 monoculture forever – and it's starting to show
#Intel might once have ruled the HPC roost but its influence is waning. Today, other processors are making significant inroads. #Nvidia's 2006 #CUDA launch transformed #GPU into general-purpose computing machines with dramatic speedups for parallel data workloads. While #AMD has gained considerable traction over Intel in the highly successful x86 HPC market, #Arm is another strong contender.
https://www.theregister.com/2025/11/27/arm_riscv_hpc/
HPC won't be an x86 monoculture forever – and it's starting to show

Feature: Arm and RISC-V would like a word

The Register

Tilus: A Tile-Level GPGPU Programming Language for Low-Precision Computation

#CUDA #PTX #Triton #ProgrammingLanguages #Package

https://hgpu.org/?p=30481

Tilus: A Tile-Level GPGPU Programming Language for Low-Precision Computation

Serving Large Language Models (LLMs) is critical for AI-powered applications, yet it demands substantial computational resources, particularly in memory bandwidth and computational throughput. Low-…

hgpu.org

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

#CUDA #AI #Memory #Package

https://hgpu.org/?p=30480

Memory-Efficient Acceleration of Block Low-Rank Foundation Models on Resource Constrained GPUs

Recent advances in transformer-based foundation models have made them the default choice for many tasks, but their rapidly growing size makes fitting a full model on a single GPU increasingly diffi…

hgpu.org

Optimal Software Pipelining and Warp Specialization for Tensor Core GPUs

#CUDA #ProgrammingLanguages

https://hgpu.org/?p=30478

Optimal Software Pipelining and Warp Specialization for Tensor Core GPUs

GPU architectures have continued to grow in complexity, with recent incarnations introducing increasingly powerful fixed-function units for matrix multiplication and data movement to accompany high…

hgpu.org

PEAK: A Performance Engineering AI-Assistant for GPU Kernels Powered by Natural Language Transformations

#CUDA #HIP #HLSL #AI #LLM #NLP

https://hgpu.org/?p=30477

PEAK: A Performance Engineering AI-Assistant for GPU Kernels Powered by Natural Language Transformations

Advancements in large language models (LLMs) are showing promising impact in software development and programming assistance. However, these models struggle when operating on low-level backend code…

hgpu.org

So sánh CUDA vs Vulkan trên RTX 3080: CUDA thường vượt trội, nhưng Vulkan gây bất ngờ với một số model khi offload 1 phần sang GPU.
- GLM4 9B Q6: PP nhanh hơn 2.2x, TG nhanh hơn 1.7x.
- Ministral3 14B Q4: PP nhanh hơn 4.4x, TG nhanh hơn 1.6x.
- Qwen3 8B Q6: PP nhanh hơn 1.5x.

#AI #LLM #CUDA #Vulkan #Benchmark #Thửnghiệm #Cardo #Vi xử lý #Côngnghệ

https://www.reddit.com/r/LocalLLaMA/comments/1pydegt/benchmarking_local_llms_for_speed_with_cuda_and/

Kết quả mới cho thấy Vulkan có thể nhanh hơn CUDA trong chỉ định model. Ví dụ, Ministral3 14B 2512 Q4 có tốc độ tăng lên 4,4 lần khi xử lý prompt. CUDA vẫn là lựa chọn tốt nhất cho đa số trường hợp. #Vulkan #CUDA #ModelOptimization #TechNews #ThiếtKếModel #BảoMật #LenhLem #HóaCván #SốHúc #LinhTụ #ThépKin #TệpMúzeum #CơSốVănHóa

NONE

https://www.reddit.com/r/LocalLLaMA/comments/1pydegt/benchmarking_local_llms_for_speed_with_cuda_and/

RTX 5090 + llama.cpp bị treo sau 2-3 lần chạy mô hình (cấu hình VFIO, Ubuntu 24.04). Lỗi "illegal memory access", GPU lỗi, quạt quay 100%. Đã thử nhiều cách nhưng chưa khắc phục. Hỏi: lỗi từ Blackwell, driver, hay nên dùng Windows VM? #llamaCPP #RTX5090 #GPU #MLOps #VFIO #CUDA #Linux #LỗiGPU #AIModel #Ubuntu

https://www.reddit.com/r/LocalLLaMA/comments/1pxv14g/help_rtx_5090_llamacpp_crashes_after_23/

Việc thiết lập và tích hợp CUDA với llama.cpp trên Ubuntu giờ đây đơn giản hơn nhờ 2 công cụ mới:
1. Ubuntu-Cuda-Llama.cpp-Executable: Thiết lập 1 bước, tối ưu cho GPU NVIDIA.
2. llcuda: Thư viện Python giúp tích hợp CUDA với llama.cpp nhanh chóng để xây dựng agents cục bộ. #AI #Python #CUDA #Llama #Linux #HọcMáy

https://www.reddit.com/r/LocalLLaMA/comments/1pxssyc/project_simplified_cuda_setup_python_bindings_for/