Mastodawn

[Mac Studio를 Ollama 호스트로 쓰는 것이 과연 합리적인 선택일까?

Mac Studio(M4 Max, 64GB)를 로컬 LLM 서버로 사용하는 것이 RTX 3090급 GPU 클러스터 대비 어떤 가치가 있는지 논의. 사용자 경험에 따르면 8B~32B급 모델은 Mac Studio에서도 실사용 가능하지만, 대형 모델은 클라우드 의존이 불가피. Mac Studio는 편의성과 안정성, GPU 클러스터는 순수 성능 중심의 선택으로 구분됨. Ollama에 대한 비판과 대안도 논의됨.

https://news.hada.io/topic?id=26257

#ollama #llm #macstudio #gpucluster #localai

Mac Studio를 Ollama 호스트로 쓰는 것이 과연 합리적인 선택일까?

<ul> <li>Mac Studio(M4 Max, 64GB)를 로컬 LLM 서버로 사용하는 것이 <strong>RTX 3090급 GPU 클러스터 대비 어떤 가치가 있는지</strong>를 묻는...

GeekNews

HGPU group Dec 14

Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

#GPUcluster #TaskScheduling #Package

https://hgpu.org/?p=30451

Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

Modern cloud platforms increasingly host large-scale deep learning (DL) workloads, demanding high-throughput, low-latency GPU scheduling. However, the growing heterogeneity of GPU clusters and limi…

hgpu.org

HGPU group Oct 26, 2025

Collective Communication for 100k+ GPUs

#CUDA #GPUcluster #LLM #Performance #Package

https://hgpu.org/?p=30315

Collective Communication for 100k+ GPUs

The increasing scale of large language models (LLMs) necessitates highly efficient collective communication frameworks, particularly as training workloads extend to hundreds of thousands of GPUs. T…

hgpu.org

HGPU group Jul 13, 2025

Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms

#CUDA #GPUcluster #Communication

https://hgpu.org/?p=30035

Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms

The NVIDIA Collective Communication Library (NCCL) is a critical software layer enabling high-performance collectives on large-scale GPU clusters. Despite being open source with a documented API, i…

hgpu.org

HGPU group Jun 22, 2025

LiteGD: Lightweight and dynamic GPU Dispatching for Large-scale Heterogeneous Clusters

#GPUcluster

https://hgpu.org/?p=29950

LiteGD: Lightweight and dynamic GPU Dispatching for Large-scale Heterogeneous Clusters

Parallel computing with multiple GPUs has become the dominant paradigm for machine learning tasks, especially those of large language models (LLMs). To reduce the latency incurred by inter-GPU comm…

hgpu.org

HGPU group May 25, 2025

FLASH: Fast All-to-All Communication in GPU Clusters

#GPUcluster #Communication #MPI

https://hgpu.org/?p=29914

FLASH: Fast All-to-All Communication in GPU Clusters

Scheduling All-to-All communications efficiently is fundamental to minimizing job completion times in distributed systems. Incast and straggler flows can slow down All-to-All transfers; and GPU clu…

hgpu.org

HGPU group Aug 4, 2024

Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing

#CUDA #MPI #GPUcluster #TaskScheduling #DeepLearning #DL #PyTorch

https://hgpu.org/?p=29319

Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing

Deep learning (DL) has demonstrated significant success across diverse fields, leading to the construction of dedicated GPU accelerators within GPU clusters for high-quality training services. Effi…

hgpu.org

HGPU group Jun 9, 2024

Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs

#HeterogeneousSystems #GPUcluster #LLM

https://hgpu.org/?p=29242

Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs

This paper introduces Helix, a distributed system for high-throughput, low-latency large language model (LLM) serving on heterogeneous GPU clusters. A key idea behind Helix is to formulate inferenc…

hgpu.org

HGPU group Apr 14, 2024

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

#SYCL #GPUcluster #HPC #Package

https://hgpu.org/?p=29182

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Reducing the need for users to manually manage the details of work and data distribution is an important goal of high-level many-task runtime systems. For distributed memory platforms this means th…

hgpu.org

GVogeler Dec 20, 2023

Wir am ZIM @dh_graz suchen technische Expertise beim Aufbau eines GPU-Clusters für die österreichischen Geisteswissenschaften: https://informationsmodellierung.uni-graz.at/de/neuigkeiten/detail/article/stellenausschreibung-projektassistenz-im-bereich-machine-learning/ und freuen uns auf jede Bewerbung! #Stellenausschreibung #MachineLearning #GPUCluster

Mac Studio를 Ollama 호스트로 쓰는 것이 과연 합리적인 선택일까?

Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

Collective Communication for 100k+ GPUs

Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms

LiteGD: Lightweight and dynamic GPU Dispatching for Large-scale Heterogeneous Clusters

FLASH: Fast All-to-All Communication in GPU Clusters

Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing

Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Stellenausschreibung: Projektassistenz im Bereich Machine Learning