Tangram: Hiding GPU Heterogeneity for Efficient LLM Parallelization

#GPUcluster #LLM #Performance

https://hgpu.org/?p=30879

Tangram: Hiding GPU Heterogeneity for Efficient LLM Parallelization

The scale of LLM training jobs requires parallelization planning over large GPU clusters. Due to different GPU types and interconnects added over time, these GPU clusters are increasingly heterogen…

hgpu.org

Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

#GPUcluster #TaskScheduling #Package

https://hgpu.org/?p=30451

Hybrid Learning and Optimization-Based Dynamic Scheduling for DL Workloads on Heterogeneous GPU Clusters

Modern cloud platforms increasingly host large-scale deep learning (DL) workloads, demanding high-throughput, low-latency GPU scheduling. However, the growing heterogeneity of GPU clusters and limi…

hgpu.org
Collective Communication for 100k+ GPUs

The increasing scale of large language models (LLMs) necessitates highly efficient collective communication frameworks, particularly as training workloads extend to hundreds of thousands of GPUs. T…

hgpu.org

Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms

#CUDA #GPUcluster #Communication

https://hgpu.org/?p=30035

Demystifying NCCL: An In-depth Analysis of GPU Communication Protocols and Algorithms

The NVIDIA Collective Communication Library (NCCL) is a critical software layer enabling high-performance collectives on large-scale GPU clusters. Despite being open source with a documented API, i…

hgpu.org

LiteGD: Lightweight and dynamic GPU Dispatching for Large-scale Heterogeneous Clusters

#GPUcluster

https://hgpu.org/?p=29950

LiteGD: Lightweight and dynamic GPU Dispatching for Large-scale Heterogeneous Clusters

Parallel computing with multiple GPUs has become the dominant paradigm for machine learning tasks, especially those of large language models (LLMs). To reduce the latency incurred by inter-GPU comm…

hgpu.org

FLASH: Fast All-to-All Communication in GPU Clusters

#GPUcluster #Communication #MPI

https://hgpu.org/?p=29914

FLASH: Fast All-to-All Communication in GPU Clusters

Scheduling All-to-All communications efficiently is fundamental to minimizing job completion times in distributed systems. Incast and straggler flows can slow down All-to-All transfers; and GPU clu…

hgpu.org

Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing

#CUDA #MPI #GPUcluster #TaskScheduling #DeepLearning #DL #PyTorch

https://hgpu.org/?p=29319

Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing

Deep learning (DL) has demonstrated significant success across diverse fields, leading to the construction of dedicated GPU accelerators within GPU clusters for high-quality training services. Effi…

hgpu.org

Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs

#HeterogeneousSystems #GPUcluster #LLM

https://hgpu.org/?p=29242

Helix: Distributed Serving of Large Language Models via Max-Flow on Heterogeneous GPUs

This paper introduces Helix, a distributed system for high-throughput, low-latency large language model (LLM) serving on heterogeneous GPU clusters. A key idea behind Helix is to formulate inferenc…

hgpu.org

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

#SYCL #GPUcluster #HPC #Package

https://hgpu.org/?p=29182

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Reducing the need for users to manually manage the details of work and data distribution is an important goal of high-level many-task runtime systems. For distributed memory platforms this means th…

hgpu.org
Wir am ZIM @dh_graz suchen technische Expertise beim Aufbau eines GPU-Clusters für die österreichischen Geisteswissenschaften: https://informationsmodellierung.uni-graz.at/de/neuigkeiten/detail/article/stellenausschreibung-projektassistenz-im-bereich-machine-learning/ und freuen uns auf jede Bewerbung! #Stellenausschreibung #MachineLearning #GPUCluster
Stellenausschreibung: Projektassistenz im Bereich Machine Learning