Aurora: A Leverage-Aware Optimizer for Rectangular Matrices
Aurora: A Leverage-Aware Optimizer for Rectangular Matrices
`Existing HP results in decentralized settings almost exclusively focus on the Decentralized Stochastic Gradient Descent (DSGD) algorithm, which requires strong assumptions..This is contrary to the mean-squared error (MSE) results..In this paper we provide the first step toward bridging the gap, by studying HP convergence of DSGD incorporating the gradient tracking technique, in the presence of noise satisfying a relaxed sub-Gaussian condition.`

We study high-probability (HP) convergence guarantees in decentralized stochastic optimization, where multiple agents collaborate to jointly train a model over a network. Existing HP results in decentralized settings almost exclusively focus on the Decentralized Stochastic Gradient Descent ($\mathtt{DSGD}$) algorithm, which requires strong assumptions, such as bounded data heterogeneity, or strong convexity of each agent's cost. This is contrary to the mean-squared error (MSE) results, where methods incorporating bias-correction techniques are known to converge under relaxed assumptions and achieve better practical performance. In this paper we provide the first step toward bridging the gap, by studying HP convergence of $\mathtt{DSGD}$ incorporating the gradient tracking technique, in the presence of noise satisfying a relaxed sub-Gaussian condition. We show that the resulting method, dubbed $\mathtt{GT-DSGD}$, achieves order-optimal HP convergence rates for both non-convex and Polyak-Łojasiewicz costs, of order $\mathcal{O}\Big(\frac{\log(1/δ)}{\sqrt{nT}}\Big)$ and $\mathcal{O}\Big(\frac{\log(1/δ)}{nT}\Big)$, respectively, where $n$ is the number of agents, $T$ is the time horizon and $δ\in (0,1)$ is the confidence parameter. Our results establish that $\mathtt{GT-DSGD}$ converges in the HP sense under the same conditions on the cost as in the MSE sense, while achieving comparable transient times. To the best of our knowledge, these are the first HP guarantees for decentralized optimization methods incorporating bias-correction. Numerical experiments on real and synthetic data verify our theoretical findings, underlining the superior performance of $\mathtt{GT-DSGD}$ and highlighting that the benefits of incorporating bias-correction are also maintained in the HP sense.
Sea of Nodes
Sea of Nodes는 1990년대 초 Cliff Click가 고안한 컴파일러 중간 표현(IR)으로, HotSpot C2 JIT 컴파일러의 핵심 IR로 사용되어 고품질 코드를 빠르게 생성합니다. 이 IR은 JVM뿐 아니라 Google V8, Graal 컴파일러 등 여러 주요 컴파일러에 영향을 미쳤으며, 현재까지도 널리 활용되고 있습니다. Sea of Nodes의 개념과 구현을 배우기 위한 오픈소스 튜토리얼과 여러 언어(Java, Rust, C++, Go)로 포팅된 레퍼지토리가 공개되어 있어 컴파일러 개발자에게 유용한 학습 자료를 제공합니다.
#compiler #intermediaterepresentation #jit #optimization #opensource
Removing fsync from our local storage engine
https://fractalbits.com/blog/remove-fsync/
#HackerNews #removingfsync #localstorage #engine #performance #optimization #database
Making Julia as Fast as C++
https://flow.byu.edu/posts/julia-c++
#HackerNews #Julia #C++ #speed #optimization #programming #languages #performance
Sudo su (@sudoingX)
에이전트가 llama.cpp의 CUDA C++ 디스패치 훅과 mmqcu 패치를 작성해 Q8 matmul을 자체 최적화 커널로 라우팅하고 있다. 27B 모델이 DGX Spark에서 추론 엔진 자체를 수정하는 모습이 인상적인 사례로 소개된다.
Вторая жизнь MacBook Air 2011: Как Linux Mint, ZRam и магия Swapspace сотворили чудо
У каждого из нас есть старый верный друг. Мой друг — это MacBook Air 2011 года . Тонкий алюминиевый корпус, стильный и красивый дизайн, с отличной клавиатурой с подсветкой, но с приговором от Apple: всего 2 ГБ оперативной памяти , которые невозможно проапгрейдить. Современная macOS превратила его в «кирпич», который задумывался на минуту при каждом клике. Но я решил, что списывать его на свалку истории рано. Выход нашелся там, где его всегда ищут энтузиасты: в мире открытого ПО.
The Art and Science of ArcSOC #Optimization in @arcgisxprise https://tinyurl.com/jenxpem6
#ArcGISEnterprise #ArcGISAdmin #GIS #WebGIS #esri #arcgis #performance #GISchat #geospatial @esri @esrifederalgovt @esrislgov @esritraining @urisa
Genetic algorithms apply principles of evolution to solve complex problems.
This session at Nebraska Code() from Barry Stahl explores:
• Representing problems in genetic terms
• Defining solution “DNA”
• Tuning parameters for continuous improvement