Việc đặt tiêu đề cho ~50k ảnh/ngày cần GPU mạnh. Kinh nghiệm cho các mô hình VLM như uform-gen2-qwen-500m hoặc qwen2.5-vl:7b. Đề xuất GPU L40, AWS G5 để tối ưu tốc độ và chi phí. #GPU #AI_vi #DeepLearning #CaptionGenerator #MLOptimization
Việc đặt tiêu đề cho ~50k ảnh/ngày cần GPU mạnh. Kinh nghiệm cho các mô hình VLM như uform-gen2-qwen-500m hoặc qwen2.5-vl:7b. Đề xuất GPU L40, AWS G5 để tối ưu tốc độ và chi phí. #GPU #AI_vi #DeepLearning #CaptionGenerator #MLOptimization
Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing
https://arxiv.org/abs/2508.12631
#HackerNews #Making #LLMs #Cheaper #and #Better #via #Performance-Efficiency #Optimized #Routing
LLMs #PerformanceEfficiency #AIResearch #MLOptimization #CostReduction
Balancing performance and efficiency is a central challenge in large language model (LLM) advancement. GPT-5 addresses this with test-time routing, dynamically assigning queries to either an efficient or a high-capacity model during inference. In this work, we present Avengers-Pro, a test-time routing framework that ensembles LLMs of varying capacities and efficiencies, providing a unified solution for all performance-efficiency tradeoffs. The Avengers-Pro embeds and clusters incoming queries, then routes each to the most suitable model based on a performance-efficiency score. Across 6 challenging benchmarks and 8 leading models -- including GPT-5-medium, Gemini-2.5-pro, and Claude-opus-4.1 -- Avengers-Pro achieves state-of-the-art results: by varying a performance-efficiency trade-off parameter, it can surpass the strongest single model (GPT-5-medium) by +7% in average accuracy. Moreover, it can match the average accuracy of the strongest single model at 27% lower cost, and reach ~90% of that performance at 63% lower cost. Last but not least, it achieves a Pareto frontier, consistently yielding the highest accuracy for any given cost, and the lowest cost for any given accuracy, among all single models. Code is available at https://github.com/ZhangYiqun018/AvengersPro.
Surprising fact: Focusing solely on ML model accuracy in enterprise deployments ignores a crucial factor – operational costs!
This means the best model isn't always the most accurate, but the most "cost-effective".
What are your thoughts on prioritizing cost-performance over pure accuracy in enterprise AI?
#BeyondAccuracyThe #MLOptimization #CostPerformance #DeepTech
MLP Accelerators are Changing TinyML for Edge Computing
https://rackenzik.com/how-mlp-accelerators-are-changing-tinyml-for-edge-computing/
#TinyML #EdgeAI #MLPAccelerators #FPGAs #EmbeddedAI #AIOnTheEdge #LowPowerAI #MachineLearning #EdgeComputing #AIHardware #AIInnovation #SmartDevices #OnDeviceAI #MLOptimization #TechForGood
A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based approach to this problem. We first introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale. We then introduce a "smooth model training" framework that enables effective optimization using metagradients. With metagradient descent (MGD), we greatly improve on existing dataset selection methods, outperform accuracy-degrading data poisoning attacks by an order of magnitude, and automatically find competitive learning rate schedules.