Việc đặt tiêu đề cho ~50k ảnh/ngày cần GPU mạnh. Kinh nghiệm cho các mô hình VLM như uform-gen2-qwen-500m hoặc qwen2.5-vl:7b. Đề xuất GPU L40, AWS G5 để tối ưu tốc độ và chi phí. #GPU #AI_vi #DeepLearning #CaptionGenerator #MLOptimization

https://www.reddit.com/r/LocalLLaMA/comments/1pun4kk/which_gpu_should_i_use_to_caption_50k_imagesday/

Beyond GPT-5: Making LLMs Cheaper and Better via Performance-Efficiency Optimized Routing

Balancing performance and efficiency is a central challenge in large language model (LLM) advancement. GPT-5 addresses this with test-time routing, dynamically assigning queries to either an efficient or a high-capacity model during inference. In this work, we present Avengers-Pro, a test-time routing framework that ensembles LLMs of varying capacities and efficiencies, providing a unified solution for all performance-efficiency tradeoffs. The Avengers-Pro embeds and clusters incoming queries, then routes each to the most suitable model based on a performance-efficiency score. Across 6 challenging benchmarks and 8 leading models -- including GPT-5-medium, Gemini-2.5-pro, and Claude-opus-4.1 -- Avengers-Pro achieves state-of-the-art results: by varying a performance-efficiency trade-off parameter, it can surpass the strongest single model (GPT-5-medium) by +7% in average accuracy. Moreover, it can match the average accuracy of the strongest single model at 27% lower cost, and reach ~90% of that performance at 63% lower cost. Last but not least, it achieves a Pareto frontier, consistently yielding the highest accuracy for any given cost, and the lowest cost for any given accuracy, among all single models. Code is available at https://github.com/ZhangYiqun018/AvengersPro.

arXiv.org

Surprising fact: Focusing solely on ML model accuracy in enterprise deployments ignores a crucial factor – operational costs!

This means the best model isn't always the most accurate, but the most "cost-effective".

What are your thoughts on prioritizing cost-performance over pure accuracy in enterprise AI?

#BeyondAccuracyThe #MLOptimization #CostPerformance #DeepTech

MLP Accelerators are Changing TinyML for Edge Computing - Rackenzik

MLP accelerators enhance TinyML by optimizing AI inference on edge devices, enabling fast, power-efficient processing without cloud reliance.

Rackenzik
🎩🤖 "Metagradient Descent" promises the magic of optimizing ML, but is more like watching paint dry at warp speed. 📉👏 With support from the mystical Simons Foundation, we now have another wizardry paper that's essentially just trying to make gradients great again. 🧙‍♂️✨
https://arxiv.org/abs/2503.13751 #MetagradientDescent #MLoptimization #SimonsFoundation #AIresearch #GradientMagic #HackerNews #ngated
Optimizing ML Training with Metagradient Descent

A major challenge in training large-scale machine learning models is configuring the training process to maximize model performance, i.e., finding the best training setup from a vast design space. In this work, we unlock a gradient-based approach to this problem. We first introduce an algorithm for efficiently calculating metagradients -- gradients through model training -- at scale. We then introduce a "smooth model training" framework that enables effective optimization using metagradients. With metagradient descent (MGD), we greatly improve on existing dataset selection methods, outperform accuracy-degrading data poisoning attacks by an order of magnitude, and automatically find competitive learning rate schedules.

arXiv.org