[Vertex AI Context Caching + Priority PayGo 레이턴시 벤치마크 (400회, Gemini 3 Flash)

Vertex AI의 Context Caching과 Priority PayGo의 레이턴시 개선 효과를 Gemini 3 Flash 모델을 사용해 벤치마크한 결과, 캐싱 유무보다 Thinking Level 설정(DEFAULT, LOW, MINIMAL)이 레이턴시 최적화에 더 큰 영향을 미치는 것으로 확인되었습니다.

https://news.hada.io/topic?id=26627

#vertexai #benchmark #latency #gemini3flash #optimization

Vertex AI Context Caching + Priority PayGo 레이턴시 벤치마크 (400회, Gemini 3 Flash)

<p>AI 챗봇 서비스에서 사용하는 ~7,500토큰 시스템 프롬프트(입력)와 ~100토큰 응답(출력) 기준으로, Vertex AI의 Context Caching과 이번에 새로...

GeekNews

Gemini 3 Flash now leverages zoom for fine‑grained visual detail, boosting PlanCheckSolver’s accuracy by 5 %. The upgrade sharpens AI Vision, enhances agentic inspection loops, and tightens code‑compliance checks. See how iterative inspection reshapes visual AI workflows. #Gemini3Flash #AIVision #PlanCheckSolver #IterativeInspection

🔗 https://aidailypost.com/news/gemini-3-flash-uses-zoom-fine-detail-improving-planchecksolver

Gemini 3 Flash’s new ‘Agentic Vision’ improves image responses – 9to5google

Gemini 3 Flash’s new ‘Agentic Vision’ improves image responses

Abner Li | Jan 27 2026 – 11:40 am PT

1 Comment

Agentic Vision is a new capability for the Gemini 3 Flash model to make image-related tasks more accurate by “grounding answers in visual evidence.”

Frontier AI models like Gemini typically process the world in a single, static glance. If they miss a fine-grained detail — like a serial number on a microchip or a distant street sign — they are forced to guess.

This new approach “treats vision as an active investigation” by combining visual reasoning with code execution and other tools in the future.

To answer prompts with images, Gemini 3 Flash will formulate “plans to zoom in, inspect and manipulate images step-by-step.” Specifically, Agentic Vision leverages a “Think, Act, Observe loop.”

  • Think: the model analyzes the user query and the initial image, formulating a multi-step plan.
  • Act: The model generates and executes Python code to actively manipulate images (e.g. cropping, rotating, annotating) or analyze them (e.g. running calculations, counting bounding boxes, etc).
  • Observe: The transformed image is appended to the model’s context window. This allows the model to inspect the new data with better context before generating a final response.
  • Instead of just describing an image it’s given, Gemini 3 Flash “can execute code to draw directly on the canvas to ground its reasoning.” One example of this image annotation in the Gemini app is asking “to count the digits on a hand.”

    To avoid counting errors, it uses Python to draw bounding boxes and numeric labels over each finger it identifies. This “visual scratchpad” ensures that its final answer is based on pixel-perfect understanding.

    Meanwhile, Gemini 3 Flash will zoom in when it detects fine-grained details in the image. Agentic Vision can also “parse high-density tables and execute Python code to visualize the findings.”

    Agentic Vision results in a “consistent 5-10% quality boost across most vision benchmarks” for Gemini 3 Flash.

    This is starting to roll out to the Gemini app with the Thinking model. For developers, it’s available today with the Gemini API in Google AI Studio and Vertex AI. 

    Continue/Read Original Article Here: Gemini 3 Flash’s new ‘Agentic Vision’ improves image responses

    #9to5GoogleCom #AgenticVision #ExecuteCode #Gemini #Gemini3Flash #GeminiApp #Google #ImageQuality #NewFromGemini

    Rewolucja odwołana? Nowe badania pokazują, że GPT-5.2 i Gemini 3 wciąż nie nadają się do prawdziwej pracy biurowej

    Dwa lata temu Satya Nadella obiecywał, że AI przejmie „pracę opartą na wiedzy”. Jeśli jednak rozejrzysz się po kancelariach prawnych czy bankach, ludzie nadal są tam niezbędni.

    Dlaczego? Nowy raport firmy Mercor brutalnie obnaża słabości najnowszych modeli: w starciu z bałaganem prawdziwej pracy biurowej, sztuczna inteligencja po prostu się gubi.

    Test prawdy: APEX-Agents

    Zapomnij o proszeniu AI o napisanie wierszyka czy rozwiązanie zagadki logicznej. Firma Mercor stworzyła nowy benchmark o nazwie APEX-Agents, który symuluje realne zadania pracowników umysłowych. Zamiast sterylnych pytań testowych, modele dostały zadania typu: „Sprawdź ten wątek na Slacku, porównaj go z polityką w PDF-ie, zerknij do arkusza kalkulacyjnego i powiedz, czy jesteśmy zgodni z RODO”.

    Wyniki? Katastrofa (dla AI)

    Rezultaty są kubłem zimnej wody na głowy entuzjastów automatyzacji. Nawet absolutna czołówka rynku – Gemini 3 Flash i GPT-5.2 – nie była w stanie przekroczyć 25% skuteczności.

    • Gemini 3 Flash: 24% poprawnych odpowiedzi.
    • GPT-5.2: 23% poprawnych odpowiedzi. Reszta stawki utknęła na poziomie kilkunastu procent. Oznacza to, że w 3 na 4 przypadkach AI albo podawało błędną odpowiedź, albo poddawało się w trakcie zadania.

    Dlaczego AI poległo?

    Brendan Foody, CEO Mercor, wskazuje na winowajcę: kontekst. Ludzie naturalnie potrafią „skakać” między różnymi źródłami informacji (mail, komunikator, plik tekstowy) i łączyć kropki. Dla AI ten „szum informacyjny” jest paraliżujący. Modele świetnie radzą sobie z jednym, konkretnym zadaniem, ale gubią się, gdy muszą syntetyzować dane z wielu rozproszonych źródeł jednocześnie.

    Twój nowy, niekompetentny stażysta

    Raport podsumowuje obecny stan technologii celną metaforą: dzisiejsze AI to nie „doświadczony profesjonalista”, który zabierze Ci pracę, ale „nieogarnięty stażysta”, któremu trzeba patrzeć na ręce, bo myli się w 75% przypadków.

    Czy to oznacza, że możemy spać spokojnie? Nie do końca. Choć wynik 24% wydaje się śmieszny, warto pamiętać o tempie zmian. Rok temu te same modele osiągały w podobnych testach wyniki rzędu 5-10%. Postęp jest więc wykładniczy. Ale na ten moment – w styczniu 2026 roku – Twoja posada w biurze jest bezpieczna. Przynajmniej dopóki nie nauczą robotów obsługi Slacka.

    Portfel w rękach robota. Młodzi dorośli wolą pytać AI o pieniądze niż bankiera

    #APEXAgents #Gemini3Flash #GPT52 #MercorBenchmark #news #pracaBiurowaAI #przyszłośćPracy

    Open‑source Falcon H1R 7B just hit 83.1 % on AIME 2025, out‑reasoning models up to 7× its size—including GPT‑5.2 and Gemini 3 Flash. The results showcase how community‑driven research can rival big‑lab efforts in mathematical reasoning. Dive into the details and see what this means for future AI benchmarks. #FalconH1R7B #AIME2025 #GPT52 #Gemini3Flash

    🔗 https://aidailypost.com/news/falcon-h1r-7b-scores-831-aime-2025-outreasoning-models-up-7-its-size

    Google AI announcements from December – Google

    The latest AI news we announced in December

    Dec 29, 2025, 6 min read

    Here’s a recap of our biggest AI updates from December, including the launch of Gemini 3 Flash, the release of new AI verification tools in the Gemini app and the arrival of Gemini’s powerful translation capabilities in Google Translate.

    Keyword Team

    Share

    For more than 20 years, we’ve invested in machine learning and AI research, tools and infrastructure to build products that make everyday life better for more people. Teams across Google are working on ways to unlock AI’s benefits in fields as wide-ranging as healthcare, crisis response and education. To keep you posted on our progress, we’re doing a regular roundup of Google’s most recent AI news.

    Here’s a look back at some of our AI announcements from December.

    December is usually a time for reflection, and looking ahead. That’s why this month we’ve been focused on taking frontier intelligence out of the lab and putting it into your hands in ways that actually matter for your day-to-day. Whether it’s the lightning speed of Gemini 3 Flash helping you tackle tasks in seconds, the new video verification tools in the Gemini app or the simple relief of having GenTabs tame your open tabs, these updates share a single goal: making technology adapt to you, not the other way around. And as we push these boundaries, we’re staying grounded in responsibility — launching new tools to help you verify AI content so you can explore this new frontier with confidence.

    We released Gemini 3 Flash, featuring frontier intelligence built for speed. Gemini 3 Flash brings frontier intelligence to virtually every corner of the Google ecosystem, combining the speed of our most advanced models with improved reasoning capabilities to help with everyday tasks, all while keeping costs significantly lower. It’s rolling out as the default model in the Gemini app and AI Mode in Search so people everywhere can now experience the incredible reasoning of our frontier model, right in our consumer products. And we’ve scaled this rollout to a global community, including developers building in the API Antigravity, our new agentic development platform, and enterprise customers on Vertex AI.

    We added new AI verification tools for videos in the Gemini app. We’re bringing video verification capabilities directly to the Gemini app. People can now upload videos — up to 100 MB or 90 seconds — and simply ask if the content was generated or edited using Google AI. Gemini uses imperceptible SynthID watermarks to analyze both audio and visual tracks, pinpointing exactly which segments contain AI-generated elements.

    We announced a new experiment to improve browsing and manage complex online tasks. We’ve all felt the friction of juggling dozens of tabs to research a topic or plan a trip. Enter Disco, a new browsing experience from Google Labs designed to tame that complexity. Disco features GenTabs, an experiment that proactively synthesizes your open tabs and chat history to build custom, interactive web applications — transforming a scattered browser session into a streamlined tool for getting things done.

    Continue/Read Original Article Here: Google AI announcements from December

    #2025 #AIUpdates #artificialIntelligence #Gemini3 #Gemini3Flash #Google #GoogleAI