Mastodawn

⛐ Bridging Vision, Language, and Mathematics: Pictographic Character Reconstruction with Bézier Curves

#cs #graphics #text #characters #cg #béziercurves #llm #ai #vision #machinevision

Bridging Vision, Language, and Mathematics: Pictographic Character Reconstruction with Bézier Curves

While Vision-language Models (VLMs) have demonstrated strong semantic capabilities, their ability to interpret the underlying geometric structure of visual information is less explored. Pictographic characters, which combine visual form with symbolic structure, provide an ideal test case for this capability. We formulate this visual recognition challenge in the mathematical domain, where each character is represented by an executable program of geometric primitives. This is framed as a program synthesis task, training a VLM to decompile raster images into programs composed of Bézier curves. Our model, acting as a "visual decompiler", demonstrates performance superior to strong zero-shot baselines, including GPT-4o. The most significant finding is that when trained solely on modern Chinese characters, the model is able to reconstruct ancient Oracle Bone Script in a zero-shot context. This generalization provides strong evidence that the model acquires an abstract and transferable geometric grammar, moving beyond pixel-level pattern recognition to a more structured form of visual understanding.

arXiv.org

NERDS.xyz – Real Tech News for Real Nerds Nov 11

Kyocera triple-lens AI depth camera could help robots stop fumbling tiny objects

https://fed.brid.gy/r/https://nerds.xyz/2025/11/kyocera-triple-lens-ai-depth-camera/

VTC News Bot Nov 7

VisionWave khẳng định vị thế tiên phong về thị giác máy và tự động hóa công nghiệp tại Việt Nam. Sau Diễn đàn Số VN–Hàn Quốc 2025, công ty định vị là cầu nối công nghệ Hàn Quốc với sản xuất thông minh Việt Nam, mang đến giải pháp tiên tiến cho nhà máy. #VisionWave #MachineVision #IndustrialAutomation #KoreanTech #SmartManufacturing #CôngNghệHànQuốc #SảnXuấtThôngMinh #ThịGiácMáy #TựĐộngHóaCôngNghiệp #MadeInVietnam #InnovativeManufacturing #VietnamTech #VTCNews

https://vtcnews.vn/visionwave-cau-n

ResearchBuzz: Firehose Nov 6

Nature: Fair human-centric image dataset for ethical AI benchmarking. “…we introduce the Fair Human-Centric Image Benchmark (FHIBE, pronounced ‘Feebee’), a publicly available human image dataset implementing best practices for consent, privacy, compensation, safety, diversity and utility. FHIBE can be used responsibly as a fairness evaluation dataset for many human-centric computer vision […]

https://rbfirehose.com/2025/11/06/nature-fair-human-centric-image-dataset-for-ethical-ai-benchmarking/

Nature: Fair human-centric image dataset for ethical AI benchmarking | ResearchBuzz: Firehose

ResearchBuzz: Firehose | Individual posts from ResearchBuzz

Paul Houle Oct 18

💔 Are Foundation Models Ready for Industrial Defect Recognition? A Reality Check on Real-World Data

https://arxiv.org/abs/2509.20479

#computing #cs #machinevision #llm #ai #ml #manufacturing #defects #automation

Are Foundation Models Ready for Industrial Defect Recognition? A Reality Check on Real-World Data

Foundation Models (FMs) have shown impressive performance on various text and image processing tasks. They can generalize across domains and datasets in a zero-shot setting. This could make them suitable for automated quality inspection during series manufacturing, where various types of images are being evaluated for many different products. Replacing tedious labeling tasks with a simple text prompt to describe anomalies and utilizing the same models across many products would save significant efforts during model setup and implementation. This is a strong advantage over supervised Artificial Intelligence (AI) models, which are trained for individual applications and require labeled training data. We test multiple recent FMs on both custom real-world industrial image data and public image data. We show that all of those models fail on our real-world data, while the very same models perform well on public benchmark datasets.

arXiv.org

Mind Lude Oct 16

My brain after debugging for hours: needs spatial-temporal reasoning. General Intuition just raised $134M to teach AI agents exactly that, leveraging video game data (the same kind OpenAI wanted!). We're talking AI that understands movement through space and time. What's one mundane task you'd immediately hand over to an AI with perfect spatial reasoning?

#AI #Robotics #MachineVision #TechCrunch #FutureTech https://techcrunch.com/2025/10/16/general-intuition-lands-134m-seed-to-teach-agents-spatial-reasoning-using-video-game-clips/

General Intuition lands $134M seed to teach agents spatial reasoning using video game clips | TechCrunch

Late last year, OpenAI reportedly tried to buy Medal and its vast trove of video game data for $500M. Today, the company spun out a frontier research lab that's using that data to build AI agents that understand how they move through space and time, a concept called spatial-temporal reasoning.

TechCrunch

Daniel Oct 9

🔍 Novo no blog: Fundamentos de Visão Computacional para Sistemas Industriais

Descubra no post como a visão computacional substitui inspeções manuais com câmeras e algoritmos. Abordo conceitos de captura de imagem, resolução / pixels / intensidade, além de aplicações como localização, medição, inspeção e identificação. Tudo isso com foco em sistemas industriais que exigem robustez e confiabilidade.
Confira: https://danieltak.com.br/posts/vision/computer-vision-fundamentos/

#VisãoComputacional #MachineVision #danieltak #ComputerVision

Sistemas de Visão Industrial - Fundamentos Básicos

Fundamentos Básicos para a Visão Computacional na Indústria.

danieltak

Nerves Meetup Oct 6

The next Nerves meetup is Wednesday, October 29! Vittoria will showcase the capability of running ML inference pipelines on a Raspberry Pi 5 with the Hailo HAT using Nerves. You'll learn about writing pre or post-processing code with Nx, on low-cost hardware, without sending data over the internet.

RSVP here: https://www.meetup.com/nerves/events/305303640

#MachineVision #AI #ElixirLang #RaspberryPi

Paul Houle Sep 30

📏 Automating Leaf Area Measurement in Citrus: The Development and Validation of a Python-Based Tool

https://www.mdpi.com/2076-3417/15/17/9750

#citrus #metrology #measurement #vision #machinevision #python #plants #software #softwaredevelopment #programming

Paul Houle Sep 29

🧾 Multi-Modal Vision vs. Text-Based Parsing: Benchmarking LLM Strategies for Invoice Processing

https://arxiv.org/abs/2509.04469

#software #ai #ml #llm #vision #machinevision #computervision #cs #invoice #receipt

Multi-Modal Vision vs. Text-Based Parsing: Benchmarking LLM Strategies for Invoice Processing

This paper benchmarks eight multi-modal large language models from three families (GPT-5, Gemini 2.5, and open-source Gemma 3) on three diverse openly available invoice document datasets using zero-shot prompting. We compare two processing strategies: direct image processing using multi-modal capabilities and a structured parsing approach converting documents to markdown first. Results show native image processing generally outperforms structured approaches, with performance varying across model types and document characteristics. This benchmark provides insights for selecting appropriate models and processing strategies for automated document systems. Our code is available online.

arXiv.org