Arint - SEO+KI (@[email protected])

<p>RT @googlegemma: Triff Gemma 4 12B!</p> <p><a href="https://arint.info/@Arint/116689851167947945">mehr</a> auf <a href="https://arint.info/">Arint.info</a></p> <p>#AI #Gemma4 #MachineLearning #Multimodal #OpenSource #TechNews #arint_info</p> <p><a href="https://x.com/googlegemma/status/2062202706882883696#m">https://x.com/googlegemma/status/2062202706882883696#m</a></p>

Mastodon Glitch Edition

La IA está revolucionando la música. ¿Cómo deberían manejarlo los Grammys?

En el podcast, Harvey Mason Jr., CEO de la Recording Academy, comenta que la IA generativa se ha vuelto omnipresente en la producción musical en los últimos 18 meses y analiza sus implicaciones para los Premios Grammy. También reflexiona sobre las predicciones anteriores y el estado actual de las herramientas de IA en la…

https://www.theverge.com/podcast/940831/ai-grammys-music-recording-harvey-mason
#multimodal

AI is blowing up music. How should the Grammys handle it?

Recording Academy CEO Harvey Mason Jr. discusses how AI tools are taking over music production and how to keep human creativity centered.

The Verge
Introducing Gemma 4 12B: a unified, encoder-free multimodal model

An overview of Gemma 4 12B, a model designed to bring high-performance multimodal intelligence directly to your laptop.

Google

Lots of interest for our poster on RGB and thermal scene reconstruction. I’m glad people were as surprised about our results as I was originally :).

#icra #robotics #computervision #science #machinelearning #artificialintelligence #multimodal

Introducing the unified multi-modal `MLX` engine architecture in LM Studio

Leveraging `mlx-lm` and `mlx-vlm` to achieve unified multi-modal LLM inference in LM Studio's `mlx-engine`.

LM Studio Blog

Complete streets is not a trend. It's what a street is supposed to be.

Public streets belong to everyone who uses them. Before postwar auto-centric planning, most urban streets were multimodal by necessity. We engineered our way out of that and spent 70 years calling it progress. Complete streets is the correction, not the revolution.

#CompleteStreets #UrbanPlanning #Multimodal #StreetDesign

We are headed to #ICRA 2026!

We will be showcasing two of our latest research breakthroughs at the International Conference on Robotics and Automation (ICRA) next week.
_

1️⃣ D-CAT 😼: Decoupled Cross-modal Knowledge Transfer
📍 Track: Interactive Session 6 − Hall C (Poster Session)
📅 Date/Time: Thursday, 15:00-16:30, Paper ThI2I.59
🧑‍🔬 Authors: Leen Daher,* Zhaobo Wang, Malcolm Mielle

Our work on D-CAT solves a major real-world challenge: training a system with rich, multi-modal data (like video, IMU, and audio) but allowing it to operate using only a single sensor during inference.

By enabling cross-modal knowledge transfer, D-CAT reduces hardware redundancy and costs without sacrificing accuracy—even showing up to a 10% F1-score gain in certain scenarios!

Paper: https://arxiv.org/pdf/2509.09747

* Leen Daher is currently searching for a PhD/job in the EU, and will be presenting in person.
_

2️⃣ Unpaired Multi-Modal Reconstruction 📸🔥
📍 Track: Workshop Paper - MM-SpatialAI Workshop
📅 Date/Time: June 1st
🧑‍🔬 Authors: Jean Cordonnier, Chenghao Xu, Olga Fink, Malcolm Mielle

We introduces a framework for independent RGB-Thermal Novel View Synthesis reconstruction. We’ve developed a framework combining VGGT, feature matching, and multi-modal 3DGS to accept decoupled RGB/TIR image sets, proving that misalignment between modalities doesn't have to compromise the quality of the reconstruction.

#ICRA2026 #Robotics #MachineLearning #ComputerVision #MultiModal #SensorFusion

El nuevo modelo de IA de Google es salvaje

Google ha lanzado un modelo de IA que puede crear videos de cualquier cosa a cualquier cosa. El autor de este artículo utilizó el modelo para crear videos de un muñeco de un ciervo en vacaciones.

https://www.theverge.com/tech/936507/gemini-omni-hands-on-deepfake-ai-video
#multimodal

Google’s new anything-to-anything AI model is wild

Gemini’s Omni Flash model can generate AI video from real images and clips. Making it look like you’re on a flight to France just got way easier.

The Verge

Hacia la generación de texto a la velocidad de la luz con los modelos de lenguaje de Nemotron-Labs

Nemotron-Labs ha desarrollado modelos de lenguaje de difusión para la generación de texto a velocidades extremadamente altas.

https://huggingface.co/blog/nvidia/nemotron-labs-diffusion
#multimodal

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models

A Blog post by NVIDIA on Hugging Face

Se utiliza el AI para resucitar las voces de pilotos muertos

Un equipo de investigadores ha utilizado el AI para reconstruir las grabaciones de voz de pilotos fallecidos a partir de imágenes espectrográficas de grabaciones de cabina.

https://techcrunch.com/2026/05/22/ai-is-being-used-to-resurrect-the-voices-of-dead-pilots
#multimodal

AI is being used to resurrect the voices of dead pilots | TechCrunch

People used AI on a spectrogram image of cockpit recordings to reconstruct them, forcing the NTSB to temporarily block access to its docket system.

TechCrunch