Mastodawn

田中義弘 | taziku CEO / AI × Creative (@taziku_co)

긴 동영상의 가치가 '전부 보기'가 아니라는 관찰과 함께, Grok에 URL을 보내면 영상 전체의 요점을 반환한다는 사례 소개. 정보 섭취가 재생시간 대신 추출 정밀도의 경쟁으로 바뀔 수 있으며, 구조를 먼저 받는 시대가 도래할 수 있다고 전망.

https://x.com/taziku_co/status/2034202956791447781

#grok #videosummarization #multimodal #aiassistant

田中義弘 | taziku CEO / AI × Creative (@taziku_co) on X

長尺動画の価値は「全部見ること」ではなくなりつつある。 GrokはURLを送るだけで動画全体の要点を返す。情報摂取は再生時間ではなく、抽出精度の競争に変わるかもしれない。見て理解する前に、先に構造だけ受け取る時代に入った。 via：@cb_doge

X (formerly Twitter)

sayzard 16h ago

Angry Tom (@AngryTomtweets)

OpenClaw 에이전트가 기존의 캘린더·이메일·업무 자동화 기능에 더해 이제 음악, 전환, 페이싱을 포함한 '실제 비디오'를 자동 생성할 수 있게 되었다는 발표입니다. 여러 비디오 모델을 활용해 편집 기술이나 복잡한 프롬프트 없이도 완성된 영상 제작을 목표로 합니다.

https://x.com/AngryTomtweets/status/2034057409069793348

#agents #videogeneration #automation #multimodal

Angry Tom (@AngryTomtweets) on X

Your OpenClaw agent already runs your calendar, answers emails, and automates your work. But now… it can make full videos too. Real videos with music, transitions, and pacing from your favorite video models. No editing skills. No prompt gymnastics. Here’s how it works 👇

X (formerly Twitter)

sayzard 1d ago

Simon Willison (@simonw)

오늘 발표된 GPT-5.4의 mini 및 nano 릴리스에 대한 노트로, 특히 nano 모델은 개인의 76,000장 사진 라이브러리를 총 $52 비용으로 모두 설명할 수 있을 것처럼 보인다고 보고합니다. 경량 모델의 비용 효율적 멀티모달 활용 사례를 시사합니다.

https://x.com/simonw/status/2033991803050070082

#gpt5.4 #openai #multimodal #costefficiency

Simon Willison (@simonw) on X

Notes and pelicans for today's GPT-5.4 mini and nano releases - the nano model looks like it could describe every image in my 76,000 photo library for $52 total https://t.co/YtsNLXHWU1

X (formerly Twitter)

Hacker News 1d ago

Antfly: Distributed, Multimodal Search and Memory and Graphs in Go

https://github.com/antflydb/antfly

#HackerNews #Antfly #Distributed #Search #Multimodal #Memory #Graphs #Go

GitHub - antflydb/antfly

Contribute to antflydb/antfly development by creating an account on GitHub.

GitHub

sayzard 1d ago

H (@hcompany_ai)

NVIDIA GTC에서 Hugging Face와 함께 개발한 오픈소스 멀티모달 모델 Holotron-12B를 공개했습니다. 고처리량(high-throughput)으로 설계된 모델로 'computer-use agents' 시대에 최적화되었으며 Hugging Face 허브와 기술 심층 자료로 바로 사용해볼 수 있습니다.

https://x.com/hcompany_ai/status/2033851052714320083

#nvidia #huggingface #multimodal #holotron12b #opensource

H (@hcompany_ai) on X

🚀 Live from @NVIDIAGTC, we're releasing Holotron-12B! Developed with @nvidia, it's a high-throughput, open-source, multimodal model engineered specifically for the age of computer-use agents. Get started today! 🤗Hugging Face: https://t.co/SyAuqLIacS 📖Technical Deep Dive:

X (formerly Twitter)

Leon Furze Jan 14, 2024

What is Multimodal Generative Artificial Intelligence?

The term multimodal generative intelligence is getting thrown around a lot recently - even more so now that the most popular models like GPT have added features like image recognition and generation. But what does 'multimodal' actually mean? What is "Multimodal"? Although the term "multimodal" might seem self-explanatory, there's more to it than you might think. The term is being bandied around right now by many AI developers, but I like to consider it from a different perspective. My […]

https://leonfurze.com/2024/01/15/what-is-multimodal-generative-artificial-intelligence/

sayzard 1d ago

Mistral Small 4 공개 — 고속 인스트럭트, 강력한 추론, 멀티모달 입력(텍스트·이미지)과 코드 에이전트를 하나로 통합한 신형 모델. MoE(128 experts), 총 119B 파라미터(토큰당 6–8B 활성), 256k 컨텍스트, reasoning_effort로 응답 깊이 조절 가능. Apache 2.0 오픈소스, vLLM·Transformers 등 지원. 지연 40% 감소·처리량 3배 향상. 개발·기업·연구용 다목적 모델.

https://mistral.ai/news/mistral-small-4

#ai #multimodal #opensource #llm #inference

Introducing Mistral Small 4 | Mistral AI

sayzard 1d ago

cedric (@cedric_chee)

Mistral의 새로운 모델 'Mistral Small 4 119B A6B'은 Magistral의 추론 능력, Pixtral의 멀티모달 기능, Devstral의 에이전트형 코딩 성능을 하나로 통합한 다목적 모델로, 추론 강도를 조절할 수 있습니다. FP8 또는 NVFP4 형식의 가중치가 Hugging Face에서 다운로드 가능하다고 안내됩니다.

https://x.com/cedric_chee/status/2033695928167899294

#mistral #llm #multimodal #huggingface #fp8

cedric (@cedric_chee) on X

Mistral Small 4 119B A6B combines Magistral's reasoning, Pixtral's multimodal capabilities, and Devstral's agentic coding strengths into a single versatile model with configurable reasoning effort. Download FP8 or NVFP4 weights on HF.

X (formerly Twitter)

sayzard 5d ago

OpenAI Developers (@OpenAIDevs)

GPT-5.4의 이미지 인코더에서 작은 버그를 수정하여 이미지 입력 처리 품질이 일부 향상되었습니다. 이로 인해 이미지 이해 관련 일부 사용 사례에서 결과가 개선될 수 있으며, 사용자 측에서 별도 조치가 필요하지 않습니다.

https://x.com/OpenAIDevs/status/2032555646399427051

#gpt5.4 #imageencoder #modelupdate #multimodal

OpenAI Developers (@OpenAIDevs) on X

We updated our image encoder to fix a small bug for image inputs GPT-5.4. Some image understanding use cases may now see improved quality. No action needed. https://t.co/OUvMWsRRtm

X (formerly Twitter)

sayzard 5d ago

Claude (@claudeai)

Opus 4.6이 100만 토큰 환경에서 MRCR v2에서 78.3%를 기록하며 프론티어 모델 중 최고 성능을 냈고, 전체 코드베이스와 대규모 문서 세트 및 장기 실행 에이전트 로드가 가능해졌습니다. 요청당 미디어 한도가 이미지/PDF 페이지 600장으로 확대되었다는 기능 확장 소식도 포함됩니다.

https://x.com/claudeai/status/2032509550239297864

#opus #mrcr #longcontext #llm #multimodal

Claude (@claudeai) on X

Opus 4.6 scores 78.3% on MRCR v2 at 1 million tokens, highest among frontier models. Load entire codebases, large document sets, and long-running agents. Media limits expand to 600 images or PDF pages per request.

X (formerly Twitter)