Jay Sensei (@hckinz)

대다수 주요 텍스트→이미지·텍스트→비디오 생성 시스템은 여전히 핵심에 디퓨전(diffusion) 계열 기법을 사용하고 있으며, flow matching 같은 방법은 사실상 현대화된 디퓨전으로 볼 수 있다는 관찰입니다. 즉, 현재 생성형 비전 모델들의 근간은 여전히 디퓨전 계열 기술에 의존하고 있다는 기술적 통찰을 제공합니다.

https://x.com/hckinz/status/2034056299978362931

#diffusion #flowmatching #texttoimage #texttovideo #generativeai

Jay Sensei👾 (@hckinz) on X

Almost every major text-to-image / text-to-video system still relies on diffusion (or very close relatives like flow matching, which is basically modernized diffusion) at its core.

X (formerly Twitter)

New research shows over 90% of gamers find playing with AI-powered NPCs to be "enjoyable and rewarding"

This is what I've been saying ever since I first played around with GPT-2 for a while (I found GPT-3 and later versions actually rather boring in comparison, and they don't even write any better meaningless Dada poetry than ancient GPT-2): #transformer type #LLM powered NPCs can make gaming so much more fun. Just use rather small LLMs that have been trained on all the lore and run locally on the GPU, and you get NPCs to whom you can actually talk. Also, if they hallucinate non-existing lore, real humans often do things like that as well. Give every inportant NPC a couple of scripted lines which contain the important information, and use an LLM so the player can (through their character) talk to the NPCs about topics that aren't in the script.

If you write such an NPC you put a list of things that character happens to know or believe in the NPC prompt, just after the general character description part of the prompt, and as soon as the player deviates from the scripted parts, the LLM drives the conversation. You will most likely have a UI that gives you screen mask with separate text boxes: Character description, description of the character's knowledge and beliefs, description of the character's situation, and a box filled with all conversations with any player character so far. If you as a player character talk to some random stranger from the street, the LLM generates everything. Natural feeling speech synthesis has been a thing for a couple of years now, just like generating a natural sounding human voice from nothing but a prompt describing the speaker (age, gender, accent, personality, current emotional state). It is therefore possible to give each randomly generated NPC their own voice, their own personality, each of them completely unique yet completely generic. A mesh #diffusion type 3D model generator can be used to automatically generate variations of original 3D models from a prompt, generating slightly changed hairstyles, clothes, jewelry, tools, etc. on the fly when needed, using the GPU.

All the AI models needed for this can be made small enough to run on the GPU, although you'll probably need a computer in the >900€ range to run it, or maybe more like >1200€ now that #BigAI are buying all the hardware. If people can't get any decent gaming hardware, computer games will become either much simpler than today or very much dependent on external computing centres to do much of the compute, even if they don't much AI. We don't even need humongous AI models to make gaming better, only small ones that do one thing good enough for the game. Of course NVdia will do everything they can to monopolise the gaming AI by putting their own models in their GPU firmware, with game developers having to pay fees in order to be able to train their own LoRAs so they can actually use the NVidia AI for their own games. However, open source machine learning models aren't restricted in that way, and they can also be used in games.

Oh, by the way, many games already use latent diffusion type models for graphics. This is how you get realtime raytracing with lots of detail in 2k or even 4k and things like that: Part of the GPU is rendering the scene with accurate lighting in low resolution, then the AI is used for upscaling and adding/reconstructing the finer details. The AI in question has been trained on high quality renderings of the same scenes. You need to make a couple of seconds of Pixar quality animation from each scene in the game, train a LoRA for each scene for a video upscaler/filter, and then run the low resolution, low detail raytraced video stream through the upscaler, that's basically how they do it. The actual workflow is much more complicated, but it's not really important if you just want to understand the basic idea. The trick is just to render accurately what needs to be accurate, like lighting, and then use AI to generate a good enough approximation of everything else.

https://www.gamesindustry.biz/new-research-shows-over-90-of-gamers-find-playing-with-ai-powered-npcs-to-be-enjoyable-and-rewarding

New research shows over 90% of gamers find playing with AI-powered NPCs to be "enjoyable and rewarding"

"Players kick back at AI that is taking away from creativity. But when AI is used to power totally new types of interactive experience, then it’s a very differe

GamesIndustry.biz

В такие моменты я жалею, что в GoToSocial нет реакций потому, что хочется поставить клоуна.

https://zeroes.ca/@kimcrawley/116229223350917046

Отдельно доставляет эта манипулятивная снисходительность, конечно.

#GenAI #LLM #diffusion #log #people #shit #manipulation

Kim Crawley 😷 (she/her) (@[email protected])

Attached: 1 image @[email protected] David, I thought you were really cool. But it turns out you're not really against Gen AI after all.

zeroes.ca

Mark Gadala-Maria (@markgadala)

신규 비디오 생성 모델 'HELIOS' 공개: 14B 규모의 autoregressive diffusion 모델로, 단일 텍스트 프롬프트로 최대 60초의 일관된 비디오를 생성한다고 발표됨. 성능은 NVIDIA H100 한 장에서 초당 19.5프레임으로 실시간급 처리에 가까워 같은 규모 모델로는 최초 사례로 보임.

https://x.com/markgadala/status/2029572916141007273

#videogeneration #diffusion #helios #nvidia #h100

Mark Gadala-Maria (@markgadala) on X

🚨 BREAKING: NEW VIDEO MODEL "HELIOS" GENERATES 1 FULL MINUTE OF VIDEO FROM A SINGLE PROMPT >MODEL: 14B autoregressive diffusion model — first of its size to hit real-time >OUTPUT: Up to 60 seconds of coherent video from a single text prompt >SPEED: 19.5 FPS on one NVIDIA H100

X (formerly Twitter)

Mark Gadala-Maria (@markgadala)

BREAKING: 새 비디오 생성 모델 'HELIOS'는 단일 텍스트 프롬프트로 최대 60초 길이의 일관된 비디오를 생성한다고 발표되었습니다. 모델은 14B 파라미터의 autoregressive diffusion 아키텍처로, 동급 최초로 실시간 처리 성능을 달성했다며 단일 NVIDIA H100에서 19.5 FPS의 속도를 기록했다고 보고합니다.

https://x.com/markgadala/status/2029572916141007273

#helios #videogeneration #diffusion #realtime #ai

Mark Gadala-Maria (@markgadala) on X

🚨 BREAKING: NEW VIDEO MODEL "HELIOS" GENERATES 1 FULL MINUTE OF VIDEO FROM A SINGLE PROMPT >MODEL: 14B autoregressive diffusion model — first of its size to hit real-time >OUTPUT: Up to 60 seconds of coherent video from a single text prompt >SPEED: 19.5 FPS on one NVIDIA H100

X (formerly Twitter)

Python Trending (@pythontrending)

dLLM(dllm): Simple Diffusion Language Modeling이라는 프로젝트/도구 공개 알림입니다. 확산(diffusion) 기반 기법을 언어 모델링에 적용한 간단한 구현체 또는 연구용 레퍼런스로 보이며, 확산 기반 LLM 실험·연구를 위한 오픈 프로젝트 성격으로 해석됩니다.

https://x.com/pythontrending/status/2029150890003722658

#diffusion #llm #languagemodeling #opensource

Python Trending 🇺🇦 (@pythontrending) on X

dllm - dLLM: Simple Diffusion Language Modeling https://t.co/C1OuWPVlg2

X (formerly Twitter)
Montpellier: Émission | Radio FM-Plus | Entrée Libre | Diffusion, Le mercredi 4 mars 2026 de 12h00 à 13h00. https://www.agendadulibre.org/events/34694 #montpelLibre #radio #fmPlus #diffusion #libre
Émission | Radio FM-Plus | Entrée Libre | Diffusion

Montpel'libre réalise une série d’émissions régulières à la Radio FM-Plus intitulées « Entrée Libre ». Ces émissions sont la présentation hebdomadaire des activités de Montpel’libre. Après le jingle où l’on présente brièvement Montpel'libre, nous donnerons un coup de projecteur sur les activités qui

From Noise to Image · Lighthouse Software

An interactive, visual guide to the magic behind how AIs generate images from text.

fly51fly (@fly51fly)

[논문] "dLLM: Simple Diffusion Language Modeling"(UC Berkeley & UIUC, 2026). 확산 기반의 언어모델링 접근법(dLLM)을 제안하는 연구로, 텍스트 생성에서 확산 모델을 적용하는 새로운 방법론과 실험 결과를 소개합니다. arXiv 원문 링크 포함.

https://x.com/fly51fly/status/2027498090912223416

#diffusion #llm #research #generativemodels

Abhishek Yadav (@abhishek__AI)

PersonaLive는 단일 이미지를 실시간으로 무한 길이의 표현력 있는 애니메이션 토킹헤드 비디오로 변환하는 오픈 소스 프로젝트입니다. ComfyUI 지원, 실시간 디퓨전 프레임워크, 12GB VRAM 스트리밍 지원, WebUI 제공 및 TensorRT를 활용해 약 2배 빠른 성능을 목표로 합니다. 라이브 스트리밍용 실시간 애니메이션/아바타 제작에 유용한 개발 도구입니다.

https://x.com/abhishek__AI/status/2026683626348490894

#realtime #diffusion #comfyui #tensorrt #livestream

Abhishek Yadav (@abhishek__AI) on X

Live portrait anime just went real time 🤯 PersonaLive turns a single image into infinite length, expressive talking head video for live streaming. → ComfyUI support → Real time diffusion framework → 12GB VRAM streaming support → WebUI and TensorRT (~2x faster) 100% Open

X (formerly Twitter)