КАК ЖЕ хорошо генери'ть треки на RTX-карте с 16 гигами, ну, йоумайоу!.. Словно оладушки со сковородки, вылетают!..
#музыка #generativeai #comfyui #flac #stableaudio3
Очередной слоп-хаус.
В этот раз, чистая генерация на моей RTX 5060 в ComfyUi со Stable Audio 3 Base.
Это 8 генерация из 8, что интересно.
К сожалению, вынужден заметить, что эта нейронка до сих пор содержит много "металла" или "песка" в генерируемых треках, НО ЗАТО 6 МИНУТ за раз!..
Занятно было наблюдать, как она работала, в ДЗ: сначала она брала гигов 5, а затем вдруг резко занимала все 16 гигов - и опять сваливалась к минимум, сбрасывая файл на диск (кстати, можно FLAC сохранитель подключить, если вам, как и мне, MP3 не нравится)
#музыка #musicproduction #house #comfyui #stableaudio3
Stability AI lansează Stable Audio 3.0 - GadgetFlux

Stability AI lansează Stable Audio 3.0, modele audio open‑weight cu generare muzicală avansată, durate extinse și compoziție direct pe dispozitiv.

GadgetFlux
Ah yes, "Stable Audio 3" - because nothing stabilizes audio quite like an unreadable jumble of arXiv IDs and a metric ton of academic jargon! 📚🎶 Congrats on making sound as exciting as a spreadsheet. 🙄🔊
https://arxiv.org/abs/2605.17991 #StableAudio3 #AcademicJargon #SoundTech #AudioResearch #TechHumor #HackerNews #ngated
Stable Audio 3

Stable Audio 3 is a family of fast latent diffusion models (small, medium, large) for variable-length audio generation and editing. Since our models can generate several minutes of audio, variable-length generations are key to avoid the cost of producing full-length generations for short sounds. We also support inpainting, enabling targeted audio editing and the continuation of short recordings. Our latent diffusion models operate on top of a novel semantic-acoustic autoencoder that projects audio into a compact latent space, enabling efficient diffusion-based generation while preserving audio fidelity and encouraging semantic structure in the latent. Finally, we run adversarial post-training to both accelerate inference and improve generation quality, reducing the number of inference steps while improving fidelity and prompt adherence. Stable Audio 3 models are trained on licensed and Creative Commons data to generate music and sounds in less than a 2s on an H200 GPU and less than a few seconds on a MacBook Pro M4. We release the weights of small and medium, that can run on consumer-grade hardware, together with their training and inference pipeline.

arXiv.org
Stable Audio 3

Stable Audio 3 is a family of fast latent diffusion models (small, medium, large) for variable-length audio generation and editing. Since our models can generate several minutes of audio, variable-length generations are key to avoid the cost of producing full-length generations for short sounds. We also support inpainting, enabling targeted audio editing and the continuation of short recordings. Our latent diffusion models operate on top of a novel semantic-acoustic autoencoder that projects audio into a compact latent space, enabling efficient diffusion-based generation while preserving audio fidelity and encouraging semantic structure in the latent. Finally, we run adversarial post-training to both accelerate inference and improve generation quality, reducing the number of inference steps while improving fidelity and prompt adherence. Stable Audio 3 models are trained on licensed and Creative Commons data to generate music and sounds in less than a 2s on an H200 GPU and less than a few seconds on a MacBook Pro M4. We release the weights of small and medium, that can run on consumer-grade hardware, together with their training and inference pipeline.

arXiv.org