Заставляем голосовых ассистентов Марусю и Салют материться без принуждения и спецсредств

Всем привет! В какой-то момент у меня появился простой вопрос: «А можно ли заставить ассистента произнести что-то, что он в норме говорить не должен?» Без API, без навыков программирования, без автоматизации и т.п. Оказалось - можно.

https://habr.com/ru/articles/1019688/

#voice_assistant #prompt_injection #LLM #безопасность #голосовые_ассистенты #AI #TTS #NLP #уязвимости #user_input

Заставляем голосовых ассистентов Марусю и Салют материться без принуждения и спецсредств

Всем привет! Не такую первую публикацию я планировал сделать на Хабр: есть пара более серьёзных и интересных тем, которыми я планирую поделиться, но перфекционизм пока не даёт их добить. А тут...

Хабр
Oxytude – Accessibilité, informatique et nouvelles technologies

Four open source models exist right now that do something the previous generation struggled with. They do not just generate speech. They clone a voice from a short audio sample and produce output that is genuinely difficult to compare from the original speaker.

The gap between open source and commercial TTS has been closing for a while. These four models suggest it has effectively closed for voice cloning specifically.
https://firethering.com/open-source-tts-voice-cloning/

#opensource #tts #ai #trending #texttospeech

4 Open-Source TTS Models That Can Clone Voices and Actually Sound Human

Voice cloning used to mean expensive studio software, proprietary APIs with per-character pricing, or models so heavy they needed server infrastructure just to run. That changed quietly over the last few months. Four open source models exist right now that do something the previous generation struggled with. They do not just generate speech. They clone a voice from a short audio sample and produce output that is genuinely difficult to compare from the original speaker. The gap between open source and commercial TTS has been closing for a while. These four models suggest it has effectively closed for voice cloning specifically. Here is what each one actually does and who it is for.

Firethering

Gesucht: Dokumente-Vorlese-App
#tts #texttospeech #text_to_speech
evtl #blind #disabled #screenreader

Fürs handsfree Studium such ich ne App, die mir pdf vorliest. Und zwar so, dass ich keine akustische Vollkrise bekomme und nicht meine Inhalte an eine BigTech-Firma weitergebe.

Ich habe SherpaTTS gefunden, aber mit welchem Programm nutze ich das auf dem Handy? Oder was gibt es als Alternative für Linux-PC/Firefox?

(an dieser Stelle: #barrierefreiheit sollte generell voran gebracht werden!)

#OCR: #MinerU
#Translate: Ebook-Translator-Calibre-Plugin + Google Translate
#TTS: #legado-with-MD3 + #MultiTTS + #讯飞水哥
#Read: #Kindle + #KOReader + #霞鹜文楷
#伴读: #GoogleAIStudio + #NotebookML
一气呵成
Maider eta Antton: Euskarazko TTS ahotsak Piper motorrerako egokitu ditut

Azkenaldian adimen artifizialeko eredu lokalekin nabil lanean edo hobe esan, etxean probak egiten nire etxeko "laborategi" informatikoan. Hainbat kezka eta jakin min ditut buruan AAri dagokionez (ez naiz orain denak esaten hasiko) eta horietako bat euskarazko TTS (Text To Speech) libreen beharra izan da aspalditik. Testutik euskarazko audiora pasatzen duen teknologia badugu, baina ez zegoen orain arte software librean txertatzeko modurik. Gainera, erabiltzen ditudan software libreko hainbat proiektutan TTS teknologia hau eskuragarri dago dagoeneko; Piper da horren adibideetako bat. Nola hit...

ゆうすけ (@yusuke_kizuna)

Aratako가 새로운 음성 생성 도구 Irodori-TTS-500M-v2-VoiceDesign을 공개한 것으로 보인다. 사용자가 목소리의 음색을 설계할 수 있고, GPT-SoVITS2 기반 기존 도구보다 TTS 품질과 음질 다양성이 더 좋다고 평가한다. 음성 합성 분야의 흥미로운 신규 프로젝트다.

https://x.com/yusuke_kizuna/status/2039295281800650870

#tts #voice #opensource #huggingface #audiogen

ゆうすけ (@yusuke_kizuna) on X

あらたこさんがまたすごいものをリリースしています! 声質を作れるやつ! 僕が作ったGPT-SoVITS2ベースの声デザイナーよりも、TTSも良いし、おそらく声質の幅もすごい!? https://t.co/C96SxafeDm

X (formerly Twitter)
For Those who may not know. #Piper, the neural #TTS is now usable on #IOS, but keep in mind, the experience is terrible. The voices are not responsive, and when they are, they speak so slowly that at 80% speech rate, it sounds rediculously slow. I wouldn't try it if I wer you.