Mastodawn

Three new Kitten TTS models – smallest less than 25MB

#HackerNews #KittenTTS #KITTENML #TextToSpeech #AIModels #SmallModels

GitHub - KittenML/KittenTTS: State-of-the-art TTS model under 25MB 😻

State-of-the-art TTS model under 25MB 😻 . Contribute to KittenML/KittenTTS development by creating an account on GitHub.

GitHub

River City Random ☑️Mar 12

Remember when computer-generated voices and virtual pop idols were cool and cute and a completely new and exciting music genre and not the constant background noise of our horrifying computerized dystopia? Pepperidge Farm remembers. Man this is a bop, even a decade and a half later.

https://www.youtube.com/watch?v=duPJqfKiA78

#hatsunemiku #vocaloid #texttospeech #baka #triplebaka #music #synthesizer #electro #jpop

slamp Mar 11

Fish Audio has open-sourced S2, a #texttospeech model that supports fine-grained inline control of prosody and emotion using natural-language tags like [laugh], [whispers], and [super happy]

https://github.com/fishaudio/fish-speech

#AI

GitHub - fishaudio/fish-speech: SOTA Open Source TTS

SOTA Open Source TTS. Contribute to fishaudio/fish-speech development by creating an account on GitHub.

GitHub

Inautilo Mar 11

#Business #Guides
Your browser can already speak a page · How to activate read-aloud features on web pages https://ilo.im/16b5hy

_____
#Reading #Audio #Accessibility #TextToSpeech #Text #Content #Webpages #Browsers

Your Browser Can Already Speak a Page

Users can customize the features built into the browser, something not often available from third-party approaches. Is an “AI” company offering to provide spoken versions of your pages for users? Is an overlay company promising to make your content more accessible by its overlay speaking it? Is some other vendor…

Adrian Roselli

eqtv Mar 10

Google AI Studio — The Only App Builder You’ll Ever Need

https://peertube.eqver.se/w/w7GqLAE9VKoEauQJZ6y2bA

red_027_en

PeerTube

PsychoticSheep Mar 10

Local AI Text-to-Speech Demo with Coqui TTS

Coqui TTS is an AI-powered text-to-speech synthesis platform that can automatically convert written text into natural-sounding speech. The system is based on modern deep learning models and can run entirely locally, making it particularly suitable for privacy-friendly applications and offline projects.

In this example, Coqui TTS is used directly through the Python API. This allows the model to be flexibly integrated into custom scripts and controlled automatically, for example to convert text into audio files or to process larger amounts of text.

Since many text-to-speech models can only process very long texts to a limited extent, the input text is divided into smaller sections (chunks) before processing. These are synthesized one after another and then combined into a complete audio output.

In this example, the model is executed locally on the CPU. Although some AI models support GPU acceleration, Coqui TTS can run reliably without specialized hardware and can therefore be used on many different systems.

The audio output generated by the model is initially a raw file. To improve sound quality, additional post-processing is recommended, such as removing clicks or artifacts, slightly smoothing audio transitions, or applying other minor corrections.

The Creepypasta used in this demo is in German and contains disturbing content.

https://creepypasta.fandom.com/de/wiki/Trypophobia

Video workflow:

- Recorded with OBS
- Edited in Kdenlive
- Transcoded with VAAPI (H.264)

No cloud, no API keys, real hardware, just Python.
Everything runs on Linux + Python (FOSS), so anyone can set this up.
No GPU? In this case… it doesn't matter.

#AI #TextToSpeech #CoquiTTS #Python #AIVoice #SpeechSynthesis #foss #LocalAI #OpenSourceAI #AItools #Artificialtelligence #AIDevelopment

Jeff Fortin T. (風の庭園のNekohayo)Mar 7

We should do a crowdfunding campaign for a "Starcraft Terran siege tank driver" text-to-speech voice for Piper, so that Orca can angrily read GTK widgets at you with this kind of confident and upbeat intonation: https://youtu.be/dtoIv9BzPHk?t=16

#Piper #Orca #TTS #texttospeech #speechsynthesis #GNOME

Siege Tank All Quotes - StarCraft Remastered

YouTube

Show thread

Digital Freedom Foundation Mar 7

2/2 If you don't want to learn #HTML and #XML coding, you can use #FreeSoftware like #Scribus to design and generate #EPUB formats for you. #Calibre can organize your #ebook #library. Bard incorporates #FLite #TextToSpeech to read ebooks aloud to you. #DFD2026

Show thread

Who Let The Dogs Out 🐾Mar 6

#tts #TextToSpeech #AI

#6. СКЛЕЙКА (БЕЗ ПОТЕРИ КАЧЕСТВА)

ls "WORKDIR"/part_*.mp3 | sort | xargs -I {} echo "file '{}'" > "WORKDIR/list.txt"
ffmpeg -f concat -safe 0 -i "WORKDIR/list.txt" -c copy -y "FINAL_FILE" > /dev/null 2>&1
rm -rf "WORKDIR"
echo "ГОТОВО! Файл: FINAL_FILE"

https://habr.com/ru/companies/selectel/articles/1006288/

Проект «Прометей»: как озвучить целую библиотеку за один вечер при помощи ИИ

Каждый из вас хоть раз ловил себя на мысли: «А почему бы не начать слушать книги вместо того, чтобы их читать?». Пока едешь в метро, стоишь в пробке, занимаешься домашней рутиной или вместо приевшейся...

Хабр

Show thread

Who Let The Dogs Out 🐾Mar 6

#tts #TextToSpeech #AI

#4. ОБРАБОТКА ТЕКСТА И НАРЕЗКА

sed '/^/d' "WORKDIR/raw.txt" > "WORKDIR/source.txt"
fold -s -w 2000 "WORKDIR/source.txt" | tr -d '\r' > "WORKDIR/formatted.txt"
split -l 100 -d -a 4 "WORKDIR/formatted.txt" "WORKDIR/part_"
TOTAL_PARTS=(ls "$WORKDIR"/part_[0-9]* | wc -l)

#5. КОНВЕЙЕРНАЯ ОЗВУЧКА

export VOICE WORKDIR RATE
do_tts() {
local file=1
edge-tts --rate="RATE" --voice "VOICE" --file "file" --write-media "$file.mp3" > /dev/null 2>&1
}
export -f do_tts

#Очередь через xargs (вот здесь живет скорость)

ls "WORKDIR"/part_[0-9]* | xargs -P THREADS -I {} bash -c 'do_tts "{}"'