New post on our blog! 🤖
A continuation of "Building Your First Voice Agent", this time the author discusses practical strategies to build AI voice agents with Pipecat!
https://blog.codeminer42.com/technical-challenges-in-building-voice-agents/
New post on our blog! 🤖
A continuation of "Building Your First Voice Agent", this time the author discusses practical strategies to build AI voice agents with Pipecat!
https://blog.codeminer42.com/technical-challenges-in-building-voice-agents/
Build an ultra-low-latency voice agent with NVIDIA open models. Learn how Nemotron Speech ASR achieves sub-25ms transcription, how Nemotron 3 Nano LLM and Magpie TTS work together, and how to optimize architecture for real-time voice AI deployment.
- Requires Gemini API & Daily API keys
- Supports Web, React, iOS, Android & more
- Server runs #Pipecat pipeline for orchestration
💡 Architecture Benefits:
- Smart proxy approach for optimal performance
- Server-side Python logic customization
Building Voice #AI Applications: #Gemini Multimodal Live API with #WebRTC and #Pipecat 🎯
🧵
🔧 Simple web app using #opensource #PipecatSDK integrates with Gemini API through WebRTC for voice communication
🌐 #WebRTC advantages over WebSockets:
📝 If you missed my talk at #ODSCWest yesterday, don't worry! The writeup is ready ✨
1️⃣ Let's see the challenges of building voice bots that actually sound human 💁♀️ https://www.zansara.dev/posts/2024-09-05-building-voice-agents-with-open-source-tools-part-1/
2️⃣ Let's make a simple #Pipecat voice bot with the same latency as #GPT4o #VoiceMode for a fraction of the cost 💸 https://www.zansara.dev/posts/2024-10-30-building-voice-agents-with-open-source-tools-part-2/
This is part one of the write-up of my talk at ODSC Europe 2024 and ODSC West 2024. In the last few years, the world of voice agents saw dramatic leaps forward in the state of the art of all its most basic components. Thanks mostly to OpenAI, bots are now able to understand human speech almost like a human would, they’re able to speak back with completely naturally sounding voices, and are able to hold a free conversation that feels extremely natural.
Thank you to everyone that just attended my virtual tutorial session at #ODSC Europe!
We talked about the state of the art of voice agents in the age of #GenAI and #LLMs, we implemented a little bot with #Pipecat and then tested out an innovative approach to use system prompts to control the conversation without using huge and overly complicated system prompts.
You can find all the material here: https://www.zansara.dev/talks/2024-09-05-odsc-europe-voice-agents/
Stay tuned for more content 📚
Announcement, slides and notebook. All resources can also be found on ODSC’s website and in my archive. (Note: this is a recording of the notebook walkthrough only. The full recording will be shared soon). At ODSC Europe 2024 I talked about building modern and reliable voice bots using Pipecat, a recently released open source tool. I gave an overview of the general structure of voice bots, of the improvements their underlying tech recently saw, and the new challenges that developers face when implementing one of these systems.