New post on our blog! 🤖

A continuation of "Building Your First Voice Agent", this time the author discusses practical strategies to build AI voice agents with Pipecat!

https://blog.codeminer42.com/technical-challenges-in-building-voice-agents/

#Codeminer42 #AI #AiAgent #VoiceAgent #DIY #Pipecat

🌗 利用 NVIDIA 開放模型建構超低延遲語音 AI 代理:技術詳解與實作指南
➤ 打破私有模型壟斷:開源架構如何實現亞毫秒級的語音互動體驗
https://www.daily.co/blog/building-voice-agents-with-nvidia-open-models/
隨著 AI 技術邁入 2026 年,語音代理的應用已深入客服、醫療預約及商用自動化等領域。過去,由於對即時性與語音自然度的嚴苛要求,開發者多仰賴私有模型。然而,NVIDIA 推出的 Nemotron 系列開放模型打破了這一僵局。本文深入探討如何結合 Nemotron Speech ASR、Nemotron 3 Nano LLM 與 Magpie TTS,並透過 Pipecat 框架打造反應速度低於 25 毫秒的語音代理。這套系統不僅在精準度上媲美商用方案,更賦予企業在隱私保護、成本控管與架構客製化上的絕對主導權。
+ 「不到 25 毫秒的 ASR 延遲簡直是黑科技,這解決了語音 AI 最令人出戲的『對話空白期
##人工智慧 #NVIDIA #語音代理 #開源模型 #低延遲技術 #Pipecat
Building Voice Agents with NVIDIA Open Models

Build an ultra-low-latency voice agent with NVIDIA open models. Learn how Nemotron Speech ASR achieves sub-25ms transcription, how Nemotron 3 Nano LLM and Magpie TTS work together, and how to optimize architecture for real-time voice AI deployment.

Daily API: Developer Tips to Build Real-time Voice, Video, and AI into Apps

Sự khác biệt thực sự giữa Pipecat và LiveKit? Cả hai đều là framework Python mã nguồn mở để xây dựng trợ lý điện thoại AI, nhưng làm thế nào để chọn giữa chúng? #Pipecat #LiveKit #AI #TrợLyĐiệnThoại #MãNgồnMở #Python

https://www.reddit.com/r/LocalLLaMA/comments/1oqrm0r/the_real_difference_between_pipecat_and_livekit/

@cbase now has its own board computer: c-beam aka. cassandra - your friendly <--> honest AI
#pipecat #bhnt #bhnt106 @BHNT

- Requires Gemini API & Daily API keys
- Supports Web, React, iOS, Android & more
- Server runs #Pipecat pipeline for orchestration

💡 Architecture Benefits:
- Smart proxy approach for optimal performance
- Server-side Python logic customization

Building Voice #AI Applications: #Gemini Multimodal Live API with #WebRTC and #Pipecat 🎯

🧵

🔧 Simple web app using #opensource #PipecatSDK integrates with Gemini API through WebRTC for voice communication

🌐 #WebRTC advantages over WebSockets:

📝 If you missed my talk at #ODSCWest yesterday, don't worry! The writeup is ready ✨

1️⃣ Let's see the challenges of building voice bots that actually sound human 💁‍♀️ https://www.zansara.dev/posts/2024-09-05-building-voice-agents-with-open-source-tools-part-1/

2️⃣ Let's make a simple #Pipecat voice bot with the same latency as #GPT4o #VoiceMode for a fraction of the cost 💸 https://www.zansara.dev/posts/2024-10-30-building-voice-agents-with-open-source-tools-part-2/

#AI #GenAI #LLM #GPT #VoiceAgent #Chatbot #ODSCWest

Building Reliable Voice Bots with Open Source Tools - Part 1

This is part one of the write-up of my talk at ODSC Europe 2024 and ODSC West 2024. In the last few years, the world of voice agents saw dramatic leaps forward in the state of the art of all its most basic components. Thanks mostly to OpenAI, bots are now able to understand human speech almost like a human would, they’re able to speak back with completely naturally sounding voices, and are able to hold a free conversation that feels extremely natural.

Sara Zan

Thank you to everyone that just attended my virtual tutorial session at #ODSC Europe!

We talked about the state of the art of voice agents in the age of #GenAI and #LLMs, we implemented a little bot with #Pipecat and then tested out an innovative approach to use system prompts to control the conversation without using huge and overly complicated system prompts.

You can find all the material here: https://www.zansara.dev/talks/2024-09-05-odsc-europe-voice-agents/

Stay tuned for more content 📚

ODSC Europe: Building Reliable Voice Agents with Open Source tools

Announcement, slides and notebook. All resources can also be found on ODSC’s website and in my archive. (Note: this is a recording of the notebook walkthrough only. The full recording will be shared soon). At ODSC Europe 2024 I talked about building modern and reliable voice bots using Pipecat, a recently released open source tool. I gave an overview of the general structure of voice bots, of the improvements their underlying tech recently saw, and the new challenges that developers face when implementing one of these systems.

Sara Zan