Building Solovey — a macOS app that cleans up audio recordings
You know the deal: you recorded a podcast, interview, or lecture — and the audio has background noise, hum, and volume jumping all over the place.
Currently polishing things up before an App Store release. If you work with audio and want to give it a try — reach out, I'd love the feedback.
#audio #podcasting #macos #indiedev #audioprocessing #noisereduction #solovey #buildinpublic
Solovey fixes that in a couple of clicks: drop in a file, the app removes noise, levels out the volume — and gives you a clean result. Everything runs locally on your Mac, no cloud uploads.
Speed-wise: a 10-minute recording processes in about 3 minutes on a MacBook Pro.
Building Solovey — a macOS app that cleans up audio recordings
You know the deal: you recorded a podcast, interview, or lecture — and the audio has background noise, hum, and volume jumping all over the place.
**11/**
Want to adapt this for your domain?
The approach works for any specialized translation:
• Legal
• Medical
• Technical
• Gaming
Build a dictionary. Fine-tune a small model. Beat the giants.
**10/**
Key lessons:
1. Remove what you don't need
2. Domain dictionaries > model size
3. 16-bit LoRA >> 4-bit QLoRA
4. Measure everything
5. Iterate relentlessly
**9/**
Final results:
• 400K recipes in 2.5 hours
• RTX 4090
• $5 electricity
• 90% quality
• 155ms per translation
DeepL would need 55 hours and $5,000.
**8/**
The size paradox:
| Model | Params | Quality |
|-------|--------|---------|
| DeepSeek V3 | 671B | 70-80% |
| HY-MT | 7B | 86% |
| TranslateGemma | 4B | 90% |
4B beats 671B. 168x smaller. 20% better.
Specialization > Size.
**7/**
7 training iterations:
1. Base fine-tune: 62%
2. More data: 69%
3. Morphological fix: +46% on steps!
4. rsLoRA + DoRA + NEFTune: +18.7%
5. Domain dictionary (5K terms)
6. Prompt optimization
7. Final tuning
Result: 90%
**6/**
Wrote a script to remove vision_tower and multi_modal_projector.
Model size: 8.6GB → 7.76GB
Now I could use 16-bit LoRA instead of 4-bit QLoRA.
This was the key unlock.
**5/**
The problem: TranslateGemma is multimodal.
It has a vision layer for images — useless for text translation.
800MB of wasted VRAM blocking 16-bit fine-tuning.
Solution? Surgery. 🔪