Aleksandr Beshkenadze

@beshkenadze
53 Followers
1 Following
21 Posts
Startup Founder | AI Product Architect | Building Privacy-First Legal & EdTech Tools (RAG, Hybrid Search, MCP Servers) | Node.js • TypeScript • Flutter • DevOps

Currently polishing things up before an App Store release. If you work with audio and want to give it a try — reach out, I'd love the feedback.

#audio #podcasting #macos #indiedev #audioprocessing #noisereduction #solovey #buildinpublic

Solovey fixes that in a couple of clicks: drop in a file, the app removes noise, levels out the volume — and gives you a clean result. Everything runs locally on your Mac, no cloud uploads.

Speed-wise: a 10-minute recording processes in about 3 minutes on a MacBook Pro.

Building Solovey — a macOS app that cleans up audio recordings

You know the deal: you recorded a podcast, interview, or lecture — and the audio has background noise, hum, and volume jumping all over the place.

**11/**
Want to adapt this for your domain?

The approach works for any specialized translation:
• Legal
• Medical
• Technical
• Gaming

Build a dictionary. Fine-tune a small model. Beat the giants.

**10/**
Key lessons:

1. Remove what you don't need
2. Domain dictionaries > model size
3. 16-bit LoRA >> 4-bit QLoRA
4. Measure everything
5. Iterate relentlessly

**9/**
Final results:

• 400K recipes in 2.5 hours
• RTX 4090
• $5 electricity
• 90% quality
• 155ms per translation

DeepL would need 55 hours and $5,000.

**8/**
The size paradox:

| Model | Params | Quality |
|-------|--------|---------|
| DeepSeek V3 | 671B | 70-80% |
| HY-MT | 7B | 86% |
| TranslateGemma | 4B | 90% |

4B beats 671B. 168x smaller. 20% better.

Specialization > Size.

**7/**
7 training iterations:

1. Base fine-tune: 62%
2. More data: 69%
3. Morphological fix: +46% on steps!
4. rsLoRA + DoRA + NEFTune: +18.7%
5. Domain dictionary (5K terms)
6. Prompt optimization
7. Final tuning

Result: 90%

**6/**
Wrote a script to remove vision_tower and multi_modal_projector.

Model size: 8.6GB → 7.76GB

Now I could use 16-bit LoRA instead of 4-bit QLoRA.

This was the key unlock.

**5/**
The problem: TranslateGemma is multimodal.

It has a vision layer for images — useless for text translation.

800MB of wasted VRAM blocking 16-bit fine-tuning.

Solution? Surgery. 🔪