Реально большая стейт-машина: как мы строили облачную запись и ИИ-конспектирование в Телемосте

Всем привет! Меня зовут Илья Григорьев, я старший бэкенд-разработчик в команде Телемоста. В этой статье я разберу наш опыт разработки двух фич последнего года — ИИ-конспект с Алисой Про и облачной записи на Диск. Покажу, как мы проектировали их архитектуру, почему не всё получилось с первого раза, с какими системными и техническими ограничениями столкнулись при работе с медиаданными и как в итоге выстроили пайплайн их обработки и анализа.

https://habr.com/ru/companies/yandex/articles/1032168/

#бэкенд #java #postgresql #ffmpeg #стейтмашина #телемост #медиасервер #оптимизация #оптимизация_производительности #backendразработка

Реально большая стейт-машина: как мы строили облачную запись и ИИ-конспектирование в Телемосте

Всем привет! Меня зовут Илья Григорьев, я старший бэкенд-разработчик в команде Телемоста. В этой статье я разберу наш опыт разработки двух фич последнего года — ИИ-конспект с Алисой Про и облачной...

Хабр
「あの」Mt.Goxで知られたカルプレス氏が、わずか1ヶ月もない中で作成したRust製多機能エンコーダーにFFmpeg関連コードが混入していた問題に関して、現状分かる範囲で記事にしました!

情報が不足している中ですが、参考になれば幸いです
🙏

MITライセンス下の「OxideAV」にFFmpeg関連コードが混入か、Mt.Goxのマルク・カルプレス氏が「クリーンルーム」で作成したエンコーダー - osumiakari.jp
www.osumiakari.jp/articles/20260506-oxideav/ #news #ニュース #OxideAV #FFmpeg #FLOSS
MITライセンス下の「OxideAV」にFFmpeg関連コードが混入か、Mt.Goxのマルク・カルプレス氏が「クリーンルーム」で作成したエンコーダー

MtGoxのマルク・カルプレス氏が遅くとも4月中旬から作成、FFmpegの主要開発者により指摘が行われ謝罪

Osumi Akari.jp

For the #wmhack demo, my screen recorder app was failing so I fell back to this handy shell function:

record_screen () {
file=${1:-output.mp4}
screen_size=$(xdpyinfo | awk '/dimensions/ {print $2}')
ffmpeg -video_size $screen_size -f x11grab -i :0.0+0,0 "$file"
}

#shell #ffmpeg #cli #screencast

FFmpeg Developers released #FFmpeg version 8.1.1. https://ffmpeg.org/download.html
Download FFmpeg

The following hashtags are trending across South African Mastodon instances:

#ryzen
#ffmpeg
#whisper
#opera
#church

Based on recent posts made by non-automated accounts. Posts with more boosts, favourites, and replies are weighted higher.

The following hashtags are trending across South African Mastodon instances:

#Wordle
#wordle1778
#internationalscurvyawarenessday
#scurvyawareness
#vitaminc
#ryzen
#ffmpeg
#whisper
#opera
#church

Based on recent posts made by non-automated accounts. Posts with more boosts, favourites, and replies are weighted higher.

I recently replaced my #Ryzen 7 5825U laptop with a Ryzen 7 7840U laptop. WOW! I expected maybe a 10 or 20% boost in performance, but it's really night and day. Whether I am using #FFMPEG to transcode a video, transcribing in #Whisper or conducting #proteomics in FragPipe or DIA-NN, the new machine is a wonder.

Details: HP EliteBook 845 G10 with 32 GB RAM; boot drive replaced with WD Black SN770 2TB.

Stop believing you can build a Twitch clone on a single server. It's an engineering lie. You need a dedicated Origin Server.

We published a brutally honest guide on building a production-grade Nginx-RTMP streaming engine:

✅ Compile from source (ignore apt packages)
✅ Bypass SSD wear with tmpfs RAM disks
✅ Enterprise NVENC GPU limits
✅ Stop wildcard CORS hijackers

Read the blueprint:
🔗 https://www.servermo.com/howto/build-live-streaming-server-nginx-rtmp/

#SysAdmin #DevOps #Nginx #FFmpeg #LiveStreaming

archive DataHoarders

markdown formatted

A nice presentable DataHoarders archive has been created regarding the epstein files

The archive is online accessible as given in the sources matrix.

Even if the content is less interesting to you, the manner in which the front & backend end is built is quite interesting. I have interests in both backend and frontend programming & networking, thus think this is a treasure trove from both perspectives.

YMMV

When you glance through the wikipedia pages of Jeffrey you will find interesting tidbits of his nature rise and fall. When you read it multiple times you will know more than you may want to about this man, enabled by different forces to flourish in his behavour. Go in with a neutral mind and read the sources, go there if you want to know more.

The wikipedia dbase of epstein is LONG the data ammount is massive. Don't expect to even glance over it in just a few minutes.

There are 305 references in this document

When you go to this datahoarders media archive you will have a pleasant representation of the visual and printed data as released by the USA DOJ

Quotes from the archive creators:

Hey! We are two college students and we just want to share the technical part of our project because you might appreciate it. The DOJ released the Epstein files and we decided to host the entire thing ourselves and build a proper interface on top of it. Here is what the archive actually looks like.

354GB total. 160GB of raw data from the original files and 194GB of our own processed data. Around 600,000 PDF files which actually contain roughly 1,400,000 individual pages inside them since many PDFs bundle multiple pages together when you scroll down. All 3,200 videos have been converted to HLS with adaptive bitrate streaming so quality adjusts automatically to your connection the same way Netflix does it.

For the videos we ran a full audio extraction pipeline, converting video to audio MP4 and then audio to text, generating SRT subtitle files for every single video that contains spoken content. This means you can search for a word that was spoken in any video and find the exact moment it was said

For the PDFs we converted every single page to PNG and ran OCR across all 1,400,000 pages. We then used Go to run AI agents that analyze and summarize the OCR output across the documents. The search engine works through tags associated to each specific file, built on top of all that processed data.

The frontend is React Native, infrastructure runs through Cloudflare.

We also added the possibility for a user to make an anonymous account to like, add a comment and reply to others or make your own investigation post on our platform.

We are not stopping here. There is still a lot to do and we are pushing updates constantly.

Z

Naturally ffmpeg / curl are crucial tool combo's for all this conversion fetch and serve to work smoothly, but I don't need to tell you that. There are many more tools used, go in read and learn!

Sources:

https://exposingepstein.com/home

https://en.wikipedia.org/wiki/Jeffrey_Epstein

https://www.reddit.com/r/DataHoarder/comments/1shx4po/we_scraped_processed_and_now_host_the_entire_doj/

#programming #database #video #HLS #pdf #recoding #streaming #json #backend #frontend #react #srt #subtitles #FFMPEG

ExposingEpstein

#handbrake encoding with default #Mesa stack does not utilize GPU acceleration without proprietary amf-amdgpu-pro drivers, which are available in #AUR but look to have been abandoned after AMD stopped publishing separated AMF from their stack, building on top of open amdgpu drivers instead, and offer inferior performance in other contexts.

#ffmpeg can with #VAAPI but hammering out scripts for the amount of media I'm archiving, that's less than ideal.

Could spin up a VM with amf drivers, but setting up with SingleGPUPassthrough seems like overkill/pain in the ass.

Any other tools utilizing VAAPI that could make things a little easier?