Building an ML-Powered Transaction Classifier with Retraining and A/B Testing
Every month I download a CSV from my bank with all our household transactions. Each one needs a category: groceries, fuel, mortgage, subscriptions, insurance.
https://www.hylkerozema.nl/2026/02/26/building-an-ml-powered-transaction-classifier-with-retraining-and-a-b-testing/
#DataScience #MachineLearningEngineering #classification #DataScience #Flask #MachineLearning #MLflow #MLOps #mongodb #NAS #Optuna #Python
Building an ML-Powered Transaction Classifier with Retraining and A/B Testing – Hylke Rozema

We’re looking for a way to version and catalogue self-trained deep learning models (training data, code revision, etc.) from our Tissue-Concepts family of medical foundation models.

We’ve briefly looked at #W&B, #MLflow (now integrated into GitLab), and intensely tried storing more-or-less-documented model snapshots to disk.

Has anyone had good or bad experiences with these tools in research / medical ML settings? Any recommendations?

#ComputationalPathology #datascience #deeplearning

I like W&B
0%
I like MLflow
0%
Dumping to disk is just fine
0%
Something else!
0%
Poll ended at .

And, finally, success. Figuring out access controls took a bit of work too. It was particularly weird to see that I could navigate to the 'create user' section through the browser, but not actually create the user. The solution to that was setting the --cors-allowed-origins flag to be the sites own URL (I'm sure this makes perfect sense to a webdeveloper).
It will take a bit of getting used to that some management will happen through python scripts (such as changing user passwords)

#mlflow

For anyone curious: 'Invalid Host header' means that the '--allowed-hosts' flag has to be set, with the actual web address through which mlflow will be accessed. That works now.

Next step is checking that the python api also works, and then figuring our access control(s).

Given my zero web developing knowledge, I am feeling very technical.

#mlflow

Next hurdle was acually connecting to the machine through the internet. There are two ways which need to work:

- through a browser, so we can inspect our results.
- through the python API, so we can actually log our results.

This morning I figured out how to get the browser thing to work. Or, more precisely, a friendly IT guy told me what my error message meant, so I could figure out what to change in the setup.

#mlflow

I thought I would start a thread documenting my exploration of #mlflow as a #neptune replacement to track machine learning experiments and results in my group.

First results:
I got the university to set up a virtual machine (which was, disappointingly, not-free) on which to run the service. That took a lot of back-and-forth with IT until both I and a few phd students could actually log in. But we got there.

#selfhosting

MLflow로 AI 에이전트 안전성 테스트: GPT vs Gemini 레드팀 실험

MLflow를 활용해 AI 에이전트 안전성을 체계적으로 평가하는 3-모델 레드팀 프레임워크. GPT vs Gemini 실험 결과와 실무 적용 방법을 소개합니다.

https://aisparkup.com/posts/7821

AI Skills 2025: LangChain, RAG & MLOps—The Complete Guide

Comprehensive guide to the three critical AI competencies reshaping hiring in 2025: LangChain for orchestration, RAG for knowledge grounding, and MLOps for production deployment.

TechLife

Benchmark Driven Development: почему мы перестали верить чужим бенчмаркам

В этой статье расскажем, как мы пришли к подходу, который внутри называем Benchmark Driven Development (BDD) — разработка, движимая бенчмарками на своих данных. (Да, мы знаем, что BDD — это ещё и Behavior Driven Development, тут у нас своя расшифровка 🙂)

https://habr.com/ru/articles/975188/

#ML #mlflow #datascience #benchmark #ocr

Benchmark Driven Development: почему мы перестали верить чужим бенчмаркам

Каждый день появляются новые LLM, OCR, мультимодальные модели и агенты. В новостях — одни заголовки: «Модель X побила все бенчмарки» . Руководство хочет «самое новое и передовое», команда — «самое...

Хабр

🚀 New blog post: Choosing the Right LLM with MLflow

How do you systematically evaluate which open-source LLM performs best for your use case?

I wrote a hands-on guide covering:
✅ Comparing multiple models (Llama, Mistral, DeepSeek)
✅ Docker Compose infrastructure
✅ Complete Python & bash code

The evaluation pipeline works unchanged from local → staging → production.

https://www.lotharschulz.info/2025/12/08/choosing-the-right-llm-systematic-model-evaluation-with-mlflow/

#MLOps #LLM #MLflow #Python #DevOps #MachineLearning #OpenSource #AI