@strickvl

4 Followers
3 Following
298 Posts
Machine Learning Engineer, researcher (& author of a few books in my old life as a historian).
Love learning languages (machine and human), cats and sharks. Studying Mathematics @ the Open University. Budding J enthusiast.
linkedinhttps://linkedin.com/in/strickvl
githubhttps://github.com/strickvl
bloghttps://mlops.systems/
Wrote my first RL environment this evening. A very simple on, mind, but 'verifiers' (by @PrimeIntellect and @willccbb) makes it very easy to slot in the pieces.
I'm now transitioning from the part of my agentic RL exploration where I learned the high-level concepts to seeing what people are doing in practice. Part of that has meant navigating the world of RL environment/training frameworks which I have to say is slightly overwhelming! (new domain vocabulary, new players, explosion of projects...)

For example, GLM 5.2 came out yesterday and they explicitly highlight how they needed PPO for long-horizon agentic RL work. (https://z.ai/blog/glm-5.2)

Blog even includes a little video of me talking through my understanding of GRPO... (with obviously the huge caveat that I'm early days in my learning etc etc!)

https://alexstrick.com/posts/2026-06-18-grpo-explained.html

Whenever a frontier lab drops a new model you always see their employees posting things like "you’ll be surprised by how good we made our new model! throw your hardest problems at it". Today (while trying to build up an intuition for how GRPO works) I think I realised that there are actually two things going on there:

1. "Our model is better than you think / give it credit for"
2. "We need hard tasks and examples to train on so we can make the next version of our model even better."

In the GIF I recorded below, you can see we have a whole bunch of playgrounds designed to help me understand how GRPO works. An important starting point for me is to get a rough mental model for how the algorithm works and these widgets help a LOT.

Published a new post on our Kitaru adapter for Claude Agent SDK.

Claude owns the agent loop. Kitaru records the completed invocation as durable workflow state: result, artifacts, waits, and replay boundary.

One completed invocation = one checkpoint.

OpenAI Agents SDK is a great harness. When you move your agent to production, you're probably going to need and want more. That's where Kitaru comes in... We build an adapter so you can keep your OpenAI Agents SDK code, but throw in some durability and other goodies on top.

I wrote how Kitaru wraps it without changing what the agent does underneath. So you get to keep all your durable approval waits, replay boundaries and inspectable execution history etc

Just made a bumper release this evening: 25 new format adapters covering the major cloud annotation platforms, autonomous-driving and aerial datasets, document layout, synthetic data, and the long tail of academic/community formats. Panlabel now reads and writes 40+ object detection annotation formats, which I think covers almost all the options!

I think I'll move onto a new domain / format now. Either segmentation or maybe I'll dip my toes into text datasets / formats!
https://github.com/strickvl/panlabel/releases/tag/v0.7.0

We just shipped migration skills that help you try out ZenML from 11 ML/data platforms: Airflow, Argo, AzureML, Dagster, Databricks, Flyte, Kedro, Metaflow, Prefect, SageMaker, Vertex AI.

Each skill has hand-curated concept maps showing what maps 1:1, what's approximate, and what needs redesign. Plus ZenML best practices baked in and a conservative approach that flags uncertainty rather than guessing.

#MLOps #AgenticCoding #MachineLearning #OpenSource