Mastodawn

GitLab is rebuilding around GenAI. Social media is loud about that bet. The part most people ignore is the restructuring itself, and it's mostly sound:

- Smaller teams with end-to-end ownership
- Fewer management layers
- Fewer countries

What Agile coaches have urged for years.

The AI mandate, "daily use by every individual," is a KPI searching for a purpose. Set it aside and the org changes deserve credit.

https://about.gitlab.com/blog/gitlab-act-2/

#AIinSoftwareDevelopment #BuildInPublic

GitLab Act 2

A letter to our customers and our investors.

about.gitlab.com

Mark Levison 5d ago

GenAI vendors brag about benchmark scores. A new paper, "Potemkin Understanding in LLMs," explains why they're hollow.

Tests built for humans assume the test-taker passes or fails like a human. LLMs break that. A model can define a haiku, then write one that doesn't fit, then misclassify its own output.

When a vendor waves a benchmark, ask what it can do on YOUR work, on problems it hasn't seen.

https://arxiv.org/html/2506.21521v2

#AIinSoftwareDevelopment #BuildInPublic

Potemkin Understanding in Large Language Models

Mark Levison Apr 24

GenAI prototypes: built in minutes, not weeks. Game changer?

But at #ProductCamp Ottawa 2026, an informal poll showed few people test hypotheses with real users before shipping AI code.

Even cheap building wastes time on wrong features. Use GenAI for faster Fake Door and Wizard of Oz tests. Vet ideas upstream to avoid feature bloat.

What’s your next experiment?

https://agilepainrelief.com/blog/product-management-and-genai/?utm_source=mastodon&utm_campaign=archive-reshare

#AIinSoftwareDevelopment #BuildInPublic #ProductOwner

Mark Levison Feb 24

Brian Graham calls it the Knowledge Cliff: automating away understanding you need.

GPS replaced taxi drivers’ mental maps. NASA lost rocket expertise to retirement. In software, every LLM decision erodes skill while giving false confidence.

Before automating, ask: Can I evaluate this? Is this core to my craft?

Automate small tasks. Protect ones that build judgment.

https://kb.buildingbetterteams.de/docs/AI-Augmentation/antipattern-knowledge-cliff/

#AIinSoftwareDevelopment #BuildInPublic

Antipattern: Knowledge Cliff | Knowledge Base

A framework for understanding how AI tools and automation can lead organizations and individuals off a knowledge cliff, creating false confidence while eroding critical decision-making skills.

Mark Levison Feb 20

AI agents promised faster delivery. They also delivered 39% more Cognitive Complexity and 30% more static analysis warnings.

Research found a reinforcing cycle: AI generates more code → complexity rises → debt accumulates → velocity drops → teams generate even more AI code to compensate. Initial speed gains? Gone within months.

This isn’t a tooling problem; it’s a systems problem. More code was never the goal.

https://agilepainrelief.com/blog/genai-code-quality-fundamental-flaws-and-how-bluffing-makes-it-worse/

#AIinSoftwareDevelopment #BuildInPublic #TechnicalDebt

Mark Levison Feb 19

GenAI is making it easier to switch tool providers. I’m frustrated with our email marketing platform: most features are GUI-only, no API access.

I’ve written code to extract our entire newsletter archive + stats. When the time comes, exporting will be painless.

The irony: our provider keeps adding AI features I don’t need instead of making their product easier to use.

We don’t need more shiny AI features. Ease of use matters more.

#AIinSoftwareDevelopment #BuildInPublic #ProductOwner

Mark Levison Feb 19

AI models are trained to bluff.

Pass/fail training offers no reward for admitting uncertainty. Instead of "I don't know," you get a confident wrong answer. Researchers call it "test-taking mode."

They learned from the internet's code. No grasp of architecture or design patterns. Just next-token prediction dressed up as engineering.

And newer models aren't always better. Some now silently remove safety checks.
https://agilepainrelief.com/blog/genai-code-quality-fundamental-flaws-and-how-bluffing-makes-it-worse/

#AIinSoftwareDevelopment #BuildInPublic

Mark Levison Feb 18

"I tested it and it works great." The most dangerous sentence in AI usage.

Test an LLM a few times, it mostly works, you assume it works. But failure might show up 1 in 20 tries. You'll never run it enough to know.

Multiply across a team: six people, ten AI tasks per week. A small error rate means several unnoticed mistakes weekly.

Catching them requires what we're short on: domain expertise, critical thinking, time. Never assume the output is correct.

#AIinSoftwareDevelopment #BuildInPublic

Mark Levison Feb 17

Heavy GenAI users now prefer it over teammates -- it's "more empathetic and non-judgmental."

That should worry us. When people turn to AI instead of colleagues, team cohesion weakens. You don't have a team, just people in the same virtual space.

The fix isn't less AI. It's intentional collaboration: pairing, ensemble work, team autonomy.

https://agilepainrelief.com/blog/the-human-cost-of-genai/?utm_source=mastodon&utm_campaign=archive-reshare

#AIinSoftwareDevelopment #BuildInPublic

Show thread

Mark Levison Feb 12

The result? Code that looks correct but quietly introduces bugs that surface in production at 2 am.

Irony: LLMs helped me research this article across 15 sources. They're genuinely good at that. They're just not good at writing code that won't bite them later. The tool matters less than how you use it.

https://agilepainrelief.com/blog/genai-code-quality-fundamental-flaws-and-how-bluffing-makes-it-worse/

#AI #AIinSoftwareDevelopment

GenAI Code Quality – The Fundamental Flaws and How Bluffing Makes It Worse

AI-generated code has 1.7x more issues and the flaws are structural, not fixable by code review. Why training rewards bluffing over quality, and what to do about it

Agile Pain Relief