Mastodawn

I've been working on a new Python tool: labeille. Its main purpose is to look for CPython JIT crashes by running real world test suites.

https://github.com/devdanzin/labeille

But it's grown a feature that might interest more people: benchmarking using PyPI packages.

How does that work?

labeille allows you to run test suites in 2 different configurations. Say, with coverage on and off, or memray on and off. Here's an example:

https://gist.github.com/devdanzin/63528343df98779b5fedf657bf8286cd

#Python #labeille #fuzzing #JIT #PyPI #benchmarking

Show thread

danzin Feb 28

labeille runs test suites from popular PyPI packages against a JIT-enabled CPython build and catches crashes: segfaults, assertion failures, etc.

If all of requests, flask, attrs, etc. pass their tests under the JIT, that shows the JIT is working. If one crashes, there's a bug with a reproducer. We've found one crash so far: https://github.com/python/cpython/issues/145197

This requires curating a local package registry with repo URLs, install and test commands, etc.

#Python #CPython #JIT #fuzzing #labeille #testing

JIT: segfault from invalid frame in `_PyFrame_GetFunction` · Issue #145197 · python/cpython

Crash report What happened? It's possible to segfault a patched JIT build by running ipython's test_completer.py with pytest: pytest tests/test_completer.py Necessary patch diff --git a/Include/int...

GitHub

Show thread

danzin

labeille has a bisect command that binary-searches through a package's git history to find the commit that triggers a JIT crash:

labeille bisect requests --good=v2.30.0 --bad=HEAD --target-python /path/to/cpython-jit

https://github.com/devdanzin/labeille#bisecting-crashes

Commits that won't build get skipped automatically (like git bisect skip), revisions get a fresh venv so dependency versions don't leak, and you can filter by crash signature when a package has distinct crashes.

#Python #CPython #JIT #debugging #labeille

Show thread

danzin Feb 28

labeille can compare 2 test runs and show what changed and why it changed.

When it goes from PASS to CRASH, labeille looks at the package's repo. If the commit is the same, it's a CPython/JIT regression. Otherwise, it might be the package:

requests: PASS → CRASH
Repo: abc1234 → abc1234 (unchanged — likely a CPython/JIT regression)

flask: CRASH → PASS
Repo: 222bbbb → 333cccc (changed)

This allows figuring out "3 of these are JIT regressions".

#Python #CPython #JIT #labeille #testing

Show thread

danzin Feb 28

I built labeille to find CPython JIT crashes, but it's a "run real world test suites at scale" platform.

It also works for:
— Checking which packages pass their tests on a new CPython version
— Testing free-threaded (no-GIL) CPython compatibility
— Measuring coverage.py or memray overhead across hundreds of packages
— Comparing CPython vs PyPy performance on real code

The registry of 350+ packages with install/test commands is the core.

#Python #CPython #PyPI #testing #benchmarking #labeille

Show thread

danzin Feb 28

The most important and tedious part of labeille is the registry.

So far with 350+ PyPI packages, each with a repo URL, install and test commands, metadata about whether it has C extensions, what Python versions to skip, and whether it needs xdist disabled.

"Just run pytest" doesn't work for all packages. Some need specific test markers or editable installs. Some have tests that might hang. Some need extra dependencies that aren't in their dev requirements.

#Python #PyPI #testing #labeille