Nothing like debugging heap graph traversal code at the airport with a cocktail & steak
Nothing like debugging heap graph traversal code at the airport with a cocktail & steak
Uuuugh so this was a bad idea.. because now I thought deeply about the existing code and realized I probably made a dumb mistake, and I think I found an edge case that shows I never really computed dominators right.
So, let's go back to basics, why do we care about dominators in a jvm heap anyway?
I learned about dominators from reading the Android Studio code, and how it computes retained size.
An easy def is that a dominator D of a node A is a node that if removed would make A unreachable.
Hence why if you sum up the size of all the nodes that a dominator dominates, you get the retained size.
The immediate dominator is the closest dominator to the node.
I had this algorithm in my head for progressively building up dominators: do a graph traversal, first time you find a node set it's parent as dominator, then every time you find that node again you go up the dominator chain you've already built, on both the previous parent side and the new parent side, until you find a common node, that's the new dominator.
I need to re read the AS code to see if it's doing something similar or not.
Anyway, the problem is, this doesn't account for running into the original parent node from yet another spot after you've done that. At which point you're not updating the dominated node because you're not exploring it, just its parent.
I wrote a failing unit test against the dominator data structure and confirmed it's broken.
So now I need to craft a fake heap dump and confirm I compute retained size wrong.
Then check Android Studio & Yourkit, see if they got different results.
So, I made a real life example, ran the code then dumped the heap. The results are interesting.
Note "AppWatcher.objectWatcher.watch(guide)" which tells LeakCanary that HitchHiker.guide is pretend leaking and to dump the heap.
Much to my surprise, LeakCanary is computing the correct result this time: "Everything" only retains but not "Universe", nor the answer "42". That's not what I was getting with my unit test, so I need to figure out what's different.
References are still 4 bytes here, but on an Android VM every instance has an additional shadow$_klass_ ref field and a shadow$_monitor_ int field.
Android Studio and LeakCanary seem in sync on retained size.
LeakCanary now does incorrectly show 3 objects leaks / 40 bytes, when it should be 2 objects (Everything + Universe) and 28 bytes.
YourKit is also correct in its dominator computation.
So, at this point, I've confirmed again & again that LeakCanary's approach is broken, and now I need to figure out a fix.. Unfortunately, a quick search shows standard dominators is expensive to compute (simple algo is quadratic!).
Here's where I first documented what I intended to do: https://github.com/square/leakcanary/issues/1344 and the PR: https://github.com/square/leakcanary/pull/1348
Before that, I was using the perflib impl: https://cs.android.com/android-studio/platform/tools/base/+/mirror-goog-studio-main:perflib/src/main/java/com/android/tools/perflib/heap/analysis/LinkEvalDominators.kt;l=36;drc=499fa43009666c0f0a686d8e21722dbea8b2ecf0
Looking through cs.android.com, I've also found a ahat impl: https://cs.android.com/android/platform/superproject/main/+/main:art/tools/ahat/src/main/com/android/ahat/dominators/Dominators.java;l=281;drc=f9ccccd1dd3e1a3e24f1f2238281555658090619
And some CPP impls: https://cs.android.com/android/platform/superproject/main/+/main:external/deqp-deps/SPIRV-Tools/source/opt/dominator_tree.cpp;l=23-38;drc=1a7f71afb42983b8b21bef656260964eb3852942
or