Influence estimation + Tree ensembles + Lots of empirical results = Our new paper in JMLR!

My two favorite results:
1. TracIn is easily adapted to trees, and works great.
2. In some settings, approximate influence estimates are much better than exact!

https://jmlr.org/papers/v24/22-0449.html

#NewPaper #MachineLearning #InfluenceEstimation #GBDT

Adapting and Evaluating Influence-Estimation Methods for Gradient-Boosted Decision Trees

Q: How do we define influence? How do we reduce the impact of an example to a single number?

A (simple): Leave-one-out (LOO). Remove one example, retrain, and then measure the change in loss. Call that the influence.

But LOO is *slow*, so there’s a bunch of approximate methods for influence estimation.

Some only work on neural nets, so we adapt them for gradient boosted trees.

Surprisingly, approximate methods beat exact LOO if you remove a GROUP of examples!

Why is LOO so bad, when it’s exact?

Because LOO considers the changing tree structure, so its estimate is way off if the structure has already changed.

Ironically, it’s better to assume a fixed structure! It’s less wrong than thinking you know how the structure will change.

Here's a graph from the paper. When you remove examples in the order suggested by their influence estimates, LOO picks the best first example... but its later picks are not so good.

Which method is best? Well, it depends on lots of things, like how you define influence.

But BoostIn, our adaptation of TracIn to boosted trees, does pretty ok. It often identifies sets of examples that, if removed, impact loss more than other methods.

/END