🧵🗺️ 🦌 💻 A little thread about #MachineLearning education in ecology.

This fall, I am teaching an ML class based on species distributions. Yesterday's activity was a little game where we kept the same data and tweaked the model to see which combination of data preparation and classifier we could use and whether we "liked" the results. It was, essentially, a computer-assisted vibe check.

The data look like this:

The first thing we tried was a PCA followed by a Random Forest because we are so extremely basic.

It's very obvious that it was not going to work, and so we started thinking a little about how RF generally works, so maybe the PCA is the problem. Let's replace it by a simple Standardizer.

@tpoisot for prediction of presence, I often noticed that RF tend to overfit, resulting in weird predictions. GAMs seems to work better...
@OMorissette Absolutely. RF will overfit anything you want, especially if there is no tree pruning or too many trees. Ensembles of weak learners are a lot more conservative.