This is quite a neat demonstration of the counter-intuitive dangers of using current ML tech in critical situations.

You'd think that more reflective = more visible, but visibility is a problem for humans. For ML more reflective means more unusual, which means the system doesn't know what to do.

This creates a whole host of problems. For example when fashion rapidly changes, systems will get less good at recognizing people.

https://usa.streetsblog.org/2025/01/10/alarming-report-shows-that-two-auto-braking-systems-cant-see-people-in-reflective-garb

Alarming Report Shows that Two Auto-Braking Systems Can't See People in Reflective Garb — Streetsblog USA

The safety strips are useless in the eyes of automatic braking systems on two very popular car models.

In short, we are nowhere near ready to do anything like this. It looks ready, because we test in settings that are close to the training data. But we're not testing generalization.

Here's a test. Train your self-driving car on US data, and have it drive safely in Japan (zero shot). If you can do that, maybe I'll consider getting in one.

@pbloem the current paradigm is of the "train on more data until nothing is out of distribution", and while it's not "true" generalization, with enough capital this approach can and has yielded wonders.

What we see above is perhaps just the effects of not enough data...

@varavs True, but it also creates a false suggestion that the product is getting better, and that we're "nearly there" when in fact, we're not making any real progress.

It's fine for some cases, but not for critical stuff like this.

Ultimately a car is not safe on the road until it can deal safely with something it has never seen before, at least to the extent that people can (e.g. weird clothes or unusual backgrounds should cause zero deviation from normal behavior).

@pbloem this sounds an awful lot like "no true AGI" kind of argument...

I agree that we do not have this kind of generalization, yet, but, on the other hand, we do have waymo stats saying they're generally safer where they operate than humans.

Basically, "nowhere near" feels way too strong, given that waymo miracles do exist. We have self-driving cars (under some limited conditions), that's pure magic, and it's far closer to self-driving cars everywhere than to self-driving cars nowhere.

@varavs I didn't intend that. I just don't think we have enough generalization yet, and more data doesn't seem to fix the problem. It just hides it. I think we need more innovation in either the architectures or the training set up (probably the latter).

The stats showing good performance in controlled settings are misleading, I think. If the setting is controlled you can scale up the training data, but that's because you are never encountering the long tail.