Mastodawn

Increasingly thinking about designing a #RecSys benchmark where “number go up” forces you to care about real-world algorithmic issues (e.g. user controls like thumbs and/or explicit category preferences, but not serving latency)

Show thread

Michael Ekstrand Jun 15

@karlhigley it didn’t do a bunch of what I’m sure you’re thinking about, but you might want to look at the EvalRS challenge from CIKM 2022 or so for incomplete prior art if you haven’t read their report.