Mastodawn

Romain Deffayet

33 Followers

21 Following

32 Posts

PhD candidate Naver Labs Europe x UvA
RL and Counterfactual L2R for Recommender Systems

Location	French Alps
Website	https://www.deffayet.cc

Show thread

Romain Deffayet Apr 20, 2023

By computing the conditional mutual information of these quantities, our metric CMIP measures how debiased a model is.
While debiasedness alone is not sufficient (a random click model is debiased), CMIP helps to discard biased models and predict out-of-distribution metrics.

Show thread

Romain Deffayet Apr 20, 2023

That's the idea of our metric: analyze the correlations between relevance scores predicted by the logging policy (the reference classmate) and by the candidate click model. If they are correlated beyond the true relevance signal, the model failed to debias the logged data.

Show thread

Romain Deffayet Feb 2, 2023

Among the 24 papers performing offline eval of RL-based RecSys, 22 used this protocol. But we argue that (1) it is myopic and does not account for long-term outcomes, (2) what it considers as ground truth is suboptimal and (3) it hides deficiencies of RL agents trained offline.