Title: P1: causal inference A/B learing [2024-10-29 Tue]
bound (where N is the number of arms and T is the number
of time steps).
: O(√(N*T*log(T)))
*regret* is difference between max possible reward and
collected. *optimal* means algoritms able to achive
minimal regret when T → ∞.
I hacked my first remote machine. I created a separate
account and cleared logs. I didn't break any
configuration. Now I am going to spend a day or so to #dailyreport #abtest #multiarmedbandit #mab