make this hack automatic.
蠡 #dailyreport #abtest #multiarmedbandit #mab
Title: P2: P0: causal inference A/B learing [2024-10-29 Tue]
Popular algorithms:
- Upper Confidence Bound (UCB) - deterministic, optimal
- Thompson Sampling - stochastic, optimal
- Epsilon Greedy - stochastic, approximate
Thompson Sampling and UCB have asymptotic regret lower #dailyreport #abtest #multiarmedbandit #mab
Title: P1: causal inference A/B learing [2024-10-29 Tue]
bound (where N is the number of arms and T is the number
of time steps).
: O(√(N*T*log(T)))
*regret* is difference between max possible reward and
collected. *optimal* means algoritms able to achive
minimal regret when T → ∞.
I hacked my first remote machine. I created a separate
account and cleared logs. I didn't break any
configuration. Now I am going to spend a day or so to #dailyreport #abtest #multiarmedbandit #mab