Just saw people saying that Anthropic's Project Deal was "good". And that it shows that newer models are better.
Except that the "good" metric was "sold stuff for more money" (which means that you're also paying more for everything, which isn't win-win, so it's only a win for capitalists counting the cash on the outside) and it made some really dumb transactions (buying your own snowboard) and showed all the signs you'd expect of being dangerous and untrustworthy in an uncontrolled public market (making up bullshit stories and random arbitrary rationalisations)
The main take-home for me were:
1) the convincing text machine can convincingly roleplay carefully managed low-stakes marketplace transactions (but value and outcomes are questionable)
2) nearly 50% of employees at convincing text machine company would pay for a convincing text machine






