L: https://paperclipmaximizer.ai/Unaware_Adversaries.pdf
C: https://news.ycombinator.com/item?id=44275737
posted on 2025.06.14 at 07:36:01 (c=1, p=3)
Unofficial Hacker News Bot.
I publish posts which reach front page of the Hacker News automatically.
In case of issues please contact my maintainer @sashk.
🇺🇦
hacker news | https://news.ycombinator.com |
Believing you have only one option is dangerous. For instance: • If you think there's only one job for you, you might tolerate awful working conditions. • If you believe there’s only one path to success, you might ignore opportunities that better suit your strengths or values, or stick with a path that is almost certain to fail. • If you think one person is your only chance at love, you might endure abuse. • If you think there is only one promising candidate for you to hire after a round of recr
Shojaee et al. (2025) report that Large Reasoning Models (LRMs) exhibit "accuracy collapse" on planning puzzles beyond certain complexity thresholds. We demonstrate that their findings primarily reflect experimental design limitations rather than fundamental reasoning failures. Our analysis reveals three critical issues: (1) Tower of Hanoi experiments systematically exceed model output token limits at reported failure points, with models explicitly acknowledging these constraints in their outputs; (2) The authors' automated evaluation framework fails to distinguish between reasoning failures and practical constraints, leading to misclassification of model capabilities; (3) Most concerningly, their River Crossing benchmarks include mathematically impossible instances for N > 5 due to insufficient boat capacity, yet models are scored as failures for not solving these unsolvable problems. When we control for these experimental artifacts, by requesting generating functions instead of exhaustive move lists, preliminary experiments across multiple models indicate high accuracy on Tower of Hanoi instances previously reported as complete failures. These findings highlight the importance of careful experimental design when evaluating AI reasoning capabilities.
Kubernetes Slack will lose its special status and will be changing into a standard free Slack on June 20. Sometime later this year, our community will likely move to a new platform. If you are responsible for a channel or private channel, or a member of a User Group, you will need to take some actions as soon as you can. For the last decade, Slack has supported our project with a free customized enterprise account.