Evaluating Chain-of-Thought Monitorability
https://openai.com/index/evaluating-chain-of-thought-monitorability/
#HackerNews #Evaluating #Chain-of-Thought #Monitorability #AI #Research #MachineLearning #OpenAI #ThoughtProcess #Insights
Evaluating Chain-of-Thought Monitorability
https://openai.com/index/evaluating-chain-of-thought-monitorability/
#HackerNews #Evaluating #Chain-of-Thought #Monitorability #AI #Research #MachineLearning #OpenAI #ThoughtProcess #Insights
Evaluating Argon2 Adoption and Effectiveness in Real-World Software
https://arxiv.org/abs/2504.17121
#HackerNews #Evaluating #Argon2 #Adoption #Effectiveness #RealWorldSoftware #Cybersecurity #Cryptography
Modern password hashing remains a critical defense against credential cracking, yet the transition from theoretically secure algorithms to robust real-world implementations remains fraught with challenges. This paper presents a dual analysis of Argon2, the Password Hashing Competition winner, combining attack simulations quantifying how parameter configurations impact guessing costs under realistic budgets, with the first large-scale empirical study of Argon2 adoption across public GitHub software repositories. Our economic model, validated against cryptocurrency mining benchmarks, demonstrates that OWASP's recommended 46 MiB configuration reduces compromise rates by 42.5% compared to SHA-256 at \$1/account attack budgets for strong user passwords. However, memory-hardness exhibits diminishing returns as increasing allocations to RFC 9106's 2048 MiB provides just 23.3% (\$1) and 17.7% (\$20) additional protection despite 44.5 times greater memory demands. Crucially, both configurations fail to mitigate risks from weak passwords, with 96.9-99.8% compromise rates for RockYou-like credentials regardless of algorithm choice. Our repository analysis shows accelerating Argon2 adoption, yet weak configuration practices: 46.6% of deployments use weaker-than-OWASP parameters. Surprisingly, sensitive applications (password managers, encryption tools) show no stronger configurations than general software. Our findings highlight that a secure algorithm alone cannot ensure security, effective parameter guidance and developer education remain essential for realizing Argon2's theoretical advantages.
Evaluating the Infinity Cache in AMD Strix Halo
https://chipsandcheese.com/p/evaluating-the-infinity-cache-in
#HackerNews #Evaluating #Infinity #Cache #AMD #Strix #Halo #Hardware #Review #Technology
Evaluating LLMs for my personal use case
https://darkcoding.net/software/personal-ai-evals-aug-2025/
#HackerNews #Evaluating #LLMs #personaluse #case #AItechnology #machinelearning #HackerNews
An #Expert Who Has #Testified in #FosterCare #Cases Across #Colorado Admits Her #Evaluations Are #Unscientific.
#DianeBaird had spent four decades #evaluating the #relationships of #poor #families with their #children.
#Women #Transgender #LGBTQ #LGBTQIA #Fostering #FosterParents #Courts #Abuse
https://www.propublica.org/article/expert-in-foster-care-cases-admits-her-method-is-unscientific
Diane Baird labeled her method for assessing families the “Kempe Protocol” after the renowned University of Colorado institute where she worked for decades. The school has yet to publicly disavow it.
Click the link to learn more about our marketing tools and receive unlimited access B2B email leads. Leads Vault Jiawei Zhou, a PhD student in Georgia Tech’s School of Interactive Computing. Existing machine learning (ML) models used to detect online misinformation are less effective when matched against content created by ChatGPT or other large language […]