Mastodawn

Evaluating Chain-of-Thought Monitorability

https://openai.com/index/evaluating-chain-of-thought-monitorability/

#HackerNews #Evaluating #Chain-of-Thought #Monitorability #AI #Research #MachineLearning #OpenAI #ThoughtProcess #Insights

Evaluating chain-of-thought monitorability

We introduce evaluations for chain-of-thought monitorability and study how it scales with test-time compute, reinforcement learning, and pretraining.

Hacker News Oct 22, 2025

Evaluating Argon2 Adoption and Effectiveness in Real-World Software

https://arxiv.org/abs/2504.17121

#HackerNews #Evaluating #Argon2 #Adoption #Effectiveness #RealWorldSoftware #Cybersecurity #Cryptography

Evaluating Argon2 Adoption and Effectiveness in Real-World Software

Modern password hashing remains a critical defense against credential cracking, yet the transition from theoretically secure algorithms to robust real-world implementations remains fraught with challenges. This paper presents a dual analysis of Argon2, the Password Hashing Competition winner, combining attack simulations quantifying how parameter configurations impact guessing costs under realistic budgets, with the first large-scale empirical study of Argon2 adoption across public GitHub software repositories. Our economic model, validated against cryptocurrency mining benchmarks, demonstrates that OWASP's recommended 46 MiB configuration reduces compromise rates by 42.5% compared to SHA-256 at \$1/account attack budgets for strong user passwords. However, memory-hardness exhibits diminishing returns as increasing allocations to RFC 9106's 2048 MiB provides just 23.3% (\$1) and 17.7% (\$20) additional protection despite 44.5 times greater memory demands. Crucially, both configurations fail to mitigate risks from weak passwords, with 96.9-99.8% compromise rates for RockYou-like credentials regardless of algorithm choice. Our repository analysis shows accelerating Argon2 adoption, yet weak configuration practices: 46.6% of deployments use weaker-than-OWASP parameters. Surprisingly, sensitive applications (password managers, encryption tools) show no stronger configurations than general software. Our findings highlight that a secure algorithm alone cannot ensure security, effective parameter guidance and developer education remain essential for realizing Argon2's theoretical advantages.

arXiv.org

Hacker News Oct 22, 2025

Evaluating the Infinity Cache in AMD Strix Halo

https://chipsandcheese.com/p/evaluating-the-infinity-cache-in

#HackerNews #Evaluating #Infinity #Cache #AMD #Strix #Halo #Hardware #Review #Technology

Evaluating the Infinity Cache in AMD Strix Halo

Strix Halo is the codename for AMD’s highest end mobile chip, which is used in the Ryzen AI MAX series.

Chips and Cheese

Osna.FM Sep 24, 2025

The German government has acknowledged US President Donald Trump's remarks during the UN General Debate, in which he suggested Germany was returning to fossil f... https://news.osna.fm/?p=16697 | #news #comments #dismisses #energy #evaluating

Germany Dismisses Evaluating Trump's Energy Comments - Osna.FM

Germany responds to President Trump's UN comments on energy policy, declining to directly evaluate his remarks regarding a potential shift away from renewables.

Osna.FM

Hacker News Aug 24, 2025

Evaluating LLMs for my personal use case

https://darkcoding.net/software/personal-ai-evals-aug-2025/

#HackerNews #Evaluating #LLMs #personaluse #case #AItechnology #machinelearning #HackerNews

Evaluating LLMs for my personal use case

My life is not a math Olympiad

Graham King

openSUSE Linux Apr 3, 2024

We received 22 submissions for this year's #Google #Summer of #Code under the @opensuse #mentoring #Org. Thank you to all those who submitted a proposal. The mentors & admins look forward to #evaluating these proposals & will rank them by the #deadline on April 24.

Susan Larson ♀️🏳️‍🌈🏳️‍⚧️🌈Mar 18, 2024

An #Expert Who Has #Testified in #FosterCare #Cases Across #Colorado Admits Her #Evaluations Are #Unscientific.

#DianeBaird had spent four decades #evaluating the #relationships of #poor #families with their #children.

#Women #Transgender #LGBTQ #LGBTQIA #Fostering #FosterParents #Courts #Abuse

https://www.propublica.org/article/expert-in-foster-care-cases-admits-her-method-is-unscientific

An Expert Who Has Testified in Foster Care Cases Across Colorado Admits Her Evaluations Are Unscientific

Diane Baird labeled her method for assessing families the “Kempe Protocol” after the renowned University of Colorado institute where she worked for decades. The school has yet to publicly disavow it.

ProPublica

The Triangle Agency Apr 21, 2023

Understanding AI-generated misinformation and evaluating algorithmic and human solutions https://triangleagency.co.uk/understanding-ai-generated-misinformation-and-evaluating-algorithmic-and-human-solutions/?utm_source=dlvr.it&utm_medium=mastodon #TheTriangleAgencyNews #AIGenerated #Algorithmic #evaluating

Understanding AI-generated misinformation and evaluating algorithmic and human solutions - The Triangle Agency

Click the link to learn more about our marketing tools and receive unlimited access B2B email leads. Leads Vault Jiawei Zhou, a PhD student in Georgia Tech’s School of Interactive Computing. Existing machine learning (ML) models used to detect online misinformation are less effective when matched against content created by ChatGPT or other large language […]

The Triangle Agency

Mohammad Hajiaghayi Feb 9, 2023

Now (7pm ET) watch
https://youtu.be/LKvkaJfWoEg
(SUBSCRIBE TO YOUTUBE
@hajiaghayi
FOR MORE)
Lesson 6: Introduction to Algorithms by Mohammad Hajiaghayi: Smart Algorithm Design through Strong Induction: #celebrity problem, #evaluating a polynomial, maximum #consecutive #subsequence

Lesson 6: Introduction to Algorithms by Mohammad Hajiaghayi: Smart Algorithm Design Strong Induction

YouTube

Show thread

Spaceflight 🚀Dec 18, 2022

@christiankoestner They don't know yet : "The IDA will be vital in #evaluating how well the Gateway module structure shields the interior habitable volume from radiation" https://www.nasa.gov/feature/gateway-instruments-to-improve-radiation-detection-for-artemis-astronauts

Gateway Instruments to Improve Radiation Detection for Astronauts

Gateway Instruments to Improve Radiation Detection for Artemis Astronauts

NASA