10 posts about #metacognition -- a vital skill in the age of social media and #ai

#thinking

https://www.patreon.com/collection/2124752?view=expanded

Teaching #metacognition techniques to 4-6 year olds leads to better learning outcomes (and may immunize against cognitive decline caused by #ai use)

#llm #learning #education

https://www.patreon.com/posts/159662223

On whether LLMs can abstain effectively and whether chain-of-thought can help, two recent papers seem at odds on the surface. COLING 2025 finds prompted CoT raises abstention on instruct models. AbstentionBench (NeurIPS 2025) finds extending the reasoning budget lowers it on a trained reasoner. What gives?

https://benjaminhan.net/posts/20260527-prompted-vs-trained-cot-abstention/?utm_source=mastodon&utm_medium=social

#Metacognition #LLMs #Reasoning #Evaluation #AI

[Followup] Prompted vs. Trained Chain-of-Thought on Abstention: Reading Two Studies Together – synesis

Why prompted chain-of-thought raises abstention recall on instruct models in COLING 2025 but extending the reasoning budget on a trained reasoner lowers it in AbstentionBench, and three experiments that would clarify the picture.

synesis

Have You Realized That It’s Possible to Manage Your Emotions?

In addition to teaching you how to think, at the EMV Institute we focus on your emotions so you can truly achieve your goals. That's why we take emotions into account. But you should keep in mind that emotions are not synonymous with emotional intelligence. While emotions are what you feel (the phenomenon itself), emotional intelligence is what you do with those feelings. Hence the importance of acquiring strategies that allow you to manage your emotions. Book our services and make your purchases on our website.

https://institutoemv.wordpress.com/2026/05/27/have-you-realized-that-its-possible-to-manage-your-emotions/

Can language models monitor and steer their own internal activations? A neuroscience-inspired neurofeedback paradigm finds yes, but only within a low-dimensional metacognitive space: semantically interpretable directions are accessible, raw-variance directions aren't. The prerequisite for spoofing activation-based oversight already partially exists.

https://benjaminhan.net/posts/20260526-metacognitive-monitoring-control-activations/?utm_source=mastodon&utm_medium=social

#Paper #Metacognition #LLMs #AISafety #Neuroscience #NeurIPS #AI

Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations – synesis

A neuroscience-inspired neurofeedback paradigm shows LLMs can introspect and steer a low-dimensional metacognitive space of their hidden activations, with implications for activation-based oversight.

synesis

I used chatGPT to research cognitive risks of undisciplined use of #AI and what to do about it, then created a series of 7 books for my 5 year old grandson. If you don't want to download the 60 mb PDF of the illustrated books, this detailed curriculum guide details the pedagogy of #metacognition for very young people

#learning #education #llm

https://rheingold.com/READ_FIRST_Thinking_System_Curriculum_Intro.pdf

and the entire set of books
https://rheingold.com/ThinkingCurriculum.zip

Does training an LLM to be calibrated on one task format transfer to another? A new arxiv paper tests two formats: single-question confidence and pairwise comparison. Training only on one doesn't improve the other. Multitask training closes most of the gap, but Llama doesn't inherit the comparison-task benefit.

https://benjaminhan.net/posts/20260525-metacognition-uncertainty-sft/?utm_source=mastodon&utm_medium=social

#AI #LLMs #Calibration #Metacognition

Improving Metacognition and Uncertainty Communication in Language Models – synesis

Supervised fine-tuning on confidence-labeled data improves both calibration and discrimination of verbalized LLM confidence, but only multitask training transfers across single-question and pairwise tasks.

synesis

Given a problem queue and a token budget, can an LLM plan which to attempt, in what order, and how much to spend on each — before any execution feedback? TRIAGE tests 20 frontier and open-source LLMs. Most plan worse than random. Reasoning-trained modes systematically lose to standard ones. Even when shown its own per-problem budget, the best complier respects it on 37% of attempts.

https://benjaminhan.net/posts/20260523-triage-metacognitive-control/?utm_source=mastodon&utm_medium=social

#Paper #AI #LLMs #Metacognition #Evaluation #AgenticSystems

TRIAGE: Evaluating Prospective Metacognitive Control in LLMs Under Resource Constraints – synesis

A new benchmark scores frontier and open-source LLMs on whether they can plan token-budget allocation across a queue of problems before any execution feedback — and most cannot.

synesis