๐๐จ, ๐ฒ๐จ๐ฎ๐ซ ๐›๐ซ๐š๐ข๐ง ๐๐จ๐ž๐ฌ ๐ง๐จ๐ญ ๐ฉ๐ž๐ซ๐Ÿ๐จ๐ซ๐ฆ ๐›๐ž๐ญ๐ญ๐ž๐ซ ๐š๐Ÿ๐ญ๐ž๐ซ ๐‹๐‹๐Œ ๐จ๐ซ ๐๐ฎ๐ซ๐ข๐ง๐  ๐‹๐‹๐Œ ๐ฎ๐ฌ๐ž.

See our paper for more results: "Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task" : https://www.brainonllm.com

For 4 months, 54 students were divided into three groups: ChatGPT, Google -ai, and Brain-only. Across 3 sessions, each wrote essays on SAT prompts. In an optional 4th session, participants switched: LLM users used no tools (LLM-to-Brain), and Brain-only group used ChatGPT (Brain-to-LLM).
๐ˆ. ๐๐‹๐ ๐š๐ง๐ ๐„๐ฌ๐ฌ๐š๐ฒ ๐‚๐จ๐ง๐ญ๐ž๐ง๐ญ
- LLM Group: Essays were highly homogeneous within each topic, showing little variation. Participants often relied on the same expressions or ideas.
- Brain-only Group: Diverse and varied approaches across participants and topics.
- Search Engine Group: Essays were shaped by search engine-optimized content; their ontology overlapped with the LLM group but not with the Brain-only group.
๐ˆ๐ˆ. ๐„๐ฌ๐ฌ๐š๐ฒ ๐’๐œ๐จ๐ซ๐ข๐ง๐  (๐“๐ž๐š๐œ๐ก๐ž๐ซ๐ฌ ๐ฏ๐ฌ. ๐€๐ˆ ๐‰๐ฎ๐๐ ๐ž)
- Teachers detected patterns typical of AI-generated content and scoring LLM essays lower for originality and structure.
- AI Judge gave consistently higher scores to LLM essays, missing human-recognized stylistic traits.
๐ˆ๐ˆ๐ˆ: ๐„๐„๐† ๐€๐ง๐š๐ฅ๐ฒ๐ฌ๐ข๐ฌ
Connectivity: Brain-only group showed the highest neural connectivity, especially in alpha, theta, and delta bands. LLM users had the weakest connectivity, up to 55% lower in low-frequency networks. Search Engine group showed high visual cortex engagement, aligned with web-based information gathering.

๐‘บ๐’†๐’”๐’”๐’Š๐’๐’ 4 ๐‘น๐’†๐’”๐’–๐’๐’•๐’”:
- LLM-to-Brain (๐Ÿค–๐Ÿค–๐Ÿค–๐Ÿง ) participants underperformed cognitively with reduced alpha/beta activity and poor content recall.
- Brain-to-LLM (๐Ÿง ๐Ÿง ๐Ÿง ๐Ÿค–) participants showed strong re-engagement, better memory recall, and efficient tool use.

LLM-to-Brain participants had potential limitations in achieving robust neural synchronization essential for complex cognitive tasks.

Results for Brain-to-LLM participants suggest that timing of AI tool introduction following initial self-driven effort may enhance engagement and neural integration.
๐ˆ๐•. ๐๐ž๐ก๐š๐ฏ๐ข๐จ๐ซ๐š๐ฅ ๐š๐ง๐ ๐‚๐จ๐ ๐ง๐ข๐ญ๐ข๐ฏ๐ž ๐„๐ง๐ ๐š๐ ๐ž๐ฆ๐ž๐ง๐ญ
- Quoting Ability: LLM users failed to quote accurately, while Brain-only participants showed robust recall and quoting skills.
- Ownership: Brain-only group claimed full ownership of their work; LLM users expressed either no ownership or partial ownership.
- Critical Thinking: Brain-only participants cared more about ๐˜ธ๐˜ฉ๐˜ข๐˜ต and ๐˜ธ๐˜ฉ๐˜บ they wrote; LLM users focused on ๐˜ฉ๐˜ฐ๐˜ธ.
- Cognitive Debt: Repeated LLM use led to shallow content repetition and reduced critical engagement. This suggests a buildup of "cognitive debt", deferring mental effort at the cost of long-term cognitive depth.
@nataliyakosmyna Thank you for this. I am finishing a paper with my students on hands-on cybersecurity education where a major part of our argument is that we can't afford this kind of LLM malaise/cognitive debt in the field.
@bcallah we cannot really afford any cognitive debt as this is the one which cannot be written off.
@nataliyakosmyna @bcallah Why do you say that cognitive debt "cannot be written off"? Have you done any follow up work on getting people from the LLM groups trying to catch up with the Brain group?

@SubductionRheology @bcallah we actually did more than one study, eg more than an essay writing task. I will not talk about it too much as even the preprint is not available just yet, but here is one additional, important point which is in this paper: Brain-to-LLM group in session 4 accepted LLM suggestions.

So more work ahead to understand all the implications further.

@nataliyakosmyna @SubductionRheology @bcallah
Ever more evidence that using LLMs for real tasks is more like having a machine lift weights for you than uaing a dishwasher.

@a_cubed @nataliyakosmyna @SubductionRheology @bcallah

You will be shocked by the earthmoving advances in the last 200 years!

@bcallah @nataliyakosmyna on the other hand, we are lacking 3mio security specialists.
So LLMs can help
@Okuna @nataliyakosmyna Closer to 5 million according to latest industry research. And we don't know if/how much LLMs can help. That's the whole point of doing research. I have a study (not even in preprint stage yet) that suggests that for human-facing cybersecurity tasks (e.g., security awareness training), LLMs can be more of a hindrance rather than a help.
@bcallah @nataliyakosmyna We already know how LLMs can help. The point is to train your LLm with security related data to get rid off hallucination. And there are some around. We should not use the LLM to decide though, but as counter parties to ask questions, or to scan the attack surface in the local network. You can input the CVEs into the LLM to learn from them and then ask about your vulnerabilities. Summarization of security findings and reports is another point. In general, all which is routine can be done. Scanning of huge amounts of log files, finding patterns in them, โ€ฆ
In my ITSRM classes one chapter is AI. How it can harm and ow it can help
@bcallah @nataliyakosmyna https://youtu.be/4QzBdeUQ0Dc?feature=shared
IBM on AI in Cybersecurity. A good starting point, I think
AI in Cybersecurity

YouTube
@nataliyakosmyna Hi! Thank you for your work on this. I am not a scientist but I am (reluctantly) an โ€œAI skepticโ€, and I can see many fellow skeptics quoting this approvingly as if it is smoking-gun proof that LLMs are pure brain poison. Anything that confirms my worldview this directly I find is cause for skepticism! Would you mind some methodological questions that may be somewhat ill-informed from a layperson?
@glyph sure, go for it! Note that we are stating it clearly in the paper - there are several limitations (group size, geographical location of participants, etc), but we believe it is a good first step to move forward towards these studies! But I hear the โ€œconfirmation biasโ€ mentioned a lot today as well!
@nataliyakosmyna One of the biggest questions that I have is that the selection of such a wide variety of apparently unrelated metrics is the sort of thing one sees in psychometric research that is doing (wittingly or unwittingly) p-hacking: collect 200 metrics, write a paper on the 30 where p<0.05. AI judge, human essay eval, EEG, NLP, etc sort of smells like that. Were any other metrics measured that didn't make it into the paper? What were the criteria used to select the things you measured?
@nataliyakosmyna I have admittedly only had time to skim thus far but I didn't see the motivation or reasoning behind the selection of all the stuff that got measured here in the "EXPERIMENTAL DESIGN" section of the paper, which seemed mostly focused on detailing the protocol involved
@glyph Experimental Design section only talks about what was done e.g., the study protocol. You need to go into the subsections - eg NLP, EEG

@glyph ok so I would need to send you back to read the paper, but I will still try to unpack it a bit for you! The first thing first: to denote different levels of significance in figures and results, we adopted the following convention:
โ€ข p < 0.05 was considered statistically significant and is marked with a single asterisk (*)
โ€ข p < 0.01 with a double asterisk (**)
โ€ข p < 0.001 with a triple asterisk ***)

Most things you see as results in the paper are actually *** . (1)

@glyph we did it this way because I originally had a hypothesis that the n-grams used might correlate with the brain activity of the subjects. And to show this - you need to analyze both.
@nataliyakosmyna thanks, I had skipped those sections because I was looking for an overview that talked about the collection rather than the individual metrics. Will head back to look at those!
@nataliyakosmyna I have some other questions but they might be obviated by understanding your reasoning on this part better, so maybe I will have more in a few days :)
@glyph sure! I will be going offline soon, but do try to give it a read! Hopefully my explanations made it a bit clearer/better! Also I recommend you follow a TL;DR suggestion - go through the Abstract, Intro, Discussion, Conclusion and Limitations first. Then ease yourself into the rest of the paper or the parts that sound more exciting to you!
@glyph as the task is the essay writing - when the essays were written - I did not feel being actually able to evaluate them as Iโ€™m not trained in this type of task. So I went ahead and hired two teachers (two, not one, to avoid bias, etc) to do this work for us. I felt that this is the right thing to do - the task should be evaluated by the experts. They were not given any details about the study/groups, etc, so that also avoids any potential conflicts of interest/additional bias.
@glyph full transparency - I was told that there need to be three separate papers, not one (as of right now - I am currently literally breaking this document into three for peer-review submission, as not a single journal will accept such a lengthy manuscript). But I felt that presenting the whole picture is โ€œworthโ€ it, so everyone can read for themselves how correlated all the findings are. For me - it is not about how many papers total will come out of this one - it is about the findings.

@nataliyakosmyna Useful info!

Did this research track differences in neurotypes?

@nataliyakosmyna
C'est encore pour toi @Looping โคด๏ธ !
@nataliyakosmyna been thinking about this a bunch today. "Cognitive debt" and "bot-splaining" are two good recent additions to my vocabulary
@nataliyakosmyna how do you format fancy like that?
@nataliyakosmyna (rewriting this for the third time) I feel this post might sound too harsh; I am surprized to find in such a study the need to do some PR for OpenAI: I feel here you adopt a common pov that AI use in inevitable, thus finding its "best use" desirable. Is it caution with respect to pro-AI backlash?

@BrKloeckner no PR for OpenAI, important point mentioned in the paper: โ€œBrain-to-LLM group in session 4 accepted LLM suggestions.โ€

So more work ahead to understand all the implications further! (And we did some more already, the results are even worseโ€ฆ)

@nataliyakosmyna I must confess I do not understand your answer.

To clarify my point, which was possibly too superficially made, is that you somehow seem to be looking how to best use LLM (by timing strategically) while the main results suggest that the best might be not to use them at all. I can see how anyone willing to use LLM or promote their use could basically dismiss the core of your work by saying "sure, you have to do it *right*, good timing etc." just like we hear all the time "sure you have to use it ethically, but you have to use it" (and then ethics get forgotten anyway).

@nataliyakosmyna (More generally, I wonder why you chose to have Brain-to-LLM and LLM-to-Brain groups, instead of Brain vs LLM. This choice already feels like LLM use is deemed inevitable.)
@BrKloeckner there are both Brain-only and LLM-only groups in the paper, as well as the third group, the Search Engine, and we do compare those pair-wise.
@nataliyakosmyna Ok, sorry!
@BrKloeckner you are good! But if truly curious - do check the abstract (in the paper, not the arxiv one, that one is too short), intro, discussion, conclusion and limitations.
@BrKloeckner just do it yourself, do not use AI for a summary! It is around 10ish pages total, so more like a regular size conference paper.
@nataliyakosmyna @BrKloeckner Using "AI" to summarise that paper would be quite an insult =)
@richlv @BrKloeckner yes! And it does the job very poorly beyond that table we made for it!
@BrKloeckner the idea of the paper is not and was not to find best timings for LLM use. I see where you are coming from with your comment though, I feel like โ€œconfirmation biasโ€ is going strong here, eg everyone who hates LLMs - will run with our paper, and then the โ€œtech bros who are waking up on the West coastโ€ (quote curtesy from someone on LinkedIn), will try to use it to justify their use and investments. If you read the paper - we did science and delivered it in scientific tone.
@BrKloeckner my suggestion is for you to read the paper, I understand that you might not have time for a 200 page read, but follow up on our TL;DR recommendation to read the Discussion, Limitations and Conclusion.

@nataliyakosmyna

translation: ai likes its own farts

@nataliyakosmyna O wow. AI rates AI higber than humans. #whodathunk
@nataliyakosmyna this is a very small sample to derive conclusions.
@nataliyakosmyna thanks for the summary! The results are scary. We already had a large experiment with uncontrolled exposure to social media and ubiquitous screens, now we are exposing a large part of students and workers to LLMs, with unforeseeable results. Especially the loss of original thought and a variety of thought through the use of LLMs may have devastating results.
@carstenfranke yes. The experiment is happening in real time and none of us consented to it!
@nataliyakosmyna yup, I am living with the results of the first experiment... My 20 year old stepson has an extreme screen addiction, he was given a tablet at age 6, way before I met him. It is hard to get him to do anything apart from a screen, he is on it from 7 am to midnight... When we now degrade thinking skills...
@carstenfranke doom scrolling and social media are really the worst enemies of a healthy (developing) brain. I grew up with no phone in school and I got my first iPad in 2010/2011 (used the money from school that were given for a computer and got the very first iPad instead). It ended up at my patents. They loved it. Never worked for me. I buy paper books and take paper notes with a pen/pencil - always a pleasure to get those from new and interesting locations when traveling!
@nataliyakosmyna @carstenfranke and your association with google???
@donhawkins @carstenfranke yep, I joined them after the study was done and analyzed as a visiting researcher! Thus no association whatsoever with this work, but I do disclose this potential conflict of interest in the paper clearly!
@carstenfranke @nataliyakosmyna IMHO Google is the dark-side. Or, a major component.
@nataliyakosmyna thanks so much for the great summary thread! Any plans to do similar research on programming, or aware of anyone else doing it?
Iโ€™d speculate results would be similar
@dgodon we have already done one in CS! Results are even worse โ€ฆ. The paper is incoming ! We could not package these two together - the paper would been 400 pages and no one except an LLM read it