Mastodawn

CelloMom On Cars 21h ago

nataliyakosmyna

𝐍𝐨, 𝐲𝐨𝐮𝐫 𝐛𝐫𝐚𝐢𝐧 𝐝𝐨𝐞𝐬 𝐧𝐨𝐭 𝐩𝐞𝐫𝐟𝐨𝐫𝐦 𝐛𝐞𝐭𝐭𝐞𝐫 𝐚𝐟𝐭𝐞𝐫 𝐋𝐋𝐌 𝐨𝐫 𝐝𝐮𝐫𝐢𝐧𝐠 𝐋𝐋𝐌 𝐮𝐬𝐞.

See our paper for more results: "Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task" : https://www.brainonllm.com

nataliyakosmyna 1d ago

For 4 months, 54 students were divided into three groups: ChatGPT, Google -ai, and Brain-only. Across 3 sessions, each wrote essays on SAT prompts. In an optional 4th session, participants switched: LLM users used no tools (LLM-to-Brain), and Brain-only group used ChatGPT (Brain-to-LLM).

nataliyakosmyna 1d ago

𝐈. 𝐍𝐋𝐏 𝐚𝐧𝐝 𝐄𝐬𝐬𝐚𝐲 𝐂𝐨𝐧𝐭𝐞𝐧𝐭
- LLM Group: Essays were highly homogeneous within each topic, showing little variation. Participants often relied on the same expressions or ideas.
- Brain-only Group: Diverse and varied approaches across participants and topics.
- Search Engine Group: Essays were shaped by search engine-optimized content; their ontology overlapped with the LLM group but not with the Brain-only group.

nataliyakosmyna 1d ago

𝐈𝐈. 𝐄𝐬𝐬𝐚𝐲 𝐒𝐜𝐨𝐫𝐢𝐧𝐠 (𝐓𝐞𝐚𝐜𝐡𝐞𝐫𝐬 𝐯𝐬. 𝐀𝐈 𝐉𝐮𝐝𝐠𝐞)
- Teachers detected patterns typical of AI-generated content and scoring LLM essays lower for originality and structure.
- AI Judge gave consistently higher scores to LLM essays, missing human-recognized stylistic traits.

nataliyakosmyna 1d ago

𝐈𝐈𝐈: 𝐄𝐄𝐆 𝐀𝐧𝐚𝐥𝐲𝐬𝐢𝐬
Connectivity: Brain-only group showed the highest neural connectivity, especially in alpha, theta, and delta bands. LLM users had the weakest connectivity, up to 55% lower in low-frequency networks. Search Engine group showed high visual cortex engagement, aligned with web-based information gathering.

nataliyakosmyna 1d ago

𝑺𝒆𝒔𝒔𝒊𝒐𝒏 4 𝑹𝒆𝒔𝒖𝒍𝒕𝒔:
- LLM-to-Brain (🤖🤖🤖🧠) participants underperformed cognitively with reduced alpha/beta activity and poor content recall.
- Brain-to-LLM (🧠🧠🧠🤖) participants showed strong re-engagement, better memory recall, and efficient tool use.

LLM-to-Brain participants had potential limitations in achieving robust neural synchronization essential for complex cognitive tasks.

nataliyakosmyna 1d ago

Results for Brain-to-LLM participants suggest that timing of AI tool introduction following initial self-driven effort may enhance engagement and neural integration.

nataliyakosmyna 1d ago

𝐈𝐕. 𝐁𝐞𝐡𝐚𝐯𝐢𝐨𝐫𝐚𝐥 𝐚𝐧𝐝 𝐂𝐨𝐠𝐧𝐢𝐭𝐢𝐯𝐞 𝐄𝐧𝐠𝐚𝐠𝐞𝐦𝐞𝐧𝐭
- Quoting Ability: LLM users failed to quote accurately, while Brain-only participants showed robust recall and quoting skills.
- Ownership: Brain-only group claimed full ownership of their work; LLM users expressed either no ownership or partial ownership.
- Critical Thinking: Brain-only participants cared more about 𝘸𝘩𝘢𝘵 and 𝘸𝘩𝘺 they wrote; LLM users focused on 𝘩𝘰𝘸.

nataliyakosmyna 1d ago

- Cognitive Debt: Repeated LLM use led to shallow content repetition and reduced critical engagement. This suggests a buildup of "cognitive debt", deferring mental effort at the cost of long-term cognitive depth.

Dr. Brian Callahan 1d ago

@nataliyakosmyna Thank you for this. I am finishing a paper with my students on hands-on cybersecurity education where a major part of our argument is that we can't afford this kind of LLM malaise/cognitive debt in the field.

nataliyakosmyna 1d ago

@bcallah we cannot really afford any cognitive debt as this is the one which cannot be written off.

SubductionRheology 1d ago

@nataliyakosmyna @bcallah Why do you say that cognitive debt "cannot be written off"? Have you done any follow up work on getting people from the LLM groups trying to catch up with the Brain group?

nataliyakosmyna 1d ago

@SubductionRheology @bcallah we actually did more than one study, eg more than an essay writing task. I will not talk about it too much as even the preprint is not available just yet, but here is one additional, important point which is in this paper: Brain-to-LLM group in session 4 accepted LLM suggestions.

So more work ahead to understand all the implications further.

Dr Andrew A. Adams #FBPE 🔶1d ago

@nataliyakosmyna @SubductionRheology @bcallah
Ever more evidence that using LLMs for real tasks is more like having a machine lift weights for you than uaing a dishwasher.

@a_cubed @nataliyakosmyna @SubductionRheology @bcallah

You will be shocked by the earthmoving advances in the last 200 years!

@bcallah @nataliyakosmyna on the other hand, we are lacking 3mio security specialists.
So LLMs can help

Dr. Brian Callahan 23h ago

@Okuna @nataliyakosmyna Closer to 5 million according to latest industry research. And we don't know if/how much LLMs can help. That's the whole point of doing research. I have a study (not even in preprint stage yet) that suggests that for human-facing cybersecurity tasks (e.g., security awareness training), LLMs can be more of a hindrance rather than a help.

@bcallah @nataliyakosmyna We already know how LLMs can help. The point is to train your LLm with security related data to get rid off hallucination. And there are some around. We should not use the LLM to decide though, but as counter parties to ask questions, or to scan the attack surface in the local network. You can input the CVEs into the LLM to learn from them and then ask about your vulnerabilities. Summarization of security findings and reports is another point. In general, all which is routine can be done. Scanning of huge amounts of log files, finding patterns in them, …
In my ITSRM classes one chapter is AI. How it can harm and ow it can help

@bcallah @nataliyakosmyna https://youtu.be/4QzBdeUQ0Dc?feature=shared
IBM on AI in Cybersecurity. A good starting point, I think

AI in Cybersecurity

YouTube

@nataliyakosmyna Hi! Thank you for your work on this. I am not a scientist but I am (reluctantly) an “AI skeptic”, and I can see many fellow skeptics quoting this approvingly as if it is smoking-gun proof that LLMs are pure brain poison. Anything that confirms my worldview this directly I find is cause for skepticism! Would you mind some methodological questions that may be somewhat ill-informed from a layperson?

nataliyakosmyna 1d ago

@glyph sure, go for it! Note that we are stating it clearly in the paper - there are several limitations (group size, geographical location of participants, etc), but we believe it is a good first step to move forward towards these studies! But I hear the “confirmation bias” mentioned a lot today as well!

@nataliyakosmyna One of the biggest questions that I have is that the selection of such a wide variety of apparently unrelated metrics is the sort of thing one sees in psychometric research that is doing (wittingly or unwittingly) p-hacking: collect 200 metrics, write a paper on the 30 where p<0.05. AI judge, human essay eval, EEG, NLP, etc sort of smells like that. Were any other metrics measured that didn't make it into the paper? What were the criteria used to select the things you measured?

@nataliyakosmyna I have admittedly only had time to skim thus far but I didn't see the motivation or reasoning behind the selection of all the stuff that got measured here in the "EXPERIMENTAL DESIGN" section of the paper, which seemed mostly focused on detailing the protocol involved

nataliyakosmyna 1d ago

@glyph Experimental Design section only talks about what was done e.g., the study protocol. You need to go into the subsections - eg NLP, EEG

nataliyakosmyna 1d ago

@glyph ok so I would need to send you back to read the paper, but I will still try to unpack it a bit for you! The first thing first: to denote different levels of significance in figures and results, we adopted the following convention:
• p < 0.05 was considered statistically significant and is marked with a single asterisk (*)
• p < 0.01 with a double asterisk (**)
• p < 0.001 with a triple asterisk ***)

Most things you see as results in the paper are actually *** . (1)

nataliyakosmyna 1d ago

@glyph we did it this way because I originally had a hypothesis that the n-grams used might correlate with the brain activity of the subjects. And to show this - you need to analyze both.

@nataliyakosmyna thanks, I had skipped those sections because I was looking for an overview that talked about the collection rather than the individual metrics. Will head back to look at those!

@nataliyakosmyna I have some other questions but they might be obviated by understanding your reasoning on this part better, so maybe I will have more in a few days :)

nataliyakosmyna 1d ago

@glyph sure! I will be going offline soon, but do try to give it a read! Hopefully my explanations made it a bit clearer/better! Also I recommend you follow a TL;DR suggestion - go through the Abstract, Intro, Discussion, Conclusion and Limitations first. Then ease yourself into the rest of the paper or the parts that sound more exciting to you!

nataliyakosmyna 1d ago

@glyph as the task is the essay writing - when the essays were written - I did not feel being actually able to evaluate them as I’m not trained in this type of task. So I went ahead and hired two teachers (two, not one, to avoid bias, etc) to do this work for us. I felt that this is the right thing to do - the task should be evaluated by the experts. They were not given any details about the study/groups, etc, so that also avoids any potential conflicts of interest/additional bias.

nataliyakosmyna 1d ago

@glyph full transparency - I was told that there need to be three separate papers, not one (as of right now - I am currently literally breaking this document into three for peer-review submission, as not a single journal will accept such a lengthy manuscript). But I felt that presenting the whole picture is “worth” it, so everyone can read for themselves how correlated all the findings are. For me - it is not about how many papers total will come out of this one - it is about the findings.

@nataliyakosmyna Useful info!

Did this research track differences in neurotypes?

@nataliyakosmyna
C'est encore pour toi @Looping ⤴️ !

Aaron In Minnesota 10h ago

@nataliyakosmyna been thinking about this a bunch today. "Cognitive debt" and "bot-splaining" are two good recent additions to my vocabulary

tusharhero 20h ago

@nataliyakosmyna how do you format fancy like that?

Benoît Régent-Kloeckner 1d ago

@nataliyakosmyna (rewriting this for the third time) I feel this post might sound too harsh; I am surprized to find in such a study the need to do some PR for OpenAI: I feel here you adopt a common pov that AI use in inevitable, thus finding its "best use" desirable. Is it caution with respect to pro-AI backlash?

nataliyakosmyna 1d ago

@BrKloeckner no PR for OpenAI, important point mentioned in the paper: “Brain-to-LLM group in session 4 accepted LLM suggestions.”

So more work ahead to understand all the implications further! (And we did some more already, the results are even worse…)

Benoît Régent-Kloeckner 1d ago

@nataliyakosmyna I must confess I do not understand your answer.

To clarify my point, which was possibly too superficially made, is that you somehow seem to be looking how to best use LLM (by timing strategically) while the main results suggest that the best might be not to use them at all. I can see how anyone willing to use LLM or promote their use could basically dismiss the core of your work by saying "sure, you have to do it *right*, good timing etc." just like we hear all the time "sure you have to use it ethically, but you have to use it" (and then ethics get forgotten anyway).

Benoît Régent-Kloeckner 1d ago

@nataliyakosmyna (More generally, I wonder why you chose to have Brain-to-LLM and LLM-to-Brain groups, instead of Brain vs LLM. This choice already feels like LLM use is deemed inevitable.)

nataliyakosmyna 1d ago

@BrKloeckner there are both Brain-only and LLM-only groups in the paper, as well as the third group, the Search Engine, and we do compare those pair-wise.

Benoît Régent-Kloeckner 1d ago

@nataliyakosmyna Ok, sorry!

nataliyakosmyna 1d ago

@BrKloeckner you are good! But if truly curious - do check the abstract (in the paper, not the arxiv one, that one is too short), intro, discussion, conclusion and limitations.

nataliyakosmyna 1d ago

@BrKloeckner just do it yourself, do not use AI for a summary! It is around 10ish pages total, so more like a regular size conference paper.

Rihards Olups 22h ago

@nataliyakosmyna @BrKloeckner Using "AI" to summarise that paper would be quite an insult =)

nataliyakosmyna 22h ago

@richlv @BrKloeckner yes! And it does the job very poorly beyond that table we made for it!

nataliyakosmyna 1d ago

@BrKloeckner the idea of the paper is not and was not to find best timings for LLM use. I see where you are coming from with your comment though, I feel like “confirmation bias” is going strong here, eg everyone who hates LLMs - will run with our paper, and then the “tech bros who are waking up on the West coast” (quote curtesy from someone on LinkedIn), will try to use it to justify their use and investments. If you read the paper - we did science and delivered it in scientific tone.

nataliyakosmyna 1d ago

@BrKloeckner my suggestion is for you to read the paper, I understand that you might not have time for a 200 page read, but follow up on our TL;DR recommendation to read the Discussion, Limitations and Conclusion.

Wintermute_BBS 1d ago

@nataliyakosmyna LLM - the dumb down device

@nataliyakosmyna

translation: ai likes its own farts

@nataliyakosmyna O wow. AI rates AI higber than humans. #whodathunk

@nataliyakosmyna this is a very small sample to derive conclusions.

Carsten Franke 1d ago

@nataliyakosmyna thanks for the summary! The results are scary. We already had a large experiment with uncontrolled exposure to social media and ubiquitous screens, now we are exposing a large part of students and workers to LLMs, with unforeseeable results. Especially the loss of original thought and a variety of thought through the use of LLMs may have devastating results.

nataliyakosmyna 1d ago

@carstenfranke yes. The experiment is happening in real time and none of us consented to it!

Carsten Franke 1d ago

@nataliyakosmyna yup, I am living with the results of the first experiment... My 20 year old stepson has an extreme screen addiction, he was given a tablet at age 6, way before I met him. It is hard to get him to do anything apart from a screen, he is on it from 7 am to midnight... When we now degrade thinking skills...

nataliyakosmyna 1d ago

@carstenfranke doom scrolling and social media are really the worst enemies of a healthy (developing) brain. I grew up with no phone in school and I got my first iPad in 2010/2011 (used the money from school that were given for a computer and got the very first iPad instead). It ended up at my patents. They loved it. Never worked for me. I buy paper books and take paper notes with a pen/pencil - always a pleasure to get those from new and interesting locations when traveling!

Don Hawkins - W7DAH 1d ago

@nataliyakosmyna @carstenfranke and your association with google???

nataliyakosmyna 1d ago

@donhawkins @carstenfranke yep, I joined them after the study was done and analyzed as a visiting researcher! Thus no association whatsoever with this work, but I do disclose this potential conflict of interest in the paper clearly!

Don Hawkins - W7DAH 1d ago

@carstenfranke @nataliyakosmyna IMHO Google is the dark-side. Or, a major component.

@nataliyakosmyna thanks so much for the great summary thread! Any plans to do similar research on programming, or aware of anyone else doing it?
I’d speculate results would be similar

nataliyakosmyna 1d ago

@dgodon we have already done one in CS! Results are even worse …. The paper is incoming ! We could not package these two together - the paper would been 400 pages and no one except an LLM read it

@nataliyakosmyna awesome work!