| Blog | https://edtechdev.wordpress.com/ |
| Blog | https://edtechdev.wordpress.com/ |

Effective mathematics education requires identifying and responding to students' mistakes. For AI to support pedagogical applications, models must perform well across different levels of student proficiency. Our work provides an extensive, year-long snapshot of how 11 vision-language models (VLMs) perform on DrawEduMath, a QA benchmark involving real students' handwritten, hand-drawn responses to math problems. We find that models' weaknesses concentrate on a core component of math education: student error. All evaluated VLMs underperform when describing work from students who require more pedagogical help, and across all QA, they struggle the most on questions related to assessing student error. Thus, while VLMs may be optimized to be math problem solving experts, our results suggest that they require alternative development incentives to adequately support educational use cases.

LLMs increasingly excel on AI benchmarks, but doing so does not guarantee validity for downstream tasks. This study evaluates the performance of leading foundation models (FMs, i.e., generative pre-trained base LLMs) with out-of-distribution (OOD) tasks of the teaching and learning of schoolchildren. Across all FMs, inter-model behaviors on disparate tasks correlate higher than they do with expert human behaviors on target tasks. These biases shared across LLMs are poorly aligned with downstream measures of teaching quality and often \textit{negatively aligned with learning outcomes}. Further, we find multi-model ensembles, both unanimous model voting and expert-weighting by benchmark performance, further exacerbate misalignment with learning. We measure that 50\% of the variation in misalignment error is shared across foundation models, suggesting that common pretraining accounts for much of the misalignment in these tasks. We demonstrate methods for robustly measuring alignment of complex tasks and provide unique insights into both educational applications of foundation models and to understanding limitations of models.
Φ In the Phaedrus, Plato argued that the invention of writing would destroy our memory and replace true wisdom with a mere shadow of it. He believed that when we stop internalizing knowledge and start relying on external tools, we lose the ability to actually think. Thousands of years later, we are having the exact same conversation about Large Language Models and ChatGPT.
The danger of AI is not that it will become too smart, but that it will make us too lazy to be wise. True education is what Plato called a turning of the soul, a difficult process that requires active engagement. If you let a machine summarize the world for you, you are only holding onto dead speech. We must treat writing and thinking as a practice of the mind rather than a task to be automated.
🧠 Plato feared that external tools create the illusion of knowledge.
⚡ Large Language Models offer quick results while bypassing understanding.
🎓 Genuine insight comes from human dialectic and struggle.
🔍 We must focus on literacy that teaches how these algorithms function.
https://www.templeton.org/news/plato-warned-us-about-chatgpt-and-told-us-what-to-do-about-it
#ArtificialIntelligence #Philosophy #Learning #ChatGPT #Education #Teaching #AI #Technology
Test your browser's fingerprint and weep.
https://coveryourtracks.eff.org/
In my case, "at least 18.2 bits of information", but that seems to be an underestimate based on being unique amongst the recent visitors (~300k = 2^18). If they were independent, the information sources available on me would sum to about 75 bits).
Less than 8 bits of information with the Tor browser.
Browser manufacturers should offer a reduced fingerprint option (rather than only trying to block fingerprinting sites).
Sharing this one for those outside of Australia as its a doozy.
How Australia’s university students are using to AI to cheat their way to a degree
Students are graduating with degrees they never earned as AI tools write their assignments, sit their exams and secure High Distinctions. Why aren’t our universities doing anything about it?