Préparer un oral, ça se travaille… mais pas forcément avec des outils compliqués. ✅🙂
Avec Mon Oral, vous pouvez créer des entraînements structurés et accessibles à tous.
À découvrir ici 👇
https://outilstice.com/2023/02/mon-oral-application-libre-et-gratuite-pour-creer-des-entrainements-a-loral/
#Enseignants #Classe #Evaluation #CompetencesOrales #NumeriqueEducatif #OpenSource
Mon Oral : outil gratuit pour l'enregistrement audio des élèves et le Grand Oral

Vos élèves s'enregistrent en un clic, sans compte. Mon Oral est l'outil gratuit pour préparer le Grand Oral, le brevet et les activités orales en classe.

Les Outils Tice

En deux mots (fantastiques), une autre définition de ce fameux "Indice h" qui, pour celleux qui y croient, a pour but "de quantifier la productivité scientifique" (wiki):

"Le H-index est cette merveilleuse invention qui permet de donner une apparence scientifique à une vieille passion humaine: comparer la taille de ses attributs symboliques en prétendant parler d’#excellence."

https://mrmondialisation.org/tribune-recherche-se-vendre/

#HIndex #recherche #science #évaluation #ESR
ping @academia_carnet

Un tribunal enjoint aux autorités suisses de dévoiler les critères d’évaluation utilisés pour le choix du F-35A

https://fed.brid.gy/r/https://www.opex360.com/2026/05/30/un-tribunal-enjoint-les-autorites-suisses-a-devoiler-les-criteres-devaluation-utilises-pour-le-choix-du-f-35a/

Does brief AI help in a learning task hurt later unaided performance? Three controlled trials (N=1,222) on math and reading say yes, and the effect is carried entirely by the 61% who asked for direct answers. Participants who asked for hints or clarifications performed the same as the no-AI control. Whether AI assistance impairs later performance depends on how the user prompts it.

https://benjaminhan.net/posts/20260528-ai-assistance-reduces-persistence/?utm_source=mastodon&utm_medium=social

#AI #Paper #Education #Evaluation #CognitiveScience

AI Assistance Reduces Persistence and Hurts Independent Performance – synesis

Three randomized controlled trials (N = 1,222) show that ten-minute AI-assisted sessions on math and reading tasks lower participants’ subsequent unaided solve rate and raise their give-up rate, with effects concentrated among users who prompt the AI for direct answers.

synesis

Im Juni gibt es noch freie Beratungstermine zur Evaluation! Unsere Julia Panzer und Vincent Schmid-Loertzer von der Impact Unit beraten euch in 30-minütigen Video-Calls zur Evaluation eurer #Wisskomm. Das Angebot ist kostenfrei.

Jetzt mit wenigen Klicks einen Termin über die Website der Impact Unit buchen:
https://impactunit.de/evaluationsberatung/

#Wissenschaft #Forschung #Evaluation

On whether LLMs can abstain effectively and whether chain-of-thought can help, two recent papers seem at odds on the surface. COLING 2025 finds prompted CoT raises abstention on instruct models. AbstentionBench (NeurIPS 2025) finds extending the reasoning budget lowers it on a trained reasoner. What gives?

https://benjaminhan.net/posts/20260527-prompted-vs-trained-cot-abstention/?utm_source=mastodon&utm_medium=social

#Metacognition #LLMs #Reasoning #Evaluation #AI

[Followup] Prompted vs. Trained Chain-of-Thought on Abstention: Reading Two Studies Together – synesis

Why prompted chain-of-thought raises abstention recall on instruct models in COLING 2025 but extending the reasoning budget on a trained reasoner lowers it in AbstentionBench, and three experiments that would clarify the picture.

synesis
California officials say potential crack on overheated chemical tank could lower risk
Orange County Fire Authority Capt. Wayhowe Huang said officials will be continuing to evaluate the tank on Sunday after emergency crews spotted the potential crack overnight. As of Sunday morning, he said it does not appear that any of the highly volatile chemicals in the tank have leaked.
https://www.cbc.ca/news/world/overheated-chemical-tank-california-crack-evacuations-9.7210539?cmp=rss

Как мы научили ИИ-агента отвечать за свои слова: 10 000 сообщений, Венгерский алгоритм и немного магии

На связи Сергей Смирнов, AI-инженер и основатель LLMStart.ru. Сегодня разбираем самое больное место разработки ИИ-агентов — как доказать, что они реально умнеют, а не просто пускают пыль в глаза. В статье я покажу изнанку нашей системы оценки: — Как 10 000 живых переписок превратились в эталоны для тестов. — Почему стандартные метрики безжалостно валили нашего агента (и зачем нам понадобился Венгерский алгоритм из 1955 года). — И что делать, если метрика падает просто потому, что ИИ оказался умнее вашего устаревшего эталона! Читайте полный разбор с цифрами, кейсами и откровенными провалами…

https://habr.com/ru/companies/llmstart/articles/1038512/

#evaluation #метрики_качества #LLMагенты #Ragas #LangFuse #RAG #Венгерский_алгоритм #AIdriven_разработка #LangChain #langchain_агенты

Как мы научили ИИ-агента отвечать за свои слова: 10 000 сообщений, Венгерский алгоритм и немного магии

Как мы научили ИИ-агента отвечать за свои слова: 10 000 сообщений, Венгерский алгоритм и немного магии На связи Сергей Смирнов, AI-инженер и основатель LLMStart.ru . Мы делаем AI-системы для бизнеса....

Хабр

Last week, the communications managers from all @NFDI consortia met in Mainz to exchange best practices and discuss joint campaigns. 👀

Jana Bendigs attended the meeting together with Inga Mohr and led a workshop on the #evaluation of #sciencecommunication.

We would like to express our thanks for the excellent organization, the pleasant atmosphere, and the stimulating discussions, as well as to @nfdi4culture, @NFDI4Memory and @NFDI4Chem for the organization.

Given a problem queue and a token budget, can an LLM plan which to attempt, in what order, and how much to spend on each — before any execution feedback? TRIAGE tests 20 frontier and open-source LLMs. Most plan worse than random. Reasoning-trained modes systematically lose to standard ones. Even when shown its own per-problem budget, the best complier respects it on 37% of attempts.

https://benjaminhan.net/posts/20260523-triage-metacognitive-control/?utm_source=mastodon&utm_medium=social

#Paper #AI #LLMs #Metacognition #Evaluation #AgenticSystems

TRIAGE: Evaluating Prospective Metacognitive Control in LLMs Under Resource Constraints – synesis

A new benchmark scores frontier and open-source LLMs on whether they can plan token-budget allocation across a queue of problems before any execution feedback — and most cannot.

synesis