Mastodawn

GPT-5-Pro đạt 90% tại kỳ thi Miklós Schweitzer 2025, vượt kỳ vọng từ Metaculus. Đây là thành tích ấn tượng đánh dấu bước tiến trong trí tuệ nhân tạo. #GPT5Pro #AI #Metaculus #GPT_5_Pro #TríTuệNhânTạo

https://www.reddit.com/r/singularity/comments/1pklrfh/gpt5pro_achieved_an_amazing_90_score_on_the_2025/

Reddit Tech VN Bot Nov 17, 2025

GPT-5-pro có thể là cổng thông tin đại lý phổ quát / Mô hình đại lý lớn. Chỉ số cho thấy gpt-5-pro là mô hình đại lý lớn #GPT5pro #MôHìnhĐạiLýLớn #TríTuệNhânTạo #AI #UniversalAgenticGateway

https://www.reddit.com/r/LocalLLaMA/comments/1oz6msr/gpt5pro_is_likely_a_universal_agentic_gateway/

Mark Carrigan Oct 13, 2025

Has there been any improvement in AI detectors over the last 12 months? A GPT 5 Pro literature review

I ran this report to support myself in exploring whether my 2023/24 arguments from Generative AI for Academics still hold. Shared here because other people might find this useful.

TL;DR: Over the last 12 months, the centre of gravity has moved further away from “catch‑and‑punish” AI detection toward assessment redesign, process evidence, and transparency. Regulators (e.g., TEQSA in Australia) now say reliable detection “is all but impossible,” several universities have disabled AI detection features, and independent guidance for instructors argues detectors don’t work well enough to be relied on. Vendors keep publishing high headline accuracy numbers, and there’s active research on watermarking and authorship verification—but none of this has translated into a dependable, classroom‑safe detector for typical student writing. teqsa.gov.au+2Inside Higher Ed+2

From the “assessment panic” to a new normal

In the book we described the 2023–24 “great assessment panic”—a rush to outsource academic judgment to detectors in hopes of restoring order overnight. The last year shows that order won’t be restored by a score. What is emerging instead is a culture shift: instructors accept that students will use GenAI, and programmes re‑emphasize authentic tasks, process artefacts, and viva/oral components to evidence learning.

That turn is also consistent with your broader argument about not over‑automating interpersonal judgment: even where automation looks tempting, we should beware brittle tools that offload risk onto students and staff. Detection has become the latest test case for that etiquette.

What changed in 2024–25

1) Policy and regulator signals hardened

TEQSA (Australia) now says “detecting gen AI use with certainty in assessments is, at this point, all but impossible” and urges providers to build in at least one secure assessment per unit and redesign tasks rather than lean on detectors. teqsa.gov.au
UK sector bodies (e.g., QAA) continued to steer institutions toward reconsidering assessment and away from simplistic technological fixes. qaa.ac.uk
Inside Higher Ed reporting in early 2024 captured the mood among many faculty: proceed with caution; detectors risk more harm than good. Inside Higher Ed

2) Universities kept switching off or downgrading detectors

University of Waterloo announced it is discontinuing Turnitin’s AI detection feature in 2025. University of Waterloo
UMass Amherst deactivated Turnitin’s AI detection feature; its teaching centre cites significant limitations in GenAI detectors. UMass Amherst
In Australia, ACU paused Turnitin’s indicator after false accusations and lengthy investigations; other institutions signalled similar moves. staff.acu.edu.au+1

3) Vendors continued strong claims; independent guidance stayed sceptical

Major tools (GPTZero, Originality.AI, Turnitin) publicised ~98–99% accuracy on their own benchmarks. But instructor‑facing guidance at MIT Sloan EdTech and elsewhere remains clear: don’t rely on detectors as proof. MIT Sloan Teaching+3GPTZero+3Originality.ai+3

4) The equity problem didn’t go away

Peer‑reviewed work and round‑ups reiterated that detectors mislabel non‑native English writing at higher rates; that bias remained a practical and ethical blocker. Stanford HAI+1

Has AI detection gotten more reliable?

Short answer: not in the way that matters for day‑to‑day teaching.

Whole‑cloth, unedited AI essays can often be flagged by multiple tools. But the real world is hybrid (drafts revised by students, paraphrased, or partly AI‑assisted), where accuracies drop sharply and error profiles become unacceptable for high‑stakes use. Stress‑testing papers show attacks like light paraphrasing reliably break detectors—including watermark‑based ones. arXiv
Even in 2025, sector guidance aimed at instructors (MIT; IHE; many CTLs) continues to caution that detectors “don’t work” as evidence, and OpenAI’s own AI text classifier remains discontinued for low accuracy. MIT Sloan Teaching+2Inside Higher Ed+2
Institutional risk has become clearer: ACU’s experience (and student protests at Buffalo) illustrates how a single high score can trigger disproportionate harm and erode trust. Adelaide Now+1

Bottom line: There’s no compelling evidence of a step‑change in detector reliability over the last year for typical coursework (short, hybrid, multi‑draft, multilingual). What has changed is policy clarity: “use with caution—never as sole evidence.”

What’s promising (but not there yet): provenance & watermarking

If there’s progress, it’s more on provenance than on retroactive text detection.

C2PA / Content Credentials is gaining adoption across media platforms and tooling (Adobe, Google, Cloudflare, OpenAI joined the steering committee). This helps with images, audio and video by attaching verifiable “nutrition labels,” but text remains tricky—most university workflows strip or transform metadata, and text models rarely ship with robust, persistent marks. The Verge+3Content Authenticity Initiative+3C2PA+3
Watermarking research in 2024 (e.g., Nature paper on scalable schemes) is encouraging, but isn’t widely deployed for text in mainstream LLMs and can be defeated by editing/paraphrase. Nature

Implication: Expect forward‑looking provenance (label at creation) to matter more than after‑the‑fact detection. That helps in journalism and platform governance; it’s much less helpful when grading a student draft pasted from an unknown source.

Where detectors might fit (narrow, safeguarded use)

If your institution still exposes an AI score, keep it in a triage role only:

Corroboration, not causation: Treat any score as a lead to investigate alongside process evidence (draft history, notes, references, code commits). Never as proof. MIT Sloan Teaching

Due process & equity: Document how a concern is raised (patterns across drafts, mismatched level with prior work), offer student reflection opportunities, and avoid one‑off scores—especially for non‑native writers. Stanford HAI

Opt‑in authorship verification: If you adopt stylometry/“authorship” tools, use them with consent and for portfolio baselining over time, not to police isolated assignments. The field is evolving; explainability is better than black‑box “AI %” scores, but it’s not a silver bullet. PMC

This aligns with your book’s argument to use GAI as a meta‑collaborator for process (drafts, notes, meeting digests) rather than as a shortcut to verdicts. Building a traceable writing process—version histories, short oral defenses, design logs—yields better authorship evidence than any detector.

What universities actually did instead (2024–25)

Assessment redesign: TEQSA urges at least one secure/authentic task per unit (in‑class writing, supervised practicals, viva). UK QAA and many CTLs stress process‑rich tasks and staged submissions. teqsa.gov.au+1
Clear student communications: Institutions published guidance discouraging reliance on detectors and explaining acceptable AI use; several disabled AI scoring in Turnitin outright. UMass Amherst+1
Case handling reforms: After false positives, some universities tightened evidence standards and timelines to reduce harm. (See ACU’s March 2025 changes.) staff.acu.edu.au

Why the consensus calls detection a “dead end” (for now)

Adversarial drift: Generators improve faster than detectors; light paraphrase or hybrid editing defeats many heuristics. arXiv

Context mismatch: Benchmarks seldom reflect short, multilingual, hybrid student writing; vendors’ headline metrics don’t generalize to real courses. Inside Higher Ed

Ethical and legal exposure: Bias against non‑native writers, unclear thresholds, and opaque models make high‑stakes decisions indefensible. Stanford HAI

Better alternatives exist: Redesigning assessment and teaching works with GenAI rather than pretending it can be perfectly policed—exactly the direction we explored under your “communication and etiquette” lens.

A practical playbook (what we recommend now)

Name acceptable use in each assessment (permitted tools, disclosure expectations, citation of AI assistance).
Require process artefacts: outlines, reading logs, draft‑to‑final diffs, short audio reflections, and targeted viva where appropriate.
Shift weight to authentic tasks: data labs, site‑specific briefs, oral explanations of decisions, micro‑presentations.
If detection is visible on your campus, write it down: “AI scores are used for triage only, never as sole evidence.”
Invest in staff support for GenAI‑positive pedagogy; prefer formative uses of GAI (clarify goals, rehearse arguments) over punitive tooling. This builds the professional culture your book argues for—augment the human, don’t outsource judgment.

What to watch next

Provenance stack maturity (C2PA, Content Credentials) across more platforms and LMS integrations. Content Authenticity Initiative+1
Model‑side watermarking research; any mainstream text model that ships with robust, edit‑resistant marks would change the detection calculus—but that’s speculative today. Nature
Authorship verification as an opt‑in portfolio capability, not a gotcha detector. Evidence‑informed, explainable methods may help students reflect on their writing while protecting equity. PMC

Bottom line

The weight of evidence this year backs your intuition: AI detection has not become reliably trustworthy for routine academic integrity decisions. The consensus hasn’t just declared it a dead end—it has pivoted toward making assessment resilient to AI rather than attempting to police it perfectly. That shift fits the values you map throughout the project: keep the human at the centre, build better processes, and treat GAI as a tool for learning—not a trap for students.

References & further reading (selection)

TEQSA, Enacting assessment reform in a time of AI (Sept 2025): “detecting gen AI use with certainty… is all but impossible.” teqsa.gov.au
MIT Sloan EdTech: AI detectors don’t work—what to do instead. MIT Sloan Teaching
Inside Higher Ed: Professors proceed with caution using AI-detection tools (Feb 2024). Inside Higher Ed
OpenAI: retired its AI text classifier for low accuracy (July 2023). OpenAI
Stanford HAI: AI detectors biased against non‑native English writers (2023; still cited in 2024–25 practice). Stanford HAI
Nature (2024): scalable watermarking for LLMs (research outlook). Nature
QAA: sector advice on reconsidering assessment in the ChatGPT era. qaa.ac.uk

#AIDetection #assesmentIntegrity #detectors #GPT5Pro

S.v. N.Sönmez Oct 7, 2025

OpenAI'dan Dev Day sürprizleri! 🚀 GPT-5 Pro, Sora 2 ve gpt-realtime mini modelleri API'ye geliyor. Geliştiriciler için yapay zeka dünyasında yeni bir dönem başlıyor. İnovasyon hızlanacak!

🚩 #OpenAI #YapayZeka #GPT5Pro #Sora2 #AITeknoloji #Geliştirici

ResearchBuzz: Firehose Oct 7, 2025

TechCrunch: OpenAI ramps up developer push with more powerful models in its API . “OpenAI unveiled new API updates at its Dev Day on Monday, introducing GPT-5 Pro, its latest language model, its new video generation model Sora 2, and a smaller, cheaper voice model.”

https://rbfirehose.com/2025/10/07/techcrunch-openai-ramps-up-developer-push-with-more-powerful-models-in-its-api/

TechCrunch: OpenAI ramps up developer push with more powerful models in its API | ResearchBuzz: Firehose

ResearchBuzz: Firehose | Individual posts from ResearchBuzz

TechCrunch | Startup and Technology News Oct 6, 2025

OpenAI ramps up developer push with more powerful models in its API

https://web.brid.gy/r/https://techcrunch.com/2025/10/06/openai-ramps-up-developer-push-with-more-powerful-models-in-its-api/

Olam News Sep 6, 2025

GPT-5 vs GPT-5 Pro Differences

GPT-5 and GPT-5 Pro differ mainly in accuracy, speed, and cost. Pro delivers deeper reasoning with internet access.

https://www.olamnews.com/technology/ai/1482/gpt-5-vs-gpt-5-pro-differences/

Hacker News Aug 24, 2025

Claim: GPT-5-pro can prove new interesting mathematics

https://twitter.com/SebastienBubeck/status/1958198661139009862

#HackerNews #GPT5Pro #Mathematics #AI #Innovation #Proofs

Sebastien Bubeck (@SebastienBubeck) on X

Claim: gpt-5-pro can prove new interesting mathematics. Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct. Details below.

X (formerly Twitter)

apfeltalk

Aug 21, 2025

Ben Goertzel: GPT-5 Pro beeindruckt, ist aber keine echte, künstliche Intelligenz
GPT-5 Pro setzt neue Maßstäbe bei Sprachmodellen. Doch Professor Ben Goertzel, Mitprägender des Begriffs Künstliche Allgemeine Intelligenz (AGI), sieht darin noch keine echt
https://www.apfeltalk.de/magazin/news/ben-goertzel-gpt-5-pro-beeindruckt-ist-aber-keine-echte-kuenstliche-intelligenz/
#News #Tellerrand #AGI #BenGoertzel #GPT5Pro #KnstlicheIntelligenz #OpenAI #Sprachmodell #Sprachverstehen #ZukunftKI

Ben Goertzel: GPT-5 Pro beeindruckt, ist aber keine echte, künstliche Intelligenz

GPT-5 Pro übertrifft viele Sprachmodelle, doch laut AGI-Mitbegründer Ben Goertzel sind die technologischen Fortschritte kein Ersatz für echte künstliche Intelligenz mit menschenähnlichen Fähigkeiten.

Apfeltalk Magazin