Mastodawn

Show thread

Mark Crowley Jul 20, 2025

@TaliaRinger @tao totally agree, perfectly said

Show thread

Mark Crowley Jul 20, 2025

@tao so great

Mark Crowley Jul 20, 2025

Terence Tao Jul 19, 2025

It is tempting to view the capability of current AI technology as a singular quantity: either a given task X is within the ability of current tools, or it is not. However, there is in fact a very wide spread in capability (several orders of magnitude) depending on what resources and assistance gives the tool, and how one reports their results.

One can illustrate this with a human metaphor. I will use the recently concluded International Mathematical Olympiad (IMO) as an example. Here, the format is that each country fields a team of six human contestants (high school students), led by a team leader (often a professional mathematician). Over the course of two days, each contestant is given four and a half hours on each day to solve three difficult mathematical problems, given only pen and paper. No communication between contestants (or with the team leader) during this period is permitted, although the contestants can ask the invigilators for clarification on the wording of the problems. The team leader advocates for the students in front of the IMO jury during the grading process, but is not involved in the IMO examination directly.

The IMO is widely regarded as a highly selective measure of mathematical achievement for a high school student to be able to score well enough to receive a medal, particularly a gold medal or a perfect score; this year the threshold for the gold was 35/42, which corresponds to answering five of the six questions perfectly. Even answering one question perfectly merits an "honorable mention". (1/3)

Mark Crowley Jul 20, 2025

Show thread

Talia Ringer Jul 20, 2025

@tao I think there is a broader problem wherein competitions (math, programming, games, whatever) are meant to measure something difficult for humans, but tools work so fundamentally differently from us that success for a tool isn't even necessarily meaningful. AI companies have long viewed the IMO Grand Challenge as a sign of achieving "AGI," but no matter what set of rules a machine follows, there's no reason to believe success for a machine will correlate with broader mathematical or "reasoning" abilities in the way it does for human participants.

Mark Crowley Nov 18, 2024

Testing the new Mastodon to BlueSky bridge. This should let my posts here on Sigmoid.social shownup on BlueSky. I'm not planning to do the reverse yet, we'll see how it goes.

Show thread

Mark Crowley Nov 9, 2024

@emtiyaz oh good, it had been such a slow news week, I was gonna get doom withdrawal (sarcasm)

Show thread

Mark Crowley Oct 21, 2024

@[email protected] Talking to myself...

Mark Crowley Oct 21, 2024

@[email protected] [spider mans pointing at each other meme]

Show thread

Mark Crowley Oct 21, 2024

@donni Daa-da-da-demm, DAA-da-da-DEMM!

love that song, but yah, now that you mention it...

Mark Crowley Oct 8, 2024

Emtiyaz Khan Sep 17, 2024

We have two open post-doc positions. You dont' have to be a Bayesian but somebody who is interested to work with at the intersection of DL, Bayes, and optimization.

https://www.riken.jp/en/careers/researchers/20240917_2/index.html

Interest in understanding deep learning and continual lifelong learning is a plus!

WEBSITE	https://markcrowley.ca
BLOG	https://computationallythinking.com
BIRD	@compthink
ORC-ID	https://orcid.org/0000-0003-3921-4762

Seeking a Research Scientist or a Postdoctoral Researcher at Approximate Bayesian Inference Team (W24162) | RIKEN