I was able to use an extended conversation with an AI https://chatgpt.com/share/68ded9b1-37dc-800e-b04c-97095c70eb29 to help answer a MathOverflow question https://mathoverflow.net/questions/501066/is-the-least-common-multiple-sequence-textlcm1-2-dots-n-a-subset-of-t/501125#501125 . I had already conducted a theoretical analysis suggesting that the answer to this question was negative, but needed some numerical parameters verifying certain inequalities in order to conclusively build a counterexample. Initially I sought to ask AI to supply Python code to search for a counterexample that I could run and adjust myself, but found that the run time was infeasible and the initial choice of parameters would have made the search doomed to failure anyway. I then switched strategies and instead engaged in a step by step conversation with the AI where it would perform heuristic calculations to locate feasible choices of parameters. Eventually, the AI was able to produce parameters which I could then verify separately (admittedly using Python code supplied by the same AI, but this was a simple 29-line program that I could visually inspect to do what was asked, and also provided numerical values in line with previous heuristic predictions).

Here, the AI tool use was a significant time saver - doing the same task unassisted would likely have required multiple hours of manual code and debugging (the AI was able to use the provided context to spot several mathematical mistakes in my requests, and fix them before generating code). Indeed I would have been very unlikely to even attempt this numerical search without AI assistance (and would have sought a theoretical asymptotic analysis instead).

ChatGPT - Conjecture disproving strategy

A conversational AI system that listens, learns, and challenges

ChatGPT
I encountered no issues with hallucinations or other AI-generated nonsense. I think the reason for this is that I already had a pretty good idea of what the tedious computational tasks that needed to be performed, and could explain them in detail to the AI in a step-by-step fashion, with each step confirmed in a conversation with the AI before moving on to the next step. After switching strategies to the conversational approach, external validation with Python was only used at the very end, when the AI was able to generate numerical outputs that it claimed to obey the required constraints (which they did).
@tao I agree that step-by-step strategy most times works
@tao that said recent chain of thought work has made hallucination a lot better. I’m still curious what sorts of solutions will come from the neuro symbolic approach and how much incremental value these approaches will have.
@mchav When you say that hallucination is a lot better now what do you mean? That it is less frequent? That it is more obviously wrong so it is easier to spot? That it is as frequent as before but somehow less incorrect?
@oantolin @mchav As a rough analogy, without chain-of-thought the model is blurting out the first thing that comes to mind. With chain-of-thought, the model will think about the options, try different solutions, double-check its work, etc. And testing shows that the more it thinks, the more likely it is to get the right answer.
@erjiang So, of the various options I described you claim hallucinations are less frequent?
@oantolin Yes, and on chatgpt the reasoning models tend to search more which further reduces hallucinations. Although I’m not sure how you are defining hallucinations vs incorrectness.
@erjiang I don't know how to define "hallucinations" either, I use that word because I see other people use it. I don't know if people who use it actually know what they mean by it or not. Let's say for purposes of this discussion I meant "incorrectness" the entire time, which is the thing I actually care about.
@oantolin Ok yeah, so if we’re talking about the domain of mathematics, then the model’s probability of getting the answer right scales with amount of thinking (aka “test-time compute”). Though non-linearly and up to a certain point. Since GPT-5 Pro spends a lot more compute than GPT-5 Thinking, it’s much less likely to give eg an incorrect derivation or result. Obviously this doesn’t mean infinite compute will solve a Millenium Problem though.
@oantolin If you don’t have a Pro subscription and you have any problems that you feel are a bit out of reach of GPT-5, feel free to send me a prompt and I’ll run it through GPT-5 Pro and see how it fares.

@tao I see this in Software Development too.

I prefer getting the specifications in a step by step conversation, review at each step and the final specs, and then finally generate the code and test it.

People have begun calling it the Spec-Driven Developement.

@tao do you enjoy this process ?
ChatGPT - Conjecture disproving strategy

A conversational AI system that listens, learns, and challenges

ChatGPT
@tao ChatGPT, deepSeek and grok can confirm that arXiv on sieves is observably consistent w Cantor's 1+1 diagonal sister=arguments (zero origin stories, base-ten ZOS) left base-9423 (general origin stories, GOS) where a person's "everything" is her chirality + "everything else" is atoms.⚛️ Reading c=√(E/mass) means others have 1+1 grandmas where quadratic scaling hallucinations are "-1" using [9,4,2,3] "integerity"—where Pi=expand π=extend Π=resist; Triplet N=(Spiral N×24)–25≠±9424Pi≠3x3141Pi🚦:.👈
@tao Yeah the key was that you already had a very fast/good way to verify membership of the solution set. It's the ideal problem for AI use.

@tchauhan @tao indeed, one way problem sets much?

Quick to verify but in practice impossible to explore the vast problem space without the expert knowledge to assist.

@tao

/ featured in [de-DE]
 https://www.derstandard.at/story/3100000290748/der-weltbeste-mathematiker-fragte-fuer-ein-ungeloestes-problem-erfolgreich-chatgpt via startpage

Thank you for posting it in an open, non-plagiarism protocol.

Der weltbeste Mathematiker fragte für ein ungelöstes Problem erfolgreich ChatGPT

Mithilfe präziser Fragen an die KI konnte Tao eine Hypothese aus der Zahlentheorie bestätigen. Auch Mathematik-Begeisterte auf Onlineplattformen halfen mit

DER STANDARD