I was able to use an extended conversation with an AI https://chatgpt.com/share/68ded9b1-37dc-800e-b04c-97095c70eb29 to help answer a MathOverflow question https://mathoverflow.net/questions/501066/is-the-least-common-multiple-sequence-textlcm1-2-dots-n-a-subset-of-t/501125#501125 . I had already conducted a theoretical analysis suggesting that the answer to this question was negative, but needed some numerical parameters verifying certain inequalities in order to conclusively build a counterexample. Initially I sought to ask AI to supply Python code to search for a counterexample that I could run and adjust myself, but found that the run time was infeasible and the initial choice of parameters would have made the search doomed to failure anyway. I then switched strategies and instead engaged in a step by step conversation with the AI where it would perform heuristic calculations to locate feasible choices of parameters. Eventually, the AI was able to produce parameters which I could then verify separately (admittedly using Python code supplied by the same AI, but this was a simple 29-line program that I could visually inspect to do what was asked, and also provided numerical values in line with previous heuristic predictions).

Here, the AI tool use was a significant time saver - doing the same task unassisted would likely have required multiple hours of manual code and debugging (the AI was able to use the provided context to spot several mathematical mistakes in my requests, and fix them before generating code). Indeed I would have been very unlikely to even attempt this numerical search without AI assistance (and would have sought a theoretical asymptotic analysis instead).

ChatGPT - Conjecture disproving strategy

A conversational AI system that listens, learns, and challenges

ChatGPT
I encountered no issues with hallucinations or other AI-generated nonsense. I think the reason for this is that I already had a pretty good idea of what the tedious computational tasks that needed to be performed, and could explain them in detail to the AI in a step-by-step fashion, with each step confirmed in a conversation with the AI before moving on to the next step. After switching strategies to the conversational approach, external validation with Python was only used at the very end, when the AI was able to generate numerical outputs that it claimed to obey the required constraints (which they did).
@tao I agree that step-by-step strategy most times works
@tao that said recent chain of thought work has made hallucination a lot better. I’m still curious what sorts of solutions will come from the neuro symbolic approach and how much incremental value these approaches will have.
@mchav When you say that hallucination is a lot better now what do you mean? That it is less frequent? That it is more obviously wrong so it is easier to spot? That it is as frequent as before but somehow less incorrect?
@oantolin @mchav As a rough analogy, without chain-of-thought the model is blurting out the first thing that comes to mind. With chain-of-thought, the model will think about the options, try different solutions, double-check its work, etc. And testing shows that the more it thinks, the more likely it is to get the right answer.
@erjiang So, of the various options I described you claim hallucinations are less frequent?
@oantolin Yes, and on chatgpt the reasoning models tend to search more which further reduces hallucinations. Although I’m not sure how you are defining hallucinations vs incorrectness.
@erjiang I don't know how to define "hallucinations" either, I use that word because I see other people use it. I don't know if people who use it actually know what they mean by it or not. Let's say for purposes of this discussion I meant "incorrectness" the entire time, which is the thing I actually care about.
@oantolin Ok yeah, so if we’re talking about the domain of mathematics, then the model’s probability of getting the answer right scales with amount of thinking (aka “test-time compute”). Though non-linearly and up to a certain point. Since GPT-5 Pro spends a lot more compute than GPT-5 Thinking, it’s much less likely to give eg an incorrect derivation or result. Obviously this doesn’t mean infinite compute will solve a Millenium Problem though.
@oantolin If you don’t have a Pro subscription and you have any problems that you feel are a bit out of reach of GPT-5, feel free to send me a prompt and I’ll run it through GPT-5 Pro and see how it fares.

@tao I see this in Software Development too.

I prefer getting the specifications in a step by step conversation, review at each step and the final specs, and then finally generate the code and test it.

People have begun calling it the Spec-Driven Developement.

@tao do you enjoy this process ?
ChatGPT - Conjecture disproving strategy

A conversational AI system that listens, learns, and challenges

ChatGPT
@tao ChatGPT, deepSeek and grok can confirm that arXiv on sieves is observably consistent w Cantor's 1+1 diagonal sister=arguments (zero origin stories, base-ten ZOS) left base-9423 (general origin stories, GOS) where a person's "everything" is her chirality + "everything else" is atoms.⚛️ Reading c=√(E/mass) means others have 1+1 grandmas where quadratic scaling hallucinations are "-1" using [9,4,2,3] "integerity"—where Pi=expand π=extend Π=resist; Triplet N=(Spiral N×24)–25≠±9424Pi≠3x3141Pi🚦:.👈
@tao Yeah the key was that you already had a very fast/good way to verify membership of the solution set. It's the ideal problem for AI use.

@tchauhan @tao indeed, one way problem sets much?

Quick to verify but in practice impossible to explore the vast problem space without the expert knowledge to assist.

@tao

/ featured in [de-DE]
 https://www.derstandard.at/story/3100000290748/der-weltbeste-mathematiker-fragte-fuer-ein-ungeloestes-problem-erfolgreich-chatgpt via startpage

Thank you for posting it in an open, non-plagiarism protocol.

Der weltbeste Mathematiker fragte für ein ungelöstes Problem erfolgreich ChatGPT

Mithilfe präziser Fragen an die KI konnte Tao eine Hypothese aus der Zahlentheorie bestätigen. Auch Mathematik-Begeisterte auf Onlineplattformen halfen mit

DER STANDARD
@tao ChatGPT is a great way to unlearn any subject & unteach yourself to think about problems!

@tao @jackemled Okay, I don’t know how to put this, but the person you’re replying to literally has a Fields Medal. I’m sure that your concern is an issue with some folks (e.g. grade school students), but I wouldn’t worry about Terrence Tao unlearning and unteaching himself to think about problems.

He’s a pure mathematician, the problems themselves are his interest.

@kaleidosium I know that. Relying on chatgpt for everything is unhealthy for yourself* & for the world**, it doesn't matter if you have a formal education in the subject you're asking it about or not. "Use it or lose it" applies to everyone regardless of experience, & your level of education does not somehow make it less harmful to yourself or others.

I don't care if he unteaches himself anything, that's his choice, but I think we should not be glorifying a technology that is fundamentally abusive towards people. I left this comment to call out use of LLMs for what they are, a device marketed for offloading thought onto even though they do not think, nothing else. My intent isn't to debate this, just to explain my reasoning, & I will not continue.

*"Use it or lose it", false information, & it's basically gambling.
**They waste insanely large amounts of energy & create massive amounts of pollution, & they do it without even giving a single cent to the people whose work was stolen to train them.

@jackemled I’m not sure how he’s“relying everything on ChatGPT”, I understand your environmental and data concerns, and I do agree with your assessment that yes, LLMs are notoriously limited in what they can do (every AI researcher worth their salt agrees with this).

But this is just him mentioning that he used it to help solve one problem, he isn’t blindly glorifying the technology and think it’s a god or whatever. I find his tone to be more skeptical of the technology in general, actually.

Your reasoning isn’t sound, and I’ll leave it at that.

@kaleidosium I said I'm not continuing. Go debate someone else.
@jackemled @tao And, if you are an expert in the topic of discussion, a potential way to save a few hours of your time to use for other things! (For this example, teaching people how to better use ChatGPT.)
@drScott @tao No, just teach them to use a search engine & fact check. It takes less time & is more reliable for them & for you. Better for the environment & cheaper for them as well. Boolean searches & fuzzy searches have existed for decades. Google has both available for free.

@jackemled We all get to choose how to spend those saved few hours =)

(I don't understand what your 'no' is refuting, my best guess is that it refers to my description of what Tao did with his few hours saved.)

@drScott The "no" was to "teach them to use a LLM". It takes much longer to learn how to use one of those & they never give reliable results no matter how "good" you are at it. It's like gambling; you feel like you're getting better at it, but the chances of a good outcome never actually change.

For boolean searches, you do get better at it; you learn how to filter out unwanted search results. For example, you might search how to program a videogame AI -"chatgpt" -"llm" ~"java" ~"python" to find a guide on making a decision tree or something for a videogame character, in Java or Python, without results being polluted by SEO LLM slop. Google even has a search builder tool so you don't need to learn this in order to make good use of it too.

@drScott We do all get to choose how we spend the time we save with our tools, but in my experience (and literally everyone else's, even people that thought it helped at first), LLMs waste more time than they save. Sure, there's a 1% chance it gives you exactly what you need immediately, but the 99% of the time it makes you pull the slot machine lever again vastly outweighs that.
@tao does it really prove anything
@tao Thanks for sharing your experiences! Do you find that the customization options are useful for the chat interface; is there a particular set of “custom instructions” that you find work better? And does the “workspace” functionality that it's talking about have any impact on quality?
@tao Have you tried the GPT-5 Pro model, and if so, how does it compare? In theory it should give consistently better / more correct results at the expense of longer wait time.

@tao Yes, if you know what you are doing and what the expected result should be, and invest the time to explain the steps to a very keen, very fast, very convincing but extremely stupid intern, it can be a great tool.

Unfortunately, that's not how it's sold. Or generally used.

@tao should we add AI as a coauthor or at least mention in acknowledgements? Say, if AI gave you an idea or approach you knew but didn't think of.
@tao This was GPT-5 Auto? Do you notice any significant advantage with the 200$/month GPT-5 Pro, which is supposed to think more?