Mastodawn

David Andersen Feb 25, 2025

Today's third slide.

Prompt: Write, in C++, a solution to the following problem: in the first 1,000,000 prime numbers, how many are divisible by 13?

Copilot: Sure!!!!!! Yay! I can has output a program!!!

...

ok so first: My dear students, if someone asked you to do this task, WHAT WOULD YOU SAY?

Right. "You're a fucking moron, they're prime."

Ok maybe not quite that language at work.

Second: LOOK AT THAT ISPRIME FUNCTION. Wtf.

(1) use a sieve, for the love of puppies

(2) Use a LIBRARY. primesieve is great. It's fast. It's efficient. It's already written and is correct and someone's already tested the snot out of it.

And storing all of those primes in a vector before testing them? Is this Microsoft's new approach to propping up the DRAM market?

I mean don't write that program in the first place because it's stupid and you could simply write return(1) if you were, for some reason, forced to write it, but if you're going to solve a _similar_ problem, don't add more freshman-level code to the universe.

Show thread

Wan Shen Lim Feb 26, 2025

@dave_andersen FWIW, Claude-3.5-Sonnet immediately recognizes that there's a trick: "I'll help write a solution to this problem. Note that this is actually a trick question - no prime number (except 13 itself) can be divisible by 13, as any number divisible by 13 would be composite by definition! However, I'll write a program that demonstrates this and verifies our reasoning".

No sieve, but their isPrime function only checks up to sqrt(n), skips even numbers with += 2, uses standard (n % i) == 0 testing. It ends with "A more efficient solution wouldn't need to compute anything - we could just return 1 as the answer since we know mathematically that 13 is the only prime that can be divisible by 13. However, I provided the computational solution to demonstrate the verification of this mathematical property".

Additionally, following up with "can you use a library?" generates code that uses primesieve.

In general, I find that Claude-3.5-Sonnet (which is the default AI model for the Cursor IDE) outperforms Copilot significantly and is a better representation of current AI coding assistance capabilities. I've tried and given up on a lot of the predecessors over the years (e.g., ChatGPT web interface, VSCode + Copilot), they weren't worth using at the time and I became very skeptical of AI-generated code. But Cursor+Claude changed my mind, I actually pay for that now (fixed cost of $20/month).

Show thread

David Andersen

@capybara You have free access to Claude 3.5 through github copilot, which is free for academic use, if you want to save some money.

I also use claude for my own coding. And I like it -- you'll note that my post a few back was about a positive use of LLMs. :)

I'm pretty sure I can find ways that Claude falls on its face that a human wouldn't, but I do agree with you that it's better than whatever's available in the base copilot version. The problem of writing freshman-level code still applies -- even though Claude did better (and recognized the logical trap), it still generated code that had no sieve and unnecessarily stored all of the primes in a vector. It's a _better_ student but it's still regurgitating student-level code.

Now, I admit, this isn't how I use LLMs -- I tell them exactly what to do and _how to do it_ and they work pretty well. It saves me a lot of googling for the syntax of specific libraries, etc.

But the thing I'm trying to get across to my students is that they need to already be a good programmer to use LLMs well. Because used naively, all of the assistants right now are worse than, say, a junior CMU CS student.

Show thread

Wan Shen Lim Feb 26, 2025

@dave_andersen TIL about copilot including Claude. Thanks!

And fully agreed. I think I'm seeing an interesting split in student opinion on AI code assistance for 15-799's first project right now (there is extremely little public code on GitHub that demonstrates how to use Apache Calcite) - in office hours, I feel like half the students told me that it's really helpful for API discovery, and the other half told me that it just keeps generating garbage. Maybe we should run a poll once the project is over. :)