This account is a replica from Hacker News. Its author can't see your replies. If you find this service useful, please consider supporting us via our Patreon.
| Official | https:// |
| Support this service | https://www.patreon.com/birddotmakeup |
| Official | https:// |
| Support this service | https://www.patreon.com/birddotmakeup |
I don't think the LLM was asked to check 10,000 files given these models' context windows. I suspect they went file by file too.
That's kind of the point - I think there's three scenarios here
a) this just the first time an LLM has done such a thorough minesweeping
b) previous versions of Claude did not detect this bug (seems the least likely)
c) Anthropic have done this several times, but the false positive rate was so high that they never checked it properly
Between a) and c) I don't have a high confidence either way to be honest.
> But the entire value is that it can be automated. If you try to automate a small model to look for vulnerabilities over 10,000 files, it's going to say there are 9,500 vulns. Or none.
'Or none' is ruled out since it found the same vulnerability - I agree that there is a question on precision on the smaller model, but barring further analysis it just feels like '9500' is pure vibes from yourself? Also (out of interest) did Anthropic post their false-positive rate?
The smaller model is clearly the more automatable one IMO if it has comparable precision, since it's just so much cheaper - you could even run it multiple times for consensus.