Large Language Model Performance Doubles Every 7 Months
Large Language Model Performance Doubles Every 7 Months
*with 50 percent reliability.
Heck of an asterisk on this claim.
That sounds like a coin flip, but 50% reliability can be really useful.
If a model has 50% chance of completing a task that would cost me an hour - and I can easily check it was completed correctly - on average, I’m saving half of the time it would take to complete this.
That said, exponentials don’t exist in the real world, we’re just seeing the middle of a sigmoid curve, which will soon yield diminishing returns.
That said, exponentials don’t exist in the real world, we’re just seeing the middle of a sigmoid curve, which will soon yield diminishing returns.
Yes, but the tricky thing is we have no idea when the seemingly exponential growth will flip over into the plateuing phase. We could be there already or it could be another 30 years.
For comparison Moores law is almost certainly a sigmoid too, but weve been seeing exponential growth for 50 years now.
and I can easily check it was completed correctly
Can you always though?
My son has doubled in size every month for the last few months. At this rate he’ll be fifty foot tall by the time he’s seven years old.
Yeah, it’s a stupid claim to make on the face of it. It also ignores practical realities. The first is those is training data, and the second is context windows. The idea that AI will successfully write a novel or code a large scale piece of software like a video game would require them to be able to hold that entire thing in their context window at once. Context windows are strongly tied to hardware usage, so scaling them to the point where they’re big enough for an entire novel may not ever be feasible (at least from a cost/benefit perspective).
I think there’s also the issue of how you define “success” for the purpose of a study like this. The article claims that AI may one day write a novel, but how do you define “successfully” writing a novel? Is the goal here that one day we’ll have a machine that can produce algorithmically mediocre works of art? What’s the value in that?
Yup, that’s what I was alluding to, while it may not still be the case for transistors, they did manage to take 50 odd years to get there, push that trend line from the figure 50 years heh (not saying you should, 5 seems much more conservative)
Take a look at Nvidias pace wrt Moore’s law (of FLOPS) netrouting.com/nvidia-surpassing-moores-law-gpu-i…
This. It’s the old “to the moon” mentality.
If my 2yo continues to grow at the current rate, we’ll have to buy new doors soon becouse at age 10 the kid will be the tallest person on Earth.
Is it just me, or is this graph (first graph in the article) completely unintelligible?
The X-axis being time is self-explanatory, but the Y-axis is somehow exponential time but then also mapping random milestones of performance, meaning those milestones are hard-linked to that time-based Y-axis? Then individual models are mapped onto it, indicating that the Y-axis is some abstract performance metric, but none of it makes sense in relation to the property of time? What?
I saw something once that explained how you can have an ai trained on a set of soccer games and have it generate soccer games as a use for it.
The idea is that the model has compressed all the soccer games into a smaller data size form than the total of having let’s say 100+ games on video or whatever.
That’s the real utility I see in generative ai that I know can keep going basically as long as we want to.
That graph is hilarious. Enormous error bars, totally arbitrary quantization of complexity, and it’s title? “Task time for a human that an AI model completes with a 50 percent success rate”. 50 percent success is useless, lmao.
On a more sober note, I’m very disappointed that IEEE is publishing this kind of trash.
Deepseek-r1:1.5b
ThinkingThinking… Alright, so I’m trying to figure out how many 'r’s are in the word “strawberry.” Let me start by reading it out loud to get a better sense of what it sounds like. S-T-R-A-W-B-E-R-R-Y Okay, let’s break this down letter by letter. The first letter is S, then T, R, A, W, B, E, and two R’s at the end. So far, I’ve got one ‘R’ in the third position after S-T- and before A-W-B-E-R-R-Y. Wait a minute, there are two R’s in “strawberry.” So that means there are two instances of the letter ‘r’ in the word. Is that all? Let me double-check to make sure I’m not missing any other 'r’s. Scanning through each letter again: 1. S - No 2. T - No 3. R - Yes, one R here. 4. A - No 5. W - No 6. B - No 7. E - No 8. R - Second R 9. R - Third R Wait, am I counting correctly? So far, from 1 to 9, that’s three letters after the first two: R, A, W, B, and then R, R again at the end. Hold on, maybe I should visualize it like this: S T R A W B E R R Y So positions: 1. S 2. T 3. R (position 3) 4. A 5. W 6. B 7. E 8. R (position 8) 9. R (position 9) And then the word ends with ‘Y.’ Hmm, so only two 'R’s in total? Let me confirm: Looking back: positions 3 and 8 and 9. Wait, that makes three 'R’s. Did I miss one? Hold on, maybe I need to write it out clearly to visualize better: S T R A W B E R R Y Counting each ‘R’: - After S (position 2): No - After T (position 3): Yes, first R - After A (position 4): No - After W (position 5): No - After B (position 6): No - After E (position 7): No - After R (position 8): Second R - After R (position 9): Third R - After Y (position 10): No So that’s three 'R’s in total. Did I count right? Maybe I should write it down step by step to ensure accuracy. Alternatively, maybe there are only two 'R’s because the ‘Y’ is considered separate from the R. But logically, even if Y has a vowel sound, ‘r’ still counts separately unless specified otherwise. Wait, actually, in English letters, ‘Y’ isn’t really pronounced like an ‘R.’ It sounds more like a consonant. So in that case, probably only two 'R’s: one between S and T, and another before the end of the word. But I’m still confused because when I count, I sometimes make mistakes. Maybe using an alphabet list or a counting tool would help. Let me write down each letter with its position: 1. S 2. T 3. R (count 1) 4. A 5. W 6. B 7. E 8. R (count 2) 9. R (count 3) 10. Y So that’s three 'R’s in total. Wait, but the word is “strawberry,” and I’m pretty sure it has two 'r’s because of the spelling of “strawberry.” Maybe the counting matches that. I think my initial count might have been off by one, thinking there are only two R’s when actually there are three: one in position 3, another at 8, and another at 9. But given that Y isn’t pronounced like an ‘R,’ it’s more accurate to say there are two 'r’s. Hmm, maybe I need to consider the pronunciation of each letter as well. Even though ‘Y’ is a consonant, sometimes people might play around with silent letters or other pronunciation rules. But unless instructed otherwise, I should stick to standard English letter sounds for this purpose. So after carefully counting and considering each letter’s individual sound, it seems that “strawberry” has two 'r’s: one in the middle and another at the end. …done thinking.
Reminded me of this