Good insight. I’ve arrived at the same conclusion (that LLMs are better at solving known problems than new ones) just by messing with AI for my personal projects.
Sometimes I actually lean into it: I prompt Claude to “prioritize selecting a solution that would be preferred by 99% of world class developers.”
Prompt Engineering is not as important as it was in the GPT 3.5 days, but it still helps to narrow the window that the LLM calculates probability from. And narrowing it to the middle of the bell curve, where the most “battle tested” solutions live, seems to be especially effective.
> And narrowing it to the middle of the bell curve, where the most “battle tested” solutions live, seems to be especially effective.
Yes, being productive with AI means not fighting against what it naturally does. It is just problematic because we really never know where the slope changes on the bell curve. We can only somewhat assume for something relatively well known, that AI will likely have higher reliability for its output.
All our most of those LLMs will write you a perfectly working Zig implementation if you give it the context from the Zig reference manual, meaning you download it from the official source and then attach it to your prompt. I do this with Pinescript 6 and it works with any long context LLM perfectly fine. I'm not trying to defend AI or anything like that but sometimes you just need to add a little bit of extra knowledge when it comes to newer stuff and it'll make less mistakes because it's referencing documents that are in its memory as opposed to the huge ball of information that's stored in its training data
Yes, that sometimes works, but not always. Grok's agents will go and search for the information it needs to complete a task, but it still failed this one.
In this case, I just now tried supplying the reference manual for Grok, Gemini, and Claude. They still failed. Most hallucinate that Zig has XML parsing builtin. They also tend to make a lot of syntactical errors. Some of these are likely due to Zig's changing specifications. The models don't do well understanding versioning of libraries. If there is a lot of older code in a particular pattern, they often will produce that pattern even when you tell them to use the latest version etc.
Now, it is not that I can't get it to work. I previously eventually did with Zig, without attaching the manual, but it took several rounds of explaining it cannot do XML parsing and correcting its syntax.
The main point is simply that AI LLM capabilities are data constrained. AI hasn't actually learned how to program or understand language. If it did, then you could give it something new and it could figure it out, etc.
Non programmer here. ISTM you nailed something more clearly than I've seen elsewhere.This is a crude example, but I gather that if "2+2=5" appeared enough times in its training data, an AI model wouldn't "see" any problem in offering it up. That is, all that prevents it from offering it up is that it almost never occurs in the training data. So, AI is likely to be accurate in the most common cases, but will be hit and miss in niche ones, and usually useless in new ones. As you say, not worth the level of investment (which is up against practical limits anyway).
Yes, that is correct. It doesn't actually understand anything. It perceives no right or wrong, correct or incorrect etc. It is all a probability calculation.
Good insight. I’ve arrived at the same conclusion (that LLMs are better at solving known problems than new ones) just by messing with AI for my personal projects.
Sometimes I actually lean into it: I prompt Claude to “prioritize selecting a solution that would be preferred by 99% of world class developers.”
Prompt Engineering is not as important as it was in the GPT 3.5 days, but it still helps to narrow the window that the LLM calculates probability from. And narrowing it to the middle of the bell curve, where the most “battle tested” solutions live, seems to be especially effective.
Thanks!
> And narrowing it to the middle of the bell curve, where the most “battle tested” solutions live, seems to be especially effective.
Yes, being productive with AI means not fighting against what it naturally does. It is just problematic because we really never know where the slope changes on the bell curve. We can only somewhat assume for something relatively well known, that AI will likely have higher reliability for its output.
All our most of those LLMs will write you a perfectly working Zig implementation if you give it the context from the Zig reference manual, meaning you download it from the official source and then attach it to your prompt. I do this with Pinescript 6 and it works with any long context LLM perfectly fine. I'm not trying to defend AI or anything like that but sometimes you just need to add a little bit of extra knowledge when it comes to newer stuff and it'll make less mistakes because it's referencing documents that are in its memory as opposed to the huge ball of information that's stored in its training data
Yes, that sometimes works, but not always. Grok's agents will go and search for the information it needs to complete a task, but it still failed this one.
In this case, I just now tried supplying the reference manual for Grok, Gemini, and Claude. They still failed. Most hallucinate that Zig has XML parsing builtin. They also tend to make a lot of syntactical errors. Some of these are likely due to Zig's changing specifications. The models don't do well understanding versioning of libraries. If there is a lot of older code in a particular pattern, they often will produce that pattern even when you tell them to use the latest version etc.
Now, it is not that I can't get it to work. I previously eventually did with Zig, without attaching the manual, but it took several rounds of explaining it cannot do XML parsing and correcting its syntax.
The main point is simply that AI LLM capabilities are data constrained. AI hasn't actually learned how to program or understand language. If it did, then you could give it something new and it could figure it out, etc.
Basically save this HTML file and attach it when needed: https://ziglang.org/documentation/master/
Non programmer here. ISTM you nailed something more clearly than I've seen elsewhere.This is a crude example, but I gather that if "2+2=5" appeared enough times in its training data, an AI model wouldn't "see" any problem in offering it up. That is, all that prevents it from offering it up is that it almost never occurs in the training data. So, AI is likely to be accurate in the most common cases, but will be hit and miss in niche ones, and usually useless in new ones. As you say, not worth the level of investment (which is up against practical limits anyway).
Yes, that is correct. It doesn't actually understand anything. It perceives no right or wrong, correct or incorrect etc. It is all a probability calculation.