LLMs remain confidently unreliable and is the intelligence explosion near?
Notes From the Desk: No. 34 - 2024.06.08
Notes From the Desk are periodic posts that summarize recent topics of interest or other brief notable commentary that might otherwise be a tweet or note.
LLMs remain confidently unreliable
Continuing the counter-viewpoints of highly intelligent machines, a summary of lessons learned, What We Learned from a Year of Building with LLMs (Part I), provides a lot of detail on the nuance of attempting to use LLMs effectively. In short, there are minefields everywhere.
LLMs will return output even when they shouldn’t
A key challenge when working with LLMs is that they’ll often generate output even when they shouldn’t. This can lead to harmless but nonsensical responses, or more egregious defects like toxicity or dangerous content. For example, when asked to extract specific attributes or metadata from a document, an LLM may confidently return values even when those values don’t actually exist. Alternatively, the model may respond in a language other than English because we provided non-English documents in the context.
While we can try to prompt the LLM to return a “not applicable” or “unknown” response, it’s not foolproof. Even when the log probabilities are available, they’re a poor indicator of output quality …
Hallucinations are a stubborn problem.
… factual inconsistencies are stubbornly persistent and more challenging to detect. They’re more common and occur at a baseline rate of 5 – 10%, and from what we’ve learned from LLM providers, it can be challenging to get it below 2%, even on simple tasks such as summarization.
Keep in mind that hallucination error rates may be more impactful than error rates of any other types of tools, specifically because, as also mentioned in the article, they are challenging to detect. LLMs are excellent at convincing you the wrong answer is the correct answer.
Furthermore, the article acknowledges the difficulty in attempting to have objective evaluations for LLMs.
Evaluating LLMs can be a minefield. The inputs and the outputs of LLMs are arbitrary text, and the tasks we set them to are varied.
Consisting of such arbitrary attributes, it makes it very easy to selectively present whatever perspective you desire to promote. Which is part of the problem we see with benchmarks that seem to always suggest far greater capability than most experience in real-world use.
Nonetheless, the article is an excellent read if you are attempting to do something productive with LLMs, as it does provide some suggestions as well as informs the reader of many limitations of current LLMs, which may be very important for your decisions to invest in these tools.
Intelligence explosion is near?
Former OpenAI Superalignment team member Leopold Achenbrenner puts forward his reasoning for why we are almost certainly on path toward AGI and superintelligence within the next few years.
The jump to superintelligence would be wild enough at the current rapid but continuous rate of AI progress (if we could make the jump to AGI in 4 years from GPT-4, what might another 4 or 8 years after that bring?). But it could be much faster than that, if AGI automates AI research itself.
The graph below is Leopold’s illustration for the rapid acceleration to AGI and beyond. However, it is built entirely on a false premise: that being the progress being illustrated has actually been achieved, when in reality GPT-4 can’t replace hardly anything that a “smart high schooler” could do in the real world. The labels of capability on the graph are entirely imagined. There remain significant doubts as to our ability to reach AGI.
Below are the only recognized obstacles to AGI from the article, but they are mostly dismissed. However, what is most telling is what is missing: there is no speculation as to whether LLMs truly represent intelligence or reasoning. It seems this is assumed to be an unquestionable default.
There are several plausible bottlenecks—including limited compute for experiments, complementarities with humans, and algorithmic progress becoming harder—which I’ll address, but none seem sufficient to definitively slow things down.
Is this representative of what OpenAI internally believe they are achieving? Or does it represent the required necessary views of a Superalignment team to justify their existence? However, based on the powerful influential effects of LLMs, it wouldn’t be too surprising that they create a deceptive feedback loop that fulfills what people want to see in them - both capability and fear.
Artificial Intellect: The Intelligence Illusion
What is the true nature of these machines that we are creating? Will they manifest dreams or lead us into the abyss?
Is it a machine which has captured humanity’s hubris and will end with a system collapse of the order of civilization?
A false oracle of insight that fools the populace with the elegance of words that obfuscate what underlines: nothing more than permutations of everything which had already existed prior.
Wisdom abandoned for the artifice of knowledge.
Reflections From Imagined Worlds
Literary, film, music or other artistic mediums that inspire reflection, insight, and critical thinking about our current world.
Outer Limits - Stream of Consciousness
What happens when humanity becomes totally dependent on machines? When even thoughts of the mind are no longer our own? Who can fix the machine that no one understands?
Outer Limits episode “Stream of Consciousness” questions the wisdom of a society totally dependent on machines. How far away are we from this imagined reality?
Unlike much of the internet now, there is a human mind behind all the content created here at Mind Prison. I typically spend hours to days on articles including creating the illustrations for each. I hope if you find them valuable and you still appreciate the creations from the organic hardware within someone’s head that you will consider subscribing. Thank you!
No compass through the dark exists without hope of reaching the other side and the belief that it matters …
Excellent article. To your point about Chat GPT is not as effective as a high schooler, what happens when we are fooled into thinking it is, and sit high schoolers in front of it, even to supplement kids education? Disaster.
I suspect that the hype cycle is being used here as gradual approximation so that some type of technology can be put in place just like "The Science" supplanted actual science and observation.
Listened to recent AJones interview where he postulated that due to the level of extreme and varied adverse reactions to the covid vaccinations, he believes AI was involved in its genetic formulation. the covid lockstep in its coercive global deployment and non-stop attacking dissenting voices is also revealing a real malevelance towards humanity.