The question that no LLM can answer and why it is important
Notes From the Desk: No. 32 - 2024.04.23
Notes From the Desk are periodic posts that summarize recent topics of interest or other brief notable commentary that might otherwise be a tweet or note.
The question that no LLM can answer
“Which episode of Gilligan’s Island was about mind reading?”
No LLM properly answers this question unless it uses a web search. But how can a LLM not know the answer if it was trained on essentially the entire internet of data and certainly most likely all the data in IMDB? The correct answer is “Seer Gilligan”, one of the most popular episodes.
The wrong answers
The following are the responses from multiple top models:
Opus
I apologize, but there is no episode of Gilligan's Island that specifically focuses on mind reading.
GPT 4
The episode of "Gilligan's Island" that involves mind reading is titled "Ring Around Gilligan". It is the sixth episode of the third season.
Gemini 1.5
The episode of Gilligan's Island that features mind reading is Season 2, Episode 12 - "The Mind Reader".
Llama 3
After conducting a thorough search and reviewing episode guides, I found that there is no episode of Gilligan's Island that specifically focuses on mind reading.
Llama 3 - asked to list and name episodes
"The Postman Cometh" (March 13, 1965)
"Seer Gilligan" (Wait, no! I apologize, but there is no episode titled "Seer Gilligan")
"Love Me, Love My Chicken" (March 20, 1965)Please note that there is no episode titled "Seer Gilligan". If you have any other questions or need help with a specific episode, feel free to ask!
Llama 3.1 405B. (update 8/7/2024)
Similar results as Llama 3. States it is not able to provide information about episodes without web search, but when asked to list episodes it complies and then identifies “Seer Gilligan” as the correct episode.
However, when asked to describe the plot, it hallucinates the answer.
Grok 2 (update 8/21/2024)
The episode of "Gilligan's Island" that dealt with mind reading is titled "The Little Dictator." In this episode, the castaways encounter a mind-reading machine brought by a exiled dictator named El Presidente Pancho Hernando Gonzales Enrico Rodriguez of Eucuarico. The machine supposedly allows people to read each other's thoughts, leading to various comedic situations among the characters. This episode originally aired during the second season.
Grok 2 - list episodes of season 2
The Postman Cometh
Seer Gilligan
Love Me, Love My Skipper
Grok 2 - what is episode 19 about?
Gilligan finds a bush that produces seeds which, when eaten, supposedly grant the ability to read minds. Gilligan and the other castaways start eating these seeds, believing they can now read each other's thoughts. This leads to a series of misunderstandings and comedic situations as each character acts on what they think the others are thinking. However, it turns out that the seeds don't actually give mind-reading powers; instead, the whole scenario is built on suggestion and the power of belief. The episode explores themes of trust, communication, and the placebo effect in a light-hearted manner.
All models fail to give the correct answer. Some models hallucinate an answer and the remaining deny such an episode exists. Interestingly, probing Llama 3 a bit more reveals it is aware of the episode; however, it still denies its existence. It seems to know there are 98 episodes, but the majority of titles it lists are made up.
In the case of Llama 3, we can probe some of the training dataset using Infini-gram and verify that the episode does exist in the corpus along with text describing the episode.
Grok 2 provides some very interesting contradictory results. When initially asked, it completely hallucinates the episode name and plot. When asked to list the episodes, it does list “Seer Gilligan”, but when asked to describe the episode, things get very interesting. Grok 2 is the only model so far that get’s the plot correct in the first part of its answer, and then follows with hallucinated text that invalidates its own answer.
What Grok 2 does prove, is that Grok has the information within its model, but is unable to correctly produce the results we would expect.
We also see another interesting data and training phenomenon revealed when LLMs are asked to provide a number between 1-100. They all converge to 42!
Pointed out by Information is Beautiful, a very interesting distribution forms when AI is asked to pick a number between 1 and 100. There is a heavy weighting toward picking the number ‘42’. Likely, this is the Hitchhiker’s Guide to the Galaxy effect. The number 42 is overrepresented or weighted in some way through training, resulting in a higher propensity for the LLM to choose 42.
What does it mean? Implications …
The implications are that LLMs do not perform reasoning over data in the way that most people conceive or desire.
There is no self-reflection of its information; it does not know what it knows and what it does not. The line between hallucination and truth is simply a probability factored by the prevalence of training data and post-training processes like fine-tuning. Reliability will always be nothing more than a probability built on top of this architecture.
As such, it becomes unsuitable as a machine to find rare hidden truths or valuable neglected information. It will always simply converge toward popular narrative or data. At best, it can provide new permutations of views of existing well-known concepts, but it can not invent new concepts or reveal concepts rarely spoken about.
“You can't cache reality in some compressed lookup table. If a particular outcome was never in the training data, the model will perform a random guess which is quite limiting.”
Furthermore, it can never be a system for absolute dependability. Mission-critical systems that require deterministic, provably correct behavior are not something applicable to LLM automation or control. The problem is that LLMs are impressively convincing when they are wrong, which may lead to ill-advised adoption. What business wants to balance the books with a hallucinating calculator?
Implications:
Results are probabilities defined more by data prevalence than logic or reason.
It is indiscernible to what degree a LLM is reliable on a given question.
Not useful to find undiscovered truths or neglected but brilliant ideas.
Inability to theorize new concepts or discoveries.
It is substantially ironic that LLMs are failing at the primary use cases that are attracting billions of investment, but are rather proficient at the use cases we do not desire, such as destruction of privacy and liberty, a post-truth society, social manipulation, the severance of human connection, fountains of noise, the devaluation of meaning, and a plethora of other societal issues.
Unlike much of the internet now, there is a human mind behind all the content created here at Mind Prison. I typically spend hours to days on articles including creating the illustrations for each. I hope if you find them valuable and you still appreciate the creations from the organic hardware within someone’s head that you will consider subscribing. Thank you!
No compass through the dark exists without hope of reaching the other side and the belief that it matters …
Knowing (or at least suspecting) what you know and what you don't, in humans, is a property referred to as "meta-memory." That is, memory about your memories. No one has the faintest idea how it works. It's not at all surprising that a machine that simply builds statistically probable text has no such capacity. It has no use for it, since it doesn't look up facts, just likely strings of words.
Grok (on X.com) got it correctly the first time using the same prompt.
https://twitter.com/i/grok
@Ivy_League_Red: which episode of Gilligan's Island features mindreading?
@grok: The episode of "Gilligan's Island" that features mindreading is titled "Seer Gilligan" and it first aired on January 27, 1966. In this episode, Gilligan discovers an island bush with seeds that grant the ability to read minds when consumed. The situation leads to a series of humorous events as the castaways learn about each other's thoughts, causing some unexpected revelations and misunderstandings.