AI ends middleware, testing AI is flawed

Notes From the Desk: No. 2 - 2023.08.31

Aug 31, 2023

Notes From the Desk are periodic posts that summarize recent topics of interest or other brief notable commentary that might otherwise be a tweet or note.

Brian Roemmele comments on AI replacing all Youtube instructional videos

What does the future look like if any instructional video you can find on YouTube can be generated on-demand and unreal-time on your local computer?
What companies change?
Adjust accordingly.

The implications are much larger. Not sure how many have perceived this yet, but AI is the end of all middleware.

Almost all companies are just middleware between you and what you want to do. AI will just deliver what you want to do. That's the end of everything else. I continue to be surprised at how this is not a part of hardly any conversation.

Scott Adams writes about AI poor writing

I wonder if AI writes poorly (compared to professional writers) because it has no way to recognize good writing.
The way a human recognizes good writing is, in large part, by the cognitive load. If I have to work hard to understand a sentence, it's a bad sentence.
Would AI "know" that? If AI only learned from patterns of human writers, it would never rise above mediocre.

It is interesting the significantly different perspectives on AI’s capabilities. I suspect this is in part due to the nature of prompting. This has turned out to have a substantial effect on the results. This leads to largely diverging opinions on capability.

As far as Scott’s question, AI in theory would have access to the patterns of what humans have written in regards to evaluations of writing. So AI would be able to infer from our evaluations of each other the patterns that would define good writing. Likely, good results could be obtained if a prompt can be constructed to narrow the scope to content that would be perceived to have high evaluations.

AI Explained has a new post about developments in prompting and catastrophically flawed benchmarks being used to test the Large Language Models (LLM’s) like ChatGPT.

If you are interested in deep technical information about the developments in AI, AI Explained is an excellent source.

New records on the Massive Multitask Language Understanding (MMLU) have been achieved through the evolution of improved prompting. This is a competency test for the AI of 14,000 questions across 57 different domains. This is the current most prominent benchmark being used to compare different LLM’s like ChatGPT.

A score of 95% to 100% on the MMLU, some consider to be performance at AGI level or equivalent human intelligence capability. Lennart Heim, Future of Life Institute Channel, stated that within 20 years we might have something that can perform at 95% on the MMLU. GPT-4 is currently scoring 86.4%.

AI Explained was able to achieve a score of 89.0% using only a portion of their improved prompting techniques beating the latest state of art score 86.4%. The full set of prompting techniques was not utilized due to the extra cost which would have been incurred to run the tests, but could have resulted in an even higher score.

A couple of major implications AI Explained revealed from testing.

Prompting has massive implications for accuracy and performance. AI Explained expects we will likely approach and surpass 95% next year and not 20 years.
The MMLU turns out have have at least 100 or more wrong or flawed questions itself. This could add 2% error or more to the results and the best LLM is often decided within a smaller margin than 1%. Optimization for the wrong results are possible. So, although the LLM’s are getting better, they are getting better at what?

Additionally, some of the test criteria included as part of the assessment of AI performance is also the AI’s ability to view gender as a social construct as determined by questions and answers in the MMLU. Apparently, the AI’s must be proper social engineers.

It should also be somewhat concerning the lack of professionalism and accuracy of the testing methodologies at present. These are the same people who profess to safely usher in aligned AGI.

Satori Graphics made a post about AI and graphic designers asserting “I think for now at least for the next decade or two designers will have a big role in that[graphic design] as humans”

This perspective continues to reinforce what I perceive as our inability to grasp exponential accelerating change. Almost all opinions by those observing AI from the outside see it as a point-in-time change. A new innovation that we will adapt to and then move along for a decade or so until the next big innovation. That is the precedent set by prior innovations, but if AI does continue along the path of exponential improvement, then we have no ability to comprehend what will still be relevant a decade from now.

AI ends middleware, testing AI is flawed

Notes From the Desk: No. 2 - 2023.08.31

Discussion about this post