New AI safety failures and self-improving AI nears
Notes From the Desk: No. 11 - 2023.10.10
Notes From the Desk are periodic posts that summarize recent topics of interest or other brief notable commentary that might otherwise be a tweet or note.
‘Disruptive’ science has declined — and no one knows why
That’s the headline from Nature.
The number of science and technology research papers published has skyrocketed over the past few decades — but the ‘disruptiveness’ of those papers has dropped, according to an analysis of how radically papers depart from the previous literature.
Short summary of paper.
Could institutional conformity and a culture that now refrains from questioning authority be part of this puzzle?
The mind crushing noise of social media
How much clarity of mind could anyone have with hundreds to thousands of focus interruptions per day? How productive could anyone be towards reaching their goals?
New research Common Sense Media released Tuesday finds about half of 11- to 17-year-olds get at least 237 notifications on their phones every day. About 25% of them pop up during the school day, and 5% show up at night.
In some cases, they get nearly 5,000 notifications in 24 hours. The pop-ups are almost always linked to alerts from friends on social media.
Kids and teens are inundated with phone prompts day and night
The behavior of all of society is shaped by algorithms. We now can deploy technology far faster than we can ever reason about the consequences. We don’t know what we are doing to society. It will not be known for years after each new algorithm or invention is released.
New jailbreak vulnerabilities discovered in LLMs
Another new method of jailbreaking, getting the AI to perform tasks that are supposed to be forbidden, has been discovered.
Our work exposes the inherent cross-lingual vulnerability of these safety mechanisms, resulting from the linguistic inequality of safety training data, by successfully circumventing GPT-4's safeguard through translating unsafe English inputs into low-resource languages.
A low-resource language is simply a language that does not have enough data useful for AI training.
There is an important pattern here, expected by few and unexpected by most, that being these types of "surprises" are unlikely to ever end. If AI safety is dependent on forever closing such vulnerabilities, then it is a flawed concept to begin with.
The reason is that with virtually unbounded input, anything that can be expressed in language, there is an enormous attack surface area that somehow must be guarded. We struggle to guard against software vulnerabilities when the attack surface is mostly known, this is an order of magnitude harder.
AI safety isn’t only about what the AI might decide to do on its own or how it might execute a request in an unexpected way, but it must also include the concept of preventing users from subverting any safety mechanisms. So far, we can’t even keep the low-intelligent humans from breaking it.
Self-improving AI nears already
Since the language models themselves are not altered, this is not full recursive self-improvement. Nonetheless, it demonstrates that a modern language model, GPT-4 in our proof-of-concept experiments, is capable of writing code that can call itself to improve itself. We critically consider concerns around the development of self-improving technologies and evaluate the frequency with which the generated code bypasses a sandbox.
Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
Our ability to understand what is happening will continue to significantly lag behind what is happening. This is the message that seems to continue to repeat from each step forward and we continue forward before we have closed the gap in our current understanding.
AI cares more about you than humans?
The following headline — Can AI Do Empathy Even Better Than Humans? Companies Are Trying It. — foretells the direction we are going.
How long before humans prefer AI companions? After social media and cellphones it seems we already have trouble communicating with each other. In the next era, nobody may even want to do so.
Anti-AI as defense against harmful AI
A X post raises concerns about AI and potential use for war.
Remember, open source means irreversible proliferation. We’re giving our enemies weapons we can never take back
Open source means no kill switch. If bad actors create a superweapon, or a supervirus, we can’t turn it off
What happens next?
…
A common counterpoint is often stated like the following response.
The problem with this viewpoint is that we are talking about immense capability. Defensive actions against unknown attack vectors will have to be reactive. Meaning that the attack occurs first, then a response can be crafted. When we are talking about incredible power and capabilities, that lag time to response is going to be very problematic as a solution to thwart the nefarious use of AI.
The alternative to reactive is to attempt to lock down much of society. It will be the argument for the absolute abolishment of privacy. A free society is no more. We must know what you are thinking so we can always plan ahead of your actions.
No compass through the dark exists without hope of reaching the other side and the belief that it matters …