This post was written together with Gabriel Alfour.
From our point of view, we are now in the end-game for AGI, and we (humans) are losing. When we share this with other people, they reliably get surprised. That’s why we believe it is worth writing down our beliefs on this.
1. AGI is happening soon. Significant probability of it happening in less than 5 years.
Five years ago, there were many obstacles on what we considered to be the path to AGI.
But in the last few years, we’ve gotten:
Powerful Agents (Agent57, GATO, Dreamer V3)
Reliably good Multimodal Models (StableDiffusion, Whisper, Clip)
Robots (Boston Dynamics, Day Dreamer, VideoDex, RT-1: Robotics Transformer1)
AIs that are superhuman at just about any task we can (or simply bother to) define a benchmark for
We don’t have any obstacle left in mind that we don’t expect to get overcome in more than 6 months after efforts are invested to take it down.
Forget about what the social consensus is. If you have technical understanding of current AIs, do you truly believe there are any major obstacles left? The kind of problems that AGI companies could reliably not tear down with their resources? If you do, state so in the comments, but please do not state what those obstacles are.
2. We haven’t solved AI Safety, and we don’t have much time left.
We are very close to AGI. But how good are we at safety right now? Well.
No one knows how to get LLMs to be truthful. LLMs make things up, constantly. It is really hard to get them not to do this, and we don’t know how to do this at scale.
Optimizers quite often break their setup in unexpected ways. There have been quite a few examples of this. But in brief, the lessons we have learned are:
Optimizers can yield unexpected results
Those results can be very weird (like breaking the simulation environment)
Yet very few extrapolate from this and find these as worrying signs
No one understands how large models make their decisions. Interpretability is extremely nascent, and mostly empirical. In practice, we are still completely in the dark about nearly all decisions taken by large models.
RLHF and Fine-Tuning have not worked well so far. Models are often unhelpful, untruthful, inconsistent, in many ways that had been theorized in the past. We also witness goal misspecification, misalignment, etc. Worse than this, as models become more powerful, we expect more egregious instances of misalignment, as more optimization will push for more and more extreme edge cases and pseudo-adversarial examples.
No one knows how to predict AI capabilities. No one predicted the many capabilities of GPT3. We only discovered them after the fact, while playing with the models. In some ways, we keep discovering capabilities now thanks to better interfaces and more optimization pressure by users, more than two years in. We’re seeing the same phenomenon happen with ChatGPT and the model behind Bing Chat.
We are uncertain about the true extent of the capabilities of the models we’re training, and we’ll be even more clueless about upcoming larger, more complex, more opaque models coming out of training. This has been true for a couple of years by now.
3. Racing towards AGI: Worst game of chicken ever.
The Race for powerful AGIs has already started. There already are general AIs. They just are not powerful enough yet to count as True AGIs.
Actors
Regardless of why people are doing it, they are racing for AGI. Everyone has their theses, their own beliefs about AGIs and their motivations. For instance, consider:
AdeptAI is working on giving AIs access to everything. In their introduction post, one can read “True general intelligence requires models that can not only read and write, but act in a way that is helpful to users. That’s why we’re starting Adept: we’re training a neural network to use every software tool and API in the world”, and furthermore, that they “believe this is actually the most practical and safest path to general intelligence” (emphasis ours).
DeepMind has done a lot of work on RL, agents and multi-modalities. It is literally in their mission statement to “solve intelligence, developing more general and capable problem-solving systems, known as AGI”.
OpenAI has a mission statement more focused on safety: “We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome”. Unfortunately, they have also been a major kickstarter of the race with GPT3 and then ChatGPT.
(Since we started writing this post, Microsoft deployed what could be OpenAI’s GPT4 on Bing, plugged directly into the internet.)
Slowing Down the Race
There has been literally no regulation whatsoever to slow down AGI development. As far as we know, the efforts of key actors don’t go in this direction.
We don’t know of any major AI lab that has participated in slowing down AGI development, or publicly expressed interest in it.
Here are a few arguments that we have personally encountered, multiple times, for why slowing down AGI development is actually bad:
“AGI safety is not a big problem, we should improve technology as fast as possible for the people”
“Once we have stronger AIs, we can use them to work on safety. So it is better to race for stronger AIs and do safety later.”
“It is better for us to deploy AGI first than [authoritarian country], which would be bad.”
“It is better for us to have AGI first than [other organization], that is less safety minded than us.”
“We can’t predict the future. Possibly, it is better to not slow down AGI development, so that at some point there is naturally a big accident, and then the public and policymakers will understand that AGI safety is a big deal.”
“It is better to have AGI ASAP, so that we can study it longer for safety purposes, before others get it.”
“It is better to have AGI ASAP, so that at least it has access to fewer compute for RSI / world-takeover than in the world where it comes 10 years later.”
“Policymakers are clueless about this technology, so it’s impossible to slow down, they will just fail in their attempts to intervene. Engineers should remain the only ones deciding where the technology goes”
Remember that arguments are soldiers: there is a whole lot more interest in pushing for the “Racing is good” thesis than for slowing down AGI development.
Question people.
We could say more. But:
We are not high status, “core” members of the community.
We work at Conjecture, so what we write should be read as biased.
There are expectations of privacy when people talk to me. Not complete secrecy about everything. But still, they expect that we would not directly attribute quotes to them for instance, and we will not do so without each individual’s consent.
We expect we could say more things that would not violate expectations of privacy (public things even!). But we expect niceness norms (that we find often detrimental and naive) and legalities (because we work at what can be seen as a competitor) would heavily punish us.
So our message is:
Things are worse than what is described in the post!
Don’t trust blindly, don’t assume: ask questions and reward openness.
Recommendations
Question people, report their answers in your whisper networks, in your Twitter sphere or whichever other places you communicate on.
An example of “questioning” is asking all of the following questions:
Do you think we should race toward AGI? If so, why? If not, do you think we should slow down AGI? What does your organization think? What is it doing to push for capabilities and race for AGI compared to slowing down capabilities?
What is your alignment plan? What is your organization’s alignment plan? If you don’t know if you have one, did you ask your manager/boss/CEO what their alignment plan is?
Don’t substitute social fluff for information: someone being nice, friendly, or being liked by people, does not mean they have good plans, or any plans at all. The reverse also holds!
Gossiping and questioning people about their positions on AGI are prosocial activities!
Silence benefits people who lie or mislead in private, telling others what they want to hear.
Open Communication Norms benefit people who are consistent (not necessarily correct, or even honest, but at least consistent).
4. Conclusion
Let’s summarize our point of view:
AGI by default very soon: brace for impact
No safety solutions in sight: we have no airbag
Race: people are actually accelerating toward the wall
Should we just give up and die?
Nope! And not just for dignity points: there is a lot we can actually do. We are currently working on it quite directly at Conjecture.
We’re not hopeful that full alignment can be solved anytime soon, but we think that narrower subproblems with tighter feedback loops, such as ensuring the boundedness of AI systems, are promising directions to pursue.
If you are interested in working together on this (not necessarily by becoming an employee or funding us), send an email with your bio and skills, or just a private message here.
We personally also recommend engaging with the writings of Eliezer, Paul, Nate, and John. We do not endorse all of their research, but they all have tackled the problem, and made a fair share of their reasoning public. If we want to get better together, they seem like a good start.
Disclaimer
We acknowledge that the points above don’t go deeply into our models of why these situations are the case. Regardless, we wanted our point of view to at least be written in public.
For many readers, these problems will be obvious and require no further explanation. For others, these claims will be controversial: we’ll address some of these cruxes in detail in the future if there’s interest.
Some of these potential cruxes include:
Adversarial examples are not only extreme cases, but rather they are representative of what you should expect conditioned on sufficient optimization.
Monitoring of increasingly advanced systems does not trivially work, since much of the cognition of advanced systems, and many of their dangerous properties, will be externalized the more they interact with the world.
Even perfect interpretability will not solve the problem alone: not everything is in the feed forward layer, and the more models interact with the world the truer this becomes.
Even with more data, RLHF and fine-tuning can’t solve alignment. These techniques don’t address deception and inner alignment, and what is natural in the RLHF ontology is not natural for humans and vice-versa.
Additional discussion in the comments of the LW post here.
Edited to include DayDreamer, VideoDex and RT-1, h/t Alexander Kruel for these additional, better examples.
"But in the last few years, we’ve gotten:
- Powerful Agents (Agent57, GATO, Dreamer V3)
...
- AIs that are superhuman at just about any task we can (or simply bother to) define a benchmark for"
I disagree with these two points. Progress in getting AIs to play games has been very slow, and headlines are way overhyped. For any impressive headline in AI, especially in AIs playing games, look past the headline and you will find that the headline was either misleading or outright false.
For example, the DreamerV3 paper claims to be the first agent to learn to get diamonds in Minecraft without training on human examples. That's not true - they increased the block breaking speed to 100x and gave their agent direct information about the gamestate.
If we are using games as a benchmark for AI progress then the AI should be competing on a level playing field with humans, which means it only has access to information the human has access to (pixels and audio) and, obviously, no drastically altering the rules of the game in order to make the objective easier while failing to mention that fact in your abstract.
Under these conditions, AI still hasn't managed to progress past Atari games, where the screen can be compressed to 84x84 grayscale images without losing any relevant information and there are a maximum of 18 possible choices (1 joystick and 1 button - the joystick can be in 9 possible positions, and the button can either be pressed or not pressed, so 18 possible options in combination).
In fact, not only has AI failed to reach superhuman performance on any videogame bigger than Atari without an unfair advantage in the form of being given information that the human doesn't have access to, it hasn't reached average-human level performance either. The closest was VPT for Minecraft, which got diamond pickaxes 2.5% of the time under these conditions compared to human testers' 12% (and of course a very good human would get it approximately 100% of the time).
Also, GATO performed very poorly - it was trained to imitate the play of an agent, Muesli, that got a median score of over 1000% of that of the human tester, and ended up with a score around 200-something% of that of a human. In other words, it was literally showed exactly how to get a very high score on Atari and was unable to replicate that. It and VPT were both reassuring negative results - i.e. they showed that the extremely impressive performance that large transformer models show in language and image generation does not easily cross over to the much scarier realm of agentic behaviour.