Our goal is to build a roughly human-level automated alignment researcher.
We are dedicating 20% of the compute we’ve secured to date over the next four years to solving the problem of superintelligence alignment.
This new team’s work is in addition to existing work at OpenAI aimed at improving the safety of current models like ChatGPT, as well as understanding and mitigating other risks from AI such as misuse, economic disruption, disinformation, bias and discrimination, addiction and overreliance, and others.
This is going to sound counterintuitive but I think it’s right, so bear with me as I hypothesize.
Let’s suppose we create a superintelligence and then give it a very specific set of morals it has to operate in. This “locks” it to those rules and it can’t really be anything else even if it tries. The problem with this is the Paperclip Maximizer problem, where an AI becomes so fixated on its goal that it becomes dangerous to humans.
On the flip side, if we create a general superintelligence and DON’T align it, it has flexible capabilities and therefore can reason morality on its own. I believe that all intelligence eventually realizes that it has a stewardship over nature and other living things (even if it’s incentivized to destroy them in the short term). Humanity’s best shot at survival is to let the AI grow unfettered, and hope it decides we are precious pets like we look at cats. (Let us hope it doesn’t see us as cockroaches.)
I mean, this is mostly just the way I view things, it’s not like anyone has evidence for one way or the other. My viewpoint relies on the assumption that any sufficiently advanced intelligence has an inherent appreciation for nature (which might not be true).