The Error in My 2001 VisFiles Column

Bill Hibbard September 2012

My 2001 VisFiles Column Super-intelligent Machines was primarily intended to address the political problem of safe AI but as part of that informally described a technical solution to safe AI. That idea was repeated and cleaned up in a paper presented at a 2004 AAAI Symposium and in my 2008 JET paper The Technology of Mind and a New Social Contract.

The idea, as expressed in the 2008 JET paper, is "This fits naturally in the form of a proposed new social contract: in exchange for significantly greater than normal human intelligence, a mind must value the long-term life satisfaction of all other intelligent minds." The AI would learn to recognize "long-term life satisfaction" in humans.

The purpose of this note is to acknowledge the error in this idea: that the AI will maximize its reinforcement by modifying humans so that the AI can more easily give them "long-term life satisfaction." This is a specific example of the general problem with RL described by Marcus Hutter on page 239 of his 2005 book Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability and by Daniel Dewey in his AGI-11 paper Learning What to Value.

The idea presented in my 2001, 2004 and 2008 publications is now superseded by my AGI-12 paper Avoiding Unintended AI Behaviors.

There has been a claim that my 2001 idea will lead an AI to tile the galaxy with tiny smiling faces. I explained the error of this claim here. Given my admission of the true error in my 2001 idea, this "smiley face" argument is now moot.