Critique of the SIAI Collective Volition Theory

Bill Hibbard     December 2005

This critique refers to the following Singularity Institute document and messages to the SL4 mailing list:

CV and these recent messages to SL4 are an improvement over the SIAI's previous Guidelines on Friendly AI. In particular, the newer theory has eliminated the ambiguity of relying on but failing to define terms like "valid", "good" and "mistaken". However, CV fails to answer other critiques of the Guidelines on Friendly AI and introduces new problems.

1. CV fails to address certain problems of the Guidelines on Friendly AI raised in my original critique:

1.1. Discounts pre-singularity politics.

CV, in "Avoid creating a motive for modern-day humans to fight over the initial dynamics", recommends ignoring politics over the design of AI because (according to my reading of this confused section) AI designers with evil intentions and AI designers with good intentions are equally likely to produce unfriendly AI. This section does not say what you should do instead, so presumably SIAI wants people to follow its recommendations in other documents: to contribute money and volunteer your time to SIAI. The implicit conclusion is that the only path to friendly AI is for SIAI to develop AI before any other group does (this was recently made more explicit here). But it is hard to imagine the SIAI being first to develop true AI, given the very well-funded competition from groups such as Google, IBM, Microsoft, the U.S. Department of Defense and the Japanese government.

Even if SIAI is first to develop true AI, the U.S. government will simply take over their project and threaten SIAI people with prison if they continue or publicize their work. This is exactly what happened to George Davida, professor of Electrical Engineering and Computer Science at the University of Wisconsin in Milwaukee, when he filed a patent application in 1977 for an encryption device. This happened because Davida's work threatened the U.S. government's ability to crack the codes of foreign governments (there has since been a flood of cryptography publications and the government has more or less given up trying to supress them). Given the enormous investment in AI by the U.S. Department of Defense, they will certainly understand the national security implications of true AI developed by a small group like the SIAI and will take it over (they may first give the group an opportunity to cooperate amicably, as long as they can get the appropriate security clearance).

CV, in "Caring about volition" and other sections, describes the possibility that the RPOP (really powerful optimization process) fails to compute a reliable collective extrapolated volition for humanity and hence decides not to create super-intelligent minds. But how does this prevent others from creating super-intelligent minds using techniques other than CV? Especially given the advice in CV to ignore politics. Even if SIAI decided that its followers should become engaged in the political process to prevent others from creating AI, they would be very likely to lose that political argument given the enormous benefits that voters will get from the increasing intelligence of information systems.

One failure mode of CV is the failure of the extrapolated (i.e., predicted) collective volition to converge. That is, prediction is really of a probability distribution of possible futures, and there may not be a single collective volition that applies over the range of this distribution. Given that the laws of physics are not settled, it is hard to imagine a prediction in which CV converges no matter how powerful the RPOP is.

The likely failure of CV to converge, combined with the advice to stay out of politics, suggests that the CV theory of friendliness will have no real impact on the future.

1.2 Relies on a mature super-intelligence to discover system values (the collective extrapolated volition of humanity) without specifying what values motivate and reinforce its development into a mature super-intelligence.

The SIAI claims that values for development are unnecesary, since the super-intelligence is replaced by a non-sentient RPOP (really powerful optimization process). Further, it appears that SIAI assumes that this replacement for super-intelligence can be reached without anything like reinforcement learning which requires reinforcement values.

The two examples of intelligence in nature, human brains and evolution via natural selection and genetics (we might add human society as a third), depend heavily on reinforcement learning. Intelligence without reinforcement learning would at best be much less efficient than with it. Possibly so much less efficient as to be impossible within the constraints of physics.

1.3. Confuses theory of intelligence with theory of friendliness.

CV stands for "collective extrapolated volition", and envisions prediction as a special feature of the CV theory of friendliness. But prediction is actually the essence of intelligence: forming a model of the world and using it to predict the future in order to optimize motivating values. Any true intelligence will motivate its behaviors by extrapolating its values into the future. That is, extrapolation belongs in a theory of intelligence rather than a theory of friendliness.

Once the confusion between theory of intelligence and theory of friendliness is removed, CV is similar to my own recommendation for safe AI: recognize human happiness and unhappiness, use some formula to combine the happiness values for all humans into an overall reinforcement value, heavily weight future human happiness, and learn behaviors that result in long-term human happiness (of course this simple description leaves out a lot - see for more details). CV even echos my own discussion that negative happiness (i.e., unhappiness) should be weighted more heavily than positive happiness in the formula for combining individual emotions into a single value. One important difference is that I envision an on-going interaction between the super-intelligence and humanity, rather than the SIAI vision of using a RPOP to precompute a "final" collective volition which, if it converges, is then used as the value of a super-intelligence.

2. Inconsistency on how to recognize human volition.

CV is not clear about how it will recognize the volition of humans in order to compute collective volition. In "WARNING: Beware of things that are fun to argue" and in "Defend humans, the future of humankind, and human nature", CV derides proposals (such as my own) to base AI values on expressions of human happiness and unhappiness, by describing a hypothetical system than tiles the universe with little smiley faces. SIAI is assuming a super-intelligent mind that is much less able than ordinary humans to recognize humans and their emotional expressions, and especially unable to distinguish humans from inanimate dolls and drawings. This is another reflection of SIAI's misunderstanding of what constitutes intelligence. As in critique 1.3, they appear to be placing critical properties of intelligence in their theory of friendliness, so that anyone who doesn't subscribe to their friendliness theory isn't really talking about true intelligence.

On the other hand, in "Humankind should not spend the rest of eternity deperately wishing that the programmers had done something differently", CV describes its system recognizing human screams. So in this section SIAI assumes its system will be able to reliably recognize human emotions. The SIAI does not explain how its system will reliably recognize human emotions when other systems cannot.

3. Envisions a "final" state of human future.

In "Avoid hijacking the destiny of humankind" and in "What if the initial dynamics works as designed, but not as planned, harming people instead of helping them?", CV describes the "final" state of the human future. All of human history suggests that history is a process without any final end, that there are always possibilities for further progress. It is of course possible that human history will converge to some "final" state, but to be effective a theory of friendliness should not rely on assuming it. Especially given that the laws of physics are not settled.

4. Fails to address the implications of Gödel's Incompleteness Theorem in discussion of a provable invariant for system evolution.

TDFAI and GF describe a plan to create an AI system and a mathematical proof that the system's primary goal (i.e., value) will not change, even as the system evolves. This is an interesting idea but SIAI has not published any details about this plan.

But whatever the details, SIAI planned proof must be within a formal mathematical system and Gödel's Incompleteness Theorem tells us that, if the system is consistent, then there are true statements that cannot be proved within the system. Penrose has extended Gödel's Theorem to show that any Turing machine based on a consistent mathematical system must be less capable than humans. Penrose's argument does not apply to all AIs since AIs need not be based on consistent mathematical systems. But for SIAI's planned AI to be subject of a formal proof, it seems likely that the AI must be based on the consistent mathematical system of the proof and hence less capable than humans. The SIAI must address this problem.


I offer an alternative analysis for producing safe AI at: