From test@demedici.ssec.wisc.edu Wed Jun 7 12:29:35 2006 Date: Wed, 7 Jun 2006 12:24:55 -0500 (CDT) From: Bill Hibbard Reply-To: sl4@sl4.org To: sl4@sl4.org Cc: wta-talk@transhumanism.org, extropy-chat@lists.extropy.org, agi@v2.listbox.com Subject: Re: Two draft papers: AI and existential risk; heuristics and biases Eliezer, > I don't think it > inappropriate to cite a problem that is general to supervised learning > and reinforcement, when your proposal is to, in general, use supervised > learning and reinforcement. You can always appeal to a "different > algorithm" or a "different implementation" that, in some unspecified > way, doesn't have a problem. But you are not demonstrating a general problem. You are instead relying on specific examples (primitive neural networks and systems that cannot distingish a human from a smiley) that fail trivially. You should be clear whether you claim that reinforcement learning (RL) must inevitably lead to: 1. A failure of intelligence. or: 2. A failure of friendliness. Your example of the US Army's primitive neural network experiments is a failure of intelligence. Your statement about smiley faces assumes a general success at intelligence by the system, but an absurd failure of intelligence in the part of the system that recognizes humans and their emotions, leading to a failure of friendliness. If your claim is that RL must lead to a failure of intelligence, then you should cite and quote from Eric Baum's What is Thought? (in my opinion, Baum deserves the Nobel Prize in Economics for his experiments linking economic principles with effective RL in multi-agent learning systems). If your claim is that RL can succeed at intelligence but must lead to a failure of friendliness, then it is reasonable to cite and quote me. But please use my 2004 AAAI paper . . . > If you are genuinely repudiating your old ideas ... . . . use my 2004 AAAI paper because I do repudiate the statement in my 2001 paper that recognition of humans and their emotions should be hard-wired (i.e., static). That is just the section of my 2001 paper that you quoted. Not that I am sure that hard-wired recognition of humans and their emotions inevitably leads to a failure of friendliness, since the super-intelligence (SI) may understand that humans would be happier if they could evolve to other physical forms but still be recognized by the SI as humans, and decide to modify itself (or build an improved replacement). But if this is my scenario, then why not design continuing learning of recognition of humans and their emotions into the system in the first place. Hence my change of views. I am sure you have not repudiated everything in CFAI, and I have not repudiated everything in my earlier publications. I continue to believe that RL is critical to acheiving intelligence with a feasible amount of computing resources, and I continue to believe that collective long-term human happiness should be the basic reinforcement value for SI. But I now think that a SI should continue to learn recognition of humans and their emotions via reinforcement, rather than these recognitions being hard-wired as the result of supervised learning. My recent writings have also refined my views about how human happiness should be defined, and how the happiness of many people should be combined into an overall reinforcement value. > I see no relevant difference between these two proposals, except that > the paragraph you cite (presumably as a potential replacement) is much > less clear to the outside academic reader. If you see no difference between my earlier and later ideas, then please use a scenario based on my later papers. That will be a better demonstration of the strength of your arguments, and be fairer to me. Of course, it would be best to demonstrate your claim (either that RL must lead to a failure of intelligence, or can succeed at intelligence but must lead to a failure of friendliness) in general. But if you cannot do that and must rely on a specific example, then at least do not pick an example that fails for trivial reasons. As I wrote above, if you think RL must fail at intelligence, you would be best to quote Eric Baum. If you think RL can succeed at intelligence but must fail at friendliness, but just want to demonstrate it for a specific example, then use a scenario in which: 1. The SI recognizes humans and their emotions as accurately as any human, and continually relearns that recognition as humans evolve (for example, to become SIs themselves). 2. The SI values people after death at the maximally unhappy value, in order to avoid motivating the SI to kill unhappy people. 3. The SI combines the happiness of many people in a way (such as by averaging) that does not motivate a simple numerical increase (or decrease) in the number of people. 4. The SI weights unhappiness stronger than happiness, so that it focuses it efforts on helping unhappy people. 5. The SI develops models of all humans and what produces long-term happiness in each of them. 6. The SI develops models of the interactions among humans and how these interactions affect the happiness of each. If you demonstrate a failure of friendliness against a weaker scenario, all that really demonstrates is that you needed the weak scenario in order to make your case. And it is unfair to me. As I said, best would be a general demonstration, but if you must pick an example, at least pick a strong example. I do not pretend to have all the answers. Clearly, making RL work will require solution to a number of currently unsolved problems. Jeff Hawkins' work on hierarchical temporal memory (HTM) is interesting in this respect, given the interactions within the human brain between the cortex (modeled by HTM) and lower brain areas where RL has been observed (in my view RL is in a lower area because it is fundamental, and the higher areas evolved to create the simulation model of the world necessary to solve the credit assignment problem for RL). Clearly RL is not the whole answer, but I think Eric Baum has it right that it is critical to intelligence. I appreciate your offer to include my URL in your article, where I can give my response. Please use this (please proof read carefully for typos in the final galleys): http://www.ssec.wisc.edu/~billh/g/AIRisk_Reply.html If you take my suggestion, by elevating your discussion to a general explanation of why RL systems must fail or at least using a strong scenario, that will make my response more friendly since I am happier to be named as an advocate of RL than to be conflated with trivial failure. I would prefer that you not use the quote you were using from my 2001 paper, as I repudiate supervised learning of hard-wired values. Please use some quote from and cite my 2004 AAAI paper, since there is nothing in it that I repudiate yet (but you will find more refined views in my 2005 on-line paper). Bill p.s., Although I receive digest messages from extropy-chat, for some reason my recent posts to it have all bounced. Could someone please forward this message to extropy-chat?