From sentience@pobox.com Tue Jun 6 14:46:14 2006 Date: Tue, 06 Jun 2006 12:14:54 -0700 From: Eliezer S. Yudkowsky Reply-To: sl4@sl4.org To: sl4@sl4.org Cc: wta-talk@transhumanism.org, extropy-chat@lists.extropy.org, agi@v2.listbox.com Subject: Re: Two draft papers: AI and existential risk; heuristics and biases Bill Hibbard wrote: > Eliezer, > >> http://singinst.org/AIRisk.pdf > > In Section 6.2 you quote my ideas written in 2001 for > hard-wiring recognition of expressions of human happiness > as values for super-intelligent machines. I have three > problems with your critique: Bill, First, let me explain why the chapter singles you out for criticism, rather than any number of AI researchers who've made similar mistakes. It is because you published the particular comment that I quoted in a peer-reviewed AI journal. The way I found the quote was that I read the online version of your book, and then looked through your journal articles hoping to find a quotation that put forth the same ideas. I specifically wanted a journal citation. The book editors specifically requested that I quote specific persons putting forth the propositions I was arguing against. In most cases, I felt I couldn't really do this, because the arguments had been put forth in spoken conversation or on email lists, and people don't expect emails composed in thirty minutes to be held to the same standards as a paper submitted for journal publication. Before discussing the specific issues below, let me immediately state that if you write a response to my critique, I will, no matter what else happens in this conversation, be willing to include a link to it in a footnote, with the specific note that you feel my criticism is misdirected. I may also include a footnote leading to my response to your response, and you would be able to respond further in your previous URL, and so on. Space constraints are a major issue here. I didn't have time to discuss *anything* in detail in that book chapter. If we can offload this discussion to separate webpages, that is a good thing. > 1. Immediately after my quote you discuss problems with > neural network experiments by the US Army. But I never said > hard-wired learning of recognition of expressions of human > happiness should be done using neural networks like those > used by the army. You are conflating my idea with another, > and then explaining how the other failed. Criticizing an AI researcher's notions of Friendly AI is, typically, an awkward issue, because obviously *they* don't believe that their proposal will destroy the world if somehow successfully implemented. Criticism in general is rarely comfortable. There's a number of "cheap" responses to an FAI criticism, especially when the AI proposal has not been put forth in mathematical detail - i.e., "Well, of course the algorithm *I* use won't have this problem." Marcus Hutter's is the only AI proposal sufficiently rigorous that he should not be able to dodge bullets fired at him in this way. I'd have liked to use Hutter's AIXI as a mathematically clear example of a similar FAI problem, but that would have required far too much space to introduce; and my experience suggests that most AI academics have trouble understanding AIXI, let alone a general academic audience. You say, "Well, I won't use neural networks like those used by the army." But you have not exhibited any algorithm which does *not* have the problem cited. Nor did you tell your readers to beware of it. Nor, as far as I can tell from your most recent papers, have you yet understood the problem I was trying to point out. It is a general problem. It is not a problem with the particular neural network the army was using. It is a problem that people run into, in general, with supervised learning using local search techniques for traversing the hypothesis space. The example given is one that is used to vividly illustrate this general point - it's not to warn people against some particular, failed neural network algorithm. I don't think it inappropriate to cite a problem that is general to supervised learning and reinforcement, when your proposal is to, in general, use supervised learning and reinforcement. You can always appeal to a "different algorithm" or a "different implementation" that, in some unspecified way, doesn't have a problem. If you have magically devised an algorithm that avoids this major curse of the entire current field, by all means publish it. > 2. In your section 6.2 you write: > > If an AI "hard-wired" to such code possessed the power - and > [Hibbard, B. 2001. Super-intelligent machines. ACM SIGGRAPH > Computer Graphics, 35(1).] spoke of superintelligence - would > the galaxy end up tiled with tiny molecular pictures of > smiley-faces? > > When it is feasible to build a super-intelligence, it will > be feasible to build hard-wired recognition of "human facial > expressions, human voices and human body language" (to use > the words of mine that you quote) that exceed the recognition > accuracy of current humans such as you and me, and will > certainly not be fooled by "tiny molecular pictures of > smiley-faces." You should not assume such a poor > implementation of my idea that it cannot make > discriminations that are trivial to current humans. Oh, so the SI will know "That's not what we really mean." A general problem that AI researchers stumble into, and an attractor which I myself lingered in for some years, is to measure "stupidity" by distance from the center of our own optimization criterion, since all our intelligence goes into searching for good fits to our own criterion. How stupid it seems, to be "fooled" by tiny molecular smiley faces! But you could have used a galactic-size neural network in the army tank classifer and gotten exactly the same result, which is only "foolish" by comparison to the programmers' mental model of which outcome *they* wanted. The AI is not given the code, to look it over and hand it back if it does the wrong thing. The AI *is* the code. If the code *is* a supervised neural network algorithm, you get an attractor that classifies most instances previously seen. During the AI's youth, it does not have the ability to tile the galaxy with tiny molecular pictures of smiling faces, and so it does not receive supervised reinforcement that such cases should be classifed as "not a smile". And once the AI is a superintelligence, it's too late, because your frantic frowns are outweighed by a vast number of tiny molecular smileyfaces. In general, saying "The AI is super-smart, it certainly won't be fooled by foolish-seeming-goal-system-failure X" is not, I feel, a good response. I realize that you don't think your proposal destroys the world, but I am arguing that it does. We disagree about this. You put forth one view of what your algorithm does in the real world, and I am putting forth a *different* view in my book chapter. As for claiming that "I should not assume such a poor implementation", well, at that rate, I can claim that all you need for Friendly AI is a computer program. Which computer program? Oh, that's an implementation issue... but then you do seem to feel that Friendly AI is a relatively easy theoretical problem, and the main issue is political. > 3. I have moved beyond my idea for hard-wired recognition of > expressions of human emotions, and you should critique my > recent ideas where they supercede my earlier ideas. In my > 2004 paper: > > Reinforcement Learning as a Context for Integrating AI Research, > Bill Hibbard, 2004 AAAI Fall Symposium on Achieving Human-Level > Intelligence through Integrated Systems and Research > http://www.ssec.wisc.edu/~billh/g/FS104HibbardB.pdf > > I say: > > Valuing human happiness requires abilities to recognize > humans and to recognize their happiness and unhappiness. > Static versions of these abilities could be created by > supervised learning. But given the changing nature of our > world, especially under the influence of machine > intelligence, it would be safer to make these abilities > dynamic. This suggests a design of interacting learning > processes. One set of processes would learn to recognize > humans and their happiness, reinforced by agreement from > the currently recognized set of humans. Another set of > processes would learn external behaviors, reinforced by > human happiness according to the recognition criteria > learned by the first set of processes. This is analogous > to humans, whose reinforcement values depend on > expressions of other humans, where the recognition of > those humans and their expressions is continuously > learned and updated. > > And I further clarify and update my ideas in a 2005 > on-line paper: > > The Ethics and Politics of Super-Intelligent Machines > http://www.ssec.wisc.edu/~billh/g/SI_ethics_politics.doc I think that you have failed to understand my objection to your ideas. I see no relevant difference between these two proposals, except that the paragraph you cite (presumably as a potential replacement) is much less clear to the outside academic reader. The paragraph I cited was essentially a capsule introduction of your ideas, including the context of their use in superintelligence. The paragraph you offer as a replacement includes no such introduction. Here, for comparison, is the original cited in AIGR: > "In place of laws constraining the behavior of intelligent machines, we need to give them emotions that can guide their learning of behaviors. They should want us to be happy and prosper, which is the emotion we call love. We can design intelligent machines so their primary, innate emotion is unconditional love for all humans. First we can build relatively simple machines that learn to recognize happiness and unhappiness in human facial expressions, human voices and human body language. Then we can hard-wire the result of this learning as the innate emotional values of more complex intelligent machines, positively reinforced when we are happy and negatively reinforced when we are unhappy. Machines can learn algorithms for approximately predicting the future, as for example investors currently use learning machines to predict future security prices. So we can program intelligent machines to learn algorithms for predicting future human happiness, and use those predicti ons as emotional values." If you are genuinely repudiating your old ideas and declaring a Halt, Melt and Catch Fire on your earlier journal article - if you now think your proposed solution would destroy the world if implemented - then I will have to think about that a bit. Your old paragraph does clearly illustrate some examples of what not to do. I wouldn't like it if someone quoted _Creating Friendly AI_ as a clear example of what not to do, but I did publish it, and it is a legitimate example of what not to do. I would definitely ask that it be made clear that I no longer espouse CFAI's ideas and that I have now moved on to different approaches and higher standards; if it were implied that CFAI was still my current approach, I would be rightly offended. But I could not justly *prevent* someone entirely from quoting a published paper, though I might not like it... But it seems to me that the paragraph I quoted still serves as a good capsule introduction to your approach, even if it omits some of the complexities of how you plan to use supervised learning. I do not see any attempt at all, in your new approach, to address any of the problems that I think your old approach has. However, I could not possibly refuse to include a footnote disclaimer saying that *you* believe this old paragraph is no longer fairly representative of your ideas, and perhaps citing one of your later journal articles, in addition to providing the URL of your response to my criticisms. If you are repudiating any of your old ideas, please say specifically which ones. If anyone on these mailing list would like to weigh in with an outside opinion of what constitutes fair practice in this case, please do so. -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence