From durant@ilm.com Thu May 15 05:35:26 2003 Date: Wed, 14 May 2003 17:18:39 -0700 From: Durant Schoon Reply-To: agi@v2.listbox.com To: sl4@sl4.org Cc: agi@v2.listbox.com Subject: [agi] Re: SIAI's flawed friendliness analysis Hi Bill, ... > The SIAI guidelines involve digging into the AI's > reflective thought process and controlling the AI's > thoughts, in order to ensure safety. My book says the > only concern for AI learning and reasoning is to ensure > they are accurate, and that the teachers of young AIs > be well-adjusted people (subject to public monitoring > and the same kind of screening used for people who > control major weapons). Beyond that, the proper domain > for ensuring AI safety is the AI's values rather than > the AI's reflective thought processes. Two words: "Value Hackers" Let us try to understand why Eliezer chooses to focus on an "AI's reflective thought processes" rather than on explicitly specifying an "AI's values". Let's look at it this way: if you could, wouldn't you rather develop an AI which could reason about *why* the values are the way they are instead of just having the values carved in stone by a programmer. This is *safer* for one very important reason: The values are less likely corruptible, since the AI that actually understands the sources for these values can reconstruct them from basic principles/information-about-humans-and-the-world-in-general. The ability to be able to re-derive these values in the face of a changing external environment and an interior mindscape-under- development is in fact *paramount* to the preservation of Friendliness. As mentioned before, this all hinges on the the ability to create an AI in the first place that can understand "how and why values are created" as well as what humans are, what itself is, and what the world around us is. Furthemore, we are instructed by this insight as to what the design of an AI should look like. Remembering our goal is to build better brains, the whole notion of the SeedAI bootstrap is to get a AI that builds a better AI. We must then ask ourselves this question: Who should be in charge of designating this new entity's values? The answer is "the smartest most capable thinker who is the most skilled in these areas". At some point that thinker is the AI. From the get-go we want the AI to be competent at this task. If we cannot come up with a way to ensure this, we should not attempt to build a mind in the first place (this is Eliezer's view. I happen to agree with this. Ben Goertzel & Peter Voss's opposing views have been noted previously as well(*)). In summary, we need to build the scaffolding of *deep* understanding of values and their derivations into the design of AI if we are to have a chance at all. The movie 2001 is already an example in popular mind of what can happen when this is not done. We cannot think of everything in advance. We must build a mind that does the "right" thing no matter what happens. --- Considering your suggestion that the AI *only* be concerned with having accurate thoughts: I haven't read your book, so I don't know your reasoning for this. I can imagine that it's an *easier* way to do things. You don't have to worry about the hard problem of where values come from, which ones are important and how to preserve the right ones under self modification. Easier is not better, obviously, but I'm only guessing here why you might hold "ensuring-accuracy" in higher esteem than other goals of thinking (like preserving Friendliness). (*) This may be the familiar topic on this list of "when" to devote your efforts to Friendliness, not "if". This topic has already been discussed exhaustively and I would say it comes down to how one answers certain questions: "How cautious do you want to be?", "How seriously do consider that a danger could arise quickly, without a chance to correct problems on a human time scale of thinking/action". > In my second and third points I described the lack of > rigorous standards for certain terms in the SIAI > Guidelines and for initial AI values. Those rigorous > standards can only come from the AI's values. I think > that in your AI model you feel the need to control how > they are derived via the AI's reflective thought > process. This is the wrong domain for addressing AI > safety. Just to reiterate, with SeedAI, the AI becomes the programmer, the gatekeeper of modifications. We *want* the modifier of the AI's values to be super intelligent, better than all humans at that task, to be more trustworthy, to do the right thing better than any best-intentioned human. Admiteddly, this is a tall order, an order Eliezer is trying to fill. If you are worried about rigorous standards, perhaps Elizer's proposal of Wisdom Tournaments would address your concern. Before the first line of code is ever written, I'm expecting Eliezer to expound upon these points in sufficient detail. > Clear and unambiguous initial values are elaborated > in the learning process, forming connections via the > AI's simulation model with many other values. Human > babies love their mothers based on simple values about > touch, warmth, milk, smiles and sounds (happy Mother's > Day). But as the baby's mind learns, those simple > values get connected to a rich set of values about the > mother, via a simulation model of the mother and > surroundings. This elaboration of simple values will > happen in any truly intelligent AI. Why will this elaboration happen? In other words, if you have a design, it should not only convince us that the elaboration will occur, but that it will be done in the right way and for the right reasons. Compromising any one of those could have disastrous effects for everyone. > I think initial AI values should be for simple > measures of human happiness. As the AI develops these > will be elaborated into a model of long-term human > happiness, and connected to many derived values about > what makes humans happy generally and particularly. > The subtle point is that this links AI values with > human values, and enables AI values to evolve as human > values evolve. We do see a gradual evolution of human > values, and the singularity will accelerate it. I think you have good intentions. I appreciate your concern for doing the right thing and helping us all along on our individual goals to be happy(**), but if the letter-of-the-law is upheld and there is no-true-comprehension as to *why* the law is the way it is, we could all end up invaded by nano-serotonin-reuptake-inhibiting-bliss-bots. I think Eliezer takes this great idea you mention, of guiding the AI to have human values and evolve with human values, one step further. Not only does he propose that the AI have these human(e) values, but he insists that the AI know *why* these values are good ones, what good "human" values look like, and how to extrapolate them properly (in the way that the smartest, most ethical human would), should the need arise. Additionally, we must consider the worst case, that we cannot control rapid ascent when it occurs. In that scenario we want the the AI to be ver own guide, maintaining and extending-as-necessary ver morality under rapid, heavy mental expansion/reconfiguration. Should we reach that point, the situation will be out of human hands. No regulatory guidelines will be able to help us. Everything we know and cherish could depend on preparing for that possible instant. (**) Slight irony implied but full, sincere appreciation bestowed. I consider this slightly ironic, since I view happiness as a signal that confirms a goal was achieved rather than a goal, in-and-of-itself. > Morality has its roots in values, especially social > values for shared interests. Complex moral systems > are elaborations of such values via learning and > reasoning. The right place to control an AI's moral > system is in its values. All we can do for an AI's > learning and reasoning is make sure they are accurate > and efficient. I'll agree that the values are critical linchpins as you suggest, but please do not lose sight of the fact that these linchpins are part of a greater machine with many interdependencies and exposure to an external, possibly malevolent, world. The statement: "All we can do for an AI's learning and reasoning is make sure they are accurate and efficient" seems limiting to me, in the light Eliezer's writings. If we can construct a mind that will solve this most difficult of problems (extreme intellectual ascent while preserving Friendliness) for us and forever, then we should aim for nothing less. Indeed, not hitting this mark is a danger that people on this list take quite seriously. Thank you for your interest and for joining the discussion. -- Durant Schoon ------- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/?listname=agi@v2.listbox.com