Debating Claude on Preferences and…

Aug 3

A fun little experiment

5 Comments

Claude rekt. Nice post! I agree with your view of well-being, and I think that to the extent that it’s counterintuitive it’s just because humans happen to have really weird preferences that make it look like there’s something more (eg prudential constraints) going on.

Expand full comment

Hans P. Niemand

Aug 3

You might find [this paper](https://philpapers.org/rec/DORPAP-4) helpful. I'll summarize the important point. He is asking roughly the same question as you: when if ever do you have a well-being-based reason to change your preferences. And he has roughly the same answer as you: never (I can't remember if he allows some exceptions, it's been a while since I read it all the way through). The part I think you'll find interesting is that he makes two distinctions in how to frame a theory of well-being-based reasons. First, you might think that what increases your well-being is either 1. The state of having a satisfied preference (state-based preferences), or 2. The *object* of a preference (object-based preferences). Second, you might think that what gives you reasons is either 1. increases in your "well-being score" (i.e. total number of satisfied preferences, or ratio of satisfied to unsatisfied preferences, or something like that) (score-based prudential reasons) or 2. the particular welfare goods themselves (goods-based prudential reasons). He argues that, if you take the second option on both of these distinctions, then you don't have prudential reason to change your preferences. Basically, the reason it seems like changing your preferences so that they're easier to satisfy, or so that they're already satisfied, could be good, is because we're reasoning like this: if I change my preferences so that I want what I already have, then I will have a bunch of satisfied preferences (state-based preferences). And in that case, I'll have more satisfied preferences than I have now (score-based prudential reasons). But if you don't assume state-based preferences and score-based prudential reasons, this line of thinking no longer goes through. If what increases your well-being is getting the particular *things you want* (rather than just having a satisfied preference), and what gives you prudential reasons is getting *particular things that are good for you* (rather than increasing your overall well-being score), then you no longer have any reason to change your preferences.

Expand full comment

Reply (1)

Jack Thompson

Aug 4

Yes, this is exactly what I'm trying to get across! I knew this wasn't a novel point, as there are papers in the AI alignment literature on this, and Yudkowsky's written about it at least indirectly. I'll have to give that paper a look :)

Expand full comment

Ali Afroz

Aug 3Edited

It occasionally seem to pair it arguments without really understanding them, but all in all I’m pretty surprised how good of a philosopher the AI turned out to be. A few points where your arguments seem weak appear worth flagging, although honestly I agree with you regarding the conclusion that wireheading is bad for you from a preference satisfaction lens, at least if by preference satisfaction, you mean the fulfilment of your own preferences instead of putting you in a state of desire fulfilment.

Your argument that a super intelligence can know all the facts and still undergo no change in its motivation is entirely correct, but it’s not entirely clear whether it practically applies to humans since it seems entirely possible that a human who underwent the experience even after it was reversed would conclude that in fact wireheading is great, and you are making a mistake, thinking that you want to examine art instead of just feel fulfilled in your desire even if you achieve it by altering your preferences. After all the depressed person also probably thinks that they have a preference in favour of not being happy and that taking their medicine would alter their preferences. Human motivation is pretty messy and incoherent, so it won’t surprise me if knowing all the facts drastically altered their preferences and I expect it would also be affected by things like the order in which they learn the facts.

Again, I agree with you that given my current preferences I don’t want to be modified into an entity that just wants easy things in order to feel fulfiled in my desires. I just think it’s possible that I am mistaken about my preferences, and there is some possibility that what I actually value is desire fulfilment for its own sake. Although realistically, it’s more likely that I’m just incoherent on the point, and depending on my experiences, could be persuade to change my mind.

Regarding the point about machines not being sufficiently different from the ordinary process of preference formation, it’s true that you care about your preferences. No matter how you got them, but suppose you went to a different culture, and after spending sufficient time there, your preference was altered regarding what kind of art you enjoy. I think it would be an unusual human who would object to this particular form of preference modification, but then the question arises. What difference is there between this example and a machine altering you to the preferences you end up with by going to the other culture. You can have preferences about how your preferences get altered, but this one doesn’t seem necessarily something that you would endorse on reflection.

Expand full comment

Reply (1)

Jack Thompson

Aug 4

I went to see David Chalmers talk about LLMs in early 2024 and he was of the opinion that Claude was the best at philosophy. I think it's still not as clever as many of the writers here but pretty good, and definitely folds less easily than ChatGPT.

"I just think it’s possible that I am mistaken about my preferences, and there is some possibility that what I actually value is desire fulfilment for its own sake."

Me too! I would love to find out about what I actually value. It would be awesome news for me to discover that I deep down value desire fulfilment.

Expand full comment

Jack's Lab

Debating Claude on Preferences and…