5 Comments
User's avatar
Silas Abrahamsen's avatar

I really liked this (and wanted to read it)! And I think your analysis is very compelling!

As for the value question, this has given me a lot to think about, but just as an initial reaction, I feel myself pulled in two directions:

On the one hand your wire-heading examples are quite intuitively moving, and I share your feeling that I would never want to change my terminal values.

On the other, there is something incredibly strange to me about how much weight this puts on procedure. If I start out having relationships as a terminal values, it would be bad for me to be wire-headed into having hermitage as a terminal value. But if I started off with the reverse preferences, the reverse change would also be bad.

Suppose I am about to have a child, and I can choose what he will be like. I can either choose to have hermit-Silas Jr. or relationship-Silas Jr. Suppose also that life would be somewhat easier for hermit-Silas Jr. In that case, it seems like I should choose to have him, as his life would be better due to his terminal values being more easily attained or something. However, if I already had relationship-Silas Jr., it would be bad for me to wire-head him into hermit-Silas Jr. That is really strange to me!

We do see these sorts of structures in theories of action (e.g. with deontic side-constraints), but it is a lot stranger to me to have them in a theory of value.

In any case this is very good!

(Also just a fun sidenote: I actually think the view you seem to be gesturing at fits better with the (admittedly imprecise) idea that I shouldn't care whether my life is objectively bad. If I should sometimes be wire-headed despite not wanting to, that seems like a case where I should in some sense care about something I don't--whereas if what matters is my current terminal values, what matters is just what I care about (in a more direct sense)).

Expand full comment
Jack Thompson's avatar

I agree on both counts. I don't usually think about the value of a state as path-dependent, and that does feel weird. And your sidenote is why I no longer like the title of my last post: it is *precisely* my preferences, in the form of values, that makes it bad that I get modified in certain ways!

Expand full comment
Steffee's avatar

I think bodily autonomy is an important thing for society to protect.

So I think the weird part about accidental wireheading is that it undoubtedly violates a person's autonomy... but then seems to immediately undo the violation?

If the wireheading was forced onto you by someone else, it would be a straightforward case of assault, of mindrape.

But when it's accidental, it sort of feels like no harm, no foul? But why should the badness of this event depend on who or what caused the event? That's not how morality normally works.

One possible resolution to this tension would be to argue that mindrape isn't that bad. But that sounds wild. So I don't know.

Expand full comment
Linch's avatar

I think I used to relate to terminal values ~the same way as you do. I no longer do.

I think I have several objections. The first objection is game theoretic. In certain situations, becoming an agent that terminally values vengeance (or reciprocity) may yield greater utility according to my other values than being an agent that only values vengeance when the reputational benefits exceed the costs. This is because an agent that only instrumentally values vengeance might have "one thought too many" when it comes to vengeance. So (e.g.) you can be exploited by other agents who deem that you won't seek vengeance on them if they deem that after being aggrieved, you'd realize that the long-term benefits of your credible commitments.

The second objection is a generalization of the first in other multi-agent dilemmas with transparent values. For example some parties may only agree to contracts with you if you demonstrate that you terminally value fulfilling contracts (or a close proxy like keeping promises) at >1%.

Note that while these examples require transparent values, I consider them much more realistic than "arbitrary predictor problems" as you put it.

The third objection comes from the opposite of transparent values. Our current terminal values, for most of us, are murky not only to others but to ourselves. While idealized agents may have clean separation of terminal and instrumental values, I'm not convinced that humans do. (And indeed empirically I observe many people either being confused about their values, or being very quickly certain about their values in ways that I think are not tracking important epistemic processes). Thus, I think it is bad to prematurely "lock in" what you perceive to be your terminal values, as much of the difficulty is figuring out what those values are in the first place, consistent or otherwise.

Expand full comment
Jack Thompson's avatar

Re: game theory, yeah I think I mentioned this in footnote two. Ultimately this only matters if others are actually looking at your *motives* and not just your *actions*, because otherwise you could just act how a vengeful person would act. In the case where they look at your motives, it's trivially true that there can be reason to self-modify *any* part of you. One way to distinguish terminal values even in this case is that if possible you would love to set up some delayed precommitment mechanism such that your terminal values will be restored after prediction are made.

Re: murkiness, I absolutely agree that people can be confused and that lock-in is very bad. This is why I probably wouldn't self-modify as drastically as I suggested in the post. I still think we can try to approximate the idealized agent, in the same way nobody can do utilitarian calculus for real (would require computing all downstream causes & effects) but utilitarianism can still give directions as to how to act, and those directions might be more prudential/deontological than an ideal utilitarian agent would make.

Expand full comment