Discussion about this post

User's avatar
Silas Abrahamsen's avatar

I really liked this (and wanted to read it)! And I think your analysis is very compelling!

As for the value question, this has given me a lot to think about, but just as an initial reaction, I feel myself pulled in two directions:

On the one hand your wire-heading examples are quite intuitively moving, and I share your feeling that I would never want to change my terminal values.

On the other, there is something incredibly strange to me about how much weight this puts on procedure. If I start out having relationships as a terminal values, it would be bad for me to be wire-headed into having hermitage as a terminal value. But if I started off with the reverse preferences, the reverse change would also be bad.

Suppose I am about to have a child, and I can choose what he will be like. I can either choose to have hermit-Silas Jr. or relationship-Silas Jr. Suppose also that life would be somewhat easier for hermit-Silas Jr. In that case, it seems like I should choose to have him, as his life would be better due to his terminal values being more easily attained or something. However, if I already had relationship-Silas Jr., it would be bad for me to wire-head him into hermit-Silas Jr. That is really strange to me!

We do see these sorts of structures in theories of action (e.g. with deontic side-constraints), but it is a lot stranger to me to have them in a theory of value.

In any case this is very good!

(Also just a fun sidenote: I actually think the view you seem to be gesturing at fits better with the (admittedly imprecise) idea that I shouldn't care whether my life is objectively bad. If I should sometimes be wire-headed despite not wanting to, that seems like a case where I should in some sense care about something I don't--whereas if what matters is my current terminal values, what matters is just what I care about (in a more direct sense)).

Expand full comment
Linch's avatar

I think I used to relate to terminal values ~the same way as you do. I no longer do.

I think I have several objections. The first objection is game theoretic. In certain situations, becoming an agent that terminally values vengeance (or reciprocity) may yield greater utility according to my other values than being an agent that only values vengeance when the reputational benefits exceed the costs. This is because an agent that only instrumentally values vengeance might have "one thought too many" when it comes to vengeance. So (e.g.) you can be exploited by other agents who deem that you won't seek vengeance on them if they deem that after being aggrieved, you'd realize that the long-term benefits of your credible commitments.

The second objection is a generalization of the first in other multi-agent dilemmas with transparent values. For example some parties may only agree to contracts with you if you demonstrate that you terminally value fulfilling contracts (or a close proxy like keeping promises) at >1%.

Note that while these examples require transparent values, I consider them much more realistic than "arbitrary predictor problems" as you put it.

The third objection comes from the opposite of transparent values. Our current terminal values, for most of us, are murky not only to others but to ourselves. While idealized agents may have clean separation of terminal and instrumental values, I'm not convinced that humans do. (And indeed empirically I observe many people either being confused about their values, or being very quickly certain about their values in ways that I think are not tracking important epistemic processes). Thus, I think it is bad to prematurely "lock in" what you perceive to be your terminal values, as much of the difficulty is figuring out what those values are in the first place, consistent or otherwise.

Expand full comment
3 more comments...

No posts