Can Decision Autonomy of an AI Be Distinguished from Malfunction?

I wrote in another note (here) that AI cannot decide autonomously because it does not have self-made preferences. I argued that its preferences are always a reflection of those that its designers wanted it to exhibit, or that reflect patterns in training data.

The irony with this argument is that if an AI is making decisions which are not explainable through human preferences, then the only conclusion we can make is that this AI is malfunctioning – never that it has decision autonomy.

For the sake of discussion, suppose that some firm releases an AI which was made in some fundamentally different way than current AI are. Let’s say that this AI somehow in fact has its own self-made preferences.

If this AI were to make decisions which can be explained in terms of human preferences, that is, decisions which somehow resemble meaningful human behavior, then the conclusion would be that it was designed to apply these human preferences. With such an explanation, it would be easier to think that the preferences it reveals are those that align with its designers’ preferences, not that it has decision autonomy.

If the same AI made decisions which cannot be explained by human preferences, then the simplest explanation would be that it is malfunctioning, not that it has decision autonomy.

If, instead, we wanted to claim that it is not malfunctioning, but has decision autonomy, we would need to invent some new notion of rational behavior which is not specific to people, but to AI.

In the Terminator movies, Skynet, the fictional AI, makes decisions to preserve its existence. In the context of this discussion, this behavior is explainable through human preferences – there is no need to believe that Skynet decides autonomously. The simpler explanation is that it was designed to have those preferences, rather than it developed them out of nowhere.