Limits of Explainability in AI Built Using Statistical Learning
How good of an explanation can be provided by Artificial Intelligence built using statistical learning methods? This note is slightly more complicated than my usual ones.
In logic, conclusions are computed from premises by applying well defined rules. When a conclusion is the appropriate one, given the premises and the rules, then it is said that there is the logical consequence relation from premises to the conclusion. In other words, if A are premises, and B is a conclusion, and if there is a logical consequence relation from A to B, then it is possible to determine the steps to get from A to B in that formal logic, and knowing any other information / statements that may be needed to get from A and B.
In formal logic, an explanation consists of providing the steps taken to compute the conclusion from the given premises: explanations are proofs.
In informal logic (see Walton [1]), an explanation will consist of all arguments that support the conclusion.
In formal argumentation frameworks [2], the explanation for an acceptable argument will be the valuation of all arguments linked to it through attack relations.
In all three kinds of frameworks above, an explanation will consist of a set of statements and relations over statements. Relations will have some intuitive interpretation, such as “if A then B”, or “A supports B”, or “A attacks B”, and so on. All of these meet many of the conditions for good explanations – see my note on what explanations are, here.
If an AI is built through any of the above frameworks, that is, it performs inference from premises to a conclusion by applying some limited set of rules to statements, and both the rules and statements they are applied to are understandable to people, then it is clear in such an AI what an explanation is, and what it means for the AI to produce an explanation for given premises and conclusion.
If an AI is built through statistical learning, it will relate statements through parameters estimated over data. These parameters may not have an intuitive interpretation. The bigger the underlying set of parameters, the less likely it is that they individually have intuitive meanings.
In other words, as discussed for example in [3], a “local” “explanation” can be provided, where the conclusion / output is explained by providing a simpler statistical model that includes few parameters that played the most important role in leading to the conclusion from the premises / inputs. It is not clear how this applies when the number of parameters is in billions or more; Chat-GPT 4 has 1.76 trillion parameters.
Explanations as in logic are not feasible with AI made with statistical learning methods. Consequently, an explanation, in the sense of meeting the following criteria that scientific explanations tend to aim for (see the note here) cannot be produced by an AI such as those based on Large Language Models.
- Logical form: Explanation’s logical form is that of an argument; it is a pair of one or more propositions, called explanans (the premises of the argument), and another proposition, called explanandum (the conslusion of the argument).
- Law-like premises: One or more premises in the explanation must state natural laws.
- Empirical evidence: There must be empirical evidence for the truth of the premises in an explanation.
- Causality: Explanations refer to causal relationships between phenomena that premises describe, and phenomenon that the conclusion describes and which is being explained.
- Minimality: Premises should be necessary and sufficient to explain (in the sense above) the phenomenon.
- Unification: As much as possible, an explanation should refer to and build on existing explanations; this may be by relating the same primitives, reusing premises, and so on.
- Intelligibility: One’s understanding of an explanation depends on what one knows and assumes when consuming the explanation, what one wants to do having learned the explanation, and the various other factors which determine context. An explanation needs to be formulated in a way which accounts for the context in which it will be used, and it will be considered as intelligible if those who use it can, when they can recognize phenomena described in the premises, also anticipate and recognize phenomena that the explanandum is about.
AI regulation cannot address this problem, as is visible in the Algorithmic Accountability Act (see the note here), It requires good data governance, impact assessment of the usage of AI, and some degree of transparency in the design of an AI system, but none of that solves the underlying risk, that the system is unpredictable, precisely because it is impossible for it, by design, to provide explanations.
The appeal of logic is that it is possible to trace the conclusion back to all possible combinations of premises that could have generated it. In turn, it makes it easier to trust a system that produces inferences in such a way. It does not mean that the system will produce the right conclusions, but it ensures that whichever conclusions it delivers, we can go back and audit exactly how those were computed from intuitively interpretable statements and rules.
That has a problem as well, since AI systems built on formal logic cannot outperform AI systems made through statistical learning from large scale training data. We can have trustworthy and simple reasoning, or complex automated reasoning that is hard to trust.
References
- Walton, Douglas. Informal logic: A pragmatic approach. Cambridge University Press, 2008.
- Dung, Phan Minh. “On the acceptability of arguments and its fundamental role in nonmonotonic reasoning, logic programming and n-person games.” Artificial intelligence 77.2 (1995): 321-357.
- Ribeiro, Marco Tulio, Sameer Singh, and Carlos Guestrin. “”Why should I trust you?” Explaining the predictions of any classifier.” Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016.