In a previous note, here, I wrote that one of the requirements for Generative AI products/services in China is that if it uses data that contains personal information, the consent of the holder of the personal information needs to be obtained. It seems self-evident that this needs to be a requirement. It is also not a requirement specific to Generative AI – consent is sought on most non-AI systems when they record personal data.
In the case of Generative AI, or AI in general, the personal data consent requirement leads to an interesting problem.
If I provide consent to give, say, my age, home address, and hobbies, these are useful in an AI product/service only if they lead to an action that is a function of those parameters (and probably many others). Simply put, there’s no point asking me for consent for that information if it is not used, and the most likely uses are going to be
- to make a recommendation to me that takes into account my age, home address, and hobbies, and/or
- to observe my actions, in order to infer general properties of others similar to me across these parameters, so similar in age, living in proximity, and/or having similar interests.
After I provide consent, my personal data will be used to compute some new data, let’s call it derived data.
As a side note, if we were to formalize this in a mathematical logic, my consent would lead to new propositions in the knowledge base, and the closure of the knowledge base would change – see, for example, the classic, here: Alchourrón, Carlos E., Peter Gärdenfors, and David Makinson. “On the logic of theory change: Partial meet contraction and revision functions.” The journal of symbolic logic 50.2 (1985): 510-530. Or you can read the entry on the Logic of Belief Revision, Stanford Encyclopedia of Philosophy, here.
In simpler terms, my consent adds new data as input to AI, and AI computes something new as a result. The input data increased, and computed data increased.
What if I now retract my consent, which I should normally be able to do: does it remove only the data I consented to share, or the data that was generated through computations on the basis of the data I consented to provide?
Removing only the data I consented to provide, versus removing also the data that was computed, are two different problems, both for AI product design and product operation. The former does not necessarily lead to having to retrain AI, the latter does.