Private Data Use Consent as a Generative AI Compliance Requirement

In a previous note, here, I wrote that one of the requirements for Generative AI products/services in China is that if it uses data that contains personal information, the consent of the holder of the personal information needs to be obtained. It seems self-evident that this needs to be a requirement. It is also not a requirement specific to Generative AI – consent is sought on most non-AI systems when they record personal data.

In the case of Generative AI, or AI in general, the personal data consent requirement leads to an interesting problem.

If I provide consent to give, say, my age, home address, and hobbies, these are useful in an AI product/service only if they lead to an action that is a function of those parameters (and probably many others). Simply put, there’s no point asking me for consent for that information if it is not used, and the most likely uses are going to be

to make a recommendation to me that takes into account my age, home address, and hobbies, and/or
to observe my actions, in order to infer general properties of others similar to me across these parameters, so similar in age, living in proximity, and/or having similar interests.

After I provide consent, my personal data will be used to compute some new data, let’s call it derived data.

As a side note, if we were to formalize this in a mathematical logic, my consent would lead to new propositions in the knowledge base, and the closure of the knowledge base would change – see, for example, the classic, here: Alchourrón, Carlos E., Peter Gärdenfors, and David Makinson. “On the logic of theory change: Partial meet contraction and revision functions.” The journal of symbolic logic 50.2 (1985): 510-530. Or you can read the entry on the Logic of Belief Revision, Stanford Encyclopedia of Philosophy, here.

In simpler terms, my consent adds new data as input to AI, and AI computes something new as a result. The input data increased, and computed data increased.

What if I now retract my consent, which I should normally be able to do: does it remove only the data I consented to share, or the data that was generated through computations on the basis of the data I consented to provide?
Removing only the data I consented to provide, versus removing also the data that was computed, are two different problems, both for AI product design and product operation. The former does not necessarily lead to having to retrain AI, the latter does.

What Does a Training Data Market Mean for Authors?

If any text can be training data for a Large Language Model, then any text is a training dataset that can be valued through a market for training data. Which datasets have high value? Wikipedia, StackOverflow, Reddit, Quora are examples that have value for different reasons, that is, because they can be used to train…

Algorithmic Accountability Act for AI Product Managers: Sections 1 and 2

The Algorithmic Accountability Act (2022 and 2023) applies to many more settings than what is in early 2024 considered as Artificial Intelligence. It applies across all kinds of software products, or more generally, products and services which rely in any way on algorithms to support decision making. This makes it necessary for any product manager…

Can LLM AI Be a Source of Competitive Advantage?

Let’s start with the optimistic “yes”, and see if it remains acceptable. Before we get carried away, a few reminders. For an LLM to be a source of competitive advantage, it needs to be a resource that enables products or services of a firm “to perform at a higher level than others in the same…

Business Risks of IP Compliance Requirements for Generative AI

IP compliance requirements on generative AI reduce the readily and cheaply available amount of training data, with a few consequences on how product development and product operations are done.

Machine/AI as Inventor? Notes on Thaler v. USPTO

Can “an artificial intelligence machine be an ‘inventor’ under the Patent Act”? According to the Memorandum Opinion filed on September 2, 2021, in the case 1:20-cv-00903, the US Patent and Trademark Office (USPTO) requires that the inventor is one or more people [1]. An “AI machine” cannot be named an inventor on a patent that…

What Is the Depth of Expertise of an AI Training Dataset?

I use “depth of expertise” as a data quality dimension of AI training datasets. It describes how much a dataset reflects of expertise in a knowledge domain. This is not a common data quality dimension used in other contexts, and I haven’t seen it as such in discussions of, say, quality of data used for…

Similar Posts