Europe Issues Guidance on the Interplay Between Data Protection and Generative AI

by: Max Landaw

On June 3, the European Data Protection Supervisory (EDPS) issued its Guidelines on generative AI.  While the EDPS issued these guidelines only in its capacity as a data protection supervisory authority (not in its new role as AI supervisory authority under the new AI Act), and the advice is meant for EU institutions, bodies, offices, and agencies (EUIs), the guidelines are telling with respect to how the EU plans on handling any tension between data protection and artificial intelligence broadly. 

Below we list out some of the more important aspects of these guidelines and how they may translate to the private sector.

Bases for Processing: Legitimate Interests

The EDPS confirmed that “Service providers of generative AI models may use legitimate interest under [GDPR] as a legal basis for data processing, particularly with regard to the collection of data used to develop the system, including the training and validations processes.”  Article 6 of GDPR lays out the 6 permissible bases for processing personal data.  One of the more popular bases is “legitimate interests” because of its versatility.  Using legitimate interests requires a balancing test between the controller’s purposes and the rights and freedoms of the data subject.  However, such a basis poses ambiguities for many controllers as it is sometimes not clear how a data supervisory authority would view certain processing activities on balance. 

The EDPS’s clarification is a very useful clarification to note that the EDPS’s argument regarding the use of legitimate interest to develop generative AI models appears to still apply outside the public sector.  This was not a given as other types of processing purposes such as behavioral advertising have been listed as ones where legitimate interests is not likely available.  Note however, that the EDPS clarified that “web scraping techniques to collect personal data, through which individuals may lose control of their personal information when these are collected without their knowledge, against their expectations, and for purposes that are different from those of the original collection.”  Therefore, companies should not assume that legitimate interests can be used as a blanket permission for all processing activities in the context of an AI model and certain practices which are common to accumulate a large storehouse of data need to be analyzed on a case by case basis.

AI Supplier Due Diligence

One area that the EDPS focused on especially was due diligence with respect to AI suppliers.  This of course makes sense in the context of EUIs but is telling with regard to what data supervisory authorities might look at in the context of an AI system’s malfunction or data breach. 

Private companies can utilize similar advice that the EDPS gives to EUIs, namely to ensure that generative AI suppliers:

  • Have specific controls that have been put in place to guarantee anonymized datasets (if that is what the supplier is advertising);

  • Only use datasets provided by trusted sources and carry out regular verification and validation procedures, including for in-house datasets;

  • Integrate specific controls tailored to the already known vulnerabilities of these systems in a way that facilitates continuous monitoring and assessment of their effectiveness.

  • Test that the generative AI system is not leaking personal data that might be present in the system’s knowledge base;

  • Have kept “privacy by design” in mind especially with regard to enabling responses to data subject requests (we discuss this more below); and

  • Provide contractual assurances and documentation regarding the procedures used to ensure the accuracy of the data used for the development of the system.

Transparency and Automated Decision-Making

One of the pillars of GDPR and the AI Act is transparency.  The EDPS provides some common sense advice that the privacy notice for AI systems should include the origin of the datasets, the curation/tagging procedure, and associated processing.  Additionally, since AI can involve profiling and automated decisions, such transparency must include “meaningful information about the logic of such decisions, as well as their meaning and possible consequences on the individuals” related to the algorithms and data sets.

More noteworthy though is that the EDPS specifically states that certain systems such as chatbots “may require specific transparency requirements, including informing individuals that they are interacting with an AI system without human intervention.” 

These guidelines can easily be translated to the private sector.  This means that while the broad privacy notice for AI systems need to take into account specific disclosures related to the functionality of the AI system, more specific use cases that would reasonably require further transparency may merit a just in time notice, such as for chatbots.     

Bias

One of the chief concerns of the AI act is bias.  As such it is no surprise that the EDPS admonishes EUIs from using AI systems that can “lead to significant adverse consequences for individuals’ fundamental rights and freedoms . . . .”  One of the more striking elements in the EDPS’s discussion here is with regard to the example they provide:

EU-X is assessing the existence of sampling bias on the automated speech recognition system. Translation services have reported significantly higher word error rates for some speakers than for others. It seems that the system has difficulties to cope with some English accents. After consulting with the developer, it has concluded that there is a deficit in the training data for certain accents, notably when the speakers are not native. Because it is systematic, EU-X is considering refining the model using its own-generated datasets.

This provides a good example for the private sector around automated speech recognition systems and how bias can play a role.  While the use case here makes more sense in the context of an EUI transcribing, for example, testimony in the courtroom, these are the kind of scenarios to think about when developing an AI system and how unintended issues regarding mispronunciation can lead to accusations of harmful bias.

Challenges With Data Subject Rights

The EDPS mentions that “the exercise of individual rights can present particular challenges, not only in the area of the right of access, but also in relation to the rights of rectification, erasure and objection to data processing.”  One example the EDPS gives is how even something as simple as the right of access can be difficult to fulfill.  In a generative AI system, it would be difficult to obtain a traceable record of personal data. For example, large language models utilize word embedding, in which words like “cat” or “dog” are not stored as strings of text but rather are represented through numerical vectors.  This would make searches for such terms more difficult (if not impossible) for the purposes of access, rectification, and deletion.

The only tangible advice the EDPS gives in this regard in terms of accessibility to inputted data elements is to minimize the amount of personal data entering the AI system to begin with and keep a record of corresponding processing activities.

This advice should not come as a surprise.  Europe has already been plagued to some degree with the popularization of technologies that do not allow for the simple fulfillment of data subject rights.  See for example, the French CNIL’s analysis of blockchain technology which acknowledges significant issues with respect to the rights of erasure and rectification.

The advice then broadly even in the private sector seems to be twofold:

  • Minimize the amount of personal data being ingested into the AI system; and

  • Ensure that AI suppliers have thought about these rights as part of the due diligence process (for AI developers, this means keeping the GDPR concept of “privacy by design” in mind as you are developing the AI system).

Data Protection Impact Assessments and AI

GDPR requires that a controller carry out a data protection impact assessment (DPIA) before any processing operation that is likely to implicate a high risk to fundamental rights and freedoms of individuals.  The EDPS points out the well known elaboration that any new technologies which process personal data should merit a DPIA.  The EDPS confirms that generative AI constitutes such a new technology meriting a DPIA. 

Private companies should take up this kind of advice accordingly and ensure that they have carried out a DPIA before the deployment of any new AI system.

Originally published by InfoLawGroup LLP. If you would like to receive regular emails from us, in which we share updates and our take on current legal news, please subscribe to InfoLawGroup’s Insights HERE.