Detect privacy risks of AI models

Published at 12 Sept 2025 - Updated at 18 Sept 2025

The diversification of the uses of Artificial Intelligence raises new risks of confidentiality. Targeted cyber-attacks aim to extract training data from models. In this context, the Group DataLab (TEC) is developing new tools to anticipate and detect these threats, in order to put in place appropriate responses and preserve users' trust.

This work, carried out jointly with the École Polytechnique as part of the “AI de Confiance et Responsable” chair, has already been shared at the Google Responsible AI Summit (Paris, 2024) and the AFIA Industrial Forum on AI (Paris, 2025). They will also be presented at the highly selective ECAI conference in Bologna in October 2025.

How can we anticipate cyber attacks on the confidentiality of AI systems?

What is a privacy attack?

Like any information system, Artificial Intelligence solutions can be subject to many types of cyber-attacks, whether to poison training data, influence decision-making, or divert the system from the use case for which it was designed. All of these risks threaten the safety and reliability of AI systems in production, so properly assessing them is crucial to anticipating them and deploying the most appropriate responses.

Some of these cyber attacks are aimed at recovering training data from AI models. Current AI systems, particularly in the banking sector, are trained on large volumes of data that may include confidential, internal and/or personal data. These attacks measure the system’s behaviour when it is presented with malicious input data, in order to “guess” the training data that may have induced such behaviour. In the most severe cases, these attacks can lead to the regurgitation of the verb of the training data.

What are the defence techniques?

Researchers and industrialists have defined a general framework for designing AI systems to improve their robustness in the face of these threats. Thus, two types of defensive approaches exist:

Proactive approaches, aimed at improving the intrinsic robustness of models during their design phase;
Reactive approaches, aimed at strengthening the control of data input to the model in production.

However, these defensive techniques come at a significant cost in terms of system performance, speed and availability.It is therefore important to accurately measure the vulnerability of each system in order to size the defence techniques to be implemented to maintain its performance while protecting it as effectively as possible.

Our approach to predicting system vulnerability

We have developed a new approach, simple to implement and very light in calculation time, to predict the vulnerability of training data in an AI system. This approach includes analysing how the AI model represents each piece of data and how it interacts with the rest of the training data. It has been validated both theoretically and through experiments on multiple models and datasets.

Ultimately, this approach will increase model performance by ensuring that data receives the right level of protection to maintain trust while preserving model efficiency. Finally, to adapt to the inventiveness of hackers, defensive techniques like ours must evolve continuously, and that is why our approach is more generally part of the methodological and technological framework for designing trusted AI.

And specifically at Crédit Agricole?

For several years, the Group DataLab has been interested in specialised cyber attacks as part of its R&D work on the design of trusted AI. In addition, the work of the “AI de Confidence et Responsable” chair with the École Polytechnique deals in particular with attacks on text-based generative AI, the use of which is intensifying in digitalised banking processes. Their protection has therefore become a priority.

This work enabled the establishment of a framework for assessing the robustness of this type of AI in the face of confidentiality attacks in order to deploy the most appropriate defence techniques. They led to the publication of a research paper at a major scientific conference in the field, the European Conference on Artificial Intelligence (ECAI)(1), where they will be presented on 27 October in Bologna.

This new work enhances the Group’s Certified Methodology (LNE) and its Common Technology Assets(2) to promote the design of trustworthy AI systems, with reactive defence techniques to alert in the event of an attack and thus limit the impact. They will also enrich the AI risk detection and reduction levers integrated into the Group Design Authority IA normative framework. Finally, the AI Factory Group will build on these advances to support entities wishing to strengthen the resilience of their AI systems.

For more information on this topic, please contact:

Aymen SHABOU, CTO DataLab Group & AI Factory Group
Jérémie Dentan, PhD candidate at École Polytechnique / Crédit Agricole Group DataLab - AI Chair in Trust and Head of

Notes

(1) ECAI is a conference that brings together hundreds of researchers, students and industry professionals each year for a week of discussions, workshops and presentations around Artificial Intelligence. This event serves as a platform for innovation and collaboration in AI, attracting participants from Europe and beyond.

(2) Joint asset: industrial software solution or brick sufficiently generic (architecture, Frameworks, methods, codes, components of the CAGIP shared offer, etc.) to be adapted to new contexts and uses with a controlled effort.

Detect privacy risks of AI models

Sustained activity and strong results