Future Internet, Vol. 18, Pages 41: AnonymAI: An Approach with Differential Privacy and Intelligent Agents for the Automated Anonymization of Sensitive Data


Future Internet, Vol. 18, Pages 41: AnonymAI: An Approach with Differential Privacy and Intelligent Agents for the Automated Anonymization of Sensitive Data

Future Internet doi: 10.3390/fi18010041

Authors:
Marcelo Nascimento Oliveira Soares
Leonardo Barbosa Oliveira
Antonio João Gonçalves Azambuja
Jean Phelipe de Oliveira Lima
Anderson Silva Soares

Data governance for responsible AI systems remains challenged by the lack of automated tools that can apply robust privacy-preserving techniques without destroying analytical value. We propose AnonymAI, a novel methodological framework that integrates LLM-based intelligent agents, the mathematical guarantees of differential privacy, and an automated workflow to generate anonymized datasets for analytical applications. This framework produces data tables with formally verifiable privacy protection, dramatically reducing the need for manual classification and the risk of human error. Focusing on the protection of tabular data containing sensitive personal information, AnonymAI is designed as a generalized, replicable pipeline adaptable to different regulations (e.g., General Data Protection Regulation) and use-case scenarios. The novelty lies in combining the contextual classification capabilities of LLMs with the mathematical rigor of differential privacy, enabling an end-to-end pipeline from raw data to a protected, analysis-ready dataset. The efficiency and formal guarantees of this approach offer significant advantages over conventional anonymization methods, which are often manual, inconsistent, and lack the verifiable protections of differential privacy. Validation studies, covering both controlled experiments on four types of synthetic datasets and broader tests on 19 real-world public tables from various domains, confirmed the applicability of the framework, with the agent-based classifier achieving high overall accuracy in identifying confidential columns. The results demonstrate that the protected data maintains high value for statistical analysis and machine learning models, highlighting AnonymAI’s potential to advance responsible data sharing. This work paves the way for trustworthy and scalable data governance in AI through a rigorously engineered automated anonymization pipeline.



Source link

Marcelo Nascimento Oliveira Soares www.mdpi.com