TLDR: BigID has launched ‘Data Cleansing for AI,’ a new capability designed to help organizations mitigate risks associated with generative AI by automatically removing or tokenizing sensitive and high-risk data before it is used in AI tools and large language models (LLMs). This aims to enable security teams to build more secure and trustworthy AI pipelines while fostering innovation.
NEW YORK – August 6, 2025 – BigID, a leading platform specializing in data security, privacy, compliance, and AI governance, today announced the release of ‘Data Cleansing for AI.’ This innovative capability is engineered to empower organizations in reducing AI-related risks by ensuring that high-risk and sensitive data is either automatically removed or tokenized prior to its ingestion by generative AI tools and large language models (LLMs). The objective is to facilitate the creation of safer and more trustworthy AI pipelines, thereby allowing security teams to advance AI initiatives with confidence and without impeding innovation.
“Every AI pipeline is only as secure as the data behind it,” stated Nimrod Vax, Chief Product Officer and Co-Founder at BigID. “With Data Cleansing for AI, we’re giving security teams the power to take action so they can protect sensitive data, reduce risk, and drive AI initiatives with confidence.”
The ‘Data Cleansing for AI’ solution enables organizations to redact or tokenize sensitive information at scale across both structured and unstructured data formats. This proactive approach is crucial in preventing confidential data from being inadvertently embedded into model outputs, leaked through prompts, or misused in downstream AI applications. The capability offers native support for a wide array of data types, including emails, PDFs, collaboration files, and databases, providing teams with the necessary control to secure data before it ever reaches AI systems.
Key benefits and features of BigID’s Data Cleansing for AI include:
Automatic removal or tokenization of sensitive and high-risk data before its utilization in AI.
Strengthening of Generative AI pipeline security through the use of pre-cleansed, policy-compliant datasets.
Reduction of exposure to critical risks such as data leakage, prompt injection, and unauthorized data use.
Comprehensive support for various data formats, including structured data, unstructured files like PDFs and emails, and SaaS files.
Integration as a component of BigID’s broader Secure Data Pipeline solution, which also encompasses GenAI Catalog, Search, and Safe-for-AI Labeling.
Also Read:
- F5 Bolsters Security Platform with Advanced AI Data Leak Prevention for Enterprise AI Workloads
- Tenable Unveils AI Exposure to Fortify Enterprise Generative AI Security
BigID’s overarching mission is to help organizations connect the dots in data and AI for enhanced security, privacy, compliance, and AI data management. The company assists customers in discovering, understanding, managing, protecting, and acting upon high-risk and high-value data across their entire data landscape, whether in the cloud, on-premise, or elsewhere. BigID has garnered significant recognition for its innovation, including being named a World Economic Forum Technology Pioneer, a Forbes Cloud 100 company, and a leader in Data Security Posture Management (DSPM) and Privacy Management.


