Specials

Gartner: Top 3 Best Practices for Automating Data Discovery in Privacy Management

Image courtesy : https://www.bmc.com/blogs/data-masking/

By Nader Henein

Data governance involves a comprehensive set of rules that define how an organization manages its data. These rules are often spread across multiple policies such as privacy policies, data retention policies, or data classification policies. The cornerstone of any successful data governance initiative is a thorough understanding of the data an organization possesses, how it flows through enterprise systems, and the purposes for which it is used.

Data governance rules stem from two primary sources: regulatory requirements and self-imposed organizational standards. Regulatory rules, driven by legal obligations, often vary across jurisdictions and carry significant penalties for non-compliance. On the other hand, self-imposed rules are based on best practices and business needs. The initial step in applying these rules is to gain a clear understanding of where data is stored and how it is utilized, achieved through a systematic data discovery process. This process involves both unstructured and structured data discovery. While many organizations start this journey manually, automation becomes crucial for scalability. Before embarking on automation, security and risk management (SRM) leaders must consider three key best practices for effective automated data discovery. They must assess the automated discovery platform’s capacity to read, interpret, and act upon their data.

Evaluating Platform Connectivity and Data Reading Capabilities

When selecting a data discovery platform, SRM leaders must ensure it can effectively connect to and read data from diverse sources. Start by compiling a comprehensive list of your organization’s data repositories before engaging with technology providers. The ideal platform should feature a robust library of upstream connectors, capable of ingesting and analyzing data from 80% to 90% of your data stores, depending on how many specialist or legacy systems the organization maintains.

Additionally, assess the cost and feasibility of developing custom connectors for data stores not covered by the platform. It is crucial to determine whether these connectors can be developed internally using available APIs or if they require external provider support.

For unstructured data, confirm that the platform can read all file types used within your organization. Some common file types include PDFs, which are widely supported. However, some organizations may have specialized formats like CAD design documents or Digital Imaging and Communications in Medicine (DICOM) images used by healthcare providers. Verify whether your technology partner supports these file types or if custom file interpreters are necessary.

In terms of structured data integration, evaluate the platform’s ability to connect to structured data repositories via Java Database Connectivity/Open Database Connectivity connectors or application-specific APIs.

By thoroughly considering these factors, SRM leaders can select a discovery platform that effectively covers the enterprise landscape, balancing out-of-the-box capabilities with the need for custom development.

Evaluating the Platform’s Learning and Recognition Capabilities for User-Defined Data Attributes

When assessing a data discovery platform, SRM leaders must consider its ability to learn and recognize user-defined data attributes. While solutions often come with preprogrammed tags or labels such as “Personal,” “Sensitive,” or “HR,” it is essential to configure these to align with your organization’s specific data needs. The technology used to ingest data and extract appropriate tags based on data attributes may be pattern-driven through regular expressions, AI-driven through machine learning, natural language processing, and computer vision, or a combination of both.

It is unrealistic to expect that all necessary tags will be preprogrammed into the platform. Therefore, a critical evaluation point is the platform’s ability to “learn” and recognize new data attributes through training or programming. Conduct a simple trial to assess the platform’s proficiency in identifying new data attributes defined by your organization and applying the appropriate tags to relevant data or files. This process typically involves collaboration between the vendor’s implementation team and your internal team, who will later manage the platform. The trial could range from tagging PDF files with an “invoice” label to more complex tasks like extracting order numbers from scanned invoices and applying custom tags.

Ensuring Your Platform Can Orchestrate Data Governance Activities

Once your data repositories are scanned and tagged, the next step is to leverage this information effectively. The objective is not merely to understand the data but to operationalize the discovered tags to automate data governance activities. For instance, these tags can be used to automate data classification based on various tag combinations or to trigger rules in your data retention schedules. An example of this is a “CV” document governed by the General Data Protection Regulation (GDPR) being automatically deleted when the “last modified” date exceeds a predefined limit.

To achieve this, ensure that the platform you select can orchestrate your planned downstream tasks based on your data’s characteristics, such as automating data retention or classification. This orchestration can be accomplished either natively within the discovery solution or through downstream connectors to third-party platforms, like archival solutions within your enterprise architecture. Some specialized discovery platforms even offer enterprise clients the ability to develop their own downstream connectors using documented APIs. By verifying these capabilities, SRM leaders can ensure seamless integration and execution of data governance activities across the organization.

Gartner analysts will discuss top trends, strategies and technology related to security and risk management at the Gartner Security & Risk Management Summit, taking place March 10-11, 2025, in Mumbai, India.

 

 

(The author is Nader Henein, VP Analyst at Gartner, and the views expressed in this article are his own)