By Srinivas Poosarla
Today data is what fuels the digital economy. The data that we as digital natives generate every moment, when we shop, drive, dine, talk or even walk could be both an asset and liability depending on what we choose to do with it. As an asset, data can provide rich insights into sales, customer experiences, supply-chain issues, hidden correlations by use of AI and big data, all of which create value by further enhancing business outcomes. Data can also become a liability if it is used inappropriately or held indefinitely leading to regulatory non-compliances. For an enterprise, regardless of whether data is asset or liability, it needs to be managed. Some of the common questions one encounters in an enterprise that lacks robust data governance program are: How did we use outdated data to make such important business decision? Don’t we know what data we lost in yesterday’s fire accident in the server room? How did we end up having multiple sources for the same data?
Data governance constitutes processes, roles, standards, metrics, and tools to maximize value from data while minimizing risk throughout the data life cycle. What is the relevance of data governance in reducing liability for entities processing personal information? Three areas are prominent.
Data Privacy
The first step towards privacy compliance involves developing a thorough understanding of various personal information intended to be processed, their purpose and context, intended data recipients, involvement of any third parties, country of processing among others. This data inventory exercise, often preceded by data discovery, includes entire spectrum of personal information. In the context of privacy, they will broadly include identifiers such as name, photograph; quasi-identifiers such as age, gender; and data streams such as our purchase history, geo location trail, driving habits that we constantly leave in the cyberspace while we participate in the digital ecosystem. Data inventory also helps assess impact of a cyber-attack and fulfil individual right to access and erasure. Tagging by virtue of being machine readable greatly reduces the time taken to retrieve data to fulfil such rights. In order to comply with storage limitation principle, as part of data governance, routine data deletion is an imperative. Unstructured data generated during business due to temporary files not being routinely purged, or data stored outside databases poses both security and privacy risks due to limited visibility and control and need to be part of data governance.
Data Security
While the domain of information security constitutes diverse activities involving physical security, asset management, background checks, incident monitoring, and business continuity, the primary intent is always to ensure confidentiality, integrity, and availability of the underlying data. All data may not need equal degree of protection and depending on the sensitivity, consequential harm likely to be caused as a result of data loss or availability, and any sectoral regulatory requirement, the measures including access control systems deployed may vary, based on data classification exercise carried out on the data inventory. While a ‘crown jewel’ for instance may warrant biometric authentication to access, for ‘restricted data’, multi-factor authentication may be a more cost-effective measure, whereas for ‘company internal data’, a simple password authentication may be a reasonable security measure. As part of data governance such classification exercise must be routinely revisited involving all stakeholders, including data owners and users. Labelling data based on such classification exercise will also help automatically detect or prevent leakage of confidential information with help of Data Loss Prevention (DLP) tools. Data provenance that records historical information about data helps in forensic analysis, detecting integrity violations, fraud detection and to ascertain data quality.
Responsible AI
Machine learning whether supervised or unsupervised, derives patterns, inferences or content, based on learnings from the data. When such data is personal information, it is all the more important to know the source of data, how it moved across entities, what changes if any. Termed data lineage, this is crucial part of AI governance that helps deploy right measures before using data for training algorithms and thereby minimize bias and inaccuracies. For instance, if data was publicly scraped from websites, and not anonymized, it is important for the AI user entity to know how to handle data principal rights, who will take accountability to determine legal grounds of processing the data. Data that is part of a user query in generative AI model, often used for continuous machine learning, is also personally identifiable information and part of data governance.
Executing a comprehensive data governance program is not a one-time activity but a sustained function. The exact roles and structure will vary across organizations, but largely they may have a data owner who decides who can access the data and how long it should be kept; a data custodian who is responsible for storage and ease of access; and a data steward who takes ownership for the quality and use of data. For data governance of personal information, the data protection officer (DPO) often plays the role of data steward. Given the growing data trove, and pace of new technologies to harness value from them, data governance will assume much greater significance. In the long term, organizations that take a holistic approach, integrating the governance processes and roles with the right tools, would maximize value from data while containing associated risks.
(The author is Srinivas Poosarla, Global Chief Privacy Officer, Infosys, and the views expressed in this article are his own)