Everyone knows that today’s world revolves around data and being competitive requires enterprises to be data-driven. According to Accenture, 70% of the world’s most valuable corporations are data-driven, up from only 30% in 2008. Being data-driven requires enterprises to collect and store more and more data, and enterprises need to ensure that data is stored and used safely and appropriately. In other words, they need good data governance.
But for most enterprises, good data governance is really a broken promise. Or, as one financial services enterprise recently described to us:
"We have been up and down with data governance a few times in the past few years. There was a governance office. We “check-boxed” our way through and it was ultimately seen as more of a barrier than a benefit."
Enterprises cannot “check-box” their way through Data Governance. Check boxes imply manual processes. Check boxes also imply that their data really isn’t being governed.
Enterprises need to operationalize their Data Governance. They need DataGovOps.
Let’s peel this onion.
What is DataGovOps?
DataGovOps refers to the collaborative data management practice focused on improving the communication, integration and automation of context and policy among all Data Governance stakeholders in an organization, including Security, Compliance, Privacy, and Data Owners. DataGovOps automates the integration of security and compliance at every phase of the data lifecycle.
In order to fully appreciate what DataGovOps is and why it’s needed, it’s important to address:
- What data governance is supposed to be
- Why data governance in most corporations is a broken promise
- How DataGovOps fixes that broken promise.
However, to fully understand DataGovOps, we need to have a common understanding of what Data Governance is, who is responsible for Data Governance, how Data Governance functions typically operate, and the shortcomings of most Data Governance programs.
What is Data Governance?
According to Google Cloud, Data Governance is (with added emphasis):
…everything you do to ensure data is secure, private, accurate, available, and usable. It includes the actions people must take, the processes they must follow, and the technology that supports them throughout the data life cycle… Data governance means setting internal standards—data policies—that apply to how data is gathered, stored, processed, and disposed of. It governs who can access what kinds of data and what kinds of data are under governance.
Who’s Responsible for Data Governance?
Every enterprise has a Data Governance function. Whether or not it’s formally called “Data Governance” or has employees with “Data Governance” in their titles is another question.
In many large organizations, Data Governance is a distributed function or “program” across multiple teams, including:
- Security
- Compliance
- Privacy
- Data
- And maybe a few others
Because Data Governance is a distributed function, relatively few professionals actually have Data Governance in their job title. A few searches on LinkedIn reveal:
- 1,540,000 professionals with “security” in their job title;
- 635,000 professionals with “compliance” in their job title; and
- 16,000 professionals with “data” and “governance” in their job title -- a 40X to 100X difference.
So Data Governance is a cross-functional fabric that spans multiple teams and/or departments. Or, as we like to say, Data Governance takes a village -- it’s a shared responsibility that requires a significant amount of coordination and collaboration across multiple teams.
How Do Data Governance Programs Typically Operate?
Two words: divide and conquer.
To understand how a Data Governance program operates, we need to break down the above definition of data governance.
If Data Governance is everything you do to ensure data is available, secure, private, accurate, and usable, then Data Governance subsumes multiple functions. These functions include:
- Data Protection
- Data Preservation
- Ensuring data is automatically backed up
- Ensuring data is replicated/highly available
- Archiving data
- Retaining/not retaining data
- Data Security
- Ensuring data is encrypted
- Threat monitoring
- Managing access control
- Authentication/Identity Management
- Breach Assessment and Recovery
- Data Loss Prevention
- Compliance & Privacy
- Tracking and understanding relevant regulations
- Tracking third-party data processing agreements (DPAs) that the enterprise needs to honor
- Creating policies to stay compliant with regulations and DPAs
- Establishing best practices
- Managing Data Subject Requests (DSRs)
- Conducting regular audits to ensure compliance
- Data Management
- Creating and maintaining a data catalog
- Classifying data
- Tagging data/adding metadata to data
- Ensuring data discoverability
- Data quality/data cleaning
- Managing enterprise data architecture
- Modeling data processes
Visually, the Data Governance function might look like this. (Credit to the Storage Networking Industry Association for most of the section under Data Protection.)
Figure 1: The Data Governance Function
Given how the Data Governance function looks above, the reality of how Data Governance operates is this: the figure above ends up segmenting into separate teams, operating in silos, and occasionally interacting with each other via periodic sensitive data audits or access audits.
Figure 2: Data Governance Functional Silos
The Problem with Data Governance and Functional Silos
There are 4 key problems when Data Governance is siloed by function.
- Functional silos result in technology solutions that are also siloed -- the solutions lack integration and automation across the data ecosystem and lifecycle and reinforce functional silos rather than breaking silos down
- Collaboration between silos occurs periodically as opposed to continuously
- When functions do collaborate, collaboration takes the form of manual, time-consuming audits, and
- Between the periodic audits, data governance is left to chance and best intentions.
Data suffers from a fundamental lack of governance between the periodic audits. This is what we call the broken promise of data governance.
In other words, state-of-the-art Data Governance currently looks like this:
Figure 3: Data Governance via Cross-Functional Periodic Audits
Ideally, Data Governance should behave/operate like this:
Figure 4: How Data Governance Should Ideally Operate
The Problem with Non-Continuous Data Governance
To illustrate the problem with non-continuous data governance, let’s assume that a team of employees is working on a special project, and they need access to a specific set of data that they don’t typically have access to. They submit a User Access Request Form. Every organization has one. Your organization has one too.
The User Access Request Form typically kicks off a process that looks something like the diagram in Figure 5 (below). The Security team receives the request. The Security team may need to validate the contents of the target data set with the Data Team; the Security team may also need to escalate the request to the employee’s manager or executive sponsor of the team, and there may also be a secondary approval process. Once approved, the database and/or IAM permissions need to be updated to reflect the new permissions.
The biggest problem with this process diagram is the “Stop” oval. During the process, data is being governed carefully and meticulously. But, even after IAM and database permissions have been provisioned, many bad things can happen. For example:
- A team member that has just been granted access to the data set promptly copies sensitive data into another table which all employees have access to.
- A day after the team has been granted access to the data set, a data engineer adds more sensitive data to the data set. Now that team has access to much more sensitive data than originally intended.
- A new employee is added to the team, but the new team member isn’t working on the special project. The new employee may inadvertently get access to the sensitive data.
- An employee leaves the team and no longer needs access to the sensitive data set. After transferring departments, the employee still has ongoing access to the data set because someone forgot to file a request to reduce access.
- An employee on the team leaves the company. Because the employee’s access to this sensitive data set was not part of typical onboarding, in the off-boarding process, the employee’s access to the data set is not deprovisioned, leaving a “ghost” user and creating a potential security vulnerability.
Without continuous monitoring of context and policy -- i.e., without operationalization -- the world of data governance becomes a massive collection of unenforced contracts. Even periodic access control audits and sensitive data audits leave data essentially ungoverned between audits.
Figure 5: User Access Request Process
Data Governance Policies: More than Just Access Control
Many people -- and many commercial software solutions, for that matter -- might tend to oversimplify the role of Data Governance into managing access control, for example.:
- Determine what kind of data resides where via data classification
- Manage who has access to a specific database, schema, or table
- Manage what kind of permission they have (read, read/write).
Next-gen access control solutions might also include self-service portals for employees to request access, and obfuscated access, where employees can access data, but specific fields are masked or tokenized on the fly.
Access control is a necessary bedrock of a good data governance program. At the same time, access control is insufficient for good data governance.
Let’s explore some other types of Data Governance policies that enterprises have, and how those policies can often end up as well-intended pieces of paper on someone’s desk that aren’t actually enforced -- i.e., they never get operationalized.
Examples of Data Governance Policies |
Data Governance Broken Promises |
---|---|
Data Preservation Policies |
|
|
|
|
|
|
|
|
|
Data Security Policies |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Compliance & Privacy Policies |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Data Management Policies |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Data Governance Needs to be Operationalized with DataGovOps
SalesOps measures and evaluates sales data to determine the effectiveness of a product, sales process, or campaign. Similarly, MarketingOps measures and evaluates marketing data to determine the effectiveness of marketing programs and campaigns.
DevOps is the combination of philosophies, practices, and tools that increases an organization's ability to deliver applications and services at high velocity.
DevSecOps automates the integration of security at every phase of the software development lifecycle, from initial design through integration, testing, deployment, and software delivery.
By analogy, Data Governance Operations -- or DataGovOps -- is the combination of practices and tools that:
- Automatically make data more secure, private, accurate, available and usable;
- Guide people to take appropriate action and follow established process to better govern data; and
- Continually measure and evaluate how internal data standards -- i.e., data policies -- are being adhered to.
DataGovOps is the collaborative data management practice focused on improving the communication, integration and automation of context and policy among all Data Governance stakeholders in an organization, including Security, Compliance, Privacy, and Data Owners. DataGovOps automates the integration of security and compliance at every phase of the data lifecycle. It’s the much-needed engineering counterpart to traditional Data Governance.
The cloud has transformed both the volume of data kept in organizations and the speed at which that data is growing. Given cloud scale and cloud velocity, Data Governance can no longer be a hodge-podge of manual steps, occasional audits, and a series of broken promises. It’s imperative for enterprises to automate and scale their Data Governance functions and invest in systems that continuously ensure that their data is being appropriately inventoried, stored, used, and deleted.
Now is the time to fix the broken promises of data governance. Now is the time for DataGovOps.