Securing the Data Lifecycle Key to Data Security and Compliance

Is data the new oil or is data the new garden? If you think about data as something archived, it’s the new oil. But if data is growing, transforming, or providing insights, it’s a garden－with many sowers, irrigators, harvesters, visitors, and trespassers. The fluidity and organic nature of modern data make it hard to secure.

Data security is more broken than ever－and companies recognize it. So why this gap in security, why so many breaches, and why this state-of-the-union.

In our first series, we discuss what the data lifecycle means for businesses, why it is at risk despite all of the security solutions in place, and how businesses can ultimately ensure that their data lifecycle is secure.

Our data is our digital twin. There’s probably more data about how we live than the information about our genes of why we breathe.

It’s generated through our everyday actions－from when we stream the latest movie on Netflix, email our co-workers, use social media, or order from our favorite restaurant. We give our consent to companies, hoping they will protect our data and knowing they can’t.

Why is it so hard for multi-billion dollar companies to protect our data? To understand the root cause, we need to perceive data as a living thing that is born, that grows, that spawns and that dies in some form. This journey is what we call the Data Lifecycle.

What is the Data Lifecycle

The data lifecycle is the sequence of stages that data goes through from its creation till it is eventually archived or deleted. At the very basic level, it goes through 3 phases:

Pre-usage: Getting data ready for use
Usage: Infinite ways data is used
Post-usage: Eventual data archival or deletion

Pre-usage

In this stage, consumers give their consent to companies as to whether they want their personal data to be collected or not. Though usually this is done in the fine print and is often skimmed over by consumers, the data lifecycle starts here. After that, data is usually imported into a data store like a data warehouse or lake for future use. That data should be (but is seldom completely) cataloged, classified, reviewed by multiple teams, cleaned and organized for business use.

Usage

Usage is where data custodians lose line of sight. Loss of visibility means less control and more risk.

In the usage stage, hundreds of users (and in a large company, thousands) are given access to data depending on their roles. Each user is part of a larger team (e.g. Data Science, Engineering, Analyst), and has different needs and skills when it comes to interacting with data. Data at this stage is copied, edited, manipulated, and transformed by users through different tools and services (e.g. self-service BI, SQL queries, APIs), several times a day. In some cases, it is even encrypted and obfuscated so that only users who have the “key” can access or change the data.

Post-usage

The lifecycle ends when data is decommissioned or archived for future use. Data at this stage is also deleted if requested by consumers, given that the regulation applies, or destroyed when a retention period has been met. For example, GDPR and CCPA regulations allow consumers to request the deletion of their data.

Like diamonds, data is forever. Even when data is archived or deleted, there is a high probability that somewhere in the company’s very diverse and distributed systems, either an exact copy of the data or a transformed version lives on. But let’s keep that story for another day.

Is your data lifecycle secure even with access control and DLP?

The assumption is that if you control who can access data and what they can export, then data will be safe. This assumption is flawed. Resultantly, 74% of data breaches start with privileged credential abuse.

Solutions like access control, DLP, and encryption are deployed to protect data. They are useful, but if you follow the science (74% of data breaches...), they are not enough.

Here’s why. These tools are binary and rule-based. They act as bookends for the data lifecycle－they secure both ends of the data lifecycle, but fail to secure the middle where most of the risk lies. This is going to sound familiar… they focus solely on controlling who has access to sensitive data and preventing data loss as opposed to understanding how sensitive data is actually being stored, moved, changed, and used.

Given that, the technologies we have available are ill-equipped to detect the following scenarios with credentialed users:
- An engineer snoops on his spouse (or a celebrity)－rides shares, shopping habits, chat history
- An inexperienced intern uses the dashboard in a BI tool to request a broad range of data (and exfiltrates PII fields in the process)
- A sales ops rep creates a temporary table and forgets to delete it after use
In all of these situations, data is at risk, and so too are the lives of millions of consumers. In order to secure the data lifecycle, companies must ask themselves these 4 critical questions:
1. Who has access to what data?
2. What regulatory and security properties apply to that data?
3. Where is data stored, moved, or copied to?
4. How is data being used?

How can companies make their data lifecycle secure?

Adopt a data lifecycle approach

Data security by itself does include data classification, data discovery, data cataloging, etc., but is part of a much larger problem, which is protecting the data lifecycle itself. The data lifecycle approach means having awareness at all times of where your data lives, breathes, and moves in your data infrastructure. That also includes dissecting the nature of these data interactions, so that companies have the context they need to govern and respond to all underlying risks. This ensures that the privacy, quality, and integrity of data is protected. Throughout this entire process, companies must also adhere to all privacy and data protection regulations.

Know data movement and lineage

Since data is constantly being used by hundreds of people, it’s essential that businesses keep track of what changes have been made to data since its initial creation. This includes knowing where data is moved to in the data infrastructure (e.g. copies, duplicates), and how it is transformed as users interact with it, whether that be employees, contractors, or partners. Companies must also implement a data classification system so that the usage and data compliance policies of a given data field still apply throughout its entire lifecycle, irrespective of where it is moved or copied to.

Understand the use of data

The biggest area of concern in all of this is the unknown－the fact that once users are granted access to data, it’s difficult for businesses to really know how they are interacting with data and the nature of that risk. For example, an employee could be spying on the purchase history of his ex and leverage that information in his personal life or an outside hacker could steal the credentials of an employee and use it to download PII. See more use cases here. In order to prevent these situations, it’s critical for companies to understand how their sensitive data is used and if it is truly used with the intent of why permission was given.

The data lifecycle, complex as it is, is an untapped pool of insight for businesses. In our next data lifecycle series, we will take a closer look at how companies can ensure data compliance throughout the data lifecycle.

Read this blog featured on AI TechPark here.

Author