Data Security Defined: Your Guide to Industry Jargon

Cloud Native Application Protection Platform (CNAPP)

Cloud-Native Application Protection Platform (CNAPP) is a security and compliance solution that helps teams build, deploy, and run secure cloud-native applications. CNAPPs provide centralized controls, threat detection, and incident response capabilities. They also help security teams collaborate more effectively with developers and DevOps. 

CNAPPs consolidate a large number of previously siloed capabilities, including: 

  • Container scanning
  • Cloud Security Posture Management (CSPM)
  • Cloud Service Network Security (CSNS)
  • Cloud Workload Protection Platform (CWPP)
CNAPPs provide complete end-to-end cloud security via a single holistic platform. They help teams identify hidden risks from misconfigurations, threats, and vulnerabilities.

Cloud Security Posture Management (CSPM)

  • Overview of CSPM: Cloud Security Posture Management (CSPM) is a set of IT security tools that enhance cloud security. They offer organizations automated visibility, continuous monitoring, and remediation workflows to assess, maintain, and improve security posture in the cloud.

  • Assessment & Alignment: CSPM ensures that an organization's cloud resources and configurations adhere to security best practices, industry standards, and regulatory requirements. It proactively identifies and addresses vulnerabilities, misconfigurations, and risks.

  • Functional Capabilities of CSPM Tools: These tools offer automated visibility, uninterrupted monitoring, threat detection, and remediation workflows. They're equipped to examine various cloud infrastructures, including SaaS, PaaS, IaaS, Containers, and Serverless code.

Cloud Service Network Security (CSNS)

CSNS solutions protect user traffic to and from cloud-based applications. They also secure network connectivity between cloud services. 

CNSS solutions include:

  • Web application firewalls
  • Intrusion detection systems
  • Deep packet inspection
  • Virtual private networks (VPNs)
  • Distributed denial-of-service protection
  • Load balancing
  • Transport layer security examination

Cloud Workload Protection Platform (CWPP)

A cloud workload protection platform (CWPP) is a cybersecurity solution that protects workloads in cloud and data center environments.

  • Agent-Based Monitoring: A Cloud Workload Protection Platform (CWPP) is a cybersecurity solution equipped with a software agent that runs on target machines in cloud and data center environments. This agent gathers security-related data and events to provide protection.
  • Versatile Security Controls: CWPPs offer security oversight and controls for various workloads, including physical machines, virtual machines, containers, and serverless architectures. They are designed to mitigate vulnerabilities resulting from poor cybersecurity practices and help prioritize high-risk issues.
  • Multi-Cloud and IaaS Compatibility: Primarily used for securing server workloads in public cloud Infrastructure as a Service (IaaS) environments, CWPPs enable cloud providers and customers to maintain the security of workloads as they pass through different domains.

Compliance Regulatory Standards

GDPR

GDPR stands for General Data Protection Regulation. It's the core of Europe's digital privacy legislation.The General Data Protection Regulation (GDPR) is a regulation of the European Union (EU) that became effective on May 25, 2018. It strengthens and builds on the EU's current data protection framework, the General Data Protection Regulation (GDPR) replaces the 1995 Data Protection Directive.

CCPA

The California Consumer Privacy Act (CCPA) is a law that allows any California consumer to demand to see all the information a company has saved on them, as well as a full list of all the third parties that data is shared with. In addition, the California law allows consumers to sue companies if the privacy guidelines are violated, even if there is no breach.It’s all about letting consumers know what data is being collected by the company, when it’s being sold and shared for a business purpose, and any third-party data is shared with or any sources of personal data has been used.

PCI

PCI DSS is the global security standard for all entities that store, process or transmit cardholder data and/or sensitive authentication data. PCI DSS sets a baseline level of protection for consumers and helps reduce fraud and data breaches across the entire payment ecosystem. It is applicable to any organisation that accepts or processes payment cards.

SOX

SOX compliance is an annual obligation derived from the Sarbanes-Oxley Act (SOX) that requires publicly traded companies doing business in the U.S. to establish financial reporting standards, including safeguarding data, tracking attempted breaches, logging electronic records for auditing, and proving compliance.

FERPA

The Family Educational Rights and Privacy Act (FERPA) is a federal law enacted in 1974 that protects the privacy of student education records. FERPA applies to any public or private elementary, secondary, or post-secondary school. It also applies to any state or local education agency that receives funds under an applicable program of the US Department of Education.

HIPAA

The Health Insurance Portability and Accountability Act (HIPAA) of 1996 is a set of regulatory standards that intend to protect private and sensitive patient data from hospitals, insurance companies, and healthcare providers.

Data Access Governance

Data Access Governance (frequently referred to as DAG) is a market segment that focuses on identifying and addressing the malicious and non-malicious threats that can come from unauthorized access to sensitive and valuable unstructured data.

Organizations look to Data Access Governance to:

  • Determine if sensitive and valuable files are being stored in secure locations
  • Identify who has access to these files
  • Correct and enforce access permissions

Data deployment

  • Data deployment refers to the process of moving, transferring, or releasing data from one environment to another.
  • This can involve various stages of data management, including data integration, data migration, and data synchronization.
  • Data deployment is a critical aspect of modern IT operations and is used to ensure that accurate and up-to-date data is available in the appropriate systems and environments.

Data exfiltration

Data exfiltration is sometimes referred to as data extrusion, data exportation, or data theft. All of these terms are used to describe the unauthorized transfer of data from a computer or other device.

Data flows

  • Data flows refer to the movement, transformation, and manipulation of data as it travels through various stages of a system, application, or business process.
  • A data flow illustrates how data moves from one point to another, how it is processed or modified along the way, and how it ultimately reaches its intended destination.
  • Data flows are essential for understanding how information is managed within an organization's technology ecosystem.

Data Governance

Data governance is the process of organizing, securing, managing, and presenting data using methods and technologies that ensure it remains correct, consistent, and accessible to verified users.

Data lakes

A data lake is a centralised repository that ingests and stores large volumes of data in its original form. The data can then be processed and used as a basis for a variety of analytic needs. Due to its open, scalable architecture, a data lake can accommodate all types of data from any source, from structured (database tables, Excel sheets) to semi-structured (XML files, webpages) to unstructured (images, audio files, tweets), all without sacrificing fidelity. The data files are typically stored in staged zones – raw, cleansed and curated – so that different types of users may use the data in its various forms to meet their needs. Data lakes provide core data consistency across a variety of applications, powering big data analytics, machine learning, predictive analytics and other forms of intelligent action.

Data lakes vs Data warehouses

While data lakes and data warehouses are similar in that they both store and process data, each has its specialties and, therefore, its use cases. That's why it's common for an enterprise-level organization to include a data lake and a data warehouse in their analytics ecosystem. Both repositories work together to form a secure, end-to-end system for storage, processing, and faster time to insight.

A data lake captures both relational and non-relational data from a variety of sources—business applications, mobile apps, IoT devices, social media, or streaming—without having to define the structure or schema of the data until it is read. Schema-on-read ensures that any type of data can be stored in its raw form. As a result, data lakes can hold a wide variety of data types, from structured to semi-structured to unstructured, at any scale. Their flexible and scalable nature make them essential for performing complex forms of data analysis using different types of compute processing tools like Apache Spark or Azure Machine Learning.

By contrast, a data warehouse is relational in nature. The structure or schema is modeled or predefined by business and product requirements that are curated, conformed, and optimized for SQL query operations. At the same time, a data lake holds data of all structure types, including raw and unprocessed data, a data warehouse stores data that has been treated and transformed with a specific purpose, which can then be used to source analytic or operational reporting. This makes data warehouses ideal for producing more standardized forms of BI analysis or for serving a business use case that has already been defined.

Data Lakes vs Data Warehouses

Data lineage

  • Data lineage refers to the visualization and documentation of the end-to-end journey of data as it moves through various stages of processing within a system or across different systems.
  • It provides a detailed view of the flow, transformations, and interactions that data undergoes from its origin to its final destination.
  • Data lineage helps organizations understand data movement, transformation processes, and relationships, which is crucial for data governance, compliance, troubleshooting, and decision-making.

Data locations

  • "Data locations" can refer to the physical or logical places where data is stored, managed, or processed within an organization's information technology infrastructure.
  • These locations could include physical data centers, cloud-based storage, databases, files systems, and more.
  • The concept of data locations is essential for understanding where data resides and how it's accessed and managed.

Data Maps

In the world of data privacy, data mapping is the process of inventorying the personal data in your business systems. This inventory is called a data map. An up-to-date data map is vital for compliance with modern data privacy regulations – like GDPR in the EU and CCPA in the US.

  • Data maps, often called data mapping, are graphical or textual representations that illustrate the flow, transformation, and relationships of data between different systems, databases, applications, or components within a technology ecosystem.
  • Data mapping is fundamental to data integration, migration, and synchronization processes.
  • It helps ensure that data is accurately and appropriately transferred or transformed as it moves from one source to another.

Data mapping requires answers to basic questions, including:

  • What personal data does my company collect?
  • When does my company erase this data?
  • Why does my company collect and process this data?
  • How does my company process this data?
  • Besides my company, who else receives this data?

Data observability

  • Data observability, also known as dataOps observability, refers to the practice of monitoring and understanding the behavior, performance, and quality of data as it moves through various stages of data pipelines and systems.
  • Similar to the concept of observability in software systems, data observability focuses on gaining insights into the "black box" of data operations to ensure data reliability, accuracy, and compliance.

Data pipelines

  • A data pipeline is a series of processes and tools that are used to extract, transform, and load (ETL) data from various sources, process it, and then deliver it to its destination, typically a data storage or analysis platform.
  • Data pipelines are a fundamental component of modern data architecture, allowing organizations to efficiently collect, manage, and utilize large volumes of data for various purposes, such as analytics, reporting, and machine learning.

Data portfolio

  • A data portfolio, also known as a data asset portfolio, refers to the collection of all the data assets owned, managed, and utilized by an organization.
  • It encompasses a wide range of data types, including structured and unstructured data, from various sources and systems.
  • A data portfolio is a strategic view of an organization's data assets, outlining their characteristics, usage, value, and relationships.

Data Quality

Data quality is the degree to which data is accurate, complete, timely, and consistent with your business’s requirements.Here are some data quality dimensions

Compliance

Consistency

Integrity

Latency

Recoverability

Data repositories

  • Data repositories are centralized storage locations or databases where data is stored, managed, and organized for easy access and retrieval.
  • These repositories provide a structured way to store data, making it available for various purposes such as analysis, reporting, application development, and more.
  • Data repositories play a crucial role in modern data management and are used by organizations to maintain data consistency, improve data quality, and enable efficient data sharing.

Data residency

  • Data residency refers to the physical or geographical location where data is stored, processed, or maintained.
  • It is a crucial consideration for organizations that handle sensitive or regulated data, as data residency often intersects with data privacy laws, compliance regulations, and security concerns.

Data Retention

Data retention, or record retention, is exactly what it sounds like — the practice of storing and managing data and records for a designated period of time. There are any number of reasons why a business might need to retain data: to maintain accurate financial records, to abide by local, state and federal laws, to comply with industry regulations, to ensure that information is easily accessible for eDiscovery and litigation purposes and so on. To fulfill these and other business requirements, it’s imperative that every organization develop and implement data retention policies.

Typically, a data retention policy will define:

  • What data needs to be retained
  • The format in which it should be kept
  • How long it should be stored for
  • Whether it should eventually be archived or deleted
  • Who has the authority to dispose of it, and
  • What procedure to follow in the event of a policy violation

Disambiguated Users

Lets say there is data store. This data store is accessible only to User A and User B. Now, there is an application lets say a BI tool which can access this data store. This BI tool has its won credentials lets say BI_USER for accessing that data store. So, when queries are made to this data store by the BI tool it uses its own user name. But there may be some users C and D who are trying to access this data store via this BI tool and they do not have access to the data store directly, such users are called disambiguated users. In this case, C and D are disambiguated users.

Data risk assessment(DRA)

  • Data risk assessment, also known as data risk analysis or data security risk assessment, is the process of evaluating potential risks and vulnerabilities related to an organization's data assets.
  • The goal of a data risk assessment is to identify and prioritize risks that could lead to data breaches, unauthorized access, data loss, or other security incidents.
  • By understanding these risks, organizations can develop strategies to mitigate them and protect their data assets.

Data security

  • Data security refers to the practice of protecting digital data from unauthorized access, use, disclosure, modification, or destruction.
  • It encompasses a range of measures and strategies designed to safeguard sensitive information and ensure that data remains confidential, available, and reliable.
  • Data security is crucial to prevent data breaches, protect individual privacy, maintain business continuity, and comply with data protection regulations.

Data Security Posture Management(DSPM)


  • Definition & Need: DSPM offers a practical approach to secure cloud data (structured and unstructured), ensuring sensitive and regulated data maintains the correct security posture, regardless of location.
  • Core Functions: DSPM helps organizations discover and classify cloud data, detect and alert on policy violations, prioritize these alerts, and offer remediation strategies.
  • Three-Step Process: DSPM operates on a three-fold principle: Find (locating and classifying data), Flag (identifying security risks), and Fix (remediating those risks).
  • Benefits: DSPM answers the pivotal cybersecurity question: “Where is my data?” providing visibility into data storage, access, usage, and security posture.

Dive deeper into DSPM by reading our white paper, "Empowering Data Security: DSPM and Beyond," today!

Data security governance(DSG)

  • Data security governance refers to the set of policies, processes, and controls that an organization establishes to manage and safeguard its data assets.
  • The goal of data security governance is to ensure that data is protected, maintained, and used in a secure and compliant manner.
  • It involves defining roles, responsibilities, and procedures to manage risks related to data breaches, unauthorized access, data loss, and other security threats.

Data security Posture

  • A data security posture refers to an organization's overall approach, readiness, and effectiveness in safeguarding its data assets against potential security threats and risks.
  • It encompasses the combination of security measures, policies, practices, and technologies that an organization has in place to protect its data from unauthorized access, breaches, data loss, and other security incidents.
  • A strong data security posture reflects the organization's commitment to data protection and risk mitigation.

Data sets

  • A dataset is a collection of structured or unstructured data that is organized and stored in a specific format.
  • Datasets are used for various purposes, including analysis, research, training machine learning models, and generating insights.
  • They can contain data of different types, such as text, numbers, images, audio, and video.

Data sensitivity

  • Data sensitivity refers to the level of confidentiality and importance associated with a piece of data.
  • It indicates how critical or private the data is and determines the level of protection and access control that should be applied to it.
  • Understanding data sensitivity is crucial for implementing appropriate security measures and ensuring that data is handled, stored, and shared in a manner that aligns with its importance and confidentiality.

Data Sovereignty

Data sovereignty is not just about where the data is stored but also about the laws and regulations that govern the data at the location where it is physically stored. In the case of data sovereignty, various data subjects will have different privacy and security regulations depending on the location of the data centers where the information is stored.

Data Store Sprawl

Data sprawl is the accumulation of vast amounts of data by organizations, to the point where they no longer know what data they have or what is happening with that data.

Data vulnerabilities

  • Data vulnerabilities refer to weaknesses or gaps in the security measures and practices that expose data to potential threats, breaches, unauthorized access, or loss.
  • These vulnerabilities can arise from various factors, such as software flaws, misconfigurations, human errors, and inadequate security controls.
  • Exploiting data vulnerabilities can lead to data breaches, compromise sensitive information, and result in financial, legal, and reputational damage for organizations.

Data warehouses

A data warehouse is a centralized database that stores huge amounts of business information and is accessible for analysis and decision-making.

In short: it's a database system on steroids for managing and storing data on a large scale, especially at the enterprise level.

In this system, data from different sources are extracted, transformed, and loaded into a data warehouse to make the data accessible and easy to manage. Or to translate it into useful information for various business and technological possibilities.

Personal Information Identifier(PII)

In short, both direct and quasi-identifiers refer to pieces of information that can be used to identify an individual, either by themselves or in combination with other readily available information.

Direct Identifiers

As the name suggests, direct personal identifiers are pieces of information that can be used to directly identify an individual. Examples of direct identifiers include:

- Name

- Social Security number

- Email address

- Credit card number

- Medical record number

Quasi-Identifiers

Quasi-identifiers, or indirect personal identifiers, too, are pieces of information that, when combined with other data, can identify an individual. Examples of quasi-identifiers include:

- Date of birth

- Gender

- Zip code

- Occupation

- Medical diagnosis

- Approximate location

Semi Structured Data

  • Semi-structured data is information that doesn’t consist of Structured data (relational database) but still has some structure to it.
  • Semi-structured data consist of documents held in JavaScript Object Notation (JSON) format. It also includes key-value stores and graph databases.

Shadow data

  • Shadow data, also known as "dark data," refers to data that is generated, collected, or stored by employees or within an organization's IT environment without the knowledge or control of the IT or security teams.
  • This data often exists outside the official data management and governance processes and can pose security, compliance, and operational risks.

Structured Data

  • Structured data is generally tabular data that is represented by columns and rows in a database
  • Databases that hold tables in this form are called relational databases
  • The mathematical term “relation” specify to a formed set of data held as a table.
  • In structured data, all row in a table has the same set of columns.
  • SQL (Structured Query Language) programming language used for structured data.

Unstructured Data

  • Unstructured data is information that either does not organize in a pre-defined manner or not have a pre-defined data model.
  • Unstructured information is a set of text-heavy but may contain data such as numbers, dates, and facts as well.
  • Videos, audio, and binary data files might not have a specific structure. They’re assigned to as unstructured data.

Data localization

In contrast, data localization demands that all data generated within a country's borders remains within them. Unlike data residency and data sovereignty, data localization is consistently applied to the creation and storage of personal data.