Why Data Discovery is Key to PCI Compliance

PCI DSS requires merchants to protect cardholder data. It separates between cardholder authentication data (including CVV2, CVC2 and CID codes, track data from the magnetic strip or PIN data) which must not be stored at all, and non-authentication data (PAN, cardholder name, service code and expiration date) which can be stored as long as it is protected by data encryption solutions or other compensating controls.

To achieve PCI DSS compliance, the merchant must scan their databases for both authentication data and non-authentication data. Any cardholder authentication data discovered must be removed. Merchants then need to decide, based on the applications and available solutions, if databases with non-authentication data should be encrypted or protected using compensating controls.

Data Management: Ideal Versus Reality

Ideally, database security is about protecting all data, in all databases, all the time, against all potential attacks. In reality, resources and budgets are constrained, so merchants need to prioritise critical databases and implement effective data risk management processes to reduce risk to data.

PCI DSS supports a risk-based approach for managing data protection. Data risk analysis should take into account the data that needs to be protected. An effective programme will analyse the threats that are of utmost importance, while developing guidelines for gradual deployment.

There are three steps that need to take place to achieve PCI data security:

  • Database discovery: finding servers in scope for PCI.
  • Data classification: to gain an understanding of the information residing on the server.
  • Vulnerability assessment: gives insight into the security posture of a server.

Database Discovery

Managing databases in today’s dynamic environments is a growing challenge. Many merchants struggle to keep track of database instances and configuration changes. Even more, most can’t describe the specific content of those databases. Yet for achieving PCI compliance it is of key importance that all databases are scanned for PCI regulated data. Mapping out databases on the network helps manage the project scope.

So how can it be done? Here’s a checklist to help effectively discover data:

  • Obtain a list of database servers that includes IP address, port number, type of database.
  • Identify servers (ICMP ping/TCP ping).
  • Scan ‘default’ ports.

PCI requirements:

  • PCI DSS section 1.1.5 requires a review of all open ports for servers holding PCI regulated information and justification for any but a few selected (HTTP/HTTPS/SSH/VPN).
  • PCI mandates (at least) quarterly scans, which means you must:
    • Repeat scans.
    • Locate and present new information.
    • Navigate results by deferent criteria (segment, DB type, etc).

Data Classification

Classification is the next step for PCI compliance. Classification identifies the type of data contained within databases and database objects and helps determine which measures should be taken to protect it. Database objects are not created equal. Some don’t contain any sensitive information, even if the object name looks like it would. Other objects may not have a name that tells as much, yet hold very sensitive information.

Understanding the data type is as important as identifying the sensitive object. Not all sensitive information is sensitive in the same way and some data types need to be treated differently than others. A good example is PCI regulated data: cardholder authentication information requires deletion, while non-authentication data needs to be encrypted. Encrypting the database without deleting authentication data will hinder PCI compliance.

Collecting granular details about the database objects holding sensitive data, including table name, column name and data type, is essential. The best way to get these details is by performing a credentialed scan that identifies the relevant objects and columns. There are three main methods used for scanning databases:

  1. ‘Name-based’ classification scan, which searches for table or column candidates through the data dictionary. This approach relies on tables and columns having ‘meaningful’ names. While this approach is very fast, it may yield false positives and false negatives.
  2. ‘Content-based’ classification scan, which looks for data within tables. This approach is very accurate but is also time-consuming. Performing statistical sampling of tables can improve the performance of the scan.
  3. A combined approach uses name-based classification in order to find candidate objects and then applies content-based classification on those candidates. This approach provides accurate results in a timely manner.

The chosen approach to data classification needs to be accurate, flexible and scalable. Scans should be easy to manage and repeatable. Results should be meaningful and support change analysis.

Vulnerability Assessment and Configuration Scans

PCI section 11.2 requires a quarterly vulnerability assessment. Vulnerability scans allow merchants to measure the security posture of their critical databases. Configuration scans find compliance gaps against internal policies and industry standards. Scanning databases for potential threats and misconfigurations provides a security score and can be used to calculate risk.

Do’s and don’ts – performing vulnerability assessments and configuration scan

Don’t look for vulnerabilities via exploitation (i.e. by trying to attack the database). Methods such as attack simulation, privilege abuse or brute force login are not recommended on live systems as they consume server resources and may bring the system down. You risk blocking legitimate users or business processes.

Do identify vulnerabilities by sending legitimate queries to the server and analysing the results. Look for known vulnerabilities in the following ways:

  • Check the version of the database and look for known vulnerabilities.
  • Query the database for indicative evidence (e.g. whether a vulnerable stored procedure exists in the database).
  • Check if the vulnerability can be exploited by unprivileged users in the specific setup.

To make tests coherent, they should be mapped against known benchmarks, for example the Defence Information Systems Agency security technical implementation guide (DISA STIG). Findings should adhere to known standards (e.g. the Security Content Automation Protocol (SCAP). To ensure accuracy, tests should be carefully created to appropriately test the benchmark’s requirement for each specific database type. For scalability, results should be legible and understandable, tying a specific finding to the appropriate benchmark item. Results need to be tracked over time and the process must be repetitive.

Calculating a data risk score: assign a score to each discovered vulnerability. You can use a custom scoring system or use the common criteria scoring system (CVSS). Calculate the risk based on vulnerability scan results and the type of data contained on the server. A risk calculation which takes into account the severity of the vulnerabilities and the type of data that is exposed by them helps prioritise systems for mitigation efforts.


Merchants today process more data in general – and specifically more cardholder data – than ever before. Managing confidential and sensitive cardholder data creates a data security and compliance challenge as merchants must ensure that it is protected from theft, abuse and misuse. In order to protect cardholder data, merchants must know where data resides and which data types exist in their data systems. Continuous discovery of databases and classification of the information they store ensures new data can be included in security and compliance efforts. Once merchants understand how sensitive data is distributed across their data repositories, they can better enforce audit and security policies and apply more effective controls.


Related reading