Data Compliance for PPI

Achieve Data Compliance in Your Organization: Finding PCI & PII


Just about every day, headlines share stories related to data breaches that have compromised the security of thousands or millions of confidential electronic records, often containing credit cards or social security numbers. Many of the companies targeted have not followed best practices promoted by regulatory or industry standards. Notable thefts, as well as the expansion of work-from-home initiatives, have prompted groups to look to their policies to avoid becoming fodder for the next news cycle. At Gimmal, we are concerned with governing, analyzing, and searching organizational data. This post will focus on identifying this compliance challenge: Find sensitive, unprotected information in all your static repositories before it can hurt your business. Locating vulnerable data can help you identify and quickly remediate areas where policy is failing. 

Two Key Areas of Data Compliance 

Two key areas of data compliance revolve around Payment Card Industry (PCI) and Personally Identifiable Information (PII). PCI data falls under the aegis of the Data Security Standards, currently in version 3.2.1, promulgated by a council of global payment brands (Visa, American Express, etc.). PII data includes: 

  • Social security numbers 
  • Date of birth 
  • Personal health information 
  • Any other points that can identify an individual 

Various privacy laws and industry regulations are responsible for detailing how PII and PCI records should be handled. Recommended protections range from maintaining secure networks to managing the flow of private information.  For data stored at rest, the details generally boil down to – don’t transmit or store sensitive data in plain text, reduce reliance on static personal data, audit your systems, and employ policies to ensure you remain in compliance.   

How to Achieve Data Compliance  

The best way to start this process is to determine the extent of the problem. One helpful technique is to scrutinize your systems for PCI or PII that would be visible to intruders in the event of a data breach. If restricted information is found in plain text, you will know that you need to take steps to fix the issues. While there are advanced solutions to locate this sensitive information using artificial intelligence or machine learning (including Gimmal Discover), this option may not be available for everyone. Since the PCI or PII in question usually matches a pattern, one of the basic techniques, which is far more accessible, is to locate problematic data is to use pattern-based searching in the form of regular expressions. However, items that match the criteria but don’t match any PCI or PII (called false positives or mismatches) can overwhelm the core task.  

False Positives 

False positives are to be expected, and any methodology should account for them. Often there is a myriad of data points that match PCI or PII patterns. For example, a typical social security number may fit a nine-number pattern, but absent any constraints, so too will a zip code in zip+4 format. Additionally, credit card and social security numbers match data patterns found in log files, random URLs, and spreadsheets.  The trick is to catch all instances of PCI or PII data to achieve compliance while eliminating as many mismatches as possible.  Using precise criteria to locate the data will speed up the review and resolution of any problem areas.  

After establishing the basic details, run your scan on a small set of data. It is essential that the sample contain data that will match the PCI and PII that you seek (i.e. be responsive), but also data that doesn’t.  A simple regular expression may well leave too many outliers which are responsive but not relevant. Checking similar but non-related information will determine if there are problems with your chosen configuration.  

To diminish the false positives, it helps to employ a bit of programmatic logic to your regular expressions. This is made a bit easier for credit card numbers as a checksum can be compared against a prospective match.  Social security numbers have a few rules (no starting with ‘000’ or ‘666’, for example) and follow several basic patterns with specific delineators. Sticking to those will help further filter out false matches. Another trick is to check for either side of the selected text for numeric values. While testing at Gimmal, we found that applying this logic (programmed into Gimmal Discover’s default PATTERN functionality) removed over 75% of false positives in our sample sets.    

Retest the data set and expand the scope with a more extensive sampling before committing to a large-scale scan. Some mismatches may persist due to the identically formatted numbers that are used for other, non-regulated purposes (e.g. account numbers). At this point, professional knowledge of the electronically stored information (ESI) comes in handy to craft appropriate refinements to the search. Tools like Gimmal Discover that give users the flexibility to add additional criteria such as proximity or Boolean logic can be very useful in pinpointing accurate matches. Once you have the false positives reduced without impacting valid hits, your search is ready to deploy. 

Achieving Continuous Data Compliance 

This tweaking is to achieve a key step in compliance and risk reduction – to pass an audit for unencrypted PCI or PII. Solutions like Gimmal’s Discover can help you with this task. Gimmal’s tools allow you to examine your data consistently and repeatedly using basic or advanced methodology regardless of where your data is stored. Gimmal Discover empowers your in-house resources with the capability to scan a variety of content repositories like file shares, workstations, SharePoint, Box, Google Drive, OneDrive, and more. Not only can Gimmal Discover help you locate sensitive information, but it also allows you to take the appropriate action to remove the risk from your data repositories.   

If your policy is sound and your organization’s confidential data flow is well regulated, genuine hits should never be found.  However, suppose your proactive steps do turn up unwanted PCI or PII data. In that case, you now have an opportunity to resolve the situation – via policy and governance solutions like Gimmal Discover– before your business is featured, rather disparagingly, on the nightly news.   

To learn more about how Gimmal’s tools can help you, please visit our website at