How to Secure and Protect Data During Testing
When it comes to testing, there are many factors that require consideration to ensure the correct use and protection of data.
Compliance standards like the Privacy Act, set out requirements for companies to ensure different types of data are carefully managed and protected. Test data should ideally be created in a generic form with no relation to live system data.
However, often data needs to reflect actual real data to ensure accurate testing. If you must use “real” data for testing purposes consider implementing a robust data masking technique to protect the data.
When using data for testing, an organisation should ensure it is:
- Anonymised – Any personal or confidential information that is used should be protected either by deletion or modification.
- Carefully selected and secured for the period of testing.
- Securely deleted when testing is complete.
- Agreed processes used to protect data during testing are securely managed.
Data Masking Types
There are several types of data masking types commonly used to secure sensitive data.
- Static Data Masking - Alter all sensitive data until a copy of the database can be safely shared.
- Deterministic Data Masking - Map two sets of data that have the same type of data, in such a way that one value is always replaced by another value. For example, the name “Bob Jones” is always replaced with “Bobby Brown”. Easy to do but is inherently less secure in the event someone finds out the mapping between the two data points.
- On-the-Fly Data Masking - Mask data while it is transferred from production systems to test or development systems before the data is saved to disk. This is used when you need to continuously stream data from production to multiple test environments. The process sends smaller subsets of masked data when it is required. Each subset of masked data is stored in the dev/test environment for use by the non-production system.
- Dynamic Data Masking - Similar to on-the-fly masking, however data is never stored in a secondary data store in the dev/test environment but streamed directly from the production system and consumed by another system in the dev/test environment.
-
Data Masking Techniques
- Data Encryption - An encryption algorithm is used to lock the data from anyone being able to see it pother than the person who has the key. For testing purposes it is often not helpful ass it requires t he system to continually lock and unlock the data and processes need to be in place manage and share encryption keys.
- Data Scrambling - Reorganise characters in the data set in a random order, replacing the original content. For example, a number such as 985467 in a production database, could be replaced by 649857 in a test database. Easy to do to but can be less secure if someone figures out the process and can reverse engineer the changes.
- Nulling Out - Data is replaced with “null” or is deleted. Not helpful during testing if you need the data to perform certain functions or test outputs appear on a page correctly.
- Value Variance - Replace original data values by using a function, such as the difference between the lowest and highest value in a series. For example, if a a list of product prices were between $10 and $50 the product price can be replaced with a range between the highest and lowest price paid. This can help protect anyone getting access to the original dataset.
- Data Substitution - Data values are substituted with fake, but realistic, alternative values. For example, real names or numbers are replaced by random names and numbers from a phonebook.
- Data Shuffling - Similar to substitution, except data values are switched within the same dataset. Data is rearranged in each column using a random sequence; for example, switching between real customer names across multiple customer records. The output set looks like real data, but it doesn’t show the real information for each individual or data record.
Pseudonymisation - A term used by the EU General Data Protection Regulation (GDPR), to cover processes like data masking, encryption, and hashing. Pseudonymization, as defined in the GDPR, is any method that ensures data cannot be used for personal identification. It requires removing direct identifiers, and, preferably, avoiding multiple identifiers that, when combined, can identify a person.
Data Masking Best Practices
- Determine the Project Scope
- Securely manage encryption keys, or other data that can be used to revert to the original data values.
- Ensure Referential Integrity so that each “type” of information coming from a business application must be masked using the same algorithm.
- Secure the Data Masking Algorithms. If someone learns which repeatable masking algorithms are being used, they can reverse engineer large blocks of sensitive information.
- Ensure separation of duties. This is explicitly required by some regulations.
Supporting evidence that an evaluator of this control may request includes:
- Documentation of policies and procedures in place to protect data being used in test environments.