It's important to keep data secure at all times, whether at rest, in use or in transit. Two popular data obfuscation methods are data masking and data encryption.
While both methods transform data for security purposes, they're not the same thing. Let's look at what each does and how they compare.
What is data masking and how does it work?
Data masking is the process of turning sensitive data into fake, or masked, data that looks similar to the authentic data. Masking reveals no genuine information, making it useless to an attacker if intercepted.
Data masking is challenging. The masked data set needs to maintain the complexity and unique characteristics of the original unmasked data set so queries and analysis still yield the same results. This means masked data must maintain referential integrity across systems and databases. An individual's Social Security number, for example, must get masked to the same SSN to preserve primary and foreign keys and relationships. It's important to note, however, that not every data field needs masking.
Types of data masking
A variety of data masking techniques can be used to obfuscate data depending on the type, including the following:
- Scrambling randomly orders alphanumeric characters to obscure the original content.
- Substitution replaces the original data with another value, while preserving the original characteristics of the data.
- Shuffling rearranges values within a column, such as user surnames.
- Date aging increases or decreases a date field by a specific date range.
- Variance applies a variance to number or date fields. It is often used to mask financial and transaction information.
- Masking out scrambles only part of a value. It is commonly applied to credit card numbers where only the last four digits remain unchanged.
- Nullifying replaces the real values with a null value.
The three main types of data masking are the following:
- Dynamic data masking is applied in real time to provide role-based security -- for example, returning masked data to a user who does not have the authority to see the real data.
- Static data masking creates a separate masked set of the data that can be used for research and development.
- On-the-fly data masking enables development teams to quickly read and mask a small subset of production data to use in a test environment.
What is data encryption and how does it work?
Encryption is considered the ultimate safeguard to ensure the security and privacy of data. It provides confidentiality in the security triad of confidentiality, integrity and availability. If encrypted data is lost, stolen or accessed without authorization, it remains meaningless.
Data, or plaintext, uses an encryption algorithm and an encryption key. Once encrypted, the encrypted data, or ciphertext, appears scrambled and unreadable. To view the ciphertext as plaintext again, the data must be decrypted using the correct encryption key. Encryption protects data at rest and in transit. Examples of data at rest include when stored in a file, database or archived on backup tapes. Data is in transit when being sent to another location, such as across a network to another device.
The most commonly used encryption methods are symmetric and asymmetric ciphers:
- Symmetric ciphers encrypt and decrypt data using the same secret key and protects data at rest. AES-128 and AES-256 are used to secure sensitive information as they are considered safe against brute-force attacks. While AES-256 is significantly stronger than AES-128, it requires more processing power and is slower. When power or latency is an issue, such as on mobile or IoT devices, AES-128 is the preferred option.
- Asymmetric encryption uses two interdependent keys: one public and one private. When data is encrypted with a public key, only the related private key can decrypt it, and vice versa. RSA is the most popular asymmetric cipher. It is ideal to protect data when it is transferred across trust boundaries. As RSA is resource-intensive, data is often encrypted using AES with just the AES key protected via RSA encryption.
Sensitive data should always remain encrypted, even when processed and analyzed. However, software developers and data scientists may find it difficult to work with encrypted data. Basic tasks can be difficult to perform; for example, you cannot filter users based on age if their birthdates are encrypted.
Data masking overcomes these problems as it keeps personally identifiable information (PII) private. It minimizes the use of and risks to real data by generating a characteristically accurate but fictitious version of a data set. Hackers can't reverse-engineer or use the data set to identify individuals.
Data masking vs. data encryption
Two key differences between masking and encryption are the following:
- Masked data remains usable, but original values can't be recovered.
- Encrypted data is challenging to work with but can be recovered with the correct encryption key.
Encryption is ideal for storing or transferring sensitive data, while data masking enables organizations to use data sets without exposing the real data. Whichever method gets used, it is essential that the encryption keys and algorithms used to mask data are secured to prevent unauthorized access.
Many standards and regulations, including GDPR, HIPAA, PCI DSS and CCPA, require organizations to keep PII secure and private. While laws and standards covering the processing and protection of data are essential, they create a challenge for companies that want to extract value from and even share the data with others.
Both encryption and data masking enable enterprises to remain compliant as they reduce the risk of sensitive data being exposed. Many organizations now use privacy-enhancing technologies, which use cryptography and statistical techniques to obfuscate sensitive data and enable it to be safely shared with and analyzed by multiple parties.