6 data security predictions for 2023

New tools are proliferating to secure data wherever it lives. Six data security trends -- ranging from AI washing to new data security platforms -- are in the forefront for 2023.

The distribution of data throughout the cloud is stimulating the adoption of new tools, including data security posture management, data detection and response and data security platforms.

I wish I could blame any of the many recent data breaches or ransomware attacks for my tardiness in publishing my 2023 data security predictions. The good news is I haven't fallen victim -- yet. I have, however, been investigating ChatGPT, the AI chatbot. And that leads to the first data security trend for 2023.

1. AI washing

It's been a plague in our industry to plaster AI, machine learning and deep learning on product descriptions and marketing materials like the breaking news chyron on CNN, as if AI automagically makes a security tool invincible.

Unfortunately, ChatGPT and other AI engines are only as good as their developers and training data sets. Right now, based on my ChatGPT experience, AI leaves a lot to be desired. What AI engines do well is analyze data at scales and speeds that are unfathomable for humans. This makes AI suitable for specific tasks -- data classification, for example -- but doesn't impart superhuman characteristics to data security tools.

But AI sounds futuristic and cool, so we'll continue to see it used to promote security tools, even if the AI engine contributes little to function or value.

2. AI improves data classification

Historically, we used a combination of complex regular expression pattern matching and heuristics to classify data. Classification tools come with a large suite of classifiers for things such as phone numbers, national IDs, credit card numbers, dates and more. The accuracy of traditional classifiers is often suboptimal. A large number of false positives means we incorrectly restrict access to non-sensitive data, while a large number of false negatives means we risk exposing sensitive data.

This is where AI truly shines. Trained properly, AI engines can take context into account. For example, we might find this text snippet in our data:

[PERSON NAME] was born on [DATE OF BIRTH] and was admitted to the hospital on [DATE]. She was prescribed [FDA CODE] for a diagnosis of [MEDICAL TERM].

It can be hard to write a regular expression to match all medical terms. But because this type of wording is common in medical histories, a properly trained AI engine can more easily and accurately find and classify the sensitive data.

3. Quantum FUD

Like AI, most people don't understand quantum computing. This gives data security vendors an opportunity to use fear, uncertainty and doubt (FUD). Don't fall for it.

Quantum computing can theoretically enable someone to break current data encryption ciphers that are used to secure data in transit and at rest. Theoretical being the key word.

So far, no one has proved that a quantum algorithm can break today's standard encryption protocols. And the few quantum computers that exist are big, expensive and don't work well. It's unlikely that a malicious actor will use quantum computers to decrypt your data in 2023.

Meanwhile, some smart people have developed at least four proposed quantum-safe encryption ciphers, and quantum-safe data encryption tools are commercially available.

4. End-to-end encryption and the fight with law enforcement

With end-to-end encryption (E2EE), data is encrypted on the source device before being transmitted to the destination device, ensuring that intermediaries -- especially service providers -- cannot access the sensitive data.

Unfortunately, ChatGPT and other AI engines are only as good as their developers and training data sets.

The key to E2EE is the encryption keys -- specifically, who owns and gets access to the keys. When only the data owner has the keys, no one else can access the data. As you might expect, law enforcement wants to snoop on your data, whether it's your personal chats to your family or your corporate data, and E2EE prevents that. As a result, law enforcement groups -- including the FBI -- are demanding that E2EE providers include a backdoor. As with quantum FUD, don't fall for it. Any backdoor that can be used by law enforcement can also be exploited by a malicious actor, rendering moot the security provided by encryption.

E2EE leads to a key management challenge. Businesses have numerous keys, so they need to manage and audit access as well as routinely rotate (change) keys. As I anticipate renewed interest in data encryption and key management options, I'll be fielding research in the first half of 2023 to learn more.

5. Shadow data is becoming important

At the start of the cloud era, many people found it was easy to go around their IT departments and use their credit cards to try out new services. These unsanctioned services are called shadow IT.

Likewise, it's easy for users to create new data stores in the cloud -- whether databases, file stores, object stores, VMs and more. These data stores are commonly created automatically during the application development process. Developers often make copies of live production data to test new features.

When security and IT teams don't know about the shadow data stores, they can't apply the appropriate controls. Shadow data is likely to be less protected than other data stores and more susceptible to being lost in data breaches, making it important for data security teams to continuously inventory their entire IT environment to identify and catalog every data store.

6. DSPM, DDR and the rise of the new data security platforms

I've written before about data security posture management, or DSPM. DSPM is to data what cloud security posture management is to cloud infrastructure: ensuring data has the correct security posture -- access controls, encryption, masking, etc. -- regardless of where that data lives. And in the cloud, data lives everywhere.

DSPM isn't the only new kid on the block. Another new tool is data detection and response (DDR). DDR applies the concepts of endpoint detection and response and extended detection and response to the realm of data: identifying, analyzing and responding to security incidents related to sensitive or critical data. DDR is centered on detecting potential data breaches or unauthorized access to sensitive data, analyzing the potential effect of the event and taking appropriate actions to remediate the issue and prevent similar incidents in the future.

To help in the effort, DDR often captures data lineage and tracks the flow of data, including when and where it was created, which users or applications can touch the data, where and when the data was modified, and data storage locations -- including copies and derivatives.

DDR and DSPM are joining forces with data cataloging and classification in addition to data access controls to form comprehensive data security platforms. By taking a data-centric approach to data security, these platforms can find and secure sensitive information that is distributed throughout cloud environments, assess privacy and security risks, and enforce policy to address those risks now and into the future. This is why data security platforms are garnering significant customer and investor interest.

Next Steps

Data security guide: Everything you need to know

Tech news this week: AI, decentralized apps and ransomware

Dig Deeper on Data security and privacy

Enterprise Desktop
Cloud Computing