E-Handbook: Enterprise data lakes hold the key to actionable insights Article 4 of 4

How to ensure your data lake security

Your data lake is full of sensitive information and securing that data is a top priority. These are the best practices to keep that information safe from hackers.

Data lakes are special-purpose platforms that store large volumes of structured and unstructured data from a wide variety of sources. Analysts can access information in the data lake directly using various tools or use it as a staging area to prepare information for loading into data warehouses.

In other words, data lakes hold a treasure trove of information about your business. And like all enterprise data stores, we need to protect data lakes against unauthorized access.

Identify and classify existing and incoming data

If you're unaware that the data you're storing is sensitive, you won't take the necessary precautions to protect it. Most businesses have security classifications that group data elements into sensitivity levels. Levels are based on industry and governmental regulation security standards and what the impact would be to the organization if that data were disclosed or modified without authorization.

These classifications allow administrators to deploy the appropriate level of baseline security mechanisms and procedural controls. To ensure proper classification, organizations need to evaluate existing data in the data lake and develop procedures to analyze incoming information.

Secure input, output and work files

A common phrase when discussing database security is "no database is an island." The same principle applies to data lakes.

A common tactic for hackers is to gain access to input files that load the system, work files used during day-to-day processing and output files. The output you need to secure includes files used to transfer data to other applications, report files and data lake backups.

Account management and access rights

There are numerous data lake platforms to choose from. Amazon, Oracle, Cloudera, Microsoft and Teradata all have popular data lake options. Although each platform may have different mechanisms and processes to create accounts and assign access rights, data lake security best practices are the same for each environment.

To properly secure your data lake, you should adhere to traditional industry recommendations that range from granting the minimum amount of security rights needed for users to perform their work to setting proper password complexity, aging and lockout settings.

Two-factor authentication, password vaults and enterprise authentication mechanisms should also be used to secure the platform. The data lake's administration guides are excellent resources. Most vendor manuals include detailed guidelines to help administrators secure their systems.

System protection best practices

Vendor manuals for the operating system and the data lake also provide information to help you properly install and configure their software to defend against unauthorized access. Keeping the software current and identifying, analyzing and applying security fixes are standard practices for all platforms, including data lakes. Once again, it's important to apply industry best practices, which include proper system configuration and patch management.

Basic misconfiguration issues and lapses in best practices lead to security problems. According to an article in SiliconAngle, more hackers are exploiting basic security administration mistakes to wreak havoc on Hadoop systems, the leading platform for data lakes.

Ongoing security evaluations

Regularly scheduled penetration tests, vulnerability scans and audits are all essential elements of effective data lake security plans.

The common goal of these scans is to identify security vulnerabilities. It is important to note that all three identify vulnerabilities for a specific point in time and must be performed on a regular basis to maintain a high level of data lake security.

Penetration testing software allows security analysts to execute a series of processes that attempt to exploit known system vulnerabilities to gain access to the target platform. Vulnerability software also identifies known system weaknesses but doesn't attempt to exploit them to gain access. Vulnerability scans are less intrusive and run more frequently than penetration tests. Security audits review the performance of existing controls and evaluate administrative adherence to the organization's policies and procedures.

Organizations use the output generated by penetration tests, vulnerability scans and audits to identify security issues and implement the corrective actions necessary to remediate or mitigate their impact.


Learning how to secure your environment is like learning anything else. You need to commit time to learning various security best practices. There is an enormous amount of educational material, and there are security classes and certifications available on sites such as Udemy and Coursera. Operating system and product administration guides are excellent starting points.

Dig Deeper on Data management strategies

Business Analytics
Content Management