Microsoft Azure Data Lake is a highly scalable public cloud service that allows developers, scientists, business professionals and other Microsoft customers to gain insight from large, complex data sets. As with most data lake offerings, the service is composed of two parts: data storage and data analytics.
According to Microsoft, customers can provision Azure Data Lakes to store an unlimited amount of structured, semi-structured or unstructured data from a variety of sources. The service does not impose limits on account sizes, file sizes, or the amount of data that can be stored in a data lake.
On the analytics side, Azure Data Lake customers can write their own code to perform specific operational or transactional data transformation and analysis tasks. They can also use existing tools, such as Microsoft's Analytics Platform System or Azure Data Lake Analytics, to query data sets.
Azure Data Lake is based on the Apache Hadoop YARN (Yet Another Resource Negotiator) cluster management platform and is intended to scale dynamically across SQL servers in Azure Data Lake, as well as servers in Azure SQL database and Azure SQL Data Warehouse. A unified approach within the Hadoop ecosystem helps the service accommodate the needs of big data projects, which are compute-intensive and often have distributed data sources.
Pricing for Azure Data Lake is dependent upon numerous variables, including storage capacity, the number of analytics units (AUs) per minute, the number of completed jobs and the cost of managed Hadoop and Spark clusters. As of this writing, the Azure Data Lake Store service is priced at $0.039 per GB per month for pay as you go, with capacity-based discounts up to 33% for monthly commitments. The Azure Pricing Calculator can help customers determine exact data lake costs.