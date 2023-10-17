Traditional data stacks lack the flexibility and scalability that cloud technology provides the modern data stack. However, on-premises data stacks can hold several benefits over their cloud counterparts.

A data stack is the set of platforms, tools and other technologies that enable organizations to collect, store and use their data. Traditionally, a data stack was on premises in a company's data center and made heavy use of relational databases or data warehouses.

The modern data stack uses cloud storage and advanced analytics tools to create a more flexible and scalable option that each organization can tailor to its specific needs.

While often associated with legacy technologies, on-premises stacks can add advanced analytics tools and should not be discounted simply due to the lack of cloud. Organizations should evaluate their needs and may find an on-premises data stack suits them better than a modern one.

How a data stack works A data stack is like a supply chain, but for data instead of physical goods. Just like a physical supply chain, a data stack can involve several specialized tools, technologies and frameworks, said Bob Parr, chief data officer at KPMG US. For example, a data stack can include tools that assess and remediate data quality and normalize data with common codes. It can also include tools that structure the data properly for storage, aggregation and distribution for analytics, reporting, visualizations and insight generation. No single vendor or set of services can cover all of these, Parr said. Almost every organization has some form of data stack. Today, most are cloud-enabled. A typical example of a modern data stack could look like this: Azure Data Factory or AWS Glue Data for data ingestion.

Informatica's Intelligent Data Management Cloud or AWS Glue Data Brew for data quality.

Amazon Web Services, S3 bucket, MongoDB Atlas or Azure Data Lake for data storage.

Apache Hadoop, Apache Spark or Data Bricks for data processing or transformation.

The Python programming language and its libraries such as Pandas and NumPy or Dataiku for data analysis.

Tableau or Power BI for data visualization. Each of the above options offers a suite of cloud services to address most of an organization's needs, Parr said. Cloud-based data stack provider benefits When organizations have a deep relationship with a primary hyperscaler such as Microsoft, AWS or Google, they tend to align the rest of their data stack with that particular cloud provider, Parr said. Going with a single cloud provider can often include tradeoffs. For example, the cloud provider's tools might be simpler to integrate and have more predictable cost structures. However, they may not offer the best-in-class functionality for every single component. Benefits of a cloud-based data stack include scalability, increased accessibility, integrated analytics, machine learning capabilities, and reduced infrastructure and maintenance costs, Parr said. These commercial data sources can augment an organization's own data to improve analytics. For example, there are companies offering economic data, weather data, supply chain data, competitive benchmarks and more.