Analytics platforms have made their way to the forefront of information-driven enterprises. Winning organizations know a core competency in analytics requires a modern data analytics platform architecture that delivers insights at critical junctures in their data pipelines while minimizing cost, redundancy and complexity.
What is a data analytics platform?
A data analytics platform can be defined as everything it takes to draw meaningful and useful insights from data. For a general concept of an analytics platform, think in terms of data, analytics and insights.
Delivering an analytics platform requires a robust architecture that serves as a blueprint for delivering business unit and enterprise analytics, communicating architectural decisions, reducing individual project risk and ensuring enterprise consistency.
A traditional data analytics platform architecture is often not well positioned to support today's data-driven organizations. New business demands, enabling technologies and cost pressures are prompting organizations to modernize their analytics platforms in order to realize the full potential of data as a corporate asset.
This article is part of
Modernizing means rethinking a data analytics platform architecture, including these attributes:
- Agility at the speed of business
- Cost optimization
- Highly qualified personnel
- Process automation
- Best-in-class technology
- Handling of data at any speed, size and variety
- Seamless data integrations
- Timely insights throughout data pipelines
- Full spectrum of business intelligence capabilities
- Robust security architecture
- High-speed direct connect data fabric
- Loosely coupled technology ecosystem
- High-efficiency computing
- Strong governance controls and stewardship
- Rapid development and deployment
- Well-documented architecture and metadata
Data lake vs. data reservoir
A strong data analytics platform architecture will account for data lakes and data reservoirs. This coexistence is complementary as each repository addresses different data and analytical uses at different points in the pipeline.
The main differences between the two involve data latency and refinement. Both store structured and unstructured data, leveraging various data stores from simple object files to SQL and NoSQL database engines to big data stores.
Data lakes are raw data repositories located at the beginning of data pipelines, optimized for getting data into the analytics platform. Landing zones and sandboxes of independent data designed for ingestion and discovery, these native format data stores are open to private consumers for selective use. Analytics are generally limited to time-sensitive insights and exploratory inquiry by consumers who can tolerate the murky waters.
Data reservoirs are refined data repositories located at operational and back-end points of data pipelines, optimized for getting data out of the analytics platform. As sources of unified, harmonized and wrangled data designed for querying and analysis, data reservoirs are purpose-built data stores that are open to the public for general consumption. Analytics span a wide range of past, present and future insights for use by casual and sophisticated consumers, serving both tactical and strategic insights that run the business.
Determining at what point in the pipeline data becomes meaningful for a particular use case is often tempered by time and quality.
On one hand, access to data early in the pipeline will favor time-sensitive insights over the suitability of non-harmonized data, particularly for use cases that require the most recent data. On the other hand, access to data later in the pipeline will favor data accuracy over increased latency by virtue of curation, particularly for use cases that require data that has been cleaned, conformed and enriched, and that is of known quality.
Public cloud or on premises
Choosing where to run your analytics platform is not as easy decision. Fortunately, public cloud and on-premises deployments aren't mutually exclusive. Smaller organizations typically gravitate to an entirely public cloud strategy, while midsize to large organizations often deploy a hybrid strategy or assume complete control with an all on-premises strategy.
Any decision on where to host a data analytics platform should minimally consider agility, scale, cost, security (particularly sensitive data protection), network latency and analytic capabilities.
A big part of the hosting decision comes down to control. Organizations that are comfortable sharing control are likely to lean more toward a cloud presence. Organizations that feel comfortable owning the end-to-end platform will likely lean more toward an on-premises option.
Regardless of where you run your analytics platform, modernization should not simply be a lift-and-shift approach. You may not need a complete overhaul, but take the opportunity to refresh select components and remove technical debt across your platform.
Organizations that choose the public cloud for some or all their data analytics platform architecture should take advantage of what the cloud does best. This means moving from IaaS to SaaS and PaaS models. Look to maximize managed services, migrate to native cloud services, automate elasticity, geo-disperse the analytics platform and move to consumption-based pricing whenever possible by using serverless technologies.
The importance of flexibility
Flexibility has become a necessary attribute of a modern data analytics platform architecture. An expanding demand for analytics is forcing analytics platforms to be more accessible, extensible and nimble while processing data at greater velocity, volume and variety.
One thing is for sure: Your data analytics platform architecture will change. A key measurement of a platform's flexibility is how well it adapts to business and technology innovation. Expect the business to demand an accelerated analytics lifecycle and greater autonomy via self-service capabilities. To keep pace with the business, look for technology advancements in automation and artificial intelligence as well as catalysts for augmented data management and analytics.