Building a strong data analytics platform architecture
Data analytics platforms are crucial for information-driven enterprises. With the right architecture, organizations can gain meaningful insights and gain a competitive edge.
Data analytics are a crucial differentiator in today's marketplace. Organizations that invest in their architecture can unlock data's full potential, giving them a competitive edge.
Analytics platforms are now the core of information-driven enterprises, helping turn raw data into actionable insights that support decision-making. By turning data from across the pipeline into value, organizations can shape their platforms to deliver trustworthy data and consistent results at scale.
What is data analytics?
Data analytics is a process of examining and interpreting data. It encompasses a broad range of BI and data science, including the following:
- Descriptive analytics describe what happened.
- Diagnostic analytics explain why it happened.
- Predictive analytics estimate what is likely to happen.
- Prescriptive analytics describe what should be done.
- Cognitive analytics -- such as AI, machine learning and natural language processing -- simulate human intelligence.
A clear architecture serves as the platform's blueprint for effective data management and information delivery and should take into account people, processes, technology and data across analytics and AI workloads.
This alignment helps teams produce reliable insights and measurable business outcomes. Investing in architecture can unlock data's full potential by improving quality, consistency and governance across the organization.
How data turns into value
At its core, an analytic architecture communicates a vision and plan for transforming data into business value. Realizing value from data requires a sequence of stages. Each stage incrementally improves usefulness and reliability, enabling data-informed decision-making to deliver tangible business results. The data value chain flows as follows:
- Stage 1: Business strategy. Defines business goals, objectives and the decisions the organization needs to support.
- Stage 2: Data. Raw values, often with limited or no meaning tied to the strategy.
- Stage 3: Information. Data organized with contextualized so it can be understood and compared.
- Stage 4: Insight. Conclusions drawn from analysis to show patterns or relationships.
- Stage 5: Action. Applying insights to achieve business outcomes.
- Stage 6: Business outcome. A measurable result that a specific action has on organizational performance, such as revenue or customer experience.
What a modern analytics platform needs
A modern data analytics platform architecture supports all types of analytics and maximizes data value by describing a framework that maps how data flows through its stages. This architecture defines the people, processes, technologies and data organizations need to reduce cost, complexity and redundancy while delivering enterprise-wide analytics.
The purpose of architecture is to provide business unit and enterprise analytics, communicate architectural decisions, minimize risk and ensure data quality and consistency across the organization.
Modern data analytics platform architecture has a variety of useful attributes. Organizations should consider the following characteristics when designing new or modernizing existing architecture:
- Agility. Adapt to changing business requirements.
- Cost. Optimize expenses while maintaining performance.
- Talent. Train and maintain a skilled and data-literate workforce.
- Technology. Run analytics on platforms and managed services to meet budget and reliability requirements.
- Process. Streamline operations with automation and clear governance where possible.
- Scale. Handle data at any volume, velocity and variety.
- Capabilities. Provide a full analytical spectrum.
- Innovation. Evaluate emerging tools and trends and adopt for continuous improvement.
- Self-service. Enable users to find, understand and use data without IT intervention.
- Insights. Support data-driven decision making.
- AI assistance. Use AI to automate analysis, improve productivity and enhance data quality.
Data lakes and data warehouses vs. data lakehouses
Modern architecture brings together structured, semi-structured and unstructured data. There are two prevailing strategies to do so:
- Data lake and data warehouse coexistence.
- Data lakehouse.
Strategy 1: Data lake and data warehouse coexistence
This strategy is complementary. Each repository addresses different analytical uses at different points in the pipeline.
Data lakes lie at the front end of the pipelines and store raw data. They're optimized for getting data into the analytics platform. Teams use landing zones and independent data sandboxes for ingestion and discovery. These native format data stores are open to private consumers for selective use. Analytics are generally limited to time-sensitive insights and exploratory inquiry by consumers who can work with data that is not yet standardized.
Data warehouses reside at the back end of the pipeline and serve refined data for querying and analysis. Data warehouses are purpose-built data stores designed for use across the organization. Analytics span a wide range of insights for use by casual and sophisticated consumers, delivering tactical and strategic insights that run the business.
Strategy 2: Data lakehouse
This strategy blends data lakes and data warehouses into a unified platform. A single source of truth supports both BI and data science workloads, reducing duplication and simplifying data management.
Regardless of the data storage strategy, a medallion architecture -- a popular design for lakehouses -- provides a structured approach for organizing and processing data in incremental layers. Each stage of data processing further refines data:
- Bronze: Data is raw, unprocessed and in its original format.
- Silver: Data is cleaned, standardized, enriched and validated.
- Gold: Business-ready data that is integrated, modeled and governed.
Determining the point in the pipeline at which data becomes meaningful is often tempered by time and quality. On one hand, access to data sooner in the pipeline favors time-sensitive insights over the suitability of non-standardized data, particularly for use cases requiring the most recent data. However, access to data later in the pipeline favors data accuracy over increased latency due to curation. Use cases favoring this approach require clean, conformed and enriched data that is of known quality.
Why data quality matters for AI
As AI continues to reshape data management and insight generation and consumption, the importance of an AI-ready foundation can't be overstated. AI depends on reliable, well-governed data. AI aids data and analytics workflow by automating routine tasks to improve productivity and bring faster analysis.
It is important to remember that AI is only as good as the data behind it. After all, AI doesn’t fix data problems; it amplifies them at scale. The saying "garbage in, garbage out" holds true. As AI adoption grows, governed high-quality data becomes essential.
The platform must ensure data is AI-ready. Successful organizations must first make certain their data foundation meets the following standards before their AI is considered trusted and defensible:
- Accountability.
- Standard definitions.
- Sufficient quality.
- Known lineage.
- Governed access.
- Data-literate workforce.
Once organizations get the data right, AI can produce more reliable insights and support better decision-making and ultimately improve overall business outcomes.
Deployment options for enterprise analytics
Choosing where to run an analytics platform isn't an easy decision. Universal considerations include agility, scale, security and privacy, network latency, analytic capabilities and FinOps. However, a big part of the hosting decision comes down to control.
Organizations comfortable with sharing control are likely to lean more toward the cloud. Smaller organizations typically gravitate to an entirely public cloud strategy. Organizations that feel more comfortable owning the end-to-end platform will likely lean more toward an on-premises option, which is more common in regulated industries or organization with regulatory requirements.
Fortunately, public cloud and on-premises deployments aren't mutually exclusive. Many organizations often deploy a hybrid strategy, which takes advantage of both cloud and on-premises benefits, such as flexibility and control.
Organizations that choose the public cloud for some or all their data analytics platform architecture should take advantage of what it does best. This means using SaaS and PaaS cloud models, choosing managed and native cloud services, automating elasticity, implementing tiered storage, geo-dispersing the analytics platform and capitalizing on consumption-based pricing, including serverless technologies when possible.
Jeff McCormick is an enterprise data architect and IT principal who has extensive experience in data-related IT roles. He is also an inventor, patent holder, freelance writer and industry presenter.