Data Strategy
What Enables Connected Decision Systems: Inside a Modern Analytics Platform
A practical breakdown of the layers inside a modern analytics platform and how they support decision systems.
A practical look at the layers that support data-driven organisations
A few years ago, building a serious data capability inside a company usually required a large specialist team: database administrators, ETL developers, BI developers, data scientists, and infrastructure engineers. The work was expensive, slow, and fragile. If the one person who understood a critical pipeline left, the whole reporting process could suffer.
Modern analytics platforms have changed that picture. They bring many of those capabilities into a shared environment and reduce the amount of custom engineering needed to get reliable data into the hands of decision-makers. To evaluate them properly, it helps to understand what they are doing under the hood.
This article breaks down the main layers of a modern analytics platform and explains why each one matters.
The Core Problem These Platforms Solve
Organisations have data coming from dozens or hundreds of sources: transactional databases, SaaS applications, cloud services, IoT devices, spreadsheets, APIs, and streaming events. These sources use different formats, update at different frequencies, follow different schemas, and vary widely in quality.
At the same time, people across the business need answers quickly. They need figures they can trust, rather than answers that change depending on which system they open.
A modern analytics platform sits between scattered source data and reliable decision-making. It collects, stores, transforms, governs, and serves data so people and systems can use it with confidence.
The Architecture of a Modern Analytics Platform
flowchart LR
subgraph Sources["Data sources"]
DB["Operational databases"]
SAAS["SaaS apps"]
STREAM["Events and IoT"]
FILES["Files and spreadsheets"]
end
subgraph Ingestion["Ingestion and integration"]
BATCH["Batch pipelines"]
CDC["Change Data Capture"]
API["API connectors"]
end
subgraph Storage["Storage zones"]
BRONZE["Bronze raw zone"]
SILVER["Silver cleaned zone"]
GOLD["Gold business-ready zone"]
end
subgraph Modelling["Transformation and semantic layer"]
TRANSFORM["Business transformations"]
METRICS["Shared metrics and KPIs"]
CATALOG["Catalogue and lineage"]
end
subgraph Consumption["Consumption"]
BI["BI dashboards"]
SQL["SQL / notebooks"]
ML["ML models"]
EMBED["Embedded analytics"]
end
DB --> BATCH
SAAS --> API
STREAM --> CDC
FILES --> BATCH
BATCH --> BRONZE
CDC --> BRONZE
API --> BRONZE
BRONZE -->|"validate and clean"| SILVER
SILVER -->|"business model"| GOLD
GOLD --> TRANSFORM --> METRICS
METRICS --> BI
METRICS --> SQL
METRICS --> ML
METRICS --> EMBED
GOV["Governance, security, audit"] -.-> Storage
GOV -.-> Modelling
GOV -.-> ConsumptionThe diagram is simplified, but it captures the core pattern. Each layer has a distinct job, and weaknesses in one layer often show up as confusion or mistrust in another.
Layer 1: Data Sources
This is where your data lives before the platform touches it. It includes:
- Operational databases: The transactional systems behind your applications
- SaaS platforms: Salesforce, SAP, HubSpot, Shopify, and the other tools teams use every day
- Streaming data: Real-time events from applications, IoT sensors, and clickstreams
- Files and flat data: CSVs, Excel files, and JSON exports that still appear in many business processes
Most platforms should not modify source systems directly. They read from them and ingest a copy. The source remains the operational system of record.
Layer 2: Ingestion and Integration
This layer is the plumbing. It moves data from source systems into the analytics platform reliably, at the right frequency, and without losing records.
There are two main patterns:
Batch ingestion pulls data on a schedule, such as hourly, nightly, or weekly. It is simpler to build and works well when fresh-to-the-second data is unnecessary. A monthly finance report does not need real-time ingestion.
Streaming ingestion captures data as it is generated, usually with latency measured in milliseconds or seconds. This is useful for fraud detection, live inventory tracking, customer-facing personalisation, and other time-sensitive workflows.
Most mature platforms support both. A key capability here is Change Data Capture (CDC), which detects and captures only the rows that changed in a source database. CDC avoids pulling an entire table every time and makes ingestion much more efficient at scale.
Layer 3: Storage
Once data arrives, you need somewhere to store it. Modern platforms have moved away from putting everything directly into a traditional warehouse. Many now use a tiered model often called the Medallion Architecture.
Bronze layer (raw zone): Data lands here exactly as it came from the source. It has not been cleaned or transformed. This gives you a safety net if something breaks downstream.
Silver layer (cleaned zone): Data has been validated, deduplicated, standardised, and enriched. Raw customer records become clean customer records here, and much of the data engineering effort happens at this stage.
Gold layer (domain zone): Business-ready data is modelled for specific use cases. This may include sales reporting tables, marketing attribution models, or finance consolidation views. The gold layer is what most business users query.
This structure protects raw data, makes cleaning work explicit, and gives business users data models designed around their decisions.
Layer 4: Transformation and Modelling
Raw data rarely reflects business reality on its own. A timestamp in a database is not necessarily the same as "the date a sale was recognised for revenue purposes." A user ID in a clickstream is not the same thing as a customer in a CRM.
Transformation applies business logic so raw data becomes meaningful data. Tools such as dbt (data build tool) are popular because they allow teams to define transformations as SQL code, version-control them, test them, and document them in one place.
For machine learning workloads, this layer may also include a feature store. A feature store is a central place for reusable model inputs, such as "customer average order value in the last 90 days." This avoids teams rebuilding the same logic in multiple models.
Layer 5: The Semantic Layer
The semantic layer is one of the most important parts of a modern analytics platform, even though it is often overlooked.
It sits between the physical data and the people consuming it. Its job is to define what business terms mean. What is a "customer"? What counts as "revenue"? How is "conversion rate" calculated?
Without a semantic layer, each team builds its own version of these metrics. The result is the familiar argument about which number is correct.
A strong semantic layer also provides data lineage, which lets you trace a number back to its source. If a CEO asks where a revenue figure came from, the data team should be able to show the source tables, transformations, and calculations behind it. That visibility builds trust.
Layer 6: The Consumption Layer
This is where humans and machines use the data.
Business Intelligence tools such as Power BI and Tableau help non-technical users build reports, explore data, and monitor KPIs through dashboards.
Notebooks and SQL editors give analysts and data scientists the flexibility to run ad-hoc analysis, build models, and explore questions that dashboards cannot answer.
Machine learning models use prepared data from the platform for predictions such as churn risk, demand forecasting, and product recommendations.
Embedded analytics put insight directly inside the applications where decisions happen. A sales rep might see a churn risk score inside the CRM instead of opening a separate analytics tool.
The best platforms can serve all of these consumption patterns from the same governed data foundation.
Layer 7: Governance and Security
flowchart LR
subgraph Platform["Analytics platform"]
STORAGE["Storage zones"]
SEMANTIC["Semantic layer"]
REPORTS["Reports and models"]
end
subgraph Controls["Control plane"]
ACCESS["Access control"]
QUALITY["Data quality monitoring"]
AUDIT["Compliance and audit trail"]
end
STORAGE --> SEMANTIC --> REPORTS
ACCESS -.-> STORAGE
ACCESS -.-> REPORTS
QUALITY -.-> STORAGE
QUALITY -.-> SEMANTIC
AUDIT -.-> REPORTS
AUDIT -.-> SEMANTICGovernance is not a single architectural layer in the strict sense. It applies across the platform. It is also often the difference between a platform people trust and a platform that creates compliance risk.
The key components are:
- Access control: Who can see what? Sensitive financial data should not be visible to everyone. Customer PII may need to be masked or restricted based on role.
- Data quality monitoring: Automated checks catch issues before they reach business users. If a pipeline breaks and revenue goes to zero overnight, you want to know before the CFO opens the dashboard.
- Compliance and audit trails: Regulated industries need to prove who accessed data, when, and why. GDPR, SOX, and HIPAA differ in detail, but all require disciplined handling of sensitive information.
What Makes a Platform "Modern"
flowchart LR
subgraph Criteria["Evaluation criteria"]
UNIFIED["Unified experience"]
OPEN["Open formats"]
LAKEHOUSE["Lakehouse capability"]
GOVERNED["Built-in governance"]
REALTIME["Real-time support"]
end
subgraph Decision["Platform decision"]
FIT["Architecture fit"]
RISK["Lock-in and compliance risk"]
VALUE["Decision value"]
end
UNIFIED --> FIT
OPEN --> RISK
LAKEHOUSE --> FIT
GOVERNED -.-> RISK
REALTIME --> VALUE
FIT --> VALUE
RISK -.-> VALUEIf I were evaluating a platform today, I would look for:
- Unified experience: Can data engineers, analysts, and scientists work in the same environment without constant handoffs?
- Open formats: Is the data stored in standards such as Delta Lake or Apache Parquet, or does the platform depend on proprietary formats?
- Lakehouse capability: Can the platform support analytics and AI/ML workloads on the same data without repeated movement between systems?
- Built-in governance: Are security and compliance native to the platform, or do they depend on extra tools bolted on later?
- Real-time support: Can the platform handle streaming data as well as batch processing?
Microsoft Fabric, Databricks, and Snowflake are all competing in this space. Each has strengths and trade-offs that matter in different operating environments.
The Main Lesson
A modern analytics platform is not a single product. It is an architecture made of layers that move data from source systems to decision-makers in a reliable, trustworthy, and timely way.
Understanding those layers helps you choose technology, but it also helps you diagnose why a current setup is failing. In practice, that understanding is often more valuable than any individual tool.
Next in this series: Building a simple decision-support workflow using Microsoft Fabric, from raw data to actionable insight in one platform.
Reader Comments
Add a comment with your name and email. Your email is used only for basic validation and is not shown publicly.