Microsoft Fabric
Microsoft Fabric and the Lakehouse Pattern, A Practical Take for Power BI Teams
A practical explanation of the Fabric lakehouse pattern and what it changes for Power BI teams.
Microsoft Fabric arrived with a lot of marketing noise and a fairly simple core idea. Put the storage layer, the engine layer, and the consumption layer into a single tenant level service so that data engineers, data scientists, and Power BI developers all work against the same files. Whether the noise is justified depends on the size of the team and the existing investment in Azure data services. The core idea, the lakehouse, is the part that matters.
This article focuses on the practical implications of the lakehouse pattern for Power BI teams. It covers what changes for a typical analytics workload, where Fabric earns its keep, and where you should remain sceptical until the platform matures further.
What a Lakehouse Actually Is
A lakehouse is the marriage of a data lake and a data warehouse. The storage is open, columnar, and cheap, sitting on Delta Parquet files in object storage. The query layer adds transactions, schema enforcement, and indexing on top, so the same files can be queried by a SQL engine, a Spark notebook, or the VertiPaq engine that powers Power BI.
In Fabric, the storage layer is OneLake. The query engines include Spark, the Fabric SQL warehouse, and the Power BI engine running in Direct Lake mode. The point is that none of these engines are copying data. They all read the same parquet files in the same logical container.
For Power BI teams, the practical effect is that the boundary between the data warehouse and the BI layer becomes much thinner. The warehouse tables are the BI tables. The transformations that used to happen in Power Query can move upstream into Spark or T SQL. The semantic model becomes a layer of business definitions on top of files that already exist.
The Architecture in Practice
flowchart LR
subgraph Sources["Source Systems"]
SAP[(SAP)]
Sales[(Salesforce)]
Files[(Files / SaaS APIs)]
end
subgraph OneLake["OneLake Storage"]
Bronze[Bronze, Raw landed data]
Silver[Silver, Cleansed and conformed]
Gold[Gold, Star schema for analytics]
end
subgraph Engines["Query Engines"]
Spark[Spark Notebooks]
SQL[Fabric Warehouse]
DL[Direct Lake Semantic Model]
end
Sources --> Bronze
Spark --> Silver
Silver --> Gold
Gold --> DL
Gold --> SQL
DL --> Power[Power BI Reports]
SQL --> PowerThe medallion pattern of bronze, silver, and gold is not new. What is new is that all three layers live in the same store, the engines that read them are interoperable, and Power BI can consume the gold layer without import or DirectQuery in the traditional sense.
Direct Lake, the Feature That Matters Most
Direct Lake is the storage mode that ties the lakehouse to Power BI. It is neither Import nor DirectQuery. The semantic model holds metadata and DAX, but the table data is read directly from Delta Parquet files at query time, then cached in memory for subsequent queries.
The first query against a table is slower because it loads the columnar data from OneLake. Subsequent queries hit the cache and feel like Import mode. When the underlying parquet files change, the cache invalidates and reloads. There is no scheduled refresh in the traditional sense.
For BI teams, this changes a long standing trade off. You no longer have to choose between fresh data (DirectQuery) and fast queries (Import). Direct Lake gives you both, provided your data lives in OneLake.
The catch is that Direct Lake has a list of restrictions that have been shrinking with every release but still exist. Some DAX functions force a fallback to DirectQuery against the warehouse, which is slower. Calculated columns are limited. Data types must align with parquet types. Most of these limitations are footnotes for a typical retail or finance reporting workload, but they matter for sophisticated models.
A Concrete Workflow
Suppose you want to build a sales analytics solution from scratch in Fabric. The flow looks like this.
Step 1, Land the Raw Data
Create a Data Pipeline that copies raw sales data from the source system into a bronze lakehouse table. No transformations. Schema mirrors the source.
Step 2, Cleanse in Spark
Open a Fabric notebook and read the bronze table.
df = spark.read.table("bronze.sales_raw")
clean = (df
.filter("sale_amount IS NOT NULL")
.withColumn("sale_date", to_date("sale_timestamp"))
.withColumn("currency", coalesce("currency", lit("USD")))
.dropDuplicates(["sale_id"]))
clean.write.mode("overwrite").saveAsTable("silver.sales")
The silver layer holds cleansed and conformed data. Schemas are stable. Quality issues that should not appear downstream are filtered out here.
Step 3, Model Into a Star Schema
Build the gold layer as a proper star schema with one fact and several dimensions, all written as Delta tables. This step can be done in Spark or in T SQL using the Fabric warehouse. The choice usually depends on team preference and the complexity of the transformations.
CREATE TABLE gold.fact_sales AS
SELECT
s.sale_id,
s.sale_date AS date_key,
s.customer_id,
s.product_id,
s.store_id,
s.quantity,
s.unit_price,
s.discount,
s.quantity * s.unit_price * (1 - s.discount) AS total_amount
FROM silver.sales s;
CREATE TABLE gold.dim_customer AS
SELECT DISTINCT
customer_id,
customer_name,
customer_city,
customer_country
FROM silver.customers;
Step 4, Create the Direct Lake Semantic Model
In the Fabric workspace, create a new semantic model on top of the gold lakehouse. Add the fact and dimension tables. Set up relationships, mark the date table, and write measures.
The model is now consumable by Power BI Desktop or directly in the service. Reports built against it open instantly because the engine pulls columns from parquet on demand and caches them.
Step 5, Schedule the Pipeline
Create a Data Pipeline that runs the bronze copy, the silver cleanse, and the gold rebuild on a schedule. The semantic model picks up the new data automatically because Direct Lake reads the latest parquet files.
This entire flow can be built in a single Fabric workspace by a small team. There is no Azure subscription to provision, no separate Synapse workspace, no Power BI capacity to size separately. Everything happens inside Fabric.
What Changes in the BI Developer's Day
Three things shift quite noticeably for a Power BI developer working in Fabric.
The first is that Power Query becomes optional. Most transformations should move upstream into Spark or SQL. Power Query still has a role for last mile shaping and for quick prototypes, but the bulk of ETL belongs to the lakehouse.
The second is that semantic models become much smaller. They hold relationships, measures, and metadata, but no row data of their own. A 50 GB semantic model from the old world becomes a 50 MB semantic model in Fabric, with the data living in OneLake and being read on demand.
The third is that collaboration with data engineering becomes much closer. The same lakehouse is the source of truth for both teams. Decisions about schema, granularity, and quality are joint decisions. Some teams find this energising. Others find it painful because the boundary they relied on is now porous.
Where Fabric Earns Its Keep
Fabric is the right answer in three specific situations.
When a team is starting fresh and needs to stand up an analytics platform without a heavy Azure infrastructure project, Fabric removes a lot of plumbing. The combination of OneLake, Spark, warehouse, and Power BI in one tenant eliminates several integration tasks that would otherwise consume weeks.
When an organisation already lives in the Microsoft 365 ecosystem and has Power BI licences, Fabric extends the surface area without a new procurement conversation. The capacity is bought in the same currency. The identity model is the same. Governance plugs into the same labels and policies.
When the data volume is genuinely large and the queries are varied, Direct Lake removes the storage mode trade off that has constrained Power BI for years. For sub second queries against multi billion row datasets without a daily refresh window, the platform is hard to beat right now.
Where Scepticism Is Healthy
Three considerations should temper enthusiasm.
Capacity sizing is still a moving target. Fabric capacity units (CUs) are billed at a different shape than Premium capacity, and the burstable model means a heavy query can throttle other workloads. Test the workload at expected peak before committing to a capacity SKU.
Some features are still in preview. The platform has matured rapidly, but parts of the developer experience, especially around source control and CI CD, are catching up to what the broader Azure data ecosystem has had for years. If your team needs full git integration for every artefact and rigorous environment promotion today, audit the current state carefully.
The vendor lock in question is real. OneLake stores data as open Delta Parquet, which mitigates the issue technically. Operationally, however, building a workload that uses Fabric pipelines, Fabric notebooks, Direct Lake, and Fabric warehousing creates strong gravity. Plan accordingly.
A Migration Note
Teams already running on Azure Synapse or Azure Databricks plus Power BI Premium can move into Fabric incrementally. The lakehouse can sit alongside the existing warehouse, and individual workloads can be migrated one at a time. Direct Lake can be enabled on a single semantic model without disturbing the others. The data does not need to physically move because OneLake supports shortcuts to existing ADLS Gen2 storage.
This incremental path is much safer than a big bang migration and tends to surface the real edges of the platform without putting critical workloads at risk.
A Closing View
Fabric is not a revolution. It is the gradual unification of services that have existed in Azure for years, with a few genuinely new capabilities such as Direct Lake bolted on. For Power BI teams, the meaningful changes are the lakehouse storage pattern and the removal of the Import versus DirectQuery trade off.
If you build a new analytics workload today and there is no strong reason to choose otherwise, doing it in Fabric is a reasonable default. If you have an existing Synapse or Databricks investment that works well, there is no urgency to move. The platform will continue to mature for several more quarters, and an early move often means absorbing changes that later adopters avoid.
Either way, the lakehouse pattern itself is here to stay. It is worth understanding even if you never click a button in Fabric.
References and Further Reading
| # | Source | Type | Link |
|---|---|---|---|
| 1 | Microsoft Learn, Microsoft Fabric documentation | Free official documentation | https://learn.microsoft.com/en-us/fabric/ |
| 2 | Microsoft Learn, Direct Lake mode overview | Free official documentation | https://learn.microsoft.com/en-us/fabric/get-started/direct-lake-overview |
| 3 | Microsoft Learn, OneLake overview | Free official documentation | https://learn.microsoft.com/en-us/fabric/onelake/onelake-overview |
| 4 | Delta Lake documentation | Open source documentation | https://docs.delta.io/latest/index.html |
| 5 | Apache Parquet documentation | Open source documentation | https://parquet.apache.org/docs/ |
| 6 | Microsoft Learn, Medallion lakehouse architecture | Free official guidance | https://learn.microsoft.com/en-us/azure/databricks/lakehouse/medallion |
| 7 | Microsoft Learn, Lakehouse and Delta tables in Fabric | Free official documentation | https://learn.microsoft.com/en-us/fabric/data-engineering/lakehouse-overview |
| 8 | Microsoft Learn, Fabric capacity and licensing | Free official documentation | https://learn.microsoft.com/en-us/fabric/enterprise/licenses |
Reader Comments
Add a comment with your name and email. Your email is used only for basic validation and is not shown publicly.