AI and Automation

Building AI Agents That Generate Power BI Reports On Their Own

By Syed Hussnain Sherazi | October 21, 2025 | AI Agents | Power BI | Automation

A component-by-component guide to building AI agents that can generate Power BI reports.

The most repetitive task in any analytics team is not the analysis. It is the production of the same report, with slightly different filters, for slightly different audiences, week after week. Regional sales reports for each region. Product performance decks for each product line. Daily KPI emails for each business unit. The work is mechanical, the inputs are well defined, and the outputs follow a template.

This is exactly the kind of work that agentic AI can handle. The technical pieces are now mature enough to build a system that takes a brief in plain English, picks the right semantic model, queries the data, builds the visuals, applies the brand template, and publishes the report. This article walks through how such a system is put together.

The Goal

Imagine a brief that arrives at 8am on Monday morning. "Build a one page summary for the EMEA leadership meeting tomorrow. Focus on revenue, margin, and headcount, year on year, broken down by country. Highlight the three most significant changes."

A human analyst spends two hours on this brief. An agent should spend two minutes. The output should be indistinguishable from what the analyst would have produced, both in correctness and in visual polish.

This article describes a working architecture for that kind of agent and walks through the pieces in enough detail to build a prototype.

Architecture

flowchart LR
    Brief[User brief in plain English]
    Plan[Planner agent]
    Catalog[Semantic Model Catalogue]
    DAX[DAX Generator]
    Validator[Result Validator]
    Layout[Layout Designer]
    Theme[Brand Theme Service]
    Report[Power BI Report Builder]
    Publish[Workspace Publisher]
    Output[Published Report or PDF]

    Brief --> Plan
    Plan --> Catalog
    Plan --> DAX
    DAX --> Validator
    Validator --> DAX
    Validator --> Layout
    Layout --> Theme
    Theme --> Report
    Report --> Publish
    Publish --> Output

There are six logical components. The planner decomposes the brief into measurable questions. The catalogue picks the right semantic model. The DAX generator writes queries. The validator checks the results for plausibility. The layout designer chooses visuals and arranges them on a page. The publisher assembles the report and ships it.

Component 1, The Planner

The planner converts a brief into a structured plan. It uses a reasoning capable language model with a system prompt that describes the available analytical primitives.

A simple plan for the EMEA brief looks like this.

{
  "title": "EMEA Leadership Summary",
  "audience": "Executive",
  "page_count": 1,
  "metrics": [
    {"name": "Revenue", "comparison": "YoY"},
    {"name": "Margin", "comparison": "YoY"},
    {"name": "Headcount", "comparison": "YoY"}
  ],
  "filters": {"Region": "EMEA"},
  "breakdowns": ["Country"],
  "highlights": "Top 3 movers by absolute change",
  "delivery": "PDF and workspace publish"
}

The planner does not write DAX. It produces a contract that downstream components understand.

Component 2, The Catalogue

The catalogue maintains metadata for every semantic model the agent is allowed to use. For each model it records the workspace ID, dataset ID, the schema, the available measures, and a short description of what the model is for.

The agent picks the best fit by embedding both the brief and each catalogue entry, then computing cosine similarity. The top match becomes the working model. If no match scores above a threshold, the agent reports the issue rather than guessing.

def select_model(brief, catalog, embedder):
    brief_vec = embedder.embed(brief)
    scored = [(entry, cosine(brief_vec, embedder.embed(entry.description))) for entry in catalog]
    best = max(scored, key=lambda x: x[1])
    if best[1] < 0.65:
        raise NoSuitableModelError("No semantic model matches the brief")
    return best[0]

Embeddings are cheap. A catalogue with a thousand semantic models still resolves in under a second.

Component 3, The DAX Generator

The DAX generator turns the plan into actual queries. For each metric in the plan, it produces one DAX EVALUATE statement that returns the data needed for the visual.

A naive prompt with the schema and the requested metric usually produces correct DAX, but production grade generation needs guardrails.

def generate_dax(plan, model_schema, llm):
    system_prompt = f"""
    You are a senior Power BI developer. Write DAX EVALUATE queries that match the user request.
    The semantic model has the schema below.
    {model_schema}
    Rules:
    - Use existing measures wherever possible.
    - Use SUMMARIZECOLUMNS for aggregation queries.
    - Always include a column for the breakdown dimension.
    - Add explicit ORDER BY for deterministic output.
    """
    queries = []
    for metric in plan.metrics:
        prompt = f"Plan: {plan}. Generate DAX for metric {metric.name} {metric.comparison}."
        dax = llm.complete(system=system_prompt, user=prompt)
        queries.append(dax)
    return queries

The generated queries pass through the validator before they touch the dataset.

Component 4, The Validator

The validator runs the query against the model and checks the result. Three checks matter most.

The first check is structural. Did the query return rows. Did it return the expected columns. Are the data types sensible. A query that returns zero rows when the brief implied data should exist is suspicious.

The second check is plausibility. Are the numbers within the expected magnitude for the model. A revenue figure that is six orders of magnitude off the historical mean is almost certainly wrong, even if the query syntax is valid.

The third check is consistency. Do the totals reconcile across breakdowns. If revenue by country sums to a different number than total revenue for the region, something is amiss.

def validate(query, dataset_id, model_stats):
    result = run_dax(dataset_id, query)
    if not result.rows:
        return ValidationFailure("Query returned no rows")
    total = sum(row.value for row in result.rows)
    if not (model_stats.min_plausible <= total <= model_stats.max_plausible):
        return ValidationFailure(f"Total {total} outside plausible range")
    return ValidationSuccess(result)

When validation fails, the agent loops back to the DAX generator with the failure reason. Three retries is a sensible cap. After that, the agent escalates to a human.

Component 5, The Layout Designer

The layout designer picks visuals for each metric and arranges them on the page. The choice of visual depends on the shape of the data.

A single metric with a year on year comparison becomes a KPI tile with a sparkline. A breakdown by category with comparison becomes a bar chart with side by side bars. A trend over time becomes a line chart. Three movers from the highlight become a small annotated table with directional arrows.

The designer also handles the page layout. For a one page executive summary, the standard pattern is a header strip with the title and date, three KPI tiles across the top, two charts below them, and a textual narrative panel along the right hand side. The exact dimensions are encoded in a template.

def design_layout(plan, validated_data):
    layout = []
    layout.append({"type": "header", "title": plan.title, "date": today()})
    for metric, data in zip(plan.metrics, validated_data):
        layout.append({"type": "kpi_tile", "metric": metric, "data": data})
    layout.append({"type": "bar_chart", "data": validated_data.by_country})
    layout.append({"type": "narrative", "highlights": find_movers(validated_data)})
    return layout

The narrative is generated by the language model from the same data, with a constraint that every claim must be backed by a number that appears in the visuals.

Component 6, The Publisher

The publisher takes the layout and creates an actual Power BI report. There are two routes.

The first route is to use the Power BI REST API to clone a template report and rewrite the bookmarks, page text, and visual filters via the API. This is the fastest path but limited to what the API exposes.

The second route is to generate the report definition in PBIR format, the new file format that exposes the report as JSON. PBIR files can be authored programmatically and then imported into the workspace via the deployment API. This route gives much more control over visuals and layout.

def publish(layout, workspace_id, dataset_id):
    report = render_pbir(layout, dataset_id)
    response = requests.post(
        f"https://api.powerbi.com/v1.0/myorg/groups/{workspace_id}/imports",
        headers={"Authorization": f"Bearer {token}"},
        files={"file": ("report.pbir", report)}
    )
    return response.json()["id"]

After publishing, the agent can also export the report to PDF using the export to file API and email it to the recipients listed in the brief.

Putting It Together

A complete agent loop, expressed in pseudocode, looks like this.

def run_agent(brief):
    plan = planner.create_plan(brief)
    model = catalog.select_model(plan)
    schema = model.fetch_schema()

    queries = dax_generator.generate(plan, schema)
    validated = []
    for q in queries:
        for attempt in range(3):
            result = validator.validate(q, model.dataset_id)
            if result.ok:
                validated.append(result.data)
                break
            q = dax_generator.regenerate(q, result.failure_reason)
        else:
            return EscalateToHuman(brief, q, result.failure_reason)

    layout = layout_designer.design(plan, validated)
    report_id = publisher.publish(layout, model.workspace_id, model.dataset_id)
    return DeliverableReady(report_id, plan)

A prototype that follows this shape can be running in a few hundred lines. A production grade version adds auditing, evaluation harnesses, and access controls, but the core loop stays the same.

What Goes Wrong, and How to Catch It

Three failure modes show up most often.

The first is the language model writing plausible looking DAX that uses non existent measures or columns. The validator catches this when the query fails to execute. The fix is to give the DAX generator a tighter list of available primitives in the prompt, and to use function calling so that the model is forced to pick from a known set.

The second is the model misinterpreting the brief. A request for revenue by country might be answered with revenue by region because the planner conflated the two. This is caught by an explicit confirmation step, where the planner shows the structured plan back to the user before any work begins. The user can tweak the plan before the agent commits to executing it.

The third is layout failure. The agent picks a visual that does not fit the data shape, and the rendered report looks awkward. This is caught by a final review step that checks the rendered PNG of the report against a quality rubric. Reports below a quality threshold are escalated to a human reviewer rather than published automatically.

A Note on Trust

The agent will not be trusted overnight. The right rollout pattern is to run it in shadow mode for several weeks. The agent produces a report. A human reviews it side by side with what they would have built. Discrepancies are logged and the agent is improved. Once the discrepancy rate falls below an acceptable threshold, the agent moves to assisted mode, where it produces a draft and the human approves. After more time, it can move to autonomous mode for low risk reports.

This pattern matches how trust is built between a junior analyst and a senior reviewer. The agent earns autonomy the same way a junior would, by accumulating evidence of correctness over time.

A Final Reflection

The reports that benefit most from this kind of automation are the ones nobody enjoys producing. Routine summaries, mechanical updates, periodic decks. Freeing the team from this work does not eliminate analyst jobs. It pushes analysts toward the questions that are genuinely interesting, which is where the real value of analytics has always been.

If your team produces more than ten templated reports a week, building this agent is among the highest leverage projects you can invest in. The platform is ready. The pieces are well documented. The only thing missing is the team that decides to put them together.

References and Further Reading

#	Source	Type	Link
1	Microsoft Learn, Power BI REST API reference	Free official documentation	https://learn.microsoft.com/en-us/rest/api/power-bi/
2	Microsoft Learn, Power BI Embedded analytics	Free official documentation	https://learn.microsoft.com/en-us/power-bi/developer/embedded/
3	Anthropic, Building Effective Agents	Free engineering article	https://www.anthropic.com/engineering/building-effective-agents
4	LangChain documentation	Open source framework documentation	https://python.langchain.com/docs/
5	LangGraph documentation	Open source framework documentation	https://langchain-ai.github.io/langgraph/
6	pbi tools	Open source CLI for Power BI	https://github.com/pbi-tools/pbi-tools
7	Microsoft Learn, XMLA endpoints	Free official documentation	https://learn.microsoft.com/en-us/power-bi/enterprise/service-premium-connect-tools
8	Microsoft Semantic Kernel	Open source GitHub project	https://github.com/microsoft/semantic-kernel

Back to Knowledge Sharing Contact Syed Hussnain

Reader Comments

Add a comment with your name and email. Your email is used only for basic validation and is not shown publicly.