Power Bi Transform Model
Power Bi Transform Model
e OVERVIEW
g TUTORIAL
p CONCEPT
c HOW-TO GUIDE
Model data
p CONCEPT
Modeling view
Many-to-many relationships
c HOW-TO GUIDE
Calculations
g TUTORIAL
p CONCEPT
Calculated tables
c HOW-TO GUIDE
e OVERVIEW
p CONCEPT
Datamarts (preview)
e OVERVIEW
c HOW-TO GUIDE
p CONCEPT
With Power BI Desktop you can connect to the world of data, create compelling and
foundational reports, and share your efforts with others – who can then build on your
work, and expand their business intelligence efforts.
Report view – You can use queries that you create to build compelling
visualizations, arranged as you want them to appear, and with multiple pages, that
you can share with others.
Data view – See the data in your report in data model format, where you can add
measures, create new columns, and manage relationships.
Model view – Get a graphical representation of the relationships that are
established in your data model, and manage or modify them as needed.
Access these views by selecting one of the three icons along the left side of Power BI
Desktop. In the following image, Report view is selected, indicated by the yellow band
beside the icon.
Power BI Desktop also comes with Power Query Editor. Use Power Query Editor to
connect to one or many data sources, shape and transform the data to meet your needs,
then load that model into Power BI Desktop.
This article provides an overview of the work with data in the Power Query Editor, but
there's more to learn. At the end of this article, you'll find links to detailed guidance
about supported data types. You'll also find guidance about connecting to data, shaping
data, creating relationships, and how to get started.
But first, let's get acquainted with Power Query Editor.
With no data connections, Power Query Editor appears as a blank pane, ready for data.
After a query is loaded, Power Query Editor view becomes more interesting. If you
connect to the following Web data source using the New Source button in the top left,
Power Query Editor loads information about the data, which you can then begin to
shape:
https://www.bankrate.com/retirement/best-and-worst-states-for-retirement/
Here's how Power Query Editor appears after a data connection is established:
1. In the ribbon, many buttons are now active to interact with the data in the query.
2. In the left pane, queries are listed and available for selection, viewing, and shaping.
3. In the center pane, data from the selected query is displayed and available for
shaping.
4. The Query Settings pane appears, listing the query's properties and applied steps.
Each of these four areas will be explained later: the ribbon, the Queries pane, the Data
view, and the Query Settings pane.
To connect to data and begin the query building process, select New Source. A menu
appears, providing the most common data sources.
For more information about available data sources, see Data Sources. For information
about connecting to data, including examples and steps, see Connect to Data.
The Transform tab provides access to common data transformation tasks, such as:
For more information about transforming data, including examples, see Tutorial: Shape
and combine data in Power BI Desktop.
The Add Column tab provides more tasks associated with adding a column, formatting
column data, and adding custom columns. The following image shows the Add Column
tab.
The View tab on the ribbon is used to toggle whether certain panes or windows are
displayed. It's also used to display the Advanced Editor. The following image shows the
View tab.
It's useful to know that many of the tasks available from the ribbon are also available by
right-clicking a column, or other data, in the center pane.
The following image shows the Web data connection established earlier. The Overall
score column is selected, and its header is right-clicked to show the available menu
items. Notice that many of these items in the right-click menu are the same as buttons
in the ribbon tabs.
When you select a right-click menu item (or a ribbon button), the query applies the step
to the data. It also saves step as part of the query itself. The steps are recorded in the
Query Settings pane in sequential order, as described in the next section.
It's important to know that the underlying data isn't changed. Rather, Power Query
Editor adjusts and shapes its view of the data. It also shapes and adjusts the view of any
interaction with the underlying data that occurs based on Power Query Editor's shaped
and modified view of that data.
In the Query Settings pane, you can rename steps, delete steps, or reorder the steps as
you see fit. To do so, right-click the step in the Applied Steps section, and choose from
the menu that appears. All query steps are carried out in the order they appear in the
Applied Steps pane.
Advanced Editor
The Advanced Editor lets you see the code that Power Query Editor is creating with
each step. It also lets you create your own code in the Power Query M formula
language. To launch the advanced editor, select View from the ribbon, then select
Advanced Editor. A window appears, showing the code generated for the selected
query.
You can directly edit the code in the Advanced Editor window. To close the window,
select the Done or Cancel button.
When you're ready, Power BI Desktop can save your work in the form of a .pbix file.
To save your work, select File > Save (or File > Save As), as shown in the following
image.
Next steps
There are all sorts of things you can do with Power BI Desktop. For more information on
its capabilities, check out the following resources:
By using measures, you can create some of the most powerful data analysis solutions in
Power BI Desktop. Measures help you by performing calculations on your data as you
interact with your reports. This tutorial will guide you through understanding measures
and creating your own basic measures in Power BI Desktop.
Prerequisites
This tutorial is intended for Power BI users already familiar with using Power BI
Desktop to create more advanced models. You should already be familiar with
using Get Data and Power Query Editor to import data, work with multiple related
tables, and add fields to the report canvas. If you’re new to Power BI Desktop, be
sure to check out Get Started with Power BI Desktop.
This tutorial uses the Contoso Sales Sample for Power BI Desktop file, which
includes online sales data from the fictitious company, Contoso. Because this data
is imported from a database, you can't connect to the datasource or view it in
Power Query Editor. Download and extract the file on your computer.
Automatic measures
When Power BI Desktop creates a measure, it's most often created for you
automatically. To see how Power BI Desktop creates a measure, follow these steps:
1. In Power BI Desktop, select File > Open, browse to the Contoso Sales Sample for
Power BI Desktop.pbix file, and then choose Open.
2. In the Fields pane, expand the Sales table. Then, either select the check box next to
the SalesAmount field or drag SalesAmount onto the report canvas.
A new column chart visualization appears, showing the sum total of all values in
the SalesAmount column of the Sales table.
Any field (column) in the Fields pane with a sigma icon is numeric, and its values
can be aggregated. Rather than display a table with many values (2,000,000 rows
for SalesAmount), Power BI Desktop automatically creates and calculates a
measure to aggregate the data if it detects a numeric datatype. Sum is the default
aggregation for a numeric datatype, but you can easily apply different
aggregations like average or count. Understanding aggregations is fundamental to
understanding measures, because every measure performs some type of
aggregation.
2. In the Values area of the Visualizations pane, select the down arrow to the right of
SalesAmount.
Values calculated from measures change in response to your interactions with your
report. For example, if you drag the RegionCountryName field from the Geography
table onto your existing SalesAmount chart, it changes to show the average sales
amounts for each country/region.
When the result of a measure changes because of an interaction with your report,
you've affected your measure’s context. Every time you interact with your report
visualizations, you're changing the context in which a measure calculates and displays its
results.
DAX formulas use many of the same functions, operators, and syntax as Excel formulas.
However, DAX functions are designed to work with relational data and perform more
dynamic calculations as you interact with your reports. There are over 200 DAX functions
that do everything from simple aggregations like sum and average to more complex
statistical and filtering functions. There are many resources to help you learn more
about DAX. After you've finished this tutorial, see DAX basics in Power BI Desktop.
When you create your own measure, it's called a model measure, and it's added to the
Fields list for the table you select. Some advantages of model measures are that you can
name them whatever you want, making them more identifiable. You can use them as
arguments in other DAX expressions, and you can make them perform complex
calculations quickly.
Quick measures
Many common calculations are available as quick measures, which write the DAX
formulas for you based on your inputs in a window. These quick, powerful calculations
are also great for learning DAX or seeding your own customized measures.
From a table in the Fields pane, right-click or select More options (...), and then
choose New quick measure from the list.
Under Calculations in the Home tab of the Power BI Desktop ribbon, select New
Quick Measure.
For more information about creating and using quick measures, see Use quick
measures.
Create a measure
Suppose you want to analyze your net sales by subtracting discounts and returns from
total sales amounts. For the context that exists in your visualization, you need a measure
that subtracts the sum of DiscountAmount and ReturnAmount from the sum of
SalesAmount. There's no field for Net Sales in the Fields list, but you have the building
blocks to create your own measure to calculate net sales.
1. In the Fields pane, right-click the Sales table, or hover over the table and select
More options (...).
This action saves your new measure in the Sales table, where it's easy to find.
You can also create a new measure by selecting New Measure in the Calculations
group on the Home tab of the Power BI Desktop ribbon.
Tip
When you create a measure from the ribbon, you can create it in any of your
tables, but it's easier to find if you create it where you plan to use it. In this
case, select the Sales table first to make it active, and then choose New
measure.
The formula bar appears along the top of the report canvas, where you can
rename your measure and enter a DAX formula.
3. By default, each new measure is named Measure. If you don’t rename it, new
measures are named Measure 2, Measure 3, and so on. Because we want this
measure to be more identifiable, highlight Measure in the formula bar, and then
change it to Net Sales.
4. Begin entering your formula. After the equals sign, start to type Sum. As you type,
a drop-down suggestion list appears, showing all the DAX functions, beginning
with the letters you type. Scroll down, if necessary, to select SUM from the list, and
then press Enter.
The column name preceded by the table name is called the fully qualified name of
the column. Fully qualified column names make your formulas easier to read.
6. Select Sales[SalesAmount] from the list, and then enter a closing parenthesis.
Tip
a. After the closing parenthesis for the first expression, type a space, a minus
operator (-), and then another space.
b. Enter another SUM function, and start typing DiscountAmount until you can
choose the Sales[DiscountAmount] column as the argument. Add a closing
parenthesis.
The validated Net Sales measure is now ready to use in the Sales table in the
Fields pane.
9. If you run out of room for entering a formula or want it on separate lines, select
the down arrow on the right side of the formula bar to provide more space.
The down arrow turns into an up arrow and a large box appears.
10. Separate parts of your formula by pressing Alt + Enter for separate lines, or
pressing Tab to add tab spacing.
1. Select the Net Sales measure from the Sales table, or drag it onto the report
canvas.
2. Select the RegionCountryName field from the Geography table, or drag it onto
the Net Sales chart.
3. To see the difference between net sales and total sales by country/region, select
the SalesAmount field or drag it onto the chart.
The chart now uses two measures: SalesAmount, which Power BI summed
automatically, and the Net Sales measure, which you manually created. Each
measure was calculated in the context of another field, RegionCountryName.
1. Select a blank area next to the chart. In the Visualizations pane, select the Table
visualization.
2. Drag the Year field from the Calendar table onto the new blank table visualization.
Because Year is a numeric field, Power BI Desktop sums up its values. This
summation doesn’t work well as an aggregation; we'll address that in the next step.
3. In the Values box in the Visualizations pane, select the down arrow next to Year,
and then choose Don't summarize from the list. The table now lists individual
years.
4. Select the Slicer icon in the Visualizations pane to convert the table to a slicer. If
the visualization displays a slider instead of a list, choose List from the down arrow
in the slider.
5. Select any value in the Year slicer to filter the Net Sales and Sales Amount by
RegionCountryName chart accordingly. The Net Sales and SalesAmount
measures recalculate and display results in the context of the selected Year field.
1. In the Fields pane, create a new measure named Net Sales per Unit in the Sales
table.
2. In the formula bar, begin typing Net Sales. The suggestion list shows what you can
add. Select [Net Sales].
3. You can also reference measures by just typing an opening bracket ([). The
suggestion list shows only measures to add to your formula.
4. Enter a space, a divide operator (/), another space, a SUM function, and then type
Quantity. The suggestion list shows all the columns with Quantity in the name.
Select Sales[SalesQuantity], type the closing parenthesis, and press ENTER or
choose Commit (checkmark icon) to validate your formula.
5. Select the Net Sales per Unit measure from the Sales table, or drag it onto a blank
area in the report canvas.
The chart shows the net sales amount per unit over all products sold. This chart
isn't informative; we'll address it in the next step.
6. For a different look, change the chart visualization type to Treemap.
7. Select the Product Category field, or drag it onto the treemap or the Group field
of the Visualizations pane. Now you have some good info!
8. Try removing the ProductCategory field, and dragging the ProductName field
onto the chart instead.
Ok, now we're just playing, but you have to admit that's cool! Experiment with
other ways to filter and format the visualization.
If you want to take a deeper dive into DAX formulas and create some more
advanced measures, see Learn DAX basics in Power BI Desktop. This article focuses
on fundamental concepts in DAX, such as syntax, functions, and a more thorough
understanding of context.
Be sure to add the Data Analysis Expressions (DAX) Reference to your favorites.
This reference is where you'll find detailed info on DAX syntax, operators, and over
200 DAX functions.
Tutorial: Create calculated columns in
Power BI Desktop
Article • 01/12/2023
Sometimes the data you’re analyzing doesn’t contain a particular field that you need to
get your desired results. Calculated columns are useful for this situation. Calculated
columns use Data Analysis Expressions (DAX) formulas to define a column’s values. This
tool is useful for anything from putting together text values from a couple of different
columns to calculating a numeric value from other values. For example, let’s say your
data has City and State fields, but you want a single Location field that has both, like
"Miami, FL".
Calculated columns are similar to measures in that both are based on DAX formulas, but
they differ in how they're used. You often use measures in a visualization's Values area,
to calculate results based on other fields. You use calculated columns as new Fields in
the rows, axes, legends, and group areas of visualizations.
This tutorial will guide you through understanding and creating some calculated
columns and using them in report visualizations in Power BI Desktop.
Prerequisites
This tutorial is intended for Power BI users already familiar with using Power BI
Desktop to create more advanced models. You should already know how to use
Get Data and the Power Query Editor to import data, work with multiple related
tables, and add fields to the Report canvas. If you’re new to Power BI Desktop, be
sure to check out Getting Started with Power BI Desktop.
The tutorial uses the Contoso Sales Sample for Power BI Desktop , the same
sample used for the Create your own measures in Power BI Desktop tutorial. This
sales data from the fictitious company Contoso, Inc. was imported from a
database. You won’t be able to connect to the data source or view it in the Power
Query Editor. Download and extract the file on your own computer, and then open
it in Power BI Desktop.
2. By default, a new calculated column is named Column. If you don’t rename it, new
columns will be named Column 2, Column 3, and so on. You want your column to
be more identifiable, so while the Column name is already highlighted in the
formula bar, rename it by typing ProductFullCategory, and then type an equals (=)
sign.
3. You want the values in your new column to start with the name in the
ProductCategory field. Because this column is in a different but related table, you
can use the RELATED function to help you get it.
After the equals sign, type r. A dropdown suggestion list shows all of the DAX
functions beginning with the letter R. Selecting each function shows a description
of its effect. As you type, the suggestion list scales closer to the function you need.
Select RELATED, and then press Enter.
An opening parenthesis appears, along with another suggestion list of the related
columns you can pass to the RELATED function, with descriptions and details of
expected parameters.
4. You want the ProductCategory column from the ProductCategory table. Select
ProductCategory[ProductCategory], press Enter, and then type a closing
parenthesis.
Tip
Tip
If you need more room, select the down chevron on the right side of the
formula bar to expand the formula editor. In the editor, press Alt + Enter to
move down a line, and Tab to move things over.
6. Enter an opening bracket ([), and then select the [ProductSubcategory] column to
finish the formula.
You didn’t need to use another RELATED function to call the ProductSubcategory
table in the second expression, because you're creating the calculated column in
this table. You can enter [ProductSubcategory] with the table name prefix (fully
qualified) or without (non-qualified).
7. Complete the formula by pressing Enter or selecting the checkmark in the formula
bar. The formula validates, and the ProductFullCategory column name appears in
the ProductSubcategory table in the Fields pane.
7 Note
2. Select or drag the SalesAmount field from the Sales table into the table to show
the SalesAmount for each ProductFullCategory.
Create a calculated column that uses an IF
function
The Contoso Sales Sample contains sales data for both active and inactive stores. You
want to ensure that active store sales are clearly separated from inactive store sales in
your report by creating an Active StoreName field. In the new Active StoreName
calculated column, each active store will appear with the store's full name, while the
sales for inactive stores will be grouped together in one line item called Inactive.
Fortunately, the Stores table has a column named Status, with values of "On" for active
stores and "Off" for inactive stores, which we can use to create values for our new Active
StoreName column. Your DAX formula will use the logical IF function to test each store's
Status and return a particular value depending on the result. If a store's Status is "On",
the formula will return the store's name. If it’s "Off", the formula will assign an Active
StoreName of "Inactive".
1. Create a new calculated column in the Stores table and name it Active StoreName
in the formula bar.
2. After the = sign, begin typing IF. The suggestion list will show what you can add.
Select IF.
3. The first argument for IF is a logical test of whether a store's Status is "On". Type
an opening bracket [, which lists columns from the Stores table, and select [Status].
4. Right after [Status], type ="On", and then type a comma (,) to end the argument.
The tooltip suggests that you now need to add a value to return when the result is
TRUE.
5. If the store's status is "On", you want to show the store’s name. Type an opening
bracket ([) and select the [StoreName] column, and then type another comma. The
tooltip now indicates that you need to add a value to return when the result is
FALSE.
6. You want the value to be "Inactive", so type "Inactive", and then complete the
formula by pressing Enter or selecting the checkmark in the formula bar. The
formula validates, and the new column's name appears in the Stores table in the
Fields pane.
7. You can use your new Active StoreName column in visualizations just like any
other field. To show SalesAmounts by Active StoreName, select the Active
StoreName field or drag it onto the Report canvas, and then choose the
SalesAmount field or drag it into the table. In this table, active stores appear
individually by name, but inactive stores are grouped together at the end as
Inactive.
Next steps
If you want to take a deeper dive into DAX formulas and create calculated columns with
more advanced formulas, see DAX Basics in Power BI Desktop. This article focuses on
fundamental concepts in DAX, such as syntax, functions, and a more thorough
understanding of context.
Be sure to add the Data Analysis Expressions (DAX) Reference to your favorites. This
reference is where you'll find detailed info on DAX syntax, operators, and over 200 DAX
functions.
Edit data models in the Power BI service
(preview)
Article • 10/10/2023
Power BI allows users to modify existing data models in the Power BI service using
actions such as editing relationships, creating DAX measures and managing RLS. In this
experience, users can work and collaborate simultaneously on the same data model.
1. In the Power BI service, select Settings for the workspace where you want to
enable the preview feature.
2. Select Advanced > Data model settings > Users can edit data models in the
Power BI service (preview)
3. Select Save to see the new experience for datasets in your workspace.
7 Note
Enabling the edit data models in the Power BI service preview doesn't apply to
editing a dataset through an API or an XMLA endpoint.
From the workspace content list, select More options (...) for the dataset and select
Open data model.
From the datahub content list, select More options (...) for the dataset and select
Open data model.
From edit mode for a report connected to the dataset, select Open data model to
open the corresponding data model in another tab.
Model data
When you open your data model you can see all the tables, columns, and relationships
in your model. You can now edit your data model, and any changes are automatically
saved.
Create measures
To create a measure (a measure is a collection of standardized metrics) select the table
in the Data Pane and select the New measure button from the ribbon, as shown in the
following image.
Enter the measure into the formula bar and specify the table and the column to which it
applies. Similar to Power BI Desktop, the DAX editing experience in the Power BI service
presents a rich editor complete with autocomplete for formulas (intellisense).
You can expand the table to find the measure in the table.
Enter the calculated column into the formula bar and specify the table to which it
applies. Similar to Power BI Desktop, the DAX editing experience in the Power BI service
presents a rich editor complete with autocomplete for formulas (intellisense).
You can expand the table to find the calculated column in the table.
Enter the calculated table into the formula bar. Similar to Power BI Desktop, the DAX
editing experience in the Power BI service presents a rich editor complete with
autocomplete for formulas (intellisense). You can now see the newly created calculated
table in your model.
Create relationships
To create a relationship, drag the column from one table to the column of the other
table to initiate the relationship. In the window that appears, configure the relationship
properties.
Select the Confirm button when your relationship is complete to save the relationship
information.
Set properties
You can change the properties for a given object using the Properties pane. You can set
common properties across multiple objects at once by holding down the Ctrl key and
selecting multiple objects either in the relationship diagram or Data pane. When
multiple objects are highlighted, changes applied in the Properties pane apply to all
selected objects.
For example, you could change the data type for multiple columns by holding down the
Ctrl key, selecting columns, then changing the data type setting in the Properties pane.
2. From the Manage roles window, select New to create a new role.
3. Under Roles, provide a name for the role and select enter.
4. Under Select tables, select the table to which you want to apply a row-level
security filter.
5. Under Filter data, use the default editor to define your roles. The expressions
created return a true or false value.
7 Note
Not all row-level security filters supported in Power BI can be defined using
the default editor. Limitations include expressions that today can only be
defined using DAX, including dynamic rules such as username or
userprincipalname. To define roles using these filters, switch to use the DAX
editor.
6. Optionally select Switch to DAX editor to use the DAX editor to define your role.
You can switch back to the default editor by selecting Switch to default editor. All
changes made in either editor interface persist when switching interfaces when
possible.
When defining a role using the DAX editor that can't be defined in the default
editor, if you attempt to switch to the default editor you'll be prompted with a
warning that switching editors may result in some information being lost. To keep
this information, select Cancel and continue only editing this role in the DAX
editor.
8. Once the role is saved, select Assign to add users to the role. Once assigned, select
Save to save the role assignments and close the RLS settings modal.
Create layouts
You can create layouts of your model that contain only a subset of the tables in your
model. This reorganization can help provide a clearer view into the tables you want to
work with, and make working with complex datasets easier. To create a new layout with
only a subset of the tables, select the + button next to the All tables tab along the
bottom of the window.
You can then drag a table from the Data pane onto the new layout. Right-click the table,
and then select Add related tables from the menu that appears. Doing so includes any
table that is related to the original table to the layout.
Create reports
You can create a new report from the data model editing in the service experience by
selecting the New report button in the ribbon. This opens a new browser tab to the
report editing canvas to a new report that is built on the dataset.
When you save your new report, you're prompted to choose a workspace, provided you
have write permissions for that workspace. If you don't have write permissions, or if
you're a free user and the dataset resides in a Premium-capacity workspace, the new
report is saved in your My workspace.
Autosave
As you made changes to your data model, your changes are automatically saved.
Changes are permanent with no option to undo.
Permissions
A user must have write and build dataset permissions in order to open and edit the
corresponding data model in the Power BI service.
For more information on accessing your audit logs, see the Access your audit logs
article.
Web Modeling A data model read operation in the dataset web Datasets Interactive
read modeling user experience
Web Modeling A data model write operation in the dataset web Datasets Interactive
write modeling user experience
Unsupported datasets
The following scenarios don't support opening the data model for a dataset in the
service:
To see which limitation is preventing you from opening your data model, hover over the
Open data model button in the dataset details page. This displays a tooltip indicating
which limitation is causing the Open data model button to be disabled.
Limitations
There are still many functional gaps between the model view in Power BI desktop and
service. Functionality not yet supported in the service includes:
Next steps
This article provided information about the preview for editing data models in the Power
BI service. For more information on data modeling in Power BI, see the following
resources:
With Modeling view in Power BI Desktop, you can view and work with complex
datasets that contain many tables.
When you do, tables that are related to the original table are displayed in the new
diagram. The following image shows how related tables are displayed after selecting the
Add related tables menu option.
7 Note
You can also find the Add related tables option in the context menu on the
background of the model view. When selected, any table that has any relationship
to any table already included in the layout will be added to the layout.
For example, you could change the storage mode for multiple tables in your diagram
view by holding down the Ctrl key, selecting tables, then changing the storage mode
setting in the Properties pane.
Next steps
The following articles describe more about data models, and also describe DirectQuery
in detail.
Automatic aggregations
Use composite models in Power BI Desktop
Manage storage mode in Power BI Desktop
Many-to-many relationships in Power BI Desktop
DirectQuery articles:
DirectQuery in Power BI
Power BI data sources
Model relationships in Power BI Desktop
Article • 03/20/2023
This article targets import data modelers working with Power BI Desktop. It's an
important model design topic that's essential to delivering intuitive, accurate, and
optimal models.
For a deeper discussion on optimal model design, including table roles and
relationships, see Understand star schema and the importance for Power BI.
Relationship purpose
A model relationship propagates filters applied on the column of one model table to a
different model table. Filters will propagate so long as there's a relationship path to
follow, which can involve propagation to multiple tables.
Relationship paths are deterministic, meaning that filters are always propagated in the
same way and without random variation. Relationships can, however, be disabled, or
have filter context modified by model calculations that use particular DAX functions. For
more information, see the Relevant DAX functions topic later in this article.
) Important
Model relationships don't enforce data integrity. For more information, see the
Relationship evaluation topic later in this article, which explains how model
relationships behave when there are data integrity issues with your data.
A query, possibly generated by a Power BI card visual, requests the total sales quantity
for sales orders made for a single category, Cat-A, and for a single year, CY2018. It's why
you can see filters applied on the Category and Year tables. The filter on the Category
table propagates to the Product table to isolate two products that are assigned to the
category Cat-A. Then the Product table filters propagate to the Sales table to isolate just
two sales rows for these products. These two sales rows represent the sales of products
assigned to category Cat-A. Their combined quantity is 14 units. At the same time, the
Year table filter propagates to further filter the Sales table, resulting in just the one sales
row that is for products assigned to category Cat-A and that was ordered in year
CY2018. The quantity value returned by the query is 11 units. Note that when multiple
filters are applied to a table (like the Sales table in this example), it's always an AND
operation, requiring that all conditions must be true.
The following image is the model diagram of the Adventure Works sales analysis data
model. It shows a star schema design comprising a single fact table named Sales. The
other four tables are dimension tables that support the analysis of sales measures by
date, state, region, and product. Notice the model relationships connecting all tables.
These relationships propagate filters (directly or indirectly) to the Sales table.
Disconnected tables
It's unusual that a model table isn't related to another model table. Such a table in a
valid model design is described as a disconnected table. A disconnected table isn't
intended to propagate filters to other model tables. Instead, it accepts "user input"
(perhaps with a slicer visual), allowing model calculations to use the input value in a
meaningful way. For example, consider a disconnected table that's loaded with a range
of currency exchange rate values. As long as a filter is applied to filter by a single rate
value, a measure expression can use that value to convert sales values.
The Power BI Desktop what-if parameter is a feature that creates a disconnected table.
For more information, see Create and use a What if parameter to visualize variables in
Power BI Desktop.
Relationship properties
A model relationship relates one column in a table to one column in a different table.
(There's one specialized case where this requirement isn't true, and it applies only to
multi-column relationships in DirectQuery models. For more information, see the
COMBINEVALUES DAX function article.)
7 Note
It's not possible to relate a column to a different column in the same table. This
concept is sometimes confused with the ability to define a relational database
foreign key constraint that's table self-referencing. You can use this relational
database concept to store parent-child relationships (for example, each employee
record is related to a "reports to" employee). However, you can't use model
relationships to generate a model hierarchy based on this type of relationship. To
create a parent-child hierarchy, see Parent and Child functions.
Cardinality
Each model relationship is defined by a cardinality type. There are four cardinality type
options, representing the data characteristics of the "from" and "to" related columns.
The "one" side means the column contains unique values; the "many" side means the
column can contain duplicate values.
7 Note
If a data refresh operation attempts to load duplicate values into a "one" side
column, the entire data refresh will fail.
The four options, together with their shorthand notations, are described in the following
bulleted list:
One-to-many (1:*)
Many-to-one (*:1)
One-to-one (1:1)
Many-to-many (*:*)
When you create a relationship in Power BI Desktop, the designer automatically detects
and sets the cardinality type. Power BI Desktop queries the model to know which
columns contain unique values. For import models, it uses internal storage statistics; for
DirectQuery models it sends profiling queries to the data source. Sometimes, however,
Power BI Desktop can get it wrong. It can get it wrong when tables are yet to be loaded
with data, or because columns that you expect to contain duplicate values currently
contain unique values. In either case, you can update the cardinality type as long as any
"one" side columns contain unique values (or the table is yet to be loaded with rows of
data).
One-to-one cardinality
A one-to-one relationship means both columns contain unique values. This cardinality
type isn't common, and it likely represents a suboptimal model design because of the
storage of redundant data.
For more information on using this cardinality type, see One-to-one relationship
guidance.
Many-to-many cardinality
A many-to-many relationship means both columns can contain duplicate values. This
cardinality type is infrequently used. It's typically useful when designing complex model
requirements. You can use it to relate many-to-many facts or to relate higher grain facts.
For example, when sales target facts are stored at product category level and the
product dimension table is stored at product level.
For guidance on using this cardinality type, see Many-to-many relationship guidance.
7 Note
The Many-to-many cardinality type isn't currently supported for models developed
for Power BI Report Server.
Tip
In Power BI Desktop model view, you can interpret a relationship's cardinality type
by looking at the indicators (1 or *) on either side of the relationship line. To
determine which columns are related, you'll need to select, or hover the cursor
over, the relationship line to highlight the columns.
One-to-one Both
Single cross filter direction means "single direction", and Both means "both directions". A
relationship that filters in both directions is commonly described as bi-directional.
For one-to-many relationships, the cross filter direction is always from the "one" side,
and optionally from the "many" side (bi-directional). For one-to-one relationships, the
cross filter direction is always from both tables. Lastly, for many-to-many relationships,
cross filter direction can be from either one of the tables, or from both tables. Notice
that when the cardinality type includes a "one" side, that filters will always propagate
from that side.
When the cross filter direction is set to Both, another property becomes available. It can
apply bi-directional filtering when Power BI enforces row-level security (RLS) rules. For
more information about RLS, see Row-level security (RLS) with Power BI Desktop.
You can modify the relationship cross filter direction, including the disabling of filter
propagation, by using a model calculation. It's achieved by using the CROSSFILTER DAX
function.
We recommend using bi-directional filtering only as needed. For more information, see
Bi-directional relationship guidance.
Tip
In Power BI Desktop model view, you can interpret a relationship's cross filter
direction by noticing the arrowhead(s) along the relationship line. A single
arrowhead represents a single-direction filter in the direction of the arrowhead; a
double arrowhead represents a bi-directional relationship.
In specific circumstances, however, you can define one or more inactive relationships for
a role-playing dimension table. You can consider this design when:
Tip
In Power BI Desktop model view, you can interpret a relationship's active vs inactive
status. An active relationship is represented by a solid line; an inactive relationship
is represented as a dashed line.
When enabled, native queries sent to the data source will join the two tables together
by using an INNER JOIN rather than an OUTER JOIN . Generally, enabling this property
improves query performance, though it does depend on the specifics of the data source.
Always enable this property when a database foreign key constraint exists between the
two tables. Even when a foreign key constraint doesn't exist, consider enabling the
property as long as you're certain data integrity exists.
) Important
If data integrity should become compromised, the inner join will eliminate
unmatched rows between the tables. For example, consider a model Sales table
with a ProductID column value that didn't exist in the related Product table. Filter
propagation from the Product table to the Sales table will eliminate sales rows for
unknown products. This would result in an understatement of the sales results.
For more information, see Assume referential integrity settings in Power BI
Desktop.
RELATED: Retrieves the value from "one" side of a relationship. It's useful when
involving calculations from different tables that are evaluated in row context.
RELATEDTABLE: Retrieve a table of rows from "many" side of a relationship.
USERELATIONSHIP: Allows a calculation to use an inactive relationship. (Technically,
this function modifies the weight of a specific inactive model relationship helping
to influence its use.) It's useful when your model includes a role-playing dimension
table, and you choose to create inactive relationships from this table. You can also
use this function to resolve ambiguity in filter paths.
CROSSFILTER: Modifies the relationship cross filter direction (to one or both), or it
disables filter propagation (none). It's useful when you need to change or ignore
model relationships during the evaluation of a specific calculation.
COMBINEVALUES: Joins two or more text strings into one text string. The purpose
of this function is to support multi-column relationships in DirectQuery models
when tables belong to the same source group.
TREATAS: Applies the result of a table expression as filters to columns from an
unrelated table. It's helpful in advanced scenarios when you want to create a virtual
relationship during the evaluation of a specific calculation.
Parent and Child functions: A family of related functions that you can use to
generate calculated columns to naturalize a parent-child hierarchy. You can then
use these columns to create a fixed-level hierarchy.
Relationship evaluation
Model relationships, from an evaluation perspective, are classified as either regular or
limited. It's not a configurable relationship property. It's in fact inferred from the
cardinality type and the data source of the two related tables. It's important to
understand the evaluation type because there may be performance implications or
consequences should data integrity be compromised. These implications and integrity
consequences are described in this topic.
A composite model, however, can comprise tables using different storage modes
(import, DirectQuery or dual), or multiple DirectQuery sources. Each source, including
the Vertipaq cache of imported data, is considered to be a source group. Model
relationships can then be classified as intra source group or inter/cross source group. An
intra source group relationship relates two tables within a source group, while a
inter/cross source group relationship relates tables across two source groups. Note that
relationships in import or DirectQuery models are always intra source group.
In this example, the composite model consists of two source groups: a Vertipaq source
group and a DirectQuery source group. The Vertipaq source group contains three tables,
and the DirectQuery source group contains two tables. One cross source group
relationship exists to relate a table in the Vertipaq source group to a table in the
DirectQuery source group.
Regular relationships
A model relationship is regular when the query engine can determine the "one" side of
relationship. It has confirmation that the "one" side column contains unique values. All
one-to-many intra source group relationships are regular relationships.
In the following example, there are two regular relationships, both marked as R.
Relationships include the one-to-many relationship contained within the Vertipaq source
group, and the one-to-many relationship contained within the DirectQuery source.
For import models, where all data is stored in the Vertipaq cache, Power BI creates a
data structure for each regular relationship at data refresh time. The data structures
consist of indexed mappings of all column-to-column values, and their purpose is to
accelerate joining tables at query time.
At query time, regular relationships permit table expansion to happen. Table expansion
results in the creation of a virtual table by including the native columns of the base table
and then expanding into related tables. For import tables, table expansion is done in the
query engine; for DirectQuery tables it's done in the native query that's sent to the
source database (as long as the Assume referential integrity property isn't enabled).
The query engine then acts upon the expanded table, applying filters and grouping by
the values in the expanded table columns.
7 Note
Inactive relationships are expanded also, even when the relationship isn't used by a
calculation. Bi-directional relationships have no impact on table expansion.
For one-to-many relationships, table expansion takes place from the "many" to the
"one" sides by using LEFT OUTER JOIN semantics. When a matching value from the
"many" to the "one" side doesn't exist, a blank virtual row is added to the "one" side
table. This behavior applies only to regular relationships, not to limited relationships.
Table expansion also occurs for one-to-one intra source group relationships, but by
using FULL OUTER JOIN semantics. This join type ensures that blank virtual rows are
added on either side, when necessary.
Blank virtual rows are effectively unknown members. Unknown members represent
referential integrity violations where the "many" side value has no corresponding "one"
side value. Ideally these blanks shouldn't exist. They can be eliminated by cleansing or
repairing the source data.
In this example, the model consists of three tables: Category, Product, and Sales. The
Category table relates to the Product table with a One-to-many relationship, and the
Product table relates to the Sales table with a One-to-many relationship. The Category
table contains two rows, the Product table contains three rows, and the Sales tables
contains five rows. There are matching values on both sides of all relationships meaning
that there are no referential integrity violations. A query-time expanded table is
revealed. The table consists of the columns from all three tables. It's effectively a
denormalized perspective of the data contained in the three tables. A new row is added
to the Sales table, and it has a production identifier value (9) that has no matching value
in the Product table. It's a referential integrity violation. In the expanded table, the new
row has (Blank) values for the Category and Product table columns.
Limited relationships
A model relationship is limited when there's no guaranteed "one" side. A limited
relationship can happen for two reasons:
The relationship uses a many-to-many cardinality type (even if one or both
columns contain unique values).
The relationship is cross source group (which can only ever be the case for
composite models).
In the following example, there are two limited relationships, both marked as L. The two
relationships include the many-to-many relationship contained within the Vertipaq
source group, and the one-to-many cross source group relationship.
For import models, data structures are never created for limited relationships. In that
case, Power BI resolves table joins at query time.
Table expansion never occurs for limited relationships. Table joins are achieved by using
INNER JOIN semantics, and for this reason, blank virtual rows aren't added to
compensate for referential integrity violations.
The RELATED DAX function can't be used to retrieve the "one" side column values.
Enforcing RLS has topology restrictions.
Tip
In Power BI Desktop model view, you can interpret a relationship as being limited. A
limited relationship is represented with parenthesis-like marks ( ) after the
cardinality indicators.
Resolve relationship path ambiguity
Bi-directional relationships can introduce multiple, and therefore ambiguous, filter
propagation paths between model tables. When evaluating ambiguity, Power BI chooses
the filter propagation path according to its priority and weight.
Priority
Priority tiers define a sequence of rules that Power BI uses to resolve relationship path
ambiguity. The first rule match determines the path Power BI will follow. Each rule below
describes how filters flow from a source table to a target table.
When a relationship is included in all available paths, it's removed from consideration
from all paths.
Weight
Each relationship in a path has a weight. By default, each relationship weight is equal
unless the USERELATIONSHIP function is used. The path weight is the maximum of all
relationship weights along the path. Power BI uses the path weights to resolve
ambiguity between multiple paths in the same priority tier. It won't choose a path with a
lower priority but it will choose the path with the higher weight. The number of
relationships in the path doesn't affect the weight.
You can influence the weight of a relationship by using the USERELATIONSHIP function.
The weight is determined by the nesting level of the call to this function, where the
innermost call receives the highest weight.
Consider the following example. The Product Sales measure assigns a higher weight to
the relationship between Sales[ProductID] and Product[ProductID], followed by the
relationship between Inventory[ProductID] and Product[ProductID].
DAX
Product Sales =
CALCULATE(
CALCULATE(
SUM(Sales[SalesAmount]),
USERELATIONSHIP(Sales[ProductID], Product[ProductID])
),
USERELATIONSHIP(Inventory[ProductID], Product[ProductID])
)
7 Note
If Power BI detects multiple paths that have the same priority and the same weight,
it will return an ambiguous path error. In this case, you must resolve the ambiguity
by influencing the relationship weights by using the USERELATIONSHIP function,
or by removing or modifying model relationships.
Performance preference
The following list orders filter propagation performance, from fastest to slowest
performance:
Next steps
For more information about this article, check out the following resources:
With relationships with a many-to-many cardinality in Power BI Desktop, you can join
tables that use a cardinality of many-to-many. You can more easily and intuitively create
data models that contain two or more data sources. Relationships with a many-to-many
cardinality are part of the larger composite models capabilities in Power BI Desktop. For
more information about composite models, see Use composite models in Power BI
Desktop
For example, two tables might have had a column labeled CountryRegion. The values of
CountryRegion weren't unique in either table, though. To join such tables, you had to
create a workaround. One workaround might be to introduce extra tables with the
needed unique values. With relationships with a many-to-many cardinality, you can join
such tables directly, if you use a relationship with a cardinality of many-to-many.
Now, imagine that the Product table displays just two rows, as shown:
Also imagine that the Sales table has just four rows, including a row for a product C.
Because of a referential integrity error, the product C row doesn't exist in the Product
table.
The ProductName and Price (from the Product table), along with the total Qty for each
product (from the ProductSales table), would be displayed as shown:
As you can see in the preceding image, a blank ProductName row is associated with
sales for product C. This blank row accounts for the following considerations:
Any rows in the ProductSales table for which no corresponding row exists in the
Product table. There's a referential integrity issue, as we see for product C in this
example.
Any rows in the ProductSales table for which the foreign key column is null.
For these reasons, the blank row in both cases accounts for sales where the
ProductName and Price are unknown.
Sometimes the tables are joined by two columns, yet neither column is unique. For
example, consider these two tables:
The Sales table displays sales data by State, and each row contains the sales
amount for the type of sale in that state. The states include CA, WA, and TX.
The CityData table displays data on cities, including the population and state (such
as CA, WA, and New York).
A column for State is now in both tables. It's reasonable to want to report on both total
sales by state and total population of each state. However, a problem exists: the State
column isn't unique in either table.
Create a third table that contains only the unique State IDs. The table could be any
or all of:
A calculated table (defined by using Data Analysis Expressions [DAX]).
A table based on a query that's defined in Power Query Editor, which could
display the unique IDs drawn from one of the tables.
The combined full set.
Then relate the two original tables to that new table by using common Many-1
relationships.
You could leave the workaround table visible. Or you might hide the workaround table,
so it doesn't appear in the Fields list. If you hide the table, the Many-1 relationships
would commonly be set to filter in both directions, and you could use the State field
from either table. The latter cross-filtering would propagate to the other table. That
approach is shown in the following image:
A visual that displays State (from the CityData table), along with total Population and
total Sales, would then appear as follows:
7 Note
Because the state from the CityData table is used in this workaround, only the
states in that table are listed, so TX is excluded. Also, unlike Many-1 relationships,
while the total row includes all Sales (including those of TX), the details don't
include a blank row covering such mismatched rows. Similarly, no blank row would
cover Sales for which there's a null value for the State.
Suppose you also add City to that visual. Although the population per City is known, the
Sales shown for City simply repeats the Sales for the corresponding State. This scenario
normally occurs when the column grouping is unrelated to some aggregate measure, as
shown here:
Let's say you define the new Sales table as the combination of all States here, and we
make it visible in the Fields list. The same visual would display State (on the new table),
the total Population, and total Sales:
As you can see, TX—with Sales data but unknown Population data—and New York—
with known Population data but no Sales data—would be included. This workaround
isn't optimal, and it has many issues. For relationships with a many-to-many cardinality,
the resulting issues are addressed, as described in the next section.
For example, when you create a relationship directly between CityData and Sales—
where filters should flow from CityData to Sales—Power BI Desktop displays the Edit
relationship dialog:
The resulting Relationship view would then display the direct, many-to-many
relationship between the two tables. The tables' appearance in the Fields list, and their
later behavior when the visuals are created, are similar to when we applied the
workaround. In the workaround, the extra table that displays the distinct State data isn't
made visible. As described earlier, a visual that shows State, Population, and Sales data
would be displayed:
The major differences between relationships with a many-to-many cardinality and the
more typical Many-1 relationships are as follows:
The values shown don't include a blank row that accounts for mismatched rows in
the other table. Also, the values don't account for rows where the column used in
the relationship in the other table is null.
You can't use the RELATED() function, because more than one row could be
related.
Using the ALL() function on a table doesn't remove filters that are applied to
other, related tables by a many-to-many relationship. In the preceding example, a
measure that's defined as shown here wouldn't remove filters on columns in the
related CityData table:
A visual showing State, Sales, and Sales total data would result in this graphic:
With the preceding differences in mind, make sure the calculations that use
ALL(<Table>) , such as % of grand total, are returning the intended results.
Considerations and limitations
There are a few limitations for this release of relationships with a many-to-many
cardinality and composite models.
The following Live Connect (multidimensional) sources can't be used with composite
models:
SAP HANA
SAP Business Warehouse
SQL Server Analysis Services
Power BI datasets
Azure Analysis Services
When you connect to these multidimensional sources by using DirectQuery, you can't
connect to another DirectQuery source or combine it with imported data.
The existing limitations of using DirectQuery still apply when you use relationships with
a many-to-many cardinality. Many limitations are now per table, depending upon the
storage mode of the table. For example, a calculated column on an imported table can
refer to other tables, but a calculated column on a DirectQuery table can still refer only
to columns on the same table. Other limitations apply to the whole model if any tables
within the model are DirectQuery. For example, the QuickInsights and Q&A features are
unavailable on a model if any table within it has a storage mode of DirectQuery.
Next steps
For more information about composite models and DirectQuery, see the following
articles:
When filtering tables to create the appropriate view of data, report creators and data
modelers face challenges determining how to apply filters to a report. Previously, the
table's filter context was held on one side of the relationship, but not the other. This
arrangement often required a complex Data Analysis Expressions (DAX) formula to get
the wanted results.
With bidirectional cross-filtering, report creators and data modelers now have more
control over how they can apply filters when working with related tables. Bidirectional
cross-filtering enables them to apply filters on both sides of a table relationship. You can
apply the filters by propagating the filter context to a second related table on the other
side of a table relationship.
For more information and for examples of how bidirectional cross-filtering works, see
the Bidirectional cross-filtering for Power BI Desktop whitepaper .
Use composite models in Power BI
Desktop
Article • 09/25/2023
Previously in Power BI Desktop, when you used a DirectQuery in a report, no other data
connections, whether DirectQuery or import, were allowed for that report. With
composite models, that restriction is removed. A report can seamlessly include data
connections from more than one DirectQuery or import data connection, in any
combination you choose.
The composite models capability in Power BI Desktop consists of three related features:
Composite models: Allows a report to have two or more data connections from
different source groups. These source groups can be one or more DirectQuery
connections and an import connection, two or more DirectQuery connections, or
any combination thereof. This article describes composite models in detail.
Storage mode: You can now specify which visuals query back-end data sources.
This feature helps improve performance and reduce back-end load. Previously,
even simple visuals, such as slicers, initiated queries to back-end sources. For more
information, see Manage storage mode in Power BI Desktop.
By importing data to Power BI, which is the most common way to get data.
By connecting directly to data in its original source repository by using
DirectQuery. To learn more about DirectQuery, see DirectQuery in Power BI.
When you use DirectQuery, composite models make it possible to create a Power BI
model, such as a single .pbix Power BI Desktop file that does either or both of the
following actions:
Combines data from one or more DirectQuery sources.
Combines data from DirectQuery sources and import data.
For example, by using composite models, you can build a model that combines the
following types of data:
A model that combines data from more than one DirectQuery source or that combines
DirectQuery with import data is called a composite model.
You can create relationships between tables as you always have, even when those tables
come from different sources. Any relationships that are cross-source are created with a
cardinality of many-to-many, regardless of their actual cardinality. You can change them
to one-to-many, many-to-one, or one-to-one. Whichever cardinality you set, cross-
source relationships have different behavior. You can't use Data Analysis Expressions
(DAX) functions to retrieve values on the one side from the many side. You might also
see a performance impact versus many-to-many relationships within the same source.
7 Note
Within the context of composite models, all imported tables are effectively a single
source, regardless of the actual underlying data sources.
But what if you have data in an Excel spreadsheet about the product manager who's
assigned to each product, along with the marketing priority? If you want to view Sales
Amount by Product Manager, it might not be possible to add this local data to the
corporate data warehouse. Or it might take months at best.
It might be possible to import that sales data from the data warehouse, instead of using
DirectQuery. And the sales data could then be combined with the data that you
imported from the spreadsheet. However, that approach is unreasonable, for the
reasons that led to using DirectQuery in the first place. The reasons could include:
Here's where composite models come in. Composite models let you connect to the data
warehouse by using DirectQuery and then use Get data for more sources. In this
example, we first establish the DirectQuery connection to the corporate data warehouse.
We use Get data, choose Excel, and then navigate to the spreadsheet that contains our
local data. Finally, we import the spreadsheet that contains the Product Names, the
assigned Sales Manager, and the Priority.
In the Fields list, you can see two tables: the original Bike table from SQL Server and a
new ProductManagers table. The new table contains the data that's imported from
Excel.
Similarly, in the Relationship view in Power BI Desktop, we now see another table called
ProductManagers.
We now need to relate these tables to the other tables in the model. As always, we
create a relationship between the Bike table from SQL Server and the imported
ProductManagers table. That is, the relationship is between Bike[ProductName] and
ProductManagers[ProductName]. As discussed earlier, all relationships that go across
source default to many-to-many cardinality.
Now that we've established this relationship, it's displayed in the Relationship view in
Power BI Desktop, as we would expect.
We can now create visuals by using any of the fields in the Fields list. This approach
seamlessly blends data from multiple sources. For example, the total SalesAmount for
each Product Manager is displayed in the following image:
The following example displays a common case of a dimension table, such as Product or
Customer, that's extended with some extra data imported from somewhere else. It's also
possible to have tables use DirectQuery to connect to various sources. To continue with
our example, imagine that Sales Targets per Country and Period are stored in a
separate departmental database. As usual, you can use Get data to connect to that data,
as shown in the following image:
As we did earlier, we can create relationships between the new table and other tables in
the model. Then we can create visuals that combine the table data. Let's look again at
the Relationships view, where we've established the new relationships:
The next image is based on the new data and relationships we created. The visual at the
lower left shows total Sales Amount versus Target, and the variance calculation shows
the difference. The Sales Amount and Target data come from two different SQL Server
databases.
Set the storage mode
Each table in a composite model has a storage mode that indicates whether the table is
based on DirectQuery or import. The storage mode can be viewed and modified in the
Property pane. To display the storage mode, right-click a table in the Fields list, and
then select Properties. The following image shows the storage mode for the
SalesTargets table.
The storage mode can also be viewed on the tooltip for each table.
For any Power BI Desktop file (a .pbix file) that contains some tables from DirectQuery
and some import tables, the status bar displays a storage mode called Mixed. You can
select that term in the status bar and easily switch all tables to import.
For more information about storage mode, see Manage storage mode in Power BI
Desktop.
7 Note
You can use Mixed storage mode in Power BI Desktop and in the Power BI service.
Calculated tables
You can add calculated tables to a model that uses DirectQuery. The Data Analysis
Expressions (DAX) that define the calculated table can reference either imported or
DirectQuery tables or a combination of the two.
Calculated tables are always imported, and their data is refreshed when you refresh the
tables. If a calculated table refers to a DirectQuery table, visuals that refer to the
DirectQuery table always show the latest values in the underlying source. Alternatively,
visuals that refer to the calculated table show the values at the time when the calculated
table was last refreshed.
Security implications
Composite models have some security implications. A query sent to one data source can
include data values that have been retrieved from another source. In the earlier example,
the visual that shows (Sales Amount) by Product Manager sends an SQL query to the
Sales relational database. That SQL query might contain the names of Product Managers
and their associated Products.
So, information that's stored in the spreadsheet is now included in a query that's sent to
the relational database. If this information is confidential, you should consider the
security implications. In particular, consider the following points:
Any administrator of the database who can view traces or audit logs could view
this information, even without permissions to the data in its original source. In this
example, the administrator would need permissions to the Excel file.
The encryption settings for each source should be considered. You want to avoid
retrieving information from one source by an encrypted connection and then
inadvertently including it in a query that's sent to another source by an
unencrypted connection.
Additionally, if an author adds Table1 from Model A to a Composite Model (let's call it
Model C for reference), then a user viewing a report built on Model C could query any
table in Model A that isn't protected by row-level security RLS.
For similar reasons, be careful when you open a Power BI Desktop file that's sent from
an untrusted source. If the file contains composite models, information that someone
retrieves from one source, by using the credentials of the user who opens the file, would
be sent to another data source as part of the query. The information could be viewed by
the malicious author of the Power BI Desktop file. When you initially open a Power BI
Desktop file that contains multiple sources, Power BI Desktop displays a warning. The
warning is similar to the one that's displayed when you open a file that contains native
SQL queries.
Performance implications
When you use DirectQuery, you should always consider performance, primarily to
ensure that the back-end source has sufficient resources to provide a good experience
for users. A good experience means that the visuals refresh in five seconds or less. For
more performance advice, see DirectQuery in Power BI.
Using composite models adds other performance considerations. A single visual can
result in sending queries to multiple sources, which often pass the results from one
query across to a second source. This situation can result in the following forms of
execution:
A source query that includes a large number of literal values: For example, a
visual that requests total Sales Amount for a set of selected Product Managers
would first need to find which Products were managed by those product
managers. This sequence must happen before the visual sends an SQL query that
includes all of the product IDs in a WHERE clause.
A source query that queries at a lower level of granularity, with the data later
being aggregated locally: As the number of Products that meet the filter criteria
on Product Manager grows large, it can become inefficient or unfeasible to
include all products in a WHERE clause. Instead, you can query the relational source
at the lower level of Products and then aggregate the results locally. If the
cardinality of Products exceeds a limit of 1 million, the query fails.
Multiple source queries, one per group by value: When the aggregation uses
DistinctCount and is grouped by a column from another source, and if the external
source doesn't support efficient passing of many literal values that define the
grouping, it's necessary to send one SQL query per group by value.
Each of these cases has its own implications on performance, and the exact details vary
for each data source. Although the cardinality of the columns used in the relationship
that joins the two sources remains low, a few thousand, performance shouldn't be
affected. As this cardinality grows, you should pay more attention to the impact on the
resulting performance.
Additionally, the use of many-to-many relationships means that separate queries must
be sent to the underlying source for each total or subtotal level, rather than aggregating
the detailed values locally. A simple table visual with totals would send two source
queries, rather than one.
Source groups
A source group is a collection of items, such as tables and relationships, from a
DirectQuery source or all import sources involved in a data model. A composite model is
made of one or more source groups. Consider the following examples:
A composite model that connects to a Power BI Dataset called Sales and enriches
the dataset by adding a Sales YTD measure, which isn't available in the original
dataset. This model consists of one source group.
A composite model that combines data by importing a table from an Excel sheet
called Targets and a CSV file called Regions, and making a DirectQuery connection
to a Power BI Dataset called Sales. In this case, there are two source groups as
shown in the following image:
The first source group contains the tables from the Targets Excel sheet, and the
Regions CSV file.
The second source group contains the items from the Sales Power BI Dataset.
7 Note
Importing data from another source will not add another source group, because all
items from all imported sources are in one source group.
Source groups and relationships
There are two types of relationships in a composite model:
Intra source group relationships. These relationships relate items within a source
group together. These relationships are always regular relationships unless they're
many-to-many, in which case they're limited.
Cross source group relationships. These relationships start in one source group
and end in a different source group. These relationships are always limited
relationships.
Read more about the distinction between regular and limited relationships and their
impact.
For example, in the following image we've added three cross source group relationships,
relating tables across the various source groups together:
DAX
[Average Inventory Count] = Average(Inventory[Inventory Count])
DAX
In this scenario, the Internet Sales measure isn't impacted by the calculation group
defined in the remote model because they aren't part of the same model. However, the
calculation group can change the result of the Reseller Sales measure, because they are
in the same model. This fact means that the results returned by the Total Sales measure
must be evaluated carefully. Imagine we use the calculation group in the remote model
to return year-to-date results. The result returned by Reseller Sales is now a year-to-
date value, while the result returned by Internet Sales is still an actual. The result of
Total Sales is now likely unexpected, as it adds an actual to a year-to-date result.
Composite models on Power BI datasets and
Analysis Services
Using composite models with Power BI datasets and Analysis Services, you can build a
composite model using a DirectQuery connection to connect to Power BI datasets,
Azure Analysis Services (AAS), and SQL Server 2022 Analysis Services. Using a composite
model, you can combine the data in these sources with other DirectQuery and imported
data. Report authors who want to combine the data from their enterprise semantic
model with other data they own, such as an Excel spreadsheet, or want to personalize or
enrich the metadata from their enterprise semantic model, will find this functionality
especially useful.
Allow XMLA Endpoints and Analyze in Excel with on-premises datasets. If this
switch is disabled a DirectQuery connection to a Power BI dataset can't be made.
Users can work with Power BI datasets in Excel using a live connection. If this
switch is disabled, users can't make live connections to Power BI datasets so the
Make changes to this model button can't be reached.
Allow DirectQuery connection to Power BI datasets. See the following paragraphs
for more information on this switch and the effect of disabling it.
Additionally, for Premium capacities and Premium Per User the "XMLA endpoint" setting
should be enabled and set to to either "Read Only" or "Read/Write".
This way you can still explore the dataset in your local Power BI Desktop environment
and create the composite model. However, you won't be able to publish the report to
the Service. When you publish the report and model you'll see the following error
message and publication will be blocked:
Note that live connections to Power BI datasets aren't influenced by the switch, nor are
live or DirectQuery connections to Analysis Services. These will continue to work
regardless of if the switch has been turned off. Also, any published reports that leverage
a composite model on a Power BI dataset will continue to work even if the switch has
been turned off after they were published.
To see which connections are being used in your model, check the status bar in the
bottom right corner of Power BI Desktop. If you're only connected to an Analysis
Services source, you see a message like the following image:
If you're connected to a Power BI dataset, you see a message telling you which Power BI
dataset you're connected to:
If you want to customize the metadata of fields in your live connected dataset, select
Make changes to this model in the status bar. Alternatively, you can select the Make
changes to this model button in the ribbon, as shown in the following image. In Report
View the Make changes to this model button in the Modeling tab. In Model View, the
button is in the Home tab.
Selecting the button displays a dialog confirming addition of a local model. Select Add a
local model to enable creating new columns or modifying the metadata, for fields from
Power BI datasets or Analysis Services. The following image shows the dialog that's
displayed.
When you're connected live to an Analysis Services source, there's no local model. To
use DirectQuery for live connected sources, such as Power BI datasets and Analysis
Services, you must add a local model to your report. When you publish a report with a
local model to the Power BI service, a dataset for that local model is published a well.
Chaining
Datasets and the datasets and models on which they're based form a chain. This
process, called chaining, lets you publish a report and dataset based on other Power BI
datasets, a feature that previously wasn't possible.
For example, imagine your colleague publishes a Power BI dataset called Sales and
Budget that's based on an Analysis Services model called Sales, and combines it with an
Excel sheet called Budget.
When you publish a new report (and dataset) called Sales and Budget Europe that's
based on the Sales and Budget Power BI dataset published by your colleague, making
some further modifications or extensions as you do so, you're effectively adding a report
and dataset to a chain of length three, which started with the Sales Analysis Services
model, and ends with your Sales and Budget Europe Power BI dataset. The following
image visualizes this chaining process.
The chain in the previous image is of length three, which is the maximum length.
Extending beyond a chain length of three isn't supported and results in errors.
7 Note
Refer to this blogpost for important information about permissions required for
composite models on Power BI datasets and Analysis Services models .
If any dataset in the chain is in a Premium Per User workspace, the user accessing it
needs a Premium Per User license. If any dataset in the chain is in a Pro workspace, the
user accessing it needs a Pro license. If all the datasets in the chain are on Premium
capacities, a user can access it using a Free license.
Security warning
Using the DirectQuery for Power BI datasets and Analysis Services feature will present
you with a security warning dialog, shown in the following image.
Data may be pushed from one data source to another, which is the same security
warning for combining DirectQuery and import sources in a data model. To learn more
about this behavior, please see using composite models in Power BI Desktop.
Supported scenarios
You can build composite models using data from Power BI datasets or Analysis Services
models to service the following scenarios:
Connecting to data from various sources: Import (such as files), Power BI datasets,
Analysis Services models
Creating relationships between different data sources
Writing measures that use fields from different data sources
Creating new columns for tables from Power BI datasets or Analysis Services
models
Creating visuals that use columns from different data sources
You can remove a table from your model using the field list, to keep models as
concise and lean as possible (if you connect to a perspective, you can't remove
tables from the model)
You can specify which tables to load, rather than having to load all tables when you
only want a specific subset of tables. See Loading a subset of tables later in this
document.
You can specify whether to add any tables that are subsequently added to the
dataset after you make the connection in your model.
If you refresh your data sources, and there are errors with conflicting field or table
names, Power BI resolves the errors for you.
You cannot edit, delete or create new relationships in the same Power BI dataset or
Analysis Services source. If you have edit access to these sources, you can make
the changes directly in the data source instead.
You can't change data types of columns that are loaded from a Power BI dataset or
Analysis Services source. If you need to change the data type, either change it in
the source or use a calculated column.
Connections to a SQL Server 2022 and later Analysis Services server on-premises or
IAAS require an On-premises data gateway (Standard mode).
All connections to remote Power BI Datasets models are made using single sign-
on. Authenticating with a service principal isn't currently supported.
RLS rules will be applied on the source on which they're defined, but won't be
applied to any other datasets in the model. RLS defined in the report won't be
applied to remote sources, and RLS set on remote sources won't be applied to
other data sources. Also, you can't define RLS on a table from another source
group nor can you define RLS on a local table that has a relationship to another
source group.
KPIs, row level security, and translations won't be imported from the source.
You may see some unexpected behavior when using a date hierarchy. To resolve
this issue, use a date column instead. After adding a date hierarchy to a visual, you
can switch to a date column by clicking on the down arrow in the field name, and
then clicking on the name of that field instead of using Date Hierarchy:
For more information on using date columns versus date hierarchies, see apply
auto date or time in Power BI Desktop.
The maximum length of a chain of models is three. Extending beyond the chain
length of three isn't supported and results in errors.
A discourage chaining flag can be set on a model to prevent a chain from being
created or extended. See Manage DirectQuery connections to a published dataset
for more information.
The following limitations apply when working with DirectQuery for Power BI datasets
and Analysis Services:
There are a few other things to consider when working with DirectQuery for Power BI
datasets and Analysis Services:
Consider the following diagram. The numbered steps in the diagram are described in
paragraphs that follow.
In the diagram, Ash works with Contoso and is accessing data provided by Fabrikam.
Using Power BI Desktop, Ash creates a DirectQuery connection to an Analysis Services
model that is hosted in Fabrikam’s tenant.
To authenticate, Ash uses a B2B Guest user identity (step 1 in the diagram).
If the report is published to Contoso’s Power BI service (step 2), the dataset published in
the Contoso tenant cannot successfully authenticate against Fabrikam’s Analysis Services
model (step 3). As a result, the report won't work.
In this scenario, since the Analysis Services model used is hosted in Fabrikam’s tenant,
the report also must be published in Fabrikam's tenant. After successful publication in
Fabrikam’s tenant (step 4) the dataset can successfully access the Analysis Services
model (step 5) and the report will work properly.
Object-level security (OLS) enables model authors to hide objects that make up the
model schema (that is, tables, columns, metadata, etc.) from model consumers (for
example, a report builder or a composite model author). In configuring OLS for an
object, the model author creates a role, and then removes access to the object for users
who are assigned to that role. From the standpoint of those users, the hidden object
simply doesn't exist.
OLS is defined for and applied on the source model. It cannot be defined for a
composite model built on the source model.
Since the composite model isn't secured by OLS rules, the objects that consumers of the
composite model see are those that the composite model author could see in the
source model rather than what they themselves might have access to. This might result
in the following situations
Someone looking at the composite model might see objects that are hidden from
them in the source model by OLS.
Conversely, they might NOT see an object in the composite model that they CAN
see in the source model, because that object was hidden from the composite
model author by the OLS rules controlling access to the source model.
An important point is that in spite of the case described in the first bullet, consumers of
the composite model will never see actual data they aren't supposed to see, because the
data isn't actually located in the composite model. Rather, because of DirectQuery, it's
retrieved as needed from the source dataset, where OLS blocks unauthorized access.
3. Marketing_user opens the Finance report. The visual that uses the Territory table is
displayed, but returns an error, because when the report is opened, DirectQuery
tries to retrieve the data from the source model using the credentials of the
Marketing_user, who is blocked from seeing the Territory table as per the OLS rules
set on the enterprise semantic model.
4. Marketing_user creates a new report called "Marketing Report" that uses the
Finance dataset as its source. The field list shows the tables and columns that
Finance_user has access to. Hence, the Territory table is shown in the fields list, but
the Customer table is not. However, when the Marketing_user tries to create a
visual that leverages a column from the Territory table, an error is returned,
because at that point DirectQuery tries to retrieve data from the source model
using Marketing_user's credentials, and OLS rules once again kick in and block
access. The same thing happens when Marketing_user creates a new dataset and
report that connect to the Finance dataset with a DirectQuery connection – they
see the Territory table in the fields list, since that is what Finance_user could see,
but when they try to create a visual that leverages that table, they'll be blocked by
the OLS rules on the enterprise semantic model.
5. Now let's say that Admin_user updates the OLS rules on the enterprise semantic
model to stop Finance from seeing the Territory table.
6. The updated OLS rules are only reflected in the Finance dataset when it's
refreshed. Thus, when the Finance_user refreshes the Finance dataset, the Territory
table will no longer be shown in the fields list, and the visual in the Finance report
that uses a column from the Territory table will return an error for Finance_user,
because they're now not allowed to access the Territory table.
To summarize:
Consumers of a composite model see the results of the OLS rules that were
applicable to the author of the composite model when they created the model.
Thus, when a new report is created based on the composite model, the field list
will show the tables that the author of the composite model had access to when
they created the model, regardless of what the current user has access to in the
source model.
OLS rules can't be defined on the composite model itself.
A consumer of a composite model will never see actual data they aren't supposed
to see, because relevant OLS rules on the source model will block them when
DirectQuery tries to retrieve the data using their credentials.
If the source model updates its OLS rules, those changes will only affect the
composite model when it's refreshed.
This dialog will only show if you add a DirectQuery connection to a Power BI
dataset or Analysis Services model to an existing model. You can also open this
dialog by changing the DirectQuery connection to the Power BI dataset or Analysis
Services model in the Data source settings after you created it.
Mixed-mode connections - When using a mixed mode connection that contains online
data (such as a Power BI dataset) and an on-premises dataset (such as an Excel
workbook), you must have gateway mapping established for visuals to properly appear.
SAP HANA
SAP Business Warehouse
SQL Server Analysis Services earlier than version 2022
Usage metrics (My workspace)
The existing limitations of DirectQuery still apply when you use composite models.
Many of these limitations are now per table, depending upon the storage mode of the
table. For example, a calculated column on an import table can refer to other tables, but
a calculated column on a DirectQuery table can still refer only to columns on the same
table. Other limitations apply to the model as a whole, if any of the tables within the
model are DirectQuery. For example, the QuickInsights feature isn't available on a model
if any of the tables within it has a storage mode of DirectQuery.
Next steps
For more information about composite models and DirectQuery, see the following
articles:
Aggregations in Power BI can improve query performance over very large DirectQuery
datasets. By using aggregations, you cache data at the aggregated level in-memory.
Aggregations in Power BI can be manually configured in the data model, as described in
this article, or for Premium subscriptions, automatically by enabling the Automatic
aggregations feature in dataset Settings.
Dimensional data sources, like data warehouses and data marts, can use relationship-
based aggregations. Hadoop-based big-data sources often base aggregations on
GroupBy columns. This article describes typical Power BI data modeling differences for
each type of data source.
Manage aggregations
In the Fields pane of any Power BI Desktop view, right-click the aggregations table, and
then select Manage aggregations.
The Manage aggregations dialog shows a row for each column in the table, where you
can specify the aggregation behavior. In the following example, queries to the Sales
detail table are internally redirected to the Sales Agg aggregation table.
In this relationship-based aggregation example, the GroupBy entries are optional.
Except for DISTINCTCOUNT, they don't affect aggregation behavior and are primarily for
readability. Without the GroupBy entries, the aggregations would still get hit, based on
the relationships. This is different from the big data example later in this article, where
the GroupBy entries are required.
Validations
The Manage aggregations dialog enforces validations:
The Detail Column must have the same datatype as the Aggregation Column,
except for the Count and Count table rows Summarization functions. Count and
Count table rows are only available for integer aggregation columns, and don't
require a matching datatype.
Chained aggregations covering three or more tables aren't allowed. For example,
aggregations on Table A can't refer to a Table B that has aggregations referring to
a Table C.
Duplicate aggregations, where two entries use the same Summarization function
and refer to the same Detail Table and Detail Column, aren't allowed.
The Detail Table must use DirectQuery storage mode, not Import.
Grouping by a foreign key column used by an inactive relationship, and relying on
the USERELATIONSHIP function for aggregation hits, isn't supported.
Aggregations based on GroupBy columns can leverage relationships between
aggregation tables but authoring relationships between aggregation tables is not
supported in Power BI Desktop. If necessary, you can create relationships between
aggregation tables by using a third-party tool or a scripting solution through
XMLA endpoints.
Most validations are enforced by disabling dropdown values and showing explanatory
text in the tooltip.
Aggregation tables are hidden
Users with read-only access to the dataset can't query aggregation tables. This avoids
security concerns when used with row-level security (RLS). Consumers and queries refer
to the detail table, not the aggregation table, and don't need to know about the
aggregation table.
For this reason, aggregation tables are hidden from Report view. If the table isn't
already hidden, the Manage aggregations dialog will set it to hidden when you select
Apply all.
Storage modes
The aggregation feature interacts with table-level storage modes. Power BI tables can
use DirectQuery, Import, or Dual storage modes. DirectQuery queries the backend
directly, while Import caches data in memory and sends queries to the cached data. All
Power BI Import and non-multidimensional DirectQuery data sources can work with
aggregations.
To set the storage mode of an aggregated table to Import to speed up queries, select
the aggregated table in Power BI Desktop Model view. In the Properties pane, expand
Advanced, drop down the selection under Storage mode, and select Import. Note that
this action is irreversible.
To learn more about table storage modes, see Manage storage mode in Power BI
Desktop.
In the following example, the RLS expression on the Geography table works for
aggregations, because Geography is on the filtering side of relationships to both the
Sales table and the Sales Agg table. Queries that hit the aggregation table and those
that don't will both have RLS successfully applied.
An RLS expression on the Product table filters only the detail Sales table, not the
aggregated Sales Agg table. Since the aggregation table is another representation of
the data in the detail table, it would be insecure to answer queries from the aggregation
table if the RLS filter can't be applied. Filtering only the detail table isn't recommended,
because user queries from this role won't benefit from aggregation hits.
An RLS expression that filters only the Sales Agg aggregation table and not the Sales
detail table isn't allowed.
For aggregations based on GroupBy columns, an RLS expression applied to the detail
table can be used to filter the aggregation table, because all the GroupBy columns in
the aggregation table are covered by the detail table. On the other hand, an RLS filter
on the aggregation table can't be applied to the detail table, so is disallowed.
In the following example, the model gets data from a single data source. Tables are
using DirectQuery storage mode. The Sales fact table contains billions of rows. Setting
the storage mode of Sales to Import for caching would consume considerable memory
and resources overhead.
Instead, create the Sales Agg aggregation table. In the Sales Agg table, the number of
rows equals the sum of SalesAmount grouped by CustomerKey, DateKey, and
ProductSubcategoryKey. The Sales Agg table is at a higher granularity than Sales, so
instead of billions, it might contain millions of rows, which are much easier to manage.
If the following dimension tables are the most commonly used for the queries with high
business value, they can filter Sales Agg, using one-to-many or many-to-one
relationships.
Geography
Customer
Date
Product Subcategory
Product Category
7 Note
The Sales Agg table, like any table, has the flexibility of being loaded in a variety of
ways. The aggregation can be performed in the source database using ETL/ELT
processes, or by the M expression for the table. The aggregated table can use
Import storage mode, with or without Incremental refresh for datasets, or it can
use DirectQuery and be optimized for fast queries using columnstore indexes. This
flexibility enables balanced architectures that can spread query load to avoid
bottlenecks.
Changing the storage mode of the aggregated Sales Agg table to Import opens a
dialog box saying that the related dimension tables can be set to storage mode Dual.
Setting the related dimension tables to Dual lets them act as either Import or
DirectQuery, depending on the subquery. In the example:
Queries that aggregate metrics from the Import-mode Sales Agg table, and group
by attribute(s) from the related Dual tables, can be returned from the in-memory
cache.
Queries that aggregate metrics from the DirectQuery Sales table, and group by
attribute(s) from the related Dual tables, can be returned in DirectQuery mode. The
query logic, including the GroupBy operation, is passed down to the source
database.
For more information about Dual storage mode, see Manage storage mode in Power BI
Desktop.
Dual Dual
The only case where a cross-source relationship is considered regular is if both tables are
set to Import. Many-to-many relationships are always considered limited.
For cross-source aggregation hits that don't depend on relationships, see Aggregations
based on GroupBy columns.
The following query doesn't hit the aggregation. Despite requesting the sum of
SalesAmount, the query is performing a GroupBy operation on a column in the Product
table, which isn't at the granularity that can hit the aggregation. If you observe the
relationships in the model, a product subcategory can have multiple Product rows. The
query wouldn't be able to determine which product to aggregate to. In this case, the
query reverts to DirectQuery and submits a SQL query to the data source.
Aggregations aren't just for simple calculations that perform a straightforward sum.
Complex calculations can also benefit. Conceptually, a complex calculation is broken
down into subqueries for each SUM, MIN, MAX, and COUNT, and each subquery is
evaluated to determine if it can hit the aggregation. This logic doesn't hold true in all
cases due to query-plan optimization, but in general it should apply. The following
example hits the aggregation:
The COUNTROWS function can benefit from aggregations. The following query hits the
aggregation because there is a Count table rows aggregation defined for the Sales
table.
The AVERAGE function can benefit from aggregations. The following query hits the
aggregation because AVERAGE internally gets folded to a SUM divided by a COUNT.
Since the UnitPrice column has aggregations defined for both SUM and COUNT, the
aggregation is hit.
In some cases, the DISTINCTCOUNT function can benefit from aggregations. The
following query hits the aggregation because there is a GroupBy entry for CustomerKey,
which maintains the distinctness of CustomerKey in the aggregation table. This
technique might still hit the performance threshold where more than two to five million
distinct values can affect query performance. However, it can be useful in scenarios
where there are billions of rows in the detail table, but two to five million distinct values
in the column. In this case, the DISTINCTCOUNT can perform faster than scanning the
table with billions of rows, even if it were cached into memory.
DAX time-intelligence functions are aggregation aware. The following query hits the
aggregation because the DATESYTD function generates a table of CalendarDay values,
and the aggregation table is at a granularity that is covered for group-by columns in the
Date table. This is an example of a table-valued filter to the CALCULATE function, which
can work with aggregations.
The following table contains the Movement numeric column to be aggregated. All the
other columns are attributes to group by. The table contains IoT data and a massive
number of rows. The storage mode is DirectQuery. Queries on the data source that
aggregate across the whole dataset are slow because of the sheer volume.
To enable interactive analysis on this dataset, you can add an aggregation table that
groups by most of the attributes, but excludes the high-cardinality attributes like
longitude and latitude. This dramatically reduces the number of rows, and is small
enough to comfortably fit into an in-memory cache.
You define the aggregation mappings for the Driver Activity Agg table in the Manage
aggregations dialog.
In aggregations based on GroupBy columns, the GroupBy entries aren't optional.
Without them, the aggregations won't get hit. This is different from using aggregations
based on relationships, where the GroupBy entries are optional.
The following table shows the aggregations for the Driver Activity Agg table.
You can set the storage mode of the aggregated Driver Activity Agg table to Import.
Especially for models that contain filter attributes in fact tables, it's a good idea to use
Count table rows aggregations. Power BI may submit queries to the dataset using
COUNTROWS in cases where it is not explicitly requested by the user. For example, the
filter dialog shows the count of rows for each value.
For example, the following model replicates Month, Quarter, Semester, and Year in the
Sales Agg table. There is no relationship between Sales Agg and the Date table, but
there are relationships to Customer and Product Subcategory. The storage mode of
Sales Agg is Import.
The following table shows the entries set in the Manage aggregations dialog for the
Sales Agg table. The GroupBy entries where Date is the detail table are mandatory, to
hit aggregations for queries that group by the Date attributes. As in the previous
example, the GroupBy entries for CustomerKey and ProductSubcategoryKey don't
affect aggregation hits, except for DISTINCTCOUNT, because of the presence of
relationships.
Combined aggregation query examples
The following query hits the aggregation, because the aggregation table covers
CalendarMonth, and CategoryName is accessible via one-to-many relationships.
SalesAmount uses the SUM aggregation.
The following query doesn't hit the aggregation, because the aggregation table doesn't
cover CalendarDay.
The following time-intelligence query doesn't hit the aggregation, because the
DATESYTD function generates a table of CalendarDay values, and the aggregation table
doesn't cover CalendarDay.
Aggregation precedence
Aggregation precedence allows multiple aggregation tables to be considered by a
single subquery.
The Driver Activity DirectQuery table contains over a trillion rows of IoT data
sourced from a big-data system. It serves drillthrough queries to view individual
IoT readings in controlled filter contexts.
The Driver Activity Agg table is an intermediate aggregation table in DirectQuery
mode. It contains over a billion rows in Azure Synapse Analytics (formerly SQL Data
Warehouse) and is optimized at the source using columnstore indexes.
The Driver Activity Agg2 Import table is at a high granularity, because the group-
by attributes are few and low cardinality. The number of rows could be as low as
thousands, so it can easily fit into an in-memory cache. These attributes happen to
be used by a high-profile executive dashboard, so queries referring to them should
be as fast as possible.
7 Note
DirectQuery aggregation tables that use a different data source from the detail
table are only supported if the aggregation table is from a SQL Server, Azure SQL,
or Azure Synapse Analytics (formerly SQL Data Warehouse) source.
The memory footprint of this model is relatively small, but it unlocks a huge dataset. It
represents a balanced architecture because it spreads the query load across
components of the architecture, utilizing them based on their strengths.
The Manage aggregations dialog for Driver Activity Agg2 sets the Precedence field to
10, which is higher than for Driver Activity Agg. The higher precedence setting means
queries that use aggregations will consider Driver Activity Agg2 first. Subqueries that
aren't at the granularity that can be answered by Driver Activity Agg2 will consider
Driver Activity Agg instead. Detail queries that cannot be answered by either
aggregation table will be directed to Driver Activity.
The table specified in the Detail Table column is Driver Activity, not Driver Activity Agg,
because chained aggregations are not allowed.
The following table shows the aggregations for the Driver Activity Agg2 table.
The following JSON snippet shows an example of the output of the event when an
aggregation is used.
Community
Power BI has a vibrant community where MVPs, BI pros, and peers share expertise in
discussion groups, videos, blogs and more. When learning about aggregations, be sure
to check out these additional resources:
Power BI Community
Search "Power BI aggregations" on Bing
See also
Automatic aggregations
Composite models
Manage storage mode in Power BI
Desktop
Article • 06/19/2023
In Microsoft Power BI Desktop, you can specify the storage mode of a table. The storage
mode lets you control whether or not Power BI Desktop caches table data in-memory
for reports.
Setting the storage mode provides many advantages. You can set the storage mode for
each table individually in your model. This action enables a single dataset, which
provides the following benefits:
Large datasets: Tables that aren't cached don't consume memory for caching
purposes. You can enable interactive analysis over large datasets that are too large
or expensive to completely cache into memory. You can choose which tables are
worth caching, and which aren't.
Data refresh optimization: You don't need to refresh tables that aren't cached. You
can reduce refresh times by caching only the data that's necessary to meet your
service level agreements and your business requirements.
The storage mode setting in Power BI Desktop is one of three related features:
Storage mode: With storage mode, you can now specify which visuals require a
query to back-end data sources. Visuals that don't require a query are imported
even if they're based on DirectQuery. This feature helps improve performance and
reduce back-end load. Previously, even simple visuals, such as slicers, initiated
queries that were sent to back-end sources.
1. In Model view, select the table whose properties you want to view or set.
2. In the Properties pane, expand the Advanced section, and expand the Storage
mode drop-down.
You set the Storage mode property to one of these three values:
Import: Imported tables with this setting are cached. Queries submitted to the
Power BI dataset that return data from Import tables can be fulfilled only from
cached data.
DirectQuery: Tables with this setting aren't cached. Queries that you submit to the
Power BI dataset—for example, DAX queries—and that return data from
DirectQuery tables can be fulfilled only by executing on-demand queries to the
data source. Queries that you submit to the data source use the query language
for that data source, for example, SQL.
Dual: Tables with this setting can act as either cached or not cached, depending on
the context of the query that's submitted to the Power BI dataset. In some cases,
you fulfill queries from cached data. In other cases, you fulfill queries by executing
an on-demand query to the data source.
Changing the Storage mode of a table to Import is an irreversible operation. After this
property is set, it can't later be changed to either DirectQuery or Dual.
7 Note
You can use Dual storage mode in both Power BI Desktop and the Power BI service.
You can set the dimension tables (Customer, Geography, and Date) to Dual to reduce
the number of limited relationships in the dataset, and improve performance. Limited
relationships normally involve at least one DirectQuery table where join logic can't be
pushed to the source systems. Because Dual tables can act as either DirectQuery or
Import tables, this situation is avoided.
The propagation logic is designed to help with models that contain many tables.
Suppose you have a model with 50 tables and only certain fact (transactional) tables
need to be cached. The logic in Power BI Desktop calculates the minimum set of
dimension tables that must be set to Dual, so you don’t have to.
The propagation logic traverses only to the one side of one-to-many relationships.
Sales DirectQuery
SurveyResponse Import
Date Dual
Customer Dual
Geography Dual
Setting these storage mode properties results in the following behaviors, assuming that
the Sales table has significant data volume:
Power BI Desktop doesn't cache the Sales table. Power BI Desktop provides the
following results by not caching this table:
Data-refresh times are improved, and memory consumption is reduced.
Report queries that are based on the Sales table run in DirectQuery mode.
These queries might take longer but are closer to real time, because no caching
latency is introduced.
Report queries that are based on the SurveyResponse table are returned from the
in-memory cache, and are therefore relatively fast.
For each Query Begin event, check other events with the same ActivityID. For example, if
there isn't a DirectQuery Begin event, but there's a Vertipaq SE Query Begin event, the
query is answered from the cache.
Queries that refer to Dual tables return data from the cache, if possible; otherwise, they
revert to DirectQuery.
The following query continues from the previous table. It refers only to a column from
the Date table, which is in Dual mode. Therefore, the query should hit the cache:
The following query refers only to a column from the Sales table, which is in
DirectQuery mode. Therefore, it shouldn't hit the cache:
The following query is interesting because it combines both columns. This query doesn't
hit the cache. You might initially expect it to retrieve CalendarYear values from the cache
and SalesAmount values from the source and then combine the results, but this
approach is less efficient than submitting the SUM/GROUP BY operation to the source
system. If the operation is pushed down to the source, the number of rows returned will
likely be far less:
7 Note
The Dual storage mode is a performance optimization. It should be used only in ways
that don't compromise the ability to meet business requirements. For alternative
behavior, consider using the techniques described in the Many-to-many relationships in
Power BI Desktop.
Data view
If at least one table in the dataset has its storage mode set to either Import or Dual, the
Data view tab is displayable.
When you select Dual and Import tables in Data view, they show cached data.
DirectQuery tables don't show data, and a message is displayed that states that
DirectQuery tables can't be shown.
SAP HANA
SAP Business Warehouse
When you connect to those multi-dimensional sources using DirectQuery, you can't
connect to another DirectQuery source or combine it with imported data.
The existing limitations of using DirectQuery still apply when you use composite models.
Many of those limitations are now per table, depending upon the storage mode of the
table. For example, a calculated column on an imported table can refer to other tables,
but a calculated column on a DirectQuery table is still restricted to refer only to columns
on the same table. Other limitations apply to the model as a whole, if any of the tables
within the model are DirectQuery.
Next steps
For more information about composite models and DirectQuery, see the following
articles:
You can connect to multidimensional models in Power BI, and create reports that
visualize all sorts of data within the model. With multidimensional models, Power BI
applies rules to how it processes data, based on which column is defined as the default
member.
With multidimensional models, Power BI handles data from the model based on where
the column that contains the Default Member is used. The DefaultMember property
value for an attribute hierarchy is set in CSDL (Conceptual Schema Definition Language)
for a particular column in a multidimensional model. For more information about the
default member, see Attribute properties - Define a default member. When a data
analysis expression (DAX) query is executed, the default member specified in the model
is applied automatically.
This article describes how Power BI behaves under various circumstances when working
with multidimensional models, based on where the default member is found.
If the default member is removed, deselecting the value clears it for all visuals to which
the filter card applies, and the values displayed don't reflect the default member.
For example, imagine we have a Currency column and a default member set to USD:
In this example case, if we have a card that shows Total Sales, the value will have
the default member applied and the sales that correspond to USD.
If we drag Currency to the filter card pane, we see USD as the default value
selected. The value of Total Sales remains the same, since the default member is
applied.
However, if we deselect the USD value from the filter card, the default member for
Currency is cleared, and now Total Sales reflects all currencies.
When we select another value in the filter card (let's say we select EURO), along the
default member, the Total Sales reflects the filter Currency IN {USD, EURO}.
Group visuals
In Power BI, whenever you group a visual on a column that has a default member, Power
BI clears the default member for that column and its attribute relationship path. This
behavior ensures the visual displays all values, instead of just the default values.
Let's look at an example to clarify the behavior. Consider the following configuration of
ARPs:
Now let's imagine that the following default members are set for these columns:
Now let's examine what happens when each column is used in Power BI. When visuals
group on the following columns, here are the results:
City - Power BI displays all the cities by clearing all the default members for City,
State, Country/Region but preserves the default member for Population; Power BI
cleared the entire ARP for City.
7 Note
Population isn't in the ARP path of City, it's solely related to State and thus
Power BI doesn't clear it.
State - Power BI displays all the States by clearing all default members for City,
State, Country/Region and Population.
Country/Region - Power BI displays all the countries/regions by clearing all default
members for City, State and Country/Region, but preserves the default member for
Population.
City and State - Power BI clears all default members for all columns.
Groups displayed in the visual have their entire ARP path cleared.
If a group isn't displayed in the visual, but is part of the ARP path of another grouped-
on column, the following applies:
When a slicer or filter card is loaded with data, Power BI groups on the column in
the visual, so the display behavior is the same as described in the previous section.
Since slicers and filter cards are often used to interact with other visuals, the logic of
clearing default members for the affected visuals occurs as explained in the following
table.
For this table, we use the same example data from earlier in this article:
The following rules apply to the way Power BI behaves in these circumstances.
The column has a filter card with default stated, and Power BI is grouping on a
column in its ARP.
The column is above another column in the ARP, and Power BI has a filter card for
that other column in default state.
Next steps
This article described the behavior of Power BI when working with default members in
multidimensional models. You might also be interested in the following articles:
Business users rely heavily on centrally governed data sources built by information
technology teams (IT), but it can take months for an IT department to deliver a change
in a given data source. In response, users often resort to building their own data marts
with Access databases, local files, SharePoint sites and spreadsheets, resulting in a lack
of governance and proper oversight to ensure such data sources are supported and
have reasonable performance.
Datamarts help bridge the gap between business users and IT. Datamarts are self-
service analytics solutions, enabling users to store and explore data that is loaded in a
fully managed database. Datamarts provide a simple and optionally no-code experience
to ingest data from different data sources, extract transform and load (ETL) the data
using Power Query, then load it into an Azure SQL database that's fully managed and
requires no tuning or optimization.
Once data is loaded into a datamart, you can additionally define relationships and
policies for business intelligence and analysis. Datamarts automatically generate a
dataset or semantic model, which can be used to create Power BI reports and
dashboards. You can also query a datamart using a T-SQL endpoint or using a visual
experience.
Self-service users can easily perform relational database analytics, without the
need for a database administrator
Datamarts provide end-to-end data ingestion, preparation and exploration with
SQL, including no-code experiences
Enable building semantic models and reports within one holistic experience
Datamart features:
Relational database analytics with Power BI: Access a datamart’s data using
external SQL clients. Azure Synapse and other services/tools that use T-SQL can
also use datamarts in Power BI.
The following table describes these offerings and the best uses for each, including their
role with datamarts.
Datamarts User-based data Datamarts can be used as sources for other datamarts
warehousing and SQL or items, using the SQL endpoint:
access to your data External sharing
Sharing across departmental or organizational
boundaries with security enabled
Dataflows Reusable data prep (ETL) Datamarts use a single, built-in dataflow for ETL.
for datasets or marts Dataflows can accentuate this, enabling:
Loading data to datamarts with different refresh
schedules
Separating ETL and data prep steps from
storage, so it can be reused by datasets
Datasets Metrics and semantic layer Datamarts provide an auto-generated dataset for
for BI reporting reporting, enabling:
Combining data from multiple sources
Selective sharing of the datamart tables for fine-
grained reporting
Composite models - a dataset with data from
the datamart and other data sources outside of
the datamart
Proxy models - a dataset that uses DirectQuery
for the auto-generated model, using a single
source of truth
Dataflows provide reusable extract, transform and load (ETL). Tables can't be browsed,
queried, or explored without a dataset, but can be defined for reuse. The data is
exposed in Power BI or CDM format if you bring your own data lake. Dataflows are used
by Power BI to ingest data into your datamarts. You should use dataflows whenever you
want to reuse your ETL logic.
Build reusable and shareable data prep for items in Power BI.
Datamarts are a fully managed database that enables you to store and explore your
data in a relational and fully managed Azure SQL DB. Datamarts provide SQL support, a
no-code visual query designer, Row Level Security (RLS), and auto-generation of a
dataset for each datamart. You can perform ad-hoc analysis and create reports, all on
the web.
Next steps
This article provided an overview of datamarts and the many ways you can use them.
The following articles provide more information about datamarts and Power BI:
Understand datamarts
Get started with datamarts
Analyzing datamarts
Create reports with datamarts
Access control in datamarts
Datamart administration
Microsoft Fabric decision guide: data warehouse or lakehouse
For more information about dataflows and transforming data, see the following articles:
The default Power BI dataset created from a datamart eliminates the need to connect to
a separate dataset, set up refresh schedules, and manage multiple data elements.
Instead, you can build your business logic in a datamart and its data will be immediately
available in Power BI, enabling the following:
During preview, default dataset connectivity is available using DirectQuery only. The
following image shows how datamarts fit into the process continuum starting with
connecting to data, all the way through creating reports.
Default datasets are different from traditional Power BI datasets in the following ways:
The XMLA endpoint supports read-only operations and users can't edit the dataset
directly. With XMLA read-only permission you can query the data in a query
window.
The default datasets don't have data source settings and users don't need to enter
credentials. Rather, they use automatic single sign-on (SSO) for queries.
For refresh operations, datasets use the dataset author credentials to connect to
the managed datamart’s SQL endpoint.
With Power BI Desktop users can build composite models, enabling you to connect to
the datamart’s dataset and do the following:
Finally, if you don't want to use the default dataset directly, you can connect to the
datamart’s SQL endpoint. For more information, see Create reports using datamarts.
The background sync that includes objects (tables and views) will wait for the
downstream dataset to not be in use to update the dataset, honoring bounded
staleness. Users can always go and manually pick tables they want or not want in the
dataset.
For most datamarts, incremental refresh will involve one or more tables that contain
transaction data that changes often and can grow exponentially, such as a fact table in a
relational or star database schema. If you use an incremental refresh policy to partition
the table, and refresh only the most recent import partitions, you can significantly
reduce the amount of data that must be refreshed.
Incremental refresh and real-time data for datamarts offers the following advantages:
Proactive caching works in the following way: after each refresh, the storage mode for
the default dataset is changed to DirectQuery. Proactive caching builds a side-by-side
import model asynchronously and is managed by the datamart, and doesn't affect
availability or performance of the datamart. Queries coming in after the default dataset
is complete will use the import model.
Refreshes
New data sources
Schema changes:
New data sources
Updates to data preparation steps in Power Query Online
Any modeling updates, such as:
Measures
Hierarchies
Descriptions
Next steps
This article provided an overview of important datamart concepts to understand.
The following articles provide more information about datamarts and Power BI:
Introduction to datamarts
Get started with datamarts
Analyzing datamarts
Create reports using datamarts
Control access to datamarts
Administration of datamarts
For more information about dataflows and transforming data, see the following articles:
This article describes how to get started using datamarts, including various sample data
that can jump-start your experience. You'll learn about sample datasets you can use with
datamarts, how to create datamarts from scratch, how to rename or delete a datamart,
and other helpful information to get you acquainted and proficient with datamarts.
Sample data
You can use the following various types of sample data to explore datamarts. All of the
following resources contain free sample data:
Eight Departmental Samples in Excel workbook format, which are Excel versions of
the Power BI built-in samples containing the datasets from numerous use cases:
Customer profitability
IT spend analysis
Human resources
Opportunity analysis
Procurement analysis
Retail analysis
Sales and marketing supplier quality analysis
A financial data sample workbook, which is a simple flat table in an Excel file
available for download. It contains anonymous data with fictitious products
including sales divided by segments and region.
COVID 19 world data is based on data from Johns Hopkins University. Before
publishing this data, we recommend reviewing the disclaimers article.
Northwind Traders OData feed , data from a fictitious organization that manages
orders, products, customers, suppliers, and many other aspects of a small business.
You can also start using datamarts from any dataflow you currently have as well. Starting
from an existing dataflow will copy data into your datamart, at which point you can
apply other transformations or just use it as a data source to explore datamarts.
Create a datamart
To create a datamart, navigate to your existing Power BI Premium or Premium Per User
(PPU) workspace. Datamarts require a Power BI Premium subscription. In your Premium
workspace, select + New and then select **Datamart (Preview) to create a datamart.
If you choose to get data from another source, a data source selection window appears
where you can select from a multitude of data sources.
You can also drag and drop files from your computer to load data into your datamart,
such as Excel files. Some data sources may require parameters or connection strings to
properly connect.
Once connected, select the tables you want to load into your datamart. You can apply
transformations to your selected data and load the data into the datamart. Once the
data is loaded, the tables are imported into your datamart. You can monitor the
progress in the status bar.
For each table you select, a corresponding view is created in the datamart that appears
in the Object explorer in Data View.
Model data
To model your data, navigate to Model view by selecting on the Model View icon at the
bottom of the window, as shown in the following image.
Adding or removing objects to the default dataset
In Power BI, a dataset is always required before any reports can be built, so the default
dataset enables quick reporting capabilities on top of the datamart. Within the
datamart, a user can add datamart objects - tables to their default dataset. They can also
add additional semantic modeling properties, such as hierarchies and descriptions.
These are then used to create the Power BI dataset’s tables. Users can also remove
objects from the default dataset.
To add objects – tables or views to the default dataset, a user has 2 options:
Automatically add objects to the dataset, which happens by default with no user
intervention needed
Manually add objects to the dataset
The auto detect experience determines any tables or views and opportunistically adds
them.
The manually detect option in the ribbon allows fine grained control of which object(s) –
tables and/or views, should be added to the default dataset:
Select all
Filter for tables or views
Select specific objects
To remove objects, a user can use the manually select button in the ribbon and:
Un-select all
Filter for tables or views
Un-select specific objects
Create a measure
To create a measure (a measure is a collection of standardized metrics) select the table
in the Table Explorer and select the New Measure button in the ribbon, as shown in the
following image.
Enter the measure into the formula bar and specify the table and the column to which it
applies. The formula bar lets you enter your measure. Similar to Power BI Desktop, the
DAX editing experience in datamarts presents a rich editor complete with auto-
complete for formulas (intellisense). The DAX editor enables you to easily develop
measures right in datamart, making it a more effective single source for business logic,
semantics, and business critical calculations.
You can expand the table to find the measure in the table.
Create a relationship
To create a relationship in a datamart, select the Model view and select your datamart,
then drag the column from one table to the column on the other table to initiate the
relationship. In the window that appears, configure the relationship properties.
Select the Confirm button when your relationship is complete to save the relationship
information.
1. From the datamart context menu, select Refresh now or select Scheduled refresh.
2. From the datamart settings page, select Scheduled refresh
To set up incremental refresh for a datamart, select the table for which you want to set
up incremental refresh for in the datamart editor. In the Table tools ribbon, select the
Incremental refresh icon, and a right pane appears enabling you to configure
incremental refresh for the selected table.
Datamarts and deployment pipelines
Datamarts are supported in deployment pipelines. Using deployment pipelines, you can
deploy updates to your datamart across a designated pipeline. You can also use rules to
connect to relevant data in each stage of the pipeline. To learn how to use deployment
pipelines, see Get started with deployment pipelines.
Rename a datamart
There are two ways to rename a datamart:
First, from within the Datamart editor, select the datamart name from the top of the
editor and edit the datamart name in the window that appears, as shown in the
following image. Select on the ribbon outside of the rename window to save the new
name.
Alternatively, you can change the datamart name from the workspace list view. Select
the more menu (...) next to the datamart name in the workspace view.
Delete a datamart
To delete a datamart, navigate to the workspace and find the datamart you want to
delete. Select the more menu (...) and select Delete from the menu that appears.
Datamart deletion is not immediate, and requires a few days to complete.
Analyze in Uses the existing Analyze in Excel capability on auto-generated dataset. Learn more
Excel about Analyze in Excel
Create Build a Power BI report in DirectQuery mode. Learn more about get started creating
report in the Power BI service
Menu Option Description
Option
Delete Delete dataset from workspace. A confirmation dialog notifies you of the impact of
delete action. If Delete action is confirmed, then the datamart and related
downstream items will be deleted
Manage Enables users to add other recipients with specified permissions, similar to allowing
permissions the sharing of an underlying dataset or allowing to build content with the data
associated with the underlying dataset.
Menu Option Description
Option
Refresh Provides the history of refresh activity with the duration of activity and status.
history
Rename Updates the datamart and auto-generated dataset with the new name.
Share Lets users share the datamart to build content based on the underlying auto-
generated dataset and query the corresponding SQL endpoint. Shares the datamart
access (SQL- read only, and autogenerated dataset) with other users in your
organization. Users receive an email with links to access the detail page where they
can find the SQL Server URL and can access the auto-generated dataset to create
reports based on it.
View This shows the end-to-end lineage of datamarts from the data sources to the
lineage datamart, the auto-generated dataset, and other datasets (if any) that were built on
top of the datamarts, all the way to deports, dashboards and apps.
Datamart settings
Datamart settings are accessible from the context menu for datamarts. This section
describes and explains the datamart settings options and their description. The
following image shows the datamart settings menu.
Setting Detail
Datamart Lets users add metadata details to provide descriptive information about a
description datamart.
Server The SQL endpoint connection string for a datamart. You can use the connection
settings string to create a connection to the datamart using various tools, such as SSMS.
Setting Detail
Data Lets you get data source information and edit credentials.
source
credentials
Schedule Data refresh information for the datamart, based on the schedule defined by the
refresh user.
Setting Detail
Sensitivity Sensitivity label applied on datamart, which also gets propagated on the
label downstream auto-generated dataset, reports, and so on.
The following table shows settings for auto-generated datasets. When these settings are
applied on an auto-generated dataset, they're also applied to datamart as well.
Setting Details
Endorsement
and discovery
Request
access
Q&A
Query
caching
You can only create one cloud connection of a particular path and type, for
example, you could only create one SQL plus server/database cloud connection.
You can create multiple gateway connections.
You cannot name or rename cloud data sources; you can name or rename gateway
connections.
Next steps
This article provided sample data and instructions on how to create and interact with
datamarts.
The following articles provide more information about datamarts and Power BI:
Introduction to datamarts
Understand datamarts
Analyzing datamarts
Create reports with datamarts
Access control in datamarts
Datamart administration
For more information about dataflows and transforming data, see the following articles:
You can analyze your datamarts with multiple tools, including the Datamart editor and
the SQL Query Editor among others. This article describes how to analyze your
datamarts with those tools, and suggestions on how best to see the information you
need.
Visual query
Once you've loaded data into your datamart, you can use the Datamart editor to create
queries to analyze your data. You can use the Visual Query editor for a no-code
experience to create your queries.
In the Data grid view, create a new query using the + New Query button on the ribbon,
as shown in the following image.
Alternatively you can use the Design view icon found along the bottom of the Datamart
editor window, shown in the following image.
To create a query, drag and drop tables from the Object explorer on the left on to the
canvas.
Once you drag one or more tables onto the canvas, you can use the visual experience to
design your queries. The datamart editor uses the similar Power Query diagram view
experience to enable you to easily query and analyze your data. Learn more about
Power Query diagram view.
As you work on your Visual query, the queries are automatically saved every few
seconds. A “saving indicator” that shows up in your query tab at the bottom indicates
that your query is being saved.
The following image shows a sample query created using the no-code Visual Query
editor to retrieve the Top customers by Orders.
There are a few things to keep in mind about the Visual Query editor:
The SQL Query editor provides support for intellisense, code completion, syntax
highlighting, client-side parsing and validation. Once you’ve written the T-SQL query,
select Run to execute the query. As you work on your SQL query, the queries are
automatically saved every few seconds. A “saving indicator” that shows up in your query
tab at the bottom indicates that your query is being saved. The Results preview is
displayed in the Results section. The Download in Excel button opens the
corresponding T-SQL Query to Excel and executes the query, enabling you to view the
results in Excel. The Visualize results allows you to create reports from your query
results within the SQL query editor.
There are a few things to keep in mind about the Visual Query editor:
To connect to a datamart’s SQL endpoint with client tooling, navigate to the dataset
settings page by selecting the Datamarts (Preview) tab in Power BI. From there, expand
the Server settings section and copy the connection string, as shown in the following
image.
Once the Connect to Server window is open, paste the connection string copied from
the previous section of this article into the Server name box. Select Connect and
proceed with the appropriate credentials for authentication. Remember that only Azure
Active Directory - MFA authentication is supported.
When the connection has become established, the object explorer displays the
connected SQL DB from your datamarts and its respective tables and views, all of which
are ready to be queried.
To easily preview the data within a table, right-click on a table and select Select Top
1000 Rows from the context menu that appears. An autogenerated query returns a
collection of results displaying the top 1,000 rows based on the primary key of the table.
The following image shows the results of such a query.
To see the columns within a table, expand the table within Object explorer.
When you connect to datamart using SSMS or other client tools, you can see views
created in Model schema of the datamart. The default schema configuration on a
datamart is set to Model.
A datamart shows two other roles as admin and viewer under security when connected
using SSMS. Users added to a workspace in any of the Admin or Member or Contributor
roles get added to the admin role on the datamart. Users added to the Viewer role in
the workspace get added to viewer role in the datamart.
Relationships metadata
The extended property isSaaSMetadata added in the datamart lets you know that this
metadata is getting used for SaaS experience. You can query this extended property as
below:
SQL
The clients (such as the SQL connector) could read the relationships by querying the
table-valued function like the following:
SQL
SELECT *
FROM [metadata].[fn_relationships]();
Notice there are relationships and relationshipColumns named views under metadata
schema to maintain relationships in the datamart. The following tables provide a
description of each of them, in turn:
[metadata].[relationships]
[metadata].[relationshipColumns]
You can join these two views to get relationships added in the datamart. The following
query will join these views:
SQL
SELECT
R.RelationshipId
,R.[Name]
,R.[FromSchemaName]
,R.[FromObjectName]
,C.[FromColumnName]
,R.[ToSchemaName]
,R.[ToObjectName]
,C.[ToColumnName]
FROM [METADATA].[relationships] AS R
JOIN [metadata].[relationshipColumns] AS C
ON R.RelationshipId=C.RelationshipId
Limitations
Visualize results currently does not support SQL queries with an ORDER BY clause.
Next steps
This article provided information about analyzing data in datamarts.
The following articles provide more information about datamarts and Power BI:
Introduction to datamarts
Understand datamarts
Get started with datamarts
Create reports with datamarts
Access control in datamarts
Datamart administration
For more information about dataflows and transforming data, see the following articles:
You can discover data through the data hub, and create reusable and autogenerated
datasets to create reports in various ways in Power BI. This article describes the various
ways you can discover datamarts.
In the data hub, when you select a datamart, you're taken to its information page where
you can see the datamart’s metadata, supported actions, lineage, and impact analysis
along with related reports on that datamart.
The autogenerated dataset from a datamart behaves the same as other datasets in
Power BI. For more information, see data discovery using the data hub
A page displays the information about the datamart, provides a button to create a new
report, share datamart, pull data into Excel or view lineage. Related reports for the
selected datamart are also displayed, if any exist. You can also navigate to the datamart
editor, its settings, or manage permissions.
The page also shows the workspace where the datamart is located, its endorsement
status, its last refresh time, and any sensitivity settings that have been applied. It also
displays the datamart's SQL endpoint connection string and the datamart's description.
You can view the lineage of the datamart by selecting Lineage > Open lineage from the
ribbon menu. The window that appears displays the end-to-end lineage view describing
the flow of data from the data source to the datamart, the underlying autogenerated
dataset, and all downstream items such as reports, dashboards, or apps.
To view any dependent items of the selected datamart, select the Impact analysis menu,
which is displayed along the right side of the screen.
Data hub in Power BI Desktop
The data hub in Power BI Desktop lets you discover datamarts and datasets. Once the
datamart filter is selected, the list shows the datamarts to which you have access.
The following image shows how to select datamarts from the data hub Home ribbon
menu in Power BI Desktop.
The data hub appears in a window within Power BI Desktop, as the following screen
shows.
Selecting a datamart from the list enables the Connect button in the window. Selecting
Connect with a datamart selected loads the datamart's underlying and autogenerated
dataset, from which you can begin to build reports. By selecting Connect to SQL
endpoint, you're making a live connection to datamart’s SQL connection string to read
data and build reports.
Next steps
This article provided information about creating reports using datamarts.
The following articles provide more information about datamarts and Power BI:
Introduction to datamarts
Sharing datamarts and managing permissions
Understand datamarts
Get started with datamarts
Analyzing datamarts
Access control in datamarts
Datamart administration
For more information about dataflows and transforming data, see the following articles:
This article describes the ways you can share your datamarts and manage its
permissions to provide users with specific access.
There are a few ways to share a datamart, described in the following sections.
The following image shows selecting the context menu from within the data hub, and
selecting Share.
Share from datamart information page
To share a datamart from the information page in the data hub, select the Share button
from the ribbon at the top of the page.
The following image shows the Share button from the ribbon.
You can also select the Share datamart button from the information panel itself, within
the data hub. The following image highlights the Share button on the information panel.
You can choose whether recipients can reshare the datamart with others in the
organization, by selecting the checkbox next to Allow recipients to share this datamart.
There's an option to allow users to create Power BI reports (from scratch, autocreate,
paginated reports) on top of the default dataset that is connected to the datamart by
selecting the checkbox next to Build reports on the default dataset. Both of these
options are selected by default.
You can also choose to send recipients a message to provide more context, by typing a
message into the Add a message (optional) field in the Grant people access window.
When recipients open the link or otherwise navigate to the shared datamart, its
information page shows the SQL connection string for connecting to the datamart.
Users can use client tools other than Power BI, such as SSMS, to query the datamart
using T-SQL.
The following image highlights the SQL connection string in a datamart information
window.
Users can build reports with the datamart or use Analyze in Excel, and can also connect
to the datamart or underlying dataset from Power BI Desktop.
The following image highlights the Create a report entry point in a datamart
information window.
7 Note
Sharing a datamart allows the recipient to access the datamart for downstream
consumption and not to collaborate on the datamart creation. To enable other
creators to collaborate on the datamart, you must provide Member, Admin or
Contributor access to the workspace where the datamart is created.
Manage permissions
The Manage permissions page shows the list of users who have been given access by
either assigning to Workspace roles or item permissions (as described earlier in this
article).
If you're an Admin or Member, go to your workspace and select More options which
shows the context menu and select Manage permissions.
For users who were provided workspace roles, it shows the corresponding user,
workspace role, and permissions. Admin and Members have Read, Write, and Reshare
access to datamarts in this workspace. Contributors have Read and Write permissions.
Viewers have Read permissions and can query all objects within the datamart. For users
with whom a datamart was shared, item permissions such as Read and Reshare are
provided by default.
You can choose to add or remove permissions using the Manage permissions
experience. Remove reshare removes the Reshare permissions. Remove access removes
all item permissions and stops sharing the datamart with the specified user.
Next steps
This article provided information about creating reports using datamarts.
The following articles provide more information about datamarts and Power BI:
Introduction to datamarts
Sharing and discovering data using datamarts
Understand datamarts
Get started with datamarts
Analyzing datamarts
Access control in datamarts
Datamart administration
For more information about dataflows and transforming data, see the following articles:
Datamarts let you create reusable and auto-generated datasets to create reports in
various ways in Power BI. This article describes the various ways you can use datamarts,
and their auto-generated datasets, to create reports.
For example, you can establish a live connection to a shared dataset in the Power BI
service and create many different reports from the same dataset. You can create your
perfect data model in Power BI Desktop and publish it to the Power BI service. Then you
and others can create multiple different reports in separate .pbix files from that common
data model and save them to different workspaces.
Advanced users can build reports from a datamart using a composite model or using
the SQL Endpoint.
Reports that use datamarts can be created in either of the following two tools:
Let's take a look at how datamarts can be used with each, in turn.
Selecting New report opens a browser tab to the report editing canvas to a new report
that is built on the dataset. When you save your new report you're prompted to choose
a workspace, provided you have write permissions for that workspace. If you don't have
write permissions, or if you're a free user and the dataset resides in a Premium-capacity
workspace, the new report is saved in your My workspace.
Scenario 2: Using the auto-generated dataset and action menu in the workspace: In the
Power BI workspace, navigate to the auto-generated dataset and select the More menu
(...) to create a report in the Power BI service.
Selecting the More opens the report editing canvas to a new report that is built on the
dataset. When you save your new report, it's saved in the workspace that contains the
dataset as long as you have write permissions on that workspace. If you don't have write
permissions, or if you're a free user and the dataset resides in a Premium-capacity
workspace, the new report is saved in your My workspace.
Scenario 3: Using the auto-generated dataset and dataset details page. In the Power BI
workspace list, select the auto-generated dataset's name to get to the Dataset details
page, where you can find details about the dataset and see related reports. You can also
create a report directly from this page. To learn more about creating a report in this
fashion, see Dataset details.
In the data hub, you'll see datamarts and their associated auto-generated datasets.
Select the datamart to navigate to the datamart's details page where you can see the
datamart’s metadata, supported actions, lineage and impact analysis, along with related
reports created from that datamart. Auto-generated datasets derived from datamarts
behave the same as any dataset.
To find the datamart, you begin with the data hub. The image below shows the data hub
in the Power BI service, with the following numbered information:
1. Navigate to the datamart settings in your workspace and copy the SQL endpoint
connection string.
2. In Power BI Desktop select the SQL Server connector from the ribbon or from Get
Data.
3. Paste the connection string into the connector dialog.
4. For authentication, select organizational account.
5. Authenticate using Azure Active Directory - MFA (the same way you would connect
to Power BI)
6. Select Connect.
7. Select the data items you want to include or not include in your dataset.
For more information, see connect to on-premises data in SQL Server. You don't need to
set up a gateway with datamarts to use them in Power BI.
Next steps
This article provided information about creating reports using datamarts.
The following articles provide more information about datamarts and Power BI:
Introduction to datamarts
Understand datamarts
Get started with datamarts
Analyzing datamarts
Access control in datamarts
Datamart administration
For more information about dataflows and transforming data, see the following articles:
This article describes access control to datamarts, including row level security, rules in
Power BI Desktop, and how datamarts might become inaccessible or unavailable.
Workspace roles
Assigning users to the various workspace roles provides the following capabilities with
respect to Datamarts:
Workspace Description
role
Admin Grants the user permissions to ingest data through a dataflow, write SQL and
visual queries, and update the model or dataset (create relationships, create
measures etc.)
Member Grants the user permissions to ingest data through a dataflow, write SQL and
visual queries, and update the model or dataset (create relationships, create
measures etc.)
Contributor Grants the user permissions to ingest data through a dataflow, write SQL and
visual queries, and update the model or dataset (create relationships, create
measures etc.)
Viewer Grants the user permissions to write SQL and visual queries and access the
“Model view” in read-only mode. For more information, see Viewer restrictions.
Viewer restrictions
The Viewer role is a more limited role in comparison with the other workspace roles. In
addition to fewer SQL permissions given to viewers, there are more actions they are
restricted from performing.
Feature Limitation
Settings Viewers have read-only access, so they cannot rename datamart, add
description, or change sensitivity label.
Run queries Viewers do not have full DML/DDL capabilities unless granted specifically.
Viewers can read data using SELECT statement in SQL query editor and use
Feature Limitation
all tools in the toolbar in the Visual query editor. Viewers can also read data
from Power BI Desktop and other SQL client tools.
Manually update Viewers cannot manually update the default dataset to which the datamart is
dataset connected.
Lineage view Viewers do not have access to reading the lineage view chart.
Create a report Viewers do not have access to create content within the workspace and
hence cannot build reports on top of the datamart.
You can configure RLS for datamarts in the Datamart editor. The configured RLS on
datamarts automatically gets applied to downstream items, including the auto-
generated datasets and reports.
7 Note
Datamarts use the enhanced row-level security editor, which means that not all
row-level security filters supported in Power BI can be defined. Limitations include
expressions that today can only be defined using DAX including dynamic rules such
as USERNAME() or USERPRINCIPALNAME(). To define roles using these filters switch
to use the DAX editor.
2. Create new RLS roles using the Row security settings window. You can define a
combination of filters on tables and select Save to save the role.
3. Once the role is saved, select Assign to add users to the role. Once assigned, select
Save to save the role assignments and close the RLS settings modal.
2. Select the role to be validated by selecting the check box for the role, then select
OK.
3. The data view shows the access that the selected role has.
To revert to your access, select the View as button on the ribbon again, and select None.
How datamarts become unavailable
A datamart can get marked as an unavailable datamart when one of the following
situations occurs.
Situation 2: When dataflow updates a datamart and associated dataset, but due to a
system lock the datamart or dataset update is pending, the datamart becomes
unavailable. The Datamart editor isn't accessible when a datamart goes into unavailable
state. The try again action, shown in the following image, enables users to trigger
synchronization between dataflow, datamart and dataset. It may take a few minutes to
complete the requested action but downstream consumption can be continued.
Next steps
This article provided information about controlling access to datamarts.
The following articles provide more information about datamarts and Power BI:
Introduction to datamarts
Understand datamarts
Get started with datamarts
Analyzing datamarts
Create reports with datamarts
Datamart administration
For more information about dataflows and transforming data, see the following articles:
You can administer the use and settings for datamarts just like you can administer other
aspects of Power BI. This article describes and explains how to administer your
datamarts, and where to find the settings.
Create
Rename
Update
Delete
Refresh
View
1. Sign in to the Power BI admin portal as the administrator and navigate to Audit
logs.
2. In the Audit logs section, select the button to go to Microsoft 365 Admin Center
Admin - Get Activity Events - REST API (Power BI Power BI REST APIs)
Track user activities in Power BI
Some connectors aren't supported for datamarts (or dataflows) in Premium workspaces.
When using an unsupported connector, you may receive the following error:
Expression.Error: The import "<"connector name">" matches no exports. Did you miss a
module reference?
The following connectors aren't supported for dataflows and datamarts in Premium
workspaces:
Linkar
Actian
AmazonAthena
AmazonOpenSearchService
BIConnector
DataVirtuality
DenodoForPowerBI
Exasol
Foundry
Indexima
IRIS
JethroODBC
Kyligence
MariaDB
MarkLogicODBC
OpenSearchProject
QubolePresto
SingleStoreODBC
StarburstPresto
TibcoTdv
The use of the previous list of connectors with dataflows or datamarts is only supported
workspaces that aren't Premium.
Next steps
This article provided information about the administration of datamarts.
The following articles provide more information about datamarts and Power BI:
Introduction to datamarts
Understand datamarts
Get started with datamarts
Analyzing datamarts
Create reports with datamarts
Access control in datamarts
For more information about dataflows and transforming data, see the following articles:
Tip
You can also try Dataflow Gen2 in Data Factory in Microsoft Fabric, an all-in-one
analytics solution for enterprises. Microsoft Fabric covers everything from data
movement to data science, real-time analytics, business intelligence, and reporting.
Learn how to start a new trial for free.
As data volume continues to grow, so does the challenge of wrangling that data into
well-formed, actionable information. We want data that’s ready for analytics, to populate
visuals, reports, and dashboards, so we can quickly turn our volumes of data into
actionable insights. With self-service data prep for big data in Power BI, you can go from
data to Power BI insights with just a few actions.
Persist data in your own Azure Data Lake Gen 2 storage, enabling you to expose it
to other Azure services outside Power BI.
Create a single source of truth, curated from raw data using industry standard
definitions, which can work with other services and products in the Power Platform.
Encourage uptake by removing analysts' access to underlying data sources.
If you want to work with large data volumes and perform ETL at scale, dataflows
with Power BI Premium scales more efficiently and gives you more flexibility.
Dataflows supports a wide range of cloud and on-premises sources.
You can use Power BI Desktop and the Power BI service with dataflows to create
datasets, reports, dashboards, and apps that use the Common Data Model. From these
resources, you can gain deep insights into your business activities. Dataflow refresh
scheduling is managed directly from the workspace in which your dataflow was created,
just like your datasets.
7 Note
Dataflows may not be available in the Power BI service for all U.S. Government DoD
customers. For more information about which features are available, and which are
not, see Power BI feature availability for U.S. Government customers.
Next steps
This article provided an overview of self-service data prep for big data in Power BI, and
the many ways you can use it.
The following articles provide more information about dataflows and Power BI:
Creating a dataflow
Configure and consume a dataflow
Configuring Dataflow storage to use Azure Data Lake Gen 2
Premium features of dataflows
AI with dataflows
Dataflows considerations and limitations
Dataflows best practices
Power BI usage scenarios: Self-service data preparation
For more information about the Common Data Model, you can read its overview article:
A dataflow is a collection of tables that are created and managed in workspaces in the
Power BI service. A table is a set of columns that are used to store data, much like a
table within a database. You can add and edit tables in your dataflow, and manage data
refresh schedules, directly from the workspace in which your dataflow was created.
To create a dataflow, launch the Power BI service in a browser then select a workspace
(dataflows aren't available in my-workspace in the Power BI service) from the nav pane
on the left, as shown in the following screen. You can also create a new workspace in
which to create your new dataflow.
The following sections explore each of these ways to create a dataflow in detail.
7 Note
Dataflows can be created by users in a Premium workspace, users with a Pro
license, and users with a Premium Per User (PPU) license.
When you select a data source, you're prompted to provide the connection settings,
including the account to use when connecting to the data source, as shown in the
following image.
Once connected, you can select which data to use for your table. When you choose data
and a source, Power BI reconnects to the data source. The reconnection keeps the data
in your dataflow refreshed at the frequency that you select later in the setup process.
After you select the data for use in the table, you can use dataflow editor to shape or
transform that data into the format necessary for use in your dataflow.
If you want to reuse a table across multiple dataflows, such as a date table or a
static lookup table, you should create a table once and then reference it across the
other dataflows.
If you want to avoid creating multiple refreshes to a data source, it's better to use
linked tables to store the data and act as a cache. Doing so allows every
subsequent consumer to use that table, reducing the load to the underlying data
source.
By selecting Enable load, you create a new table for which its source is the referenced
table. The icon changes, and shows the computed icon, as shown in the following
image.
Any transformation you perform on this newly created table is run on the data that
already resides in Power BI dataflow storage. That means that the query won't run
against the external data source from which the data was imported, like the data pulled
from the SQL database. Instead the query is performed on the data that resides in the
dataflow storage.
Consider the following example: you have an Account table that contains the raw data
for all the customers from your Dynamics 365 subscription. You also have ServiceCalls
raw data from the Service Center, with data from the support calls that were performed
from the different account in each day of the year.
Imagine you want to enrich the Account table with data from the ServiceCalls. First you
would need to aggregate the data from the ServiceCalls to calculate the number of
support calls that were done for each account in the last year.
Next, you would want to merge the Account table with the ServiceCallsAggregated table
to calculate the enriched Account table.
And then you can see the results, shown as EnrichedAccount in the following image.
And that's it - the transformation is performed on the data in the dataflow that resides
in your Power BI Premium subscription, not on the source data.
7 Note
Computed tables are a premium only feature
There are a few requirements for creating dataflows from CDM folders, as the following
list describes:
The ADLS Gen 2 account must have the appropriate permissions set up in order for
PBI to access the file.
The ADLS Gen 2 account must be accessible by the user trying to create the
dataflow.
The URL must be a direct file path to the JSON file and use the ADLS Gen 2
endpoint; blob.core isn't supported (example:
https://myaccount.dfs.core.windows.net/filesystem/path/model.json )
Create a dataflow by using import/export
Creating a dataflow by using import/export lets you import a dataflow from a file. This
tool is useful if you want to save a dataflow copy offline, or move a dataflow from one
workspace to another.
To export a dataflow, select the dataflow you created and select the More menu item
(the ellipsis) to expand the options, and then select Export .json. You're prompted to
begin the download of the dataflow represented in CDM format.
To import a dataflow, select the import box and upload the file. Power BI creates the
dataflow for you, and allows you to save the dataflow as is, or to perform other
transformations.
Next steps
By putting your data into a dataflow you can use Power BI Desktop and the Power BI
service to create datasets, reports, dashboards, and apps. These new resources can give
you insights into your business activities. The following articles go into more detail
about common usage scenarios for dataflows:
With dataflows, you can unify data from multiple sources and prepare that unified data
for modeling. Whenever you create a dataflow, you're prompted to refresh the data for
the dataflow. Refreshing a dataflow is required before it can be consumed in a dataset in
Power BI Desktop, or referenced as a linked or computed table.
7 Note
Dataflows are not available in the Power BI service for U.S. Government DoD
customers. For more information about which features are available, and which are
not, see Power BI feature availability for U.S. Government customers.
Configure a dataflow
To configure the refresh of a dataflow, select More options (the ellipsis) and choose
Settings.
The Settings options provide many options for your dataflow, as the following sections
describe.
Take ownership: If you're not the owner of the dataflow, many of these settings
are disabled. To take ownership of the dataflow, select Take over to take control.
You're prompted to provide credentials to ensure you have the necessary access
level.
Gateway Connection: In this section, you can choose whether the dataflow uses a
gateway, and select which gateway is used. If you have specified the Gateway as
part of editing dataflow, upon taking ownership you may need to update
credentials using the edit dataflow option.
Data source credentials: In this section you choose which credentials are being
used, and can change how you authenticate to the data source.
Sensitivity label: Here you can define the sensitivity of the data in the dataflow. To
learn more about sensitivity labels, see How to apply sensitivity labels in Power BI.
Scheduled refresh: Here you can define the times of day the selected dataflow
refreshes. A dataflow can be refreshed at the same frequency as a dataset.
Enhanced compute engine settings: Here you can define whether the dataflow is
stored in the compute engine. The compute engine allows subsequent dataflows,
which reference this dataflow, to perform merges and joins and other
transformations faster than you would otherwise. It also allows DirectQuery to be
performed over the dataflow. Selecting On ensures the dataflow is always
supported in DirectQuery mode, and any references benefit from the engine.
Selecting Optimized means the engine is only used if there's a reference to this
dataflow. Selecting Off disables the compute engine and DirectQuery capability for
this dataflow.
7 Note
Users with a Pro license or a Premium Per User (PPU) can create a dataflow in a
Premium workspace.
U Caution
Refresh a dataflow
Dataflows act as building blocks on top of one another. Suppose you have a dataflow
called Raw Data and a linked table called Transformed Data, which contains a linked
table to the Raw Data dataflow. When the schedule refresh for the Raw Data dataflow
triggers, it will trigger any dataflow that references it upon completion. This functionality
creates a chain effect of refreshes, allowing you to avoid having to schedule dataflows
manually. There are a few nuances to be aware of when dealing with linked tables
refreshes:
A linked table will be triggered by a refresh only if it exists in the same workspace.
A linked table will be locked for editing if a source table is being refreshed or the
refresh of the source table is being canceled. If any of the dataflows in a reference
chain fail to refresh, all the dataflows will roll back to the old data (dataflow
refreshes are transactional within a workspace).
Only referenced tables are refreshed when triggered by a source refresh
completion. To schedule all the tables, you should set a schedule refresh on the
linked table as well. Avoid setting a refresh schedule on linked dataflows to avoid
double refresh.
Cancel Refresh Dataflows support the ability to cancel a refresh, unlike datasets. If a
refresh is running for a long time, you can select More options (the ellipses next to the
dataflow) and then select Cancel refresh.
Incremental Refresh (Premium only) Dataflows can also be set to refresh incrementally.
To do so, select the dataflow you wish to set up for incremental refresh, and then choose
the Incremental Refresh icon.
Setting incremental refresh adds parameters to the dataflow to specify the date range.
For detailed information on how to set up incremental refresh, see Using incremental
refresh with dataflows.
There are some circumstances under which you shouldn't set incremental refresh:
Consume a dataflow
A dataflow can be consumed in the following three ways:
Create a linked table from the dataflow to allow another dataflow author to use
the data.
Create a dataset from the dataflow to allow a user to utilize the data to create
reports.
Create a connection from external tools that can read from the CDM (Common
Data Model) format.
Consume from Power BI Desktop To consume a dataflow, open Power BI Desktop and
select Power BI dataflows in the Get Data dropdown.
7 Note
The Power BI dataflows connector uses a different set of credentials than the
current logged in user. This is by design to support multi-tenant users.
7 Note
You can connect to any dataflow or table regardless of which workspace it resides
in, and whether or not it was defined in a Premium or non-Premium workspace.
If DirectQuery is available, you're prompted to choose whether you want to connect to
the tables through DirectQuery or Import.
In DirectQuery mode, you can quickly interrogate large-scale datasets locally. However,
you can't perform any more transformations.
Using Import brings the data into Power BI, and requires the dataset to be refreshed
independently of the dataflow.
Next steps
The following articles provide more information about dataflows and Power BI:
You can create dataflow workloads in your Power BI Premium subscription. Power BI
uses the concept of workloads to describe Premium content. Workloads include
datasets, paginated reports, dataflows, and AI. The dataflows workload lets you use
dataflows self-service data preparation to ingest, transform, integrate, and enrich data.
Power BI Premium dataflows are managed in the Admin portal.
The following sections describe how to enable dataflows in your organization, how to
refine their settings in your Premium capacity, and guidance for common usage.
After enabling the dataflows workload, it is configured with default settings. You might
want to tweak these settings as you see fit. Next, we'll describe where these settings live,
describe each, and help you understand when you might want to change the values to
optimize your dataflow performance.
Refining dataflow settings in Premium
Once dataflows are enabled, you can use the Admin portal to change, or refine, how
dataflows are created and how they use resources in your Power BI Premium
subscription. Power BI Premium doesn't require memory settings to be changed.
Memory in Power BI Premium is automatically managed by the underlying system. The
following steps show how to adjust your dataflow settings.
1. In the Admin portal, select Tenant settings to list all capacities that have been
created. Select a capacity to manage its settings.
2. Your Power BI Premium capacity reflects the resources available for your dataflows.
You can change your capacity's size by selecting the Change size button, as shown
in the following image.
Premium capacity SKUs - scale up the hardware
Power BI Premium workloads use v-cores to serve fast queries across the various
workload types. Capacities and SKUs includes a chart that illustrates the current
specifications across each of the available workload offerings. Capacities of A3 and
greater can take advantage of the compute engine, so when you want to use the
enhanced compute engine, start there.
7 Note
1. A key concept for slow refresh times is the nature of your data preparation.
Whenever you can optimize your slow refresh times by taking advantage of your
data source actually doing the preparation and performing upfront query logic,
you should do so. Specifically, when using a relational database such as SQL as
your source, see if the initial query can be run on the source, and use that source
query for your initial extraction dataflow for the data source. If you cannot use a
native query in the source system, perform operations that the dataflows engine
can fold to the data source.
2. Evaluate spreading out refresh times on the same capacity. Refresh operations are
a process that requires significant compute. Using our restaurant analogy,
spreading out refresh times is akin to limiting the number of guests in your
restaurant. Just as restaurants will schedule guests and plan for capacity, you also
want to consider refresh operations during times when usage is not at its full peak.
This can go a long way toward alleviating strain on the capacity.
If the steps in this section don't provide the desired degree of parallelism, consider
upgrading your capacity to a higher SKU. Then follow the previous steps in this
sequence again.
1. For ingestion focus on getting the data into the storage as fast as possible, using
filters only if they reduce the overall dataset size. It's best practice to keep your
transformation logic separate from this step, and allow the engine to focus on the
initial gathering of ingredients. Next, separate your transformation and business
logic into a separate dataflow in the same workspace, using linked or computed
entities; doing so allows for the engine to activate and accelerate your
computations. Your logic needs to be prepared separately before it can take
advantage of the compute engine.
2. Ensure you perform the operations that fold, such as merges, joins, conversion,
and others.
2. When you perform your initial refresh with the compute engine turned on, then
data gets written in the lake and in the cache. This double write means these
refreshes will be slower.
3. If you have a dataflow linking to multiple dataflows, make sure you schedule
refreshes of the source dataflows so that they do not all refresh at the same time.
Next steps
The following articles provide more information about dataflows and Power BI:
Data used with Power BI is stored in internal storage provided by Power BI by default.
With the integration of dataflows and Azure Data Lake Storage Gen 2 (ADLS Gen2), you
can store your dataflows in your organization's Azure Data Lake Storage Gen2 account.
This feature essentially allows you to "bring your own storage" to Power BI dataflows,
and establish a connection at the tenant or workspace level.
There are two ways to configure which ADLS Gen 2 store to use: you can use a tenant-
assigned ADLS Gen 2 account, or you can bring your own ADLS Gen 2 store at a
workspace level.
Prerequisites
To bring your own ADLS Gen 2 account, you must have Owner permission at the
storage account layer. Permissions at the resource group or subscription level
won't work. If you're an administrator, you still must assign yourself the Owner
permission. Currently not supporting ADLS Gen2 Storage Accounts behind a
firewall.
The storage account must be created with the Hierarchical Namespace (HNS)
enabled.
The storage account must be created in the same Azure Active Directory (Azure
AD) tenant as the Power BI tenant.
The user must have Storage Blob Data Owner role, Storage Blob Data Reader role,
and an Owner role at the storage account level (scope should be this resource and
not inherited). Any applied role changes might take a few minutes to sync, and
must sync before the following steps can be completed in the Power BI service.
The Power BI workspace tenant region should be the same as the storage account
region.
TLS (Transport Layer Security) version 1.2 (or higher) is required to secure your
endpoints. Web browsers and other client applications that use TLS versions earlier
than TLS 1.2 won't be able to connect.
Finally, you can connect to any ADLS Gen 2 from the Admin portal, but if you
connect directly to a workspace, you must first ensure there are no dataflows in the
workspace before connecting.
7 Note
Bring your own storage (Azure Data Lake Gen 2) is not available in the Power BI
service for U.S. Government GCC customers. For more information about which
features are available, and which are not, see Power BI feature availability for U.S.
Government customers.
The following table describes the permissions for ADLS and for Power BI required for
ADLS Gen 2 and Power BI:
The Use default Azure connection option is visible if admin has already configured a
tenant-assigned ADLS Gen 2 account. You have two options:
Use the tenant configured ADLS Gen 2 account by selecting the box called Use the
default Azure connection, or
Select Connect to Azure to point to a new Azure Storage account.
When you select Connect to Azure, Power BI retrieves a list of Azure subscriptions to
which you have access. Fill in the dropdowns. Then choose a valid Azure subscription,
resource group, and storage account that has the hierarchical namespace option
enabled, which is the ADLS Gen2 flag. The personal account used to connect to Azure is
only used once, to set the initial connection and grant the Power BI service account
rights to read and write data, after which the original user account is no longer needed
to keep the connection active.
After you choose your selected, select Save and you now have successfully connected
the workspace to your own ADLS Gen2 account. Power BI automatically configures the
storage account with the required permissions, and sets up the Power BI filesystem
where the data will be written. At this point, every dataflow’s data inside this workspace
will write directly to this filesystem, which can be used with other Azure services. You
now have a single source for all of your organizational or departmental data.
You can optionally configure tenant-level storage if you want to use a centralized data
lake only, or want this storage to be the default option. We don’t automatically start by
using the default to allow flexibility in your configuration, so you have flexibility to
configure the workspaces that use this connection as you see fit. If you configure a
tenant-assigned ADLS Gen 2 account, you still have to configure each workspace to use
this default option.
The structure of the powerbi container looks like this: <workspace name>/<dataflow
name>/model.json , <workspace name>/<dataflow name>/model.json.snapshots/<all
The following example uses the Orders table of the Northwind Odata sample.
We only write to this storage account and don't currently delete data. So even after
detach, we don’t delete from the ADLS account, so all of the files mentioned in the
preceding list are still stored.
7 Note
The storage structure adheres to the Common Data Model format. Learn more about
the storage structure and CDM by visiting What is the storage structure for analytical
dataflows and Use the Common Data Model to optimize Azure Data Lake Storage Gen2.
After it's properly configured, the data and metadata is in your control. Many
applications are aware of the CDM and the data can be extended by using Azure,
PowerApps, and PowerAutomate. You can also use third-party ecosystems either by
conforming to the format or by reading the raw data.
1. Export a copy of the dataflow from Power BI. Or, copy the model.json file. The
model.json file is stored in ADLS.
3. Detach ADLS.
4. Recreate the dataflows by using import. Incremental refresh data (if applicable) will
need to be deleted prior to import. This action can be done by deleting the
relevant partitions in the model.json file.
Next steps
The following articles provide more information about dataflows and Power BI:
Dataflows are supported for Power BI Pro, Premium Per User (PPU), and Power BI
Premium users. Some features are only available with a Power BI Premium subscription
(which is either a Premium capacity or PPU license). This article describes and details the
PPU and Premium-only features and their uses.
The following features are available only with Power BI Premium (PPU or a Premium
capacity subscription):
7 Note
Dataflows are not available in the Power BI service for U.S. Government DoD
customers. For more information about which features are available, and which are
not, see Power BI feature availability for U.S. Government customers.
) Important
This article applies to the first generation of dataflows (Gen1), and does not apply
to the second generation (Gen2) of dataflows, which are available in Microsoft
Fabric (preview). For more information, see Getting from dataflows Generation 1 to
dataflows Generation 2.
Drastically reduces the refresh time required for long-running ETL (extract,
transform, load) steps over computed entities, such as performing joins, distinct,
filters, and group by.
Performs DirectQuery queries over entities.
7 Note
The validation and refresh processes inform dataflows of the model schema.
To set the schema of the tables yourself, use the Power Query Editor and set
data types.
This feature is available on all Power BI clusters except WABI-INDIA-CENTRAL-
A-PRIMARY
) Important
The enhanced compute engine works only for A3 or larger Power BI capacities.
In Power BI Premium, the enhanced compute engine is individually set for each
dataflow. There are three configurations to choose from:
Disabled
On
To change the default setting and enable the enhanced compute engine, do the
following steps:
1. In your workspace, next to the dataflow you want to change the settings for, select
More options.
4. In the Enhanced compute engine settings, select On and then choose Apply.
Use the enhanced compute engine
After the enhanced compute engine is on, return to dataflows and you should see a
performance improvement in any computed table that performs complex operations,
such as joins or group by operations for dataflows created from existing linked entities
on the same capacity.
To make the best use of the compute engine, split the ETL stage into two separate
dataflows, in the following way:
Dataflow 1 - this dataflow should only be ingesting all of the required from a data
source.
Dataflow 2 - perform all ETL operations in this second dataflow, but ensure you're
referencing Dataflow 1, which should be on the same capacity. Also ensure you
perform operations that can fold first: filter, group by, distinct, join). And perform
these operations before any other operation, to ensure the compute engine is
utilized.
Answer: If you enable the enhanced compute engine, there are two possible
explanations that could lead to slower refresh times:
Answer: The enhanced compute engine is being released in stages to regions around
the world, but isn't yet available in every region.
Question: What are the supported data types for the compute engine?
Answer: The enhanced compute engine and dataflows currently support the following
data types. If your dataflow doesn't use one of the following data types, an error occurs
during refresh:
Date/time
Decimal number
Text
Whole number
Date/time/zone
True/false
Date
Time
Using DirectQuery with dataflows enables the following enhancements to your Power BI
and dataflows processes:
Filtering data - DirectQuery is useful for working on a filtered view of data inside a
dataflow. You can use DirectQuery with the compute engine to filter dataflow data
and work with the filtered subset you need. Filtering data lets you work with a
smaller and more manageable subset of the data in your dataflow.
To learn more about DirectQuery with dataflows, see Using DirectQuery with dataflows.
After you've applied that setting, refresh the dataflow for the optimization to take effect.
Composite/mixed models that have import and DirectQuery data sources are
currently not supported.
Large dataflows might have trouble with timeout issues when viewing
visualizations. Large dataflows that run into trouble with timeout issues should use
Import mode.
Under data source settings, the dataflow connector will show invalid credentials if
you're using DirectQuery. This warning doesn't affect the behavior, and the dataset
will work properly.
Computed entities
You can perform in-storage computations when using dataflows with a Power BI
Premium subscription. This feature lets you perform calculations on your existing
dataflows, and return results that enable you to focus on report creation and analytics.
To perform in-storage computations, you first must create the dataflow and bring data
into that Power BI dataflow storage. After you have a dataflow that contains data, you
can create computed entities, which are entities that perform in-storage computations.
As a best practice, when doing computations on data joined by on-premises and cloud
data, create a new dataflow for each source (one for on-premises and one for cloud)
and then create a third dataflow to merge/compute over these two data sources.
Linked entities
You can reference existing dataflows by using linked entities with a Power BI Premium
subscription, which lets you either perform calculations on these entities using
computed entities or allows you to create a "single source of the truth" table that you
can reuse within multiple dataflows.
Incremental refresh
Dataflows can be set to refresh incrementally to avoid having to pull all the data on
every refresh. To do so, select the dataflow then choose the Incremental Refresh icon.
Setting incremental refresh adds parameters to the dataflow to specify the date range.
For detailed information on how to set up incremental refresh, seeUsing incremental
refresh with dataflows.
Next steps
The following articles provide more information about dataflows and Power BI:
Creating a dataflow
AI with dataflows
This article shows how you can use artificial intelligence (AI) with dataflows. This article
describes:
Cognitive Services
Automated machine learning
Azure Machine Learning Integration
The services that are supported today are Sentiment Analysis, Key Phrase Extraction,
Language Detection, and Image Tagging. The transformations are executed on the
Power BI service and don't require an Azure Cognitive Services subscription. This feature
requires Power BI Premium.
Enable AI features
Cognitive services are supported for Premium capacity nodes EM2, A2, or P1 and other
nodes with more resources. Cognitive services are also available with a Premium Per
User (PPU) license. A separate AI workload on the capacity is used to run cognitive
services. Before you use cognitive services in Power BI, the AI workload needs to be
enabled in the Capacity settings of the Admin portal. You can turn on the AI workload
in the workloads section and define the maximum amount of memory you would like
this workload to consume. The recommended memory limit is 20%. Exceeding this limit
causes the query to slow down.
Get started with Cognitive Services in Power BI
Cognitive Services transforms are part of the Self-Service Data Prep for dataflows . To
enrich your data with Cognitive Services, start by editing a dataflow.
Select the AI Insights button in the top ribbon of the Power Query Editor.
In the pop-up window, select the function you want to use and the data you want to
transform. This example scores the sentiment of a column that contains review text.
LanguageISOCode is an optional input to specify the language of the text. This column
expects an ISO code. You can use a column as input for LanguageISOCode, or you can
use a static column. In this example, the language is specified as English (en) for the
whole column. If you leave this column blank, Power BI automatically detects the
language before applying the function. Next, select Invoke.
After you invoke the function, the result is added as a new column to the table. The
transformation is also added as an applied step in the query.
If the function returns multiple output columns, invoking the function adds a new
column with a row of the multiple output columns.
Use the expand option to add one or both values as columns to your data.
Available functions
This section describes the available functions in Cognitive Services in Power BI.
Detect Language
The language detection function evaluates text input and, for each column, returns the
language name and ISO identifier. This function is useful for data columns that collect
arbitrary text, where language is unknown. The function expects data in text format as
input.
Text Analytics recognizes up to 120 languages. For more information, see What is
language detection in Azure Cognitive Service for Language.
Key phrase extraction works best when you give it bigger chunks of text to work on,
opposite from sentiment analysis. Sentiment analysis performs better on smaller blocks
of text. To get the best results from both operations, consider restructuring the inputs
accordingly.
Score Sentiment
The Score Sentiment function evaluates text input and returns a sentiment score for
each document, ranging from 0 (negative) to 1 (positive). This function is useful for
detecting positive and negative sentiment in social media, customer reviews, and
discussion forums.
Currently, Sentiment Analysis supports English, German, Spanish, and French. Other
languages are in preview. For more information, see What is language detection in
Azure Cognitive Service for Language.
Tag Images
The Tag Images function returns tags based on more than 2,000 recognizable objects,
living beings, scenery, and actions. When tags are ambiguous or not common
knowledge, the output provides 'hints' to clarify the meaning of the tag in context of a
known setting. Tags aren't organized as a taxonomy, and no inheritance hierarchies
exist. A collection of content tags forms the foundation for an image 'description'
displayed as human readable language formatted in complete sentences.
This function requires an image URL or abase-64 column as input. At this time, image
tagging supports English, Spanish, Japanese, Portuguese, and Simplified Chinese. For
more information, see ComputerVision Interface.
Automated machine learning is available for dataflows that are hosted on Power BI
Premium and Embedded capacities only.
Other pages of the generated report show the statistical summary of the model and the
training details. The statistical summary is of interest to users who would like to see the
standard data science measures of model performance. The training details summarize
all the iterations that were run to create your model, with the associated modeling
parameters. It also describes how each input was used to create the ML model.
You can then apply your ML model to your data for scoring. When the dataflow is
refreshed, your data is updated with predictions from your ML model. Power BI also
includes an individualized explanation for each specific prediction that the ML model
produces.
AutoML has specific data requirements for training a machine learning model. These
requirements are described in the following sections, based on respective model types.
To create an AutoML model, select the ML icon in the Actions column of the dataflow
table, and select Add a machine learning model.
A simplified experience launches, consisting of a wizard that guides you through the
process of creating the ML model. The wizard includes the following simple steps.
1. Select the table with the historical data, and choose the outcome column for which
you want a prediction
The outcome column identifies the label attribute for training the ML model, shown in
the following image.
2. Choose a model type
When you specify the outcome column, AutoML analyzes the label data to recommend
the most likely ML model type that can be trained. You can pick a different model type
as shown in the following image by clicking on Choose a model.
7 Note
Some model types might not be supported for the data that you have selected, and
so, it would be disabled. In the previous example, Regression is disabled, because a
text column is selected as outcome column.
3. Select the inputs you want the model to use as predictive signals
AutoML analyzes a sample of the selected table to suggest the inputs that can be used
for training the ML model. Explanations are provided next to columns that aren't
selected. If a particular column has too many distinct values or only one value, or low or
high correlation with the output column, it isn't recommended.
Any inputs that are dependent on the outcome column (or the label column) shouldn't
be used for training the ML model, since they affect its performance. Such columns are
flagged as having “suspiciously high correlation with output column”. Introducing these
columns into the training data causes label leakage, where the model performs well on
the validation or test data but can't match that performance when used in production
for scoring. Label leakage could be a possible concern in AutoML models when training
model performance is too good to be true.
This feature recommendation is based on a sample of a data, so you should review the
inputs used. You can change the selections to include only the columns you want the
model to study. You can also select all the columns by selecting the checkbox next to
the table name.
In the final step, you can name the model, select Save, and choose which begins training
the ML model. You can choose to reduce the training time to see quick results or
increase the amount of time spent in training to get the best model.
ML model training
Training of AutoML models is a part of the dataflow refresh. AutoML first prepares your
data for training. AutoML splits the historical data you provide into training and testing
datasets. The test dataset is a holdout set that is used for validating the model
performance after training. These sets are realized as Training and Testing tables in the
dataflow. AutoML uses cross-validation for the model validation.
Next, each input column is analyzed and imputation is applied, which replaces any
missing values with substituted values. A couple of different imputation strategies are
used by AutoML. For input attributes treated as numeric features, the mean of the
column values is used for imputation. For input attributes treated as categorical features,
AutoML uses the mode of the column values for imputation. The AutoML framework
calculates the mean and mode of values used for imputation on the subsampled
training dataset.
Then, sampling and normalization are applied to your data as required. For classification
models, AutoML runs the input data through stratified sampling and balances the
classes to ensure the row counts are equal for all.
AutoML applies several transformations on each selected input column based on its
data type and statistical properties. AutoML uses these transformations to extract
features for use in training your ML model.
The training process for AutoML models consists of up to 50 iterations with different
modeling algorithms and hyperparameter settings to find the model with the best
performance. Training can end early with lesser iterations if AutoML notices that there's
no performance improvement being observed. AutoML assesses the performance of
each of these models by validating with the holdout test dataset. During this training
step, AutoML creates several pipelines for training and validation of these iterations. The
process of assessing the performance of the models can take time, anywhere from
several minutes, to a couple of hours, up-to the training time configured in the wizard.
The time taken depends on the size of your dataset and the capacity resources available.
In some cases, the final model generated might use ensemble learning, where multiple
models are used to deliver better predictive performance.
After the model has been trained, AutoML analyzes the relationship between the input
features and the model output. It assesses the magnitude of change to the model
output for the holdout test dataset for each input feature. This relationship is known as
the feature importance. This analysis happens as a part of the refresh after training is
complete. Hence your refresh might take longer than the training time configured in the
wizard.
The charts and measures used to describe the model performance in the report depend
on the model type. These performance charts and measures are described in the
following sections.
Other pages in the report might describe statistical measures about the model from a
data science perspective. For instance, the Binary Prediction report includes a gain chart
and the ROC curve for the model.
The reports also include a Training Details page that includes a description of how the
model was trained, and a chart describing the model performance over each of the
iterations run.
Another section on this page describes the detected type of the input column and
imputation method used for filling missing values. It also includes the parameters used
by the final model.
If the model produced uses ensemble learning, then the Training Details page also
includes a chart showing the weight of each constituent model in the ensemble and its
parameters.
Applying the ML model creates two new dataflow tables that contain the predictions
and individualized explanations for each row that it scores in the output table. For
instance, if you apply the PurchaseIntent model to the OnlineShoppers table, the output
generates the OnlineShoppers enriched PurchaseIntent and OnlineShoppers enriched
PurchaseIntent explanations tables. For each row in the enriched table, The
Explanations is broken down into multiple rows in the enriched explanations table
based on the input feature. An ExplanationIndex helps map the rows from the enriched
explanations table to the row in enriched table.
You can also apply any Power BI AutoML model to tables in any dataflow in the same
workspace by using AI Insights in the PQO function browser. This way, you can use
models created by others in the same workspace without necessarily being an owner of
the dataflow that has the model. Power Query discovers all the Power BI ML models in
the workspace and exposes them as dynamic Power Query functions. You can invoke
those functions by accessing them from the ribbon in Power Query Editor or by invoking
the M function directly. This functionality is currently only supported for Power BI
dataflows and for Power Query Online in the Power BI service. This process is different
from applying ML models within a dataflow using the AutoML wizard. There's no
explanations table created by using this method. Unless you're the owner of the
dataflow, you can't access model training reports or retrain the model. Also, if the
source model is edited by adding or removing input columns or the model or source
dataflow is deleted,then this dependent dataflow would break.
After you apply the model, AutoML always keeps your predictions up-to-date whenever
the dataflow is refreshed.
To use the insights and predictions from the ML model in a Power BI report, you can
connect to the output table from Power BI Desktop by using the dataflows connector.
The output of a Binary Prediction model is a probability score, which identifies the
likelihood that the target outcome will be achieved.
Pre-requisites:
The process of creation for a Binary Prediction model follows the same steps as other
AutoML models, described in the previous section, Configure the ML model inputs. The
only difference is in the Choose a model step where you can select the target outcome
value that you’re most interested in. You can also provide friendly labels for the
outcomes to be used in the automatically generated report that summarizes the results
of the model validation.
The Binary Prediction model produces as an output a probability that a row will achieve
the target outcome. The report includes a slicer for the probability threshold, which
influences how the scores greater and less than the probability threshold are
interpreted.
The report describes the performance of the model in terms of True Positives, False
Positives, True Negatives, and False Negatives. True Positives and True Negatives are
correctly predicted outcomes for the two classes in the outcome data. False Positives are
rows that were predicted to have Target outcome but actually didn't. Conversely, False
Negatives are rows that had target outcomes but were predicted as not them.
Measures, such as Precision and Recall, describe the effect of the probability threshold
on the predicted outcomes. You can use the probability threshold slicer to select a
threshold that achieves a balanced compromise between Precision and Recall.
The report also includes a Cost-Benefit analysis tool to help identify the subset of the
population that should be targeted to yield the highest profit. Given an estimated unit
cost of targeting and a unit benefit from achieving a target outcome, Cost-Benefit
analysis attempts to maximize profit. You can use this tool to pick your probability
threshold based on the maximum point in the graph to maximize profit. You can also
use the graph to compute the profit or cost for your choice of probability threshold.
The Accuracy Report page of the model report includes the Cumulative Gains chart and
the ROC curve for the model. This data provides statistical measures of the model
performance. The reports include descriptions of the charts shown.
Apply a Binary Prediction model
To apply a Binary Prediction model, you must specify the table with the data to which
you want to apply the predictions from the ML model. Other parameters include the
output column name prefix and the probability threshold for classifying the predicted
outcome.
When a Binary Prediction model is applied, it adds four output columns to the enriched
output table: Outcome, PredictionScore, PredictionExplanation, and ExplanationIndex.
The column names in the table have the prefix specified when the model is applied.
The Outcome column contains the predicted outcome label. Records with probabilities
exceeding the threshold are predicted as likely to achieve the target outcome and are
labeled as True. Records less than the threshold are predicted as unlikely to achieve the
outcome and are labeled as False.
Classification models
Classification models are used to classify a dataset into multiple groups or classes.
They're used to predict events that can have one of the multiple possible outcomes. For
instance, whether a customer is likely to have a high, medium, or low Lifetime Value.
They can also predict whether the risk of default is high, moderate, low, and so on.
The output of a Classification model is a probability score, which identifies the likelihood
that a row will achieve the criteria for a given class.
The input table containing your training data for a classification model must have a
string or whole number column as the outcome column, which identifies the past known
outcomes.
Pre-requisites:
The process of creation for a classification model follows the same steps as other
AutoML models, described in the previous section, Configure the ML model inputs.
The model report includes a chart that includes the breakdown of the correctly and
incorrectly classified rows for each known class.
A further class-specific drill-down action enables an analysis of how the predictions for a
known class are distributed. This analysis shows the other classes in which rows of that
known class are likely to be misclassified.
The model explanation in the report also includes the top predictors for each class.
The classification model report also includes a Training Details page similar to the pages
for other model types, as described earlier, in AutoML model report.
When a classification model is applied, it adds five output columns to the enriched
output table: ClassificationScore, ClassificationResult, ClassificationExplanation,
ClassProbabilities, and ExplanationIndex. The column names in the table have the
prefix specified when the model is applied.
The ClassProbabilities column contains the list of probability scores for the row for each
possible class.
The ClassificationScore is the percentage probability, which identifies the likelihood that
a row will achieve the criteria for a given class.
The ClassificationResult column contains the most likely predicted class for the row.
Regression models
Regression models are used to predict a numeric value and can be used in scenarios like
determining:
Pre-requisites:
The process of creation for a Regression model follows the same steps as other AutoML
models, described in the previous section, Configure the ML model inputs.
The model report includes a chart that compares the predicted values to the actual
values. In this chart, the distance from the diagonal indicates the error in the prediction.
The residual error chart shows the distribution of the percentage of average error for
different values in the holdout test dataset. The horizontal axis represents the mean of
the actual value for the group. The size of the bubble shows the frequency or count of
values in that range. The vertical axis is the average residual error.
The Regression model report also includes a Training Details page like the reports for
other model types, as described in the previous section, AutoML model report.
When a Regression model is applied, it adds three output columns to the enriched
output table: RegressionResult, RegressionExplanation, and ExplanationIndex. The
column names in the table have the prefix specified when the model is applied.
The RegressionResult column contains the predicted value for the row based on the
input columns. The RegressionExplanation column contains an explanation with the
specific influence that the input features had on the RegressionResult.
To use this capability, a data scientist can grant access to the Azure Machine Learning
model to the BI analyst by using the Azure portal. Then, at the start of each session,
Power Query discovers all the Azure Machine Learning models to which the user has
access and exposes them as dynamic Power Query functions. The user can then invoke
those functions by accessing them from the ribbon in Power Query Editor, or by
invoking the M function directly. Power BI also automatically batches the access
requests when invoking the Azure Machine Learning model for a set of rows to achieve
better performance.
This functionality is currently only supported for Power BI dataflows and for Power
Query online in the Power BI service.
To learn more about dataflows, see Introduction to dataflows and self-service data prep.
The steps in this article describe how to grant a Power BI user access to a model hosted
on the Azure Machine Learning service to access this model as a Power Query function.
For more information, see Assign Azure roles using the Azure portal.
2. Go to the Subscriptions page. You can find the Subscriptions page through the All
Services list in the nav pane menu of the Azure portal.
3. Select your subscription.
4. Select Access Control (IAM), and then choose the Add button.
5. Select Reader as the Role. Then choose the Power BI user to whom you wish to
grant access to the Azure Machine Learning model.
6. Select Save.
7. Repeat steps three through six to grant Reader access to the user for the specific
machine learning workspace hosting the model.
This schema file must be included in the deployed web service for machine learning
models. To automatically generate the schema for web service, you must provide a
sample of the input/output in the entry script for the deployed model. For more
information, see Deploy and score a machine learning model by using an online
endpoint. The link includes the example entry script with the statements for the schema
generation.
These instructions for schema generation by updating the entry script must also be
applied to models created by using automated machine learning experiments with the
Azure Machine Learning SDK.
7 Note
Models created by using the Azure Machine Learning visual interface don't
currently support schema generation but will in subsequent releases.
Selecting the Edit Table button opens the Power Query Editor for the tables in your
dataflow.
Select the AI Insights button in the ribbon, and then select the Azure Machine Learning
Models folder from the nav pane menu. All the Azure Machine Learning models to which
you have access are listed here as Power Query functions. Also, the input parameters for
the Azure Machine Learning model are automatically mapped as parameters of the
corresponding Power Query function.
To invoke an Azure Machine Learning model, you can specify any of the selected table's
columns as an input from the drop-down. You can also specify a constant value to be
used as an input by toggling the column icon to the left of the input dialog.
Select Invoke to view the preview of the Azure Machine Learning model's output as a
new column in the table. The model invocation shows up as an applied step for the
query.
If the model returns multiple output parameters, they're grouped together as a row in
the output column. You can expand the column to produce individual output
parameters in separate columns.
After you save your dataflow, the model is automatically invoked when the dataflow is
refreshed, for any new or updated rows in the table.
Next steps
This article provided an overview of Automated Machine Learning for Dataflows in the
Power BI service. The following articles might also be useful.
The following articles provide more information about dataflows and Power BI:
Power Query Tips and tricks to get the most of your data Best practices when working
wrangling experience with Power Query
Using computed Performance benefits for using computed Computed tables scenarios
tables tables in a dataflow
Reusing Patterns, guidance, and use cases Best practices for reusing
dataflows dataflows across
environments and
workspaces
Large-scale Large-scale use and guidance to complement Best practices for creating a
implementations enterprise architecture dimensional model using
dataflows
Using enhanced Potentially improve dataflow performance up Using the compute engine
compute to 25x to improve performance
Topic Guidance area Link to article or content
Optimizing your Get the most out of your dataflows Configure Power BI Premium
workload infrastructure by understanding the levers you dataflow workloads
settings can pull to maximize performance
Query folding Speeding up transformations using the source Power Query query folding
guidance system
Using data Understand column quality, distribution, and Using the data profiling
profiling profile tools
Incremental Load the latest or changed data versus a full Using incremental refresh
refresh reload with dataflows
Next steps
The following articles provide more information about dataflows and Power BI:
Power BI dataflows enable you to connect to, transform, combine, and distribute data
for downstream analytics. A key element in dataflows is the refresh process, which
applies the transformation steps you authored in the dataflows and updates the data in
the items themselves.
To understand run times, performance, and whether you're getting the most out of your
dataflow, you can download the refresh history after you refresh a dataflow.
Understand refreshes
There are two types of refreshes applicable to dataflows:
7 Note
To learn more about incremental refresh and how it works, see Using
incremental refresh with dataflows.
Incremental refresh enables large dataflows in Power BI with the following benefits:
Refreshes are faster after the first refresh, due to the following facts:
Power BI refreshes the last N partitions specified by the user (where partition is
day/week/month, and so on), or
Power BI refreshes only data that needs to be refreshed. For example, refreshing
only the last five days of a 10-year dataset.
Power BI only refreshes data that has changed, as long as you specify the
column you want to check for changes.
In any of these refresh scenarios, if a refresh fails, the data doesn't update. Your data
might be stale until the latest refresh completes, or you can refresh it manually and it
can then complete without error. Refresh occurs at a partition or entity, so if an
incremental refresh fails, or an entity has an error, then the entire refresh transaction
doesn't occur. Said another way, if a partition (incremental refresh policy) or entity fails
for a dataflow, the entire refresh operation fails, and no data gets updated.
The Refresh History provides an overview of refreshes, including the type – on demand
or scheduled, the duration, and the run status. To see details in the form of a CSV file,
select the download icon on the far right of the refresh description's row. The
downloaded CSV includes the attributes described in the following table. Premium
refreshes provide more information based on the extra compute and dataflows
capabilities, versus Pro based dataflows that reside on shared capacity. As such, some of
the following metrics are available only in Premium.
Requested Time refresh was scheduled or refresh now was clicked, in local ✔ ✔
on time.
Start time In Premium, this item is the time the dataflow was queued up for ✔ ✔
processing for the entity or partition. This time can differ if
dataflows have dependencies and need to wait for the result set
of an upstream dataflow to begin processing.
End time End time is the time the dataflow entity or partition completed, if ✔ ✔
applicable.
Max Max Commit is the peak commit memory useful for diagnosing ✔
commit out-of-memory failures when the M query isn't optimized.
(KB)
When you use a gateway on this particular dataflow, this
information isn't provided.
Wait time For a given entity or partition, the time that an entity spent in wait ✔
status, based on workload on the Premium capacity.
Orchestration
Using dataflows in the same workspace allows straightforward orchestration. As an
example, you might have dataflows A, B and C in a single workspace, and chaining like A
> B > C. If you refresh the source (A), the downstream entities also get refreshed.
However, if you refresh C, then you have to refresh others independently. Also, if you
add a new data source in dataflow B (which isn't included in A) that data isn't refreshed
as a part of orchestration.
You might want to chain items together that don't fit the managed orchestration Power
BI performs. In these scenarios, you can use the APIs and/or use Power Automate. You
can refer to the API documentation and the PowerShell script for programmatic
refresh. There's a Power Automate connector that enables doing this procedure without
writing any code. You can see detailed samples, with specific walk-throughs for
sequential refreshes.
Monitoring
Using the enhanced refresh statistics described earlier in this article, you can get
detailed per-dataflow refresh information. But if you would like to see dataflows with
tenant-wide or workspace-wide overview of refreshes, perhaps to build a monitoring
dashboard, you can use the APIs or PowerAutomate templates. Similarly, for use cases
such as sending simple or complex notifications, you can use the PowerAutomate
connector or build your own custom application by using the APIs.
Timeout errors
Optimizing the time it takes to perform extract, transform, and load (ETL) scenarios is
ideal. In Power BI, the following cases apply:
Some connectors have explicit timeout settings you can configure. For more
information, see Connectors in Power Query.
Power BI dataflows, using Power BI Pro, can also experience timeouts for long
running queries within an entity or dataflows themselves. That limitation doesn't
exist in Power BI Premium workspaces.
Timeout guidance
For example, if you have a dataflow with three tables, no individual table can take more
than two hours and the entire dataflow times out if the duration exceeds three hours.
If you're experiencing timeouts, consider optimizing your dataflow queries, and consider
using query folding on your source systems.
Separately, consider upgrading to Premium Per User, which isn't subject to these time-
outs and offers increased performance due to many Power BI Premium Per User
features.
Long durations
Complex or large dataflows can take more time to refresh, as can poorly optimized
dataflows. The following sections provide guidance on how to mitigate long refresh
durations.
Use linked entities for data that can be used later in other transformations.
Use computed entities to cache data, reducing data loading and data ingestion
burden on source systems.
Split data into staging dataflows and transformation dataflows, separating the ETL
into different dataflows.
Optimize expanding table operations.
Follow guidance for complex dataflows.
Next, it can help to evaluate whether you can use incremental refresh.
Using incremental refresh can improve performance. It's important that the partition
filters are pushed to the source system when queries are submitted for refresh
operations. To push filtering down means the data source should support query folding,
or you can express business logic through a function or other means that can help
Power Query eliminate and filter files or folders. Most data sources that support SQL
queries support query folding, and some OData feeds can also support filtering.
However, data sources like flat files, blobs, and APIs typically don't support filtering. In
cases where the data source back-end doesn't support the filter, it can't be pushed
down. In such cases, the mash-up engine compensates and applies the filter locally,
which might require retrieving the full dataset from the data source. This operation can
cause incremental refresh to be slow, and the process can run out of resources either in
the Power BI service or in the on-premises data gateway, if used.
Given the various levels of query folding support for each data source, you should
perform verification to ensure the filter logic is included in the source queries. To make
this easier, Power BI attempts to perform this verification for you, with step folding
indicators for Power Query Online . Many of these optimizations are design-time
experiences, but after a refresh occurs, you have an opportunity to analyze and optimize
your refresh performance.
Finally, consider optimizing your environment. You can optimize the Power BI
environment by scaling up your capacity, right-sizing data gateways, and reducing
network latency with the following optimizations:
When using capacities available with Power BI Premium or Premium Per User, you
can increase performance by increasing your Premium instance, or assigning the
content to a different capacity.
A gateway is required whenever Power BI needs to access data that isn't available
directly over the Internet. You can install the on-premises data gateway on an on-
premises server, or on a virtual machine.
To understand gateway workloads and sizing recommendations, see On-
premises data gateway sizing.
Also evaluate bringing the data first into a staging dataflow, and referencing it
downstream by using linked and computed entities.
Network latency can affect refresh performance by increasing the time required for
requests to reach the Power BI service, and for responses to be delivered. Tenants
in Power BI are assigned to a specific region. To determine where your tenant is
located, see Find the default region for your organization. When users from a
tenant access the Power BI service, their requests always route to that region. As
requests reach the Power BI service, the service might then send extra requests, for
example, to the underlying data source, or a data gateway—which are also subject
to network latency.
Tools such as Azure Speed Test provide an indication of network latency
between the client and the Azure region. In general, to minimize the impact of
network latency, strive to keep data sources, gateways, and your Power BI
cluster as close as possible. Residing in the same region is preferable. If network
latency is an issue, try locating gateways and data sources closer to your Power
BI cluster by placing them inside cloud-hosted virtual machines.
First, use query folding within the data source itself, which should reduce the load on
the dataflow compute engine directly. Query folding within the data source allows the
source system to do most of the work. The dataflow can then pass through queries in
the native language of the source, rather than having to perform all the computations in
memory after the initial query.
Not all data sources can perform query folding, and even when query folding is possible
there might be dataflows that perform certain transformations that can't fold to the
source. In such cases, the enhanced compute engine is a capability introduced by Power
BI to potentially improve performance by up to 25 times, for transformations specifically.
The following sections provide guidance about using the compute engine, and its
statistics.
2 Warning
During design time the folding indicator in the editor might show that the query
does not fold when consuming data from another dataflow. Check the source
dataflow if enhanced compute is enabled to ensure folding on the source dataflow
is enabled.
Turning on the enhanced compute engine and understanding the various statuses is
helpful. Internally, the enhanced compute engine uses an SQL database to read and
store data. It's best to have your transformations execute against the query engine here.
The following paragraphs provide various situations, and guidance about what to do for
each.
NA - This status means that the compute engine wasn't used, either because:
If you're experiencing long durations and still get a status of NA, make sure that it's
turned on and not accidentally turned off. One recommended pattern is to use staging
dataflows to initially get your data into the Power BI service, then build dataflows on top
of this data, after it is in a staging dataflow. That pattern can reduce load on source
systems and, together with the compute engine, provide a speed boost for
transformations and improve performance.
Cached - If you see the cached status, the dataflow data was stored in the compute
engine and available to be referenced as part of another query. This situation is ideal if
you're using it as a linked entity, because the compute engine caches that data for use
downstream. The cached data doesn't need to be refreshed multiple times in the same
dataflow. This situation is also potentially ideal if you want to use it for DirectQuery.
When cached, the performance impact on initial ingestion pays off later, in the same
dataflow or in a different dataflow in the same workspace.
If you have a large duration for the entity, consider turning off the compute engine. To
cache the entity, Power BI writes it to storage and to SQL. If it's a single-use entity, the
performance benefit for users might not be worth the penalty of the double-ingestion.
Folded - Folded means that the dataflow was able to use SQL compute to read data.
The calculated entity used the table from SQL to read data, and the SQL used is related
to the constructs of their query.
Folded status appears if, when you're using on-premises or cloud data sources, you first
loaded data into a staging dataflow and referenced that in this dataflow. This status
applies only to entities that reference another entity. It means your queries were run on
top of the SQL engine, and they have the potential to be improved with SQL compute.
To ensure the SQL engine processes your transformations, use transformations that
support SQL folding, such as merge (join), group by (aggregation), and append (union)
actions in the Query Editor.
Cached + Folded - When you see cached + folded, it's likely that the data refresh is
optimized, as you have an entity that both references another entity and is referred to
by another entity upstream. This operation also runs on top of the SQL and, as such,
also has the potential for improvement with SQL compute. To be sure you're getting the
best performance possible, use transformations that support SQL folding, like merge
(join), group by (aggregation), and append (union) actions in the Query Editor.
For ingestion, focus on getting the data into the storage as fast as possible, use filters
only if they reduce the overall dataset size. Keep your transformation logic separate
from this step. Next, separate your transformation and business logic into a separate
dataflow in the same workspace. Use linked or computed entities. Doing so allows for
the engine to activate and accelerate your computations. For a simple analogy, it's like
food preparation in a kitchen: food preparation is typically a separate and distinct step
from gathering your raw ingredients, and a pre-requisite for putting the food in the
oven. Similarly, you need to prepare your logic separately before it can take advantage
of the compute engine.
Ensure you perform the operations that fold, such as merges, joins, conversion, and
others.
Take the following steps when investigating scenarios where the compute engine is on,
but you're seeing poor performance:
Limit computed and linked entities that exist across the workspace.
If your initial refresh is with the compute engine turned on, data gets written in the
lake and in the cache. This double-write results in refreshes being slower.
If you have a dataflow linking to multiple dataflows, make sure you schedule
refreshes of the source dataflows so that they don't all refresh at the same time.
Using DirectQuery with Power BI dataflows lets you connect directly to a dataflow
without the need to import the data into a dataset. There are many reasons why using
DirectQuery with dataflows, rather than importing data, is useful and helpful. The
following are a few examples:
Configurations
To use DirectQuery with dataflows, you must explicitly toggle the enhanced compute
engine to On in dataflow settings. You must then refresh the dataflow before it can be
consumed in DirectQuery mode.
1. Navigate to the Premium dataflow, and set enhanced compute engine to On.
2. Refresh the dataflow.
After you complete the steps, the dataflow is accessible in Power BI Desktop with
DirectQuery mode.
Consumption
When DirectQuery is available for a dataflow, connecting to a dataflow by using the
Dataflows connector prompts you to choose whether to connect to tables through
DirectQuery or Import.
Dataflow entities that support DirectQuery display the View icon in Power BI Desktop,
rather than the Table icon. The View icon appears as two boxes overlaid on each other,
the Table icon is a single table with a grid.
The following image shows the View icon, indicating that the Orders table supports
DirectQuery:
This image shows the Table icon, indicating that the Query table only supports import:
In DirectQuery mode, you can quickly interrogate large-scale datasets locally. However,
you can't currently perform any other transformations.
Next steps
This article provided an overview of using DirectQuery with dataflows. The following
articles might also be useful:
DirectQuery in Power BI
DirectQuery model guidance in Power BI Desktop
The following articles provide more information about dataflows and Power BI:
There are a few dataflow limitations across authoring, refreshes, and capacity
management that users should keep in mind, as described in the following sections.
General limitations
Dataflows might not be available for all U.S. Government DoD customers. Feature
parity across government environments can be found in the Power BI feature
availability for government article.
Deleted datasources aren't removed from the dataflow datasource page, which is a
benign behavior and doesn't affect the refresh or editing of dataflows. In Lineage
View, deleted data sources appear as lineage for a dataflow.
Deleted datasources still appear in the Setting page in the gateway drop-down.
Depth equates to dataflows linked to other dataflows. The current maximum depth
is 32.
Breadth equates to entities within a dataflow.
There's no guidance or limits for the optimal number of entities is in a dataflow,
however, shared dataflows have a refresh limit of two hours per entity, and
three per dataflow. So if you have two entities, and each takes two hours, you
shouldn't put them in the same dataflow.
For Power BI Premium, guidance and limits are based on individual use cases
rather than specific requirements. The only limit for Power BI Premium is a 24-
hour refresh per dataflow.
A Power BI Premium subscription is required in order to refresh more than 10
dataflows cross workspace.
PowerQuery limitations are found in the Power Query Online Limits article.
Power BI dataflows don't support use of global variables in a URL argument.
Multi-Geo is currently not supported unless configuring storage to use your own
Azure Data Lake Gen2 storage account.
Vnet support is achieved by using a gateway.
When you use Computed entities with gateway data sources, the data ingestion
should be performed in different data sources than the computations. The
computed entities should build upon entities that are only used for ingestion, and
not ingest data within their own mash-up steps.
In Power BI dataflows, you can use parameters but you can't edit them unless you
edit the entire dataflow. In this regard, parameters in dataflows behave similar to
declared constants.
Dataflow authoring
When you author dataflows, be mindful of the following considerations:
Authoring in dataflows is done in the Power Query Online (PQO) environment; see
the limitations described in Power Query limits. Because dataflows authoring is
done in the Power Query Online (PQO) environment, updates performed on the
dataflows workload configurations only affects refreshes, and don't have an effect
on the authoring experience.
Dataflows using gateway data sources don't support multiple credentials for the
same data source.
API considerations
More about supported dataflows REST APIs can be found in the REST API reference.
Here are some considerations to keep in mind:
Importing dataflows that contain linked tables doesn't update the existing
references within the dataflow (these queries should be updated manually before
importing the dataflow).
When you deploy a dataflow, you can use the conflict handlers
GenerateUniqueName and Abort parameters to either abort the operation when it
already exists or instruct the API to automatically create a unique name instead.
Dataflows can be overwritten with the CreateOrOverwrite parameter, if they have
initially been created using the import API.
When a dataflow is refreshed, timeouts in a shared capacity are 2 hours per table,
and 3 hours per dataflow.
Linked tables can't be created in shared dataflows, although they can exist within
the dataflow as long as the Load Enabled property on the query is disabled.
Computed tables can't be created in shared dataflows.
AutoML and Cognitive services aren't available in shared dataflows.
Incremental refresh doesn't work in shared dataflows.
Dataflows in Premium
Dataflows that exist in Premium have the following considerations and limitations.
When refreshing dataflows, timeouts are 24 hours (no distinction for tables and/or
dataflows).
When using a Premium Per User (PPU) license with dataflows, data is cleared when
moving data out of a PPU environment.
When a dataflow is refreshed in a Premium Per User (PPU) context, the data isn't
visible to non-PPU users.
Incremental refresh works with dataflows only when the enhanced compute engine
is enabled.
A linked table can't be joined with a regular table that gets its data from an on-
premises data source.
When a query (query A, for example) is used in the calculation of another query
(query B) in dataflows, query B becomes a calculated table. Calculated tables can't
refer to on-premises sources.
Compute Engine:
While using the Compute engine, there's an approximate 10% to 20% initial
increase in time for data ingestion.
This only applied to the first dataflow that is on the compute engine, and reads
data from the data source.
Subsequent dataflows that use the source dataflow doesn't incur the same
penalty.
Only certain operations make use of the compute engine, and only when used
through a linked table or as a computed table. A full list of operations is available
in this blog post .
Capacity Management:
The approximate number of containers can be found out by dividing the total
memory allocated to the workload by the amount of memory allocated to a
container.
1. Failing to ensure those credentials are the same results in a Key not found
error upon dataset refresh
7 Note
If the dataflow structure is changed, such as a new or renamed column, the dataset
will not show the change, and the change may also cause a data refresh to fail in
the Power BI service for the dataset, until refreshed in Power BI Desktop and re-
published.
You can only create one cloud connection of a particular path and type, for
example, you could only create one SQL plus server/database cloud connection.
You can create multiple gateway connections.
You can't name or rename cloud data sources; you can name or rename gateway
connections.
ADLS limitations
ADLS isn't available in GCC, GCC High, or DOD environments. For more
information, see Power BI for US government customers.
You must be assigned as an owner of the resource, due to changes in the ADLS
Gen 2 APIs.
Azure subscription migration isn't supported, but there are two alternatives to do
so:
First approach: after migration, the user can detach workspaces and reattach
them. If using the tenant level account, you must detach all workspaces then
detach at the tenant level, and reattach. This can be undesirable for customers
who don't want to delete all of their dataflows, or have many workspaces.
Second approach: if the previous approach isn't feasible, submit a support
request to change the subscription ID in the database.
ADLS doesn't support most elements in the list in the Directories and file names
section of the article for workspace naming and dataflow naming, due to the
following limitations:
Power BI either returns an unhelpful error, or allows the process to happen but
the refresh will fail.
Cross tenant ADLS subscriptions aren't supported. The ADLS attached to Power BI
must be part of the same Azure tenant that Power BI uses for Azure Active
Directory (Azure AD).
Time Time
Date Date
DateTime DateTime
DateTimeZone DateTimeOffset
Logical Boolean
Text String
Any String
Currency Decimal
Int8 Int64
Int16 Int64
Int32 Int64
Int64 Int64
Double Double
Percentage Double
Single Double
Decimal Double
Number Double
Next steps
The following articles provide more information about dataflows and Power BI:
Organizations want to work with data as it comes in, not days or weeks later. The vision
of Power BI is simple: the distinctions between batch, real-time, and streaming should
disappear. Users should be able to work with all data as soon as it's available.
) Important
Streaming dataflows has been retired, and is no longer available. Azure Stream
Analytics has merged the functionality of streaming dataflows. For more
information about the retirement of streaming dataflows, see the retirement
announcement .
Analysts usually need technical help to deal with streaming data sources, data
preparation, complex time-based operations, and real-time data visualization. IT
departments often rely on custom-built systems, and a combination of technologies
from various vendors, to perform timely analyses on the data. Without this complexity,
they can't provide decision makers with information in near real time.
Streaming dataflows allow authors to connect to, ingest, mash up, model, and build
reports based on streaming in near real-time data directly in the Power BI service. The
service enables drag-and-drop, no-code experiences.
You can mix and match streaming data with batch data if you need to through a user
interface (UI) that includes a diagram view for easy data mashup. The final item
produced is a dataflow, which can be consumed in real time to create highly interactive,
near real-time reporting. All of the data visualization capabilities in Power BI work with
streaming data, just as they do with batch data.
Users can perform data preparation operations like joins and filters. They can also
perform time-window aggregations (such as tumbling, hopping, and session windows)
for group-by operations.
Make confident decisions in near real time. Organizations can be more agile and
take meaningful actions based on the most up-to-date insights.
Democratize streaming data. Organizations can make data more accessible and
easier to interpret with a no-code solution, and this accessibility reduces IT
resources.
Accelerate time to insight by using an end-to-end streaming analytics solution
with integrated data storage and business intelligence.
Requirements
Before you create your first streaming dataflow, make sure that you meet all the
following requirements:
To create and run a streaming dataflow, you need a workspace that's part of a
Premium capacity or Premium Per User (PPU) license.
) Important
If you're using a PPU license and you want other users to consume reports
created with streaming dataflows that are updated in real time, they'll also
need a PPU license. They can then consume the reports with the same refresh
frequency that you set up, if that refresh is faster than every 30 minutes.
Enable dataflows for your tenant. For more information, see Enabling dataflows in
Power BI Premium.
To make sure streaming dataflows work in your Premium capacity, the enhanced
compute engine needs to be turned on. The engine is turned on by default, but
Power BI capacity admins can turn it off. If so, contact your admin to turn it on.
To create reports that are updated in real time, make sure that your admin
(capacity or Power BI for PPU) has enabled automatic page refresh. Also make sure
that the admin has allowed a minimum refresh interval that matches your needs.
For more information, see Automatic page refresh in Power BI.
You can add and edit tables in your streaming dataflow directly from the workspace in
which your dataflow was created. The main difference with regular dataflows is that you
don't need to worry about refreshes or frequency. Because of the nature of streaming
data, there's a continuous stream coming in. The refresh is constant or infinite unless
you stop it.
7 Note
You can have only one type of dataflow per workspace. If you already have a
regular dataflow in your Premium workspace, you won't be able to create a
streaming dataflow (and vice versa).
2. Select the New dropdown menu, and then choose Streaming dataflow.
3. On the side pane that opens, you must name your streaming dataflow. Enter a
name in the Name box (1), and then select Create (2).
The empty diagram view for streaming dataflows appears.
The following screenshot shows a finished dataflow. It highlights all the sections
available to you for authoring in the streaming dataflow UI.
1. Ribbon: On the ribbon, sections follow the order of a "classic" analytics process:
inputs (also known as data sources), transformations (streaming ETL operations),
outputs, and a button to save your progress.
2. Diagram view: This view is a graphical representation of your dataflow, from inputs
to operations to outputs.
3. Side pane: Depending on which component you select in the diagram view, you
have settings to modify each input, transformation, or output.
4. Tabs for data preview, authoring errors, and runtime errors: For each card shown,
the data preview shows you results for that step (live for inputs and on-demand for
transformations and outputs).
This section also summarizes any authoring errors or warnings that you might have
in your dataflows. Selecting each error or warning selects that transform. In
addition, you have access to runtime errors after the dataflow is running, such as
dropped messages.
You can always minimize this section of streaming dataflows by selecting the arrow
in the upper-right corner.
The Azure Event Hubs and Azure IoT Hub services are built on a common architecture to
facilitate the fast and scalable ingestion and consumption of events. IoT Hub in
particular is tailored as a central message hub for communications in both directions
between an IoT application and its attached devices.
To configure an event hub as an input for streaming dataflows, select the Event Hub
icon. A card appears in the diagram view, including a side pane for its configuration.
You have the option of pasting the Event Hubs connection string. Streaming dataflows
fill out all the necessary information, including the optional consumer group (which by
default is $Default). If you want to enter all fields manually, you can turn on the manual-
entry toggle to show them. To learn more, see Get an Event Hubs connection string.
After you set up your Event Hubs credentials and select Connect, you can add fields
manually by using + Add field if you know the field names. Alternatively, you can detect
fields and data types automatically based on a sample of the incoming messages, select
Autodetect fields. Selecting the gear icon allows you to edit the credentials if needed.
When streaming dataflows detect the fields, you can see them in the list. There's also a
live preview of the incoming messages in the Data Preview table under the diagram
view.
You can always edit the field names, or remove or change the data type, by selecting
more options (...) next to each field. You can also expand, select, and edit any nested
fields from the incoming messages, as shown in the following image.
Azure IoT Hub
IoT Hub is a managed service hosted in the cloud. It acts as a central message hub for
communications in both directions between an IoT application and its attached devices.
You can connect millions of devices and their back-end solutions reliably and securely.
Almost any device can be connected to an IoT hub.
IoT Hub configuration is similar to Event Hubs configuration because of their common
architecture. But there are some differences, including where to find the Event Hubs-
compatible connection string for the built-in endpoint. To learn more, see Read device-
to-cloud messages from the built-in endpoint.
After you paste the connection string for the built-in endpoint, all functionality for
selecting, adding, autodetecting, and editing fields coming in from IoT Hub is the same
as in Event Hubs. You can also edit the credentials by selecting the gear icon.
Tip
If you have access to Event Hubs or IoT Hub in your organization's Azure portal,
and you want to use it as an input for your streaming dataflow, you can find the
connection strings in the following locations:
1. In the Internet of Things section, select All Services > IoT Hubs.
2. Select the IoT hub that you want to connect to, and then select Built-in
endpoints.
3. Select Copy to clipboard next to the Event Hubs-compatible endpoint.
When you use stream data from Event Hubs or IoT Hub, you have access to the
following metadata time fields in your streaming dataflow:
EventProcessedUtcTime: The date and time that the event was processed.
EventEnqueuedUtcTime: The date and time that the event was received.
Neither of these fields appear in the input preview. You need to add them manually.
Blob storage
Azure Blob storage is Microsoft's object storage solution for the cloud. Blob storage is
optimized for storing massive amounts of unstructured data. Unstructured data is data
that doesn't adhere to a particular data model or definition, such as text or binary data.
You can use Azure Blobs as a streaming or reference input. Streaming blobs are checked
every second for updates. Unlike a streaming blob, a reference blob is only loaded at
the beginning of the refresh. It's static data that isn’t expected to change, and the
recommended limit for static data is 50 MB or less.
Power BI expects reference blobs to be used alongside streaming sources, for example,
through a JOIN. Hence, a streaming dataflow with a reference blob must also have a
streaming source.
The configuration for Azure Blobs is slightly different to that of an Azure Event Hubs
node. To find your Azure Blob connection string, see View account access keys.
After you enter the Blob connection string, you need to provide the name of your
container. You also need to enter the path pattern within your directory to access the
files you want to set as the source for your dataflow.
For streaming blobs, the directory path pattern is expected to be a dynamic value. The
date is required to be a part of the filepath for the blob – referenced as {date}.
Furthermore, an asterisk (*) in the path pattern, like {date}/{time}/*.json, isn't be
supported.
For example, if you have a blob called ExampleContainer that you're storing nested .json
files inside, where the first level is the date of creation and the second level is the hour
of creation (yyyy-mm-dd/hh), then your Container input would be “ExampleContainer”.
The Directory path pattern would be “{date}/{time}” where you could modify the date
and time pattern.
After your blob is connected to the endpoint, all functionality for selecting, adding,
autodetecting, and editing fields coming in from Azure Blob is the same as in Event
Hubs. You can also edit the credentials by selecting the gear icon.
Often, when working with real time data, data is condensed, and Identifiers are used to
represent the object. A possible use case for blobs could also be as reference data for
your streaming sources. Reference data allows you to join static data to streaming data
to enrich your streams for analysis. A quick example of when this feature would be
helpful would be if you were installing sensors at different department stores to
measure how many people are entering the store at a given time. Usually, the sensor ID
needs to be joined onto a static table to indicate which department store and which
location the sensor is located at. Now with reference data, it's possible to join this data
during the ingestion phase to make it easy to see which store has the highest output of
users.
7 Note
A Streaming Dataflows job pulls data from Azure Blob storage or ADLS Gen2 input
every second if the blob file is available. If the blob file is unavailable, there is an
exponential backoff with a maximum time delay of 90 seconds.
Data types
The available data types for streaming dataflows fields include:
) Important
The data types selected for a streaming input have important implications
downstream for your streaming dataflow. Select the data type as early as you can in
your dataflow to avoid having to stop it later for edits.
To add a streaming data transformation to your dataflow, select the transformation icon
on the ribbon for that transformation. The respective card appears in the diagram view.
After you select it, you'll see the side pane for that transformation to configure it.
Filter
Use the Filter transformation to filter events based on the value of a field in the input.
Depending on the data type (number or text), the transformation keeps the values that
match the selected condition.
7 Note
Inside every card, you'll see information about what else is needed for the
transformation to be ready. For example, when you're adding a new card, you'll see
a "Set-up required" message. If you're missing a node connector, you'll see either
an "Error" or a "Warning" message.
Manage fields
The Manage fields transformation allows you to add, remove, or rename fields coming
in from an input or another transformation. The settings on the side pane give you the
option of adding a new one by selecting Add field or adding all fields at once.
Tip
After you configure a card, the diagram view gives you a glimpse of the settings
within the card itself. For example, in the Manage fields area of the preceding
image, you can see the first three fields being managed and the new names
assigned to them. Each card has information relevant to it.
Aggregate
You can use the Aggregate transformation to calculate an aggregation (Sum, Minimum,
Maximum, or Average) every time a new event occurs over a period of time. This
operation also allows you to filter or slice the aggregation based on other dimensions in
your data. You can have one or more aggregations in the same transformation.
To add an aggregation, select the transformation icon. Then connect an input, select the
aggregation, add any filter or slice dimensions, and choose the period of time when you
want to calculate the aggregation. This example calculates the sum of the toll value by
the state where the vehicle is from over the last 10 seconds.
To add another aggregation to the same transformation, select Add aggregate function.
Keep in mind that the filter or slice applies to all aggregations in the transformation.
Join
Use the Join transformation to combine events from two inputs based on the field pairs
that you select. If you don't select a field pair, the join is based on time by default. The
default is what makes this transformation different from a batch one.
As with regular joins, you have different options for your join logic:
Inner join: Include only records from both tables where the pair matches. In this
example, that's where the license plate matches both inputs.
Left outer join: Include all records from the left (first) table and only the records
from the second one that match the pair of fields. If there's no match, the fields
from the second input are set blank.
To select the type of join, select the icon for the preferred type on the side pane.
Finally, select over what period of time you want the join to be calculated. In this
example, the join looks at the last 10 seconds. Keep in mind that the longer the period
is, the less frequent the output is—and the more processing resources you use for the
transformation.
By default, all fields from both tables are included. Prefixes left (first node) and right
(second node) in the output help you differentiate the source.
Group by
Use the Group by transformation to calculate aggregations across all events within a
certain time window. You can group by the values in one or more fields. It's similar to
the Aggregate transformation but provides more options for aggregations. It also
includes more complex time-window options. Also similar to Aggregate, you can add
more than one aggregation per transformation.
To add another aggregation to the same transformation, select Add aggregate function.
Keep in mind that the Group by field and the windowing function applies to all
aggregations in the transformation.
A time stamp for the end of the time window is provided as part of the transformation
output for reference.
A section later in this article explains each type of time window available for this
transformation.
Union
Use the Union transformation to connect two or more inputs to add events with shared
fields (with the same name and data type) into one table. Fields that don't match are
dropped and not included in the output.
Set up time-window functions
Time windows are one of the most complex concepts in streaming data. This concept
sits at the core of streaming analytics.
With streaming dataflows, you can set up time windows when you're aggregating data
as an option for the Group by transformation.
7 Note
Keep in mind that all the output results for windowing operations are calculated at
the end of the time window. The output of the window will be a single event that's
based on the aggregate function. This event will have the time stamp of the end of
the window, and all window functions are defined with a fixed length.
There are five kinds of time windows to choose from: tumbling, hopping, sliding,
session, and snapshot.
Tumbling window
Tumbling is the most common type of time window. The key characteristics of tumbling
windows are that they repeat, have the same time length, and don't overlap. An event
can't belong to more than one tumbling window.
When you're setting up a tumbling window in streaming dataflows, you need to provide
the duration of the window (same for all windows in this case). You also can provide an
optional offset. By default, tumbling windows include the end of the window and
exclude the beginning. You can use this parameter to change this behavior and include
the events in the beginning of the window and exclude the ones in the end.
Hopping window
Hopping windows "hop" forward in time by a fixed period. You can think of them as
tumbling windows that can overlap and be emitted more often than the window size.
Events can belong to more than one result set for a hopping window. To make a
hopping window the same as a tumbling window, you can specify the hop size to be the
same as the window size.
When you're setting up a hopping window in streaming dataflows, you need to provide
the duration of the window (same as with tumbling windows). You also need to provide
the hop size, which tells streaming dataflows how often you want the aggregation to be
calculated for the defined duration.
The offset parameter is also available in hopping windows for the same reason as in
tumbling windows. It defines the logic for including and excluding events for the
beginning and end of the hopping window.
Sliding window
Sliding windows, unlike tumbling or hopping windows, calculate the aggregation only
for points in time when the content of the window actually changes. When an event
enters or exits the window, the aggregation is calculated. So, every window has at least
one event. Similar to hopping windows, events can belong to more than one sliding
window.
The only parameter that you need for a sliding window is the duration, because events
themselves define when the window starts. No offset logic is necessary.
Session window
Session windows are the most complex type. They group events that arrive at similar
times, filtering out periods of time where there's no data. For this window, it's necessary
to provide:
Snapshot window
Snapshot windows group events that have the same time stamp. Unlike other windows,
a snapshot doesn't require any parameters because it uses the time from the system.
Define outputs
After setting up inputs and transformations, it's time to define one or more outputs. As
of July of 2021, streaming dataflows support Power BI tables as the only type of output.
This output is a dataflow table (that is, an entity) that you can use to create reports in
Power BI Desktop. You need to join the nodes of the previous step with the output that
you're creating to make it work. After that, name the table.
After you connect to your dataflow, this table will be available for you to create visuals
that are updated in real time for your reports.
As shown in the following screenshot, if you want to see or drill down into something
specific, you can pause the preview (1). Or you can start it again if you're done.
You can also see the details of a specific record (a "cell" in the table) by selecting it and
then selecting Show details or Hide details (2). The screenshot shows the detailed view
of a nested object in a record.
Static preview for transformations and outputs
After you add and set up any steps in the diagram view, you can test their behavior by
selecting the static data button.
After you do, streaming dataflows evaluate all transformation and outputs that are
configured correctly. Streaming dataflows then display the results in the static data
preview, as shown in the following image.
You can refresh the preview by selecting Refresh static preview (1). When you do this,
streaming dataflows take new data from the input and evaluate all transformations and
outputs again with any updates that you might have performed. The Show or Hide
details option is also available (2).
Authoring errors
If you have any authoring errors or warnings, the Authoring errors tab (1) lists them, as
shown in the following screenshot. The list includes details of the error or warning, the
type of card (input, transformation, or output), the error level, and a description of the
error or warning (2). When you select any of the errors or warnings, the respective card
is selected and the configuration side pane opens for you to make the needed changes.
Runtime errors
The last available tab in the preview is Runtime errors (1), as shown in the following
screenshot. This tab lists any errors in the process of ingesting and analyzing the
streaming dataflow after you start it. For example, you might get a runtime error if a
message came in corrupted, and the dataflow couldn't ingest it and perform the defined
transformations.
Because dataflows might run for a long period of time, this tab offers the option to filter
by time span and to download the list of errors and refresh it if needed (2).
Data source credentials: This setting shows the inputs that have been configured
for the specific streaming dataflow.
Retention duration: This setting is specific to streaming dataflows. Here you can
define how long you want to keep real-time data to visualize in reports. Historical
data is saved by default in Azure Blob Storage. This setting is specific to the real-
time side of your data (hot storage). The minimum value is 1 day or 24 hours.
) Important
The amount of hot data stored by this retention duration directly influences
the performance of your real-time visuals when you're creating reports on top
of this data. The more retention you have here, the more your real-time
visuals in reports can be affected by low performance. If you need to perform
historical analysis, you should use the cold storage provided for streaming
dataflows.
7 Note
It might take up to five minutes for data to start being ingested and for you to see
data coming in to create reports and dashboards in Power BI Desktop.
When you go into a running streaming dataflow, all edit options are disabled and a
message is displayed: "The dataflow can't be edited while it's running. Stop the dataflow
if you wish to continue." The data preview is disabled too.
To edit your streaming dataflow, you have to stop it. A stopped dataflow results in
missing data.
The only experience available while a streaming dataflow runs is the Runtime errors tab,
where you can monitor the behavior of your dataflow for any dropped messages and
similar situations.
This experience is better shown with an example. The following screenshot shows the
message you get when you add a column to one table, change the name for a second
table, and leave a third table the same as it was before.
In this example, the data already saved in both tables that had schema and name
changes is deleted if you save the changes. For the table that stayed the same, you get
the option to delete any old data and start from scratch, or save it for later analysis
together with new data that comes in.
Keep these nuances in mind when editing your streaming dataflow, especially if you
need historical data available later for further analysis.
Hot storage (real-time analysis): As data comes into Power BI from streaming
dataflows, data is stored in a hot location for you to access with real-time visuals.
How much data is saved in this storage depends on the value that you defined for
Retention duration in the streaming dataflow settings. The default (and minimum)
is 24 hours.
Cold storage (historical analysis): Any time period that doesn't fall in the period
that you defined for Retention duration is saved in cold storage (blobs) in Power
BI for you to consume if needed.
7 Note
There is overlap between these two data storage locations. If you need to use both
locations in conjunction (for example, day-over-day percentage change), you might
have to deduplicate your records. It depends on the time intelligence calculations
that you're making and the retention policy.
1. Go to Get Data, select Power Platform, and then choose the Dataflows connector.
2. Sign in with your Power BI credentials.
3. Select workspaces. Look for the one that contains your streaming dataflow and
select that dataflow. (In this example, the streaming dataflow is called Toll.)
4. Notice that all your output tables appear twice: one for streaming data (hot) and
one for archived data (cold). You can differentiate them by the labels added after
the table names and by the icons.
5. Connect to the streaming data. The archived data case is the same, only available
in import mode. Select the tables that include the labels Streaming and Hot, and
then select Load.
6. When you're asked to choose a storage mode, select DirectQuery if your goal is to
create real-time visuals.
Now you can create visuals, measures, and more, by using the features available in
Power BI Desktop.
7 Note
The regular Power BI dataflow connector is still available and will work with
streaming dataflows with two caveats:
For more information about the feature, see Automatic page refresh in Power BI. That
article includes information about how to use it, how to set it up, and how to contact
your admin if you're having trouble. The following are the basics on how to set it up:
1. Go to the report page where you want the visuals to be updated in real time.
2. Clear any visual on the page. If possible, select the background of the page.
4. Set up your desired frequency (up to every second if your admin has allowed it).
5. To share a real-time report, first publish back to the Power BI service. Then you can
set up your dataflow credentials for the dataset and share.
Tip
If your report isn't updated as fast as you need it to be or in real time, check the
documentation for automatic page refresh. Follow the FAQs and troubleshooting
instructions to figure out why this problem might be happening.
Considerations and limitations
General limitations
A Power BI Premium subscription (capacity or PPU) is required for creating and
running streaming dataflows.
Only one type of dataflow is allowed per workspace.
Linking regular and streaming dataflows isn't possible.
Capacities smaller than A3 don't allow the use of streaming dataflows.
If dataflows or the enhanced calculation engine isn't enabled in a tenant, you can't
create or run streaming dataflows.
Workspaces connected to a storage account aren't supported.
Each streaming dataflow can provide up to 1 MB per second of throughput.
Availability
The preview of streaming dataflows isn't available in the following regions:
Central India
Germany North
Norway East
Norway West
UAE Central
South Africa North
South Africa West
Switzerland North
Switzerland West
Brazil Southeast
Licensing
The number of streaming dataflows allowed per tenant depends on the license being
used:
For regular capacities, use the following formula to calculate the maximum number
of streaming dataflows allowed in a capacity:
Dataflow authoring
When you're authoring streaming dataflows, be mindful of the following considerations:
The owner of a streaming dataflows can only make modifications, and they can
only make modifications if the dataflow isn't running.
Streaming dataflows aren't available in My Workspace.
Next steps
This article provided an overview of self-service streaming data preparation by using
streaming dataflows. The following articles provide information about how to test this
capability and how to use other streaming data features in Power BI:
For these reasons, we recommend that you use dataflows in a Premium capacity
whenever possible. Dataflows used in a Power BI Pro license can be used for simple,
small-scale use cases.
Solution
Getting access to these Premium features of dataflows is possible in two ways:
Designate a Premium capacity to a given workspace and bring your own Pro
license to author dataflows here.
Bring your own Premium per user (PPU) license, which requires other members of
the workspace to also possess a PPU license.
You can't consume PPU dataflows (or any other content) outside the PPU environment
(such as in Premium or other SKUs or licenses).
For Premium capacities, your consumers of dataflows in Power BI Desktop don't need
explicit licenses to consume and publish to Power BI. But to publish to a workspace or
share a resulting dataset, you'll need at least a Pro license.
For PPU, everyone who creates or consumes PPU content must have a PPU license. This
requirement varies from the rest of Power BI in that you need to explicitly license
everyone with PPU. You can't mix Free, Pro, or even Premium capacities with PPU
content unless you migrate the workspace to a Premium capacity.
Choosing a model typically depends on your organization's size and goals, but the
following guidelines apply.
>5,000 users ✔
<5,000 users ✔
For small teams, PPU can bridge the gap between Free, Pro, and Premium per capacity.
If you have larger needs, using a Premium capacity with users who have Pro licenses is
the best approach.
Back-end workspaces where you develop dataflows and build out the business
logic.
Lineage for privileged individuals also shows the referenced workspace and allows users
to link back to fully understand the parent dataflow. For those users who aren't
privileged, privacy is still respected. Only the name of the workspace is shown.
The following diagram illustrates this setup. On the left is the architectural pattern. On
the right is an example that shows sales data split and secured by region.
Reduce refresh times for dataflows
Imagine you have a large dataflow, but you want to build datasets off of that dataflow
and decrease the time required to refresh it. Typically, refreshes take a long time to
complete from the data source to dataflows to the dataset. Lengthy refreshes are
difficult to manage or maintain.
Disabling load typically is appropriate only when the overhead of loading more queries
cancels the benefit of the entity with which you're developing.
While disabling load means Power BI doesn't evaluate that given query, when used as
ingredients, that is, referenced in other dataflows, it also means that Power BI doesn't
treat it as an existing table where we can provide a pointer to and perform folding and
query optimizations. In this sense, performing transformations such as a join or merge is
merely a join or merge of two data source queries. Such operations can have a negative
effect on performance, because Power BI must fully reload already computed logic again
and then apply any more logic.
To simplify the query processing of your dataflow and ensure any engine optimizations
are taking place, enable load and ensure that the compute engine in Power BI Premium
dataflows is set at the default setting, which is Optimized.
Enabling load also enables you to keep the complete view of lineage, because Power BI
considers a non-enabled load dataflow as a new item. If lineage is important to you,
don't disable load for entities or dataflows connected to other dataflows.
Generally, by using DirectQuery trades up-to-date data in your dataset with slower
report performance compared to import mode. Consider this approach only when:
Your use case requires low latency data coming from your dataflow.
The dataflow data is large.
An import would be too time consuming.
You're willing to trade cached performance for up-to-date data.
Solution: Use the dataflows connector to enable query
folding and incremental refresh for import
The unified Dataflows connector can significantly reduce evaluation time for steps
performed over computed entities, such as performing joins, distinct, filters, and group
by operations. There are two specific benefits:
To enable this feature for any Premium dataflow, make sure the compute engine is
explicitly set to On. Then use the Dataflows connector in Power BI Desktop. You must
use the August 2021 version of Power BI Desktop or later to take advantage of this
feature.
To use this feature for existing solutions, you must be on a Premium or Premium Per
User subscription. You might also need to make some changes to your dataflow as
described in Using the enhanced compute engine. You must update any existing Power
Query queries to use the new connector by replacing PowerBI.Dataflows in the Source
section with PowerPlatform.Dataflows .
Type Mechanism
Build your own business logic by using the APIs. Rest API
For more information about refresh, see Understanding and optimizing dataflows
refresh.
Multi-geo support
Many customers today have a need to meet data sovereignty and residency
requirements. You can complete a manual configuration to your dataflows workspace to
be multi-geo.
Scenario Status
Read virtual network data sources through an on-premises Supported through an on-
gateway. premises gateway
Write data to a sensitivity label account behind a virtual network Not yet supported
by using an on-premises gateway.
Next steps
The following articles provide more information about dataflows and Power BI:
Power BI is integrating with Azure Log Analytics (LA) to enable administrators and
Premium workspace owners to configure a Log Analytics connection to their Power BI
subscription. This article describes how the integration between Log Analytics and
Power BI works, and provides examples of how you can use Azure Log Analytics in your
Power BI Premium subscription.
Azure Log Analytics (LA) is a service within Azure Monitor which Power BI uses to save
activity logs. The Azure Monitor suite lets you collect, analyze, and act on telemetry data
from your Azure and on-premises environments. It offers long-term storage, an ad-hoc
query interface and API access to allow data export and integration with other systems.
The Power BI integration with Log Analytics exposes events from the Analysis Services
engine. The events are derived from existing diagnostic logs available for Azure Analysis
Services.
Once connected to Power BI, data is sent continuously and is available in Log Analytics
in approximately 5 minutes. The following diagram shows how Azure Monitor operates,
with the path taken by Power BI highlighted.
The following sections describe the integration of Azure Log Analytics with Power BI, the
requirements necessary to connect Azure Log Analytics to Power BI, and considerations
to keep in mind.
The following section provides examples of how you might put logging to use in Power
BI.
In this example, only workspace logs from Workspace A are sent to a dedicated Log
Analytics workspace:
These examples highlight the various ways you can use Azure Log Analytics with Power
BI, and get the log information you need.
In another article, you can see how to configure Azure Log Analytics to work with Power
BI, with specific steps and requirements to get your logging working properly.
In this example, workspace logs from multiple Power BI workspaces are each sent to a
dedicated Log Analytics workspace:
These examples highlight the various ways you can use Azure Log Analytics with Power
BI, and get the log information you need.
In another article, you can see how to configure Azure Log Analytics to work with Power
BI, with specific steps and requirements to get your logging working properly.
Next steps
The following articles provide more information about Power BI and its many features:
Power BI is integrating with Azure Log Analytics to enable administrators and Premium
workspace owners to configure a Log Analytics connection to their Power BI
subscription. This article describes how the integration between Log Analytics and
Power BI works and how to configure it for your environment.
There are two elements to getting Azure Log Analytics working for Power BI:
1. Sign in to the Azure portal, select the subscription you want to use with Log
Analytics and that contains your Log Analytics workspaces. In the Settings section,
select Resource providers as shown in the following image.
Set permissions
1. Make sure the user configuring Log Analytics integration has Log Analytics
Contributor role of the Log Analytics workspace. When you select Access control
(IAM) for the subscription in the Azure portal, and then select Role assignments
from the top selections in the panel, the current user must see one entry: Log
Analytics Contributor for the user who configures Log Analytics:
After you complete those steps, the Azure Log Analytics configuration portion is
complete. The next section shows you how to continue and complete the configuration
in the Power BI Admin portal.
1. In the Power BI Admin portal, go to Tenant Settings > Audit and usage settings,
and expand Azure Log Analytics connections for workspace administrators. To
allow workspace admins to enable Log Analytics, set the slider to Enabled and
specify the needed security groups under Apply to, as shown in the following
image.
Configure logging in a Premium Workspace
1. In the Premium workspace, workspace admins can enable Log Analytics. To do so,
go to Settings as shown in the following image.
2. In the Settings pane, select Azure connections, then expand Log Analytics as
shown in the following image.
3. Select the Azure subscription, Resource group, and then the Log Analytics
workspace configured in the previous section. Then choose Save. When
successfully completed, the expanded Tenant-level Log Analytics section should
look similar to the following image.
Disconnect Azure Log Analytics
You can disconnect from Azure Log Analytics to stop sending logs to Azure. To
disconnect, in the Power BI Workspace Settings, go to the Log Analytics settings. Select
Disconnect from Azure. Then choose Save to disconnect.
7 Note
When you disconnect a Power BI workspace from Azure Log Analytics, logs are not
deleted. Your data remains and follows the storage and retention policies you set
there.
Usage scenarios
There are many ways that Azure Log Analytics and Power BI can help solve real-world
challenges for your organization. Consider the following:
Identify periods of high or unusual Analysis Services engine activity by capacity,
workspace, report, or user.
Analyze query performance and trends, including external DirectQuery operations.
Analyze dataset refresh duration, overlaps, and processing steps.
Analyze custom operations sent using the Premium XMLA endpoint.
Send us feedback in the Power BI Community for how you're using logging and how it
has helped your organization.
You don't have permission Error - cannot You need write permissions on this Log Analytics
to write to the Log Analytics proceed workspace to connect it to Power BI. Contact the
Workspace person in your organization who manages Azure
subscriptions to fix this problem.
You don't have permission Error - cannot You need write permissions on this Log Analytics
to write to the Log Analytics proceed workspace to connect it to Power BI.
workspace account
You don't have access to Error - cannot You don't have access to any Azure subscriptions.
any Azure subscriptions proceed Ask the person who manages Azure subscriptions
in your organization to grant you contributor
access or higher.
You don't have access to Error - cannot You don't have access to an Azure Log Analytics
any Azure Log Analytics proceed workspace. Ask the person who manages Azure
workspaces within that subscriptions in your organization to add you to
subscription the Log Analytics owner or contributor role.
Workspace-level Log Information Ask your tenant admin to grant workspace admins
Analytics disabled when permission to connect Log Analytics workspaces.
trying to connect
AggregateTableRewriteQuery
Command
Deadlock
DirectQuery
Discover
Error
ProgressReport
Query
Session Initialize
VertiPaqSEQuery
Notification
SQL
Next steps
The following articles can help you learn more about Power BI and about its integration
with Azure Log Analytics.
Power BI is integrating with Azure Log Analytics (LA) to enable administrators and
Premium workspace owners to configure a Log Analytics connection to their Power BI
subscription.
Answer: Dataset activity logs (such as Analysis Services engine traces) are currently
available.
Question: When should I use Log Analytics for the Analysis Services engine?
Answer: Engine logs are detailed and can be high volume and large, averaging 3-4 KB
each for complex datasets. Therefore we recommend carefully considering when to use
logging for the Analysis Service engine. Typical use cases for logging are performance
investigations, scale/load testing or pre-release validation.
Question: Which Analysis Services events are supported? What will the logs look like?
Answer: For information on events and logs see events and schema.
Question: I can't get Owner permissions for Azure Log Analytics in my organization, is
there a workaround?
Answer: Each log entry is marked with the correspondent Power BI Workspace Id.
Question: How long does it take for logs to appear in Log Analytics?
Answer: Typically within 5 minutes of the activity being generated in Power BI. The logs
are sent continuously.
Question: What happens when I disconnect Log Analytics? Will I lose my data?
Answer: The default retention period is 31 days. You can adjust data retention period
within the Azure portal, which currently can be increased to 730 days (two years).
Answer: No new Log Analytics configurations can be made at the workspace-level if that
occurs. Any existing workspaces that have Log Analytics already configured will continue
to send logs.
Question: Do you support Blob Store and Event Hubs destinations in Log Analytics?
Answer: Blob Store and Event Hubs destinations aren't currently supported, but your
feedback is welcomed on how useful you would find those destinations.
Answer: Currently the Log Analytics configuration won't be deleted, but logs will stop
flowing when the dataset isn't in a Premium capacity. If you move it back to Premium
capacity, logs will begin to flow again.
Answer: Azure Log Analytics bills storage, ingestion, and analytical queries
independently. Cost also depends on the geographic region. It will vary depending on
how much activity is generated, how long you choose to store the data, and how often
you query it. An average Premium capacity generates about 35 GB of logs monthly, but
the storage size of logs can be higher for heavily utilized capacities. For for information,
see the pricing calculator .
Next steps
The following articles can help you learn more about Power BI, and about its integration
with Azure Log Analytics.
In the Power Query Editor window of Power BI Desktop, there are a handful of
commonly used tasks. This article demonstrates those common tasks and provides links
for additional information.
Connect to data
Shape and combine data
Group rows
Pivot columns
Create custom columns
Query formulas
You can use a couple data connections to complete these tasks. The data is available for
you to download or connect to, in case you want to step through these tasks yourself.
The first data connection is an Excel workbook , which you can download and save
locally. The other is a Web resource:
https://www.bankrate.com/retirement/best-and-worst-states-for-retirement/
Common query tasks begin at the steps necessary to connect to both of those data
sources.
Connect to data
To connect to data in Power BI Desktop, select Home and then choose Get data. Power
BI Desktop presents a menu with the most common data sources. For a complete list of
data sources to which Power BI Desktop can connect, select More at the end of the
menu. For more information, see Data sources in Power BI Desktop.
To start, select Excel Workbook, specify the Excel workbook mentioned earlier, and then
choose Open. Power Query Editor inspects the workbook, then presents the data it
found in the Navigator dialog box after you select a table.
Select Transform Data to edit, adjust, or shape, the data before you load it into Power BI
Desktop. Editing is especially useful when you work with large datasets that you want to
pare down before loading.
Select OK. Like before, Power BI Desktop inspects the webpage data and shows preview
options in the Navigator dialog box. When you select a table, it displays a preview of
the data.
Other data connections are similar. Power BI Desktop prompts you for the appropriate
credentials if it needs you to authenticate your connection.
This section and the following sections use the exampleExcel workbook mentioned
previously, which you can download and save locally. Load the data in Power Query
Editor by using the Transform data button in the Home tab. After you load the data,
select Table 1 from the available queries in the Queries pane, as shown here:
When you shape data, you transform a data source into the form and format that meets
your needs.
In Power Query Editor, you can find many commands in the ribbon, and in context
menus. For example, when you right-click a column, the context menu lets you remove
the column. Or select a column and then choose the Remove Columns button from the
Home tab in the ribbon.
You can shape the data in many other ways in this query. You can remove any number
of rows from the top or bottom. Or add columns, split columns, replace values, and do
other shaping tasks. With these features, you can direct Power Query Editor to get the
data how you want it.
Group rows
In Power Query Editor, you can group the values from many rows into a single value.
This feature can be useful when summarizing the number of products offered, the total
sales, or the count of students.
In this example, you group rows in an education enrollment dataset. The data is from
the Excel workbook.
This example shows how many Agencies each state has. (Agencies can include school
districts, other education agencies such as regional service districts, and more.) Select
the State Abbr column, then select the Group By button in the Transform tab or the
Home tab of the ribbon. (Group By is available in both tabs.)
The Group By dialog box appears. When Power Query Editor groups rows, it creates a
new column into which it places the Group By results. You can adjust the Group By
operation in the following ways:
1. The unlabeled dropdown list specifies the column to be grouped. Power Query
Editor defaults this value to the selected column, but you can change it to be any
column in the table.
2. New column name: Power Query Editor suggests a name for the new column,
based on the operation it applies to the grouped column. You can name the new
column anything you want, though.
3. Operation: Choose the operation that Power Query Editor applies, such as Sum,
Median, or Count Distinct Rows. The default value is Count Rows.
4. Add grouping and Add aggregation: These buttons are available only if you select
the Advanced option. In a single operation, you can make grouping operations
(Group By actions) on many columns and create several aggregations by using
these buttons. Based on your selections in this dialog box, Power Query Editor
creates a new column that operates on multiple columns.
And with Power Query Editor, you can always remove the last shaping operation. In the
Query Settings pane, under Applied Steps, just select the X next to the step recently
completed. So go ahead and experiment. If you don’t like the results, redo the step until
Power Query Editor shapes your data the way you want.
Pivot columns
You can pivot columns and create a table that contains aggregated values for each
unique value in a column. For example, to find out how many different products are in
each product category, you can quickly create a table to do that.
To create a new table that shows a count of products for each category (based on the
CategoryName column), select the column, then select Transform > Pivot Column.
The Pivot Column dialog box appears, letting you know which column’s values the
operation uses to create new columns. (If the wanted column name of CategoryName
isn't shown, select it from the dropdown list.) When you expand Advanced options, you
can select which function to apply to the aggregated values.
When you select OK, Power Query Editor displays the table according to the transform
instructions provided in the Pivot Column dialog box.
Create custom columns
In Power Query Editor, you can create custom formulas that operate on multiple
columns in your table. Then you can place the results of such formulas into a new
(custom) column. Power Query Editor makes it easy to create custom columns.
With the Excel workbook data in Power Query Editor, go to the Add Column tab on the
ribbon, and then select Custom Column.
The following dialog box appears. This example creates a custom column called Percent
ELL that calculates the percentage of total students that are English Language Learners
(ELL).
As with any other applied step in Power Query Editor, if the new custom column doesn’t
provide the data you’re looking for, you can delete the step. In the Query Settings pane,
under APPLIED STEPS, just select the X next to the Added Custom step.
Query formulas
You can edit the steps that Power Query Editor generates. You can also create custom
formulas, which let you connect to and shape your data more precisely. Whenever
Power Query Editor does an action on data, the formula associated with the action is
displayed in the formula bar. To view the formula bar, go to the View tab of the ribbon,
and then select Formula Bar.
Power Query Editor keeps all applied steps for each query as text that you can view or
modify. You can view or modify the text for any query by using the Advanced Editor.
Just select View and then Advanced Editor.
Here's a screenshot of the Advanced Editor, with the query steps associated with the
USA_StudentEnrollment query displayed. These steps are created in the Power Query
Formula Language, often referred to as M. For more information, see Create Power
Query formulas in Excel . To view the language specification itself, see Power Query M
language specification.
Power BI Desktop provides an extensive set of formula categories. For more information,
and a complete reference of all Power Query Editor formulas, see Power Query M
function reference.
Next steps
You can do all sorts of things with Power BI Desktop. For more information on its
capabilities, see the following resources:
When you have multiple tables, chances are you'll do some analysis using data from all
those tables. Relationships between those tables are necessary to accurately calculate
results and display the correct information in your reports. In most cases you won’t have
to do anything. The autodetect feature does it for you. However, sometimes you might
have to create relationships yourself, or need to make changes to a relationship. Either
way, it’s important to understand relationships in Power BI Desktop and how to create
and edit them.
2. In the Create relationship dialog box, in the first table drop-down list, select a
table. Select the column you want to use in the relationship.
3. In the second table drop-down list, select the other table you want in the
relationship. Select the other column you want to use, and then select OK.
By default, Power BI Desktop automatically configures the options Cardinality
(direction), Cross filter direction, and Make this relationship active for your new
relationship. However, you can change these settings if necessary. For more information,
see Understanding additional options.
If none of the tables selected for the relationship has unique values, you'll see the
following error: One of the columns must have unique values. At least one table in a
relationship must have a distinct, unique list of key values, which is a common
requirement for all relational database technologies.
If you encounter that error, there are a couple ways to fix the issue:
Use Remove Duplicates to create a column with unique values. The drawback to
this approach is that you might lose information when duplicate rows are removed.
Often a key (row) is duplicated for good reason.
Add an intermediary table made of the list of distinct key values to the model,
which will then be linked to both original columns in the relationship.
Alternatively, in the Model view diagram layouts, you can drag and drop a column from
one table to a column in another table to create a relationship.
Edit a relationship
There are two ways to edit a relationship in Power BI.
The first method to edit a relationship is using the Editing relationships in the
Properties pane in Model view, where you can select any line between two tables to see
the relationship options in the Properties pane. Be sure to expand the Properties pane
to see the relationship options.
You can also see a video demonstration of editing relationships in the Properties
pane.
The other method of editing a relationship is using the Relationship editor dialog,
which you can open many ways from within Power BI Desktop. The following list shows
different ways you can open the Relationship editor dialog:
Select the Modeling ribbon > Manage relationships, then select the relationship
and select Edit.
Select a table in the Fields list then select the Table tools ribbon > Manage
relationships, then select the relationship and then select Edit.
From the Data view, select the Table tools ribbon > Manage relationships, then select
the relationship and then choose Edit.
Select the Home ribbon > Manage relationships, then choose the relationship and
then select Edit.
Double-click any line between two tables.
Right-click any line between two tables and then choose Properties.
Select any line between two tables, then choose Open relationship editor in the
Properties pane.
Finally, you can also edit a relationship from any view, right-click or select the ellipsis to
get to the context menu of any table, then select Manage relationships, select the
relationship and then select Edit
You can also multi-select relationships in the Model view diagram layouts by pressing
the Ctrl key and selecting more than one line to choose multiple relationships. Common
properties can be edited in the Properties pane and Apply changes will process the
changes in one transaction.
) Important
Cardinality
The Cardinality option can have one of the following settings:
Many to one (*:1): A many-to-one relationship is the most common, default type of
relationship. It means the column in a given table can have more than one instance of a
value, and the other related table, often know as the lookup table, has only one instance
of a value.
One to one (1:1): In a one-to-one relationship, the column in one table has only one
instance of a particular value, and the other related table has only one instance of a
particular value.
One to many (1:*): In a one-to-many relationship, the column in one table has only one
instance of a particular value, and the other related table can have more than one
instance of a value.
Many to many (*:*): With composite models, you can establish a many-to-many
relationship between tables, which removes requirements for unique values in tables. It
also removes previous workarounds, such as introducing new tables only to establish
relationships. For more information, see Relationships with a many-many cardinality.
For more information about when to change cardinality, see Understanding additional
options.
Both: For filtering purposes, both tables are treated as if they're a single table. The Both
setting works well with a single table that has many lookup tables that surround it. An
example is a sales actuals table with a lookup table for its department. This
configuration is often called a star schema configuration (a central table with several
lookup tables). However, if you have two or more tables that also have lookup tables
(with some in common) then you wouldn't want to use the Both setting. To continue the
previous example, in this case, you also have a budget sales table that records target
budget for each department. And, the department table is connected to both the sales
and the budget table. Avoid the Both setting for this kind of configuration.
Single: The most common, default direction, which means filtering choices in connected
tables work on the table where values are being aggregated. If you import a Power Pivot
in Excel 2013 or earlier data model, all relationships will have a single direction.
For more information about when to change cross filter direction, see Understanding
additional options.
Understanding relationships
Once you've connected two tables together with a relationship, you can work with the
data in both tables as if they were a single table. You're then free from having to worry
about relationship details or flattening those tables into a single table before importing
them. In many situations, Power BI Desktop can automatically create relationships for
you. However, if Power BI Desktop can’t determine with a high-degree of certainty that a
relationship between two tables should exist, it doesn't automatically create the
relationship. In that case, you must do so.
Let’s go through a quick tutorial, to better show you how relationships work in Power BI
Desktop.
Tip
1. Copy the following ProjectHours table into an Excel worksheet (excluding the
title), select all of the cells, and then select Insert > Table.
2. In the Create Table dialog box, select OK.
3. Select any table cell, select Table Design > Table Name, and then enter
ProjectHours.
4. Do the same for the CompanyProject table.
5. Import the data by using Get Data in Power BI Desktop. Select the two tables
as a data source, and then select Load.
The first table, ProjectHours, is a record of work tickets that record the number of hours
a person has worked on a particular project.
ProjectHours
CompanyProject
ProjName Priority
Blue A
Red B
Green C
Yellow C
Purple B
Orange C
Notice that each table has a project column. Each is named slightly different, but the
values look like they’re the same. That difference is important, and we’ll get back to it in
soon.
Now that we have our two tables imported into a model, let’s create a report. The first
thing we want to get is the number of hours submitted by project priority, so we select
Priority and Hours from the Fields pane.
If we look at our table in the report canvas, you’ll see the number of hours is 256 for
each project, which is also the total. Clearly this number isn’t correct. Why? It’s because
we can’t calculate a sum total of values from one table (Hours in the Project table),
sliced by values in another table (Priority in the CompanyProject table) without a
relationship between these two tables.
Remember those columns we saw in both tables with a project name, but with values
that look alike? We'll use these two columns to create a relationship between our tables.
Why these columns? Well, if we look at the Project column in the ProjectHours table, we
see values like Blue, Red, Yellow, Orange, and so on. In fact, we see several rows that
have the same value. In effect, we have many color values for Project.
If we look at the ProjName column in the CompanyProject table, we see there’s only
one of each of the color values for the project name. Each color value in this table is
unique, and that’s important, because we can create a relationship between these two
tables. In this case, a many-to-one relationship. In a many-to-one relationship, at least
one column in one of the tables must contain unique values. There are some more
options for some relationships, which we'll look at later. For now, let’s create a
relationship between the project columns in each of our two tables.
3. In the first drop-down list, select ProjectHours as the first table, then select the
Project column. This side is the many side of our relationship.
5. Accept the defaults for the relationship options, and then select OK.
In the interest of full disclosure, you just created this relationship the hard way. You
could have selected Autodetect in the Manage relationships dialog box. In fact,
autodetect would have automatically created the relationship for you when you loaded
the data if both columns had the same name.
When we sum up hours by Priority, Power BI Desktop looks for every instance of the
unique color values in the CompanyProject lookup table, looks for every instance of
each of those values in the ProjectHours table, and then calculates a sum total for each
unique value.
Power BI typically sets these options automatically and you won’t need to adjust them.
But there are several situations where you might want to configure these options
yourself.
Import relationships from data sources on first load: This option is selected by
default. When it's selected, Power BI checks for relationships defined in your data
source, such as foreign key/primary key relationships in your data warehouse. If
such relationships exist, they're mirrored into the Power BI data model when you
initially load data. This option enables you to quickly begin working with your
model, rather than requiring you find or define those relationships yourself.
2 Warning
The CompanyProjectPriority table is a list of all company projects and their priority. The
ProjectBudget table is the set of projects for which a budget has been approved.
CompanyProjectPriority
ProjName Priority
Blue A
Red B
Green C
Yellow C
Purple B
Orange C
ProjectBudget
The reason Power BI makes these settings is because, to Power BI Desktop, the best
combination of the two tables is as follows:
Yellow C
ProjName Priority BudgetAllocation AllocationDate
Purple B
Orange C
There's a one-to-one relationship between our two tables because there are no
repeating values in the combined table’s ProjName column. The ProjName column is
unique, because each value occurs only once; therefore, the rows from the two tables
can be combined directly without any duplication.
But, let’s say you know the data will change the next time you refresh it. A refreshed
version of the ProjectBudget table now has additional rows for the Blue and Red
projects:
ProjectBudget
These additional rows mean the best combination of the two tables now looks like this:
Yellow C
Purple B
Orange C
The Both setting enables Power BI Desktop to treat all aspects of connected tables as if
they're a single table. There are some situations, however, where Power BI Desktop can't
set a relationship’s cross filter direction to Both and also keep an unambiguous set of
defaults available for reporting purposes. If a relationship cross filter direction isn't set to
Both, then it’s usually because it would create ambiguity. If the default cross filter
setting isn’t working for you, try setting it to a particular table or to Both.
Single direction cross filtering works for many situations. In fact, if you’ve imported a
model from Power Pivot in Excel 2013 or earlier, all of the relationships will be set to
single direction. Single direction means that filtering choices in connected tables work
on the table where aggregation work is happening. Sometimes, understanding cross
filtering can be a little difficult, so let’s look at an example.
With single direction cross filtering, if you create a report that summarizes the project
hours, you can then choose to summarize (or filter) by the CompanyProject table and its
Priority column or the CompanyEmployee table and its City column. If however, you
want to count the number of employees per projects (a less common question), it won’t
work. You’ll get a column of values that are all the same. In the following example, both
relationship's cross filtering direction is set to a single direction: towards the
ProjectHours table. In the Values well, the Project field is set to Count:
Filter specification will flow from CompanyProject to ProjectHours (as shown in the
following image), but it won’t flow up to CompanyEmployee.
However, if you set the cross filtering direction to Both, it will work. The Both setting
allows the filter specification to flow up to CompanyEmployee.
With the cross filtering direction set to Both, our report now appears correct:
Cross filtering both directions works well for a pattern of table relationships such as the
pattern shown previously. This schema is most commonly called a star schema, like this:
Cross filtering direction doesn't work well with a more general pattern often found in
databases, like in this diagram:
If you have a table pattern like this, with loops, then cross filtering can create an
ambiguous set of relationships. For instance, if you sum up a field from TableX and then
choose to filter by a field on TableY, then it’s not clear how the filter should travel,
through the top table or the bottom table. A common example of this kind of pattern is
with TableX as a sales table with actuals data and for TableY to be budget data. Then, the
tables in the middle are lookup tables that both tables use, such as division or region.
To ensure there’s a default relationship, Power BI Desktop allows only a single active
relationship between two tables at a given time. Therefore, you must first set the current
relationship as inactive and then set the relationship you want to be active.
Let’s look at an example. The first table is ProjectTickets, and the second table is
EmployeeRole.
ProjectTickets
EmployeeRole
Employee Role
If we add both relationships to the model (OpenedBy first), then the Manage
relationships dialog box shows that OpenedBy is active:
Now, if we create a report that uses Role and Employee fields from EmployeeRole, and
the Hours field from ProjectTickets in a table visualization in the report canvas, we see
only project sponsors because they’re the only ones that opened a project ticket.
We can change the active relationship and get SubmittedBy instead of OpenedBy. In
Manage relationships, uncheck the ProjectTickets(OpenedBy) to
EmployeeRole(Employee) relationship, and then check the EmployeeRole(Employee) to
Project Tickets(SubmittedBy) relationship.
See all of your relationships in Relationship
view
Sometimes your model has multiple tables and complex relationships between them.
Relationship view in Power BI Desktop shows all of the relationships in your model, their
direction, and cardinality in an easy to understand and customizable diagram.
Troubleshooting
This section provides guidance and troubleshooting information when working with
relationships in Power BI.
Scenario 2: Traditional star schema and measure constraint provided. In the previous
example in Scenario 1, if the user provides a constraint in the form of summarized
column (Sum/Average/Count of Purchase Qty, for example) or a model measure
(Distinct Count of VendID), Power BI can generate a query in the form of the following
example:
In such a case, Power BI attempts to return combinations that have meaningful values
for the constraint provided by the user (non-blank). Power BI doesn't need to also add
its own implicit constraint of CountRows(Purchases)>0, such as what was done like in the
previous Scenario 1, because the constraint provided by the user is sufficient.
Scenario 4: Non-star schema and measure constraint provided. If we take the example
from Scenario 3, and add a user provided constraint in the form of a summarized
column (Count of Product[ProdID] for example) or a model measure (Sales[Total Qty])
Power, BI can generate a query in the form of Correlate Purchase[VenID] and
Sales[CustID] where MeasureConstraint isn't blank.
In this case, Power BI respects the user's constraint as being the sole constraint Power BI
needs to apply, and return the combinations that produce non-blank values for it. The
user has guided Power BI to the scenario it wants, and Power BI applies the guidance.
1. Check your model. Is it set up appropriately for the types of questions you want
answered from your analysis? Can you change some of the relationships between
tables? Can you avoid creating an indirect Many to Many?
Consider converting your reversed V shape schema to two tables, and use a direct
Many to Many relationship between them as described in apply many-many
relationships in Power BI Desktop.
2. Add a constraint to the visual in the form of a summarized column or a model
measure.
3. If a summarized column is added and there still is an error, consider using a model
measure.
Next steps
For more information about models and relationships, see the following articles:
With the Analytics pane in Power BI Desktop, you can add dynamic reference lines to
visuals, and provide focus for important trends or insights. The Analytics icon and pane
is found in the Visualizations area of Power BI Desktop.
7 Note
The Analytics pane only appears when you select a visual on the Power BI Desktop
canvas.
7 Note
The following sections show how you can use the Analytics pane and dynamic reference
lines in your visualizations.
To view the available dynamic reference lines for a visual, follow these steps:
1. Select or create a visual, then select the Analytics icon from the Visualizations
section.
2. Select the type of line you want to create to expand its options. This example
shows Average line selected.
3. To create a new line, select + Add. Then you can name the line. Double-click the
text box and enter your name.
Now you have all sorts of options for your line. You can specify its Color,
Transparency percentage, Line style, and Position (compared to the visual's data
elements). You might also choose whether to include the Data label. To specify the
visual measure to base your line upon, select the Measure dropdown list, which is
automatically populated with data elements from the visual. This example selects
Culture as the measure, and labels it Average of Culture. You'll see how to
customize a few of the other options in the later steps.
4. If you want to have a data label appear, change Data label from Off to On. When
you do so, you get many more options for your data label.
5. Notice the number that appears next to the Average line item in the Analytics
pane. That tells you how many dynamic lines you currently have on your visual,
and of which type. If we add a Max line for Affordability, the Analytics pane shows
that we now also have a Max line dynamic reference line applied to this visual.
If the visual you've selected can't have dynamic reference lines applied to it (in this case,
a Map visual), you'll see the following message when you select the Analytics pane.
You can highlight many interesting insights by creating dynamic reference lines with the
Analytics pane.
More features and capabilities are being planned, including expanding which visuals can
have dynamic reference lines applied to them. Check back often to see what's new.
Apply forecasting
If you have time data in your data source, you can use the forecasting feature. Select a
visual, then expand the Forecast section of the Analytics pane. You might specify many
inputs to modify the forecast, such as the Forecast length or the Confidence interval.
The following image shows a basic line visual with forecasting applied. Use your
imagination (and play around with forecasting) to see how it might apply to your
models.
7 Note
For an example of how forecasting can be applied, see the (dated, but still relevant)
article about forecasting capabilities .
You might use x-axis constant line, y-axis constant line, and symmetry shading on the
following visual:
Scatter chart
Use of constant line, min line, max line, average line, median line, and percentile line is
available on these visuals:
Area chart
Clustered bar chart
Clustered column chart
Line chart
Scatter chart
The following visuals can use only a constant line from the Analytics pane:
The following visuals can use a trend line if there's time data:
Area chart
Clustered column chart
Line chart
Line and clustered column chart
Scatter chart
Funnel
Line and clustered column chart
Line and stacked column chart
Ribbon chart
Non-Cartesian visuals, such as Donut chart, Gauge, Matrix, Pie chart, and Table
The percentile line is only available when using imported data in Power BI Desktop or
when connected live to a model on a server that's running Analysis Service 2016 or later,
Azure Analysis Services, or a dataset on the Power BI service.
Next steps
You can do all sorts of things with Power BI Desktop. For more information on its
capabilities, see the following resources:
Data view helps you inspect, explore, and understand data in your Power BI Desktop
model. It's different from how you view tables, columns, and data in the Power Query
Editor. With Data view, you're looking at your data after it has been loaded into the
model.
7 Note
Since Data view shows data after it's loaded into the model, the Data view icon isn't
visible if all data sources are based on DirectQuery.
When you're modeling your data, sometimes you want to see what's actually in a table
or column without creating a visual on the report canvas. You might want to see right
down to the row level. This ability is especially useful when you're creating measures
and calculated columns, or you need to identify a data type or data category.
Let's take a closer look at some of the elements found in Data view.
2. Data Grid. This area shows the selected table and all columns and rows in it.
Columns hidden from the Report view are greyed out. You can right-click on a
column for options.
3. Formula bar. Enter Data Analysis Expression (DAX) formulas for Measures and
Calculated columns.
You can filter individual values, or use advanced filtering based on the data in the
column.
7 Note
When a Power BI model is created in a different culture than your current user
interface, the search box doesn't appear in the Data view user interface for anything
other than text fields. For example, this behavior would apply for a model created
in US English that you view in Spanish.
Next steps
You can do all sorts of things with Power BI Desktop. For more information on its
capabilities, check out the following resources:
This article is for users new to Power BI Desktop. It gives you a quick and easy
introduction on how you can use Data Analysis Expressions (DAX) to solve many basic
calculations and data analysis problems. We’ll go over some conceptual information, a
series of tasks you can complete, and a knowledge check to test what you’ve learned.
After completing this article, you should have a good understanding of the most
important fundamental concepts in DAX.
What is DAX?
DAX is a collection of functions, operators, and constants that can be used in a formula,
or expression, to calculate and return one or more values. DAX helps you create new
information from data already in your model.
Prerequisites
You might already be familiar with creating formulas in Microsoft Excel, and that
knowledge will be helpful in understanding DAX. But even if you have no experience
with Excel formulas, the concepts described here will help you get started creating DAX
formulas and solving real-world BI problems right away.
Let's begin
We'll frame our understanding of DAX around three fundamental concepts: Syntax,
Functions, and Context. There are other important concepts in DAX, but understanding
these three concepts will provide the best foundation on which to build your DAX skills.
Syntax
Before you create your own formulas, let’s take a look at DAX formula syntax. Syntax
includes the various elements that make up a formula, or more simply, how the formula
is written. For example, here's a simple DAX formula for a measure:
B. The equals sign operator (=), which indicates the beginning of the formula. When
calculated, it will return a result.
C. The DAX function SUM, which adds up all of the numbers in the Sales[SalesAmount]
column. You’ll learn more about functions later.
D. Parenthesis (), which surround an expression that contains one or more arguments.
Most functions require at least one argument. An argument passes a value to a function.
F. The referenced column, [SalesAmount], in the Sales table. With this argument, the
SUM function knows on which column to aggregate a SUM.
When trying to understand a DAX formula, it's often helpful to break down each of the
elements into a language you think and speak every day. For example, you can read this
formula as:
For the measure named Total Sales, calculate (=) the SUM of values in the
[SalesAmount ] column in the Sales table.
When added to a report, this measure calculates and returns values by summing up
sales amounts for each of the other fields we include, for example, Cell Phones in the
USA.
You might be thinking, "Isn’t this measure doing the same thing as if I were to just add
the SalesAmount field to my report?" Well, yes. But, there’s a good reason to create our
own measure that sums up values from the SalesAmount field: We can use it as an
argument in other formulas. This solution might seem a little confusing now, but as your
DAX formula skills grow, knowing this measure will make your formulas and your model
more efficient. In fact, you’ll see the Total Sales measure showing up as an argument in
other formulas later on.
Let’s go over a few more things about this formula. In particular, we introduced a
function, SUM. Functions are pre-written formulas that make it easier to do complex
calculations and manipulations with numbers, dates, time, text, and more. You'll learn
more about functions later.
You also see that the column name [SalesAmount] was preceded by the Sales table in
which the column belongs. This name is known as a fully qualified column name in that
it includes the column name preceded by the table name. Columns referenced in the
same table don't require the table name be included in the formula, which can make
long formulas that reference many columns shorter and easier to read. However, it's a
good practice to include the table name in your measure formulas, even when in the
same table.
7 Note
Let’s create an example formula. This task will help you further understand formula
syntax and how the suggestions feature in the formula bar can help you.
2. In Report view, in the field list, right-click the Sales table, and then select New
Measure.
3. In the formula bar, replace Measure by entering a new measure name, Previous
Quarter Sales.
4. After the equals sign, type the first few letters CAL, and then double-click the
function you want to use. In this formula, you want to use the CALCULATE
function.
You’ll use the CALCULATE function to filter the amounts we want to sum by an
argument we pass to the CALCULATE function. This type of function is referred to
as nesting functions. The CALCULATE function has at least two arguments. The first
is the expression to be evaluated, and the second is a filter.
5. After the opening parenthesis ( for the CALCULATE function, type SUM followed by
another opening parenthesis (.
This step creates the first expression argument for our CALCULATE function.
7. Type a comma (,) followed by a space to specify the first filter, and then type
PREVIOUSQUARTER.
You’ll use the PREVIOUSQUARTER time intelligence function to filter SUM results
by the previous quarter.
8. After the opening parenthesis ( for the PREVIOUSQUARTER function, type
Calendar[DateKey].
9. Close both the arguments being passed to the PREVIOUSQUARTER function and
the CALCULATE function by typing two closing parenthesis )).
10. Select the checkmark in the formula bar or press Enter to validate the formula
and add it to the Sales table.
You did it! You just created a complex measure by using DAX. What this formula will do
is calculate the total sales for the previous quarter, depending on the filters applied in a
report. For example, we can put SalesAmount and our new Previous Quarter Sales
measure from the Sales table into a Clustered column chart. Then from the Calendar
table add Year as a slicer and select 2011. Then after, add QuarterOfYear as another
Slicer and select 4, and we get a chart like this:
Keep in mind, the sample model contains only a small amount of sales data from
1/1/2011 to 1/19/2013. If you select a year or quarter where SalesAmount can't be
summed, or your new measure can't calculate sales data for the current or previous
quarter, no data for that period is shown. For example, if you select 2011 for Year and 1
for QuarterOfYear, no data is shown for Previous Quarter Sales because there's no data
for the fourth quarter of 2010.
DAX formulas can contain up to 64 nested functions. It’s unlikely a formula would
ever contain so many nested functions. In fact, such a formula would be difficult to
create and debug, and it probably wouldn’t be fast either.
In this formula, you also used filters. Filters narrow down what will be calculated. In
this case, you selected one filter as an argument, which is actually the result of
another function. You'll learn more about filters later.
You used the CALCULATE function. This function is one of the most powerful
functions in DAX. As you author models and create more complex formulas, you'll
likely use this function many times. Although further discussion about the
CALCULATE function is outside the scope of this article, as your knowledge of DAX
grows, pay special attention to it.
Syntax QuickQuiz
1. What does this button on the formula bar do?
Functions
Functions are predefined formulas that perform calculations by using specific values,
called arguments, in a particular order or structure. Arguments can be other functions,
another formula, expression, column references, numbers, text, logical values such as
TRUE or FALSE, or constants.
DAX includes the following categories of functions: Date and Time, Time Intelligence,
Information, Logical, Mathematical, Statistical, Text, Parent/Child, and Other functions. If
you’re familiar with functions in Excel formulas, many of the functions in DAX will appear
similar to you; however, DAX functions are unique in the following ways:
A DAX function always references a complete column or a table. If you want to use
only particular values from a table or column, you can add filters to the formula.
DAX includes many functions that return a table rather than a value. The table isn't
displayed, but is used to provide input to other functions. For example, you can
retrieve a table and then count the distinct values in it, or calculate dynamic sums
across filtered tables or columns.
DAX includes various time intelligence functions. These functions let you define or
select date ranges, and perform dynamic calculations based on them. For example,
you can compare sums across parallel periods.
Excel has a popular function, VLOOKUP. DAX functions don’t take a cell or cell
range as a reference like VLOOKUP does in Excel. DAX functions take a column or a
table as a reference. Keep in mind, in Power BI Desktop you’re working with a
relational data model. Looking up values in another table is easy, and in most
cases you don’t need to create any formulas at all.
As you can see, functions in DAX can help you create powerful formulas. We only
touched on the basics of functions. As your DAX skills grow, you'll create formulas
by using many different functions. One of the best places to learn details about
each of the DAX functions is in the DAX Function Reference.
Functions QuickQuiz
1. What does a function always reference?
2. Can a formula contain more than one function?
3. What category of functions would you use to concatenate two text strings into one
string?
Context
Context is one of the most important DAX concepts to understand. There are two types
of context in DAX: row context and filter context. We’ll first look at row context.
Row context
Row context is most easily thought of as the current row. It applies whenever a formula
has a function that applies filters to identify a single row in a table. The function will
inherently apply a row context for each row of the table over which it's filtering. This
type of row context most often applies to measures.
Filter context
Filter context is a little more difficult to understand than row context. You can most
easily think of filter context as: One or more filters applied in a calculation that
determines a result or value.
Filter context doesn’t exist in place of row context; rather, it applies in addition to row
context. For example, to further narrow down the values to include in a calculation, you
can apply a filter context, which not only specifies the row context, but also specifies a
particular value (filter) in that row context.
Filter context is easily seen in your reports. For example, when you add TotalCost to a
visualization, and then add Year and Region, you're defining a filter context that selects a
subset of data based on a given year and region.
Why is filter context so important to DAX? You've seen that filter context can be applied
by adding fields to a visualization. Filter context can also be applied in a DAX formula by
defining a filter with functions such as ALL, RELATED, FILTER, CALCULATE, by
relationships, and by other measures and columns. For example, let’s look at the
following formula in a measure named Store Sales:
To better understand this formula, we can break it down, much like with other formulas.
E. A measure [Total Sales] in the same table as an expression. The Total Sales measure
has the formula: =SUM(Sales[SalesAmount]).
F. A comma (,), which separates the first expression argument from the filter argument.
This formula ensures only sales values defined by the Total Sales measure are calculated
only for rows in the Channel[ChannelName] column, with the value Store used as a filter.
As you can imagine, being able to define filter context within a formula has immense
and powerful capabilities. The ability to reference only a particular value in a related
table is just one such example. Don’t worry if you don't completely understand context
right away. As you create your own formulas, you'll better understand context and why
it’s so important in DAX.
Context QuickQuiz
1. What are the two types of context?
2. What is filter context?
3. What is row context?
Summary
Now that you have a basic understanding of the most important concepts in DAX, you
can begin creating DAX formulas for measures on your own. DAX can indeed be a little
tricky to learn, but there are many resources available to you. After reading through this
article and experimenting with a few of your own formulas, you can learn more about
other DAX concepts and formulas that can help you solve your own business problems.
There are many DAX resources available to you; most important is the Data Analysis
Expressions (DAX) Reference.
Because DAX has been around for several years in other Microsoft BI tools such as
Power Pivot and Analysis Services Tabular models, there are many great sources
information out there. You can find more information in books, whitepapers, and blogs
from both Microsoft and leading BI professionals. The DAX Resource Center Wiki on
TechNet is also a great place to start.
QuickQuiz answers
Syntax:
Functions:
Context:
Model view shows all of the tables, columns, and relationships in your model. This view
can be especially helpful when your model has complex relationships between many
tables.
Select the Model view icon near the side of the window to see a view of the existing
model. Hover your cursor over a relationship line to show the columns used.
In the image, the Connections table has a Seat ID column that’s related to the Unique
Seats table, which also has a seatId column. The two tables have a Many to One (*:1)
relationship. An arrow in the middle of the line shows the direction of the filter context
flow. Double arrows would mean the cross-filter direction is set to Both.
You can double-click a relationship to open it in the Edit relationship dialog box. For
more information about relationships, see Create and manage relationships in Power BI
Desktop.
The colors in the table card headers automatically match the colors you've selected in
any report theme you're using. If the color is too close to white, Model view doesn't use
it in the theme headers to avoid situations where it's difficult to differentiate tables in
dual mode. In the previous image the card headers are white; if the report theme was
using blue, the card headers in the Model view shown in the previous image would be
blue instead of white.
If your model has fewer than 75 tables, Model view shows all of your tables. If your
model has more than 75 tables, instead of showing all tables you see the following
image:
When your model has more than 75 tables, Power BI Desktop warns you that slowdowns
might occur. Create a custom layout (select the Create a custom layout button) to
reduce the significant CPU and memory used when Model view shows more than 75
tables.
Next steps
There are all sorts of things you can do with Power BI Desktop. For more information on
data sources, see the following resources:
Quick measure suggestions assist creation of DAX measures using natural language
instead of using templates or writing DAX from scratch.
This feature can be used to jump-start creation of common DAX measures scenarios
such as:
Here you can describe the measure you want to create and hit Generate (or enter key)
to get DAX measure suggestions:
You should always validate the DAX suggestions to make sure they meet your needs. If
you’re satisfied with a suggested measure, you can click the Add button to automatically
add the measure to your model.
Aggregated columns
Apply aggregations to a column to return a single value. Our supported aggregations
include sum, count, distinct count, distinct count no blanks, average, min, max, median,
variance, and standard deviation.
Examples:
Optional filters
For aggregated columns, you can also specify one or more filter conditions. If there are
multiple filter conditions, you can specify if you want an intersection (&&/AND) or union
(||/OR) of the filters.
Examples:
Count of rows
Count the number of records in the specified table. You don’t need to specify the table if
there is only one table.
Examples:
Examples:
Count rows of sales table where Product is Word and Region is North
Count of sales table where Product is Word or Region is North
Count record of sales table filtered to Product is Word && Region is North
Get the row count of sales table for Product is Word || Region is North
Examples:
Mathematical operations
Perform mathematical operations with numeric columns, measures, or aggregated
columns. For scenarios across columns within a table, you can either average
(AVERAGEX) or sum up (SUMX) the result in order to return a single value.
Examples:
Sales - Cogs
Sales minus Cogs
Sales divided by target revenue times 100
Sales / target revenue * 100
EU Sales + JP Sales + NA Sales
For each row in Sales table calculate Price * Units and sum up the result
For each row in Sales table sum up Price * Units
For each row in Sales table calculate Price * Discount and then get the average
For the Sales table get the average of Price * Discount
Selected value
Get the selected value of a column. This is typically used when paired with a single-
select slicer or filter so that the measure will return a non-blank value.
Examples:
If condition
Return values based on conditions. If you are returning string values, you will need to
use double quotes. Conditions can use the following comparison operators: =, ==, <>,
<, >, <=, >=
Examples:
Text operations
Perform text operations with columns, measures, or aggregated columns. For scenarios
across columns within a table, we’ll merge (CONCATENATEX) the result in order to return
a single value.
Examples:
Time intelligence
These time intelligence scenarios require using a properly marked date table or auto
date/time hierarchy. For YTD scenarios you can specify "fiscal" or "fiscal calendar" to
base the calculation on the fiscal calendar (ends on June 30th).
Examples:
YTD sales
Sales fiscal YTD
Get the sales year to date
Sales MTD
Quarter to date sales
YTD sales for US and Canada
Change of sales from the previous year
Sales YoY change
Month over month change for sales
Sales QoQ Percent change
Sales for the same period last year
Sales for the same period last month
28 day rolling average sales
28 – day rolling avg sales
Examples:
Examples:
Examples:
Examples:
Information functions
Return system or user information such as the current date/time or the current user's
email, domain, or username.
Examples:
Today's date
Now
Return the current user email
Return the current domain name and username
Return the current user’s domain login
Here's a powerful approach to importing data into Power BI Desktop: If you have
multiple files that have the same schema, combine them into a single logical table. This
popular technique has been made more convenient and more expansive.
To start the process of combining files from the same folder, select Get data, choose File
> Folder, and then select Connect.
Enter the folder path, select OK, and then choose Transform data to see the folder's files
in Power Query Editor.
The combine files transform analyzes each input file to determine the correct file
format to use, such as text, Excel workbook, or JSON file.
The transform allows you to select a specific object from the first file, such as an
Excel workbook, to extract.
The combine files transform then automatically takes these actions:
Creates an example query that performs all the required extraction steps in a
single file.
Creates a function query that parameterizes the file/binary input to the exemplar
query. The exemplar query and the function query are linked, so that changes to
the exemplar query are reflected in the function query.
Applies the function query to the original query with input binaries, such as the
Folder query. It applies the function query for binary inputs on each row, then
expands the resulting data extraction as top-level columns.
7 Note
The scope of your selection in an Excel workbook will affect the behavior of
combine binaries. For example, you can select a specific worksheet to combine that
worksheet, or choose the root to combine the full file. Selecting a folder combines
the files found in that folder.
With the behavior of combine files, you can easily combine all files within a given folder
if they have the same file type and structure (such as the same columns).
Also you can easily apply more transformation or extraction steps by modifying the
automatically created exemplar query, without having to worry about modifying or
creating other function query steps. Any changes to the exemplar query are
automatically generated in the linked function query.
Next steps
You can connect to all sorts of data by using Power BI Desktop. For more information on
data sources, see the following resources:
In Power BI, you can use AI Insights to gain access to a collection of pre-trained machine
learning models that enhance your data preparation efforts. You can access AI Insights
in the Power Query Editor. You can find its associated features and functions through
the Home and Add Column tabs in Power Query Editor.
This article describes the functions for Text Analytics and Vision functions, both from
Azure Cognitive Services. Also in this article is a section that describes the custom
functions available in Power BI from Azure Machine Learning.
Sentiment Analysis
Key Phrase Extraction
Language Detection
Image Tagging.
The transformations are executed on the Power BI service and don't require an Azure
Cognitive Services subscription.
) Important
Available functions
This section describes the available functions in Cognitive Services in Power BI.
Detect language
The Detect language function evaluates text input, and for each field, returns the
language name and ISO identifier. This function is useful for data columns that collect
arbitrary text, where language is unknown. The function expects data in text format as
input.
Text Analytics recognizes up to 120 languages. For more information, see supported
languages.
Key phrase extraction works best when you give it bigger chunks of text to work on,
opposite from sentiment analysis. Sentiment analysis performs better on smaller blocks
of text. To get the best results from both operations, consider restructuring the inputs
accordingly.
Score sentiment
The Score sentiment function evaluates text input and returns a sentiment score for
each document, ranging from 0 (negative) to 1 (positive). Score sentiment also accepts
an optional input for a Language ISO code. This function is useful for detecting positive
and negative sentiment in social media, customer reviews, and discussion forums.
Text Analytics uses a machine learning classification algorithm to generate a sentiment
score between 0 and 1. Scores closer to 1 indicate positive sentiment. Scores closer to 0
indicate negative sentiment. The model is pre-trained with an extensive body of text
with sentiment associations. Currently, it's not possible to provide your own training
data. The model uses a combination of techniques during text analysis, including text
processing, part-of-speech analysis, word placement, and word associations. For more
information about the algorithm, see Introducing Text Analytics.
Currently, Sentiment Analysis supports English, German, Spanish, and French. Other
languages are in preview. For more information, see supported languages.
Tag images
The Tag Images function returns tags based on more than 2,000 recognizable objects,
living beings, scenery, and actions. When tags are ambiguous or not common
knowledge, the output provides hints to clarify the meaning of the tag in context of a
known setting. Tags aren't organized as a taxonomy, and no inheritance hierarchies
exist. A collection of content tags forms the foundation for an image description
displayed as human readable language formatted in complete sentences.
This function requires an image URL or a base-64 field as input. At this time, image
tagging supports English, Spanish, Japanese, Portuguese, and Simplified Chinese. For
more information, see supported languages.
Select the Text analytics button in the Home or Add Column ribbon. Then sign in when
you see the prompt.
After signing in, select the function you want to use and the data column you want to
transform in the pop-up window.
Power BI selects a Premium capacity to run the function on and send the results back to
Power BI Desktop. The selected capacity is only used for Text Analytics and Vision
function during application and refreshes in Power BI Desktop. Once Power BI publishes
the report, refreshes run on the Premium capacity of the workspace the report is
published to. You can change the capacity used for all Cognitive Services in the
dropdown in the lower left corner of the popup window.
Language ISO code is an optional input to specify the language of the text. You can use
a column as input, or a static field. In this example, the language is specified as English
(en) for the whole column. If you leave this field blank, Power BI automatically detects
the language before applying the function. Next, select Apply.
The first time you use AI Insights on a new data source, Power BI Desktop prompts you
to set the privacy level of your data.
7 Note
Refreshes of the dataset in Power BI will only work for data sources where the
privacy level is set to public or organizational.
After you invoke the function, the result is added as a new column to the table. The
transformation is also added as an applied step in the query.
In the cases of image tagging and key phrase extraction, the results can return multiple
values. Each individual result is returned on a duplicate of the original row.
Reports with applied Text Analytics and Vision functions should be published to a
workspace that is on a Premium capacity, otherwise refreshing the dataset fails.
Select a capacity
Report authors can select the Premium capacity on which to run AI Insights. By default,
Power BI selects the first created capacity to which the user has access.
Power Query has separate buttons for Text Analytics, Vision, and Azure Machine
Learning. In Power Query Online, these features are combined in one menu.
In Power Query, the report author can select the Premium capacity that's used to
run the functions. This choice isn't required in Power Query Online, since a
dataflow is already on a specific capacity.
Incremental refresh is supported but can cause performance issues when used on
queries with AI insights.
Direct Query isn't supported.
To use this capability, a data scientist can grant access to the Azure Machine Learning
model to the BI analyst using the Azure portal. Then, at the start of each session, Power
Query discovers all the Azure Machine Learning models to which the user has access
and exposes them as dynamic Power Query functions. The user can then invoke those
functions by accessing them from the ribbon in Power Query Editor, or by invoking the
M function directly. Power BI also automatically batches the access requests when
invoking the Azure Machine Learning model for a set of rows to achieve better
performance.
This functionality is supported in Power BI Desktop, Power BI dataflows, and for Power
Query Online in the Power BI service.
To learn more about dataflows, see Self-service data prep in Power BI.
To learn more about Azure Machine Learning, see the following articles:
The steps in this section describe how to grant a Power BI user access to a model hosted
on the Azure Machine Learning service. With this access, they can use this model as a
Power Query function. For more information, see Manage access using RBAC and the
Azure portal.
These instructions for schema generation, by updating the entry script, must also be
applied to models created using automated machine learning experiments with the
Azure Machine Learning SDK.
7 Note
Models created by using the Azure Machine Learning visual interface don't
currently support schema generation, but they will in subsequent releases.
All Azure Machine Learning models to which you have access are listed here as Power
Query functions. Also, the input parameters for the Azure Machine Learning model are
automatically mapped as parameters of the corresponding Power Query function.
To invoke an Azure Machine Learning model, you can specify any of the selected entity's
columns as an input from the drop-down. You can also specify a constant value to be
used as an input by toggling the column icon to the left of the input dialog.
Select OK to view the preview of the Azure Machine Learning model's output as a new
column in the entity table. The model invocation appears as an applied step for the
query.
If the model returns multiple output parameters, they're grouped together as a record in
the output column. You can expand the column to produce individual output
parameters in separate columns.
Models created by using the Azure Machine Learning visual interface don't
currently support schema generation. Support is anticipated in subsequent
releases.
Incremental refresh is supported but can cause performance issues when used on
queries with AI insights.
Direct Query isn't supported.
Users with a Premium Per User (PPU) only license cannot use AI Insights from
Power BI Desktop; you must use a non-PPU Premium license with its
corresponding Premium capacity. You can still use AI Insights with a PPU license
the Power BI service.
Next steps
This article provided an overview of integrating Machine Learning into Power BI
Desktop. The following articles might also be interesting and useful.
You can use quick measures to quickly and easily perform common, powerful
calculations. A quick measure runs a set of Data Analysis Expressions (DAX) commands
behind the scenes, then presents the results for you to use in your report. You don't
have to write the DAX, it's done for you based on input you provide in a dialog box.
There are many available categories of calculations and ways to modify each calculation
to fit your needs. Best of all, you can see the DAX that's executed by the quick measure
and jump-start or expand your own DAX knowledge.
When you select New quick measure, the Quick measures window appears, letting you
choose the calculation you want and the fields to run the calculation against.
Choose the Select a calculation field to see a long list of available quick measures.
The five quick measure calculation types, with their calculations, are:
To submit your ideas about new quick measures you'd like to see, underlying DAX
formulas, or other quick measures ideas for consideration, check out the Power BI
Ideas page.
7 Note
When using SQL Server Analysis Services (SSAS) live connections, some quick
measures are available. Power BI Desktop displays only the quick measures that are
supported for the version of SSAS you're connecting to. If you're connected to a
SSAS live data source and don't see certain quick measures in the list, it's because
the SSAS version you're connected to doesn't support the DAX commands used to
implement those quick measures.
After you select the calculations and fields you want for your quick measure, choose OK.
The new quick measure appears in the Fields pane, and the underlying DAX formula
appears in the formula bar.
The following matrix visual shows a sales table for various products. It's a basic table
that includes the sales totals for each category.
With the matrix visual selected, choose the drop-down arrow next to TotalSales in the
Values well, and select New quick measure.
In the Quick measures window, under Calculation, select Average per category.
Drag Average Unit Price from the Fields pane into the Base value field. Leave Category
in the Category field, and select OK.
When you select OK, several interesting things happen.
1. The matrix visual has a new column that shows the calculated Average Unit Price
average per Category.
2. The DAX formula for the new quick measure appears in the formula bar. See the
next section for more about the DAX formula.
3. The new quick measure appears selected and highlighted in the Fields pane.
The new quick measure is available to any visual in the report, not just the visual you
created it for. The following image shows a quick column chart visual created by using
the new quick measure field.
The formula bar not only shows you the formula behind the measure, but more
importantly, lets you see how to create the DAX formulas underlying quick measures.
Imagine you need to do a year-over-year calculation, but you're not sure how to
structure the DAX formula, or you have no idea where to start. Instead of banging your
head on the desk, you can create a quick measure by using the Year-over-year change
calculation, and see how it appears in your visual and how the DAX formula works. Then
you can either make changes directly to the DAX formula, or create a similar measure
that meets your needs and expectations.
You can always delete quick measures from your model if you don't like them by right-
clicking or selecting the ... next to the measure and selecting Delete from model. You
can also rename a quick measure whatever you like by selecting Rename from the
menu.
You can use quick measures added to the Fields pane with any visual in the report.
You can always see the DAX associated with a quick measure by selecting the
measure in the Fields list and looking at the formula in the formula bar.
Quick measures are only available if you can modify the model. One exception is
the case when you're working with some Live connections. SSAS tabular live
connections are supported, as previously described.
You can't create time intelligence quick measures when working in DirectQuery
mode. The DAX functions used in these quick measures have performance
implications when translated into the T-SQL statements that are sent to your data
source.
) Important
DAX statements for quick measures use only commas for argument separators. If
your version of Power BI Desktop is in a language that uses commas as decimal
separators, quick measures will not work properly.
You can create variables for your reports, interact with the variable as a slicer, and
visualize and quantify different key values in your reports.
Create a parameter on the Modeling tab in Power BI Desktop. When you select it, a
dialog box appears where you can configure the parameter.
Create a parameter
To create a parameter, select New parameter from the Modeling tab in Power BI
Desktop, and choose either Fields or Numeric range. The following examples use
Numeric range, similar procedures apply to using Fields. Name the example Discount
Percentage and set its Data type to Decimal number. The Minimum value is zero. The
Maximum is 0.50 (50 percent). Also set the Increment to 0.05, or five percent. The
increment determines how much the parameter will adjust when interacted with in a
report.
7 Note
For decimal numbers, make sure you precede the value with a zero, as in 0.50
versus just .50. Otherwise, the number won't validate and the OK button won't be
selectable.
For your convenience, the Add slicer to this page checkbox automatically puts a slicer
with your parameter onto the current report page.
In addition to creating the parameter, you also create a measure automatically in this
process, which you can use to visualize the current value of the parameter.
It's important and useful to note that after you create a parameter, both the parameter
and the measure become part of your model. So, they're available throughout the
report and can be used on other report pages. And, since they're part of the model, you
can delete the slicer from the report page. If you want it back, choose the parameter
from the Fields list and drag it onto the canvas, then change the visual to a slicer.
The new measure is going to be the total sales amount, with the discount rate applied.
You can create complex and interesting measures that let the consumers of your reports
visualize the variable of your parameter. For example, you could create a report that lets
sales people see their compensation if they meet certain sales goals or percentages, or
see the effect of increased sales to deeper discounts.
Enter the measure formula into the formula bar, and name the formula Sales after
Discount.
DAX
Then, create a column visual with OrderDate on the axis, and both SalesAmount and
the just-created measure, Sales after Discount as the values.
Then, as you move the slider, you'll see that the Sales after Discount column reflects the
discounted sales amount.
This process is how you create parameters for any data you might want to work with.
You can use parameters in all sorts of situations. These parameters enable the
consumers of reports to interact with different scenarios that you create in your reports.
Parameters can only have 1,000 unique values. For parameters with more than
1,000 unique values, the parameter values will be evenly sampled.
Parameters are designed for measures within visuals, and might not calculate
properly when used in a dimension calculation.
Next steps
You might also be interested in the following articles:
In Power BI Desktop, you can specify the data category for a column so Power BI
Desktop knows how it should treat its values when in a visualization.
When Power BI Desktop imports data, it gets other information than the data itself, like
the table and column names, and whether the data is a primary key. With that
information, Power BI Desktop makes some assumptions about how to give you a good
default experience when creating a visualization. For example, when a column has
numeric values, you'll probably want to aggregate it in some way, so Power BI Desktop
places it in the Values area of the Visualizations pane. Or, for a column with date-time
values on a line chart, Power BI Desktop assumes you'll probably use it as a time
hierarchy axis.
But, there are some cases that are a bit more challenging, like geography. Consider the
following table from an Excel worksheet:
Should Power BI Desktop treat the codes in the GeoCode column as an abbreviation for
a Country/Region or a US State? The answer to that question isn't clear, because a code
like this can mean either one. For instance, AL can mean Alabama or Albania. AR can
mean Arkansas or Argentina. Or CA can mean California or Canada. It makes a
difference when we go to chart our GeoCode field on a map.
2. On the ribbon, in the Properties area of the Column tools tab, select the drop-
down arrow next to Data Category. This list shows the data categories you can
choose for your column. Some selections might be disabled if they won't work with
the current data type of your column. For example, if a column is a date or time
data type, Power BI Desktop won't let you choose geographic data categories.
You might also be interested in learning about geographic filtering for Power BI mobile
apps.
Tag barcode fields in Power BI Desktop
Article • 03/14/2023
In Power BI Desktop, you can categorize data in a column, so that Power BI Desktop
knows how to treat values in visuals in a report. You can also categorize a column as
Barcode. Then, you can let someone in your organization scan a barcode on a product
by using the Power BI mobile app on their iOS or Android device. This barcode lets them
see any report that includes it. When they open the report, it automatically filters to the
data related to that barcode.
2. Select the column that contains the barcode data. See the list of supported
barcode formats in the following section.
Do not categorize more than one column across all data tables in a report as
Barcode. The mobile apps support barcode filtering only for reports that have
only one barcode column across all report data tables. If a report has more
than one barcode column, no filtering takes place.
4. In Report view, add the barcode field to the visuals you want filtered by the
barcode.
Now when you open the scanner on the Power BI apps for iOS and Android devices, you
can scan a barcode. Then you can see this report in the list of reports that have
barcodes. When you open the report, it filters the visuals by the product barcode you
scanned.
UPCECode
Code39Code
A39Mod43Code
EAN13Code
EAN8Code
93Code
128Code
PDF417Code
Interleaved2of5Code
ITF14Code
Next steps
Scan barcodes from the mobile app to get filtered data
Issues with scanning a barcode
Specify data categories in Power BI Desktop
Questions? Ask the Power BI Community
Set geographic filters in Power BI
Desktop for use in the mobile app
Article • 03/13/2023
In Power BI Desktop, you can categorize geographical data for a column, so Power BI
Desktop knows how to treat values in visuals in a report. As an added benefit, when you
or your colleagues view that report in the Power BI mobile apps, Power BI automatically
provides geographical filters that match where you are.
For example, say you're a sales manager that travels to meet customers, and you want
to quickly filter the total sales and revenue for the specific customer you're planning to
visit. You want to break out the data for your current location, whether by state, city, or
an actual address. Later, if you have time left, you'd like to visit other customers located
nearby. You can filter the report by your location to find those customers.
7 Note
You can only filter by location in the mobile app if the geographic names in the
report are in English; for example, "New York City" or "Germany".
3. On the Column tools tab, select Data category, then the correct category, in this
example, City.
4. Continue setting geographic data categories for any other fields in the model.
7 Note
You can set multiple columns for each data category in a model, but if you do,
the model can't filter for geography in the Power BI mobile app. To use
geographic filtering in the mobile apps, set only one column for each data
category. For example, set only one City column, one State or Province
column, and one Country or Region column.
1. Switch to the Report view, and create visuals that use the geographic fields
in your data.
In this example, the model also contains a calculated column that brings city and
state together into one column. To learn more, see creating calculated columns in
Power BI Desktop.
2. If you're in a geographic location with data in the report, you can filter it
automatically to your location.
To learn more, see filtering a report by location in the Power BI mobile apps.
Next steps
Specify data categories in Power BI Desktop
Questions? Try asking the Power BI Community
Create calculated columns in Power BI
Desktop
Article • 01/27/2023
With calculated columns, you can add new data to a table already in your model. But
instead of querying and loading values into your new column from a data source, you
create a Data Analysis Expressions (DAX) formula that defines the column's values. In
Power BI Desktop, calculated columns are created by using the new column feature in
Report view, Data view, or Model view.
Unlike custom columns that are created as part of a query by using Add Custom
Column in Power Query Editor, calculated columns that are created in Report view, Data
view, or Model view are based on data you've already loaded into the model. For
example, you might choose to concatenate values from two different columns in two
different but related tables, do addition, or extract substrings.
Calculated columns you create appear in the Fields list just like any other field, but
they'll have a special icon showing its values are the result of a formula. You can name
your columns whatever you want, and add them to a report visualization just like other
fields.
Calculated columns calculate results by using DAX, a formula language meant to work
with relational data like in Power BI Desktop. DAX includes a library of over 200
functions, operators, and constructs. It provides immense flexibility in creating formulas
to calculate results for just about any data analysis need. To learn more about DAX, see
Learn DAX basics in Power BI Desktop.
DAX formulas are similar to Excel formulas. In fact, DAX has many of the same functions
as Excel. DAX functions, however, are meant to work over data interactively sliced or
filtered in a report, like in Power BI Desktop. In Excel, you can have a different formula
for each row in a table. In Power BI, when you create a DAX formula for a new column, it
will calculate a result for every row in the table. Column values are recalculated as
necessary, like when the underlying data is refreshed and values have changed.
But with a calculated column, Jeff can put together the cities from the City column with
the states from the State column.
Jeff right-clicks on the Geography table and then selects New Column. Jeff then enters
the following DAX formula into the formula bar:
DAX
This formula creates a new column named CityState. For each row in the Geography
table, it takes values from the City column, adds a comma and a space, and then
concatenates values from the State column.
Next steps
This article provides a quick introduction to calculated columns here. For more
information, see the following resources:
To download a sample file and get step-by-step lessons on how to create more
columns, see Tutorial: Create calculated columns in Power BI Desktop.
To learn more about DAX, see Learn DAX basics in Power BI Desktop.
To learn more about columns you create as part of a query, see Create custom
columns.
Create calculated tables in Power BI
Desktop
Article • 01/13/2023
Most of the time, you create tables by importing data into your model from an external
data source. But calculated tables let you add new tables based on data you've already
loaded into the model. Instead of querying and loading values into your new table's
columns from a data source, you create a Data Analysis Expressions (DAX) formula to
define the table's values.
DAX is a formula language for working with relational data, like in Power BI Desktop.
DAX includes a library of over 200 functions, operators, and constructs, providing
immense flexibility in creating formulas to calculate results for just about any data
analysis need. Calculated tables are best for intermediate calculations and data you want
to store as part of the model, rather than calculating on the fly or as query results. For
example, you might choose to union or cross join two existing tables.
Just like other Power BI Desktop tables, calculated tables can have relationships with
other tables. Calculated table columns have data types, formatting, and can belong to a
data category. You can name your columns whatever you want, and add them to report
visualizations just like other fields. Calculated tables are recalculated if any of the tables
they pull data from are refreshed or updated. If the table uses data from DirectQuery,
calculated tables aren't refreshed. In the case with DirectQuery, the table will only reflect
the changes after the dataset has been refreshed. If a table needs to use DirectQuery,
it's best to have the calculated table in DirectQuery as well.
For example, imagine you're a personnel manager who has a table of Northwest
Employees and another table of Southwest Employees. You want to combine the two
tables into a single table called Western Region Employees.
Northwest Employees
Southwest Employees
In Report View, Data View, or Model View of Power BI Desktop, in the Calculations
group select New table. It's a bit easier to do in Table tools in the Data View, because
then you can immediately see your new calculated table.
DAX
A new table named Western Region Employees is created, and appears just like any
other table in the Fields pane. You can create relationships to other tables, add
measures and calculated columns, and add the fields to reports just like with any other
table.
Functions for calculated tables
You can define a calculated table by any DAX expression that returns a table, including a
simple reference to another table. For example:
DAX
This article provides only a quick introduction to calculated tables. You can use
calculated tables with DAX to solve many analytical problems. Here are some of the
more common DAX table functions you might use:
DISTINCT
VALUES
CROSSJOIN
UNION
NATURALINNERJOIN
NATURALLEFTOUTERJOIN
INTERSECT
CALENDAR
CALENDARAUTO
See the DAX Function Reference for these and other DAX functions that return tables.
Create measures for data analysis in
Power BI Desktop
Article • 09/18/2023
Power BI Desktop helps you create insights into your data with just a few actions. But
sometimes that data just doesn’t include everything you need to answer some of your
most important questions. Measures can help you get there.
Measures are used in some of the most common data analyses. Simple summarizations
such as sums, averages, minimum, maximum and counts can be set through the Fields
well. The calculated results of measures are always changing in response to your
interaction with your reports, allowing for fast and dynamic ad-hoc data exploration.
Let’s take a closer look. For more information, see Create measures.
Understanding measures
In Power BI Desktop, measures are created and displayed in Report View, Data View, or
Model View. Measures you create yourself appear in the Fields list with a calculator icon.
You can name measures whatever you want, and add them to a new or existing
visualization just like any other field.
7 Note
You might also be interested in quick measures, which are ready-made measures
you can select from dialog boxes. They're a good way to quickly create measures,
and also a good way to learn Data Analysis Expressions (DAX) syntax, since their
automatically created DAX formulas are available to review. For more information,
see quick measures.
To report the estimates, Janice imports last year's sales data into Power BI Desktop.
Janice finds the SalesAmount field in the Reseller Sales table. Because the imported
data only contains sales amounts for last year, Janice renames the SalesAmount field to
Last Years Sales. Janice then drags Last Years Sales onto the report canvas. It appears in
a chart visualization as a single value that is the sum of all reseller sales from last year.
Janice notices that even without specifying a calculation, one has been provided
automatically. Power BI Desktop created its own measure by summing up all of the
values in Last Years Sales.
But Janice needs a measure to calculate sales projections for the coming year, which will
be based on last year's sales multiplied by 1.06 to account for the expected 6 percent
increase in business. For this calculation, Janice will create a measure. Janice creates a
new measure by using the New Measure feature, then enters the following DAX formula:
DAX
Janice then drags the new Projected Sales measure into the chart.
Quickly and with minimal effort, Janice now has a measure to calculate projected sales.
Janice can further analyze the projections by filtering on specific resellers or by adding
other fields to the report.
Among other things, data categories allow you to use measures to dynamically create
URLs, and mark the data category as a Web URL.
You could create tables that display the measures as Web URLs, and be able to select on
the URL that's created based on your selection. This approach is especially useful when
you want to link to other Power BI reports with URL filter parameters.
You can make a field appear in multiple folders by using a semicolon to separate the
folder names. For example, Products\Names;Departments results in the field appearing
in a Departments folder and a Names folder inside a Products folder.
You can create a special table that contains only measures. That table always appears at
the top of the Fields. To do so, create a table with just one column. You can use Enter
data to create that table. Then move your measures to that table. Finally, hide the
column, but not the table, that you created. Select the arrow at the top of Fields to close
and reopen the fields list to see your changes.
Tip
Hidden measures are displayed and accessible in Power BI Desktop, however, you
won't see hidden measures in Excel or the Power BI services, since Excel and the
Power BI service are considered client tools.
Learn more
We’ve only provided you with a quick introduction to measures here. There’s a lot more
to help you learn how to create your own. For more information, see Tutorial: Create
your own measures in Power BI Desktop. You can download a sample file and get step-
by-step lessons on how to create more measures.
To dive a little deeper into DAX, see Learn DAX basics in Power BI Desktop. The Data
Analysis Expressions Reference provides detailed articles on each of the functions,
syntax, operators, and naming conventions. DAX has been around for several years in
Power Pivot in Excel and SQL Server Analysis Services. There are many other great
resources available, too. Be sure to check out the DAX Resource Center Wiki , where
influential members of the BI community share their knowledge of DAX.
Import and display KPIs in Power BI
Article • 01/12/2023
With Power BI Desktop, you can import and display KPIs in tables, matrices, and cards.
1. Start with an Excel workbook that has a Power Pivot model and KPIs.
2. Import the Excel workbook into Power BI, by using File -> Import -> Power Query,
Power Pivot, Power View. You can also learn how to import workbooks.
3. After import into Power BI, your KPI will appear in the Fields pane, marked with the
icon. To use a KPI in your report, be sure to expand its contents, exposing the
Value, Goal, and Status fields.
4. Imported KPIs are best used in standard visualization types, such as the Table type.
Power BI also includes the KPI visualization type, which should only be used to
create new KPIs.
You can use KPIs to highlight trends, progress, or other important indicators.
Apply auto date/time in Power BI
Desktop
Article • 03/29/2023
This article targets data modelers developing Import or Composite models in Power BI
Desktop. It introduces and describes the Auto date/time option.
The auto date/time is a data load option in Power BI Desktop. The purpose of this
option is to support convenient time intelligence reporting based on date columns
loaded into a model. Specifically, it allows report authors using your data model to filter,
group, and drill down by using calendar time periods (years, quarters, months, and
days). What's important is that you don't need to explicitly develop these time
intelligence capabilities.
When the option is enabled, Power BI Desktop creates a hidden auto date/time table for
each date column, providing all of the following conditions are true:
How it works
Each auto date/time table is in fact a calculated table that generates rows of data by
using the DAX CALENDAR function. Each table also includes six calculated columns: Day,
MonthNo, Month, QuarterNo, Quarter, and Year.
7 Note
Power BI translates and formats column names and values according to the model
language. For example, if the model was created by using English, it will still show
month names, and so on, in English, even if viewed with a Korean client.
Power BI Desktop also creates a relationship between the auto date/time table's Date
column and the model date column.
The auto date/time table contains full calendar years encompassing all date values
stored in the model date column. For example, if the earliest value in a date column is
March 20, 2016 and the latest value is October 23, 2019, the table will contain 1,461
rows. It represents one row for each date in the four calendar years 2016 to 2019. When
Power BI refreshes the model, each auto date/time table is also refreshed. This way, the
model always contains dates that encompass the date column values.
If it were possible to see the rows of an auto date/time table, they would look similar to
the following example. The example shows seven columns with 10 rows of data from
January 1, 2019 to January 10, 2019.
7 Note
Auto date/time tables are permanently hidden, even from modelers. They don’t
appear in the Fields pane or the Model view diagram, and its rows don’t appear in
Data view. Also, the table and its column can’t be directly referenced by DAX
expressions.
Further, it's not possible to work with them when using Analyze in Excel, or
connecting to the model by using non-Power BI report designers.
The table also defines a hierarchy, providing visuals with a drill-down path through year,
quarter, month, and day levels.
If it were possible to see an auto date/time table in the Model view diagram, it would
look like the following tables with related columns highlighted:
Work with auto date/time
When an auto date/time table exists for a date column (and that column is visible),
report authors won't find that column as a field in the Fields pane. Instead, they find an
expandable object that has the name of the date column. You can easily identify it
because it's adorned with a calendar icon. When report authors expand the calendar
object, they find a hierarchy named Date Hierarchy. After they expand the hierarchy,
they find four levels: Year, Quarter, Month, and Day.
The auto date/time generated hierarchy can be used to configure a visual in exactly the
same way that regular hierarchies can be used. Visuals can be configured by using the
entire Date Hierarchy hierarchy, or specific levels of the hierarchy.
There is, however, one added capability not supported by regular hierarchies. When the
auto date/time hierarchy—or a level from the hierarchy—is added to a visual well, report
authors can toggle between using the hierarchy or the date column. This approach
makes sense for some visuals, when all they require is the date column, not the
hierarchy and its levels. They start by configuring the visual field (right-click the visual
field, or select the down-arrow), and then by using the context menu to switch between
the date column or the date hierarchy.
Lastly, model calculations, written in DAX, can reference a date column directly, or the
hidden auto date/time table columns indirectly.
Formulas written in Power BI Desktop can reference a date column in the usual way. The
auto date/time table columns, however, must be referenced by using a special extended
syntax. You start by first referencing the date column, and then following it by a period
(.). The formula bar auto complete will then allow you to select a column from the auto
date/time table.
In Power BI Desktop, a valid measure expression could read:
DAX
7 Note
While this measure expression is valid in Power BI Desktop, it's not correct DAX
syntax. Internally, Power BI Desktop transposes your expression to reference the
true (hidden) auto date/time table column.
The current file option, too, can also be turned on or off at any time. When turned on,
auto date/time tables are created. When turned off, any auto date/time tables are
removed from the model.
U Caution
Take care when you turn the current file option off, because this will remove the
auto date/time tables. Be sure to fix any broken report filters or visuals that had
been configured to use them.
In Power BI Desktop, you select File > Options and settings > Options, and then select
either the Global or Current File page. On either page, the option exists in the Time
intelligence section.
Next steps
For more information related to this article, check out the following resources:
The External Tools ribbon provides easy access to external tools that are installed locally
and registered with Power BI Desktop. When launched from the External Tools ribbon,
Power BI Desktop passes the name and port number of its internal data model engine
instance and the current model name to the tool. The tool then automatically connects,
providing a seamless connection experience.
Semantic modeling - Open-source tools such as DAX Studio, ALM Toolkit, Tabular
Editor, and Metadata Translator extend Power BI Desktop functionality for specific data
modeling scenarios such as Data Analysis Expressions (DAX) query and expression
optimization, application lifecycle management (ALM), and metadata translation.
Data analysis - Tools for connecting to a model in read-only to query data and perform
other analysis tasks. For example, a tool might launch Python, Excel, and Power BI
Report Builder. The tool connects the client application to the model in Power BI
Desktop for testing and analysis without having to first publish the Power BI Desktop
(pbix) file to the Power BI service. Tools to document a Power BI dataset also fall into this
category.
Miscellaneous - Some external tools don’t connect to a model at all, but instead extend
Power BI Desktop to make helpful tips and make helpful content more readily
accessible. For example, PBI.tips tutorials, DAX Guide from sqlbi.com, and the
PowerBI.tips Product Business Ops community tool, make installation of a large selection
of external tools easier. These tools also assist registration with Power BI Desktop,
including DAX Studio, ALM Toolkit, Tabular Editor, and many others easy.
Custom - Integrate your own scripts and tools by adding a *.pbitool.json document to
the Power BI Desktop\External Tools folder.
Before installing external tools, keep the following notes in mind:
External tools aren't supported in Power BI Desktop for Power BI Report Server.
Tool Description
PowerBI.tips An easy to use deployment tool for adding external tools extensions to Power BI
- Business Desktop. The Business Ops goal is to provide a one stop shop for installing all the
Ops latest versions of external tools. To learn more, go to PowerBI.tips - Business Ops .
Tabular Model creators can easily build, maintain, and manage tabular models by using an
Editor intuitive and lightweight editor. A hierarchical view shows all objects in your tabular
model organized by display folders, with support for multi-select property editing
and DAX syntax highlighting. To learn more, go to tabulareditor.com .
DAX Studio A feature-rich tool for DAX authoring, diagnosis, performance tuning, and analysis.
Features include object browsing, integrated tracing, query execution breakdowns
with detailed statistics, DAX syntax highlighting and formatting. To get the latest,
go to DAX Studio on GitHub.
ALM Toolkit A schema compare tool for Power BI models and datasets, used for application
lifecycle management (ALM) scenarios. You can perform straightforward
deployment across environments and retain incremental refresh historical data. You
can diff and merge metadata files, branches, and repos. You can also reuse
common definitions between datasets. To get the latest, go to alm-toolkit.com .
Metadata Streamlines localization of Power BI models and datasets. The tool can
Translator automatically translate captions, descriptions, and display folder names of tables,
columns, measures, and hierarchies. The tool translates by using the machine
translation technology of Azure Cognitive Services. You can also export and import
translations via Comma Separated Values (.csv) files for convenient bulk editing in
Excel or a localization tool. To get the latest, go to Metadata Translator on
GitHub.
When Power BI Desktop launches Analysis Services as its analytical data engine, it
dynamically assigns a random port number. It also loads the model with a randomly
generated name in the form of a globally unique identifier (GUID). Because these
connection parameters change with every Power BI Desktop session, it's difficult for
external tools to discover on their own the correct Analysis Services instance and model
to connect to. External tools integration solves this problem by allowing Power BI
Desktop to send the Analysis Services server name, port number, and model name to
the tool as command-line parameters when starting the external tool from the External
Tools ribbon, as shown in the following diagram.
With the Analysis Services Server name, port number, and model name, the tool uses
Analysis Services client libraries to establish a connection to the model, retrieve
metadata, and execute DAX or MDX queries. Whenever an external data modeling tool
updates the metadata, Power BI Desktop synchronizes the changes so that the Power BI
Desktop user interface reflects the current state of the model accurately. Keep in mind
there are some limitations to the synchronization capabilities as described later.
Tables No
Columns Yes 1
Relationships Yes
Measures Yes
Perspectives Yes
Translations Yes
Annotations Yes
M expressions No
1 - When using external tools to connect to the AS instance, changing a column's data
type is supported, however, renaming columns is not supported.
Power BI Desktop project files offer a broader scope of supported write operations.
Those objects and operations that don't support write operations by using external tools
to connect to Power BI Desktop's Analysis Services instance may be supported by
editing Power BI Desktop project files. To learn more, see Power BI Desktop projects -
Model authoring.
an icon, the tool appears in the External Tools ribbon. Some tools, like ALM Toolkit and
DAX Studio create the registration file automatically when you install the tool. However,
many tools, like SQL Profiler typically don't because the installer they do have doesn't
include creating a registration file for Power BI Desktop. Tools that don't automatically
register with Power BI Desktop can be registered manually by creating a *.pbitool.json
registration file.
A value of 1 (decimal) enables the External Tools ribbon, which is also the default value.
See also
Register an external tool
Register an external tool
Article • 01/12/2023
Some tools must be manually registered with Power BI Desktop. To register an external
tool, create a JSON file with the following example code:
JSON
{
"name": "<tool name>",
"description": "<tool description>",
"path": "<tool executable path>",
"arguments": "<optional command line arguments>",
"iconData": "image/png;base64,<encoded png icon data>"
}
name: Provide a name for the tool, which will appear as a button caption in the
External Tools ribbon within Power BI Desktop.
description: (optional) Provide a description, which will appear as a tooltip on the
External Tools ribbon button within Power BI Desktop.
path: Provide the fully qualified path to the tool executable.
arguments: (optional) Provide a string of command-line arguments that the tool
executable should be launched with. You can use any of the following
placeholders:
%server%: Replaced with the server name and portnumber of the local instance
of Analysis Services Tabular for imported/DirectQuery data models.
%database%: Replaced with the database name of the model hosted in the
local instance of Analysis Services Tabular for imported/DirectQuery data
models.
iconData: Provide image data, which will be rendered as a button icon in the
External Tools ribbon within Power BI Desktop. The string should be formatted
according to the syntax for Data URIs without the "data:" prefix.
Name the file "<tool name>.pbitool.json" and place it in the following folder:
Example
The following *.pbitool.json file launches powershell.exe from the External Tools ribbon
and runs a script called pbiToolsDemo.ps1. The script passes the server name and port
number in the -Server parameter and the dataset name in the -Database parameter.
JSON
{
"version": "1.0.0",
"name": "External Tools Demo",
"description": "Launches PowerShell and runs a script that outputs
server and database parameters. (Requires elevated PowerShell
permissions.)",
"path":
"C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe",
"arguments": "C:\\pbiToolsDemo.ps1 -Server \"%server%\" -Database
\"%database%\"",
"iconData":
"image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs
4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsEAAA7BAbiRa+0AAAANSURBVBhXY/jH9+8/AA
ciAwpql7QkAAAAAElFTkSuQmCC"
}
PowerShell
[CmdletBinding()]
param
(
[Parameter(Mandatory = $true)]
[string] $Server,
[Parameter(Mandatory = $true)]
[string] $Database
)
Write-Host ""
Write-Host "Analysis Services instance: " -NoNewline
Write-Host "$Server" -ForegroundColor Yellow
Write-Host "Dataset name: " -NoNewline
Write-Host "$Database" -ForegroundColor Green
Write-Host ""
Read-Host -Prompt 'Press [Enter] to close this window'
Icon data URIs
To include an icon in the External Tools ribbon, the pbitool.json registration file must
include an iconData element.
The iconData element takes a data URI without the data: prefix. For example, the data
URI of a one pixel magenta png image is:
data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAAAXNSR0IArs4c6
QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsEAAA7BAbiRa+0AAAANSURBVBhXY/jH9+8/AAciAwpql7Qk
AAAAAElFTkSuQmCC
Be sure to remove the data: prefix, as shown in the pbitool.json preceding example.
To convert a .png or other image file type to a data URI, use an online tool or a custom
tool such as the one shown in the following C# code snippet:
c#
string ImageDataUri;
OpenFileDialog openFileDialog1 = new OpenFileDialog();
openFileDialog1.Filter = "PNG Files (.png)|*.png|All Files (*.*)|*.*";
openFileDialog1.FilterIndex = 1;
openFileDialog1.Multiselect = false;
openFileDialog1.CheckFileExists = true;
bool? userClickedOK = openFileDialog1.ShowDialog();
if (userClickedOK == true)
{
var fileName = openFileDialog1.FileName;
var sb = new StringBuilder();
sb.Append("image/")
.Append((System.IO.Path.GetExtension(fileName) ??
"png").Replace(".", ""))
.Append(";base64,")
.Append(Convert.ToBase64String(File.ReadAllBytes(fileName)));
ImageDataUri = sb.ToString();
}
See also
External tools in Power BI Desktop
Analysis Services client libraries
Tabular Object Model (TOM)
Use the Field list in Power BI Desktop
Article • 08/11/2023
The lists in the Field pane, called the Data pane in current releases of Power BI Desktop,
are being unified across Model view, Data view and Report view in Power BI Desktop.
Unifying these views creates consistency for functionality and the user interface (UI)
across views, and addresses customer feedback.
Iconography
Search functionality
Context menu items
Similar drag-drop behavior
Tooltips
Accessibility improvements
The intent is to improve Power BI Desktop usability. The changes should be minimal on
your typical data workflow. To view the Fields pane (or the Data pane in current releases
of Power BI Desktop), add data to your model and select the pane from the area to the
right of the canvas.
Original Field list (Model view) New Field list (Model view)
Original New
Icons and UI
Original Field list (Model view) New Field list (Model view)
Tooltips
Numeric calculated column: A new column you create with a DAX formula
that defines the column’s values. For more information, see Create calculated
columns in Power BI Desktop.
Measure: A measure has its own hard-coded formula. Report viewers can’t
change the calculation, for example, if it’s a sum, it can only be a sum. The
values aren't stored in a column. They're calculated on the fly, depending
solely on their location in a visual. For more information, see Create
measures for data analysis in Power BI Desktop.
Measure group.
KPI: A visual cue that communicates the amount of progress made toward a
measurable goal. For more information, see Create key performance
indicator (KPI) visualizations.
Hierarchy of fields: Select the arrow to see the fields that make up the
hierarchy. For more information, see Creating and working with hierarchies in
Power BI (3-11g) on YouTube.
Geo data: These location fields can be used to create map visualizations.
Identity field: Fields with this icon are unique fields, set to show all values,
even if they have duplicates. For example, your data might have records for
two different people named 'Robin Smith', and each is treated as unique.
They aren't summed.
Parameter: Set parameters to make parts of your reports and data models
(such as a query filter, a data source reference, a measure definition, etc.)
depend on one or more parameter values. For more information, see Deep
Dive into Query Parameters and Power BI Templates .
Calculated table: A table created with a DAX formula based on data already
loaded into the model. Calculated tables are best used for intermediate
calculations and you want to store as part of the model.
Warning: A calculated field with an error. For example, the syntax of the DAX
expression might be incorrect.
Group: Values in this column are based on grouping values from another
column by using the groups and bins feature. For more information, see Use
grouping and binning in Power BI Desktop.
no Change detection measure: When you configure a page for automatic page
original refresh, you can configure a change detection measure that is queried to
icon determine if the rest of a page’s visuals should be updated.
Next steps
You might also be interested in the following articles:
The formula editor (often referred to as the DAX editor) includes robust editing and
shortcut enhancements to make authoring and editing formulas easy and intuitive.
Ctrl+G Go to line…
Next steps
The following articles provide more information about formulas and DAX in Power BI
Desktop.