Create your own selfservice datawarehouse with PowerBi.com

Topic: Business Intelligence

Håkan Alsén
18.08.2015

When Bill Gates released BASIC for MSDOS he was shocked that people started to use it for enterprise solutions. I get the same feeling when I start building a data warehouse/data mart with PowerBI.com

star

I have been building a lot of data warehouse/data mart or data preparation over the years(since 2000). I have used different tools to design it (MS Access, Informatica, SSIS, OWB, ODI, Qlik). These tools gave me good metadata, good performance, some error handling, help with data warehouse structure and a design studio with toolbox for aggregation, join, filter, dedup, pivot, unpivot, slowly changing dimension, lookups and so on.

Unfortunately, there is some important weakness with these tools. Some needs specific services on the server, some needs a lot of granting, some needs specific licenses, some have difficult setup, most of them also needs upgrade scripting when new versions arrive and all of them needs specific skills.

I know a couple of BI/DW developers that skip ETL-tools and design their own solution in the database to get rid of these problems.

PowerBI.com is a new solution for BI with selfservice data preparation and ETL embedded. It gives everyone possibilities to maintain data everywhere with any browser thanks to its cloud solution.

I´m so happy to be able to create and govern data marts by myself and my team anytime, anywhere with any device without the need of IT support.

PowerBI.com uses visualization of the language m very nicely. It differs from the old tools by showing the data on the run when creating which makes it more fun and much simpler to use. It also log all steps in the editor.

Maybe you ask if it´s really possibly to create data marts with PowerBi.com. My answer is YES it is and I will explain how. Of course there is some problems e.g performance, stability and Power BI Desktop isn’t directly build for team development with version handling, test and multi user environment.

But I liked it so much when I started to work with it that my mission was to create a dimensional data warehouse structure in it.

So this is the story of a selfservice finance data warehouse.

First I started to import my data needed for analysis and planning. I wanted to integrate data from a finance system(agresso) and put that data together with 25 different planning sheets from Excel. The goal was to be able to better act with these insights.

Importing data into PowerBI.com needs, a gateway(private) to connect to the source. Of course you need some privileges to handle that. You need this also for your scheduling and for your data reload.

To organize my solution for a data mart I created different folders for agresso, planning, staging and data mart. Folder was namned Agresso(finance system), Plan(Planning), Stage and Mart Finance(Finance datamart) for this solution.

folders

After importing I clean the data inside PowerBI. Everything I took care of directly helped me through the other steps. In my agresso source system there was a lot of null and blank dimension value. I took care of them by replacing them with -1.

minus1

This helped me preparing for the data warehouse keys that disconnect the solution from the source system. In the staging area I added a dimension row with values -1 for all dimensions to handle data warehouse data with unknown value.

nullvalue

To help me log when data was loaded I also added a load date column.  After that I created the new dimension key with an prebuilt index column function in PowerBI.com.

dwkey

Now I was able to build my dimensions with all my descriptions for specific subject. I changed some names and added data from different tables(outer join, inner join, full outer join) in query builder.

dimensions

Then I started the work with my fact table. First I loaded the different fact tables to the landing area named by the source system. Then in the stage area I referered to the landing table in PowerBI.com and cleaned it to get rid of all descriptions and other data that isnt allowed to be placed inside a fact table.

I also added all the new synthetic keys from the dimensions with nestedjoins.

outerjoin

Then I integrated the data into one fact table with decided granularity.

When this work was done I was able to create a star for the analysis based on dimension and measures.

star

This star solution helps all the users to build their own reports and analysis upon a well-structured solution.

I think it is possible to replace 50% of my old DW solutions with PowerBI.

It will give users better understanding of the data loading because the users can see all the steps by themselves and also correct them very easy. It doesnt cost you a lot to build and set up because you dont need to buy a new server.

This solution took me only 8 hours to build.

I think that selfservice ETL is the future of BI!

Contact me if you want a demo, a lesson or consulting rearding this!

Håkan Alsén

VP Data Analytics at Knowit

Business Intelligence

Bloggen för Business Intelligence.