zondag 19 februari 2017

DAX : Bidirectional problem with Product and Bill of Materials

Introduction

Suppose your customer wants to calculate the weight of the components that makes up your product (Bill of Material aka the BOM). This blogpost is a description about summarizing and calculating of quantities and the weight of products and sales.

For this blogpost I've used the explanantion of Exceleratorbi.com.

The Data

So, I've created a small example of the problem. There are three tables in the sample workbook and they are a Material (Product), BOM and a Sales facttable. In order to make the example more interesting I've created some data like car, bicycle and others, together with the parts that made them up like wheels, frame, steering wheel. Lastly a fact with the sales data is created where a quantity tells how many was sold on a day.


So. I'm interested in knowing the weight of the materials and also knowing the weight of the parts of the Material (Product).

The data model

In the diagram I've plotted the datamodel. The keys of the DimMaterial table are propagated in different directions and not in one direction we 'normally' use in dimensional modelling.


The DAX expressions

For the initial implementation of the problem in the workbook the following DAX expressions are used. Pay attention to the TotalMaterialWeight DAX expression calculation. This DAX expression uses a SUMX DAX function. It calculates the weight on a row-by-row manner. SUMX is an interator and is less efficient that the SUM function. Normally, I don't use the SUMX function very often, but in this case it is needed because you need to multiply the rows first and then sum it up.

       
TotalQuantity:=SUM(FactSales[Quantity])

TotalBOMWeight:=SUM(DimBom[BOMWeight])

TotalMaterialWeight:=SUMX(DimMaterial;[TotalQuantity]*[TotalBOMWeight])

The first DAX expression calculates the quantity sold in the FactSales, the TotalBOMWeight expression calculates the weight of the individual BOM parts and the TotalMaterialWeight calcualtes the summarization of the multiplication of the TotalQuantity and TotalBOMWeight.

The outcome

Below a first impression of the Pivottable based on the PowerPivot model and here you can see that the TotalQuantity, the TotalMaterialWeight are calcualted correctly. Next to the pivottable I've included a manual calculation and it is the same as the automated calculation


Below a screenshot of the weights of the BOM parts and that calculation is also correct.


How to improve the model

Now there is something wrong with the TotalBOMWeight. It is 1225 on every row and that doesn't seem right. The problem is that the relation towards the DimBom is not enhanced in a standard manner. In PowerBI Desktop you can set the relation to be bi directional in order to make it work. Now, in order to change this behavior in PowerPivot you have to add some black magic to the DAX expression.

       
TotalSalesQuantity:=CALCULATE(SUM(FactSales[Quantity]);DimBom)

TotalBOMWeight:=Calculate(SUM(DimBom[BOMWeight]);FactSales)

TotalMaterialWeight:=SUMX(DimMaterial;[TotalSalesQuantity]*[TotalBOMWeight])
 

The sums of TotalSalesQuantity and TotalBOMWeight are now wrapped with a CALCULATE function and this handles the calculation of the TotalBOMWeight better.




For a full explanation of this solution, see the blogpost of Exceleratorbi.com.

Conclusion

This blogpost is about the calculation of measures that is not a standard pattern. The method descibed handles a calculation of the total weight of Materials (products) sold and their BOM parts (components). We also explored the problem of bidirectional relations.

Greetz,

Hennie

vrijdag 17 februari 2017

SSAS : Where did my measures go?

Introduction

Currently, developing a Tabular Model in SSDT and somehow I managed to let my measures disappear. What? Yep! This happened me a couple of times now. This blogpost is about the problem and how to solve it.

Situation described

Now it occured when I was entering some measures in SSDT.


Even though I don't saw the errors in the formulas, a blue mark is show in the tab. A bit annoying and you can't see why it's giving me the error. No measure with error  is shown.




Solution option 1

In the forums of Microsoft someone suggested to delete the layout file in order to get the error measures back. So I did..



When I do that, the erroneous measures are shown again but all the measures are now shown on two rows and it is completely reformatted. 


Not what I want! I have to copy and paste manually the formulas to a format that I appreciate. This is a bit labor intensive when you have a lot of measures.

Solution option 2

Now after investigating the files more closely I opened the bim file. Always make a copy of the file!


And I removed the following snippet (although the warning says that you don't want to change it manually;-) )

       
   <Command>
                  <Text>----------------------------------------------------------
-- PowerPivot measures command (do not modify manually) --
----------------------------------------------------------
CREATE MEASURE 'FactSales'[Copy of PYSalesNetRevenue]=CALCULATE([TotalSalesNetRevenue], FILTER(SAMEPERIODLASTYEAR(DimCalendar[Date]), DimCalendar[Date] &lt;= LASTNONBLANK(DimCalendar[Date], [TotalSalesNetRevenue])));
</Text>
                  <Annotations>
                    <Annotation>
                      <Name>FullName</Name>
                      <Value>Copy of PYSalesNetRevenue</Value>
                    </Annotation>
                    <Annotation>
                      <Name>Table</Name>
                      <Value>FactSales</Value>
                    </Annotation>
                  </Annotations>
                </Command>

That solved my problem. The layout of my measures stayed the same and the wrong measures were removed from the model.


Conclusion

It's possible to let measures disappear from SSDT but with editing the .bim file you can remove the measures.

Greetz,

Hennie

zondag 12 februari 2017

SSAS : Processing a Tabular Model

Introduction

This blogpost is about processing a tabular model in Analysis Services in SQL Server 2014. At first glance, the number of processing options is a bit overwhelming but when you have a certain understanding about the options you get the idea behind it. Although, I've gathered this information for SQL Server 2014, most of it is applicable for other versions as well.

I've read a couple of blogposts about processing a tabular model and most of this blogpost is based on the following blogs:
So all the credits goes the people who wrote these blogs.

Stages of loading data

There are two stages of loading data in a tabular model. First, The xVelocity engine takes each column in each table and builds a dictionary from it. 
Secondly, data recalculation takes place, meaning that the engine populates any calculated columns, hierarchies, relationships and indexes for the database.

Processing options

As said before, SSAS Tabular model comes with a couple of processing options that can be used differently depending on the object you want to process. 


Process Add

With the processing option Process Add it is possible to implement incremental loading of the cube. By defining the rows that are new to the cube, incremental loading of the cube is possible. As Marco Russo is writing in his blog, when executing the Process Add option, the SSAS engine will create a new partition and this is merged with the existing partition.

An effect of the Process Add is that automatically recalculates the affected objects.

Process Clear

Process Clear drops all the data in a database, table, or partition. Is there more to say;-)


Process Data

Process Data loads data into a partition or a table. It reads the data and it builds dictionaries, but doesn't process any calculated columns, indexes, relationships or hierarchies. Therefore it's required to perform a Process Recalc after all tables' dictionaries have been updated using Process Data.


Process Default

Process Default loads data into unprocessed partitions or tables. Any affected calculated columns, relationships, user hierarchies, or internal engine structures (except table dictionaries) are recalculated. It recalculates the affected objects (and only the affected objects).


Process Defrag

Process Defrag optimizes the table dictionary (an internal engine structure) for a given table or for all tables in the database. This operation removes all dictionary entries that no longer exist in the data set and rebuilds the partition data based on the new dictionaries.

If you often add or remove partitions from a table, or if you often reprocess individual partitions in a table, you should Process Defrag regularly but carefully. 

Process Defrag may be an expensive operation for a large table.

If you Process Data on a table level, it is never necessary to do a Process Defrag

Run Process Recalc after running Process Defrag.

Process Full

Process Full loads data into all selected partitions or tables. Any affected calculated columns, relationships, user hierarchies, or internal engine structures (except table dictionaries) are recalculated. It recalculates the affected objects. Process Full runs a Process Data on the table or partition, followed by a Process Recalc.


Process Recalc

For all tables in the database, recalculates calculated columns,  Process Recalc rebuilds relationships. rebuilds user hierarchies, and rebuilds other internal engine structures. Table dictionaries are not affected.

Process Recalc is always the last thing you do, and you can batch up as many processing operations as you want before issuing that last Process Recalc.

Keep in mind that you can only issue Process Recalc at the database level. This means that after a Process Data operation on a dimension, you may be stuck recalculating a large fact table when it was not impacted by the initial processing operation.


Processing scenarios

I've read a couple of interesting options for loading the tabular model. 


1) Full load
With Process Full you can perform a full load of the database, partition or a table. It will drop all of the data and then it will process the object. This can become handy for an initial loading of the cube or in case of a structural change of an object in the cube or the cube itself. In that case, you need to redeploy the cube and perform a Process Full of the cube.

2) Incremental processing
Two options:
  • 1) Create a new partition and then use Process Data to populate it. This new partition can then be merged with an existing partition if you like.
  • 2) Use Process Add to add data to the existing partition. If used make sure that you don’t add duplicate rows. 
          Two options:
    • Change the partition definition to the query that returns the set of rows you want returned by the incremental processing operation. 
    • If your partition definition is based on a view, make sure that the view on the original data source now returns just the set of rows you want returned by the incremental processing operation.

3) Semi-incremental processing
Yet another option is to create a partition for a year or a month in a table and process that with a Process Data. This will limit the data loading significantly but it 's simple but effective.

No matter how you do incremental processing, you must Process Recalc on the database afterwards before moving on with your life. If you do this enough, you’ll want to Process Defrag too.


Greetz,

Hennie


vrijdag 27 januari 2017

Slowing Changing Dimensions (SCD) explained

Introduction

Changes in data in a data warehouse (Slowing Changing Dimension principle) is a key concept of a data warehouse. If you track the changes in a data warehouse it is possible to go back in time and see what happened with the data. That is an advantage. The disadvantage is that complexity is added in the ETL process and in the reports.

In this blogpost I'll explain the fundamentals of Slowly Changing Dimensions with a small example in PowerPivot and Excel.

Data model

Below, the data model is shown with two dimensions and a fact. The dimensions are a product dimension, in which products are categorized and one date dimension (granularity month). The fact table is a simple fact table that have a number of keys to the dimensions DimCalendar and DimProduct. There is a measure NetRevenue present.



Example data

For this article, I have created a small example in Excel where a product with a ID 4 draws some attention, the hairdresser. This product is in the category of transportation. This is accidentally or it may have some other reason. But, most logical is that the product is in the wrong category.


If we handle this in a graph with the categories Personal Care and Transportation together, it looks like this.
  



We look to the category Personal Care only and it looks as follows.


Slicing the data with the category "Transportation" will show this graph.


Slowly Changing Dimension 1 (TYPE I)

Meanwhile, we found out (in this scenario) that the categories are not good. We see that the product Hairdryer falls into the wrong category. We change the category to Transportation for  Hairdryer. This looks like this:




If we change the categories (hairdryer is personal care) then this has effect on the charts. It changes the charts and not just at the moment of the change but also retroactively. Even if we correct data now, everything is changed in the past. This may be desirable (most likely in this particular example), but it may also have undesirable effects in other situations.

For example, it can also take place at the moment that a company changes their range of products. What happens then? New products and categories are created, products may have moved to another category, products are made inactive or deleted in the source systems. In short, it depends on what concern is the source of the kind of change. Is there an error or is a strategic choice the reason to change the products and categories. It is difficult to guess the true reason of a change in the data.

Let's see what if a product changes from category. Product4, the hairdryer has changed category Transportation to Personal Care. The following occurs. It seems that the change generated additional revenue in the year for the category Personal Care. Good news for the sales manager who is responsible for the category Personal Care.




However, there is also a change that takes place in the chart for the category Transportation. Unfortunately, the sales manager will have to do with a lesser bonus this year. He or she may enjoy less bonus. The turnover has moved from the category Transportation to Personal Care.



Slowly Changing Dimension 2 (TYPE II)

May be it is desirable what I have descibed in the former paragraph (a logical correction of the categories), but it can also have undesirable effects, for instance when another series of products is introduced. Then you want the figures as they were and these should be calculated differently in contrast with the new product series. How can we achieve this? Suppose we make the product dimension SCD2? What happens then? If there is a change in the source system then both the old record is preserved and a new record is created. See below on the right side There is a new record created for hairdryer and a new record ProductID 8. Suppose that a new record is loaded into the data warehouse? See this event on the left. At 201801 there will be a sales record with a revenue of 500.




And here you can see now that nothing has changed in 2017 for both categories? Below the composite chart with NetRevenue both personal care as well as transportation. In 2017 there is absolutely nothing that changed, but we had this in SCD1 too. The total of the categories remained the same.




Let's look at the NetRevenue of the underlying products in each category. We also see no change for 2017. 


And Personal Care chart looks as follows for 2017



Slowly Changing Dimension 3 (TYPE 3)

There is another variant that can be very useful, namely SCD3. This is a variant which can be used in the case of a comparison between an old and a new situation. Let us consider a change in the producthierarchy. Let's take the example above with products and categories. In the above case, the producthierarchy can change at a given time (rightly or wrongly). Suppose we now want to see the new data with the old producthierarchy. In other words, it would be like there is no new producthierarchy. What would have looked the numbers? And they also want to know how the figures in the past with the new product hierarchy would have seen if there had always been a new producthierarchy. The old product hierarchy has never existed. Two varieties :
  • With an old producthierachy you want to see both old and new data.
  • A new product hierarchy can see both the old and the new data.

In the data model, you can fix this. I've added an extra column and the old one is renamed to Category Before and the after producthierarchy we call CategoryAfter.


Now we can therefore make a distinction between 'for the change' and 'after the change'. In the example below, there is now a new slicer, namely CategoryAfter. With this we can now switch between the new and the old situation and compare them to see what has changed.



Conclusion

In this article, we discussed the impact of a change in dimension data . In addition, various options put forward to be here to deal with in order to achieve a desired result. The suggestions made are not applicable to every situation, "it depends." It is necessary to check each situation.

Greetz,

Hennie

dinsdag 8 november 2016

SQL : ColumnStore (part III) : enforce uniqueness (SQL Server 2014)

Introduction

Columnstores in SQL Server 2014 can not enforce primary keys and you can't create an unique non clustered index in contrast with SQL server 2016, where you are able to create an unique non clustered index on the ColumnStore table. That's a pity because I have to work with SQL Server 2014. Now, how can you enforce a semi primary key on a ColumnStore table? Well, there seems to be a way to do that with indexed views together with an unique clustered index. In this blogpost I'll show you the way. For this blogpost I've used the blogpost of MSDN where you can find more information about this.

Enforce uniqueness with Indexed view

For this blogpost I've created a simple table.
       
/*
DROP TABLE testIndexedVIEW
GO
*/
CREATE TABLE testIndexedVIEW
(TestID int)
GO     
 

The next step is creating a ColumStore table with the following statement:
       
CREATE CLUSTERED COLUMNSTORE INDEX CCI_testIndexedVIEW ON testIndexedVIEW;  
GO        
 

Next step is building the indexed view on this table (WITH SCHEMABINDING):
       
/* 
DROP VIEW dbo.testIndexedViewUniqueness
GO
*/

CREATE VIEW dbo.testIndexedViewUniqueness
WITH SCHEMABINDING
AS 
SELECT TestID
FROM dbo.testIndexedVIEW
GO      
 

And now create a Unique Clustered Index on this Indexed view:
       

CREATE UNIQUE CLUSTERED INDEX UNCI_testIndexedViewUniqueness ON testIndexedViewUniqueness(TestID)
GO
       
 

That's it. The columnstore, indexed view and Unique Clustered Index are in place and now it's time to test this by inserting twice the same value in the column with the Unique Clustered Index.
       
INSERT INTO [dbo].testIndexedVIEW
           (TestID)
     VALUES
           (1)

INSERT INTO [dbo].testIndexedVIEW
           (TestID)
     VALUES
           (1)     
 

Resulting in an error message.

Level 14, State 1, Line 21
Cannot insert duplicate key row in object 'dbo.testIndexedVIEW' with unique index 'UNCI_testIndexedViewUniqueness'. The duplicate key value is (1).
The statement has been terminated.

Conclusion

It's not possible to create an Unique Clustered Index on a ColumnStore table in SQL Server 2014. With an Indexed View, together with a Unique Nonclustered index you can enforce the uniqueness of the key fields.

Greetz,

Hennie