maandag 8 december 2014

Let's Disco!

Introduction

Disco is a process mining tool that let's you discover processes by analyzing event logs. It's suitable for analyzing transactions that happen in a process. For instance, a sales process or a order process are examples that can be analyzed. You can examine bottlenecks in your process, loops, time, durations, averages, fast/slow lanes, conformance issues, resource performance, etc.

So, processmining with a tool like ProM, Disco (and I've even seen a demo of SAS Visual Analytics with a Sankey Diagram) are very well suited for process model analysis. Different tooling are enhancing process analysis. Process analysis can be a great addition to Business Intelligence. Where Business intelligence is more like phishing with a fishing rod in an organisation for KPI's and Business Metrics, Process Mining is much more 'on' the process. Business Intelligence and Process Mining can work together to optimize processes.

A simple tour through Disco

In this blogpost, I've included some screenshots about Disco. I've used a small file that I've borrowed from the Coursera Course "Process mining:Data Science in Action" lectured by TUE, Wil van der Aalst. Below the opening screen of Disco.


The first thing that you have to is load the data. In this case a csv file.


Then you have to set the CaseID, Events, Timestamps and the resources. This is needed for analysis of process.




The next step is importing the data and generating the processmap. Below an example of the duration of the processsteps.


Here an example of the mean of durations


And below, an example when you play the data on the model. The yellow/red dots are markers that flow through the model



 And below some statistical analysis with Disco:


Some more information about the cases.


Conclusion

Disco is a great tool for analysis of processes. Process mining can be a great addition to Business Intelligence and very helpful for analysis of processes. Both analyses processes but on different levels.

Greetz,
Hennie

donderdag 4 december 2014

Process mining

Introduction

Currently following the course processmining on Coursera and this course is combination of data mining and processmodels. Like the same as with Business intelligence (sort of), processmining analyzes data about processes in an organisation. I'm quite enthusiastic about this approach because it analyzes processes on a scientific manner with data mining. Datamining analyzes data but not with the (direct) aim of looking at processes in an organisation. Processmining does. There is also a relation between BPM (Business Process modelling). In the course sometimes BPM models are used aside petrinets.

Although I've just started with the course, I want to share some interesting things, I've come along during the course. In this blog post I'll describe this.

Defining process mining

The definition of Processmining according to Wikipedia :

"Process mining is a process management technique that allows for the analysis of business processes based on event logs.".

But in my opinion event logs is a bit of a narrow keyword. A more broader definition could be applicable. Think about facts in a star schema or satellite information in a Datavault model that are very often used in business intelligence and data warehousing. These are transactions that happens in the operations of an organisation These are also events. Events that happened. Think about an order entry system with statuses. Sometimes, customers told me how the order entry process worked and when I studied the different statuses an order should have, I sometimes found out that different sequences of the order process were possible. With process mining you can identify undiscovered routes of your business process in automated way. This is truly an addition in the  field of Business intelligence, Lean Six Sigma,  datamining and BPM.

Just a simple example, suppose from the customer you hear that the model is this (orders with order statusses):



But when we study the transactions of the order entry system the following is noticed (records are identified by a case ID (the grouping of the records) and the activity at a certain moment):

Here we see that order 4568 is reopened and this should not have supposed to happen according to the designed model. After analyzing the events in a log or perhaps a transactional modelled star schema the model appears like this (corrected):


It could mean that in the operational process order entry personnel has reopened the order for some reason. If you want optimize the process in order to reduce the wastes (Lean Six sigma) than this is very interesting information. Process mining can do this for you.

Conclusion

Although I've just started with studying process mining, this seems a very interesting approach for analyzing processes with datamining. And, this is also applicable on huge log files and analyzing log files is one of the applications of Big data analytics. 

Hope you have read this blogpost with pleasure..

Greetz,

Hennie

maandag 1 december 2014

SSAS SSRS : Bad Performance SSRS parameters and SSAS

Introduction

SSRS Reporting parameters and SSAS cubes......There is a lot to read about on the internet about the bad performance of  parameters that are generated by Reporting Services based upon a SSAS Cube. Today, I've ran into this problem too. I've a parameter with a dataset that queries a customer table of about 40000 records. Peanuts for a simple SQL query, but not for a SSAS cube (pfffff). And, I'm not the only one who has encountered this problem. You can read hundreds of blogposts, forumposts and more. This is one blogpost more about this bad cooperation between SSRS and SSAS.

The lab situation is this: we have a main report with a customer parameter and a sub report. I want to pass a Customer list via the parameter to the subreport in order to show detailrecords. In my real word example a map is shown and clicking on a area on the map shows the detail records that belong to a region.

Analyzing the parameter performance problem

The initial problem of a parameter with 40000 records is simple. It doesn't perform! I will show you what kind of issues rise when you create a parameter with 40000 records.  First create a dataset with a parameter "Customer" by checking the parameter option in the query window.




A parameter dataset is created (you can't see that because it's hidden). Choose for "Show hidden datasets". If you run the report, you'll notice that the performance is bad..very bad. When you open profiler to take a look at what is happening during cube querying a lot of serialize events happen and these happen often (per 1000) and these are sloooowwww. The serialize results current is a very slow process. So processing the query is very fast but serializing the result for the Report is slow.



If you take a look at this diagram, you can see that the serialize is done from the Compute Cell data part and it's transferred to the client application.


Now, I haven't found a silver bullet unfortunately. But Chris Webb helped me a bit further. In his blogpost, Chris Web explains this problem a bit further and he gives a solution. Well, sort of but I'll explain this later. Suppose you add a parameter to a dataset according to the following screenshot. 



Then a parameter is automatically created with a dataset:



And, the query in this dataset looks like this:

 WITH 
  MEMBER [Measures].[ParameterCaption] AS [Customer].[Customer].CURRENTMEMBER.MEMBER_CAPTION 
  MEMBER [Measures].[ParameterValue] AS [Customer].[Customer].CURRENTMEMBER.UNIQUENAME 
  MEMBER [Measures].[ParameterLevel] AS [Customer].[Customer].CURRENTMEMBER.LEVEL.ORDINAL
 SELECT 
 {[Measures].[ParameterCaption],
  [Measures].[ParameterValue], 
  [Measures].[ParameterLevel]} ON COLUMNS , 
 [Customer].[Customer].ALLMEMBERS ON ROWS 
 FROM [Adventure Works]

According to Chris Webb  you can replace this query by an OLEDB query. This query performs much faster than the generated query by SSRS.

WITH
 MEMBER MEASURES.DUMMY AS NULL
SELECT
{MEASURES.DUMMY}
ON COLUMNS
, [Customer].[Customer].ALLMEMBERS
DIMENSION PROPERTIES UNIQUE_NAME, MEMBER_CAPTION, LEVEL_NUMBER
ON ROWS
FROM [Adventure Works]
CELL PROPERTIES VALUE

This speeds up the loading the data into the parameter.

Invalid URI : The URI String is too long 

The next problem I had was that I got an error when I call a subreport with the same parameter. It seems that there is limitation of 65520 characters that you can pass with a parameter. This is the error:


A limitation of the number of characters that can be passed through a subreport. Theo Lachevs explains why. There are a couple of options. The best option is to limit the number of characters of the parameter that are passed to the subreport. This is not perfect because of growing the amount of the data somewhere in the future you'll run into problems again and you can't solve it anymore.

Okay, how do we limit the number of characters that are passed to the subreport? Well, Erika Bakse showed in het blogpost a way to limit the number of members. In this example I've created a main report and a subreport. The main report has a Customer parameter that lists customers

For this blogpost I've created a simple report with a Tablix. On the Textbox Properties I've created an Action with an expression.



This is the expression that is passed to the subreport:

=Join(Parameters!CustomerCustomerName.Label, "],")





This is the output of the expression:


And this expression is without the full membernames and therefore compacted. In the parameter of the subreport this is one large string:


With the following expression you can change these CSV separated string into full membernames:

="[Customer].[Customer Name].&[" + REPLACE(Parameters!Customer.Label(0) + "]", "],", "],[Customer].[Customer Name].&[")

Shown below:

The output of the expression is this:


And now we can pass this to a dataset and filter the records with this list.

Conclusion

This URI problem happens quite often. I encountered this problem on numerous environments with SQL Server 2008, SQL Server 2008 R2 and SQL Server 2012 in different situations. I've also encountered different behaviour between SSRS, Report manager and Reportbuilder.

Greetz

Hennie