maandag 8 december 2014

Let's Disco!


Disco is a process mining tool that let's you discover processes by analyzing event logs. It's suitable for analyzing transactions that happen in a process. For instance, a sales process or a order process are examples that can be analyzed. You can examine bottlenecks in your process, loops, time, durations, averages, fast/slow lanes, conformance issues, resource performance, etc.

So, processmining with a tool like ProM, Disco (and I've even seen a demo of SAS Visual Analytics with a Sankey Diagram) are very well suited for process model analysis. Different tooling are enhancing process analysis. Process analysis can be a great addition to Business Intelligence. Where Business intelligence is more like phishing with a fishing rod in an organisation for KPI's and Business Metrics, Process Mining is much more 'on' the process. Business Intelligence and Process Mining can work together to optimize processes.

A simple tour through Disco

In this blogpost, I've included some screenshots about Disco. I've used a small file that I've borrowed from the Coursera Course "Process mining:Data Science in Action" lectured by TUE, Wil van der Aalst. Below the opening screen of Disco.

The first thing that you have to is load the data. In this case a csv file.

Then you have to set the CaseID, Events, Timestamps and the resources. This is needed for analysis of process.

The next step is importing the data and generating the processmap. Below an example of the duration of the processsteps.

Here an example of the mean of durations

And below, an example when you play the data on the model. The yellow/red dots are markers that flow through the model

 And below some statistical analysis with Disco:

Some more information about the cases.


Disco is a great tool for analysis of processes. Process mining can be a great addition to Business Intelligence and very helpful for analysis of processes. Both analyses processes but on different levels.


donderdag 4 december 2014

Process mining


Currently following the course processmining on Coursera and this course is combination of data mining and processmodels. Like the same as with Business intelligence (sort of), processmining analyzes data about processes in an organisation. I'm quite enthusiastic about this approach because it analyzes processes on a scientific manner with data mining. Datamining analyzes data but not with the (direct) aim of looking at processes in an organisation. Processmining does. There is also a relation between BPM (Business Process modelling). In the course sometimes BPM models are used aside petrinets.

Although I've just started with the course, I want to share some interesting things, I've come along during the course. In this blog post I'll describe this.

Defining process mining

The definition of Processmining according to Wikipedia :

"Process mining is a process management technique that allows for the analysis of business processes based on event logs.".

But in my opinion event logs is a bit of a narrow keyword. A more broader definition could be applicable. Think about facts in a star schema or satellite information in a Datavault model that are very often used in business intelligence and data warehousing. These are transactions that happens in the operations of an organisation These are also events. Events that happened. Think about an order entry system with statuses. Sometimes, customers told me how the order entry process worked and when I studied the different statuses an order should have, I sometimes found out that different sequences of the order process were possible. With process mining you can identify undiscovered routes of your business process in automated way. This is truly an addition in the  field of Business intelligence, Lean Six Sigma,  datamining and BPM.

Just a simple example, suppose from the customer you hear that the model is this (orders with order statusses):

But when we study the transactions of the order entry system the following is noticed (records are identified by a case ID (the grouping of the records) and the activity at a certain moment):

Here we see that order 4568 is reopened and this should not have supposed to happen according to the designed model. After analyzing the events in a log or perhaps a transactional modelled star schema the model appears like this (corrected):

It could mean that in the operational process order entry personnel has reopened the order for some reason. If you want optimize the process in order to reduce the wastes (Lean Six sigma) than this is very interesting information. Process mining can do this for you.


Although I've just started with studying process mining, this seems a very interesting approach for analyzing processes with datamining. And, this is also applicable on huge log files and analyzing log files is one of the applications of Big data analytics. 

Hope you have read this blogpost with pleasure..



maandag 1 december 2014

SSAS SSRS : Bad Performance SSRS parameters and SSAS


SSRS Reporting parameters and SSAS cubes......There is a lot to read about on the internet about the bad performance of  parameters that are generated by Reporting Services based upon a SSAS Cube. Today, I've ran into this problem too. I've a parameter with a dataset that queries a customer table of about 40000 records. Peanuts for a simple SQL query, but not for a SSAS cube (pfffff). And, I'm not the only one who has encountered this problem. You can read hundreds of blogposts, forumposts and more. This is one blogpost more about this bad cooperation between SSRS and SSAS.

The lab situation is this: we have a main report with a customer parameter and a sub report. I want to pass a Customer list via the parameter to the subreport in order to show detailrecords. In my real word example a map is shown and clicking on a area on the map shows the detail records that belong to a region.

Analyzing the parameter performance problem

The initial problem of a parameter with 40000 records is simple. It doesn't perform! I will show you what kind of issues rise when you create a parameter with 40000 records.  First create a dataset with a parameter "Customer" by checking the parameter option in the query window.

A parameter dataset is created (you can't see that because it's hidden). Choose for "Show hidden datasets". If you run the report, you'll notice that the performance is bad..very bad. When you open profiler to take a look at what is happening during cube querying a lot of serialize events happen and these happen often (per 1000) and these are sloooowwww. The serialize results current is a very slow process. So processing the query is very fast but serializing the result for the Report is slow.

If you take a look at this diagram, you can see that the serialize is done from the Compute Cell data part and it's transferred to the client application.

Now, I haven't found a silver bullet unfortunately. But Chris Webb helped me a bit further. In his blogpost, Chris Web explains this problem a bit further and he gives a solution. Well, sort of but I'll explain this later. Suppose you add a parameter to a dataset according to the following screenshot. 

Then a parameter is automatically created with a dataset:

And, the query in this dataset looks like this:

  MEMBER [Measures].[ParameterCaption] AS [Customer].[Customer].CURRENTMEMBER.MEMBER_CAPTION 
  MEMBER [Measures].[ParameterValue] AS [Customer].[Customer].CURRENTMEMBER.UNIQUENAME 
  MEMBER [Measures].[ParameterLevel] AS [Customer].[Customer].CURRENTMEMBER.LEVEL.ORDINAL
  [Measures].[ParameterLevel]} ON COLUMNS , 
 [Customer].[Customer].ALLMEMBERS ON ROWS 
 FROM [Adventure Works]

According to Chris Webb  you can replace this query by an OLEDB query. This query performs much faster than the generated query by SSRS.

, [Customer].[Customer].ALLMEMBERS
FROM [Adventure Works]

This speeds up the loading the data into the parameter.

Invalid URI : The URI String is too long 

The next problem I had was that I got an error when I call a subreport with the same parameter. It seems that there is limitation of 65520 characters that you can pass with a parameter. This is the error:

A limitation of the number of characters that can be passed through a subreport. Theo Lachevs explains why. There are a couple of options. The best option is to limit the number of characters of the parameter that are passed to the subreport. This is not perfect because of growing the amount of the data somewhere in the future you'll run into problems again and you can't solve it anymore.

Okay, how do we limit the number of characters that are passed to the subreport? Well, Erika Bakse showed in het blogpost a way to limit the number of members. In this example I've created a main report and a subreport. The main report has a Customer parameter that lists customers

For this blogpost I've created a simple report with a Tablix. On the Textbox Properties I've created an Action with an expression.

This is the expression that is passed to the subreport:

=Join(Parameters!CustomerCustomerName.Label, "],")

This is the output of the expression:

And this expression is without the full membernames and therefore compacted. In the parameter of the subreport this is one large string:

With the following expression you can change these CSV separated string into full membernames:

="[Customer].[Customer Name].&[" + REPLACE(Parameters!Customer.Label(0) + "]", "],", "],[Customer].[Customer Name].&[")

Shown below:

The output of the expression is this:

And now we can pass this to a dataset and filter the records with this list.


This URI problem happens quite often. I encountered this problem on numerous environments with SQL Server 2008, SQL Server 2008 R2 and SQL Server 2012 in different situations. I've also encountered different behaviour between SSRS, Report manager and Reportbuilder.



zondag 16 november 2014

BI and the Anscombe quartet

The anscombe quartet is showing us something that is also related to Business intelligence. In Business Intelligence it's familiar to show KPI's or business metrics. A KPI as one number hides some times important information like the anscombe quartet is showing us.

And, they all have the same mean, variance, deviation, correlation and lineair regression!

What is this saying us? The statistician Francis Anscombe suggested this quartet to demonstrate the importance of graphical data analysis and the effect of outliers before they decide on a statistical analysis on the basis of their characteristics. The data sets show that the simple statistical characteristics are not always sufficient in order to describe the data.


Be very cautious showing numbers in General Business Metrics and KPI's in your report and your dashboards. Tell the whole story behind a KPI or Business metric and do not only show a number.

zaterdag 15 november 2014

SSRS : Multi Language Reporting


Years ago, I've worked in a team building a multiluanguage portal for an international company. We've built all kinds of functions for translating reports. Now, I've to built another multilingual portal for an international company again and google led me to the following blog on Codeproject. This is a neat solution for building multilingual reports. It stores labels, languages and translations. In this example a language parameter is used for parameterized translations. For my current project it seems that this is a cool solution.


In the example below I've created three tables: Label, Language and Translations. The Label contains the ID of the field that needs to be translated. I've modded the language table a bit so that the language field is not a descriptive field but contains the format string equal that of the regional settings on your windows system (eg. en-US). Now you can translate your report automatically with the regional settings of your windows system. I renamed the LabelText table to Transalation because that's what it actually does, in my opinion. This is the DDL:

IF  EXISTS (SELECT * FROM sys.objects 
   WHERE object_id = OBJECT_ID(N'[dbo].[Label]') AND type in (N'U'))
DROP TABLE [dbo].[Label]

IF  EXISTS (SELECT * FROM sys.objects 
   WHERE object_id = OBJECT_ID(N'[dbo].[LabelText]') AND type in (N'U'))
DROP TABLE [dbo].[LabelText]

IF  EXISTS (SELECT * FROM sys.objects 
   WHERE object_id = OBJECT_ID(N'[dbo].[Language]') AND type in (N'U'))
DROP TABLE [dbo].[Language]

CREATE TABLE [dbo].[Label](
 [LabelID] [int] IDENTITY(1,1) NOT NULL,
 [Description] [varchar](900) NOT NULL

CREATE TABLE [dbo].[Translation](
 [LanguageID] [int] NOT NULL,
 [LabelID] [int] NOT NULL,
 [LabelTranslation] [nvarchar](max) NOT NULL

CREATE TABLE [dbo].[Language](
 [LanguageID] [int] IDENTITY(1,1) NOT NULL,
 [LanguageCode] [varchar](100) NOT NULL,
 [Active] [bit] NOT NULL

Here is some content of the table:

INSERT INTO [dbo].[Label]  

INSERT INTO [dbo].[Language]  
VALUES ('en-US', 'English (United States)', 1), 
  ('nl-NL', 'Dutch (Netherlands)', 1),
  ('fr-FR','French (France)',0),
  ('de-DE','German (Germany)',0),
  ('it-IT','Italian (Italy)',0),
  ('ja-JP','Japanese (Japan)',0),
  ('es-ES','Spanish (Spain)',0),
  ('ar-SA','Arabic (Saudi Arabia)',0),
  ('zh-CN','Chinese (PRC)',0),
  ('zh-HK','Chinese (Hong Kong S.A.R.)',0),
  ('zh-TW','Chinese (Taiwan)',0),
  ('cs-CZ','Czech (Czech Republic)',0),
  ('da-DK','Danish (Denmark)',0),
  ('fi-FI','Finnish (Finland)',0),
  ('el-GR','Greek (Greece)',0),
  ('he-IL','Hebrew (Israel)',0),
  ('hu-HU','Hungarian (Hungary)',0),
  ('ko-KR','Korean (Korea)',0),
  ('nb-NO','Norwegian, Bokmål (Norway)',0),
  ('pl-PL','Polish (Poland)',0),
  ('pt-BR','Portuguese (Brazil)',0),
  ('pt-PT','Portuguese (Portugal)',0),
  ('ru-RU','Russian (Russia)',0),
  ('sv-SE','Swedish (Sweden)',0),
  ('tr-TR','Turkish (Turkey)',0),
  ('bg-BG','Bulgarian (Bulgaria)',0),
  ('hr-HR','Croatian (Croatia)',0),
  ('et-EE','Estonian (Estonia)',0),
  ('lv-LV','Latvian (Latvia)',0),
  ('lt-LT','Lithuanian (Lithuania)',0),
  ('ro-RO','Romanian (Romania)',0),
  ('sr-Latn-CS',' Serbian (Latin, Serbia)',0),
  ('sk-SK','Slovak (Slovakia)',0),
  ('sl-SI','Slovenian (Slovenia)',0),
  ('th-TH','Thai (Thailand)',0),
  ('uk-UA','Ukrainian (Ukraine)',0),
  ('af-ZA','Afrikaans (South Africa)',0),
  ('sq-AL','Albanian (Albania)',0),
  ('am-ET','Amharic (Ethiopia)',0),
  ('hy-AM','Armenian (Armenia)',0),
  ('as-IN','Assamese (India)',0),
  ('az-Latn-Z','Azeri (Latin, Azerbaijan)',0),
  ('eu-ES','Basque (Basque)',0),
  ('be-BY','Belarusian (Belarus)',0),
  ('bn-BD','Bengali (Bangladesh)',0),
  ('bn-IN','Bengali (India)',0),
  ('bs-Cyrl-BA','Bosnian (Cyrillic, Bosnia and Herzegovina)',0),
  ('bs-Latn-BA','Bosnian (Latin, Bosnia and Herzegovina)',0),
  ('ca-ES','Catalan (Catalan)',0),
  ('fil-PH','Filipino (Philippines)',0),
  ('gl-ES','Galician (Galician)',0),
  ('ka-GE','Georgian (Georgia)',0),
  ('gu-IN','Gujarati (India)',0),
  ('ha-Latn-NG','Hausa (Latin, Nigeria)',0),
  ('hi-IN','Hindi (India)',0),
  ('is-IS','Icelandic (Iceland)',0),
  ('ig-NG','Igbo (Nigeria)',0),
  ('id-ID','Indonesian (Indonesia)',0),
  ('iu-Latn-CA','Inuktitut (Latin, Canada)',0),
  ('ga-IE','Irish (Ireland)',0),
  ('xh-ZA','isiXhosa (South Africa)',0),
  ('zu-ZA','isiZulu (South Africa)',0),
  ('kn-IN','Kannada (India)',0),
  ('kk-KZ','Kazakh (Kazakhstan)',0),
  ('km-KH','Khmer (Cambodia)',0),
  ('rw-RW','Kinyarwanda (Rwanda)',0),
  ('sw-KE','Kiswahili (Kenya)',0),
  ('kok-IN','Konkani (India)',0),
  ('ky-KG','Kyrgyz (Kyrgyzstan)',0),
  ('lo-LA','Lao (Lao P.D.R.)',0),
  ('lb-LU','Luxembourgish (Luxembourg)',0),
  ('mk-MK','Macedonian (Former Yugoslav Republic of Macedonia)',0),
  ('ms-BN','Malay (Brunei Darussalam)',0),
  ('ms-MY','Malay (Malaysia)',0),
  ('ml-IN','Malayalam (India)',0),
  ('mt-MT','Maltese (Malta)',0),
  ('mi-NZ','Maori (New Zealand)',0),
  ('mr-IN','Marathi (India)',0),
  ('ne-NP','Nepali (Nepal)',0),
  ('nn-NO','Norwegian, Nynorsk (Norway)',0),
  ('or-IN','Oriya (India)',0),
  ('ps-AF','Pashto (Afghanistan)',0),
  ('pa-IN','Punjabi (India)',0),
  ('quz-PE','Quechua (Peru)',0),
  ('sr-Cyrl-CS','Serbian (Cyrillic, Serbia)',0),
  ('nso-ZA','Sesotho sa Leboa (South Africa)',0),
  ('tn-ZA','Setswana (South Africa)',0),
  ('si-LK','Sinhala (Sri Lanka)',0),
  ('ta-IN','Tamil (India)',0),
  ('tt-RU','Tatar (Russia)',0),
  ('te-IN','Telugu (India)',0),
  ('ur-PK','Urdu (Islamic Republic of Pakistan)',0),
  ('uz-Latn-UZ','Uzbek (Latin, Uzbekistan)',0),
  ('vi-VN','Vietnamese (Vietnam)',0),
  ('cy-GB','Welsh (United Kingdom)',0),
  ('wo-SN','Wolof (Senegal)',0),
  ('yo-NG','Yoruba (Nigeria)',0)

DECLARE @LanguageID   AS int = (SELECT LanguageID FROM [Language] WHERE   LanguageCode = 'en-US')     
DECLARE @LabelID   AS int = (SELECT LabelID FROM Label WHERE   Name = 'ReportName1')  

INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'ReportName!')

SET @LabelID = (SELECT LabelID FROM Label WHERE   Name = 'PeriodFrom')
INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'PeriodFrom')

SET @LabelID = (SELECT LabelID FROM Label WHERE   Name = 'PeriodUntil')
INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'PeriodUntil')

SET @LabelID = (SELECT LabelID FROM Label WHERE   Name = 'Currencies')
INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'Currencies')

SET @LabelID = (SELECT LabelID FROM Label WHERE   Name = 'Total')
INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'Total')

SET @LabelID = (SELECT LabelID FROM Label WHERE   Name = 'Time')
INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'Time')

SET @LabelID = (SELECT LabelID FROM Label WHERE   Name = 'Source')
INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'Source')

SET @LabelID = (SELECT LabelID FROM Label WHERE   Name = 'Location')
INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'Location')

SET @LabelID = (SELECT LabelID FROM Label WHERE   Name = 'Page')
INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'Page')

SET @LabelID = (SELECT LabelID FROM Label WHERE   Name = 'TotalPages')
INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'TotalPages')


SET @LanguageID = (SELECT LanguageID FROM [Language] WHERE   LanguageCode = 'nl-NL')  
SET @LabelID  = (SELECT LabelID FROM Label WHERE   Name = 'ReportName1')  

INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'RaportNaam!')

SET @LabelID = (SELECT LabelID FROM Label WHERE   Name = 'PeriodFrom')
INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'Periode Van')

SET @LabelID = (SELECT LabelID FROM Label WHERE   Name = 'PeriodUntil')
INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'Periode Tot')

SET @LabelID = (SELECT LabelID FROM Label WHERE   Name = 'Currencies')
INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'Valuta')

SET @LabelID = (SELECT LabelID FROM Label WHERE   Name = 'Total')
INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'Totaal')

SET @LabelID = (SELECT LabelID FROM Label WHERE   Name = 'Time')
INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'Tijd')

SET @LabelID = (SELECT LabelID FROM Label WHERE   Name = 'Source')
INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'Bron')

SET @LabelID = (SELECT LabelID FROM Label WHERE   Name = 'Location')
INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'Locatie')

SET @LabelID = (SELECT LabelID FROM Label WHERE   Name = 'Page')
INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'Pagina')

SET @LabelID = (SELECT LabelID FROM Label WHERE   Name = 'TotalPages')
INSERT INTO [dbo].[Translation] ([LanguageID],[LabelID],[LabelTranslation]) VALUES (@LanguageID, @LabelID, 'Totaal aantal paginas')

The above scripting can be improved be implementing stored procedures. I'll work on that in the future. The next thing to consider is to build a webapplication or .NET application to maintain the translations.


Next, in the report I created a dataset for the supported languages and made it visible, but when I'm almost finished I'll change this in a hidden parameter. Below I've printed some important screenshots of the solution.

This is the language dataset for filling the parameter:

The available values for the language parameter:

Then I've created the translation parameter:

Modded the properties accordingly to the following screenshot:

Then I added the code from the and tweaked it a bit.

Public Function GetTranslation(Parm as Parameter, LabelID as Integer, _
                Optional AddColon as Boolean=False) as String

Dim t as Integer

For t = 0 to Ubound(Parm.Value)
   If (Parm.Value(t) = LabelID) Then
      if AddColon = true then
         Return Parm.Label(t) & ":"
         Return Parm.Label(t)
      End if
   End if
Next t

Return ""

End Function

And I tested it with an expression in a textbox : Code.GetTranslation(Parameters!Translation, 2)

Depending on the language parameter the right translation is chosen form the translation table.


When you want to make your report truly multilanguage, then you have to go a step further. Then you have to set the language property in the report, for instance. This will format your fixed elements like a date (eg. 11/5/2014 or 5/11/2014).

Another important thing to consider is the data. I've blogged already about multi currency issues earlier (Part I, Part II and Part III). This solution will transform your amount fields in every currency you want. In a new blogpost I'm going to write about multicurrency (Part IV) I'll show how to use the formatstring and the Fields!Amount("FORMAT_STRING") option to format the currencies ccording to the settings of the SSAS cube. This way the currency symbols are inherited from the SSAS cube. Below an example of the report in English

Here an example in Dutch:

One thing to consider and to improve is to define multilingual values for contents of a table. For instance Bicycle, car and Bus for English and Fiets, Auto and Bus in dutch. In AdventureWorks this is solved by adding extra columns to a dimension: Englishname, FrenchName, etc. Perhaps I'll take a look into that....


It's possible to make your report multilingual. Standard, there is no support for a multilingual report in Reporting Services, unfortunately. In C# and .NET you have resource files that stores the translations. With aid of some logic in SSRS we can come quite along way..


maandag 27 oktober 2014

SQL Server : A tutorial for building a replication between 2 machines (Part I)


Replication is a set of technologies for copying and distributing database objects and data from one database to another database. After the intial configuration a synchronization process is running to maintain consistency.The distributor is kept in sync synchroneously with the publisher.

This blogpost is the first blogpost in a series of blogposts about replication on 2 machines. This series of blogpost is about a more complex configuration of replication than the blogpost I've written earlier.

These are the blogposts about this series about replication:
  • SQL Server : a tutorial for building a replication between 2 machines (Part I).
  • SQL Server : a tutorial for building a replication between 2 machines : Share (Part II).
  • SQL Server : a tutorial for building a replication between 2 machines : Distributor (Part III).
  • SQL Server : a tutorial for building a replication between 2 machines : Publisher (Part IV).
  • SQL Server : a tutorial for building a replication between 2 machines : Subscriber (Part V).
  • SQL Server : a tutorial for building a replication between 2 machines : Monitoring (Part VI).

The main parts

In this section I'll go into detail about replication. For replication, there are three main parts important, namely the publisher, the subscriber and distributor:
  • Publisher. The publisher has the data (publication) it can offer to other subscribers. There can be multiple subscribers.
  • Subscriber. The subscribers are the database servers that want to have the data from the publisher when the data is changed.  
  • Distributor. The distributor keeps track of the subscribers and to ensure that the subscribers are notified of changes.

Below another picture I've borrowed from a SQLServerCentral.

Green is read and red is write

Initial data set 

The initial dataset is the dataset that is initially captured on the publisher and stored on the subscriber. This is done by the snapshot agent. It is also possible to carry this out through a backup or SSIS.

Changed dataset 

The log reader agent is running at the distributor. The log reader identifies INSERT, UPDATE, and DELETE statements or other changes marked for replication in the transaction log. The agent will then copy transactions in batches to the distribution database on the distributor. The distribution database is a store-and-forward queue and the final step is storing the changed data set on the subscriber.

The agents

The transactional replication of SQL Server is implemented with a Snapshot Agent, Log Reader Agent, Queue Reader Agent and a Distribution Agent:
  • The Snapshot Agent makes snapshot files based on the schema and data from the publisher of the databaseobjects and data. The Snapshot Agent places a shared lock on all tables ready for publishing while the BCP files are generated. This ensures transactional consistency of the data, but it ensures that other requests that attempt to write simultaneously are blocked. Depending on the size of the table this may be substantial. This is only the first time. During normal operations there are much finer-grained and very short-locks required.
  • The log reader Agent monitors the transaction log of each database which is configured for replication.. The Log Reader agent is responsible for copying the transactional log records in the distribution database. This is done according to the ACID principles. The LogReader connects the Publisher, and looking through the log files for records marked with the replication flag and copies the information in the distribution database. Transactions can be as long as the LogReader records when tranaction can't be removed from the log file due to problems. Growth of the log file can be the result. A possible cause may lie in the distributor that is out. This is viewable in  the "log_reuse_wait_desc column column in the sys.database.
  • The distribution agent copies the initial snapshot files from the snapshot folder and the transactions in the distribution database to the subscriber. The distribution agent is responsible for the transfer of the distribution database to the subscription database. The distribution Agent connects with the distributor and reads the stored changes. Then, the changes will be processed in jacket order.
  • The Queue read agent is used together with transactional replication with updating. The agent runs on the distributor and move changes from the subscriber back to the publisher. 
Transactional replication is not performed by SQL Server database engine but with windows programs that connect to the servers involved in the replication process. The agents are implemented by the SQL Server Jobs.

Important choices 

For replication, there are several possible important choices, I've gathered so far:
  • Type of replication. Possible options: snapshot (eg 1 x daily.), Transactional Replication (almost realtime), merge replication (merge data from different sources, eg, salespeople). 
  • Configuration. It is possible to place the distributor (monitors and supervises the replication process) on a separate machine.
  • Push and pull configuration
    • When chosen for push the distributor gets the data from the Publisher and sends it to the subscriber. 
    • For pull the distributor gets the data from the publisher and the subscribers retrieve the data from the Distributor.


In this blogpost I've desribed the main parts of replication, the agents and the important choices of replication.


maandag 20 oktober 2014

SQL Server Replication : a tutorial for building a simple replication


In this blogpost I'll show you how to set up a basic replication in SQL Server (2008 R2). There are multiple sites, blogs and other stuff where you can read the ins and outs of replication. I'll not go into detail about replication.

The basic components

Let's explain some basics. There are a couple of different types of replication possible: transactional, snapshot, and mergereplication. In this blogpost I'll describe the transactional replication. And, there are a couple of keyplayers involved : the distributor, the publisher and the subscriber:
  • A distributor. The distributor identifies whether changes has happened on the publisher.
  • A publisher. The publisher is the SQL Server instance that publishes the articles.
  • A subscriber. SQL Server instance that subscribes to a publication.

There are also other components involved :
  • Articles : for each SQL Server object an article needs to be defined.
  • Publication : a collection of articles.
  • Publication database : a database that contains the articles that needs to be published.
  • Distribution database : the distribution database stores metadata and replicated data.
  • Subscription : a collection of articles.
  • Subscription database : the target database where the articles are stored.

And, there are a couple of agents that take care of the replication:
  • Snapshot agent. The snapshot agent creates the snapshot files with the schema and the data of the to be published tables and database objects
  • Log reader agent. The log reader Agent monitors the transaction log of every database that is configured for replication. And it copies the data into the distribution database.
  • Distribution agent. The distribution agent copies the initial snapshot files of the snapshot folder and the data from the distribution agent into the subscriber database.
  • The Queue Reader Agent is used with transactional replication with the queued updating option. 

The tutorial described in this blogpost is executed on one virtual environment. Currently setting up a testenvironment with a Domain controller, a publisher and a distributor in separate Virtual machines. I'll blog about this configuration later.

Setting up the distribution

In this blogpost I'll describe a simple replication setup. Don't bother (yet) about the right AD accounts. Just a plain and simple replication on one database instance (in a virtual environment on my laptop). Okay let's start the tutorial:

1. Start SQL Server Management Studio and connect to an instance.

2. In the Object Explorer, right click on the replication map and click on Configure Distribution.

3. Click on Next.

4. Choose the current instance for setting up the distributor. Don't change anything and press Next.

5. Although it's not recommend to choose Yes here, because of not following security best practices, choose Yes;-). This tutorial is about setting up a simple replication.

6. Now it's time to specify the snapshot folder. The snapshot folder is used for the initial synchronization of the transactional replication. In this tutorial I created the snapshot folder on the same machine. I can imagine that you want to create this snapshot folder on a share on the network.

7. After setting up the snapshot folder, the following window appears. This window specify the distribution database and the location of the database and log files. Click Next.

8. Now we are entering the Publisher window. Here you can specify the Publisher. It grants the server access to the Distributor.

Setting up the publisher

9. Now you have setup the Distributor it's time to setup the Publisher. In SQL Server Management Studio right click on Local Publications, and then New Publication.

10. Press Next.

11. Press Next.

12. Press Next.

13. Press Next.

14. Choose the database that database that should be replicated (in my case AdventureWorksLT). Press Next.

15. Choose transactional replication and press Next.

16. Choose the tables, stored procedures or even fields in the table you want to replicate. Press Next

17. Add filters if you want and press Next.

18. In the following window you can specify when you want to run the snapshot. Choose "Create a snapshot immediately and keep the snapshot available to initialize subscriptions".

19. Use the SQL Server Agent account for the snapshot agent.. Press Next.

20. Press Next..

21. Enter some properties depending on what you want. Press Next.

22. On the Complete the wizard window an overview is given of the settings. Press Next.

Settting up the subscription

23. Now it's time to setup the subscription. The subscription resides mostly on a different machine. In this simple tutorial I'll keep the subscriber on the same machine. Right click in the Object explorer, Replication, Local Subscription and New Subscription and press Next on the opening window of the wizard.

24. Choose the server and the Publisher database. Press Next.

25. Specify whether this is a pull or push subscription.

26. Choose server and database where the subscription database should be. Press Next.

27. Specify the security account for the distributor. Press ... .

28. Choose options like the one in the window below. Press OK.

29. Press Next.

30. Specify the synchronization schedule. Choose Run continuously and press Next.

31. Press Next.

32. Press Next.

33. Press Next.

34. And we are ready to execute. Press Next.

35. Wait and see how the subscriptions are generated.

Looking around

The next thing I did was browsing SQL Server and folders to see what has changed. Below the definitions of the replication process.

The snapshot folder:

The replication is executed by a number of replication agents (jobs) :

  • Snapshot agent. performs initial synchronization between the publication database and the subscription database.
  • Log reader agent. Reads the transaction log of the Publication database.
  • Distribution agent. Applies changes to the subscription database (from the recorded changes in the distribution database).
Below the jobs that controls the replication of articles.


Although it's a simple tutorial, some considerations are shown that can help you when you're implementing replication in a real world scenario. In this blogpost I showed a simple Next Next tutorial for building a replication process.

In future blogpost I'll implement replication on multiple machines and with better security options.