Robert J. Hatton

Business Intelligence Developer & Author

Speaking at SQL Saturday this weekend!

September 19, 2013 By Robert Hatton Leave a Comment

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmail

I’ll be giving a presentation on “Designing for Analysis” at the SQL Saturday in San Diego this weekend. It’ll cover how to separate information into Facts and Dimensions and organize it into a Star Schema. Of course, that means I’ll have to explain what a Star Schema is and why it’s cool! The part I’m interested in seeing how people respond to is the description of how to organize the ETL (Extract / Transform / Load) to make everything work!

Here’s a PDF of the presentation:FactsAndDimensions

Incremental extracts

August 12, 2013 By Robert Hatton Leave a Comment

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmail

This is going to be a long post, but I see folks working way harder than needed to extract data for BI systems. Given the large quantities of data available in source systems, it’s obvious that it is not practical to transfer all of the data to the Business Intelligence solution on each extract. Incremental extracts are needed to cut quantity of data transferred down to a manageable level.

An incremental extract is simply the product of a program which will pull data from a source system for a specified date range.

It might be tempting to create an extract program which simply extracts data from the prior day. In actual practice, this often an undesirable approach. It is not unusual to encounter a situation where a given day or range of days worth of data needs to be re-extracted.

A much better way to implement an extract program is to provide the start and end dates at run time. This can be done by providing them as command line arguments when the extract program is invoked or by having the extract program pick the start and end date up from a database table or environmental variables.

When extract programs are designed to use start and end dates that are provided at run time, it is possible to query the existing data repository for the record loaded from that extract with the most recent date. This yields an extract facility which is, to a great degree, adaptive to the needs of the database. If a network failure prevents the data transfer for an entire day, no manual intervention is necessary. The next day when the extract runs it will extract data for two days instead of one. If you have to delete a week’s worth of data because of a problem with a source system then next execution of the extract will retrieve the appropriate amount of data.

While there are several ways to implement this type of extract, one way that is commonly supported is to create the extract program so that is gets the start and end date as command line arguments. Command line scripts can then be used to find the appropriate start and end dates then build the command line to start the program.

Using a Windows / SQL server environment as an example, the following SQL statement could be used to create a command line to start an extract program that would pull whatever data from a sales_facts table in the source system that is not currently in the data warehouse:

set nocount on

select ‘salesExtract ‘

+ (select convert(char(10), max(timestamp), 101) from sales_facts) + ‘ ‘

+ (select convert(char(10), dateadd(d, -1, getdate()), 101))

 

The above query would create the following output:

salesExtract 08/07/2013 08/08/2013

Assuming the above query was saved in a file named “salesExtract.sql” another batch file could be used to:

  1. Create a batch file which would start the extract program
  2. Start the extract program (which would create a text file in the landing area)
  3. Delete the batch file

The following script would be saved as a batch file named “salesExtract.bat”, and would perform the steps outlined above.

sqlcmd -h -1 -s “localhost” -d “biDB” -i “salesExtract.sql” -o “salesExtractCurrent.bat” salesExtractCurrent.bat

del salesExtractCurrent.bat

In actual use, a job scheduler would be used to start the “salesExtract.bat” batch file, which would use the “salesExtract.sql” file to create a temporary batch file (“salesExtractCurrent”) that it would then execute. Once the extract was created in the landing zone, the salesExtractCurrent batch file would be deleted.

As previously mentioned, there are several ways to create an extract facility that will retrieve all of the data needed to bring the data repository current. The above example illustrates one approach that works in a variety of environments. However it is implemented, making the extract facility do more that simply retrieve the prior day’s activity will make the finished solution much more robust.

Someone in the government doing really cool work

July 12, 2013 By Robert Hatton Leave a Comment

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmail

Say what you will about ‘Obamacare’, what’s going on behind the scenes (in terms of tech) is totally cool. It’s kind of a long article, but well worth the read.

http://www.infoworld.com/d/consumerization-of-it/the-savvy-tech-strategy-behind-obamacare-222251

Of course, it all depends on folks actually caring about health information. If we’re all just sheep, it won’t turn out well.

Finding stuff in SQL Server

July 8, 2013 By Robert Hatton Leave a Comment

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmail

Starting at a new client usually means some exploring in databases to figure out how things work. The system tables can be used to find where a text string is used in stored procedures or column names. The following queries will find where a specific string is used. Just use the database of interest, set a value for the @searchStr variable, and run the query!

declare @searchStr as varchar(50)
set @searchStr = ‘%categoryID%’

–Find string in proc
select distinct OBJECT_NAME(id)
from syscomments
where OBJECTPROPERTY(id, ‘IsProcedure’) = 1
and [text] like @searchStr
order by OBJECT_NAME(id)

–Find table and column by name
SELECT t.name AS table_name,
SCHEMA_NAME(schema_id) AS schema_name,
c.name AS column_name
FROM sys.tables AS t
INNER JOIN sys.columns c ON t.OBJECT_ID = c.OBJECT_ID
WHERE c.name LIKE @searchStr
ORDER BY schema_name, table_name;

Reporting Services Subscriptions (without the Enterprise server)

May 3, 2013 By Robert Hatton Leave a Comment

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmail

One of the truly potent features of the Enterprise version of Microsoft’s Reporting Services is the ability to run data driven subscriptions. This supports the ability, for example, to create and email a sales report to every store in a retail chain every morning without human intervention. For an organization that’s widely distributed, this kind of thing is a big deal! Unfortunately, this is a feature that is limited to the Enterprise version of SQL Server (which is a lot more expensive than the Standard version).

I worked at a client in the past that needed this data driven subscription capability, but wasn’t prepared to pay for the Enterprise version on their database servers. So we (Andy Warren and myself) built our own data driven subscriptions! Our client kindly agreed to let us share the code:

ReportSubscriptions

The link is to a zip file which contains some documentation, some TSQL scripts and a .NET project. The general idea is to set up a database which defines subscriptions and then use .NET to create an instance of SSRS which creates and distributes the report. It worked great! In some ways, it works better than the Enterprise version (better error recovery).

Enjoy!

The server side of Reporting Services

May 3, 2013 By Robert Hatton Leave a Comment

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmail

Nearly all of the presentations concerning SQL Server Reporting Services at the SQL Saturday conferences are about how to design reports. This one is a bit different, it’s about how to organize things on the server side.

The two most important points are to create a ‘shared data sources’ directory and a ‘linked reports’ directory. Make sure all developers use the same shared data sources and ALWAYS deploy into the linked reports directory using the same name the developer used. Doing so will help keep your SSRS server from becoming a mess.

The most interesting (though perhaps not most important) topic is the ReportServer database and how it can be used to create a Report of Reports. It’s a great way to untangle a SSRS server that’s a mess.

ReportingServicesServer

Enjoy!

Something every B.I. developer needs: a calendar generator

May 3, 2013 By Robert Hatton Leave a Comment

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmail

Nearly every B.I. project deals with data over time, and therefor needs a calendar dimension. Many newer B.I. tools include features to create a calendar, but they don’t always do it in the way you (or more importantly, your customer) wants.

This is especially true if you are working in a retail environment where the 454 calendar is needed. For those who aren’t familiar with the 454 calendar, it addresses a problem that the retail community has comparing month over month sales. In the conventional calendar, months can have different numbers of days. More significantly, they can have different numbers of weekend days. The 454 calendar addresses this issue by breaking each quarter into 3 months where the first month has 4 weeks, the second has 5 weeks and the third has 4 weeks. Each week always starts on a Sunday. This provides the consistency needed to make good month over month comparisons.

The Excel workbook attached below creates a spreadsheet which has columns for the normal calendar, the fiscal calendar, and the 454 calendar. It’s all driven by VBA, which you can modify to suite you own needs. It’ll open up on a worksheet that provides some basic documentation.

Dates454v2

Enjoy!

Designing for Analytics: Facts and Dimensions

May 3, 2013 By Robert Hatton Leave a Comment

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmail

This is a presentation on how to design an analytical database. It’s primarily about how to divide data into facts and dimensions then organize them into a star schema.

It’s one of my favorite presentations to do for the B.I. crowd because there is not always a clear set rules that let you know if a particular data element is a fact or dimension. On the surface, it’s usually obvious. The interesting part is when the discussion turns to concepts that many folks, even experienced ones, hadn’t considered.

Here’s a PDF version of the presentation.

FactsAndDimensions

Designing Analysis Services Cubes

May 2, 2013 By Robert Hatton Leave a Comment

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmail

This is a Power Point presentation from 2010 on designing Analysis Services cubes. It gives an overview of the structure of multidimensional hypercubes. It also provides some tips about how to approach the task of creating the design.

 

Getting the blog rolling…

May 2, 2013 By Robert Hatton Leave a Comment

Facebooktwittergoogle_plusredditpinterestlinkedintumblrmail

I’ve been building Business Intelligence systems for decades. Over the years, I’ve accumulated a wealth of tips and tools. It seems like it’s time to collect them in one place before I forget them!

« Previous Page

Follow Me

rss

Database Spelunking

Spelunking, for those who don’t know is exploring caves. In some ways, that’s what it’s like to find your way around in databases in some organizations. It’s especially difficult without a flashlight. This PowerPoint presentation will provide some tips, techniques, and several scripts that can help shed light on your data cavern. The attached ZIP […]

SQL Style presentation

Thanks to all who joined in the SQL Style presentation at the Tampa SQL Saturday. It was great to have folks involved during a presentation. It doesn’t look like I can attach the presentation, but it’s available on the SQL Saturday website. The queries are right here: –Example Queries use Fruit go select * from […]

Categories

Copyright © 2025 Robert J Hatton