Robert J. Hatton - Business Intelligence Developer & Author

Database Spelunking

March 19, 2018 By robhatton314@gmail.com Leave a Comment

Spelunking, for those who don’t know is exploring caves. In some ways, that’s what it’s like to find your way around in databases in some organizations. It’s especially difficult without a flashlight.

This PowerPoint presentation will provide some tips, techniques, and several scripts that can help shed light on your data cavern.

The attached ZIP file has all of the scripts mentioned in the presentation.DatabaseSpelunking DatabaseSpelunkingScripts

SQL Style presentation

February 24, 2018 By robhatton314@gmail.com Leave a Comment

Thanks to all who joined in the SQL Style presentation at the Tampa SQL Saturday. It was great to have folks involved during a presentation. It doesn’t look like I can attach the presentation, but it’s available on the SQL Saturday website. The queries are right here:

–Example Queries
use Fruit
go

select * from colors
select * from fruit

–Use ‘where’ clause
select *
from fruit
where colorKey = 1

–Dealing with null
select *
from fruit
–where colorKey = null
where colorKey is null

–Wildcard
select *
from fruit
where fruitName like ‘%e%’
–where fruitName like ‘%e ‘
–where rtrim(fruitName) like ‘%e’

–‘In’ clause
select *
from fruit
where fruitName in (‘apple’,’grape’)

–In with a subquery
select *
from fruit
where fruitName in (select fruitName from fruit where rtrim(fruitName) like ‘%e’)

–Everything from inner join
select *
from fruit f
inner join colors c
on f.colorKey = c.colorKey
–Relevent columns from inner join
select f.fruitName, c.color
from fruit f
inner join colors c
on f.colorKey = c.colorKey

–Everything from left outer join
select f.fruitName, c.color
from fruit f
left outer join colors c
on f.colorKey = c.colorKey

–Everything from right outer join
select f.fruitName, c.color
from fruit f
right outer join colors c
on f.colorKey = c.colorKey

–Everything from Cross join (match everything with everything – no ‘on’ condition)
select f.fruitName, c.color
from fruit f
cross join colors c
–on f.colorKey = c.colorKey

–Everything from full join (combination of left & right outer joins)
select f.fruitName, c.color
from fruit f
full join colors c
on f.colorKey = c.colorKey

–Check for Exists
select fruitName
from fruit f
where exists
(select *
from colors c
where c.colorKey = f.colorKey)
–Alternate version of Exists
select fruitName
from fruit f
left outer join colors c
on c.colorKey = f.colorKey
where f.colorKey is not null

–Aggregations
select colorKey, count(*) as cnt
from fruit f
group by colorKey
having count(*) > 1

–Self join
use [AdventureWorksDW]
go
select c.AccountDescription, p.AccountDescription
from DimAccount c
left outer join DimAccount p
on p.ParentAccountKey = c.AccountKey
order by c.accountKey

Speaking at Tampa BI User Group

November 5, 2017 By robhatton314@gmail.com Leave a Comment

I’ll be speaking at the Tampa Bay BI User Group meeting on 11/6. The topic will be ‘Database spelunking’. Spelunking, for those who don’t know is exploring caves. In some ways, that’s what it’s like to find your way around in databases in some organizations. It’s especially difficult without a flashlight.

This PowerPoint presentation will provide some tips, techniques, and several scripts that can help shed light on your data cavern. DatabaseSpelunking

Here’s a zip file with the scripts that are mentioned in the presentation: DatabaseSpelunkingScripts

Finding SQL Server instances that have a specific table.

August 12, 2016 By robhatton314@gmail.com Leave a Comment

It’s not uncommon to have a commonly used table copied into a number of different databases, potentially on different servers. We had a requirement recently to change the schema of a commonly used table, but had no idea of where it’s used. Rather than manually inspect every single SQL Server in the organization, I whipped up a PowerShell script to ask each database on each server if it has a copy of the desired table.

It works great, but it’s a bit crude. I didn’t include much in the way of error handling, but given how rarely this will be used, it’s probably not worth doing. It does give you the location of every copy of the table that you specify at the beginning of the script! Don’t forget to update $serverList with the names of your servers.

Clear-Host
Write-Host “Searching SQL Servers….”

#Set up the variables
$tblName = “Put table you want here”
$serverList = ‘server1′,’server2\instance’,’server3′,’server4′
$SQLServer = “”
$SQLDBName = “”
$uid =””
$pwd = “”
$db = “”
$tbl = “”
$connectionString = “”

foreach($SQLServer in $serverList)
{
#create a connection to the database
$connectionString = “Server=$SQLServer;Database = Master;Integrated Security=SSPI;”

#open the connection
$dbConn = New-Object System.Data.SqlClient.SqlConnection
$dbConn.ConnectionString = $connectionString
$dbConn.Open()

Write-Host “Server: ” $SQLServer

#Compose a SQL statement to find every database
$qry = @”
select d.[name] as dbName
from sys.databases d
where d.[name] not in (‘master’,’tempdb’,’model’,’msdb’)
“@

#Add the query to the connection
$dbCmd = $dbConn.CreateCommand()
$dbCmd.CommandText = $qry

#Execute the query
$dbList = $dbCmd.ExecuteReader()

#Format and display the results
$dbs = new-object “System.Data.DataTable”
$dbs.Load($dbList)

#Check each database for desired table
foreach($db in $dbs)
{
$dbName = $db[0]
#Write-Host “Checking database: ” $dbName

#open the connection
$tblConn = New-Object System.Data.SqlClient.SqlConnection
$tblConn.ConnectionString = $connectionString
$tblConn.Open()

#Compose a SQL statement to find every table
$qry = @”
select s.[name] as schemaName
, tbl.[name] as TableName
from $dbName.sys.tables tbl
left outer join $dbName.sys.schemas s
on s.[schema_id] = tbl.[schema_id]
“@

#Add the query to the connection
$tblCmd = $tblConn.CreateCommand()
$tblCmd.CommandText = $qry
#Write-Host “Query: ” $tblCmd.CommandText “`n”

#Execute the query
try {
$tblList = $tblCmd.ExecuteReader()
}
catch
{
Write-Host ” -Error accessing database” $dbName
}
$tbls = new-object “System.Data.DataTable”
$tbls.Load($tblList)
#Write-Host ($tbls | Format-Table | Out-String)
foreach($tbl in $tbls)
{
if($tbl[1] -eq $tblName)
{
Write-Host “Found table in ” $dbName, $tbl[0], $tbl[1]
}
}

}

Find all of your SQL Servers

August 11, 2016 By robhatton314@gmail.com Leave a Comment

It’s surprising to find out how few organizations have a list of all of their SQL Server instances. In the past, building such a list was an onerous task. It turns out that there’s a great way to to find out about all of the SQL Server instances that a given server can see. Just ask SQL Browser!

As PowerShell can reach into the .Net object model, it’s a great platform for simple tasks like this. To use it, you need to remote to a desired SQL Server, start PowerShell, and paste in this command:

[System.Data.Sql.SqlDataSourceEnumerator]::Instance.GetDataSources()

This will produce a list of servers along with the instance name (if any) and the server version number.

It’s a simple solution to an annoying problem!

A fundamental truth for SQL Server’s optimizer

June 4, 2016 By Robert Hatton 1 Comment

Table joins vs. query plans

SQL Server uses an optimizer to figure out how to retrieve data and process a query. For every table, it evaluates roughly 350 different rules. It occurred to me that this would lead to exponential growth in the number of potential query plans. The following formula is one way to think about the problem of increasing the potential number of query plans as the number of tables in a query increases:

(t*350)^t

In the above query, ‘t’ represents the number of tables. It’s obvious that the optimizer would need to evaluate 350 rules per table (hence t*350). That’s actually not always true as the optimizer knows how to take some shortcuts, but it’s a reasonable number for this exercise. What’s less obvious is that the order that the optimizer evaluates rules for tables increases exponentially with the number of tables. This is expressed by the part of the formula that raises the result of the rules evaluation for each table to the power of the number of tables (^t). It turns out that this is an ‘NP complete’ problem.

What this means in the real world is that it quickly becomes less likely that SQL Server’s optimizer will find the optimal execution plan once you have more than 6 tables in a join. That doesn’t mean that performance will always be awful with more that 6 tables in a join. It just means that it becomes much more likely. Once you encounter performance problems with this many tables in a query, there may be no other way to improve performance than to reduce the number of tables joined.

Picture a database

June 4, 2016 By Robert Hatton Leave a Comment

When you walk into a new project, it can be a challenge to build a mental image of what’s going on quickly. I could spend an hour just talking about how to get organized quickly. Instead of blathering on, here’s a script to query the system tables of the database to document it’s structure and then sample the actual data to help you see what’s actually in there.

The script will output the following columns: TableName, ColumnName, recordCnt, distinctCnt, nonNullCnt, NullCnt, fldSample. The general idea is to show you a list of the tables and columns followed by some counts of how many distinct values each table has along with how many records are null (or not). The real value is in seeing a sample of the values in each column (it’s in the fldSample column) . The script is set to limit the sample data to 200 characters, but that’s easy to change.
In any event, here’s the script:

USE [Datawarehouse]

declare @tbl varchar(50)
declare @col varchar(50)
declare @sqlStmt varchar(700)
declare @val varchar(100)
declare @fldSample varchar(200)
set @val = ”
set @fldSample = ”

–Recreate the profile table
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[dbProfile]’) AND type in (N’U’))
DROP TABLE [dbo].[dbProfile]
CREATE TABLE [dbo].[dbProfile](
[TableName] [varchar](50) NOT NULL,
[ColumnName] [varchar](50) NOT NULL,
[recordCnt] [int] NULL,
[distinctCnt] [int] NULL,
[nonNullCnt] [int] NULL,
[NullCnt] [int] NULL,
[fldSample] varchar(300)
) ON [PRIMARY]

declare dbSchema cursor for
SELECT t.name AS table_name,
c.name AS column_name
FROM sys.tables AS t
INNER JOIN sys.columns c ON t.OBJECT_ID = c.OBJECT_ID
ORDER BY table_name

open dbSchema
fetch next from dbSchema into @tbl, @col
while @@FETCH_STATUS = 0
begin

set @sqlStmt = ‘declare fldList cursor for select distinct ‘ + @col + ‘ from ‘ + @tbl
EXEC (@sqlStmt)
set @fldSample = ”
open fldList

fetch next from fldList into @val

while @@FETCH_STATUS = 0
begin
set @fldSample = @fldSample + rtrim(ltrim(@val)) + ‘, ‘
fetch next from fldList into @val
end

close fldList
deallocate fldList

set @sqlStmt = ‘insert into dbProfile (TableName, ColumnName, recordCnt, distinctCnt, nonNullCnt, nullCnt, fldSample) ‘
set @sqlStmt = @sqlStmt +’select ”’+@tbl+”’ as TableName ,”’+@col+”’ as ColumnName, ‘
set @sqlStmt = @sqlStmt +’count(*) as recordCnt, ‘
set @sqlStmt = @sqlStmt +’count(distinct [‘+@col+’]) as distinctCnt, ‘
set @sqlStmt = @sqlStmt +’sum(case len(rtrim([‘+@col+’])) when 0 then 0 else 1 end) as nonNullCnt, ‘
set @sqlStmt = @sqlStmt +’sum(case len(rtrim([‘+@col+’])) when 0 then 1 else 0 end) as NullCnt, ‘
set @sqlStmt = @sqlStmt + ”” + left(@fldSample,200) + ””
set @sqlStmt = @sqlStmt +’ from ‘ + @tbl
–select @sqlStmt
begin try
EXEC (@sqlStmt)
end try
begin catch
print(@sqlStmt)
end catch

fetch next from dbSchema into @tbl, @col
end
close dbSchema
deallocate dbSchema
select * from dbProfile

Constellation Schema

November 18, 2013 By robhatton314@gmail.com Leave a Comment

One of the basic concepts that makes Business Intelligence work so well is the separation of data into facts (what is being analyzed) and dimensions (how it is analyzed) and organizing those facts and dimensions into a Star Schema. An illustration from my book ‘Business Intelligence Team Handbook’ shows how tables are organized into a star shape.

Star Schema

In this case, you can see how the Sales table is used as the fact table. I was explaining the concept of how dimension tables could be re-used in association with a variety of fact tables to build multiple star schemas. Many of the same dimensions referenced in the star schema above could be used with an Inventory fact table or perhaps with a Foreclosure fact table.

It occurred to me that the resulting design would be a ‘Constellation Schema’. This phrase creates a great mental image of how a Business Intelligence database is organized. It’s more complex than a Star Schema. It’s clearly NOT a Snowflake Schema (thank goodness). It’s a collection of stars.

This is one term that I’m adding to my vocabulary!

Crystal Reports

October 16, 2013 By Robert Hatton Leave a Comment

I just did a project for a client who has an accounting package that uses Crystal Reports. He was having problems with orphaned footers. Much to my surprise, it seems that this is a common problem with Crystal Reports. It was easy to find forums where people were wrestling with this issue. Folks were discussing all kinds of different solutions, but I didn’t see any mention of putting a simple BASIC statement in the ‘Page Before’ event of the Detail section to limit the number of rows:

shared numbervar nRow;
if (nRow < 30) then
(nRow := nRow + 1;
false)
else
(nRow := 0;
true)
It seems to work pretty well. I have to say, though, that I don’t regret not having used Crystal Reports for several years. It’s pretty clunky.

SQL Saturday was great!

September 23, 2013 By Robert Hatton Leave a Comment

SQL Saturday San Diego 2013

The SQL Saturday last weekend was a great meeting! As usual, there was a lot of great information to absorb. Paul Turley’s presentations on Tablular Models and Visualization Choices were particularly good. I hadn’t gone to any of the non-technical presentations in the past, but I went to the ’10 Things I Hate About Interviewing With You’ session and found it to be worthwhile.

My ‘Designing for Analysis’ presentation went great. It was standing room only and the audience was actively engaged the whole time. This was the second time I’ve presented on this topic. Thanks to feedback from the first presentation, I made some changes that made a huge difference this time. Feedback from this session leads me to believe that I should split this presentation into a beginner version and a deeper dive.

All in all, though, it was a wonderful weekend!