Tag Archive for T-SQL

Cursor Transformation

2015 ArtOne of my favorite presentations to give is called, “Changing Your Habits to Improve the Performance of Your T-SQL.” After giving this presentation in Austin, TX this past January, I had one of my students contact me. He saw in one of my demonstration how bad cursors can perform. He wanted to see if his stored procedure could be written using set theory.

Scenario

He has an application that can only accept 3 data types (NVARCHAR, NUMERIC, and DATETIME), and as all normal database schemas do, his database has more than 3 data types. He needed to identify which column had data types that could not be used and determine which of the 3 data types could be used for that column. The actual conversion was handled by another process, so the information to do the conversion was stored in a table.

The Cursor Approach

His approach was to use a cursor to cycle through all the columns in the provided table, analyze each column, determine the new data type, and store the information in a table variable. After the cursor was completed, the data in the table variable was written to a permanent table for the next process to use.

This approach isn’t necessarily bad. If you are only running it infrequently and you needed to write this stored procedure quickly, then it’s fine. But if this type of stored procedure needs to be run frequently, then it should be rewritten.

The Reason For the Rewrite

Every SQL statement used in a stored procedure can get it’s own execution plan, depending on whether or not it was parameterized. Parameterized queries that are identically written (including case, spaces, and line breaks), can reuse an execution plan. Those that are not, will receive their own execution. Each unique execution plan is stored in the cache. When there are multiple similar execution plans stored in the Cache, it’s called “Cache Bloat.”

Note: This does not apply to databases with the “optimize for ad hoc workloads” setting turned on, but that would be a different blog post.

What does this have to do with Cursors? Each time the Cursor loops (and this includes While loops), each SELECT statement is executed independently and receives its own execution plan. If the query is parameterized, then execution plans can be reused. You can see this by using Extended Events, or Profiler.

Note: If you try this out, don’t do it in production. You’ll be adding load to your SQL Server.

The Cursor

Below is a similar query to what I was sent. This particular solution looks very complicated. There are table variables, one holding the good data types, and one that will hold the needed information for the conversion. There are also several variables to be used in the cursor. Finally the table variable with the needed information, is written to the external table.


USE DemoProgramming
GO
SET NOCOUNT ON;

/* These are the parameters that would be used in the stored procedure.*/
DECLARE
@schema AS NVARCHAR(200) = 'dbo'
,@tableName AS NVARCHAR(200) = 'SalesHeader'
/* table variable will approved data types.*/
DECLARE @legalDataTypes TABLE
(
Data_Type NVARCHAR(500)
);

INSERT INTO @legalDataTypes
(Data_Type)
VALUES
('nvarchar'),
('numeric'),
('datetime');
/* use information_schema.columns to discover information about column types of input table */
DECLARE @ColumnInformation TABLE
(
ColumnName NVARCHAR(100) NOT NULL
PRIMARY KEY
,ColumnDataType NVARCHAR(20) NOT NULL
,ColumnLength NVARCHAR(10)
,NumericPrecision INT
,NumericScale INT
,OrdinalPosition INT
);

INSERT INTO @ColumnInformation
(ColumnName
,ColumnDataType
,ColumnLength
,NumericPrecision
,NumericScale
,OrdinalPosition
)
SELECT
COLUMN_NAME
,DATA_TYPE
,CHARACTER_MAXIMUM_LENGTH
,NUMERIC_PRECISION
,NUMERIC_SCALE
,ORDINAL_POSITION
FROM
INFORMATION_SCHEMA.COLUMNS
WHERE
TABLE_NAME = @tableName
AND TABLE_SCHEMA = @schema;

/* The table will be populated iteratively and eventually inserted into the meta data table */
DECLARE @resultSet TABLE
(
ColumnName NVARCHAR(100)
,ColumnDataType NVARCHAR(20)
,ColumnLength NVARCHAR(10)
,Position INT
);

/* These will be set each loop */
DECLARE @columnLength AS NVARCHAR(10)
DECLARE @newDataType AS NVARCHAR(20);

/*setup cursor*/
DECLARE @columnName NVARCHAR(100);
DECLARE @dataType NVARCHAR(100);
DECLARE @characterMaximumLength NVARCHAR(10);
DECLARE @ordinalPosition INT;
DECLARE @numericPrecision INT;
DECLARE @numericScale INT;

DECLARE allColumnsCursor CURSOR
FOR
SELECT
ColumnName
,ColumnDataType
,ColumnLength
,NumericPrecision
,NumericScale
,OrdinalPosition
FROM
@ColumnInformation;
OPEN allColumnsCursor;
FETCH NEXT FROM allColumnsCursor INTO @columnName, @dataType, @characterMaximumLength, @numericPrecision, @numericScale, @ordinalPosition;
WHILE @@FETCH_STATUS = 0
BEGIN
IF @dataType NOT IN (SELECT
Data_Type
FROM
@legalDataTypes)
/* illegal data types*/
BEGIN
IF @dataType IN ('varchar', 'char', 'nchar')
BEGIN
SET @newDataType = N'nvarchar';
SET @columnLength = CAST(@characterMaximumLength AS NVARCHAR(10));
END;
ELSE
IF @dataType IN ('decimal', 'float', 'real', 'money', 'smallmoney')
BEGIN
SET @newDataType = N'numeric';
SET @columnLength = '(19,8)';
END;
ELSE
IF @dataType IN ('bigint', 'smallint', 'tinyint', 'binary', 'varbinary', 'int')
BEGIN
SET @newDataType = N'numeric';
SET @columnLength = '(19,0)';
END;
ELSE
IF @dataType IN ('bit')
BEGIN
SET @newDataType = N'numeric';
SET @columnLength = '(1,0)';
END;
ELSE
IF @dataType IN ('smalldatetime', 'date', 'time', 'datetimeoffset', 'datetime2', 'timestamp')
BEGIN
SET @newDataType = N'datetime';
SET @columnLength = NULL;
END;
ELSE
BEGIN
DECLARE @ret_string VARCHAR(255);
EXEC sys.xp_sprintf @ret_string OUTPUT, '@@columnName = %s has unrecoginzed @dataType = %s', @columnName, @dataType;
RAISERROR(@ret_string,16,1);
RETURN;
END;
END;
ELSE
BEGIN
/* legal data types, don't change datatype but capture correct columnLength */
-- VALUES ('nvarchar'),('numeric'),('datetime');
SET @newDataType = @dataType;
IF @dataType = 'nvarchar'
BEGIN
SET @columnLength = CAST(@characterMaximumLength AS NVARCHAR(10));
END;
ELSE
IF @dataType = 'numeric'
BEGIN
SET @columnLength = '(' + CAST(@numericPrecision AS NVARCHAR(10)) + ',' + CAST(@numericScale AS NVARCHAR(10)) + ')';
END;
ELSE
BEGIN
SET @columnLength = NULL;
END;
END;
INSERT INTO @resultSet
(ColumnName
,ColumnDataType
,ColumnLength
,Position
)
VALUES
(@columnName
,@newDataType
,@columnLength
,@ordinalPosition
);

FETCH NEXT FROM allColumnsCursor INTO @columnName, @dataType, @characterMaximumLength, @ordinalPosition, @numericPrecision, @numericScale;

END;
CLOSE allColumnsCursor;
DEALLOCATE allColumnsCursor;
/* populate two meta data tables*/
INSERT INTO dbo.DataTypeConversion_Cursor
(TableName
,ColumnName
,ColumnDataType
,ColumnLength
,OrdinalPosition
)
SELECT
@tableName
,ColumnName
,ColumnDataType
,ColumnLength
,Position
FROM
@resultSet;

The New Solution

The new solution consists of a permanent table that contains the information, per data type to do the conversion. That allows a join between the conversion information and the metadata found in the system SQL view, “INFORMATION_SCHEMA.COLUMNS”. After the join, the new information can be directly inserted into the permanent table for the next process to consume.


/*Create this table one time*/
CREATE TABLE dbo.DataTypeConversion
(
OldDataType NVARCHAR(20)
,NewDataType NVARCHAR(20)
,columnLength NVARCHAR(20)
)

/*Values for the Data Types that will be converted.*/
INSERT INTO dbo.DataTypeConversion
(OldDataType, NewDataType, columnLength)
VALUES
('decimal', N'numeric', '(19,8)')
,('Float', N'numeric', '(19,8)')
,('real', N'numeric', '(19,8)')
,('money', N'numeric', '(19,8)')
,('smallmoney', N'numeric', '(19,8)')
,('varchar', N'nvarchar', '-1')
,('char', N'nvarchar', '-1')
,('nchar', N'nvarchar', '-1')
,('bigint', N'numeric', '(19,0)')
,('smallint', N'numeric', '(19,0)')
,('tinyint', N'numeric', '(19,0)')
,('binary', N'numeric', '(19,0)')
,('varbinary', N'numeric', '(19,0)')
,('int', N'numeric', '(19,0)')
,('bit', N'numeric', '(1,0)')
,('smalldatetime', N'datetime', NULL)
,('date', N'datetime', NULL)
,('time', N'datetime', NULL)
,('datetimeoffset', N'datetime', NULL)
,('datetime2', N'datetime', NULL)
,('timestamp', N'datetime', NULL)
,('numeric', N'numeric', NULL)
,('nvarchar', N'nvarchar', NULL)

/*paramters that would be used with the Stored Procedure.*/
DECLARE
@Tablename AS VARCHAR(100) = 'SalesHeader'
,@schema AS VARCHAR(20) = 'dbo'

INSERT DataTypeConversion_SetTheory
(TableName
,ColumnName
,ColumnDataType
,ColumnLength
,OrdinalPosition
)
SELECT
c.TABLE_NAME
,c.COLUMN_NAME
,d.NewDataType

/*Conversion for data types*/
,CASE WHEN d.NewDataType = 'nvarchar' THEN CAST(c.CHARACTER_MAXIMUM_LENGTH AS NVARCHAR(10))
WHEN d.OldDataType = 'numeic' THEN '(' + CAST(c.NUMERIC_PRECISION AS NVARCHAR(10)) + ',' + CAST(c.NUMERIC_SCALE AS NVARCHAR(10)) + ')'
ELSE d.columnLength
END AS NewColumnLength
,c.ORDINAL_POSITION
FROM
INFORMATION_SCHEMA.COLUMNS AS c
JOIN dbo.DataTypeConversion AS d ON c.DATA_TYPE = d.OldDataType
WHERE
c.TABLE_NAME = @Tablename
AND c.TABLE_SCHEMA = @schema;

Giving thanks

I want to give thanks to my student, Mark Lai and the company he works for, allowing me to write about this Cursor Transformation. As the geek I am, I thoroughly enjoyed the challenge of thinking outside of the box to rewrite the stored procedure.

T-SQL Tuesday #72 – Bad Decisions Made With Surrogate Keys

This is my second time hosting the t-SQL Tuesday blog party. The party was started by Adam Machanic (b|t) in December of 2009.

This month’s invitation topic is on Data Modeling Gone Wrong. Being a Database Developer, I deal with bad database design decisions daily. One of my app-dev teammates loves to tell me that the bad decisions were made because I didn’t work there yet. (That makes me laugh.)

Surrogate Keys vs Natural Keys

The point of surrogate keys is to represent complicated natural keys as the primary key of the table. Both the surrogate key and natural key will yield a unique key for the row. Sometimes that unique natural key is the entire row. When possible, it is better to use the natural key since it is the true representation of the row. Unfortunately, this is not always practical. Let’s look at some examples.

TSQLTuesday72 Image1

In the employee table it would take four fields to make a primary key from the natural key (first name, last name, social security number, and birthdate). Note: This is assuming this table is only used in the US and the employees have social security numbers. The reason the birthdate is also needed is due to the fact that social security numbers can be reused after someone has passed away. For the employee table it makes sense to have a surrogate key since it would be cumbersome to use all four fields as foreign keys in other tables.

The StateList (representing each state in the United States) is a good example of using the natural key as the primary key. Each state only uses two characters to represent the State so CHAR(2) can be used for the natural key and the primary key. This would provide the added benefit of not needing to join back to the StateList to get the two character representation of the State abbreviation…unless additional information about the state is needed. So what is the point of this table? Well, by having it, you are guaranteed referential integrity on the StateCode field by having a foreign key back to the StateList table. You don’t have to worry that someone puts ZZ as a StateCode.

Danger, Will Robinson!

One of the problems I’ve seen with careless use of surrogate keys are the duplication of natural keys. Quite often it’s overlooked that the natural key still needs to have a unique constraint. Without it, the reporting team ends up having to use MAX or DISTINCT to get the latest instance of the natural key, or SSIS packages are needed to clean up the duplicates. This can be compounded with many-to-many tables.

Many-to-many tables allow two tables to be joined multiple times. An example can be seen in the car insurance industry.  If you have multiple people on the same insurance and they are registered to drive multiple cars, then a many-to-many table would be created to capture the data.

If a surrogate key is used on the many-to-many table in order to provide uniqueness and if the natural key does not have a unique constraint, then duplicate natural key combinations can occur. This can be obfuscated if there is additional information in the table. Maybe the amount the car is insured, is also maintained in this table. Let’s take Victoria’s insurance as an example. If Victoria is in the table with her 1971 Corvette listed twice with two different insurance amounts listed, which one is the current one? The better pattern in this case would be to use the natural key.

TSQLTuesday72 Image2

Conclusion

Surrogate keys are very useful, but it should not be assumed that they should be used for all tables. The natural key should always be considered first. If the natural key is too complicated to be used as foreign keys in other tables, then the surrogate key is a good choice. Just remember to ALSO put a unique constraint on the natural key.

Thanks for all the fish

I had several people tell me on Twitter that they were going to write their first blog post for this t-SQL Tuesday blog party. I want to thank them ahead of time for taking the leap into the blogging world to share their experiences and expertise in their fields.

T-SQL Tuesday #72 Invitation – Data Modeling Gone Wrong

This month marks the 72nd T-SQL Tuesday.  Adam Machanic’s (b|t) started the T-SQL Tuesday blog party in December of 2009. Each month an invitation is sent out on the first Tuesday of the month, inviting bloggers to participate in a common topic. On the second Tuesday of the month all the bloggers post their contribution to the event for everyone to read. The host sums up all the participant’s entries at the end of the week. This month I’m the host and the topic is …

Data Modeling Gone Wrong

The purpose of SQL Server, is to make sure that the databases are kept safe and run as optimally as possible. The problem is, if the data model is flawed, or not maintained, then no matter how optimally the SQL Server is configured, the database won’t be able to function efficiently.

I would like to invite you to share some data modeling practices that should be avoided, and how to fix them when they do occur.

Rules for T-SQL Tuesday Blog Party

Rule 1: Make sure that you include the T-SQL Tuesday image at the top of the post which will help identify your post as a T-SQL Tuesday blog post.  Please include a link back to this invitation too.

Rule 2: Publish your post sometime next Tuesday using GMT. Here’s a link to a GMT time convertor.  For example, in California, that would cover 5 pm Monday to 5 pm (PDT) Tuesday.

Rule 3: Come back here and post a link in the comments so that I can find all the posts for the round up.

Rule 4: Don’t get yourself fired. Make sure that you either generalize your post or get permission to blog about anything from work.

Rule 5: If you roam the Twitterverse, then don’t forget to Tweet about your blog post with the hashtag #tsql2sday.

Rule 6: Go read someone else’s blog on the subject!

Final Rule: Have fun!

T-SQL Tuesday #51- Don’t Crap Out While Betting On Table Functions

My good friend Jason Brimhall (b|t) is hosting this month’s T-SQL Tuesday blog party. The party was started by Adam Machanic (b|t) in December of 2009. As a compliment to the upcoming debut of the Las Vegas SQL Saturday, Jason has taken up a betting theme. He wants to know our stories of when we bet it all on a risky solution and won or lost.

Instead of telling you about the past, I want to help you win big at the table today. I really don’t want you to crap out while betting on the wrong table functions.

Snake Eyes

There are two types of table functions Multi-line Table Functions and In-Line Table Functions. There is a huge difference between the two of them.

Multi-line table functions sound great. You write as much code as you need in them and they will return all the data in a table variable. This is where the weighted dice rolls snake eyes every single time. You see, the statistics for a table variable always, always says there is only one row in the table being returned. It doesn’t matter if there are a hundred, a thousand, or a million rows. The statistics will say one. Which means the optimizer has a good chance of loosing when it picks the execution plan for that query.

Let’s Take a Look at the Bets

For my example, I have a simple query that returns 43 rows out of a Tally table. Notice that the index estimates 43 rows will be returned, which is great, because that is exactly on the money!

TSQLTuesday51_1

 

If we put that same query inside of a multi-line table function, we get an estimated number of rows of 1 (snake eyes!).

 

TSQLTuesday51_2

 

TSQLTuesday51_3

 

Double Down

An in-line table function will return the same result set, but there are some limitations on its construction. The entire query within the in-line table function needs to be done in only one statement.

Note: You can get very creative with Common Table Expressions (CTE) if need be.

There are two benefits to using an in-line Table Function. One, is that the Estimated Number of Rows will be accurate (or as accurate as the statistics on the table), and two, the “inside” of the in-line table function is not masked in the Execution Plan. It is plopped right into the middle of the calling query. (Yes, “plopped” is a technical term. )

 

TSQLTuesday51_4

 

TSQLTuesday51_5 

Last Call

SoRemember to double down on in-line table functions and don’t crap out on the snake eyes of the multi-line table function.

Thanks for all the fish

Thanks go out to Jason Brimhall for hosting this month’s T-SQL Tuesday blog party. Please visit his website at http://jasonbrimhall.info/, or better yet come to Las Vegas for their SQL Saturday and thank him in person.

SQL Bacon Bits No. 2 – Dropping Temporary SQL Objects

Bacon file8121243652304This is my second SQL Tidbit and I have already decided to change the name to SQL Bacon Bits. Why? Because it’s my blog and I can. These quick posts are supposed to be simple and yummy, just like bacon.

Today’s featured product is ER/Studio Data Architect.

The Need

Whenever a change script is created from an ER/Studio Data Architect model, the original objects are kept with a date time stamp as part of their name. These original copies are then left for the developer to determine if they are still needed.

When I first came across these objects in my database, I was flummoxed. “Why would they leave such a mess?”, I asked myself. But they didn’t leave a mess. They were actually being considerate. Sometimes the SQL Objects have issues because of the changes being brought into the database. Since the original objects are kept as back up copies, you can go look at them before deleting them.

So how do you get rid of these objects once you have confirmed they are no longer needed?

Well, you create a script. This script is based on a version that my co-worker Chris Henry created. I’ve added to it and generalized it.

Let’s break it down

I found that ER/Studio Data Architect uses UTC as the time for the timestamp on the temporary objects. (That drove me nuts until I figured it out!). By using the GetUTCDate() function you’ll be able to programmatically create the date to filter the SQL objects that you want to get rid of.

DECLARE @Date AS varchar(10)

SELECT
    @date = '%' + CONVERT(varchar(2), MONTH(GETUTCDATE())) 
				+ ( CASE 
						WHEN DAY(GETUTCDATE()) < 10 THEN '0'
                        ELSE ''
                        END ) 
				+ CONVERT(varchar(2), DAY(GETUTCDATE())) 
				+ CONVERT(varchar(4), YEAR(GETUTCDATE())) + '%'

 

I then use UNION ALL to join the different types of objects with the proper DROP syntax.

SELECT
    'ALTER TABLE [' + s.name + '].[' + p.name + '] DROP CONSTRAINT  [' + o.name + ']' AS ItemToDrop
	,o.type

FROM
    sys.objects AS o
	JOIN sys.objects AS p ON o.parent_object_id = p.object_id
	JOIN sys.schemas AS s ON p.schema_id = s.schema_id
WHERE
    o.name LIKE @date
    AND o.type = 'F'

UNION ALL

SELECT
    'Drop Trigger [' + s.name + '].[' + o.name + ']' AS ItemToDrop
	,o.type
FROM
    sys.objects AS o
	JOIN sys.objects AS p ON o.parent_object_id = p.object_id
	JOIN sys.schemas AS s ON p.schema_id = s.schema_id
WHERE
    o.name LIKE @date
    AND o.type = 'TR'

UNION ALL

SELECT
    'drop table [' + s.name + '].[' + o.name + ']' AS ItemToDrop
	,o.type
FROM
    sys.objects AS o
	JOIN sys.schemas AS s ON o.schema_id = s.schema_id
WHERE
    o.name LIKE @date
    AND o.type  ='U'

UNION ALL

SELECT
    'drop proc [' + s.name + '].[' + o.name + ']' AS ItemToDrop
	,o.type
FROM
    sys.objects AS o
	JOIN sys.schemas AS s ON o.schema_id = s.schema_id
WHERE
    o.name LIKE @date
    AND o.type  ='P'
ORDER BY
	o.type

 

After it executes, I copy the statements out of the results pane and execute them.

Bonus Tip

Since this is a script you’ll use again and again…and again. You’ll want to turn it into a template. Two of the many ways to do that are as follows.

  • SSMS comes with a Template Browser. You can save your script in the Template Browser for future use. Each you need it, simply double click on the script in the Template Browser and copy will be created for you. By adding the following code at the top of the script, you can use Ctrl+M to pick which database you want to use.
USE <DatabaseName, string,>
GO

 

  • If you are a SQL Prompt addict like me, then  you can add your script as a snippet. Instead of adding the code I just showed you, add the following code and a default database name will appear as a default when you use Ctrl+M.
USE <DatabaseName, string,$DBNAME$>
GO

You can download full script from here.

SQL Bacon Bits

SQL Tidbits: No.1– Outputting from ER/Studio Data Architect Directly to SQL Server Management Studio

 

Dev Connections – Demos and Slides Are Available

Thanks to all the Dev Connection attendees who came to my class. I have posted the slides, demos, and demo database on my Resources page.

T-SQL Tuesday #43: Give Me a Key Lookup Operator for $1200, Please

My good friend Rob Farley (b|t) is hosting this month’s T-SQL Tuesday blog party. The party was started by Adam Machanic (b|t) in December of 2009.  The topic this month is on Plan Operators that are used in execution plans to tell us how the Optimizer is going to run our query statements.

What is a Key Lookup Operator?

The Optimizer uses indexes to retrieve the fields needed from a table. If there are missing fields in the index being used, then the Optimizer has to go back to the Clustered Index to get the other fields. This has to be done for every row in the table. This action is noted in the execution plan by the Key Lookup Operator. The RID Lookup Operator is used instead of the Key Lookup Operator when a Clustered Index is not used and a Heap is used instead.

Show Me the Money

For my example I used the AdventureWorks2008R2 database. I ran the following query and looked at the execution plan.

SELECT
	c.CustomerID
   ,c.PersonID
   ,c.StoreID
   ,c.TerritoryID
   ,c.AccountNumber
   ,s.BusinessEntityID AS StoreID
   ,s.Name AS StoreName
   ,s.SalesPersonID
   ,st.Name AS TerritoryName
   ,st.CountryRegionCode
   ,st.[Group]
   ,st.SalesYTD
   ,st.SalesLastYear
   ,st.CostYTD
   ,st.CostLastYear
FROM
	Sales.Customer AS c
	JOIN Sales.Store AS s ON c.StoreID = s.BusinessEntityID
	JOIN Sales.SalesTerritory AS st ON c.TerritoryID = st.TerritoryID
WHERE
	c.StoreID IN (1284, 994, 1356, 1282, 992, 1358, 1280)

 

TSQLTuesday43 - KeyLookup

Two indexes were used to retrieve all the fields needed from the Sales.Customer table. The ix_Customer_StoreID index was missing the TerritoryID field so the Optimizer had to go to the PK_Customer_CustomerID Clustered Index to retrieve it. If you add the cost of both operators, then 66% of the cost of the query was used to retrieve fields from the Sales.Customer table.

For reference I removed the Clustered Index to show you what the execution plan would look like when a Heap is involved.

TSQLTuesday43 - RIDLookup

 

Since there was already a good index for the Customer table, I added the TerritoryID to the INCLUDE part of the index script. This turned the index into a covering index. A covering index is an index that contains all the fields from a table that are needed by a query statement. The INCLUDE part of an index allow extra fields to be part of the index, without the overhead of the data being sorted. Any fields that are part of predicates or filters should be part of the index, all other fields from the table should be part of the INCLUDE. Be cautious though, don’t throw the whole kitchen sink there. Those fields still take up space.

TSQLTuesday43 - Index

When I ran the execution plan again, I saw that the Key Lookup Operator was removed and the total cost to retrieve the fields from the Customer table was now reduced to 12%. This would also be true if the table was using a Heap instead of a Clustered Index.

TSQLTuesday43 - CoveringIndex

The Bottom Line

When you see a Key Lookup Operator or a  RID Lookup Operator, look to see if it makes sense to modify the corresponding index to be a covering index.

Thanks for all the fish

Thanks go out to Rob Farley for hosting this month’s T-SQL Tuesday blog party. Please visit Rob’s blog at http://sqlblog.com/blogs/rob_farley/default.aspx.

Shameless Plug for Grant Fritchey’s (b|t) Book: SQL Server Execution Plans, Second EditionYup, this is a shameless plug for one of my FAVORITE books. Grant did an outstanding job on this book. He goes through the majority of the execution plan operators, discussing what they represent. He also goes into how best to read an execution plan and how changing your query can affect the operators that are shown in the execution plan. I highly recommend adding this book to your collection if you don’t already have it.This book is available from Red Gate. You can download a FREE eBook, buy a hard copy from Amazon, or if you are lucky you can pick up a copy from a Red Gate event.

 

Using Set Theory Instead of ISNULL To Filter Data

I have written dozens upon dozens of reports for my users. I love the challenge of finding the data and wrangling it into a report in a way that doesn’t take a hundred years to run when they decide to return a year’s worth of data. One of the common requests that I get is to provide a parameter that will allow them to choose a single client or all clients. At first blush this is a very simple request and can be accomplished by using the ISNULL function.

ISNULL - ISNULL Code

Unfortunately there are performance implications using the ISNULL function. The ISNULL function is non-sargable. This means that when the function is used as part of a predicate, it can’t utilize the necessary index in an optimal way to filter the data. In this example, the Execution Plan shows that a full index scan was used to return one row. This equated to 80 Logical Reads for my dataset. This is the same number of Logical Reads whether one row was returned or all rows were returned. Eighty Logical Reads may not be that big a deal, but what about on a table that has hundreds of thousands of rows?

ISNULL - ISNULL Execution Plan
Execution Plan

ISNULL - ISNULL Reads 1
Statistics on the Table I/O Tab in SQL Sentry Plan Explorer Pro

 

An Alternate Universe

There is an alternate way of accomplishing this same request and it uses Set Theory to solve the problem. I can accomplish the same request by separating out the ISNULL logic into two different SQL statements and using UNION ALL to join the two result sets into one result set.  Even though I’m using two different SQL statements, only one will return data during the execution of the logic.

ISNULL - Union All Code

The first SELECT statement returns rows when the @ClientID variable has a value other than NULL. The second SELECT statement returns rows when @ClientID is NULL. By using UNION ALL instead of UNION we forgo the task of checking if there are duplicate rows in the second SELECT statement.

The Execution Plan now contains the ability to handle both cases, and the Logical Reads changes based on the value of @ClientID. There are still 80 Logical Reads when @ClientID is NULL because the whole table is being returned, but the Logical Reads are reduced to two when @ClientID is set to a single client number.

 

ISNULL - Union All Execution Plan 1
Execution Plan when @ClientID =7890

ISNULL - Union All Reads 1

 

Statistics when @ClientID = 7890
ISNULL - Union All Execution Plan 2
Execution plan when @ClientID IS NULL

ISNULL - Union All Reads 2
Statistics when @ClientID IS NULL

 

In this example I was able to use UNION ALL to filter on one or all clients, but it isn’t always the best solution. If the table being filtered fits on one page, then a table scan will occur regardless of how the predicate is setup. In that case, the ISNULL statement is easier to maintain so it would be the better solution.

Conclusion

It is always best to take a look at the Execution Plan of a SQL statement to see if there are any Index Scans on filtered data. If there are, you should take a look to see if there is another, set based approach to solve the problem.

T-SQL Tuesday #37 – RIGHT JOIN, LEFT JOIN, raw, raw, raw

T-sql TuesdayThis months T-SQL Tuesday blog party is being hosted by Sebastian Meine, PhD (blog). The topic is on JOINS. I’ve chosen to blog about how to rewrite a RIGHT JOIN I found this week in a stored procedure.The query was simple, but it was difficult to read without creating a data model and understanding the relationships of the tables. I decided to rewrite it to make it easier to maintain in the future. I have created a sample data model to demonstrate the problem and the solution. The data model has a table for Sales Reps,a table for Regions which the Sales Reps belong to (optionally), a list of Clients, and a linking table which links the Clients and the Sales Reps.

The query that was created, returns all the Clients and any associated Sales Reps with their regions, even if the Sales Rep is not assigned to a region. The original programmer used a RIGHT JOIN  to join the SalesRep table to the ClientSalesRep table. They also put the predicate for the SalesRegion table after the SalesRep table. While the Optimizer has no problem reading this query, I had to stand on my head to figure it out.

SELECT
   c.ClientID
   ,c.ClientName
   ,sr.Region
   ,srep.FirstName
   ,srep.LastName
FROM
   dbo.Client AS c
   LEFT JOIN dbo.ClientSalesRep AS cus ON c.ClientID = cus.ClientID
   LEFT JOIN dbo.SalesRegion AS sr
   RIGHT JOIN dbo.SalesRep AS srep ON srep.SalesRegionID = sr.SalesRegionID
										ON cus.SalesRepID = srep.SalesRepID
GO

I rewrote the query using only LEFT JOINS and each table had its own predicate. I found the LEFT JOINS made it easier to read and didn’t give me a headache.

SELECT
   c.ClientID
   ,c.ClientName
   ,sr.Region
   ,srep.FirstName
   ,srep.LastName
FROM
   dbo.Client AS c
   LEFT JOIN dbo.ClientSalesRep AS cus ON c.ClientID = cus.ClientID
   LEFT JOIN dbo.SalesRep AS srep ON cus.SalesRepID = srep.SalesRepID
   LEFT JOIN dbo.SalesRegion AS sr ON srep.SalesRegionID = sr.SalesRegionID

GO

I populated the tables with a million rows to see if the Optimizer would treat these queries differently. It didn’t. They had the same query plan, the same number of reads, and the same statistics, but it was easier to read.

Thanks go to Sebastian for hosting T-SQL Tuesday this month. Check out Sebastian’s blog, because he is blogging about JOINS all month.

%d bloggers like this: