Using Set Theory Instead of ISNULL To Filter Data

SQLMickey | May 28, 2013

Using Set Theory Instead of ISNULL To Filter Data

I have written dozens upon dozens of reports for my users. I love the challenge of finding the data and wrangling it into a report in a way that doesn’t take a hundred years to run when they decide to return a year’s worth of data. One of the common requests that I get is to provide a parameter that will allow them to choose a single client or all clients. At first blush this is a very simple request and can be accomplished by using the ISNULL function.

Unfortunately there are performance implications using the ISNULL function. The ISNULL function is non-sargable. This means that when the function is used as part of a predicate, it can’t utilize the necessary index in an optimal way to filter the data. In this example, the Execution Plan shows that a full index scan was used to return one row. This equated to 80 Logical Reads for my dataset. This is the same number of Logical Reads whether one row was returned or all rows were returned. Eighty Logical Reads may not be that big a deal, but what about on a table that has hundreds of thousands of rows?

Execution Plan

Statistics on the Table I/O Tab in SQL Sentry Plan Explorer Pro

An Alternate Universe

There is an alternate way of accomplishing this same request and it uses Set Theory to solve the problem. I can accomplish the same request by separating out the ISNULL logic into two different SQL statements and using UNION ALL to join the two result sets into one result set. Even though I’m using two different SQL statements, only one will return data during the execution of the logic.

The first SELECT statement returns rows when the @ClientID variable has a value other than NULL. The second SELECT statement returns rows when @ClientID is NULL. By using UNION ALL instead of UNION we forgo the task of checking if there are duplicate rows in the second SELECT statement.

The Execution Plan now contains the ability to handle both cases, and the Logical Reads changes based on the value of @ClientID. There are still 80 Logical Reads when @ClientID is NULL because the whole table is being returned, but the Logical Reads are reduced to two when @ClientID is set to a single client number.

Execution Plan when @ClientID =7890

Statistics when @ClientID = 7890

Execution plan when @ClientID IS NULL

Statistics when @ClientID IS NULL

In this example I was able to use UNION ALL to filter on one or all clients, but it isn’t always the best solution. If the table being filtered fits on one page, then a table scan will occur regardless of how the predicate is setup. In that case, the ISNULL statement is easier to maintain so it would be the better solution.

Conclusion

It is always best to take a look at the Execution Plan of a SQL statement to see if there are any Index Scans on filtered data. If there are, you should take a look to see if there is another, set based approach to solve the problem.

Category: Set Theory, SQL Functions | Tags: Functions, ISNULL, Set Theory, SQL, T-SQL, UNION ALL

7 comments

Andre Ranieri says:

July 15, 2013 at 4:06 PM

I have been testing with dynamic SQL inside stored procedures where there are multiple parameters which can be either NULL (no filter) or a set filter value. I’ve had success in this situation because the column’s not even included in the WHERE clause when there’s no filter value for the parameter.
- SQLMickey says:
  
  July 16, 2013 at 9:58 AM
  
  Yes, that is another alternative method. The downside, is there will be a separate plan in the cache for every dynamic SQL statement generated. The plan reuse will go down and the number of plans in cache for that stored procedure will go up. That is not necessarily a bad thing, but it can affect the life of the plan since plans are removed from cache when there are too many plans in memory. (When there is plenty of room for the plans in memory, then they age out over time.)
Tahir says:

May 30, 2013 at 10:29 PM

Thanks Mickey,

I have been using in many queries the syntax like this:

where(@parem is NULL or @param = columnA)

There are 5 or 6, sometime more than that, parameters, what you suggest, shall we break our query into union all?
Andre Ranieri says:

May 30, 2013 at 12:42 PM

I have a reporting project where the users can choose to filter on multiple field parameters. The filter can be a specific field, or “ANY” to not filter on that particular column.

In a nutshell, my T-SQL to pull data is simplified down to what’s pasted below. In reality, there are multiple joins involved and the filter fields aren’t necessarily in the same tables.

My question is whether the following T-SQL might be considered SARGable, or if there’s a preferred best practice that might work better in this case. Some of these queries are in fairly long stored procedures so dynamic SQL isn’t necessarily a good option.

Thanks for posting your blog,

Andre Ranieri

SELECT somefielda, somefieldb, somefieldc, somefieldd)
FROM SomeTable
WHERE somefielda = (CASE WHEN @filtera = ‘ANY’ THEN somefielda ELSE @filtera END) AND
somefieldb = (CASE WHEN @filterb = ‘ANY’ THEN somefieldb ELSE @filterb END) AND
somefieldc = (CASE WHEN @filterc = ‘ANY’ THEN somefieldc ELSE @filterc END) AND
somefieldd = (CASE WHEN @filterd = ‘ANY’ THEN somefieldd ELSE @filterd END) AND
- SQLMickey says:
  
  June 2, 2013 at 8:03 PM
  
  hi Andre and Tahir,
  
  You both have great questions and I want to address them at the same time.
  
  The OR statement is a Non-SARGable operation. Which means the Optimizer will not be able to make the best use of the available indexes. The CASE statements are SARGable and perform as well as the ISNULL pattern in my tests. The reason that the UNION ALL pattern that I wrote about provides a better execution plan, is it is handling both scenarios (1 or All). What we are really solving for is a form of Parameter Sniffing. Parameter Sniffing occurs when the Optimizer is trying to find the best execution plan for the query by looking at the parameters to see what was kind of value was passed in, then the Optimizer selects the indexes it wants to use. This solution is targeting the pattern of filtering on a particular value or bringing all values you back. That usually uses an index differently. When all the data is brought back, the Optimizer will use an Index Scan because everything is coming back. When one value is requested, the Optimizer tries to do an Index Seek which targets only part of the Index. Note: This depends on the distribution of the data in the statistics for the table. An Index Scan might still be used.
  
  With this in mind, Adam Machanic sent me a tweet regarding my post, suggesting that OPTION (RECOMPILE) be added to the SQL statement. This will cause only the one SQL statement to be recompiled each time the query is run. Be cautious. OPTION (RECOMPILE) is not to be used on every query. Make sure it meets your needs.
  
  The other question that you both had, was around stored procedures that have several parameters. This is where you need to do some testing. Sometimes the best thing to do is to use the ISNULL pattern or the CASE pattern. You don’t need to add complexity if the table being filtered fits on a page because the Optimizer will perform an Index Scan regardless. Other times multiple CTEs will be beneficial.
  
  Below I created an example from the AdventureWorks2998R2 Database (Note: I added a new column called RandNumber and populated it). The example has 3 parameters on a single table. I was going to show more, but the other fields in this table were NULLABLE and that introduces a whole other discussion. What I did was break up the SQL into the multiple UNION ALL statements, each with its own parameter. This allowed me to stay away from all the permutations the parameters could be passed in as AND allowed the Optimizer to pick the best execution plan. If you look at the execution plan for this query, you’ll find that multiple indexes are included most of which are Index Seeks. The other patterns (OR, ISNULL, and CASE) were using Index Scans.
  
  DECLARE
  @CustomerID AS int –= 25984
  ,@RandomNumber AS int = 18490022
  ,@TerritoryID AS int
  
  ;WITH cte_customer
  AS(
  SELECT
  CustomerID
  ,AccountNumber
  ,RandomNumber
  FROM
  sales.customer
  WHERE
  @CustomerID IS NOT NULL
  AND customerID = @CustomerID
  
  UNION ALL
  SELECT
  CustomerID
  ,AccountNumber
  ,RandomNumber
  FROM
  sales.customer
  WHERE
  @CustomerID IS NULL
  )
  ,cte_AccountNumber
  AS
  (
  SELECT
  CustomerID
  ,AccountNumber
  ,RandomNumber
  FROM
  cte_customer
  WHERE
  @AccountNumber IS NOT NULL
  AND AccountNumber = @AccountNumber
  
  UNION ALL
  
  SELECT
  CustomerID
  ,AccountNumber
  ,RandomNumber
  FROM
  cte_customer
  WHERE
  @AccountNumber IS NULL
  )
  SELECT
  CustomerID
  ,AccountNumber
  ,RandomNumber
  FROM
  cte_customer
  WHERE
  @RandomNumber IS NOT NULL
  AND RandomNumber = @RandomNumber
  
  UNION ALL
  
  SELECT
  CustomerID
  ,AccountNumber
  ,RandomNumber
  FROM
  cte_customer
  WHERE
  @RandomNumber IS NULL
  
  –Mickey
SQL3D says:

May 28, 2013 at 9:38 AM

Great post Mickey! I’m a fan of breaking up certain queries using the UNION ALL, especially when ORs in the WHERE clause cause scans instead of seeks. This is another great example of that!
- SQLMickey says:
  
  May 30, 2013 at 9:19 AM
  
  Thanks Dan!
  
  Mickey

Mickey's T-SQL Ponderings

Using Set Theory Instead of ISNULL To Filter Data

Like this:

7 comments

Searching…

Subscribe to Blog via Email. Resistance is futile.

Top Posts & Pages

The Tombs

These are not the links you are looking for. Move along. Move along.

Mickey's T-SQL Ponderings

Using Set Theory Instead of ISNULL To Filter Data

Share this:

Like this:

7 comments

Searching…

Subscribe to Blog via Email. Resistance is futile.

Top Posts & Pages

The Tombs

Tags

These are not the links you are looking for. Move along. Move along.