Using Set Theory Instead of ISNULL To Filter Data

I have written dozens upon dozens of reports for my users. I love the challenge of finding the data and wrangling it into a report in a way that doesn’t take a hundred years to run when they decide to return a year’s worth of data. One of the common requests that I get is to provide a parameter that will allow them to choose a single client or all clients. At first blush this is a very simple request and can be accomplished by using the ISNULL function.

ISNULL - ISNULL Code

Unfortunately there are performance implications using the ISNULL function. The ISNULL function is non-sargable. This means that when the function is used as part of a predicate, it can’t utilize the necessary index in an optimal way to filter the data. In this example, the Execution Plan shows that a full index scan was used to return one row. This equated to 80 Logical Reads for my dataset. This is the same number of Logical Reads whether one row was returned or all rows were returned. Eighty Logical Reads may not be that big a deal, but what about on a table that has hundreds of thousands of rows?

ISNULL - ISNULL Execution Plan
Execution Plan

ISNULL - ISNULL Reads 1
Statistics on the Table I/O Tab in SQL Sentry Plan Explorer Pro

 

An Alternate Universe

There is an alternate way of accomplishing this same request and it uses Set Theory to solve the problem. I can accomplish the same request by separating out the ISNULL logic into two different SQL statements and using UNION ALL to join the two result sets into one result set.  Even though I’m using two different SQL statements, only one will return data during the execution of the logic.

ISNULL - Union All Code

The first SELECT statement returns rows when the @ClientID variable has a value other than NULL. The second SELECT statement returns rows when @ClientID is NULL. By using UNION ALL instead of UNION we forgo the task of checking if there are duplicate rows in the second SELECT statement.

The Execution Plan now contains the ability to handle both cases, and the Logical Reads changes based on the value of @ClientID. There are still 80 Logical Reads when @ClientID is NULL because the whole table is being returned, but the Logical Reads are reduced to two when @ClientID is set to a single client number.

 

ISNULL - Union All Execution Plan 1
Execution Plan when @ClientID =7890

ISNULL - Union All Reads 1

 

Statistics when @ClientID = 7890
ISNULL - Union All Execution Plan 2
Execution plan when @ClientID IS NULL

ISNULL - Union All Reads 2
Statistics when @ClientID IS NULL

 

In this example I was able to use UNION ALL to filter on one or all clients, but it isn’t always the best solution. If the table being filtered fits on one page, then a table scan will occur regardless of how the predicate is setup. In that case, the ISNULL statement is easier to maintain so it would be the better solution.

Conclusion

It is always best to take a look at the Execution Plan of a SQL statement to see if there are any Index Scans on filtered data. If there are, you should take a look to see if there is another, set based approach to solve the problem.

7 comments

  1. Andre Ranieri says:

    I have been testing with dynamic SQL inside stored procedures where there are multiple parameters which can be either NULL (no filter) or a set filter value. I’ve had success in this situation because the column’s not even included in the WHERE clause when there’s no filter value for the parameter.

    • SQLMickey says:

      Yes, that is another alternative method. The downside, is there will be a separate plan in the cache for every dynamic SQL statement generated. The plan reuse will go down and the number of plans in cache for that stored procedure will go up. That is not necessarily a bad thing, but it can affect the life of the plan since plans are removed from cache when there are too many plans in memory. (When there is plenty of room for the plans in memory, then they age out over time.)

  2. Tahir says:

    Thanks Mickey,

    I have been using in many queries the syntax like this:

    where(@parem is NULL or @param = columnA)

    There are 5 or 6, sometime more than that, parameters, what you suggest, shall we break our query into union all?

  3. Andre Ranieri says:

    I have a reporting project where the users can choose to filter on multiple field parameters. The filter can be a specific field, or “ANY” to not filter on that particular column.

    In a nutshell, my T-SQL to pull data is simplified down to what’s pasted below. In reality, there are multiple joins involved and the filter fields aren’t necessarily in the same tables.

    My question is whether the following T-SQL might be considered SARGable, or if there’s a preferred best practice that might work better in this case. Some of these queries are in fairly long stored procedures so dynamic SQL isn’t necessarily a good option.

    Thanks for posting your blog,

    Andre Ranieri

    SELECT somefielda, somefieldb, somefieldc, somefieldd)
    FROM SomeTable
    WHERE somefielda = (CASE WHEN @filtera = ‘ANY’ THEN somefielda ELSE @filtera END) AND
    somefieldb = (CASE WHEN @filterb = ‘ANY’ THEN somefieldb ELSE @filterb END) AND
    somefieldc = (CASE WHEN @filterc = ‘ANY’ THEN somefieldc ELSE @filterc END) AND
    somefieldd = (CASE WHEN @filterd = ‘ANY’ THEN somefieldd ELSE @filterd END) AND

    • SQLMickey says:

      hi Andre and Tahir,

      You both have great questions and I want to address them at the same time.

      The OR statement is a Non-SARGable operation. Which means the Optimizer will not be able to make the best use of the available indexes. The CASE statements are SARGable and perform as well as the ISNULL pattern in my tests. The reason that the UNION ALL pattern that I wrote about provides a better execution plan, is it is handling both scenarios (1 or All). What we are really solving for is a form of Parameter Sniffing. Parameter Sniffing occurs when the Optimizer is trying to find the best execution plan for the query by looking at the parameters to see what was kind of value was passed in, then the Optimizer selects the indexes it wants to use. This solution is targeting the pattern of filtering on a particular value or bringing all values you back. That usually uses an index differently. When all the data is brought back, the Optimizer will use an Index Scan because everything is coming back. When one value is requested, the Optimizer tries to do an Index Seek which targets only part of the Index. Note: This depends on the distribution of the data in the statistics for the table. An Index Scan might still be used.

      With this in mind, Adam Machanic sent me a tweet regarding my post, suggesting that OPTION (RECOMPILE) be added to the SQL statement. This will cause only the one SQL statement to be recompiled each time the query is run. Be cautious. OPTION (RECOMPILE) is not to be used on every query. Make sure it meets your needs.

      The other question that you both had, was around stored procedures that have several parameters. This is where you need to do some testing. Sometimes the best thing to do is to use the ISNULL pattern or the CASE pattern. You don’t need to add complexity if the table being filtered fits on a page because the Optimizer will perform an Index Scan regardless. Other times multiple CTEs will be beneficial.

      Below I created an example from the AdventureWorks2998R2 Database (Note: I added a new column called RandNumber and populated it). The example has 3 parameters on a single table. I was going to show more, but the other fields in this table were NULLABLE and that introduces a whole other discussion. What I did was break up the SQL into the multiple UNION ALL statements, each with its own parameter. This allowed me to stay away from all the permutations the parameters could be passed in as AND allowed the Optimizer to pick the best execution plan. If you look at the execution plan for this query, you’ll find that multiple indexes are included most of which are Index Seeks. The other patterns (OR, ISNULL, and CASE) were using Index Scans.

      DECLARE
      @CustomerID AS int –= 25984
      ,@RandomNumber AS int = 18490022
      ,@TerritoryID AS int

      ;WITH cte_customer
      AS(
      SELECT
      CustomerID
      ,AccountNumber
      ,RandomNumber
      FROM
      sales.customer
      WHERE
      @CustomerID IS NOT NULL
      AND customerID = @CustomerID

      UNION ALL
      SELECT
      CustomerID
      ,AccountNumber
      ,RandomNumber
      FROM
      sales.customer
      WHERE
      @CustomerID IS NULL
      )
      ,cte_AccountNumber
      AS
      (
      SELECT
      CustomerID
      ,AccountNumber
      ,RandomNumber
      FROM
      cte_customer
      WHERE
      @AccountNumber IS NOT NULL
      AND AccountNumber = @AccountNumber

      UNION ALL

      SELECT
      CustomerID
      ,AccountNumber
      ,RandomNumber
      FROM
      cte_customer
      WHERE
      @AccountNumber IS NULL
      )
      SELECT
      CustomerID
      ,AccountNumber
      ,RandomNumber
      FROM
      cte_customer
      WHERE
      @RandomNumber IS NOT NULL
      AND RandomNumber = @RandomNumber

      UNION ALL

      SELECT
      CustomerID
      ,AccountNumber
      ,RandomNumber
      FROM
      cte_customer
      WHERE
      @RandomNumber IS NULL

      –Mickey

  4. SQL3D says:

    Great post Mickey! I’m a fan of breaking up certain queries using the UNION ALL, especially when ORs in the WHERE clause cause scans instead of seeks. This is another great example of that!

%d bloggers like this: