Functions | Mickey's T-SQL Ponderings

Tag Archive for Functions

SQLMickey | April 14, 2014

A Date At The End of The Month

Four SQL professionals gathered from the four corners of the world to share their SQL knowledge with each other and with their readers: Mickey Stuewe from California, USA, Chris Yates from Kentucky, USA, Julie Koesmarno from Canberra, Australia, and Jeffrey Verheul from Rotterdam, The Netherlands. They invite you to join them on their quest as they ask each other questions and seek out the answers in this collaborative blog series. Along the way, they will also include other SQL professionals to join in the collaboration.

The Second Mission

This month our team is tackling windows functions. My particular take on the subject is near and dear to my reporting heart. How to get date ranges based on full months.

End of the Month

Dates tend to be a critical component of most metrics. They define the boundaries of the data. They create groups of data that can help with trending patterns and forecasting. The downside has always been the slicing and dicing of the date. For instance, if you need to group all the data by the last day of the month. You need to create an equation like this to determine the last day of the month:

DATEADD(day, -1, CONVERT(date, CONVERT(varchar(2), DATEPART(MONTH, DATEADD(month, 1, SomeDate))) + ‘-01-‘ + CONVERT(varchar(4), DATEPART(year, DATEADD(month, 1, SomeDate)))))

Or this:

DATEADD(month,1,DATEADD(DAY,-DATEPART(DAY,GETDATE()) – 1,GETDATE()))

In SQL2012 the EOMONTH() windows function was introduced. Now you just need to type this:

EOMONTH(SomeDate)

Performance Perks

Not only are you able to write a simpler equation that is easier to maintain and understand, but you get some performance perks too.

I ran a query using both the “old way” and the “windows function way” against a table I created with a million rows of data. The data set returns 561 rows of aggregated data. I aggregated all the data by using the last day of the month. I then looked at the execution plans and the statistics for these queries.

SELECT
     DATEADD(day, -1, CONVERT(date, CONVERT(varchar(2), DATEPART(MONTH, DATEADD(month, 1, SomeDate))) + '-01-' + 
          CONVERT(varchar(4), DATEPART(year, DATEADD(month, 1, SomeDate)))))
     ,COUNT(SomeDate)
FROM
     DemoProgramming.dbo.TestData AS td
GROUP BY
     DATEADD(day, -1, CONVERT(date, CONVERT(varchar(2), DATEPART(MONTH, DATEADD(month, 1, SomeDate))) + '-01-' + 
          CONVERT(varchar(4), DATEPART(year, DATEADD(month, 1, SomeDate)))))
ORDER BY
     DATEADD(day, -1, CONVERT(date, CONVERT(varchar(2), DATEPART(MONTH, DATEADD(month, 1, SomeDate))) + '-01-' + 
          CONVERT(varchar(4), DATEPART(year, DATEADD(month, 1, SomeDate)))))

SELECT
     EOMONTH(SomeDate)
     ,COUNT(SomeDate)
FROM
     DemoProgramming.dbo.TestData AS td
GROUP BY
     EOMONTH(SomeDate)
ORDER BY
     EOMONTH(SomeDate)

Findings

1. When breaking up a date to reconstruct it, you need to go back and forth between data types. The optimizer isn’t very fond of that. In fact you get a warning about the possibility of your cardinality being questionable. The EOMONTH() windows function doesn’t have this problem.

2. The EOMONTH() windows function takes less CPU and can return values faster. For this particular execution, I received a 5.25 times faster CPU time and 3.5 times faster execution time.

I used the following code to turn on Statistics:

SET STATISTICS IO ON;
SET STATISTICS TIME ON;

But Wait! There’s More

This particular function has an additional (optional) parameter. The parameter will allow you to add/subtract months from the end of the month. Which means you can easily find the beginning of the month using this equation:

SELECT DATEADD(DAY, 1, EOMONTH(GETDATE(), -1)) as boMonth

This also allows you to easily create a one year range for the last 12 full months using these two equations:
SELECT
DATEADD(DAY, 1, EOMONTH(GETDATE(), -13)) AS StartDate
,EOMONTH(GETDATE(),-1) AS EndDate

Outside the Box

You are probably telling yourself that you can do everything I just showed you by grouping data by the Month() function. Yes, that is true, but you need to think outside the box for the application. What about those wonderful companies who think Months are not contstrained by Day 1 and Day 31? (You know who you are. They have business rules like, “the 5th day of the month”. Ah! Now we got something we can work with.

SELECT DATEADD(DAY, 5, EOMONTH(GETDATE(), -1)) as FifthDayOfMonth

Don’t Stop Yet

If you want to read more about Windows Functions, don’t forget to check out these blog posts:

Chris Yates: Windows Functions; Who Knew?
Julie Koesmarno: ABC Classification With SQL Server Window Function
Jeffrey Verheul: Write readable and high-performance queries with Window Functions

To follow our quest for SQL knowledge through this collaborative project, follow the #SQLCoOp tag on Twitter.

See you next month!!

Original Post: On a SQL Collaboration Quest

Category: SQL CoOp | Tags: #SQLCoOp, Chris Yates, Functions, Jeffrey Verheul, Julie Koesmarno

SQLMickey | February 11, 2014

4 comments

T-SQL Tuesday #51- Don’t Crap Out While Betting On Table Functions

My good friend Jason Brimhall (b|t) is hosting this month’s T-SQL Tuesday blog party. The party was started by Adam Machanic (b|t) in December of 2009. As a compliment to the upcoming debut of the Las Vegas SQL Saturday, Jason has taken up a betting theme. He wants to know our stories of when we bet it all on a risky solution and won or lost.

Instead of telling you about the past, I want to help you win big at the table today. I really don’t want you to crap out while betting on the wrong table functions.

Snake Eyes

There are two types of table functions Multi-line Table Functions and In-Line Table Functions. There is a huge difference between the two of them.

Multi-line table functions sound great. You write as much code as you need in them and they will return all the data in a table variable. This is where the weighted dice rolls snake eyes every single time. You see, the statistics for a table variable always, always says there is only one row in the table being returned. It doesn’t matter if there are a hundred, a thousand, or a million rows. The statistics will say one. Which means the optimizer has a good chance of loosing when it picks the execution plan for that query.

Let’s Take a Look at the Bets

For my example, I have a simple query that returns 43 rows out of a Tally table. Notice that the index estimates 43 rows will be returned, which is great, because that is exactly on the money!

If we put that same query inside of a multi-line table function, we get an estimated number of rows of 1 (snake eyes!).

Double Down

An in-line table function will return the same result set, but there are some limitations on its construction. The entire query within the in-line table function needs to be done in only one statement.

Note: You can get very creative with Common Table Expressions (CTE) if need be.

There are two benefits to using an in-line Table Function. One, is that the Estimated Number of Rows will be accurate (or as accurate as the statistics on the table), and two, the “inside” of the in-line table function is not masked in the Execution Plan. It is plopped right into the middle of the calling query. (Yes, “plopped” is a technical term. )

Last Call

So…Remember to double down on in-line table functions and don’t crap out on the snake eyes of the multi-line table function.

Thanks for all the fish

Thanks go out to Jason Brimhall for hosting this month’s T-SQL Tuesday blog party. Please visit his website at http://jasonbrimhall.info/, or better yet come to Las Vegas for their SQL Saturday and thank him in person.

Category: Execution Plans, Performance, SQL Functions, T-SQL, T-SQL Tuesday | Tags: #TSQL2sdays, Better Habbits, Execution Plan, Functions, SQL Community, SQL Saturday, T-SQL, T-SQL Tuesday

SQLMickey | May 28, 2013

7 comments

Using Set Theory Instead of ISNULL To Filter Data

I have written dozens upon dozens of reports for my users. I love the challenge of finding the data and wrangling it into a report in a way that doesn’t take a hundred years to run when they decide to return a year’s worth of data. One of the common requests that I get is to provide a parameter that will allow them to choose a single client or all clients. At first blush this is a very simple request and can be accomplished by using the ISNULL function.

Unfortunately there are performance implications using the ISNULL function. The ISNULL function is non-sargable. This means that when the function is used as part of a predicate, it can’t utilize the necessary index in an optimal way to filter the data. In this example, the Execution Plan shows that a full index scan was used to return one row. This equated to 80 Logical Reads for my dataset. This is the same number of Logical Reads whether one row was returned or all rows were returned. Eighty Logical Reads may not be that big a deal, but what about on a table that has hundreds of thousands of rows?

Execution Plan

Statistics on the Table I/O Tab in SQL Sentry Plan Explorer Pro

An Alternate Universe

There is an alternate way of accomplishing this same request and it uses Set Theory to solve the problem. I can accomplish the same request by separating out the ISNULL logic into two different SQL statements and using UNION ALL to join the two result sets into one result set. Even though I’m using two different SQL statements, only one will return data during the execution of the logic.

The first SELECT statement returns rows when the @ClientID variable has a value other than NULL. The second SELECT statement returns rows when @ClientID is NULL. By using UNION ALL instead of UNION we forgo the task of checking if there are duplicate rows in the second SELECT statement.

The Execution Plan now contains the ability to handle both cases, and the Logical Reads changes based on the value of @ClientID. There are still 80 Logical Reads when @ClientID is NULL because the whole table is being returned, but the Logical Reads are reduced to two when @ClientID is set to a single client number.

Execution Plan when @ClientID =7890

Statistics when @ClientID = 7890

Execution plan when @ClientID IS NULL

Statistics when @ClientID IS NULL

In this example I was able to use UNION ALL to filter on one or all clients, but it isn’t always the best solution. If the table being filtered fits on one page, then a table scan will occur regardless of how the predicate is setup. In that case, the ISNULL statement is easier to maintain so it would be the better solution.

Conclusion

It is always best to take a look at the Execution Plan of a SQL statement to see if there are any Index Scans on filtered data. If there are, you should take a look to see if there is another, set based approach to solve the problem.

Category: Set Theory, SQL Functions | Tags: Functions, ISNULL, Set Theory, SQL, T-SQL, UNION ALL

SQLMickey | October 28, 2012

Comments Off

The ROW_NUMBER Function As An Alternate To The MAX Function

It has taken a month to fuss over my new blog, but I finally made my first SQL entry. Since I’m excited about the upcoming SQL Pass conference, I thought I would show a fictitious problem about employees and their interests in SQL Conferences.

Problem: You are given two tables. The first table contains employees. The second table contains all the SQL Conferences each employee has been interested in along with the date they showed interest in the conference and whether or not they are still interested. You are asked to find the last SQL Conference that was added for each employee. Only conferences the employees are still interested in should be included, and only one conference per employee should be listed. The returned data should be ordered by the employee’s last name and first name.

The first solution that came to mind, was to use the MAX function on the InterestAddDate field to find the last added interest. There are two issues with this approach though.

1. In order to get the activity field returned, the Interest table has to be joined a second time on the MAX(InerestAddDate).
2. Multiple rows will be returned if the employee had an interest in two SQL conferences on the same date. While this could be a valid result set, in this case only one activity should be returned.

WITH CTE_InterestsByMax
AS
(
      SELECT
         EmployeeID
         ,MAX(InterestAddDate) AS LastInterestAdDate
      FROM
            dbo.Interest AS i
      WHERE
            isActive = 1
      GROUP BY
            EmployeeID
)
SELECT
      e.FirstName + ‘ ‘ + e.LastName AS EmployeeName
      ,i.InterestAddDate
      ,i.Activity
FROM
      CTE_InterestsByMax AS im
      JOIN dbo.Interest AS i ON im.LastInterestAdDate = i.InterestAddDate
                                                AND im.EmployeeID = i.EmployeeID
      JOIN dbo.Employee AS e ON im.EmployeeID = e.EmployeeID
ORDER BY
      e.LastName
      ,e.FirstName

Solution: To address these two issues, I used a Common Table Express (CTE) and the ROW_NUMBER function. This function will number each row with a unique sequential number based on the OVER clause. Inside the OVER clause, I will order the data by the InterestAddDate field in descending order. Since I want to find the last SQL Conference of interest for each employee, I’m going to add the PARTITION statement on the EmployeeID field to the OVER clause. This will cause the ROW_NUMBER function to start over for each EmployeeID. Since I’m not using an aggregate function, I can return all the data from the Interest table that I need.

In the next part of the query , I join the CTE to the Employee table and add a WHERE clause. Since I ordered each partition in descending order, I know that the first row of each partition will have a rowindex of 1. I can now filter my data by rowindex = 1.

WITH CTE_InterestsByRow_Number
AS
(
      SELECT
         i.EmployeeID
         ,i.InterestAddDate
         ,i.Activity
         ,ROW_NUMBER() OVER (PARTITION BY i.EmployeeID ORDER BY i.InterestAddDate DESC) AS RowIndex
      FROM
         dbo.Interest AS i
      WHERE
         i.IsActive = 1
)
SELECT
      e.FirstName + ‘ ‘ + e.LastName AS EmployeeName
      ,i.InterestAddDate
      ,i.Activity
FROM
      CTE_InterestsByRow_Number AS i
      JOIN dbo.Employee AS e ON i.EmployeeID = e.EmployeeID
WHERE
      rowindex = 1
ORDER BY
      e.LastName
      ,e.FirstName

When I looked at the logical reads for these two separate queries, the query using the MAX function had twice as many logical reads as the query with the ROW_NUMBER function. When I looked at the Execution Plan for both queries, I found the query using the MAX function had a higher Query Cost relative to the batch. My first run with the data, I used 20 Employees and 40 Interests. For the second run, I used 1000 employees and 4000 interests. I found that the Query Cost for the query using the MAX function increased with the larger datasets.

Category: SQL Functions | Tags: Functions, MAX, OVER, PARTITION, ROW_NUMER, SQL

Mickey's T-SQL Ponderings

Tag Archive for Functions

A Date At The End of The Month

The Second Mission

End of the Month

Performance Perks

Like this:

T-SQL Tuesday #51- Don’t Crap Out While Betting On Table Functions

Snake Eyes

Let’s Take a Look at the Bets

Last Call

Thanks for all the fish

Like this:

Using Set Theory Instead of ISNULL To Filter Data

Like this:

The ROW_NUMBER Function As An Alternate To The MAX Function

Like this:

Searching…

Subscribe to Blog via Email. Resistance is futile.

Top Posts & Pages

The Tombs

These are not the links you are looking for. Move along. Move along.

Mickey's T-SQL Ponderings

Tag Archive for Functions

A Date At The End of The Month

The Second Mission

End of the Month

Performance Perks

Share this:

Like this:

T-SQL Tuesday #51- Don’t Crap Out While Betting On Table Functions

Snake Eyes

Let’s Take a Look at the Bets

Last Call

Thanks for all the fish

Share this:

Like this:

Using Set Theory Instead of ISNULL To Filter Data

Share this:

Like this:

The ROW_NUMBER Function As An Alternate To The MAX Function

Share this:

Like this:

Searching…

Subscribe to Blog via Email. Resistance is futile.

Top Posts & Pages

The Tombs

Tags

These are not the links you are looking for. Move along. Move along.