Tag Archive for ROW_NUMER

Questions Answered From Presentation: Changing Your Habits to Improve the Performance of Your T-SQL

QuestionMark_127880048This past Tuesday, Oct 8th, 2014, I had the privilege of speaking for the DBA Fundamentals Virtual PASS Chapter. It was my biggest audience to date and the fourth biggest audience for the user group to date. There were 374 people listening in, and they weren’t all from the US. I was thrilled to see that I had at least one person listening in from the UK.

Since there were so many people on line, I couldn’t answer all the question…thus my post today. You can download the slide deck, demos, and my sample database. (All my presentations eventually find there way to my resources page, here.)

I will post the link to the presentation on this page when it becomes available.

Questions Answered

1.    What is the name of the Red Gate tool for source control you mentioned?

Red Gate’s tool is called SQL Source Control. It’s like a bridge between your source control repository and SSMS or Visual Studio. They have a trial version that you can download here

2.    What to do when you come into a team and there was no SQL coding standards before and we do not want to spend time re-writing everything?

I would first get buy in on why standards are important to have and outline the standards you want to use. Then I would slowly change the code. What I mean by this is, update the code as you make changes to it. For instance, you need to modify stored procedure XYZ, I would add the standards to only that stored procedure or even only the part of the stored procedure that you change.

3.    What version of SSMS are you using?

In the presentation I was using SQL 2012, but all the concepts I covered applied to SQL 2005, 2008 and 2008 R2. Even the Template Browser and Template Parameters I covered are available in those versions (Talk about a well kept secret!). Note: If you need the sample database in SQL 2008 or SQL 2008R2, please let me know and I’ll see what I can do.

4.    Can you give an example of commenting within a stored procedure?

Sure. You can comment using two different ways. You can comment at the end of a line using two dashes. You can also comment any where in the code using /* and */ to encompass the comment. This allows for comments to spend multiple lines.

When I’m adding comments about my changes, I will include the date, my name and brief comment. If I’m lucky to have a work item or bug tracker application to keep track of my work, then I’ll reference that number as well. Here is an example.


USE AWMonkey
GO
/* ***********************************************************************************
Purpose:    Returns list of Territory Groups.
Notes:
Sample:
EXEC List.GetTerritory
Author:     THEZOO\mstuewe
Date:       9/16/2014

Revision History
(Change Date)    (Author)        (Description of Change)
-----------------------------------------------------------------
10/10/2014        Mickey Stuewe    #2345 Added All to the list of values as requested
By end user.
************************************************************************************ */
ALTER PROCEDURE List.GetTerritory
AS
BEGIN TRY

SET NOCOUNT ON;

SELECT
st.TerritoryID
,[Group] AS CountryRegion
,st.Name AS Territory
FROM
AdventureWorks2008R2.Sales.SalesTerritory AS st
UNION ALL
SELECT
-1
,' All'
,' All'
ORDER BY
CountryRegion; --Added the ALL to the list.
END TRY
BEGIN CATCH

DECLARE
@ErrorMessage AS nvarchar(3000)
,@ErrorSeverity AS int;

SET @ErrorMessage = ISNULL(DB_NAME(DB_ID()) + N'.' + SCHEMA_NAME(SCHEMA_ID()) + N'.'

+ OBJECT_NAME(@@PROCID, DB_ID()), N'SQL Object Name Not Available')

+ N': Error: ' + CONVERT(nvarchar(10), ERROR_NUMBER()) + N' Line: ' + CONVERT(nvarchar(5), ERROR_LINE()) + N' - ' + ERROR_MESSAGE();

SET @ErrorSeverity = ERROR_SEVERITY();
RAISERROR(@ErrorMessage, @ErrorSeverity, 1);
END CATCH;

SET NOCOUNT OFF;

5.    Is there any configuration for the Template Browser? Can it integrate with any source control?

No configuration for the Template Browser is needed. Here are the generic steps to integrate your Template Browser folder with your source control repository. Each Source Control repository will be different, but these are the general steps.

1. Open the location of the Template Browser. Since I’m using SQL 2012, I found mine here:

C:\Users\<Your User Name>\AppData\Roaming\Microsoft\SQL Server Management Studio\11.0\Templates\Sql\<My Templates>

2. Import your templates into your source control repository.

3. Set your templates folder as the “working folder” location.

4. Test by getting the latest from your repository.

5. Connect the template browsers on your team’s computers the same way.

6. Let them know when you make changes to the templates so that they can do a “get latest” to download the changes to there computers.

6.    Is there any kind of Freeware repository tool?

Yes, Subversion is an open source product and widely used repository tool. It has two parts. The first part is the repository which resides in a common location, usually on the network so that everyone can use it. Then each user needs to have a subversion client tool to connect to the repository. There are many different clients to pick from. I use Tortoise. Why? It was suggested to me and recommended by Red Gate. Since I was going to use SQL Source Control by Red Gate to connect to subversion, I decided to stick with there recommendation.

7.    So, Row_Number() is non-deterministic?

Yes, that is correct.

8.    Is there any difference or improvements in using a CAST vs CONVERT or vice versa?

There isn’t a different in using the two functions, in terms of performance. As Jeff Joy kindly pointed out, the difference between the two comes down to compliance.

Jeff Joy – CAST is ANSI – compliant while CONVERT is Microsoft-specific casting function that builds in additional formatting functionality. If you may ever need to convert to another platform you need to write code that is cross-platform compliant, use CAST.

The downside of writing all of your code to ANSI compliant standards, is you miss out on quite a few rich features that Microsoft has added to the SQL language. These rich features will help SQL statements perform faster or make your life easier. 

9.    Isn’t CTEs much better than subqueries? This is in regards to readability?

Yes CTEs (Common Table Expressions) are much easier to read. There are even patterns that can be accomplished with CTEs that can’t be accomplished with sub queries, but CTEs aren’t always faster. I have had a few queries I’ve had to re-write with sub queries because I needed every millisecond of performance.

10.    Can I see the trace options selected?

Yes, here are the events I captured. I also isolated the database, to the demo database.

 

QuestionsAnswered20141010_Image1

11.    Is there a standards document about T-SQL Formatting?

I don’t have one I can give you. I can create a template for you if you would like. Just send me an email.

12.    Can you e-mail that URL to us?

I’m not sure which URLs this is referring to, so I’ve listed the URLs that were mentioned in my presentation.

13.    Is there better performance between CTE and Apply?

Comparing CTE and APPLY is like comparing apples and oranges. They are two different things. A CTE provides you the ability to create a query to be used within another query. APPLY is an operator like INNER JOIN, but it is specifically used to join table functions (in line and multi line) as well as sub queries and CTEs.

14.    On the comparison of temporary, table variables, and materialized tables…Why was the second execution plan eligible for auto-parameterization and the others were not?

This will take a little bit of investigation since I don’t know off the top of my head. I will post this one in the next week or so.

15.    What is the proper way to use DISTINCT with lots of columns in the SELECT without duplicate records? (Per our SDLC, avoiding using DISTINCT with lots of columns.)

The DISTINCT operator causes a SORT in the execution plan which is usually a very costly operator. Unfortunately sometimes you can’t avoid it. One thing that you can do, is see if there are other ways to filter the data so that there aren’t duplicate records. For example maybe there is a column that marks the latest record as active. You could filter on that field to provide uniqueness across the rest of the fields. Another example would be joining to another table that will help reduce the records, but not alter the data you need to return.

IMHO, your SDLC is trying to avoid unnecessary DISTINCT operators in the code. I had added that same requirements to my SDLC document when DISTINCT was being abused. I later added the comment, “If you can prove the need for the DISTINCT operator, then the use of DISINCT will be approved.”

16.    About predicate and especially LIKE, we know that to put predicate more efficient, we shouldn’t start with a wildcard, but if we need to return data that end by defining string, it exists a way to improve that?

See # 17 below.

17.    REVERSE string?

I wanted to answer 16 and 17 together. #18 is related, but I answered it separately. While all the string functions are deterministic, the particular pattern that needs to be implemented (looking at the end of string), does not allow it’s self to an efficient execution plan. This has to do with how indexes are built. The keys in the index are stored sorted. This means that finding records can only be efficient when looking from left to right, not right to left when comparing strings. You can see this by running the following sample on my demo database I provided.

I created an index on the field we are looking at, then I tried various techniques to get the last three characters. All three execution plans show an Index Scan.


USE DemoProgramming
GO

IF  EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'dbo.TestData') AND name = N'IX_TestData_SomeDateText')
DROP INDEX IX_TestData_SomeDateText ON dbo.TestData WITH ( ONLINE = OFF )
GO
CREATE NONCLUSTERED INDEX IX_TestData_SomeDateText ON dbo.TestData
(
SomeDateText ASC
)

SELECT
SomeDateText
FROM
dbo.TestData AS td
WHERE
td.SomeDateText LIKE '%-09'

SELECT
SomeDateText
FROM
dbo.TestData AS td
WHERE
REVERSE(SomeDateText) LIKE '90-%'

SELECT
SomeDateText
FROM
dbo.TestData AS td
WHERE
LEFT(REVERSE(SomeDateText),3) = '90-'

When you run this sample, you do get an Index seek.


SELECT
SomeDateText
FROM
dbo.TestData AS td
WHERE
SomeDateText LIKE '2004%'

With all that said, if a particular column is constantly being split into separate parts to be searched, then I would consider storing those separate parts in fields in the database. That way they can be indexed and improve performance.

18.    Full Text index vs Like?

I have not had the opportunity to use or even investigate Full Text indexes, so I can’t comment on this at this time. (sorry)

19.    Is it ok to manually create tables in the temp DB? Would that be the same as creating materialized tables, but will be wiped out after a DB server restart?

Temp tables and global temp tables are tables with in the TempDB. As far as creating materialized or permanent tables within in TempDB, that is not a good idea. TempDB can already be burdened by all the other queries that use temp tables, table variables, and worktables. It would be better to create materialized tables outside of the TempDB.

20.    For the Template explorer…how do you share your templates to other users in SQL because when we create templates they seem to only be available to the user specifically that created them?

See Question #1.

21.    What’s the best hosting solution for MSSQL server standard/enterprise?

This is outside the scope of this session and an area I don’t have a lot of experience with.

22.    Does including Actual Execution plan also puts an overhead on the server?

No. The Actual Execution plan is what is created when a query is ran. The query itself can cause overhead, but not the plan. In fact, the engine allows a specific amount of time to find the best plan. If it can’t go through all the different possibilities, then it will do the best it can with the possibilities it ran through . You can see this in the execution plan. Click on the SELECT icon and look at the properties. For this query, it told me it found a “good enough plan”.

 

QuestionsAnswered20141010_Image2

 

If you want to get the last execution plan that was generated for a query, then you can get it using sys.dm_exec_query_plan. See how to use this DMV here.

Last but not least…

23.    What’s your favorite whisky?

Well in the immortal words of Grant Fritchey…that depends. It depends on where I am, what I’m eating, and how many people I’m with. (I know, that is a weird one. But I find I can handle the smokier Scotchs when I’m around a group a friends, but not when I’m home relaxing.) Also, I drink all my whiskey neat.

So, since I don’t have one favorite, I’ll answer the question by country in the order of my preference.

  • Ireland – I don’t think I’ve met an Irish whisky I didn’t like, but my favorite (right now) is Redbreast.
  • USA – I love 100% Rye whiskey! Mitcher’s Rye and Highwest Rendevous Rye are at the top of my list.
  • Scotland – I like the Fruity and Spicy Scotch’s from Speyside, with Oban being at the top of list.
But wait there is more

Mark Finch mentioned a free formatting tool. I wanted to include it in my summary since I had never heard of it. You can find it at poorsql.com.

Thanks for all the fish

I want to thank everyone for attending my presentation, as well as for all the encouraging feed back I received.

Going Beyond The INSERT Statement

The Seventh Mission

In this installment of SQLCoOp, we are writing about CRUD. No, this is not about the stuff you scrape off the bottom of your shoes after hiking. This is Create, Read, Update, and Delete. I’ll be focusing on create, which is done with the INSERT Statement, but I’m going to go beyond the basic INSERT statement.

MCJJ_ALL

Multiple Rows

Let’s start with a new feature of the basic INSERTVALUES statement that we all know and love. In SQL Server 2012, Microsoft improved the INSERTVALUES statement, by allowing you to include multiple lists of values in the same query. This allows you to insert multiple rows of data with one INSERT statement. I find this syntax to to be a wonderful feature when I’m creating small sets of data using Excel.

Here is an example of what the syntax looks like. The values for each row are enclosed in parenthesis and separated by commas. In this example, I’m entering student scores for tests the students took at the end of September. After this one query runs, there will be 12 rows in the table.


CREATE TABLE StudentScore
(
     StudentScoresID int IDENTITY PRIMARY KEY
     ,StudentID int
     ,TestingRoomID int
     ,Score tinyint
     ,TestDateTime datetime
);

 

INSERT INTO StudentScore
     (StudentID,TestingRoomID, Score, TestDateTime)
VALUES
     (35, 1, 110, '2014-09-30 17:00:00.00')
     ,(36, 1, 87, '2014-09-30 17:00:00.00')
     ,(42, 1, 94, '2014-09-30 17:00:00.00')
     ,(10, 1, 99, '2014-09-29 12:00:00.00')
     ,(35, 10, 90, '2014-09-29 17:00:00.00')
     ,(36, 10, 100, '2014-09-29 17:00:00.00')
     ,(42, 10, 105, '2014-09-29 17:00:00.00')
     ,(10, 10, 99, '2014-09-29 17:00:00.00')
     ,(35, 5, 115, '2014-09-28 17:00:00.00')
     ,(36, 5, 70, '2014-09-28 17:00:00.00')
     ,(42, 5, 90, '2014-09-28 17:00:00.00')
     ,(10, 5, 67, '2014-09-28 17:00:00.00');
Common Table Expressions (CTEs)

I love CTEs, sometimes a little too much. The reason I think they are so great is two fold. First, they can make complicated queries easier to read and maintain. Second, they can perform complicated functionality such as recursion without breaking a query up into multiple statements. An added benefit, is they can be used with all CRUD statements, including the INSERT statement.

Let’s take a look at an example.

I have a group of students that were given the same test multiple times in different classrooms on different dates. I want to only insert the row representing  the highest score a student had. I also want to make sure that I keep the first time they received this score and any other data that came with that record.

Below I’ve highlighted the rows that I want to insert into the new table.

SQLCoOp7_Image1

The first step is building the SELECT statement for the CTE that numbers the scores for each student. This is done by using the ROW_NUMBER function in conjunction with the OVER clause. The OVER clause will have two parts. The first part will use the PARTION BY statement. This is used to restart the numbering for each partition. In this case we will partition the data by StudentID. The second part will use the ORDER BY statement. This tells us how to order the data so that it will consistently come out in the same row number. Since we want the highest score first, we will order the data by Score in a descending order. Then we will order the duplicates by TestDateTime so that the first date the score was achieved will appear first in the order.

SELECT
     ss.StudentScoresID
     ,ss.StudentID
     ,ss.TestingRoomID
     ,ss.Score
     ,ss.TestDateTime
     ,ROW_NUMBER() OVER (PARTITION BY ss.StudentID
               ORDER BY ss.Score DESC, ss.TestDateTime) AS RowID
FROM
     dbo.StudentScore AS ss

This SQL statement above will be used in our CTE. We’ll then use all the rows that have a RowID of 1 for the INSERT. The final INSERT statement will look like this.

CREATE TABLE dbo.StudentScoreHigh
(
     StudentScoresID int PRIMARY KEY
     ,StudentID int
     ,TestingRoomID int
     ,Score tinyint
     ,TestDateTime datetime
);

 

WITH cte_Scores
AS
(
     SELECT
          ss.StudentScoresID
          ,ss.StudentID
          ,ss.TestingRoomID
          ,ss.Score
          ,ss.TestDateTime
          ,ROW_NUMBER() OVER (PARTITION BY ss.StudentID
               ORDER BY ss.Score DESC, ss.TestDateTime) AS RowID
     FROM
          dbo.StudentScore AS ss
)
INSERT INTO dbo.StudentScoreHigh
(
     StudentScoresID
     ,StudentID
     ,TestingRoomID
     ,Score
     ,TestDateTime
)
SELECT
     StudentScoresID
     ,StudentID
     ,TestingRoomID
     ,Score
     ,TestDateTime
FROM
     cte_Scores
WHERE
     RowID = 1;
MERGE

A different way to approach this same problem, would be to use the MERGE statement. The MERGE statement, is like CRUD on steroids. It will look at all the data you specify that is coming in and compare to all the existing data. Then you can determine what will happen. Do you want to insert, update, delete, or ignore the data based on what does and does not match.

For this example, we’ll assume we are constantly updating the StudentScoreHigh table with the students highest score. We’ll compare what is already in the StudentScoreHigh table (the target) with what is in the StudentScore table (the Source) each night. If there is a record already in the table and the student achieved a higher score today, than the existing record will be updated. If the student does not exist in the table, then the student will be inserted into the table. Note: for this example, we’ll assume that the student could only take the test once in a day.

The MERGE statement will look like this.


MERGE dbo.StudentScoreHigh AS target
USING
(
     SELECT
          StudentScoresID
          ,StudentID
          ,TestingRoomID
          ,Score
          ,TestDateTime
     FROM
          dbo.StudentScore AS ss
     WHERE
          TestDateTime >= DATEADD(d, -1, GETDATE())
) AS Source ON Target.StudentID = Source.StudentID
WHEN MATCHED AND Source.Score > Target.Score THEN
UPDATE SET
     StudentScoresID = Source.StudentScoresID
     ,StudentID = Source.StudentID
     ,TestingRoomID = Source.TestingRoomID
     ,Score = Source.Score
     ,TestDateTime = Source.TestDateTime
WHEN NOT MATCHED THEN
INSERT
(
     StudentScoresID
     ,StudentID
     ,TestingRoomID
     ,Score
     ,TestDateTime
)
VALUES
(
     Source.StudentScoresID
     ,Source.StudentID
     ,Source.TestingRoomID
     ,Source.Score
     ,Source.TestDateTime
);
BONUS

If you have the latest Red Gate’s SQL Prompt installed (ver 6.4), then you will have an even easier time writing your favorite INSERT statement. They added a new feature that will highlight the field of the value you are modifying and vice versa.  Below you can see that my cursor is on line 61 and line 54 is highlighted. If I were to put my cursor on line 54, then line 61 would be highlighted.

 

SQLCoOp7_Image2

Don’t Stop Yet

Don’t forget to check out these blog posts by the rest of the SQL CoOp team on the subject of CRUD:

To follow our quest for SQL knowledge through this collaborative project, follow the #SQLCoOp tag on Twitter.

See you next time!!

On a SQL Collaboration Quest

Four SQL professionals gathered from the four corners of the world to share their SQL knowledge with each other and with their readers: Mickey Stuewe from California, USA, Chris Yates from Kentucky, USA, Julie Koesmarno from Canberra, Australia, and Jeffrey Verheul from Rotterdam, The Netherlands. They invite you to join them on their quest as they ask each other questions and seek out the answers in this collaborative blog series. Along the way, they will also include other SQL professionals to join in the collaboration.

Original Post: On a SQL Collaboration Quest

The ROW_NUMBER Function As An Alternate To The MAX Function

It has taken a month to fuss over my new blog, but I finally made my first SQL entry. Since I’m excited about the upcoming SQL Pass conference, I thought I would show a fictitious problem about employees and their interests in SQL Conferences.

Problem: You are given two tables. The first table contains employees. The second table contains all the SQL Conferences each employee has been interested in along with the date they showed interest in the conference and whether or not they are still interested. You are asked to find the last SQL Conference that was added for each employee. Only conferences the employees are still interested in should be included, and only one conference per employee should be listed. The returned data should be ordered by the employee’s last name and first name.

Employee and Interest Data Model

The first solution that came to mind, was to use the MAX function on the InterestAddDate field to find the last added interest. There are two issues with this approach though.

1. In order to get the activity field returned, the Interest table has to be joined a second time on the MAX(InerestAddDate).
2. Multiple rows will be returned if the employee had an interest in two SQL conferences on the same date. While this could be a valid result set, in this case only one activity should be returned.


WITH CTE_InterestsByMax
AS
(
      SELECT
         EmployeeID
         ,MAX(InterestAddDate) AS LastInterestAdDate
      FROM
            dbo.Interest AS i
      WHERE
            isActive = 1
      GROUP BY
            EmployeeID
)
SELECT
      e.FirstName + ‘ ‘ + e.LastName AS EmployeeName
      ,i.InterestAddDate
      ,i.Activity
FROM
      CTE_InterestsByMax AS im
      JOIN dbo.Interest AS i ON im.LastInterestAdDate = i.InterestAddDate
                                                AND im.EmployeeID = i.EmployeeID
      JOIN dbo.Employee AS e ON im.EmployeeID = e.EmployeeID
ORDER BY
      e.LastName
      ,e.FirstName

Solution: To address these two issues, I used a Common Table Express (CTE)  and the ROW_NUMBER function. This function will number each row with a unique sequential number based on the OVER clause. Inside the OVER clause, I will order the data by the InterestAddDate field in descending order. Since I want to find the last SQL Conference of interest for each employee, I’m going to add the PARTITION statement on the EmployeeID field to the OVER clause. This will cause the ROW_NUMBER function to start over for each EmployeeID. Since I’m not using an aggregate function, I can return all the data from the Interest table that I need.

In the next part of the query , I join the CTE to the Employee table and add a WHERE clause. Since I ordered each partition in descending order, I know that the first row of each partition will have a rowindex of 1. I can now filter my data by rowindex = 1.

WITH CTE_InterestsByRow_Number
AS
(
      SELECT
         i.EmployeeID
         ,i.InterestAddDate
         ,i.Activity
         ,ROW_NUMBER() OVER (PARTITION BY i.EmployeeID ORDER BY i.InterestAddDate DESC) AS RowIndex
      FROM
         dbo.Interest AS i
      WHERE
         i.IsActive = 1
 )
SELECT
      e.FirstName + ‘ ‘ + e.LastName AS EmployeeName
      ,i.InterestAddDate
      ,i.Activity
FROM
      CTE_InterestsByRow_Number AS i
      JOIN dbo.Employee AS e ON i.EmployeeID = e.EmployeeID
WHERE
      rowindex = 1
ORDER BY
      e.LastName
      ,e.FirstName

When I looked at the logical reads for these two separate queries, the query using the MAX function had twice as many logical reads as the query with the ROW_NUMBER function. When I looked at the Execution Plan for both queries, I found the query using the MAX function had a higher Query Cost relative to the batch. My first run with the data, I used 20 Employees and 40 Interests. For the second run, I used 1000 employees and 4000 interests. I found that the Query Cost for the query using the MAX function increased with the larger datasets.

Execution Plan

%d bloggers like this: