Tag Archive for MERGE

Going Beyond The INSERT Statement

The Seventh Mission

In this installment of SQLCoOp, we are writing about CRUD. No, this is not about the stuff you scrape off the bottom of your shoes after hiking. This is Create, Read, Update, and Delete. I’ll be focusing on create, which is done with the INSERT Statement, but I’m going to go beyond the basic INSERT statement.

MCJJ_ALL

Multiple Rows

Let’s start with a new feature of the basic INSERTVALUES statement that we all know and love. In SQL Server 2012, Microsoft improved the INSERTVALUES statement, by allowing you to include multiple lists of values in the same query. This allows you to insert multiple rows of data with one INSERT statement. I find this syntax to to be a wonderful feature when I’m creating small sets of data using Excel.

Here is an example of what the syntax looks like. The values for each row are enclosed in parenthesis and separated by commas. In this example, I’m entering student scores for tests the students took at the end of September. After this one query runs, there will be 12 rows in the table.


CREATE TABLE StudentScore
(
     StudentScoresID int IDENTITY PRIMARY KEY
     ,StudentID int
     ,TestingRoomID int
     ,Score tinyint
     ,TestDateTime datetime
);

 

INSERT INTO StudentScore
     (StudentID,TestingRoomID, Score, TestDateTime)
VALUES
     (35, 1, 110, '2014-09-30 17:00:00.00')
     ,(36, 1, 87, '2014-09-30 17:00:00.00')
     ,(42, 1, 94, '2014-09-30 17:00:00.00')
     ,(10, 1, 99, '2014-09-29 12:00:00.00')
     ,(35, 10, 90, '2014-09-29 17:00:00.00')
     ,(36, 10, 100, '2014-09-29 17:00:00.00')
     ,(42, 10, 105, '2014-09-29 17:00:00.00')
     ,(10, 10, 99, '2014-09-29 17:00:00.00')
     ,(35, 5, 115, '2014-09-28 17:00:00.00')
     ,(36, 5, 70, '2014-09-28 17:00:00.00')
     ,(42, 5, 90, '2014-09-28 17:00:00.00')
     ,(10, 5, 67, '2014-09-28 17:00:00.00');
Common Table Expressions (CTEs)

I love CTEs, sometimes a little too much. The reason I think they are so great is two fold. First, they can make complicated queries easier to read and maintain. Second, they can perform complicated functionality such as recursion without breaking a query up into multiple statements. An added benefit, is they can be used with all CRUD statements, including the INSERT statement.

Let’s take a look at an example.

I have a group of students that were given the same test multiple times in different classrooms on different dates. I want to only insert the row representing  the highest score a student had. I also want to make sure that I keep the first time they received this score and any other data that came with that record.

Below I’ve highlighted the rows that I want to insert into the new table.

SQLCoOp7_Image1

The first step is building the SELECT statement for the CTE that numbers the scores for each student. This is done by using the ROW_NUMBER function in conjunction with the OVER clause. The OVER clause will have two parts. The first part will use the PARTION BY statement. This is used to restart the numbering for each partition. In this case we will partition the data by StudentID. The second part will use the ORDER BY statement. This tells us how to order the data so that it will consistently come out in the same row number. Since we want the highest score first, we will order the data by Score in a descending order. Then we will order the duplicates by TestDateTime so that the first date the score was achieved will appear first in the order.

SELECT
     ss.StudentScoresID
     ,ss.StudentID
     ,ss.TestingRoomID
     ,ss.Score
     ,ss.TestDateTime
     ,ROW_NUMBER() OVER (PARTITION BY ss.StudentID
               ORDER BY ss.Score DESC, ss.TestDateTime) AS RowID
FROM
     dbo.StudentScore AS ss

This SQL statement above will be used in our CTE. We’ll then use all the rows that have a RowID of 1 for the INSERT. The final INSERT statement will look like this.

CREATE TABLE dbo.StudentScoreHigh
(
     StudentScoresID int PRIMARY KEY
     ,StudentID int
     ,TestingRoomID int
     ,Score tinyint
     ,TestDateTime datetime
);

 

WITH cte_Scores
AS
(
     SELECT
          ss.StudentScoresID
          ,ss.StudentID
          ,ss.TestingRoomID
          ,ss.Score
          ,ss.TestDateTime
          ,ROW_NUMBER() OVER (PARTITION BY ss.StudentID
               ORDER BY ss.Score DESC, ss.TestDateTime) AS RowID
     FROM
          dbo.StudentScore AS ss
)
INSERT INTO dbo.StudentScoreHigh
(
     StudentScoresID
     ,StudentID
     ,TestingRoomID
     ,Score
     ,TestDateTime
)
SELECT
     StudentScoresID
     ,StudentID
     ,TestingRoomID
     ,Score
     ,TestDateTime
FROM
     cte_Scores
WHERE
     RowID = 1;
MERGE

A different way to approach this same problem, would be to use the MERGE statement. The MERGE statement, is like CRUD on steroids. It will look at all the data you specify that is coming in and compare to all the existing data. Then you can determine what will happen. Do you want to insert, update, delete, or ignore the data based on what does and does not match.

For this example, we’ll assume we are constantly updating the StudentScoreHigh table with the students highest score. We’ll compare what is already in the StudentScoreHigh table (the target) with what is in the StudentScore table (the Source) each night. If there is a record already in the table and the student achieved a higher score today, than the existing record will be updated. If the student does not exist in the table, then the student will be inserted into the table. Note: for this example, we’ll assume that the student could only take the test once in a day.

The MERGE statement will look like this.


MERGE dbo.StudentScoreHigh AS target
USING
(
     SELECT
          StudentScoresID
          ,StudentID
          ,TestingRoomID
          ,Score
          ,TestDateTime
     FROM
          dbo.StudentScore AS ss
     WHERE
          TestDateTime >= DATEADD(d, -1, GETDATE())
) AS Source ON Target.StudentID = Source.StudentID
WHEN MATCHED AND Source.Score > Target.Score THEN
UPDATE SET
     StudentScoresID = Source.StudentScoresID
     ,StudentID = Source.StudentID
     ,TestingRoomID = Source.TestingRoomID
     ,Score = Source.Score
     ,TestDateTime = Source.TestDateTime
WHEN NOT MATCHED THEN
INSERT
(
     StudentScoresID
     ,StudentID
     ,TestingRoomID
     ,Score
     ,TestDateTime
)
VALUES
(
     Source.StudentScoresID
     ,Source.StudentID
     ,Source.TestingRoomID
     ,Source.Score
     ,Source.TestDateTime
);
BONUS

If you have the latest Red Gate’s SQL Prompt installed (ver 6.4), then you will have an even easier time writing your favorite INSERT statement. They added a new feature that will highlight the field of the value you are modifying and vice versa.  Below you can see that my cursor is on line 61 and line 54 is highlighted. If I were to put my cursor on line 54, then line 61 would be highlighted.

 

SQLCoOp7_Image2

Don’t Stop Yet

Don’t forget to check out these blog posts by the rest of the SQL CoOp team on the subject of CRUD:

To follow our quest for SQL knowledge through this collaborative project, follow the #SQLCoOp tag on Twitter.

See you next time!!

On a SQL Collaboration Quest

Four SQL professionals gathered from the four corners of the world to share their SQL knowledge with each other and with their readers: Mickey Stuewe from California, USA, Chris Yates from Kentucky, USA, Julie Koesmarno from Canberra, Australia, and Jeffrey Verheul from Rotterdam, The Netherlands. They invite you to join them on their quest as they ask each other questions and seek out the answers in this collaborative blog series. Along the way, they will also include other SQL professionals to join in the collaboration.

Original Post: On a SQL Collaboration Quest

%d bloggers like this: