Tag Archive for SQL

Going Beyond The INSERT Statement

The Seventh Mission

In this installment of SQLCoOp, we are writing about CRUD. No, this is not about the stuff you scrape off the bottom of your shoes after hiking. This is Create, Read, Update, and Delete. I’ll be focusing on create, which is done with the INSERT Statement, but I’m going to go beyond the basic INSERT statement.

MCJJ_ALL

Multiple Rows

Let’s start with a new feature of the basic INSERTVALUES statement that we all know and love. In SQL Server 2012, Microsoft improved the INSERTVALUES statement, by allowing you to include multiple lists of values in the same query. This allows you to insert multiple rows of data with one INSERT statement. I find this syntax to to be a wonderful feature when I’m creating small sets of data using Excel.

Here is an example of what the syntax looks like. The values for each row are enclosed in parenthesis and separated by commas. In this example, I’m entering student scores for tests the students took at the end of September. After this one query runs, there will be 12 rows in the table.


CREATE TABLE StudentScore
(
     StudentScoresID int IDENTITY PRIMARY KEY
     ,StudentID int
     ,TestingRoomID int
     ,Score tinyint
     ,TestDateTime datetime
);

 

INSERT INTO StudentScore
     (StudentID,TestingRoomID, Score, TestDateTime)
VALUES
     (35, 1, 110, '2014-09-30 17:00:00.00')
     ,(36, 1, 87, '2014-09-30 17:00:00.00')
     ,(42, 1, 94, '2014-09-30 17:00:00.00')
     ,(10, 1, 99, '2014-09-29 12:00:00.00')
     ,(35, 10, 90, '2014-09-29 17:00:00.00')
     ,(36, 10, 100, '2014-09-29 17:00:00.00')
     ,(42, 10, 105, '2014-09-29 17:00:00.00')
     ,(10, 10, 99, '2014-09-29 17:00:00.00')
     ,(35, 5, 115, '2014-09-28 17:00:00.00')
     ,(36, 5, 70, '2014-09-28 17:00:00.00')
     ,(42, 5, 90, '2014-09-28 17:00:00.00')
     ,(10, 5, 67, '2014-09-28 17:00:00.00');
Common Table Expressions (CTEs)

I love CTEs, sometimes a little too much. The reason I think they are so great is two fold. First, they can make complicated queries easier to read and maintain. Second, they can perform complicated functionality such as recursion without breaking a query up into multiple statements. An added benefit, is they can be used with all CRUD statements, including the INSERT statement.

Let’s take a look at an example.

I have a group of students that were given the same test multiple times in different classrooms on different dates. I want to only insert the row representing  the highest score a student had. I also want to make sure that I keep the first time they received this score and any other data that came with that record.

Below I’ve highlighted the rows that I want to insert into the new table.

SQLCoOp7_Image1

The first step is building the SELECT statement for the CTE that numbers the scores for each student. This is done by using the ROW_NUMBER function in conjunction with the OVER clause. The OVER clause will have two parts. The first part will use the PARTION BY statement. This is used to restart the numbering for each partition. In this case we will partition the data by StudentID. The second part will use the ORDER BY statement. This tells us how to order the data so that it will consistently come out in the same row number. Since we want the highest score first, we will order the data by Score in a descending order. Then we will order the duplicates by TestDateTime so that the first date the score was achieved will appear first in the order.

SELECT
     ss.StudentScoresID
     ,ss.StudentID
     ,ss.TestingRoomID
     ,ss.Score
     ,ss.TestDateTime
     ,ROW_NUMBER() OVER (PARTITION BY ss.StudentID
               ORDER BY ss.Score DESC, ss.TestDateTime) AS RowID
FROM
     dbo.StudentScore AS ss

This SQL statement above will be used in our CTE. We’ll then use all the rows that have a RowID of 1 for the INSERT. The final INSERT statement will look like this.

CREATE TABLE dbo.StudentScoreHigh
(
     StudentScoresID int PRIMARY KEY
     ,StudentID int
     ,TestingRoomID int
     ,Score tinyint
     ,TestDateTime datetime
);

 

WITH cte_Scores
AS
(
     SELECT
          ss.StudentScoresID
          ,ss.StudentID
          ,ss.TestingRoomID
          ,ss.Score
          ,ss.TestDateTime
          ,ROW_NUMBER() OVER (PARTITION BY ss.StudentID
               ORDER BY ss.Score DESC, ss.TestDateTime) AS RowID
     FROM
          dbo.StudentScore AS ss
)
INSERT INTO dbo.StudentScoreHigh
(
     StudentScoresID
     ,StudentID
     ,TestingRoomID
     ,Score
     ,TestDateTime
)
SELECT
     StudentScoresID
     ,StudentID
     ,TestingRoomID
     ,Score
     ,TestDateTime
FROM
     cte_Scores
WHERE
     RowID = 1;
MERGE

A different way to approach this same problem, would be to use the MERGE statement. The MERGE statement, is like CRUD on steroids. It will look at all the data you specify that is coming in and compare to all the existing data. Then you can determine what will happen. Do you want to insert, update, delete, or ignore the data based on what does and does not match.

For this example, we’ll assume we are constantly updating the StudentScoreHigh table with the students highest score. We’ll compare what is already in the StudentScoreHigh table (the target) with what is in the StudentScore table (the Source) each night. If there is a record already in the table and the student achieved a higher score today, than the existing record will be updated. If the student does not exist in the table, then the student will be inserted into the table. Note: for this example, we’ll assume that the student could only take the test once in a day.

The MERGE statement will look like this.


MERGE dbo.StudentScoreHigh AS target
USING
(
     SELECT
          StudentScoresID
          ,StudentID
          ,TestingRoomID
          ,Score
          ,TestDateTime
     FROM
          dbo.StudentScore AS ss
     WHERE
          TestDateTime >= DATEADD(d, -1, GETDATE())
) AS Source ON Target.StudentID = Source.StudentID
WHEN MATCHED AND Source.Score > Target.Score THEN
UPDATE SET
     StudentScoresID = Source.StudentScoresID
     ,StudentID = Source.StudentID
     ,TestingRoomID = Source.TestingRoomID
     ,Score = Source.Score
     ,TestDateTime = Source.TestDateTime
WHEN NOT MATCHED THEN
INSERT
(
     StudentScoresID
     ,StudentID
     ,TestingRoomID
     ,Score
     ,TestDateTime
)
VALUES
(
     Source.StudentScoresID
     ,Source.StudentID
     ,Source.TestingRoomID
     ,Source.Score
     ,Source.TestDateTime
);
BONUS

If you have the latest Red Gate’s SQL Prompt installed (ver 6.4), then you will have an even easier time writing your favorite INSERT statement. They added a new feature that will highlight the field of the value you are modifying and vice versa.  Below you can see that my cursor is on line 61 and line 54 is highlighted. If I were to put my cursor on line 54, then line 61 would be highlighted.

 

SQLCoOp7_Image2

Don’t Stop Yet

Don’t forget to check out these blog posts by the rest of the SQL CoOp team on the subject of CRUD:

To follow our quest for SQL knowledge through this collaborative project, follow the #SQLCoOp tag on Twitter.

See you next time!!

On a SQL Collaboration Quest

Four SQL professionals gathered from the four corners of the world to share their SQL knowledge with each other and with their readers: Mickey Stuewe from California, USA, Chris Yates from Kentucky, USA, Julie Koesmarno from Canberra, Australia, and Jeffrey Verheul from Rotterdam, The Netherlands. They invite you to join them on their quest as they ask each other questions and seek out the answers in this collaborative blog series. Along the way, they will also include other SQL professionals to join in the collaboration.

Original Post: On a SQL Collaboration Quest

SQL Server Data Transferred to a SQLite Database Using SSIS

The Sixth Mission

In this installment of SQLCoOp, we are writing about SSIS. For me, this happens to tie closely with my the June SQLCoOp post I wrote, called Data Models, SQL Server, SQLite, and PowerShell. In that post I walked through how to create a data model in ER/Studio Data Architect and have it end up in a SQLite database. In this post I’ll show how to use SSIS to get master data from a SQL Server database and insert it into a newly created SQLite database that can then be deployed with client applications.

MCJJ_ALL

The Project

I needed to create a consistent way of pre-populating an empty SQLite database to be used in new builds of our application, so I turned my development SSIS package that moved test data from SQL Server into the SQLite database into a production SSIS package that pre-populated master data into a SQLite database.

Before you get started trying this example out, make sure that you have an ODBC Driver for the flavor of SQLite that you are using installed on your computer. Here is a great resource for ODBC Drivers. After the installation is finished, setup an ODBC connection to the SQLite database that you’ll be populating.

Step One

The first step I did, was run a DELETE statement for each of the tables that I’ll be populating. This allowed me to not worry about the destination database being populated from last time I ran the package. I used an Execute SQL task for each table. I set the Connection property to my ODBC connection I created and I set the SQLStatement property to the corresponding DELETE Statement.

SQLCoop6_Image1

 

SQLCoOp6_Image2

Step Two

For the next step I created a Data Flow task for each table. Inside each Data Flow task I created an ADO.Net Source and ADO.Net Destination tasks.

For each of the tables I wanted to populate, I created a unique stored procedure in a utility database called SSISMigration. This allowed me to customize the needs of each of the stored procedures without worrying about the needs of the applications using similar stored procedures in the source database.

I set the  ADO.Net Source connection to the SSISMigration database. I set the Data Access Mode to “SQL Command” and the SQL Command text to an execute statement such as “EXEC MSSL. GetAddressType”.

Side Note: If you want to pass a parameter to the stored procedure in a generic form, then check out the Expressions property of the parent Data Flow task.

 

SQLCoOp6_Image3

For the ADO.Net Destination task, set the Connection Manager property and select the table that you would like to have populated. Then click on the Mappings menu item to verify that the columns are mapped correctly between the Source and Destination tables.

SQLCoOp6_Image4

Step Three

Now I can connect each of my Execute SQL Tasks with their corresponding Data Flow tasks. This will ensure that the delete process occurs before the insert process.

SQLCoOp6_Image5

Step Four

The first time I built this process, I had all the statements connected in one big line. This made the processing an asynchronous process. I then learned that I could have multiple processes happening at the same time. But what if I have one last process that needs to occur after all other processes are finished? This is where the awesome Sequence Container comes into play. By placing all my Delete/Insert processes into the Sequence container, I can have them run synchronously, and still have a final process occur after they have all completed.

Step Five

Finally, I need to clean up after myself. Since I did some deleting and inserting, I want to make sure that my brand new SQLite database is as small as possible. SQLite has a command called VACUUM. This command rebuilds the database to remove any fragmentation caused by CUD operations. To run the VACUUM command, I use another Execute SQL Task. I set it up like I did in step one, with the SQL Statement set to “VACUUM;”.

Now I have three processes that occur synchronously and one process that happens after the first three are completed.

SQLCoOp6_Image6

Step 6

Run the package. If I’m only making this “new” master copy of the database for development, then I might just run the package from Visual Studio on an as needed basis. If I want this to be an automated process, then I can set the package up on SQL Server to run on a schedule.

But Wait, There’s More

My example here was for populating a new SQLite database with master data from SQL Server. I could use this same process for creating a subset of data for testing needs or for creating new SQL Server databases that need to be deployed in new environments.

Don’t Stop Yet

Don’t forget to check out these blog posts by the rest of the SQL CoOp team on the subject of SSIS:

To follow our quest for SQL knowledge through this collaborative project, follow the #SQLCoOp tag on Twitter.

See you next time!!

On a SQL Collaboration Quest

Four SQL professionals gathered from the four corners of the world to share their SQL knowledge with each other and with their readers: Mickey Stuewe from California, USA, Chris Yates from Kentucky, USA, Julie Koesmarno from Canberra, Australia, and Jeffrey Verheul from Rotterdam, The Netherlands. They invite you to join them on their quest as they ask each other questions and seek out the answers in this collaborative blog series. Along the way, they will also include other SQL professionals to join in the collaboration.

Original Post: On a SQL Collaboration Quest

Fall Speaking Engagements

A year ago I had no intentions of speaking this year. I didn’t think I was mentally ready for public speaking once again. Boy was I wrong. Not only was I ready, but I’ve become addicted to speaking at various SQL Events. I even found a SQL Saturday to speak at on my family vacation in Louisville, Kentucky. Between September and October I will be speaking at 5 venues and moderating two Women in Technology (WIT) panels. That will be a total of 8 sessions. Here is a bit about each of them.

SQL Saturday #249 takes place in San Diego, CA on Saturday, September 20th. This year, instead of a pre-con the day before , Red Gate will be hosting a half day called SQL in the City Seminar.

Dev Connections takes place in Las Vegas, NV between September 30th and October 4th at Mandalay Bay. I’m very excited about my two brand new presentations.

SQL in the City takes place in three different cities in October. I will be speaking at two of them. I’ll be in Pasadena, CA on October 9th and I’ll be in Charlotte, NC on October 14th the week of the PASS Summit.

PASS Summit takes place in Charlotte, NC between October 15th and 18th. This is the most amazing conference for the SQL community. While I’m not speaking at it this year, I am participating in other ways including being a buddy to a new attendee or two.

SQL Saturday #237 also takes place in Charlotte, NC on Saturday October 19th.

  • This is my last SQL Saturday of the year and it is the “BI Edition”, so I had to do my session, Scalable SSRS Reports Achieved Through the Powerful Tablix.

I’m looking forward to these sessions and to meeting new people as well as reconnecting with my old friends. I hope to find you in my sessions and that you have the time to introduce yourself. I really enjoy meeting new people and sharing the knowledge that I have.

SQL on my friends.

Using Set Theory Instead of ISNULL To Filter Data

I have written dozens upon dozens of reports for my users. I love the challenge of finding the data and wrangling it into a report in a way that doesn’t take a hundred years to run when they decide to return a year’s worth of data. One of the common requests that I get is to provide a parameter that will allow them to choose a single client or all clients. At first blush this is a very simple request and can be accomplished by using the ISNULL function.

ISNULL - ISNULL Code

Unfortunately there are performance implications using the ISNULL function. The ISNULL function is non-sargable. This means that when the function is used as part of a predicate, it can’t utilize the necessary index in an optimal way to filter the data. In this example, the Execution Plan shows that a full index scan was used to return one row. This equated to 80 Logical Reads for my dataset. This is the same number of Logical Reads whether one row was returned or all rows were returned. Eighty Logical Reads may not be that big a deal, but what about on a table that has hundreds of thousands of rows?

ISNULL - ISNULL Execution Plan
Execution Plan

ISNULL - ISNULL Reads 1
Statistics on the Table I/O Tab in SQL Sentry Plan Explorer Pro

 

An Alternate Universe

There is an alternate way of accomplishing this same request and it uses Set Theory to solve the problem. I can accomplish the same request by separating out the ISNULL logic into two different SQL statements and using UNION ALL to join the two result sets into one result set.  Even though I’m using two different SQL statements, only one will return data during the execution of the logic.

ISNULL - Union All Code

The first SELECT statement returns rows when the @ClientID variable has a value other than NULL. The second SELECT statement returns rows when @ClientID is NULL. By using UNION ALL instead of UNION we forgo the task of checking if there are duplicate rows in the second SELECT statement.

The Execution Plan now contains the ability to handle both cases, and the Logical Reads changes based on the value of @ClientID. There are still 80 Logical Reads when @ClientID is NULL because the whole table is being returned, but the Logical Reads are reduced to two when @ClientID is set to a single client number.

 

ISNULL - Union All Execution Plan 1
Execution Plan when @ClientID =7890

ISNULL - Union All Reads 1

 

Statistics when @ClientID = 7890
ISNULL - Union All Execution Plan 2
Execution plan when @ClientID IS NULL

ISNULL - Union All Reads 2
Statistics when @ClientID IS NULL

 

In this example I was able to use UNION ALL to filter on one or all clients, but it isn’t always the best solution. If the table being filtered fits on one page, then a table scan will occur regardless of how the predicate is setup. In that case, the ISNULL statement is easier to maintain so it would be the better solution.

Conclusion

It is always best to take a look at the Execution Plan of a SQL statement to see if there are any Index Scans on filtered data. If there are, you should take a look to see if there is another, set based approach to solve the problem.

T-SQL Tuesday #37 – RIGHT JOIN, LEFT JOIN, raw, raw, raw

T-sql TuesdayThis months T-SQL Tuesday blog party is being hosted by Sebastian Meine, PhD (blog). The topic is on JOINS. I’ve chosen to blog about how to rewrite a RIGHT JOIN I found this week in a stored procedure.The query was simple, but it was difficult to read without creating a data model and understanding the relationships of the tables. I decided to rewrite it to make it easier to maintain in the future. I have created a sample data model to demonstrate the problem and the solution. The data model has a table for Sales Reps,a table for Regions which the Sales Reps belong to (optionally), a list of Clients, and a linking table which links the Clients and the Sales Reps.

The query that was created, returns all the Clients and any associated Sales Reps with their regions, even if the Sales Rep is not assigned to a region. The original programmer used a RIGHT JOIN  to join the SalesRep table to the ClientSalesRep table. They also put the predicate for the SalesRegion table after the SalesRep table. While the Optimizer has no problem reading this query, I had to stand on my head to figure it out.

SELECT
   c.ClientID
   ,c.ClientName
   ,sr.Region
   ,srep.FirstName
   ,srep.LastName
FROM
   dbo.Client AS c
   LEFT JOIN dbo.ClientSalesRep AS cus ON c.ClientID = cus.ClientID
   LEFT JOIN dbo.SalesRegion AS sr
   RIGHT JOIN dbo.SalesRep AS srep ON srep.SalesRegionID = sr.SalesRegionID
										ON cus.SalesRepID = srep.SalesRepID
GO

I rewrote the query using only LEFT JOINS and each table had its own predicate. I found the LEFT JOINS made it easier to read and didn’t give me a headache.

SELECT
   c.ClientID
   ,c.ClientName
   ,sr.Region
   ,srep.FirstName
   ,srep.LastName
FROM
   dbo.Client AS c
   LEFT JOIN dbo.ClientSalesRep AS cus ON c.ClientID = cus.ClientID
   LEFT JOIN dbo.SalesRep AS srep ON cus.SalesRepID = srep.SalesRepID
   LEFT JOIN dbo.SalesRegion AS sr ON srep.SalesRegionID = sr.SalesRegionID

GO

I populated the tables with a million rows to see if the Optimizer would treat these queries differently. It didn’t. They had the same query plan, the same number of reads, and the same statistics, but it was easier to read.

Thanks go to Sebastian for hosting T-SQL Tuesday this month. Check out Sebastian’s blog, because he is blogging about JOINS all month.

The ROW_NUMBER Function As An Alternate To The MAX Function

It has taken a month to fuss over my new blog, but I finally made my first SQL entry. Since I’m excited about the upcoming SQL Pass conference, I thought I would show a fictitious problem about employees and their interests in SQL Conferences.

Problem: You are given two tables. The first table contains employees. The second table contains all the SQL Conferences each employee has been interested in along with the date they showed interest in the conference and whether or not they are still interested. You are asked to find the last SQL Conference that was added for each employee. Only conferences the employees are still interested in should be included, and only one conference per employee should be listed. The returned data should be ordered by the employee’s last name and first name.

Employee and Interest Data Model

The first solution that came to mind, was to use the MAX function on the InterestAddDate field to find the last added interest. There are two issues with this approach though.

1. In order to get the activity field returned, the Interest table has to be joined a second time on the MAX(InerestAddDate).
2. Multiple rows will be returned if the employee had an interest in two SQL conferences on the same date. While this could be a valid result set, in this case only one activity should be returned.


WITH CTE_InterestsByMax
AS
(
      SELECT
         EmployeeID
         ,MAX(InterestAddDate) AS LastInterestAdDate
      FROM
            dbo.Interest AS i
      WHERE
            isActive = 1
      GROUP BY
            EmployeeID
)
SELECT
      e.FirstName + ‘ ‘ + e.LastName AS EmployeeName
      ,i.InterestAddDate
      ,i.Activity
FROM
      CTE_InterestsByMax AS im
      JOIN dbo.Interest AS i ON im.LastInterestAdDate = i.InterestAddDate
                                                AND im.EmployeeID = i.EmployeeID
      JOIN dbo.Employee AS e ON im.EmployeeID = e.EmployeeID
ORDER BY
      e.LastName
      ,e.FirstName

Solution: To address these two issues, I used a Common Table Express (CTE)  and the ROW_NUMBER function. This function will number each row with a unique sequential number based on the OVER clause. Inside the OVER clause, I will order the data by the InterestAddDate field in descending order. Since I want to find the last SQL Conference of interest for each employee, I’m going to add the PARTITION statement on the EmployeeID field to the OVER clause. This will cause the ROW_NUMBER function to start over for each EmployeeID. Since I’m not using an aggregate function, I can return all the data from the Interest table that I need.

In the next part of the query , I join the CTE to the Employee table and add a WHERE clause. Since I ordered each partition in descending order, I know that the first row of each partition will have a rowindex of 1. I can now filter my data by rowindex = 1.

WITH CTE_InterestsByRow_Number
AS
(
      SELECT
         i.EmployeeID
         ,i.InterestAddDate
         ,i.Activity
         ,ROW_NUMBER() OVER (PARTITION BY i.EmployeeID ORDER BY i.InterestAddDate DESC) AS RowIndex
      FROM
         dbo.Interest AS i
      WHERE
         i.IsActive = 1
 )
SELECT
      e.FirstName + ‘ ‘ + e.LastName AS EmployeeName
      ,i.InterestAddDate
      ,i.Activity
FROM
      CTE_InterestsByRow_Number AS i
      JOIN dbo.Employee AS e ON i.EmployeeID = e.EmployeeID
WHERE
      rowindex = 1
ORDER BY
      e.LastName
      ,e.FirstName

When I looked at the logical reads for these two separate queries, the query using the MAX function had twice as many logical reads as the query with the ROW_NUMBER function. When I looked at the Execution Plan for both queries, I found the query using the MAX function had a higher Query Cost relative to the batch. My first run with the data, I used 20 Employees and 40 Interests. For the second run, I used 1000 employees and 4000 interests. I found that the Query Cost for the query using the MAX function increased with the larger datasets.

Execution Plan

%d bloggers like this: