0% found this document useful (0 votes)
322 views

SQL Convert Date Functions and Formats

The document discusses various SQL functions for converting date formats, as different applications and reports may require dates in different formats. It explores built-in SQL Server functions like GETDATE(), CONVERT(), and SYSDATETIME() that can retrieve dates and times from the server and convert them into formats like MM/DD/YYYY, YYYY-MM-DD, or other common date formats using format codes. Examples are provided demonstrating how to declare date variables and convert dates to various formats using these SQL functions.

Uploaded by

ravikumar lanka
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
322 views

SQL Convert Date Functions and Formats

The document discusses various SQL functions for converting date formats, as different applications and reports may require dates in different formats. It explores built-in SQL Server functions like GETDATE(), CONVERT(), and SYSDATETIME() that can retrieve dates and times from the server and convert them into formats like MM/DD/YYYY, YYYY-MM-DD, or other common date formats using format codes. Examples are provided demonstrating how to declare date variables and convert dates to various formats using these SQL functions.

Uploaded by

ravikumar lanka
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 253

SQL Convert Date functions and formats

In this article, we will explore various SQL Convert Date formats to use in writing SQL queries.
We need to work with date type data in SQL. It can be a complicated thing to deal with, at times, for SQL
Server developers. Suppose you have a Product table with a column timestamp. It creates a timestamp
for each customer order. You might face the following issues with it
 You fail to insert data in the Product table because the application tries to insert data in a
different date format
 Suppose you have data in a table in the format YYYY-MM-DD hh:mm: ss. You have a daily Sales
report, and in that, you want data group by date. You want to have data in the report in format
YYYY-MM-DD
We do face many such scenarios when we do not have a date format as per our requirement. We cannot
change table properties to satisfy each requirement. In this case, we need to use the built-in functions in
SQL Server to give the required date format.

Data Types for Date and Time


We have the following SQL convert date and Time data types in SQL Server.
Date type Format

Time hh:mm:ss[.nnnnnnn]

Date YYYY-MM-DD

SmallDateTime YYYY-MM-DD hh:mm:ss

DateTime YYYY-MM-DD hh:mm:ss[.nnn]

DateTime2 YYYY-MM-DD hh:mm:ss[.nnnnnnn]

DateTimeOffset YYYY-MM-DD hh:mm:ss[.nnnnnnn] [+|-]hh:mm


In SQL Server, we have used built-in functions such as SQL GETDATE() and GetUTCDate() to provide
server date and format in various formats.
 SYSDATETIME(): To returns the server’s date and time
 SYSDATETIMEOffset(): It returns the server’s date and time, along with UTC offset
 GETUTCDATE(): It returns date and GMT (Greenwich Mean Time ) time
 GETDATE(): It returns server date and time
Execute the following queries to get output in respective formats.
A.
1 Select SYSDATETIME() as [SYSDATETIME]
B.
1 Select SYSDATETIMEOffset() as [SYSDATETIMEOffset]
C.
1 Select GETUTCDATE() as [GETUTCDATE]
D.
1 Select GETDATE() as [GETDATE]
SQL Convert Date Formats
As highlighted earlier, we might need to format a date in different formats as per our requirements. We
can use the SQL CONVERT() function in SQL Server to format DateTime in various formats.
Syntax for the SQ: CONVERT() function is as follows.
1 SELECT CONVERT (data_type(length)),Date, DateFormatCode)
 Data_Type: We need to define data type along with length. In the date function, we use
Varchar(length) data types
 Date: We need to specify the date that we want to convert
 DateFormatCode: We need to specify DateFormatCode to convert a date in an appropriate
form. We will explore more on this in the upcoming section
Let us explore various date formats using SQL convert date functions.
First, we declare a variable to hold current DateTime using the SQL GETDATE() function with the
following query.
1 declare @Existingdate datetime
2 Set @Existingdate=GETDATE()
3 Print @Existingdate

We can see various date formats in the following table. You can keep this table handy for reference
purpose in the format of Date Time columns.

Date and Time Formats SQL convert date query Output

declare @Existingdate datetime


Datetime format as 1 Set @Existingdate=GETDATE()
MM/DD/YY 2 Select
3 CONVERT(varchar,@Existingdate,1) as
Standard: U.S.A. [MM/DD/YY]

1 declare @Existingdate datetime


Datetime format in 2 Set @Existingdate=GETDATE()
YY.MM.DD format 3 Select
CONVERT(varchar,@Existingdate,2) as
Standard: ANSI [YY.MM.DD]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
DD/MM/YY format 2 Select
3 CONVERT(varchar,@Existingdate,3) as
Standard: British/French [DD/MM/YY]

declare @Existingdate datetime


Datetime format in DD.MM.YY format 1 Set @Existingdate=GETDATE()
Standard: German 2 Select
3 CONVERT(varchar,@Existingdate,4) as
[DD.MM.YY]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
DD-MM-YY format 2 Select
3 CONVERT(varchar,@Existingdate,5) as
Standard: Italian [DD-MM-YY]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
DD MMM YY format 2 Select
3 CONVERT(varchar,@Existingdate,6) as
Standard: Shortened month name [DD MMM YY]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
MMM DD, YY format 2 Select
3 CONVERT(varchar,@Existingdate,7) as
Standard: Shortened month name [MMM DD,YY]

declare @Existingdate datetime


Datetime Format 1 Set @Existingdate=GETDATE()
In HH:MM: SS 2 Select
3 CONVERT(varchar,@Existingdate,8) as
Standard: 24 hour time [hh:mm:ss]

declare @Existingdate datetime


Datetime format as Set @Existingdate=GETDATE()
[MMM DD YYYY 1
Select
2
hh:mm:ss:mmm(AM/PM)] CONVERT(varchar,@Existingdate,9) as
3
[MMM DD YYYY
Standard: Default + milliseconds hh:mm:ss:mmm(AM/PM)]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
MM-DD-YY format 2 Select
3 CONVERT(varchar,@Existingdate,10) as
Standard: USA [MM-DD-YY]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
YY/MM/DD format 2 Select
3 CONVERT(varchar,@Existingdate,11) as
Standard: JAPAN [YY/MM/DD]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
YYMMDD format 2 Select
3 CONVERT(varchar,@Existingdate,12) as
Standard: ISO [YYMMDD]
declare @Existingdate datetime
Datetime format in 1 Set @Existingdate=GETDATE()
DD MMM YYYY HH:MM:SS:MMM 2 Select
3 CONVERT(varchar,@Existingdate,13) as
Standard: Europe default + milliseconds [DD MMM YYYY HH:MM:SS:MMM]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
HH:MM:SS:MMM 2 Select
3 CONVERT(varchar,@Existingdate,14) as
Standard:  24 hour time with [HH:MM:SS:MMM]
milliseconds

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
YYYY-MM-DD HH:MM:SS 2 Select
3 CONVERT(varchar,@Existingdate,20) as
Default: ODBC canonical [YYYY-MM-DD HH:MM:SS]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
YYYY-MM-DD HH:MM:SS.mmm 2 Select
3 CONVERT(varchar,@Existingdate,21) as
Standard: ODBC canonical with [YYYY-MM-DD HH:MM:SS.mmm]
milliseconds

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
mm/dd/yy hh:mm:ss (AM/PM) 2 Select
3 CONVERT(varchar,@Existingdate,22) as
Standard: USA with Time AM/PM [mm/dd/yy hh:mm:ss (AM/PM)]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
[yyyy-mm-dd] 2 Select
3 CONVERT(varchar,@Existingdate,23) as
[yyyy-mm-dd]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
[hh:mm:ss] 2 Select
3 CONVERT(varchar,@Existingdate,24) as
[hh:mm:ss]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
[mm-dd-yyyy hh:mm:ss.mmm] 2 Select
3 CONVERT(varchar,@Existingdate,27) as
[mm-dd-yyyy hh:mm:ss.mmm]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
[MMM DD YYYY HH: SS (AM/PM)] 2 Select
3 CONVERT(varchar,@Existingdate,100) as
Standard: Default [MMM DD YYYY HH:SS (AM/PM)]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
[MM/DD/YYYY] 2 Select
3 CONVERT(varchar,@Existingdate,101) as
Standard: USA 4 [MM/DD/YYYY]
 

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
[YYYY.MM.DD] 2 Select
3 CONVERT(varchar,@Existingdate,102) as
Standard: ANSI [YYYY.MM.DD]
declare @Existingdate datetime
Datetime format in 1 Set @Existingdate=GETDATE()
DD/MM/YYYY format 2 Select
3 CONVERT(varchar,@Existingdate,103) as
Standard: British/French [DD/MM/YYYY]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
DD.MM.YY format 2 Select
3 CONVERT(varchar,@Existingdate,104) as
Standard: German [DD/MM/YYYY]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
DD-MM-YY format 2 Select
3 CONVERT(varchar,@Existingdate,105) as
Standard: Italian [DD/MM/YYYY]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
DD MMM YYYY format 2 Select
3 CONVERT(varchar,@Existingdate,106) as
Standard: Shortened month name [DD MMM YYYY]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
MMM DD,YYYY format 2 Select
3 CONVERT(varchar,@Existingdate,107) as
Standard: Shortened month name [MMM DD,YYYY]

declare @Existingdate datetime


Datetime Format 1 Set @Existingdate=GETDATE()
In HH:MM: SS 2 Select
3 CONVERT(varchar,@Existingdate,108) as
Standard: 24 hour time [HH:MM:SS]

declare @Existingdate datetime


Datetime format as Set @Existingdate=GETDATE()
[MMM DD YYYY 1
Select
2
hh:mm:ss:mmm(AM/PM)] CONVERT(varchar,@Existingdate,109) as
Standard: Default + milliseconds 3
[MMM DD YYYY
hh:mm:ss:mmm(AM/PM)]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
MM- DD-YY format 2 Select
3 CONVERT(varchar,@Existingdate,110) as
Standard: USA [MM-DD-YYYY]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
YYYY/MM/DD format 2 Select
3 CONVERT(varchar,@Existingdate,111) as
Standard: JAPAN [YYYY/MM/DD]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
YYYYMMDD format 2 Select
3 CONVERT(varchar,@Existingdate,112) as
Standard: ISO [YYYYMMDD]
declare @Existingdate datetime
Datetime format in 1 Set @Existingdate=GETDATE()
DD MMM YYYY HH:MM:SS: MMM 2 Select
3 CONVERT(varchar,@Existingdate,113) as
Standard: Europe default + milliseconds [DD MMM YYYY HH:MM:SS:MMM]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
HH:MM:SS: MMM 2 Select
3 CONVERT(varchar,@Existingdate,114) as
Standard:  24 hour time with [DD MMM YYYY HH:MM:SS:MMM]
milliseconds

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
YYYY-MM-DD HH:MM: SS 2 Select
3 CONVERT(varchar,@Existingdate,120) as
Default: ODBC canonical [YYYY-MM-DD HH:MM:SS]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
YYYY-MM-DD HH:MM: SS.mmm 2 Select
3 CONVERT(varchar,@Existingdate,121) as
Standard: ODBC canonical with [YYYY-MM-DD HH:MM:SS.mmm]
milliseconds

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
YYYY-MM-DDTHH:MM: SS.mmm 2 Select
3 CONVERT(varchar,@Existingdate,126) as
Standard: ISO8601 [yyyy-mm-ddThh:mi:ss.mmm]

declare @Existingdate datetime


Datetime format in 1 Set @Existingdate=GETDATE()
[DD MMM YYYY 2 Select
hh:mi:ss:mmm(AM/PM)] 3 CONVERT(varchar,@Existingdate,130) as
[dd mon yyyy hh:mi:ss:mmm(AM/PM)]
Standard: Islamic/Hijri date

In the table, we can see various formats to SQL convert date as per your requirements. In the following
table, you can see all SQL date formats together.
Date format option SQL convert date output

0 Dec 30 2006 12:38AM

1 12/30/06

2 06.12.30

3 30/12/2006

4 30.12.06

5 30/12/2006

6 30-Dec-06
Let us next explore a function that is useful for SQL convert date.

DATEADD
We can use the SQL DATEADD function to add a particular period to our date. Suppose we have a
requirement to add 1 month to current date. We can use the SQL DATEADD function to do this task.
The syntax for SQL DATEADD function is as following
1 DATEADD(interval, number, date)
Interval: We can specify an interval that needs to be added in the specified date. We can have values
such as year, quarter, month, day, week, hour, minute etc.
Number: It specifies the number of the interval to add. For example, if we have specified interval as
Month and Number as 2, it means 2 months needs to be added in date.
In the following query, we want to add 2 months in the current date.
1 SELECT GETDATE() as Currentdate

3 SELECT DATEADD(month, 2, GETDATE()) AS NewDate;
You can see the output in the following screenshot.

Similarly, lets us add 1 year to current date using the following query.
1 select GETDATE() as Currentdate
2  
3 SELECT DATEADD(Year, 1, GETDATE()) AS NewDate;
4  

We can combine the SQL DATEADD and CONVERT functions to get output in desired DateTime


formats. Suppose, in the previous example; we want a date format in of MMM DD, YYYY. We can use
the format code 107 to get output in this format.
Execute the following code to get New date and ConvertedDate.
1 SELECT
2 DATEADD(YEAR,1,GETDATE()) AS [NewDate]
3 ,CONVERT(varchar(110),DATEADD(YEAR,1,GETDATE()),107) AS [ConvertedDate]

Conclusion
In this article, we explored various SQL convert date formats. It allows getting a date in required format
with Covert function easily. You can use this article to take a reference for all date formats and use in
your queries.

SQL Variables: Basics and usage


November 18, 2019 by Esat Erkec
In this article, we will learn the notions and usage details of the SQL variable. In SQL Server, local
variables are used to store data during the batch execution period. The local variables can be created for
different data types and can also be assigned values. Additionally, variable assigned values can be
changed during the execution period. The life cycle of the variable starts from the point where it is
declared and has to end at the end of the batch. On the other hand, If a variable is being used in a stored
procedure, the scope of the variable is limited to the current stored procedure. In the next sections, we
will reinforce this theoretical information with various examples
Note: In this article examples, the sample AdventureWorks database is used.

SQL Variable declaration


The following syntax defines how to declare a variable:
1 DECLARE { @LOCAL_VARIABLE data_type [ = value ] }
Now, let’s interpret the above syntax.
Firstly, if we want to use a variable in SQL Server, we have to declare it. The DECLARE statement is used to
declare a variable in SQL Server. In the second step, we have to specify the name of the variable. Local
variable names have to start with an at (@) sign because this rule is a syntax necessity. Finally, we defined
the data type of the variable. The value argument which is indicated in the syntax is an optional
parameter that helps to assign an initial value to a variable during the declaration. On the other hand, we
can assign or replace the value of the variable on the next steps of the batch. If we don’t make any initial
value assigned to a variable, it is initialized as NULL.
The following example will declare a variable whose name will be @VarValue and the data type will be
varchar. At the same time, we will assign an initial value which is ‘Save Our Planet’:
1 DECLARE @TestVariable AS VARCHAR(100)='Save Our Planet'
2 PRINT @TestVariable

Assigning a value to SQL Variable


SQL Server offers two different methods to assign values into variables except for initial value
assignment. The first option is to use the SET statement and the second one is to use the SELECT
statement. In the following example, we will declare a variable and then assign a value with the help of
the SET statement:
1 DECLARE @TestVariable AS VARCHAR(100)
2 SET @TestVariable = 'One Planet One Life'
3 PRINT @TestVariable

In the following example, we will use the SELECT statement in order to assign a value to a variable:
1 DECLARE @TestVariable AS VARCHAR(100)
2 SELECT @TestVariable = 'Save the Nature'
3 PRINT @TestVariable

Additionally, the SELECT statement can be used to assign a value to a variable from table, view or scalar-
valued functions. Now, we will take a glance at this usage concept through the following example:
1 DECLARE @PurchaseName AS NVARCHAR(50)
2 SELECT @PurchaseName = [Name]
3 FROM [Purchasing].[Vendor]
4 WHERE BusinessEntityID = 1492
5 PRINT @PurchaseName

As can be seen, the @PurchaseName value has been assigned from the Vendor table.
Now, we will assign a value to variable from a scalar-valued function:
1 DECLARE @StockVal AS INT
2 SELECT @StockVal=dbo.ufnGetStock(1)
3 SELECT @StockVal AS [VariableVal]

Multiple SQL Variables


For different cases, we may need to declare more than one variable. In fact, we can do this by declaring
each variable individually and assigned a value for every parameter:
1 DECLARE @Variable1 AS VARCHAR(100)
2 DECLARE @Variable2 AS UNIQUEIDENTIFIER
3 SET @Variable1 = 'Save Water Save Life'
4 SET @Variable2= '6D8446DE-68DA-4169-A2C5-4C0995C00CC1'
5 PRINT @Variable1
6 PRINT @Variable2

This way is tedious and inconvenient. However, we have a more efficient way to declare multiple
variables in one statement. We can use the DECLARE statement in the following form so that we can
assign values to these variables in one SELECT statement:
1 DECLARE @Variable1 AS VARCHAR(100), @Variable2 AS UNIQUEIDENTIFIER
2 SELECT @Variable1 = 'Save Water Save Life' , @Variable2= '6D8446DE-68DA-4169-A2C5-4C0995C00CC1'
3 PRINT @Variable1
4 PRINT @Variable2

Also, we can use a SELECT statement in order to assign values from tables to multiple variables:
1 DECLARE @VarAccountNumber AS NVARCHAR(15)
2 ,@VariableName AS NVARCHAR(50)
3 SELECT @VarAccountNumber=AccountNumber , @VariableName=Name
4 FROM [Purchasing].[Vendor]
5 WHERE BusinessEntityID = 1492
6 PRINT @VarAccountNumber
7 PRINT @VariableName
Useful tips about the SQL Variables
Tip 1: As we mentioned before, the local variable scope expires at the end of the batch. Now, we will
analyze the following example of this issue:
1 DECLARE @TestVariable AS VARCHAR(100)
2 SET @TestVariable = 'Think Green'
3 GO
4 PRINT @TestVariable

The above script generated an error because of the GO statement. GO statement determines the end of
the batch in SQL Server thus @TestVariable lifecycle ends with GO statement line. The variable which is
declared above the GO statement line can not be accessed under the GO statement. However, we can
overcome this issue by carrying the variable value with the help of the temporary tables:
1 IF OBJECT_ID('tempdb..#TempTbl') IS NOT NULL DROP TABLE #TempTbl
2 DECLARE @TestVariable AS VARCHAR(100)
3 SET @TestVariable = 'Hello World'
4 SELECT @TestVariable AS VarVal INTO #TempTbl
5 GO
6 DECLARE @TestVariable AS VARCHAR(100)
7 SELECT @TestVariable = VarVal FROM #TempTbl
8 PRINT @TestVariable

Tip 2: Assume that, we assigned a value from table to a variable and the result set of the SELECT
statement returns more than one row. The main issue at this point will be which row value is assigned to
the variable. In this circumstance, the assigned value to the variable will be the last row of the resultset. In
the following example, the last row of the resultset will be assigned to the variable:
1 SELECT AccountNumber
2 FROM [Purchasing].[Vendor]
3 ORDER BY BusinessEntityID
4         
5 DECLARE @VarAccountNumber AS NVARCHAR(15)
6 SELECT @VarAccountNumber=AccountNumber
7 FROM [Purchasing].[Vendor]
8 order by BusinessEntityID
9 SELECT @VarAccountNumber AS VarValue
Tip 3: If the variable declared data types and assigned value data types are not matched, SQL Server
makes an implicit conversion in the value assignment process, if it is possible. The lower precedence data
type is converted to the higher precedence data type by the SQL Server but this operation may lead to
data loss. For the following example, we will assign a float value to the variable but this variable data type
has declared as an integer:
1 DECLARE @FloatVar AS FLOAT = 12312.1232
2 DECLARE @IntVar AS INT
3 SET @IntVar=@FloatVar
4 PRINT  @IntVar

Conclusion
In this article, we have explored the concept of SQL variables from different perspectives, and we also
learned how to define a variable and how to assign a value(s) to it.

See more
SQL PARTITION BY Clause overview
April 9, 2019 by Rajendra Gupta

This article will cover the SQL PARTITION BY clause and, in particular, the difference with GROUP BY in a
select statement. We will also explore various use cases of SQL PARTITION BY.
We use SQL PARTITION BY to divide the result set into partitions and perform computation on each
subset of partitioned data.

Preparing Sample Data


Let us create an Orders table in my sample database SQLShackDemo and insert records to write further
queries.
1 Use SQLShackDemo
2 Go
3 CREATE TABLE [dbo].[Orders]
4 (
5     [orderid] INT,
6     [Orderdate] DATE,
7     [CustomerName] VARCHAR(100),
8     [Customercity] VARCHAR(100),
9     [Orderamount] MONEY
10 )
I use ApexSQL Generate to insert sample data into this article. Right click on the Orders table
and Generate test data.

It launches the ApexSQL Generate. I generated a script to insert data into the Orders table. Execute this
script to insert 100 records in the Orders table.
1 USE [SQLShackDemo]
2 GO
3 INSERT [dbo].[Orders]  VALUES (216090, CAST(N'1826-12-19' AS Date), N'Edward', N'Phoenix', 4713.8900)
4 GO
5 INSERT [dbo].[Orders]  VALUES (508220, CAST(N'1826-12-09' AS Date), N'Aria', N'San Francisco', 9832.7200)
6 GO
7 …
Once we execute insert statements, we can see the data in the Orders table in the following image.
We use SQL GROUP BY clause to group results by specified column and use aggregate functions such as
Avg(), Min(), Max() to calculate required values.
Group By function syntax
1 SELECT expression, aggregate function ()
2 FROM tables
3 WHERE conditions
4 GROUP BY expression
Suppose we want to find the following values in the Orders table
 Minimum order value in a city
 Maximum order value in a city
 Average order value in a city
Execute the following query with GROUP BY clause to calculate these values.
1 SELECT Customercity,
2        AVG(Orderamount) AS AvgOrderAmount,
3        MIN(OrderAmount) AS MinOrderAmount,
4        SUM(Orderamount) TotalOrderAmount
5 FROM [dbo].[Orders]
6 GROUP BY Customercity;
In the following screenshot, we can see Average, Minimum and maximum values grouped by
CustomerCity.

Now, we want to add CustomerName and OrderAmount column as well in the output. Let’s add these


columns in the select statement and execute the following code.
1 SELECT Customercity, CustomerName ,OrderAmount,
2        AVG(Orderamount) AS AvgOrderAmount,
3        MIN(OrderAmount) AS MinOrderAmount,
4        SUM(Orderamount) TotalOrderAmount
5 FROM [dbo].[Orders]
6 GROUP BY Customercity;
Once we execute this query, we get an error message. In the SQL GROUP BY clause, we can use a column
in the select statement if it is used in Group by clause as well. It does not allow any column in the select
clause that is not part of GROUP BY clause.

We can use the SQL PARTITION BY clause to resolve this issue. Let us explore it further in the next
section.
SQL PARTITION BY
We can use the SQL PARTITION BY clause with the OVER clause to specify the column on which we
need to perform aggregation. In the previous example, we used Group By with CustomerCity column and
calculated average, minimum and maximum values.
Let us rerun this scenario with the SQL PARTITION BY clause using the following query.
1 SELECT Customercity,
2        AVG(Orderamount) OVER(PARTITION BY Customercity) AS AvgOrderAmount,
3        MIN(OrderAmount) OVER(PARTITION BY Customercity) AS MinOrderAmount,
4        SUM(Orderamount) OVER(PARTITION BY Customercity) TotalOrderAmount
5 FROM [dbo].[Orders];
In the output, we get aggregated values similar to a GROUP By clause. You might notice a difference in
output of the SQL PARTITION BY and GROUP BY clause output.

Group By SQL PARTITION BY

We get a limited number of records using the Group By clause We get all records in a table using the PARTITION BY clause.

It gives aggregated columns with each record in the specified table.


We have 15 records in the Orders table. In the query output of SQL
It gives one row per group in result set. For example, we get a PARTITION BY, we also get 15 rows along with Min, Max and average
result for each group of CustomerCity in the GROUP BY clause. values.
In the previous example, we get an error message if we try to add a column that is not a part of the
GROUP BY clause.
We can add required columns in a select statement with the SQL PARTITION BY clause. Let us add
CustomerName and OrderAmount columns and execute the following query.
1 SELECT Customercity,
2        CustomerName,
3        OrderAmount,
4        AVG(Orderamount) OVER(PARTITION BY Customercity) AS AvgOrderAmount,
5        MIN(OrderAmount) OVER(PARTITION BY Customercity) AS MinOrderAmount,
6        SUM(Orderamount) OVER(PARTITION BY Customercity) TotalOrderAmount
7 FROM [dbo].[Orders];
We get CustomerName and OrderAmount column along with the output of the aggregated function.
We also get all rows available in the Orders table.

In the following screenshot, you can for CustomerCity Chicago, it performs aggregations (Avg, Min and
Max) and gives values in respective columns.

Similarly, we can use other aggregate functions such as count to find out total no of orders in a particular
city with the SQL PARTITION BY clause.
1 SELECT Customercity,
2        CustomerName,
3        OrderAmount,
4        COUNT(OrderID) OVER(PARTITION BY Customercity) AS CountOfOrders,
5        AVG(Orderamount) OVER(PARTITION BY Customercity) AS AvgOrderAmount,
6        MIN(OrderAmount) OVER(PARTITION BY Customercity) AS MinOrderAmount,
7        SUM(Orderamount) OVER(PARTITION BY Customercity) TotalOrderAmount
8 FROM [dbo].[Orders];
We can see order counts for a particular city. For example, we have two orders from Austin city therefore;
it shows value 2 in CountofOrders column.
PARTITION BY clause with ROW_NUMBER()
We can use the SQL PARTITION BY clause with ROW_NUMBER() function to have a row number of each
row. We define the following parameters to use ROW_NUMBER with the SQL PARTITION BY clause.
 PARTITION BY column – In this example, we want to partition data on CustomerCity column
 Order By: In the ORDER BY column, we define a column or condition that defines row number. In
this example, we want to sort data on the OrderAmount column
1 SELECT Customercity,
2        CustomerName,
3        ROW_NUMBER() OVER(PARTITION BY Customercity
4        ORDER BY OrderAmount DESC) AS "Row Number",
5        OrderAmount,
6        COUNT(OrderID) OVER(PARTITION BY Customercity) AS CountOfOrders,
7        AVG(Orderamount) OVER(PARTITION BY Customercity) AS AvgOrderAmount,
8        MIN(OrderAmount) OVER(PARTITION BY Customercity) AS MinOrderAmount,
9        SUM(Orderamount) OVER(PARTITION BY Customercity) TotalOrderAmount
10 FROM [dbo].[Orders];
In the following screenshot, we get see for CustomerCity Chicago, we have Row number 1 for order with
highest amount 7577.90. it provides row number with descending OrderAmount.

PARTITION BY clause with Cumulative total value


Suppose we want to get a cumulative total for the orders in a partition. Cumulative total should be of the
current row and the following row in the partition.
For example, in the Chicago city, we have four orders.
CustomerCity CustomerName Rank OrderAmount Cumulative Total Rows Cumulative Total

Chicago Marvin 1 7577.9 Rank 1 +2 14777.51

Chicago Lawrence 2 7199.61 Rank 2+3 14047.21

Chicago Alex 3 6847.66 Rank 3+4 8691.49

Chicago Jerome 4 1843.83 Rank 4 1843.83


In the following query, we the specified ROWS clause to select the current row (using CURRENT ROW)
and next row (using 1 FOLLOWING). It further calculates sum on those rows using sum(Orderamount)
with a partition on CustomerCity ( using OVER(PARTITION BY Customercity ORDER BY OrderAmount
DESC).
1 SELECT Customercity,
2        CustomerName,
3        OrderAmount,
4        ROW_NUMBER() OVER(PARTITION BY Customercity
5        ORDER BY OrderAmount DESC) AS "Row Number",
6        CONVERT(VARCHAR(20), SUM(orderamount) OVER(PARTITION BY Customercity
7        ORDER BY OrderAmount DESC ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING), 1) AS CumulativeTotal,

Similarly, we can calculate the cumulative average using the following query with the SQL PARTITION
BY clause.
1 SELECT Customercity,
2        CustomerName,
3        OrderAmount,
4        ROW_NUMBER() OVER(PARTITION BY Customercity
5        ORDER BY OrderAmount DESC) AS "Row Number",
6        CONVERT(VARCHAR(20), AVG(orderamount) OVER(PARTITION BY Customercity
7        ORDER BY OrderAmount DESC ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING), 1) AS CumulativeAVG
ROWS UNBOUNDED PRECEDING with the PARTITION BY
clause
We can use ROWS UNBOUNDED PRECEDING with the SQL PARTITION BY clause to select a row in a
partition before the current row and the highest value row after current row.
In the following table, we can see for row 1; it does not have any row with a high value in this partition.
Therefore, Cumulative average value is the same as of row 1 OrderAmount.
For Row2, It looks for current row value (7199.61) and highest value row 1(7577.9). It calculates the
average for these two amounts.
For Row 3, it looks for current value (6847.66) and higher amount value than this value that is 7199.61
and 7577.90. It calculates the average of these and returns.

CustomerCity CustomerName Rank OrderAmount Cumulative Average Rows Cumulative Average

Chicago Marvin 1 7577.9 Rank 1 7577.90

Chicago Lawrence 2 7199.61 Rank 1+2 7388.76

Chicago Alex 3 6847.66 Rank 1+2+3 7208.39

Chicago Jerome 4 1843.83 Rank 1+2+3+4 5867.25


Execute the following query to get this result with our sample data.
1 SELECT Customercity,
2        CustomerName,
3        OrderAmount,
4        ROW_NUMBER() OVER(PARTITION BY Customercity
5        ORDER BY OrderAmount DESC) AS "Row Number",
6        CONVERT(VARCHAR(20), AVG(orderamount) OVER(PARTITION BY Customercity
7        ORDER BY OrderAmount DESC ROWS UNBOUNDED PRECEDING), 1) AS CumulativeAvg
8 FROM [dbo].[Orders];
Conclusion
In this article, we explored the SQL PARTIION BY clause and its comparison with GROUP BY clause. We
also learned its usage with a few examples. I hope you find this article useful and feel free to ask any
questions in the comments below

Different ways to SQL delete duplicate rows from


a SQL Table
August 30, 2019 by Rajendra Gupta

This article explains the process of performing SQL delete activity for duplicate rows from a SQL table.

Introduction
We should follow certain best practices while designing objects in SQL Server. For example, a table
should have primary keys, identity columns, clustered and non-clustered indexes, constraints to ensure
data integrity and performance. Even we follow the best practices, and we might face issues such as
duplicate rows. We might also get these data in intermediate tables in data import, and we want to
remove duplicate rows before actually inserting them in the production tables.
Suppose your SQL table contains duplicate rows and you want to remove those duplicate rows. Many
times, we face these issues. It is a best practice as well to use the relevant keys, constrains to eliminate
the possibility of duplicate rows however if we have duplicate rows already in the table. We need to
follow specific methods to clean up duplicate data. This article explores the different methods to remove
duplicate data from the SQL table.
Let’s create a sample Employee table and insert a few records in it.
1 CREATE TABLE Employee
2     (
3     [ID] INT identity(1,1),
4     [FirstName] Varchar(100),
5     [LastName] Varchar(100),
6     [Country] Varchar(100),
7     )
8     GO
9     
10     Insert into Employee ([FirstName],[LastName],[Country] )values('Raj','Gupta','India'),
11                                 ('Raj','Gupta','India'),
12                                 ('Mohan','Kumar','USA'),
13                                 ('James','Barry','UK'),
14                                 ('James','Barry','UK'),
15                                 ('James','Barry','UK')
In the table, we have a few duplicate records, and we need to remove them.
SQL delete duplicate Rows using Group By and
having clause
In this method, we use the SQL GROUP BY clause to identify the duplicate rows. The Group By clause
groups data as per the defined columns and we can use the COUNT function to check the occurrence of
a row.
For example, execute the following query, and we get those records having occurrence greater than 1 in
the Employee table.
1 SELECT [FirstName],
2     [LastName],
3     [Country],
4     COUNT(*) AS CNT
5 FROM [SampleDB].[dbo].[Employee]
6 GROUP BY [FirstName],
7       [LastName],
8       [Country]
9 HAVING COUNT(*) > 1;

In the output above, we have two duplicate records with ID 1 and 3.


 Emp ID 1 has two occurrences in the Employee table
 Emp ID 3 has three occurrences in the Employee table
We require to keep a single row and remove the duplicate rows. We need to remove only duplicate rows
from the table. For example, the EmpID 1 appears two times in the table. We want to remove only one
occurrence of it.
We use the SQL MAX function to calculate the max id of each data row.
1 SELECT *
2     FROM [SampleDB].[dbo].[Employee]
3     WHERE ID NOT IN
4     (
5         SELECT MAX(ID)
6         FROM [SampleDB].[dbo].[Employee]
7         GROUP BY [FirstName],
8                  [LastName],
9                  [Country]
10     );
In the following screenshot, we can see that the above Select statement excludes the Max id of each
duplicate row and we get only the minimum ID value.

To remove this
data, replace the first Select with the SQL delete statement as per the following query.
1 DELETE FROM [SampleDB].[dbo].[Employee]
2     WHERE ID NOT IN
3     (
4         SELECT MAX(ID) AS MaxRecordID
5         FROM [SampleDB].[dbo].[Employee]
6         GROUP BY [FirstName],
7                  [LastName],
8                  [Country]
9     );
Once you execute the delete statement, perform a select on an Employee table, and we get the following
records that do not contain duplicate rows.

SQL delete duplicate Rows using Common Table


Expressions (CTE)
We can use Common Table Expressions commonly known as CTE to remove duplicate rows in SQL
Server. It is available starting from SQL Server 2005.
We use a SQL ROW_NUMBER function, and it adds a unique sequential row number for the row.
In the following CTE, it partitions the data using the PARTITION BY clause for the [Firstname], [Lastname]
and [Country] column and generates a row number for each row.
1 WITH CTE([firstname],
2     [lastname],
3     [country],
4     duplicatecount)
5 AS (SELECT [firstname],
6            [lastname],
7            [country],
8            ROW_NUMBER() OVER(PARTITION BY [firstname],
9                                           [lastname],
10                                           [country]
11            ORDER BY id) AS DuplicateCount
12     FROM [SampleDB].[dbo].[employee])
13 SELECT *
14 FROM CTE;
In the output, if any row has the value of [DuplicateCount] column greater than 1, it shows that it is a
duplicate row.

We can remove the duplicate rows using the following CTE.


1 WITH CTE([FirstName],
2     [LastName],
3     [Country],
4     DuplicateCount)
5 AS (SELECT [FirstName],
6            [LastName],
7            [Country],
8            ROW_NUMBER() OVER(PARTITION BY [FirstName],
9                                           [LastName],
10                                           [Country]
11            ORDER BY ID) AS DuplicateCount
12     FROM [SampleDB].[dbo].[Employee])
13 DELETE FROM CTE
14 WHERE DuplicateCount > 1;
It removes the rows having the value of [DuplicateCount] greater than 1

RANK function to SQL delete duplicate rows


We can use the SQL RANK function to remove the duplicate rows as well. SQL RANK function gives
unique row ID for each row irrespective of the duplicate row.
In the following query, we use a RANK function with the PARTITION BY clause. The PARTITION BY clause
prepares a subset of data for the specified columns and gives rank for that partition.
1 SELECT E.ID,
2     E.firstname,
3     E.lastname,
4     E.country,
5     T.rank
6 FROM [SampleDB].[dbo].[Employee] E
7   INNER JOIN
8 (
9 SELECT *,
10         RANK() OVER(PARTITION BY firstname,
11                                  lastname,
12                                  country
13         ORDER BY id) rank
14 FROM [SampleDB].[dbo].[Employee]
15 ) T ON E.ID = t.ID;

In the screenshot, you can note that we need to remove the row having a Rank greater than one. Let’s
remove those rows using the following query.
1 DELETE E
2     FROM [SampleDB].[dbo].[Employee] E
3          INNER JOIN
4     (
5         SELECT *,
6                RANK() OVER(PARTITION BY firstname,
7                                         lastname,
8                                         country
9                ORDER BY id) rank
10         FROM [SampleDB].[dbo].[Employee]
11     ) T ON E.ID = t.ID
12     WHERE rank > 1;

Use SSIS package to SQL delete duplicate rows


SQL Server integration service provides various transformation, operators that help both administrators
and developers in reducing manual effort and optimize the tasks. SSIS package can remove the duplicate
rows from a SQL table as well.
Use Sort Operator in an SSIS package for removing duplicating
rows
We can use a Sort operator to sort the values in a SQL table. You might ask how data sorting can remove
duplicate rows?
Let’s create the SSIS package to show this task.
 In SQL Server Data Tools, create a new Integration package. In the new package, add an OLE DB
source connection
 Open OLE DB source editor and configuration the source connection and select the destination
table

 Click on Preview data and you can see we still have duplicate data in the source table

 Add a Sort operator from the SSIS toolbox for SQL delete operation and join it with the source
data
For the configuration of the Sort operator, double click on it and select the columns that contain
duplicate values. In our case, duplicate value is in [FirstName], [LastName], [Country] columns.
We can also use the ascending or descending sorting types for the columns. The default sort method is
ascending. In the sort order, we can choose the column sort order. Sort order 1 shows the column which
will be sorted first.

On the bottom left side, notice a checkbox Remove rows with duplicate sort values.
It will do the task of removing duplicate rows for us from the source data. Let’s put a tick in this checkbox
and click ok. It performs the SQL delete activity in the SSIS package.
Once we click OK, it returns to the data flow tab, and we can see the following SSIS package.

We can add SQL Server destinations to store the data after removing duplicate rows. We only want to
check that sort operator is doing the task for us or not.
Add a SQL Multicast Transformation from the SSIS toolbox as shown below.

To view the distinct data, right-click on the connector between Sort and Multicast. Click on Enable Data
Viewer.
The overall SSIS package looks like below.

Execute the package to perform SQL delete operation. It opens the Sort output data viewer at the Data
flow task. In this data viewer, you can see distinct data after removing the duplicate values.
Close this and the SSIS package shows
successfully executed.

Conclusion
In this article, we explored the process of SQL delete duplicate rows using various ways such as T-SQL,
CTE, and SSIS package. You can use the method in which you feel comfortable. However, I would suggest
not to implement these procedures and package on the production data directly. You should test in a
lower environment

How to UPDATE from a SELECT statement in


SQL Server
April 29, 2020 by Esat Erkec

In this article, we will learn different methods that are used to update the data in a table with the data of
other tables. The UPDATE from SELECT query structure is the main technique for performing these
updates.
An UPDATE query is used to change an existing row or rows in the database. UPDATE queries can
change all tables’ rows, or we can limit the update statement affects for certain rows with the help of
the WHERE clause. Mostly, we use constant values to change the data, such as the following structures.
The full update statement is used to change the whole table data with the same value.
1 UPDATE table
2 SET col1 = constant_value1 , col2 =  constant_value2 , colN = constant_valueN
The conditional update statement is used to change the data that satisfies the WHERE condition.
1 UPDATE table
2 SET col1 = constant_value1 , col2 =  constant_value2 , colN = constant_valueN
3 WHERE col = val
However, for different scenarios, this constant value usage type cannot be enough for us, and we need to
use other tables’ data in order to update our table. This type of update statement is a bit complicated
than the usual structures. In the following sections, we will learn how to write this type of update query
with different methods, but at first, we have to prepare our sample data. So let’s do this.

Preparing the sample data


With the help of the following query, we will create Persons and AddressList tables and populate them
with some synthetic data. These two tables have a relationship through the PersonId column, meaning
that, in these two tables, the PersonId column value represents the same person.
1 CREATE TABLE dbo.Persons
2 ( PersonId       INT
3   PRIMARY KEY IDENTITY(1, 1) NOT NULL,
4   PersonName     VARCHAR(100) NULL,
5   PersonLastName VARCHAR(100) NULL,
6   PersonPostCode VARCHAR(100) NULL,
7   PersonCityName VARCHAR(100) NULL)
8  
9 GO
10  
11 CREATE TABLE  AddressList(
12   [AddressId] [int]  PRIMARY KEY IDENTITY(1,1) NOT NULL,
13   [PersonId] [int] NULL,
14   [PostCode] [varchar](100) NULL,
15   [City] [varchar](100) NULL)
16  
17 GO
18  
19 INSERT INTO Persons
20 (PersonName, PersonLastName )
21 VALUES
22 (N'Salvador', N'Williams'),
23 (N'Lawrence', N'Brown'),
24 ( N'Gilbert', N'Jones'),
25 ( N'Ernest', N'Smith'),
26 ( N'Jorge', N'Johnson')
27  
28 GO
29 INSERT INTO AddressList
30 (PersonId, PostCode, City)
31 VALUES
32 (1, N'07145', N'Philadelphia'),
33 (2, N'68443', N'New York'),
34 (3, N'50675', N'Phoenix'),
35 (4, N'96573', N'Chicago')
36  
37 SELECT * FROM Persons
38 SELECT * FROM AddressList

UPDATE from SELECT: Join Method


In this method, the table to be updated will be joined with the reference (secondary) table that contains
new row values. So that, we can access the matched data of the reference table based on the specified
join type. Lastly, the columns to be updated can be matched with referenced columns and the update
process changes these column values.
In the following example, we will update the PersonCityName and PersonPostCode columns data with
the City and PostCode columns data of the AdressList table.
1 UPDATE Per
2 SET
3 Per.PersonCityName=Addr.City,
4 Per.PersonPostCode=Addr.PostCode
5 FROM Persons Per
6 INNER JOIN
7 AddressList Addr
8 ON Per.PersonId = Addr.PersonId

After the execution of the update from a select query the output of the Persons table will be as shown
below;
1 SELECT * FROM Persons

Let’s try to understand the above code:


We typed the table name, which will be updated after the UPDATE statement. After the SET keyword, we
specified the column names to be updated, and also, we matched them with the referenced table
columns. After the FROM clause, we retyped the table name, which will be updated. After the INNER
JOIN clause, we specified the referenced table and joined it to the table to be updated. In addition to
this, we can specify a WHERE clause and filter any columns of the referenced or updated table. We can
also rewrite the query by using aliases for tables.
1 UPDATE Per
2 SET
3     Per.PersonCityName=Addr.City,
4     Per.PersonPostCode=Addr.PostCode
5 FROM Persons Per
6 INNER JOIN
7 AddressList Addr
8 ON Per.PersonId = Addr.PersonId

Performance Tip:
Indexes are very helpful database objects to improve query performance in SQL Server. Particularly, if we
are working on the performance of the update query, we should take into account of this probability. The
following execution plan illustrates an execution plan of the previous query. The only difference is that
this query updated the 3.000.000 rows of the Persons table. This query was completed within 68
seconds.
We added a non-clustered index on Persons table before to update and the added index involves
the PersonCityName and PersonPostCode columns as the index key.
The following execution plan is demonstrating an execution plan of the same query, but this query was
completed within 130 seconds because of the added index, unlike the first one.

The Index Update and Sort operators consume 74% cost of the execution plan. We have seen this
obvious performance difference between the same query because of index usage on the updated
columns. As a result, if the updated columns are being used by the indexes, like this, for example, the
query performance might be affected negatively. In particular, we should consider this problem if we will
update a large number of rows. To overcome this issue, we can disable or remove the index before
executing the update query.
On the other hand, a warning sign is seen on the Sort operator, and it indicates something does not go
well for this operator. When we hover the mouse over this operator, we can see the warning details.
During the execution of the query, the query optimizer calculates a required memory consumption for
the query based on the estimated row numbers and row size. However, this consumption estimation can
be wrong for a variety of reasons, and if the query requires more memory than the estimation, it uses the
tempdb data. This mechanism is called a tempdb spill and causes performance loss. The reason for this:
the memory always faster than the tempdb database because the tempdb database uses the disk
resources.
You can see this SQL Server 2017: SQL Sort, Spill, Memory and Adaptive Memory Grant
Feedback fantastic article for more details about the tempdb spill issue.

UPDATE from SELECT: The MERGE statement


The MERGE statement is used to manipulate (INSERT, UPDATE, DELETE) a target table by referencing a
source table for the matched and unmatched rows. The MERGE statement can be very useful for
synchronizing the table from any source table.
Now, if we go back to our position, the MERGE statement can be used as an alternative method for
updating data in a table with those in another table. In this method, the reference table can be thought
of as a source table and the target table will be the table to be updated. The following query can be an
example of this usage method.
1 MERGE Persons AS Per
2 USING(SELECT * FROM AddressList) AS Addr  
3 ON Addr.PersonID=Per.PersonID
4 WHEN MATCHED THEN
5 UPDATE SET
6 Per.PersonPostCode=Addr.PostCode ,
7 Per.PersonCityName = Addr.City;
8  
9 SELECT * FROM Persons

Now let’s tackle the previous update from a select query line by line.
1 MERGE Persons AS Per
We have typed the Persons table after the MERGE statement because it is our target table, which we
want to update, and we gave Per alias to it in order to use the rest of the query.
1 USING(SELECT * FROM AddressList) AS Addr
After the USING statement, we have specified the source table.
1 ON Addr.PersonID=Per.PersonID
With the help of this syntax, the join condition is defined between the target and source table.
1 WHEN MATCHED THEN
2 UPDATE SET Per.PersonPostCode=Addr.PostCode;
In this last line of the query, we chose the manipulation method for the matched rows. Individually for
this query, we have selected the UPDATE method for the matched rows of the target table. Finally, we
added the semicolon (;) sign because the MERGE statements must end with the semicolon signs.

UPDATE from SELECT: Subquery Method


A subquery is an interior query that can be used inside of the DML (SELECT, INSERT, UPDATE and
DELETE) statements. The major characteristic of the subquery is, they can only be executed with the
external query.
The subquery method is the very basic and easy method to update existing data from other tables’ data.
The noticeable difference in this method is, it might be a convenient way to update one column for the
tables that have a small number of the rows. Now we will execute the following query and then will
analyze it.
1 UPDATE Persons
2 SET  Persons.PersonCityName=(SELECT AddressList.PostCode
3                             FROM AddressList
4                             WHERE AddressList.PersonId = Persons.PersonId)
After the execution of the update from a select statement the output of the table will be as below;
1 SELECT * FROM Persons

As we can see, the PersonCityName column data of the Persons table have been updated with


the City column data of the AddressList table for the matched records for the PersonId column.
Regarding this method, we should underline the following significant points.
 If the subquery could not find any matched row, the updated value will be changed to NULL
 If the subquery finds more than one matched row, the update query will return an error, as
shown below:

 Many times the subquery update method may not offer satisfying performance

Conclusion
How to backup and restore MySQL databases
using the mysqldump command
May 12, 2020 by Nisarg Upadhyay

In this article, I am going to explain different ways to generate the backup in the MySQL database server.
As we know, data is a valuable asset to the organization. As database administrators, it is our primary and
crucial job to keep the data available and safe. If the system or data center fails, database corruption, and
data loss, we must be able to recover it within the defined SLA.
Different database platforms provide various methods to generate the backup and restore the database.
Many vendors provide state-of-the-art software and hardware solutions that can help to back up the
database, and it can restore the database within the defined RTO and RPO.
Here, we are not going to discuss any third-party vendor’s backup solutions. I am going to explain the
native methods that are used to generate the backup of the database. We can generate the backup of
the MySQL database using any of the following methods:
1. Generate the backup using mysqldump utility
2. Generate Incremental backups using Binary Log
3. Generate backups using the Replication of Slaves
In this article, I am going to explain how we can use mysqldump to generate the backup of the MySQL
database.

Generate backup using mysqldump utility


Mysqldump is a command-line utility that is used to generate the logical backup of the MySQL database.
It produces the SQL Statements that can be used to recreate the database objects and data. The
command can also be used to generate the output in the XML, delimited text, or CSV format.
This command is easy to use, but the only problem that occurs while restoring the database. As I
mentioned, when we generate a backup of the MySQL database, it creates a backup file that contains
SQL commands that are necessary to rebuild or restore the database. Now, when we restore the
database, the command executes all the SQL Statements to create tables and insert the data. If you have
a large database, then the restoration process takes a long time to complete.
Note: By default, mysqldump command does not dump
the information_schema database, performance_schema, and MySQL Cluster ndbinfo database.
If you want to include the information_schema tables, you must explicitly specify the name of the
database in the mysqldump command, also include the —skip-lock-tables option.
There are lots of options and features that can be used with mysqldump. You can view the complete list
of options here. I am going to some of the basic features. Following is the syntax of
the mysqldump utility.
1 mysqldump -u [user name] –p [password] [options] [database_name] [tablename] > [dumpfilename.sql]
The parameters are as following:
1. -u [user_name]: It is a username to connect to the MySQL server. To generate the backup
using mysqldump, ‘Select‘ to dump the tables, ‘Show View‘ for views, ‘Trigger‘ for the triggers.
If you are not using —single-transaction option, then ‘Lock Tables‘ privileges must be granted
to the user
2. -p [password]: The valid password of the MySQL user
3. [option]: The configuration option to customize the backup
4. [database name]: Name of the database that you want to take backup
5. [table name]: This is an optional parameter. If you want to take the backup specific tables, then
you can specify the names in the command
6. “<” OR ”>”: This character indicates whether we are generating the backup of the database or
restoring the database. You can use “>” to generate the backup and “<” to restore the backup
7. [dumpfilename.sql]: Path and name of the backup file. As I mentioned, we can generate the
backup in XML, delimited text, or SQL file so we can provide the extension of the file accordingly

Generate the backup of a single database


For example, you want to generate the backup of the single database, run the following command. The
command will generate the backup of the “sakila” database with structure and data in
the sakila_20200424.sql file.
1 mysqldump -u root -p sakila > C:\MySQLBackup\sakila_20200424.sql
When you run this command, it prompts for the password. Provide the appropriate password. See the
following image:
Once backup generated successfully, let us open the backup file to view the content of the backup file.
Open the backup location and double-click on the “sakila_20200424.sql” file.
As you can see in the above image, the backup file contains the various T-SQL statements that can be
used to re-create the objects.

Generate the backup of multiple databases or all the


databases
For example, you want to generate a backup of more than one database. You must add the —
databases option in the mysqldump command. The following command will generate the backup of
“sakila” and “employees” database with structure and data.
1 mysqldump -u root -p --databases sakila employees > C:\MySQLBackup\sakila_employees_20200424.sql
See the following image:
Similarly, if you want to generate the backup of all the databases, you must use –all-databases option in
the mysqldump command. The following command will generate the backup of all databases within
MySQL Server.
1 mysqldump -u root -p --all-databases > C:\MySQLBackup\all_databases_20200424.sql
See the following image:

Generate the backup of database structure


If you want to generate the backup of the database structure, then you must use the –no-data  option in
the mysqldump command. The following command generates the backup of the database structure of
the sakila database.
1 mysqldump -u root -p --no-data sakila > C:\MySQLBackup\sakila_objects_definition_20200424.sql
See the following image:
Generate the backup of a specific table
If you want to generate the backup of a specific table, then you must specify the name of the tables after
the name of the database. The following command generates the backup of the actor table of
the sakila database.
1 mysqldump -u root -p sakila actor payment > C:\MySQLBackup\actor_payment_table_20200424.sql
If you want to generate the backup of more than one tables, than you must separate the names of the
tables with space, the following command generates the backup of actor and payment table
of sakila database.

Generate the backup of database data


If you want to generate the backup of the data without the database structure, then you must use the –
no-create-info  option in the mysqldump command. The following command generates the backup of
data of the sakila database.
1 mysqldump -u root -p sakila --no-create-info > C:\MySQLBackup\sakila_data_only_20200424.sql
See the following image:
Let us view the content of the backup file.

As you can see in the above screenshot, the backup file contains the various T-SQL statements that can
be used to insert data in the tables.

Restore the MySQL Database


Restoring a MySQL database using mysqldump is simple. To restore the database, you must create an
empty database. First, let us drop and recreate the sakila database by executing the following command.
1 mysql> drop database sakila;
2 Query OK, 24 rows affected (0.35 sec)
3 mysql> create database sakila;
4 Query OK, 1 row affected (0.01 sec)
5 MySQL>
When you restore the database, instead of using mysqldump, you must use mysql; otherwise,
the mysqldump will not generate the schema and the data. Execute the following command to restore
the sakila database:
1 mysql -u root -p sakila < C:\MySQLBackup\sakila_20200424.sql
Once command executes successfully, execute the following command to verify that all objects have
been created on the sakila database.
mysql> use sakila;
Database changed
mysql> show tables;
See the following image:

Restore a specific table in the database


For instance, someone dropped a table from the database. Instead of restoring the entire database, we
can restore the dropped table from the available backup. To demonstrate, drop the actor table from the
sakila database by executing the following command on the MySQL command-line tool.
1 mysql> use sakila;
2 Database changed
3 mysql> drop table actor;
To restore the actor table, perform the following step by step process.
Step 1 :
Create a dummy database named sakila_dummy and restore the backup of the sakila database on it.
Following is the command.
1 mysql> create database sakila_dummy;
2 mysql> use sakila_dummy;
3 mysql> source C:\MySQLBackup\sakila_20200424.sql
Step 2:
Backup the actor table to sakila_dummy_actor_20200424.sql file. Following is the command
1 C:\Users\Nisarg> mysqldump -u root -p sakila_dummy actor > C:\MySQLBackup\sakila_dummy_actor_20200424.sql
Step 3:
Restore the actor table from the “sakila_dummy_actor_20200424.sql” file. Following is the command
on the MySQL command-line tool.
1 mysql> source C:\MySQLBackup\sakila_dummy_actor_20200424.sql
Execute the following command to verify the table has been restored successfully.
1 mysql> use sakila;
2 Database changed
3 mysql> show tables;
See the following image:
Summary
In this article, I have explained how we can use the mysqldump command-line utility to generate the
following:
1. The backup of MySQL database, table, or the structure of the database
2. Restore the MySQL database or table from the backup

SQL WHILE loop with simple examples


October 25, 2019 by Esat Erkec

SQL WHILE loop provides us with the advantage to execute the SQL statement(s) repeatedly until the
specified condition result turn out to be false.
In the following sections of this article, we will use more flowcharts in order to explain the notions and
examples. For this reason, firstly, we will explain what is a flowchart briefly. The flowchart is a visual
geometric symbol that helps to explain algorithms visually. The flowchart is used to simply design and
document the algorithms. In the flowchart, each geometric symbol specifies different meanings.
The following flowchart explains the essential structure of the WHILE loop in SQL:
As you can see, in each iteration of the loop, the defined condition is checked, and then, according to the
result of the condition, the code flow is determined. If the result of the condition is true, the SQL
statement will be executed. Otherwise, the code flow will exit the loop. If any SQL statement exists
outside the loop, it will be executed.

SQL WHILE loop syntax and example


The syntax of the WHILE loop in SQL looks like as follows:
1 WHILE condition
2 BEGIN
3    {...statements...}
4 END
After these explanations, we will give a very simple example of a WHILE loop in SQL. In the example
given below, the WHILE loop example will write a value of the variable ten times, and then the loop will
be completed:
1 DECLARE @Counter INT
2 SET @Counter=1
3 WHILE ( @Counter <= 10)
4 BEGIN
5     PRINT 'The counter value is = ' + CONVERT(VARCHAR,@Counter)
6     SET @Counter  = @Counter  + 1
7 END

Now, we will handle the WHILE loop example line by line and examine it with details.
In this part of the code, we declare a variable, and we assign an initializing value to it:
1 DECLARE @Counter INT
2 SET @Counter=1
This part of the code has a specified condition that until the variable value reaches till 10, the loop
continues and executes the PRINT statement. Otherwise, the while condition will not occur, and the loop
will end:
1 WHILE ( @Counter <= 10)
In this last part of the code, we executed the SQL statement, and then we incremented the value of the
variable:
1 BEGIN
2     PRINT 'The counter value is = ' + CONVERT(VARCHAR,@Counter)
3     SET @Counter  = @Counter  + 1
4 END
The following flowchart illustrates this WHILE loop example visually:
Infinite SQL WHILE loop
In the infinite loop AKA endless loop, the condition result will never be false, so the loop never ends and
can work forever. Imagine that we have a WHILE loop, and we don’t increment the value of the variable.
In this scenario, the loop runs endlessly and never ends. Now, we will realize this scenario with the help of
the following example. We need to take account of one thing that we should not forget to cancel the
execution of the query manually:
1 DECLARE @Counter INT
2 SET @Counter=1
3 WHILE ( @Counter <= 10)
4 BEGIN
5     PRINT 'Somebody stops me!'
6   
7 END
In the following flowchart, it is obvious that the value of the variable never changes; therefore, the loop
never ends. The reason for this issue is that the variable is always equal to 1 so the condition returns true
for each iteration of the loop:
BREAK statement
BREAK statement is used in the SQL WHILE loop in order to exit the current iteration of the loop
immediately when certain conditions occur. In the generally IF…ELSE statement is used to check whether
the condition has occurred or not. Refer to the SQL IF Statement introduction and overview article for
more details about the IF…ELSE statement.
The following example shows the usage of the BREAK statement in the WHILE loop:
1 DECLARE @Counter INT
2 SET @Counter=1
3 WHILE ( @Counter <= 10)
4 BEGIN
5   PRINT 'The counter value is = ' + CONVERT(VARCHAR,@Counter)
6   IF @Counter >=7
7   BEGIN
8   BREAK
9   END
10     SET @Counter  = @Counter  + 1
11 END

In this example, we have checked the value of the variable, and when the value is equal or greater than 7,
the code entered the IF…ELSE block and executed the BREAK statement, and so it exited the loop
immediately. For this reason, the message shows the values of the variable up to 7. If the condition of the
IF…ELSE statement does not meet, the loop will run until the condition result will be false. The following
flowchart explains the working logic of the BREAK statement example as visually:
CONTINUE statement
CONTINUE statement is used in the SQL WHILE loop in order to stop the current iteration of the loop
when certain conditions occur, and then it starts a new iteration from the beginning of the loop. Assume
that we want to write only even numbers in a WHILE loop. In order to overcome this issue, we can use
the CONTINUE statement. In the following example, we will check whether the variable value is odd or
even. If the variable value is odd, the code enters the IF…ELSE statement blocks and increment the value
of the variable, execute the CONTINUE statement and starts a new iteration:
1 DECLARE @Counter INT
2 SET @Counter=1
3 WHILE ( @Counter <= 20)
4 BEGIN
5  
6   IF @Counter % 2 =1
7   BEGIN
8   SET @Counter  = @Counter  + 1
9   CONTINUE
10   END
11     PRINT 'The counter value is = ' + CONVERT(VARCHAR,@Counter)
12     SET @Counter  = @Counter  + 1
13 END
The following flowchart explains the working logic of the CONTINUE statement example as visually:

Reading table records through the WHILE loop


In the following example, we will read table data, row by row. Firstly we will create a sample table:
1 USE tempdb
2 GO
3 DROP TABLE IF EXISTS SampleTable
4 CREATE TABLE SampleTable
5 (Id INT, CountryName NVARCHAR(100), ReadStatus TINYINT)
6 GO
7 INSERT INTO SampleTable ( Id, CountryName, ReadStatus)
8 Values (1, 'Germany', 0),
9         (2, 'France', 0),
10         (3, 'Italy', 0),
11     (4, 'Netherlands', 0) ,
12        (5, 'Poland', 0)
13  
14  
15  
16   SELECT * FROM SampleTable
In this step, we will read all data row by row with the help of the WHILE loop:
1 USE tempdb
2 GO
3  
4 DECLARE @Counter INT , @MaxId INT,
5         @CountryName NVARCHAR(100)
6 SELECT @Counter = min(Id) , @MaxId = max(Id)
7 FROM SampleTable
8
9 WHILE(@Counter IS NOT NULL
10       AND @Counter <= @MaxId)
11 BEGIN
12    SELECT @CountryName = CountryName
13    FROM SampleTable WHERE Id = @Counter
14     
15    PRINT CONVERT(VARCHAR,@Counter) + '. country name is ' + @CountryName  
16    SET @Counter  = @Counter  + 1        
17 END

In this example, we read the table rows via the WHILE loop. We can also develop more sophisticated and
advanced loops based on our needs.

QL Server functions for converting a String to a


Date
February 6, 2020 by Hadi Fadlallah

While working with raw data, you may frequently face date values stored as text. Converting these values
to a date data type is very important since dates may be more valuable during analysis. In SQL Server,
converting a string to date can be achieved in different approaches.
In general, there are two types of data type conversions:
1. Implicit where conversions are not visible to the user; data type is changed while loading data
without using any function
2. Explicit where conversions are visible to the user and they are performed using CAST or
CONVERT functions or other tools
In this article, we will explain how a string to date conversion can be achieved implicitly, or explicitly in
SQL Server using built-in functions such as CAST(), TRY_CAST(), CONVERT(), TRY_CONVERT() and
TRY_PARSE().
 Note: Before we start, please note that some of the SQL statements used are meaningless from
the data context perspective and are just used to explain the concept.

SQL Server: convert string to date implicitly


As mentioned above, converting a data type implicitly is not visible to the user, as an example when you
are comparing two fields or values having different data types:
1 SELECT * FROM information_schema.columns where '1' = 1
In SQL Server, converting string to date implicitly depends on the string date format and the default
language settings (regional settings); If the date stored within a string is in ISO
formats: yyyyMMdd or yyyy-MM-ddTHH:mm:ss(.mmm), it can be converted regardless of the regional
settings, else the date must have a supported format or it will throw an exception, as an example while
working under the regional settings “EN-US”, if we try to convert a string with dd/MM/yyyy format it will
fail since it tries to convert it as MM/dd/yyyy format which is supported.
1 SELECT * FROM information_schema.columns where GETDATE() > '13/12/2019'
Will throw the following exception:
Msg 242, Level 16, State 3, Line 1
The conversion of a varchar data type to a datetime data type resulted in an
out-of-range value.
Screenshot:

But, if we switch the day and month parts, it will succeed:


1 SELECT * FROM information_schema.columns where GETDATE() > '13/12/2019'
Screenshot:

You can check out this official documentation here to learn more about how to change SQL Server
language settings.
Additionally, you can read more about implicitly converting date types in SQL Server, by referring to this
article: Implicit conversion in SQL Server.

SQL Server: Convert string to date explicitly


The second approach for converting data types is the explicit conversion which is done by using some
functions or tools. In SQL Server, converting a string to date explicitly can be achieved using CONVERT().
CAST() and PARSE() functions.
CAST()
CAST() is the most basic conversion function provided by SQL Server. This function tries to convert given
value to a specified data type (data type length can only be specified).
Example:
1 SELECT CAST('12/01/2019' as date) as StringToDate , CAST(GETDATE() as VARCHAR(50)) as DateToString
Result:

Note that in SQL Server, converting a string to date using CAST() function depends on the language
settings similar to implicit conversion, as we mentioned in the previous section, so you can only convert
ISO formats or supported formats by the current language settings.
CONVERT()
CONVERT() function is more advanced than CAST() since the conversion style can be specified. This
function takes 3 arguments: (1) the desired data type, (2) the input value and (3) the style number
(optional).
If the style number is not passed to the function, it acts like the CAST() function. But, if the style argument
is passed, it will try to convert the value based on that style. As an example, if we try to convert
“13/12/2019” value to date without specifying the style number, it will fail since it is not supported by the
current language setting:
1 SELECT CONVERT(DATETIME,'13/12/2019')
Result:

But, if we pass 103 as style number (103 is corresponding of dd/MM/yyyy date format), it will succeed:
1 SELECT CONVERT(DATETIME,'13/12/2019',103)
Result:

For more information about CONVERT() function and date style numbers, you can refer to the following
articles:
 SQL Convert Date functions and formats
 How to convert from string to datetime?

PARSE()
PARSE() is SQL CLR function that use .Net framework Parse() function. PARSE() syntax is as follows:
PARSE(<value> AS <data type> [USING <culture>])
If the culture info is not specified, PARSE() acts similar to CAST() function, but when the culture is passed
within the expression, the function tries to convert the value to the desired data type using this culture.
As an example, if we try to parse 13/12/2019 value without passing the culture information, it will fail
since “dd/MM/yyyy” is not supported by the default language settings.

But, if we pass “AR-LB” as culture (Arabic – Lebanon), where “dd/MM/yyyy” is supported, the conversion
succeeds:

TRY_CAST(), TRY_CONVERT() and TRY_PARSE()


One of the main issues of the data type conversion functions is that they cannot handle the erroneous
value. As an example, many times you may face bad date values such as “01/01/0000”; these values
cannot be converted and will throw a data conversion exception.
To solve this issue, you can use TRY_CAST(), TRY_CONVERT() or TRY_PARSE() functions to check if the
value can be converted or not, if so, the function will return the conversion result, else it will return a
NULL value.
Example:
1 SELECT TRY_CAST('01/01/2000' as date), TRY_CAST('01/01/0000' as date)
Result:
CAST() vs CONVERT() vs PARSE()
To understand the differences among these conversion functions and also to decide which function to
use in which scenario, you can refer to this site.

Conclusion
In this article, we explained data conversion approaches in general. Then we showed how, while using
SQL Server, converting a string to date can be achieved using these approaches. We explained the
system functions provided by SQL Server by giving some examples and external links that provide more
details.

See more
SELECT INTO TEMP TABLE statement in SQL
Server
June 21, 2021 by Esat Erkec

In this article, we will explore the SELECT INTO TEMP TABLE statement, its syntax and usage details and
also will give some simple basic examples to reinforce the learnings.

Introduction
SELECT INTO statement is one of the easy ways to create a new table and then copy the source table
data into this newly created table. In other words, the SELECT INTO statement performs a combo task:
 Creates a clone table of the source table with exactly the same column names and data types
 Reads data from the source table
 Inserts data into the newly created table
We can use the SELECT INTO TEMP TABLE statement to perform the above tasks in one statement for the
temporary tables. In this way, we can copy the source table data into the temporary tables in a quick
manner.

SELECT INTO TEMP TABLE statement syntax


1 SELECT * | Column1,Column2...ColumnN
2 INTO #TempDestinationTable
3 FROM Source_Table
4 WHERE Condition

Arguments of the SELECT INTO TEMP TABLE


 Column List: We can use the asterisk (*) to create a full temporary copy of the source table or
can select the particular columns of the source table
 Destination Table: This table refers to the temporary table name to which we will create and
insert the data. We can specify the destination table as a local or global temporary table. For the
local temporary table, we use a single hash (#) sign and for the global temporary table we use
hash (##) sign
 Source Table: The source is a table from which we want to read data
 Where Clause: We can use a where clause to apply a filter to the source table data
In the following example, we will insert the Location table data into the #TempLocation table. In other
words, we will create a temporary clone of the Location table.
1 SELECT * INTO #TempLocation FROM Production.Location
2 GO
3 SELECT * FROM #TempLocation
As we can see, the SELECT INTO statement creates the #TempLocation table and then insert the Location
table data into it.
When we want to insert particular columns of the Location table into a temporary table we can use the
following query :
1 SELECT LocationID,Name,ModifiedDate INTO #TempLocationCol FROM Production.Location
2 GO
3 SELECT * FROM #TempLocationCol

One point to notice here is the temporary table and source table column names are the same. In order to
change the column names of the temporary table, we can give aliases to the source table columns in the
select query.
1 SELECT LocationID AS [TempLocationID],
2 Name AS [TempLocationName] ,ModifiedDate  AS [TempModifiedDate]
3 INTO #TempLocationCol FROM Production.Location
4 GO
5 SELECT * FROM #TempLocationCol
At the same time, we can filter some rows of the Location and then insert the result set into a temporary
table. The following query filters the rows in which the Name column starts with the “F” character and
then inserts the resultsets into the temporary table.
1 SELECT LocationID,Name,ModifiedDate INTO #TempLocationCon FROM Production.Location
2 WHERE Name LIKE 'F%'
3 GO
4 SELECT * FROM #TempLocationCon

INSERT INTO SELECT vs SELECT INTO TEMP TABLE


INSERT INTO SELECT statement reads data from one table and inserts it into an existing table. Such as, if
we want to copy the Location table data into a temp table using the INSERT INTO SELECT statement, we
have to specify the temporary table explicitly and then insert the data.
1 ---Declare the temporary table---
2 CREATE TABLE #CopyLocation(
3     LocationID smallint  NOT NULL,
4     Name nvarchar(50) NOT NULL,
5     CostRate smallmoney NOT NULL,
6     Availability decimal (8, 2) NOT NULL,
7     ModifiedDate datetime NOT NULL)
8  
9 ---Copy data into the temporary table---
10     INSERT INTO #CopyLocation
11     SELECT * FROM Production.Location
12 ---Select data from the temporary table---
13     SELECT * FROM #CopyLocation
In fact, these two statements accomplish the same task in different ways. However, they have some
differences in their usage scenarios.
INSERT INTO SELECT SELECT INTO

Required to declare the destination temporary table explicitly. So, it allows the flexibility to change Creates the destination temporary table
column data types and able to allows creates indexes. automatically.

It can create a backup copy of a table


Due to the flexibility to define the column data types, allows transferring data from different tables. with easy syntax.

SELECT INTO TEMP TABLE performance


The SELECT INTO TEMP TABLE statement performs two main tasks in the context of the performance and
these are:
 Reading data from the source data
 Inserting data into the temp table
Data reading operation performance depends on the select query performance so we need to evaluate
the performance of the data reading process within this scope. However, the configuration of the
tempdb database will have an impact on the performance of the insert statement. With SQL 2014,
SELECT … INTO statements have been running parallel so they show better performance. Now, let’s
analyze the following query execution plan.
1 SELECT SalesOrderID,CarrierTrackingNumber,ModifiedDate
2 INTO #TempsSalesDetail FROM Sales.SalesOrderDetail
3 ORDER BY SalesOrderID
1- The Clustered Index Scan operator reads all data from the primary key of the SalesOrderDetail table
and passes all data to the table insert operator.
2- The Table Insert operator adds new data into the temporary table and performs this operation in a
parallel manner. This situation can be shown in the Actual Number of Rows attribute. Thread 0 does not
show any values because it is the coordinator thread.

The Gather Stream operator merges several parallel operations into a single operation. In this query
execution plan, we have used the ORDER BY clause but we can not see any sort of operator in the
execution plan. At the same time, the Clustered Index Scan operator does not return in a sorted manner.
The reason for this point is that there is no guarantee for the order of insertion of the rows into the table.
Conclusion
In this article, we have learned the syntax and usage details of the SELECT INTO TEMP TABLE statement.
This statement is very practical to insert table data or query data into the temporary tables.

Understanding the SQL MERGE statement


July 27, 2020 by Aveek Das

In this article, I am going to give a detailed explanation of how to use the SQL MERGE statement in SQL
Server. The MERGE statement in SQL is a very popular clause that can handle inserts, updates, and
deletes all in a single transaction without having to write separate logic for each of these. You can specify
conditions on which you expect the MERGE statement to insert, update, or delete, etc.
Using the MERGE statement in SQL gives you better flexibility in customizing your complex SQL scripts
and also enhances the readability of your scripts. The MERGE statement basically modifies an existing
table based on the result of comparison between the key fields with another table in the context.
Figure 1 – MERGE Illustration
The above illustration depicts how a SQL MERGE statement basically works. As you can see, there are two
circles that represent two tables and can be considered as Source and a Target. The MERGE statement
tries to compare the source table with the target table based on a key field and then do some of the
processing. The MERGE statement actually combines the INSERT, UPDATE, and the DELETE operations
altogether. Although the MERGE statement is a little complex than the simple INSERTs or UPDATEs, once
you are able to master the underlying concept, you can easily use this SQL MERGE more often than using
the individual INSERTs or UPDATEs.

Applications of the SQL MERGE statement


In a typical SQL Data warehouse solution, it is often essential to maintain a history of data in the
warehouse with a reference to the source data that is being fed to the ETL tool. A most common use case
is while trying to maintain Slowly Changing Dimensions (SCD) in the data warehouse. In such cases, you
need to insert new records into the data warehouse, remove or flag records from the warehouse which
are not in the source anymore, and update the values of those in the warehouse which have been
updated in the source.
The SQL MERGE statement was introduced in the SQL Server 2008 edition which allowed great flexibility
to the database programmers to simplify their messy code around the INSERT, UPDATE and DELETE
statements while applying the logic to implement SCD in ETL.

Optimizing the performance of the SQL MERGE


statement
There are a few aspects using which you can optimize the performance of your MERGE statements.
Having said that, it means now you can write all your DML statements (INSERT, UPDATE, and DELETE)
combined in a single statement. From a data processing perspective, this is quite helpful as it reduces the
I/O operations from the disk for each of the three statements individually and now data is being read
from the source only once.
Also, the performance of the MERGE statement greatly depends on the proper indexes being used to
match both the source and the target tables. Apart from indexes, it is also essential that the join
conditions are optimized as well. We should also try to filter the source table so that only necessary
records are being fetched by the statement to do the necessary operations.

Hands-on with the MERGE statement


Now that we have gathered enough information regarding how the MERGE statement works, lets us go
ahead and try to implement the same practically. For the purpose of this tutorial, I am going to create a
simple table and insert a few records in it. You can use the following SQL script to create the database
and tables on your machine.
1 CREATE DATABASE SqlShackMergeDemo
2 GO
3     
4 USE SqlShackMergeDemo
5 GO
6     
7 CREATE TABLE SourceProducts(
8     ProductID INT,
9     ProductName VARCHAR(50),
10     Price DECIMAL(9,2)
11 )
12 GO
13     
14 INSERT INTO SourceProducts(ProductID,ProductName, Price) VALUES(1,'Table',100)
15 INSERT INTO SourceProducts(ProductID,ProductName, Price) VALUES(2,'Desk',80)
16 INSERT INTO SourceProducts(ProductID,ProductName, Price) VALUES(3,'Chair',50)
17 INSERT INTO SourceProducts(ProductID,ProductName, Price) VALUES(4,'Computer',300)
18 GO
19     
20 CREATE TABLE TargetProducts(
21     ProductID INT,
22     ProductName VARCHAR(50),
23     Price DECIMAL(9,2)
24 )
25 GO
26     
27 INSERT INTO TargetProducts(ProductID,ProductName, Price) VALUES(1,'Table',100)
28 INSERT INTO TargetProducts(ProductID,ProductName, Price) VALUES(2,'Desk',180)
29 INSERT INTO TargetProducts(ProductID,ProductName, Price) VALUES(5,'Bed',50)
30 INSERT INTO TargetProducts(ProductID,ProductName, Price) VALUES(6,'Cupboard',300)
31 GO
32     
33     
34 SELECT * FROM SourceProducts
35 SELECT * FROM TargetProducts

Figure 2 – Sample Data inserted


Now that the database is ready, the next step I am going to perform is to apply the MERGE statement
and try to get both the tables to synchronize with each other. The first operation that we are trying to see
is how to manage the INSERTs. You can copy and paste the below SQL code to merge the new data from
the source to the target table.
1 USE SqlShackMergeDemo
2 GO
3     
4 MERGE TargetProducts AS Target
5 USING SourceProducts AS Source
6 ON Source.ProductID = Target.ProductID
7 WHEN NOT MATCHED BY Target THEN
8     INSERT (ProductID,ProductName, Price)
9     VALUES (Source.ProductID,Source.ProductName, Source.Price);
Figure 3 – MERGE operation performed on the source and target tables
As you can see, the two records with ProductID 3 and 4, which were not present in the target table are
now inserted. This operation is done by matching the source and the target tables based on
the ProductID field.
Now that we have learned how to insert records using the SQL MERGE statement, let us learn how to
update the values in the same statement. In order to update the values, the ProductID field must have a
common value in both the source and the target tables. Only then the database engine will be able to
match the records and the update operation can be performed on the columns that have been specified.
1 USE SqlShackMergeDemo
2     GO
3     
4     MERGE TargetProducts AS Target
5     USING SourceProducts AS Source
6     ON Source.ProductID = Target.ProductID
7     
8     -- For Inserts
9     WHEN NOT MATCHED BY Target THEN
10         INSERT (ProductID,ProductName, Price)
11         VALUES (Source.ProductID,Source.ProductName, Source.Price)
12     
13     -- For Updates
14     WHEN MATCHED THEN UPDATE SET
15         Target.ProductName = Source.ProductName,
16         Target.Price = Source.Price;

Figure 4 – Record updated using the MERGE statement


As you can see in the figure above, the initial value for the product “Desk” in the target table was
mentioned as “180.00”. When the SQL MERGE statement was executed, it updated the values for all the
matched records that had an entry in the source. Also, if you notice the SQL script now, you can see that I
have just added the update script after the insert statement, and that means all the inserts and the
updates are being executed in the same script itself.
Let us now see how to delete or remove records from the target table in the same script itself.
1 USE SqlShackMergeDemo
2 GO
3     
4 MERGE TargetProducts AS Target
5 USING SourceProducts AS Source
6 ON Source.ProductID = Target.ProductID
7     
8 -- For Inserts
9 WHEN NOT MATCHED BY Target THEN
10     INSERT (ProductID,ProductName, Price)
11     VALUES (Source.ProductID,Source.ProductName, Source.Price)
12     
13 -- For Updates
14 WHEN MATCHED THEN UPDATE SET
15     Target.ProductName = Source.ProductName,
16     Target.Price = Source.Price
17     
18 -- For Deletes
19 WHEN NOT MATCHED BY Source THEN
20     DELETE;

Figure 5 – Records deleted using the MERGE statement


Now, if you see, all records with ProductID  5 and 6 are being deleted from the target table since these
records are not available in the source. In this way, you can implement a SQL MERGE statement in a very
simple yet powerful way and can handle complex business requirements.
If you would like to see a summary of all the actions that have been performed by the MERGE statement,
then you may modify your existing script and include the following output actions. It will return us a list
of records on which we have performed the merge and what operation has been executed on that
particular record.
1 USE SqlShackMergeDemo
2 GO
3     
4 MERGE TargetProducts AS Target
5 USING SourceProducts AS Source
6 ON Source.ProductID = Target.ProductID
7     
8 -- For Inserts
9 WHEN NOT MATCHED BY Target THEN
10     INSERT (ProductID,ProductName, Price)
11     VALUES (Source.ProductID,Source.ProductName, Source.Price)
12     
13 -- For Updates
14 WHEN MATCHED THEN UPDATE SET
15     Target.ProductName = Source.ProductName,
16     Target.Price = Source.Price
17     
18 -- For Deletes
19 WHEN NOT MATCHED BY Source THEN
20     DELETE
21         
22 -- Checking the actions by MERGE statement
23 OUTPUT $action,
24 DELETED.ProductID AS TargetProductID,
25 DELETED.ProductName AS TargetProductName,
26 DELETED.Price AS TargetPrice,
27 INSERTED.ProductID AS SourceProductID,
28 INSERTED.ProductName AS SourceProductName,
29 INSERTED.Price AS SourcePrice;

Figure 6 – Checking output actions by the merge statement

Important things to remember while implementing


SQL MERGE
Although we have now understood how to write the MERGE statement from scratch and how to modify
the script to include logic for handling inserts, updates and deletes, there are also some other key
important points that we should keep in mind while preparing the scripts.
1. Every MERGE statement must end with a semi-colon. If a semi-colon is not present at the end of
the MERGE statement, then an error will be thrown
2. You can use SELECT @@RowCount after writing the MERGE statement which will return the
number of records that have been modified by the transaction
3. It is mandatory that one of the MATCHED clauses is provided in order for the MERGE statement
to operate

Conclusion
In this article, I have explained in detail about the SQL MERGE statement. This MERGE statement has
been introduced in the SQL Server 2008 which brought a great revolution in writing simpler and
maintainable code in SQL. The MERGE statement takes in two tables – a source and a target and
compares the records based on a key column, often the index column, and then performs an operation
on it. Being a database developer, I would definitely advise all young programmers to start using the SQL
MERGE statement more frequently while using complex stored procedures in SQL.

SQL Server table hints – WITH (NOLOCK) best


practices
February 14, 2018 by Ahmad Yaseen

SQL Server table hints are a special type of explicit command that is used to override the default
behavior of the SQL Server query optimizer during the T-SQL query execution This is accomplished by
enforcing a specific locking method, a specific index or query processing operation, such index seek or
table scan, to be used by the SQL Server query optimizer to build the query execution plan. The table
hints can be added to the FROM clause of the T-SQL query, affecting the table or the view that is
referenced in the FROM clause only.
One of the more heavily used table hints in the SELECT T-SQL statements is the WITH (NOLOCK) hint.
The default transaction isolation level in SQL Server is the READ COMMITTED isolation level, in which
retrieving the changing data will be blocked until these changes are committed. The WITH (NOLOCK)
table hint is used to override the default transaction isolation level of the table or the tables within the
view in a specific query, by allowing the user to retrieve the data without being affected by the locks, on
the requested data, due to another process that is changing it. In this way, the query will consume less
memory in holding locks against that data. In addition to that, no deadlock will occur against the queries,
that are requesting the same data from that table, allowing a higher level of concurrency due to a lower
footprint. In other words, the WITH (NOLOCK) table hint retrieves the rows without waiting for the other
queries, that are reading or modifying the same data, to finish its processing. This is similar to the READ
UNCOMMITTED transaction isolation level, that allows the query to see the data changes before
committing the transaction that is changing it. The transaction isolation level can be set globally at the
connection level using the SET TRANSACTION ISOLATION LEVEL T-SQL command, as will see later in this
article.
Although the NOLOCK table hint, similar to all other table hints, can be used without using the WITH
keyword, Microsoft announced that omitting the WITH keyword is a deprecated feature and will be
removed from future Microsoft SQL Server versions. With that said, it is better to include the WITH
keyword when specifying the table hints. One benefit of using the WITH keyword is that you can specify
multiple table hints using the WITH keyword against the same table.
In general, using explicit table hints frequently is considered as a bad practice that you should generally
avoid. For the NOLOCK table hint specifically, reading uncommitted data that could be rolled back after
you have read it can lead to a Dirty read, which can occur when reading the data that is being modified
or deleted during the uncommitted data read, so that the data you read could be different, or never even
have existed.
The WITH (NOLOCK) table hint also leads to Nonrepeatable reads; this read occurs when it is required
to read the same data multiple times and the data changes during these readings. In this case, you will
read multiple versions of the same row.
Phantom reads can be also a result of using the WITH(NOLOCK) table hint, in which you will get more
records when the transaction that is inserting new records is rolled back, or fewer records when the
transaction that is deleting existing data is rolled back. Another problem that may occur when other
transactions move data you have not read yet to a location that you have already scanned, or have
added new pages to the location that you already scanned. In this case, you will miss these records and
will not see it in the returned result. If another transaction moves the data that you have already scanned
to a new location that you have not read yet, you will read the data twice. Also, as the requested data
could be moved or deleted during your reading process, the below error could be faced:
Msg 601, Level 12, State 1
Could not continue scan with NOLOCK due to data movement.
The WITH (NOLOCK) table hint is a good idea when the system uses explicit transactions heavily, which
blocks the data reading very frequently. The WITH (NOLOCK) table hint is used when working with
systems that accept out of sync data, such as the reporting systems.
To understand the usage of the WITH (NOLOCK) table hint practically, let us create a new table using the
CREATE TABLE T-SQL statement below:
1 USE SQLShackDemo
2 GO
3 CREATE TABLE LockTestDemo
4 ( ID INT IDENTITY (1,1) PRIMARY KEY,
5   EmpName NVARCHAR(50),
6   EmpAddress NVARCHAR(4000),
7   PhoneNumber VARCHAR(50)
8 )
After creating the table, we will fill it with 100K rows for testing purposes, using ApexSQL Generate, SQL
test data generator, as shown in the snapshot below:

Once the table is ready, we will simulate a blocking scenario, in which an update transaction will be
executed within a transaction that will begin and not committed or rolled back. The below BEGIN TRAN
T-SQL statement will start the transaction that will run the following UPDATE statement on the
LockTestDemo table under SQL session number 53, without closing the transaction by committing or
rolling it back:
1 BEGIN TRAN
2 UPDATE LockTestDemo SET EmpAddress = 'AMM' WHERE   ID <100
With the table’s data locked by the transaction, we will run another SELECT statement, under SQL session
number 54, that retrieves data from the LockTestDemo table, using the SELECT statement below:
1 SELECT * FROM LockTestDemo
You will see that the previous SELECT statement will take a long time without retrieving any records.
Checking what is blocking that SELECT query using sp_who2 command with the session number for both
the SELECT and the UPDATE statements:
1 sp_who2 53
2 GO
3 sp_who2 54
The result will show you that, the previously opened transaction is not performing any action, as the
UPDATE statement executed successfully. But due to the fact that the transaction is not committed or
rolled back yet, it still blocking other queries that are trying to get data from that table. And the SELECT
statement that is running under session 54 is blocked by that transaction that is running under session
53, as shown in the result below:

The previous SELECT statement will keep waiting for the transaction to be killed, committed or rolled
back in order to get the requested rows from that table. You can stop the transaction that is running
under session 53 from blocking other queries by killing that session using the KILL command below:
1 KILL 53
Or simply committing or rolling back that transaction, by running the COMMIT or ROLLBACK command
under the same session of the transaction, if applicable, as shown below:

Once the locking is released, you will see that the requested rows will be retrieved from the SELECT
statement directly as shown in the results below:

The previous solution is not always preferable or applicable, for example, when the transaction that is
blocking our queries is critical and not easy to be killed or rolled back, or when you don’t have control
over other’s transactions within the database. In this case, the WITH (NOLOCK) table hint is useful here, if
you can tolerate the risk of dirty reads or data inconsistency. As mentioned previously, the WITH
(NOLOCK) table hint allows you to read the data that has been changed, but not committed to the
database yet. If you run the same SELECT statement without killing, committing or rolling back the
UPDATE transaction, but this time adding the WITH (NOLOCK) table hint to the table name in the SELECT
statement as shown below:
1 SELECT * FROM LockTestDemo WITH (NOLOCK)
Then checking the SELECT statement status using the sp_who2 command. You will see that the query is
running without waiting for the UPDATE transaction to be completed successfully and release the locking
on the table, as shown in the snapshot below:

The WITH (NOLOCK) table hint works the same as the READUNCOMMITTED table hint, allowing us to
retrieve the data that is changed but not committed yet. The same SELECT statement can be modified to
use the READUNCOMMITTED table hint as shown below:
1 SELECT * FROM LockTestDemo WITH (READUNCOMMITTED)
Retrieving the requested data directly, without waiting for the UPDATE statement to release the lock it
performed on the table, returning the same result as shown in the result set below:

Take into consideration that, the WITH (NOLOCK) and READUNCOMMITTED table hints can be only used
with the SELECT statements. If you try to use the WITH (NOLOCK) table hint in the DELETE statement, you
will get an error, showing that it both the WITH (NOLOCK) and READUNCOMMITTED table hints are not
allowed with the UPDATE, INSERT, DELETE or MERGE T-SQL statements, as shown below:

Rather than allowing a dirty read at the query level using the WITH (NOLOCK) and READUNCOMMITTED
table hints, you can change the transaction isolation level at the connection level to be READ
UNCOMMITTED using the SET TRANSACTION ISOLATION LEVEL T-SQL statement below:
1 SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
2 SELECT * FROM LockTestDemo
This query will also retrieve the same data directly, without using any table hint and without waiting for
the UPDATE statement to release the lock it performed on the table, as shown in the result set below:
From the previous results, you may think that this is the perfect solution for such scenarios, where you
will get the requested data faster, without waiting for other operations to be committed, taking the risk
of having not accurate data. But will the SELECT query that is using the WITH (NOLOCK) table hint
negatively affects other processes on the SQL Server? To get the answer, let us first check what type of
locks the WITH (NOLOCK) table hint will be granted during its execution. This can be achieved by simply
running the sp_lock command with the session number of the running query, while the query is running,
as shown below:
1 sp_lock 54
You will see from the result that the query that is using the WITH (NOLOCK) table hint will be
granted S and Sch-S locking types, as shown in the result below:

From the previous result, you will see that the WITH (NOLOCK) table hint will be granted shared access
(S) lock at the database level. The shared access (S) lock is used for reading operation, allowing
concurrent transactions to read data under pessimistic concurrency control, preventing other
transactions from modifying the locked resource while shared (S) locks exist on that resource, until that
locking is released as soon as the read operation completes.
The second kind of locking that is granted to the query using the WITH (NOLOCK) table hint is
the schema stability (Sch-S) lock. This lock will not prevent any other transaction from accessing the
resources except for the concurrent DDL operations, and concurrent DML operations that acquire
schema modification (Sch-M) locks on the same table, that will be blocked while the query is executing.
This really makes sense, as you do not need to start reading data from the table then another transaction
changes the structure of that table during your data retrieval process. SQL Server Database Engine uses
the schema modification (Sch-M) locks while processing the data definition language (DDL) commands,
such as adding a new column, dropping an existing column, dropping or rebuilding indexes, to prevent
concurrent access to the table, until the lock is released.

My NOLOCK query is blocking!


This means that the NOLOCK naming is not always 100% accurate. Use of the WITH (NOLOCK) table hint,
that holds schema stability (Sch_S) lock, can block other queries that attempt to acquire a schema
modification (Sch-M) lock on that table. It is a critical issue that you should take into consideration if
there are lots of users executing their SELECT queries using the WITH (NOLOCK) table hint, preventing
you from making any changes to the table schema or maintenances on the table indexes, being blocked
by the schema stability (Sch_S) lock.
Assume that we need to run the below SELECT statement, that is using the WITH (NOLOCK) table hint,
under session number 53:
1 SELECT * FROM LockTestDemo WITH (NOLOCK) WHERE EmpAddress LIKE '%Da%' and EmpName like '%And%'
At the same time, we will run the below query, that is dropping an index on the same table and create it
again, under session number 58:
1 USE [SQLShackDemo]
2 GO
3  
4 DROP INDEX [IX_LockTestDemo_EmpName] ON [dbo].[LockTestDemo]
5 GO
6  
7 CREATE NONCLUSTERED INDEX [IX_LockTestDemo_EmpName] ON [dbo].[LockTestDemo]
8 (
9 [EmpName] ASC
10 )
11 GO
Then checking the status of both queries using the sp_who2 command, you will see from the result that,
the SELECT statement that is using the WITH (NOLOCK) table hint and running session number 53, is
locking the DROP/CREATE INDEX process running under session number 58, as shown clearly below:

If we check the locks that are performed by each query, using the sys.dm_tran_locks system object as in
the query below:
1 SELECT *
2 FROM sys.dm_tran_locks
3 WHERE resource_type = 'OBJECT'
You will see that, the DROP/CREATE INDEX process running under session number 58 is waiting to
acquire schema modification (Sch-M) lock type. This occurs due to the fact that, the schema modification
(Sch-M) lock cannot be acquired while the schema stability (Sch_S) lock that is already granted to the
SELECT statement running under session number 53, already exists as shown in the snapshot below:

My NOLOCK query is blocked!


Conversely, since the WITH (NOLOCK) table hint acquires schema stability (Sch-S) lock type, the SELECT
statement that is using the WITH (NOLOCK) table hint will be blocked if a schema modification is
performed on that table. Assume that we run the below ALTER TABLE T-SQL statement to change the
size of the EmpAddress column on the LockTestDemo table, under session number 53:
1 ALTER TABLE LockTestDemo ALTER COLUMN [EmpAddress] VARCHAR (5000)
At the same time, the below SELECT statement that is using the WITH (NOLOCK) table hint will be
running under session number 54:
1 SELECT * FROM LockTestDemo WITH (NOLOCK)
Checking the status of both queries using the sp_who2 commands below:
1 sp_who2 53
2 GO
3 sp_who2 54
You will see that the SELECT statement running under session 54 is blocked by the ALTER TABLE
statement running under session 54, as shown below:
Then checking the locks that are performed by each query, using the sys.dm_tran_locks system object as
in the query below:
1 SELECT *
2 FROM sys.dm_tran_locks
3 WHERE resource_type = 'OBJECT'
4 AND request_session_id in (53, 54)
It will be clear from the returned result that, the SELECT statement that is using the WITH (NOLOCK) table
hint and running under session number 54, will be waiting to acquire schema stability (Sch_S) lock, due to
the fact that the schema stability (Sch-S) lock cannot be acquired while the schema modification (Sch_M)
lock, that is already granted to the ALTER statement running under session number 53, already exists as
shown in the snapshot below:

You can imagine the situation when you are scheduling huge number of reports at night, that are using
the WITH (NOLOCK) table hint just to be safe. At the same time, there are maintenance jobs that are also
scheduled to rebuild heavily fragmented indexes on the same table!
There are number of best practices and suggestions that you can follow, in order to avoid the problems
that you may face when using WITH (NOLOCK) table hint. Such suggestions include:
 Include only the columns that are really required in your SELECT query
 Make sure that your transaction is short, by separating different operations from each other. For
example, do not include a huge SELECT statement between two UPDATE operations
 Try to find an alternative to the cursors
 Take care to utilize and benefit from the newly defined WAIT_AT_LOW_PRIORITY option to do an
online rebuild for the indexes
 Study reporting vs maintenances schedules well
 Take care to utilize and benefit from the different SQL Server high availability solutions for
reporting purposes, such as:
o Configure the Always On Availability Groups secondary replicas to be readable and use it for
reporting
o Create database snapshots when using the SQL Server Database Mirroring and use it for
reporting
o Use the SQL Server Replication subscriber database for reporting
o Use the secondary database of the SQL Server Log Shipping for reporting

Overview of SQL RANK functions


July 3, 2019 by Rajendra Gupta

We perform calculations on data using various aggregated functions such as Max, Min, and AVG. We get
a single output row using these functions. SQL Sever provides SQL RANK functions to specify rank for
individual fields as per the categorizations. It returns an aggregated value for each participating row. SQL
RANK functions also knows as Window Functions.
 Note:  Windows term in this does not relate to the Microsoft Windows operating system. These are
SQL RANK functions.
We have the following rank functions.
 ROW_NUMBER()
 RANK()
 DENSE_RANK()
 NTILE()
In the SQL RANK functions, we use the OVER() clause to define a set of rows in the result set. We can also
use SQL PARTITION BY clause to define a subset of data in a partition. You can also use Order by clause
to sort the results in a descending or ascending order.
Before we explore these SQL RANK functions, let’s prepare sample data. In this sample data, we have
exam results for three students in Maths, Science and English subjects.
1 CREATE TABLE ExamResult
2 (StudentName VARCHAR(70),
3 Subject     VARCHAR(20),
4 Marks       INT
5 );
6 INSERT INTO ExamResult
7 VALUES
8 ('Lily',
9 'Maths',
10 65
11 );
12 INSERT INTO ExamResult
13 VALUES
14 ('Lily',
15 'Science',
16 80
17 );
18 INSERT INTO ExamResult
19 VALUES
20 ('Lily',
21 'english',
22 70
23 );
24 INSERT INTO ExamResult
25 VALUES
26 ('Isabella',
27 'Maths',
28 50
29 );
30 INSERT INTO ExamResult
31 VALUES
32 ('Isabella',
33 'Science',
34 70
35 );
36 INSERT INTO ExamResult
37 VALUES
38 ('Isabella',
39 'english',
40 90
41 );
42 INSERT INTO ExamResult
43 VALUES
44 ('Olivia',
45 'Maths',
46 55
47 );
48 INSERT INTO ExamResult
49 VALUES
50 ('Olivia',
51 'Science',
52 60
53 );
54 INSERT INTO ExamResult
55 VALUES
56 ('Olivia',
57 'english',
58 89
59 );
We have the following sample data in the ExamResult table.

Let’s use each SQL Rank Functions in upcoming examples.

ROW_Number() SQL RANK function


We use ROW_Number() SQL RANK function to get a unique sequential number for each row in the
specified data. It gives the rank one for the first row and then increments the value by one for each row.
We get different ranks for the row having similar values as well.
Execute the following query to get a rank for students as per their marks.
1 SELECT Studentname,
2        Subject,
3        Marks,
4        ROW_NUMBER() OVER(ORDER BY Marks) RowNumber
5 FROM ExamResult;

By default, it sorts the data in ascending order and starts assigning ranks for each row. In the above
screenshot, we get ROW number 1 for marks 50.
We can specify descending order with Order By clause, and it changes the RANK accordingly.
1 SELECT Studentname,
2        Subject,
3        Marks,
4        ROW_NUMBER() OVER(ORDER BY Marks desc) RowNumber
5 FROM ExamResult;
RANK() SQL RANK Function
We use RANK() SQL Rank function to specify rank for each row in the result set. We have student results
for three subjects. We want to rank the result of students as per their marks in the subjects. For example,
in the following screenshot, student Isabella got the highest marks in English subject and lowest marks in
Maths subject. As per the marks, Isabella gets the first rank in English and 3rd place in Maths subject.

Execute the following query to get this result set. In this query, you can note the following things:
 We use PARTITION BY Studentname clause to perform calculations on each student group
 Each subset should get rank as per their Marks in descending order
 The result set uses Order By clause to sort results on Studentname and their rank
1 SELECT Studentname,
2        Subject,
3        Marks,
4        RANK() OVER(PARTITION BY Studentname ORDER BY Marks DESC) Rank
5 FROM ExamResult
6 ORDER BY Studentname,
7          Rank;
Let’s execute the following query of SQL Rank function and look at the result set. In this query, we did
not specify SQL PARTITION By clause to divide the data into a smaller subset. We use SQL Rank function
with over clause on Marks clause ( in descending order) to get ranks for respective rows.
1 SELECT Studentname,
2        Subject,
3        Marks,
4        RANK() OVER(ORDER BY Marks DESC) Rank
5 FROM ExamResult
6 ORDER BY Rank;
In the output, we can see each student get rank as per their marks irrespective of the specific subject. For
example, the highest and lowest marks in the complete result set are 90 and 50 respectively. In the result
set, the highest mark gets RANK 1, and the lowest mark gets RANK 9.
If two students get the same marks (in our example, ROW numbers 4 and 5), their ranks are also the
same.

DENSE_RANK() SQL RANK function


We use DENSE_RANK() function to specify a unique rank number within the partition as per the specified
column value. It is similar to the Rank function with a small difference.
In the SQL RANK function DENSE_RANK(), if we have duplicate values, SQL assigns different ranks to
those rows as well. Ideally, we should get the same rank for duplicate or similar values.
Let’s execute the following query with the DENSE_RANK() function.
1 SELECT Studentname,
2        Subject,
3        Marks,
4        DENSE_RANK() OVER(ORDER BY Marks DESC) Rank
5 FROM ExamResult
6 ORDER BY Rank;
In the output, you can see we have the same rank for both Lily and Isabella who scored 70 marks.

Let’s use DENSE_RANK function in combination with the SQL PARTITION BY clause.
1 SELECT Studentname,
2        Subject,
3        Marks,
4        DENSE_RANK() OVER(PARTITION BY Subject ORDER BY Marks DESC) Rank
5 FROM ExamResult
6 ORDER BY Studentname,
7          Rank;
We do not have two students with similar marks; therefore result set similar to RANK Function in this
case.

Let’s update the student mark with the following query and rerun the query.
1 Update Examresult set Marks=70 where Studentname='Isabella' and Subject='Maths'
We can see that in the student group, Isabella got similar marks in Maths and Science subjects. Rank is
also the same for both subjects in this case.
Let’s see the difference between RANK() and DENSE_RANK() SQL Rank function with the following query.
 Query 1
1 SELECT Studentname,
2        Subject,
3        Marks,
4        RANK() OVER(PARTITION BY StudentName ORDER BY Marks ) Rank
5 FROM ExamResult
6 ORDER BY Studentname,
7          Rank;
 Query 2
1 SELECT Studentname,
2        Subject,
3        Marks,
4        DENSE_RANK() OVER(PARTITION BY StudentName ORDER BY Marks ) Rank
5 FROM ExamResult
6 ORDER BY Studentname,
7          Rank;
In the output, you can see a gap in the rank function output within a partition. We do not have any gap
in the DENSE_RANK function.

In the following screenshot, you can see that Isabella has similar numbers in the two subjects. A rank
function assigns rank 1 for similar values however, internally ignores rank two, and the next row gets rank
three.
In the Dense_Rank function, it maintains the rank and does not give any gap for the values.

NTILE(N) SQL RANK function


We use the NTILE(N) function to distribute the number of rows in the specified (N) number of groups.
Each row group gets its rank as per the specified condition. We need to specify the value for the desired
number of groups.
In my example, we have nine records in the ExamResult table. The NTILE(2) shows that we require a
group of two records in the result.
1 SELECT *,
2        NTILE(2) OVER(
3        ORDER BY Marks DESC) Rank
4 FROM ExamResult
5 ORDER BY rank;
In the output, we can see two groups. Group 1 contains five rows, and Group 2 contains four rows.

Similarly, NTILE(3) divides the number of rows of three groups having three records in each group.
1 SELECT *,
2        NTILE(3) OVER(
3        ORDER BY Marks DESC) Rank
4 FROM ExamResult
5 ORDER BY rank;

We can use SQL PARTITION BY clause to have more than one partition. In the following query, each
partition on subjects is divided into two groups.
1 SELECT *,
2        NTILE(2) OVER(PARTITION  BY subject ORDER BY Marks DESC) Rank
3 FROM ExamResult
4 ORDER BY subject, rank;

Practical usage of SQL RANK functions


We can use SQL RANK function to fetch specific rows from the data. Suppose we want to get the data of
the students from ranks 1 to 3. In the following query, we use common table expressions(CTE) to get
data using ROW_NUMBER() function and later filtered the result from CTE to satisfy our condition.
1 WITH StudentRanks AS
2 (
3   SELECT *, ROW_NUMBER() OVER( ORDER BY Marks) AS Ranks
4   FROM ExamResult
5 )
6
7 SELECT StudentName , Marks
8 FROM StudentRanks
9 WHERE Ranks >= 1 and Ranks <=3
10 ORDER BY Ranks

We can use the OFFSET FETCH command starting from SQL Server 2012 to fetch a specific number of
records.
1 WITH StudentRanks AS
2 (
3   SELECT *, ROW_NUMBER() OVER( ORDER BY Marks) AS Ranks
4   FROM ExamResult
5 )
6
7 SELECT StudentName , Marks
8 FROM StudentRanks
9 ORDER BY Ranks OFFSET 1 ROWS FETCH NEXT 3 ROWS ONLY;

A quick summary of SQL RANK Functions


ROW_Number It assigns the sequential rank number to each unique record.

RANK It assigns the rank number to each row in a partition. It skips the number for similar values.

Dense_RANK It assigns the rank number to each row in a partition. It does not skip the number for similar values.

NTILE(N) It divides the number of rows as per specified partition and assigns unique value in the partition.

Conclusion
In this article, we explored SQL RANK functions and difference between these functions. It is helpful for
sql developers to be familiar with these functions to explore and manage their data well. If you have any
comments or questions, feel free to leave them in the comments below.

The Table Variable in SQL Server


December 3, 2019 by Esat Erkec

In this article, we will explore the table variable in SQL Server with various examples and we will also
discuss some useful tips about the table variables.

Definition
The table variable is a special type of the local variable that helps to store data temporarily, similar to the
temp table in SQL Server. In fact, the table variable provides all the properties of the local variable, but
the local variables have some limitations, unlike temp or regular tables.

Syntax
The following syntax describes how to declare a table variable:
1 DECLARE @LOCAL_TABLEVARIABLE TABLE
2 (column_1 DATATYPE,
3 column_2 DATATYPE,
4 column_N DATATYPE
5)
If we want to declare a table variable, we have to start the DECLARE statement which is similar to local
variables. The name of the local variable must start with at(@) sign. The TABLE keyword specifies that this
variable is a table variable. After the TABLE keyword, we have to define column names and datatypes of
the table variable in SQL Server.
In the following example, we will declare a table variable and insert the days of the week and their
abbreviations to the table variable:
1 DECLARE @ListOWeekDays TABLE(DyNumber INT,DayAbb VARCHAR(40) , WeekName VARCHAR(40))
2  
3 INSERT INTO @ListOWeekDays
4 VALUES
5 (1,'Mon','Monday')  ,
6 (2,'Tue','Tuesday') ,
7 (3,'Wed','Wednesday') ,
8 (4,'Thu','Thursday'),
9 (5,'Fri','Friday'),
10 (6,'Sat','Saturday'),
11 (7,'Sun','Sunday')
12 SELECT * FROM @ListOWeekDays

At the same time, we can update and delete the data contained in the table variables. The following
query delete and update rows:
1 DECLARE @ListOWeekDays TABLE(DyNumber INT,DayAbb VARCHAR(40) , WeekName VARCHAR(40))
2  
3 INSERT INTO @ListOWeekDays
4 VALUES
5 (1,'Mon','Monday')  ,
6 (2,'Tue','Tuesday') ,
7 (3,'Wed','Wednesday') ,
8 (4,'Thu','Thursday'),
9 (5,'Fri','Friday'),
10 (6,'Sat','Saturday'),
11 (7,'Sun','Sunday')
12 DELETE @ListOWeekDays WHERE DyNumber=1
13 UPDATE @ListOWeekDays SET WeekName='Saturday is holiday'  WHERE DyNumber=6
14 SELECT * FROM @ListOWeekDays

What is the storage location of the table variables?


The answer to this question is – table variables are stored in the tempdb database. Why we underline this
is because sometimes the answer to this question is that the table variable is stored in the memory, but
this is totally wrong. Before proving the answer to this question, we should clarify one issue about the
table variables. The lifecycle of the table variables starts in the declaration point and ends at the end of
the batch. As a result, the table variable in SQL Server is automatically dropped at the end of the batch:
1 DECLARE @ExperiementTable TABLE
2 (
3 TestColumn_1 INT, TestColumn_2 VARCHAR(40), TestColumn_3 VARCHAR(40)
4 );
5 SELECT TABLE_CATALOG, TABLE_SCHEMA, COLUMN_NAME, DATA_TYPE
6 FROM tempdb.INFORMATION_SCHEMA.COLUMNS
7 WHERE COLUMN_NAME LIKE 'TestColumn%';
8     
9 GO
10 SELECT TABLE_CATALOG, TABLE_SCHEMA, COLUMN_NAME, DATA_TYPE
11 FROM tempdb.INFORMATION_SCHEMA.COLUMNS
12 WHERE COLUMN_NAME LIKE 'TestColumn%';

As you can see, the previous query returns two result sets. The ResultSet-1 contains column names and
data types of the declared table variable and the ResultSet-2 does not contain any data. The reason for
this case is, the first INFORMATION_SCHEMA.COLUMNS view, and table variable executed in the same
batch so we can get the information of the @ExperiementTable table variable from the tempdb
database. The second query could not return any data about the @ExperiementTable because the GO
statement ends the batch so the life-cycle of the @ExperiementTable table variable is terminated. In this
section, we proved the storage location of the table variable in SQL Server.

How can we use constraints with the table variables?


Constraints are database objects that ensure data integrity. Table variables allow us to create the
following constraints:
 Primary Key
 Unique
 Null
 Check
In the following example, we will successfully use all types of constraints on the table variable seamlessly:
1 DECLARE @TestTable TABLE
2 (ID INT PRIMARY KEY,
3 Col1 VARCHAR(40) UNIQUE,
4 Col2 VARCHAR(40) NOT NULL,
5 Col3 int CHECK (Col3>=18))
6     
7 INSERT INTO @TestTable
8 VALUES(1,'Value1',12 , 20)
9     
10 SELECT * FROM @TestTable
On the other hand, Foreign Key constraints cannot use for the table variables. The other restriction is, we
have to define the constraints when we are declaring the table variable otherwise, we experience an
error. For example, the following query will return an error because of this restriction. We cannot alter the
table structure after the declaration of the table variable:
1 DECLARE @TestTable TABLE
2 (ID INT NOT NULL  )
3     
4 ALTER TABLE @TestTable
5 ADD CONSTRAINT PK_ID PRIMARY KEY (ID)

Transactions and table variable in SQL Server


Transactions are the smallest logical unit that helps to manage the CRUD (insert, select, update and
delete) operations in the SQL Server. Explicit transactions are started with BEGIN TRAN statement and
they can be completed with COMMIT or ROLLBACK statements. Now we will execute the following query
and then analyze the result:
1 DECLARE @TestTable TABLE
2 (ID INT PRIMARY KEY,
3 Col1 VARCHAR(40) UNIQUE,
4 Col2 VARCHAR(40) NOT NULL,
5 Col3 int CHECK (Col3>=18))
6 BEGIN TRAN
7 INSERT INTO @TestTable
8 VALUES(1,'Value1',12 , 20)
9     
10 ROLLBACK TRAN
11     
12 SELECT * FROM @TestTable

Table variable CRUD operations do not manage by explicit transactions. As a result, ROLLBACK TRAN
cannot erase the modified data for the table variables.

Some useful tips for the table variables


TRUNCATE statement does not work for table variables
The TRUNCATE statement helps to delete all rows in the tables very quickly. However, this statement
cannot be used for table variables. For example, the following query will return an error:
1 DECLARE @TestTable TABLE
2 (ID INT PRIMARY KEY,
3 Col1 VARCHAR(40) UNIQUE,
4 Col2 VARCHAR(40) NOT NULL,
5 Col3 int CHECK (Col3>=18))
6     
7 INSERT INTO @TestTable
8 VALUES(1,'Value1',12 , 20)
9     
10     
11 TRUNCATE TABLE @TestTable
The table variable structure cannot be changed after it has
been declared
According to this tip interpretation, the following query has to return an error:
1 DECLARE @TestTable TABLE
2 (ID INT PRIMARY KEY,
3 Col1 VARCHAR(40) UNIQUE,
4 Col2 VARCHAR(40) NOT NULL)
5         
6 ALTER TABLE @TestTable
7 ADD Col4 INT

The table variable in SQL Server should use an alias with the
join statements
If we want to join two or more table variables with each other or regular tables, we have to use an alias
for the table names. The usage of this looks like this:
1 DECLARE @Department TABLE
2 (DepartmentID INT PRIMARY KEY,
3 DepName VARCHAR(40) UNIQUE)
4         
5 INSERT INTO @Department VALUES(1,'Marketing')
6 INSERT INTO @Department VALUES(2,'Finance')
7 INSERT INTO @Department VALUES(3,'Operations ')
8         
9 DECLARE @Employee TABLE
10 (EmployeeID INT PRIMARY KEY IDENTITY(1,1),
11 EmployeeName VARCHAR(40),
12 DepartmentID VARCHAR(40))
13         
14 INSERT INTO @Employee VALUES('Jodie Holloway','1')
15 INSERT INTO @Employee VALUES('Victoria Lyons','2')
16 INSERT INTO @Employee VALUES('Callum Lee','3')
17         
18 select * from @Department Dep inner join @Employee Emp
19 on Dep.DepartmentID = Emp.DepartmentID
The table variable does not allow to create an explicit index
Indexes help to improve the performance of the queries but the CREATE INDEX statement cannot be
used to create an index for the table variables. For example, the following query will return an error:
1 DECLARE @TestTable TABLE
2 (ID INT PRIMARY KEY,
3 Col1 VARCHAR(40) UNIQUE,
4 Col2 VARCHAR(40) NOT NULL)
5         
6         
7 CREATE NONCLUSTERED INDEX test_index
8 ON @TestTable(Col1)

However, we can overcome this issue with the help of the implicit index definitions because the PRIMARY
KEY constraint or UNIQUE constraints definitions automatically create an index and we can use these
INDEX statements in order to create single or composite non-clustered indexes. When we execute the
following query, we can figure out the created index which belongs to @TestTable:
1 DECLARE @TestTable TABLE
2 (
3     Col1 INT NOT NULL PRIMARY KEY ,
4     Col2 INT NOT NULL INDEX Cluster_I1 (Col1,Col2),
5     Col3 INT NOT NULL UNIQUE
6 )
7         
8         
9 SELECT
10 ind.name,type_desc
11 FROM
12      tempdb.sys.indexes ind
13         
14 where ind.object_id=(
15 SELECT OBJECT_ID FROM tempdb.sys.objects obj WHERE obj.name  IN (
16 SELECT TABLE_NAME FROM tempdb.INFORMATION_SCHEMA.COLUMNS
17 WHERE  (COLUMN_NAME = 'Col1' OR COLUMN_NAME='Col2' OR COLUMN_NAME='Col3')
18 ))

Conclusion
In this article, we explored the table variable in SQL Server details with various examples. Also, we
mentioned the features and limitations of the table variables.

The Table Variable in SQL Server


December 3, 2019 by Esat Erkec

In this article, we will explore the table variable in SQL Server with various examples and we will also
discuss some useful tips about the table variables.

Definition
The table variable is a special type of the local variable that helps to store data temporarily, similar to the
temp table in SQL Server. In fact, the table variable provides all the properties of the local variable, but
the local variables have some limitations, unlike temp or regular tables.

Syntax
The following syntax describes how to declare a table variable:
1 DECLARE @LOCAL_TABLEVARIABLE TABLE
2 (column_1 DATATYPE,
3 column_2 DATATYPE,
4 column_N DATATYPE
5 )
If we want to declare a table variable, we have to start the DECLARE statement which is similar to local
variables. The name of the local variable must start with at(@) sign. The TABLE keyword specifies that this
variable is a table variable. After the TABLE keyword, we have to define column names and datatypes of
the table variable in SQL Server.
In the following example, we will declare a table variable and insert the days of the week and their
abbreviations to the table variable:
1 DECLARE @ListOWeekDays TABLE(DyNumber INT,DayAbb VARCHAR(40) , WeekName VARCHAR(40))
2  
3 INSERT INTO @ListOWeekDays
4 VALUES
5 (1,'Mon','Monday')  ,
6 (2,'Tue','Tuesday') ,
7 (3,'Wed','Wednesday') ,
8 (4,'Thu','Thursday'),
9 (5,'Fri','Friday'),
10 (6,'Sat','Saturday'),
11 (7,'Sun','Sunday')
12 SELECT * FROM @ListOWeekDays

At the same time, we can update and delete the data contained in the table variables. The following
query delete and update rows:
1 DECLARE @ListOWeekDays TABLE(DyNumber INT,DayAbb VARCHAR(40) , WeekName VARCHAR(40))
2  
3 INSERT INTO @ListOWeekDays
4 VALUES
5 (1,'Mon','Monday')  ,
6 (2,'Tue','Tuesday') ,
7 (3,'Wed','Wednesday') ,
8 (4,'Thu','Thursday'),
9 (5,'Fri','Friday'),
10 (6,'Sat','Saturday'),
11 (7,'Sun','Sunday')
12 DELETE @ListOWeekDays WHERE DyNumber=1
13 UPDATE @ListOWeekDays SET WeekName='Saturday is holiday'  WHERE DyNumber=6
14 SELECT * FROM @ListOWeekDays
What is the storage location of the table variables?
The answer to this question is – table variables are stored in the tempdb database. Why we underline this
is because sometimes the answer to this question is that the table variable is stored in the memory, but
this is totally wrong. Before proving the answer to this question, we should clarify one issue about the
table variables. The lifecycle of the table variables starts in the declaration point and ends at the end of
the batch. As a result, the table variable in SQL Server is automatically dropped at the end of the batch:
1 DECLARE @ExperiementTable TABLE
2 (
3 TestColumn_1 INT, TestColumn_2 VARCHAR(40), TestColumn_3 VARCHAR(40)
4 );
5 SELECT TABLE_CATALOG, TABLE_SCHEMA, COLUMN_NAME, DATA_TYPE
6 FROM tempdb.INFORMATION_SCHEMA.COLUMNS
7 WHERE COLUMN_NAME LIKE 'TestColumn%';
8     
9 GO
10 SELECT TABLE_CATALOG, TABLE_SCHEMA, COLUMN_NAME, DATA_TYPE
11 FROM tempdb.INFORMATION_SCHEMA.COLUMNS
12 WHERE COLUMN_NAME LIKE 'TestColumn%';

As you can see, the previous query returns two result sets. The ResultSet-1 contains column names and
data types of the declared table variable and the ResultSet-2 does not contain any data. The reason for
this case is, the first INFORMATION_SCHEMA.COLUMNS view, and table variable executed in the same
batch so we can get the information of the @ExperiementTable table variable from the tempdb
database. The second query could not return any data about the @ExperiementTable because the GO
statement ends the batch so the life-cycle of the @ExperiementTable table variable is terminated. In this
section, we proved the storage location of the table variable in SQL Server.

How can we use constraints with the table variables?


Constraints are database objects that ensure data integrity. Table variables allow us to create the
following constraints:
 Primary Key
 Unique
 Null
 Check
In the following example, we will successfully use all types of constraints on the table variable seamlessly:
1 DECLARE @TestTable TABLE
2 (ID INT PRIMARY KEY,
3 Col1 VARCHAR(40) UNIQUE,
4 Col2 VARCHAR(40) NOT NULL,
5 Col3 int CHECK (Col3>=18))
6     
7 INSERT INTO @TestTable
8 VALUES(1,'Value1',12 , 20)
9     
10 SELECT * FROM @TestTable

On the other hand, Foreign Key constraints cannot use for the table variables. The other restriction is, we
have to define the constraints when we are declaring the table variable otherwise, we experience an
error. For example, the following query will return an error because of this restriction. We cannot alter the
table structure after the declaration of the table variable:
1 DECLARE @TestTable TABLE
2 (ID INT NOT NULL  )
3     
4 ALTER TABLE @TestTable
5 ADD CONSTRAINT PK_ID PRIMARY KEY (ID)

Transactions and table variable in SQL Server


Transactions are the smallest logical unit that helps to manage the CRUD (insert, select, update and
delete) operations in the SQL Server. Explicit transactions are started with BEGIN TRAN statement and
they can be completed with COMMIT or ROLLBACK statements. Now we will execute the following query
and then analyze the result:
1 DECLARE @TestTable TABLE
2 (ID INT PRIMARY KEY,
3 Col1 VARCHAR(40) UNIQUE,
4 Col2 VARCHAR(40) NOT NULL,
5 Col3 int CHECK (Col3>=18))
6 BEGIN TRAN
7 INSERT INTO @TestTable
8 VALUES(1,'Value1',12 , 20)
9     
10 ROLLBACK TRAN
11     
12 SELECT * FROM @TestTable

Table variable CRUD operations do not manage by explicit transactions. As a result, ROLLBACK TRAN
cannot erase the modified data for the table variables.

Some useful tips for the table variables


TRUNCATE statement does not work for table variables
The TRUNCATE statement helps to delete all rows in the tables very quickly. However, this statement
cannot be used for table variables. For example, the following query will return an error:
1 DECLARE @TestTable TABLE
2 (ID INT PRIMARY KEY,
3 Col1 VARCHAR(40) UNIQUE,
4 Col2 VARCHAR(40) NOT NULL,
5 Col3 int CHECK (Col3>=18))
6     
7 INSERT INTO @TestTable
8 VALUES(1,'Value1',12 , 20)
9     
10     
11 TRUNCATE TABLE @TestTable

The table variable structure cannot be changed after it has


been declared
According to this tip interpretation, the following query has to return an error:
1 DECLARE @TestTable TABLE
2 (ID INT PRIMARY KEY,
3 Col1 VARCHAR(40) UNIQUE,
4 Col2 VARCHAR(40) NOT NULL)
5         
6 ALTER TABLE @TestTable
7 ADD Col4 INT

The table variable in SQL Server should use an alias with the
join statements
If we want to join two or more table variables with each other or regular tables, we have to use an alias
for the table names. The usage of this looks like this:
1 DECLARE @Department TABLE
2 (DepartmentID INT PRIMARY KEY,
3 DepName VARCHAR(40) UNIQUE)
4         
5 INSERT INTO @Department VALUES(1,'Marketing')
6 INSERT INTO @Department VALUES(2,'Finance')
7 INSERT INTO @Department VALUES(3,'Operations ')
8         
9 DECLARE @Employee TABLE
10 (EmployeeID INT PRIMARY KEY IDENTITY(1,1),
11 EmployeeName VARCHAR(40),
12 DepartmentID VARCHAR(40))
13         
14 INSERT INTO @Employee VALUES('Jodie Holloway','1')
15 INSERT INTO @Employee VALUES('Victoria Lyons','2')
16 INSERT INTO @Employee VALUES('Callum Lee','3')
17         
18 select * from @Department Dep inner join @Employee Emp
19 on Dep.DepartmentID = Emp.DepartmentID
The table variable does not allow to create an explicit index
Indexes help to improve the performance of the queries but the CREATE INDEX statement cannot be
used to create an index for the table variables. For example, the following query will return an error:
1 DECLARE @TestTable TABLE
2 (ID INT PRIMARY KEY,
3 Col1 VARCHAR(40) UNIQUE,
4 Col2 VARCHAR(40) NOT NULL)
5         
6         
7 CREATE NONCLUSTERED INDEX test_index
8 ON @TestTable(Col1)

However, we can overcome this issue with the help of the implicit index definitions because the PRIMARY
KEY constraint or UNIQUE constraints definitions automatically create an index and we can use these
INDEX statements in order to create single or composite non-clustered indexes. When we execute the
following query, we can figure out the created index which belongs to @TestTable:
1 DECLARE @TestTable TABLE
2 (
3     Col1 INT NOT NULL PRIMARY KEY ,
4     Col2 INT NOT NULL INDEX Cluster_I1 (Col1,Col2),
5     Col3 INT NOT NULL UNIQUE
6 )
7         
8         
9 SELECT
10 ind.name,type_desc
11 FROM
12      tempdb.sys.indexes ind
13         
14 where ind.object_id=(
15 SELECT OBJECT_ID FROM tempdb.sys.objects obj WHERE obj.name  IN (
16 SELECT TABLE_NAME FROM tempdb.INFORMATION_SCHEMA.COLUMNS
17 WHERE  (COLUMN_NAME = 'Col1' OR COLUMN_NAME='Col2' OR COLUMN_NAME='Col3')
18 ))

Conclusion
In this article, we explored the table variable in SQL Server details with various examples. Also, we
mentioned the features and limitations of the table variables.

How to drop temp tables in SQL Server


March 23, 2020 by Esat Erkec

Temporary tables, also known as temp tables, are widely used by the database administrators and
developers. However, it may be necessary to drop the temp table before creating it. It is a common
practice to check whether the temporary table exists or not exists. So, we can eliminate the “There is
already an object named ‘#temptablename’ in the database” error during the temporary table creation.

Temporary Tables
The temporary tables are used to store data for an amount of time in SQL Server. Many features of the
temporary tables are similar to the persisted tables. Such as, we can create indexes, statistics, and
constraints for these tables like we do for persisted tables.
The types of temporary tables affect the life-cycle of the temporary tables. Now, we will take a glance at
them.

Types of the Temporary Tables


Local Temporary Tables: The name of this type of temporary table starts with a single “#” hashtag
symbol, and they are solely visible on the created session. If the session which has created the local
temporary table is closed, the temporary table will be dropped automatically by SQL Server.
The following query will create a local temporary table:
1 CREATE TABLE #LocalCustomer
2 (
3 CustomerId int,
4 CustomerName varchar(50),
5 CustomerAdress varchar(150)
6 )
7 GO
8 INSERT INTO #LocalCustomer VALUES(1,'Katelyn Montropx' ,'30  Crescent Avenue DRUMMUIR CASTLE')
9 GO
10 SELECT * FROM #LocalCustomer

Global Temporary Tables: The name of this type of temporary table starts with a double “##” hashtag
symbol and can be accessed from all other connections. This is the major difference between the local
and global temporary tables. If the session where the global temporary table was created is closed, the
global temporary table will be dropped automatically.
The following query will create a global temporary table:
1 CREATE TABLE ##GlobalCustomer
2 (
3 CustomerId int,
4 CustomerName varchar(50),
5 CustomerAdress varchar(150)
6 )
7 GO
8 INSERT INTO ##GlobalCustomer VALUES(1,'Adam Tottropx' ,'30  Mztom Street LONDON')
9 GO
10 SELECT * FROM ##GlobalCustomer
The following table expresses the main differences between global and local temporary tables:
Local Temporary Tables Global Temporary Tables

Names start with a single “#” hashtag symbol. Names start with a double “##” hashtag symbol.

Tables can be accessed only from the session where the table was created. Tables can be accessed from all other sessions.

Cannot be dropped by the other connections. Can be dropped by the other connections.

Where are the Temporary Tables stored?


When we create a temporary table, they are created in the tempdb database. After creating a local
temporary table, if we check the temporary tables folder in tempdb, we will see a weird table name. On
the other hand, global temporary tables are created with their original names.

SQL Server adds random numbers at the end of the local table variables names. The idea behind this
logic is pretty simple. More than one different connection can create local temporary tables with the
same name, so SQL Server automatically adds a random number at the end of this type of temporary
table name. In this way, the SQL Server avoids the same name conflicts.

There is no doubt that after these learnings, if we want to drop any temp table, we should work on
the tempdb database.

How to drop Temp Tables?


As the best practices before creating a temporary table, we should check the existence of this temporary
table. In this way, we don’t experience the following error:
To achieve this check, we can use different techniques. Let us learn these techniques:
Using OBJECT_ID function to check temporary table existence
OBJECT_ID function is used to obtain the identification number of the database
object. OBJECT_ID function can take the object’s name as a parameter so we can use this function to
check the existence of any object in the particular database.
The following query will check the #LocalCustomer table existence in the tempdb database, and if it
exists, it will be dropped.
For the local temporary tables:
1 IF OBJECT_ID(N'tempdb..#LocalCustomer') IS NOT NULL
2 BEGIN
3 DROP TABLE #LocalCustomer
4 END
5 GO
6  
7 CREATE TABLE #LocalCustomer
8 (
9 CustomerId int,
10 CustomerName varchar(50),
11 CustomerAdress varchar(150)
12 )
For the global temporary tables:
1 IF OBJECT_ID(N'tempdb..##GlobalCustomer') IS NOT NULL
2 BEGIN
3 DROP TABLE ##GlobalCustomer
4 END
5 GO
6  
7 CREATE TABLE ##GlobalCustomer
8 (
9 CustomerId int,
10 CustomerName varchar(50),
11 CustomerAdress varchar(150)
12 )

Using sys.tables table to check temporary table existence


In this method, we will check the existence of the temporary table with the help of
the sys.tables because this table returns user tables in the relevant database.
For the local temporary tables:
1 IF EXISTS(SELECT [name] FROM tempdb.sys.tables WHERE [name] like '#LocalCustomer%')
2 BEGIN
3    DROP TABLE #LocalCustomer;
4 END;
5  
6 CREATE TABLE #LocalCustomer
7 (
8 CustomerId int,
9 CustomerName varchar(50),
10 CustomerAdress varchar(150)
11 )
For the global temporary tables:
1 IF EXISTS(SELECT [name] FROM tempdb.sys.tables WHERE [name] like '##GlobalCustomer%')
2 BEGIN
3    DROP TABLE ##GlobalCustomer ;
4 END;
5  
6 CREATE TABLE ##GlobalCustomer
7 (
8 CustomerId int,
9 CustomerName varchar(50),
10 CustomerAdress varchar(150)
11 )
As we can see, we check the existence of the #LocalCustomer table in the tempdb database, and if it
exists, we have to drop it. At this point, we need to underline one issue, the table name is searched with
the LIKE operator, and we also added the wildcard character at the end of the temp table name. As we
stated, local temp tables are created with random suffix so that we can not know the exact name of
them.

Using DROP TABLE IF EXISTS statement


This is the last technique on how to drop a temp table, which we will learn. DROP TABLE
IF EXISTS statement checks the existence of the table, and if the table exists, it drops. We have to
underline one point about this statement; it works on SQL Server 2016 or the higher version of the SQL
Server. In the following query, DROP TABLE IF EXISTS statement, we will check
the #LocalCustomer table existence, and if it exists, it will be dropped.
For the local temporary tables:
1 DROP TABLE IF EXISTS  #LocalCustomer
2 GO
3 CREATE TABLE #LocalCustomer
4 (
5 CustomerId int,
6 CustomerName varchar(50),
7 CustomerAdress varchar(150)
8 )
For the global temporary tables:
1 DROP TABLE IF EXISTS  ##GlobalCustomer
2 GO
3 CREATE TABLE ##GlobalCustomer
4 (
5 CustomerId int,
6 CustomerName varchar(50),
7 CustomerAdress varchar(150)
8 )
In the following table, we can see all the methods that we have mentioned briefly:
How to drop temporary tables

1 IF OBJECT_ID(N'tempdb..#TempTableName') IS NOT NULL


2 BEGIN
3 DROP TABLE #TempTableName
4 END
5 GO
6  
7 CREATE TABLE #TempTableName
8 (
9 Col1 VARCHAR(100)
Using OBJECT_ID function 10 )

IF EXISTS(SELECT [name] FROM tempdb.sys.tables WHERE


1
[name] like '#TempTableName%')
2
BEGIN
3
   DROP TABLE #TempTableName;
4
END;
5
 
6
CREATE TABLE #TempTableName
7
(
8
Col1 VARCHAR(100)
9
Using sys.tables )

1 DROP TABLE IF EXISTS  #TempTableName


2 GO
3 CREATE TABLE #TempTableName
4 (
Using 5 Col1 VARCHAR(100)
DROP TABLE IF EXISTS statement 6 )

Conclusion
In this article, we learned the basics of the temporary tables, and we discussed dropping the temp table
techniques in SQL Server. According to my thought, the best way is using the DROP TABLE IF
EXISTS statement, but we can use other alternative methods easily.

How to drop temp tables in SQL Server


March 23, 2020 by Esat Erkec

Temporary tables, also known as temp tables, are widely used by the database administrators and
developers. However, it may be necessary to drop the temp table before creating it. It is a common
practice to check whether the temporary table exists or not exists. So, we can eliminate the “There is
already an object named ‘#temptablename’ in the database” error during the temporary table creation.

Temporary Tables
The temporary tables are used to store data for an amount of time in SQL Server. Many features of the
temporary tables are similar to the persisted tables. Such as, we can create indexes, statistics, and
constraints for these tables like we do for persisted tables.
The types of temporary tables affect the life-cycle of the temporary tables. Now, we will take a glance at
them.

Types of the Temporary Tables


Local Temporary Tables: The name of this type of temporary table starts with a single “#” hashtag
symbol, and they are solely visible on the created session. If the session which has created the local
temporary table is closed, the temporary table will be dropped automatically by SQL Server.
The following query will create a local temporary table:
1 CREATE TABLE #LocalCustomer
2 (
3 CustomerId int,
4 CustomerName varchar(50),
5 CustomerAdress varchar(150)
6 )
7 GO
8 INSERT INTO #LocalCustomer VALUES(1,'Katelyn Montropx' ,'30  Crescent Avenue DRUMMUIR CASTLE')
9 GO
10 SELECT * FROM #LocalCustomer

Global Temporary Tables: The name of this type of temporary table starts with a double “##” hashtag
symbol and can be accessed from all other connections. This is the major difference between the local
and global temporary tables. If the session where the global temporary table was created is closed, the
global temporary table will be dropped automatically.
The following query will create a global temporary table:
1 CREATE TABLE ##GlobalCustomer
2 (
3 CustomerId int,
4 CustomerName varchar(50),
5 CustomerAdress varchar(150)
6 )
7 GO
8 INSERT INTO ##GlobalCustomer VALUES(1,'Adam Tottropx' ,'30  Mztom Street LONDON')
9 GO
10 SELECT * FROM ##GlobalCustomer
The following table expresses the main differences between global and local temporary tables:
Local Temporary Tables Global Temporary Tables

Names start with a double “##” hashtag


Names start with a single “#” hashtag symbol. symbol.

Tables can be accessed only from the session where the table was
created. Tables can be accessed from all other sessions.

Cannot be dropped by the other connections. Can be dropped by the other connections.

Where are the Temporary Tables stored?


When we create a temporary table, they are created in the tempdb database. After creating a local
temporary table, if we check the temporary tables folder in tempdb, we will see a weird table name. On
the other hand, global temporary tables are created with their original names.

SQL Server adds random numbers at the end of the local table variables names. The idea behind this
logic is pretty simple. More than one different connection can create local temporary tables with the
same name, so SQL Server automatically adds a random number at the end of this type of temporary
table name. In this way, the SQL Server avoids the same name conflicts.

There is no doubt that after these learnings, if we want to drop any temp table, we should work on
the tempdb database.

How to drop Temp Tables?


As the best practices before creating a temporary table, we should check the existence of this temporary
table. In this way, we don’t experience the following error:

To achieve this check, we can use different techniques. Let us learn these techniques:
Using OBJECT_ID function to check temporary table existence
OBJECT_ID function is used to obtain the identification number of the database
object. OBJECT_ID function can take the object’s name as a parameter so we can use this function to
check the existence of any object in the particular database.
The following query will check the #LocalCustomer table existence in the tempdb database, and if it
exists, it will be dropped.
For the local temporary tables:
1 IF OBJECT_ID(N'tempdb..#LocalCustomer') IS NOT NULL
2 BEGIN
3 DROP TABLE #LocalCustomer
4 END
5 GO
6  
7 CREATE TABLE #LocalCustomer
8 (
9 CustomerId int,
10 CustomerName varchar(50),
11 CustomerAdress varchar(150)
12 )
For the global temporary tables:
1 IF OBJECT_ID(N'tempdb..##GlobalCustomer') IS NOT NULL
2 BEGIN
3 DROP TABLE ##GlobalCustomer
4 END
5 GO
6  
7 CREATE TABLE ##GlobalCustomer
8 (
9 CustomerId int,
10 CustomerName varchar(50),
11 CustomerAdress varchar(150)
12 )

Using sys.tables table to check temporary table existence


In this method, we will check the existence of the temporary table with the help of
the sys.tables because this table returns user tables in the relevant database.
For the local temporary tables:
1 IF EXISTS(SELECT [name] FROM tempdb.sys.tables WHERE [name] like '#LocalCustomer%')
2 BEGIN
3    DROP TABLE #LocalCustomer;
4 END;
5  
6 CREATE TABLE #LocalCustomer
7 (
8 CustomerId int,
9 CustomerName varchar(50),
10 CustomerAdress varchar(150)
11 )
For the global temporary tables:
1 IF EXISTS(SELECT [name] FROM tempdb.sys.tables WHERE [name] like '##GlobalCustomer%')
2 BEGIN
3    DROP TABLE ##GlobalCustomer ;
4 END;
5  
6 CREATE TABLE ##GlobalCustomer
7 (
8 CustomerId int,
9 CustomerName varchar(50),
10 CustomerAdress varchar(150)
11 )
As we can see, we check the existence of the #LocalCustomer table in the tempdb database, and if it
exists, we have to drop it. At this point, we need to underline one issue, the table name is searched with
the LIKE operator, and we also added the wildcard character at the end of the temp table name. As we
stated, local temp tables are created with random suffix so that we can not know the exact name of
them.

Using DROP TABLE IF EXISTS statement


This is the last technique on how to drop a temp table, which we will learn. DROP TABLE
IF EXISTS statement checks the existence of the table, and if the table exists, it drops. We have to
underline one point about this statement; it works on SQL Server 2016 or the higher version of the SQL
Server. In the following query, DROP TABLE IF EXISTS statement, we will check
the #LocalCustomer table existence, and if it exists, it will be dropped.
For the local temporary tables:
1 DROP TABLE IF EXISTS  #LocalCustomer
2 GO
3 CREATE TABLE #LocalCustomer
4 (
5 CustomerId int,
6 CustomerName varchar(50),
7 CustomerAdress varchar(150)
8 )
For the global temporary tables:
1 DROP TABLE IF EXISTS  ##GlobalCustomer
2 GO
3 CREATE TABLE ##GlobalCustomer
4 (
5 CustomerId int,
6 CustomerName varchar(50),
7 CustomerAdress varchar(150)
8 )
In the following table, we can see all the methods that we have mentioned briefly:
How to drop temporary tables

IF OBJECT_ID(N'tempdb..#TempTableName')
1
IS NOT NULL
2
BEGIN
3
DROP TABLE #TempTableName
4
END
5
GO
6
 
7
CREATE TABLE #TempTableName
8
(
9
Col1 VARCHAR(100)
10
Using OBJECT_ID function )

IF EXISTS(SELECT [name] FROM


1 tempdb.sys.tables WHERE [name] like
2 '#TempTableName%')
3 BEGIN
4    DROP TABLE #TempTableName;
5 END;
6  
7 CREATE TABLE #TempTableName
8 (
9 Col1 VARCHAR(100)
Using sys.tables )

1 DROP TABLE IF EXISTS  #TempTableName


2 GO
3 CREATE TABLE #TempTableName
4 (
Using 5 Col1 VARCHAR(100)
DROP TABLE IF EXISTS statement 6 )
Conclusion
Database table partitioning in SQL Server
April 4, 2014 by Milica Medic

What is a database table partitioning?


Partitioning is the database process where very large tables are divided into multiple smaller parts. By
splitting a large table into smaller, individual tables, queries that access only a fraction of the data can
run faster because there is less data to scan. The main of goal of partitioning is to aid in maintenance of
large tables and to reduce the overall response time to read and load data for particular SQL operations.

Vertical Partitioning on SQL Server tables


Vertical table partitioning is mostly used to increase SQL Server performance especially in cases where a
query retrieves all columns from a table that contains a number of very wide text or BLOB columns. In
this case to reduce access times the BLOB columns can be split to its own table. Another example is to
restrict access to sensitive data e.g. passwords, salary information etc. Vertical partitioning splits a table
into two or more tables containing different columns:

An example of vertical partitioning


An example for vertical partitioning can be a large table with reports for employees containing basic
information, such as report name, id, number of report and a large column with report description.
Assuming that ~95% of users are searching on the part of the report name, number, etc. and that only
~5% of requests are opening the reports description field and looking to the description. Let’s assume
that all those searches will lead to the clustered index scans and since the index scan reads all rows in the
table the cost of the query is proportional to the total number of rows in the table and our goal is to
minimize the number of IO operations and reduce the cost of the search.
Let’s see the example on the EmployeeReports table:
1 CREATE TABLE EmployeeReports
2 (
3 ReportID int IDENTITY (1,1) NOT NULL,
4 ReportName varchar (100),
5 ReportNumber varchar (20),
6 ReportDescription varchar (max)
7 CONSTRAINT EReport_PK PRIMARY KEY CLUSTERED (ReportID)
8 )
9  
10 DECLARE @i int
11 SET @i = 1
12  
13 BEGIN TRAN
14 WHILE @i&lt;100000
15 BEGIN
16 INSERT INTO EmployeeReports
17 (
18 ReportName,
19 ReportNumber,
20 ReportDescription
21 )
22 VALUES
23 (
24 'ReportName',
25 CONVERT (varchar (20), @i),
26 REPLICATE ('Report', 1000)
27 )
28 SET @i=@i+1
29 END
30 COMMIT TRAN
31 GO
If we run a SQL query to pull ReportID, ReportName, ReportNumber data from
the EmployeeReports table the result set that a scan count is 5 and represents a number of times that
the table was accessed during the query, and that we had 113,288 logical reads that represent the total
number of page accesses needed to process the query:
1 SET STATISTICS IO ON
2 SET STATISTICS TIME ON
3 SELECT er.ReportID, er.ReportName, er.ReportNumber
4 FROM dbo.EmployeeReports er
5 WHERE er.ReportNumber LIKE '%33%'
6 SET STATISTICS IO OFF
7 SET STATISTICS TIME OFF

As indicated, every page is read from the data cache, whether or not it was necessary to bring that page
from disk into the cache for any given read. To reduce the cost of the query we will change the SQL
Server database schema and split the EmployeeReports table vertically.
Next we’ll create the ReportsDesc table and move the large ReportDescription column, and
the ReportsData table and move all data from the EmployeeReports table except
the ReportDescription column:
1 CREATE TABLE ReportsDesc
2 ( ReportID int FOREIGN KEY REFERENCES EmployeeReports (ReportID),
3   ReportDescription varchar(max)
4   CONSTRAINT PK_ReportDesc PRIMARY KEY CLUSTERED (ReportID)
5 )
6  
7 CREATE TABLE ReportsData
8 (
9 ReportID int NOT NULL,
10 ReportName varchar (100),
11 ReportNumber varchar (20),
12  
13 CONSTRAINT DReport_PK PRIMARY KEY CLUSTERED (ReportID)
14 )
15 INSERT INTO dbo.ReportsData
16 (
17     ReportID,
18     ReportName,
19     ReportNumber
20 )
21 SELECT er.ReportID,
22 er.ReportName,
23 er.ReportNumber
24 FROM dbo.EmployeeReports er
The same search query will now give different results:
1 SET STATISTICS IO ON
2 SET STATISTICS TIME ON
3 SELECT er.ReportID, er.ReportName, er.ReportNumber
4 FROM ReportsData er
5 WHERE er.ReportNumber LIKE '%33%'
6 SET STATISTICS IO OFF
7 SET STATISTICS TIME OFF

Vertical partitioning on SQL Server tables may not be the right method in every case. However, if you
have, for example, a table with a lot of data that is not accessed equally, tables with data you want to
restrict access to, or scans that return a lot of data, vertical partitioning can help.

Horizontal Partitioning on SQL Server tables


Horizontal partitioning divides a table into multiple tables that contain the same number of columns, but
fewer rows. For example, if a table contains a large number of rows that represent monthly reports it
could be partitioned horizontally into tables by years, with each table representing all monthly reports for
a specific year. This way queries requiring data for a specific year will only reference the appropriate
table. Tables should be partitioned in a way that queries reference as few tables as possible.

Tables are horizontally partitioned based on a column which will be used for partitioning and the ranges
associated to each partition. Partitioning column is usually a datetime column but all data types that are
valid for use as index columns can be used as a partitioning column, except a timestamp column. The
ntext, text, image, xml, varchar(max), nvarchar(max), or varbinary(max), Microsoft .NET Framework
common language runtime (CLR) user-defined type, and alias data type columns cannot be specified.
There are two different approaches we could use to accomplish table partitioning. The first is to create a
new partitioned table and then simply copy the data from your existing table into the new table and do a
table rename. The second approach is to partition an existing table by rebuilding or creating a clustered
index on the table.

An example of horizontal partitioning with creating a


new partitioned table
SQL Server 2005 introduced a built-in partitioning feature to horizontally partition a table with up to
1000 partitions in SQL Server 2008, and 15000 partitions in SQL Server 2012, and the data placement is
handled automatically by SQL Server. This feature is available only in the Enterprise Edition of SQL Server.
To create a partitioned table for storing monthly reports we will first create additional filegroups. A
filegroup is a logical storage unit. Every database has a primary filegroup that contains the primary data
file (.mdf). An additional, user-defined, filegrups can be created to contain secondary files (.ndf). We will
create 12 filegroups for every month:
1 ALTER DATABASE PartitioningDB
2 ADD FILEGROUP January
3 GO
4 ALTER DATABASE PartitioningDB
5 ADD FILEGROUP February
6 GO
7 ALTER DATABASE PartitioningDB
8 ADD FILEGROUP March
9 GO
10 ALTER DATABASE PartitioningDB
11 ADD FILEGROUP April
12 GO
13 ALTER DATABASE PartitioningDB
14 ADD FILEGROUP May
15 GO
16 ALTER DATABASE PartitioningDB
17 ADD FILEGROUP June
18 GO
19 ALTER DATABASE PartitioningDB
20 ADD FILEGROUP July
21 GO
22 ALTER DATABASE PartitioningDB
23 ADD FILEGROUP Avgust
24 GO
25 ALTER DATABASE PartitioningDB
26 ADD FILEGROUP September
27 GO
28 ALTER DATABASE PartitioningDB
29 ADD FILEGROUP October
30 GO
31 ALTER DATABASE PartitioningDB
32 ADD FILEGROUP November
33 GO
34 ALTER DATABASE PartitioningDB
35 ADD FILEGROUP December
36 GO
To check created and available file groups in the current database run the following query:
1 SELECT name AS AvailableFilegroups
2 FROM sys.filegroups
3 WHERE type = 'FG'

When filegrups are created we will add .ndf file to every filegroup:
1 ALTER DATABASE [PartitioningDB]
2     ADD FILE
3     (
4     NAME = [PartJan],
5     FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL11.LENOVO\MSSQL\DATA\PartitioningDB.ndf',
6         SIZE = 3072 KB,
7         MAXSIZE = UNLIMITED,
8         FILEGROWTH = 1024 KB
9     ) TO FILEGROUP [January]
10  
The same way files to all created filegroups with specifying the logical name of the file and the operating
system (physical) file name for each filegroup e.g.:
1 ALTER DATABASE [PartitioningDB]
2     ADD FILE
3     (
4     NAME = [PartFeb],
5     FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL11.LENOVO\MSSQL\DATA\PartitioningDB2.ndf',
6         SIZE = 3072 KB,
7         MAXSIZE = UNLIMITED,
8         FILEGROWTH = 1024 KB
9     ) TO FILEGROUP [February]
To check files created added to the filegroups run the following query:
1 SELECT
2 name as [FileName],
3 physical_name as [FilePath]
4 FROM sys.database_files
5 where type_desc = 'ROWS'
6 GO

After creating additional filegroups for storing data we’ll create a partition function. A partition function
is a function that maps the rows of a partitioned table into partitions based on the values of a
partitioning column. In this example we will create a partitioning function that partitions a table into 12
partitions, one for each month of a year’s worth of values in a datetime column:
1 CREATE PARTITION FUNCTION [PartitioningByMonth] (datetime)
2 AS RANGE RIGHT FOR VALUES ('20140201', '20140301', '20140401',
3                '20140501', '20140601', '20140701', '20140801',
4                '20140901', '20141001', '20141101', '20141201');
To map the partitions of a partitioned table to filegroups and determine the number and domain of the
partitions of a partitioned table we will create a partition scheme:
1 CREATE PARTITION SCHEME PartitionBymonth
2 AS PARTITION PartitioningBymonth
3 TO (January, February, March,
4     April, May, June, July,
5     Avgust, September, October,
6     November, December);
Now we’re going to create the table using the PartitionBymonth partition scheme, and fill it with the
test data:
1 CREATE TABLE Reports
2 (ReportDate datetime PRIMARY KEY,
3 MonthlyReport varchar(max))
4 ON PartitionBymonth (ReportDate);
5 GO
6  
7 INSERT INTO Reports (ReportDate,MonthlyReport)
8 SELECT '20140105', 'ReportJanuary' UNION ALL
9 SELECT '20140205', 'ReportFebryary' UNION ALL
10 SELECT '20140308', 'ReportMarch' UNION ALL
11 SELECT '20140409', 'ReportApril' UNION ALL
12 SELECT '20140509', 'ReportMay' UNION ALL
13 SELECT '20140609', 'ReportJune' UNION ALL
14 SELECT '20140709', 'ReportJuly' UNION ALL
15 SELECT '20140809', 'ReportAugust' UNION ALL
16 SELECT '20140909', 'ReportSeptember' UNION ALL
17 SELECT '20141009', 'ReportOctober' UNION ALL
18 SELECT '20141109', 'ReportNovember' UNION ALL
19 SELECT '20141209', 'ReportDecember'
We will now verify the rows in the different partitions:
1 SELECT
2 p.partition_number AS PartitionNumber,
3 f.name AS PartitionFilegroup,
4 p.rows AS NumberOfRows
5 FROM sys.partitions p
6 JOIN sys.destination_data_spaces dds ON p.partition_number = dds.destination_id
7 JOIN sys.filegroups f ON dds.data_space_id = f.data_space_id
8 WHERE OBJECT_NAME(OBJECT_ID) = 'Reports'

Now just copy data from your table and rename a partitioned table.

Partitioning a table using the SQL Server


Management Studio Partitioning wizard
SQL Server 2008 introduced a table partitioning wizard in SQL Server Management Studio.
Right click on a table in the Object Explorer pane and in the Storage context menu choose the Create
Partition command:

In the Select a Partitioning Column window, select a column which will be used to partition a table
from available partitioning columns:
Other options in the Create Partition Wizard dialog include the Collocate this table to the selected
partition table option used to display related data to join with the partitioned column and the Storage
Align Non Unique Indexes and Unique Indexes with an Indexed Partition Column option that aligns
all indexes of the partitioned table with the same partition scheme.
After selecting a column for partitioning click the Next button. In the Select a Partition
Function window enter the name of a partition function to map the rows of the table or index into
partitions based on the values of the ReportDate column, or choose the existing partition function:

Click the Next button and in the Select a Partition Scheme window create the partition scheme to map
the partitions of the MonthlyReport table to different filegroups:
Click the Next button and in the Map Partitions window choose the rage of partitioning and select the
available filegroups and the range boundary. The Left boundary is based on Value <= Boundary and the
Right boundary is based on Value < Boundary.

By clicking the Set boundaries button you can customize the date range and set the start and the end
date for each partition:
The Estimate storage option determines the Rowcount, the Required space, and the Available space
columns that displays an estimate on required space and available space based on number of records in
the table.
The next screen of the wizard offers to choose the option to whether to execute the script immediately
by the wizard to create objects and a partition table, or to create a script and save it. A schedule for
executing the script to perform the operations automatically can also be specified:

The next screen of the wizard shows a review of selections made in the wizard:
Click the Finish button to complete the process:

References
 Partitioning
 Partitioned Tables and Indexes
 Files and Filegroups Architecture

See more
Seamlessly integrate a powerful, SQL formatter into SSMS and/or Visual Studio with ApexSQL Refactor.
ApexSQL Refactor is a SQL query formatter but it can also obfuscate SQL, refactor objects, safely rename
objects and more – with nearly 200 customizable options

SQL Server stored procedures for beginners


July 29, 2019 by Ranga Babu

In this article, we will learn how to create stored procedures in SQL Server with different examples.
SQL Server stored procedure is a batch of statements grouped as a logical unit and stored in the
database. The stored procedure accepts the parameters and executes the T-SQL statements in the
procedure, returns the result set if any.
To understand differences between functions and stored procedures in SQL Server, you can refer to this
article, Functions vs stored procedures in SQL Server and to learn about Partial stored procedures in SQL
Server, click Partial stored procedures in SQL Server.

Benefits of using a stored procedure


It can be easily modified: We can easily modify the code inside the stored procedure without the need
to restart or deploying the application. For example, If the T-SQL queries are written in the application
and if we need to change the logic, we must change the code in the application and re-deploy it. SQL
Server Stored procedures eliminate such challenges by storing the code in the database. so, when we
want to change the logic inside the procedure we can just do it by simple ALTER PROCEDURE statement.
Reduced network traffic: When we use stored procedures instead of writing T-SQL queries at the
application level, only the procedure name is passed over the network instead of the whole T-SQL code.
Reusable: Stored procedures can be executed by multiple users or multiple client applications without
the need of writing the code again.
Security: Stored procedures reduce the threat by eliminating direct access to the tables. we can also
encrypt the stored procedures while creating them so that source code inside the stored procedure is not
visible. Use third-party tools like ApexSQL Decrypt to decrypt the encrypted stored procedures.
Performance: The SQL Server stored procedure when executed for the first time creates a plan and
stores it in the buffer pool so that the plan can be reused when it executes next time.
I am creating sample tables that will be used in the examples in this article.
1 CREATE TABLE Product
2 (ProductID INT, ProductName VARCHAR(100) )
3 GO
4  
5 CREATE TABLE ProductDescription
6 (ProductID INT, ProductDescription VARCHAR(800) )
7 GO
8  
9 INSERT INTO Product VALUES (680,'HL Road Frame - Black, 58')
10 ,(706,'HL Road Frame - Red, 58')
11 ,(707,'Sport-100 Helmet, Red')
12 GO
13  
14 INSERT INTO ProductDescription VALUES (680,'Replacement mountain wheel for entry-level rider.')
15 ,(706,'Sturdy alloy features a quick-release hub.')
16 ,(707,'Aerodynamic rims for smooth riding.')
17 GO

Creating a simple stored procedure


We will create a simple stored procedure that joins two tables and returns the result set as shown in the
following example.
1 CREATE PROCEDURE GetProductDesc
2 AS
3 BEGIN
4 SET NOCOUNT ON
5  
6 SELECT P.ProductID,P.ProductName,PD.ProductDescription  FROM
7 Product P
8 INNER JOIN ProductDescription PD ON P.ProductID=PD.ProductID
9  
10 END
We can use ‘EXEC ProcedureName’ to execute stored procedures. When we execute the procedure
GetProductDesc, the result set looks like below.

Creating a stored procedure with parameters


Let us create a SQL Server stored procedure that accepts the input parameters and processes the records
based on the input parameter.
Following is the example of a stored procedure that accepts the parameter.
1 CREATE PROCEDURE GetProductDesc_withparameters
2 (@PID INT)
3 AS
4 BEGIN
5 SET NOCOUNT ON
6  
7 SELECT P.ProductID,P.ProductName,PD.ProductDescription  FROM
8 Product P
9 INNER JOIN ProductDescription PD ON P.ProductID=PD.ProductID
10 WHERE P.ProductID=@PID
11  
12 END
1 EXEC GetProductDesc_withparameters 706
While executing the stored procedure we need to pass the input parameter. Please refer to the below
image for the result set.

Creating a stored procedure with default parameters


values
Following is the example of a stored procedure with default parameter values.
1 CREATE PROCEDURE GetProductDesc_withDefaultparameters
2 (@PID INT =706)
3 AS
4 BEGIN
5 SET NOCOUNT ON
6  
7 SELECT P.ProductID,P.ProductName,PD.ProductDescription  FROM
8 Product P
9 INNER JOIN ProductDescription PD ON P.ProductID=PD.ProductID
10 WHERE P.ProductID=@PID
11  
12 END
When we execute the above procedure without passing the parameter value, the default value 706 will
be used. But when executed passing the value, the default value will be ignored and the passed value will
be considered as a parameter.
Creating a stored procedure with an output
parameter
Below is the example of a stored procedure with an output parameter. The following example retrieves
the EmpID which is an auto identity column when a new employee is inserted.
1 CREATE TABLE Employee (EmpID int identity(1,1),EmpName varchar(500))
1 CREATE PROCEDURE ins_NewEmp_with_outputparamaters
2 (@Ename varchar(50),
3 @EId int output)
4 AS
5 BEGIN
6 SET NOCOUNT ON
7  
8 INSERT INTO Employee (EmpName) VALUES (@Ename)
9  
10 SELECT @EId= SCOPE_IDENTITY()
11  
12 END
Executing the stored procedures with output parameters is bit different. We must declare the variable to
store the value returned by the output parameter.
1 declare @EmpID INT
2  
3 EXEC ins_NewEmp_with_outputparamaters 'Andrew', @EmpID OUTPUT
4  
5 SELECT @EmpID
Creating an encrypted stored procedure
We can hide the source code in the stored procedure by creating the procedure with the “ENCRYPTION”
option.
Following is the example of an encrypted stored procedure.
1 CREATE PROCEDURE GetEmployees
2 WITH ENCRYPTION
3 AS
4 BEGIN
5 SET NOCOUNT ON
6  
7 SELECT EmpID,EmpName from Employee
8 END
When we try to view the code of the SQL Server stored procedure using sp_helptext, it returns “The text
for object ‘GetEmployees’ is encrypted.”

When you try to script the encrypted stored procedure from SQL Server management studio, it throws an
error as below.

Creating a temporary procedure


Like the temporary table, we can create temporary procedures as well. There are two types of temporary
procedures, one is a local temporary stored procedure and another one is a global temporary procedure.
These procedures are created in the tempdb database.
Local temporary SQL Server stored procedures: These are created with # as prefix and can be accessed
only in the session where it created. This procedure is automatically dropped when the connection is
closed.
Following is the example of creating a local temporary procedure.
1 CREATE PROCEDURE #Temp
2 AS
3 BEGIN
4 PRINT 'Local temp procedure'
5 END
Global temporary SQL Server stored procedure: These procedures are created with ## as prefix and
can be accessed on the other sessions as well. This procedure is automatically dropped when the
connection which is used to create the procedure is closed.
Below is the example of creating a global temporary procedure.
1 CREATE PROCEDURE ##TEMP
2 AS
3 BEGIN
4 PRINT 'Global temp procedure'
5 END

Modifying the stored procedure


Use the ALTER PROCEDURE statement to modify the existing stored procedure. Following is the
example of modifying the existing procedure.
1 ALTER PROCEDURE GetProductDesc
2 AS
3 BEGIN
4 SET NOCOUNT ON
5  
6 SELECT P.ProductID,P.ProductName,PD.ProductDescription  FROM
7 Product P
8 INNER JOIN ProductDescription PD ON P.ProductID=PD.ProductID
9  
10 END

Renaming the stored procedure


To rename a stored procedure using T-SQL, use system stored procedure sp_rename. Following is the
example that renames the procedure “GetProductDesc” to a new name “GetProductDesc_new”.
1 sp_rename 'GetProductDesc','GetProductDesc_new'

Conclusion
In this article, we explored SQL Server stored procedures with different examples. In case you have any
questions, please feel free to ask in the comment section below

SQL Server stored procedures for beginners


July 29, 2019 by Ranga Babu

In this article, we will learn how to create stored procedures in SQL Server with different examples.
SQL Server stored procedure is a batch of statements grouped as a logical unit and stored in the
database. The stored procedure accepts the parameters and executes the T-SQL statements in the
procedure, returns the result set if any.
To understand differences between functions and stored procedures in SQL Server, you can refer to this
article, Functions vs stored procedures in SQL Server and to learn about Partial stored procedures in SQL
Server, click Partial stored procedures in SQL Server.

Benefits of using a stored procedure


It can be easily modified: We can easily modify the code inside the stored procedure without the need
to restart or deploying the application. For example, If the T-SQL queries are written in the application
and if we need to change the logic, we must change the code in the application and re-deploy it. SQL
Server Stored procedures eliminate such challenges by storing the code in the database. so, when we
want to change the logic inside the procedure we can just do it by simple ALTER PROCEDURE statement.
Reduced network traffic: When we use stored procedures instead of writing T-SQL queries at the
application level, only the procedure name is passed over the network instead of the whole T-SQL code.
Reusable: Stored procedures can be executed by multiple users or multiple client applications without
the need of writing the code again.
Security: Stored procedures reduce the threat by eliminating direct access to the tables. we can also
encrypt the stored procedures while creating them so that source code inside the stored procedure is not
visible. Use third-party tools like ApexSQL Decrypt to decrypt the encrypted stored procedures.
Performance: The SQL Server stored procedure when executed for the first time creates a plan and
stores it in the buffer pool so that the plan can be reused when it executes next time.
I am creating sample tables that will be used in the examples in this article.
1 CREATE TABLE Product
2 (ProductID INT, ProductName VARCHAR(100) )
3 GO
4  
5 CREATE TABLE ProductDescription
6 (ProductID INT, ProductDescription VARCHAR(800) )
7 GO
8  
9 INSERT INTO Product VALUES (680,'HL Road Frame - Black, 58')
10 ,(706,'HL Road Frame - Red, 58')
11 ,(707,'Sport-100 Helmet, Red')
12 GO
13  
14 INSERT INTO ProductDescription VALUES (680,'Replacement mountain wheel for entry-level rider.')
15 ,(706,'Sturdy alloy features a quick-release hub.')
16 ,(707,'Aerodynamic rims for smooth riding.')
17 GO

Creating a simple stored procedure


We will create a simple stored procedure that joins two tables and returns the result set as shown in the
following example.
1 CREATE PROCEDURE GetProductDesc
2 AS
3 BEGIN
4 SET NOCOUNT ON
5  
6 SELECT P.ProductID,P.ProductName,PD.ProductDescription  FROM
7 Product P
8 INNER JOIN ProductDescription PD ON P.ProductID=PD.ProductID
9  
10 END
We can use ‘EXEC ProcedureName’ to execute stored procedures. When we execute the procedure
GetProductDesc, the result set looks like below.

Creating a stored procedure with parameters


Let us create a SQL Server stored procedure that accepts the input parameters and processes the records
based on the input parameter.
Following is the example of a stored procedure that accepts the parameter.
1 CREATE PROCEDURE GetProductDesc_withparameters
2 (@PID INT)
3 AS
4 BEGIN
5 SET NOCOUNT ON
6  
7 SELECT P.ProductID,P.ProductName,PD.ProductDescription  FROM
8 Product P
9 INNER JOIN ProductDescription PD ON P.ProductID=PD.ProductID
10 WHERE P.ProductID=@PID
11  
12 END
1 EXEC GetProductDesc_withparameters 706
While executing the stored procedure we need to pass the input parameter. Please refer to the below
image for the result set.
Creating a stored procedure with default parameters
values
Following is the example of a stored procedure with default parameter values.
1 CREATE PROCEDURE GetProductDesc_withDefaultparameters
2 (@PID INT =706)
3 AS
4 BEGIN
5 SET NOCOUNT ON
6  
7 SELECT P.ProductID,P.ProductName,PD.ProductDescription  FROM
8 Product P
9 INNER JOIN ProductDescription PD ON P.ProductID=PD.ProductID
10 WHERE P.ProductID=@PID
11  
12 END
When we execute the above procedure without passing the parameter value, the default value 706 will
be used. But when executed passing the value, the default value will be ignored and the passed value will
be considered as a parameter.

Creating a stored procedure with an output


parameter
Below is the example of a stored procedure with an output parameter. The following example retrieves
the EmpID which is an auto identity column when a new employee is inserted.
1 CREATE TABLE Employee (EmpID int identity(1,1),EmpName varchar(500))
1 CREATE PROCEDURE ins_NewEmp_with_outputparamaters
2 (@Ename varchar(50),
3 @EId int output)
4 AS
5 BEGIN
6 SET NOCOUNT ON
7  
8 INSERT INTO Employee (EmpName) VALUES (@Ename)
9  
10 SELECT @EId= SCOPE_IDENTITY()
11  
12 END
Executing the stored procedures with output parameters is bit different. We must declare the variable to
store the value returned by the output parameter.
1 declare @EmpID INT
2  
3 EXEC ins_NewEmp_with_outputparamaters 'Andrew', @EmpID OUTPUT
4  
5 SELECT @EmpID
Creating an encrypted stored procedure
We can hide the source code in the stored procedure by creating the procedure with the “ENCRYPTION”
option.
Following is the example of an encrypted stored procedure.
1 CREATE PROCEDURE GetEmployees
2 WITH ENCRYPTION
3 AS
4 BEGIN
5 SET NOCOUNT ON
6  
7 SELECT EmpID,EmpName from Employee
8 END
When we try to view the code of the SQL Server stored procedure using sp_helptext, it returns “The text
for object ‘GetEmployees’ is encrypted.”

When you try to script the encrypted stored procedure from SQL Server management studio, it throws an
error as below.
Creating a temporary procedure
Like the temporary table, we can create temporary procedures as well. There are two types of temporary
procedures, one is a local temporary stored procedure and another one is a global temporary procedure.
These procedures are created in the tempdb database.
Local temporary SQL Server stored procedures: These are created with # as prefix and can be accessed
only in the session where it created. This procedure is automatically dropped when the connection is
closed.
Following is the example of creating a local temporary procedure.
1 CREATE PROCEDURE #Temp
2 AS
3 BEGIN
4 PRINT 'Local temp procedure'
5 END
Global temporary SQL Server stored procedure: These procedures are created with ## as prefix and
can be accessed on the other sessions as well. This procedure is automatically dropped when the
connection which is used to create the procedure is closed.
Below is the example of creating a global temporary procedure.
1 CREATE PROCEDURE ##TEMP
2 AS
3 BEGIN
4 PRINT 'Global temp procedure'
5 END

Modifying the stored procedure


Use the ALTER PROCEDURE statement to modify the existing stored procedure. Following is the
example of modifying the existing procedure.
1 ALTER PROCEDURE GetProductDesc
2 AS
3 BEGIN
4 SET NOCOUNT ON
5  
6 SELECT P.ProductID,P.ProductName,PD.ProductDescription  FROM
7 Product P
8 INNER JOIN ProductDescription PD ON P.ProductID=PD.ProductID
9  
10 END

Renaming the stored procedure


To rename a stored procedure using T-SQL, use system stored procedure sp_rename. Following is the
example that renames the procedure “GetProductDesc” to a new name “GetProductDesc_new”.
1 sp_rename 'GetProductDesc','GetProductDesc_new'
Conclusion
In this article, we explored SQL Server stored procedures with different examples. In case you have any
questions, please feel free to ask in the comment section below.

How to implement array-like functionality in SQL


Server
January 16, 2018 by Daniel Calbimonte

Introduction
I was training some Oracle DBAs in T-SQL and they asked me how to create arrays in SQL Server.
I told them that there were no arrays in SQL Server like the ones that we have in Oracle (varray). They
were disappointed and asked me how was this problem handled.
Some developers asked me the same thing. Where are the arrays in SQL Server?
The short answer is that we use temporary tables or TVPs (Table-valued parameters) instead of arrays or
we use other functions to replace the used of arrays.
The use of temporary tables, TVPs and table variables is explained in another article:
 The tempdb database, introduction and recommendations
In this article, we will show:
 How to use a table variable instead of an array
 The function STRING_SPLIT function which will help us to replace the array functionality
 How to work with older versions of SQL Server to handle a list of values separated by commas
Requirements
1. SQL Server 2016 or later with SSMS installed
2. The Adventureworks database installed

Getting started
How to use a table variable instead of an array
In the first demo, we will show how to use a table variable instead of an array.
We will create a table variable using T-SQL:
1 DECLARE @myTableVariable TABLE (id INT, name varchar(20))
2 insert into @myTableVariable values(1,'Roberto'),(2,'Gail'),(3,'Dylan')
3 select * from @myTableVariable
We created a table variable named myTableVariable and we inserted 3 rows and then we did a select in
the table variable.
The select will show the following values:
Now, we will show information of the table Person.person of the adventureworks database that match
with the nameS of the table variable:
1 DECLARE @myTableVariable TABLE (id INT, name varchar(20))
2 insert into @myTableVariable values(1,'Roberto'),(2,'Gail'),(3,'Dylan')
3  
4 SELECT  [BusinessEntityID]
5       ,[PersonType]
6       ,[NameStyle]
7       ,[Title]
8       ,[FirstName]
9       ,[MiddleName]
10       ,[LastName]
11  
12   FROM [Adventureworks].[Person].[Person] where
13   FirstName
14   IN (Select name from @myTableVariable)
The results will display the names and information of the table Person.person with the names of Roberto,
Gail and Dylan:

Note that in SQL Server, it is better to use SQL sentences to compare values. It is more efficient. We do
not use loops (WHILE) in general because it is slower and it is not efficient.
You can use the id to retrieve values from a specific row. For example, for Roberto, the id is 1 for Dylan
the id is 3 and for Gail the id is 2.
In C# for example if you want to list the second member of an array, you should run something like this:
1 Array[1];
You use the brackets and the number 1 displays the second number of the array (the first one is 0).
In a table variable, you can use the id. If you want to list the second member (id=2) of the table variable,
you can do something like this:
1 DECLARE @myTableVariable TABLE (id INT, name varchar(20))
2 insert into @myTableVariable values(1,'Roberto'),(2,'Gail'),(3,'Dylan')
3 select * from @myTableVariable where id=2
In other words, you can use the id to get a specific member of the table variable.

The problem with table variables is that you need to insert values and it requires more code to have a
simple table with few rows.
In C# for example, to create an array, you only need to write the elements and you do not need to insert
data into the table:
1 string[] names = new string[] {"Gail","Roberto","Dylan"};
It is just a single line of code to have the array with elements. Can we do something similar in SQL
Server?
The next solution will help us determine this
The function STRING_SPLIT function
Another solution is to replace arrays with the use of the new function STRING_SPLIT. This function is
applicable in SQL Server 2016 or later versions and applicable in Azure SQL.
If you use the function in an old adventureworks database or in SQL Server 2014 or older, you may
receive an error message. The following example will try to split 3 names separated by commas:
1 SELECT value FROM STRING_SPLIT('Roberto,Gail,Dylan', ',');
A typical error message would be the following:
Msg 208, Level 16, State 1, Line 8
Invalid object name ‘STRING_SPLIT’
If you receive this error in SQL Server 2016, check your database compatibility level:
1 SELECT compatibility_level  
2 FROM sys.databases WHERE name = 'AdventureWorks';  
3 GO
If your compatibility level is lower than 130, use this T-SQL sentence to change the compatibility level:
1 ALTER DATABASE [Adventureworks] SET COMPATIBILITY_LEVEL = 130
If you do not like T-SQL, you can right click the database in SSMS and go to options and change the
compatibility level:

The T-SQL sentence will convert the values separated by commas in rows:
1 SELECT value FROM STRING_SPLIT('Roberto,Gail,Dylan', ',');
The values will be converted to rows:
In the STRING_SPLIT function, you need to specify the separator.
The following query will show the information of people in the person.person table that matches the
names used in the STRING_SPLIT function:
1 SELECT  [BusinessEntityID]
2       ,[PersonType]
3       ,[NameStyle]
4       ,[Title]
5       ,[FirstName]
6       ,[MiddleName]
7       ,[LastName]
8  
9   FROM [Adventureworks].[Person].[Person] where
10   FirstName
11   IN (SELECT value FROM STRING_SPLIT('Roberto,Gail,Dylan', ','));
The query will show information about the people with the names equal to Roberto or Gail or Dylan:

If you want to retrieve a specific member of the string, you can assign a row # to each member row of
the STRING_SPLIT. The following code shows how retrieve the information
1 WITH fakearray AS  
2 (  
3 SELECT
4   ROW_NUMBER() OVER(ORDER BY value DESC) AS ID,value FROM STRING_SPLIT('Roberto,Gail,Dylan', ',')
5 )  
6 SELECT ID, value  
7 FROM fakearray  
8 WHERE ID =3
ROW_NUMBER is used to add an id to each name. For example, Roberto has the id =1, Gail id=2 and
Dylan 3.
Once you have the query in a CTE expression, you can do a select statement and use the WHERE to
specify an ID. In this example, the query will show Dylan information (ID=3). As you can see, to retrieve a
value of a specific member of the fake array is not hard, but requires more code than a programming
language that supports arrays.
How to work with older versions of SQL Server
STRING_SPLIT is pretty helpful, but how was it handled in earlier versions?
There are many ways to solve this, but we will use the XML solution. The following example will show
how to show the values that match the results of a fake vector:
1 DECLARE @oldfakearray VARCHAR(100) = 'Roberto,Gail,Dylan';
2 DECLARE @param XML;
3  
4 SELECT @param = CAST('<i>' + REPLACE(@oldfakearray,',','</i><i>') + '</i>' AS XML)
5  
6  
7 SELECT  [BusinessEntityID]
8       ,[PersonType]
9       ,[NameStyle]
10       ,[Title]
11       ,[FirstName]
12       ,[MiddleName]
13       ,[LastName]
14  
15   FROM [Adventureworks].[Person].[Person]
16   WHERE FirstName IN
17   (SELECT x.i.value('.','NVARCHAR(100)') FROM @param.nodes('//i') x(i))
The code will do the same that the STRING_SPLIT or the table variable solution:

In the first line, we just create a new fake array named oldfakearray and assign the names in the variable:
1 DECLARE @oldfakearray VARCHAR(100) = 'Roberto,Gail,Dylan';
In the second line, we are declaring an XML variable:
1 DECLARE @param XML;
In the next line, we are removing the comma and creating a XML with the values of the oldfakearray:
1 SELECT @param = CAST('&lt;i&gt;' + REPLACE(@oldfakearray,',','&lt;/i&gt;&lt;i&gt;') + '&lt;/i&gt;' AS XML)
Finally, we are doing a select from the table Person.Person in the Adventureworks database where the
firstname is in the @param variable:
1 SELECT  [BusinessEntityID]
2       ,[PersonType]
3       ,[NameStyle]
4       ,[Title]
5       ,[FirstName]
6       ,[MiddleName]
7       ,[LastName]
8  
9   FROM [Adventureworks].[Person].[Person]
10   WHERE FirstName IN
11   (SELECT x.i.value('.','NVARCHAR(100)') FROM @param.nodes('//i') x(i))
As you can see, it is not an array, but it helps to compare a list of values with a table.

Conclusion
As you can see, SQL Server does not include arrays. But we can use table variables, temporary tables or
the STRING_SPLIT function. However, the STRING_SPLIT function is new and can be used only on SQL
Server 2016 or later versions.
If you do not have SQL Server, there were older methods to split strings separated by commas. We show
the method using XML files

SQL varchar data type deep dive


May 29, 2019 by Gauri Mahajan

In this article we’ll review the SQL varchar data type including a basic definition and overview, differences
from varchar(n), UTF-8 support, Collation, performance considerations and more.
Data plays a crucial part in any organization and an attribute by which it is defined is called its data type.
In simple words, data type states what kind of data any object, variable or expression can store. As a SQL
developer, while creating a SQL table, we have to understand and decide what type of data will be
contained by each and every column in a table. Like any other programming language, SQL also supports
a gamut of data types that can hold integer data, date and time data, character data etc. and allows you
to define data types of your own as well. SQL varchar is one of the best-known and most-used data types
among the lot. In this article, we will walk through different facets of the SQL Server varchar in the SQL
server.
Below is the outline that we will cover in this block.
1. Introduction to the SQL Server varchar data type in SQL Server
2. Use of varchar for large blocks of text
3. What is new in SQL Server 2019 preview for varchar datatype?
4. Influence of collation on varchar SQL in SQL Server
5. UTF-8 support with varchar in SQL Server 2019 CTP
6. SQL Server varchar for data conversions and data display
7. Storage and performance considerations using SQL Server varchar
8. Impact on string length of SQL varchar with CAST and CONVERT functions
Let’s move ahead and see the aforementioned in action.

So what is varchar in SQL?


As the name suggests, varchar means character data that is varying. Also known as Variable Character, it
is an indeterminate length string data type. It can hold numbers, letters and special characters. Microsoft
SQL Server 2008 (and above) can store up to 8000 characters as the maximum length of the string using
varchar data type. SQL varchar usually holds 1 byte per character and 2 more bytes for the length
information. It is recommended to use varchar as the data type when columns have variable length and
the actual data is way less than the given capacity. Let’s switch to SSMS and see how varchar works.
The following example creates three variables (name, gender and age) with varchar as the data type and
different values being assigned to them. As evident from the result sets shown below, by default, the
string length of the SQL varchar columns is 1 and it returns only the first value of the variables(rest of the
string being truncated) when no string length is passed for the varchar data type. Function len() is used
to determine the number of characters stored in the varchar column.
1 DECLARE @name AS varchar = 'john parker d''souza';  
2 DECLARE @gender AS varchar = 'M'
3 DECLARE @age AS varchar = '23'
4  
5 SELECT @name Name, @gender Gender ,@age Age
6 SELECT len(@name) namelen, len(@gender) genderlen, len(@age) agelen

How SQL varchar(max) is different from varchar(n)?


There are times where SQL developers (including myself) usually define varchar datatype without a
length, and subsequently, are failed to insert string records in the SQL table, this is because SQL Server
allocates 1 character space as the default value to the varchar column that is defined without any length.
In practical scenarios, varchar(n) is used to store variable length value as a string, here ‘n’ denotes the
string length in bytes and it can go up to 8000 characters. Now, let’s proceed further and see how we can
store SQL varchar data with a string length into the column of a SQL table. Below script creates the table
Demovarchar with some data in it. And the result screen shows records of 7 employees based on their
departments, age etc.
1 CREATE TABLE Demovarchar
2 (
3 Id int NOT NULL IDENTITY(1,1),
4 LastName varchar(10),
5 FirstName varchar(10),
6 Gender varchar,
7 DepartmentName varchar(20),
8 Age int
9 )
10 INSERT INTO Demovarchar VALUES('Gilbert', 'Kevin','M','Tool Design',33)
11 INSERT INTO Demovarchar VALUES('Tamburello', 'Andrea','F','Marketing',45)
12 INSERT INTO Demovarchar VALUES('Johnson', 'David','M','Engineering',66)
13 INSERT INTO Demovarchar VALUES('Sharma', 'Bradley','M','Production',27)
14 INSERT INTO Demovarchar VALUES('Rapier', 'Abigail','F', 'Human Resources',38)
15 INSERT INTO Demovarchar VALUES('Martin', 'Kelly','F','Information Services',54)
16 INSERT INTO Demovarchar VALUES('Poland', 'Carole','F','Production Control',29)
17 SELECT * FROM Demovarchar

Suppose, there is a new addition of an employee in the organization and we, as SQL data developers,
would have to insert this new record into the above table using INSERT SQL Statement. Below is one
such example shown.
1 INSERT INTO Demovarchar VALUES('Newton Hamilton', 'Isaac','M','Design Head',69)

Oops, SQL Server encountered an error and terminated the statement saying string or binary data would
be truncated. This has occurred because, column LastName varchar(10) can hold up to 10 characters and
here we are attempting to insert a new record with string length(‘Newton Hamilton’) which is clearly
greater than 10 characters. As a quick fix, we can alter the table and increase the data type of the SQL
varchar column, say to varchar(50) to insert the new row. Execute the below script to ALTER and INSERT a
new record into the table. Additionally, you can use LEN() and DATALENGTH() functions to determine the
number of characters and the storage size in bytes respectively that are stored in the varchar column.
1 ALTER TABLE Demovarchar
2 ALTER COLUMN LastName varchar(50)
3 INSERT INTO Demovarchar VALUES('Newton Hamilton', 'Isaac','M','Design Head',69)
4 SELECT * FROM Demovarchar
We observed above how we can set or alter the string length in the SQL varchar column to meet the
business needs. However, consider a scenario, where we are unsure of the data size that is going to be
loaded into our SQL tables, in such circumstances, inspecting and altering data type size for each and
every column is not a viable choice. One of the options to handle this could be is to set the string length
on the higher bar in the SQL Server varchar column (provided you have a rough estimation of what
length of the string column would be approximately).
An important point to keep in consideration, we can use string length up to varchar(8000) only as this is
the maximum number of characters that SQL varchar(n) data type can hold. So in cases when there are
chances that the string length of the varchar column might exceed 8000 bytes, using varchar(8001) or
anything higher will result into an error. One short example demonstrating this fact is shown below.
1 DECLARE @name AS varchar(8001) = 'john parker d''souza';  
2 SELECT @name Name

SQL Server 2005 got around this limitation of 8KB storage size and provided a workaround with
varchar(max). It is a non-Unicode large variable-length character data type and can store a maximum of
2^31-1 bytes (2 GB) of non-Unicode characters.
When I got first introduced to the concepts of varchar(n) and SQL varchar, the common question like any
other beginner I had, was why can’t we simply declare a column of data type varchar(8500) or higher,
since we have varchar(max) that takes care of storage up to 2GB and why are we supposed to either use
varchar(<=8000) or varchar(max)? I got my answers on a little research that SQL Server uses page to
store data and the size of each page is 8KB(excluding page header, row offsets size). If the data to be
stored is less than or equal to 8000 bytes, varchar(n) or varchar(max) stores it in-row. However, if the data
exceeds the 8000 byte size then it is treated as a Large Object(LOB) and they are not stored in-row but in
separate LOB pages(LOB_DATA). Row in such case will only have a pointer to the LOB data page where
the actual data is present and SQL Server automatically assigns an over-flow indicator to the page to
manipulate data rows. In nutshell, if you know the data might exceed 8000 byte, it is a better option to
use varchar(max) as the data type.
We can refer to the DMV sys.dm_db_index_physical_stats to see what kind of page allocation
(IN_ROW_DATA data/LOB_DATA/ ROW_OVERFLOW_DATA) is performed. You can also check out  this
link  in case you want detailed explanation on how SQL Server exercises row and page limits with both
varchar(n) and varchar(max) data types.
Let’s quickly jump over to SSMS and see how we can use varchar(max). Execute the following script to
insert 1 record where StringCol column value in each row is 15,000 B characters (i.e. 15,000 bytes).
1 CREATE TABLE Demovarcharmax
2     (
3       ID INT IDENTITY(1, 1) ,
4       StringCol VARCHAR(MAX)
5     )
6 INSERT  INTO Demovarcharmax(StringCol) VALUES(REPLICATE(CAST('B' AS VARCHAR(MAX)), 15000))
7 SELECT Id, StringCol,len(StringCol) AS LengthOfString FROM Demovarcharmax

One limitation of using varchar(max) is we cannot create an index that has a varchar(max) as a key
column, instead, it is advisable to do a Full-text index on that column.
A quick note to make – From here to the last leg of this article, we will mention varchar in place of
varchar(n). Do NOT consider it as the varchar with default value = 1.
To learn some more interesting differences between varchar(n) and varchar(max) in SQL Server, consider
going through this article,  Comparing VARCHAR(max) vs VARCHAR(n) data types in SQL Server.

UTF-8 support with SQL Server 2019 CTP


Before we dig in what SQL Server 2019 preview feature has to offer for SQL varchar, let’s quickly look at
one more interesting data type – ‘nvarchar’ first. Like SQL Server varchar [(n|max)], we have SQL nvarchar
[(n|max)], the prefix n in nvarchar denotes Unicode, i.e. it stores both Unicode and non-Unicode data. The
key difference between varchar and nvarchar is the way they are stored, varchar is stored as regular 8-bit
data(1 byte per character) and nvarchar stores data at 2 bytes per character. Due to this reason, nvarchar
can hold upto 4000 characters and it takes double the space as SQL varchar. You can go through  this
link  to learn more about nvarchar in SQL Server.
With the public preview of SQL Server 2019, Microsoft has announced the support for UTF-8 character
encoding to the existing data types (varchar and char). For those, who are not aware of UTF-8, it stands
for Unicode Transformation Format and is a Unicode-based encoding that supports many languages. The
8 in UTF-8 means it uses 1 byte (8-bits) to represent a character in memory. Likewise, UTF-16 uses 16 bits
(2 bytes) to represent a character. We will limit the scope of this new SQL Server 2019 CTP enhancement
to ‘SQL varchar’ only in this article.
This enhancement has the following impact in SQL Server: is
1. Improves Data compatibility
Until SQL Server 2019 CTP, SQL varchar data type had the capacity to store only Non-Unicode
data and with this preview, we can now create a varchar column to store Unicode data under
UTF-8 enabled collations (_UTF8). UTF-8 is allowed in the varchar datatypes and is enabled when
creating or changing an object’s collation to a collation with the UTF8 suffix. This helps in
minimizing character conversion issues.
2. Reduction in storage and performance improvements
UTF-8 support for varchar data type provides substantial storage savings depending on the
character set in use. For eg, using an UTF-8 enabled collation, changing the column data type
from nvarchar(20) to varchar(20) offers a significant drop in storage requirements since
nvarchar(20) requires 40 bytes for storage and varchar(20) needs 20 bytes for the same Unicode
string.
Important side note – Since this enhancement is still in preview, we can expect more progressions on this
front in the near future. However, existing Unicode (UTF-16) data types (nchar, nvarchar and ntext)
remain unchanged in SQL Server 2019 preview.
Collation with SQL varchar in SQL Server 2019 CTP
Collation in SQL Server defines configurations to determine various rules like case sensitivity, accent
sensitivity, sorting, character types and width etc. Understanding all these properties and how do they
work with your data become very important. Collation can be set at server, database, expression or
column level. UTF-8 supports database-level or column-level collation in SQL Server 2019 CTP and is
enabled when you create or change Database or column collation to a collation with UTF8 suffix.
If you execute the below query against SQL Server 2019 CTP, you will be able to see all the UTF-8
supported collations on your instance of SQL Server using function (fn_helpcollations()).
1 SELECT Name, Description
2 FROM fn_helpcollations()
3 WHERE Name like '%UTF8';

With SQL Server 2019 preview version, we can assign Unicode collations (UTF-8 supported) as well for
SQL varchar columns using the COLLATE clause while declaring the varchar column. This way, specific
collation is applied to the particular column’s data without impacting the rest of the database.
Since we are dealing with SQL Server varchar data type in this post, let’s see how Column Collation with
SQL varchar datatype works. Execute the code below to alter the SQL Server varchar Column Collation
from one collation type to _UTF8 suffix. You can read more on Database Collation from here.
CREATE TABLE demovarcharcollate
1
  (ID   int PRIMARY KEY,  
2
   Description varchar(50) COLLATE LATIN1_GENERAL_100_CI_AS_SC NOT NULL  
3
  );  
4
ALTER TABLE demovarcharcollate
5
ALTER COLUMN Description varchar(50) COLLATE LATIN1_GENERAL_100_CI_AS_SC_UTF8 NOT
6
NULL;  

Role of SQL varchar in data conversions and data


display
SQL Server varchar is widely used in displaying data in the desirable formats using Convert and Cast
functions in SQL Server. Real data deals with a mix of data types and it has to be compatible with each
other (i.e. belong to the same data type), before we make comparisons to them. SQL Server supports
both implicit and explicit conversions.
 Note:  Check out  SQL CAST and SQL CONVERT function overview  to get more information on how
we can perform these conversions for data compatibility.
With an incessant need of formatting and displaying data in the required output, SQL varchar comes
really handy. As a SQL developer myself, I find it extremely straightforward to use convert/cast with
varchar data type to make assignments or transformations on data, especially for the date fields.
I am using table FactInternetSales from Sample DB AdventureWorksDW2017 to show how this feature
works. You can refer to any table with some datetime and money/float fields for the practice purpose.
The following script converts two datetime columns to SQL varchar types with style 102 and 107 to
display the data in the format yyyy.mm.dd and Mon dd, yyyy respectively. Also, the SalesAmount column
with Money as a data type is converted to varchar and style 3 is applied to display the amount with
commas as shown in the screenshot below. Additionally, say, we would want to see data for the orders
placed in the year 2010 only, using the CAST function to convert datetime column to varchar data, the
string comparison is performed in the WHERE clause. You can also go over  SQL convert date  to find more
information on date conversion formats and styles.
1 SELECT OrderDate, CONVERT(varchar, OrderDate, 102) AS FormattedOrderDate,
2 ShipDate, CONVERT(varchar(12), ShipDate, 107) AS FormattedShipDate,
3 SalesAmount, convert(varchar,salesamount, 3) AS FormattedAmount
4 FROM FactInternetSales
5 WHERE CAST(OrderDate AS varchar) LIKE '%2010%'

Impact on string length of SQL varchar with CAST and


CONVERT functions
SQL Server stores long string data in the commonly used varchar data type and it becomes helpful to
know the expected and maximum lengths of the strings to display the results in the UI. Copy and execute
the below code, where we are passing a long string in an unspecified length varchar variable
(@demovarchar) and also in another variable with a defined varchar length (@demovarcharwithcast).
Microsoft takes 30 as the default length for SQL Varchar (with unspecified varchar length) in the SQL
Server when it is used with CAST and CONVERT functions. In our case, even though the length of the
string was 52, it returned 30 as the length as shown in the last result output.
One important point to note here is that when an unspecified length varchar field is created, the default
length of such field is 1 (shown in red color below). When varchar length is unspecified and is used with
CAST or CONVERT functions, the CAST or CONVERT returns n=30 as the default string length of this
conversion (marked in blue color below).
1 DECLARE @demovarchar varchar = 'We are learning SQL varchar in this SQLShack article'
2 DECLARE @demovarcharwithcast AS varchar(60) =  'We are learning SQL varchar in this SQLShack article'
3 SELECT DATALENGTH('We are learning SQL varchar in this SQLShack article') AS 'LenOFStringPassed'
4  
5 SELECT DATALENGTH(@demovarchar)  AS 'DefaultVarcharLength'
6  
7 SELECT DATALENGtH(CAST(@demovarcharwithcast AS varchar(60))) AS 'VarcharLengthSpecifiedWithCast'
8 SELECT DATALENGTH(CAST(@demovarcharwithcast AS varchar)) AS 'DefaultVarcharLengthWithCast'
Storage and performance considerations using SQL
varchar
Data types like varchar, char and nvarchar are all used to store string data in SQL Server. SQL varchar
stores variable string length whereas SQL char stores fixed string length. This means SQL Server varchar
holds only the characters we assign to it and char holds the maximum column space regardless of the
string it holds.
Because of the fixed field lengths, data is pulled straight from the column without doing any data
manipulation and index lookups against varchar are slower than that of char fields. CHAR is better than
VARCHAR performance wise, however, it takes unnecessary memory space when the data does not have
a fixed-length. So in cases where disk size is not an issue, it is recommended to use CHAR.
In simple words, say we have a column with varchar(150) = ‘SQLShack’ – This will take 8 bytes(sqlshack) +
2 bytes for the length information = 10 bytes in actual and for column with char(150) = ‘SQLShack’ – This
will consume whole 150 bytes on disk, regardless of what we pass as a string. The below example shows
how CHAR uses the maximum allotted space (150) to fit in the string passed and how varchar column
uses only the needed space.
1 DECLARE @demochar CHAR(150) = 'This is the char value'
2 DECLARE @demovarchar VARCHAR(150) = 'This is the varchar value'
3  
4 SELECT 'Starting ' + @demochar + ' finishing' AS 'CHAR DATA'
5 SELECT 'Starting ' + @demovarchar + ' finishing' AS 'VARCHAR DATA'

Bottom line is to use the data type that fits our need. You can use SQL varchar when the sizes of the
column vary considerably, use varchar(max) when there are chances that string length might exceed 8000
bytes, use char when the sizes of the column are fixed and use nvarchar if there is a requirement to store
Unicode or multilingual data.

Conclusion
Data types play a fundamental role in database design but they are often overlooked. A good
understanding and accurate use of data types ensure correct nature and length of data is populated in
the tables. The intention of this tip is to help you gain an understanding of basic characteristics and
features of SQL Server varchar along with its performance and storage aspects in SQL Server. We also
covered recent advancements in SQL varchar in the SQL Server 2019 Preview.

How to identify slow running queries in SQL


Server
May 31, 2017 by Musab Umair

Overview
Slow running queries are one of the most common problems in every organization dealing with huge
amounts of data. And the most challenging problem, in almost all the clients, I work with, is how to find
the queries running slow and figuring out what is the actual cause behind the performance problem.
Thankfully, the solution, in most cases, is simple.
I always suggest spending most of the time on figuring out the actual cause behind the problem, not on
thinking about the potential solutions which might exist.
Fortunately, there are some tools and techniques which a Developer or DBA should always use (at least)
to have a fair idea about the queries running slow.
Before going into the details, I would like to mention here that the tools and techniques I will mention
here will be for SQL Developers who do not have expert knowledge of database administration and for
Database Administrators who are at the start of their career.
Note: I will be using SQL Server 2016 for my test cases in this article. If you have any prior version, then the
Query Store is not available for you but all the other tools will still work.

Tools & techniques


Like every kind of work in the world, there are special tools required. The process of identifying slow
running queries is no different. In this article, I will only mention tools that already exist on your system
or you can download them free. I would not say that commercial tools do not help, in fact, in large
organizations, I would strongly recommend those to save time performing deep dives into each server
for highlighting slow running queries. But for the purposes of this article, we want to roll up our sleeves
and learn the fundamentals using the existing tools we already have at our disposal.
The first tool which I will mention here is abuilt-in tool for SQL Server Management Studio;“Activity
Monitor”. You can view this by Right Clicking on Instance Name in SQL Server Management Studio and
selecting “Activity Monitor”.
Activity monitor tells you what the current and recent activities are in your SQL Server Instance.
The above screenshot displays an overview window for the Activity Monitor. This screen will show you
the graphs for Processor Times, Waiting Tasks, and Batch Requests. Generally, the lower the number of
counts the better the performance. In large organizations with huge load there might be huge number of
batch requests with high processor times but that does not necessarily indicate a performance problem.
After the overview, you need to focus on Processes which gives you the access to view all the processes
running in your instance and having a deeper look for how many processes are waiting, blocking or
blocked. This way you can get the idea if you have queries running slow because of any specific wait or if
the queries taking time are being blocked by other processes. In this view, you can Right Click on any
process and click on Details to view the actual TSQL running for that session.
The queries which are being blocked are those which are actually suspended because of any other
process working on the resources the process depends upon. So, if you find queries which are being
blocked by other processes then simply check for the root blocker which is causing all the blocking by
looking at the Blocked By column. Try to consider just that query, not all the processes which are blocked.
And the queries which are waiting for any specific resource, gives you information about the Wait
Resource so you can check for the Wait Type and try to figure out the solution for that problem. Some of
the most common wait stats are listed in a comprehensive SQL Shack’s SQL Server wait types section. Go
through that in details for further actions.
The Active Expensive and Recent Expensive queries will give you information about the queries which
have high CPU, Logical Reads or High Elapsed time.
You can go to each section for Current or Recent expensive queries. Sort them by Elapsed time, Logical
Read and CPU Time one by one and check the execution plan. In the execution plan you will be able to
find out why these expensive queries were taking an inordinate amount of time so that you can take
appropriate actions to resolve them. I will let you know how to go through the SQL Server Query
Execution Plan later in this article so stay tuned.

The next tool is the “Query Store”. This is helpful and could save your life in situation where you were
called in the middle of the night to check why SQL Server was slow 1 hour earlier.
Generally, prior to SQL Server 2016, without any third-party application or custom solutions, you are not
able to look at the history of query execution. So, the Query Store provides a great deal of value added
functionality in this regard. Ed Pollack wrote about Query Store here so do check this article as it’s a great
resource to deep dive into query store.
If you have SQL Server 2016 or higher so first you need to enable it in your database properties. After
enabling the Query Store, you will have the properties of your database as shown in the screenshot
below:
After you have enabled the Query Store you can expand the database objects and go to the “Top
Resource Consuming Queries” as shown in the below screenshot:
Note: Give the Query Store a day or two to capture the production load so that you can easily work on it
with real load.

Right Click on Top Resource Consuming Queries and select “View Top Resource Consuming Queries”, you
will be taken to the window showing these high resource consuming queries. You can customize the view
by selecting an appropriate “metric” like Duration, CPU Time, Logical Read or Memory Consumption. The
second thing you need to change is “Statistic”. You can change it to Min, Max or Avg. I would
recommend to use Average Statistic with all metrics mentioned above to get queries.
The next step is to highlight the queries which are consuming high resources. After highlighting the
graph value in the left-hand side window (as highlighted in the screenshot below) you will get the query
execution plan in the bottom window.
You can click on the mentioned below highlighted button in the Query Store window to get the actual
Query Text for further analysis.
So, as of now, you have multiple ways to get High Resource usage queries. Now we will see how we can
check why the queries are running slow and which part of the query needs to be fixed (if required).
So, here I will take the example of a query used in Microsoft sample database “WideWorldImporters”.
The TSQL executes a stored procedure “[Integration].[GetOrderUpdates]”.
The call of this Stored Procedure takes around a second and I will not be optimizing it. This is just to give
you an example that how may know, how this second was spent. We also want to know which part of the
query is taking most time as well as on which table we must focus.
Below is the stored procedure call and results.
So now we have the call and we will dig deeper into this.
First, we need to enable the Query Statistics for this session. We will enable CPU and IO statistics for this
query session by issuing the TSQL “SET STATISTICS TIME, IO ON”.
After executing the mentioned above TSQL for enabling Statistics, we will get IO for each table and Total
CPU cost for the queries running inside the stored procedure in Messages tab as shown in below
screenshot.
In the above screenshot, we can see that the most IO is taken by OrderLines table and there is only one
query which is executing inside the Stored Procedure which is taking 672 ms CPU time (1650 ms Elapsed
time).
Note: There might be multiple queries running inside a stored procedure so keep in mind that Statistics will
give you time for each query as well as the total for all the queries at the end. So, in case of Stored
Procedures for total CPU time only consider the last CPU time and for each query consider its CPU time
only and exclude the last CPU time as it’s only a total for all.
As of now, we know that the OrderLines table is taking most of the Logical Reads.
Next, we will enable the Actual Execution Plan for the query by clicking the icon     (Ctrl +M) in SQL
Server Management Studio and will try to answer the question of why this table was taking this IO and
which component of the execution plan is taking most of the time.
After including the Actual Execution Plan, we will re-execute the query and view the execution plan.
Though, we can obtain much detailed information about Query Execution Plan inside SQL Server
Management Studio but there is another great tool out on web which can be used to explore Query
Execution Plan in much more intuitive way that is ApexSQL Plan.
After installing this tool, you might need to restart the SQL Server Management Studio so install it and
then re-execute the query to get an Execution Plan. A screenshot tour for this tool is provided here. After
executing the query, Right Click on the Execution Plan and you will have the option of “View with
ApexSQL Plan”.

After viewing the Execution Plan in ApexSQL Plan, you can see the highlighted items in mentioned below
screenshot which will be opened in the ApexSQL Plan.

Couple of points and tips here to look at


 If the table is using Key Lookup, then try to remove that lookup by adding those columns in the
index which is used by the table.
 If number of rows returned by the query is way off as compared to number of rows returned by
table operators (as highlighted in the lower section of the query plan), then try to re-write the
query which filters the data using more columns to reduce the number of rows.
 If estimated rows and actual rows have huge difference, then try to update the Statistics of under
lying tables.
 If there are missing indexes indicated by the query, then try to evaluate the index and if that
index is helping you query then add it to the corresponding table.
Finally, if your queries are performing well in single execution and only have problem while running in
production load so you can easily fake the production load for a single query by using Adam Machanic’s
SQL Stress Tool which is available to download from the link here. Then you can capture that slow
running query from the techniques mentioned above and tune it up accordingly.

SQL replace: How to replace ASCII special


characters in SQL Server
August 7, 2017 by Sifiso Ndlovu

One of the important steps in an ETL process involves the transformation of source data. This could
involve looking up foreign keys, converting values from one data type into another, or simply conducting
data clean-ups by removing trailing and leading spaces. One aspect of transforming source data that
could get complicated relates to the removal of ASCII special characters such as new line characters and
the horizontal tab. In this article, we take a look at some of the issues you are likely to encounter when
cleaning up source data that contains ASCII special characters and we also look at the user-defined
function that could be applied to successfully remove such characters.

Replacing ASCII Printable Characters


The American Standard Code for Information Interchange (ASCII) is one of the generally accepted
standardized numeric codes for representing character data in a computer. For instance, the ASCII
numeric code associated with the backslash (\) character is 92. Many of the software vendors abide by
ASCII and thus represents character codes according to the ASCII standard. Likewise, SQL Server, which
uses ANSI – an improved version of ASCII, ships with a built-in CHAR function that can be used to
convert an ASCII numerical code back to its original character code (or symbol). Script 1 shows us an
example of how an ASCII numeric code 92 can be converted back into a backslash  character as shown
in Figure 1.
1 SELECT CHAR(92);
Script 1

Figure 1
The backslash character falls into a category of ASCII characters that is known as ASCII Printable
Characters – which basically refers to characters visible to the human eye. Table 1 shows a top 5 sample
of ASCII Printable Characters.
Numeric Code Character Description

33 ! Exclamation Mark

35 # Number

36 $ Dollar

37 % Percent

38 & Ampersand
Table 1: ASCII Printable Characters (Source: RapidTables.com)
When it comes to addressing data quality issues in SQL Server, it’s easy to clean most of the ASCII
Printable Characters by simply applying the REPLACE function. Say for instance that source data contains
an email address for John Doe that has several invalid special characters as shown in Script 2.
1 DECLARE @email VARCHAR(55) = 'johndoe@a!b#c.com$';
Script 2
We could eliminate such characters by applying the REPLACE T-SQL function as shown in Script 3.
1 SELECT REPLACE(REPLACE(REPLACE(@email, '!', ''), '#', ''), '$', '');
Script 3
Execution of Script 3 results into a correctly formatted email address that is shown in Figure 2.

Figure 2

Replacing ASCII Control Characters


In addition to ASCII Printable Characters, the ASCII standard further defines a list of special characters
collectively known as ASCII Control Characters. Such characters typically are not easy to detect (to the
human eye) and thus not easily replaceable using the REPLACE T-SQL function. Table 2 shows a sample
list of the ASCII Control Characters.
Numeric Code Character Description

0 NUL null

1 SOH start of header

2 STX start of text

3 ETX end of text

4 EOT end of transmission


Table 2: Top 5 ASCII control characters (Source: RapidTables.com)
To demonstrate the challenge of cleaning up ASCII Control Characters, I have written a C# Console
application shown in Script 4 that generates an output.txt text file that contains different variations of
John Doe’s email address (only the first line has John Doe’s email address in the correct format).
1 using (StreamWriter writer = new StreamWriter(@"C:\temp\output.txt"))
2 {
3 string vd = "[email protected]";
4 writer.WriteLine(vd);
5 writer.WriteLine((char)1 + vd);
6 writer.WriteLine((char)9 + vd + (char)1);
7 writer.WriteLine((char)9 + vd);
8 }
Script 4
A preview of the output.txt text file populated by Script 4 is shown using the Windows Notepad.exe
program in Figure 3.

Figure 3
As it can be seen, there seem to be spaces in email address 2-4 but it’s difficult to tell whether these
spaces are created by the Tab character or the Space bar character. Furthermore, if you go back to Script
4, you will recall that for the 3rd email address, I included the start of header character at the end of the
email address, but looking at the data in Figure 3, the start of header character is not easily visible at the
end of that 3rd email address. In fact, it looks like the email address 3 and 4 have the same amount of
characters – which is not true. Only using advanced text editors such as Notepad++ are we then able to
visualize the special characters in the data, as shown in Figure 4.
Figure 4
When it comes to SQL Server, the cleaning and removal of ASCII Control Characters are a bit tricky. For
instance, say we have successfully imported data from the output.txt text file into a SQL Server database
table. If we were to run the  REPLACE T-SQL function against the data as we did in Script 3, we can already
see in Figure 5 that the REPLACE function was unsuccessful as the length of data in the original column is
exactly similar to the length calculated after having applied both REPLACE and TRIM functions.
1 SELECT [id],
2        [Column 0],
3        LEN([Column 0]) OriginalLength,
4        LEN(REPLACE(REPLACE(LTRIM(LTRIM([Column 0])), ' ', ''), '  ', '')) NewLength
5 FROM [SQLShack].[dbo].[OLE DB Destination];
Script 5

Figure 5
So how do we replace what we cannot see?
1. Replace String using Character Codes
The simplest way to replace what we cannot see is that instead of hardcoding the string to
replace into our REPLACE function, we should hardcode the string to be replaced by hardcoding
its ASCII numerical code within the CHAR function. Thus, instead of providing an exclamation
mark as the string to replace, we can hardcode the ASCII numerical code for  exclamation mark –
which is 33 and convert that numeric code back to character code using the CHAR function. Thus
our script changes from:
1 DECLARE @email VARCHAR(55)= 'johndoe@a!bc.com';
2 SELECT REPLACE(@email, '!', '');
To using:
1 DECLARE @email VARCHAR(55)= 'johndoe@a!bc.com';
2 SELECT REPLACE(@email, CHAR(33), '');
Script 6
Now going back to cleaning email address data out of the output.txt text file, we can rewrite our
script to what is shown in Script 7.
1 SELECT [id],
2        [Column 0],
3        LEN([Column 0]) OriginalLength,
4        LEN(REPLACE(REPLACE([Column 0], CHAR(1), ''), CHAR(9), '')) NewLength
5 FROM [SQLShack].[dbo].[OLE DB Destination];
Script 7
After executing Script 7, we can see in Figure 6 that the length of all email address rows matches
back to the length of row 1 – which was originally the correct email address. Thus, we have
successfully managed to remove “invincible” special characters.

Figure 6
2. Dynamically Detect and Replace ASCII Characters
One noticeable limitation of Script 7 is that we have hard-coded the list of ASCII numerical values.
This means if the email address data contained special characters with ASCII numerical value 8
then we wouldn’t have removed them as we had hardcoded our script to specifically look
for CHAR(1) and CHAR(9). Therefore, there is a need for a mechanism that allows us to
automatically detect ASCII Control Characters contained in a given string and then automatically
replace them. Script 8 provides such a mechanism in a form of a While loop within a user-defined
function that iteratively searches through a given string to identify and replace ASCII Control
Characters.
1 CREATE FUNCTION [dbo].[ReplaceASCII](@inputString VARCHAR(8000))
2 RETURNS VARCHAR(55)
3 AS
4      BEGIN
5          DECLARE @badStrings VARCHAR(100);
6          DECLARE @increment INT= 1;
7          WHILE @increment <= DATALENGTH(@inputString)
8              BEGIN
9                  IF(ASCII(SUBSTRING(@inputString, @increment, 1)) < 33)
10                      BEGIN
11                          SET @badStrings = CHAR(ASCII(SUBSTRING(@inputString, @increment, 1)));
12                          SET @inputString = REPLACE(@inputString, @badStrings, '');
13                  END;
14                  SET @increment = @increment + 1;
15              END;
16          RETURN @inputString;
17      END;
18 GO
Script 8
The application of the function is shown in Script 9.
1 SELECT [id],
2        [Column 0],
3        LEN([Column 0]) OriginalLength,
4        LEN([SQLShack].[dbo].[ReplaceASCII]([Column 0])) NewLength
5 FROM [SQLShack].[dbo].[OLE DB Destination];
Script 9

Conclusion
Every now and then T-SQL developers are faced with cleaning the data they have imported by usually
applying the REPLACE T-SQL function. However, when it comes to removing special characters, removal
of ASCII Control Characters can be tricky and frustrating. Fortunately, SQL Server ships with additional
built-in functions such as CHAR and ASCII that can assist in automatically detecting and replacing ASCII
Control Characters.

References
Manage Unicode Characters in Data Using T-SQL
November 7, 2019 by Jignesh Raiyani

In this article, I’ll provide some useful information to help you understand how to use Unicode in SQL
Server and address various compilation problems that arise from the Unicode characters’ text with the
help of T-SQL.

What is Unicode?
The American Standard Code for Information Interchange (ASCII) was the first extensive character
encoding format. Originally developed in the US, and intended for English, ASCII could only
accommodate encoding for 128 characters. Character encoding simply means assigning a unique
number to every character being used. As an example, we show the letters ‘A’,’a’,’1′ and the symbol ‘+’
become numbers, as shown in the table:
ASCII(‘A’) ASCII(‘a’) ASCII(‘1’) ASCII(‘+’)

65 97 49 43
The T-SQL statement below can help us find the character from the ASCII value and vice-versa:
1 SELECT CHAR(193) as Character
Here is the result set of ASCII value to char:
1 SELECT ASCII('Á') as ASCII_
Here is the result set of char to ASCII value:

While ASCII encoding was acceptable for most common English language characters, numbers and
punctuation, it was constraining for the rest of the world’s dialects. As a result, other languages required
different encoding schemes and character definitions changed according to the language. Having
encoding schemes of different lengths required programs to figure out which one to apply depending on
the language being used.
Here is where international standards become critical. When the entire world practices the same
character encoding scheme, every computer can display the same characters. This is where the Unicode
Standard comes in.
Encoding is always related to a charset, so the encoding process encodes characters to bytes and
decodes bytes to characters. There are several Unicode formats: UTF-8, UTF-16 and UTF-32.
 UTF-8 uses 1 byte to encode an English character. It uses between 1 and 4 bytes per character
and it has no concept of byte-order. All European languages are encoded in two bytes or less per
character
 UTF-16 uses 2 bytes to encode an English character and it is widely used with either 2 or 4 bytes
per character
 UTF-32 uses 4 bytes to encode an English character. It is best for random access by character
offset into a byte-array
Special characters are often problematic. When working with different source frameworks, it would be
preferable if every framework agreed as to which characters were acceptable. A lot of times, it happens
that developers perform missteps to identify or troubleshoot the issue, and however, those issues are
identified with the odd characters in the data, which caused the error.

Unicode data types in SQL Server


Microsoft SQL Server supports the below Unicode data types:
 nchar
 nvarchar
 ntext
The Unicode terms are expressed with a prefix “N”, originating from the SQL-92 standard. The utilization
of nchar, nvarchar and ntext data types are equivalent to char, varchar and text. The Unicode supports a
broad scope of characters and more space is expected to store Unicode characters. The most extreme
size of nchar and nvarchar columns is 4,000 characters, not 8,000 characters like char and varchar. For
example:
N’Mãrk sÿmónds’
All Unicode data practices the identical Unicode code page. Collations do not regulate the code page,
which is being used for Unicode columns. Collations control only attributes such as comparison rules and
case sensitivity.
This T-SQL statement prints the ASCII values and characters for the ASCII 193-200 range:
SELECT CHAR(193), CHAR(194), CHAR(195), CHAR(196), CHAR(197), CHAR(198), CHAR
(199), CHAR (200)
CH CH CH CH CH CH CH CHA
AR( AR( AR( AR( AR( AR( AR( R(20
193) 194) 195) 196) 197) 198) 199) 0)

Á Â Ã Ä Å Æ Ç È

Get a list of special characters in SQL Server


Here are some of the Unicode character sets that can be represented in a single-byte coding scheme;
however, the character sets require multi-byte encoding. For more information on character sets, check
out the below function that returns the ASCII value and character with positions for each special
character in the string with the help of T-SQL statements:
Function:
1 CREATE FUNCTION [dbo].[Find_Unicode]
2 (
3     @in_string nvarchar(max)
4 )
5 RETURNS @unicode_char TABLE(id INT IDENTITY(1,1), Char_ NVARCHAR(4), position BIGINT)
6 AS
7 BEGIN
8     DECLARE @character nvarchar(1)
9     DECLARE @index int
10  
11     SET @index = 1
12     WHILE @index <= LEN(@in_string)
13     BEGIN
14         SET @character = SUBSTRING(@in_string, @index, 1)
15         IF((UNICODE(@character) NOT BETWEEN 32 AND 127) AND UNICODE(@character) NOT IN (10,11))
16         BEGIN
17       INSERT INTO @unicode_char(Char_, position)
18       VALUES(@character, @index)
19     END
20     SET @index = @index + 1
21     END
22     RETURN
23 END
24 GO

Execution:
1 SELECT *
2 FROM [Find_Unicode](N'Mãrk sÿmónds')
Here is the result set:

Remove special characters from string in SQL Server


In the code below, we are defining logic to remove special characters from a string. We know that the
basic ASCII values are 32 – 127. This includes capital letters in order from 65 to 90 and lower case letters
in order from 97 to 122. Each character corresponds to its ASCII value using T-SQL. The
“RemoveNonASCII” function excludes all the special characters from the string and sets up a blank of
them:
1 CREATE FUNCTION [dbo].[RemoveNonASCII]
2 (
3     @in_string nvarchar(max)
4 )
5 RETURNS nvarchar(MAX)
6 AS
7 BEGIN
8  
9     DECLARE @Result nvarchar(MAX)
10     SET @Result = ''
11  
12     DECLARE @character nvarchar(1)
13     DECLARE @index int
14  
15     SET @index = 1
16     WHILE @index <= LEN(@in_string)
17     BEGIN
18         SET @character = SUBSTRING(@in_string, @index, 1)
19   
20         IF (UNICODE(@character) between 32 and 127) or UNICODE(@character) in (10,11)
21             SET @Result = @Result + @character
22         SET @index = @index + 1
23     END
24  
25     RETURN @Result
26 END

Execution:
1 SELECT dbo.[RemoveNonASCII](N'Mãrk sÿmónds')

These SQL functions can be very useful if you’re working with large international character sets.

How to create and configure a linked server in


SQL Server Management Studio
June 9, 2017 by Marko Zivkovic

Linked servers allow submitting a T-SQL statement on a SQL Server instance, which returns data from
other SQL Server instances. A linked server allows joining data from several SQL Server instances using a
single T-SQL statement when data exists on multiple databases on different SQL instances. By using a
linked server to retrieve data from several SQL instances, the only thing that should be done is to
connect to one SQL instance.
There are two ways of configuring linked server in SSMS. One way is by using sp_addlinkedserver system
stored procedure and another is by using SQL Server Management Studio (SSMS) GUI interface.
In this article will be explained how to configure a linked server using a SQL Server data source. More
information about other data sources can be found on this link.
To see all created linked servers in SSMS, under Object Explorer, chose the Server Objects folder and
expand the Linked Servers folder:
To create a linked server in SSMS, right click on the Linked Servers folder and from the context menu
select the New Linked Server option:

The New Linked Server dialog appears:


In this dialog, the name of a linked server and server type must be identified. The linked servers can be
defined for different kind of data source if the Other data source radio button is chosen. For the
purpose of this article, the SQL Server radio button under the Server type section will be chosen and in
the Linked server text box, a name of the server will be entered:
If the SQL Server type is chosen to configure a SQL Server linked server, the name specified in the Linked
server text box must be the name of the remote SQL Server.
If a SQL Server instance is default instance, type the name of the computer that hosts the instance of SQL
Server (e.g. WSERVER2012). If the SQL Server is a named instance, type the name of the computer and
the name of the instance separated by slash (e.g. WSERVER2012\SQLEXPRESS).
Otherwise the following error may occur when the OK button is pressed:
To set how a user would authenticate to the WSERVER2012\SQLEXPRESS instance, under the Select a
page section on upper left corner of the New Linked Server dialog, select the Security item:

Here, different ways to authenticate the linked server can be set.


Under the Local server login to remote server login mappings, two ways of local logging to a remote
login can be set. One way is to associate a local login with a remote login and other way is to
impersonate.

Local Login
In the Local Login field, will be listed all the local logins. The local login can be an SQL Server
Authentication local login:

Or a Windows Authentication login:

Now, when clicking the OK button on the New Linked Server dialog, the following error message will
appear:
The login mappings should either be impersonate or have a remote login name.
See the image below:

This happens because the Impersonate check box isn’t checked.

Impersonate
The Impersonate check box when is checked passes the local login credentials to the linked server. For
SQL Server Authentication, the same login with the exact credentials must exist on the linked server,
otherwise when connected to the server with the SQL Server Authentication, the list of the databases
under the Catalogs folder may look like this:

For Windows logins, the login must be a valid login on the linked server. In order to use impersonation,
the delegation between the local server and the linked server must be set.
Let’s create a linked server using the local Windows login. From the Local Login combo box, choose the
local Windows login and check the Impersonate checkbox and press the OK button:
Under the Catalogs folder, all databases that are located on the linked server will be listed:

Remote User
The remote user option allows users from the local SQL server to connect to the linked SQL server even
though their credentials aren’t present on the remote server by using the credentials from the user that
exists on the remote server. Basically, it allows local logins to connect to a remote server as a different
login that must exist on a remote server.

Remote Password
Specify the password of the remote user.
From the Local Login drop down list, choose a local login which should map to a remote login. On
the Remote User field, enter the name of the remote user that exists on the remote server and in
the Remote Password filed, enter a password of that remote user. Then, press the OK button:

Now, when connected to the local server using SQL Server Authentication, with Miki or Zivko credentials,
under the Catalogs folder, all databases that are available on a remote server for the Nenad remote login
will be listed:

Additionally, on the Linked Server dialog, it can be identified how logins that are not set in the Local
server login to remote server login mappings list will connect to the linked server, for that there are
four options that can be used and they are located under the For a login not defined in the list above,
connections will section:
Not be made
If this radio button is chosen, any users that aren’t identified in the Local server login to remote server
login mappings list cannot establish connection to the linked server.
For example, if login with different account (e.g. Ben) that not set in the login mapping list the list of the
databases under the Catalogs folder will look like this:

If you attempt to execute a linked server query:


1  
2 SELECT * FROM [WSERVER2012\SQLEXPRESS].AdventureWorks2014.HumanResources.Employee e
3  
The following result will appear:
Msg 7416, Level 16, State 1, Line 1
Access to the remote server is denied because no login-mapping exists.
Be made without using a security context
The Be made without using a security context option is used for connecting to data sources that do
not require any authentication, for example like a text file. When this option is selected for connect to a
linked server, will have the same effect as selecting the “Not be made” option.
If you attempt to execute a linked server query:
1  
2 SELECT * FROM [WSERVER2012\SQLEXPRESS].AdventureWorks2014.HumanResources.Employee e
3  
The following message e may appear:
OLE DB provider “SQLNCLI11” for linked server “WSERVER2012\SQLEXPRESS”
returned message “Invalid authorization specification”.
Msg 7399, Level 16, State 1, Line 1
The OLE DB provider “SQLNCLI11” for linked server “WSERVER2012\SQLEXPRESS”
reported an error. Authentication failed.
Msg 7303, Level 16, State 1, Line 1
Cannot initialize the data source object of OLE DB provider “SQLNCLI11” for
linked server “WSERVER2012\SQLEXPRESS”.

Be made using the login’s current security context


If this option is chosen, it will pass the current security context of the local login to the remote login.
If Windows Authentication is used, the windows credentials will be used to connect to a remote SQL
server. If SQL Server Authentication is used, then the local login credentials will be passed to remote
SQL Server. Note, to establish connection to remote server successfully, then the user with the exact
same credentials must exist on the remote server otherwise when execute a linked server query:
1  
2 SELECT * FROM [WSERVER2012\SQLEXPRESS].AdventureWorks2014.HumanResources.Employee e
3  
The following message will appear:
Msg 18456, Level 14, State 1, Line 1
Login failed for user ‘Ben’.

Be made using this security context


The fourth option under the For a login not defined in the list above, connections will section is Be
made using this security context. In the Remote login and With password fields, enter the credentials
of the SQL Server Authentication login that exist on a remote server, otherwise the following error may
occur:

The last item under the Select a page menu is the Server Options item. When selecting this option, the
following window will be shown:
Here, additional options for linked server can be seen or set.

Collation Compatible
The first option is the Collation Compatible option. This option is used to identify if the linked server has
the same collation as the local server. This option should set to True only if is known that the linked
server has the same collation as the local, otherwise it should be set to False (default).

Data Access
This option is used to allow/deny access to the linked server data. If this option is set to False, the access
to remote will be denied. This option is useful to disable access to a remote server temporally. The
following message will appear when execute a linked server query and this option is set to False:
Msg 7411, Level 16, State 1, Line 1
Server ‘WSERVER2012\SQLEXPRESS’ is not configured for DATA ACCESS.
By default, the option is set to True

RPC and RCP Out


This RCP (Remote Procedure Call) is used to enable the access to remote procedures to be called from
the linked server or to be called to the linked server.
If these options are set to False, the following error will appear when some procedures from the linked
server are called:
Msg 7411, Level 16, State 1, Line 4
Server ‘WSERVER2012\SQLEXPRESS’ is not configured for RPC.
By default, the False value are set for the RPC and RCP Out options

Use Remote Collation


When this option is set to True, the collection of remote columns will be used and collection specified in
the Collation Name filed will be used for data source that are not SQL Server data source, but if the
option is set to False then the collation for the local server will be used. By default, is set to False.

Collation Name
If the Use Remote Collation filed set to True, this option is used to specify the collation name of the
linked server for the data source that is not SQL Server data source. When chose a collation name, it must
be a collation that SQL Server supports.

Connection Timeout
This option is used to set the maximum time the local server should wait for to get a connection to the
linked server SQL Server instance. If 0 (zero) is set, then the server option remote login timeout is used.
By default, 10 second is set for this option. Note, the default value for SQL Server 2008 is 20 seconds.

Query Timeout
This option is used to set how long, in seconds, a remote process can take before time is out. The default
value is 600 second (10 minutes). To disable query timeout put 0 (zero) in this field and the query will
wait until it is completed.

Distributor
In this option, it can be specified whether the linked server is participating in replication as a distribution
Publisher.
The Distributor is a database instance, that acts as a store for replication specific data associated with one
or more Publishers

Publisher
In this option, it can be set whether the linked server to be a replication publisher or not. If True, the
linked server is a publisher. Otherwise, is not.
The Publisher is a database instance, that makes data available to other locations through replication.

Subscriber
In this option, it can be specified whether the linked server is a replication subscriber or not.
A Subscriber is a database instance, that receives replicated data.
More information about Distributor, Publisher, Subscriber can be found on the Replication Publishing
Model Overview page.

Lazy schema validation


This option checks schema changes that have occurred since compilation in the remote tables. If this
option is set to False (default state), SQL Server checks changes before the execution of a query and if
there are some changes, it recompiles the query. If the Lazy schema validation is set to True, an SQL
Server delay schema checks the remote tables until query execution.

Enable Promotion of Distributed Transactions


This option is used to protect the actions of a server-to-server procedure through a Microsoft Distributed
Transaction Coordinator (MS DTC) transaction. If this option is set to True calling a remote stored
procedure starts a distributed transaction and enlists the transaction with MS DTC.
Now, when everything is set, click the OK button on the New Linked Server dialog. A newly created linked
server will appear under the Linked Server folder.
To test that linked server if it works properly, go right-clicking on that linked server and choose Test
Connection:
If a connection with linked server is established successfully, the following info message box will appear:

Otherwise, an error message will be displayed that shows a problem that prevents connection to be
successfully established:

Querying data using a linked server


Querying data using the linked server is a little bit different then querying data from the local SQL Server.
In the normal queries, usually, two part notation is used [Schema].[ObjectName], for example
HumanResources.Employee:
1  
2 SELECT * FROM HumanResources.Employee e
3  
When querying a table from a linked server, the fourth part notation is
used LinkedServer.Database.Schema.ObjectName. To get data from the Employee table which is
located in a database on the linked server, querying code will look like this:
1  
2 SELECT * FROM [ZIVKO\SQLEXPRESS2016].[AdventureWorks2014].[HumanResources].[ Employee]
3  

Deleting a linked server


To delete a linked server, under the Linked Servers folder, right click on the linked server and from the
context menu choose the Delete command:
This will open the Delete Object dialog:
Click the OK button and from the message box, choose the Yes button:

If everything goes well the linked server will be removed from the Linked Servers folder.

SQL Query Optimization Techniques in SQL


Server: Parameter Sniffing
September 4, 2018 by Ed Pollack

Description
Of the many ways in which query performance can go awry, few are as misunderstood as parameter
sniffing. Search the internet for solutions to a plan reuse problem, and many suggestions will be
misleading, incomplete, or just plain wrong.
This is an area where design, architecture, and understanding one’s own code are extremely important,
and quick fixes should be saved as emergency last resorts.
Understanding parameter sniffing requires comfort with plan reuse, the query plan cache, and
parameterization. This topic is so important and has influenced me so much that I am devoting an entire
article to it, in which we will define, discuss, and provide solutions to parameter sniffing challenges.

Review of Execution Plan Reuse


SQL query optimization is both a resource and time intensive process. An execution plan provides SQL
Server with instructions on how to efficiently execute a query and must be available prior to execution
and is the product of the query optimization process whenever a query is executed.
Because it takes significant resources to generate an execution plan, SQL Server caches plans in memory
in the query plan cache for later use. If the same query is executed multiple times, then the cached plan
can be reused over and over, without the need to generate a new plan. This saves time and resources,
especially for common queries that are executed frequently.
Execution plans are cached based on the exact text of the query. Any differences, even those as minor as
a comment or capital letter, will result in a separate plan being generated and cached. Consider the
following two queries:
1 SELECT
2 SalesOrderHeader.SalesOrderID,
3 SalesOrderHeader.DueDate,
4 SalesOrderHeader.ShipDate
5 FROM Sales.SalesOrderHeader
6 WHERE SalesOrderHeader.OrderDate = '2011-05-30 00:00:00.000';
7  
8 SELECT
9 SalesOrderHeader.SalesOrderID,
10 SalesOrderHeader.DueDate,
11 SalesOrderHeader.ShipDate
12 FROM Sales.SalesOrderHeader
13 WHERE SalesOrderHeader.OrderDate = '2011-05-31 00:00:00.000';
While the queries are very similar and will likely require the same execution plan, SQL Server will create a
separate plan for each. This is because the filter is different, with the OrderDate being May 30th in the first
query and May 31st in the second query. As a result, hard-coded literals in queries will result in different
execution plans for each different value that is used in the query. If I ran the query above once for every
day in the year 2011, then the result would be 365 queries and 365 different cached execution plans.
If the queries above are executed very often, then SQL Server will be forced to generate new plans
frequently for all possible values of OrderDate. If OrderDate is a DATETIME and can (and will) have lots of
distinct values, then we’ll see a very large number of execution plans getting created at a rapid pace.
The plan cache is stored in memory and its size is limited by available memory. Therefore, if excessive
numbers of plans are generated over a short period of time, the plan cache could fill up. When this
occurs, older plans are removed from cache in favor of newer ones. If memory pressure becomes
significant, then the older plans being removed may end up being useful ones that we will need soon.

Parameterization
The solution to memory pressure in the plan cache is parameterization. For our query above, the
DATETIME literal can be replaced with a parameter:
1 CREATE PROCEDURE dbo.get_order_date_metrics
2 @order_date DATETIME
3 AS
4 BEGIN
5 SET NOCOUNT ON;
6  
7 SELECT
8 SalesOrderHeader.SalesOrderID,
9 SalesOrderHeader.DueDate,
10 SalesOrderHeader.ShipDate
11 FROM Sales.SalesOrderHeader
12 WHERE SalesOrderHeader.OrderDate = @order_date;
13 END
When executed for the first time, an execution plan will be generated for this stored procedure that uses
the parameter @order_date. All subsequent executions will use the same execution plan, resulting in the
need for only a single plan, even if the proc is executed millions of times per day.
Parameterization greatly reduces churn in the plan cache and speeds up query execution as we can often
skip the expensive optimization process that is needed to generate an execution plan.

What is Parameter Sniffing


Plan reuse is an important part of how execution plans are managed. The process of optimizing a query
and assigning a plan to it is one of the most CPU-intensive processes in SQL Server. Since it is also a
time-sensitive process, slowing down is not an option.
This is a good feature and one that saves immense server resources. A query that executes a million
times a day can now be optimized once and the plan reused 999,999 times for free. While this feature is
almost always good, there are times it can cause unexpected performance problems. This primarily
occurs when the set of parameters that the execution plan was optimized for ends up being drastically
different than the parameters that are being passed in right now. Maybe the initial optimization called for
an index seek, but the current parameters suggest a scan is better. Maybe a MERGE join made sense the
first time, but NESTED LOOPS is the right way to go now.
The following is an example of parameter sniffing:
1 CREATE PROCEDURE dbo.get_order_metrics_by_sales_person
2 @sales_person_id INT
3 AS
4 BEGIN
5 SET NOCOUNT ON;
6  
7 SELECT
8 SalesOrderHeader.SalesOrderID,
9 SalesOrderHeader.DueDate,
10 SalesOrderHeader.ShipDate
11 FROM Sales.SalesOrderHeader
12 WHERE ISNULL(SalesOrderHeader.SalesPersonID, 0) = @sales_person_id;
13 END
This stored procedure searches SalesOrderHeader based on the ID of the sales person, including a catch-
all for NULL IDs. When we execute it for a specific sales person (285), we get the following IO and
execution plan:
Table ‘SalesOrderHeader’. Scan count 1, logical reads 105, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead
reads 0.

We can see that SQL Server used a scan on a nonclustered index, as well as a key lookup to return the
data we were looking for. If we were to clear the execution plan cache and rerun this for a parameter
value of 0, then we would get a different plan:
Table ‘SalesOrderHeader’. Scan count 1, logical reads 698, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead
reads 0.

Because so many rows were being returned by the query, SQL Server found it more efficient to scan the
table and return everything, rather than methodically seek through an index to return 95% of the table. In
each of these examples, the execution plan chosen was the best plan for the parameter value passed in.
How will performance look if we were to execute the stored procedure for a parameter value of 285 and
not clear the plan cache?
Table ‘SalesOrderHeader’. Scan count 1, logical reads 698, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead
reads 0.

The correct execution plan involved a scan of a nonclustered index with a key lookup, but since we
reused our most recently generated execution plan, we got a clustered index scan instead. This plan cost
us six times more reads, pulling significantly more data from storage than was needed to process the
query and return our results.
The behavior above is a side-effect of plan reuse and is the poster-child for what this article is all about.
For our purposes, parameter sniffing will be defined as undesired execution plan reuse.
Finding and Resolving Parameter Sniffing
How do we diagnose parameter sniffing? Once we know that performance is suboptimal, there are a
handful of giveaways that help us understand when this is occurring:
 A stored procedure executes efficiently sometimes, but inefficiently at other times.
 A good query begins performing poorly when no changes are made to database schema.
 A stored procedure has many parameters and/or complex business logic enumerated within it.
 A stored procedure uses extensive branching logic.
 Playing around with the TSQL appears to fix it temporarily.
 Hacks fix it temporarily
Of the many areas of SQL Server where performance problems rear their head, few are handled as poorly
as parameter sniffing. There often is not an obvious or clear fix, and as a result we implement hacks or
poor choices to resolve the latency and allow us to move on with life as quickly as possible. An immense
percentage of the content available online, in publications, and in presentations on this topic is
misleading, and encourages the administrator to take shortcuts that do not truly fix a problem. There are
definitive ways to resolve parameter sniffing, so let’s look at many of the possible solutions (and how
effective they are).
I am not going to go into excruciating detail here. MSDN documents the use of different hints/mechanics
well. Links are included at the end of the article to help with this, if needed.
Redeclaring Parameters Locally
Rating: It’s a trap!
This is a complete cop-out, plain and simple. Call it a cheat, a poor hack, or a bandage as that is all it is.
Because the value of local variables is not known until runtime, the query optimizer needs to make a very
rough estimate of row counts prior to execution. This estimate is all we get, and statistics on the index
will not be effectively used to determine the best execution plan. This estimate will sometimes be good
enough to resolve a parameter sniffing issue and give the illusion of a job well done.
The effect of using local variables is to hide the value from SQL Server. It’s essentially applying the hint
“OPTIMIZE FOR UNKNOWN” to any query component that references them. The rough estimate that SQL
Server uses to optimize the query and generate an execution plan will be right sometimes, and wrong
other times. Typically the way this is implemented is as follows:
1. Performance problem is identified.
2. Parameter sniffing is determined to be the cause.
3. Redeclaring parameters locally is a solution found on the internet.
4. Try redeclaring parameters locally and the performance problem resolves itself.
5. Implement the fix permanently.
6. 3 months later, the problem resurfaces and the cause is less obvious.
What we are really doing is fixing a problem temporarily and leaving behind a time bomb that will create
problems in the future. The estimate by the optimizer may work adequately for now, but eventually will
not be adequate and we’ll have resumed performance problems. This solution works because oftentimes
a poor estimate performs better than badly times parameter sniffing, but only at that time. This is a game
of chance in which a low probability event (parameter sniffing) is crossed with a high probability event (a
poor estimate happening to be good enough) to generate a reasonable illusion of a fix.
To demo this behavior, we’ll redeclare a parameter locally in our stored procedure from earlier:
1 IF EXISTS (SELECT * FROM sys.procedures WHERE procedures.name = 'get_order_metrics_by_sales_person')
2 BEGIN
3 DROP PROCEDURE dbo.get_order_metrics_by_sales_person;
4 END
5 GO
6  
7 CREATE PROCEDURE dbo.get_order_metrics_by_sales_person
8 @sales_person_id INT
9 AS
10 BEGIN
11 SET NOCOUNT ON;
12  
13 DECLARE @sales_person_id_local INT = @sales_person_id;
14  
15 SELECT
16 SalesOrderHeader.SalesOrderID,
17 SalesOrderHeader.DueDate,
18 SalesOrderHeader.ShipDate
19 FROM Sales.SalesOrderHeader
20 WHERE SalesOrderHeader.SalesPersonID = @sales_person_id_local;
21 END
When we execute this for different values, we get the same plan each time. Clearing the proc cache has
no effect either:
1 EXEC dbo.get_order_metrics_by_sales_person @sales_person_id = 285;
2 DBCC FREEPROCCACHE
3 EXEC dbo.get_order_metrics_by_sales_person @sales_person_id = 0;
4 EXEC dbo.get_order_metrics_by_sales_person @sales_person_id = 285;
5 DBCC FREEPROCCACHE
6 EXEC dbo.get_order_metrics_by_sales_person @sales_person_id = 285;
7 EXEC dbo.get_order_metrics_by_sales_person @sales_person_id = 0;
For each execution, the result is:
Table ‘SalesOrderHeader’. Scan count 1, logical reads 698, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead
reads 0.

When we hover over the results, we can see that the estimated number of rows was 1748, but the actual
rows returned by the query was 16. Seeing a huge disparity between actual and estimated rows is an
immediate indication that something is wrong. While that could be indicative of stale statistics, seeing
local variables in the query should be a warning sign that they are related. In this example, the local
variable forced the same mediocre execution plan for all runs of the query, regardless of details. This may
sometimes give an illusion of adequate performance, but will rarely do so for long.
To summarize: declaring local variables, assigning parameter values to them, and using the local variables
in subsequent queries is a very bad idea and we should never, ever do this! If a short-term hack is

needed, there are far better ones to use than this 


OPTION (RECOMPILE)
Rating: Potentially useful
When this query hint is applied, a new execution plan will be generated for the current parameter values
supplied. This automatically curtails parameter sniffing as there will be no plan reuse when this hint is
used. The cost of this hint are the resources required to generate a new execution plan. By creating a new
plan with each execution, we will pay the price of the optimization process with each and every
execution.
This option is also an easy way out and should not be blindly used. This hint is only useful on queries or
stored procedures that execute infrequently as the cost to generate a new execution plan will not be
incurred often. For important OLTP queries that are being executed all day long, this is a harmful option
and would be best avoided as we would sacrifice valuable resources on an ongoing basis to avoid
parameter sniffing.
OPTION RECOMPILE works best on reporting queries, infrequent or edge-case queries, and in scenarios
where all other optimization techniques have failed. For highly unpredictable workloads, it can be a
reliable way to ensure that a good plan is generated with each execution, regardless of parameter values.
Here is a quick example of OPTION (RECOMPILE) from above:
1 IF EXISTS (SELECT * FROM sys.procedures WHERE procedures.name = 'get_order_metrics_by_sales_person')
2 BEGIN
3 DROP PROCEDURE dbo.get_order_metrics_by_sales_person;
4 END
5 GO
6  
7 CREATE PROCEDURE dbo.get_order_metrics_by_sales_person
8 @sales_person_id INT
9 AS
10 BEGIN
11 SET NOCOUNT ON;
12  
13 SELECT
14 SalesOrderHeader.SalesOrderID,
15 SalesOrderHeader.DueDate,
16 SalesOrderHeader.ShipDate
17 FROM Sales.SalesOrderHeader
18 WHERE SalesOrderHeader.SalesPersonID = @sales_person_id
19 OPTION (RECOMPILE);
20 END
21 GO
The results of this change are that the stored procedure runs with an excellent execution plan each time:
Table ‘SalesOrderHeader’. Scan count 1, logical reads 50, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead
reads 0.

It is important to note that if this query hint is utilized and the query later begins to be used more often,
you will want to consider removing the hint to prevent excessive resource consumption by the query
optimizer as it constantly generates new plans. OPTION RECOMPILE is useful in a specific set of
circumstances and should be applied carefully, only when needed, and only when the query is not
executed often. To review, OPTION (RECOMPILE) is best used when:
 A query is executed infrequently.
 Unpredictable parameter values result in optimal execution plans that vary greatly with each
execution.
 Other optimization solutions were unavailable or unsuccessful.
As with all hints, use it with caution, and only when absolutely needed.
Dynamic SQL
Rating: Potentially useful
While dynamic SQL can be an extremely useful tool, this is a somewhat awkward place to use it. By
wrapping a troublesome TSQL statement in dynamic SQL, we remove it from the scope of the stored
procedure and another execution plan will be generated exclusively for the dynamic SQL. Since execution
plans are generated for specific TSQL text, a dynamic SQL statement with any variations in text will
generate a new plan.
For all intents and purposes, using dynamic SQL to resolve parameter sniffing is very similar to using a
RECOMPILE hint. We are going to generate more execution plans with greater granularity in an effort to
sidestep the effects of parameter sniffing. All of the caveats of recompilation apply here as well. We do
not want to generate excessive quantities of execution plans as the resource cost to do so will be high.
One benefit of this solution is that we will not create a new plan with each execution, but only when the
parameter values change. If the parameter values don’t change often, then we will be able to reuse plans
frequently and avoid the heavy repeated costs of optimization.
A downside to this solution is that it is confusing. To a developer, it is not immediately obvious why
dynamic SQL was used, so additional documentation would be needed to explain its purpose. While
using dynamic SQL can sometimes be a good solution, it is the sort that should be implemented very
carefully and only when we are certain we have a complete grasp of the code and business logic
involved. As with RECOMPILE, if the newly created dynamic SQL suddenly begins to be executed often,
then the cost to generate new execution plans may become a burden on resource consumption. Lastly,
remember to cleanse inputs and ensure that string values cannot be broken or modified by apostrophes,
percent signs, brackets, or other special characters.
Here is an example of this usage:
1 IF EXISTS (SELECT * FROM sys.procedures WHERE procedures.name = 'get_order_metrics_by_sales_person')
2 BEGIN
3 DROP PROCEDURE dbo.get_order_metrics_by_sales_person;
4 END
5 GO
6  
7 CREATE PROCEDURE dbo.get_order_metrics_by_sales_person
8 @sales_person_id INT
9 AS
10 BEGIN
11 SET NOCOUNT ON;
12  
13 DECLARE @sql_command NVARCHAR(MAX);
14  
15 SELECT @sql_command = '
16 SELECT
17 SalesOrderHeader.SalesOrderID,
18 SalesOrderHeader.DueDate,
19 SalesOrderHeader.ShipDate
20 FROM Sales.SalesOrderHeader
21 WHERE SalesOrderHeader.SalesPersonID = ' + CAST(@sales_person_id AS VARCHAR(MAX)) + ';
22 ';
23 EXEC sp_executesql @sql_command;
24 END
25 GO
Our results here are similar to using OPTION (RECOMPILE), as we will get good IO and execution plans
generated each time. To review wrapping a TSQL statement in dynamic SQL and hard-coding parameters
into that statement can be useful when:
 A query is executed infrequently OR parameter values are not very diverse.
 Different parameter values result in wildly different execution plans.
 Other optimization solutions were unavailable or unsuccessful.
 OPTION (RECOMPILE) resulted in too many recompilations.

OPTIMIZE FOR
Rating: Potentially useful, if you really know your code!
When we utilize this hint, we explicitly tell the query optimizer what parameter value to optimize for. This
should be used like a scalpel, and only when we have complete knowledge of and control over the code
in question. To tell SQL Server that we should optimize a query for any specific value requires that we
know that all values used will be similar to the one we choose.
This requires knowledge of both the business logic behind the poorly performing query and any of the
TSQL in and around the query. It also requires that we can see the future with a high level of accuracy
and know that parameter values will not shift in the future, resulting in our estimates being wrong.
One excellent use of this query hint is to assign optimization values for local variables. This can allow you
to curtail the rough estimates that would otherwise be used. As with parameters, you need to know what
you are doing for this to be effective, but there is at least a higher probability of improvement when our
starting point is “blind guess”.
Note that OPTIMIZE FOR UNKNOWN has the same effect as using a local variable. The result will typically
behave as if a rough statistical estimate were used and will not always be adequate for efficient
execution. Here’s how its usage looks:
1 IF EXISTS (SELECT * FROM sys.procedures WHERE procedures.name = 'get_order_metrics_by_sales_person')
2 BEGIN
3 DROP PROCEDURE dbo.get_order_metrics_by_sales_person;
4 END
5 GO
6  
7 CREATE PROCEDURE dbo.get_order_metrics_by_sales_person
8 @sales_person_id INT
9 AS
10 BEGIN
11 SET NOCOUNT ON;
12  
13 SELECT
14 SalesOrderHeader.SalesOrderID,
15 SalesOrderHeader.DueDate,
16 SalesOrderHeader.ShipDate
17 FROM Sales.SalesOrderHeader
18 WHERE SalesOrderHeader.SalesPersonID = @sales_person_id
19 OPTION (OPTIMIZE FOR (@sales_person_id = 285));
20 END
With this hint in place, all executions will utilize the same execution plan based on the
parameter @sales_person_id having a value of 285. OPTIMIZE FOR is most useful in these scenarios:
 A query executes very similarly all the time.
 We know our code very well and understand the performance of it thoroughly.
 Row counts processed and returned are consistently similar.
 We have a high level of confidence that these facts will not change in the future.
OPTIMIZE FOR can be a useful way to control variables and parameters to ensure optimal performance,
but it requires knowledge and confidence in how a query or stored procedure operates so that we do not
introduce a future performance problem when things change. As with all hints, use it with caution, and
only when absolutely needed.

Create a Temporary Stored Procedure


Rating: Potentially useful, in very specific scenarios
One creative approach towards parameter sniffing is to create a temporary stored procedure that
encapsulates the unstable database logic. The temporary proc can be executed as needed and dropped
when complete. This isolates execution patterns and limits the lifespan of its execution plan in the cache.
A temporary stored procedure can be created and dropped similarly to a standard stored procedure,
though it will persist throughout a session, even if a batch or scope is ended:
1 CREATE PROCEDURE #get_order_metrics_by_sales_person
2 @sales_person_id INT
3 AS
4 BEGIN
5 SET NOCOUNT ON;
6  
7 SELECT
8 SalesOrderHeader.SalesOrderID,
9 SalesOrderHeader.DueDate,
10 SalesOrderHeader.ShipDate
11 FROM Sales.SalesOrderHeader
12 WHERE SalesOrderHeader.SalesPersonID = @sales_person_id
13 OPTION (OPTIMIZE FOR (@sales_person_id = 285));
14 END
15 GO
16  
17 EXEC #get_order_metrics_by_sales_person @sales_person_id = 285;
18 EXEC #get_order_metrics_by_sales_person @sales_person_id = 0;
19 GO
20 DROP PROCEDURE #get_order_metrics_by_sales_person;
21 GO
When executed, the performance will mirror a newly created stored procedure, with no extraneous
history to draw on for an execution plan. This is great for short-term tasks, temporary needs, releases, or
scenarios in which data patterns change on a mid-term basis and can be controlled. Temporary stored
procs can be declared as global as well by using “##” in front of the name, instead of “#”. This is ill-
advised for the same reason that global temporary tables are discouraged, as they possess no security,
and maintainability becomes a hassle across many databases or the entire server.
The benefits of temporary stored procedures are:
 Can control stored proc and plan existence easily.
 Facilitates accurate execution plans for data that is consistent in the short term, but varies long-
term.
 Documents the need/existence for temporary business logic.
This is a little-known feature and few take advantage of it, but it can provide a useful way to guarantee
good execution plans without the need to hack apart code too much in doing so.
Disable Parameter Sniffing (Trace Flag 4136)
Rating: Occasionally useful, but typically a bad idea!
This trace flag disables plan reuse, and therefore stops parameter sniffing. It may be implemented on a
server-wide basis or as a query hint option. The result is similar to adding OPTIMIZE FOR UNKNOWN to
any query affected. Specific query hints override this, though, such as OPTIN (RECOMPILE) or OPTIMIZE
FOR.
Like query hints, trace flags should be applied with extreme caution. Adjusting the optimizer’s behavior is
rarely a good thing and will typically cause more harm than good. OPTIMIZE FOR UNKNOWN, like using
local variables, will result in generic execution plans that do not use accurate statistics to make their
decisions.
Making rough estimates with limited data is already dangerous. Applying that tactic to an entire server or
set of queries is likely to be more dangerous. This trace flag can be useful when:
 Your SQL Server has a unique use-case that you fully understand.
 Plan reuse is undesired server-wide.
 Usage patterns will not change in the foreseeable future.
While there are a few legitimate uses of this trace flag in SQL Server, parameter sniffing is not the
problem we want to try and solve with it. It is highly unlikely that this will provide a quality, long-term
solution to a parameter-related optimization problem.
Improve Business Logic
Rating: Very, very good!
Suboptimal parameter sniffing is often seen as an anomaly. The optimizer makes bad choices or solar
flares somehow intersect with your query’s execution or some other bad thing happens that warrants
quick & reckless actions on our part. More often than not, though, parameter sniffing is the result of how
we wrote a stored procedure, and not bad luck. Here are a few common query patterns that can increase
the chances of performance problems caused by parameter sniffing:
Too Many Code Paths
When we add code branching logic to procedural TSQL, such as by using IF…THEN…ELSE, GOTO, or CASE
statements, we create code paths that are not always followed, except when specific conditions are met.
Since an execution plan is generated ahead of time, prior to knowing which code path will be chosen, it
needs to guess as to what the most probable and optimal execution plan will be, regardless of how
conditionals are met.
Code paths are sometimes implemented using “switch” parameters that indicate a specific type of report
or request to be made. Switch parameters may indicate if a report should return detailed data or
summary data. They may determine which type of entity to process. They may decide what style of data
to return at the end. Regardless of form, these parameters contribute heavily to poor parameter sniffing
as the execution plan will not change when parameters do change. Use switch parameters cautiously,
knowing that if many different values are passed in frequently, the execution plan will not change.
This is in no way to suggest that conditional code is bad, but that a stored procedure with too many
code paths will be more susceptible to suboptimal parameter sniffing, especially if those code paths are
vastly different in content and purpose. Here are a few suggestions for reducing the impact of this
problem:
 Consider breaking out large conditional sections of TSQL into new stored procedures. A large
block of important code may very well be more appropriate as its own stored procedure,
especially if it can be reused elsewhere.
 Move business branching logic into code. Instead of making a stored procedure decide what data
to return or how to return it, have the application decide and let SQL Server manage what it does
best: reading and writing of data! The purpose of a database is to store and retrieve data, not to
make important business decisions or beautify the data.
 Avoid unnecessary conditionals, especially GOTO. This causes execution to jump around and is
not only confusing for developers to understand, but makes optimization challenging for SQL
Server.
Too Many Parameters
Each parameter adds another level of complexity to the job of the query optimizer. Similar to how a
query becomes more complex with each table added, a stored procedure will become more challenging
to optimize a plan for with each parameter that is added.
An execution plan will be created for the first set of parameters and reused for all subsequent sets, even
if the values change significantly. Like when too many code paths exist in a stored procedure, it becomes
challenging to pick a good execution plan will also happen to be good for all possible future executions.
A stored procedure with ten or twenty or thirty parameters may be trying to accomplish too many things
at once. Consider splitting the proc into smaller, more portable ones. Also, consider removing parameters
that are not necessary. Sometimes a parameter will always have the same value, not get used, or have its
value overridden later in the proc. Sometimes the logic imposed by a specific parameter is no longer
needed and it can be removed with no negative impact.
Reducing the number of parameters in a stored procedure will automatically reduce the potential for
parameter sniffing to become problematic. It may not always be an easy solution, but it’s the simplest
way to solve this problem without having to resort to hacks or trickery.
Overly Large Stored Procedure. AKA: Too Much Business Logic in the Database
Even if the parameter list is short, a very long stored procedure will have more decisions to make and
more potential for an execution plan to not be the one-size-fits-all solution. If you’re running into a
performance problem on line 16,359, you may want to consider dividing up the stored procedure into
smaller ones. Alternatively, a rewrite that reduces the amount of necessary code can also help.
Oftentimes new features in SQL Server allow for code to be written more succinctly. For example, MERGE,
OUTPUT, or common-table expressions can take long and complex TSQL statements and make them
shorter and simpler.
If a stored proc is not long due to having many code paths or too many parameters, it may be due to
using the database as a presentation tool. SQL Server, like all RDBMS, is optimized for the quick storage
and retrieval of data. Formatting, layout, and other presentation considerations can be made in SQL
Server, but it simply isn’t what it is best at. While the query optimizer generally has no trouble managing
queries that adjust formatting, color, and layout as they are relatively simple in nature, we still
incrementally add more complexity to a stored proc when we let it manage these aspects of data
presentation.
Another reason why a stored procedure can become too long is because it contains too much business
logic. Decision-making, presentation, and branching all are costly and difficult to optimize a universal
execution plan for. Reporting applications are excellent at managing parameters and decision-making
processes. Application code is built for branching, looping, and making complex decisions. Pushing
business logic from stored procedures, functions, views, and triggers into applications will greatly simplify
database schema and improve performance.
Reducing the size of a stored procedure will improve the chances that the execution plan generated for it
is more likely to be good for all possible parameter values. Removing code paths and parameters helps
with this, as does removing presentation-layer decisions that are made within the proc.

Conclusion
To wrap up our discussion of parameter sniffing, it is important to be reminded that this is a feature and
not a bug. We should not be automatically seeking workarounds, hacks, or cheats to make the problem
go away. Many quick fixes exist that will resolve a problem for now and allow us to move on to other
priorities. Before adding query hints, trace flags, or otherwise hobbling the query optimizer, consider
every alternate way to improve performance. Local variables, dynamic SQL, RECOMPILE, and OPTIMIZE
for are too often cited as the best solutions, when in fact they are typically misused.

Query optimization techniques in SQL Server:


Database Design and Architecture
Description
One of the best ways to optimize performance in a database is to design it right the first time! Making
design and architecture decisions based on facts and best practices will reduce technical debt and the
number of fixes that you need to implement in the future.
While there are often ways to tweak queries, indexes, and server settings to make things faster, there are
limits to what we can accomplish in a cost-efficient and timely manner. Those limits are dictated by
database architecture and application design. It may often seem that prioritizing good design up-front
will cost more than the ability to roll features out quickly, but in the long-run, the price paid for hasty or
poorly-thought-out decisions will be higher.
This article is a fast-paced dive into the various design considerations that we can make when building
new databases, tables, or procedural TSQL. Consider these as ways to improve performance in the long-
term by making careful, well thought-out decisions now. Predicting the future of an application may not
be easy when we are still nursing it through infancy, but we can consider a variety of tips that will give us
a better shot at doing it right the first time!
“Measure twice, cut once” applies to application development in that regard. We need to accept that
doing it right the first time will be significantly cheaper and less time-consuming than needing to
rearchitect in the future.

Understand the application


Learn the business need behind the application and its database. In addition to a basic understanding of
what an app will do, consider the following questions:
 What is it used for? What is the app and its purpose? What kind of data will be involved?
 Who will access it? Will it be end-users on the web or internal employees? Will it be 5 people,
5,000 people or 5 million people?
 How will they access it? Will it be a web page? Mobile app? Local software on a computer or
private cloud? A reporting interface?
 Are there specific times of day when usage is heavier? Do we need to accommodate busy times
with extra resources? Will quiet times allow for planned maintenance or downtime? What sort of
uptime is expected?
Getting a basic idea of the purpose of a database will allow you to better forecast its future and avoid
making any major mistakes in the beginning. If we know about application quirks, such as busy or quiet
times, the use of ORMs, or the use of an API, we can better understand how it will interact with the
database and make decisions to accommodate that usage.
Often, a good understanding of the app allows us to make simplifying assumptions and cut through a lot
of the decisions we need to make. This conversation may involve both technical folks (architects,
developers, engineers, and other DBAs) or it may involve business reps that have a strong understanding
of the purpose of the app, even if they may not know much about the technical implementation of it.
Here are some more details on the questions and considerations we should make when designing new
database objects:

Scalability
How will the application and its data grow over time? The ways we build, maintain, and query data
change when we know the volume will be huge. Very often we build and test code in very controlled
dev/QA environments in which the data flow does not mirror a real production environment. Even
without this, we should be able to estimate how much an app will be used and what the most common
needs will be.
We can then infer metrics such as database size, memory needs, CPU, and throughput. We want the
hardware that we put our databases on to be able to perform adequately and this requires us to allocate
enough computing resources to make this happen. A 10TB database will likely not perform well (or at all)
on a server with 2GB of RAM available to it. Similarly, a high-traffic application may require faster
throughput on the network and storage infrastructure, in addition to speedy SSDs. A database will only
perform as quickly as its slowest component, and it is up to us to make sure that the slowest component
is fast enough for our app.
How will data size grow over time? Can we easily expand storage and memory easily when needed? If
downtime is not an option, then we will need to consider hardware configurations that will either provide
a ton of extra overhead to start or allow for seamless expansions later on. If we are not certain of data
growth, do we expect the user or customer count to grow? If so, we may be able to infer data or usage
growth based on this.
Licensing matters, too as licensing database software isn’t cheap. We should consider what edition of
SQL Server will function on and what the least expensive edition is that we are allowed to use. A
completely internal server with no customer-facing customer access may be able to benefit from using
Developer edition. Alternatively, the choice between Enterprise and Standard may be decided by features
(such as AlwaysOn) or capacity (memory limitations, for example). A link is provided at the end of this
article with extensive comparisons between editions of SQL Server.
High availability and disaster recovery are very important considerations early-on that often are not
visited until it is too late. What is the expected up-time of the app? How quickly are we expected to
recover from an outage (recovery time objective/RTO)? In addition, how much data loss is tolerated in
the event of an outage or disaster (recovery point objective,/RPO)? These are tough questions as
businesses will often ask for a guarantee of zero downtime and no data loss, but will back off when they
realize the cost to do so is astronomically high. This discussion is very important to have prior to an
application being released as it ensures that contracts, terms of service, and other documentation
accurately reflect the technical capabilities of the systems it resides on. It also allows you to plan ahead
with disaster recovery plans and avoid the panic often associated with unexpected outages.

Data types
One of the most basic decisions that we can make when designing a database is to choose the right data
types. Good choices can improve performance and maintainability. Poor choices will make work for us in

the future 
Choose natural data types that fit the data being stored. A date should be a date, not a string. A bit
should be a bit and not an integer or string. Many of these decisions are holdovers from years ago when
data types were more limited and developers had to be creative in order to generate the data they
wanted.
Choose length, precision, and size that fits the use case. Extra precision may seem like a useful add-on,
but can be confusing to developers who need to understand why a DECIMAL(18,4) contains data with
only two digits of decimal detail. Similarly, using a DATETIME to store a DATE or TIME can also be
confusing and lead to bad data.
When in doubt, consider using a standard, such as ISO5218 for gender, ISO3166 for country, or ISO4217
for currency. These allow you to quickly refer anyone to universal documentation on what data should
look like, what is valid, and how it should be interpreted.
Avoid storing HTML, XML, JSON, or other markup languages in the database. Storing, retrieving, and
displaying this data is expensive. Let the app manage data presentation, not the database. A database
exists to store and retrieve data, not to generate pretty documents or web pages.
Dates and times should be consistent across all tables. If time zones or locations will matter, consider
using UTC time or DATETIMEOFFSET to model them. Upgrading a database in the future to add time
zone support is much harder than using these conventions in the beginning. Dates, times, and durations
are different. Label them so that it is easy to understand what they mean. Duration should be stored in a
one-dimensional scalar unit, such as seconds or minutes. Storing duration in the format
“HH:MM:SS.mmm” is confusing and difficult to manipulate when mathematical operations are needed.

NULLs
Use NULL when non-existence of data needs to be modelled in a meaningful fashion. Do not use made-
up data to fill in NOT NULL columns, such as “1/1/1900” for dates, “-1” for integers, “00:00:00” for times,
or “N/A” for strings. NOT NULL should mean that a column is required by an application and should
always be populated with meaningful data.
NULL should have meaning and that meaning should be defined when the database is being designed.
For example, “request_complete_date = NULL” could mean that a request is not yet complete. “Parent_id
= NULL“ could indicate an entity with no parent.
NULL can be eliminated by additional normalization. For example, a parent-child table could be created
that models all hierarchical relationships for an entity. This may be beneficial if these relationships form a
critical component of how an app operates. Reserve the removal of NULLable columns via normalization
for those that are important to an app or that may require additional supporting schema to function well.
As always, normalization for the sake of normalization is probably not a good thing!
Beware NULL behavior. ORDER BY, GROUP BY, equalities, inequalities, and aggregate functions will all
treat NULL differently. Always SET ANSI_NULLS ON. When performing operations on NULLable columns,
be sure to check for NULL whenever needed. Here is a simple example from Adventureworks:
1 SELECT
2 *
3 FROM Person.Person
4 WHERE Title = NULL
5  
6 SELECT
7 *
8 FROM Person.Person
9 WHERE Title IS NULL
These queries look similar but will return different results. The first query will return 0 rows, whereas the
second will return 18,963 rows:

The reason is that NULL is not a value and cannot be treated like a number or string. When checking for
NULL or working with NULLable columns, always check and validate if you wish to include or exclude
NULL values, and always use IS NOT NULL or IS NULL, instead of =, <, >, etc…
SET ANSI NULLS ON is a default in SQL Server and should be left as a default. Adjusting this will change
how the above behavior works and will go against ANSI standards. Building code to handle NULL
effectively is a far more scalable approach than adjusting this setting.
Object names
Naming things is hard! Choosing descriptive, useful names for objects will greatly improve readability
and the ability for developers to easily use those objects for their work and not make unintended
mistakes.
Name an object for what it is. Include units in the name if they are not absurdly obvious.
“duration_in_seconds” is much more useful than “duration”. “Length_inches” is easier to understand than
“length”. Bit columns should be named in the positive and match the business use case: “is_active”,
“is_flagged_for_deletion”, “has_seventeen_pizzas”. Negative columns are usually confusing:
“is_not_taxable”, “has_no_pizzas”, “is_not_active” will lead to mistakes and confusion as they are not
intuitive. Database schema should not require puzzle-solving skills to understand ?
Other things to avoid:
 Abbreviations & shorthand. This is rarely not confusing. If typing speed is a concern for slower
typists, consider the many tools available that provide Intellisense or similar auto-completion
features.
 Spaces & special characters. They will break maintenance processes, confuse developers, and be
a nuisance to type correctly when needed. Stick to numbers, letters, and underscores.
 Reserved words. If it’s blue, white, or pink in SSMS, don’t use it! This only causes confusion and
increases the chances of logical coding errors.
Consistency is valuable and creating effective naming schemes early will pay dividends later when there
is no need to “fix” standards to not be awful. As for the debate between capitalization and whether you
should use no capitals, camel case, pascal case, etc…, this is completely arbitrary and up to a
development team. In databases with lots of objects, prefixes can be used to allow objects of specific
types, origins, or purposes to be easily searchable. Alternatively, different schemas can be used to divide
up objects of different types.
Good object naming reduces mistakes and errors while speeding up app development. While nothing is
truly self-documenting, quality object names reduce the need to find additional resources (docs or
people) to determine what something is for or what it means.
Old Data
Whenever data is created, ask the question, “How long should it exist for?”. Forever is a long time and
most data does not need to live forever. Find out or create a data retention policy for all data and write
code to enforce it. Many businesses have compliance or privacy rules to follow that may dictate how long
data needs to be retained for.
Limiting data size is a great way to improve performance and reduce data footprint! This is true for any
table that stores historical data. A smaller table means smaller indexes, less storage use, less memory use,
and less network bandwidth usage. All scans and seeks will be faster as the indexes are more compact
and quicker to search.
There are many ways to deal with old data. Here are a few examples:
 Delete it. Forever. If allowed, this is an easy and quick solution.
 Archive it. Copy it to a secondary location (different database, server, partition, etc…) and then
delete it.
 Soft-delete it. Have a flag that indicates that it is no longer relevant and can be ignored in normal
processes. This is a good solution when you can leverage different storage partitions, filtered
indexes, or ways to segregate data as soon as it is flagged as old.
 Nothing. Some data truly is needed forever. If so, consider how to make the underlying structures
scalable so that they perform well in the future. Consider how large the tables can grow.
Data retention doesn’t only involve production OLTP tables, but may also include backup files, reporting
data, or data copies. Be sure to apply your retention policies to everything!

Cartesian Products (Cross Joins/No Join Predicate)


All joins occur between some data set and a new data set. In bringing them together, we are connecting
them via some set of keys. When we join data sets without any matching criteria, we are performing a
CROSS JOIN (or cartesian product). While this can be a useful way to generate data needed by an
application, it can also be a performance and data quality issue when done unintentionally.
There are a variety of ways to generate CROSS JOIN conditions:
 Use the CROSS JOIN operator
 Enter incorrect join criteria
 Unintentionally omit a join predicate
 Forget a WHERE clause
The following query is an example of the second possibility:
1 SELECT
2 Product.Name,
3 Product.ProductNumber,
4 ProductModel.Name AS Product_Model_Name
5 FROM Production.Product
6 INNER JOIN Production.ProductModel
7 ON ProductModel.ProductModelID = ProductModel.ProductModelID
8 WHERE Product.ProductID = 777;
What we expect is a single row returned with some product data. What we get instead are 128 rows, one
for each product model:

We have two hints that something has gone wrong: An overly large result set, and an unexpected index
scan in the execution plan:

Upon closer inspection of our query, it becomes obvious that I fat-fingered the INNER JOIN and did not
enter the correct table names:
1 INNER JOIN Production.ProductModel
2 ON ProductModel.ProductModelID = ProductModel.ProductModelID
By entering ProductModel on both sides of the join, I inadvertently told SQL Server to not
join Product to ProductModel, but instead join Product to the entirety of ProductModel. This occurs
because ProductModel.ProductModel will always equal itself. I could have entered “ON 1 = 1” for the join
criteria and seen the same results.
The correction here is simple, adjust the join criteria to connect Product to ProductModel, as was
intended:
1 INNER JOIN Production.ProductModel
2 ON Product.ProductModelID = ProductModel.ProductModelID
Once fixed, the query returns a single row and utilizes an index seek on ProductModel.
Situations in which a join predicate is missing or wrong can be difficult to detect. SQL Server does not
always warn you of this situation, and you may not see an error message or show-stopping bad
performance that gets your immediate attention. Here are some tips on catching bad joins before they
cause production headaches:
 Make sure that each join correlates an existing data set with the new table. CROSS JOINs should
only be used when needed (and intentionally) to inflate the size/depth a data set.
 An execution plan may indicate a “No Join Predicate” warning on a specific join in the execution
plan. If so, then you’ll know exactly where to begin your research.
 Check the size of the result set. Is it too large? Are any tables being cross joined across an entire
data set, resulting in extra rows of legit data with extraneous data tacked onto the end of it?
 Do you see any unusual index scans in the execution plan? Are they for tables where you expect
to only seek a few rows, such as in a lookup table?
For reference, here is an example of what a “No Join Predicate” warning looks like:

We’ll follow the standard rule that yellow and red exclamation marks will always warrant further
investigation. In doing so, we can see that this specific join is flagged as having no join predicate. In a
short query, this is easy to spot, but in a larger query against many tables, it is easy for these problems to
get buried in a larger execution plan.

Iteration
SQL Server is optimized for set-based operations and performs best when you read and write data in
batches, rather than row-by-row. Applications are not constrained in this fashion and often use iteration
as a method to parse data sets.
While it may anecdotally seem that collecting 100 rows from a table one-at-a-time or all at once would
take the same effort overall, the reality is that the effort to connect to storage and read pages into
memory takes a distinct amount of overhead. As a result, one hundred index seeks of one row each will
take far more time and resources than one seek of a hundred rows:
1 DECLARE @id INT = (SELECT MIN(BusinessEntityID) FROM HumanResources.Employee)
2 WHILE @id <= 100
3 BEGIN
4 UPDATE HumanResources.Employee
5 SET VacationHours = VacationHours + 4
6 WHERE BusinessEntityID = @id
7 AND VacationHours < 200;
8
9 SET @id = @id + 1;
10 END
This example is simple: iterate through a loop, update an employee record, increment a counter and
repeat 99 times. The performance is slow and the execution plan/IO cost abysmal:
At first glance, things seem good: Lots of index seeks and each read operation is inexpensive. When we
look more closely, we realize that while 2 reads may seem cheap, we need to multiply that by 100. The
same is true for the 100 execution plans that were generated for all of the update operations.
Let’s say we rewrite this to update all 100 rows in a single operation:
1 UPDATE HumanResources.Employee
2 SET VacationHours = VacationHours + 4
3 WHERE VacationHours < 200
4 AND BusinessEntityID <= 100;

Instead of 200 reads, we only need 5, and instead of 100 execution plans, we only need 1.
Data in SQL Server is stored in 8kb pages. When we read rows of data from disk or memory, we are
reading 8kb pages, regardless of the data size. In our iterative example above, each read operation did
not simply read a few numeric values from disk and update one, but had to read all of the necessary 8kb
pages needed to service the entire query.
Iteration is often hidden from view because each operation is fast an inexpensive, making it difficult to
locate it when reviewing extended events or trace data. Watching out for CURSOR use, WHILE loops, and
GOTO can help us catch it, even when there is no single poor-performing operation.
There are other tools available that can help us avoid iteration. For example, a common need when
inserting new rows into a table is to immediately return the IDENTITY value for that new row. This can be
accomplished by using @@IDENTITY or SCOPE_IDENTITY(), but these are not set-based functions. To use
them, we must iterate through insert operations one-at-a-time and retrieve/process the new identity
values after each loop. For row counts greater than 2 or 3, we will begin to see the same inefficiencies
introduced above.
The following code is a short example of how to use OUTPUT INSERTED to retrieve IDENTITY values in
bulk, without the need for iteration:
CREATE TABLE #color
1
(color_id SMALLINT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED, color_name VARCHAR(50)
2
NOT NULL, datetime_added_utc DATETIME);
3
CREATE TABLE #id_values
4
(color_id SMALLINT NOT NULL PRIMARY KEY CLUSTERED, color_name VARCHAR(50) NOT NULL);
5
 
6
INSERT INTO #color
7
(color_name, datetime_added_utc)
8
OUTPUT INSERTED.color_id, INSERTED.color_name
9
INTO #id_values
10
VALUES
11
('Red', GETUTCDATE()),
12
('Blue', GETUTCDATE()),
13
('Yellow', GETUTCDATE()),
14
('Brown', GETUTCDATE()),
15
('Pink', GETUTCDATE());
16
 
17
SELECT * FROM #id_values;
18
 
19
DROP TABLE #color;
20
DROP TABLE #id_values;
In this script, we insert new rows into #color in a set-based fashion, and pull the newly inserted IDs, as
well as color_name, into a temp table. Once in the temp table, we can use those new values for whatever
additional operations are required, without the need to iterate through each INSERT operation one-at-a-
time.
Window functions are also very useful for minimizing the need to iterate. Using them, we can pull row
counts, sums, min/max values, and more without executing additional queries or iterating through data
windows manually:
SELECT
1 SalesOrderHeader.SalesOrderID,
2 SalesOrderDetail.SalesOrderDetailID,
3 SalesOrderHeader.SalesPersonID,
4 ROW_NUMBER() OVER (PARTITION BY SalesOrderHeader.SalesPersonID ORDER BY
5 SalesOrderDetail.SalesOrderDetailID ASC) AS SalesPersonRowNum,
6 SUM(SalesOrderHeader.SubTotal) OVER (PARTITION BY SalesOrderHeader.SalesPersonID ORDER BY
7 SalesOrderDetail.SalesOrderDetailID ASC) AS SalesPersonSales
8 FROM Sales.SalesOrderHeader
9 INNER JOIN Sales.SalesOrderDetail
10 ON SalesOrderDetail.SalesOrderID = SalesOrderHeader.SalesOrderID
11 WHERE SalesOrderHeader.SalesPersonID IS NOT NULL
AND SalesOrderHeader.Status = 5;
The results of this query show us not only a row per detail line, but include a running count of orders per
sales person and a running total of sales:
Window functions are not inherently efficient: The above query required some hefty sort operations to
generate the results. Despite the cost, this is far more efficient that iterating through all sales people,
orders, or some other iterative operation over a large data set:

In addition to avoiding iteration, we also avoid the need for aggregation within our query, allowing us to
freely select whatever columns we’d like without the typical constraints of GROUP BY/HAVING queries.
Iteration is not always a bad thing. Sometimes we need to query all databases on a server or all servers in
a list. Other times we need to call a stored procedure, send emails, or perform other operations that are
either inefficient or impossible to do in a set-based fashion. In these scenarios, make sure that
performance is adequate and that the number of times that a loop needs to be repeated is limited to
prevent unexpected long-running jobs.

Encapsulation
When writing application code, encapsulation is used as a way to reuse code and simplify complex
interfaces. By packaging code into functions, stored procedures, and views, we can very easily offload
important business logic or reusable code to a common place, where it can be called by any other code.
While this sounds like a very good thing, when overused it can very quickly introduce performance
bottlenecks as chains of objects linked together by other encapsulated objects increases. For example: a
stored procedure that calls a stored procedure that uses a function that calls a view that calls a view that
calls a view. This may sound absurd but is a very common outcome when views and nested stored
procedures are relied on heavily.
How does this cause performance problems? Here are a few common ways:
 Unnecessary joins, filters, and subqueries are applied, but not needed.
 Columns are returned that are not needed for a given application.
 INNER JOINs, CROSS JOINs, or filters force reads against tables that are not needed for a given
operation.
 Query size (# of tables referenced in query) results in a poor execution plan.
 Logical mistakes are made due to obfuscated query logic not being fully understood.
Here is an example of an AdventureWorks query in which simple intentions have complex results:
1 SELECT
2 BusinessEntityID,
3 Title,
4 FirstName,
5 LastName
6 FROM HumanResources.vEmployee
7 WHERE FirstName LIKE 'E%'
At first glance, this query is pulling only 4 columns from the employee view. The results are what we
expect, but it runs a bit longer than we’d want (over 1 second). Checking the execution plan and IO stats
reveals:

What we discover is that there was quite a bit going on behind-the-scenes that we were not aware of.
Tables were accessed that we didn’t need, and excess reads performed as a result. This leads us to ask:
What is in vEmployee anyway!? Here is the definition of this view:
1 CREATE VIEW [HumanResources].[vEmployee]
2 AS
3 SELECT
4     e.[BusinessEntityID]
5     ,p.[Title]
6     ,p.[FirstName]
7     ,p.[MiddleName]
8     ,p.[LastName]
9     ,p.[Suffix]
10     ,e.[JobTitle]  
11     ,pp.[PhoneNumber]
12     ,pnt.[Name] AS [PhoneNumberType]
13     ,ea.[EmailAddress]
14     ,p.[EmailPromotion]
15     ,a.[AddressLine1]
16     ,a.[AddressLine2]
17     ,a.[City]
18     ,sp.[Name] AS [StateProvinceName]
19     ,a.[PostalCode]
20     ,cr.[Name] AS [CountryRegionName]
21     ,p.[AdditionalContactInfo]
22 FROM [HumanResources].[Employee] e
23 INNER JOIN [Person].[Person] p
24 ON p.[BusinessEntityID] = e.[BusinessEntityID]
25     INNER JOIN [Person].[BusinessEntityAddress] bea
26     ON bea.[BusinessEntityID] = e.[BusinessEntityID]
27     INNER JOIN [Person].[Address] a
28     ON a.[AddressID] = bea.[AddressID]
29     INNER JOIN [Person].[StateProvince] sp
30     ON sp.[StateProvinceID] = a.[StateProvinceID]
31     INNER JOIN [Person].[CountryRegion] cr
32     ON cr.[CountryRegionCode] = sp.[CountryRegionCode]
33     LEFT OUTER JOIN [Person].[PersonPhone] pp
34     ON pp.BusinessEntityID = p.[BusinessEntityID]
35     LEFT OUTER JOIN [Person].[PhoneNumberType] pnt
36     ON pp.[PhoneNumberTypeID] = pnt.[PhoneNumberTypeID]
37     LEFT OUTER JOIN [Person].[EmailAddress] ea
38     ON p.[BusinessEntityID] = ea.[BusinessEntityID];
This view does not only contain basic Employee data, but also many other tables as well that we have no
need for in our query. While the performance we experienced might be acceptable under some
circumstances, it’s important to understand the contents of any objects we use to the extent that we can
use them effectively. If performance were a key issue here, we could rewrite our query as follows:
1 SELECT
2     e.BusinessEntityID,
3     p.Title,
4     p.FirstName,
5     p.LastName
6 FROM HumanResources.Employee e
7 INNER JOIN Person.Person p
8 ON p.BusinessEntityID = e.BusinessEntityID
9 WHERE FirstName LIKE 'E%'
This version only accesses the tables we need, thereby generating half the reads and a much simpler
execution plan:

It is important to note that encapsulation is in no way a bad thing, but in the world of data, there are
dangers to over-encapsulating business logic within the database. Here are some basic guidelines to
help in avoiding performance problems resulting from the nesting of database objects:
 When possible, avoid nesting views within views. This improves visibility into code and reduces
the chances of misunderstanding the contents of a given view.
 Avoid nesting functions if possible. This can be confusing and lead to challenging performance
problems.
 Avoid triggers that call stored procedures or that perform too much business logic. Nested
triggers are equally dangerous. Use caution when operations within triggers can fire more
triggers.
 Understand the functionality of any defined objects (functions, triggers, views, stored procedures)
prior to use. This will avoid misunderstandings of their purpose.
Storing important and frequently used TSQL in stored procedures, views, or functions can be a great way
to increase maintainability via code reuse. Exercise caution and ensure that the complexity of
encapsulated objects does not become too high. Performance can be inadvertently impacted when
objects are nested many layers deep. When troubleshooting a problem query, always research the
objects involved so that you have full exposure to any views, functions, stored procedures, or triggers
that may also be involved in its execution.

OLTP vs. OLAP


Data is typically accessed either for transactional needs or analytical (reporting) needs. A database can be
effectively optimized to handle either of these scenarios very well. The ways in which we performance
tune for each is very different and needs some consideration when designing database elements.
OLTP
Online transaction processing refers to workloads in which data is written to and read for common
interactive business usage. OLTP workloads are typically characterized by the following patterns:
 More writes, such as adding new data, updating rows, or deleting rows.
 More interactive operations in which people are logging into apps or web sites and directly
viewing or modifying data. This comprises common business tasks.
 Operations on smaller row counts, such as updating an order, adding a new contact, or viewing
recent transactions in a store. These operations often operate on current or recent data only.
 More tables and joins involved in queries.
 Timeliness is important. Since users are waiting for results, latency is not tolerated.
 High transaction counts, but typically small transaction size.
OLTP environments tend to be more relational, with indexes targeted at common updates, searches, or
operations that are the core of an application. OLTP processes generally ensure, and rely on data
integrity. This may necessitate the use of foreign keys, check constraints, default constraints, or triggers
to assist in guaranteeing real-time data integrity.
OLAP
Online analytical processing generally refers to reporting or search environments. These are used for
crunching large volumes of data, such as in reporting, data mining, or analytics. Common features of
OLAP workloads are:
 Typical workloads are read-only, with writes only occurring during designated load/update times.
 Many operations are automated or scheduled to run and be delivered to users at a later time.
These processes are often used to gain insight into a business and to assist in decision making
processes.
 Operations can run on very large quantities of data. This can be crunching data year-over-year,
trending spending over the past quarter, or any other task that may require pulling a longer
history to complete.
 Tables tend to be wider and fewer, allowing for reports to be generated with less joins or
lookups.
 Users may not be waiting for results, which can be delivered via email, file, or some other
asynchronous means. If they are, there may be an expectation of delay due to the size of data.
For reports where timeliness is important, the data can be crunched and staged ahead of time to
assist in speedy results when requested.
 Low transaction count, but typically large transaction sizes.
OLAP environments are usually flatter and less relational. Data is created in OLTP applications and then
passed onto OLAP environments where analytics can take place. As a result, we can often assume that
data integrity has already been established. As a result, constraints, keys, and other similar checks can
often be omitted.
If data is crunched or transformed, we can validate it afterwards, rather than real-time as with OLTP
workloads. Quite a bit of creativity can be exercised in OLAP data, depending on how current data needs
to be, how quickly results are requested, and the volume of history required to service requests.
Keeping them separated
Due to their vastly different needs, it behooves us to separate transactional and analytical systems as
much as possible. One of the most common reasons that applications become slow and we resort to
NOLOCK hints is when we try to run huge search queries or bulky reports against our transactional
production application. As transaction counts become higher and data volume increases, the clash
between transactional operations and analytical ones will increase. The common results are:
 Locking, blocking, and deadlocks when a large search/report runs and users are trying to update
data.
 Over-indexing of tables in an effort to service all kinds of different queries.
 The removal of foreign keys and constraints to speed up writes.
 Application latency.
 Use of query hints as workarounds to performance problems.
 Throwing hardware at the database server in an effort to improve performance.
The optimal solution is to recognize the difference between OLAP and OLTP workloads when designing
an application, and separate these environments on day 1. This often doesn’t happen due to time, cost,
or personnel constraints.
Regardless of the severity of the problem or how long it has persisted, separating operations based on
their type is the solution. Creating a new and separate environment to store OLAP data is the first step.
This may be developed using AlwaysOn, log shipping, ETL processes, storage-level data copying, or many
other solutions to make a copy of the OLTP data.
Once available, offloading operations can be a process that occurs over time. Easing into it allows for
more QA and caution as a business grows familiar with new tools. As more operations are moved to a
separate data store, you’ll be able to take remove reporting indexes from the OLTP data source and
further optimize it for what it does best (service OLTP workloads). Similarly, the new OLAP data store can
be optimized for analytics, allowing you to flatten tables, remove constraints and OLTP indexes, and
make it faster for the operations that it services.
The more separated processes become, the easier it is to optimize each environment for its core uses.
This results not only in far better performance, but also ease of development of new features. Tables built
solely for reporting are far easier to write queries against than transactional tables. Similarly, being able
to update application code with the knowledge that large reports won’t be running against the database
removes many of the performance worries that typically are associated with a mixed environment.

Triggers
Triggers themselves are not bad, but overuse of them can certainly be a performance headache. Triggers
are placed on tables and can fire instead of, or after inserts, updates, and/or deleted.
The scenarios when they can become performance problems is when there are too many of them. When
updating a table results in inserts, updates, or deletes against 10 other tables, tracking performance can
become very challenging as determining the specific code responsible can take time and lots of
searching.
Triggers often are used to implement business/application logic, but this is not what a relational
database is built or optimized for. In general, applications should manage as much of this as possible.
When not possible, consider using stored procedures as opposed to triggers.
The danger of triggers is that they become a part of the calling transaction. A single write operation can
easily become many and result in waits on other processes until all triggers have fired successfully.
To summarize some best practices:
 Use triggers only when needed, and not as a convenience or time-saver.
 Avoid triggers that call more triggers. These can lead to crippling amounts of IO or complex
query paths that are frustrating to debug.
 Server trigger recursion should be turned off. This is the default. Allowing triggers to call
themselves, directly or indirectly, can lead to unstable situations or infinite loops.
 Keep triggers simple and have them execute a single purpose.

Conclusion
Troubleshooting performance can be challenging, time-consuming, and frustrating. One of the best ways
to avoid these troubles is to build a database intelligently up-front and avoid the need to have to fix
things later.
By gathering information about an application and how it is used, we can make smart architecture
decisions that will make our database more scalable and perform better over time. The result will be
better performance and less need to waste time on troubleshooting broken things.

Table of contents
Query optimization techniques in SQL Server: the
basics
May 30, 2018 by Ed Pollack

Description
Fixing and preventing performance problems is critical to the success of any application. We will use a
variety of tools and best practices to provide a set of techniques that can be used to analyze and speed
up any performance problem!
This is one of my personal favorite areas of research and discussion as it is inherently satisfying. Taking a
performance nightmare and tuning it into something fast and sleek feels great and will undoubtedly
make others happy.
I often view optimization as a detective mystery. Something terrible has happened and you need to
follow clues to locate and apprehend the culprit! This series of articles is all about these clues, how to
identify them, and how to use them in order to find the root cause of a performance problem.
 For more information about Query optimization, see the SQL Query Optimization — How to
Determine When and If It’s Needed article

Defining Optimization
What is “optimal”? The answer to this will also determine when we are done with a problem and can
move onto the next one. Often, a query can be sped up through many different means, each of which
has an associated time and resource cost.
We usually cannot spend the resources needed to make a script run as fast as possible, nor should we
want to. For the sake of simplicity, we will define “optimal” as the point at which a query performs
acceptably and will continue to do so for a reasonable amount of time in the future. This is as much as a
business definition as it is a technical definition. With infinite money, time, and computing resources,
anything is possible, but we do not have the luxury of unlimited resources, and therefore must define
what “done” is whenever we chase any performance problem.
This provides us with several useful checkpoints that will force us to re-evaluate our progress as we
optimize:
1. The query now performs adequately.
2. The resources needed to optimize further are very expensive.
3. We have reached a point of diminishing returns for any further optimization.
4. A completely different solution is discovered that renders this unneeded.
Over-optimization sounds good, but in the context of resource management is generally wasteful. A
giant (but unnecessary) covering index will cost us computing resources whenever we write to a table for
the rest of eternity (a long time). A project to rewrite code that was already acceptable might cost days or
weeks of development and QA time. Trying to further tweak an already good query may net a gain of
3%, but take a week of sweating to get there.
Our goal is to solve a problem and not over-solve it.

What Does the Query Do?


Question #1 that we must always answer is: What is the purpose of a query?
 What is its purpose?
 What should the result set look like?
 What sort of code, report, or UI is generating the query?
It is first-nature for us to want to dive in with a sword in hand and slay the dragon as quickly as humanly
possible. We have a trace running, execution plans in hand, and a pile of IO and timing statistics collected

before realizing that we have no idea what we are doing 


Step #1 is to step back and understand the query. Some helpful questions that can aid in optimization:
 How large is the result set? Should we brace ourselves for a million rows returned, or just a few?
 Are there any parameters that have limited values? Will a given parameter always have the
same value, or are there other limitations on values that can simplify our work by eliminating
avenues of research.
 How often is the query executed? Something that occurs once a day will be treated very
differently than one that is run every second.
 Are there any invalid or unusual input values that are indicative of an application
problem? Is one input set to NULL, but never should be NULL? Are any other inputs set to values
that make no sense, are contradictory, or otherwise go against the use-case of the query?
 Are there any obvious logical, syntactical, or optimization problems staring us in the
face? Do we see any immediate performance bombs that will always perform poorly, regardless
of parameter values or other variables? More on these later when we discuss optimization
techniques.
 What is acceptable query performance? How fast must the query be for its consumers to be
happy? If server performance is poor, how much do we need to decrease resource consumption
for it to be acceptable? Lastly, what is the current performance of the query? This will provide us
with a baseline so we know how much improvement is needed.
By stopping and asking these questions prior to optimizing a query, we avoid the uncomfortable
situation in which we spend hours collecting data about a query only to not fully understand how to use
it. In many ways, query optimization and database design force us to ask many of the same questions.
The results of this additional foresight will often lead us to more innovative solutions. Maybe a new index
isn’t needed and we can break a big query into a few smaller ones. Maybe one parameter value is
incorrect and there is a problem in code or the UI that needs to be resolved. Maybe a report is run once
a week, so we can pre-cache the data set and send the results to an email, dashboard, or file, rather than
force a user wait 10 minutes for it interactively.

Tools
To keep things simple, we’ll use only a handful of tools in this article:
Execution Plans
An execution plan provides a graphical representation of how the query optimizer chose to execute a
query:
The execution plan shows us which tables were accessed, how they were accessed, how they were joined
together, and any other operations that occurred along the way. Included are query costs, which are
estimates of the overall expense of any query component. A treasure trove of data is also included, such
as row size, CPU cost, I/O cost, and details on which indexes were utilized.
In general, what we are looking for are scenarios in which large numbers of rows are being processed by
any given operation within the execution plan. Once we have found a high cost component, we can
zoom in on what the cause is and how to resolve it.
STATISTICS IO
This allows us to see how many logical and physical reads are made when a query is executed and may
be turned on interactively in SQL Server Management Studio by running the following TSQL:
SET STATISTICS IO ON;
Once on, we will see additional data included in the Messages pane:

Logical reads tell us how many reads were made from the buffer cache. This is the number that we will
refer to whenever we talk about how many reads a query is responsible for, or how much IO it is causing.
Physical reads tell us how much data was read from a storage device as it was not yet present in memory.
This can be a useful indication of buffer cache/memory capacity problems if data is very frequently being
read from storage devices, rather than memory.
In general, IO will be the primary cause of latency and bottlenecks when analyzing slow queries. The unit
of measurement of STATISTICS IO = 1 read = a single 8kb page = 8192 bytes.
Query Duration
Typically, the #1 reason we will research a slow query is because someone has complained and told us
that it is too slow. The time it takes a query to execute is going to often be the smoking gun that leads us
to a performance problem in need of a solution.
For our work here, we will measure duration manually using the timer found in the lower-right hand
corner of SSMS:

There are other ways to accurately measure query duration, such as setting on STATISTICS TIME, but we’ll
focus on queries that are slow enough that such a level of accuracy will not be necessary. We can easily
observe when a 30 second query is improved to run in sub-second time. This also reinforces the role of
the user as a constant source of feedback as we try to improve the speed of an application.
Our Eyes
Many performance problems are the result of common query patterns that we will become familiar with
below. This pattern recognition allows us to short-circuit a great deal of research when we see something
that is clearly poorly written.
As we optimize more and more queries, quickly identifying these indicators becomes more second-
nature and we’ll get the pleasure of being able to fix a problem quickly, without the need for very time-
consuming research.
In addition to common query mistakes, we will also look out for any business logic hints that may tell us
if there is an application problem, parameter issue, or some other flaw in how the query was generated
that may require involvement from others aside from us.

What Does the Query Optimizer Do?


Every query follows the same basic process from TSQL to completing execution on a SQL Server:
Parsing is the process by which query syntax is checked. Are keywords valid and are the rules of the
TSQL language being followed correctly. If you made a spelling error, named a column using a reserved
word, or forgot a semicolon before a common table expression, this is where you’ll get error messages
informing you of those problems.
Binding checks all objects referenced in your TQL against the system catalogs and any temporary objects
defined within your code to determine if they are both valid and referenced correctly. Information about
these objects is retrieved, such as data types, constraints, and if a column allows NULL or not. The result
of this step is a query tree that is composed of a basic list of the processes needed to execute the query.
This provides basic instructions, but does not yet include specifics, such as which indexes or joins to use.
Optimization is the process that we will reference most often here. The optimizer operates similarly to a
chess (or any gaming) computer. It needs to consider an immense number of possible moves as quickly
as possible, remove the poor choices, and finish with the best possible move. At any point in time, there
may be millions of combinations of moves available for the computer to consider, of which only a
handful will be the best possible moves. Anyone that has played chess against a computer knows that
the less time the computer has, the more likely it is to make an error.
In the world of SQL Server, we will talk about execution plans instead of chess moves. The execution plan
is the set of specific steps that the execution engine will follow to process a query. Every query has many
choices to make to arrive at that execution plan and must do so in a very short span of time.
These choices include questions such as:
 What order should tables be joined?
 What joins should be applied to tables?
 Which indexes should be used?
 Should a seek or scan be used against a given table?
 Is there a benefit in caching data in a worktable or spooling data for future use?
Any execution plan that is considered by the optimizer must return the same results, but the
performance of each plan may differ due to those questions above (and many more!).
Query optimization is a CPU-intensive operation. The process to sift through plans requires significant
computing resources and to find the best plan may require more time than is available. As a result, a
balance must be maintained between the resources needed to optimize the query, the resources
required to execute the query, and the time we must wait for the entire process to complete. As a result,
the optimizer is not built to select the best execution plan, but instead to search and find the best
possible plan after a set amount of time passes. It may not be the perfect execution plan, but we accept
that as a limitation of how a process with so many possibilities must operate.
The metric used to judge execution plans and decide which to consider or not is query cost. The cost has
no unit and is a relative measure of the resources required to execute each step of an execution plan. The
overall query cost is the sum of the costs of each step within a query. You can view these costs in any
execution plan:
Subtree costs for each component of a query are calculated and used to either:
1. Remove a high-cost execution plan and any similar ones from the pool of available plans.
2. Rank the remaining plans based on how low their cost is.
While query cost is a useful metric to understand how SQL Server has optimized a particular query, it is
important to remember that its primary purpose is to aid the query optimizer in choosing good
execution plans. It is not a direct measure of IO, CPU, memory, duration, or any other metric that matters
to an application user waiting for query execution to complete. A low query cost may not indicate a fast
query or the best plan. Alternatively, a high query cost may sometimes be acceptable. As a result, it’s best
to not rely heavily on query cost as a metric of performance.
As the query optimizer churns through candidate execution plans, it will rank them from lowest cost to
highest cost. Eventually, the optimizer will reach one of the following conclusions:
 Every execution plan has been evaluated and the best one chosen.
 There isn’t enough time to evaluate every plan, and the best one thus far is chosen.
Once an execution plan is chosen, the query optimizer’s job is complete and we can move to the final
step of query processing.
Execution is the final step. SQL Server takes the execution plan that was identified in the optimization
step and follows those instructions in order to execute the query.
A note on plan reuse: Because optimizing is an inherently expensive process, SQL Server maintains an
execution plan cache that stores details about each query executed on a server and the plan that was
chosen for it. Typically, databases experience the same queries executed over and over again, such as a
web search, order placement, or social media post. Reuse allows us to avoid the expensive optimization
process and rely on the work we have previously done to optimize a query.
When a query is executed that already has a valid plan in cache, that plan will be chosen, rather than
going through the process of building a new one. This saves computing resources and speeds up query
execution immensely. We’ll discuss plan reuse more in a future article when we tackle parameter sniffing.

Common Themes in Query Optimization


With the introduction out of the way, let’s dive into optimization! The following is a list of the most
common metrics that will assist in optimization. Once the basics are out of the way, we can use these
basic processes to identify tricks, tips, and patterns in query structure that can be indicative of poor
performance.

Index Scans
Data may be accessed from an index via either a scan or a seek. A seek is a targeted selection of rows
from the table based on a (typically) narrow filter. A scan is when an entire index is searched to return the
requested data. If a table contains a million rows, then a scan will need to traverse all million rows to
service the query. A seek of the same table can traverse the index’s binary tree quickly to return only the
data needed, without the need to inspect the entire table.
If there is a legitimate need to return a great deal of data from a table, then an index scan may be the
correct operation. If we needed to return 950,000 rows from a million row table, then an index scan
makes sense. If we only need to return 10 rows, then a seek would be far more efficient.
Index scans are easy to spot in execution plans:
1 SELECT
2 *
3 FROM Sales.OrderTracking
4 INNER JOIN Sales.SalesOrderHeader
5 ON SalesOrderHeader.SalesOrderID = OrderTracking.SalesOrderID
6 INNER JOIN Sales.SalesOrderDetail
7 ON SalesOrderDetail.SalesOrderID = SalesOrderHeader.SalesOrderID
8 WHERE OrderTracking.EventDateTime = '2014-05-29 00:00:00';

We can quickly spot the index scan in the top-right corner of the execution plan. Consuming 90% of the
resources of the query, and being labeled as a clustered index scan quickly lets us know what is going on
here. STATISTICS IO also shows us a large number of reads against the OrderTracking table:

Many solutions are available when we have identified an undesired index scan. Here is a quick list of
some thoughts to consider when resolving an index scan problem:
 Is there any index that can handle the filter in the query?
o In this example, is there an index on EventDateTime?
 If no index is available, should we create one to improve performance on the query?
o Is this query executed often enough to warrant this change? Indexes improve read speeds on
queries, but will reduce write speeds, so we should add them with caution.
 Is this a valid filter? Is this column one that no one should ever filter on?
o Should we discuss this with those responsible for the app to determine a better way to search for
this data?
 Is there some other query pattern that is causing the index scan that we can resolve? We will
attempt to more thoroughly answer this question below. If there is an index on the filter column
(EventDataTime in this example), then there may be some other shenanigans here that require
our attention!
 Is the query one for which there is no way to avoid a scan?
o Some query filters are all-inclusive and need to search the entire table. In our demo above,
if EventDateTIme happens to equal “5-29-2014” in every row in Sales.OrderTracking, then a scan is
expected. Similarly, if we were performing a fuzzy string search, an index scan would be difficult
to avoid without implementing a Full-Text Index, or some similar feature.
As we walk through more examples, we’ll find a wide variety of other ways to identify and resolve
undesired index scans.

Functions Wrapped Around Joins and WHERE Clauses


A theme in optimization is a constant focus on joins and the WHERE clause. Since IO is generally our
biggest cost, and these are the query components that can limit IO the most, we’ll often find our worst
offenders here. The faster we can slice down our data set to only the rows we need, the more efficient
query execution will be!
When evaluating a WHERE clause, any expressions involved need to be resolved prior to returning our
data. If a column contains functions around it, such as DATEPART, SUBSTRING, or CONVERT, then these
functions will also need to be resolved. If the function must be evaluated prior to execution to determine
a result set, then the entirety of the data set will need to be scanned to complete that evaluation.
Consider the following query:
1 SELECT
2 Person.BusinessEntityID,
3 Person.FirstName,
4 Person.LastName,
5 Person.MiddleName
6 FROM Person.Person
7 WHERE LEFT(Person.LastName, 3) = 'For';
This will return any rows from Person.Person that have a last name beginning in “For”. Here is how the
query performs:

Despite only returning 4 rows, the entire index was scanned to return our data. The reason for this
behavior is the use of LEFT on Person.LastName. While our query is logically correct and will return the
data we want, SQL Server will need to evaluate LEFT against every row in the table before being able to
determine which rows fit the filter. This forces an index scan, but luckily one that can be avoided!
When faced with functions in the WHERE clause or in a join, consider ways to move the function onto the
scalar variable instead. Also think of ways to rewrite the query in such a way that the table columns can
be left clean (that is: no functions attached to them!)
The query above can be rewritten to do just this:
1 SELECT
2 Person.BusinessEntityID,
3 Person.FirstName,
4 Person.LastName,
5 Person.MiddleName
6 FROM Person.Person
7 WHERE Person.LastName LIKE 'For%';
By using LIKE and shifting the wildcard logic into the string literal, we have cleaned up
the LastName column, which will allow SQL Server full access to seek indexes against it. Here is the
performance we see on the rewritten version:
The relatively minor query tweak we made allowed the query optimizer to utilize an index seek and pull
the data we wanted with only 2 logical reads, instead of 117.
The theme of this optimization technique is to ensure that columns are left clean! When writing queries,
feel free to put complex string/date/numeric logic onto scalar variables or parameters, but not on
columns. If you are troubleshooting a poorly performing query and notice functions (system or user-
defined) wrapped around column names, then begin thinking of ways to push those functions off into
other scalar parts of the query. This will allow SQL Server to seek indexes, rather than scan, and therefore
make the most efficient decisions possible when executing the query!

Implicit Conversions
Earlier, we demonstrated how wrapping functions around columns can result in unintended table scans,
reducing query performance and increasing latency. Implicit conversions behave the exact same way but
are far more hidden from plain sight.
When SQL Server compares any values, it needs to reconcile data types. All data types are assigned a
precedence in SQL Server and whichever is of the lower precedence will be automatically converted to
the data type of higher precedence. For more info on operator precedence, see the link at the end of this
article containing the complete list.
Some conversions can occur seamlessly, without any performance impact. For example, a VARCHAR(50)
and VARCHAR(MAX) can be compared no problem. Similarly, a TINYINT and BIGINT, DATE and
DATETIME, or TIME and a VARCHAR representation of a TIME type. Not all data types can be compared
automatically, though.
Consider the following SELECT query, which is filtered against an indexed column:
1 SELECT
2 EMP.BusinessEntityID,
3 EMP.LoginID,
4 EMP.JobTitle
5 FROM HumanResources.Employee EMP
6 WHERE EMP.NationalIDNumber = 658797903;
A quick glance and we assume that this query will result in an index seek and return data to us quite
efficiently. Here is the resulting performance:
Despite only looking for a single row against an indexed column, we got a table scan for our efforts.
What happened? We get a hint from the execution plan in the yellow exclamation mark over the SELECT
operation:

Hovering over the operator reveals a CONVERT_IMPLICIT warning. Whenever we see this, it is an
indication that we are comparing two data types that are different enough from each other that they
cannot be automatically converted. Instead, SQL Server converts every single value in the table prior to
applying the filter.

When we hover over the NationalIDNumber column in SSMS, we can confirm that it is in fact an


NVARCHAR(15). The value we are comparing it to is a numeric. The solution to this problem is very
similar to when we had a function on a column: Move the conversion over to the scalar value, instead of
the column. In this case, we would change the scalar value 658797903 to the string representation,
‘658797903’:
1 SELECT
2 EMP.BusinessEntityID,
3 EMP.LoginID,
4 EMP.JobTitle
5 FROM HumanResources.Employee EMP
6 WHERE EMP.NationalIDNumber = '658797903'
This simple change will completely alter how the query optimizer handles the query:
The result is an index seek instead of a scan, less IO, and the implicit conversion warning is gone from
our execution plan.
Implicit conversions are easy to spot as you’ll get a prominent warning from SQL Server in the execution
plan whenever it happens. Once you’ve been tipped off to this problem, you can check the data types of
the columns indicated in the warning and resolve the issue.

Conclusion
Query optimization is a huge topic that can easily become overwhelming without a good dose of focus.
The best way to approach a performance problem is to find specific areas of focus that are most likely
the cause of latency. A stored procedure could be 10,000 lines long, but only a single line needs to be
addressed to resolve the problem. In these scenarios, finding the suspicious, high-cost, high resource-
consuming parts of a script can quickly narrow down the search and allow us to solve a problem rather
than hunt for it

Methods to avoid the SQL divide by zero error


October 21, 2019 by Rajendra Gupta

This article explores the SQL divide by zero error and various methods for eliminating this.

Introduction
We all know that in math, it is not possible to divide a number by zero. It leads to infinity:

Source: www.1dividedby0.com
If you try to do in calculator also, you get the error message – Cannot Divide by zero:

We perform data calculations in SQL Server for various considerations. Suppose we perform an
arithmetic division operator for calculating a ratio of products in a store. Usually, the division works fine,
and we get the ratio:
1 DECLARE @Product1 INT;
2     DECLARE @Product2 INT;
3     SET @Product1 = 50;
4     SET @Product2 = 10;
5     SELECT @Product1 / @Product2 ProductRatio;
Someday, the product2 quantity goes out of stock and that means we do not have any quantity for
product2. Let’s see how the SQL Server query behaves in this case:
1 DECLARE @Product1 INT;
2     DECLARE @Product2 INT;
3     SET @Product1 = 50;
4     SET @Product2 = 0;
5     SELECT @Product1 / @Product2 ProductRatio;
We get SQL divide by zero error messages (message id 8134, level 16):

We do not want our code to fail due to these errors. It is a best practice to write code in such a way that
it does not give divide by zero message. It should have a mechanism to deal proactively with such
conditions.
SQL Server provides multiple methods for avoiding this error message. Let’s explore it in the next section.

Method 1: SQL NULLIF Function


We use NULLIF function to avoid divide by zero error message.
The syntax of NULLIF function:
1 NULLIF(expression1, expression2)
It accepts two arguments.
 If both the arguments are equal, it returns a null value
For example, let’s say both arguments value is 10:
1 SELECT NULLIF(10, 10) result;
In the screenshot, we can see that the output is null:

 If both the arguments are not equal, it returns the value of the first argument
In this example, both argument values differ. It returns the output as value of first argument 10:
1 SELECT NULLIF(10, 5) result;

Let’s modify our initial query using the SQL NULLIF statement. We place the following logic using NULLIF
function for eliminating SQL divide by zero error:
 Use NULLIF function in the denominator with second argument value zero
 If the value of the first argument is also, zero, this function returns a null value. In SQL Server, if
we divide a number with null, the output is null as well
 If the value of the first argument is not zero, it returns the first argument value and division takes
place as standard values
1 DECLARE @Product1 INT;
2         DECLARE @Product2 INT;
3         SET @Product1 = 50;
4         SET @Product2 = 0;
5         SELECT @Product1 / NULLIF(@Product2,0) ProductRatio;
Execute this modified query. We can see the output NULL because denominator contains value zero.

Do we want null value in the output? Is there any method to avoid null and display a definite value?
Yes, we can use SQL ISNULL function to avoid null values in the output. This function replaces the null
value in the expression1 and returns expression2 value as output.
Let’s explore the following query with a combination of SQL NULLIF and SQL ISNULL function:
 First argument ((@Product1 / NULLIF(@Product2,0)) returns null
 We use the ISNULL function and specify the second argument value zero. As we have the first
argument null, the output of overall query is zero (second argument value)
1 DECLARE @Product1 INT;
2         DECLARE @Product2 INT;
3         SET @Product1 = 50;
4         SET @Product2 = 0;
5         SELECT ISNULL(@Product1 / NULLIF(@Product2,0),0) ProductRatio;

Method 2: Using CASE statement to avoid divide by


zero error
We can use a CASE statement in SQL to return values based on specific conditions. Look at the following
query. It does the following task with the Case statement.
The Case statement checks for the value of @Product2 parameter:
 If the @Product2 value is zero, it returns null
 If the above condition is not satisfied, it does the arithmetic operation (@Product1/@Product2)
and returns the output

Method 3: SET ARITHABORT OFF


We can use set methods to control query behavior. By default, SQL Server has a default value of SET
ARITHABORT is ON. We get SQL divide by zero error in the output using the default behavior.
The T-SQL syntax for controlling the ARITHABORT option is shown below:
1 SET ARITHABORT { ON | OFF }
 Using ARITHABORT ON, the query will terminate with divide by zero message. It is the default
behavior. For this demo, let’s enable it using the SET ARITHABORT ON statement:
1 SET ARITHABORT ON   -- Default
2         SET ANSI_WARNINGS ON
3         DECLARE @Product1 INT;
4         DECLARE @Product2 INT;
5         SET @Product1 = 50;
6         SET @Product2 = 0;
7         SELECT @Product1 / @Product2 ProductRatio;
 We get the SQL divide by zero error messages:


 Using ARITHABORT OFF, the batch will terminate and returns a null value. We need to use
ARITHABORT in combination with SET ANSI_WARNINGS OFF to avoid the error message:
We can use the following query to check the current setting for the ARITHABORT parameter:
1 DECLARE @ARITHABORT VARCHAR(3) = 'OFF';  
2     IF ( (64 & @@OPTIONS) = 64 ) SET @ARITHABORT = 'ON';  
3     SELECT @ARITHABORT AS ARITHABORT;
The default ARITHABORT setting for SSMS is ON. We can view it using SSMS Tools properties.
Navigate to Tools -> Options -> Advanced:

Many client applications or drivers provide a default value of ARITHABORT is OFF. The different
values might force SQL Server to produces a different execution plan, and it might create
performance issues. You should also match the setting similar to a client application while
troubleshooting the performance issues.
Note:  You should not modify the value of ARITHABORT unless required. It might create
performance issues, as well. I would suggest using alternative methods (as described earlier) for
avoiding SQL divide by zero error

Working with the SQL Server command line


(sqlcmd)
Getting Started
1. Working with sqlcmd interactive mode
In interactive mode, you can write the input and interact using the command line.
a. How to connect to SQL Server using sqlcmd
To connect to your local machine, specify the SQL Instance name and the credentials:
sqlcmd -S DESKTOP-5K4TURF\SQLEXPRESS -E
The –S value is to specify the SQL Server name of the instance and -E is to specify a trusted
connection. If you do not specify the SQL Server name, it will try to connect to the local machine.
When you connect, you will see the number 1>:
The number 1> means that it is connected and ready to receive sentences to execute.
If you enabled SQL Server Authentication, you will need to specify a user name and a user
password (I am assuming that the user is already created). Note that you will need to EXIT of
sqlcmd to login with this credential.
sqlcmd -S DESKTOP-5K4TURF\SQLEXPRESS -U jsmith
The command line will ask you the password. You can optionally specify the password (not
recommended, but sometimes is the only way to work):
sqlcmd -S DESKTOP-5K4TURF\SQLEXPRESS -U jsmith -P
Mypwd$%1234565
b. How to check the current database in sqlcmd
When a SQL Server Login is created, you can define the default database you want to log in. If it
is not specified, the master database is the default one.
1  
2 select DB_NAME()
3 GO
4  

c. How to list the databases in sqlcmd


The following sentences will list the databases in the SQL Instance:
1  
2 select name from sys.databases
3 go
4  
In the sys.databases table, you have all the database information:

You can also use the sp_databases stored procedure:


1  
2 Sp_databases
3 Go
4  
d. How to check if the SQL Server is case sensitive in sqlcmd
The following T-SQL Sentences are used to detect the collation information including if the
machine is case sensitive or not:
1  
2 SELECT SERVERPROPERTY('COLLATION')
3 GO
4  
The information displayed will be as follows:

Modern_spanish is the collation, CI means case insensitive and CS is case sensitive. AS means
Accent Sensitive and AI is Accent Insensitive.
You can also check the information, with the sp_helpsort procedure:
1  
2 sp_helpsort
3 go
4  
The information displayed is the following:
Modern-Spanish, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive
e. How to check the SQL Server edition in SQL
You can check the SQL Server Edition, using the following T-SQL sentences:
1  
2 SELECT SERVERPROPERTY('EDITION')
3 GO
4  
The result is the following:

f. How to check the SQL Server Authentication in sqlcmd


Before Azure, there were two options to Authenticate to SQL Server:
i. Windows Authentication where you can use an Active directory account or a local Windows
account.
ii. Windows Authentication and SQL Authentication where you can also authenticate using an
account created in SQL Server.
To detect the authentication, you can use the following sentences:
1  
2 SELECT SERVERPROPERTY('IsIntegratedSecurityOnly')
3 GO
4  
The result displayed is the following:

If the result is 0, it means that both authentications are enabled. If it is 1, only Windows
Authentication is enabled.
g. How to list the variables set
In order to list all the variables set, run the following command in sqlcmd:
:ListVar
It will show all the variables set:
2. Running sqlcmd in command mode
You can run sqlcmd as commands. You can run scripts in command mode.
a. How to run a T-SQL script and receive the output in a file in sqlcmd
In the next example, we will show how to run a script using sqlcmd and show the results in
another file.
We will first create a script file named columns.sql with the following sentences:
select * from adventureworks2014.information_schema.columns
In the cmd, run the following command to invoke sqlcmd:
sqlcmd -S DESKTOP-5K4TURF\SQLEXPRESS -E -i c:\sql\columns.sql -o c:\sql\exit.txt
-i is used to specify the input. You specify the script file with the queries.
-o is used to show the results of the input in a file.
The exit.txt file will be created:

If we open the file, we will see the output results:


b. How to back up in sqlcmd
We will first create a script to back up the database named backup.sql:
1  
2 BACKUP DATABASE [AdventureWorks2014] TO  DISK = N'C:\SQL\backup.bak'
3 GO
4  
In the cmd run the following command:
sqlcmd -S DESKTOP-5K4TURF\SQLEXPRESS -E -i c:\sql\backup.sql -o
c:\sql\output.txt
The output will be similar to this one:

The commands will create a backup in a file named backup.sql in the c:\sql folder:

c. How to work with variables in sqlcmd


You can work with variables in sqlcmd. The following example will set the variable
DATABASENAME with the value adventureworks2014 and then we change the context to the
database specified:
1  
2 :SETVAR DATABASENAME "adventureworks2014"
3 USE $(DATABASENAME);
4 GO
5  
The result displayed is the following:
As you can see, SETVAR is used to specify the value of the variable. Then you need to use $() for
the variable.
Another example is to set the variable CONTACTTYPEID to 3 and use it in the where clause to
find a contact type ID according to the value of the variable:
1  
2 :SETVAR CONTACTTYPEID 3
3 SELECT [ContactTypeID]
4       ,[Name]
5       ,[ModifiedDate]
6   FROM [Person].[ContactType]
7 where contacttypeid=$(CONTACTTYPEID)
8 GO
9  
The result displayed is the following:

d. How to list the table names of a database in sqlcmd


You can list the tables of the database using the information_schema.tables view. We will first
create a script named tables.sql. This script contains the tables and views:
1  
2 --Script name: tables.sql
3 select table_name from adventureworks2014.information_schema.tables
4 GO
5  
Next, we will invoke sqlcmd to execute the script.
sqlcmd -E -i c:\sql\tables.sql -o c:\sql\output.txt -S DESKTOP-
5K4TURF\SQLEXPRESS
The result displayed are the following in the output.txt file:

e. How to list the column names of a database in sqlcmd


The following sentences will list the table names and the column names of a database in a script
named columns.sql:
1  
2 --Filename columns.sql
3 select table_name, column_name from adventureworks2014.information_schema.columns
4 GO
5  
In the cmd run this command:
sqlcmd -E -i c:\sql\columns.sql -o c:\sql\output.txt -S DESKTOP-
5K4TURF\SQLEXPRESS
The result of the output.txt is the following:

f. How to check all the commands


You can check all the sqlcmd commands using this command:
Sqlcmd -?
This command will list all the commands available:

g. How to exit if the command fails


The following command will exit if it fails using the –b parameter:
sqlcmd -E -q “create table adventureworks” –b -S DESKTOP-
5K4TURF\SQLEXPRESS
The command will exit if there is an error:
h. How to display error messages according to the error level
If there is an error, the error is displayed. However, according to the error level, you can stop this
behavior by default using the -m option.
Here it is an example about this:
The following command shows an error message:

However, if you add the –m 16, the error will no longer be displayed because the error has the
level of 15:
sqlcmd -E -q “create table adventureworks” -m 16 -S
DESKTOP-5K4TURF\SQLEXPRESS
-m 16 will show only the errors higher than 16. As you can see the error message is no longer
displayed
i. How to accept user input
The following example will run a SQL script with one variable. The example will create a database
specified by the user.
We will first create a script named createdb.sql with the following content:
1  
2 --file createdb.sql
3 CREATE DATABASE $(DATABASENAME);
4 GO
5  
Next, in the cmd we will run the database specifying the database name:
sqlcmd -E -v DATABASENAME=”Userinput” -i
c:\sql\createdb.sql
The command will create a database named Userinput.
In sqlcmd you can run the sp_databases stored procedure:
1  
2 Sp_databases
3 GO
4  
And you will be able to see the database created:

3. Working in SSMS in sqlcmd mode


a. How to run sqlcmd in SSMS
Yes, in SSMS, click on your query and select Query>SQLCMD Mode:
The following example will create a database named sales in SSMS.
1  
2 :SETVAR DATABASENAME "sales"
3 create database $(DATABASENAME);
4 GO
5  
If everything is OK, a database named sales will be created:

b. How can we set the sqlcmd mode by default in SSMS?


Yes, to do this, go to Tools>Options in SSMS and check the By default, open new queries in
SQLCMD mode.
4. Working with PowerShell
a. How to invoke sqlcmd using PowerShell
PowerShell can be used to invoke sqlcmd. To open PowerShell for SQL Server, go to the Windows
Search and write sqlps:
In sqlps, write these cmdlets to run the sp_who stored procedure:
invoke-sqlcmd -query “sp_who”

Note that if you have SSMS 17 or later, SQL PowerShell is installed separately. For more
information about installing SQL PowerShell, refer to our link:
 What is new in SSMS 17; PowerShell and DAX
b. How to run scripts in SQL PowerShell (check table fragmentation)
It is possible to run SQL Server scripts with PowerShell. The following example will show the
fragmentation of the table of the table Person.Address in the Adventureworks database.
We will first create a script named fragmentation.sql:
1  
2 DECLARE @db_id SMALLINT=DB_ID('AdventureWorks');  
3 DECLARE @object_id INT=OBJECT_ID(N'AdventureWorks.Person.Address');
4  
5 SELECT * FROM sys.dm_db_index_physical_stats(@db_id,
6 @object_id, NULL, NULL , 'LIMITED');  
7 GO
8  
In PowerShell for SQL Server, run this script:
Invoke-sqlcmd –inputfile “c: \sql\fragmentation.sql” | Out-File
-filePath “C:\sql\outps.txt”
The output of the outps.txt file will be the following:

c. How to use verbose output


Verbose is used to display information that is not displayed by default. For example, the
command print is not displayed by default. Let’s take a look to an example.
In sqlps, run this cmdlet:
Invoke-Sqlcmd -Query “PRINT ‘HELLO SQLSHACK'”
The cmdlet will not return any value. However, if you run with the parameter verbose, the output
can be displayed:
Invoke-Sqlcmd -Query “PRINT ‘HELLO SQLSHACK'”
–verbose

5. DAC
a. How to work with a Dedicated Administrator Connection (DAC) in sqlcmd
If SQL Server fails to connect in SSMS or other tools, it is possible to try a DAC connection. This
connection is connection allows to diagnostic and verify the problems of the Database Server.
When the SQL Server is corrupt and it is not possible to connect to it, DAC connection usually
works.
The following example shows how to connect to a SQL Server database:
sqlcmd -S DESKTOP-5K4TURF -E -A -d master
-A is used to specify a DAC connection and -d is used to specify the database to connect.
A DAC connection requires the SQL Browser service to be started and enabled. To enable the SQL
Browser service, if it is disabled, you can use the following commands:
sc config sqlbrowser start=demand
If it is enabled, the message will be the following:
To start the service, you can use the following commands:
net start sqlbrowser

6. When to use sqlcmd mode, interactive mode, DAC, SSMS, PowerShell


Use interactive mode when you need to run multiple queries and administrative tasks. The
sqlcmd command line mode is used when you have specific tasks like a backup. Use it when you
have a specific task in mind. DAC is used for disaster recovery (for example when the master
database is damaged and you cannot access to SQL Server using SSMS or other conventional
tools). SSMS in sqlcmd mode can be used to create scripts. It is great to debug and program
large scripts to be used later in the command line mode.
Use PowerShell if you have other PowerShell scripts and you need to integrate some sqlcmd
invocations to it.

Conclusion
Sqlcmd is a very powerful feature that can help us to automate tasks in SQL Server. You can run
scripts and save the results of your queries in a text file.
Previous article in this series:

How to implement error handling in SQL Server


June 15, 2018 by Bojan Petrovic

Error handling overview


Error handling in SQL Server gives us control over the Transact-SQL code. For example, when things go
wrong, we get a chance to do something about it and possibly make it right again. SQL Server error
handling can be as simple as just logging that something happened, or it could be us trying to fix an
error. It can even be translating the error in SQL language because we all know how technical SQL Server
error messages could get making no sense and hard to understand. Luckily, we have a chance to
translate those messages into something more meaningful to pass on to the users, developers, etc.
In this article, we’ll take a closer look at the TRY… CATCH statement: the syntax, how it looks, how it
works and what can be done when an error occurs. Furthermore, the method will be explained in a SQL
Server case using a group of T-SQL statements/blocks, which is basically SQL Server way of handling
errors. This is a very simple yet structured way of doing it and once you get the hang of it, it can be quite
helpful in many cases.
On top of that, there is a RAISERROR function that can be used to generate our own custom error
messages which is a great way to translate confusing error messages into something a little bit more
meaningful that people would understand.

Handling errors using TRY…CATCH


Here’s how the syntax looks like. It’s pretty simple to get the hang of. We have two blocks of code:
1 BEGIN TRY  
2      --code to try
3 END TRY  
4 BEGIN CATCH  
5      --code to run if an error occurs
6 --is generated in try
7 END CATCH
Anything between the BEGIN TRY and END TRY is the code that we want to monitor for an error. So, if an
error would have happened inside this TRY statement, the control would have immediately get
transferred to the CATCH statement and then it would have started executing code line by line.
Now, inside the CATCH statement, we can try to fix the error, report the error or even log the error, so we
know when it happened, who did it by logging the username, all the useful stuff. We even have access to
some special data only available inside the CATCH statement:
 ERROR_NUMBER – Returns the internal number of the error
 ERROR_STATE – Returns the information about the source
 ERROR_SEVERITY – Returns the information about anything from informational errors to errors
user of DBA can fix, etc.
 ERROR_LINE – Returns the line number at which an error happened on
 ERROR_PROCEDURE – Returns the name of the stored procedure or function
 ERROR_MESSAGE – Returns the most essential information and that is the message text of the
error
That’s all that is needed when it comes to SQL Server error handling. Everything can be done with a
simple TRY and CATCH statement and the only part when it can be tricky is when we’re dealing with
transactions. Why? Because if there’s a BEGIN TRANSACTION, it always must end with a COMMIT or
ROLLBACK transaction. The problem is if an error occurs after we begin but before we commit or
rollback. In this particular case, there is a special function that can be used in the CATCH statement that
allows checking whether a transaction is in a committable state or not, which then allows us to make a
decision to rollback or to commit it.
Let’s head over to SQL Server Management Studio (SSMS) and start with basics of how to handle SQL
Server errors. The AdventureWorks 2014 sample database is used throughout the article. The script
below is as simple as it gets:
1 USE AdventureWorks2014
2 GO
3 -- Basic example of TRY...CATCH
4  
5 BEGIN TRY
6 -- Generate a divide-by-zero error  
7   SELECT
8     1 / 0 AS Error;
9 END TRY
10 BEGIN CATCH
11   SELECT
12     ERROR_NUMBER() AS ErrorNumber,
13     ERROR_STATE() AS ErrorState,
14     ERROR_SEVERITY() AS ErrorSeverity,
15     ERROR_PROCEDURE() AS ErrorProcedure,
16     ERROR_LINE() AS ErrorLine,
17     ERROR_MESSAGE() AS ErrorMessage;
18 END CATCH;
19 GO
This is an example of how it looks and how it works. The only thing we’re doing in the BEGIN TRY is
dividing 1 by 0, which, of course, will cause an error. So, as soon as that block of code is hit, it’s going to
transfer control into the CATCH block and then it’s going to select all of the properties using the built-in
functions that we mentioned earlier. If we execute the script from above, this is what we get:
We got two result grids because of two SELECT statements: the first one is 1 divided by 0, which causes
the error and the second one is the transferred control that actually gave us some results. From left to
right, we got ErrorNumber, ErrorState, ErrorSeverity; there is no procedure in this case (NULL), ErrorLine,
and ErrorMessage.
Now, let’s do something a little more meaningful. It’s a clever idea to track these errors. Things that are
error-prone should be captured anyway and, at the very least, logged. You can also put triggers on these
logged tables and even set up an email account and get a bit creative in the way of notifying people
when an error occurs.
If you’re unfamiliar with database email, check out this article for more information on the emailing
system: How to configure database mail in SQL Server
The script below creates a table called DB_Errors, which can be used to store tracking data:
1 -- Table to record errors
2  
3 CREATE TABLE DB_Errors
4          (ErrorID        INT IDENTITY(1, 1),
5           UserName       VARCHAR(100),
6           ErrorNumber    INT,
7           ErrorState     INT,
8           ErrorSeverity  INT,
9           ErrorLine      INT,
10           ErrorProcedure VARCHAR(MAX),
11           ErrorMessage   VARCHAR(MAX),
12           ErrorDateTime  DATETIME)
13 GO
Here we have a simple identity column, followed by username, so we know who generated the error and
the rest is simply the exact information from the built-in functions we listed earlier.
Now, let’s modify a custom stored procedure from the database and put an error handler in there:
1 ALTER PROCEDURE dbo.AddSale @employeeid INT,
2                    @productid  INT,
3                    @quantity   SMALLINT,
4                    @saleid     UNIQUEIDENTIFIER OUTPUT
5 AS
6 SET @saleid = NEWID()
7   BEGIN TRY
8     INSERT INTO Sales.Sales
9          SELECT
10            @saleid,
11            @productid,
12            @employeeid,
13            @quantity
14   END TRY
15   BEGIN CATCH
16     INSERT INTO dbo.DB_Errors
17     VALUES
18   (SUSER_SNAME(),
19    ERROR_NUMBER(),
20    ERROR_STATE(),
21    ERROR_SEVERITY(),
22    ERROR_LINE(),
23    ERROR_PROCEDURE(),
24    ERROR_MESSAGE(),
25    GETDATE());
26   END CATCH
27 GO
Altering this stored procedure simply wraps error handling in this case around the only statement inside
the stored procedure. If we call this stored procedure and pass some valid data, here’s what happens:

A quick Select statement indicates that the record has been successfully inserted:
However, if we call the above-stored procedure one more time, passing the same parameters, the results
grid will be populated differently:

This time, we got two indicators in the results grid:


0 rows affected – this line indicated that nothing actually went into the Sales table
1 row affected – this line indicates that something went into our newly created logging table
So, what we can do here is look at the errors table and see what happened. A simple Select statement
will do the job:

Here we have all the information we set previously to be logged, only this time we also got the
procedure field filled out and of course the SQL Server “friendly” technical message that we have a
violation:
Violation of PRIMARY KEY constraint ‘PK_Sales_1′. Cannot insert duplicate key in object’ Sales.Sales’. The
duplicate key value is (20).
How this was a very artificial example, but the point is that in the real world, passing an invalid date is
very common. For example, passing an employee ID that doesn’t exist in a case when we have a foreign
key set up between the Sales table and the Employee table, meaning the Employee must exist in order to
create a new record in the Sales table. This use case will cause a foreign key constraint violation.
The general idea behind this is not to get the error fizzled out. We at least want to report to an individual
that something went wrong and then also log it under the hood. In the real world, if there was an
application relying on a stored procedure, developers would probably have SQL Server error handling
coded somewhere as well because they would have known when an error occurred. This is also where it
would be a clever idea to raise an error back to the user/application. This can be done by adding the
RAISERROR function so we can throw our own version of the error.
For example, if we know that entering an employee ID that doesn’t exist is more likely to occur, then we
can do a lookup. This lookup can check if the employee ID exists and if it doesn’t, then throw the exact
error that occurred. Or in the worst-case scenario, if we had an unexpected error that we had no idea
what it was, then we can just pass back what it was.

Advanced SQL error handling


We only briefly mentioned tricky part with transactions, so here’s a simple example of how to deal with
them. We can use the same procedure as before, only this time let’s wrap a transaction around the Insert
statement:
1 ALTER PROCEDURE dbo.AddSale @employeeid INT,
2                    @productid  INT,
3                    @quantity   SMALLINT,
4                    @saleid     UNIQUEIDENTIFIER OUTPUT
5 AS
6 SET @saleid = NEWID()
7   BEGIN TRY
8     BEGIN TRANSACTION
9     INSERT INTO Sales.Sales
10          SELECT
11            @saleid,
12            @productid,
13            @employeeid,
14            @quantity
15     COMMIT TRANSACTION
16   END TRY
17   BEGIN CATCH
18     INSERT INTO dbo.DB_Errors
19     VALUES
20   (SUSER_SNAME(),
21    ERROR_NUMBER(),
22    ERROR_STATE(),
23    ERROR_SEVERITY(),
24    ERROR_LINE(),
25    ERROR_PROCEDURE(),
26    ERROR_MESSAGE(),
27    GETDATE());
28  
29 -- Transaction uncommittable
30     IF (XACT_STATE()) = -1
31       ROLLBACK TRANSACTION
32  
33 -- Transaction committable
34     IF (XACT_STATE()) = 1
35       COMMIT TRANSACTION
36   END CATCH
37 GO
So, if everything executes successfully inside the Begin transaction, it will insert a record into Sales, and
then it will commit it. But if something goes wrong before the commit takes place and it transfers control
down to our Catch – the question is: How do we know if we commit or rollback the whole thing?
If the error isn’t serious, and it is in the committable state, we can still commit the transaction. But if
something went wrong and is in an uncommittable state, then we can roll back the transaction. This can
be done by simply running and analyzing the XACT_STATE function that reports transaction state.
This function returns one of the following three values:
  1 – the transaction is committable
-1 – the transaction is uncommittable and should be rolled back
  0 – there are no pending transactions
The only catch here is to remember to actually do this inside the catch statement because you don’t want
to start transactions and then not commit or roll them back:

How, if we execute the same stored procedure providing e.g. invalid EmployeeID we’ll get the same
errors as before generated inside out table:
The way we can tell that this wasn’t inserted is by executing a simple Select query, selecting everything
from the Sales table where EmployeeID is 20:

Generating custom raise error SQL message


Let’s wrap things up by looking at how we can create our own custom error messages. These are good
when we know that there’s a possible situation that might occur. As we mentioned earlier, it’s possible
that someone will pass an invalid employee ID. In this particular case, we can do a check before then and
sure enough, when this happens, we can raise our own custom message like saying employee ID does
not exist. This can be easily done by altering our stored procedure one more time and adding the lookup
in our TRY block:
1 ALTER PROCEDURE dbo.AddSale @employeeid INT,
2                    @productid  INT,
3                    @quantity   SMALLINT,
4                    @saleid     UNIQUEIDENTIFIER OUTPUT
5 AS
6 SET @saleid = NEWID()
7   BEGIN TRY
8   IF (SELECT COUNT(*) FROM HumanResources.Employee e WHERE employeeid = @employeeid) = 0
9       RAISEERROR ('EmployeeID does not exist.', 11, 1)
10     
11     INSERT INTO Sales.Sales
12          SELECT
13            @saleid,
14            @productid,
15            @employeeid,
16            @quantity
17   END TRY
18   BEGIN CATCH
19     INSERT INTO dbo.DB_Errors
20     VALUES
21   (SUSER_SNAME(),
22    ERROR_NUMBER(),
23    ERROR_STATE(),
24    ERROR_SEVERITY(),
25    ERROR_LINE(),
26    ERROR_PROCEDURE(),
27    ERROR_MESSAGE(),
28    GETDATE());
29  
30    DECLARE @Message varchar(MAX) = ERROR_MESSAGE(),
31         @Severity int = ERROR_SEVERITY(),
32         @State smallint = ERROR_STATE()
33  
34    RAISEERROR (@Message, @Severity, @State)
35   END CATCH
36 GO
If this count comes back as zero, that means the employee with that ID doesn’t exist. Then we can call the
RAISERROR where we define a user-defined message, and furthermore our custom severity and state. So,
that would be a lot easier for someone using this stored procedure to understand what the problem is
rather than seeing the very technical error message that SQL throws, in this case, about the foreign key
validation.
With the last changes in our store procedure, there also another RAISERROR in the Catch block. If
another error occurred, rather than having it slip under, we can again call the RAISERROR and pass back
exactly what happened. That’s why we have declared all the variables and the results of all the functions.
This way, it will not only get logged but also report back to the application or user.
And now if we execute the same code from before, it will both get logged and it will also indicate that
the employee ID does not exist:

Another thing worth mentioning is that we can actually predefine this error message code, severity, and
state. There is a stored procedure called sp_addmessage that is used to add our own error messages.
This is useful when we need to call the message on multiple places; we can just use RAISERROR and pass
the message number rather than retyping the stuff all over again. By executing the selected code from
below, we then added this error into SQL Server:

This means that now rather than doing it the way we did previously, we can just call the RAISERROR and
pass in the error number and here’s what it looks like:
The sp_dropmessage is, of course, used to drop a specified user-defined error message. We can also view
all the messages in SQL Server by executing the query from below:
1 SELECT * FROM master.dbo.sysmessages

There’s a lot of them and you can see our custom raise error SQL message at the very top.
I hope this article has been informative for you and I thank you for reading.

Six different methods to copy tables between


databases in SQL Server
November 16, 2018 by Prashanth Jayaram

In this article, you’ll learn the key skills that you need to copy tables between SQL Server instances
including both on-premises and cloud SQL databases. In this article, I’ll walk-through several ways of
copying a table(s) between SQL databases, helping you to see the benefits and trade-offs of each option.

Introduction
Before we begin the article, though, let’s go over the objectives of the article. We then move on to the
overview of each module or methods. In this guide, we briefly discuss several aspects of SQL Server’s
available built-in options, as well as show you a few PowerShell and 3 rd party tools can be used to copy
SQL tables between the databases and between the instances as well. At the beginning of each method,
I’ve given you enough information that the following modules. We follow this module up with several
modules, each of which is dedicated to specific methods.

Objectives:
1. Introduction
2. Discuss various methods to copy tables
 Using .Net class library to copy tables with PowerShell
 Using Import-and-Export Wizard
 Using sqlpackage.exe – Extract and Publish method
 Using Generate Scripts wizard in SSMS ( SQL Server Management Studio)
 Using INSERT INTO SQL statement
3. And more…

Get started
In SQL Server, copying tables between the databases of the same SQL instances are relatively easier than
copying the data between the remote servers. To minimize the work-load on the production database, it
is always recommended to restore the database from the backup to the new database and then use the
best methods to copy the data to the target database. Again, this depends on the number of tables, size,
and available space. If the size of the table(s) is more than 50% of the total size of the database than the
backup-and-restore method is a recommended option.
In some cases, you might have to copy a few very large table(s), and then you may probably end-up in
moving the table(s) to separate file-groups and perform a partial backup-and-restore method to copy
the data. You can refer to the article Database Filegroup(s) and Piecemeal restores in SQL Server for more
information.
You can also use third-party tools to perform an object level restore from a backup file.

SqlBulkCopy object class for Data copy with


PowerShell
PowerShell is always my first choice for any administrative task. Net provides a SqlBulkCopy class library
to bulk load the table(s) into the database.
You can refer to the article  6 methods to write PowerShell output to a SQL Server table  to get more
information about .Net class libraries.

PowerShell script
The following PoSH script creates a function named Get-SQLTable. The function has several mandatory
parameters.
1 function Get-SQLTable
2 {
3     [CmdletBinding()]
4     param(
5   
6         [Parameter(Mandatory=$true)]
7         [string] $SourceSQLInstance,
8
9         [Parameter(Mandatory=$true)]
10         [string] $SourceDatabase,        
11         
12         [Parameter(Mandatory=$true)]
13         [string] $TargetSQLInstance,
14         
15         [Parameter(Mandatory=$true)]
16         [string] $TargetDatabase,
17         
18         [Parameter(Mandatory=$true)]
19         [string[]] $Tables,
20
21         [Parameter(Mandatory=$false)]
22         [int] $BulkCopyBatchSize = 10000,
23
24         [Parameter(Mandatory=$false)]
        [int] $BulkCopyTimeout = 600  
25
  
26
    )
27
  
28
    
29
  
30
    $sourceConnStr = "Data Source=$SourceSQLInstance;Initial Catalog=$SourceDatabase;Integrated Security=True;"
31
    $TargetConnStr = "Data Source=$TargetSQLInstance;Initial Catalog=$TargetDatabase;Integrated Security=True;"
32
      
33
    try
34
    {    
35
          
36
        Import-Module -Name SQLServer
37
        write-host 'module loaded'
38
        $sourceSQLServer = New-Object Microsoft.SqlServer.Management.Smo.Server $SourceSQLInstance
39
        $sourceDB = $sourceSQLServer.Databases[$SourceDatabase]
40
       $sourceConn  = New-Object System.Data.SqlClient.SQLConnection($sourceConnStr)
41
    
42
        $sourceConn.Open()
43
      
44
    
45
46
        foreach($table in $sourceDB.Tables)
47
        {
48
        
49
            $tableName = $table.Name
50
            $schemaName = $table.Schema
51
            $tableAndSchema = "$schemaName.$tableName"
52
 
53
             if ($Tables.Contains($tableAndSchema))
54
            {
55
            $Tablescript = ($table.Script() | Out-String)
56
            $Tablescript
57
58
                Invoke-Sqlcmd `
59
                            -ServerInstance $TargetSQLInstance `
60
                            -Database $TargetDatabase `
61
                            -Query $Tablescript
62
  
63
            
64
                    $sql = "SELECT * FROM $tableAndSchema"
65
                    $sqlCommand = New-Object system.Data.SqlClient.SqlCommand($sql, $sourceConn)
66
                    [System.Data.SqlClient.SqlDataReader] $sqlReader = $sqlCommand.ExecuteReader()        
67
                    $bulkCopy = New-Object Data.SqlClient.SqlBulkCopy($TargetConnStr,
68
[System.Data.SqlClient.SqlBulkCopyOptions]::KeepIdentity)
69
                    $bulkCopy.DestinationTableName = $tableAndSchema
70
                    $bulkCopy.BulkCopyTimeOut = $BulkCopyTimeout
71
                    $bulkCopy.BatchSize = $BulkCopyBatchSize
72
                    $bulkCopy.WriteToServer($sqlReader)
73
                    $sqlReader.Close()
74
                    $bulkCopy.Close()
75
                }
76
                }
77
78
79
  
80
81
        $sourceConn.Close()
82
83
84
    }
85
    catch
86
    {
87
        [Exception]$ex = $_.Exception
88
        write-host $ex.Message
89
    }
90
    finally
91
    {
92
        #Return value if any
93
    }
94
}
The $tables array variable is used to assign the list of the table(s) to be copied to the target database
1 [string[]] $tables = @('dbo.OPERATION','dbo.OPERATION_DETAIL')
Let us invoke the Get-SQLTable function with the below mentioned parameters to copy the tables from
Adventureworks2016 database on ‘HQDBT01’ to Adentureworks2012 database on hqdbt01/sql2017’
instance.
Get-SQLTable -SourceSQLInstance hqdbt01 -SourceDatabase AdventureWorks2016 -TargetSQLInstance hqdbt01\sql2017 -
1
TargetDatabase AdventureWorks2012 -Tables $tables -BulkCopyBatchSize 5000

The output shows the tables OPERATION and OPERATION_DETAIL copied to the target instance.

SSMS Import-and-Export Wizard


Let’s take a look at the Import-and-Export Wizard. The interface is very similar to all other wizards, allow
you to easily step through a process, and to execute the data copy process with writing very little or no
code. To do that, we have very few options that we can do within the wizard. However, for this, for
importing and exporting data from one source into another, this is really an excellent tool. If you want to
do almost any kind of transformations, then you don’t want use this tool, you may need to use Visual
Studio Data Tools (VSDT), and do a data flow.
So let’s get started. The first thing is to open Microsoft SQL Server Management Studio (SSMS). And
we’re going use to AdventureWorks2016 database, and we’re going to move it over to another instance
of SQL.
 Open the Object Explorer, locate the database, right-click and select Tasks and choose Export
Data option.
 Now the data source, if I pull this down, you’ll see the different sources that we can use. We’re
going to use SQL Native Client 11.0, the SQL provider.
 Next, Server name, it is recommended practice to pull server name and database in the import
and export wizard using the selection drop-down list.

 Now the destination selection, again pull down SQL provider, Server name and Database from a
drop-down list rather than typing it. And we’ll go Next
 In Select Source Tables and Views, select the objects to copy to the destination or you could write
a query. But here we’re just going to copy the data. In this case, let’s bring in the dbo.Cities and
Person.Address.
 Click Next

 We’re ready to run the copy job. Let us choose Run immediately and Click Next

 We can see a summary of the action that we are going to perform using the wizard
 Click Finish to execute the job steps.
 After successful execution of the job, we can validate and review the output.

Using sqlpackage.exe – Extract and Publish method


Sqlpackage is command-line utility that automate “Schema-and-data” extraction process and publish the
generated file into a target database. The SqlPackage.exe command-line utility is an in-house component
of SQL Server Data Tools (SSDT).
You can refer to the articles  Continuous Database Delivery (CD) using SQL Server Tools
SqlPackage.exe  and  SqlPackage.exe – Automate SQL Server Database Restoration using bacpac with
PowerShell or Batch techniques  for more information.
Using Sqlpackage.exe, it’s possible to extract the schema-and-data, and then publish only those listed
table(s) data. In the process, the objects such as Stored Procedures, Functions, etc are extracted into
the .dacpac file, but they’re excluded when publishing the content into the target database.
It is possible to list the tables. On specifying individual tables, you need to first specify
/p:ExtractAllTableData=False and then add /p:TableData property to specify each table in the form of
Schema.Table.
The following example uses the property /p:TableData for three tables. You can see that the tables are
referred in the form of dbo.orders.
/p:TableData=dbo.Orders
/p:TableData=Orders.Orders
/p:TableData=Person.Address
SqlPackage /Action:Extract /SourceDatabaseName:Adventureworks2016
/SourceServerName:HQDBT01 /TargetFile:F:\PowerSQL\smartbackup\AdventureWorks2016.dacpac
/p:IgnoreExtendedProperties=True /p:ExtractAllTableData=FALSE /p:TableData=dbo.Cities
/p:TableData=dbo.citiesDemo
Let’s prepare the script to automate the extract-and-publish process
1. Set the ENVIRONMENT variable. The file may be found in other directories depend on the SSDT
installation. In this case, SqlPackage.exe is found in C:\Program Files (x86)\Microsoft SQL
Server\140\DAC\bin\ folder
2. Prepare the Input values
a. Backup Directory
b. Source Database
c. Source SQL Server instance
d. Target database
e. Target SQL Server instance
3. Run the SqlPackage.exe with an extract action on the source SQL instance
4. Run T-SQL to find the existence of the target database
5. Run the SqlPackage.exe with a publish action on the target SQL instance
1 # Environment PATH variable
2 $Variables=$env:PATH
3  
4
5 #Check the path existence of the SqlPackage.exe and print its status
6
7 IF (-not $Variables.Contains( "C:\Program Files (x86)\Microsoft SQL Server\140\DAC\bin"))
8 {
9 write-host "SQLPackage.exe path is not found, Update the environment variable"
10 $ENV:PATH = $ENV:PATH + ";C:\Program Files (x86)\Microsoft SQL Server\140\DAC\bin;"  
11 }
12  
13
14 #the input parameters
15
16 $BackupDirectory="F:\PowerSQL\smartbackup\"
17 $DatabaseName="AdventureWorks2012"
18 #Source SQL Server Instance
19 $SourceServerName="HQDBT01"
20 #target SQL Instance
21 $TargetserverName="HQDBT01\SQL2017"
22
23
24 #Prepare the target filename
25 $dir  = [io.path]::GetDirectoryName($BackupDirectory)
26  
27 #set the filename, the database should be a part of the filename
28 $filename = "AdventureWorks2012_Rpt"
29 #extension must be dacpac
30 $ext = "dacpac"
31 # Prepare FULL PATH
32 $TargetFilePath  = "$dirName\$filename-$(get-date -f yyyyMMddhhmmss).$ext"
33 #print FULL file path
34 $TargetFilePath
35
36 #Run the SqlPackage tool to extract the data
37
38 SqlPackage /Action:Extract /SourceDatabaseName:$DatabaseName /SourceServerName:$SourceServerName /TargetFile:
39 $TargetFilePath /p:IgnoreExtendedProperties=True  /p:ExtractAllTableData=FALSE  /p:TableData=dbo.Orders
40 /p:TableData=dbo.Address
41  
42
43 #Get the latest file in a given directory
44  
45
46 $NewestBacPacFile = Get-ChildItem -Path $dirName\$filename*.$ext | Sort-Object LastAccessTime -Descending | Select-
47 Object -First 1
48 #print the latest bacfile name depending the name of the database
49 $file="$NewestBacPacFile"
50
51 $FILE
52
53 #If exists then drop the database
54
55 $dropTSQL=
56 @"
57 IF EXISTS (SELECT * FROM [sys].[databases] WHERE [name] = '$DatabaseName') DROP DATABASE $DatabaseName
58 "@
#Using sqlcmd, execute the DropTSQL on the target instance.
59
SQLCMD -S $TargetserverName -U SA -P Api1401$ -Q $dropTSQL
60
61
62
#Publish the data in the target database using sqlpackage.exe
63
64
SqlPackage.exe /a:publish /sf:$file /tsn:$TargetserverName /tdn:$DatabaseName /tu:SA /tp:Api1401$
Output:
In the output, you can see that the table dbo.orders and dbo.address tables are processed.

You can refer to the article  SqlPackage.exe – Automate SQL Server Database Restoration using bacpac with
PowerShell or Batch techniques  for more information

Generate Scripts using SQL Server Management


Studio
In this section, we’ll discuss another way to generate “schema and data” for SQL Server databases
objects.
Let’s see the steps to generate a SQL Script that includes both “Schema and Data”
1. Connect the SQL Server instance
2. Open the Object Explorer and locate the database
3. Right-click the database, select Tasks, and then click on Generate Scripts…. After that, the Script
Wizard opens. Click on “Next”.
4. On Choose Object page, enable the Select specific database objects option. Select the
intended objects and Click Next.

5. In Set Scripting Options, Select the Output Type and Click Advanced button. In this case, the
output type re-directed to query window.
6. In the Advanced Scripting Options, select “Schema and Data” from the drop-down list and Click
OK.

7. Next, the Summary page details the outlines of the entire process. Click Next
8. Now, Save or Publish Scripts page shows the progress of the entire process. You can monitor
the status of the entire schema and data generation process.

INSERT INTO SQL


This is also an option to clone the table from database to another.
You can refer to the article  Overview of the SQL Insert  for more information.

Summary
So far, we’ve discussed various methods to copy the tables across SQL Server databases. It is evident that
restoring a couple of tables from a backup can be time and space consuming process. It is up to your
environment to follow any of the aforementioned steps to copy the tables in SQL Server. There is no
standard/recommended way to copy a table between the databases but there are many possible
approaches that you can use to fit your needs

SQL Server Transaction Log Backup, Truncate


and Shrink Operations
In this article, we will cover SQL Server Transaction log backups, truncate and shrink operations with an
overview and examples covering everything discussed
If this article is your first visit to the SQL Server Transaction Log series, I recommend you to check the
previous articles (see the TOC below), in which we described the internal structure of the SQL Server
Transaction Log, the vital role that the Transaction Log plays in keeping the database in a consistent state
and recovering the corrupted database or mistakenly modified table to a specific point in time. We
discussed also in this series the three recovery models, Full, Simple and Bulk-Logged, that controls how
the transactions will be written to the SQL Server Transaction Log file and finally how to manage and
monitor the SQL Server Transaction Log growth.
Building all the basic information from the previous articles, we are ready now to discuss deeply in this
article the difference between the SQL Server Transaction Log backup, truncate and shrink concepts and
how to perform these operations.

Transaction Log Backup


When configuring your database with the Simple recovery model, the SQL Server Transaction Log will be
marked as inactive and truncated automatically after committing the active transaction. This is not the
case with the Full and Bulk-Logged database recovery models. When the database is configured with Full
recovery model, the SQL Server Transaction Log in the Transaction Log file will be marked as inactive
after committing the transaction, without being truncated automatically, as it will be waiting for a
Transaction Log backup to be performed. Recall that only the Transaction Log backup, but NOT the
database Full backup, will truncate the Transaction Logs from the Transaction Log file and makes it
available for reuse. If no Transaction Log backup is taken from the database, the Transaction Log file will
grow continuously, without truncation, until it runs out of free space.
The SQL Server Transaction Log backup can be taken only from the database when the recovery model
of that database is Full or Bulk-Logged. The recovery model of the database can be checked form
the Options tab of the Database Properties window, as below:

If you try to take Transaction Log backup for a database that is configured with the Simple recovery
model, the backup operation will fail with the error message below:

In addition, the Transaction Log backup requires that at least one Full backup is taken from that database
as a start point for the new backup chain. If you try to take a Transaction Log backup from a database
with no Full backup taken previously, the backup operation will fail with the error message below:

Let’s take a Full backup for the database to be able to take Transaction Log backup for that database. We
will use the BACKUP DATABASE T-SQL command to perform the database Full backup operation in our
example here. For more information about the different ways and options for performing database
backups in SQL Server, check the SQL Server Backup and Restore Series. The Full backup of the database
can be taken using the T-SQL script below:
1 BACKUP DATABASE [TSQL]
2 TO DISK = N'C:\Ahmad Yaseen\TSQL.bak' WITH NOFORMAT, NOINIT,  
3 NAME = N'TSQL-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, COMPRESSION, STATS = 10
4 GO
Once the database Full backup is performed, we will start taking the Transaction Log backups for the
database. The first Transaction Log backup will take a backup for all the transactions that occurred in the
database since the last Full backup. The Transaction Log backup can be taken using the BACKUP LOG T-
SQL command below:
1 BACKUP LOG [TSQL]
2 TO DISK = N'C:\Ahmad Yaseen\TSQL_2.TRN' WITH NOFORMAT, NOINIT,  
3 NAME = N'TSQL-TRN Database Backup', SKIP, NOREWIND, NOUNLOAD, COMPRESSION, STATS = 10
4 GO
On the other hand, the Transaction Log backups that follows the first Transaction Log backup will take
backup for all transactions that occurred in the database since the point that the last Transaction Log
backup stopped at. The Full backup and all following Transaction Log backup until a new Full backup is
taken is called Backup Chain. This backup chain is important to recover the database to a specific point
in time, in the case of any mistakenly performed change or database corruption. The frequency of the
Transaction Log backup depends on how important your data is, the size of the database and what type
of workload this database serves. In the heavily transactional databases, it is recommended to increase
the frequency of the Transaction Log backup, in order to minimize the data loss and truncate the
Transaction Logs to make it available for reuse.
If the database is damaged, it is recommended to create a tail-log backup to enable you to restore the
database to the current point in time. A tail-log backup is used to capture all log records that have not
yet been backed up. This will help in preventing any data loss and to keep the log chain complete.
Assume that you have executed the below DELETE statement by mistake without providing the WHERE
clause. This means that all table records will be deleted:

If you have designed a proper backup solution, the data can be easily recovered by restoring the
database back to the specific point in time before executing the DELETE statement. From the Restore
Database window, the SQL Server will return the complete backup chain that is taken from that database.
If you know the exact file that is taken directly before the data deletion, you can stop at that specific file,
as shown below:

But if you are aware of the exact time of executing the DELETE statement, you can restore the database
back to that specific point in time before the DELETE statement execution, without the need to know
which Transaction Log file contains that point in time. This can be achieved by clicking on
the Timeline option, and specify the time, as shown below:
Transaction Log Truncate
SQL Server Transaction Log truncation is the process in which all VLFs that are marked as inactive will be
deleted from the SQL Server Transaction Log file and become available for reuse. If there is a
single active log record in a VLF, the overall VLF will be considered as active log and cannot be
truncated.
The SQL Server Transaction Log, for the database that is configured with the Simple recovery model, can
be truncated automatically if:
 A Checkpoint operator is triggered
 The database transaction is committed
The SQL Server Transaction Log, for the database that is configured with the Full or Bulk-
Logged recovery model, can be truncated automatically:
 After performing a Transaction Log backup process, and the Transaction Log is not waiting for an
active transaction or any high availability feature, such as Mirroring, Replication or Always On
Availability Group
 Change the database recovery model to Simple
For example, if we change the recovery model of the below database to Simple and perform a
Checkpoint directly, the Transaction log will be truncated automatically and will be available for
reuse as shown below:

 TRUNCATE_ONLY Transaction Log backup option, that breaks the database backup chain and
truncates the available Transaction Logs. (Available only prior SQL Server 2008.)
If you try to truncate the Transaction Log of the database using the TRUNCATE_ONLY option in a
SQL Server instance on version 2008 and later, the statement will fail with the error message
below:

Transaction Log Shrink


When the database Transaction Log file is truncated, the truncated space will be freed up and become
available for reuse. But the Transaction Log file size will not be decreased, as the truncated space will not
be deallocated. On the other hand, the process of recovering the Transaction Log space by deallocating
the free VLFs and returning it back to the operating system is called a Transaction Log Shrink. operation.
The Transaction Log file shrink operation can be performed only if there is free space on the Transaction
Log file, that can be available most of the time after truncating the inactive part of the Transaction Log. A
shrink operation will be useful after performing an operation that creates a large number of Transaction
Logs.
The Transaction Log file of a database can be shrunk by right-clicking on the database and choose the
Shrink -> Files option from the Tasks menu, as shown below:

In the Shrink File page, change the File Type to Log, and choose the Transaction Log file that you
manage to shrink. In this page, you have three options:
 Release unused space in the Transaction Log file to the operating system and shrinks the file to
the last allocated extent. This reduces the file size without moving any data
 Release unused space in the Transaction Log file to the operating system and tries to relocate
rows to unallocated pages. Here, a value should be specified
 Moves all data from the specified file to other files in the same filegroup, in order to delete the
empty file later

The same Transaction Log file can be shrunk using the DBCC SHRINKFILE T-SQL statement below:
1 USE [AdventureWorks2016CTP3]
2 GO
3 DBCC SHRINKFILE (N'AdventureWorks2016CTP3_Log' , 0, TRUNCATEONLY)
4 GO
Shrinking the Transaction Log file to a size smaller than the size of the Virtual Log File is not possible,
even if this space is not used. This is due to the fact that the Transaction Log file can be shrunk only to
the boundary of the VLF. In this case, the SQL Server Database Engine will free as much space as possible,
and then issues an informational message, as shown below:

In the next article of this series, we will discuss the best practices that should be applied to the
transaction log in order to get the optimal performance from it. Stay tuned!
SQL Lag function overview and examples
In the article SQL Server Lead function overview and examples, we explored Lead function for performing
computational operations on data. This article gives an overview of the SQL Lag function and its
comparison with the SQL Lead function.

Overview of SQL Lag function


We use a Lag() function to access previous rows data as per defined offset value. It is a window function
available from SQL Server 2012 onwards. It works similar to a Lead function. In the lead function, we
access subsequent rows, but in lag function, we access previous rows. It is a useful function in comparing
the current row value from the previous row value.
Syntax of Lag function
1 LAG (scalar_expression [,offset] [,default])  
2 OVER ( [ partition_by_clause ] order_by_clause )
It uses following arguments.
 Scalar_expression: We define a column name or expression in this argument. The lag function
does calculations on this column. It is a mandatory argument, and we cannot execute the lag
function without this
 Offset: We define an integer number in this argument. The lag function uses this argument forgo
behind the number of rows (offset). The default value for this argument is one. It is an optional
argument
 Default: Suppose we define an offset, value that does not lie in the boundary of the data. For
example, we specified offset value 3 for the first row. A lag function cannot go three rows behind.
It displays the default value if specified. If we do not specify any value for this, the lag function
displays NULL in the output for out of range values
 PARTITION BY: It creates a logical boundary of data. Suppose we have an extensive data set and
we require calculations on a smaller set of data, we can define partitions for it. For example, sales
data for an organization might contain data for several years. We can create a partition quarterly
and do the computation. It is as well an optional argument
 ORDER BY: We can sort data in ascending or descending order using ORDER by clause. By
default, it uses ascending order to sort data
We will use data from the previous article for demonstration of SQL Server Lag function as well:
1 DECLARE   @Employee TABLE
2   (
3        EmpCode VARCHAR(10),
4        EmpName   VARCHAR(10),
5        JoiningDate  DATE
6     )
7 INSERT INTO @Employee VALUES ('1', 'Rajendra', '1-Sep-2018')
8 INSERT INTO @Employee VALUES ('2', 'Manoj', '1-Oct-2018')
9 INSERT INTO @Employee VALUES ('3', 'Sonu', '10-Mar-2018')
10 INSERT INTO @Employee VALUES ('4', 'Kashish', '25-Oct-2018')
11 INSERT INTO @Employee VALUES ('5', 'Tim', '1-Dec-2018')
12 INSERT INTO @Employee VALUES ('6', 'Akshita', '1-Nov-2018')
13 GO
14 SELECT * FROM   @Employee;
We have the following data in the Employee table:
Example 1: SQL Lag function without a default value
Execute the following query to use the Lag function on the JoiningDate column with offset one. We did
not specify any default value in this query.
Execute the following query (we require to run the complete query along with defining a variable, its
value):
1 SELECT *,
2        Lag(JoiningDate, 1) OVER(.
3        ORDER BY JoiningDate ASC) AS EndDate
4 FROM @Employee;
In the output, we can note the following:
 The first row shows NULL value for the EndDate column because it does not have any previous
rows
 The second row contains previous row value in the EndDate column. It takes value from the
previous row due to offset value 1

Example 2: SQL Lag function with a default value


In the previous example, we get NULL value as a default value. Let’s use a default end date in the lag
function. This example also uses offset value 1 in the lag function:
1 SELECT *,
2        Lag(JoiningDate, 1,'1999-09-01') OVER(
3        ORDER BY JoiningDate ASC) AS EndDate
4 FROM @Employee;
In the output, we can see a default value instead of NULL in the first row:

We can use compatible data types in the default value column. If we use incompatible data types, we get
the following error message:

Example 3: SQL Lag function with OFFSET value 2


Previously, we used default offset 1 in Lag function, and it takes value from the previous row. In the
example, we use offset value 2. In the output, you can see we have a default value for row 1 and 2. In row
3, it takes value from row 1:
1 SELECT *,
2        Lag(JoiningDate, 2,'1999-09-01') OVER(
3        ORDER BY JoiningDate ASC) AS EndDate
4 FROM @Employee;
Example 4: SQL Lag function with PARTITION BY clause
As discussed earlier, we use the PARTITION BY clause to create a logical subset of data. Let’s use this
PARTITION function on the ProductSales table. You can refer to the SQL Server Lead function to create
this table:

In the following query, we use SQL Server Lag function and view the output:
1 SELECT [Year],
2        [Quarter],
3        Sales,
4        LAG(Sales, 1, 0) OVER(
5        ORDER BY [Year],
6                 [Quarter] ASC) AS [NextQuarterSales]
7 FROM dbo.ProductSales;
In the output, the lag function considers all rows as a single data set and applies Lag function:

In the ProductSales table, we have data for the years of 2017, 2018 and 2019. We want to use a lag
function on a yearly basis. We use the PARTITION BY clause on the Year column and define the logical
subset of data on a yearly basis. We use the Order by clause on year and quarter columns to sort data
first on a yearly basis and then monthly:
1 SELECT [Year],
2        [Quarter],
3        Sales,
4        LAG(Sales, 1, 0) OVER(PARTITION BY [Year]
5        ORDER BY [Year],
6                 [Quarter] ASC) AS [NextQuarterSales]
7 FROM dbo.ProductSales;
In the following screenshot, we can see three partitions of data for 2017,2018 and 2019 year. The Lag
function individually works on each partition and calculates the required data:

Conclusion
In this article, we learned the SQL Lag function and its usage to retrieve a value from previous rows. Here
is the quick summary of the lag function:

DELETE CASCADE and UPDATE CASCADE


in SQL Server foreign key
In this article, we will review on DELETE CASCADE AND UPDATE CASCADE rules in SQL Server foreign key
with different examples.
DELETE CASCADE: When we create a foreign key using this option, it deletes the referencing rows in the
child table when the referenced row is deleted in the parent table which has a primary key.
UPDATE CASCADE: When we create a foreign key using UPDATE CASCADE the referencing rows are
updated in the child table when the referenced row is updated in the parent table which has a primary
key.
We will be discussing the following topics in this article:
1. Creating DELETE and UPDATE CASCADE rule in a foreign key using SQL Server management
studio
2. Creating DELETE CASCADE and UPDATE CASCADE rule in a foreign key using T-SQL script
3. Triggers on a table with DELETE or UPDATE cascading foreign key
Let us see how to create a foreign key with DELETE and UPDATE CASCADE rules along with few examples.

Creating a foreign key with DELETE and UPDATE


CASCADE rules
Using the SQL Server Management Studio GUI:
Login to the SQL Server using SQL Server Management Studio, Navigate to the Keys folder in the child
table. Right click on the Keys folder and select New Foreign Key.
Edit table and columns specification by clicking … as shown in the below image.
Select the parent table and the primary key column in the parent table. select the foreign key column in
the child table. Click on OK. Please refer to the below sample image.

In the INSERT and UPDATE specifications, select Cascade for the delete rule.

Click on Close and save the table in the designer. Click Yes in the warning message window.
Once you click on Yes, a foreign key with delete rule is created. Similarly, we can create a foreign key
with UPDATE CASCADE rule by selecting CASCADE as an action for the update rule in INSERT and
UPDATE specifications.

Using T-SQL:
Please refer to the below T-SQL script which creates a parent, child table and a foreign key on the child
table with DELETE CASCADE rule.
1 CREATE TABLE Countries
2  
3 (CountryID INT PRIMARY KEY,
4 CountryName VARCHAR(50),
5 CountryCode VARCHAR(3))
6  
7  
8 CREATE TABLE States
9  
10 (StateID INT PRIMARY KEY,
11 StateName VARCHAR(50),
12 StateCode VARCHAR(3),
13 CountryID INT)
14  
15  
16  
17 ALTER TABLE [dbo].[States]  WITH CHECK ADD  CONSTRAINT [FK_States_Countries] FOREIGN KEY([CountryID])
18 REFERENCES [dbo].[Countries] ([CountryID])
19 ON DELETE CASCADE
20 GO
21  
22 ALTER TABLE [dbo].[States] CHECK CONSTRAINT [FK_States_Countries]
23 GO
Insert some sample data using below T-SQL script.
1 INSERT INTO Countries VALUES (1,'United States','USA')
2  
3 INSERT INTO Countries VALUES (2,'United Kingdom','UK')
4  
5 INSERT INTO States VALUES (1,'Texas','TX',1)
6 INSERT INTO States VALUES (2,'Arizona','AZ',1)
Now I deleted a row in the parent table with CountryID =1 which also deletes the rows in the child table
which has CountryID =1.

Please refer to the below T-SQL script to create a foreign key with UPDATE CASCADE rule.
1 CREATE TABLE Countries
2  
3 (CountryID INT PRIMARY KEY,
4 CountryName VARCHAR(50),
5 CountryCode VARCHAR(3))
6  
7  
8 CREATE TABLE States
9  
10 (StateID INT PRIMARY KEY,
11 StateName VARCHAR(50),
12 StateCode VARCHAR(3),
13 CountryID INT)
14  
15 GO
16  
17 INSERT INTO Countries VALUES (1,'United States','USA')
18  
19 INSERT INTO Countries VALUES (2,'United Kingdom','UK')
20  
21 INSERT INTO States VALUES (1,'Texas','TX',1)
22 INSERT INTO States VALUES (2,'Arizona','AZ',1)
23  
24 GO
25  
26  
27 ALTER TABLE [dbo].[States]  WITH CHECK ADD  CONSTRAINT [FK_States_Countries] FOREIGN KEY([CountryID])
28 REFERENCES [dbo].[Countries] ([CountryID])
29 ON UPDATE CASCADE
30 GO
31  
32 ALTER TABLE [dbo].[States] CHECK CONSTRAINT [FK_States_Countries]
33 GO
Now update CountryID in the Countries for a row which also updates the referencing rows in the child
table States.
1 UPDATE Countries SET CountryID =3 where CountryID=1

Following is the T-SQL script which creates a foreign key with cascade as UPDATE and DELETE rules.
1 ALTER TABLE [dbo].[States]  WITH CHECK ADD  CONSTRAINT [FK_States_Countries] FOREIGN KEY([CountryID])
2 REFERENCES [dbo].[Countries] ([CountryID])
3 ON UPDATE CASCADE
4 ON DELETE CASCADE
5 GO
6  
7 ALTER TABLE [dbo].[States] CHECK CONSTRAINT [FK_States_Countries]
8 GO
To know the update and delete actions in the foreign key, query sys.foreign_keys view. Replace the
constraint name in the script.
SELECT
1 name,delete_referential_action,delete_referential_action_desc,update_referential_action,update_referential_action_desc
FROM sys.foreign_keys where name ='FK_States_Countries'
The below image shows that a DELETE CASCADE action and no UPDATE action is defined on the foreign
key.
Let’s move forward and check the behavior of delete and update rules the foreign keys on a child table
which acts as parent table to another child table. The below example demonstrates this scenario.
In this case, “Countries” is the parent table of the “States” table and the “States” table is the parent table
of Cities table.
We will create a foreign key now with cascade as delete rule on States table which references to
CountryID in parent table Countries.
1 CREATE TABLE Countries
2  
3 (CountryID INT PRIMARY KEY,
4 CountryName VARCHAR(50),
5 CountryCode VARCHAR(3))
6  
7  
8 CREATE TABLE States
9  
10 (StateID INT PRIMARY KEY,
11 StateName VARCHAR(50),
12 StateCode VARCHAR(3),
13 CountryID INT)
14  
15 GO
16  
17  
18 CREATE TABLE Cities
19 (CityID INT,
20 CityName varchar(50),
21 StateID INT)
22 GO
23  
24 INSERT INTO Countries VALUES (1,'United States','USA')
25  
26 INSERT INTO Countries VALUES (2,'United Kingdom','UK')
27  
28 INSERT INTO States VALUES (1,'Texas','TX',1)
29 INSERT INTO States VALUES (2,'Arizona','AZ',1)
30  
31 INSERT INTO Cities VALUES(1,'Texas City',1)
32 INSERT INTO Cities values (1,'Phoenix',2)
33  
34 GO
35  
36  
37 ALTER TABLE [dbo].[States]  WITH CHECK ADD  CONSTRAINT [FK_States_Countries] FOREIGN KEY([CountryID])
38 REFERENCES [dbo].[Countries] ([CountryID])
39 ON DELETE CASCADE
40 GO
Now on the Cities table, create a foreign key without a DELETE CASCADE rule.
1 ALTER TABLE [dbo].[Cities]  WITH CHECK ADD  CONSTRAINT [FK_Cities_States] FOREIGN KEY([StateID])
2 REFERENCES [dbo].[States] ([StateID])
3 GO
If we try to delete a record with CountryID =1, it will throw an error as delete on parent table “Countries”
tries to delete the referencing rows in the child table States. But on Cities table, we have a foreign key
constraint with no action for delete and the referenced value still exists in the table.
1 DELETE FROM Countries where CountryID =1
The delete fails at the second foreign key.
When we create the second foreign key with cascade as delete rule then the above delete command runs
successfully by deleting records in the child table “States” which in turn deletes records in the second
child table “Cities”.
1 CREATE TABLE Countries
2  
3 (CountryID INT PRIMARY KEY,
4 CountryName VARCHAR(50),
5 CountryCode VARCHAR(3))
6  
7  
8 CREATE TABLE States
9  
10 (StateID INT PRIMARY KEY,
11 StateName VARCHAR(50),
12 StateCode VARCHAR(3),
13 CountryID INT)
14  
15 GO
16  
17  
18 CREATE TABLE Cities
19 (CityID INT,
20 CityName varchar(50),
21 StateID INT)
22 GO
23  
24 INSERT INTO Countries VALUES (1,'United States','USA')
25  
26 INSERT INTO Countries VALUES (2,'United Kingdom','UK')
27  
28 INSERT INTO States VALUES (1,'Texas','TX',1)
29 INSERT INTO States VALUES (2,'Arizona','AZ',1)
30  
31 INSERT INTO Cities VALUES(1,'Texas City',1)
32 INSERT INTO Cities values (1,'Phoenix',2)
33  
34 GO
35  
36  
37 ALTER TABLE [dbo].[States]  WITH CHECK ADD  CONSTRAINT [FK_States_Countries] FOREIGN KEY([CountryID])
38 REFERENCES [dbo].[Countries] ([CountryID])
39 ON DELETE CASCADE
40 GO
41  
42  
43 ALTER TABLE [dbo].[Cities]  WITH CHECK ADD  CONSTRAINT [FK_Cities_States] FOREIGN KEY([StateID])
44 REFERENCES [dbo].[States] ([StateID])
45 ON DELETE CASCADE
46 GO
47  
48 DELETE FROM Countries where CountryID =1

Triggers on a table with delete cascade or update


cascade foreign key
An instead of an update trigger cannot be created on the table if a foreign key on with UPDATE
CASCADE already exists on the table. It throws an error “Cannot create INSTEAD OF DELETE or INSTEAD
OF UPDATE TRIGGER ‘trigger name’ on table ‘table name’. This is because the table has a FOREIGN KEY
with cascading DELETE or UPDATE.”

Similarly, we cannot create INSTEAD OF DELETE trigger on the table when a foreign key CASCADE
DELETE rule already exists on the table.

INSERT INTO SELECT statement overview and


examples
April 12, 2019 by Rajendra Gupta

This article covers the SQL INSERT INTO SELECT statement along with its syntax, examples, and use cases.
In my earlier article SQL SELECT INTO Statement, we explored the following tasks.
 Create a SQL table on the fly while inserting records with appropriate data types
 Use SQL SELECT INTO to insert records in a particular FileGroup
 We cannot use it to insert data in an existing table

The INSERT INTO SELECT statement


We want to insert records as regular database activity. We can insert data directly using client tools such
as SSMS, Azure Data Studio or directly from an application. In SQL, we use the SQL INSERT INTO
statement to insert records.
The syntax of the INSERT INTO
Once we insert data into the table, we can use the following syntax for our SQL INSERT INTO statement.
1 INSERT INTO table_name (Column1, Column 2....)
2 VALUES (value1, value2, ...);
If we have specified all column values as per table column orders, we do not need to specify column
names. We can directly insert records into the table.
1 INSERT INTO table_name
2 VALUES (value1, value2, ...);
Let us create a sample table and insert data into it.
1 CREATE TABLE Employees
2 (ID   INT,
3 Name VARCHAR(20)
4 );
We can insert data using the following queries. Both queries are valid for data insertion.
1 Insert into Employees (ID, Name) values (1,'raj')
2 Insert into Employees values (2,'raj')
We cannot insert data without specifying column names if there is a mismatch between data insertion
and the order of column values is different. We can get the following error message.
 Msg 213, Level 16, State 1, Line 6
Column name or number of supplied values does not match table
definition.
 Msg 245, Level 16, State 1, Line 6
Conversion failed when converting the varchar value ‘raj’ to data type
int.
In this example, we’ll use the SQL INSERT INTO statement with supplying values directly in a statement.
Suppose we want to insert data from another table. We can still use the SQL INSERT INTO statement with
a select statement. Let’s explore this in the next section.
INSERT INTO SELECT Statement Syntax
We can insert data from other SQL tables into a table with the following INSERT INTO SELECT statement.
1 INSERT INTO table1 (col1, col2, col3, …)
2 SELECT col1, col2, col3, …
3 FROM table2
This query performs the following tasks:
 It first Selects records from a table ( Select statement)
 Next, it inserts into a table specified with INSERT INTO
 Note:  The Column structure should match between the column returned by SELECT statement and
destination table.
INSERT INTO SELECT examples
Example 1: insert data from all columns of source table to destination table
We have the following records in an existing Employee table.

Let us create another table Customers with the following query.


1 CREATE TABLE Customers
2 (ID   INT,
3 Name VARCHAR(20)
4 );
We want to insert all records from the Employees table to the Customers table. We can use the SQL
INSERT INTO SELECT statement to do this.
1 INSERT INTO Customers
2        SELECT *
3        FROM Employees;
It inserts all records into the Customers table. We can verify the records in Customers table are similar to
the Employees table.

In this example, we inserted records for all columns to the Customers table.
Example 2: Insert rows from source to destination table by specifying column
names
Let’s drop the existing Customers table before we move forward. Now, we want to create a table with
one additional IDENTITY column. IDENTITY column automatically inserts identity values in a table. We
also added a City column that allows NULL values
1 CREATE TABLE Customers
2 (ID     INT IDENTITY(1, 1),
3 Emp_ID INT,
4 Name   VARCHAR(20),
5 City   VARCHAR(20) NULL,
6 );
We cannot use the INSERT INTO SELECT statement similar to the above example. If we try to run this
code, we get an error message.
1 INSERT INTO Customers
2        SELECT *
3        FROM Employees;

In this case, we need to specify the column name with INSERT INTO statement.
1 INSERT INTO Customers (Emp_ID ,Name)
2        SELECT *
3        FROM Employees;
In the Customers table, we have an additional column with allows NULL values. Let’s run a Select on
Customers table. In the following screenshot, we can see NULL values in the City column.

Suppose you have a different column in the source table. You can still insert records into the destination
table with specifying column names in the INSERT INTO SELECT statement. We should have an
appropriate data type to insert data. You cannot insert a varchar column data into an INT column.
Add a new column in Employees table using ALTER TABLE statement.
1 ALTER TABLE Employees
2 ADD Country varchar(50);
Update the table records with country value India.
1 Update Employees set Country='India'
Now, rerun the INSERT INTO SELECT statement. You can notice that we are using SELECT * instead of
specifying column names.
1 INSERT INTO Customers (Emp_ID ,Name)
2        SELECT *
3        FROM Employees;
We get the following error message. This error comes because of the column mismatch between the
source table and destination table.

We can map the column between the source and destination table using the following query.
1 INSERT INTO Customers
2 (Emp_ID,
3 Name
4 )
5        SELECT ID,Name
6        FROM Employees;
Example 3: Insert top rows using the INSERT INTO SELECT statement
Suppose we want to insert Top N rows from the source table to the destination table. We can use Top
clause in the INSERT INTO SELECT statement. In the following query, it inserts the top 1 row from the
Employees table to the Customers table.
1 INSERT TOP(1) INTO Customers
2 (Emp_ID,
3 Name
4 )
5        SELECT ID,Name
6        FROM Employees;
Example 4: Insert using both columns and defined values in the SQL INSERT INTO
SELECT Statement
In previous examples, we either specified specific values in the INSERT INTO statement or used INSERT
INTO SELECT to get records from the source table and insert it into the destination table.
We can combine both columns and defined values in the SQL INSERT INTO SELECT statement.
We have the following columns in the Customers and Employees table. Previously, we did not insert any
values for the City column. We do not have the required values in the Employee table as well. We need to
specify an explicit value for the City column.
In the following query, we specified a value for the City column while the rest of the values we inserted
from the Employees table.
1 INSERT TOP(1) INTO Customers (Emp_ID,  Name, City)
2        SELECT ID, Name,'Delhi' FROM Employees;
In the following query, we can see it inserts one row (due to Top (1) clause) along with value for the City
column.

Example 5: INSERT INTO SELECT statement with Join clause to get data from
multiple tables
We can use a JOIN clause to get data from multiple tables. These tables are joined with conditions
specified with the ON clause. Suppose we want to get data from multiple tables and insert into a table.
In this example, I am using AdventureWorks2017 database. First, create a new table with appropriate
data types.
1 CREATE TABLE [HumanResources].[EmployeeData](
2   [FirstName] [dbo].[Name] NOT NULL,
3   [MiddleName] [dbo].[Name] NULL,
4   [LastName] [dbo].[Name] NOT NULL,
5   [Suffix] [nvarchar](10) NULL,
6   [JobTitle] [nvarchar](50) NOT NULL,
7   [PhoneNumber] [dbo].[Phone] NULL,
8   [PhoneNumberType] [dbo].[Name] NULL,
9   [EmailAddress] [nvarchar](50) NULL,
10   [City] [nvarchar](30) NOT NULL,
11   [StateProvinceName] [dbo].[Name] NOT NULL,
12   [PostalCode] [nvarchar](15) NOT NULL,
13   [CountryRegionName] [dbo].[Name] NOT NULL
14 ) ON [PRIMARY]
15 GO
This table should contain records from the output of a multiple table join query. Execute the following
query to insert data into HumanResources.EmployeeData table.
1 INSERT INTO HumanResources.EmployeeData
2 SELECT p.[FirstName],
3        p.[MiddleName],
4        p.[LastName],
5        p.[Suffix],
6        e.[JobTitle],
7        pp.[PhoneNumber],
8        pnt.[Name] AS [PhoneNumberType],
9        ea.[EmailAddress],
10        a.[City],
11        sp.[Name] AS [StateProvinceName],
12        a.[PostalCode],
13        cr.[Name] AS [CountryRegionName]
14 FROM [HumanResources].[Employee] e
15      INNER JOIN [Person].[Person] p ON p.[BusinessEntityID] = e.[BusinessEntityID]
16      INNER JOIN [Person].[BusinessEntityAddress] bea ON bea.[BusinessEntityID] = e.[BusinessEntityID]
17      INNER JOIN [Person].[Address] a ON a.[AddressID] = bea.[AddressID]
18      INNER JOIN [Person].[StateProvince] sp ON sp.[StateProvinceID] = a.[StateProvinceID]
19      INNER JOIN [Person].[CountryRegion] cr ON cr.[CountryRegionCode] = sp.[CountryRegionCode]
20      LEFT OUTER JOIN [Person].[PersonPhone] pp ON pp.BusinessEntityID = p.[BusinessEntityID]
21      LEFT OUTER JOIN [Person].[PhoneNumberType] pnt ON pp.[PhoneNumberTypeID] = pnt.[PhoneNumberTypeID]
22      LEFT OUTER JOIN [Person].[EmailAddress] ea ON p.[BusinessEntityID] = ea.[BusinessEntityID];
23 GO
Example 6: INSERT INTO SELECT statement with common table expression
We use Common Table Expressions (CTE) to simplify complex join from multiple columns. In the previous
example, we used JOINS in a Select statement for inserting data into a SQL table. In this part, we will
rewrite the query with CTE.
In a CTE, we can divide code into two parts.
1. We define CTE by a WITH clause before SELECT, INSERT, UPDATE, DELETE statement
2. Once we define CTE, we can take reference the CTE similar to a relational SQL table
Execute the following code to insert data using a CTE.
1 WITH EmployeeData_Temp([FirstName],
2                        [MiddleName],
3                        [LastName],
4                        [Suffix],
5                        [JobTitle],
6                        [PhoneNumber],
7                        [PhoneNumberType],
8                        [EmailAddress],
9                        [City],
10                        [StateProvinceName],
11                        [PostalCode],
12                        [CountryRegionName])
13      AS (
14  
15      SELECT p.[FirstName],
16             p.[MiddleName],
17             p.[LastName],
18             p.[Suffix],
19             e.[JobTitle],
20             pp.[PhoneNumber],
21             pnt.[Name] AS [PhoneNumberType],
22             ea.[EmailAddress],
23             a.[City],
24             sp.[Name] AS [StateProvinceName],
25             a.[PostalCode],
26             cr.[Name] AS [CountryRegionName]
27      FROM [HumanResources].[Employee] e
28           INNER JOIN [Person].[Person] p ON p.[BusinessEntityID] = e.[BusinessEntityID]
29           INNER JOIN [Person].[BusinessEntityAddress] bea ON bea.[BusinessEntityID] = e.[BusinessEntityID]
30           INNER JOIN [Person].[Address] a ON a.[AddressID] = bea.[AddressID]
31           INNER JOIN [Person].[StateProvince] sp ON sp.[StateProvinceID] = a.[StateProvinceID]
32           INNER JOIN [Person].[CountryRegion] cr ON cr.[CountryRegionCode] = sp.[CountryRegionCode]
33           LEFT OUTER JOIN [Person].[PersonPhone] pp ON pp.BusinessEntityID = p.[BusinessEntityID]
34           LEFT OUTER JOIN [Person].[PhoneNumberType] pnt ON pp.[PhoneNumberTypeID] = pnt.[PhoneNumberTypeID]
35           LEFT OUTER JOIN [Person].[EmailAddress] ea ON p.[BusinessEntityID] = ea.[BusinessEntityID])
36  
37 INSERT INTO HumanResources.EmployeeData
38             SELECT *
39             FROM EmployeeData_Temp;
40 GO
Example 7: INSERT INTO SELECT statement with a Table variable
We use Table variables  similarly to a temporary table. We can declare them using the table data type.
This table can be used to perform activities in SQL Server where we do not require a permanent table.
You can divide the following query into three parts.
1. Create a SQL Table variable with appropriate column data types. We need to use data type TABLE
for table variable
2. Execute a INSERT INTO SELECT statement to insert data into a table variable
3. View the table variable result set
1 DECLARE @TableVar table(  
2     [JobTitle] [nvarchar](50) NOT NULL,
3   [BirthDate] [date] NOT NULL,
4   [MaritalStatus] [nchar](1) NOT NULL,
5   [Gender] [nchar](1) NOT NULL,
6   [HireDate] [date] NOT NULL,
7   [SalariedFlag] [dbo].[Flag] NOT NULL,
8   [VacationHours] [smallint] NOT NULL,
9   [SickLeaveHours] [smallint] NOT NULL
10   )
11   
12 -- Insert values into the table variable.  
13 INSERT INTO @TableVar
14     SELECT  
15    [JobTitle]
16       ,[BirthDate]
17       ,[MaritalStatus]
18       ,[Gender]
19       ,[HireDate]
20       ,[SalariedFlag]
21       ,[VacationHours]
22       ,[SickLeaveHours]
23     FROM [AdventureWorks2017].[HumanResources].[Employee]
24   
25 -- View the table variable result set.  
26 SELECT * FROM @TableVar;  
27 GO

Understanding the SQL Decimal data type


This article aims to walk you through the SQL Decimal data type and its usage with various examples. We
will also see how we can exercise this data type in SQL Server to help make SQL developer’s job easier.

Introduction
Organizations deal with decimals on a day-to-day basis, and these decimal values can be seen
everywhere in different sectors, be it in banks, the medical industry, biometrics, gas stations, financial
reports, sports, and whatnot. Using whole numbers (by rounding decimal numbers) definitely makes
one’s job easier but it often leads to inaccurate outputs, especially when we are dealing with a large
number of values and crucial data. In such scenarios, it is ideal to use Sql Decimal data type in SQL Server
to deliver correct results with perfect precision.
It becomes very essential for SQL developers to choose the correct data types in the table structure while
designing and modeling SQL databases. Let’s move forward and explore Decimal data type in SQL
Server.

Pre-requisite
SQL Decimal data type is being used in SQL Server since forever. You can use any SQL Server version
installed (starting 2000 or above) to understand this data type. We will be using SQL Server 2017 in this
article for the demo purposes. If you don’t have any version installed on your system and wish to practice
against the 2017 version, download it from here.

The Basic syntax of Decimal data type in SQL Server


Let’s take a look at the basic syntax of SQL Decimal Data type first. It is denoted as below:
 decimal [(p [,s])]
Where,
 p stands for Precision, the total number of digits in the value, i.e. on both sides of the decimal
point
 s stands for Scale, number of digits after the decimal point
The default value of p is 18 and s is 0 and for both these values, the minimum is 1 and the maximum is
38.
In short, by defining parameters in the SQL Decimal data type, we are estimating how many digits a
column or a variable will have and also the number of digits to the right of the decimal point.
For instance, decimal (4,2) indicates that the number will have 2 digits before the decimal point and 2
digits after the decimal point, something like this has to be the number value- ##.##.
One important thing to note here is, – parameter s (Scale) can only be specified if p (Precision) is
specified. The scale must always be less than or equal to the precision.

Defining SQL Decimal Data type


Let’s work with a very popular mathematical constant – π, aka, Pi that has a value equal to 3.14159 (22/7
in a fraction). Copy and paste the below query in a new query window and execute it.
DECLARE @PiWithNoDecimal DECIMAL(6,0) = 3.14159
1
DECLARE @Piupto5Decimal DECIMAL(6,5) = 3.14159
2
DECLARE @Piupto1Decimal DECIMAL(3,1) = 3.14159
3
SELECT @PiWithNoDecimal AS PiWithNoDecimal, @Piupto5Decimal AS Piupto5Decimal, @Piupto1Decimal AS
4
Piupto1Decimal

The above result set shows how SQL Server treats each combination of precision and scale as a different
data type. Like here, decimal (6, 0) behaves differently from data types decimal (6,5) and decimal (3,1)
and are considered as three different types. This way we can tweak the parameters in the SQL Decimal
type to achieve desired results.
Now that we know how to create this Decimal data type in SQL Server, let’s explore it with numerous
examples.

Using SQL Decimal in the Tables


Let’s quickly create a new table, named Patients, that makes use of decimal data type for columns height
and weight. We will insert a few rows using an INSERT clause as shown below for the demo purposes.
1 CREATE TABLE dbo.Patients
2 ( Name varchar(10),
3   Gender varchar(2),
4   Height decimal (3,2),
5   Weight decimal (5,2)
6 )
7 INSERT INTO PATIENTS VALUES('John','M',6.1,80.4)
8 INSERT INTO PATIENTS VALUES('Bred','M',5.8,73.7)
9 INSERT INTO PATIENTS VALUES('Leslie','F',5.3,66.9)
10 INSERT INTO PATIENTS VALUES('Rebecca','F',5.7,50.2)
11 INSERT INTO PATIENTS VALUES('Shermas','M',6.5,190.6)
Once the data is populated in the table, we can query this data using SELECT statement as shown below.
The decimal values can be seen in the height and weight attributes.
1 SELECT * FROM dbo.PATIENTS
Let’s figure out what happens if we try to insert values that exceed the specified precision or scale values
while defining the Height and Weight columns. For this demo, we will insert 2 more rows into this table
(shown below).
1.
1 INSERT INTO PATIENTS VALUES('Largest','M', '10.9', 88.5)
2.
1 INSERT INTO PATIENTS VALUES('Hulk','M', '9.9', 1000.45)
It encounters the below error saying arithmetic overflow error and the SQL Server terminated the
statements.

Let’s get to the root of this issue:


 Height Decimal (3, 2) means the value can have 3 digits overall and 2 digits to the right of the
decimal point. In the first line of code above, the value 10.9 (considered as 10.90 = 4 digits
overall) exceeds the specified range (3, 2) and causes the overflow
 Weight Decimal (5,2) means the total number of digits cannot exceed 5 and 2 digits can be
placed to the right of the decimal. However, the value 1000.45 in the second line of code above
exceeds the specified range of (5, 2) since it means 6 digits in total and throws an overflow error
 Quick note – In the above error message, if you have noticed, “data type numeric” is stated
instead of data type decimal, the reason is that the Decimal and the Numeric data type are
exactly the same, both are fixed-precision data types and can be used interchangeably
Resolving the error
One of the easiest workarounds is to increase the precision level of the columns to store bigger numbers.
We can alter the data type of the columns without dropping the table or column with the below code.
1 ALTER TABLE dbo.Patients ALTER COLUMN Height decimal(4,2)
2 ALTER TABLE dbo.Patients ALTER COLUMN Weight decimal (6,2)
Once altered, execute the Insert queries to insert these rows into the table.

We can see the rows being added to the table.


Storage considerations with Decimal Data Type in
SQL Server
Data type SQL Decimal requires the following storage bytes for the specified precision as provided by
Microsoft below:
Precision Storage (Bytes)

1–9 5

10 – 19 9

20 – 28 13

29 – 38 17
The space consumption of SQL Decimal data type is based on the column definition and not on the size
of the value being assigned to it. For e.g. Decimal (12, 4) with value of 888.888 takes 9 bytes on disk and
Decimal (22, 2) value of 9999.99 consumes 13 bytes on disk. This is why this data type falls under fixed-
length columns.
As a SQL developer myself, I always try to use SQL Decimal data type as decimal (9, 2) which consumes
the least storage, 5 bytes on disk and offers better performance.

Conclusion
I hope this article provides a comprehensible approach on how to use SQL Decimal data type. Always
ensure the precision of the decimal or numeric variable specified is enough to accommodate the values
assigned to it. Additionally, we observed, how selecting the right kind of data type helps SQL developers
to save disk storage.
In case of any questions, please feel free to ask in the comments section below.
To continue your journey with SQL Server and data types used in it, I would recommend going through
the below links.
 Spatial SQL data types in SQL Server
 SQL Server Data Type Conversion Methods and performance comparison
 Understanding the GUID data type in SQL Server
 A step-by-step walkthrough of SQL Inner Joi

SQL multiple joins for beginners with examples


October 16, 2019 by Esat Erkec
In this article, we will learn the SQL multiple joins concept and reinforce our learnings with pretty simple
examples, which are explained with illustrations. In relational databases, data is stored in tables. Without
a doubt, and most of the time, we need a result set that is formed combining data from several tables.
The joins allow us to combine data from two or more tables so that we are able to join data of the tables
so that we can easily retrieve data from multiple tables. You might ask yourself how many different types
of join exist in SQL Server. The answer is there are four main types of joins that exist in SQL Server. First
of all, we will briefly describe them using Venn diagram illustrations:
 Inner join returns the rows that match in both tables

 Left join returns all rows from the left table

 Right join returns all rows from the right table

 Full join returns whole rows from both tables

If you lack knowledge about the SQL join concept in the SQL Server, you can see the SQL Join types
overview and tutorial article.
After this short explanatory about the SQL joins types, we will go through the multiple joins.

What are SQL multiple joins?


Multiple joins can be described as follows; multiple join is a query that contains the same or different join
types, which are used more than once. Thus, we gain the ability to combine multiple tables of data in
order to overcome relational database issues.

Example scenario
Green-Tree company launched a new campaign for the New Year and made different offers to its online
customers. As a result of their campaign, they succeeded in converting some offers to sales. In the
following examples, we will uncover the new year campaign data details of the Green-Tree company.
The company stores these campaign data details in the following tables. Now, we will create these tables
through the following query and populate them with some dummy data:
1 DROP TABLE IF EXISTS sales
2 GO
3 DROP TABLE IF EXISTS orders
GO
DROP TABLE IF EXISTS onlinecustomers
GO
CREATE TABLE onlinecustomers (customerid INT PRIMARY KEY IDENTITY(1,1) ,CustomerName VARCHAR(100)
,CustomerCity VARCHAR(100) ,Customermail VARCHAR(100))
GO
CREATE TABLE orders (orderId INT PRIMARY KEY IDENTITY(1,1) , customerid INT  ,
4
ordertotal float ,discountrate float ,orderdate DATETIME)
5
GO
6
CREATE TABLE sales (salesId INT PRIMARY KEY IDENTITY(1,1) ,
7
orderId INT  ,
8
salestotal FLOAT)
9
GO
10
 
11
 
12
INSERT INTO [dbo].[onlinecustomers]([CustomerName],[CustomerCity],[Customermail]) VALUES
13
(N'Salvador',N'Philadelphia',N'[email protected]')
14
INSERT INTO [dbo].[onlinecustomers]([CustomerName],[CustomerCity],[Customermail]) VALUES (N'Gilbert',N'San
15
Diego',N'[email protected]')
16
INSERT INTO [dbo].[onlinecustomers]([CustomerName],[CustomerCity],[Customermail]) VALUES (N'Ernest',N'New
17
York',N'[email protected]')
18
INSERT INTO [dbo].[onlinecustomers]([CustomerName],[CustomerCity],[Customermail]) VALUES
19
(N'Stella',N'Phoenix',N'[email protected]')
20
INSERT INTO [dbo].[onlinecustomers]([CustomerName],[CustomerCity],[Customermail]) VALUES (N'Jorge',N'Los
21
Angeles',N'[email protected]')
22
INSERT INTO [dbo].[onlinecustomers]([CustomerName],[CustomerCity],[Customermail]) VALUES (N'Jerome',N'San
23
Antonio',N'[email protected]')
24
INSERT INTO [dbo].[onlinecustomers]([CustomerName],[CustomerCity],[Customermail]) VALUES
25
(N'Edward',N'Chicago',N'[email protected]')
26
 
27
GO
28
 
29
INSERT INTO [dbo].[orders]([customerid],[ordertotal],[discountrate],[orderdate]) VALUES (3,1910.64,5.49,CAST('03-Dec-
30
2019' AS DATETIME))
31
INSERT INTO [dbo].[orders]([customerid],[ordertotal],[discountrate],[orderdate]) VALUES (4,150.89,15.33,CAST('11-Jun-
32
2019' AS DATETIME))
33
INSERT INTO [dbo].[orders]([customerid],[ordertotal],[discountrate],[orderdate]) VALUES (5,912.55,13.74,CAST('15-Sep-
34
2019' AS DATETIME))
35
INSERT INTO [dbo].[orders]([customerid],[ordertotal],[discountrate],[orderdate]) VALUES (7,418.24,14.53,CAST('28-
36
May-2019' AS DATETIME))
37
INSERT INTO [dbo].[orders]([customerid],[ordertotal],[discountrate],[orderdate]) VALUES (55,512.55,13.74,CAST('15-
38
Jun-2019' AS DATETIME))
39
INSERT INTO [dbo].[orders]([customerid],[ordertotal],[discountrate],[orderdate]) VALUES (57,118.24,14.53,CAST('28-
40
Dec-2019' AS DATETIME))
41
GO
42
 
INSERT INTO [dbo].[sales]([orderId],[salestotal]) VALUES (3,370.95)
INSERT INTO [dbo].[sales]([orderId],[salestotal]) VALUES (4,882.13)
INSERT INTO [dbo].[sales]([orderId],[salestotal]) VALUES (12,370.95)
INSERT INTO [dbo].[sales]([orderId],[salestotal]) VALUES (13,882.13)
INSERT INTO [dbo].[sales]([orderId],[salestotal]) VALUES (55,170.95)
INSERT INTO [dbo].[sales]([orderId],[salestotal]) VALUES (57,382.13)

How SQL multiple joins work?


Business problem: Which customers were interested in this New Year campaign?
In order to answer this question, we need to find out the matched rows for all the tables because some
customers did not receive an email offer, and some offers could not be converted into a sale. The
following Venn diagram will help us to figure out the matched rows which we need. In short, the result of
this query should be the intersecting rows of all tables in the query. The grey-colored area specifies these
rows in the Venn diagram:
The SQL multiple joins approach will help us to join onlinecustomers, orders, and sales tables. As
shown in the Venn diagram, we need to matched rows of all tables. For this reason, we will combine all
tables with an inner join clause. The following query will return a result set that is desired from us and
will answer the question:
1 SELECT customerName, customercity, customermail, salestotal
2 FROM onlinecustomers AS oc
3    INNER JOIN
4    orders AS o
5    ON oc.customerid = o.customerid
6    INNER JOIN
7    sales AS s
8    ON o.orderId = s.orderId
At first, we will analyze the query. An inner join clause that is
between onlinecustomers and orders tables derived the matched rows between these two tables. The
second inner join clause that combines the sales table derived the matched rows from the previous result
set. The following colored tables illustration will help us to understand the joined tables data matching in
the query. The yellow-colored rows specify matched data between onlinecustomers and orders. On the
other hand, only the blue colored rows exist in the sales tables so the query result will be blue colored
rows:

The result of the query will look like this:

Different join types usage in SQL multiple joins


Business problem: Which offers could not be converted into a sell?
We can use the different types of joins in a single query so that we can overcome different relational
database issues. In this example, we need all rows of the orders table, which are matched
to onlinecustomers tables. On the other hand, these rows do not exist in the sales table. The following
Venn diagram will help us to figure out the matched rows which we need. The grey-colored area
indicates rows which will be the output of the query:
In the first step, we should combine the onlinecustomers and orders tables through the inner
join clause because inner join returns all the matched rows between onlinecustomers and orders tables.
In the second step, we will combine the orders table to the sales table through the left join and then
filter the null values because we need to eliminate the rows which are stored by the sales table:
1 SELECT customerName, customercity, customermail, ordertotal,salestotal
2 FROM onlinecustomers AS c
3    INNER JOIN
4    orders AS o
5    ON c.customerid = o.customerid
6    LEFT JOIN
7    sales AS s
8    ON o.orderId = s.orderId
9    WHERE s.salesId IS NULL
The result of the query will look like this:

Quiz
Question: Please generate the proper query according to the below Venn diagram.

Answer: As we learned, the full join allows us to return all rows from the combined tables. The answered
query will be like the following:
1 SELECT customerName, customercity, customermail, ordertotal,salestotal
2 FROM onlinecustomers AS c
3    FULL JOIN
4    orders AS o
5    ON c.customerid = o.customerid
6    FULL JOIN
7    sales AS s
8    ON o.orderId = s.orderId
Conclusion
dc

You might also like