SQL Convert Date Functions and Formats
SQL Convert Date Functions and Formats
In this article, we will explore various SQL Convert Date formats to use in writing SQL queries.
We need to work with date type data in SQL. It can be a complicated thing to deal with, at times, for SQL
Server developers. Suppose you have a Product table with a column timestamp. It creates a timestamp
for each customer order. You might face the following issues with it
You fail to insert data in the Product table because the application tries to insert data in a
different date format
Suppose you have data in a table in the format YYYY-MM-DD hh:mm: ss. You have a daily Sales
report, and in that, you want data group by date. You want to have data in the report in format
YYYY-MM-DD
We do face many such scenarios when we do not have a date format as per our requirement. We cannot
change table properties to satisfy each requirement. In this case, we need to use the built-in functions in
SQL Server to give the required date format.
Time hh:mm:ss[.nnnnnnn]
Date YYYY-MM-DD
We can see various date formats in the following table. You can keep this table handy for reference
purpose in the format of Date Time columns.
In the table, we can see various formats to SQL convert date as per your requirements. In the following
table, you can see all SQL date formats together.
Date format option SQL convert date output
1 12/30/06
2 06.12.30
3 30/12/2006
4 30.12.06
5 30/12/2006
6 30-Dec-06
Let us next explore a function that is useful for SQL convert date.
DATEADD
We can use the SQL DATEADD function to add a particular period to our date. Suppose we have a
requirement to add 1 month to current date. We can use the SQL DATEADD function to do this task.
The syntax for SQL DATEADD function is as following
1 DATEADD(interval, number, date)
Interval: We can specify an interval that needs to be added in the specified date. We can have values
such as year, quarter, month, day, week, hour, minute etc.
Number: It specifies the number of the interval to add. For example, if we have specified interval as
Month and Number as 2, it means 2 months needs to be added in date.
In the following query, we want to add 2 months in the current date.
1 SELECT GETDATE() as Currentdate
2
3 SELECT DATEADD(month, 2, GETDATE()) AS NewDate;
You can see the output in the following screenshot.
Similarly, lets us add 1 year to current date using the following query.
1 select GETDATE() as Currentdate
2
3 SELECT DATEADD(Year, 1, GETDATE()) AS NewDate;
4
Conclusion
In this article, we explored various SQL convert date formats. It allows getting a date in required format
with Covert function easily. You can use this article to take a reference for all date formats and use in
your queries.
In the following example, we will use the SELECT statement in order to assign a value to a variable:
1 DECLARE @TestVariable AS VARCHAR(100)
2 SELECT @TestVariable = 'Save the Nature'
3 PRINT @TestVariable
Additionally, the SELECT statement can be used to assign a value to a variable from table, view or scalar-
valued functions. Now, we will take a glance at this usage concept through the following example:
1 DECLARE @PurchaseName AS NVARCHAR(50)
2 SELECT @PurchaseName = [Name]
3 FROM [Purchasing].[Vendor]
4 WHERE BusinessEntityID = 1492
5 PRINT @PurchaseName
As can be seen, the @PurchaseName value has been assigned from the Vendor table.
Now, we will assign a value to variable from a scalar-valued function:
1 DECLARE @StockVal AS INT
2 SELECT @StockVal=dbo.ufnGetStock(1)
3 SELECT @StockVal AS [VariableVal]
This way is tedious and inconvenient. However, we have a more efficient way to declare multiple
variables in one statement. We can use the DECLARE statement in the following form so that we can
assign values to these variables in one SELECT statement:
1 DECLARE @Variable1 AS VARCHAR(100), @Variable2 AS UNIQUEIDENTIFIER
2 SELECT @Variable1 = 'Save Water Save Life' , @Variable2= '6D8446DE-68DA-4169-A2C5-4C0995C00CC1'
3 PRINT @Variable1
4 PRINT @Variable2
Also, we can use a SELECT statement in order to assign values from tables to multiple variables:
1 DECLARE @VarAccountNumber AS NVARCHAR(15)
2 ,@VariableName AS NVARCHAR(50)
3 SELECT @VarAccountNumber=AccountNumber , @VariableName=Name
4 FROM [Purchasing].[Vendor]
5 WHERE BusinessEntityID = 1492
6 PRINT @VarAccountNumber
7 PRINT @VariableName
Useful tips about the SQL Variables
Tip 1: As we mentioned before, the local variable scope expires at the end of the batch. Now, we will
analyze the following example of this issue:
1 DECLARE @TestVariable AS VARCHAR(100)
2 SET @TestVariable = 'Think Green'
3 GO
4 PRINT @TestVariable
The above script generated an error because of the GO statement. GO statement determines the end of
the batch in SQL Server thus @TestVariable lifecycle ends with GO statement line. The variable which is
declared above the GO statement line can not be accessed under the GO statement. However, we can
overcome this issue by carrying the variable value with the help of the temporary tables:
1 IF OBJECT_ID('tempdb..#TempTbl') IS NOT NULL DROP TABLE #TempTbl
2 DECLARE @TestVariable AS VARCHAR(100)
3 SET @TestVariable = 'Hello World'
4 SELECT @TestVariable AS VarVal INTO #TempTbl
5 GO
6 DECLARE @TestVariable AS VARCHAR(100)
7 SELECT @TestVariable = VarVal FROM #TempTbl
8 PRINT @TestVariable
Tip 2: Assume that, we assigned a value from table to a variable and the result set of the SELECT
statement returns more than one row. The main issue at this point will be which row value is assigned to
the variable. In this circumstance, the assigned value to the variable will be the last row of the resultset. In
the following example, the last row of the resultset will be assigned to the variable:
1 SELECT AccountNumber
2 FROM [Purchasing].[Vendor]
3 ORDER BY BusinessEntityID
4
5 DECLARE @VarAccountNumber AS NVARCHAR(15)
6 SELECT @VarAccountNumber=AccountNumber
7 FROM [Purchasing].[Vendor]
8 order by BusinessEntityID
9 SELECT @VarAccountNumber AS VarValue
Tip 3: If the variable declared data types and assigned value data types are not matched, SQL Server
makes an implicit conversion in the value assignment process, if it is possible. The lower precedence data
type is converted to the higher precedence data type by the SQL Server but this operation may lead to
data loss. For the following example, we will assign a float value to the variable but this variable data type
has declared as an integer:
1 DECLARE @FloatVar AS FLOAT = 12312.1232
2 DECLARE @IntVar AS INT
3 SET @IntVar=@FloatVar
4 PRINT @IntVar
Conclusion
In this article, we have explored the concept of SQL variables from different perspectives, and we also
learned how to define a variable and how to assign a value(s) to it.
See more
SQL PARTITION BY Clause overview
April 9, 2019 by Rajendra Gupta
This article will cover the SQL PARTITION BY clause and, in particular, the difference with GROUP BY in a
select statement. We will also explore various use cases of SQL PARTITION BY.
We use SQL PARTITION BY to divide the result set into partitions and perform computation on each
subset of partitioned data.
It launches the ApexSQL Generate. I generated a script to insert data into the Orders table. Execute this
script to insert 100 records in the Orders table.
1 USE [SQLShackDemo]
2 GO
3 INSERT [dbo].[Orders] VALUES (216090, CAST(N'1826-12-19' AS Date), N'Edward', N'Phoenix', 4713.8900)
4 GO
5 INSERT [dbo].[Orders] VALUES (508220, CAST(N'1826-12-09' AS Date), N'Aria', N'San Francisco', 9832.7200)
6 GO
7 …
Once we execute insert statements, we can see the data in the Orders table in the following image.
We use SQL GROUP BY clause to group results by specified column and use aggregate functions such as
Avg(), Min(), Max() to calculate required values.
Group By function syntax
1 SELECT expression, aggregate function ()
2 FROM tables
3 WHERE conditions
4 GROUP BY expression
Suppose we want to find the following values in the Orders table
Minimum order value in a city
Maximum order value in a city
Average order value in a city
Execute the following query with GROUP BY clause to calculate these values.
1 SELECT Customercity,
2 AVG(Orderamount) AS AvgOrderAmount,
3 MIN(OrderAmount) AS MinOrderAmount,
4 SUM(Orderamount) TotalOrderAmount
5 FROM [dbo].[Orders]
6 GROUP BY Customercity;
In the following screenshot, we can see Average, Minimum and maximum values grouped by
CustomerCity.
We can use the SQL PARTITION BY clause to resolve this issue. Let us explore it further in the next
section.
SQL PARTITION BY
We can use the SQL PARTITION BY clause with the OVER clause to specify the column on which we
need to perform aggregation. In the previous example, we used Group By with CustomerCity column and
calculated average, minimum and maximum values.
Let us rerun this scenario with the SQL PARTITION BY clause using the following query.
1 SELECT Customercity,
2 AVG(Orderamount) OVER(PARTITION BY Customercity) AS AvgOrderAmount,
3 MIN(OrderAmount) OVER(PARTITION BY Customercity) AS MinOrderAmount,
4 SUM(Orderamount) OVER(PARTITION BY Customercity) TotalOrderAmount
5 FROM [dbo].[Orders];
In the output, we get aggregated values similar to a GROUP By clause. You might notice a difference in
output of the SQL PARTITION BY and GROUP BY clause output.
We get a limited number of records using the Group By clause We get all records in a table using the PARTITION BY clause.
In the following screenshot, you can for CustomerCity Chicago, it performs aggregations (Avg, Min and
Max) and gives values in respective columns.
Similarly, we can use other aggregate functions such as count to find out total no of orders in a particular
city with the SQL PARTITION BY clause.
1 SELECT Customercity,
2 CustomerName,
3 OrderAmount,
4 COUNT(OrderID) OVER(PARTITION BY Customercity) AS CountOfOrders,
5 AVG(Orderamount) OVER(PARTITION BY Customercity) AS AvgOrderAmount,
6 MIN(OrderAmount) OVER(PARTITION BY Customercity) AS MinOrderAmount,
7 SUM(Orderamount) OVER(PARTITION BY Customercity) TotalOrderAmount
8 FROM [dbo].[Orders];
We can see order counts for a particular city. For example, we have two orders from Austin city therefore;
it shows value 2 in CountofOrders column.
PARTITION BY clause with ROW_NUMBER()
We can use the SQL PARTITION BY clause with ROW_NUMBER() function to have a row number of each
row. We define the following parameters to use ROW_NUMBER with the SQL PARTITION BY clause.
PARTITION BY column – In this example, we want to partition data on CustomerCity column
Order By: In the ORDER BY column, we define a column or condition that defines row number. In
this example, we want to sort data on the OrderAmount column
1 SELECT Customercity,
2 CustomerName,
3 ROW_NUMBER() OVER(PARTITION BY Customercity
4 ORDER BY OrderAmount DESC) AS "Row Number",
5 OrderAmount,
6 COUNT(OrderID) OVER(PARTITION BY Customercity) AS CountOfOrders,
7 AVG(Orderamount) OVER(PARTITION BY Customercity) AS AvgOrderAmount,
8 MIN(OrderAmount) OVER(PARTITION BY Customercity) AS MinOrderAmount,
9 SUM(Orderamount) OVER(PARTITION BY Customercity) TotalOrderAmount
10 FROM [dbo].[Orders];
In the following screenshot, we get see for CustomerCity Chicago, we have Row number 1 for order with
highest amount 7577.90. it provides row number with descending OrderAmount.
Similarly, we can calculate the cumulative average using the following query with the SQL PARTITION
BY clause.
1 SELECT Customercity,
2 CustomerName,
3 OrderAmount,
4 ROW_NUMBER() OVER(PARTITION BY Customercity
5 ORDER BY OrderAmount DESC) AS "Row Number",
6 CONVERT(VARCHAR(20), AVG(orderamount) OVER(PARTITION BY Customercity
7 ORDER BY OrderAmount DESC ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING), 1) AS CumulativeAVG
ROWS UNBOUNDED PRECEDING with the PARTITION BY
clause
We can use ROWS UNBOUNDED PRECEDING with the SQL PARTITION BY clause to select a row in a
partition before the current row and the highest value row after current row.
In the following table, we can see for row 1; it does not have any row with a high value in this partition.
Therefore, Cumulative average value is the same as of row 1 OrderAmount.
For Row2, It looks for current row value (7199.61) and highest value row 1(7577.9). It calculates the
average for these two amounts.
For Row 3, it looks for current value (6847.66) and higher amount value than this value that is 7199.61
and 7577.90. It calculates the average of these and returns.
This article explains the process of performing SQL delete activity for duplicate rows from a SQL table.
Introduction
We should follow certain best practices while designing objects in SQL Server. For example, a table
should have primary keys, identity columns, clustered and non-clustered indexes, constraints to ensure
data integrity and performance. Even we follow the best practices, and we might face issues such as
duplicate rows. We might also get these data in intermediate tables in data import, and we want to
remove duplicate rows before actually inserting them in the production tables.
Suppose your SQL table contains duplicate rows and you want to remove those duplicate rows. Many
times, we face these issues. It is a best practice as well to use the relevant keys, constrains to eliminate
the possibility of duplicate rows however if we have duplicate rows already in the table. We need to
follow specific methods to clean up duplicate data. This article explores the different methods to remove
duplicate data from the SQL table.
Let’s create a sample Employee table and insert a few records in it.
1 CREATE TABLE Employee
2 (
3 [ID] INT identity(1,1),
4 [FirstName] Varchar(100),
5 [LastName] Varchar(100),
6 [Country] Varchar(100),
7 )
8 GO
9
10 Insert into Employee ([FirstName],[LastName],[Country] )values('Raj','Gupta','India'),
11 ('Raj','Gupta','India'),
12 ('Mohan','Kumar','USA'),
13 ('James','Barry','UK'),
14 ('James','Barry','UK'),
15 ('James','Barry','UK')
In the table, we have a few duplicate records, and we need to remove them.
SQL delete duplicate Rows using Group By and
having clause
In this method, we use the SQL GROUP BY clause to identify the duplicate rows. The Group By clause
groups data as per the defined columns and we can use the COUNT function to check the occurrence of
a row.
For example, execute the following query, and we get those records having occurrence greater than 1 in
the Employee table.
1 SELECT [FirstName],
2 [LastName],
3 [Country],
4 COUNT(*) AS CNT
5 FROM [SampleDB].[dbo].[Employee]
6 GROUP BY [FirstName],
7 [LastName],
8 [Country]
9 HAVING COUNT(*) > 1;
To remove this
data, replace the first Select with the SQL delete statement as per the following query.
1 DELETE FROM [SampleDB].[dbo].[Employee]
2 WHERE ID NOT IN
3 (
4 SELECT MAX(ID) AS MaxRecordID
5 FROM [SampleDB].[dbo].[Employee]
6 GROUP BY [FirstName],
7 [LastName],
8 [Country]
9 );
Once you execute the delete statement, perform a select on an Employee table, and we get the following
records that do not contain duplicate rows.
In the screenshot, you can note that we need to remove the row having a Rank greater than one. Let’s
remove those rows using the following query.
1 DELETE E
2 FROM [SampleDB].[dbo].[Employee] E
3 INNER JOIN
4 (
5 SELECT *,
6 RANK() OVER(PARTITION BY firstname,
7 lastname,
8 country
9 ORDER BY id) rank
10 FROM [SampleDB].[dbo].[Employee]
11 ) T ON E.ID = t.ID
12 WHERE rank > 1;
Click on Preview data and you can see we still have duplicate data in the source table
Add a Sort operator from the SSIS toolbox for SQL delete operation and join it with the source
data
For the configuration of the Sort operator, double click on it and select the columns that contain
duplicate values. In our case, duplicate value is in [FirstName], [LastName], [Country] columns.
We can also use the ascending or descending sorting types for the columns. The default sort method is
ascending. In the sort order, we can choose the column sort order. Sort order 1 shows the column which
will be sorted first.
On the bottom left side, notice a checkbox Remove rows with duplicate sort values.
It will do the task of removing duplicate rows for us from the source data. Let’s put a tick in this checkbox
and click ok. It performs the SQL delete activity in the SSIS package.
Once we click OK, it returns to the data flow tab, and we can see the following SSIS package.
We can add SQL Server destinations to store the data after removing duplicate rows. We only want to
check that sort operator is doing the task for us or not.
Add a SQL Multicast Transformation from the SSIS toolbox as shown below.
To view the distinct data, right-click on the connector between Sort and Multicast. Click on Enable Data
Viewer.
The overall SSIS package looks like below.
Execute the package to perform SQL delete operation. It opens the Sort output data viewer at the Data
flow task. In this data viewer, you can see distinct data after removing the duplicate values.
Close this and the SSIS package shows
successfully executed.
Conclusion
In this article, we explored the process of SQL delete duplicate rows using various ways such as T-SQL,
CTE, and SSIS package. You can use the method in which you feel comfortable. However, I would suggest
not to implement these procedures and package on the production data directly. You should test in a
lower environment
In this article, we will learn different methods that are used to update the data in a table with the data of
other tables. The UPDATE from SELECT query structure is the main technique for performing these
updates.
An UPDATE query is used to change an existing row or rows in the database. UPDATE queries can
change all tables’ rows, or we can limit the update statement affects for certain rows with the help of
the WHERE clause. Mostly, we use constant values to change the data, such as the following structures.
The full update statement is used to change the whole table data with the same value.
1 UPDATE table
2 SET col1 = constant_value1 , col2 = constant_value2 , colN = constant_valueN
The conditional update statement is used to change the data that satisfies the WHERE condition.
1 UPDATE table
2 SET col1 = constant_value1 , col2 = constant_value2 , colN = constant_valueN
3 WHERE col = val
However, for different scenarios, this constant value usage type cannot be enough for us, and we need to
use other tables’ data in order to update our table. This type of update statement is a bit complicated
than the usual structures. In the following sections, we will learn how to write this type of update query
with different methods, but at first, we have to prepare our sample data. So let’s do this.
After the execution of the update from a select query the output of the Persons table will be as shown
below;
1 SELECT * FROM Persons
Performance Tip:
Indexes are very helpful database objects to improve query performance in SQL Server. Particularly, if we
are working on the performance of the update query, we should take into account of this probability. The
following execution plan illustrates an execution plan of the previous query. The only difference is that
this query updated the 3.000.000 rows of the Persons table. This query was completed within 68
seconds.
We added a non-clustered index on Persons table before to update and the added index involves
the PersonCityName and PersonPostCode columns as the index key.
The following execution plan is demonstrating an execution plan of the same query, but this query was
completed within 130 seconds because of the added index, unlike the first one.
The Index Update and Sort operators consume 74% cost of the execution plan. We have seen this
obvious performance difference between the same query because of index usage on the updated
columns. As a result, if the updated columns are being used by the indexes, like this, for example, the
query performance might be affected negatively. In particular, we should consider this problem if we will
update a large number of rows. To overcome this issue, we can disable or remove the index before
executing the update query.
On the other hand, a warning sign is seen on the Sort operator, and it indicates something does not go
well for this operator. When we hover the mouse over this operator, we can see the warning details.
During the execution of the query, the query optimizer calculates a required memory consumption for
the query based on the estimated row numbers and row size. However, this consumption estimation can
be wrong for a variety of reasons, and if the query requires more memory than the estimation, it uses the
tempdb data. This mechanism is called a tempdb spill and causes performance loss. The reason for this:
the memory always faster than the tempdb database because the tempdb database uses the disk
resources.
You can see this SQL Server 2017: SQL Sort, Spill, Memory and Adaptive Memory Grant
Feedback fantastic article for more details about the tempdb spill issue.
Now let’s tackle the previous update from a select query line by line.
1 MERGE Persons AS Per
We have typed the Persons table after the MERGE statement because it is our target table, which we
want to update, and we gave Per alias to it in order to use the rest of the query.
1 USING(SELECT * FROM AddressList) AS Addr
After the USING statement, we have specified the source table.
1 ON Addr.PersonID=Per.PersonID
With the help of this syntax, the join condition is defined between the target and source table.
1 WHEN MATCHED THEN
2 UPDATE SET Per.PersonPostCode=Addr.PostCode;
In this last line of the query, we chose the manipulation method for the matched rows. Individually for
this query, we have selected the UPDATE method for the matched rows of the target table. Finally, we
added the semicolon (;) sign because the MERGE statements must end with the semicolon signs.
Many times the subquery update method may not offer satisfying performance
Conclusion
How to backup and restore MySQL databases
using the mysqldump command
May 12, 2020 by Nisarg Upadhyay
In this article, I am going to explain different ways to generate the backup in the MySQL database server.
As we know, data is a valuable asset to the organization. As database administrators, it is our primary and
crucial job to keep the data available and safe. If the system or data center fails, database corruption, and
data loss, we must be able to recover it within the defined SLA.
Different database platforms provide various methods to generate the backup and restore the database.
Many vendors provide state-of-the-art software and hardware solutions that can help to back up the
database, and it can restore the database within the defined RTO and RPO.
Here, we are not going to discuss any third-party vendor’s backup solutions. I am going to explain the
native methods that are used to generate the backup of the database. We can generate the backup of
the MySQL database using any of the following methods:
1. Generate the backup using mysqldump utility
2. Generate Incremental backups using Binary Log
3. Generate backups using the Replication of Slaves
In this article, I am going to explain how we can use mysqldump to generate the backup of the MySQL
database.
As you can see in the above screenshot, the backup file contains the various T-SQL statements that can
be used to insert data in the tables.
SQL WHILE loop provides us with the advantage to execute the SQL statement(s) repeatedly until the
specified condition result turn out to be false.
In the following sections of this article, we will use more flowcharts in order to explain the notions and
examples. For this reason, firstly, we will explain what is a flowchart briefly. The flowchart is a visual
geometric symbol that helps to explain algorithms visually. The flowchart is used to simply design and
document the algorithms. In the flowchart, each geometric symbol specifies different meanings.
The following flowchart explains the essential structure of the WHILE loop in SQL:
As you can see, in each iteration of the loop, the defined condition is checked, and then, according to the
result of the condition, the code flow is determined. If the result of the condition is true, the SQL
statement will be executed. Otherwise, the code flow will exit the loop. If any SQL statement exists
outside the loop, it will be executed.
Now, we will handle the WHILE loop example line by line and examine it with details.
In this part of the code, we declare a variable, and we assign an initializing value to it:
1 DECLARE @Counter INT
2 SET @Counter=1
This part of the code has a specified condition that until the variable value reaches till 10, the loop
continues and executes the PRINT statement. Otherwise, the while condition will not occur, and the loop
will end:
1 WHILE ( @Counter <= 10)
In this last part of the code, we executed the SQL statement, and then we incremented the value of the
variable:
1 BEGIN
2 PRINT 'The counter value is = ' + CONVERT(VARCHAR,@Counter)
3 SET @Counter = @Counter + 1
4 END
The following flowchart illustrates this WHILE loop example visually:
Infinite SQL WHILE loop
In the infinite loop AKA endless loop, the condition result will never be false, so the loop never ends and
can work forever. Imagine that we have a WHILE loop, and we don’t increment the value of the variable.
In this scenario, the loop runs endlessly and never ends. Now, we will realize this scenario with the help of
the following example. We need to take account of one thing that we should not forget to cancel the
execution of the query manually:
1 DECLARE @Counter INT
2 SET @Counter=1
3 WHILE ( @Counter <= 10)
4 BEGIN
5 PRINT 'Somebody stops me!'
6
7 END
In the following flowchart, it is obvious that the value of the variable never changes; therefore, the loop
never ends. The reason for this issue is that the variable is always equal to 1 so the condition returns true
for each iteration of the loop:
BREAK statement
BREAK statement is used in the SQL WHILE loop in order to exit the current iteration of the loop
immediately when certain conditions occur. In the generally IF…ELSE statement is used to check whether
the condition has occurred or not. Refer to the SQL IF Statement introduction and overview article for
more details about the IF…ELSE statement.
The following example shows the usage of the BREAK statement in the WHILE loop:
1 DECLARE @Counter INT
2 SET @Counter=1
3 WHILE ( @Counter <= 10)
4 BEGIN
5 PRINT 'The counter value is = ' + CONVERT(VARCHAR,@Counter)
6 IF @Counter >=7
7 BEGIN
8 BREAK
9 END
10 SET @Counter = @Counter + 1
11 END
In this example, we have checked the value of the variable, and when the value is equal or greater than 7,
the code entered the IF…ELSE block and executed the BREAK statement, and so it exited the loop
immediately. For this reason, the message shows the values of the variable up to 7. If the condition of the
IF…ELSE statement does not meet, the loop will run until the condition result will be false. The following
flowchart explains the working logic of the BREAK statement example as visually:
CONTINUE statement
CONTINUE statement is used in the SQL WHILE loop in order to stop the current iteration of the loop
when certain conditions occur, and then it starts a new iteration from the beginning of the loop. Assume
that we want to write only even numbers in a WHILE loop. In order to overcome this issue, we can use
the CONTINUE statement. In the following example, we will check whether the variable value is odd or
even. If the variable value is odd, the code enters the IF…ELSE statement blocks and increment the value
of the variable, execute the CONTINUE statement and starts a new iteration:
1 DECLARE @Counter INT
2 SET @Counter=1
3 WHILE ( @Counter <= 20)
4 BEGIN
5
6 IF @Counter % 2 =1
7 BEGIN
8 SET @Counter = @Counter + 1
9 CONTINUE
10 END
11 PRINT 'The counter value is = ' + CONVERT(VARCHAR,@Counter)
12 SET @Counter = @Counter + 1
13 END
The following flowchart explains the working logic of the CONTINUE statement example as visually:
In this example, we read the table rows via the WHILE loop. We can also develop more sophisticated and
advanced loops based on our needs.
While working with raw data, you may frequently face date values stored as text. Converting these values
to a date data type is very important since dates may be more valuable during analysis. In SQL Server,
converting a string to date can be achieved in different approaches.
In general, there are two types of data type conversions:
1. Implicit where conversions are not visible to the user; data type is changed while loading data
without using any function
2. Explicit where conversions are visible to the user and they are performed using CAST or
CONVERT functions or other tools
In this article, we will explain how a string to date conversion can be achieved implicitly, or explicitly in
SQL Server using built-in functions such as CAST(), TRY_CAST(), CONVERT(), TRY_CONVERT() and
TRY_PARSE().
Note: Before we start, please note that some of the SQL statements used are meaningless from
the data context perspective and are just used to explain the concept.
You can check out this official documentation here to learn more about how to change SQL Server
language settings.
Additionally, you can read more about implicitly converting date types in SQL Server, by referring to this
article: Implicit conversion in SQL Server.
Note that in SQL Server, converting a string to date using CAST() function depends on the language
settings similar to implicit conversion, as we mentioned in the previous section, so you can only convert
ISO formats or supported formats by the current language settings.
CONVERT()
CONVERT() function is more advanced than CAST() since the conversion style can be specified. This
function takes 3 arguments: (1) the desired data type, (2) the input value and (3) the style number
(optional).
If the style number is not passed to the function, it acts like the CAST() function. But, if the style argument
is passed, it will try to convert the value based on that style. As an example, if we try to convert
“13/12/2019” value to date without specifying the style number, it will fail since it is not supported by the
current language setting:
1 SELECT CONVERT(DATETIME,'13/12/2019')
Result:
But, if we pass 103 as style number (103 is corresponding of dd/MM/yyyy date format), it will succeed:
1 SELECT CONVERT(DATETIME,'13/12/2019',103)
Result:
For more information about CONVERT() function and date style numbers, you can refer to the following
articles:
SQL Convert Date functions and formats
How to convert from string to datetime?
PARSE()
PARSE() is SQL CLR function that use .Net framework Parse() function. PARSE() syntax is as follows:
PARSE(<value> AS <data type> [USING <culture>])
If the culture info is not specified, PARSE() acts similar to CAST() function, but when the culture is passed
within the expression, the function tries to convert the value to the desired data type using this culture.
As an example, if we try to parse 13/12/2019 value without passing the culture information, it will fail
since “dd/MM/yyyy” is not supported by the default language settings.
But, if we pass “AR-LB” as culture (Arabic – Lebanon), where “dd/MM/yyyy” is supported, the conversion
succeeds:
Conclusion
In this article, we explained data conversion approaches in general. Then we showed how, while using
SQL Server, converting a string to date can be achieved using these approaches. We explained the
system functions provided by SQL Server by giving some examples and external links that provide more
details.
See more
SELECT INTO TEMP TABLE statement in SQL
Server
June 21, 2021 by Esat Erkec
In this article, we will explore the SELECT INTO TEMP TABLE statement, its syntax and usage details and
also will give some simple basic examples to reinforce the learnings.
Introduction
SELECT INTO statement is one of the easy ways to create a new table and then copy the source table
data into this newly created table. In other words, the SELECT INTO statement performs a combo task:
Creates a clone table of the source table with exactly the same column names and data types
Reads data from the source table
Inserts data into the newly created table
We can use the SELECT INTO TEMP TABLE statement to perform the above tasks in one statement for the
temporary tables. In this way, we can copy the source table data into the temporary tables in a quick
manner.
One point to notice here is the temporary table and source table column names are the same. In order to
change the column names of the temporary table, we can give aliases to the source table columns in the
select query.
1 SELECT LocationID AS [TempLocationID],
2 Name AS [TempLocationName] ,ModifiedDate AS [TempModifiedDate]
3 INTO #TempLocationCol FROM Production.Location
4 GO
5 SELECT * FROM #TempLocationCol
At the same time, we can filter some rows of the Location and then insert the result set into a temporary
table. The following query filters the rows in which the Name column starts with the “F” character and
then inserts the resultsets into the temporary table.
1 SELECT LocationID,Name,ModifiedDate INTO #TempLocationCon FROM Production.Location
2 WHERE Name LIKE 'F%'
3 GO
4 SELECT * FROM #TempLocationCon
Required to declare the destination temporary table explicitly. So, it allows the flexibility to change Creates the destination temporary table
column data types and able to allows creates indexes. automatically.
The Gather Stream operator merges several parallel operations into a single operation. In this query
execution plan, we have used the ORDER BY clause but we can not see any sort of operator in the
execution plan. At the same time, the Clustered Index Scan operator does not return in a sorted manner.
The reason for this point is that there is no guarantee for the order of insertion of the rows into the table.
Conclusion
In this article, we have learned the syntax and usage details of the SELECT INTO TEMP TABLE statement.
This statement is very practical to insert table data or query data into the temporary tables.
In this article, I am going to give a detailed explanation of how to use the SQL MERGE statement in SQL
Server. The MERGE statement in SQL is a very popular clause that can handle inserts, updates, and
deletes all in a single transaction without having to write separate logic for each of these. You can specify
conditions on which you expect the MERGE statement to insert, update, or delete, etc.
Using the MERGE statement in SQL gives you better flexibility in customizing your complex SQL scripts
and also enhances the readability of your scripts. The MERGE statement basically modifies an existing
table based on the result of comparison between the key fields with another table in the context.
Figure 1 – MERGE Illustration
The above illustration depicts how a SQL MERGE statement basically works. As you can see, there are two
circles that represent two tables and can be considered as Source and a Target. The MERGE statement
tries to compare the source table with the target table based on a key field and then do some of the
processing. The MERGE statement actually combines the INSERT, UPDATE, and the DELETE operations
altogether. Although the MERGE statement is a little complex than the simple INSERTs or UPDATEs, once
you are able to master the underlying concept, you can easily use this SQL MERGE more often than using
the individual INSERTs or UPDATEs.
Conclusion
In this article, I have explained in detail about the SQL MERGE statement. This MERGE statement has
been introduced in the SQL Server 2008 which brought a great revolution in writing simpler and
maintainable code in SQL. The MERGE statement takes in two tables – a source and a target and
compares the records based on a key column, often the index column, and then performs an operation
on it. Being a database developer, I would definitely advise all young programmers to start using the SQL
MERGE statement more frequently while using complex stored procedures in SQL.
SQL Server table hints are a special type of explicit command that is used to override the default
behavior of the SQL Server query optimizer during the T-SQL query execution This is accomplished by
enforcing a specific locking method, a specific index or query processing operation, such index seek or
table scan, to be used by the SQL Server query optimizer to build the query execution plan. The table
hints can be added to the FROM clause of the T-SQL query, affecting the table or the view that is
referenced in the FROM clause only.
One of the more heavily used table hints in the SELECT T-SQL statements is the WITH (NOLOCK) hint.
The default transaction isolation level in SQL Server is the READ COMMITTED isolation level, in which
retrieving the changing data will be blocked until these changes are committed. The WITH (NOLOCK)
table hint is used to override the default transaction isolation level of the table or the tables within the
view in a specific query, by allowing the user to retrieve the data without being affected by the locks, on
the requested data, due to another process that is changing it. In this way, the query will consume less
memory in holding locks against that data. In addition to that, no deadlock will occur against the queries,
that are requesting the same data from that table, allowing a higher level of concurrency due to a lower
footprint. In other words, the WITH (NOLOCK) table hint retrieves the rows without waiting for the other
queries, that are reading or modifying the same data, to finish its processing. This is similar to the READ
UNCOMMITTED transaction isolation level, that allows the query to see the data changes before
committing the transaction that is changing it. The transaction isolation level can be set globally at the
connection level using the SET TRANSACTION ISOLATION LEVEL T-SQL command, as will see later in this
article.
Although the NOLOCK table hint, similar to all other table hints, can be used without using the WITH
keyword, Microsoft announced that omitting the WITH keyword is a deprecated feature and will be
removed from future Microsoft SQL Server versions. With that said, it is better to include the WITH
keyword when specifying the table hints. One benefit of using the WITH keyword is that you can specify
multiple table hints using the WITH keyword against the same table.
In general, using explicit table hints frequently is considered as a bad practice that you should generally
avoid. For the NOLOCK table hint specifically, reading uncommitted data that could be rolled back after
you have read it can lead to a Dirty read, which can occur when reading the data that is being modified
or deleted during the uncommitted data read, so that the data you read could be different, or never even
have existed.
The WITH (NOLOCK) table hint also leads to Nonrepeatable reads; this read occurs when it is required
to read the same data multiple times and the data changes during these readings. In this case, you will
read multiple versions of the same row.
Phantom reads can be also a result of using the WITH(NOLOCK) table hint, in which you will get more
records when the transaction that is inserting new records is rolled back, or fewer records when the
transaction that is deleting existing data is rolled back. Another problem that may occur when other
transactions move data you have not read yet to a location that you have already scanned, or have
added new pages to the location that you already scanned. In this case, you will miss these records and
will not see it in the returned result. If another transaction moves the data that you have already scanned
to a new location that you have not read yet, you will read the data twice. Also, as the requested data
could be moved or deleted during your reading process, the below error could be faced:
Msg 601, Level 12, State 1
Could not continue scan with NOLOCK due to data movement.
The WITH (NOLOCK) table hint is a good idea when the system uses explicit transactions heavily, which
blocks the data reading very frequently. The WITH (NOLOCK) table hint is used when working with
systems that accept out of sync data, such as the reporting systems.
To understand the usage of the WITH (NOLOCK) table hint practically, let us create a new table using the
CREATE TABLE T-SQL statement below:
1 USE SQLShackDemo
2 GO
3 CREATE TABLE LockTestDemo
4 ( ID INT IDENTITY (1,1) PRIMARY KEY,
5 EmpName NVARCHAR(50),
6 EmpAddress NVARCHAR(4000),
7 PhoneNumber VARCHAR(50)
8 )
After creating the table, we will fill it with 100K rows for testing purposes, using ApexSQL Generate, SQL
test data generator, as shown in the snapshot below:
Once the table is ready, we will simulate a blocking scenario, in which an update transaction will be
executed within a transaction that will begin and not committed or rolled back. The below BEGIN TRAN
T-SQL statement will start the transaction that will run the following UPDATE statement on the
LockTestDemo table under SQL session number 53, without closing the transaction by committing or
rolling it back:
1 BEGIN TRAN
2 UPDATE LockTestDemo SET EmpAddress = 'AMM' WHERE ID <100
With the table’s data locked by the transaction, we will run another SELECT statement, under SQL session
number 54, that retrieves data from the LockTestDemo table, using the SELECT statement below:
1 SELECT * FROM LockTestDemo
You will see that the previous SELECT statement will take a long time without retrieving any records.
Checking what is blocking that SELECT query using sp_who2 command with the session number for both
the SELECT and the UPDATE statements:
1 sp_who2 53
2 GO
3 sp_who2 54
The result will show you that, the previously opened transaction is not performing any action, as the
UPDATE statement executed successfully. But due to the fact that the transaction is not committed or
rolled back yet, it still blocking other queries that are trying to get data from that table. And the SELECT
statement that is running under session 54 is blocked by that transaction that is running under session
53, as shown in the result below:
The previous SELECT statement will keep waiting for the transaction to be killed, committed or rolled
back in order to get the requested rows from that table. You can stop the transaction that is running
under session 53 from blocking other queries by killing that session using the KILL command below:
1 KILL 53
Or simply committing or rolling back that transaction, by running the COMMIT or ROLLBACK command
under the same session of the transaction, if applicable, as shown below:
Once the locking is released, you will see that the requested rows will be retrieved from the SELECT
statement directly as shown in the results below:
The previous solution is not always preferable or applicable, for example, when the transaction that is
blocking our queries is critical and not easy to be killed or rolled back, or when you don’t have control
over other’s transactions within the database. In this case, the WITH (NOLOCK) table hint is useful here, if
you can tolerate the risk of dirty reads or data inconsistency. As mentioned previously, the WITH
(NOLOCK) table hint allows you to read the data that has been changed, but not committed to the
database yet. If you run the same SELECT statement without killing, committing or rolling back the
UPDATE transaction, but this time adding the WITH (NOLOCK) table hint to the table name in the SELECT
statement as shown below:
1 SELECT * FROM LockTestDemo WITH (NOLOCK)
Then checking the SELECT statement status using the sp_who2 command. You will see that the query is
running without waiting for the UPDATE transaction to be completed successfully and release the locking
on the table, as shown in the snapshot below:
The WITH (NOLOCK) table hint works the same as the READUNCOMMITTED table hint, allowing us to
retrieve the data that is changed but not committed yet. The same SELECT statement can be modified to
use the READUNCOMMITTED table hint as shown below:
1 SELECT * FROM LockTestDemo WITH (READUNCOMMITTED)
Retrieving the requested data directly, without waiting for the UPDATE statement to release the lock it
performed on the table, returning the same result as shown in the result set below:
Take into consideration that, the WITH (NOLOCK) and READUNCOMMITTED table hints can be only used
with the SELECT statements. If you try to use the WITH (NOLOCK) table hint in the DELETE statement, you
will get an error, showing that it both the WITH (NOLOCK) and READUNCOMMITTED table hints are not
allowed with the UPDATE, INSERT, DELETE or MERGE T-SQL statements, as shown below:
Rather than allowing a dirty read at the query level using the WITH (NOLOCK) and READUNCOMMITTED
table hints, you can change the transaction isolation level at the connection level to be READ
UNCOMMITTED using the SET TRANSACTION ISOLATION LEVEL T-SQL statement below:
1 SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
2 SELECT * FROM LockTestDemo
This query will also retrieve the same data directly, without using any table hint and without waiting for
the UPDATE statement to release the lock it performed on the table, as shown in the result set below:
From the previous results, you may think that this is the perfect solution for such scenarios, where you
will get the requested data faster, without waiting for other operations to be committed, taking the risk
of having not accurate data. But will the SELECT query that is using the WITH (NOLOCK) table hint
negatively affects other processes on the SQL Server? To get the answer, let us first check what type of
locks the WITH (NOLOCK) table hint will be granted during its execution. This can be achieved by simply
running the sp_lock command with the session number of the running query, while the query is running,
as shown below:
1 sp_lock 54
You will see from the result that the query that is using the WITH (NOLOCK) table hint will be
granted S and Sch-S locking types, as shown in the result below:
From the previous result, you will see that the WITH (NOLOCK) table hint will be granted shared access
(S) lock at the database level. The shared access (S) lock is used for reading operation, allowing
concurrent transactions to read data under pessimistic concurrency control, preventing other
transactions from modifying the locked resource while shared (S) locks exist on that resource, until that
locking is released as soon as the read operation completes.
The second kind of locking that is granted to the query using the WITH (NOLOCK) table hint is
the schema stability (Sch-S) lock. This lock will not prevent any other transaction from accessing the
resources except for the concurrent DDL operations, and concurrent DML operations that acquire
schema modification (Sch-M) locks on the same table, that will be blocked while the query is executing.
This really makes sense, as you do not need to start reading data from the table then another transaction
changes the structure of that table during your data retrieval process. SQL Server Database Engine uses
the schema modification (Sch-M) locks while processing the data definition language (DDL) commands,
such as adding a new column, dropping an existing column, dropping or rebuilding indexes, to prevent
concurrent access to the table, until the lock is released.
If we check the locks that are performed by each query, using the sys.dm_tran_locks system object as in
the query below:
1 SELECT *
2 FROM sys.dm_tran_locks
3 WHERE resource_type = 'OBJECT'
You will see that, the DROP/CREATE INDEX process running under session number 58 is waiting to
acquire schema modification (Sch-M) lock type. This occurs due to the fact that, the schema modification
(Sch-M) lock cannot be acquired while the schema stability (Sch_S) lock that is already granted to the
SELECT statement running under session number 53, already exists as shown in the snapshot below:
You can imagine the situation when you are scheduling huge number of reports at night, that are using
the WITH (NOLOCK) table hint just to be safe. At the same time, there are maintenance jobs that are also
scheduled to rebuild heavily fragmented indexes on the same table!
There are number of best practices and suggestions that you can follow, in order to avoid the problems
that you may face when using WITH (NOLOCK) table hint. Such suggestions include:
Include only the columns that are really required in your SELECT query
Make sure that your transaction is short, by separating different operations from each other. For
example, do not include a huge SELECT statement between two UPDATE operations
Try to find an alternative to the cursors
Take care to utilize and benefit from the newly defined WAIT_AT_LOW_PRIORITY option to do an
online rebuild for the indexes
Study reporting vs maintenances schedules well
Take care to utilize and benefit from the different SQL Server high availability solutions for
reporting purposes, such as:
o Configure the Always On Availability Groups secondary replicas to be readable and use it for
reporting
o Create database snapshots when using the SQL Server Database Mirroring and use it for
reporting
o Use the SQL Server Replication subscriber database for reporting
o Use the secondary database of the SQL Server Log Shipping for reporting
We perform calculations on data using various aggregated functions such as Max, Min, and AVG. We get
a single output row using these functions. SQL Sever provides SQL RANK functions to specify rank for
individual fields as per the categorizations. It returns an aggregated value for each participating row. SQL
RANK functions also knows as Window Functions.
Note: Windows term in this does not relate to the Microsoft Windows operating system. These are
SQL RANK functions.
We have the following rank functions.
ROW_NUMBER()
RANK()
DENSE_RANK()
NTILE()
In the SQL RANK functions, we use the OVER() clause to define a set of rows in the result set. We can also
use SQL PARTITION BY clause to define a subset of data in a partition. You can also use Order by clause
to sort the results in a descending or ascending order.
Before we explore these SQL RANK functions, let’s prepare sample data. In this sample data, we have
exam results for three students in Maths, Science and English subjects.
1 CREATE TABLE ExamResult
2 (StudentName VARCHAR(70),
3 Subject VARCHAR(20),
4 Marks INT
5 );
6 INSERT INTO ExamResult
7 VALUES
8 ('Lily',
9 'Maths',
10 65
11 );
12 INSERT INTO ExamResult
13 VALUES
14 ('Lily',
15 'Science',
16 80
17 );
18 INSERT INTO ExamResult
19 VALUES
20 ('Lily',
21 'english',
22 70
23 );
24 INSERT INTO ExamResult
25 VALUES
26 ('Isabella',
27 'Maths',
28 50
29 );
30 INSERT INTO ExamResult
31 VALUES
32 ('Isabella',
33 'Science',
34 70
35 );
36 INSERT INTO ExamResult
37 VALUES
38 ('Isabella',
39 'english',
40 90
41 );
42 INSERT INTO ExamResult
43 VALUES
44 ('Olivia',
45 'Maths',
46 55
47 );
48 INSERT INTO ExamResult
49 VALUES
50 ('Olivia',
51 'Science',
52 60
53 );
54 INSERT INTO ExamResult
55 VALUES
56 ('Olivia',
57 'english',
58 89
59 );
We have the following sample data in the ExamResult table.
By default, it sorts the data in ascending order and starts assigning ranks for each row. In the above
screenshot, we get ROW number 1 for marks 50.
We can specify descending order with Order By clause, and it changes the RANK accordingly.
1 SELECT Studentname,
2 Subject,
3 Marks,
4 ROW_NUMBER() OVER(ORDER BY Marks desc) RowNumber
5 FROM ExamResult;
RANK() SQL RANK Function
We use RANK() SQL Rank function to specify rank for each row in the result set. We have student results
for three subjects. We want to rank the result of students as per their marks in the subjects. For example,
in the following screenshot, student Isabella got the highest marks in English subject and lowest marks in
Maths subject. As per the marks, Isabella gets the first rank in English and 3rd place in Maths subject.
Execute the following query to get this result set. In this query, you can note the following things:
We use PARTITION BY Studentname clause to perform calculations on each student group
Each subset should get rank as per their Marks in descending order
The result set uses Order By clause to sort results on Studentname and their rank
1 SELECT Studentname,
2 Subject,
3 Marks,
4 RANK() OVER(PARTITION BY Studentname ORDER BY Marks DESC) Rank
5 FROM ExamResult
6 ORDER BY Studentname,
7 Rank;
Let’s execute the following query of SQL Rank function and look at the result set. In this query, we did
not specify SQL PARTITION By clause to divide the data into a smaller subset. We use SQL Rank function
with over clause on Marks clause ( in descending order) to get ranks for respective rows.
1 SELECT Studentname,
2 Subject,
3 Marks,
4 RANK() OVER(ORDER BY Marks DESC) Rank
5 FROM ExamResult
6 ORDER BY Rank;
In the output, we can see each student get rank as per their marks irrespective of the specific subject. For
example, the highest and lowest marks in the complete result set are 90 and 50 respectively. In the result
set, the highest mark gets RANK 1, and the lowest mark gets RANK 9.
If two students get the same marks (in our example, ROW numbers 4 and 5), their ranks are also the
same.
Let’s use DENSE_RANK function in combination with the SQL PARTITION BY clause.
1 SELECT Studentname,
2 Subject,
3 Marks,
4 DENSE_RANK() OVER(PARTITION BY Subject ORDER BY Marks DESC) Rank
5 FROM ExamResult
6 ORDER BY Studentname,
7 Rank;
We do not have two students with similar marks; therefore result set similar to RANK Function in this
case.
Let’s update the student mark with the following query and rerun the query.
1 Update Examresult set Marks=70 where Studentname='Isabella' and Subject='Maths'
We can see that in the student group, Isabella got similar marks in Maths and Science subjects. Rank is
also the same for both subjects in this case.
Let’s see the difference between RANK() and DENSE_RANK() SQL Rank function with the following query.
Query 1
1 SELECT Studentname,
2 Subject,
3 Marks,
4 RANK() OVER(PARTITION BY StudentName ORDER BY Marks ) Rank
5 FROM ExamResult
6 ORDER BY Studentname,
7 Rank;
Query 2
1 SELECT Studentname,
2 Subject,
3 Marks,
4 DENSE_RANK() OVER(PARTITION BY StudentName ORDER BY Marks ) Rank
5 FROM ExamResult
6 ORDER BY Studentname,
7 Rank;
In the output, you can see a gap in the rank function output within a partition. We do not have any gap
in the DENSE_RANK function.
In the following screenshot, you can see that Isabella has similar numbers in the two subjects. A rank
function assigns rank 1 for similar values however, internally ignores rank two, and the next row gets rank
three.
In the Dense_Rank function, it maintains the rank and does not give any gap for the values.
Similarly, NTILE(3) divides the number of rows of three groups having three records in each group.
1 SELECT *,
2 NTILE(3) OVER(
3 ORDER BY Marks DESC) Rank
4 FROM ExamResult
5 ORDER BY rank;
We can use SQL PARTITION BY clause to have more than one partition. In the following query, each
partition on subjects is divided into two groups.
1 SELECT *,
2 NTILE(2) OVER(PARTITION BY subject ORDER BY Marks DESC) Rank
3 FROM ExamResult
4 ORDER BY subject, rank;
We can use the OFFSET FETCH command starting from SQL Server 2012 to fetch a specific number of
records.
1 WITH StudentRanks AS
2 (
3 SELECT *, ROW_NUMBER() OVER( ORDER BY Marks) AS Ranks
4 FROM ExamResult
5 )
6
7 SELECT StudentName , Marks
8 FROM StudentRanks
9 ORDER BY Ranks OFFSET 1 ROWS FETCH NEXT 3 ROWS ONLY;
RANK It assigns the rank number to each row in a partition. It skips the number for similar values.
Dense_RANK It assigns the rank number to each row in a partition. It does not skip the number for similar values.
NTILE(N) It divides the number of rows as per specified partition and assigns unique value in the partition.
Conclusion
In this article, we explored SQL RANK functions and difference between these functions. It is helpful for
sql developers to be familiar with these functions to explore and manage their data well. If you have any
comments or questions, feel free to leave them in the comments below.
In this article, we will explore the table variable in SQL Server with various examples and we will also
discuss some useful tips about the table variables.
Definition
The table variable is a special type of the local variable that helps to store data temporarily, similar to the
temp table in SQL Server. In fact, the table variable provides all the properties of the local variable, but
the local variables have some limitations, unlike temp or regular tables.
Syntax
The following syntax describes how to declare a table variable:
1 DECLARE @LOCAL_TABLEVARIABLE TABLE
2 (column_1 DATATYPE,
3 column_2 DATATYPE,
4 column_N DATATYPE
5)
If we want to declare a table variable, we have to start the DECLARE statement which is similar to local
variables. The name of the local variable must start with at(@) sign. The TABLE keyword specifies that this
variable is a table variable. After the TABLE keyword, we have to define column names and datatypes of
the table variable in SQL Server.
In the following example, we will declare a table variable and insert the days of the week and their
abbreviations to the table variable:
1 DECLARE @ListOWeekDays TABLE(DyNumber INT,DayAbb VARCHAR(40) , WeekName VARCHAR(40))
2
3 INSERT INTO @ListOWeekDays
4 VALUES
5 (1,'Mon','Monday') ,
6 (2,'Tue','Tuesday') ,
7 (3,'Wed','Wednesday') ,
8 (4,'Thu','Thursday'),
9 (5,'Fri','Friday'),
10 (6,'Sat','Saturday'),
11 (7,'Sun','Sunday')
12 SELECT * FROM @ListOWeekDays
At the same time, we can update and delete the data contained in the table variables. The following
query delete and update rows:
1 DECLARE @ListOWeekDays TABLE(DyNumber INT,DayAbb VARCHAR(40) , WeekName VARCHAR(40))
2
3 INSERT INTO @ListOWeekDays
4 VALUES
5 (1,'Mon','Monday') ,
6 (2,'Tue','Tuesday') ,
7 (3,'Wed','Wednesday') ,
8 (4,'Thu','Thursday'),
9 (5,'Fri','Friday'),
10 (6,'Sat','Saturday'),
11 (7,'Sun','Sunday')
12 DELETE @ListOWeekDays WHERE DyNumber=1
13 UPDATE @ListOWeekDays SET WeekName='Saturday is holiday' WHERE DyNumber=6
14 SELECT * FROM @ListOWeekDays
As you can see, the previous query returns two result sets. The ResultSet-1 contains column names and
data types of the declared table variable and the ResultSet-2 does not contain any data. The reason for
this case is, the first INFORMATION_SCHEMA.COLUMNS view, and table variable executed in the same
batch so we can get the information of the @ExperiementTable table variable from the tempdb
database. The second query could not return any data about the @ExperiementTable because the GO
statement ends the batch so the life-cycle of the @ExperiementTable table variable is terminated. In this
section, we proved the storage location of the table variable in SQL Server.
Table variable CRUD operations do not manage by explicit transactions. As a result, ROLLBACK TRAN
cannot erase the modified data for the table variables.
The table variable in SQL Server should use an alias with the
join statements
If we want to join two or more table variables with each other or regular tables, we have to use an alias
for the table names. The usage of this looks like this:
1 DECLARE @Department TABLE
2 (DepartmentID INT PRIMARY KEY,
3 DepName VARCHAR(40) UNIQUE)
4
5 INSERT INTO @Department VALUES(1,'Marketing')
6 INSERT INTO @Department VALUES(2,'Finance')
7 INSERT INTO @Department VALUES(3,'Operations ')
8
9 DECLARE @Employee TABLE
10 (EmployeeID INT PRIMARY KEY IDENTITY(1,1),
11 EmployeeName VARCHAR(40),
12 DepartmentID VARCHAR(40))
13
14 INSERT INTO @Employee VALUES('Jodie Holloway','1')
15 INSERT INTO @Employee VALUES('Victoria Lyons','2')
16 INSERT INTO @Employee VALUES('Callum Lee','3')
17
18 select * from @Department Dep inner join @Employee Emp
19 on Dep.DepartmentID = Emp.DepartmentID
The table variable does not allow to create an explicit index
Indexes help to improve the performance of the queries but the CREATE INDEX statement cannot be
used to create an index for the table variables. For example, the following query will return an error:
1 DECLARE @TestTable TABLE
2 (ID INT PRIMARY KEY,
3 Col1 VARCHAR(40) UNIQUE,
4 Col2 VARCHAR(40) NOT NULL)
5
6
7 CREATE NONCLUSTERED INDEX test_index
8 ON @TestTable(Col1)
However, we can overcome this issue with the help of the implicit index definitions because the PRIMARY
KEY constraint or UNIQUE constraints definitions automatically create an index and we can use these
INDEX statements in order to create single or composite non-clustered indexes. When we execute the
following query, we can figure out the created index which belongs to @TestTable:
1 DECLARE @TestTable TABLE
2 (
3 Col1 INT NOT NULL PRIMARY KEY ,
4 Col2 INT NOT NULL INDEX Cluster_I1 (Col1,Col2),
5 Col3 INT NOT NULL UNIQUE
6 )
7
8
9 SELECT
10 ind.name,type_desc
11 FROM
12 tempdb.sys.indexes ind
13
14 where ind.object_id=(
15 SELECT OBJECT_ID FROM tempdb.sys.objects obj WHERE obj.name IN (
16 SELECT TABLE_NAME FROM tempdb.INFORMATION_SCHEMA.COLUMNS
17 WHERE (COLUMN_NAME = 'Col1' OR COLUMN_NAME='Col2' OR COLUMN_NAME='Col3')
18 ))
Conclusion
In this article, we explored the table variable in SQL Server details with various examples. Also, we
mentioned the features and limitations of the table variables.
In this article, we will explore the table variable in SQL Server with various examples and we will also
discuss some useful tips about the table variables.
Definition
The table variable is a special type of the local variable that helps to store data temporarily, similar to the
temp table in SQL Server. In fact, the table variable provides all the properties of the local variable, but
the local variables have some limitations, unlike temp or regular tables.
Syntax
The following syntax describes how to declare a table variable:
1 DECLARE @LOCAL_TABLEVARIABLE TABLE
2 (column_1 DATATYPE,
3 column_2 DATATYPE,
4 column_N DATATYPE
5 )
If we want to declare a table variable, we have to start the DECLARE statement which is similar to local
variables. The name of the local variable must start with at(@) sign. The TABLE keyword specifies that this
variable is a table variable. After the TABLE keyword, we have to define column names and datatypes of
the table variable in SQL Server.
In the following example, we will declare a table variable and insert the days of the week and their
abbreviations to the table variable:
1 DECLARE @ListOWeekDays TABLE(DyNumber INT,DayAbb VARCHAR(40) , WeekName VARCHAR(40))
2
3 INSERT INTO @ListOWeekDays
4 VALUES
5 (1,'Mon','Monday') ,
6 (2,'Tue','Tuesday') ,
7 (3,'Wed','Wednesday') ,
8 (4,'Thu','Thursday'),
9 (5,'Fri','Friday'),
10 (6,'Sat','Saturday'),
11 (7,'Sun','Sunday')
12 SELECT * FROM @ListOWeekDays
At the same time, we can update and delete the data contained in the table variables. The following
query delete and update rows:
1 DECLARE @ListOWeekDays TABLE(DyNumber INT,DayAbb VARCHAR(40) , WeekName VARCHAR(40))
2
3 INSERT INTO @ListOWeekDays
4 VALUES
5 (1,'Mon','Monday') ,
6 (2,'Tue','Tuesday') ,
7 (3,'Wed','Wednesday') ,
8 (4,'Thu','Thursday'),
9 (5,'Fri','Friday'),
10 (6,'Sat','Saturday'),
11 (7,'Sun','Sunday')
12 DELETE @ListOWeekDays WHERE DyNumber=1
13 UPDATE @ListOWeekDays SET WeekName='Saturday is holiday' WHERE DyNumber=6
14 SELECT * FROM @ListOWeekDays
What is the storage location of the table variables?
The answer to this question is – table variables are stored in the tempdb database. Why we underline this
is because sometimes the answer to this question is that the table variable is stored in the memory, but
this is totally wrong. Before proving the answer to this question, we should clarify one issue about the
table variables. The lifecycle of the table variables starts in the declaration point and ends at the end of
the batch. As a result, the table variable in SQL Server is automatically dropped at the end of the batch:
1 DECLARE @ExperiementTable TABLE
2 (
3 TestColumn_1 INT, TestColumn_2 VARCHAR(40), TestColumn_3 VARCHAR(40)
4 );
5 SELECT TABLE_CATALOG, TABLE_SCHEMA, COLUMN_NAME, DATA_TYPE
6 FROM tempdb.INFORMATION_SCHEMA.COLUMNS
7 WHERE COLUMN_NAME LIKE 'TestColumn%';
8
9 GO
10 SELECT TABLE_CATALOG, TABLE_SCHEMA, COLUMN_NAME, DATA_TYPE
11 FROM tempdb.INFORMATION_SCHEMA.COLUMNS
12 WHERE COLUMN_NAME LIKE 'TestColumn%';
As you can see, the previous query returns two result sets. The ResultSet-1 contains column names and
data types of the declared table variable and the ResultSet-2 does not contain any data. The reason for
this case is, the first INFORMATION_SCHEMA.COLUMNS view, and table variable executed in the same
batch so we can get the information of the @ExperiementTable table variable from the tempdb
database. The second query could not return any data about the @ExperiementTable because the GO
statement ends the batch so the life-cycle of the @ExperiementTable table variable is terminated. In this
section, we proved the storage location of the table variable in SQL Server.
On the other hand, Foreign Key constraints cannot use for the table variables. The other restriction is, we
have to define the constraints when we are declaring the table variable otherwise, we experience an
error. For example, the following query will return an error because of this restriction. We cannot alter the
table structure after the declaration of the table variable:
1 DECLARE @TestTable TABLE
2 (ID INT NOT NULL )
3
4 ALTER TABLE @TestTable
5 ADD CONSTRAINT PK_ID PRIMARY KEY (ID)
Table variable CRUD operations do not manage by explicit transactions. As a result, ROLLBACK TRAN
cannot erase the modified data for the table variables.
The table variable in SQL Server should use an alias with the
join statements
If we want to join two or more table variables with each other or regular tables, we have to use an alias
for the table names. The usage of this looks like this:
1 DECLARE @Department TABLE
2 (DepartmentID INT PRIMARY KEY,
3 DepName VARCHAR(40) UNIQUE)
4
5 INSERT INTO @Department VALUES(1,'Marketing')
6 INSERT INTO @Department VALUES(2,'Finance')
7 INSERT INTO @Department VALUES(3,'Operations ')
8
9 DECLARE @Employee TABLE
10 (EmployeeID INT PRIMARY KEY IDENTITY(1,1),
11 EmployeeName VARCHAR(40),
12 DepartmentID VARCHAR(40))
13
14 INSERT INTO @Employee VALUES('Jodie Holloway','1')
15 INSERT INTO @Employee VALUES('Victoria Lyons','2')
16 INSERT INTO @Employee VALUES('Callum Lee','3')
17
18 select * from @Department Dep inner join @Employee Emp
19 on Dep.DepartmentID = Emp.DepartmentID
The table variable does not allow to create an explicit index
Indexes help to improve the performance of the queries but the CREATE INDEX statement cannot be
used to create an index for the table variables. For example, the following query will return an error:
1 DECLARE @TestTable TABLE
2 (ID INT PRIMARY KEY,
3 Col1 VARCHAR(40) UNIQUE,
4 Col2 VARCHAR(40) NOT NULL)
5
6
7 CREATE NONCLUSTERED INDEX test_index
8 ON @TestTable(Col1)
However, we can overcome this issue with the help of the implicit index definitions because the PRIMARY
KEY constraint or UNIQUE constraints definitions automatically create an index and we can use these
INDEX statements in order to create single or composite non-clustered indexes. When we execute the
following query, we can figure out the created index which belongs to @TestTable:
1 DECLARE @TestTable TABLE
2 (
3 Col1 INT NOT NULL PRIMARY KEY ,
4 Col2 INT NOT NULL INDEX Cluster_I1 (Col1,Col2),
5 Col3 INT NOT NULL UNIQUE
6 )
7
8
9 SELECT
10 ind.name,type_desc
11 FROM
12 tempdb.sys.indexes ind
13
14 where ind.object_id=(
15 SELECT OBJECT_ID FROM tempdb.sys.objects obj WHERE obj.name IN (
16 SELECT TABLE_NAME FROM tempdb.INFORMATION_SCHEMA.COLUMNS
17 WHERE (COLUMN_NAME = 'Col1' OR COLUMN_NAME='Col2' OR COLUMN_NAME='Col3')
18 ))
Conclusion
In this article, we explored the table variable in SQL Server details with various examples. Also, we
mentioned the features and limitations of the table variables.
Temporary tables, also known as temp tables, are widely used by the database administrators and
developers. However, it may be necessary to drop the temp table before creating it. It is a common
practice to check whether the temporary table exists or not exists. So, we can eliminate the “There is
already an object named ‘#temptablename’ in the database” error during the temporary table creation.
Temporary Tables
The temporary tables are used to store data for an amount of time in SQL Server. Many features of the
temporary tables are similar to the persisted tables. Such as, we can create indexes, statistics, and
constraints for these tables like we do for persisted tables.
The types of temporary tables affect the life-cycle of the temporary tables. Now, we will take a glance at
them.
Global Temporary Tables: The name of this type of temporary table starts with a double “##” hashtag
symbol and can be accessed from all other connections. This is the major difference between the local
and global temporary tables. If the session where the global temporary table was created is closed, the
global temporary table will be dropped automatically.
The following query will create a global temporary table:
1 CREATE TABLE ##GlobalCustomer
2 (
3 CustomerId int,
4 CustomerName varchar(50),
5 CustomerAdress varchar(150)
6 )
7 GO
8 INSERT INTO ##GlobalCustomer VALUES(1,'Adam Tottropx' ,'30 Mztom Street LONDON')
9 GO
10 SELECT * FROM ##GlobalCustomer
The following table expresses the main differences between global and local temporary tables:
Local Temporary Tables Global Temporary Tables
Names start with a single “#” hashtag symbol. Names start with a double “##” hashtag symbol.
Tables can be accessed only from the session where the table was created. Tables can be accessed from all other sessions.
Cannot be dropped by the other connections. Can be dropped by the other connections.
SQL Server adds random numbers at the end of the local table variables names. The idea behind this
logic is pretty simple. More than one different connection can create local temporary tables with the
same name, so SQL Server automatically adds a random number at the end of this type of temporary
table name. In this way, the SQL Server avoids the same name conflicts.
There is no doubt that after these learnings, if we want to drop any temp table, we should work on
the tempdb database.
Conclusion
In this article, we learned the basics of the temporary tables, and we discussed dropping the temp table
techniques in SQL Server. According to my thought, the best way is using the DROP TABLE IF
EXISTS statement, but we can use other alternative methods easily.
Temporary tables, also known as temp tables, are widely used by the database administrators and
developers. However, it may be necessary to drop the temp table before creating it. It is a common
practice to check whether the temporary table exists or not exists. So, we can eliminate the “There is
already an object named ‘#temptablename’ in the database” error during the temporary table creation.
Temporary Tables
The temporary tables are used to store data for an amount of time in SQL Server. Many features of the
temporary tables are similar to the persisted tables. Such as, we can create indexes, statistics, and
constraints for these tables like we do for persisted tables.
The types of temporary tables affect the life-cycle of the temporary tables. Now, we will take a glance at
them.
Global Temporary Tables: The name of this type of temporary table starts with a double “##” hashtag
symbol and can be accessed from all other connections. This is the major difference between the local
and global temporary tables. If the session where the global temporary table was created is closed, the
global temporary table will be dropped automatically.
The following query will create a global temporary table:
1 CREATE TABLE ##GlobalCustomer
2 (
3 CustomerId int,
4 CustomerName varchar(50),
5 CustomerAdress varchar(150)
6 )
7 GO
8 INSERT INTO ##GlobalCustomer VALUES(1,'Adam Tottropx' ,'30 Mztom Street LONDON')
9 GO
10 SELECT * FROM ##GlobalCustomer
The following table expresses the main differences between global and local temporary tables:
Local Temporary Tables Global Temporary Tables
Tables can be accessed only from the session where the table was
created. Tables can be accessed from all other sessions.
Cannot be dropped by the other connections. Can be dropped by the other connections.
SQL Server adds random numbers at the end of the local table variables names. The idea behind this
logic is pretty simple. More than one different connection can create local temporary tables with the
same name, so SQL Server automatically adds a random number at the end of this type of temporary
table name. In this way, the SQL Server avoids the same name conflicts.
There is no doubt that after these learnings, if we want to drop any temp table, we should work on
the tempdb database.
To achieve this check, we can use different techniques. Let us learn these techniques:
Using OBJECT_ID function to check temporary table existence
OBJECT_ID function is used to obtain the identification number of the database
object. OBJECT_ID function can take the object’s name as a parameter so we can use this function to
check the existence of any object in the particular database.
The following query will check the #LocalCustomer table existence in the tempdb database, and if it
exists, it will be dropped.
For the local temporary tables:
1 IF OBJECT_ID(N'tempdb..#LocalCustomer') IS NOT NULL
2 BEGIN
3 DROP TABLE #LocalCustomer
4 END
5 GO
6
7 CREATE TABLE #LocalCustomer
8 (
9 CustomerId int,
10 CustomerName varchar(50),
11 CustomerAdress varchar(150)
12 )
For the global temporary tables:
1 IF OBJECT_ID(N'tempdb..##GlobalCustomer') IS NOT NULL
2 BEGIN
3 DROP TABLE ##GlobalCustomer
4 END
5 GO
6
7 CREATE TABLE ##GlobalCustomer
8 (
9 CustomerId int,
10 CustomerName varchar(50),
11 CustomerAdress varchar(150)
12 )
IF OBJECT_ID(N'tempdb..#TempTableName')
1
IS NOT NULL
2
BEGIN
3
DROP TABLE #TempTableName
4
END
5
GO
6
7
CREATE TABLE #TempTableName
8
(
9
Col1 VARCHAR(100)
10
Using OBJECT_ID function )
As indicated, every page is read from the data cache, whether or not it was necessary to bring that page
from disk into the cache for any given read. To reduce the cost of the query we will change the SQL
Server database schema and split the EmployeeReports table vertically.
Next we’ll create the ReportsDesc table and move the large ReportDescription column, and
the ReportsData table and move all data from the EmployeeReports table except
the ReportDescription column:
1 CREATE TABLE ReportsDesc
2 ( ReportID int FOREIGN KEY REFERENCES EmployeeReports (ReportID),
3 ReportDescription varchar(max)
4 CONSTRAINT PK_ReportDesc PRIMARY KEY CLUSTERED (ReportID)
5 )
6
7 CREATE TABLE ReportsData
8 (
9 ReportID int NOT NULL,
10 ReportName varchar (100),
11 ReportNumber varchar (20),
12
13 CONSTRAINT DReport_PK PRIMARY KEY CLUSTERED (ReportID)
14 )
15 INSERT INTO dbo.ReportsData
16 (
17 ReportID,
18 ReportName,
19 ReportNumber
20 )
21 SELECT er.ReportID,
22 er.ReportName,
23 er.ReportNumber
24 FROM dbo.EmployeeReports er
The same search query will now give different results:
1 SET STATISTICS IO ON
2 SET STATISTICS TIME ON
3 SELECT er.ReportID, er.ReportName, er.ReportNumber
4 FROM ReportsData er
5 WHERE er.ReportNumber LIKE '%33%'
6 SET STATISTICS IO OFF
7 SET STATISTICS TIME OFF
Vertical partitioning on SQL Server tables may not be the right method in every case. However, if you
have, for example, a table with a lot of data that is not accessed equally, tables with data you want to
restrict access to, or scans that return a lot of data, vertical partitioning can help.
Tables are horizontally partitioned based on a column which will be used for partitioning and the ranges
associated to each partition. Partitioning column is usually a datetime column but all data types that are
valid for use as index columns can be used as a partitioning column, except a timestamp column. The
ntext, text, image, xml, varchar(max), nvarchar(max), or varbinary(max), Microsoft .NET Framework
common language runtime (CLR) user-defined type, and alias data type columns cannot be specified.
There are two different approaches we could use to accomplish table partitioning. The first is to create a
new partitioned table and then simply copy the data from your existing table into the new table and do a
table rename. The second approach is to partition an existing table by rebuilding or creating a clustered
index on the table.
When filegrups are created we will add .ndf file to every filegroup:
1 ALTER DATABASE [PartitioningDB]
2 ADD FILE
3 (
4 NAME = [PartJan],
5 FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL11.LENOVO\MSSQL\DATA\PartitioningDB.ndf',
6 SIZE = 3072 KB,
7 MAXSIZE = UNLIMITED,
8 FILEGROWTH = 1024 KB
9 ) TO FILEGROUP [January]
10
The same way files to all created filegroups with specifying the logical name of the file and the operating
system (physical) file name for each filegroup e.g.:
1 ALTER DATABASE [PartitioningDB]
2 ADD FILE
3 (
4 NAME = [PartFeb],
5 FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL11.LENOVO\MSSQL\DATA\PartitioningDB2.ndf',
6 SIZE = 3072 KB,
7 MAXSIZE = UNLIMITED,
8 FILEGROWTH = 1024 KB
9 ) TO FILEGROUP [February]
To check files created added to the filegroups run the following query:
1 SELECT
2 name as [FileName],
3 physical_name as [FilePath]
4 FROM sys.database_files
5 where type_desc = 'ROWS'
6 GO
After creating additional filegroups for storing data we’ll create a partition function. A partition function
is a function that maps the rows of a partitioned table into partitions based on the values of a
partitioning column. In this example we will create a partitioning function that partitions a table into 12
partitions, one for each month of a year’s worth of values in a datetime column:
1 CREATE PARTITION FUNCTION [PartitioningByMonth] (datetime)
2 AS RANGE RIGHT FOR VALUES ('20140201', '20140301', '20140401',
3 '20140501', '20140601', '20140701', '20140801',
4 '20140901', '20141001', '20141101', '20141201');
To map the partitions of a partitioned table to filegroups and determine the number and domain of the
partitions of a partitioned table we will create a partition scheme:
1 CREATE PARTITION SCHEME PartitionBymonth
2 AS PARTITION PartitioningBymonth
3 TO (January, February, March,
4 April, May, June, July,
5 Avgust, September, October,
6 November, December);
Now we’re going to create the table using the PartitionBymonth partition scheme, and fill it with the
test data:
1 CREATE TABLE Reports
2 (ReportDate datetime PRIMARY KEY,
3 MonthlyReport varchar(max))
4 ON PartitionBymonth (ReportDate);
5 GO
6
7 INSERT INTO Reports (ReportDate,MonthlyReport)
8 SELECT '20140105', 'ReportJanuary' UNION ALL
9 SELECT '20140205', 'ReportFebryary' UNION ALL
10 SELECT '20140308', 'ReportMarch' UNION ALL
11 SELECT '20140409', 'ReportApril' UNION ALL
12 SELECT '20140509', 'ReportMay' UNION ALL
13 SELECT '20140609', 'ReportJune' UNION ALL
14 SELECT '20140709', 'ReportJuly' UNION ALL
15 SELECT '20140809', 'ReportAugust' UNION ALL
16 SELECT '20140909', 'ReportSeptember' UNION ALL
17 SELECT '20141009', 'ReportOctober' UNION ALL
18 SELECT '20141109', 'ReportNovember' UNION ALL
19 SELECT '20141209', 'ReportDecember'
We will now verify the rows in the different partitions:
1 SELECT
2 p.partition_number AS PartitionNumber,
3 f.name AS PartitionFilegroup,
4 p.rows AS NumberOfRows
5 FROM sys.partitions p
6 JOIN sys.destination_data_spaces dds ON p.partition_number = dds.destination_id
7 JOIN sys.filegroups f ON dds.data_space_id = f.data_space_id
8 WHERE OBJECT_NAME(OBJECT_ID) = 'Reports'
Now just copy data from your table and rename a partitioned table.
In the Select a Partitioning Column window, select a column which will be used to partition a table
from available partitioning columns:
Other options in the Create Partition Wizard dialog include the Collocate this table to the selected
partition table option used to display related data to join with the partitioned column and the Storage
Align Non Unique Indexes and Unique Indexes with an Indexed Partition Column option that aligns
all indexes of the partitioned table with the same partition scheme.
After selecting a column for partitioning click the Next button. In the Select a Partition
Function window enter the name of a partition function to map the rows of the table or index into
partitions based on the values of the ReportDate column, or choose the existing partition function:
Click the Next button and in the Select a Partition Scheme window create the partition scheme to map
the partitions of the MonthlyReport table to different filegroups:
Click the Next button and in the Map Partitions window choose the rage of partitioning and select the
available filegroups and the range boundary. The Left boundary is based on Value <= Boundary and the
Right boundary is based on Value < Boundary.
By clicking the Set boundaries button you can customize the date range and set the start and the end
date for each partition:
The Estimate storage option determines the Rowcount, the Required space, and the Available space
columns that displays an estimate on required space and available space based on number of records in
the table.
The next screen of the wizard offers to choose the option to whether to execute the script immediately
by the wizard to create objects and a partition table, or to create a script and save it. A schedule for
executing the script to perform the operations automatically can also be specified:
The next screen of the wizard shows a review of selections made in the wizard:
Click the Finish button to complete the process:
References
Partitioning
Partitioned Tables and Indexes
Files and Filegroups Architecture
See more
Seamlessly integrate a powerful, SQL formatter into SSMS and/or Visual Studio with ApexSQL Refactor.
ApexSQL Refactor is a SQL query formatter but it can also obfuscate SQL, refactor objects, safely rename
objects and more – with nearly 200 customizable options
In this article, we will learn how to create stored procedures in SQL Server with different examples.
SQL Server stored procedure is a batch of statements grouped as a logical unit and stored in the
database. The stored procedure accepts the parameters and executes the T-SQL statements in the
procedure, returns the result set if any.
To understand differences between functions and stored procedures in SQL Server, you can refer to this
article, Functions vs stored procedures in SQL Server and to learn about Partial stored procedures in SQL
Server, click Partial stored procedures in SQL Server.
When you try to script the encrypted stored procedure from SQL Server management studio, it throws an
error as below.
Conclusion
In this article, we explored SQL Server stored procedures with different examples. In case you have any
questions, please feel free to ask in the comment section below
In this article, we will learn how to create stored procedures in SQL Server with different examples.
SQL Server stored procedure is a batch of statements grouped as a logical unit and stored in the
database. The stored procedure accepts the parameters and executes the T-SQL statements in the
procedure, returns the result set if any.
To understand differences between functions and stored procedures in SQL Server, you can refer to this
article, Functions vs stored procedures in SQL Server and to learn about Partial stored procedures in SQL
Server, click Partial stored procedures in SQL Server.
When you try to script the encrypted stored procedure from SQL Server management studio, it throws an
error as below.
Creating a temporary procedure
Like the temporary table, we can create temporary procedures as well. There are two types of temporary
procedures, one is a local temporary stored procedure and another one is a global temporary procedure.
These procedures are created in the tempdb database.
Local temporary SQL Server stored procedures: These are created with # as prefix and can be accessed
only in the session where it created. This procedure is automatically dropped when the connection is
closed.
Following is the example of creating a local temporary procedure.
1 CREATE PROCEDURE #Temp
2 AS
3 BEGIN
4 PRINT 'Local temp procedure'
5 END
Global temporary SQL Server stored procedure: These procedures are created with ## as prefix and
can be accessed on the other sessions as well. This procedure is automatically dropped when the
connection which is used to create the procedure is closed.
Below is the example of creating a global temporary procedure.
1 CREATE PROCEDURE ##TEMP
2 AS
3 BEGIN
4 PRINT 'Global temp procedure'
5 END
Introduction
I was training some Oracle DBAs in T-SQL and they asked me how to create arrays in SQL Server.
I told them that there were no arrays in SQL Server like the ones that we have in Oracle (varray). They
were disappointed and asked me how was this problem handled.
Some developers asked me the same thing. Where are the arrays in SQL Server?
The short answer is that we use temporary tables or TVPs (Table-valued parameters) instead of arrays or
we use other functions to replace the used of arrays.
The use of temporary tables, TVPs and table variables is explained in another article:
The tempdb database, introduction and recommendations
In this article, we will show:
How to use a table variable instead of an array
The function STRING_SPLIT function which will help us to replace the array functionality
How to work with older versions of SQL Server to handle a list of values separated by commas
Requirements
1. SQL Server 2016 or later with SSMS installed
2. The Adventureworks database installed
Getting started
How to use a table variable instead of an array
In the first demo, we will show how to use a table variable instead of an array.
We will create a table variable using T-SQL:
1 DECLARE @myTableVariable TABLE (id INT, name varchar(20))
2 insert into @myTableVariable values(1,'Roberto'),(2,'Gail'),(3,'Dylan')
3 select * from @myTableVariable
We created a table variable named myTableVariable and we inserted 3 rows and then we did a select in
the table variable.
The select will show the following values:
Now, we will show information of the table Person.person of the adventureworks database that match
with the nameS of the table variable:
1 DECLARE @myTableVariable TABLE (id INT, name varchar(20))
2 insert into @myTableVariable values(1,'Roberto'),(2,'Gail'),(3,'Dylan')
3
4 SELECT [BusinessEntityID]
5 ,[PersonType]
6 ,[NameStyle]
7 ,[Title]
8 ,[FirstName]
9 ,[MiddleName]
10 ,[LastName]
11
12 FROM [Adventureworks].[Person].[Person] where
13 FirstName
14 IN (Select name from @myTableVariable)
The results will display the names and information of the table Person.person with the names of Roberto,
Gail and Dylan:
Note that in SQL Server, it is better to use SQL sentences to compare values. It is more efficient. We do
not use loops (WHILE) in general because it is slower and it is not efficient.
You can use the id to retrieve values from a specific row. For example, for Roberto, the id is 1 for Dylan
the id is 3 and for Gail the id is 2.
In C# for example if you want to list the second member of an array, you should run something like this:
1 Array[1];
You use the brackets and the number 1 displays the second number of the array (the first one is 0).
In a table variable, you can use the id. If you want to list the second member (id=2) of the table variable,
you can do something like this:
1 DECLARE @myTableVariable TABLE (id INT, name varchar(20))
2 insert into @myTableVariable values(1,'Roberto'),(2,'Gail'),(3,'Dylan')
3 select * from @myTableVariable where id=2
In other words, you can use the id to get a specific member of the table variable.
The problem with table variables is that you need to insert values and it requires more code to have a
simple table with few rows.
In C# for example, to create an array, you only need to write the elements and you do not need to insert
data into the table:
1 string[] names = new string[] {"Gail","Roberto","Dylan"};
It is just a single line of code to have the array with elements. Can we do something similar in SQL
Server?
The next solution will help us determine this
The function STRING_SPLIT function
Another solution is to replace arrays with the use of the new function STRING_SPLIT. This function is
applicable in SQL Server 2016 or later versions and applicable in Azure SQL.
If you use the function in an old adventureworks database or in SQL Server 2014 or older, you may
receive an error message. The following example will try to split 3 names separated by commas:
1 SELECT value FROM STRING_SPLIT('Roberto,Gail,Dylan', ',');
A typical error message would be the following:
Msg 208, Level 16, State 1, Line 8
Invalid object name ‘STRING_SPLIT’
If you receive this error in SQL Server 2016, check your database compatibility level:
1 SELECT compatibility_level
2 FROM sys.databases WHERE name = 'AdventureWorks';
3 GO
If your compatibility level is lower than 130, use this T-SQL sentence to change the compatibility level:
1 ALTER DATABASE [Adventureworks] SET COMPATIBILITY_LEVEL = 130
If you do not like T-SQL, you can right click the database in SSMS and go to options and change the
compatibility level:
The T-SQL sentence will convert the values separated by commas in rows:
1 SELECT value FROM STRING_SPLIT('Roberto,Gail,Dylan', ',');
The values will be converted to rows:
In the STRING_SPLIT function, you need to specify the separator.
The following query will show the information of people in the person.person table that matches the
names used in the STRING_SPLIT function:
1 SELECT [BusinessEntityID]
2 ,[PersonType]
3 ,[NameStyle]
4 ,[Title]
5 ,[FirstName]
6 ,[MiddleName]
7 ,[LastName]
8
9 FROM [Adventureworks].[Person].[Person] where
10 FirstName
11 IN (SELECT value FROM STRING_SPLIT('Roberto,Gail,Dylan', ','));
The query will show information about the people with the names equal to Roberto or Gail or Dylan:
If you want to retrieve a specific member of the string, you can assign a row # to each member row of
the STRING_SPLIT. The following code shows how retrieve the information
1 WITH fakearray AS
2 (
3 SELECT
4 ROW_NUMBER() OVER(ORDER BY value DESC) AS ID,value FROM STRING_SPLIT('Roberto,Gail,Dylan', ',')
5 )
6 SELECT ID, value
7 FROM fakearray
8 WHERE ID =3
ROW_NUMBER is used to add an id to each name. For example, Roberto has the id =1, Gail id=2 and
Dylan 3.
Once you have the query in a CTE expression, you can do a select statement and use the WHERE to
specify an ID. In this example, the query will show Dylan information (ID=3). As you can see, to retrieve a
value of a specific member of the fake array is not hard, but requires more code than a programming
language that supports arrays.
How to work with older versions of SQL Server
STRING_SPLIT is pretty helpful, but how was it handled in earlier versions?
There are many ways to solve this, but we will use the XML solution. The following example will show
how to show the values that match the results of a fake vector:
1 DECLARE @oldfakearray VARCHAR(100) = 'Roberto,Gail,Dylan';
2 DECLARE @param XML;
3
4 SELECT @param = CAST('<i>' + REPLACE(@oldfakearray,',','</i><i>') + '</i>' AS XML)
5
6
7 SELECT [BusinessEntityID]
8 ,[PersonType]
9 ,[NameStyle]
10 ,[Title]
11 ,[FirstName]
12 ,[MiddleName]
13 ,[LastName]
14
15 FROM [Adventureworks].[Person].[Person]
16 WHERE FirstName IN
17 (SELECT x.i.value('.','NVARCHAR(100)') FROM @param.nodes('//i') x(i))
The code will do the same that the STRING_SPLIT or the table variable solution:
In the first line, we just create a new fake array named oldfakearray and assign the names in the variable:
1 DECLARE @oldfakearray VARCHAR(100) = 'Roberto,Gail,Dylan';
In the second line, we are declaring an XML variable:
1 DECLARE @param XML;
In the next line, we are removing the comma and creating a XML with the values of the oldfakearray:
1 SELECT @param = CAST('<i>' + REPLACE(@oldfakearray,',','</i><i>') + '</i>' AS XML)
Finally, we are doing a select from the table Person.Person in the Adventureworks database where the
firstname is in the @param variable:
1 SELECT [BusinessEntityID]
2 ,[PersonType]
3 ,[NameStyle]
4 ,[Title]
5 ,[FirstName]
6 ,[MiddleName]
7 ,[LastName]
8
9 FROM [Adventureworks].[Person].[Person]
10 WHERE FirstName IN
11 (SELECT x.i.value('.','NVARCHAR(100)') FROM @param.nodes('//i') x(i))
As you can see, it is not an array, but it helps to compare a list of values with a table.
Conclusion
As you can see, SQL Server does not include arrays. But we can use table variables, temporary tables or
the STRING_SPLIT function. However, the STRING_SPLIT function is new and can be used only on SQL
Server 2016 or later versions.
If you do not have SQL Server, there were older methods to split strings separated by commas. We show
the method using XML files
In this article we’ll review the SQL varchar data type including a basic definition and overview, differences
from varchar(n), UTF-8 support, Collation, performance considerations and more.
Data plays a crucial part in any organization and an attribute by which it is defined is called its data type.
In simple words, data type states what kind of data any object, variable or expression can store. As a SQL
developer, while creating a SQL table, we have to understand and decide what type of data will be
contained by each and every column in a table. Like any other programming language, SQL also supports
a gamut of data types that can hold integer data, date and time data, character data etc. and allows you
to define data types of your own as well. SQL varchar is one of the best-known and most-used data types
among the lot. In this article, we will walk through different facets of the SQL Server varchar in the SQL
server.
Below is the outline that we will cover in this block.
1. Introduction to the SQL Server varchar data type in SQL Server
2. Use of varchar for large blocks of text
3. What is new in SQL Server 2019 preview for varchar datatype?
4. Influence of collation on varchar SQL in SQL Server
5. UTF-8 support with varchar in SQL Server 2019 CTP
6. SQL Server varchar for data conversions and data display
7. Storage and performance considerations using SQL Server varchar
8. Impact on string length of SQL varchar with CAST and CONVERT functions
Let’s move ahead and see the aforementioned in action.
Suppose, there is a new addition of an employee in the organization and we, as SQL data developers,
would have to insert this new record into the above table using INSERT SQL Statement. Below is one
such example shown.
1 INSERT INTO Demovarchar VALUES('Newton Hamilton', 'Isaac','M','Design Head',69)
Oops, SQL Server encountered an error and terminated the statement saying string or binary data would
be truncated. This has occurred because, column LastName varchar(10) can hold up to 10 characters and
here we are attempting to insert a new record with string length(‘Newton Hamilton’) which is clearly
greater than 10 characters. As a quick fix, we can alter the table and increase the data type of the SQL
varchar column, say to varchar(50) to insert the new row. Execute the below script to ALTER and INSERT a
new record into the table. Additionally, you can use LEN() and DATALENGTH() functions to determine the
number of characters and the storage size in bytes respectively that are stored in the varchar column.
1 ALTER TABLE Demovarchar
2 ALTER COLUMN LastName varchar(50)
3 INSERT INTO Demovarchar VALUES('Newton Hamilton', 'Isaac','M','Design Head',69)
4 SELECT * FROM Demovarchar
We observed above how we can set or alter the string length in the SQL varchar column to meet the
business needs. However, consider a scenario, where we are unsure of the data size that is going to be
loaded into our SQL tables, in such circumstances, inspecting and altering data type size for each and
every column is not a viable choice. One of the options to handle this could be is to set the string length
on the higher bar in the SQL Server varchar column (provided you have a rough estimation of what
length of the string column would be approximately).
An important point to keep in consideration, we can use string length up to varchar(8000) only as this is
the maximum number of characters that SQL varchar(n) data type can hold. So in cases when there are
chances that the string length of the varchar column might exceed 8000 bytes, using varchar(8001) or
anything higher will result into an error. One short example demonstrating this fact is shown below.
1 DECLARE @name AS varchar(8001) = 'john parker d''souza';
2 SELECT @name Name
SQL Server 2005 got around this limitation of 8KB storage size and provided a workaround with
varchar(max). It is a non-Unicode large variable-length character data type and can store a maximum of
2^31-1 bytes (2 GB) of non-Unicode characters.
When I got first introduced to the concepts of varchar(n) and SQL varchar, the common question like any
other beginner I had, was why can’t we simply declare a column of data type varchar(8500) or higher,
since we have varchar(max) that takes care of storage up to 2GB and why are we supposed to either use
varchar(<=8000) or varchar(max)? I got my answers on a little research that SQL Server uses page to
store data and the size of each page is 8KB(excluding page header, row offsets size). If the data to be
stored is less than or equal to 8000 bytes, varchar(n) or varchar(max) stores it in-row. However, if the data
exceeds the 8000 byte size then it is treated as a Large Object(LOB) and they are not stored in-row but in
separate LOB pages(LOB_DATA). Row in such case will only have a pointer to the LOB data page where
the actual data is present and SQL Server automatically assigns an over-flow indicator to the page to
manipulate data rows. In nutshell, if you know the data might exceed 8000 byte, it is a better option to
use varchar(max) as the data type.
We can refer to the DMV sys.dm_db_index_physical_stats to see what kind of page allocation
(IN_ROW_DATA data/LOB_DATA/ ROW_OVERFLOW_DATA) is performed. You can also check out this
link in case you want detailed explanation on how SQL Server exercises row and page limits with both
varchar(n) and varchar(max) data types.
Let’s quickly jump over to SSMS and see how we can use varchar(max). Execute the following script to
insert 1 record where StringCol column value in each row is 15,000 B characters (i.e. 15,000 bytes).
1 CREATE TABLE Demovarcharmax
2 (
3 ID INT IDENTITY(1, 1) ,
4 StringCol VARCHAR(MAX)
5 )
6 INSERT INTO Demovarcharmax(StringCol) VALUES(REPLICATE(CAST('B' AS VARCHAR(MAX)), 15000))
7 SELECT Id, StringCol,len(StringCol) AS LengthOfString FROM Demovarcharmax
One limitation of using varchar(max) is we cannot create an index that has a varchar(max) as a key
column, instead, it is advisable to do a Full-text index on that column.
A quick note to make – From here to the last leg of this article, we will mention varchar in place of
varchar(n). Do NOT consider it as the varchar with default value = 1.
To learn some more interesting differences between varchar(n) and varchar(max) in SQL Server, consider
going through this article, Comparing VARCHAR(max) vs VARCHAR(n) data types in SQL Server.
With SQL Server 2019 preview version, we can assign Unicode collations (UTF-8 supported) as well for
SQL varchar columns using the COLLATE clause while declaring the varchar column. This way, specific
collation is applied to the particular column’s data without impacting the rest of the database.
Since we are dealing with SQL Server varchar data type in this post, let’s see how Column Collation with
SQL varchar datatype works. Execute the code below to alter the SQL Server varchar Column Collation
from one collation type to _UTF8 suffix. You can read more on Database Collation from here.
CREATE TABLE demovarcharcollate
1
(ID int PRIMARY KEY,
2
Description varchar(50) COLLATE LATIN1_GENERAL_100_CI_AS_SC NOT NULL
3
);
4
ALTER TABLE demovarcharcollate
5
ALTER COLUMN Description varchar(50) COLLATE LATIN1_GENERAL_100_CI_AS_SC_UTF8 NOT
6
NULL;
Bottom line is to use the data type that fits our need. You can use SQL varchar when the sizes of the
column vary considerably, use varchar(max) when there are chances that string length might exceed 8000
bytes, use char when the sizes of the column are fixed and use nvarchar if there is a requirement to store
Unicode or multilingual data.
Conclusion
Data types play a fundamental role in database design but they are often overlooked. A good
understanding and accurate use of data types ensure correct nature and length of data is populated in
the tables. The intention of this tip is to help you gain an understanding of basic characteristics and
features of SQL Server varchar along with its performance and storage aspects in SQL Server. We also
covered recent advancements in SQL varchar in the SQL Server 2019 Preview.
Overview
Slow running queries are one of the most common problems in every organization dealing with huge
amounts of data. And the most challenging problem, in almost all the clients, I work with, is how to find
the queries running slow and figuring out what is the actual cause behind the performance problem.
Thankfully, the solution, in most cases, is simple.
I always suggest spending most of the time on figuring out the actual cause behind the problem, not on
thinking about the potential solutions which might exist.
Fortunately, there are some tools and techniques which a Developer or DBA should always use (at least)
to have a fair idea about the queries running slow.
Before going into the details, I would like to mention here that the tools and techniques I will mention
here will be for SQL Developers who do not have expert knowledge of database administration and for
Database Administrators who are at the start of their career.
Note: I will be using SQL Server 2016 for my test cases in this article. If you have any prior version, then the
Query Store is not available for you but all the other tools will still work.
The next tool is the “Query Store”. This is helpful and could save your life in situation where you were
called in the middle of the night to check why SQL Server was slow 1 hour earlier.
Generally, prior to SQL Server 2016, without any third-party application or custom solutions, you are not
able to look at the history of query execution. So, the Query Store provides a great deal of value added
functionality in this regard. Ed Pollack wrote about Query Store here so do check this article as it’s a great
resource to deep dive into query store.
If you have SQL Server 2016 or higher so first you need to enable it in your database properties. After
enabling the Query Store, you will have the properties of your database as shown in the screenshot
below:
After you have enabled the Query Store you can expand the database objects and go to the “Top
Resource Consuming Queries” as shown in the below screenshot:
Note: Give the Query Store a day or two to capture the production load so that you can easily work on it
with real load.
Right Click on Top Resource Consuming Queries and select “View Top Resource Consuming Queries”, you
will be taken to the window showing these high resource consuming queries. You can customize the view
by selecting an appropriate “metric” like Duration, CPU Time, Logical Read or Memory Consumption. The
second thing you need to change is “Statistic”. You can change it to Min, Max or Avg. I would
recommend to use Average Statistic with all metrics mentioned above to get queries.
The next step is to highlight the queries which are consuming high resources. After highlighting the
graph value in the left-hand side window (as highlighted in the screenshot below) you will get the query
execution plan in the bottom window.
You can click on the mentioned below highlighted button in the Query Store window to get the actual
Query Text for further analysis.
So, as of now, you have multiple ways to get High Resource usage queries. Now we will see how we can
check why the queries are running slow and which part of the query needs to be fixed (if required).
So, here I will take the example of a query used in Microsoft sample database “WideWorldImporters”.
The TSQL executes a stored procedure “[Integration].[GetOrderUpdates]”.
The call of this Stored Procedure takes around a second and I will not be optimizing it. This is just to give
you an example that how may know, how this second was spent. We also want to know which part of the
query is taking most time as well as on which table we must focus.
Below is the stored procedure call and results.
So now we have the call and we will dig deeper into this.
First, we need to enable the Query Statistics for this session. We will enable CPU and IO statistics for this
query session by issuing the TSQL “SET STATISTICS TIME, IO ON”.
After executing the mentioned above TSQL for enabling Statistics, we will get IO for each table and Total
CPU cost for the queries running inside the stored procedure in Messages tab as shown in below
screenshot.
In the above screenshot, we can see that the most IO is taken by OrderLines table and there is only one
query which is executing inside the Stored Procedure which is taking 672 ms CPU time (1650 ms Elapsed
time).
Note: There might be multiple queries running inside a stored procedure so keep in mind that Statistics will
give you time for each query as well as the total for all the queries at the end. So, in case of Stored
Procedures for total CPU time only consider the last CPU time and for each query consider its CPU time
only and exclude the last CPU time as it’s only a total for all.
As of now, we know that the OrderLines table is taking most of the Logical Reads.
Next, we will enable the Actual Execution Plan for the query by clicking the icon (Ctrl +M) in SQL
Server Management Studio and will try to answer the question of why this table was taking this IO and
which component of the execution plan is taking most of the time.
After including the Actual Execution Plan, we will re-execute the query and view the execution plan.
Though, we can obtain much detailed information about Query Execution Plan inside SQL Server
Management Studio but there is another great tool out on web which can be used to explore Query
Execution Plan in much more intuitive way that is ApexSQL Plan.
After installing this tool, you might need to restart the SQL Server Management Studio so install it and
then re-execute the query to get an Execution Plan. A screenshot tour for this tool is provided here. After
executing the query, Right Click on the Execution Plan and you will have the option of “View with
ApexSQL Plan”.
After viewing the Execution Plan in ApexSQL Plan, you can see the highlighted items in mentioned below
screenshot which will be opened in the ApexSQL Plan.
One of the important steps in an ETL process involves the transformation of source data. This could
involve looking up foreign keys, converting values from one data type into another, or simply conducting
data clean-ups by removing trailing and leading spaces. One aspect of transforming source data that
could get complicated relates to the removal of ASCII special characters such as new line characters and
the horizontal tab. In this article, we take a look at some of the issues you are likely to encounter when
cleaning up source data that contains ASCII special characters and we also look at the user-defined
function that could be applied to successfully remove such characters.
Figure 1
The backslash character falls into a category of ASCII characters that is known as ASCII Printable
Characters – which basically refers to characters visible to the human eye. Table 1 shows a top 5 sample
of ASCII Printable Characters.
Numeric Code Character Description
33 ! Exclamation Mark
35 # Number
36 $ Dollar
37 % Percent
38 & Ampersand
Table 1: ASCII Printable Characters (Source: RapidTables.com)
When it comes to addressing data quality issues in SQL Server, it’s easy to clean most of the ASCII
Printable Characters by simply applying the REPLACE function. Say for instance that source data contains
an email address for John Doe that has several invalid special characters as shown in Script 2.
1 DECLARE @email VARCHAR(55) = 'johndoe@a!b#c.com$';
Script 2
We could eliminate such characters by applying the REPLACE T-SQL function as shown in Script 3.
1 SELECT REPLACE(REPLACE(REPLACE(@email, '!', ''), '#', ''), '$', '');
Script 3
Execution of Script 3 results into a correctly formatted email address that is shown in Figure 2.
Figure 2
0 NUL null
Figure 3
As it can be seen, there seem to be spaces in email address 2-4 but it’s difficult to tell whether these
spaces are created by the Tab character or the Space bar character. Furthermore, if you go back to Script
4, you will recall that for the 3rd email address, I included the start of header character at the end of the
email address, but looking at the data in Figure 3, the start of header character is not easily visible at the
end of that 3rd email address. In fact, it looks like the email address 3 and 4 have the same amount of
characters – which is not true. Only using advanced text editors such as Notepad++ are we then able to
visualize the special characters in the data, as shown in Figure 4.
Figure 4
When it comes to SQL Server, the cleaning and removal of ASCII Control Characters are a bit tricky. For
instance, say we have successfully imported data from the output.txt text file into a SQL Server database
table. If we were to run the REPLACE T-SQL function against the data as we did in Script 3, we can already
see in Figure 5 that the REPLACE function was unsuccessful as the length of data in the original column is
exactly similar to the length calculated after having applied both REPLACE and TRIM functions.
1 SELECT [id],
2 [Column 0],
3 LEN([Column 0]) OriginalLength,
4 LEN(REPLACE(REPLACE(LTRIM(LTRIM([Column 0])), ' ', ''), ' ', '')) NewLength
5 FROM [SQLShack].[dbo].[OLE DB Destination];
Script 5
Figure 5
So how do we replace what we cannot see?
1. Replace String using Character Codes
The simplest way to replace what we cannot see is that instead of hardcoding the string to
replace into our REPLACE function, we should hardcode the string to be replaced by hardcoding
its ASCII numerical code within the CHAR function. Thus, instead of providing an exclamation
mark as the string to replace, we can hardcode the ASCII numerical code for exclamation mark –
which is 33 and convert that numeric code back to character code using the CHAR function. Thus
our script changes from:
1 DECLARE @email VARCHAR(55)= 'johndoe@a!bc.com';
2 SELECT REPLACE(@email, '!', '');
To using:
1 DECLARE @email VARCHAR(55)= 'johndoe@a!bc.com';
2 SELECT REPLACE(@email, CHAR(33), '');
Script 6
Now going back to cleaning email address data out of the output.txt text file, we can rewrite our
script to what is shown in Script 7.
1 SELECT [id],
2 [Column 0],
3 LEN([Column 0]) OriginalLength,
4 LEN(REPLACE(REPLACE([Column 0], CHAR(1), ''), CHAR(9), '')) NewLength
5 FROM [SQLShack].[dbo].[OLE DB Destination];
Script 7
After executing Script 7, we can see in Figure 6 that the length of all email address rows matches
back to the length of row 1 – which was originally the correct email address. Thus, we have
successfully managed to remove “invincible” special characters.
Figure 6
2. Dynamically Detect and Replace ASCII Characters
One noticeable limitation of Script 7 is that we have hard-coded the list of ASCII numerical values.
This means if the email address data contained special characters with ASCII numerical value 8
then we wouldn’t have removed them as we had hardcoded our script to specifically look
for CHAR(1) and CHAR(9). Therefore, there is a need for a mechanism that allows us to
automatically detect ASCII Control Characters contained in a given string and then automatically
replace them. Script 8 provides such a mechanism in a form of a While loop within a user-defined
function that iteratively searches through a given string to identify and replace ASCII Control
Characters.
1 CREATE FUNCTION [dbo].[ReplaceASCII](@inputString VARCHAR(8000))
2 RETURNS VARCHAR(55)
3 AS
4 BEGIN
5 DECLARE @badStrings VARCHAR(100);
6 DECLARE @increment INT= 1;
7 WHILE @increment <= DATALENGTH(@inputString)
8 BEGIN
9 IF(ASCII(SUBSTRING(@inputString, @increment, 1)) < 33)
10 BEGIN
11 SET @badStrings = CHAR(ASCII(SUBSTRING(@inputString, @increment, 1)));
12 SET @inputString = REPLACE(@inputString, @badStrings, '');
13 END;
14 SET @increment = @increment + 1;
15 END;
16 RETURN @inputString;
17 END;
18 GO
Script 8
The application of the function is shown in Script 9.
1 SELECT [id],
2 [Column 0],
3 LEN([Column 0]) OriginalLength,
4 LEN([SQLShack].[dbo].[ReplaceASCII]([Column 0])) NewLength
5 FROM [SQLShack].[dbo].[OLE DB Destination];
Script 9
Conclusion
Every now and then T-SQL developers are faced with cleaning the data they have imported by usually
applying the REPLACE T-SQL function. However, when it comes to removing special characters, removal
of ASCII Control Characters can be tricky and frustrating. Fortunately, SQL Server ships with additional
built-in functions such as CHAR and ASCII that can assist in automatically detecting and replacing ASCII
Control Characters.
References
Manage Unicode Characters in Data Using T-SQL
November 7, 2019 by Jignesh Raiyani
In this article, I’ll provide some useful information to help you understand how to use Unicode in SQL
Server and address various compilation problems that arise from the Unicode characters’ text with the
help of T-SQL.
What is Unicode?
The American Standard Code for Information Interchange (ASCII) was the first extensive character
encoding format. Originally developed in the US, and intended for English, ASCII could only
accommodate encoding for 128 characters. Character encoding simply means assigning a unique
number to every character being used. As an example, we show the letters ‘A’,’a’,’1′ and the symbol ‘+’
become numbers, as shown in the table:
ASCII(‘A’) ASCII(‘a’) ASCII(‘1’) ASCII(‘+’)
65 97 49 43
The T-SQL statement below can help us find the character from the ASCII value and vice-versa:
1 SELECT CHAR(193) as Character
Here is the result set of ASCII value to char:
1 SELECT ASCII('Á') as ASCII_
Here is the result set of char to ASCII value:
While ASCII encoding was acceptable for most common English language characters, numbers and
punctuation, it was constraining for the rest of the world’s dialects. As a result, other languages required
different encoding schemes and character definitions changed according to the language. Having
encoding schemes of different lengths required programs to figure out which one to apply depending on
the language being used.
Here is where international standards become critical. When the entire world practices the same
character encoding scheme, every computer can display the same characters. This is where the Unicode
Standard comes in.
Encoding is always related to a charset, so the encoding process encodes characters to bytes and
decodes bytes to characters. There are several Unicode formats: UTF-8, UTF-16 and UTF-32.
UTF-8 uses 1 byte to encode an English character. It uses between 1 and 4 bytes per character
and it has no concept of byte-order. All European languages are encoded in two bytes or less per
character
UTF-16 uses 2 bytes to encode an English character and it is widely used with either 2 or 4 bytes
per character
UTF-32 uses 4 bytes to encode an English character. It is best for random access by character
offset into a byte-array
Special characters are often problematic. When working with different source frameworks, it would be
preferable if every framework agreed as to which characters were acceptable. A lot of times, it happens
that developers perform missteps to identify or troubleshoot the issue, and however, those issues are
identified with the odd characters in the data, which caused the error.
Á Â Ã Ä Å Æ Ç È
Execution:
1 SELECT *
2 FROM [Find_Unicode](N'Mãrk sÿmónds')
Here is the result set:
Execution:
1 SELECT dbo.[RemoveNonASCII](N'Mãrk sÿmónds')
These SQL functions can be very useful if you’re working with large international character sets.
Linked servers allow submitting a T-SQL statement on a SQL Server instance, which returns data from
other SQL Server instances. A linked server allows joining data from several SQL Server instances using a
single T-SQL statement when data exists on multiple databases on different SQL instances. By using a
linked server to retrieve data from several SQL instances, the only thing that should be done is to
connect to one SQL instance.
There are two ways of configuring linked server in SSMS. One way is by using sp_addlinkedserver system
stored procedure and another is by using SQL Server Management Studio (SSMS) GUI interface.
In this article will be explained how to configure a linked server using a SQL Server data source. More
information about other data sources can be found on this link.
To see all created linked servers in SSMS, under Object Explorer, chose the Server Objects folder and
expand the Linked Servers folder:
To create a linked server in SSMS, right click on the Linked Servers folder and from the context menu
select the New Linked Server option:
Local Login
In the Local Login field, will be listed all the local logins. The local login can be an SQL Server
Authentication local login:
Now, when clicking the OK button on the New Linked Server dialog, the following error message will
appear:
The login mappings should either be impersonate or have a remote login name.
See the image below:
Impersonate
The Impersonate check box when is checked passes the local login credentials to the linked server. For
SQL Server Authentication, the same login with the exact credentials must exist on the linked server,
otherwise when connected to the server with the SQL Server Authentication, the list of the databases
under the Catalogs folder may look like this:
For Windows logins, the login must be a valid login on the linked server. In order to use impersonation,
the delegation between the local server and the linked server must be set.
Let’s create a linked server using the local Windows login. From the Local Login combo box, choose the
local Windows login and check the Impersonate checkbox and press the OK button:
Under the Catalogs folder, all databases that are located on the linked server will be listed:
Remote User
The remote user option allows users from the local SQL server to connect to the linked SQL server even
though their credentials aren’t present on the remote server by using the credentials from the user that
exists on the remote server. Basically, it allows local logins to connect to a remote server as a different
login that must exist on a remote server.
Remote Password
Specify the password of the remote user.
From the Local Login drop down list, choose a local login which should map to a remote login. On
the Remote User field, enter the name of the remote user that exists on the remote server and in
the Remote Password filed, enter a password of that remote user. Then, press the OK button:
Now, when connected to the local server using SQL Server Authentication, with Miki or Zivko credentials,
under the Catalogs folder, all databases that are available on a remote server for the Nenad remote login
will be listed:
Additionally, on the Linked Server dialog, it can be identified how logins that are not set in the Local
server login to remote server login mappings list will connect to the linked server, for that there are
four options that can be used and they are located under the For a login not defined in the list above,
connections will section:
Not be made
If this radio button is chosen, any users that aren’t identified in the Local server login to remote server
login mappings list cannot establish connection to the linked server.
For example, if login with different account (e.g. Ben) that not set in the login mapping list the list of the
databases under the Catalogs folder will look like this:
The last item under the Select a page menu is the Server Options item. When selecting this option, the
following window will be shown:
Here, additional options for linked server can be seen or set.
Collation Compatible
The first option is the Collation Compatible option. This option is used to identify if the linked server has
the same collation as the local server. This option should set to True only if is known that the linked
server has the same collation as the local, otherwise it should be set to False (default).
Data Access
This option is used to allow/deny access to the linked server data. If this option is set to False, the access
to remote will be denied. This option is useful to disable access to a remote server temporally. The
following message will appear when execute a linked server query and this option is set to False:
Msg 7411, Level 16, State 1, Line 1
Server ‘WSERVER2012\SQLEXPRESS’ is not configured for DATA ACCESS.
By default, the option is set to True
Collation Name
If the Use Remote Collation filed set to True, this option is used to specify the collation name of the
linked server for the data source that is not SQL Server data source. When chose a collation name, it must
be a collation that SQL Server supports.
Connection Timeout
This option is used to set the maximum time the local server should wait for to get a connection to the
linked server SQL Server instance. If 0 (zero) is set, then the server option remote login timeout is used.
By default, 10 second is set for this option. Note, the default value for SQL Server 2008 is 20 seconds.
Query Timeout
This option is used to set how long, in seconds, a remote process can take before time is out. The default
value is 600 second (10 minutes). To disable query timeout put 0 (zero) in this field and the query will
wait until it is completed.
Distributor
In this option, it can be specified whether the linked server is participating in replication as a distribution
Publisher.
The Distributor is a database instance, that acts as a store for replication specific data associated with one
or more Publishers
Publisher
In this option, it can be set whether the linked server to be a replication publisher or not. If True, the
linked server is a publisher. Otherwise, is not.
The Publisher is a database instance, that makes data available to other locations through replication.
Subscriber
In this option, it can be specified whether the linked server is a replication subscriber or not.
A Subscriber is a database instance, that receives replicated data.
More information about Distributor, Publisher, Subscriber can be found on the Replication Publishing
Model Overview page.
Otherwise, an error message will be displayed that shows a problem that prevents connection to be
successfully established:
If everything goes well the linked server will be removed from the Linked Servers folder.
Description
Of the many ways in which query performance can go awry, few are as misunderstood as parameter
sniffing. Search the internet for solutions to a plan reuse problem, and many suggestions will be
misleading, incomplete, or just plain wrong.
This is an area where design, architecture, and understanding one’s own code are extremely important,
and quick fixes should be saved as emergency last resorts.
Understanding parameter sniffing requires comfort with plan reuse, the query plan cache, and
parameterization. This topic is so important and has influenced me so much that I am devoting an entire
article to it, in which we will define, discuss, and provide solutions to parameter sniffing challenges.
Parameterization
The solution to memory pressure in the plan cache is parameterization. For our query above, the
DATETIME literal can be replaced with a parameter:
1 CREATE PROCEDURE dbo.get_order_date_metrics
2 @order_date DATETIME
3 AS
4 BEGIN
5 SET NOCOUNT ON;
6
7 SELECT
8 SalesOrderHeader.SalesOrderID,
9 SalesOrderHeader.DueDate,
10 SalesOrderHeader.ShipDate
11 FROM Sales.SalesOrderHeader
12 WHERE SalesOrderHeader.OrderDate = @order_date;
13 END
When executed for the first time, an execution plan will be generated for this stored procedure that uses
the parameter @order_date. All subsequent executions will use the same execution plan, resulting in the
need for only a single plan, even if the proc is executed millions of times per day.
Parameterization greatly reduces churn in the plan cache and speeds up query execution as we can often
skip the expensive optimization process that is needed to generate an execution plan.
We can see that SQL Server used a scan on a nonclustered index, as well as a key lookup to return the
data we were looking for. If we were to clear the execution plan cache and rerun this for a parameter
value of 0, then we would get a different plan:
Table ‘SalesOrderHeader’. Scan count 1, logical reads 698, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead
reads 0.
Because so many rows were being returned by the query, SQL Server found it more efficient to scan the
table and return everything, rather than methodically seek through an index to return 95% of the table. In
each of these examples, the execution plan chosen was the best plan for the parameter value passed in.
How will performance look if we were to execute the stored procedure for a parameter value of 285 and
not clear the plan cache?
Table ‘SalesOrderHeader’. Scan count 1, logical reads 698, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead
reads 0.
The correct execution plan involved a scan of a nonclustered index with a key lookup, but since we
reused our most recently generated execution plan, we got a clustered index scan instead. This plan cost
us six times more reads, pulling significantly more data from storage than was needed to process the
query and return our results.
The behavior above is a side-effect of plan reuse and is the poster-child for what this article is all about.
For our purposes, parameter sniffing will be defined as undesired execution plan reuse.
Finding and Resolving Parameter Sniffing
How do we diagnose parameter sniffing? Once we know that performance is suboptimal, there are a
handful of giveaways that help us understand when this is occurring:
A stored procedure executes efficiently sometimes, but inefficiently at other times.
A good query begins performing poorly when no changes are made to database schema.
A stored procedure has many parameters and/or complex business logic enumerated within it.
A stored procedure uses extensive branching logic.
Playing around with the TSQL appears to fix it temporarily.
Hacks fix it temporarily
Of the many areas of SQL Server where performance problems rear their head, few are handled as poorly
as parameter sniffing. There often is not an obvious or clear fix, and as a result we implement hacks or
poor choices to resolve the latency and allow us to move on with life as quickly as possible. An immense
percentage of the content available online, in publications, and in presentations on this topic is
misleading, and encourages the administrator to take shortcuts that do not truly fix a problem. There are
definitive ways to resolve parameter sniffing, so let’s look at many of the possible solutions (and how
effective they are).
I am not going to go into excruciating detail here. MSDN documents the use of different hints/mechanics
well. Links are included at the end of the article to help with this, if needed.
Redeclaring Parameters Locally
Rating: It’s a trap!
This is a complete cop-out, plain and simple. Call it a cheat, a poor hack, or a bandage as that is all it is.
Because the value of local variables is not known until runtime, the query optimizer needs to make a very
rough estimate of row counts prior to execution. This estimate is all we get, and statistics on the index
will not be effectively used to determine the best execution plan. This estimate will sometimes be good
enough to resolve a parameter sniffing issue and give the illusion of a job well done.
The effect of using local variables is to hide the value from SQL Server. It’s essentially applying the hint
“OPTIMIZE FOR UNKNOWN” to any query component that references them. The rough estimate that SQL
Server uses to optimize the query and generate an execution plan will be right sometimes, and wrong
other times. Typically the way this is implemented is as follows:
1. Performance problem is identified.
2. Parameter sniffing is determined to be the cause.
3. Redeclaring parameters locally is a solution found on the internet.
4. Try redeclaring parameters locally and the performance problem resolves itself.
5. Implement the fix permanently.
6. 3 months later, the problem resurfaces and the cause is less obvious.
What we are really doing is fixing a problem temporarily and leaving behind a time bomb that will create
problems in the future. The estimate by the optimizer may work adequately for now, but eventually will
not be adequate and we’ll have resumed performance problems. This solution works because oftentimes
a poor estimate performs better than badly times parameter sniffing, but only at that time. This is a game
of chance in which a low probability event (parameter sniffing) is crossed with a high probability event (a
poor estimate happening to be good enough) to generate a reasonable illusion of a fix.
To demo this behavior, we’ll redeclare a parameter locally in our stored procedure from earlier:
1 IF EXISTS (SELECT * FROM sys.procedures WHERE procedures.name = 'get_order_metrics_by_sales_person')
2 BEGIN
3 DROP PROCEDURE dbo.get_order_metrics_by_sales_person;
4 END
5 GO
6
7 CREATE PROCEDURE dbo.get_order_metrics_by_sales_person
8 @sales_person_id INT
9 AS
10 BEGIN
11 SET NOCOUNT ON;
12
13 DECLARE @sales_person_id_local INT = @sales_person_id;
14
15 SELECT
16 SalesOrderHeader.SalesOrderID,
17 SalesOrderHeader.DueDate,
18 SalesOrderHeader.ShipDate
19 FROM Sales.SalesOrderHeader
20 WHERE SalesOrderHeader.SalesPersonID = @sales_person_id_local;
21 END
When we execute this for different values, we get the same plan each time. Clearing the proc cache has
no effect either:
1 EXEC dbo.get_order_metrics_by_sales_person @sales_person_id = 285;
2 DBCC FREEPROCCACHE
3 EXEC dbo.get_order_metrics_by_sales_person @sales_person_id = 0;
4 EXEC dbo.get_order_metrics_by_sales_person @sales_person_id = 285;
5 DBCC FREEPROCCACHE
6 EXEC dbo.get_order_metrics_by_sales_person @sales_person_id = 285;
7 EXEC dbo.get_order_metrics_by_sales_person @sales_person_id = 0;
For each execution, the result is:
Table ‘SalesOrderHeader’. Scan count 1, logical reads 698, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead
reads 0.
When we hover over the results, we can see that the estimated number of rows was 1748, but the actual
rows returned by the query was 16. Seeing a huge disparity between actual and estimated rows is an
immediate indication that something is wrong. While that could be indicative of stale statistics, seeing
local variables in the query should be a warning sign that they are related. In this example, the local
variable forced the same mediocre execution plan for all runs of the query, regardless of details. This may
sometimes give an illusion of adequate performance, but will rarely do so for long.
To summarize: declaring local variables, assigning parameter values to them, and using the local variables
in subsequent queries is a very bad idea and we should never, ever do this! If a short-term hack is
It is important to note that if this query hint is utilized and the query later begins to be used more often,
you will want to consider removing the hint to prevent excessive resource consumption by the query
optimizer as it constantly generates new plans. OPTION RECOMPILE is useful in a specific set of
circumstances and should be applied carefully, only when needed, and only when the query is not
executed often. To review, OPTION (RECOMPILE) is best used when:
A query is executed infrequently.
Unpredictable parameter values result in optimal execution plans that vary greatly with each
execution.
Other optimization solutions were unavailable or unsuccessful.
As with all hints, use it with caution, and only when absolutely needed.
Dynamic SQL
Rating: Potentially useful
While dynamic SQL can be an extremely useful tool, this is a somewhat awkward place to use it. By
wrapping a troublesome TSQL statement in dynamic SQL, we remove it from the scope of the stored
procedure and another execution plan will be generated exclusively for the dynamic SQL. Since execution
plans are generated for specific TSQL text, a dynamic SQL statement with any variations in text will
generate a new plan.
For all intents and purposes, using dynamic SQL to resolve parameter sniffing is very similar to using a
RECOMPILE hint. We are going to generate more execution plans with greater granularity in an effort to
sidestep the effects of parameter sniffing. All of the caveats of recompilation apply here as well. We do
not want to generate excessive quantities of execution plans as the resource cost to do so will be high.
One benefit of this solution is that we will not create a new plan with each execution, but only when the
parameter values change. If the parameter values don’t change often, then we will be able to reuse plans
frequently and avoid the heavy repeated costs of optimization.
A downside to this solution is that it is confusing. To a developer, it is not immediately obvious why
dynamic SQL was used, so additional documentation would be needed to explain its purpose. While
using dynamic SQL can sometimes be a good solution, it is the sort that should be implemented very
carefully and only when we are certain we have a complete grasp of the code and business logic
involved. As with RECOMPILE, if the newly created dynamic SQL suddenly begins to be executed often,
then the cost to generate new execution plans may become a burden on resource consumption. Lastly,
remember to cleanse inputs and ensure that string values cannot be broken or modified by apostrophes,
percent signs, brackets, or other special characters.
Here is an example of this usage:
1 IF EXISTS (SELECT * FROM sys.procedures WHERE procedures.name = 'get_order_metrics_by_sales_person')
2 BEGIN
3 DROP PROCEDURE dbo.get_order_metrics_by_sales_person;
4 END
5 GO
6
7 CREATE PROCEDURE dbo.get_order_metrics_by_sales_person
8 @sales_person_id INT
9 AS
10 BEGIN
11 SET NOCOUNT ON;
12
13 DECLARE @sql_command NVARCHAR(MAX);
14
15 SELECT @sql_command = '
16 SELECT
17 SalesOrderHeader.SalesOrderID,
18 SalesOrderHeader.DueDate,
19 SalesOrderHeader.ShipDate
20 FROM Sales.SalesOrderHeader
21 WHERE SalesOrderHeader.SalesPersonID = ' + CAST(@sales_person_id AS VARCHAR(MAX)) + ';
22 ';
23 EXEC sp_executesql @sql_command;
24 END
25 GO
Our results here are similar to using OPTION (RECOMPILE), as we will get good IO and execution plans
generated each time. To review wrapping a TSQL statement in dynamic SQL and hard-coding parameters
into that statement can be useful when:
A query is executed infrequently OR parameter values are not very diverse.
Different parameter values result in wildly different execution plans.
Other optimization solutions were unavailable or unsuccessful.
OPTION (RECOMPILE) resulted in too many recompilations.
OPTIMIZE FOR
Rating: Potentially useful, if you really know your code!
When we utilize this hint, we explicitly tell the query optimizer what parameter value to optimize for. This
should be used like a scalpel, and only when we have complete knowledge of and control over the code
in question. To tell SQL Server that we should optimize a query for any specific value requires that we
know that all values used will be similar to the one we choose.
This requires knowledge of both the business logic behind the poorly performing query and any of the
TSQL in and around the query. It also requires that we can see the future with a high level of accuracy
and know that parameter values will not shift in the future, resulting in our estimates being wrong.
One excellent use of this query hint is to assign optimization values for local variables. This can allow you
to curtail the rough estimates that would otherwise be used. As with parameters, you need to know what
you are doing for this to be effective, but there is at least a higher probability of improvement when our
starting point is “blind guess”.
Note that OPTIMIZE FOR UNKNOWN has the same effect as using a local variable. The result will typically
behave as if a rough statistical estimate were used and will not always be adequate for efficient
execution. Here’s how its usage looks:
1 IF EXISTS (SELECT * FROM sys.procedures WHERE procedures.name = 'get_order_metrics_by_sales_person')
2 BEGIN
3 DROP PROCEDURE dbo.get_order_metrics_by_sales_person;
4 END
5 GO
6
7 CREATE PROCEDURE dbo.get_order_metrics_by_sales_person
8 @sales_person_id INT
9 AS
10 BEGIN
11 SET NOCOUNT ON;
12
13 SELECT
14 SalesOrderHeader.SalesOrderID,
15 SalesOrderHeader.DueDate,
16 SalesOrderHeader.ShipDate
17 FROM Sales.SalesOrderHeader
18 WHERE SalesOrderHeader.SalesPersonID = @sales_person_id
19 OPTION (OPTIMIZE FOR (@sales_person_id = 285));
20 END
With this hint in place, all executions will utilize the same execution plan based on the
parameter @sales_person_id having a value of 285. OPTIMIZE FOR is most useful in these scenarios:
A query executes very similarly all the time.
We know our code very well and understand the performance of it thoroughly.
Row counts processed and returned are consistently similar.
We have a high level of confidence that these facts will not change in the future.
OPTIMIZE FOR can be a useful way to control variables and parameters to ensure optimal performance,
but it requires knowledge and confidence in how a query or stored procedure operates so that we do not
introduce a future performance problem when things change. As with all hints, use it with caution, and
only when absolutely needed.
Conclusion
To wrap up our discussion of parameter sniffing, it is important to be reminded that this is a feature and
not a bug. We should not be automatically seeking workarounds, hacks, or cheats to make the problem
go away. Many quick fixes exist that will resolve a problem for now and allow us to move on to other
priorities. Before adding query hints, trace flags, or otherwise hobbling the query optimizer, consider
every alternate way to improve performance. Local variables, dynamic SQL, RECOMPILE, and OPTIMIZE
for are too often cited as the best solutions, when in fact they are typically misused.
Scalability
How will the application and its data grow over time? The ways we build, maintain, and query data
change when we know the volume will be huge. Very often we build and test code in very controlled
dev/QA environments in which the data flow does not mirror a real production environment. Even
without this, we should be able to estimate how much an app will be used and what the most common
needs will be.
We can then infer metrics such as database size, memory needs, CPU, and throughput. We want the
hardware that we put our databases on to be able to perform adequately and this requires us to allocate
enough computing resources to make this happen. A 10TB database will likely not perform well (or at all)
on a server with 2GB of RAM available to it. Similarly, a high-traffic application may require faster
throughput on the network and storage infrastructure, in addition to speedy SSDs. A database will only
perform as quickly as its slowest component, and it is up to us to make sure that the slowest component
is fast enough for our app.
How will data size grow over time? Can we easily expand storage and memory easily when needed? If
downtime is not an option, then we will need to consider hardware configurations that will either provide
a ton of extra overhead to start or allow for seamless expansions later on. If we are not certain of data
growth, do we expect the user or customer count to grow? If so, we may be able to infer data or usage
growth based on this.
Licensing matters, too as licensing database software isn’t cheap. We should consider what edition of
SQL Server will function on and what the least expensive edition is that we are allowed to use. A
completely internal server with no customer-facing customer access may be able to benefit from using
Developer edition. Alternatively, the choice between Enterprise and Standard may be decided by features
(such as AlwaysOn) or capacity (memory limitations, for example). A link is provided at the end of this
article with extensive comparisons between editions of SQL Server.
High availability and disaster recovery are very important considerations early-on that often are not
visited until it is too late. What is the expected up-time of the app? How quickly are we expected to
recover from an outage (recovery time objective/RTO)? In addition, how much data loss is tolerated in
the event of an outage or disaster (recovery point objective,/RPO)? These are tough questions as
businesses will often ask for a guarantee of zero downtime and no data loss, but will back off when they
realize the cost to do so is astronomically high. This discussion is very important to have prior to an
application being released as it ensures that contracts, terms of service, and other documentation
accurately reflect the technical capabilities of the systems it resides on. It also allows you to plan ahead
with disaster recovery plans and avoid the panic often associated with unexpected outages.
Data types
One of the most basic decisions that we can make when designing a database is to choose the right data
types. Good choices can improve performance and maintainability. Poor choices will make work for us in
the future
Choose natural data types that fit the data being stored. A date should be a date, not a string. A bit
should be a bit and not an integer or string. Many of these decisions are holdovers from years ago when
data types were more limited and developers had to be creative in order to generate the data they
wanted.
Choose length, precision, and size that fits the use case. Extra precision may seem like a useful add-on,
but can be confusing to developers who need to understand why a DECIMAL(18,4) contains data with
only two digits of decimal detail. Similarly, using a DATETIME to store a DATE or TIME can also be
confusing and lead to bad data.
When in doubt, consider using a standard, such as ISO5218 for gender, ISO3166 for country, or ISO4217
for currency. These allow you to quickly refer anyone to universal documentation on what data should
look like, what is valid, and how it should be interpreted.
Avoid storing HTML, XML, JSON, or other markup languages in the database. Storing, retrieving, and
displaying this data is expensive. Let the app manage data presentation, not the database. A database
exists to store and retrieve data, not to generate pretty documents or web pages.
Dates and times should be consistent across all tables. If time zones or locations will matter, consider
using UTC time or DATETIMEOFFSET to model them. Upgrading a database in the future to add time
zone support is much harder than using these conventions in the beginning. Dates, times, and durations
are different. Label them so that it is easy to understand what they mean. Duration should be stored in a
one-dimensional scalar unit, such as seconds or minutes. Storing duration in the format
“HH:MM:SS.mmm” is confusing and difficult to manipulate when mathematical operations are needed.
NULLs
Use NULL when non-existence of data needs to be modelled in a meaningful fashion. Do not use made-
up data to fill in NOT NULL columns, such as “1/1/1900” for dates, “-1” for integers, “00:00:00” for times,
or “N/A” for strings. NOT NULL should mean that a column is required by an application and should
always be populated with meaningful data.
NULL should have meaning and that meaning should be defined when the database is being designed.
For example, “request_complete_date = NULL” could mean that a request is not yet complete. “Parent_id
= NULL“ could indicate an entity with no parent.
NULL can be eliminated by additional normalization. For example, a parent-child table could be created
that models all hierarchical relationships for an entity. This may be beneficial if these relationships form a
critical component of how an app operates. Reserve the removal of NULLable columns via normalization
for those that are important to an app or that may require additional supporting schema to function well.
As always, normalization for the sake of normalization is probably not a good thing!
Beware NULL behavior. ORDER BY, GROUP BY, equalities, inequalities, and aggregate functions will all
treat NULL differently. Always SET ANSI_NULLS ON. When performing operations on NULLable columns,
be sure to check for NULL whenever needed. Here is a simple example from Adventureworks:
1 SELECT
2 *
3 FROM Person.Person
4 WHERE Title = NULL
5
6 SELECT
7 *
8 FROM Person.Person
9 WHERE Title IS NULL
These queries look similar but will return different results. The first query will return 0 rows, whereas the
second will return 18,963 rows:
The reason is that NULL is not a value and cannot be treated like a number or string. When checking for
NULL or working with NULLable columns, always check and validate if you wish to include or exclude
NULL values, and always use IS NOT NULL or IS NULL, instead of =, <, >, etc…
SET ANSI NULLS ON is a default in SQL Server and should be left as a default. Adjusting this will change
how the above behavior works and will go against ANSI standards. Building code to handle NULL
effectively is a far more scalable approach than adjusting this setting.
Object names
Naming things is hard! Choosing descriptive, useful names for objects will greatly improve readability
and the ability for developers to easily use those objects for their work and not make unintended
mistakes.
Name an object for what it is. Include units in the name if they are not absurdly obvious.
“duration_in_seconds” is much more useful than “duration”. “Length_inches” is easier to understand than
“length”. Bit columns should be named in the positive and match the business use case: “is_active”,
“is_flagged_for_deletion”, “has_seventeen_pizzas”. Negative columns are usually confusing:
“is_not_taxable”, “has_no_pizzas”, “is_not_active” will lead to mistakes and confusion as they are not
intuitive. Database schema should not require puzzle-solving skills to understand ?
Other things to avoid:
Abbreviations & shorthand. This is rarely not confusing. If typing speed is a concern for slower
typists, consider the many tools available that provide Intellisense or similar auto-completion
features.
Spaces & special characters. They will break maintenance processes, confuse developers, and be
a nuisance to type correctly when needed. Stick to numbers, letters, and underscores.
Reserved words. If it’s blue, white, or pink in SSMS, don’t use it! This only causes confusion and
increases the chances of logical coding errors.
Consistency is valuable and creating effective naming schemes early will pay dividends later when there
is no need to “fix” standards to not be awful. As for the debate between capitalization and whether you
should use no capitals, camel case, pascal case, etc…, this is completely arbitrary and up to a
development team. In databases with lots of objects, prefixes can be used to allow objects of specific
types, origins, or purposes to be easily searchable. Alternatively, different schemas can be used to divide
up objects of different types.
Good object naming reduces mistakes and errors while speeding up app development. While nothing is
truly self-documenting, quality object names reduce the need to find additional resources (docs or
people) to determine what something is for or what it means.
Old Data
Whenever data is created, ask the question, “How long should it exist for?”. Forever is a long time and
most data does not need to live forever. Find out or create a data retention policy for all data and write
code to enforce it. Many businesses have compliance or privacy rules to follow that may dictate how long
data needs to be retained for.
Limiting data size is a great way to improve performance and reduce data footprint! This is true for any
table that stores historical data. A smaller table means smaller indexes, less storage use, less memory use,
and less network bandwidth usage. All scans and seeks will be faster as the indexes are more compact
and quicker to search.
There are many ways to deal with old data. Here are a few examples:
Delete it. Forever. If allowed, this is an easy and quick solution.
Archive it. Copy it to a secondary location (different database, server, partition, etc…) and then
delete it.
Soft-delete it. Have a flag that indicates that it is no longer relevant and can be ignored in normal
processes. This is a good solution when you can leverage different storage partitions, filtered
indexes, or ways to segregate data as soon as it is flagged as old.
Nothing. Some data truly is needed forever. If so, consider how to make the underlying structures
scalable so that they perform well in the future. Consider how large the tables can grow.
Data retention doesn’t only involve production OLTP tables, but may also include backup files, reporting
data, or data copies. Be sure to apply your retention policies to everything!
We have two hints that something has gone wrong: An overly large result set, and an unexpected index
scan in the execution plan:
Upon closer inspection of our query, it becomes obvious that I fat-fingered the INNER JOIN and did not
enter the correct table names:
1 INNER JOIN Production.ProductModel
2 ON ProductModel.ProductModelID = ProductModel.ProductModelID
By entering ProductModel on both sides of the join, I inadvertently told SQL Server to not
join Product to ProductModel, but instead join Product to the entirety of ProductModel. This occurs
because ProductModel.ProductModel will always equal itself. I could have entered “ON 1 = 1” for the join
criteria and seen the same results.
The correction here is simple, adjust the join criteria to connect Product to ProductModel, as was
intended:
1 INNER JOIN Production.ProductModel
2 ON Product.ProductModelID = ProductModel.ProductModelID
Once fixed, the query returns a single row and utilizes an index seek on ProductModel.
Situations in which a join predicate is missing or wrong can be difficult to detect. SQL Server does not
always warn you of this situation, and you may not see an error message or show-stopping bad
performance that gets your immediate attention. Here are some tips on catching bad joins before they
cause production headaches:
Make sure that each join correlates an existing data set with the new table. CROSS JOINs should
only be used when needed (and intentionally) to inflate the size/depth a data set.
An execution plan may indicate a “No Join Predicate” warning on a specific join in the execution
plan. If so, then you’ll know exactly where to begin your research.
Check the size of the result set. Is it too large? Are any tables being cross joined across an entire
data set, resulting in extra rows of legit data with extraneous data tacked onto the end of it?
Do you see any unusual index scans in the execution plan? Are they for tables where you expect
to only seek a few rows, such as in a lookup table?
For reference, here is an example of what a “No Join Predicate” warning looks like:
We’ll follow the standard rule that yellow and red exclamation marks will always warrant further
investigation. In doing so, we can see that this specific join is flagged as having no join predicate. In a
short query, this is easy to spot, but in a larger query against many tables, it is easy for these problems to
get buried in a larger execution plan.
Iteration
SQL Server is optimized for set-based operations and performs best when you read and write data in
batches, rather than row-by-row. Applications are not constrained in this fashion and often use iteration
as a method to parse data sets.
While it may anecdotally seem that collecting 100 rows from a table one-at-a-time or all at once would
take the same effort overall, the reality is that the effort to connect to storage and read pages into
memory takes a distinct amount of overhead. As a result, one hundred index seeks of one row each will
take far more time and resources than one seek of a hundred rows:
1 DECLARE @id INT = (SELECT MIN(BusinessEntityID) FROM HumanResources.Employee)
2 WHILE @id <= 100
3 BEGIN
4 UPDATE HumanResources.Employee
5 SET VacationHours = VacationHours + 4
6 WHERE BusinessEntityID = @id
7 AND VacationHours < 200;
8
9 SET @id = @id + 1;
10 END
This example is simple: iterate through a loop, update an employee record, increment a counter and
repeat 99 times. The performance is slow and the execution plan/IO cost abysmal:
At first glance, things seem good: Lots of index seeks and each read operation is inexpensive. When we
look more closely, we realize that while 2 reads may seem cheap, we need to multiply that by 100. The
same is true for the 100 execution plans that were generated for all of the update operations.
Let’s say we rewrite this to update all 100 rows in a single operation:
1 UPDATE HumanResources.Employee
2 SET VacationHours = VacationHours + 4
3 WHERE VacationHours < 200
4 AND BusinessEntityID <= 100;
Instead of 200 reads, we only need 5, and instead of 100 execution plans, we only need 1.
Data in SQL Server is stored in 8kb pages. When we read rows of data from disk or memory, we are
reading 8kb pages, regardless of the data size. In our iterative example above, each read operation did
not simply read a few numeric values from disk and update one, but had to read all of the necessary 8kb
pages needed to service the entire query.
Iteration is often hidden from view because each operation is fast an inexpensive, making it difficult to
locate it when reviewing extended events or trace data. Watching out for CURSOR use, WHILE loops, and
GOTO can help us catch it, even when there is no single poor-performing operation.
There are other tools available that can help us avoid iteration. For example, a common need when
inserting new rows into a table is to immediately return the IDENTITY value for that new row. This can be
accomplished by using @@IDENTITY or SCOPE_IDENTITY(), but these are not set-based functions. To use
them, we must iterate through insert operations one-at-a-time and retrieve/process the new identity
values after each loop. For row counts greater than 2 or 3, we will begin to see the same inefficiencies
introduced above.
The following code is a short example of how to use OUTPUT INSERTED to retrieve IDENTITY values in
bulk, without the need for iteration:
CREATE TABLE #color
1
(color_id SMALLINT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED, color_name VARCHAR(50)
2
NOT NULL, datetime_added_utc DATETIME);
3
CREATE TABLE #id_values
4
(color_id SMALLINT NOT NULL PRIMARY KEY CLUSTERED, color_name VARCHAR(50) NOT NULL);
5
6
INSERT INTO #color
7
(color_name, datetime_added_utc)
8
OUTPUT INSERTED.color_id, INSERTED.color_name
9
INTO #id_values
10
VALUES
11
('Red', GETUTCDATE()),
12
('Blue', GETUTCDATE()),
13
('Yellow', GETUTCDATE()),
14
('Brown', GETUTCDATE()),
15
('Pink', GETUTCDATE());
16
17
SELECT * FROM #id_values;
18
19
DROP TABLE #color;
20
DROP TABLE #id_values;
In this script, we insert new rows into #color in a set-based fashion, and pull the newly inserted IDs, as
well as color_name, into a temp table. Once in the temp table, we can use those new values for whatever
additional operations are required, without the need to iterate through each INSERT operation one-at-a-
time.
Window functions are also very useful for minimizing the need to iterate. Using them, we can pull row
counts, sums, min/max values, and more without executing additional queries or iterating through data
windows manually:
SELECT
1 SalesOrderHeader.SalesOrderID,
2 SalesOrderDetail.SalesOrderDetailID,
3 SalesOrderHeader.SalesPersonID,
4 ROW_NUMBER() OVER (PARTITION BY SalesOrderHeader.SalesPersonID ORDER BY
5 SalesOrderDetail.SalesOrderDetailID ASC) AS SalesPersonRowNum,
6 SUM(SalesOrderHeader.SubTotal) OVER (PARTITION BY SalesOrderHeader.SalesPersonID ORDER BY
7 SalesOrderDetail.SalesOrderDetailID ASC) AS SalesPersonSales
8 FROM Sales.SalesOrderHeader
9 INNER JOIN Sales.SalesOrderDetail
10 ON SalesOrderDetail.SalesOrderID = SalesOrderHeader.SalesOrderID
11 WHERE SalesOrderHeader.SalesPersonID IS NOT NULL
AND SalesOrderHeader.Status = 5;
The results of this query show us not only a row per detail line, but include a running count of orders per
sales person and a running total of sales:
Window functions are not inherently efficient: The above query required some hefty sort operations to
generate the results. Despite the cost, this is far more efficient that iterating through all sales people,
orders, or some other iterative operation over a large data set:
In addition to avoiding iteration, we also avoid the need for aggregation within our query, allowing us to
freely select whatever columns we’d like without the typical constraints of GROUP BY/HAVING queries.
Iteration is not always a bad thing. Sometimes we need to query all databases on a server or all servers in
a list. Other times we need to call a stored procedure, send emails, or perform other operations that are
either inefficient or impossible to do in a set-based fashion. In these scenarios, make sure that
performance is adequate and that the number of times that a loop needs to be repeated is limited to
prevent unexpected long-running jobs.
Encapsulation
When writing application code, encapsulation is used as a way to reuse code and simplify complex
interfaces. By packaging code into functions, stored procedures, and views, we can very easily offload
important business logic or reusable code to a common place, where it can be called by any other code.
While this sounds like a very good thing, when overused it can very quickly introduce performance
bottlenecks as chains of objects linked together by other encapsulated objects increases. For example: a
stored procedure that calls a stored procedure that uses a function that calls a view that calls a view that
calls a view. This may sound absurd but is a very common outcome when views and nested stored
procedures are relied on heavily.
How does this cause performance problems? Here are a few common ways:
Unnecessary joins, filters, and subqueries are applied, but not needed.
Columns are returned that are not needed for a given application.
INNER JOINs, CROSS JOINs, or filters force reads against tables that are not needed for a given
operation.
Query size (# of tables referenced in query) results in a poor execution plan.
Logical mistakes are made due to obfuscated query logic not being fully understood.
Here is an example of an AdventureWorks query in which simple intentions have complex results:
1 SELECT
2 BusinessEntityID,
3 Title,
4 FirstName,
5 LastName
6 FROM HumanResources.vEmployee
7 WHERE FirstName LIKE 'E%'
At first glance, this query is pulling only 4 columns from the employee view. The results are what we
expect, but it runs a bit longer than we’d want (over 1 second). Checking the execution plan and IO stats
reveals:
What we discover is that there was quite a bit going on behind-the-scenes that we were not aware of.
Tables were accessed that we didn’t need, and excess reads performed as a result. This leads us to ask:
What is in vEmployee anyway!? Here is the definition of this view:
1 CREATE VIEW [HumanResources].[vEmployee]
2 AS
3 SELECT
4 e.[BusinessEntityID]
5 ,p.[Title]
6 ,p.[FirstName]
7 ,p.[MiddleName]
8 ,p.[LastName]
9 ,p.[Suffix]
10 ,e.[JobTitle]
11 ,pp.[PhoneNumber]
12 ,pnt.[Name] AS [PhoneNumberType]
13 ,ea.[EmailAddress]
14 ,p.[EmailPromotion]
15 ,a.[AddressLine1]
16 ,a.[AddressLine2]
17 ,a.[City]
18 ,sp.[Name] AS [StateProvinceName]
19 ,a.[PostalCode]
20 ,cr.[Name] AS [CountryRegionName]
21 ,p.[AdditionalContactInfo]
22 FROM [HumanResources].[Employee] e
23 INNER JOIN [Person].[Person] p
24 ON p.[BusinessEntityID] = e.[BusinessEntityID]
25 INNER JOIN [Person].[BusinessEntityAddress] bea
26 ON bea.[BusinessEntityID] = e.[BusinessEntityID]
27 INNER JOIN [Person].[Address] a
28 ON a.[AddressID] = bea.[AddressID]
29 INNER JOIN [Person].[StateProvince] sp
30 ON sp.[StateProvinceID] = a.[StateProvinceID]
31 INNER JOIN [Person].[CountryRegion] cr
32 ON cr.[CountryRegionCode] = sp.[CountryRegionCode]
33 LEFT OUTER JOIN [Person].[PersonPhone] pp
34 ON pp.BusinessEntityID = p.[BusinessEntityID]
35 LEFT OUTER JOIN [Person].[PhoneNumberType] pnt
36 ON pp.[PhoneNumberTypeID] = pnt.[PhoneNumberTypeID]
37 LEFT OUTER JOIN [Person].[EmailAddress] ea
38 ON p.[BusinessEntityID] = ea.[BusinessEntityID];
This view does not only contain basic Employee data, but also many other tables as well that we have no
need for in our query. While the performance we experienced might be acceptable under some
circumstances, it’s important to understand the contents of any objects we use to the extent that we can
use them effectively. If performance were a key issue here, we could rewrite our query as follows:
1 SELECT
2 e.BusinessEntityID,
3 p.Title,
4 p.FirstName,
5 p.LastName
6 FROM HumanResources.Employee e
7 INNER JOIN Person.Person p
8 ON p.BusinessEntityID = e.BusinessEntityID
9 WHERE FirstName LIKE 'E%'
This version only accesses the tables we need, thereby generating half the reads and a much simpler
execution plan:
It is important to note that encapsulation is in no way a bad thing, but in the world of data, there are
dangers to over-encapsulating business logic within the database. Here are some basic guidelines to
help in avoiding performance problems resulting from the nesting of database objects:
When possible, avoid nesting views within views. This improves visibility into code and reduces
the chances of misunderstanding the contents of a given view.
Avoid nesting functions if possible. This can be confusing and lead to challenging performance
problems.
Avoid triggers that call stored procedures or that perform too much business logic. Nested
triggers are equally dangerous. Use caution when operations within triggers can fire more
triggers.
Understand the functionality of any defined objects (functions, triggers, views, stored procedures)
prior to use. This will avoid misunderstandings of their purpose.
Storing important and frequently used TSQL in stored procedures, views, or functions can be a great way
to increase maintainability via code reuse. Exercise caution and ensure that the complexity of
encapsulated objects does not become too high. Performance can be inadvertently impacted when
objects are nested many layers deep. When troubleshooting a problem query, always research the
objects involved so that you have full exposure to any views, functions, stored procedures, or triggers
that may also be involved in its execution.
Triggers
Triggers themselves are not bad, but overuse of them can certainly be a performance headache. Triggers
are placed on tables and can fire instead of, or after inserts, updates, and/or deleted.
The scenarios when they can become performance problems is when there are too many of them. When
updating a table results in inserts, updates, or deletes against 10 other tables, tracking performance can
become very challenging as determining the specific code responsible can take time and lots of
searching.
Triggers often are used to implement business/application logic, but this is not what a relational
database is built or optimized for. In general, applications should manage as much of this as possible.
When not possible, consider using stored procedures as opposed to triggers.
The danger of triggers is that they become a part of the calling transaction. A single write operation can
easily become many and result in waits on other processes until all triggers have fired successfully.
To summarize some best practices:
Use triggers only when needed, and not as a convenience or time-saver.
Avoid triggers that call more triggers. These can lead to crippling amounts of IO or complex
query paths that are frustrating to debug.
Server trigger recursion should be turned off. This is the default. Allowing triggers to call
themselves, directly or indirectly, can lead to unstable situations or infinite loops.
Keep triggers simple and have them execute a single purpose.
Conclusion
Troubleshooting performance can be challenging, time-consuming, and frustrating. One of the best ways
to avoid these troubles is to build a database intelligently up-front and avoid the need to have to fix
things later.
By gathering information about an application and how it is used, we can make smart architecture
decisions that will make our database more scalable and perform better over time. The result will be
better performance and less need to waste time on troubleshooting broken things.
Table of contents
Query optimization techniques in SQL Server: the
basics
May 30, 2018 by Ed Pollack
Description
Fixing and preventing performance problems is critical to the success of any application. We will use a
variety of tools and best practices to provide a set of techniques that can be used to analyze and speed
up any performance problem!
This is one of my personal favorite areas of research and discussion as it is inherently satisfying. Taking a
performance nightmare and tuning it into something fast and sleek feels great and will undoubtedly
make others happy.
I often view optimization as a detective mystery. Something terrible has happened and you need to
follow clues to locate and apprehend the culprit! This series of articles is all about these clues, how to
identify them, and how to use them in order to find the root cause of a performance problem.
For more information about Query optimization, see the SQL Query Optimization — How to
Determine When and If It’s Needed article
Defining Optimization
What is “optimal”? The answer to this will also determine when we are done with a problem and can
move onto the next one. Often, a query can be sped up through many different means, each of which
has an associated time and resource cost.
We usually cannot spend the resources needed to make a script run as fast as possible, nor should we
want to. For the sake of simplicity, we will define “optimal” as the point at which a query performs
acceptably and will continue to do so for a reasonable amount of time in the future. This is as much as a
business definition as it is a technical definition. With infinite money, time, and computing resources,
anything is possible, but we do not have the luxury of unlimited resources, and therefore must define
what “done” is whenever we chase any performance problem.
This provides us with several useful checkpoints that will force us to re-evaluate our progress as we
optimize:
1. The query now performs adequately.
2. The resources needed to optimize further are very expensive.
3. We have reached a point of diminishing returns for any further optimization.
4. A completely different solution is discovered that renders this unneeded.
Over-optimization sounds good, but in the context of resource management is generally wasteful. A
giant (but unnecessary) covering index will cost us computing resources whenever we write to a table for
the rest of eternity (a long time). A project to rewrite code that was already acceptable might cost days or
weeks of development and QA time. Trying to further tweak an already good query may net a gain of
3%, but take a week of sweating to get there.
Our goal is to solve a problem and not over-solve it.
Tools
To keep things simple, we’ll use only a handful of tools in this article:
Execution Plans
An execution plan provides a graphical representation of how the query optimizer chose to execute a
query:
The execution plan shows us which tables were accessed, how they were accessed, how they were joined
together, and any other operations that occurred along the way. Included are query costs, which are
estimates of the overall expense of any query component. A treasure trove of data is also included, such
as row size, CPU cost, I/O cost, and details on which indexes were utilized.
In general, what we are looking for are scenarios in which large numbers of rows are being processed by
any given operation within the execution plan. Once we have found a high cost component, we can
zoom in on what the cause is and how to resolve it.
STATISTICS IO
This allows us to see how many logical and physical reads are made when a query is executed and may
be turned on interactively in SQL Server Management Studio by running the following TSQL:
SET STATISTICS IO ON;
Once on, we will see additional data included in the Messages pane:
Logical reads tell us how many reads were made from the buffer cache. This is the number that we will
refer to whenever we talk about how many reads a query is responsible for, or how much IO it is causing.
Physical reads tell us how much data was read from a storage device as it was not yet present in memory.
This can be a useful indication of buffer cache/memory capacity problems if data is very frequently being
read from storage devices, rather than memory.
In general, IO will be the primary cause of latency and bottlenecks when analyzing slow queries. The unit
of measurement of STATISTICS IO = 1 read = a single 8kb page = 8192 bytes.
Query Duration
Typically, the #1 reason we will research a slow query is because someone has complained and told us
that it is too slow. The time it takes a query to execute is going to often be the smoking gun that leads us
to a performance problem in need of a solution.
For our work here, we will measure duration manually using the timer found in the lower-right hand
corner of SSMS:
There are other ways to accurately measure query duration, such as setting on STATISTICS TIME, but we’ll
focus on queries that are slow enough that such a level of accuracy will not be necessary. We can easily
observe when a 30 second query is improved to run in sub-second time. This also reinforces the role of
the user as a constant source of feedback as we try to improve the speed of an application.
Our Eyes
Many performance problems are the result of common query patterns that we will become familiar with
below. This pattern recognition allows us to short-circuit a great deal of research when we see something
that is clearly poorly written.
As we optimize more and more queries, quickly identifying these indicators becomes more second-
nature and we’ll get the pleasure of being able to fix a problem quickly, without the need for very time-
consuming research.
In addition to common query mistakes, we will also look out for any business logic hints that may tell us
if there is an application problem, parameter issue, or some other flaw in how the query was generated
that may require involvement from others aside from us.
Index Scans
Data may be accessed from an index via either a scan or a seek. A seek is a targeted selection of rows
from the table based on a (typically) narrow filter. A scan is when an entire index is searched to return the
requested data. If a table contains a million rows, then a scan will need to traverse all million rows to
service the query. A seek of the same table can traverse the index’s binary tree quickly to return only the
data needed, without the need to inspect the entire table.
If there is a legitimate need to return a great deal of data from a table, then an index scan may be the
correct operation. If we needed to return 950,000 rows from a million row table, then an index scan
makes sense. If we only need to return 10 rows, then a seek would be far more efficient.
Index scans are easy to spot in execution plans:
1 SELECT
2 *
3 FROM Sales.OrderTracking
4 INNER JOIN Sales.SalesOrderHeader
5 ON SalesOrderHeader.SalesOrderID = OrderTracking.SalesOrderID
6 INNER JOIN Sales.SalesOrderDetail
7 ON SalesOrderDetail.SalesOrderID = SalesOrderHeader.SalesOrderID
8 WHERE OrderTracking.EventDateTime = '2014-05-29 00:00:00';
We can quickly spot the index scan in the top-right corner of the execution plan. Consuming 90% of the
resources of the query, and being labeled as a clustered index scan quickly lets us know what is going on
here. STATISTICS IO also shows us a large number of reads against the OrderTracking table:
Many solutions are available when we have identified an undesired index scan. Here is a quick list of
some thoughts to consider when resolving an index scan problem:
Is there any index that can handle the filter in the query?
o In this example, is there an index on EventDateTime?
If no index is available, should we create one to improve performance on the query?
o Is this query executed often enough to warrant this change? Indexes improve read speeds on
queries, but will reduce write speeds, so we should add them with caution.
Is this a valid filter? Is this column one that no one should ever filter on?
o Should we discuss this with those responsible for the app to determine a better way to search for
this data?
Is there some other query pattern that is causing the index scan that we can resolve? We will
attempt to more thoroughly answer this question below. If there is an index on the filter column
(EventDataTime in this example), then there may be some other shenanigans here that require
our attention!
Is the query one for which there is no way to avoid a scan?
o Some query filters are all-inclusive and need to search the entire table. In our demo above,
if EventDateTIme happens to equal “5-29-2014” in every row in Sales.OrderTracking, then a scan is
expected. Similarly, if we were performing a fuzzy string search, an index scan would be difficult
to avoid without implementing a Full-Text Index, or some similar feature.
As we walk through more examples, we’ll find a wide variety of other ways to identify and resolve
undesired index scans.
Despite only returning 4 rows, the entire index was scanned to return our data. The reason for this
behavior is the use of LEFT on Person.LastName. While our query is logically correct and will return the
data we want, SQL Server will need to evaluate LEFT against every row in the table before being able to
determine which rows fit the filter. This forces an index scan, but luckily one that can be avoided!
When faced with functions in the WHERE clause or in a join, consider ways to move the function onto the
scalar variable instead. Also think of ways to rewrite the query in such a way that the table columns can
be left clean (that is: no functions attached to them!)
The query above can be rewritten to do just this:
1 SELECT
2 Person.BusinessEntityID,
3 Person.FirstName,
4 Person.LastName,
5 Person.MiddleName
6 FROM Person.Person
7 WHERE Person.LastName LIKE 'For%';
By using LIKE and shifting the wildcard logic into the string literal, we have cleaned up
the LastName column, which will allow SQL Server full access to seek indexes against it. Here is the
performance we see on the rewritten version:
The relatively minor query tweak we made allowed the query optimizer to utilize an index seek and pull
the data we wanted with only 2 logical reads, instead of 117.
The theme of this optimization technique is to ensure that columns are left clean! When writing queries,
feel free to put complex string/date/numeric logic onto scalar variables or parameters, but not on
columns. If you are troubleshooting a poorly performing query and notice functions (system or user-
defined) wrapped around column names, then begin thinking of ways to push those functions off into
other scalar parts of the query. This will allow SQL Server to seek indexes, rather than scan, and therefore
make the most efficient decisions possible when executing the query!
Implicit Conversions
Earlier, we demonstrated how wrapping functions around columns can result in unintended table scans,
reducing query performance and increasing latency. Implicit conversions behave the exact same way but
are far more hidden from plain sight.
When SQL Server compares any values, it needs to reconcile data types. All data types are assigned a
precedence in SQL Server and whichever is of the lower precedence will be automatically converted to
the data type of higher precedence. For more info on operator precedence, see the link at the end of this
article containing the complete list.
Some conversions can occur seamlessly, without any performance impact. For example, a VARCHAR(50)
and VARCHAR(MAX) can be compared no problem. Similarly, a TINYINT and BIGINT, DATE and
DATETIME, or TIME and a VARCHAR representation of a TIME type. Not all data types can be compared
automatically, though.
Consider the following SELECT query, which is filtered against an indexed column:
1 SELECT
2 EMP.BusinessEntityID,
3 EMP.LoginID,
4 EMP.JobTitle
5 FROM HumanResources.Employee EMP
6 WHERE EMP.NationalIDNumber = 658797903;
A quick glance and we assume that this query will result in an index seek and return data to us quite
efficiently. Here is the resulting performance:
Despite only looking for a single row against an indexed column, we got a table scan for our efforts.
What happened? We get a hint from the execution plan in the yellow exclamation mark over the SELECT
operation:
Hovering over the operator reveals a CONVERT_IMPLICIT warning. Whenever we see this, it is an
indication that we are comparing two data types that are different enough from each other that they
cannot be automatically converted. Instead, SQL Server converts every single value in the table prior to
applying the filter.
Conclusion
Query optimization is a huge topic that can easily become overwhelming without a good dose of focus.
The best way to approach a performance problem is to find specific areas of focus that are most likely
the cause of latency. A stored procedure could be 10,000 lines long, but only a single line needs to be
addressed to resolve the problem. In these scenarios, finding the suspicious, high-cost, high resource-
consuming parts of a script can quickly narrow down the search and allow us to solve a problem rather
than hunt for it
This article explores the SQL divide by zero error and various methods for eliminating this.
Introduction
We all know that in math, it is not possible to divide a number by zero. It leads to infinity:
Source: www.1dividedby0.com
If you try to do in calculator also, you get the error message – Cannot Divide by zero:
We perform data calculations in SQL Server for various considerations. Suppose we perform an
arithmetic division operator for calculating a ratio of products in a store. Usually, the division works fine,
and we get the ratio:
1 DECLARE @Product1 INT;
2 DECLARE @Product2 INT;
3 SET @Product1 = 50;
4 SET @Product2 = 10;
5 SELECT @Product1 / @Product2 ProductRatio;
Someday, the product2 quantity goes out of stock and that means we do not have any quantity for
product2. Let’s see how the SQL Server query behaves in this case:
1 DECLARE @Product1 INT;
2 DECLARE @Product2 INT;
3 SET @Product1 = 50;
4 SET @Product2 = 0;
5 SELECT @Product1 / @Product2 ProductRatio;
We get SQL divide by zero error messages (message id 8134, level 16):
We do not want our code to fail due to these errors. It is a best practice to write code in such a way that
it does not give divide by zero message. It should have a mechanism to deal proactively with such
conditions.
SQL Server provides multiple methods for avoiding this error message. Let’s explore it in the next section.
If both the arguments are not equal, it returns the value of the first argument
In this example, both argument values differ. It returns the output as value of first argument 10:
1 SELECT NULLIF(10, 5) result;
Let’s modify our initial query using the SQL NULLIF statement. We place the following logic using NULLIF
function for eliminating SQL divide by zero error:
Use NULLIF function in the denominator with second argument value zero
If the value of the first argument is also, zero, this function returns a null value. In SQL Server, if
we divide a number with null, the output is null as well
If the value of the first argument is not zero, it returns the first argument value and division takes
place as standard values
1 DECLARE @Product1 INT;
2 DECLARE @Product2 INT;
3 SET @Product1 = 50;
4 SET @Product2 = 0;
5 SELECT @Product1 / NULLIF(@Product2,0) ProductRatio;
Execute this modified query. We can see the output NULL because denominator contains value zero.
Do we want null value in the output? Is there any method to avoid null and display a definite value?
Yes, we can use SQL ISNULL function to avoid null values in the output. This function replaces the null
value in the expression1 and returns expression2 value as output.
Let’s explore the following query with a combination of SQL NULLIF and SQL ISNULL function:
First argument ((@Product1 / NULLIF(@Product2,0)) returns null
We use the ISNULL function and specify the second argument value zero. As we have the first
argument null, the output of overall query is zero (second argument value)
1 DECLARE @Product1 INT;
2 DECLARE @Product2 INT;
3 SET @Product1 = 50;
4 SET @Product2 = 0;
5 SELECT ISNULL(@Product1 / NULLIF(@Product2,0),0) ProductRatio;
Using ARITHABORT OFF, the batch will terminate and returns a null value. We need to use
ARITHABORT in combination with SET ANSI_WARNINGS OFF to avoid the error message:
We can use the following query to check the current setting for the ARITHABORT parameter:
1 DECLARE @ARITHABORT VARCHAR(3) = 'OFF';
2 IF ( (64 & @@OPTIONS) = 64 ) SET @ARITHABORT = 'ON';
3 SELECT @ARITHABORT AS ARITHABORT;
The default ARITHABORT setting for SSMS is ON. We can view it using SSMS Tools properties.
Navigate to Tools -> Options -> Advanced:
Many client applications or drivers provide a default value of ARITHABORT is OFF. The different
values might force SQL Server to produces a different execution plan, and it might create
performance issues. You should also match the setting similar to a client application while
troubleshooting the performance issues.
Note: You should not modify the value of ARITHABORT unless required. It might create
performance issues, as well. I would suggest using alternative methods (as described earlier) for
avoiding SQL divide by zero error
Modern_spanish is the collation, CI means case insensitive and CS is case sensitive. AS means
Accent Sensitive and AI is Accent Insensitive.
You can also check the information, with the sp_helpsort procedure:
1
2 sp_helpsort
3 go
4
The information displayed is the following:
Modern-Spanish, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive
e. How to check the SQL Server edition in SQL
You can check the SQL Server Edition, using the following T-SQL sentences:
1
2 SELECT SERVERPROPERTY('EDITION')
3 GO
4
The result is the following:
If the result is 0, it means that both authentications are enabled. If it is 1, only Windows
Authentication is enabled.
g. How to list the variables set
In order to list all the variables set, run the following command in sqlcmd:
:ListVar
It will show all the variables set:
2. Running sqlcmd in command mode
You can run sqlcmd as commands. You can run scripts in command mode.
a. How to run a T-SQL script and receive the output in a file in sqlcmd
In the next example, we will show how to run a script using sqlcmd and show the results in
another file.
We will first create a script file named columns.sql with the following sentences:
select * from adventureworks2014.information_schema.columns
In the cmd, run the following command to invoke sqlcmd:
sqlcmd -S DESKTOP-5K4TURF\SQLEXPRESS -E -i c:\sql\columns.sql -o c:\sql\exit.txt
-i is used to specify the input. You specify the script file with the queries.
-o is used to show the results of the input in a file.
The exit.txt file will be created:
The commands will create a backup in a file named backup.sql in the c:\sql folder:
However, if you add the –m 16, the error will no longer be displayed because the error has the
level of 15:
sqlcmd -E -q “create table adventureworks” -m 16 -S
DESKTOP-5K4TURF\SQLEXPRESS
-m 16 will show only the errors higher than 16. As you can see the error message is no longer
displayed
i. How to accept user input
The following example will run a SQL script with one variable. The example will create a database
specified by the user.
We will first create a script named createdb.sql with the following content:
1
2 --file createdb.sql
3 CREATE DATABASE $(DATABASENAME);
4 GO
5
Next, in the cmd we will run the database specifying the database name:
sqlcmd -E -v DATABASENAME=”Userinput” -i
c:\sql\createdb.sql
The command will create a database named Userinput.
In sqlcmd you can run the sp_databases stored procedure:
1
2 Sp_databases
3 GO
4
And you will be able to see the database created:
Note that if you have SSMS 17 or later, SQL PowerShell is installed separately. For more
information about installing SQL PowerShell, refer to our link:
What is new in SSMS 17; PowerShell and DAX
b. How to run scripts in SQL PowerShell (check table fragmentation)
It is possible to run SQL Server scripts with PowerShell. The following example will show the
fragmentation of the table of the table Person.Address in the Adventureworks database.
We will first create a script named fragmentation.sql:
1
2 DECLARE @db_id SMALLINT=DB_ID('AdventureWorks');
3 DECLARE @object_id INT=OBJECT_ID(N'AdventureWorks.Person.Address');
4
5 SELECT * FROM sys.dm_db_index_physical_stats(@db_id,
6 @object_id, NULL, NULL , 'LIMITED');
7 GO
8
In PowerShell for SQL Server, run this script:
Invoke-sqlcmd –inputfile “c: \sql\fragmentation.sql” | Out-File
-filePath “C:\sql\outps.txt”
The output of the outps.txt file will be the following:
5. DAC
a. How to work with a Dedicated Administrator Connection (DAC) in sqlcmd
If SQL Server fails to connect in SSMS or other tools, it is possible to try a DAC connection. This
connection is connection allows to diagnostic and verify the problems of the Database Server.
When the SQL Server is corrupt and it is not possible to connect to it, DAC connection usually
works.
The following example shows how to connect to a SQL Server database:
sqlcmd -S DESKTOP-5K4TURF -E -A -d master
-A is used to specify a DAC connection and -d is used to specify the database to connect.
A DAC connection requires the SQL Browser service to be started and enabled. To enable the SQL
Browser service, if it is disabled, you can use the following commands:
sc config sqlbrowser start=demand
If it is enabled, the message will be the following:
To start the service, you can use the following commands:
net start sqlbrowser
Conclusion
Sqlcmd is a very powerful feature that can help us to automate tasks in SQL Server. You can run
scripts and save the results of your queries in a text file.
Previous article in this series:
A quick Select statement indicates that the record has been successfully inserted:
However, if we call the above-stored procedure one more time, passing the same parameters, the results
grid will be populated differently:
Here we have all the information we set previously to be logged, only this time we also got the
procedure field filled out and of course the SQL Server “friendly” technical message that we have a
violation:
Violation of PRIMARY KEY constraint ‘PK_Sales_1′. Cannot insert duplicate key in object’ Sales.Sales’. The
duplicate key value is (20).
How this was a very artificial example, but the point is that in the real world, passing an invalid date is
very common. For example, passing an employee ID that doesn’t exist in a case when we have a foreign
key set up between the Sales table and the Employee table, meaning the Employee must exist in order to
create a new record in the Sales table. This use case will cause a foreign key constraint violation.
The general idea behind this is not to get the error fizzled out. We at least want to report to an individual
that something went wrong and then also log it under the hood. In the real world, if there was an
application relying on a stored procedure, developers would probably have SQL Server error handling
coded somewhere as well because they would have known when an error occurred. This is also where it
would be a clever idea to raise an error back to the user/application. This can be done by adding the
RAISERROR function so we can throw our own version of the error.
For example, if we know that entering an employee ID that doesn’t exist is more likely to occur, then we
can do a lookup. This lookup can check if the employee ID exists and if it doesn’t, then throw the exact
error that occurred. Or in the worst-case scenario, if we had an unexpected error that we had no idea
what it was, then we can just pass back what it was.
How, if we execute the same stored procedure providing e.g. invalid EmployeeID we’ll get the same
errors as before generated inside out table:
The way we can tell that this wasn’t inserted is by executing a simple Select query, selecting everything
from the Sales table where EmployeeID is 20:
Another thing worth mentioning is that we can actually predefine this error message code, severity, and
state. There is a stored procedure called sp_addmessage that is used to add our own error messages.
This is useful when we need to call the message on multiple places; we can just use RAISERROR and pass
the message number rather than retyping the stuff all over again. By executing the selected code from
below, we then added this error into SQL Server:
This means that now rather than doing it the way we did previously, we can just call the RAISERROR and
pass in the error number and here’s what it looks like:
The sp_dropmessage is, of course, used to drop a specified user-defined error message. We can also view
all the messages in SQL Server by executing the query from below:
1 SELECT * FROM master.dbo.sysmessages
There’s a lot of them and you can see our custom raise error SQL message at the very top.
I hope this article has been informative for you and I thank you for reading.
In this article, you’ll learn the key skills that you need to copy tables between SQL Server instances
including both on-premises and cloud SQL databases. In this article, I’ll walk-through several ways of
copying a table(s) between SQL databases, helping you to see the benefits and trade-offs of each option.
Introduction
Before we begin the article, though, let’s go over the objectives of the article. We then move on to the
overview of each module or methods. In this guide, we briefly discuss several aspects of SQL Server’s
available built-in options, as well as show you a few PowerShell and 3 rd party tools can be used to copy
SQL tables between the databases and between the instances as well. At the beginning of each method,
I’ve given you enough information that the following modules. We follow this module up with several
modules, each of which is dedicated to specific methods.
Objectives:
1. Introduction
2. Discuss various methods to copy tables
Using .Net class library to copy tables with PowerShell
Using Import-and-Export Wizard
Using sqlpackage.exe – Extract and Publish method
Using Generate Scripts wizard in SSMS ( SQL Server Management Studio)
Using INSERT INTO SQL statement
3. And more…
Get started
In SQL Server, copying tables between the databases of the same SQL instances are relatively easier than
copying the data between the remote servers. To minimize the work-load on the production database, it
is always recommended to restore the database from the backup to the new database and then use the
best methods to copy the data to the target database. Again, this depends on the number of tables, size,
and available space. If the size of the table(s) is more than 50% of the total size of the database than the
backup-and-restore method is a recommended option.
In some cases, you might have to copy a few very large table(s), and then you may probably end-up in
moving the table(s) to separate file-groups and perform a partial backup-and-restore method to copy
the data. You can refer to the article Database Filegroup(s) and Piecemeal restores in SQL Server for more
information.
You can also use third-party tools to perform an object level restore from a backup file.
PowerShell script
The following PoSH script creates a function named Get-SQLTable. The function has several mandatory
parameters.
1 function Get-SQLTable
2 {
3 [CmdletBinding()]
4 param(
5
6 [Parameter(Mandatory=$true)]
7 [string] $SourceSQLInstance,
8
9 [Parameter(Mandatory=$true)]
10 [string] $SourceDatabase,
11
12 [Parameter(Mandatory=$true)]
13 [string] $TargetSQLInstance,
14
15 [Parameter(Mandatory=$true)]
16 [string] $TargetDatabase,
17
18 [Parameter(Mandatory=$true)]
19 [string[]] $Tables,
20
21 [Parameter(Mandatory=$false)]
22 [int] $BulkCopyBatchSize = 10000,
23
24 [Parameter(Mandatory=$false)]
[int] $BulkCopyTimeout = 600
25
26
)
27
28
29
30
$sourceConnStr = "Data Source=$SourceSQLInstance;Initial Catalog=$SourceDatabase;Integrated Security=True;"
31
$TargetConnStr = "Data Source=$TargetSQLInstance;Initial Catalog=$TargetDatabase;Integrated Security=True;"
32
33
try
34
{
35
36
Import-Module -Name SQLServer
37
write-host 'module loaded'
38
$sourceSQLServer = New-Object Microsoft.SqlServer.Management.Smo.Server $SourceSQLInstance
39
$sourceDB = $sourceSQLServer.Databases[$SourceDatabase]
40
$sourceConn = New-Object System.Data.SqlClient.SQLConnection($sourceConnStr)
41
42
$sourceConn.Open()
43
44
45
46
foreach($table in $sourceDB.Tables)
47
{
48
49
$tableName = $table.Name
50
$schemaName = $table.Schema
51
$tableAndSchema = "$schemaName.$tableName"
52
53
if ($Tables.Contains($tableAndSchema))
54
{
55
$Tablescript = ($table.Script() | Out-String)
56
$Tablescript
57
58
Invoke-Sqlcmd `
59
-ServerInstance $TargetSQLInstance `
60
-Database $TargetDatabase `
61
-Query $Tablescript
62
63
64
$sql = "SELECT * FROM $tableAndSchema"
65
$sqlCommand = New-Object system.Data.SqlClient.SqlCommand($sql, $sourceConn)
66
[System.Data.SqlClient.SqlDataReader] $sqlReader = $sqlCommand.ExecuteReader()
67
$bulkCopy = New-Object Data.SqlClient.SqlBulkCopy($TargetConnStr,
68
[System.Data.SqlClient.SqlBulkCopyOptions]::KeepIdentity)
69
$bulkCopy.DestinationTableName = $tableAndSchema
70
$bulkCopy.BulkCopyTimeOut = $BulkCopyTimeout
71
$bulkCopy.BatchSize = $BulkCopyBatchSize
72
$bulkCopy.WriteToServer($sqlReader)
73
$sqlReader.Close()
74
$bulkCopy.Close()
75
}
76
}
77
78
79
80
81
$sourceConn.Close()
82
83
84
}
85
catch
86
{
87
[Exception]$ex = $_.Exception
88
write-host $ex.Message
89
}
90
finally
91
{
92
#Return value if any
93
}
94
}
The $tables array variable is used to assign the list of the table(s) to be copied to the target database
1 [string[]] $tables = @('dbo.OPERATION','dbo.OPERATION_DETAIL')
Let us invoke the Get-SQLTable function with the below mentioned parameters to copy the tables from
Adventureworks2016 database on ‘HQDBT01’ to Adentureworks2012 database on hqdbt01/sql2017’
instance.
Get-SQLTable -SourceSQLInstance hqdbt01 -SourceDatabase AdventureWorks2016 -TargetSQLInstance hqdbt01\sql2017 -
1
TargetDatabase AdventureWorks2012 -Tables $tables -BulkCopyBatchSize 5000
The output shows the tables OPERATION and OPERATION_DETAIL copied to the target instance.
Now the destination selection, again pull down SQL provider, Server name and Database from a
drop-down list rather than typing it. And we’ll go Next
In Select Source Tables and Views, select the objects to copy to the destination or you could write
a query. But here we’re just going to copy the data. In this case, let’s bring in the dbo.Cities and
Person.Address.
Click Next
We’re ready to run the copy job. Let us choose Run immediately and Click Next
We can see a summary of the action that we are going to perform using the wizard
Click Finish to execute the job steps.
After successful execution of the job, we can validate and review the output.
You can refer to the article SqlPackage.exe – Automate SQL Server Database Restoration using bacpac with
PowerShell or Batch techniques for more information
5. In Set Scripting Options, Select the Output Type and Click Advanced button. In this case, the
output type re-directed to query window.
6. In the Advanced Scripting Options, select “Schema and Data” from the drop-down list and Click
OK.
7. Next, the Summary page details the outlines of the entire process. Click Next
8. Now, Save or Publish Scripts page shows the progress of the entire process. You can monitor
the status of the entire schema and data generation process.
Summary
So far, we’ve discussed various methods to copy the tables across SQL Server databases. It is evident that
restoring a couple of tables from a backup can be time and space consuming process. It is up to your
environment to follow any of the aforementioned steps to copy the tables in SQL Server. There is no
standard/recommended way to copy a table between the databases but there are many possible
approaches that you can use to fit your needs
If you try to take Transaction Log backup for a database that is configured with the Simple recovery
model, the backup operation will fail with the error message below:
In addition, the Transaction Log backup requires that at least one Full backup is taken from that database
as a start point for the new backup chain. If you try to take a Transaction Log backup from a database
with no Full backup taken previously, the backup operation will fail with the error message below:
Let’s take a Full backup for the database to be able to take Transaction Log backup for that database. We
will use the BACKUP DATABASE T-SQL command to perform the database Full backup operation in our
example here. For more information about the different ways and options for performing database
backups in SQL Server, check the SQL Server Backup and Restore Series. The Full backup of the database
can be taken using the T-SQL script below:
1 BACKUP DATABASE [TSQL]
2 TO DISK = N'C:\Ahmad Yaseen\TSQL.bak' WITH NOFORMAT, NOINIT,
3 NAME = N'TSQL-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, COMPRESSION, STATS = 10
4 GO
Once the database Full backup is performed, we will start taking the Transaction Log backups for the
database. The first Transaction Log backup will take a backup for all the transactions that occurred in the
database since the last Full backup. The Transaction Log backup can be taken using the BACKUP LOG T-
SQL command below:
1 BACKUP LOG [TSQL]
2 TO DISK = N'C:\Ahmad Yaseen\TSQL_2.TRN' WITH NOFORMAT, NOINIT,
3 NAME = N'TSQL-TRN Database Backup', SKIP, NOREWIND, NOUNLOAD, COMPRESSION, STATS = 10
4 GO
On the other hand, the Transaction Log backups that follows the first Transaction Log backup will take
backup for all transactions that occurred in the database since the point that the last Transaction Log
backup stopped at. The Full backup and all following Transaction Log backup until a new Full backup is
taken is called Backup Chain. This backup chain is important to recover the database to a specific point
in time, in the case of any mistakenly performed change or database corruption. The frequency of the
Transaction Log backup depends on how important your data is, the size of the database and what type
of workload this database serves. In the heavily transactional databases, it is recommended to increase
the frequency of the Transaction Log backup, in order to minimize the data loss and truncate the
Transaction Logs to make it available for reuse.
If the database is damaged, it is recommended to create a tail-log backup to enable you to restore the
database to the current point in time. A tail-log backup is used to capture all log records that have not
yet been backed up. This will help in preventing any data loss and to keep the log chain complete.
Assume that you have executed the below DELETE statement by mistake without providing the WHERE
clause. This means that all table records will be deleted:
If you have designed a proper backup solution, the data can be easily recovered by restoring the
database back to the specific point in time before executing the DELETE statement. From the Restore
Database window, the SQL Server will return the complete backup chain that is taken from that database.
If you know the exact file that is taken directly before the data deletion, you can stop at that specific file,
as shown below:
But if you are aware of the exact time of executing the DELETE statement, you can restore the database
back to that specific point in time before the DELETE statement execution, without the need to know
which Transaction Log file contains that point in time. This can be achieved by clicking on
the Timeline option, and specify the time, as shown below:
Transaction Log Truncate
SQL Server Transaction Log truncation is the process in which all VLFs that are marked as inactive will be
deleted from the SQL Server Transaction Log file and become available for reuse. If there is a
single active log record in a VLF, the overall VLF will be considered as active log and cannot be
truncated.
The SQL Server Transaction Log, for the database that is configured with the Simple recovery model, can
be truncated automatically if:
A Checkpoint operator is triggered
The database transaction is committed
The SQL Server Transaction Log, for the database that is configured with the Full or Bulk-
Logged recovery model, can be truncated automatically:
After performing a Transaction Log backup process, and the Transaction Log is not waiting for an
active transaction or any high availability feature, such as Mirroring, Replication or Always On
Availability Group
Change the database recovery model to Simple
For example, if we change the recovery model of the below database to Simple and perform a
Checkpoint directly, the Transaction log will be truncated automatically and will be available for
reuse as shown below:
TRUNCATE_ONLY Transaction Log backup option, that breaks the database backup chain and
truncates the available Transaction Logs. (Available only prior SQL Server 2008.)
If you try to truncate the Transaction Log of the database using the TRUNCATE_ONLY option in a
SQL Server instance on version 2008 and later, the statement will fail with the error message
below:
In the Shrink File page, change the File Type to Log, and choose the Transaction Log file that you
manage to shrink. In this page, you have three options:
Release unused space in the Transaction Log file to the operating system and shrinks the file to
the last allocated extent. This reduces the file size without moving any data
Release unused space in the Transaction Log file to the operating system and tries to relocate
rows to unallocated pages. Here, a value should be specified
Moves all data from the specified file to other files in the same filegroup, in order to delete the
empty file later
The same Transaction Log file can be shrunk using the DBCC SHRINKFILE T-SQL statement below:
1 USE [AdventureWorks2016CTP3]
2 GO
3 DBCC SHRINKFILE (N'AdventureWorks2016CTP3_Log' , 0, TRUNCATEONLY)
4 GO
Shrinking the Transaction Log file to a size smaller than the size of the Virtual Log File is not possible,
even if this space is not used. This is due to the fact that the Transaction Log file can be shrunk only to
the boundary of the VLF. In this case, the SQL Server Database Engine will free as much space as possible,
and then issues an informational message, as shown below:
In the next article of this series, we will discuss the best practices that should be applied to the
transaction log in order to get the optimal performance from it. Stay tuned!
SQL Lag function overview and examples
In the article SQL Server Lead function overview and examples, we explored Lead function for performing
computational operations on data. This article gives an overview of the SQL Lag function and its
comparison with the SQL Lead function.
We can use compatible data types in the default value column. If we use incompatible data types, we get
the following error message:
In the following query, we use SQL Server Lag function and view the output:
1 SELECT [Year],
2 [Quarter],
3 Sales,
4 LAG(Sales, 1, 0) OVER(
5 ORDER BY [Year],
6 [Quarter] ASC) AS [NextQuarterSales]
7 FROM dbo.ProductSales;
In the output, the lag function considers all rows as a single data set and applies Lag function:
In the ProductSales table, we have data for the years of 2017, 2018 and 2019. We want to use a lag
function on a yearly basis. We use the PARTITION BY clause on the Year column and define the logical
subset of data on a yearly basis. We use the Order by clause on year and quarter columns to sort data
first on a yearly basis and then monthly:
1 SELECT [Year],
2 [Quarter],
3 Sales,
4 LAG(Sales, 1, 0) OVER(PARTITION BY [Year]
5 ORDER BY [Year],
6 [Quarter] ASC) AS [NextQuarterSales]
7 FROM dbo.ProductSales;
In the following screenshot, we can see three partitions of data for 2017,2018 and 2019 year. The Lag
function individually works on each partition and calculates the required data:
Conclusion
In this article, we learned the SQL Lag function and its usage to retrieve a value from previous rows. Here
is the quick summary of the lag function:
Click on Close and save the table in the designer. Click Yes in the warning message window.
Once you click on Yes, a foreign key with delete rule is created. Similarly, we can create a foreign key
with UPDATE CASCADE rule by selecting CASCADE as an action for the update rule in INSERT and
UPDATE specifications.
Using T-SQL:
Please refer to the below T-SQL script which creates a parent, child table and a foreign key on the child
table with DELETE CASCADE rule.
1 CREATE TABLE Countries
2
3 (CountryID INT PRIMARY KEY,
4 CountryName VARCHAR(50),
5 CountryCode VARCHAR(3))
6
7
8 CREATE TABLE States
9
10 (StateID INT PRIMARY KEY,
11 StateName VARCHAR(50),
12 StateCode VARCHAR(3),
13 CountryID INT)
14
15
16
17 ALTER TABLE [dbo].[States] WITH CHECK ADD CONSTRAINT [FK_States_Countries] FOREIGN KEY([CountryID])
18 REFERENCES [dbo].[Countries] ([CountryID])
19 ON DELETE CASCADE
20 GO
21
22 ALTER TABLE [dbo].[States] CHECK CONSTRAINT [FK_States_Countries]
23 GO
Insert some sample data using below T-SQL script.
1 INSERT INTO Countries VALUES (1,'United States','USA')
2
3 INSERT INTO Countries VALUES (2,'United Kingdom','UK')
4
5 INSERT INTO States VALUES (1,'Texas','TX',1)
6 INSERT INTO States VALUES (2,'Arizona','AZ',1)
Now I deleted a row in the parent table with CountryID =1 which also deletes the rows in the child table
which has CountryID =1.
Please refer to the below T-SQL script to create a foreign key with UPDATE CASCADE rule.
1 CREATE TABLE Countries
2
3 (CountryID INT PRIMARY KEY,
4 CountryName VARCHAR(50),
5 CountryCode VARCHAR(3))
6
7
8 CREATE TABLE States
9
10 (StateID INT PRIMARY KEY,
11 StateName VARCHAR(50),
12 StateCode VARCHAR(3),
13 CountryID INT)
14
15 GO
16
17 INSERT INTO Countries VALUES (1,'United States','USA')
18
19 INSERT INTO Countries VALUES (2,'United Kingdom','UK')
20
21 INSERT INTO States VALUES (1,'Texas','TX',1)
22 INSERT INTO States VALUES (2,'Arizona','AZ',1)
23
24 GO
25
26
27 ALTER TABLE [dbo].[States] WITH CHECK ADD CONSTRAINT [FK_States_Countries] FOREIGN KEY([CountryID])
28 REFERENCES [dbo].[Countries] ([CountryID])
29 ON UPDATE CASCADE
30 GO
31
32 ALTER TABLE [dbo].[States] CHECK CONSTRAINT [FK_States_Countries]
33 GO
Now update CountryID in the Countries for a row which also updates the referencing rows in the child
table States.
1 UPDATE Countries SET CountryID =3 where CountryID=1
Following is the T-SQL script which creates a foreign key with cascade as UPDATE and DELETE rules.
1 ALTER TABLE [dbo].[States] WITH CHECK ADD CONSTRAINT [FK_States_Countries] FOREIGN KEY([CountryID])
2 REFERENCES [dbo].[Countries] ([CountryID])
3 ON UPDATE CASCADE
4 ON DELETE CASCADE
5 GO
6
7 ALTER TABLE [dbo].[States] CHECK CONSTRAINT [FK_States_Countries]
8 GO
To know the update and delete actions in the foreign key, query sys.foreign_keys view. Replace the
constraint name in the script.
SELECT
1 name,delete_referential_action,delete_referential_action_desc,update_referential_action,update_referential_action_desc
FROM sys.foreign_keys where name ='FK_States_Countries'
The below image shows that a DELETE CASCADE action and no UPDATE action is defined on the foreign
key.
Let’s move forward and check the behavior of delete and update rules the foreign keys on a child table
which acts as parent table to another child table. The below example demonstrates this scenario.
In this case, “Countries” is the parent table of the “States” table and the “States” table is the parent table
of Cities table.
We will create a foreign key now with cascade as delete rule on States table which references to
CountryID in parent table Countries.
1 CREATE TABLE Countries
2
3 (CountryID INT PRIMARY KEY,
4 CountryName VARCHAR(50),
5 CountryCode VARCHAR(3))
6
7
8 CREATE TABLE States
9
10 (StateID INT PRIMARY KEY,
11 StateName VARCHAR(50),
12 StateCode VARCHAR(3),
13 CountryID INT)
14
15 GO
16
17
18 CREATE TABLE Cities
19 (CityID INT,
20 CityName varchar(50),
21 StateID INT)
22 GO
23
24 INSERT INTO Countries VALUES (1,'United States','USA')
25
26 INSERT INTO Countries VALUES (2,'United Kingdom','UK')
27
28 INSERT INTO States VALUES (1,'Texas','TX',1)
29 INSERT INTO States VALUES (2,'Arizona','AZ',1)
30
31 INSERT INTO Cities VALUES(1,'Texas City',1)
32 INSERT INTO Cities values (1,'Phoenix',2)
33
34 GO
35
36
37 ALTER TABLE [dbo].[States] WITH CHECK ADD CONSTRAINT [FK_States_Countries] FOREIGN KEY([CountryID])
38 REFERENCES [dbo].[Countries] ([CountryID])
39 ON DELETE CASCADE
40 GO
Now on the Cities table, create a foreign key without a DELETE CASCADE rule.
1 ALTER TABLE [dbo].[Cities] WITH CHECK ADD CONSTRAINT [FK_Cities_States] FOREIGN KEY([StateID])
2 REFERENCES [dbo].[States] ([StateID])
3 GO
If we try to delete a record with CountryID =1, it will throw an error as delete on parent table “Countries”
tries to delete the referencing rows in the child table States. But on Cities table, we have a foreign key
constraint with no action for delete and the referenced value still exists in the table.
1 DELETE FROM Countries where CountryID =1
The delete fails at the second foreign key.
When we create the second foreign key with cascade as delete rule then the above delete command runs
successfully by deleting records in the child table “States” which in turn deletes records in the second
child table “Cities”.
1 CREATE TABLE Countries
2
3 (CountryID INT PRIMARY KEY,
4 CountryName VARCHAR(50),
5 CountryCode VARCHAR(3))
6
7
8 CREATE TABLE States
9
10 (StateID INT PRIMARY KEY,
11 StateName VARCHAR(50),
12 StateCode VARCHAR(3),
13 CountryID INT)
14
15 GO
16
17
18 CREATE TABLE Cities
19 (CityID INT,
20 CityName varchar(50),
21 StateID INT)
22 GO
23
24 INSERT INTO Countries VALUES (1,'United States','USA')
25
26 INSERT INTO Countries VALUES (2,'United Kingdom','UK')
27
28 INSERT INTO States VALUES (1,'Texas','TX',1)
29 INSERT INTO States VALUES (2,'Arizona','AZ',1)
30
31 INSERT INTO Cities VALUES(1,'Texas City',1)
32 INSERT INTO Cities values (1,'Phoenix',2)
33
34 GO
35
36
37 ALTER TABLE [dbo].[States] WITH CHECK ADD CONSTRAINT [FK_States_Countries] FOREIGN KEY([CountryID])
38 REFERENCES [dbo].[Countries] ([CountryID])
39 ON DELETE CASCADE
40 GO
41
42
43 ALTER TABLE [dbo].[Cities] WITH CHECK ADD CONSTRAINT [FK_Cities_States] FOREIGN KEY([StateID])
44 REFERENCES [dbo].[States] ([StateID])
45 ON DELETE CASCADE
46 GO
47
48 DELETE FROM Countries where CountryID =1
Similarly, we cannot create INSTEAD OF DELETE trigger on the table when a foreign key CASCADE
DELETE rule already exists on the table.
This article covers the SQL INSERT INTO SELECT statement along with its syntax, examples, and use cases.
In my earlier article SQL SELECT INTO Statement, we explored the following tasks.
Create a SQL table on the fly while inserting records with appropriate data types
Use SQL SELECT INTO to insert records in a particular FileGroup
We cannot use it to insert data in an existing table
In this example, we inserted records for all columns to the Customers table.
Example 2: Insert rows from source to destination table by specifying column
names
Let’s drop the existing Customers table before we move forward. Now, we want to create a table with
one additional IDENTITY column. IDENTITY column automatically inserts identity values in a table. We
also added a City column that allows NULL values
1 CREATE TABLE Customers
2 (ID INT IDENTITY(1, 1),
3 Emp_ID INT,
4 Name VARCHAR(20),
5 City VARCHAR(20) NULL,
6 );
We cannot use the INSERT INTO SELECT statement similar to the above example. If we try to run this
code, we get an error message.
1 INSERT INTO Customers
2 SELECT *
3 FROM Employees;
In this case, we need to specify the column name with INSERT INTO statement.
1 INSERT INTO Customers (Emp_ID ,Name)
2 SELECT *
3 FROM Employees;
In the Customers table, we have an additional column with allows NULL values. Let’s run a Select on
Customers table. In the following screenshot, we can see NULL values in the City column.
Suppose you have a different column in the source table. You can still insert records into the destination
table with specifying column names in the INSERT INTO SELECT statement. We should have an
appropriate data type to insert data. You cannot insert a varchar column data into an INT column.
Add a new column in Employees table using ALTER TABLE statement.
1 ALTER TABLE Employees
2 ADD Country varchar(50);
Update the table records with country value India.
1 Update Employees set Country='India'
Now, rerun the INSERT INTO SELECT statement. You can notice that we are using SELECT * instead of
specifying column names.
1 INSERT INTO Customers (Emp_ID ,Name)
2 SELECT *
3 FROM Employees;
We get the following error message. This error comes because of the column mismatch between the
source table and destination table.
We can map the column between the source and destination table using the following query.
1 INSERT INTO Customers
2 (Emp_ID,
3 Name
4 )
5 SELECT ID,Name
6 FROM Employees;
Example 3: Insert top rows using the INSERT INTO SELECT statement
Suppose we want to insert Top N rows from the source table to the destination table. We can use Top
clause in the INSERT INTO SELECT statement. In the following query, it inserts the top 1 row from the
Employees table to the Customers table.
1 INSERT TOP(1) INTO Customers
2 (Emp_ID,
3 Name
4 )
5 SELECT ID,Name
6 FROM Employees;
Example 4: Insert using both columns and defined values in the SQL INSERT INTO
SELECT Statement
In previous examples, we either specified specific values in the INSERT INTO statement or used INSERT
INTO SELECT to get records from the source table and insert it into the destination table.
We can combine both columns and defined values in the SQL INSERT INTO SELECT statement.
We have the following columns in the Customers and Employees table. Previously, we did not insert any
values for the City column. We do not have the required values in the Employee table as well. We need to
specify an explicit value for the City column.
In the following query, we specified a value for the City column while the rest of the values we inserted
from the Employees table.
1 INSERT TOP(1) INTO Customers (Emp_ID, Name, City)
2 SELECT ID, Name,'Delhi' FROM Employees;
In the following query, we can see it inserts one row (due to Top (1) clause) along with value for the City
column.
Example 5: INSERT INTO SELECT statement with Join clause to get data from
multiple tables
We can use a JOIN clause to get data from multiple tables. These tables are joined with conditions
specified with the ON clause. Suppose we want to get data from multiple tables and insert into a table.
In this example, I am using AdventureWorks2017 database. First, create a new table with appropriate
data types.
1 CREATE TABLE [HumanResources].[EmployeeData](
2 [FirstName] [dbo].[Name] NOT NULL,
3 [MiddleName] [dbo].[Name] NULL,
4 [LastName] [dbo].[Name] NOT NULL,
5 [Suffix] [nvarchar](10) NULL,
6 [JobTitle] [nvarchar](50) NOT NULL,
7 [PhoneNumber] [dbo].[Phone] NULL,
8 [PhoneNumberType] [dbo].[Name] NULL,
9 [EmailAddress] [nvarchar](50) NULL,
10 [City] [nvarchar](30) NOT NULL,
11 [StateProvinceName] [dbo].[Name] NOT NULL,
12 [PostalCode] [nvarchar](15) NOT NULL,
13 [CountryRegionName] [dbo].[Name] NOT NULL
14 ) ON [PRIMARY]
15 GO
This table should contain records from the output of a multiple table join query. Execute the following
query to insert data into HumanResources.EmployeeData table.
1 INSERT INTO HumanResources.EmployeeData
2 SELECT p.[FirstName],
3 p.[MiddleName],
4 p.[LastName],
5 p.[Suffix],
6 e.[JobTitle],
7 pp.[PhoneNumber],
8 pnt.[Name] AS [PhoneNumberType],
9 ea.[EmailAddress],
10 a.[City],
11 sp.[Name] AS [StateProvinceName],
12 a.[PostalCode],
13 cr.[Name] AS [CountryRegionName]
14 FROM [HumanResources].[Employee] e
15 INNER JOIN [Person].[Person] p ON p.[BusinessEntityID] = e.[BusinessEntityID]
16 INNER JOIN [Person].[BusinessEntityAddress] bea ON bea.[BusinessEntityID] = e.[BusinessEntityID]
17 INNER JOIN [Person].[Address] a ON a.[AddressID] = bea.[AddressID]
18 INNER JOIN [Person].[StateProvince] sp ON sp.[StateProvinceID] = a.[StateProvinceID]
19 INNER JOIN [Person].[CountryRegion] cr ON cr.[CountryRegionCode] = sp.[CountryRegionCode]
20 LEFT OUTER JOIN [Person].[PersonPhone] pp ON pp.BusinessEntityID = p.[BusinessEntityID]
21 LEFT OUTER JOIN [Person].[PhoneNumberType] pnt ON pp.[PhoneNumberTypeID] = pnt.[PhoneNumberTypeID]
22 LEFT OUTER JOIN [Person].[EmailAddress] ea ON p.[BusinessEntityID] = ea.[BusinessEntityID];
23 GO
Example 6: INSERT INTO SELECT statement with common table expression
We use Common Table Expressions (CTE) to simplify complex join from multiple columns. In the previous
example, we used JOINS in a Select statement for inserting data into a SQL table. In this part, we will
rewrite the query with CTE.
In a CTE, we can divide code into two parts.
1. We define CTE by a WITH clause before SELECT, INSERT, UPDATE, DELETE statement
2. Once we define CTE, we can take reference the CTE similar to a relational SQL table
Execute the following code to insert data using a CTE.
1 WITH EmployeeData_Temp([FirstName],
2 [MiddleName],
3 [LastName],
4 [Suffix],
5 [JobTitle],
6 [PhoneNumber],
7 [PhoneNumberType],
8 [EmailAddress],
9 [City],
10 [StateProvinceName],
11 [PostalCode],
12 [CountryRegionName])
13 AS (
14
15 SELECT p.[FirstName],
16 p.[MiddleName],
17 p.[LastName],
18 p.[Suffix],
19 e.[JobTitle],
20 pp.[PhoneNumber],
21 pnt.[Name] AS [PhoneNumberType],
22 ea.[EmailAddress],
23 a.[City],
24 sp.[Name] AS [StateProvinceName],
25 a.[PostalCode],
26 cr.[Name] AS [CountryRegionName]
27 FROM [HumanResources].[Employee] e
28 INNER JOIN [Person].[Person] p ON p.[BusinessEntityID] = e.[BusinessEntityID]
29 INNER JOIN [Person].[BusinessEntityAddress] bea ON bea.[BusinessEntityID] = e.[BusinessEntityID]
30 INNER JOIN [Person].[Address] a ON a.[AddressID] = bea.[AddressID]
31 INNER JOIN [Person].[StateProvince] sp ON sp.[StateProvinceID] = a.[StateProvinceID]
32 INNER JOIN [Person].[CountryRegion] cr ON cr.[CountryRegionCode] = sp.[CountryRegionCode]
33 LEFT OUTER JOIN [Person].[PersonPhone] pp ON pp.BusinessEntityID = p.[BusinessEntityID]
34 LEFT OUTER JOIN [Person].[PhoneNumberType] pnt ON pp.[PhoneNumberTypeID] = pnt.[PhoneNumberTypeID]
35 LEFT OUTER JOIN [Person].[EmailAddress] ea ON p.[BusinessEntityID] = ea.[BusinessEntityID])
36
37 INSERT INTO HumanResources.EmployeeData
38 SELECT *
39 FROM EmployeeData_Temp;
40 GO
Example 7: INSERT INTO SELECT statement with a Table variable
We use Table variables similarly to a temporary table. We can declare them using the table data type.
This table can be used to perform activities in SQL Server where we do not require a permanent table.
You can divide the following query into three parts.
1. Create a SQL Table variable with appropriate column data types. We need to use data type TABLE
for table variable
2. Execute a INSERT INTO SELECT statement to insert data into a table variable
3. View the table variable result set
1 DECLARE @TableVar table(
2 [JobTitle] [nvarchar](50) NOT NULL,
3 [BirthDate] [date] NOT NULL,
4 [MaritalStatus] [nchar](1) NOT NULL,
5 [Gender] [nchar](1) NOT NULL,
6 [HireDate] [date] NOT NULL,
7 [SalariedFlag] [dbo].[Flag] NOT NULL,
8 [VacationHours] [smallint] NOT NULL,
9 [SickLeaveHours] [smallint] NOT NULL
10 )
11
12 -- Insert values into the table variable.
13 INSERT INTO @TableVar
14 SELECT
15 [JobTitle]
16 ,[BirthDate]
17 ,[MaritalStatus]
18 ,[Gender]
19 ,[HireDate]
20 ,[SalariedFlag]
21 ,[VacationHours]
22 ,[SickLeaveHours]
23 FROM [AdventureWorks2017].[HumanResources].[Employee]
24
25 -- View the table variable result set.
26 SELECT * FROM @TableVar;
27 GO
Introduction
Organizations deal with decimals on a day-to-day basis, and these decimal values can be seen
everywhere in different sectors, be it in banks, the medical industry, biometrics, gas stations, financial
reports, sports, and whatnot. Using whole numbers (by rounding decimal numbers) definitely makes
one’s job easier but it often leads to inaccurate outputs, especially when we are dealing with a large
number of values and crucial data. In such scenarios, it is ideal to use Sql Decimal data type in SQL Server
to deliver correct results with perfect precision.
It becomes very essential for SQL developers to choose the correct data types in the table structure while
designing and modeling SQL databases. Let’s move forward and explore Decimal data type in SQL
Server.
Pre-requisite
SQL Decimal data type is being used in SQL Server since forever. You can use any SQL Server version
installed (starting 2000 or above) to understand this data type. We will be using SQL Server 2017 in this
article for the demo purposes. If you don’t have any version installed on your system and wish to practice
against the 2017 version, download it from here.
The above result set shows how SQL Server treats each combination of precision and scale as a different
data type. Like here, decimal (6, 0) behaves differently from data types decimal (6,5) and decimal (3,1)
and are considered as three different types. This way we can tweak the parameters in the SQL Decimal
type to achieve desired results.
Now that we know how to create this Decimal data type in SQL Server, let’s explore it with numerous
examples.
1–9 5
10 – 19 9
20 – 28 13
29 – 38 17
The space consumption of SQL Decimal data type is based on the column definition and not on the size
of the value being assigned to it. For e.g. Decimal (12, 4) with value of 888.888 takes 9 bytes on disk and
Decimal (22, 2) value of 9999.99 consumes 13 bytes on disk. This is why this data type falls under fixed-
length columns.
As a SQL developer myself, I always try to use SQL Decimal data type as decimal (9, 2) which consumes
the least storage, 5 bytes on disk and offers better performance.
Conclusion
I hope this article provides a comprehensible approach on how to use SQL Decimal data type. Always
ensure the precision of the decimal or numeric variable specified is enough to accommodate the values
assigned to it. Additionally, we observed, how selecting the right kind of data type helps SQL developers
to save disk storage.
In case of any questions, please feel free to ask in the comments section below.
To continue your journey with SQL Server and data types used in it, I would recommend going through
the below links.
Spatial SQL data types in SQL Server
SQL Server Data Type Conversion Methods and performance comparison
Understanding the GUID data type in SQL Server
A step-by-step walkthrough of SQL Inner Joi
If you lack knowledge about the SQL join concept in the SQL Server, you can see the SQL Join types
overview and tutorial article.
After this short explanatory about the SQL joins types, we will go through the multiple joins.
Example scenario
Green-Tree company launched a new campaign for the New Year and made different offers to its online
customers. As a result of their campaign, they succeeded in converting some offers to sales. In the
following examples, we will uncover the new year campaign data details of the Green-Tree company.
The company stores these campaign data details in the following tables. Now, we will create these tables
through the following query and populate them with some dummy data:
1 DROP TABLE IF EXISTS sales
2 GO
3 DROP TABLE IF EXISTS orders
GO
DROP TABLE IF EXISTS onlinecustomers
GO
CREATE TABLE onlinecustomers (customerid INT PRIMARY KEY IDENTITY(1,1) ,CustomerName VARCHAR(100)
,CustomerCity VARCHAR(100) ,Customermail VARCHAR(100))
GO
CREATE TABLE orders (orderId INT PRIMARY KEY IDENTITY(1,1) , customerid INT ,
4
ordertotal float ,discountrate float ,orderdate DATETIME)
5
GO
6
CREATE TABLE sales (salesId INT PRIMARY KEY IDENTITY(1,1) ,
7
orderId INT ,
8
salestotal FLOAT)
9
GO
10
11
12
INSERT INTO [dbo].[onlinecustomers]([CustomerName],[CustomerCity],[Customermail]) VALUES
13
(N'Salvador',N'Philadelphia',N'[email protected]')
14
INSERT INTO [dbo].[onlinecustomers]([CustomerName],[CustomerCity],[Customermail]) VALUES (N'Gilbert',N'San
15
Diego',N'[email protected]')
16
INSERT INTO [dbo].[onlinecustomers]([CustomerName],[CustomerCity],[Customermail]) VALUES (N'Ernest',N'New
17
York',N'[email protected]')
18
INSERT INTO [dbo].[onlinecustomers]([CustomerName],[CustomerCity],[Customermail]) VALUES
19
(N'Stella',N'Phoenix',N'[email protected]')
20
INSERT INTO [dbo].[onlinecustomers]([CustomerName],[CustomerCity],[Customermail]) VALUES (N'Jorge',N'Los
21
Angeles',N'[email protected]')
22
INSERT INTO [dbo].[onlinecustomers]([CustomerName],[CustomerCity],[Customermail]) VALUES (N'Jerome',N'San
23
Antonio',N'[email protected]')
24
INSERT INTO [dbo].[onlinecustomers]([CustomerName],[CustomerCity],[Customermail]) VALUES
25
(N'Edward',N'Chicago',N'[email protected]')
26
27
GO
28
29
INSERT INTO [dbo].[orders]([customerid],[ordertotal],[discountrate],[orderdate]) VALUES (3,1910.64,5.49,CAST('03-Dec-
30
2019' AS DATETIME))
31
INSERT INTO [dbo].[orders]([customerid],[ordertotal],[discountrate],[orderdate]) VALUES (4,150.89,15.33,CAST('11-Jun-
32
2019' AS DATETIME))
33
INSERT INTO [dbo].[orders]([customerid],[ordertotal],[discountrate],[orderdate]) VALUES (5,912.55,13.74,CAST('15-Sep-
34
2019' AS DATETIME))
35
INSERT INTO [dbo].[orders]([customerid],[ordertotal],[discountrate],[orderdate]) VALUES (7,418.24,14.53,CAST('28-
36
May-2019' AS DATETIME))
37
INSERT INTO [dbo].[orders]([customerid],[ordertotal],[discountrate],[orderdate]) VALUES (55,512.55,13.74,CAST('15-
38
Jun-2019' AS DATETIME))
39
INSERT INTO [dbo].[orders]([customerid],[ordertotal],[discountrate],[orderdate]) VALUES (57,118.24,14.53,CAST('28-
40
Dec-2019' AS DATETIME))
41
GO
42
INSERT INTO [dbo].[sales]([orderId],[salestotal]) VALUES (3,370.95)
INSERT INTO [dbo].[sales]([orderId],[salestotal]) VALUES (4,882.13)
INSERT INTO [dbo].[sales]([orderId],[salestotal]) VALUES (12,370.95)
INSERT INTO [dbo].[sales]([orderId],[salestotal]) VALUES (13,882.13)
INSERT INTO [dbo].[sales]([orderId],[salestotal]) VALUES (55,170.95)
INSERT INTO [dbo].[sales]([orderId],[salestotal]) VALUES (57,382.13)
Quiz
Question: Please generate the proper query according to the below Venn diagram.
Answer: As we learned, the full join allows us to return all rows from the combined tables. The answered
query will be like the following:
1 SELECT customerName, customercity, customermail, ordertotal,salestotal
2 FROM onlinecustomers AS c
3 FULL JOIN
4 orders AS o
5 ON c.customerid = o.customerid
6 FULL JOIN
7 sales AS s
8 ON o.orderId = s.orderId
Conclusion
dc