0% found this document useful (0 votes)
7 views

Module 13 14 15 16

Uploaded by

saadehsan.17
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Module 13 14 15 16

Uploaded by

saadehsan.17
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 132

13-1

Module 13
Implementing Managed Code in SQL Server
Contents:
Module Overview 13-1
Lesson 1: Introduction to CLR Integration in SQL Server 13-2

Lesson 2: Implementing and Publishing CLR Assemblies 13-9

Lab: Implementing Managed Code in SQL Server 13-17


Module Review and Takeaways 13-20

Module Overview
As a SQL Server® professional, you are likely to be asked to create databases that meet business needs.
Most requirements can be met using Transact-SQL. However, occasionally you may need additional
capabilities that can only be met by using common language runtime (CLR) code.
As functionality is added to SQL Server with each new release, the necessity to use managed code
decreases. However, there are times when you might need to create aggregates, stored procedures,
triggers, user-defined functions, or user-defined types. You can use any .NET Framework language to
develop these objects.

In this module, you will learn how to use CLR managed code to create user-defined database objects for
SQL Server.

Objectives
After completing this module, you will be able to:

 Explain the importance of CLR integration in SQL Server.


 Implement and publish CLR assemblies using SQL Server Data Tools (SSDT).
13-2 Implementing Managed Code in SQL Server

Lesson 1
Introduction to CLR Integration in SQL Server
Occasionally, you might want to extend the built-in functionality of SQL Server; for example, adding a new
aggregate to the existing list of aggregates supplied by SQL Server.

CLR integration is one method for extending SQL Server functionality. This lesson introduces CLR
integration in SQL Server, and its appropriate use cases.

Lesson Objectives
After completing this lesson, you will be able to:

 Explain the ways in which you can extend SQL Server.

 Describe the .NET Framework.

 Describe the .NET CLR environment.

 Explain the need for managed code in SQL Server.

 Explain the situations when the use of Transact-SQL is inappropriate.

 Choose appropriate use cases for managed code in SQL Server.

Options for Extending SQL Server Functionality


With Microsoft® you can extend SQL Server’s
functionality in several ways:

CLR Managed Code


Managed code is normally written in Microsoft
Visual C#® or Microsoft Visual Basic®, and is
executed under the management of the .NET
Framework CLR. Managed code is safer than
unmanaged code—it is designed to be more
reliable and robust. You can extend the
functionality of SQL Server by using managed
code to create user-defined types, aggregates,
mathematical functions, and other functionality.

SQL Server Components


In addition to the database engine, SQL Server now includes Analysis Services (SSAS), Reporting Services
(SSRS), and Integration Services (SSIS). These components are also extensible. For example, you use SSRS
to create rendering extensions, security extensions, data processing extensions, delivery extensions,
custom code, and external assemblies.

Extended Stored Procedures


Previous versions of SQL Server used extended stored procedures to extend the database’s functionality.
These were written in C++ and executed directly within the address space of the SQL Server engine. This
led to memory leaks and other performance issues because minor errors could cause instabilities.
Microsoft does not recommend the use of extended stored procedures, and their use is effectively
deprecated. CLR managed code should be used in place of extended stored procedures.

This module focuses on using CLR managed code to extend SQL Server functionality.
Developing SQL Databases 13-3

Introduction to the .NET Framework


The .NET Framework is the foundation for
developing Windows® applications and services,
including extended functionality for SQL Server.
The .NET Framework provides development tools
that make application and service development
easier.

Win32 and Win64 APIs


The Windows operating system has evolved over
many years. Win32 and Win64 application
programming interfaces (APIs) are the
programming interfaces to the operating system.
These interfaces are complex and often
inconsistent because they have evolved over time rather than being designed with a single set of
guidelines.

.NET Framework
The .NET Framework is a layer of software that sits above the Win32 and Win64 APIs, and provides a layer
of abstraction above the underlying complexity. The .NET Framework is object-oriented and written in a
consistent fashion to a tightly defined set of design guidelines. Many people describe it as appearing to
have been “written by one brain.” It is not specific to any one programming language, and contains
thousands of prebuilt and pretested objects. These objects are collectively referred to as the .NET
Framework class libraries.
The .NET Framework is generally well regarded amongst developers, making it a good choice for building
code to extend SQL Server.

.NET Common Language Runtime


The .NET CLR is a layer in the .NET Framework
which runs code and services that simplify
development and deployment. With the CLR, you
can create programs and procedures in any .NET
language to produce managed code. Managed
code executes under the management of the
common language runtime virtual machine.

By using the CLR integration feature within SQL


Server, you can use .NET assemblies to extend SQL

Server functionality. The .NET CLR offers:

 Better memory management.

 Access to existing managed code.

 Security features to ensure that managed code will not compromise the server.

 The ability to create new resources by using .NET Framework languages such as Microsoft Visual C#
and Microsoft Visual Basic .NET.
13-4 Implementing Managed Code in SQL Server

Memory Management
Managing memory allocation was a problem when developing directly to the Win32 and Win64 APIs.
Component Object Model (COM) programming preceded the .NET Framework, using reference counting
to release memory that was no longer needed. COM would work something like this:

1. Object A creates object B.

2. When Object B is created, it notes that it has one reference.

3. Object C acquires a reference to object B. Object B then notes that it has two references.

4. Object C releases its reference. Object B then notes that it has one reference.
5. Object A releases its reference to Object B. Object B then notes that it has no references, so it is
destroyed.

The problem was that it was easy to create situations where memory could be lost. Consider a circular
reference: if two objects have references to each other, with no other references to either of them, they
can consume memory providing they have a reference to each other. This causes a leak, or loss, of the
memory. Over time, this badly written code results in a loss of all available memory, resulting in instability
and crashes. This is obviously highly undesirable when integrating code into SQL Server.

The .NET Framework includes a sophisticated memory management system, which is known as garbage
collection, to avoid memory leaks. There is no referencing counting—instead the CLR periodically checks
which objects are “reachable” and disposes of the other objects.

Type Safety
Type safe code accesses memory in a properly structured way.
Type safety is a problem with Win32 and Win64 code. When a function or procedure is called, all that is
known to the caller is the function’s address in memory. The caller assembles a list of required parameters,
places them in an area called the stack, and jumps to the memory address of the function. Problems can
arise when the design of the function changes, but the calling code is not updated. The calling code can
then refer to memory locations that do not exist.

The .NET CLR is designed to avoid such problems. Objects are isolated from one another and can only
access the memory allocations for which they have permissions. In addition to providing address details of
a function, the CLR also provides the function’s signature. The signature specifies the data types of each of
the parameters, and their order. The CLR will not permit a function to be called with the wrong number or
types of parameters.

Hosting the CLR


The CLR is designed to be hostable, meaning that it can be operated from within other programming
environments, such as SQL Server. SQL Server acts as the operating system, controlling CLR memory
management, stability, and performance. For example, SQL Server manages the memory that a CLR
assembly can address so that code cannot affect database engine processes.
For more information about the CLR hosted environment within SQL Server, see Microsoft Docs:

CLR Hosted Environment


http://aka.ms/K9pk7y
Developing SQL Databases 13-5

Common Language Specification


The common language specification (CLS) specifies the rules to which languages must conform. A simple
example is that C# is case sensitive, and can recognize two methods, named SayHello and Sayhello, as
being distinct from each other. These two methods could not be called from a case-insensitive language
because they would appear to be the same. The CLS avoids interoperability problems like this by not
permitting you to name these two methods with the same name, regardless of case.

Why Use Managed Code with SQL Server?


Transact-SQL is the primary tool to work with
relational data in SQL Server; it works efficiently
with the SQL Server Database Engine, and covers
most eventualities. So it is relatively rare that
managed code is required. Occasionally, however,
you might have to do something that is outside
the scope of Transact-SQL.

.NET Framework Classes


The .NET Framework offers a set of libraries, each
of which contains a large set of prewritten and
pretested objects—typically referred to as classes.
For example, the Regular Expression (RegEx)
library is a powerful string manipulation class that you can use within SQL Server by using the CLR
integration feature.
The inclusion of managed code in SQL Server also makes access to external resources easier, and in some
cases, provides higher performance.

Alternative to Transact-SQL Objects


Many objects that you can create in Transact-SQL can also be created in managed code, including:
 Scalar user-defined functions.

 Table-valued user-defined functions.

 Stored procedures.

 Data manipulation language (DML) triggers.

 Data definition language (DDL) triggers.

New Object Types


In managed code, you can also construct types of objects that you cannot construct in Transact-SQL.
These include the following:

 User-defined data types.

 User-defined aggregates.

Although you can create objects using managed code, it does not necessarily mean that you should.
Transact-SQL should be used most of the time, with managed code used only when necessary.
13-6 Implementing Managed Code in SQL Server

Considerations When Using Managed Code


When you are considering whether to use
Transact-SQL or managed code, there are a
number of considerations.

Portability
Upgrading a database system that includes
managed code can be more complicated. From
time to time, SQL Server must be upgraded when
old versions come to the end of their life, and
managed code may or may not work with newer
versions, depending on functionality. This can also
be an issue with Transact-SQL, but Transact-SQL is
more likely to have an upgrade path. CLR
managed code is also dependent on the .NET Framework version installed on the server.

Maintainability
Database administrators (DBAs) generally have a good knowledge of Transact-SQL, but little or no
knowledge of C# or Visual Basic. Adding managed code to a database system means that additional
expertise may be required to maintain the system. For larger organizations that already employ
developers, this may not be a problem. Organizations that rely on DBAs to support their SQL Server
databases may find that adding managed code creates a split in expertise that, over time, causes
problems.

Three-Tier Architecture
Transact-SQL is designed as an efficient language to work with relational database tables. If you have an
extensive need for managed code, consider the three-tier architecture for your system. Each tier is
constructed separately, possibly by different teams with different skills. There is a boundary between each
tier, so that each one can be properly and independently tested. Each tier is built using the development
tools best suited to its needs. This separation of concerns creates systems that are more maintainable, and
faster to develop.
A typical three-tier architecture might be composed of a:

 Database tier. The tables, views, stored procedures and other database objects.

 Mid tier or business tier. The data access objects and other code that manage the business logic. As
the name suggests, the mid tier (or business tier) sits between the database tier and the presentation
tier.
 Presentation tier. This is the user interface tier, which might include forms for data input, reports,
and other content.

Transact-SQL
Transact-SQL is the primary method for manipulating data within databases. It is designed for direct data
access and has many built-in functions. However, Transact-SQL is not a fully-fledged high level
programming language. It is not object-oriented so, for example, you cannot create a stored procedure
that takes a parameter of an animal data type and pass a parameter of a cat data type to it. Also,
Transact-SQL is not designed for tasks such as intensive calculations or string handling.
Developing SQL Databases 13-7

Managed Code
Managed code provides full object-oriented capabilities, although this only applies within the managed
code itself. Managed code works well within SQL Server when used sparingly; otherwise you should
consider using a mid tier.

General Rules
Two good general rules apply when you are choosing between using Transact-SQL and managed code:

 Data-oriented requirements should almost always be handled using Transact-SQL.

 Some specialist calculations, strings, or external access might require managed code.

Appropriate Use of Managed Code


In the last topic, we discussed some considerations
when planning to use managed code within your
database. The following list describes areas that
might be suitable for managed code, if needed:

Scalar UDFs
Some scalar user-defined functions (UDFs) that are
written in Transact-SQL cause performance
problems. Managed code can provide an
alternative way of implementing scalar UDFs,
particularly when the function does not depend
on data access.

Table-Valued UDFs
Data-related table-valued UDFs are generally best implemented using Transact-SQL. However, table-
valued UDFs that have to access external resources, such as the file system, environment variables, or the
registry, might be candidates for managed code. Consider whether this functionality properly sits within
the database layer, or whether it should be handled outside of SQL Server.

Stored Procedures
With few exceptions, stored procedures should be written in Transact-SQL. The exceptions to this are
stored procedures that have to access external resources or perform complex calculations. However, you
should consider whether code that performs these tasks should be implemented within SQL Server at all—
it might be better implemented in a mid tier.

DML Triggers
Almost all DML triggers are heavily oriented toward data access and should be written in Transact-SQL.
There are very few valid use cases for implementing DML triggers in managed code.

DDL Triggers
DDL triggers are also data-oriented. However, some DDL triggers have to do extensive XML processing,
particularly based on the XML EVENTDATA structure that SQL Server passes to these triggers. The more
that extensive XML processing is required, the more likely it is that the DDL trigger would be best
implemented in managed code. Managed code would also be a better option if the DDL trigger needed
to access external resources—but this is rarely a good idea within any form of trigger. Again, for any but
the lightest use, consider implementing a mid tier.
13-8 Implementing Managed Code in SQL Server

User-Defined Aggregates
Transact-SQL has no concept of user-defined aggregates. You have to implement these in managed code.

User-Defined Data Types


Transact-SQL offers the ability to create alias data types, but these are not really new data types, they are
subsets of existing built-in data types. Managed code offers the ability to create entirely new data types to
determine what data needs to be stored, and the behavior of the data type.

Check Your Knowledge


Question

Why might you include managed code


in your SQL Server dataset?

Select the correct answer.

To create a new data type that is used


widely in your database.

To replace some Transact-SQL code


that is running slowly.

To create a trigger that alters code in


another application.

To back up SQL Server at a certain


time on Monday morning.
Developing SQL Databases 13-9

Lesson 2
Implementing and Publishing CLR Assemblies
There are two ways to deploy a CLR assembly to a computer running SQL Server—either with Transact-
SQL scripts or using SQL Server Data Tools (SSDT). This lesson focuses on using SSDT to develop and
deploy CLR assemblies. Code examples have been written using C#.

Lesson Objectives
After completing this lesson, you will:

 Be able to explain what an assembly is.

 Understand assembly permissions.

 Use SSDT to create a CLR assembly.

 Use SSDT to publish a CLR assembly.

What Is an Assembly?
Managed code is deployed in SQL Server within an
assembly—a .dll file that contains the executable
code and a manifest. The manifest describes the
contents of the assembly, and the interfaces to the
assembly. SQL Server and other code can then
interrogate what the assembly contains and what
it can do.
Assemblies can contain other resources such as
icons, which are also listed in the manifest. In
general terms, assemblies can be either .exe files
or .dll files; however, SQL Server only works with
.dll files.

Deployment and Security


Assemblies are created outside of SQL Server and they provide the functionality to deploy and version
managed code. After creating an assembly, you can share it between SQL Server instances and business
applications.

Security is applied at the assembly level.

In this lesson, you will see how to use SSDT to create assemblies and publish them to SQL Server.
13-10 Implementing Managed Code in SQL Server

Assembly Permission Sets


The CLR offers several levels of trust that you can
set within policies for the host machine on which
the assembly runs. There are three SQL Server
permissions with which the administrator can
control the server’s exposure: SAFE,
EXTERNAL_ACCESS, and UNSAFE.

Regardless of what the code in an assembly


attempts to do, the permission set determines the
permitted actions.

SAFE
SAFE assemblies have a limited permission set, and
only provide access for the SQL Server database in which it is cataloged. SAFE is the default permission
set—it’s the most restrictive and secure. Assemblies with SAFE permissions cannot access the external
system; for example, network files, the registry, or other files external to SQL Server.

EXTERNAL_ACCESS
EXTERNAL_ACCESS is the permission set that is required to access local and network resources such as
environment variables and the registry. Assemblies with EXTERNAL_ACCESS permissions cannot be used
within a contained database.

UNSAFE
UNSAFE is the unrestricted permission set that should rarely, if ever, be used in a production environment.
UNSAFE is required for code that calls external unmanaged code, or code that holds state information
across function calls. UNSAFE assemblies cannot be used in a contained database.

Setup for EXTERNAL_ACCESS and UNSAFE


The EXTERNAL_ACCESS and UNSAFE permission sets require a trust level to be set up before you can use
them. There are two ways to do this:

 You can flag the database as TRUSTWORTHY by using the ALTER DATABASE SET TRUSTWORTHY ON
statement. This is not recommended as, under certain circumstances, it could provide access for
malicious assemblies.

 Create an asymmetric key from the assembly file that is cataloged in the master database—then
create a login mapping to that key. Finally, grant the login EXTERNAL ACCESS ASSEMBLY permission
on the assembly. This is the recommended method of granting permission to use the
EXTERNAL_ACCESS or UNSAFE permission sets.

Setting Permissions
When you create an assembly using SSDT, the default permission set is SAFE. Alternatively, you can use
the CREATE ASSEMBLY <Transact-SQL clause> WITH PERMISSION_SET = {SAFE | EXTERNAL_ACCESS |
UNSAFE}.

To change the permission set using SSDT, use the Properties tab to set the Permission level. Right-click
the assembly name, and then click Properties.

For more information about permissions, see Microsoft Docs:

Creating an Assembly
http://aka.ms/Ijs5b1
Developing SQL Databases 13-11

SP_CONFIGURE
For security reasons, SQL Server does not allow CLR integration by default. To enable CLR integration, you
must set the clr enabled option to 1. This is set at the instance level using the sp_configure stored
procedure.

This example code firstly displays the settings for sp_configure, enables the advanced options to be
displayed, and then sets the clr enabled option to 1. This allows CLR managed code to the run within the
SQL Server instance:

Using sp_configure to Enable CLR Integration


sp_configure;
GO

sp_configure 'show advanced options', 1;


GO
RECONFIGURE;
GO

sp_configure 'clr enabled', 1;


GO
RECONFIGURE;
GO

CLR Strict Security


CLR Strict Security is a setting in sp_configure that overrides the settings of SAFE and EXTERNAL ACCESS.
When CLR Strict Security is set to 1, the PERMISSION_SET information is ignored, and all assemblies are
treated as UNSAFE.
CLR Strict Security is set to 1 by default, but can be disabled by setting it to 0 although this is not
recommended. Instead, consider signing assemblies with a certificate or using an asymmetric key with a
login that has UNSAFE ASSEMBLY permissions..

sys.sp_add_trusted_assembly
Trusted assemblies can be added to a “white list” using sp_add_trusted_assembly. Use
sp_add_trusted_assembly, sp_drop_trusted_assembly, and sys.trusted_assemblies to manage your
whitelist.

For more information about CLR strict security, and see Microsoft Docs:

CLR strict security


https://aka.ms/C3kl7u

For more information about sys.sp_add_trusted_assembly, see Microsoft Docs:


sys.sp_add_trusted_assembly (Transact-SQL)
https://aka.ms/Jggb1s
13-12 Implementing Managed Code in SQL Server

SQL Server Data Tools


SQL Server Data Tools (SSDT) was introduced with
SQL Server 2012 to provide a rich development
environment for SQL Server. SSDT is integrated
into Visual Studio and so is familiar to many
developers—although perhaps less familiar to
DBAs. You can use SSDT to develop, debug, and
refactor database code, in addition to developing
Transact-SQL and CLR managed code.

CLR managed code must be created for the


specific .NET Framework version that is deployed
on the target machine. With SSDT, you can specify
the .NET Framework version you are developing it
for. SSDT also provides a number of templates for different SQL Server objects including:

 Aggregates

 Stored procedures
 Triggers

 User-defined functions

 User-defined types

Develop a CLR Assembly Using SSDT


You can develop a CLR managed code assembly using Visual Studio 2015 with SSDT installed, using the
following steps:
1. Determine which version of the .NET Framework is installed on the target SQL Server computer.

2. In Visual Studio, on the File menu, point to New, and then click Project.

3. In the list of templates, click SQL Server and then click SQL Server Database Project. Select the .NET
Framework of your target SQL Server computer, enter a name and directory for your project and then
click OK.
4. To create a CLR object, add a new item to the project using the template for the language of the SQL
CLR object you want to create. The examples in this course are written in C#, so you would select SQL
CLR C#.

5. Select the type of CLR object you want to create. The available choices include aggregates, stored
procedures, triggers, user-defined functions, and data types.

6. In the new template code window, write your code.

7. When you have completed your CLR code, build the solution, correcting any errors that might occur.
8. Publish the CLR assembly, specifying the target database and connection information. You can then
use the CLR assembly within your SQL Server database.

Set Permission Level


The permission level is set at the assembly level in the Properties dialog box for that assembly. You can
set the permission level by selecting the appropriate option from the drop-down list. SAFE is the default
option, and any assemblies you create will automatically be set to SAFE if you do not amend the
permission level.
Developing SQL Databases 13-13

For a good introduction to SQL Server Data Tools, see MSDN:

SQL Server Data Tools


http://aka.ms/Koihog

Publishing a CLR Assembly


Visual Studio and SSDT is an ideal environment for
developing CLR managed code and most C#
developers will be familiar with Visual Studio.

SSDT includes a number of preinstalled code


templates, including:

 SQL CLR C# aggregate.

 SQL CLR C# stored procedure.

 SQL CLR C# user-defined function.


 SQL CLR C# user-defined type.

The templates include generated code that you


can customize with your own code. You then publish CLR managed code assemblies to on-premises SQL
Server instances.

Set Properties for the Assembly


After creating the assembly, you have to configure its properties. You must set the permission level to
SAFE, and the target platform must be the version of SQL Server where you are deploying your assembly.
Properties and permissions are both set at the assembly level. The permission level default is SAFE and the
default target platform is SQL Server 2016.

Using SSDT, open the CLR assembly project, and in Solution Explorer, right-click on the project name, and
then click Properties. On the Project Settings page, amend the target platform. On the SQLCLR page,
set the permission level. Save the assembly before closing the Properties dialog box.

Rebuild and Publish the Assembly


After configuring the properties, rebuild the solution, and then publish it to your target database.

Using the Assembly


After the assembly is published, you will be able to access it in SSMS—in Solution Explorer, expand your
database, expand Programmability, and then drill down to the appropriate type of assembly. In addition,
you can access it directly from your Transact-SQL code.
13-14 Implementing Managed Code in SQL Server

Demonstration: Creating a User-Defined Function


In this demonstration, you will see how to:

 Develop a simple function using CLR C# managed code.

 Publish an assembly.

Demonstration Steps
1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are both running, and then
log on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.

2. Navigate to D:\Demofiles\Mod13, right-click Setup.cmd, and then click Run as administrator.

3. In the User Account Control dialog box, click Yes, and then wait for the script to finish.

4. On the Start screen, type SQL Server Data Tools 2015, and then click SQL Server Data Tools 2015.

5. On the File menu, point to New, and then click Project.

6. In the New Project dialog box, expand Templates, and then click SQL Server.
7. In the middle pane, click SQL Server Database Project, in the top pane, in the .NET Framework list,
click .NET Framework 4.6.

8. In the Name box, type ClrDemo, in the Location box, type D:\Demofiles\Mod13, and then click
OK.
9. In Solution Explorer, right-click ClrDemo, point to Add, and then click New Item.

10. In the Add New Item dialog box, in the Installed list, under SQL Server, click SQL CLR C#, in the
middle pane, click SQL CLR C# User Defined Function, in the Name box, type HelloWorld.cs, and
then click Add.

11. Locate the return statement, which is immediately below the comment // Put your code here.

12. Amend the function to read:

return new SqlString("Hello World!");

13. On the File menu, click Save All.

14. In Solution Explorer, right-click ClrDemo, and then click Properties.

15. On the Project Settings page, in the Target platform list, select SQL Server 2017.
16. On the SQLCLR page, in the Permission level list, note that SAFE is selected, then click Signing.

17. In the Signing dialog, select Sign the assembly. In the Choose a strong name key file box, select
New.

18. In the Create Strong Name Key dialog, in the Key file name box type ClrDemo. In the Enter
password box type Pa55w.rd then in the Confirm password box type Pa55w.rd then click OK, then
click OK.

19. On the File menu, click Save All.

20. On the Build menu, click Build Solution.

21. In Windows Explorer, navigate to D:\Demofiles\Mod13, then double-click create_login.sql. When


SQL Server Management Studio starts, in the Connect to Database Engine dialog, confirm that the
Server name box has the value MIA-SQL, then click Connect. If the Connect to Database Engine
dialog is displayed a second time, click Connect.
Developing SQL Databases 13-15

22. Review the script in the create_login.sql pane. Observe that an asymmetric key is create from the
ClrDemo.dll that you just compiled, and that a login is created based on the imported asymmetric
key, then click Execute. When the script completes, close SQL Server Management Studio.

23. In Solution Explorer, right-click ClrDemo, and then click Publish.

24. In the Publish Database dialog box, click Edit.

25. In the Connect dialog box, on the Browse tab, expand Local, and then click MIA-SQL.

26. In the Database Name list, click AdventureWorks2014, and then click Test Connection.

27. In the Connect message box, click OK.


28. In the Connect dialog box, click OK.

29. In the Publish Database dialog box, click Publish.

30. After a few seconds, a message will be displayed to say it has published successfully.

31. On the taskbar, click Microsoft SQL Server Management Studio.

32. In the Connect to Server dialog box, in the Server name box, type MIA-SQL, and then click
Connect.

33. In Object Explorer, expand Databases, expand AdventureWorks2014, expand Programmability,


expand Functions, and then expand Scalar-valued Functions. Note that dbo.HelloWorld appears
in the list.
34. In Object Explorer, right-click AdventureWorks2014, and click New Query.

35. In the new query window, type the following code, and then click Execute.

SELECT dbo.HelloWorld();

36. Close SSMS without saving any changes.


37. Close Visual Studio.

Question: Do you use managed code in your SQL Server databases? Who maintains the
code when it needs amending?
13-16 Implementing Managed Code in SQL Server

Sequencing Activity
Put the following steps in order by numbering each to indicate the correct order:

Steps

Check which
version of
the .NET
Framework is
installed on the
machine hosting
your SQL Server.

Open Visual
Studio and check
that SSDT is
installed.

Create a new
project.

Add a new item


to the project.

Amend the
template with
your new code.

Build the
solution.

Publish the
solution.

Open SSMS and


check the new
function appears.

Create a
Transact-SQL
query using your
new managed
code function.
Developing SQL Databases 13-17

Lab: Implementing Managed Code in SQL Server


Scenario
You work for the rapidly expanding Adventure Works Bicycle Company Inc. A new developer has joined
the database team and has decided to implement almost all of the logic in SQL CLR assemblies. You will
determine if this is appropriate. You will also implement and test a supplied .NET assembly.

Objectives
After completing this lab you will be able to:

 Assess whether proposed functionality should be implemented in Transact-SQL, or CLR managed


code.

 Implement a CLR assembly.

 Create a table-valued CLR function.

Estimated Time: 45 minutes

Virtual machine: 20762C-MIA-SQL


User name: ADVENTUREWORKS\Student

Password: Pa55w.rd

Exercise 1: Assessing Proposed CLR Code


Scenario
You first have to assess a list of proposed functionality for your company database. You will determine
which functions should be implemented using Transact-SQL and which should be implemented using CLR
managed code integrated into SQL Server.

The main tasks for this exercise are as follows:


1. Prepare the Environment

2. Review the List of Proposed Functionality

 Task 1: Prepare the Environment


1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are both running, and then
log on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.

2. Run Setup.cmd in the D:\Labfiles\Lab13\Starter folder as Administrator.

 Task 2: Review the List of Proposed Functionality


1. Open TSQL_or_Managed_Code.docx from the D:\Labfiles\Lab13\Starter folder.
2. Review the proposed functionality and for each row, enter your recommended choice of
implementation (SQL CLR or Transact-SQL) in column B and your reasons for that choice in column C.
3. Compare your choices with the answers in TSQL_or_Managed_Code_Solution.docx, in the
D:\Labfiles\Lab13\Solution folder.

4. Close any open WordPad windows when you have finished.

Results: After completing this lab, you will have determined which type of code to use for each new
feature.
13-18 Implementing Managed Code in SQL Server

Exercise 2: Creating a Scalar-Valued CLR Function


Scenario
You have been asked to create a scalar-valued function using CLR managed code. The function takes two
parameters and performs a regular expression match on them. The first parameter is a text field to be
searched and the other is a regular expression pattern. It returns 1 if a match is found, otherwise 0 is
returned. Your function can be used in a WHERE clause of a SELECT statement.

The main tasks for this exercise are as follows:

1. Create a Scalar-Valued Function

2. Publish the Scalar-Valued Function

3. Create an Asymmetric Key and a Login in the Database


4. Sign and Publish the Assembly

5. Test the Scalar-Valued Function

 Task 1: Create a Scalar-Valued Function


1. Start SQL Server Management Studio, and then connect to the MIA-SQL Server instance by using
Windows authentication.

2. Start Visual Studio.

3. Open ClrPractice.sln from the D:\Labfiles\Lab13\Starter\ClrPractice folder.


4. Open IsRegexMatch.cs and examine the code.

5. Build the solution, and check that the solution builds without errors.

 Task 2: Publish the Scalar-Valued Function


 Attempt to publish the assembly to the MIA-SQL Server instance and the AdventureWorks2014
database.

Publishing fails with an error message. What is the problem?

 Task 3: Create an Asymmetric Key and a Login in the Database


 Open Creatre_asymmetric_key.sql from the D:\Labfiles\Lab13\Starter folder. Edit the script in the
file to create an asymmetric key from strong_name.snk in the D:\Labfiles\Lab13\Starter folder, then
to create a login called sign_assemblies from the asymmetric key. Grant the new login the UNSAFE
ASSEMBLY permission. Execute the script when you have finished editing.

 Task 4: Sign and Publish the Assembly


 In Visual Studio, change the properties of the ClrPractice project to sign the assembly with
strong_name.snk in the D:\Labfiles\Lab13\Starter folder. Rebuild the project, then publish the
assembly to the MIA-SQL Server instance and the AdventureWorks2014 database.

 Task 5: Test the Scalar-Valued Function


1. In SSMS, verify that the ClrPractice assembly is available in the AdventureWorks2014 database.

2. Verify that the IsRegExMatch function is available in the AdventureWorks2014 database.

3. Open RegExMatch.sql from the D:\Labfiles\Lab13\Starter folder.

4. Review and execute the query to verify that the function works as expected.

5. Close the file, but keep SSMS and Visual Studio open for the next exercise.
Developing SQL Databases 13-19

Results: After completing this exercise, you will have a scalar-valued CLR function available in SQL Server
Management Studio.

Exercise 3: Creating a Table-Valued CLR Function


Scenario
You have also been asked to create a table-valued function using CLR managed code. The function takes
two parameters and performs a regular expression match on them. The first parameter is a text field to be
searched and the other is a regular expression pattern. This function returns a list of string values for the
matches.

The main tasks for this exercise are as follows:

1. Create a Table-Valued function

2. Publish and Test the Table-Valued Function

 Task 1: Create a Table-Valued function


1. In Visual Studio, open RegexMatchesTV.cs, and examine the code.

2. Build the solution, and check that the solution builds without errors.

 Task 2: Publish and Test the Table-Valued Function


1. Publish the assembly to the MIA-SQL server instance and the AdventureWorks2014 database.
2. In SSMS, verify that the RegexMatches function is available in the AdventureWorks2014 database.

3. Open TestRegExMatchex.sql from the D:\Labfiles\Lab13\Starter folder.

4. Review and execute the queries to verify that the function works as expected.

5. If time permits, you could test the StringAggregate function by using Test_StringAggregate.sql.

6. Close SSMS without saving any changes.

7. Close Visual Studio without saving any changes.

Results: After completing this exercise, you will have a table-valued CLR function available in SQL Server
Management Studio.

Question: After publishing managed code to a database, what do you think the issues are
with using it?
13-20 Implementing Managed Code in SQL Server

Module Review and Takeaways


In this module, you have learned how to use CLR managed code to create user-defined database objects
for SQL Server.

You should now be able to:

 Explain the importance of CLR integration in SQL Server.

 Implement and publish CLR assemblies using SQL Server Data Tools (SSDT).

Review Question(s)
Question: This module has reviewed the pros and cons of using managed code within a SQL
Server database. You have integrated some prewritten C# functions into a database and
tested them in some queries.

How might you use managed code in your own SQL Server environment? How do you assess
the pros and cons for your specific situation?
14-1

Module 14
Storing and Querying XML Data in SQL Server
Contents:
Module Overview 14-1
Lesson 1: Introduction to XML and XML Schemas 14-2

Lesson 2: Storing XML Data and Schemas in SQL Server 14-11

Lesson 3: Implementing the XML Data Type 14-18


Lesson 4: Using the Transact-SQL FOR XML Statement 14-22

Lesson 5: Getting Started with XQuery 14-34

Lesson 6: Shredding XML 14-42

Lab: Storing and Querying XML Data in SQL Server 14-52

Module Review and Takeaways 14-57

Module Overview
XML provides rules for encoding documents in a machine-readable form. It has become a widely adopted
standard for representing data structures, rather than sending unstructured documents. Servers that are
running Microsoft® SQL Server® data management software often need to use XML to interchange data
with other systems; many SQL Server tools provide an XML-based interface.
SQL Server offers extensive handling of XML, both for storage and querying. This module introduces XML,
shows how to store XML data within SQL Server, and shows how to query the XML data.

The ability to query XML data directly avoids the need to extract data into a relational format before
executing Structured Query Language (SQL) queries. To effectively process XML, you need to be able to
query XML data in several ways: returning existing relational data as XML, and querying data that is
already XML.

Objectives
After completing this module, you will be able to:

 Describe XML and XML schemas.

 Store XML data and associated XML schemas in SQL Server.

 Implement XML indexes within SQL Server.

 Use the Transact-SQL FOR XML statement.

 Work with basic XQuery queries.


14-2 Storing and Querying XML Data in SQL Server

Lesson 1
Introduction to XML and XML Schemas
Before you work with XML in SQL Server, this lesson provides an introduction to XML and how it is used
outside SQL Server. You will learn some core XML-related terminology, along with how you can use
schemas to validate and enforce the structure of XML.

This lesson also explores the appropriate uses for XML when you are working with SQL Server.

Lesson Objectives
After completing this lesson, you will be able to:

 Explain core XML concepts.

 Explain the difference between fragments and documents.

 Describe the role of XML namespaces.

 Describe the role of XML schemas.

 Determine appropriate use cases for XML data storage in SQL Server.

Question: Do you currently work with applications that use XML? If your application does
use XML, have you considered storing and processing that XML data on SQL Server?

Core XML Concepts


XML is a plain-text, Unicode-based metalanguage;
that is, a language used to describe language. You
can use it to hold both structured and
semistructured data. It is not tied to any particular
vendor, language, or operating system. It provides
access to a wide range of technologies for
manipulating, structuring, transforming, and
querying data.

Data Interchange
XML came to prominence as a format for
interchanging data between systems. It follows the
same basic structure rules as other markup
languages (such as HTML) and is used as a self-describing language.

Consider the following XML document:

XML Document
<?xml version="1.0" ?>
<?xml-stylesheet href="orders.xsl" type="text/xsl"?>
<orders>
<order id="ord123456">
<customer id="cust0921">
<first-name>Dare</first-name>
<last-name>Obasanjo</last-name>
<address>
<street>One Microsoft
Way</street>
<city>Redmond</city>
<state>WA</state>
Developing SQL Databases 14-3

<zip>98052</zip>
</address>
</customer>
</order>
<order id="ord123457">
<customer id="cust0067">
<first-name>Shai</first-name>
<last-name>Bassli</last-name>
<address>
<street 567 3rd Ave</street>
<city>Saginaw</city>
<state>MI</state>
<zip>53900</zip>
</address>
</customer>
</order>
</orders>

Without any context and information, you can determine that this document holds the details about
customer orders, the customers who placed the order, and the customer’s name and address details. This
explains why XML is defined as a self-describing language. In formal terminology, this is described as
“deriving a schema” from a document.

XML Specifics
The lines in the example document that start with “<?” are referred to as processing instructions. These
instructions are not part of the data, but determine the details of encoding. The first line in the preceding
example is known as the prolog, and shows that version “1.0” of the XML specification is being used. The
second line is a processing instruction that indicates the use of the extensible style sheet “orders.xsl” to
format the document for display, if displaying the document becomes necessary.
The third line of the example is the first tag of the document and defines the “orders” element. Note that
the document data starts with an opening orders element and finishes with a closing orders element
shown as “</orders>.“ XML allows for repeating data, so the above example contains two orders for
different customers.

Note: XML elements are case-sensitive. For example, <street> is not the same as <Street>.

Element-Centric vs. Attribute-Centric XML


There are two ways to encode data in XML. The following example shows element-centric XML:

Element-Centric XML
<Supplier>
<Name>Tailspin Toys</Name>
<Rating>12</Rating >
</Supplier>

The following example shows the equivalent data in attribute-centric XML:

Attribute-Centric XML
<Supplier Name=”Tailspin Toys” Rating=”12”>
</Supplier>

Note that, if all data for an element is contained in attributes, a shortcut form of element is available.
14-4 Storing and Querying XML Data in SQL Server

As an example, the following two XML elements are equivalent:

Attribute-Centric Shortcut
<Supplier Name="Tailspin Toys" Rating="12"></Supplier>
<Supplier Name="Tailspin Toys" Rating="12"/>

Using the above as an example, the most obvious benefit to the attribute-centric approach is the reduced
size of the data. The element-centric approach needs 65 characters versus the 41 characters needed by
the attribute-centric XML data—a large saving of 37 percent. However, element-centric XML is a better
option in some circumstances, because it can describe more complex data; it can define an element as
nullable; and the data can be parsed quicker, because only the elements need to be processed.

SQL Server can output XML encoded in either way; by using the FOR XML statement, you can produce
XML that combines both these approaches.

Fragments vs. Documents


Well-formed XML has only one top-level, or root,
element and element tags are correctly nested
within each other. Text that has multiple top-level
elements is considered a fragment, not a
document.
Consider the following XML document:

XML Document
<order id="ord123456">
<customer id="cust0921" />
</order>

This code provides the details for a single order and would be considered to be an XML document.

Now consider the following XML code:

XML Fragment
<order id="ord123456">
<customer id="cust0921" />
</order>
<order id="ord123457">
<customer id="cust0925" />
</order>

This text contains the details of multiple orders. Although it is perfectly reasonable XML, it is considered to
be a fragment of XML rather than a document.
Developing SQL Databases 14-5

To be called a document, the XML needs to have a single root element, as shown in the following
example:

Well-formed XML Document


<?xml version="1.0" ?>
<orders>
<order id="ord123456">
<customer id="cust0921" />
</order>
<order id="ord123457">
<customer id="cust0925" />
</order>
</orders>

Well-formed XML will also include at least a prolog defining which version of the XML specification is
being used.

XML Namespaces
An XML namespace is a collection of names that
you can use as element or attribute names. They
are primarily used to avoid name conflicts on
elements in XML documents.

Name Conflicts
This XML defines a table in HTML:

HTML Table
<table>
<tr>

<td>Chicago</td>
<td>New York</td>
<td>London</td>
<td>Paris</td>
</tr>
</table>

This XML defines a piece of furniture:

Table Furniture
<table>
<name>Side Table</name>
<length>80</length>
<width>80</width>
<height>100</height>
<legs>4</legs>
</table>

There would be a name conflict if an application required both these XML fragments to be contained
within one XML document. XML has a mechanism to resolve these name conflicts with prefixes.
14-6 Storing and Querying XML Data in SQL Server

The previous two examples could be rewritten as:

XML Using Prefixes


<?xml version="1.0" ?>
<data>
<html:table>
<html:tr>
<html:td>Chicago</html:td>
<html:td>New York</html:td>
<html:td>London</html:td>
<html:td>Paris</html:td>
</html:tr>
</html:table>
<furniture:table>
<furniture:name>Side Table</furniture:name>
<furniture:length>180</furniture:length>
<furniture:width>80</furniture:width>
<furniture:height>100</furniture:height>
<furniture:legs>4</furniture:legs>
</furniture:table>
</data>

The above XML has resolved the name conflict, but isn’t valid XML until the prefixes are defined in a
namespace.

XML Namespace
An XML namespace is defined by using the special attribute xmlns. The value of the attribute must be a
valid Universal Resource Identifier (URI) or a Uniform Resource Name (URN). This namespace URI is most
commonly a URL, which will point to a location on the Internet. This location does not need to link
directly to an XML schema. You will see how XML schemas are related to namespaces in the next topic.

The following code provides examples of an XML namespace attribute:

XML Namespace Attributes


xmlns=http://schemas.microsoft.com/sqlserver/profiles/gml
xmlns:h=http://www.w3.org/TR/xhtml2/

The namespace attributes have to be added at the root element in the XML document, or they can be
duplicated at each node element that requires them.

Best Practice: Industry best practice is to include namespace attributes in the top level
node to reduce unnecessary duplication throughout the document.
Developing SQL Databases 14-7

To make the previous XML document well-formed, the prefixes need to have namespaces associated with
them:

Using Namespaces
<?xml version="1.0" ?>
<data
xmlns:html="http://www.w3.org/TR/html4/"
xmlns:furniture="http://www.nopanet.org/?page=OFDAXmlCatalog">
<html:table>
<html:tr>
<html:td>Chicago</html:td>
<html:td>New York</html:td>
<html:td>London</html:td>
<html:td>Paris</html:td>
</html:tr>
</html:table>
<furniture:table>
<furniture:name>Side Table</furniture:name>
<furniture:length>180</furniture:length>
<furniture:width>80</furniture:width>
<furniture:height>100</furniture:height>
<furniture:legs>4</furniture:legs>
</furniture:table>
</data>

If a prefix isn’t specified in the namespace attribute, that namespace will be used by default in any XML
elements without a prefix.

XML Schemas
XML schemas are used to define the specific
elements, attributes, and layout permitted within
an XML document. A well-formed XML document
is one that fulfills the criteria specified in the
Fragments vs. Documents topic. If a well-formed
XML document is validated against an XML
schema, the document is said to be a valid and
well-formed XML document.

The World Wide Web Consortium (W3C) defined


XML schemas to be able to describe:

 Elements that can or must appear in a


document.

 Attributes that can or must appear in a document.

 Which elements are child elements.

 The order of child elements.

 The number of child elements.

 Whether an element is empty or can include text.

 Data types for elements and attributes.

 Default and fixed values for elements and attributes.

XML schemas are often referred to as XML Schema Definitions (XSDs). XSD is also the default file
extension that most products use when they are storing XML schemas in files.
14-8 Storing and Querying XML Data in SQL Server

Example XML Schema


The first thing to consider is that an XSD is written in XML, and therefore can have a namespace and XSD
to describe it. Put another way, an XSD can have an XSD that can be used to validate it.

This example XSD has a namespace that links to the W3C definition of XML Schemas:

Example XSD
<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="manual">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="title"
type="xsd:string"/>
<xsd:element name="author"
type="xsd:string"/>
<xsd:element name="published"
type="xsd:date"/>
<xsd:element name="version"
type="xsd:decimal"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>

The above XML schema could be used to validate the following XML:

XML to be Validated
<manual>
<title>How to use XML and SQL Server</title>
<author>Stephen Jiang</author>
<published>2016-05-07</published>
<version>1.07</version>
</manual>

Appropriate Use of XML Data Storage in SQL Server


Given how widely XML has come to be used in
application development in higher application
tiers, there is a tendency to overuse XML within
the database. It is important to consider when it is
and is not appropriate to use XML within SQL
Server.

XML vs. Objects


Higher-level programming languages that are
used for constructing application programs often
represent entities such as customers and orders as
objects. Many developers see SQL Server as a
simple repository for objects; that is, an object-
persistence layer.
Developing SQL Databases 14-9

Consider the following table definition:

Table to Store XML Objects


CREATE TABLE ObjectStore (
ObjectID uniqueidentifier PRIMARY KEY,
PersistedData xml
);

There is no suggestion that this would make for a good database design, but note that you could use this
table design to store all objects from an application—customers, orders, payments, and so on—in a single
table. Compare this to how tables have been traditionally designed in relational databases.

SQL Server gives the developer a wide range of choices, from a simple XML design at one end of the
spectrum, to fully normalized relational tables at the other end. Recognize that there is no generic answer
for how a SQL Server database should be designed; instead, there’s a range of options.

Example Use Cases


There are several reasons for storing XML data within SQL Server:
 You may be dealing with data that is already in XML, such as an order that you are receiving
electronically from a customer. You may want to share, query, and modify the XML data in an
efficient and transacted way.

 You may need to achieve a level of interoperability between your relational and XML data. Imagine
that you have to join a customer table with a list of customer IDs that are being sent to you as XML.

 You might need to use XML formats to achieve cross-domain applications and to have maximum
portability for your data. Other systems that you are communicating with may be based on entirely
different technologies, and might not represent data in the same way as your database server.

 You might not know the structure of your data in advance. It is common to have a mixture of
structured and semistructured data. A table might hold some standard relational columns, but also
hold some less structured data in XML columns. For an example, see the
HumanResources.JobCandidate table in the AdventureWorks database.
 You might need to preserve a sequence within your data. For example, you might need to retain
order detail lines in a specific sequence. Relational tables and views have no implicit sequence. XML
documents can exhibit a predictable sequence.
 You may want to have SQL Server validate that your XML data meets a particular XML schema before
processing it.

 You might want to store transferred XML data for historical reasons, or archival purposes.

 You may want to create indexes on your XML data to support faster Business Information (BI) queries.
14-10 Storing and Querying XML Data in SQL Server

Demonstration: Introduction to XML and XML Schemas


In this demonstration, you will see how to structure XML and structure XML schemas.

Demonstration Steps
Structure XML and Structure XML Schemas

1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are running, and then log
on to 20762C-MIA-SQL as AdventureWorks\Student with the password Pa55w.rd.

2. Navigate to D:\Demofiles\Mod14, right-click Setup.cmd, and then click Run as administrator.

3. In the User Account Control dialog box, click Yes, and then wait until the script finishes.

4. On the taskbar, click Microsoft SQL Server Management Studio.

5. In the Connect to Server dialog box, in Server name box, type MIA-SQL and then click Connect.

6. On the File menu, point to Open, and then click Project/Solution.

7. In the Open Project dialog box, navigate to D:\Demofiles\Mod14, click Demo14.ssmssln, and then
click Open.

8. In Solution Explorer, under Miscellaneous, and then double-click XML_Sample_1.xml.

9. Note the warning line under the second <Production.Product> tag. Position the cursor on the tag
to display the warning.

10. Note that the XML editor in SSMS understands XML and formats it appropriately.

11. Note that the Color attribute is missing from elements where the data is NULL.

12. In Solution Explorer, under Miscellaneous, double-click XML_Sample_2.xml.


13. Note that this document contains a root element, so there is no longer a warning on the second
<Production.Product> tag.

14. In Solution Explorer, under Miscellaneous, double-click XML_Sample_3.xml.


15. Note that this file contains an XML schema followed by the XML data.

16. Leave SSMS open for the next demonstration.


Developing SQL Databases 14-11

Lesson 2
Storing XML Data and Schemas in SQL Server
Now that you have learned about XML, schemas, and the surrounding terminology, you can consider how
to store XML data and schemas in SQL Server. This is the first step in learning how to process XML
effectively within SQL Server.

You need to see how the XML data type is used, how to define schema collections that contain XML
schemas, how to declare both typed and untyped variables and database columns, and how to specify
how well-formed and valid the XML data needs to be before it can be stored.

Lesson Objectives
After completing this lesson, you will be able to:

 Use the XML data type.

 Create XML schema collections.

 Declare variables and database columns as either untyped or typed XML.

 Choose whether XML fragments can be stored, rather than entire XML documents.

XML Data
SQL Server has a native data type for storing XML
data. You can use it for variables, parameters, and
columns in databases. SQL Server also exposes
several methods that you can use for querying or
modifying the stored XML data.
xml is a built-in data type for SQL Server. It is an
intrinsic data type, which means that it is not
implemented separately through managed code.
The xml data type is limited to a maximum size of
2 GB. You can declare variables, parameters, and
database columns by using the xml data type.

You can see a variable that has been declared by


using the xml data type in the following code example:

XML Variable
DECLARE @Orders xml;
SET @Orders = '<Customer Name="Terry"><Order ID="231310" ProductID="12124"/></Customer>';
SELECT @Orders;

Canonical Form
Internally, SQL Server stores XML data in a format that makes it easy to process. It does not store the XML
data in the same format as it was received in.
14-12 Storing and Querying XML Data in SQL Server

For example, consider the following code:

Canonical Form
DECLARE @Settings xml;
SET @Settings = '<Setup><Application Name="StartUpCleanup"
State="On"></Application><Application Name="Shredder" State="Off">Keeps
Spaces</Application></Setup>';
SELECT @Settings;

When the previous code is executed, the resulting XML is:

<Setup>
<Application Name="StartUpCleanup" State="On" />
<Application Name="Shredder" State="Off">Keeps Spaces</Application>
</Setup>

Note that the output that is returned is logically equivalent to the input, but the output is not exactly the
same as the input. For example, the first closing “</Application>” has been removed and replaced by a
closing “/>”. Semantically, the two pieces of XML are identical, the returned XML is referred to as having
been returned in a canonical or logically equivalent form.

If an exact copy of the XML has to be stored and retrieved from the database, consider storing the XML as
a string in, for example, a nvarchar(max). However, using this approach means you will be unable to make
use of the ability to create indexes on the XML and other methods.
For more information about xml, see Microsoft Docs:

xml (Transact-SQL)
https://aka.ms/Yqkpcw

XML Schema Collections


Although the xml data type will only store well-
formed XML, you can further constrain the stored
values by associating the data type with an XML
schema collection.

In the first lesson, you learned how you can use


XML schemas to constrain what you can store in
an XML document. SQL Server does not store XML
schemas as database objects. SQL Server has an
XML SCHEMA COLLECTION object that holds a
collection of XML schemas.

When you associate an XML SCHEMA


COLLECTION object with an XML variable,
parameter, or database column, the XML to be stored in that location needs to conform to at least one of
the schemas that is contained in the schema collection.

XML Schemas
XML schemas are legible to humans at some level, but they are designed to be processed by computer
systems. Even simple schemas tend to have quite a high level of complexity. Fortunately, you do not need
to be able to read (or worse, write!) such schemas. Tools and utilities generally create XML schemas, and
SQL Server can create them, too. You will see an example of this in a later lesson.
Developing SQL Databases 14-13

For example, consider the following XML schema:

XML Schema
<xsd:schema targetNamespace="urn:schemas-microsoft-com:sql:SqlRowSet1"
xmlns:schema="urn:schemas-microsoft-com:sql:SqlRowSet1"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:sqltypes=http://schemas.microsoft.com/sqlserver/2004/sqltypes
elementFormDefault="qualified">
<xsd:import
namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes"

schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd" />
<xsd:element name="Production.Product">
<xsd:complexType>
<xsd:attribute name="ProductID"
type="sqltypes:int" use="required" />
<xsd:attribute name="Name" use="required">
<xsd:simpleType
sqltypes:sqlTypeAlias=

"[AdventureWorks].[dbo].[Name]">
<xsd:restriction
base="sqltypes:nvarchar"

sqltypes:localeId="1033" sqltypes:sqlCompareOptions=
"IgnoreCase
IgnoreKanaType IgnoreWidth">

<xsd:maxLength value="50" />


</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
<xsd:attribute name="Size">
<xsd:simpleType>
<xsd:restriction
base="sqltypes:nvarchar"

sqltypes:localeId="1033" sqltypes:sqlCompareOptions=
"IgnoreCase
IgnoreKanaType IgnoreWidth">

<xsd:maxLength value="5" />


</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
<xsd:attribute name="Color">
<xsd:simpleType>
<xsd:restriction
base="sqltypes:nvarchar"

sqltypes:localeId="1033" sqltypes:sqlCompareOptions=
"IgnoreCase
IgnoreKanaType IgnoreWidth">

<xsd:maxLength value="15" />


</xsd:restriction>
</xsd:simpleType>
</xsd:attribute>
</xsd:complexType>
</xsd:element>
</xsd:schema>
14-14 Storing and Querying XML Data in SQL Server

Creating an XML Schema Collection


An XML schema collection holds one or more schemas. The data that is being validated must match at
least one of the schemas within the collection.

You create an XML schema collection by using the CREATE XML SCHEMA COLLECTION syntax that is
shown in the following code snippet:

CREATE XML SCHEMA COLLECTION


CREATE XML SCHEMA COLLECTION SettingsSchemaCollection
AS
N'<?xml version="1.0" ?>
<xsd:schema
...
</xsd:schema>';

Altering Schema Collections


You can only modify a schema collection in a limited way using Transact-SQL. You can add new schema
components to an existing schema collection by using the ALTER SCHEMA COLLECTION statement.

System Views
You can see the details of the existing XML schema collections by querying the
sys.xml_schema_collections system view. You can see the details of the namespaces that are referenced
by XML schema collections by querying the sys.xml_schema_namespaces system view. Like XML, XML
schema collections are not stored in the format that you use to enter them. They are stripped into an
internal format.
You can get an idea of how XML schema collections are stored by querying the
sys.xml_schema_components system view, as shown in the following code example:

Storage of XML Schema Collections


SELECT sn.name namespace, scp.*
FROM sys.xml_schema_components scp
JOIN sys.xml_schema_collections sc ON scp.xml_collection_id = sc.xml_collection_id
JOIN sys.xml_schema_namespaces sn ON scp.xml_collection_id = sn.xml_collection_id;

Untyped vs. Typed XML


When you are storing XML data, you can choose
to enable any XML to be stored, or you can
choose to constrain the available values by
associating the XML with an XML schema
collection.

Untyped XML
You may choose to store any well-formed XML.
One reason is that you might not have a schema
for the XML data. Another reason is that you
might want to avoid the processing overhead that
is involved in validating the XML against the XML
schema collection. For complex schemas,
validating the XML can involve substantial work.
Developing SQL Databases 14-15

The following example shows the creation of a table that has an untyped XML column:

Untyped XML
CREATE TABLE App.Settings (
SessionID int PRIMARY KEY,
WindowSettings xml
);

You can store any well-formed XML in the WindowSettings column, up to the maximum size, which is
currently 2 GB.

Typed XML
You may want to have SQL Server validate your data against a schema. You might want to take advantage
of storage and query optimizations, based on the type information, or want to take advantage of this type
information during the compilation of your queries.

The following example shows the same table being created, but this time, it has a typed XML column:

Typed XML
CREATE TABLE App.Settings (
SessionID int PRIMARY KEY,
WindowSettings xml (SettingsSchemaCollection)
);

In this case, a schema collection called SettingsSchemaCollection has been defined. SQL Server will not
enable data to be stored in the WindowSettings column if it does not meet the requirements of at least
one of the XML schemas in SettingsSchemaCollection.

CONTENT vs. DOCUMENT


While you are specifying typed XML, you can also
specify whether it is necessary to provide entire
XML documents or whether you can store XML
fragments.

Using the CONTENT Keyword


This is equivalent to the previous definition
because the CONTENT keyword is the default
value for typed XML declarations.

CONTENT Keyword
CREATE TABLE App.Settings (
SessionID int PRIMARY KEY,
WindowSettings xml (CONTENT SettingsSchemaCollection)
);

SQL Server assumes XML data will be in the form of fragments, as opposed to documents, by default. This
means that the preceding definition will have the same result as the Typed XML example Transact-SQL in
the previous topic.

Using the DOCUMENT Keyword


The alternative to the default value of CONTENT is to specify the keyword DOCUMENT.
14-16 Storing and Querying XML Data in SQL Server

The DOCUMENT keyword is shown in the following code:

DOCUMENT Keyword
CREATE TABLE App.Settings (
SessionID int PRIMARY KEY,
WindowSettings xml (DOCUMENT SettingsSchemaCollection)
);

In this case, XML fragments could not be stored in the WindowSettings column. Only well-formed XML
documents will be allowed. For example, a column that is intended to store a customer order can then be
presumed to actually hold a customer order, and not some other type of XML document.

Demonstration: Working with Typed vs. Untyped XML


In this demonstration, you will see how to work with typed and untyped XML.

Demonstration Steps
Work with Typed and Untyped XML

1. Ensure that you have completed the previous demonstration.


2. In SSMS, in Solution Explorer, expand Queries, and then double-click Demonstration 2.sql.

3. Select the code under Step 1, and then click Execute.

4. Select the code under Step 2, and then click Execute to create an XML schema collection.

5. Select the code under Step 3, and then click Execute to create a table with a column that uses the
collection.
6. Select the code under Step 4, and then click Execute to try to insert malformed XML. Note that the
INSERT statement fails.

7. Select the code under Step 5, and then click Execute to try to insert well-formed XML that does not
conform to the schema. Note that this INSERT statement fails too.
8. Select the code under Step 6, and then click Execute to insert a single row fragment.

9. Select the code under Step 7, and then click Execute to insert a multirow fragment.

10. Select the code under Step 8, and then click Execute to view the added XML data in the table.
11. Leave SSMS open for the next demonstration.
Developing SQL Databases 14-17

Check Your Knowledge


Question

Which of the following is not true about


the xml data type?

Select the correct answer.

It can store up to 2 GB of data.

It can be stored with, or without, an


associated XSD.

It can only store well-formed XML


documents.

The CONTENT keyword is the default


value for typed XML.

It can be used for variables,


parameters, and columns.
14-18 Storing and Querying XML Data in SQL Server

Lesson 3
Implementing the XML Data Type
Indexes on XML columns are critical for achieving the high performance of XML-based queries. There are
four types of XML index: a primary index and three types of secondary index. This lesson discusses how
you can use each of them to achieve the maximum performance gain for your queries.

Lesson Objectives
After completing this lesson, you will be able to:

 Describe the need for XML indexes.

 Explain how to use each of the four types of XML index.

What Are XML Indexes?


Indexes are used in SQL Server to improve the
performance of queries. XML indexes are used to
improve the performance of XQuery-based
queries. (XQuery will be discussed later in this
module.)

Many systems query XML data directly as text. This


can be very slow, particularly if the XML data is
large. You saw earlier how XML data is not directly
stored in a text format in SQL Server. For ease of
querying, it is broken into a form of object tree
that makes it easier to navigate in memory.

Rather than having to create these object trees as


required for queries, which is also a relatively slow process, you can define XML indexes. An XML index is
similar to a copy of an XML object tree that is saved into the database for rapid reuse.

It is important to note that XML indexes can be quite large, compared to the underlying XML data.
Relational indexes are often much smaller than the tables on which they are built, but it is not uncommon
to see XML indexes that are larger than the underlying data.

You should also consider alternatives to XML indexes. Promoting a value that is stored within the XML to
a persisted calculated column would make it possible to use a standard relational index to quickly locate
the value.
Developing SQL Databases 14-19

Types of XML Indexes


SQL Server supports four types of XML index: a
primary XML index and up to three secondary
XML types.

Primary XML Index


The primary XML index provides a persisted object
tree in an internal format. The tree has been
formed from the structure of the XML, is used to
speed up access to elements and attributes within
the XML, and avoids the need to read the entire
XML document for every query. Before you can
create a primary XML index on a table, the table
must have a clustered primary key. If a primary
index isn’t created, the database engine will need to create this object tree every time any query has to
use the xml data.

Based on the App.Settings table that was used as an example earlier, you could create a primary XML
index by executing the following code:

Primary XML Index


CREATE PRIMARY XML INDEX PXML_Settings_WindowSettings
ON App.Settings (WindowSettings);

Secondary XML Indexes


Most of the querying benefit comes from primary XML indexes, but SQL Server also enables the creation
of three types of secondary XML index. These secondary indexes are designed to speed up a particular
type of query. These are: PATH, VALUE, and PROPERTY:

 A PATH index helps to decide whether a particular path to an element or attribute is valid. It is
typically used with the exist() XQuery method. (XQuery is discussed later in this module.)
 A VALUE index helps to obtain the value of an element or attribute.

 A PROPERTY index is used when retrieving multiple values through PATH expressions.

You can only create a secondary XML index after a primary XML index has been established.

When you are creating the secondary XML index, you need to reference the primary XML index.

Secondary XML Indexes


CREATE XML INDEX IXML_Settings_WindowSettings_Path
ON App.Settings (WindowSettings)
USING XML INDEX PXML_Settings_WindowSettings FOR PATH;

CREATE XML INDEX IXML_Settings_WindowSettings_Value


ON App.Settings (WindowSettings)
USING XML INDEX PXML_Settings_WindowSettings FOR VALUE;

CREATE XML INDEX IXML_Settings_WindowSettings_Property


ON App.Settings (WindowSettings)
USING XML INDEX PXML_Settings_WindowSettings FOR PROPERTY;

XML Tooling Support


Note that you can create both primary and secondary XML indexes in SQL Server Management Studio
(SSMS), or by using Transact-SQL commands.
14-20 Storing and Querying XML Data in SQL Server

Demonstration: Implementing XML Indexes


In this demonstration, you will see how to implement XML indexes.

Demonstration Steps
1. Ensure that you have completed the previous demonstration.

2. In SSMS, in Solution Explorer, double-click Demonstration 3.sql.

3. Select the code under Step 1, and then click Execute.

4. Select the code under Step 2, and then click Execute to create a primary XML index.
5. Select the code under Step 3, and then click Execute to create a secondary VALUE index.

6. Select the code under Step 4, and then click Execute to query the sys.xml_indexes system view.

7. Select the code under Step 5, and then click Execute to drop and recreate the table without a
primary key.

8. Select the code under Step 6, and then click Execute to try to add the primary xml index again.
Note that this will fail.

9. Leave SSMS open for the next demonstration.

Categorize Activity
Categorize each statement against the appropriate index. Indicate your answer by writing the category
number to the right of each statement.

Items

1 Requires a clustered primary


index to already exist on the
table.

2 Requires a primary XML index


to already exist.

3 Only one type.

4 Three different types.

5 Will provide the most


performance improvement.

6 Could help improve XQueries


on Values.

7 Avoids the need to parse the


whole XML.

8 Could help improve XQueries


on Properties.

9 Could help improve XQueries


checking the existence of a
node.
Developing SQL Databases 14-21

Category 1 Category 2

PRIMARY SECONDARY
Index Index
14-22 Storing and Querying XML Data in SQL Server

Lesson 4
Using the Transact-SQL FOR XML Statement
You have seen how SQL Server can store XML in its different formats. In this lesson, you will see how to
retrieve XML from data stored in traditional tables and rows.

We often need to return data as XML documents, even though it is stored in relational database columns.
Typically, this requirement relates to the exchange of data with other systems, including those from other
organizations. When you add the FOR XML clause to a traditional Transact-SQL SELECT statement, it
causes the output to be returned as XML instead of as a relational rowset.

SQL Server provides several modes for the FOR XML clause to enable the production of many styles of
XML document. You will be learning about each of these modes and their related options.

Lesson Objectives
After completing this lesson, you will be able to:
 Explain the role of the FOR XML clause.

 Use RAW mode queries.

 Use AUTO mode queries.

 Use EXPLICIT mode queries.

 Use PATH mode queries.

 Retrieve nested XML.

Introduction to the FOR XML Clause


You can use the FOR XML clause to enable the
Transact-SQL SELECT statement to return data in
an XML format. You can configure it to return the
attributes, elements, and/or schema that are
required by client applications.

The FOR XML clause can be used in one of four


modes:
1. RAW mode generates a single element per
row in the rowset that the SELECT statement
returns.
2. AUTO mode generates nesting in the
resulting XML, based on the way in which the SELECT statement is specified. You have minimal
control over the shape of the XML that is generated. If you need to produce nested XML, AUTO mode
is a better choice than RAW mode.

3. EXPLICIT mode gives you more control over the shape of the XML. You can use it when other modes
do not provide enough flexibility, but this is at the cost of greater complexity. In deciding the shape
of the XML, you can mix attributes and elements as you like.

4. PATH mode, together with the nested FOR XML query capability, provides much of the flexibility of
the EXPLICIT mode in a simpler manner.
Each of these modes are covered in more detail in the next topics.
Developing SQL Databases 14-23

For more information about FOR XML, see Microsoft Docs:

FOR XML (SQL Server)


https://aka.ms/A74x8t

Using RAW Mode Queries


RAW mode is the simplest mode to work with in
the FOR XML clause. It returns a simple XML
representation of the rowset and can optionally
specify a row element name and a root element.

Consider this simple Transact-SQL query:

Traditional SELECT Statement


SELECT TOP 5 FirstName, LastName,
PersonType
FROM Person.Person
ORDER BY FirstName, LastName;

FirstName LastName PersonType

A. Leonetti SC

A. Wright GC

A. Scott Wright EM

Aaron Adams IN

Aaron Alexander IN

Now consider the modified statement after adding the FOR XML clause:

Using RAW Mode in the FOR XML Clause


SELECT TOP 5 FirstName, LastName, PersonType
FROM Person.Person
ORDER BY FirstName, LastName
FOR XML RAW;

<row FirstName="A." LastName="Leonetti" PersonType="SC" />


<row FirstName="A." LastName="Wright" PersonType="GC" />
<row FirstName="A. Scott" LastName="Wright" PersonType="EM" />
<row FirstName="Aaron" LastName="Adams" PersonType="IN" />
<row FirstName="Aaron" LastName="Alexander" PersonType="IN" />

Note that one XML <row> element is returned for each row from the rowset, the element has a generic
name of row, and all columns are returned as attributes. The returned order is based on the ORDER BY
clause.
14-24 Storing and Querying XML Data in SQL Server

You can override the default row name in the XML and to add a root element:

Making a Well-formed XML Document


SELECT TOP 5 FirstName, LastName, PersonType
FROM Person.Person
ORDER BY FirstName, LastName
FOR XML RAW('Person'),
ROOT('People');

This generates the following results:

<People>
<Person FirstName="A." LastName="Leonetti" PersonType="SC" />
<Person FirstName="A." LastName="Wright" PersonType="GC" />
<Person FirstName="A. Scott" LastName="Wright" PersonType="EM" />
<Person FirstName="Aaron" LastName="Adams" PersonType="IN" />
<Person FirstName="Aaron" LastName="Alexander" PersonType="IN" />
</People>

Element-Centric XML
You will notice that, in the previous examples, the columns from the rowset have been returned as
attribute-centric XML. You can modify this behavior to produce element-centric XML by adding the
ELEMENTS keyword to the FOR XML clause.
You can see this in the following query:

Element-Centric XML
SELECT TOP 5 FirstName, LastName, PersonType
FROM Person.Person
ORDER BY FirstName, LastName
FOR XML RAW('Person'),
ROOT('People'),
ELEMENTS;

When this query is executed, it returns the following output:

<People>
<Person>
<FirstName>A.</FirstName>
<LastName>Leonetti</LastName>
<PersonType>SC</PersonType>
</Person>
<Person>
<FirstName>A.</FirstName>
<LastName>Wright</LastName>
<PersonType>GC</PersonType>
</Person>
<Person>
<FirstName>A. Scott</FirstName>
<LastName>Wright</LastName>
<PersonType>EM</PersonType>
</Person>
<Person>
<FirstName>Aaron</FirstName>
<LastName>Adams</LastName>
<PersonType>IN</PersonType>
</Person>
<Person>
<FirstName>Aaron</FirstName>
<LastName>Alexander</LastName>
<PersonType>IN</PersonType>
</Person>
Developing SQL Databases 14-25

</People>

Note that all the columns have now been returned as elements.

Adding an XML Schema


Another option that can be added to a RAW mode statement is the inclusion of an XML Schema:

Include XML Schema


SELECT TOP 5 FirstName, LastName, PersonType
FROM Person.Person
ORDER BY FirstName, LastName
FOR XML RAW('Person'),
ROOT('People'),
ELEMENTS,
XMLSCHEMA('urn:schema_example.com');

The above Transact-SQL will result in the following XML being generated:

<People>
<xsd:schema targetNamespace="urn:schema_example.com"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes"
elementFormDefault="qualified">
<xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes"
schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd" />
<xsd:element name="Person">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="FirstName">
<xsd:simpleType sqltypes:sqlTypeAlias="[AdventureWorks].[dbo].[Name]">
<xsd:restriction base="sqltypes:nvarchar" sqltypes:localeId="1033"
sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth"
sqltypes:sqlSortId="52">
<xsd:maxLength value="50" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="LastName">
<xsd:simpleType sqltypes:sqlTypeAlias="[AdventureWorks].[dbo].[Name]">
<xsd:restriction base="sqltypes:nvarchar" sqltypes:localeId="1033"
sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth"
sqltypes:sqlSortId="52">
<xsd:maxLength value="50" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="PersonType">
<xsd:simpleType>
<xsd:restriction base="sqltypes:nchar" sqltypes:localeId="1033"
sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth"
sqltypes:sqlSortId="52">
<xsd:maxLength value="2" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
<Person xmlns="urn:schema_example.com">
<FirstName>A.</FirstName>
<LastName>Leonetti</LastName>
<PersonType>SC</PersonType>
</Person>
<Person xmlns="urn:schema_example.com">
14-26 Storing and Querying XML Data in SQL Server

<FirstName>A.</FirstName>
<LastName>Wright</LastName>
<PersonType>GC</PersonType>
</Person>
<Person xmlns="urn:schema_example.com">
<FirstName>A. Scott</FirstName>
<LastName>Wright</LastName>
<PersonType>EM</PersonType>
</Person>
<Person xmlns="urn:schema_example.com">
<FirstName>Aaron</FirstName>
<LastName>Adams</LastName>
<PersonType>IN</PersonType>
</Person>
<Person xmlns="urn:schema_example.com">
<FirstName>Aaron</FirstName>
<LastName>Alexander</LastName>
<PersonType>IN</PersonType>
</Person>
</People>

Using Auto Mode Queries


By default, when using AUTO mode, each row in
the result set is represented as an XML element
that is named after the table (or alias) from which
it was selected. AUTO mode generates nesting in
the resulting XML, based on the way in which the
SELECT statement is specified. You have minimal
control over the shape of the XML that is
generated. AUTO mode queries are more capable
of dealing with nested XML, compared to the RAW
mode.

FOR XML AUTO


AUTO mode queries are useful if you want to
generate simple hierarchies, but they provide limited control of the resultant XML. If you need more
control over the resultant XML than AUTO mode queries provide, you need to consider using the PATH or
EXPLICIT modes instead. These are both covered in the next topics.

Let’s look at the previous SQL example using the AUTO mode:

Using AUTO Mode in the FOR XML Clause


SELECT TOP 5 FirstName, LastName, PersonType
FROM Person.Person
ORDER BY FirstName, LastName
FOR XML AUTO;

Each table in the FROM clause, from which at least one column is listed in the SELECT clause, is
represented as an XML element. The columns that are listed in the SELECT clause are mapped to
attributes. You can see the output of the previous query below:

<Person.Person FirstName="A." LastName="Leonetti" PersonType="SC" />


<Person.Person FirstName="A." LastName="Wright" PersonType="GC" />
<Person.Person FirstName="A. Scott" LastName="Wright" PersonType="EM" />
<Person.Person FirstName="Aaron" LastName="Adams" PersonType="IN" />
<Person.Person FirstName="Aaron" LastName="Alexander" PersonType="IN" />
Developing SQL Databases 14-27

Note how the name of the table is directly used as the element name.

For this reason, it is common to provide an alias for the table, as shown in the following code:

Using a Table Alias


SELECT TOP 5 FirstName, LastName, PersonType
FROM Person.Person AS Person
ORDER BY FirstName, LastName
FOR XML AUTO;

This results in the following:

<Person FirstName="A." LastName="Leonetti" PersonType="SC" />


<Person FirstName="A." LastName="Wright" PersonType="GC" />
<Person FirstName="A. Scott" LastName="Wright" PersonType="EM" />
<Person FirstName="Aaron" LastName="Adams" PersonType="IN" />
<Person FirstName="Aaron" LastName="Alexander" PersonType="IN" />

To generate the well-formed XML that was produced in the previous topic, you can add the
ROOT('People') option to the FOR XML AUTO statement.

Creating Nested XML


The benefits of using AUTO over the RAW mode is that SQL Server will return XML data nested in
accordance with heuristics in the SELECT statement.

Generating More Complex XML


SELECT TOP 3 FirstName, LastName, City
FROM Person.Person AS Employee
INNER JOIN Person.BusinessEntityAddress AS BA
ON BA.BusinessEntityID = Employee.BusinessEntityID
INNER JOIN Person.Address AS Address
ON BA.AddressID = Address.AddressID
ORDER BY FirstName, LastName
FOR XML AUTO, ROOT ('Employees'), ELEMENTS;

Executing the above will result in the following XML being produced:

<Employees>
<Employee>
<FirstName>A. Scott</FirstName>
<LastName>Wright</LastName>
<Address>
<City>Newport Hills</City>
</Address>
</Employee>
<Employee>
<FirstName>Aaron</FirstName>
<LastName>Adams</LastName>
<Address>
<City>Downey</City>
</Address>
</Employee>
<Employee>
<FirstName>Aaron</FirstName>
<LastName>Alexander</LastName>
<Address>
<City>Kirkland</City>
</Address>
</Employee>
</Employees>
14-28 Storing and Querying XML Data in SQL Server

Notice how the City value is nested inside the Address element, which in turn is nested inside the
Employee element.

Using Explicit Mode Queries


EXPLICIT mode gives you the greatest control over
the resulting XML, but at the price of query
complexity. Many common queries that required
EXPLICIT mode in earlier versions of SQL Server
can be implemented by using PATH mode. (PATH
mode was introduced in SQL Server 2005 and will
be explained in the next topic.)

FOR XML EXPLICIT


EXPLICIT mode queries define XML fragments as a
universal table, which consists of a column for
each piece of data that you require, and two
additional columns. The additional columns are
used to define the metadata for the XML fragment. The Tag column uniquely identifies the XML tag that
will be used to represent each row in the results, and the Parent column is used to control the nesting of
elements. Each row of data in the universal table represents an element in the resulting XML document.
The power of EXPLICIT mode is to mix attributes and elements at will, create wrappers and nested
complex properties, create space-separated values (for example, the OrderID attribute may have a list of
order ID values), and create mixed contents.

PATH mode, together with the nesting of FOR XML queries and the TYPE clause, gives enough power to
replace most of the EXPLICIT mode queries in a simpler, more maintainable way. EXPLICIT mode is rarely
needed now.

To produce XML in a similar format to the previous topic, the following SQL is required:

EXPLICIT Mode Example


SELECT 1 AS Tag,
NULL AS Parent,
Employee.FirstName AS [Employee!1!FirstName!ELEMENT],
Employee.LastName AS [Employee!1!LastName!ELEMENT],
NULL AS [Address!2!City!ELEMENT]
FROM Person.Person AS Employee
INNER JOIN Person.BusinessEntityAddress AS BA
ON BA.BusinessEntityID = Employee.BusinessEntityID
INNER JOIN Person.Address AS Address
ON BA.AddressID = Address.AddressID
UNION ALL
SELECT 2 AS Tag,
1 AS Parent,
Employee.FirstName,
Employee.LastName,
Address.City
FROM Person.Person AS Employee
INNER JOIN Person.BusinessEntityAddress AS BA
ON BA.BusinessEntityID = Employee.BusinessEntityID
INNER JOIN Person.Address AS Address
ON BA.AddressID = Address.AddressID
ORDER BY [Employee!1!FirstName!ELEMENT], [Employee!1!LastName!ELEMENT]
FOR XML EXPLICIT, ROOT('Employees');
Developing SQL Databases 14-29

Using Path Mode Queries


PATH mode provides a simpler way to mix
elements and attributes. You can use it in many
situations as an easier way to write queries than by
using EXPLICIT mode.

FOR XML PATH


PATH mode is a simpler way to introduce
additional nesting for representing complex
properties. In PATH mode, column names or
column aliases are treated as XML Path Language
(XPath) expressions. (More detail on XPath will be
provided later in this module.) These expressions
indicate how the values are being mapped to
XML. Each XPath expression is a relative XPath that provides the item type, such as the attribute, element,
and scalar value, and the name and hierarchy of the node that will be generated relative to the row
element.

You can use FOR XML EXPLICIT mode queries to construct such XML from a rowset, but PATH mode
provides a simpler alternative to the potentially time-consuming EXPLICIT mode queries.
You can use PATH mode, together with the ability to write nested FOR XML queries and the TYPE
directive to return xml data type instances, to write less complex queries. This gives enough power to
replace most of the EXPLICIT mode queries in a simpler, more maintainable way.

Here is the SQL to produce the XML in the previous topic:

PATH Mode Example


SELECT TOP 3 FirstName, LastName, City AS "Address/City"
FROM Person.Person AS Employee
INNER JOIN Person.BusinessEntityAddress AS BA
ON BA.BusinessEntityID = Employee.BusinessEntityID
INNER JOIN Person.Address AS Address
ON BA.AddressID = Address.AddressID
ORDER BY FirstName, LastName
FOR XML PATH ('Employee'), ROOT ('Employees'), ELEMENTS;

The XPath expressions can be used to control the structure of the XML. You can modify the default PATH
behavior by using the “at” (@) symbol to define attributes or the forward slash (/) to define the hierarchy.

Let’s add the address ID as an in the previous example:

Using the @ Symbol


SELECT TOP 3 FirstName, LastName, Address.AddressID AS 'Address/@AddressID', City AS
"Address/City"
FROM Person.Person AS Employee
INNER JOIN Person.BusinessEntityAddress AS BA
ON BA.BusinessEntityID = Employee.BusinessEntityID
INNER JOIN Person.Address AS Address
ON BA.AddressID = Address.AddressID
ORDER BY FirstName, LastName
FOR XML PATH ('Employee'), ROOT ('Employees'), ELEMENTS;
14-30 Storing and Querying XML Data in SQL Server

This results in the following XML being produced:

<Employees>
<Employee>
<FirstName>A. Scott</FirstName>
<LastName>Wright</LastName>
<Address AddressID="250">
<City>Newport Hills</City>
</Address>
</Employee>
<Employee>
<FirstName>Aaron</FirstName>
<LastName>Adams</LastName>
<Address AddressID="25953">
<City>Downey</City>
</Address>
</Employee>
<Employee>
<FirstName>Aaron</FirstName>
<LastName>Alexander</LastName>
<Address AddressID="14543">
<City>Kirkland</City>
</Address>
</Employee>
</Employees>

Note the use of the “/” to define the structure of the returned XML, and that both the “@” and “/” can be
combined in aliases. If the alias for the Address.AddressID did not include the “/”, the AddressID attribute
would have been added to the Employee element. For example:

<Employee AddressID="250">
<FirstName>A. Scott</FirstName>
<LastName>Wright</LastName>
<Address>
<City>Newport Hills</City>
</Address>
</Employee>

Nested FOR XML Query


Combining all the previous topics into a nested query:

Combining AUTO and PATH Queries


SELECT TOP 3 BusinessEntityID AS '@ID', LoginID AS '@Login',
(SELECT FirstName, LastName
FROM Person.Person AS EmployeeName
WHERE EmployeeName.BusinessEntityID = Employee.BusinessEntityID
ORDER BY FirstName, LastName
FOR XML AUTO, TYPE, ELEMENTS)
FROM HumanResources.Employee
FOR XML PATH ('Employee'), ROOT ('Employees'), ELEMENTS;

You can use any combination of RAW, AUTO and PATH queries. The previous SQL example will produce
this output:

<Employees>
<Employee ID="225" Login="adventure-works\alan0">
<EmployeeName>
<FirstName>Alan</FirstName>
<LastName>Brewer</LastName>
</EmployeeName>
</Employee>
<Employee ID="193" Login="adventure-works\alejandro0">
<EmployeeName>
Developing SQL Databases 14-31

<FirstName>Alejandro</FirstName>
<LastName>McGuel</LastName>
</EmployeeName>
</Employee>
<Employee ID="163" Login="adventure-works\alex0">
<EmployeeName>
<FirstName>Alex</FirstName>
<LastName>Nayberg</LastName>
</EmployeeName>
</Employee>
</Employees>

Retrieving Nested XML


You can use the TYPE keyword to return FOR XML
subqueries as xml data types, rather than as
nvarchar data types.

TYPE Keyword
In the previous topics in this lesson, you have seen
how FOR XML AUTO queries can return attribute-
centric or element-centric XML. If this data is
returned from a subquery, it needs to be returned
as a specific data type.

The FOR XML clause was introduced in SQL Server


2000. That version of SQL Server did not have an
xml data type. For that reason, subqueries that had FOR XML clauses had no way to return the xml data
type. FOR XML subqueries in SQL Server 2000 returned the nvarchar data type instead.
SQL Server 2005 introduced the xml data type, but for backward compatibility, the data type for return
values from FOR XML subqueries was not changed to xml. However, a new keyword, TYPE, was introduced
that changes the return data type of FOR XML subqueries to xml.

You can use a nested FOR XML query, in a FOR XML query, to build nested XML.

Combining AUTO with TYPE Nested in a PATH Query


SELECT TOP 3 BusinessEntityID AS '@ID', LoginID AS '@Login',
(SELECT FirstName, LastName
FROM Person.Person AS EmployeeName
WHERE EmployeeName.BusinessEntityID = Employee.BusinessEntityID
ORDER BY FirstName, LastName
FOR XML AUTO, TYPE, ELEMENTS)
FROM HumanResources.Employee
FOR XML PATH ('Employee'), ROOT ('Employees'), ELEMENTS;

You can use any combination of RAW, AUTO and PATH queries. The previous SQL example will produce
this output:

<Employees>
<Employee ID="225" Login="adventure-works\alan0">
<EmployeeName>
<FirstName>Alan</FirstName>
<LastName>Brewer</LastName>
</EmployeeName>
</Employee>
<Employee ID="193" Login="adventure-works\alejandro0">
<EmployeeName>
14-32 Storing and Querying XML Data in SQL Server

<FirstName>Alejandro</FirstName>
<LastName>McGuel</LastName>
</EmployeeName>
</Employee>
<Employee ID="163" Login="adventure-works\alex0">
<EmployeeName>
<FirstName>Alex</FirstName>
<LastName>Nayberg</LastName>
</EmployeeName>
</Employee>
</Employees>

Another type of nested XML query might be where XML is required in the results as an actual column
containing XML alongside non-XML columns.

XML and Non-XML Results


SELECT Customer.CustomerID, Customer.TerritoryID,
(SELECT SalesOrderID, [Status]
FROM Sales.SalesOrderHeader AS soh
WHERE Customer.CustomerID = soh.CustomerID
FOR XML AUTO, TYPE) as Orders
FROM Sales.Customer as Customer
WHERE EXISTS
(SELECT 1 FROM Sales.SalesOrderHeader AS soh
WHERE soh.CustomerID = Customer.CustomerID)
ORDER BY Customer.CustomerID;

Executing the previous Transact-SQL will return a table of results, with the last column containing the
required XML data. This will be hyperlinked; if it was a varchar, the results would appear as plain text.

CustomerID TerritoryID Orders

11000 9 <soh
SalesOrderID="43793"
Status="5" /><soh
SalesOrderID="51522"
Status="5" /><soh
SalesOrderID="57418"
Status="5" />

11001 9 <soh
SalesOrderID="43767"
Status="5" /><soh
SalesOrderID="51493"
Status="5" /><soh
SalesOrderID="72773"
Status="5" />

11002 9 <soh
SalesOrderID="43736"
Status="5" /><soh
SalesOrderID="51238"
Status="5" /><soh
SalesOrderID="53237"
Status="5" />
Developing SQL Databases 14-33

Demonstration: FOR XML Queries


In this demonstration, you will see how to use FOR XML queries.

Demonstration Steps
1. Ensure that you have completed the previous demonstration.

2. In SSMS, in Solution Explorer, double-click Demonstration 4.sql.

3. Select the code under Step 1, and then click Execute.


4. Select the code under Step 2, click Execute to execute RAW mode queries, and then review the
results.

5. Select the code under Step 3, click Execute to execute an AUTO mode query, and then review the
results.

6. Select the code under Step 4, click Execute to execute an EXPLICIT mode query, and then review the
results.
7. Select the code under Step 5, click Execute to execute PATH mode queries, and then review the
results.

8. Select the code under Step 6, click Execute to execute a query using TYPE, and then review the
results.

9. Select the code under Step 7, click Execute to run the same query without using the TYPE keyword,
and then compare the results with those obtained in the previous step.

10. Leave SSMS open for the next demonstration.

Check Your Knowledge


Question

Which of the following is not true about


RAW mode queries?

Select the correct answer.

By default, they produce XML


fragments.

They produce nested XML based on


SELECT heuristics.

A root node can be added with the


ROOT option.

XSINIL will result in NULL columns


being added to the results.
14-34 Storing and Querying XML Data in SQL Server

Lesson 5
Getting Started with XQuery
XQuery allows you to query XML data. Sometimes data is already in XML and you need to query it
directly. You might want to extract part of the XML into another XML document; you might want to
retrieve the value of an element or attribute; you might want to check whether an element or attribute
exists; and finally, you might want to directly modify the XML. XQuery methods make it possible to
perform these tasks.

Lesson Objectives
After completing this lesson, you will be able to:

 Explain the role of XQuery.

 Use the query() method.

 Use the value() method.

 Use the exist() method.

 Use the modify() method.

What is XQuery?
XQuery is a query language that is designed to
query XML documents. It also includes elements of
other programming languages, such as looping
constructs.
XQuery was developed by a working group within
the World Wide Web Consortium. It was
developed in conjunction with other work in the
W3C, in particular, the definition of Extensible
Stylesheet Language Transformations (XSLT). XSLT
makes use of a subset of XQuery that is known as
XPath.

XPath is the syntax that is used to provide an


address for specific attributes and elements within an XML document. You saw basic examples of this
when you were considering FOR XML PATH mode queries in the previous lesson.
Look at the following XPath expression:

XPath Expression
/SalesHistory/Sale[@InvoiceNo=635]

This XPath expression specifies a need to traverse the SalesHistory node—that is the root element
because the expression starts with a slash mark (/)—then traverse the Sale subelements (note that there
may be more than one of these), and then to access the InvoiceNo attribute. All invoices that have an
invoice number attribute equal to 635 are returned.

Although there is unlikely to be more than one invoice with the number 635, nothing about XML syntax
(without a schema) enforces this. One thing that can be hard to get used to with the XPath syntax is that
you constantly need to specify that you want the first entry of a particular type—even though logically
Developing SQL Databases 14-35

you may think that it should be obvious that there would only be one. You indicate the first entry in a list
by the expression [1].

To return the first sales record if there are more than one with an invoice number equal to 635:

Find the First Element in an XML Document


/SalesHistory/Sale[@InvoiceNo=635][1]

In XPath, you indicate attributes by using the “at” (@) prefix. The content of the element itself is referred
to by the token text ().

FLWOR Expressions
In addition to basic path traversal, XPath supports an iterative expression language that is known as
FLWOR and commonly pronounced “flower.” FLWOR stands for “for, let, where, order, and return,” which
are the basic operations in a FLWOR query.

An example of a FLWOR expression is shown in the following XQuery query() method:

FLWOR Expression
SELECT @xmlDoc.query
('
<OrderedItems>
{
for $i in /InvoiceList/Invoice/Items/Item
return $i
}
</OrderedItems>
');

This query supplies OrderedItems as an element. Then, within that element, it locates all items on all
invoices that are contained in the XML document and displays them as subelements of the OrderedItems
element. An example of what the output may look like from this query is shown here:

<OrderedItems>
<Item Product=”1” Price=”1.99” Quantity=”2” />
<Item Product=”3” Price=”2.49” Quantity=”1” />
<Item Product=”1” Price=”1.99” Quantity=”2” />
</OrderedItems>

Note that becoming proficient at XQuery is an advanced topic that is beyond the scope of this course. The
aim of this lesson is to make you aware of what is possible when you are using XQuery methods. The
available XQuery methods are shown in the following table:

Method Description

query() This method returns untyped XML; the XML is selected by an XQuery expression.

value() This method returns a scalar value; it takes XQuery and a SQL Type as its
parameters.

exist() This method returns a bit value; 1 if a node is found to exist; 0 if a node isn’t found
for the specific XQuery expression.

modify() This method modifies the contents of an XML document based on the XML DML
expression.

nodes() This method can be used to shred XML into relational data.
14-36 Storing and Querying XML Data in SQL Server

Shredding XML is covered in more detail in the next lesson, as is the nodes() method.

Advantages of XQuery:

 It is easy to learn if you know SQL and XPath.

 When queries are written in XQuery, they require less code, compared to queries written in XSLT.
 XQuery can be used as a strongly typed language when the XML data is typed; this can improve the
performance of the query by avoiding implicit type casts and provide type assurances that can be
used when performing query optimization.

 XQuery can be used as a weakly typed language for untyped data to provide high usability. SQL
Server implements static type inferencing with support for both strong and weak type relationships.

 XQuery 3.0 became a W3C recommendation on April 8, 2014, and will be supported by major
database vendors. SQL Server currently supports the W3C version XQuery 1.0.
For more information about XQuery, see Microsoft Docs:

XQuery Language Reference (SQL Server)


https://aka.ms/Bhjwdn

The query() Method


You can use the query() method to extract XML
from an existing XML document. This document
could be stored in an XML variable or database
column.

The XML that is generated can be a subset of the


original XML document. Alternatively, you could
generate entirely new XML based on the values
that are contained in the original XML document.
You can use the query() method to return
untyped XML. The query() method takes an
XQuery expression that evaluates to a list of XML
nodes and enables users to create output XML,
based in some way on the fragments that it extracts from the input XML.

An XQuery expression in SQL Server consists of two sections: a prolog and a body. The prolog can contain
a namespace declaration. You will see how to do this later in this module. The body of an XQuery
expression contains query expressions that define the result of the query. Both the input and output of a
query() method are XML.

Note that, if NULL is passed to a query() method, the result that the method returns is also NULL.

Example of a query() Method


A query to return the revenue and number of staff from the AdventureWorks example database:

query() Method
WITH XMLNAMESPACES ( DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/StoreSurvey' )
SELECT Top 3 BusinessEntityID,
Demographics.query('(/StoreSurvey/AnnualRevenue)') AS Revenue,
Demographics.query('(/StoreSurvey/NumberEmployees)') As Staff
FROM Sales.Store;
Developing SQL Databases 14-37

This query tells SQL Server to return the business entity id, the annual revenue, and the number of staff for
every row in the Sales.Store table. Do not be too concerned with the namespace declaration in this
example; because the XML document in this column has a defined namespace, the query() method needs
to be made aware of it. Running the previous example will return rows in the following format:

BusinessE
Revenue Staff
ntityID

292 <p1:AnnualRevenue <p1:NumberEmployees


xmlns:p1="http://schemas.microsoft.com/ xmlns:p1="http://schemas.microsoft.com/
sqlserver/2004/07/adventure- sqlserver/2004/07/adventure-
works/StoreSurvey">80000</p1:AnnualRe works/StoreSurvey">13</p1:NumberEmpl
venue> oyees>

294 <p1:AnnualRevenue <p1:NumberEmployees


xmlns:p1="http://schemas.microsoft.com/ xmlns:p1="http://schemas.microsoft.com/
sqlserver/2004/07/adventure- sqlserver/2004/07/adventure-
works/StoreSurvey">80000</p1:AnnualRe works/StoreSurvey">14</p1:NumberEmpl
venue> oyees>

296 <p1:AnnualRevenue <p1:NumberEmployees


xmlns:p1="http://schemas.microsoft.com/ xmlns:p1="http://schemas.microsoft.com/
sqlserver/2004/07/adventure- sqlserver/2004/07/adventure-
works/StoreSurvey">80000</p1:AnnualRe works/StoreSurvey">15</p1:NumberEmpl
venue> oyees>

The value() Method


The value() method is useful for extracting scalar
values from XML documents as a relational value.
This method takes an XQuery expression that
identifies a single node, and the desired SQL type
to be returned. The value of the XML node is
returned cast to the specified SQL type.

Example of a value() method


Using the previous example as a starting point, the
value method can be used to return the contained
values as SQL Types.

value() Method
WITH XMLNAMESPACES ( DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/StoreSurvey' )
SELECT Top 3 BusinessEntityID,
Demographics.value('(/StoreSurvey/AnnualRevenue/text())[1]',’decimal’) AS
Revenue,
Demographics.value('(/StoreSurvey/NumberEmployees/text())[1]','int') As
Staff
FROM Sales.Store;

The previous Transact-SQL makes a few amendments to the XQuery, the main one being the addition of
[1] to return the first element to the value() method. The value method takes a second parameter that is
the SQL type that needs to be returned. There are some exclusions on the SQL type that can be returned.
The types not allowed are the xml data type, a common language runtime (CLR) user-defined type, image,
14-38 Storing and Querying XML Data in SQL Server

text, ntext, or sql_variant data type. In the previous example, the value() method is returning a decimal
value for the revenue, and an integer for the number of staff. The result of executing this code is:

BusinessEntityID Revenue Staff

292 80000 13

294 80000 14

296 80000 15

The exist() Method


Use the exist() method to check for the existence
of a specified value. The exist() method enables
the user to perform checks on XML documents to
determine whether the result of an XQuery
expression is empty or nonempty. The result of
this method is:

 1, if the XQuery expression returns a


nonempty result.
 0, if the result is empty.

 NULL, if the XML instance itself is NULL.

Whenever possible, use the exist() method on the


xml data type instead of the value() method. The exist() method will perform better and is most helpful
when used in a SQL WHERE clause. The query will utilize XML indexes more effectively than the value()
method.

Example of an exist() Method


Amend the Transact-SQL code to return only the stores that have a specific number of employees.

exist() Method
WITH XMLNAMESPACES ( DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/StoreSurvey' )
SELECT TOP 3 BusinessEntityID,
Demographics.value('(/StoreSurvey/AnnualRevenue)[1]','decimal') AS Revenue,
Demographics.value('(/StoreSurvey/NumberEmployees)[1]','int') As Staff
FROM Sales.Store
WHERE Demographics.exist('/StoreSurvey[NumberEmployees=14]') = 1;

The previous example will return all the stores in the Sales.Store table that have exactly 14 members of
staff. The WHERE clause can make use of valid XQuery expression. Running the above will result in the
following:

BusinessEntityID Revenue Staff

294 80000 14

344 80000 14

372 80000 14
Developing SQL Databases 14-39

The modify() Method


You can perform data manipulation operations on
an XML instance by using the modify() method.
The modify() method changes the contents of an
XML document.

You can use the modify() method to alter the


content of an xml type variable or column. This
method takes an XML data manipulation language
(DML) statement to insert, update, or delete nodes
from the XML data. You can only use the modify()
method of the xml data type in the SET clause of
an UPDATE statement. You can insert, delete, and
update one or more nodes by using the insert,
delete, and replace value of keywords, respectively.

Note that, unlike the previous methods, an error is returned if NULL is passed to the modify() method.

Example of an Insert modify() Method


In the following example, all stores that have a staff level of 14 have an extra <Comments> element
added after NumberEmployees:

Insert modify() Method


WITH XMLNAMESPACES ( DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/StoreSurvey' )
UPDATE Sales.Store
SET Demographics.modify('
insert(<Comments>Problem with staff levels</Comments>)
after(/StoreSurvey/NumberEmployees)[1]
')
WHERE Demographics.exist('/StoreSurvey[NumberEmployees=14]') = 1;

The previous example will change all the rows to have the following XML structure; note the added
comments element at the bottom of the XML document:

<StoreSurvey xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/StoreSurvey">
<AnnualSales>800000</AnnualSales>
<AnnualRevenue>80000</AnnualRevenue>
<BankName>International Bank</BankName>
<BusinessType>BM</BusinessType>
<YearOpened>1991</YearOpened>
<Specialty>Touring</Specialty>
<SquareFeet>18000</SquareFeet>
<Brands>4+</Brands>
<Internet>T1</Internet>
<NumberEmployees>14</NumberEmployees>
<p1:Comments xmlns:p1="http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/StoreSurvey">Problem with staff levels</p1:Comments>
</StoreSurvey>
14-40 Storing and Querying XML Data in SQL Server

Example of a Delete modify() Method


This example Transact-SQL will delete the previously inserted comment elements from the XML column
Demographics:

Delete modify() Method


WITH XMLNAMESPACES ( DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/StoreSurvey' )
UPDATE Sales.Store
SET Demographics.modify('
delete (/StoreSurvey/Comments)[1]
')
WHERE Demographics.exist('/StoreSurvey[NumberEmployees=14]') = 1;

Example of a Replace modify() Method


In this example, the banking operation for any store where the number of employees is greater than 99
needs to be changed to a different bank.

Replace modify() Method


WITH XMLNAMESPACES ( DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/StoreSurvey' )
UPDATE Sales.Store
SET Demographics.modify('
replace value of (/StoreSurvey/BankName)[1]
with "High Value Stores Banking"
')
WHERE Demographics.exist('/StoreSurvey[NumberEmployees>99]') = 1;

Demonstration: XQuery Methods in a DDL Trigger


In this demonstration, you will see how to use XQuery in DDL triggers.

Demonstration Steps
1. Ensure that you have completed the previous demonstration.
2. In SSMS, in Solution Explorer, double-click Demonstration 5.sql.

3. Select the code under Step 1, and then click Execute.

4. Select the code under Step 2, and then click Execute to create the trigger.

5. Select the code under Step 3, and then click Execute to test the trigger.

6. Select the code under Step 4, and then click Execute to drop the trigger.
7. Select the code under Step 5, and then click Execute to create a trigger to enforce naming
conventions.

8. Select the code under Step 6, and then click Execute to test the trigger. Note that the code to create
a stored procedure named sp_GetVersion fails, due to the trigger.
9. Select the code under Step 7, and then click Execute to create a trigger to enforce tables to have
primary keys.
10. Select the code under Step 8, and then click Execute to test the trigger. Note that the CREATE TABLE
statement will fail because there is no primary key defined.

11. Select the code under Step 9, and then click Execute to clean up the database.

12. Leave SSMS open for the next demonstration.


Developing SQL Databases 14-41

Categorize Activity
Categorize each item against its correct XQuery method. Indicate your answer by writing the method
number to the right of each item.

Items

1 Returns a scalar SQL Type.

2 Has insert, delete and replace


value of as options.

3 Returns 1, 0 or NULL.

4 Need to specify [1] to select a


single XML element.

5 Normally used in an UPDATE


statement.

6 Should be used in preference


to a value() method.

7 Can be used in the SELECT or


WHERE clause.

8 Normally used in the WHERE


clause.

Category 1 Category 2 Category 3

value() modify() exist()


Method Method Method
14-42 Storing and Querying XML Data in SQL Server

Lesson 6
Shredding XML
Another scenario is the need to extract relational data from an XML document. For example, you might
receive a purchase order from a customer in XML format. You then parse the XML to retrieve the details
of the items that you need to supply.

The extraction of relational data from within XML documents is referred to as “shredding” the XML
documents. There are two ways to do this. SQL Server 2000 introduced the creation of an in-memory tree
that you could then query by using an OPENXML function. Although that is still supported, SQL Server
2005 introduced the XQuery nodes() method; in many cases, this will be an easier way to shred XML data.
In addition to covering these areas in this lesson, you will also see how Transact-SQL provides a way of
simplifying how namespaces are referred to in queries.

Lesson Objectives
After completing this lesson, you will be able to:

 Describe how to shred XML data.


 Use system stored procedures for creating and managing in-memory node trees that have been
extracted from XML documents.

 Use the OPENXML function.

 Work with XML namespaces.

 Use the nodes() method.

Overview of Shredding XML Data


There are two approaches for shredding XML data.
The first is to query an in-memory tree that
represents the XML. You can use the
sp_xml_preparedocument system stored
procedure to create an in-memory node tree from
an XML document that will make querying the
XML data possible. You can then obtain relational
data from within an XML document.

Shredding XML with OPENXML


Steps for shredding XML with OPENXML could be:

1. Receive an XML document.

2. Call sp_xml_preparedocument to create an in-memory node tree, based on the input XML.

3. Use the OPENXML table-valued function to query the in-memory node tree and extract the relational
data.
4. Process the retrieved relational data with other relational data as part of standard Transact-SQL
queries.

5. Call sp_xml_removedocument removes the node tree from memory.


Developing SQL Databases 14-43

Shredding XML with the nodes() Method


Using the nodes() xml data method is the second approach to shredding and will be typically used
alongside other XQuery methods, such as value() and query() to extract the required data.

For example:

nodes() Method
WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/Resume' AS ns)
SELECT candidate.JobCandidateID,
Employer.value('ns:Emp.StartDate','date') As StartDate,
Employer.value('ns:Emp.EndDate','date') EndDate,
Employer.value('ns:Emp.OrgName','nvarchar(4000)') As CompanyName
FROM HumanResources.JobCandidate AS candidate
CROSS APPLY candidate.Resume.nodes('/ns:Resume/ns:Employment') AS Resume(Employer)
WHERE JobCandidateID IN (1,2,3);

The previous example loops through the resumés of candidates in the HumanResources.JobCandidate
table. It uses the nodes() method to select the ns:Employment XML element and obtain information
about the candidates employers. Executing the previous example returns the following results:

JobCandidateID StartDate EndDate CompanyName

1 2000- 2002- Wingtip Toys


06-01 09-30

1 1996- 2000- Blue Yonder


11-15 05-01 Airlines

1 1994- 1996- City Power and


06-10 07-22 Light

2 1994- NULL Wingtip Toys


06-15

3 1998- 2002- Trey Research


08-31 12-28

3 1995- 1998- Contoso


06-15 08-01 Pharmaceuticals

3 1993- 1995- Southridge


05-10 06-01 Video

Both methods of shredding will be discussed further in the next topics.


14-44 Storing and Querying XML Data in SQL Server

Stored Procedures for Managing In-Memory Node Trees


Before you can use the OPENXML functionality to
navigate XML documents, you should create an in-
memory node tree. This is done by using the
sp_xml_preparedocument system stored
procedure.

sp_xml_preparedocument
sp_xml_preparedocument is a system stored
procedure that takes XML either as the untyped
xml data type or as XML stored in the nvarchar
data type; creates an in-memory node tree from
the XML (to make it easier to navigate); and
returns a handle to that node tree.
sp_xml_preparedocument reads the XML text that was provided as input, parses the text by using the
Microsoft XML Core Services (MSXML) parser (Msxmlsql.dll), and provides the parsed document in a state
that is ready for consumption. This parsed document is a tree representation of the various nodes in the
XML document, such as elements, attributes, text, and comments.

Before you call sp_xml_preparedocument, you need to declare an integer variable to be passed as an
output parameter to the procedure call. When the call returns, the variable will then be holding a handle
to the node tree.

It is important to realize that the node tree must stay available and unmoved in visible memory because
the handle is basically a pointer that needs to remain valid. This means that, on 32-bit systems, the node
tree cannot be stored in Address Windowing Extensions (AWE) memory.

sp_xml_removedocument
sp_xml_removedocument is a system stored procedure that frees the memory that a node tree occupies
and invalidates the handle.

In SQL Server 2000, sp_xml_preparedocument created a node tree that was session-scoped; that is, the
node tree remained in memory until the session ended or until sp_xml_removedocument was called. A
common coding error was to forget to call sp_xml_removedocument. Leaving too many node trees to
remain in memory was known to cause a severe lack of available low-address memory on 32-bit systems.
Therefore, a change was made in SQL Server 2005 that made the node trees created by
sp_xml_preparedocument become batch-scoped rather than session-scoped. Even though the tree will
be removed at the end of the batch, it is good practice to explicitly call sp_xml_removedocument to
minimize the use of low-address memory as much as possible.

Note that 64-bit systems generally do not have the same memory limitations as 32-bit systems.
Developing SQL Databases 14-45

OPENXML Function
The OPENXML function provides a rowset over in-
memory XML documents, which is similar to a
table or a view. OPENXML gives access to the XML
data as though it is a relational rowset. It does this
by providing a rowset view of the internal
representation of an XML document.

After you have created an in-memory node tree of


an XML document by using
sp_xml_preparedocument, you can use
OPENXML to write queries against the document.
For example, you might have to extract a list of
products that you need to supply to a customer
from an XML-based order that the customer sent to you. OPENXML provides a rowset view of the
document, based on the parameters that are passed to it.

The parameters that are passed to OPENXML are: the XML document handle; a rowpattern, which is an
XPath expression that maps the nodes of XML data to rows; and a flag that indicates whether to use
attributes rather than elements by default. Associated with the OPENXML clause is a WITH clause that
provides a mapping between the rowset columns and the XML nodes.

Example Using the OPENXML Function


The following example uses OPENXML to shred data from a resumé:

OPENXML Function
DECLARE @xmldoc AS int, @xml AS xml;
SELECT @xml=Resume FROM HumanResources.JobCandidate WHERE JobCandidateID=1;
EXEC sp_xml_preparedocument @xmldoc OUTPUT,
@xml,
'<root xmlns:ns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/Resume"/>';

SELECT * FROM OPENXML(@xmldoc, '/ns:Resume/ns:Employment', 2)


WITH (
[ns:Emp.StartDate] DATETIME
, [ns:Emp.EndDate] DATETIME
, [ns:Emp.OrgName] NVARCHAR(1000)
)

EXEC sp_xml_removedocument @xmldoc;

In the above example, the OPENXML function is passed an @xml variable that contains a single XML
document, representing a resumé from the HumanResources.JobCandidate table. The XPath expression
“/ns:Resume/ns:Employment” selects the Employment nodes from the document. Finally, the optional flag
of 2 indicates to the OPENXML function that the WHERE clause is matching on elements instead of
attributes. Executing the previous Transact-SQL produces these results:

ns:Emp.StartDate ns:Emp.EndDate ns:Emp.OrgName

2000-06-01 00:00:00.000 2002-09-30 00:00:00.000 Wingtip Toys

1996-11-15 00:00:00.000 2000-05-01 00:00:00.000 Blue Yonder Airlines

1994-06-10 00:00:00.000 1996-07-22 00:00:00.000 City Power and Light


14-46 Storing and Querying XML Data in SQL Server

The optional flag is a byte value, and therefore can be a combination of the following options:

Byte
Description
value

0 Defaults to attribute-centric mapping. This is the default if no value is provided.

1 Use the attribute-centric mapping.

2 Use the element-centric mapping.

8 Can be combined (logical OR) with the previous values. In the context of retrieval, this
flag indicates that the consumed data should not be copied to the overflow property
@mp:xmltext.

Creating an Edge Table


The OPENXML function, along with shredding data, can also be used to create an edge table. An edge
table is a relational table view of the XML document. Each XML entity equates to a row in the edge table.
You can think of each row in the edge table as a node in the logical representation of the XML document.
Each row of the edge table includes a set of columns that describe the entity.

To create an edge table, simply call OPENXML without a WHERE clause.

Create an Edge Table


DECLARE @xmldoc AS int, @xml AS xml;
SELECT @xml=Resume FROM HumanResources.JobCandidate WHERE JobCandidateID=1;
EXEC sp_xml_preparedocument @xmldoc OUTPUT,
@xml,
'<root xmlns:ns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/Resume"/>';

SELECT * FROM OPENXML(@xmldoc, '/ns:Resume/ns:Employment', 2);

EXEC sp_xml_removedocument @xmldoc;

The first few rows returned from executing the preceding code are:

id parentid nodetype localname prefix namespaceuri datatype prev text

10 0 1 Employment ns http://schemas. NULL 9 NUL


microsoft.com/s L
qlserver/2004/0
7/adventure-
works/Resume

11 10 1 Emp.StartDate ns http://schemas. NULL NULL NUL


microsoft.com/s L
qlserver/2004/0
7/adventure-
works/Resume

86 11 3 #text NULL NULL NULL NULL 200


0-
06-
01Z

12 10 1 Emp.EndDate ns http://schemas. NULL 11 NUL


microsoft.com/s L
qlserver/2004/0
Developing SQL Databases 14-47

id parentid nodetype localname prefix namespaceuri datatype prev text


7/adventure-
works/Resume

87 12 3 #text NULL NULL NULL NULL 200


2-
09-
30Z

13 10 1 Emp.OrgName ns http://schemas. NULL 12 NUL


microsoft.com/s L
qlserver/2004/0
7/adventure-
works/Resume

88 13 3 #text NULL NULL NULL NULL Win


gtip
Toys

14 10 1 Emp.JobTitle ns http://schemas. NULL 13 NUL


microsoft.com/s L
qlserver/2004/0
7/adventure-
works/Resume

89 14 3 #text NULL NULL NULL NULL Lea


d
Mac
hinis
t

Working with XML Namespaces


Earlier in this module, you saw how an XML
namespace is a collection of names that you can
use as element or attribute names in an XML
document. The namespace qualifies names
uniquely to avoid naming conflicts with other
elements that have the same name.

Namespaces can be specified in several


ways
When being used with the
sp_xml_preparedocument, the namespace is
declared as the last parameter:

EXEC sp_xml_preparedocument @xmldoc OUTPUT, @xml,


'<root xmlns:ns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/Resume"/>';
14-48 Storing and Querying XML Data in SQL Server

Namespaces can be used in one of two ways when working with the xml data methods and FOR XML
statements. The first way requires that they are repeated for every method call where they are required.
Taking a previous query() example and rewriting it:

SELECT Top 3 BusinessEntityID,


Demographics.query('
declare default element namespace
"http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/StoreSurvey";
/StoreSurvey/AnnualRevenue
') AS Revenue,
Demographics.query('
declare default element namespace
"http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/StoreSurvey";
/StoreSurvey/NumberEmployees
') As Staff
FROM Sales.Store;

The preferred method for referencing an XML namespace is to use the WITH statement. The benefits are
that it only has to be declared once at the beginning of the query. The WITH statement can be used for
both FOR XML statements and xml data method calls. For example:

WITH XMLNAMESPACES (DEFAULT


'http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/StoreSurvey')
SELECT Top 3 BusinessEntityID,
Demographics.query('(/StoreSurvey/AnnualRevenue)') AS Revenue,
Demographics.query('(/StoreSurvey/NumberEmployees)') As Staff
FROM Sales.Store;

nodes() Method
The nodes() method provides an easier way to
shred XML into relational data than OPENXML and
its associated system stored procedures.

nodes() Method
The nodes() method is an XQuery method that is
useful when you want to shred an xml data type
instance into relational data. It is a table-valued
function that enables you to identify nodes that
will be mapped into a new relational data row.

Every xml data type instance has an implicitly


provided context node. For the XML instance that
is stored in a column or a variable, this is the document node. The document node is the implicit node at
the top of every xml data type instance.
The result of the nodes() method is a rowset that contains logical copies of the original XML instances. In
these logical copies, the context node of every row instance is set to one of the nodes that is identified
with the query expression. This means that subsequent queries can navigate, relative to these context
nodes.

You should be careful about the query plans that are generated when you use the nodes() method. In
particular, no cardinality estimates are available when you use this method. This has the potential to lead
to poor query plans. In some cases, the cardinality is simply estimated to be a fixed value of 10,000 rows.
This might cause an inappropriate query plan to be generated if your XML document contained only a
handful of nodes.
Developing SQL Databases 14-49

CROSS APPLY and Table-Valued Functions


The nodes() method is a table-valued function that is normally called by using the CROSS APPLY or
OUTER APPLY operations.

APPLY operations cause table-valued functions to be called for each row in the left table of the query.

The following example searches for any telephone numbers that have been captured by support staff and
recorded as additional contact information:

CROSS APPLY and Table-Valued Functions


WITH XMLNAMESPACES (
'http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ContactTypes' as act,
DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ContactInfo'
)
SELECT person.BusinessEntityID,
person.LastName,
contact.value('act:number','nvarchar(4000)') As ChangeInContactNumber
FROM Person.Person AS person
CROSS APPLY
person.AdditionalContactInfo.nodes('/AdditionalContactInfo/act:telephoneNumber') AS
helpdesk(contact)
WHERE person.AdditionalContactInfo IS NOT NULL;

In this query, for every row in the Person.Person table, where the AdditionalContactInfo column isn’t
NULL, the nodes() method is called. When table-valued functions are used in queries like this, you must
provide an alias for both the derived table and the columns that it contains. In this case, the alias provided
to the derived table is helpdesk, and the alias provided to the extracted column is contact.
One output row is being returned for each node at the level of the XPath expression /
AdditionalContactInfo/act:telephoneNumber. From the returned XML column (contact), the
ChangeInContactNumber column is generated by calling the value() method. Executing the previous
query returns these results:

BusinessEntityID LastName ChangeInContactNumber

291 Achong 425-555-1112

293 Abel 206-555-2222

293 Abel 206-555-1234

295 Abercrombie 605-555-9877

303 Smith 206-555-2222

307 Adams 253-555-4689

Note that, in these results, two telephone numbers have been returned for Abel. Looking at the XML in
the AdditionalContactInfo row, there are three “act:number” XML nodes:

 206-555-2222

 206-555-1234

 206-555-1244
14-50 Storing and Querying XML Data in SQL Server

The last number is not contained in the results because the XPath to it is /AdditionalContactInfo/act:pager
instead of /AdditionalContactInfo/act:telephoneNumber.

<AdditionalContactInfo
xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ContactInfo"
xmlns:crm="http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/ContactRecord"
xmlns:act="http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/ContactTypes">
These are additional phone and pager numbers for the customer.
<act:telephoneNumber><act:number>206-555-2222</act:number>
<act:SpecialInstructions>On weekends, contact the manager at this number.
</act:SpecialInstructions></act:telephoneNumber>
<act:telephoneNumber><act:number>206-555-1234</act:number> </act:telephoneNumber>
<act:pager><act:number>206-555-1244</act:number><act:SpecialInstructions>Do not page
between 9:00 a.m. and 5:00 p.m.</act:SpecialInstructions></act:pager>
Customer provided this additional home address…

Demonstration: Shredding XML


In this demonstration, you will see how to:

 Shred XML data by using the nodes() method.

 Shred XML using the OPENXML method.

Demonstration Steps
1. Ensure that you have completed the previous demonstration.

2. In SSMS, in Solution Explorer, double-click Demonstration 6.sql.


3. Select the code under Step 1, and then click Execute.

4. Select the code under Step 2, and then click Execute to select the contents of the dbo.DatabaseLog
table.

5. In the results pane, in the XmlEvent column, click the first entry to view the format of the XML. Note
that this is the EVENTDATA structure returned by the DDL and LOGON triggers.

6. Switch back to the Demonstration 6.sql pane, select the code under Step 4, and then click Execute.
Compare the first row in the results with the data shown in the XmlEvent1.xml pane.

7. Select the code under Step 5, and then click Execute.

8. Select the code under Step 6, and then click Execute to show the same results obtained by using
OPENXML.

9. Close SSMS without saving any changes.


Developing SQL Databases 14-51

Check Your Knowledge


Question

Which of the following statements about


the OPENXML statement is false?

Select the correct answer.

It can only be used to shred XML.

You must call the


sp_xml_preparedocument to create an
in-memory node tree before using it.

You should call the


sp_xml_removedocument after using
it.

In most situations, it will perform


worse than the xml nodes() method.
14-52 Storing and Querying XML Data in SQL Server

Lab: Storing and Querying XML Data in SQL Server


Scenario
A new developer in your organization has discovered that SQL Server can store XML directly. He is keen to
use this mechanism extensively. In this lab, you will decide on the appropriate usage of XML in the
documented application.

You also have an upcoming project that will require the use of XML data in SQL Server. No members of
your current team have experience working with XML data in SQL Server. You need to learn how to
process XML data within SQL Server and you have been given some sample queries to assist with this
learning. Finally, you will use what you have learned to write a stored procedure for the marketing system
that returns XML data.

Objectives
After completing this lab, you will be able to:

 Determine appropriate use cases for storing XML in SQL Server.


 Test XML storage in variables.

 Retrieve information about XML schema collections.

 Query SQL Server data as XML.

 Write a stored procedure that returns XML.

Estimated Time: 45 minutes

Virtual machine: 20762C-MIA-SQL

User name: ADVENTUREWORKS\Student

Password: Pa55w.rd

Exercise 1: Determining When to Use XML


Scenario
In this exercise, you should read the list of scenarios that your new developer has provided. Determine
which are appropriate for XML storage in SQL Server, and which are not. Write “Yes” or “No” next to each
scenario.

Scenarios

Scenario Requirements

Existing XML data that is stored, but not processed.

Storing attributes for a customer.

Relational data that is being passed through a


system, but not processed in it.

Storing attributes that are nested (that is, attributes


stored within attributes).

The main tasks for this exercise are as follows:

1. Prepare the Lab Environment

2. Review the List of Scenario Requirements


Developing SQL Databases 14-53

 Task 1: Prepare the Lab Environment


1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are both running, and then
log on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.

2. Run Setup.cmd in the D:\Labfiles\Lab14\Starter folder as Administrator.

 Task 2: Review the List of Scenario Requirements


1. Review the list of requirements.

2. For each requirement, determine whether it is suitable for XML storage.

Results: After completing this exercise, you will have determined the appropriate use cases for XML
storage.

Exercise 2: Testing XML Data Storage in Variables


Scenario
In this exercise, you will explore how XML data is stored in variables. From a set of sample XML queries,
you will review the effect of executing each query.

The main tasks for this exercise are as follows:

1. Review, Execute, and Review the Results of the XML Queries

 Task 1: Review, Execute, and Review the Results of the XML Queries
1. Open SQL Server Management Studio and connect to the MIA-SQL instance of SQL Server using
Windows authentication.

2. Open D:\Labfiles\Lab14\Starter\InvestigateStorage.sql.
3. For each query in the file, review the code, execute the code, and determine how the output results
relate to the queries.

4. Close InvestigateStorage.sql without saving any changes.

5. Leave SSMS open for the next exercise.

Results: After this exercise, you will have seen how XML data is stored in variables.
14-54 Storing and Querying XML Data in SQL Server

Exercise 3: Using XML Schemas


Scenario
For some of the XML processing that you will perform in your upcoming project, you need to validate
XML data by using XML schemas. In SQL Server, XML schemas are stored in XML schema collections. You
need to investigate how these schemas are used. You have been given a set of sample queries to assist
with this. In this exercise, you will review the effect of executing these queries.

The main tasks for this exercise are as follows:

1. Review, Execute, and Review the Results of the XML Queries

 Task 1: Review, Execute, and Review the Results of the XML Queries
1. Open D:\Labfiles\Lab14\Starter\XMLSchema.sql.

2. For each query in the file, review the code, execute the code, and determine how the output results
relate to the queries.

3. Leave SSMS open for the next exercise.

Results: After this exercise, you will have seen how to create XML schema collections.

Exercise 4: Using FOR XML Queries


Scenario
For some of the XML processing that you must perform in your upcoming project, you will need to return
XML data. You should investigate how to do this. You have been given a set of sample queries to assist
with this. In this exercise, you will review the effect of executing these queries.
The main tasks for this exercise are as follows:

1. Review, Execute, and Review the Results of the FOR XML Queries

 Task 1: Review, Execute, and Review the Results of the FOR XML Queries
1. Open D:\Labfiles\Lab14\Starter\XMLQuery.sql.
2. For each query in the file, review the code, execute the code, and determine how the output results
relate to the queries.

3. Leave SSMS open for the next exercise.

Results: After this exercise, you will have seen how to use FOR XML.
Developing SQL Databases 14-55

Exercise 5: Creating a Stored Procedure to Return XML


Scenario
A new web service is being added to the marketing system. In this exercise, you need to create a stored
procedure that will query data from a table and return it as an XML value.

Supporting Documentation

Stored Procedure Specifications


Stored
Production.GetAvailableModelsAsXML
Procedure

Input None.
Parameters:

Output None.
Parameters:

Returned One XML document with attribute-centric XML.


Rows: Root element is AvailableModels.
Row element is AvailableModel.
Row contains ProductID, ProductName, ListPrice, Color and SellStartDate (from
Marketing.Product) and ProductModelID and ProductModel (from
Marketing.ProductModel) for rows where there is a SellStartDate but not yet a
SellEndDate.

Output Rows within the XML should be in order of SellStartDate ascending, and then
Order: ProductName ascending. That is, sort by SellStartDate first, and then ProductName
within SellStartDate.

Stored
Sales.UpdateSalesTerritoriesByXML
Procedure

Input @SalespersonMods xml.


Parameters:

Output None.
Parameters:

Returned None.
Rows:

Actions: Update the SalesTerritoryID column in the Sales.Salesperson table, based on the
SalesTerritoryID values extracted from the input parameter.
14-56 Storing and Querying XML Data in SQL Server

Incoming XML Object Format


This is an example of the incoming XML:

This is an example of the incoming XML:

Incoming XML Object Format


<SalespersonMods>
<SalespersonMod BusinessEntityID="274">
<Mods>
<Mod SalesTerritoryID="3"/>
</Mods>
</SalespersonMod>
<SalespersonMod BusinessEntityID="278">
<Mods>
<Mod SalesTerritoryID="4"/>
</Mods>
</SalespersonMod>
</SalespersonMods>

The main tasks for this exercise are as follows:

1. Review the Requirements


2. Create a Stored Procedure to Retrieve Available Models

3. Test the Stored Procedure

4. If Time Permits: Create a Stored Procedure to Update the Sales Territories Table

 Task 1: Review the Requirements


 Review the stored procedure specification for WebStock.GetAvailableModelsAsXML in the exercise
scenario.

 Task 2: Create a Stored Procedure to Retrieve Available Models


 Create and implement the Production.GetAvailableModelsAsXML stored procedure based on the
specifications that are provided.

 Task 3: Test the Stored Procedure


1. Test the stored procedure by executing it.

2. Review the returned data.

 Task 4: If Time Permits: Create a Stored Procedure to Update the Sales Territories
Table
1. If time permits, implement the Sales.UpdateSalesTerritoriesByXML stored procedure.

2. Test the created stored procedure with the example incoming XML.

3. Close SSMS without saving any changes.

Results: After this exercise, you will have a new stored procedure that returns XML in the AdventureWorks
database.
Developing SQL Databases 14-57

Module Review and Takeaways


Best Practice: This module has considered a variety of different aspects of using XML data
within SQL Server, including storing XML data, querying XML data, performance and XML
indexes, and shredding XML data.

Review Question(s)
Question: Which XML query mode did you use for implementing the
WebStock.GetAvailableModelsAsXML stored procedure?
15-1

Module 15
Storing and Querying Spatial Data in SQL Server
Contents:
Module Overview 15-1
Lesson 1: Introduction to Spatial Data 15-2

Lesson 2: Working with SQL Server Spatial Data Types 15-7

Lesson 3: Using Spatial Data in Applications 15-15


Lab: Working with SQL Server Spatial Data 15-20

Module Review and Takeaways 15-23

Module Overview
This module describes spatial data and how this data can be implemented within SQL Server®.

Objectives
After completing this module, you will be able to:

 Describe how spatial data can be stored in SQL Server.

 Use basic methods of the GEOMETRY and GEOGRAPHY data types.


 Query databases containing spatial data.
15-2 Storing and Querying Spatial Data in SQL Server

Lesson 1
Introduction to Spatial Data
Many business applications work with addresses or locations, so it is helpful to understand the different
spatial data types, and where they are typically used. SQL Server can process both planar and geodetic
data. In this lesson, we will also consider how the SQL Server data types relate to industry standard
measurement systems.

Lesson Objectives
After completing this lesson, you will be able to:

 Explain how spatial data is useful in a wide variety of business applications.

 Describe the different types of spatial data.

 Describe the difference between planar and geodetic data types.

 Explain the relationship between the spatial data support in SQL Server and the industry standards.

 Work with spatial reference identifiers to provide measurement systems.

Target Applications
There is a perception that spatial data is not useful
in mainstream applications. However, this
perception is invalid: most business applications
can benefit from the use of spatial data.

Business Applications
Although mapping provides an interesting
visualization in some cases, business applications
can make good use of spatial data for more
routine tasks. Almost all business applications
involve the storage of addresses or locations.
Customers or clients have street addresses, mailing
addresses, and delivery addresses. The same is true
for stores, offices, suppliers, and many other business-related entities.

Business Intelligence Applications


Business intelligence applications make particularly strong use of spatial data. These applications often
deal with results that are best visualized rather than being presented as tables of numbers. Spatial
capabilities make it possible to provide very rich forms of visualization.

Common Business Questions


Consider a pet accessories supply company that has stores all over the country. They know where their
stores are, and they know where their customers live. The owner suspects that the company’s customers
are not buying from their nearest store, but has no firm facts to back this up.

It could be true that customers really do purchase from their local store and the owner was misled by a
small sample of data. Or perhaps customers really do travel to a store other than their local branch
because the products they require are not stocked locally.
Developing SQL Databases 15-3

These sorts of questions are normal in most businesses, and you can answer them quite easily if you
process spatial data in a database.

Types of Spatial Data

SQL Server works with vector-based, two-


dimensional (2-D) data, but has some storage
options for three-dimensional (3-D) values.

Vector vs. Raster Data


You can store spatial data either as a series of line
segments that together form an overall shape
(vector storage) or as a series of dots or pixels that
are formed by dividing a shape into smaller pieces
(raster storage).
Vector storage is the method on which spatial
data in SQL Server is based. One advantage of vector-based storage is the way that it can scale. Imagine
storing the details of a line. You could divide the line into a series of dots that make up the line. However,
if you then zoomed in to an image of the line, the individual dots would become visible, along with the
gaps between the dots. This is how raster-based storage works. Alternatively, if the line was stored as the
coordinates of the start and end points of the line, it would not matter how much you zoomed in or out,
the line would still look complete. This is because it would be redrawn at each level of magnification. This
is how vector-based storage works.

To store raster data in SQL Server, you could use the varbinary data type, although you would not be able
to directly process the data.

2-D, 3-D, and 4-D


You are probably familiar with seeing two-dimensional drawings or maps on paper. A third dimension
would represent the elevation of a point on the map. Four-dimensional (4-D) systems usually incorporate
changes in a shape over time.

Spatial data in SQL Server is based on 2-D technology. In some of the objects and properties that it
provides, spatial data in SQL Server supports the storage and retrieval of 3-D and 4-D values, but it is
important to realize that the third and fourth dimensions are ignored during calculations. This means that
if you calculate the distance between, say, a point and a building, the calculated distance is the same,
regardless of the floor or level in the building where the point is located.
For more information about the various types of spatial data, see Microsoft Docs:

Spatial Data Types Overview


http://aka.ms/irtf1a
15-4 Storing and Querying Spatial Data in SQL Server

Planar vs. Geodetic


Planar systems represent the Earth as a flat surface.
Geodetic systems represent the Earth more like its
actual shape.

Planar Systems
Before the advent of computer systems, it was very
difficult to perform calculations on round models
of the Earth. For convenience, mapping tended to
be two-dimensional. Most people are familiar with
traditional flat maps of the world.

However, as soon as larger distances are involved,


flat maps provide a significant distortion,
particularly as you move from the center of the map. When most of the standard maps from atlases were
first drawn, they were oriented around where the people who were drawing the maps lived. That meant
that the least distortion occurred where the people who were using the maps were based.

Geodetic Systems
Geodetic systems represent the Earth as a round shape. Some systems use simple spheres, but in fact the
Earth is not spherical. Spatial data in SQL Server offers several systems for representing the shape of the
Earth. Most systems model the Earth as an ellipsoid rather than as a sphere.

OGC Object Hierarchy


The Open Geospatial Consortium (OGC) is the
industry body that provides specifications for how
processing of spatial data should occur in systems
that are based on Structured Query Language
(SQL).

SQL Specification
One of the two data types that SQL Server
provides is the geometry data type. It conforms to
the OGC Simple Features for SQL Specification
version 1.1.0, and is used for planar spatial data. In
addition to defining how to store the data, the
specification details common properties and
methods to be applied to the data.

The OGC defines a series of data types that form an object tree. Curved arc support was added in SQL
Server 2012.

Extensions
SQL Server also extends the standards in several ways—it provides a round-earth data type called
geography, along with several additional useful properties and methods.

Methods and properties that are related to the OGC standard have been defined by using an ST prefix
(such as STDistance). Those without an ST prefix are Microsoft® extensions to the standard (such as
MakeValid).
Developing SQL Databases 15-5

Spatial Reference Identifiers


SQL Server supports many measurement systems
directly. When you specify a spatial data type in
SQL Server, you also specify the measurement
system used. You do this by associating a spatial
reference ID with the data. A spatial reference ID
of zero indicates the lack of a measurement
system, and is common when there is no need for
a specific measurement system.

Spatial Reference Systems


Any model of the Earth is an approximation, but
some models are closer to reality than others. SQL
Server supports many different Earth models by
using a series of spatial reference identifiers (SRIDs). Each SRID defines the shape of the Earth model, the
authority that is responsible for maintaining it, the unit of measure that is used, and a multiplier that
determines how to convert the unit of measurement to meters.

SRID 4326
The World Geodetic System (WGS) is commonly used in cartography, geodetics, and navigation. The latest
standard is WGS 1984 (WGS 84) and is best known to most people through the Global Positioning System
(GPS). GPS is often used in navigation systems and uses WGS 84 as its coordinate system.

In spatial data in SQL Server, SRID 4326 provides support for WGS 84.

If you query the list of SRIDs in SQL Server, the entry for SRID 4326 has the following name. This is
formally called the Well-Known Text (WKT) that is associated with the ID:

If you query the list of SRIDs in SQL Server, the entry for SRID 4326 has the following name. This is
formally called the Well-Known Text (WKT) that is associated with the ID:

WGS 84
GEOGCS["WGS 84", DATUM["World Geodetic System 1984", ELLIPSOID["WGS 84", 6378137,
298.257223563]], PRIMEM["Greenwich", 0], UNIT["Degree", 0.0174532925199433]]

WGS 84 models the Earth as an ellipsoid (you can imagine it as a squashed ellipsoid), with its major radius
of 6,378,137 meters at the equator, a flattening of 1/ 98.257223563 (or about 21 kilometers) at the poles,
a prime meridian (that is, a starting point for measurement) at Greenwich, and a measurement that is
based on degrees. The starting point at Greenwich is specifically based at the Royal Observatory. The units
are shown as degrees and the size of a degree is specified in the final value in the definition. Most
geographic data today would be represented by SRID 4326.

For more information about spatial reference identifiers, see Microsoft Docs:
Spatial Reference Identifiers (SRIDs)
http://aka.ms/bd0j1v
15-6 Storing and Querying Spatial Data in SQL Server

Demonstration: Spatial Reference Systems


In this demonstration, you will see how to:

 View the available special reference systems.

Demonstration Steps
1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are running, and then log
on to 20762C-MIA-SQL as AdventureWorks\Student with the password Pa55w.rd.

2. Run D:\Demofiles\Mod15\Setup.cmd as an administrator.

3. In the User Account Control dialog box, click Yes. When the script completes, press any key.

4. Start SQL Server Manager Studio, and connect to the MIA-SQL instance using Windows
authentication.
5. On the File menu, point to Open, and then click Project/Solution.

6. In the Open Project dialog box, navigate to D:\Demofiles\Mod15, click 20762_15.ssmssln, and
then click Open.
7. In Solution Explorer, expand Queries, and then double-click the 11 - Demonstration 1A.sql script.

8. Highlight the Transact-SQL under the comment Step 1 - Switch to the tempdb database, and click
Execute.
9. Highlight the Transact-SQL under the comment Step 2 - Query the sys.spatial_reference_systems
system view, and click Execute.

10. Highlight the Transact-SQL under the comment Step 3 - Drill into the value for srid 4326, and click
Execute.

11. Highlight the Transact-SQL under the comment Step 4 - Query the available measurement
systems, and click Execute.

12. On the File menu, click Close.

Check Your Knowledge


Question

Which existing SQL Server data type


could you use to store, but not directly
process, raster data?

Select the correct answer.

varchar

varbinary

int

string
Developing SQL Databases 15-7

Lesson 2
Working with SQL Server Spatial Data Types
SQL Server supports two spatial data types, geometry and geography. They are both system common
language runtime (CLR) data types. This lesson introduces each of these data types, and shows how to
interchange data by using industry-standard formats.

Lesson Objectives
After completing this lesson, you will be able to:

 Describe the support that spatial data in SQL Server provides.

 Explain how system CLR types differ from user defined CLR types.

 Use the geometry data type.

 Use the geography data type.

 Work with standard spatial data formats.


 Use OGC methods and properties on spatial data.

 Use Microsoft extensions to the OGC standard when working with spatial data.

SQL Server Spatial Data


SQL Server provides rich support for spatial data. It
provides two data types: the geometry data type,
which is suited to flat-earth (planar) models, and
the geography data type, which is suited to
round-earth (geodetic) models.

geometry Data Type


The geometry data type is the SQL Server
implementation of the OGC Geometry data type.
It supports most of the methods and properties of
the OGC type plus extensions to the OGC type.
You use the geometry data type when you are
modeling flat-earth models such as two-
dimensional diagrams. The geometry data type offers a coordinate system based on X and Y.

geography Data Type


The geography data type is a Microsoft extension to the OGC standard that is suitable when you are
working with round-earth models such as GPS data. The geography data type works with a coordinate
system based on longitude and latitude.

Note: Note that, although “latitude and longitude” is a commonly used phrase, the
geographical community uses the terminology in the reverse order. When you are specifying
inputs for geographic data in SQL Server, the longitude value precedes the latitude value.
15-8 Storing and Querying Spatial Data in SQL Server

Additional Support
The Microsoft Bing® Maps software development kit (SDK) integrates closely with spatial data in SQL
Server. SQL Server Reporting Services includes a map control that you can use to render spatial data and a
wizard to help to configure the map control. The map control is available for reports built using Business
Intelligence Development Studio or Report Builder.

An application that stores or retrieves spatial data from a database in SQL Server needs to be able to work
with that data as a spatial data type. To make this possible, a separate installer (MSI) file is provided as
part of the SQL Server Feature Pack, so that client applications can use the spatial data types in SQL
Server. Installing the feature pack on client systems causes an application on the client to “rehydrate” a
geography object that has been read from a SQL Server database into a SqlGeography object within
.NET managed code.

ST Prefix
An ST prefix has been added to the properties and methods that are implementations of the OGC
standards. For example, the X and Y coordinates of a geometry object are provided by STX and STY
properties, and the distance calculation is provided by the STDistance method.

Microsoft extensions to the OGC standards have no prefix added to the name of the methods or
properties. You should take care when referring to properties and methods because they are case-
sensitive, even on servers configured for case-insensitivity.

System vs. User SQL CLR Types


Initially user-defined CLR data types were limited
to data that could be serialized into 8 KB of
storage.
SQL Server 2008 introduced larger CLR object
types, increasing the limit to 2 GB. SQL Server
2008 also introduced the concept of “system” CLR
types and all previous types are now considered
“user” types.
There is a column in sys.assemblies that indicates
whether an assembly is a system or user assembly.
System CLR types are turned on regardless of the
CLR-enabled configuration setting. The setting
only affects code in user types. Due to this change, the spatial data types will work even when the CLR-
enabled configuration setting is off.

For more information about the CLR-enabled configuration setting, see Microsoft Docs:
clr enabled Server Configuration Option
http://aka.ms/oq8rnt

The geometry and geography data types are implemented as CLR types by using managed code. They
are defined as system CLR types and work even when CLR integration is not switched on at the SQL Server
instance level.
Developing SQL Databases 15-9

You can see the currently installed assemblies, and whether they are user-defined, by executing the
following query:

Currently Installed Assemblies


SELECT name,
assembly_id,
permission_set_desc,
is_user_defined
FROM sys.assemblies;

Accessing Properties and Methods


You can access a property of an instance of a spatial data type by referring to it as Instance.Property.

As an example of this, look at the following code that is accessing the STX property of a variable called
@Location:

Accessing Properties
SELECT @Location.STX;

You can call methods that are defined on the data types (geometry and geography) rather than on
instances (that is, columns or variables) of those types. This is an important distinction.
As an example of this, look at the following code that is calling the GeomFromText method of the
geometry data type:

Calling Methods Defined on Data Types


SELECT @Location = geometry::STGeomFromText('POINT (12 15)',0);

Note: Note that you are not calling the method on a column or variable of the geometry
data type, but on the geometry data type itself. In .NET terminology, this refers to this as calling
a public static method on the geometry class. Note that the methods and properties of the
spatial data types are case-sensitive, even on servers that are configured with case-insensitive
default collations.

geometry Data Type


The geometry data type is used for flat-earth
(planar) data storage and calculations, and is
implemented as a system CLR type. It provides
comprehensive coverage of the OGC standard.

The geometry data type is a two-dimensional


data type based on an X and Y coordinate system.
In the definition of the type, there is provision for
Z (elevation) and M (measure) in addition to the X
and Y coordinates. You can enter and retrieve the
Z and M values in the geometry data type, but it
ignores these values when it performs calculations.
15-10 Storing and Querying Spatial Data in SQL Server

You can see the input and output of X, Y, Z, and M in the following code:

geometry Data Type


DECLARE @Location geometry;

SELECT @Location = geometry::STGeomFromText('POINT (12 15 2 9)',0);


SELECT @Location.STAsText();
SELECT @Location.AsTextZM();

The SQL Server geometry data type provides comprehensive coverage of the OGC Geometry data type,
and has X and Y coordinates represented by STX and STY properties.

SRID and Geometry


When you are working with geometric data, the measurement system is not directly relevant. For
example, the area of a shape that measures 3 × 2 is still 6, regardless of whether 3 and 2 are in meters or
in inches. For this reason, there is no need to specify an SRID when you are working with geometric data.
When you are entering data, the SRID value is typically left as zero.

Spatial Results Viewer


When a SQL Server result includes columns of the geometry or geography data types, a special spatial
results viewer is provided for you to visualize the spatial results.

For more information on the geometry data type, see Microsoft Docs:
geometry (Transact-SQL)
http://aka.ms/hl20zm

geography Data Type


The geography data type is implemented as a
system CLR type and used for round-earth values,
typically involving actual positions or locations on
the Earth. The geography data type is an
extension to the OGC standard.

The geography data type is based on a latitude


and longitude coordinate system. Latitude and
longitude values are represented by Lat and Long
properties. Unlike the geometry data type, where
the X and Y coordinates can be any valid number,
the Lat and Long properties must relate to valid
latitudes and longitudes for the selected spatial
reference system. SRID 4326 (or WGS 84) is the most commonly used spatial reference system for working
with the geography data type. The geography data type can also store, but not process, Z and M values.

Result Size Limitations


The implementation of the geography data type in SQL Server 2008 required any geography value to be
contained within a single hemisphere. This did not mean any specific hemisphere, such as the northern or
southern hemispheres, but just that no two points could be more than half the Earth apart if they were
contained in the same instance of the geography data type. This limitation was removed in SQL Server
2012.
Developing SQL Databases 15-11

Point Order in Polygons


When you are defining the shape of a polygon by using a series of points, the order in which the points
are provided is significant. Imagine the set of points that define a postal code region. The same set of
points actually defines two regions: all of the points inside the postal code region, and all of the points
outside the postal code region.

To enclose points, they should be listed in counterclockwise order. As you draw a shape, points to the left
of the line that you draw will be enclosed by the shape. The points on the line are also included.

If you draw a postal code region in a clockwise direction, you are defining all points outside the region. In
versions of SQL Server before 2012, this would have resulted in an error because results were not
permitted to span more than a single hemisphere.

Spatial Results Viewer


A spatial results viewer is provided whenever a result set is displayed in SQL Server Management Studio,
with results that include either geometry or geography data.

For geography, the viewer is quite configurable. You can set which column to display, the geographic
projection to use for display, such as Mercator or Bonne, and you can choose to display another column
as a label over the relevant displayed region.

The spatial results viewer in SQL Server Management Studio is limited to displaying the first 5,000 objects
from the result set.

For more details on the geography data type, see Microsoft Docs:

geography (Transact-SQL)
http://aka.ms/f4ys44

Spatial Data Formats


In most cases, the internal binary format of any
CLR data type is not directly used for input and
output of the data type. You have to
accommodate string-based representations of the
data.
CLR data types (including the geometry and
geography system CLR data types) are stored in a
binary format that the designer of the data type
determines. Although it is possible to both enter
values and generate output for instances of the
data type by using a binary string, this is not
typically very helpful—you would have to have a
detailed understanding of the internal binary format.
The OGC and other organizations that work with spatial data define several formats that you can use for
interchanging spatial data. Some of the formats that SQL Server supports are:

 Well-Known Text (WKT). This is the most common string format and is readable by humans.
 Well-Known Binary (WKB). This is a more compact binary representation that is useful for
interchange between computers.

 Geography Markup Language (GML). This is the XML-based representation for spatial data.
15-12 Storing and Querying Spatial Data in SQL Server

All CLR data types must implement two string-related methods. The Parse method is used to convert a
string to the data type and the ToString method is used to convert the data type back to a string. Both of
these methods are implemented in the spatial types and both assume a WKT format.

Several variations of these methods are used for input and output. For example, the STAsText method
provides a specific WKT format as output and the AsTextZM method is a Microsoft extension that
provides the Z and M values, in addition to the two-dimensional coordinates.

For more information on the geometry data type, see Microsoft Docs:

geometry (Transact-SQL)
http://aka.ms/Oul99w

OGC Methods and Properties


A wide variety of OGC methods and properties are
provided in SQL Server, along with a number of
OGC-defined collections. Several of the common
methods and properties are described here, but
many more exist.

Common Methods
Common OGC methods include:
 The STDistance method, which returns the
distance between two spatial objects. Note
that this does not only apply to points. You
can also calculate the distance between two
polygons. The result is returned as the minimum distance between any two points on the polygons.

 The STIntersects method, which returns 1 when two objects intersect and otherwise returns 0.

 The STArea method, which returns the total surface area of a geometry instance.

 The STLength method, which returns the total length of the objects in a geometry instance. For
example, for a polygon, STLength returns the total length of all line segments that make up the
polygon.

 The STUnion method, which returns a new object that is formed by uniting all points from two
objects.

 The STBuffer method, which returns an object whose points are within a certain distance of an
instance of a geometry object.

Common Collection Properties


SQL Server provides support for several collections defined in the OGC specifications. It is possible to hold
a geometry data type in a GeometryCollection object and it can contain several other nested geometry
objects. Properties such as STPointN and STGeometryN provide access to the members of these
collections.
For more information on OGC methods and properties on geometry instances, see Microsoft Docs:

OGC Methods on Geometry Instances


http://aka.ms/Fxvhxc
Developing SQL Databases 15-13

Microsoft Extensions
In addition to the OGC properties and methods,
Microsoft has provided several useful extensions
to the standards. Several of these extensions are
described in this topic, but many more exist.

Common Extensions
Although the coverage that the OGC specifications
provide is good, Microsoft has enhanced the data
types by adding properties and methods that
extend the standards. Note that the extended
methods and properties do not have the ST prefix.

The MakeValid method takes an arbitrary shape


and returns another shape that is valid for storage in a geometry data type. SQL Server produces only
valid geometry instances, but you can store and retrieve invalid instances. You can retrieve a valid instance
that represents the same point set of any invalid instance by using the MakeValid method.
You can use the Reduce method to reduce the complexity of an object while attempting to maintain the
overall shape of the object.

The IsNull method returns 1 if an instance of a spatial type is NULL; otherwise it returns 0.

The AsGML method returns the object encoded as GML.


An example of GML is shown here:

GML
<Point xmlns="http://www.opengis.net/gml">
<pos>12 15</pos>
</Point>

GML is excellent for information interchange but the representation of objects in XML can quickly become
very large.
The BufferWithTolerance method returns a buffer around an object, but uses a tolerance value to allow
for minor rounding errors.

For more information about Microsoft extensions, see Microsoft Docs:

geometry Data Type Method Reference


http://aka.ms/tfcabe
15-14 Storing and Querying Spatial Data in SQL Server

Demonstration: Spatial Data Types


In this demonstration, you will see how to:

 Work with spatial data types in SQL Server.

Demonstration Steps
1. In SQL Server Manager, in Solution Explorer, under Queries, double-click the 21 - Demonstration
2A.sql script file.

2. Highlight the Transact-SQL under the comment Step 1 - Switch to the AdventureWorks database,
and click Execute.

3. Highlight the Transact-SQL under the comment Step 2 - Draw a shape using geometry, and click
Execute.

4. Click the Spatial results tab.

5. Highlight the Transact-SQL under the comment Step 3 - Draw two shapes, and click Execute.

6. Click the Spatial results tab.


7. Highlight the Transact-SQL under the comment Step 4 - Show what happens if you perform a
UNION rather than a UNION ALL. This will fail, as spatial types are not comparable, and click
Execute. Note the error message.

8. Highlight the Transact-SQL under the comment Step 5 - Join the two shapes together, and click
Execute.

9. Click the Spatial results tab.


10. Highlight the Transact-SQL under the comment Step 6 - How far is it from New York to Los
Angeles in meters? and click Execute.

11. Highlight the Transact-SQL under the comment Step 7 - Draw the Pentagon, and click Execute.

12. Click the Spatial results tab.

13. Highlight the Transact-SQL under the comment Step 8 - Call the ToString method to observe the
use of the Z and M values that are stored but not processed, and click Execute.
14. Highlight the Transact-SQL under the comment Step 9 - Use GML for input, and click Execute.

15. Click the Spatial results tab.

16. Highlight the Transact-SQL under the comment Step 10 - Output GML from a location (start and
end points of the Panama Canal only – not the full shape), and click Execute.

17. Click the Spatial results tab.


18. Highlight the Transact-SQL under the comment Step 11 - Show how collections can include
different types of objects, and click Execute.

19. Click the Spatial results tab.

20. On the File menu, click Close.


Question: You have used a web service to calculate the coordinates of an address. What is
this process commonly called, and what services are available?
Developing SQL Databases 15-15

Lesson 3
Using Spatial Data in Applications
Having learned how spatial data is stored and accessed in SQL Server, you now have to understand the
implementation issues that can arise when you are building applications that use spatial data.

Lesson Objectives
After completing this lesson, you will be able to:

 Understand the performance issues with spatial queries.

 Describe the different types of spatial indexes.

 Explain the basic tessellation process used within spatial indexes in SQL Server.
 Implement spatial indexes.

 Explain which geometry methods can benefit from spatial indexes.

 Explain which geography methods can benefit from spatial indexes.


 Describe options for extending spatial data support in SQL Server.

Performance Issues in Spatial Queries


Spatial queries can often involve a large number
of data points. Executing methods such as
STIntersects for a large number of points is slow.
Spatial indexes help to avoid unnecessary
calculations, reducing the need for unnecessary
complex geometric calculations.

Spatial Indexes
Spatial indexes in SQL Server are based on b-tree
structures, but unlike standard relational indexes,
which directly locate the specific rows required to
answer a query, spatial indexes work in a two-
phase manner.
The first phase, known as the primary filter,
obtains a list of rows that are of interest. The
returned rows are referred to as candidate rows
and may include false positives; that is, rows that
are not required to answer the query.

In the second phase, a secondary filter checks each


individual candidate row to locate the exact rows
required to answer the query. The secondary filter executes methods in the WHERE clause of the query on
the filtered set of candidate rows, greatly reducing the number of calculations that SQL Server has to
make, providing the spatial index has been effective.
15-16 Storing and Querying Spatial Data in SQL Server

You can check the effectiveness of a primary filter in SQL Server using the Filter method. The Filter
method only applies the primary filter, so you can compare the number of rows that the Filter method
returns to the total number of rows.

Tessellation Process
SQL Server spatial indexes use tessellation to
minimize the number of calculations that have to
be performed. The tessellation process quickly
reduces the overall number of rows to a list that
might potentially be of interest.

Tessellation Process
SQL Server breaks the problem space into relevant
areas by using a four-level hierarchical grid. Each
object is broken down and fitted into the grid
hierarchy based on which cells it touches.

Three rules are applied recursively on each grid


level to set the depth of the tessellation process, and decide which cells to record in the index. The three
rules are:

 Covering rule—any cell covered completely by an object is a covered cell and is not tessellated.

 Cells-per-object rule—sets the maximum number of cells that can be counted for any object.
 Deepest-cell rule—records the bottom most tessellated cells for an object.

Tessellation Scheme
SQL Server uses a different tessellation scheme depending on the data type of the column. Geometry grid
tessellation is used for columns of the geometry data type and Geography grid tessellation is used for
columns of the geography data type. The view sys.spatial_index_tessellations returns the tessellation rules
of a spatial index.

For more information on the spatial index tessellation process, see Microsoft Docs:

Spatial Indexes Overview


http://aka.ms/rs5bjy
Developing SQL Databases 15-17

Implementing Spatial Indexes


You execute the CREATE SPATIAL INDEX
statement, providing a name for the index, the
table on which the index is to be created, and the
spatial data column that needs to be indexed. The
table must have a clustered primary key before
you can build a spatial index on it; indexes on the
geometry data type should specify a
BOUNDING_BOX setting.

Index Bounds
Unlike traditional types of index, a spatial index is
most useful when it knows the overall area that
the spatial data covers. Spatial indexes that are
created on the geography data type do not have to specify a bounding box because the Earth itself
naturally limits the data type.
Spatial indexes on the geometry data type specify a BOUNDING_BOX setting. This provides the
coordinates of a rectangle that would contain all possible points or shapes of interest to the index. The
geometry data type has no natural boundaries so, by specifying a bounding box, SQL Server can produce
a more useful index. If values arise outside the bounding box coordinates, the primary filter would have to
return the rows in which they are contained.

Grid Density
With SQL Server, you can specify grid densities when you are creating spatial indexes. You can specify a
value for the number of cells in each grid for each grid level in the index:

 A value of LOW indicates 16 cells in each grid or a 4 × 4 cell grid.

 A value of MEDIUM indicates 64 cells in each grid or an 8 × 8 cell grid.


 A value of HIGH indicates 256 cells in each grid or a 16 × 16 cell grid.

Spatial indexes differ from other types of index because it might make sense to create multiple spatial
indexes on the same table and column. Indexes that have one set of grid densities might be more useful
than a similar index that has a different set of grid densities for locating data in a specific query.

To make spatial indexes easier to configure, SQL Server has automatic grid density and level selections:
GEOMETRY_AUTO_GRID and GEOGRAPHY_AUTO_GRID. The automated grid configuration defaults to an
eight-level grid.

Limitations
Spatial indexes do not support the use of ONLINE build operations, which are available for other types of
index in SQL Server Enterprise.

For more information on creating spatial indexes, see Microsoft Docs:

CREATE SPATIAL INDEX (Transact-SQL)


http://aka.ms/xh7lj0
15-18 Storing and Querying Spatial Data in SQL Server

geometry Methods Supported by Spatial Indexes


Not all geometry methods and predicate forms
can benefit from the presence of spatial indexes.
The following list details predicates that can
potentially make use of a spatial index as a
primary filter:

 geometry1.STContains(geometry2) = 1

 geometry1.STDistance(geometry2) < number

 geometry1.STDistance(geometry2) <=
number
 geometry1.STIntersects(geometry2)= 1

 geometry1.STOverlaps(geometry2) = 1

 geometry1.STTouches(geometry2) = 1

 geometry1.STWithin(geometry2)= 1

If the predicate in your query is not in one of these forms, spatial indexes that you create will be ignored,
potentially resulting in slower queries.
For more information on geometry methods supported by spatial indexes, see MSDN:

Geometry Methods Supported by Spatial Indexes


http://aka.ms/klss7k

geography Methods Supported by Spatial Indexes


In a similar way to the geometry data type, not all
geography methods and predicate forms can
benefit from the presence of spatial indexes. The
following list shows the specific predicates that
can potentially make use of a spatial index as a
primary filter:

 geography1.STIntersects(geography2)= 1

 geography1.STEquals(geography2)= 1
 geography1.STDistance(geography2) <
number

 geography1.STDistance(geography2) <=
number

Unless the predicate in your query is in one of these forms, spatial indexes that you create will be ignored
and query performance might be affected.

For more information on geography methods supported by spatial indexes, see MSDN:

Geography Methods Supported by Spatial Indexes


http://aka.ms/wv2g76
Developing SQL Databases 15-19

Demonstration: Spatial Data in Applications


In this demonstration, you will see how to:

 Use spatial data in SQL Server to solve some business questions.

Demonstration Steps
1. In SQL Server Manager, in Solution Explorer, under Queries, double-click the 31 - Demonstration
3A.sql script file.

2. Highlight the Transact-SQL under the comment Step 1 - Open a new query window to the
AdventureWorks database, and click Execute.

3. Highlight the Transact-SQL under the comment Step 2 - Which salesperson is closest to New
York? and click Execute.

4. Highlight the Transact-SQL under the comment Step 3 - Which two salespeople live the closest
together? and click Execute.

Note: this will take a few minutes to run.

5. Close SQL Server Management Studio without saving any changes.

Question: Where might spatial data prove useful in your organization?


15-20 Storing and Querying Spatial Data in SQL Server

Lab: Working with SQL Server Spatial Data


Scenario
Your organization has recently started to acquire spatial data within its databases. The new Marketing
database was designed before the company started to implement spatial data. One of the developers has
provided a table of the locations where prospects live, called Marketing.ProspectLocation. A second
developer has added columns to it for Latitude and Longitude and geocoded the addresses. You will
make some changes to the system to help support the need for spatial data.

Objectives
After completing this lab you will be able to:

 Use the GEOMETRY data type.

 Use the GEOGRAPHY data type.

 View the results of spatial queries in SSMS.

Estimated Time: 45 Minutes

Virtual machine: 20762C-MIA-SQL

User name: ADVENTUREWORKS\Student

Password: Pa55w.rd

Exercise 1: Become Familiar with the geometry Data Type


Scenario
You have decided to learn to write queries using the geometry data type in SQL Server. You will review
and execute scripts that demonstrate querying techniques.

The main tasks for this exercise are as follows:


1. Prepare the Lab Environment

2. Review and Execute Queries

 Task 1: Prepare the Lab Environment


1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are both running, and then
log on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.

2. In the D:\Labfiles\Lab15\Starter folder, run Setup.cmd as Administrator.

 Task 2: Review and Execute Queries


1. Start SQL Server Manager Studio and connect to the MIA-SQL database instance using Windows
authentication.

2. Open the SQL Server solution file 20762_15.ssmssln located in D:\Demofiles\Mod15\Starter.

3. Open the query 51 - Lab Exercise 1.sql.


4. Execute each section of the script individually and review the results.
Developing SQL Databases 15-21

Exercise 2: Add Spatial Data to an Existing Table


Scenario
In this lab, you have to modify an existing table, Marketing.ProspectLocation, to replace the existing
Latitude and Longitude columns with a new Location column of type geography. You must migrate the
data to the new Location column before you delete the existing Latitude and Longitude columns.

The main tasks for this exercise are as follows:

1. Add a Location Column to the Marketing.ProspectLocation Table


2. Write Code to Assign Values Based on Existing Latitude and Longitude Columns

3. Drop the Existing Latitude and Longitude Columns

 Task 1: Add a Location Column to the Marketing.ProspectLocation Table


 Add a Location column to the Marketing.ProspectLocation table in the MarketDev database.

 Task 2: Write Code to Assign Values Based on Existing Latitude and Longitude
Columns
 Write code to assign values to the Location column, based on the existing Latitude and Longitude
columns.

 Task 3: Drop the Existing Latitude and Longitude Columns


 When you are sure that the new column has the correct data, drop the existing Latitude and
Longitude columns.

Results: After this exercise, you should have replaced the existing Longitude and Latitude columns with
a new Location column.

Exercise 3: Find Nearby Locations


Scenario
Your salespeople are keen to visit prospects at their own locations, rather than just talk to them on the
phone. To minimize effort, they are keen to simultaneously see other prospects in the same area. You will
write a stored procedure that provides details of other prospects in the area. To ensure it performs
quickly, you will create a spatial index on the table.

The main tasks for this exercise are as follows:

1. Review the Requirements

2. Create a Spatial Index on the Marketing.ProspectLocation Table

3. Design and Implement the Stored Procedure

4. Test the Procedure

 Task 1: Review the Requirements


 Read the Requirements.docx located in the folder D:\Labfiles\Lab15\Starter.

 Task 2: Create a Spatial Index on the Marketing.ProspectLocation Table


 Create a spatial index on the Location column of the Marketing.ProspectLocation table.
15-22 Storing and Querying Spatial Data in SQL Server

 Task 3: Design and Implement the Stored Procedure


 Write a stored procedure that will return the details of all prospects within a given distance of a
specified prospect. The stored procedure should have input parameters for ProspectID and Distance
in kms and should output customer details in ascending distance order.

 Task 4: Test the Procedure


1. Execute the stored procedure to find all prospects within 50 kms of prospectID 2 and check the
results.

2. Close SSMS without saving anything.

Results: After completing this lab, you will have created a spatial index and written a stored procedure
that will return the prospects within a given distance from a chosen prospect.
Developing SQL Databases 15-23

Module Review and Takeaways


Best Practice: This module described spatial data and how this data can be implemented
within SQL Server.

Common Issues and Troubleshooting Tips


Common Issue Troubleshooting Tip

Choose the correct data type for your


needs.

Consider the performance of your


application.

Consider security options for tables that


hold geography and geometry data
types.
16-1

Module 16
Storing and Querying BLOBs and Text Documents in SQL
Server
Contents:
Module Overview 16-1

Lesson 1: Considerations for BLOB Data 16-2

Lesson 2: Working with FILESTREAM 16-9


Lesson 3: Using Full-Text Search 16-16

Lab: Storing and Querying BLOBs and Text Documents in SQL Server 16-26

Module Review and Takeaways 16-30

Module Overview
Traditionally, databases have been used to store information in the form of simple values—such as
integers, dates, and strings—that contrast with more complex data formats, such as documents,
spreadsheets, image files, and video files. As the systems that databases support have become more
complex, administrators have found it necessary to integrate this more complex file data with the
structured data in database tables. For example, in a product database, it can be helpful to associate a
product record with the service manual or instructional videos for that product. SQL Server provides
several ways to integrate these files—that are often known as Binary Large Objects (BLOBs)—and enable
their content to be indexed and included in search results. In this module, you will learn how to design
and optimize a database that includes BLOBs.

Objectives
After completing this module, you will be able to:

 Describe the considerations for designing databases that incorporate BLOB data.
 Describe the benefits and design considerations for using FILESTREAM to store BLOB data on a
Windows file system.

 Describe the benefits of using full-text indexing and Semantic Search, and explain how to use these
features to search SQL Server data, including unstructured data.
16-2 Storing and Querying BLOBs and Text Documents in SQL Server

Lesson 1
Considerations for BLOB Data
There are several ways you can store BLOBs and integrate them with the tabular data in a database. Each
approach has different advantages and disadvantages. You should consider how a chosen approach
affects performance, security, availability, and any other requirements. In this lesson, you will see the
principal features of the different approaches.

Lesson Objectives
After completing this lesson, you will be able to:

 Describe how BLOBs differ from structured data and list the technical challenges they present.

 Describe how BLOBs can be stored in database files by using columns with the varchar(max) data
type.

 Describe how database administrators can store BLOBs outside the database and link to them from
database tables.

 Describe the FILESTREAM feature and explain the advantages of using it.

 Describe how FileTables extend the FILESTREAM feature and provide access for desktop applications.

What Are BLOBs?


In database terminology, a BLOB is a large amount
of unstructured data. Often, but not always, a
BLOB represents the contents of a file, such as a
document, image, or video. The data in such files
is considered to be unstructured because it does
not conform to a strict schema of columns and
data types like a database table. Although a Word
document, for example, has its own internal
structure, this is not understood by the database
engine and is treated as a simple stream of ones
and zeros.

If a BLOB represents a file then its file type,


indicated by its file extension, is usually associated with one or more applications for viewing and editing.
For example, Word documents with the .docx extension are associated with Microsoft Word. Such
applications use Windows APIs to open the file and save changes. They cannot use database APIs to
access such files and are not aware of database transactions.

Technical Challenges Presented by BLOBs


If you integrate BLOBs with your database, the way data is handled changes significantly. For example:
 Storage and management costs. If you have many large BLOBs, they can consume large amounts of
disk space and backup media space. They also increase backup and restore times.

 Security. If BLOBs are stored in the database, the authorization to access them can be controlled by
using roles, logins and users, as for the authorization to access tables, views, and so on. However, if
BLOBs are stored on the file system, access must be controlled by using Windows accounts, groups,
and NTFS permissions.
Developing SQL Databases 16-3

 Indexing and searching text data. BLOBs such as Word files contain large quantities of unstructured
text. Users might like to search this text for specific words but standard SQL Server indexes do not
support it.

 Referential and transactional integrity. You must consider how create, read, update, and delete
(CRUD) operations on database rows might affect corresponding BLOBs. For example, if a product is
deleted from the product catalog, should the corresponding product manual also be deleted?

BLOB Storage Strategies


In SQL Server, you can choose from the following strategies for BLOB storage and integration:

 Storing BLOBs in the database. In this approach, you add a column to a database table with the
varbinary(MAX) datatype. The file is stored as binary information within the database file.

 Storing BLOBs on the file system. In this approach, you save BLOBs to a shared folder on a file
server. You can link files to database rows by adding a column with the varchar() or nvarchar()
datatype to the relevant tables, and using it to store a path to the file.
 FILESTREAM. If you use the FILESTREAM feature, BLOBs are stored in the file system, although they
are accessed through the database as varbinary(MAX) columns. To applications, BLOBs appear to be
stored in database tables, but the use of the file system can increase performance and simplify the
challenges of referential and transactional integrity.
 FileTables. This feature is an extension of FILESTREAM. BLOBs are stored on the file system but can
be accessed through the database. In addition, applications can access and modify BLOBs by using
the Windows APIs they would use for accessing nondatabase files.

 Remote BLOB Storage (RBS). SQL Server RBS is an optional add-on that you can use to place BLOBs
outside the database in commodity storage solutions. RBS is extensible and comes with several
different providers. Each provider supports a different kind of remote storage solution. Several third-
party vendors have created providers for their proprietary storage technologies; you can also develop
custom providers.

Storing BLOBs in the Database


In this approach, you create a new column with
the data type varbinary(max) and use it to store
BLOBs within the database itself.

Suitable Data Types


The varbinary(max) data type has a maximum
size of approximately 2 GB (2,147,483,647 bytes).
This constitutes the maximum size of file that you
can store within the database. If you expect to
store larger files than this limit, you must use
another approach to BLOBs.
SQL Server takes a flexible approach to storage for
varbinary(max) columns. If a BLOB is smaller than 8,000 bytes, it is stored in the same data page as the
other columns in the row. This maximizes performance, because the database engine does not have to
incur extra page reads to access the BLOB. However, if the BLOB is larger than 8,000 bytes, it is stored in
separate data pages with a pointer to those pages stored with the rest of the row. These data pages are in
a dedicated binary tree (a b-tree).
16-4 Storing and Querying BLOBs and Text Documents in SQL Server

Note: You can control this storage behavior by using the large value types out of row
option in the sp_tableoption stored procedure. If this option is set to 0, the behavior is as
described above. If the option is set to 1, then BLOBs are always stored in separate pages, even if
they are under the 8,000 byte limit.

Note: Early versions of SQL Server included the image data type, which was intended for
BLOB storage. This data type is still available in SQL Server but is deprecated and should not be
used.

Advantages and Disadvantages


By storing BLOBs in the database, you closely integrate them with associated data stored in other
columns. It is possible, for example, for a product manual to be stored with an associated product,
because the manual is part of the product’s table row. This has the following advantages:

 All data is stored in the database files; there is no requirement to maintain and back up a separate set
of folders on the file system where BLOBs are located.

 Restore operations are simplified, because only the database needs to be restored.

 The transactional and referential integrity of BLOB data is automatically maintained by SQL Server.

 Administrators do not need to secure a separate set of folders on the file system.
 Developers writing applications that use your database can access BLOBs by using Transact-SQL and
do not have to call separate I/O APIs.

 Full integration with Full-Text Search and Semantic Search for textual data.
However, the following issues may be considered disadvantages of this approach:

 BLOBs can only be accessed through Transact-SQL. Word, for example, cannot open Word
documents that are stored directly as database BLOBs.

 Large BLOBs may reduce read performance, because a large number of pages from the b-tree may
have to be retrieved from the disk for each BLOB.

 Although restores are simpler, they often take longer, because BLOBs typically add considerable size
to database files.

Storing BLOBs on the File System


In this approach, BLOBs are stored outside the
database on a dedicated store, such as a shared
folder on a file server. You associate a database
row with the external BLOB by storing a path to
that BLOB.
Developing SQL Databases 16-5

For example, a Products table in the database may have a column called “ManualPath” of the varchar()
data type. In the row for the “Rear Derailleur Shifter”, the ManualPath column may store the path
“\\DocumentServer\Manuals\RearShifterManual.doc”.

Note: In this example, the stored path is a Server Message Block (SMB) path to a file on a
file server. Depending on your file store, paths may be in other forms, such as URLs.

Maintaining Two Stores


The challenges presented by this approach arise from the necessity to run two storage locations—the
database and the file store.

Atomicity is a major concern. For any operation that alters one of the storage locations, you must consider
how the entry in the other location will be affected. For example, if a product is deleted from the catalog,
should the corresponding product manual be deleted from the file server? If a product’s part number is
altered in the database, does this change the need to be propagated to the BLOB—and how should it be
updated?
You must also plan how to secure both locations—using logins, users, and GRANTS for the database, and
using Windows accounts and permissions (or some other security system) for the file store.

Advantages and Disadvantages


Originally, a database was intended to store tabular structured data. Therefore, removing BLOBs from the
database has the following advantages:

 It requires little configuration in the database.


 It avoids database files growing to large volumes. This makes it easier to manage the database. For
example, backup and restore operations for the database itself become shorter.

 Read performance for large BLOBs is typically faster than it would be for BLOBs stored in the
database.
 BLOBs are less likely to become fragmented on the file system. It is easier to reduce fragmentation on
the file system and this can ensure better performance.
 Because BLOBs are stored in a shared folder, applications can access them without going through SQL
Server. For example, a user with the correct path can open a manual in Word.

The disadvantages of this approach arise from the less close integration between the BLOBs and their
corresponding database rows:

 There is no mechanism for maintaining transactional and referential integrity. For example, if a user
moves a BLOB in the file store, the path in the database row will not automatically be updated and
will be broken.
 There is another location to back up and restore.

 Security administration must be done twice—once for the database and once for the file store.

 Developers must use two mechanisms to access data—Transact-SQL for access to the database and a
Windows API for access to the BLOBs. This adds complexity and increases development time.

 BLOBs are not available for Full-Text Search or Semantic Search.


16-6 Storing and Querying BLOBs and Text Documents in SQL Server

FILESTREAM
You use the FILESTREAM feature, which was
introduced in SQL Server 2008, to store BLOBs on
the file system, closely integrated with their
corresponding rows. It combines the extra
performance you can achieve for large BLOBs
served from the file system, with the advantages
of storing BLOBs in the database.

FILESTREAM Implementation
FILESTREAM is an attribute of the varchar(max)
data type. If you enable this attribute on a
varchar(max) column, SQL Server stores BLOBs in
a folder on the NTFS file system. You always access
these BLOBs through the database server but you can choose to use either Transact-SQL or Win32 I/O
APIs, which have better performance for large files. You can also store BLOBs that have more than the 2
GB size limit for BLOBs stored in the database.

To use FILESTREAM, you must create at least one FILESTREAM filegroup in your database. This is a
dedicated kind of filegroup that contains file system directories, called “data containers”, instead of the
actual BLOBs.

Benefits
When you use FILESTREAM, BLOBs are stored outside the database on the file system, but from the point
of view of applications, they appear to be within the database. This has the advantage of high
performance using external BLOBs, and close integration when you store BLOBs within the database.

Advantages
 When BLOBs are larger than about 1 MB, read performance will be greater with FILESTREAM than for
BLOBs stored within the database.

 BLOBs are fully integrated with the database for management and security.

 SQL Server automatically maintains referential and transactional integrity.


 BLOBs are available for full-text searches and semantic searches.

 There is no upper limit on the size of BLOBs stored in FILESTREAM columns.

 Developers access all BLOBs through the database server and can use either Transact-SQL or Win32
APIs.

Disadvantages
 Applications cannot access BLOBs directly; instead, developers must write code that reads and writes
BLOB data.

 BLOBs must be stored on a hard drive installed on the database server itself; you cannot use a shared
folder on another file server.

Note: You can use a Storage Area Network (SAN) to store FILESTREAM BLOBs, because
these appear as local hard drives to the database server.
Developing SQL Databases 16-7

FileTables
FileTables was introduced in SQL Server 2012 as an
extension to the FILESTREAM features that solve
some of that feature’s limitations.

Benefits and Implementation


When you use FILESTREAM columns, desktop
applications such as Word cannot open BLOBs as
they would normal files, because you can only
access FILESTREAM BLOBs programmatically
through Transact-SQL or Win 32 APIs. With
FileTables, you can open and manipulate the
contents of BLOBs from such applications.

Also, FILESTREAM BLOBs must be stored on a hard drive that is local to the database server. With
FileTables, the storage location can be a shared folder on a file server that is remote to the database
server.
A FileTable is a database table that has a specific schema. It includes a varchar(max) column with the
FILESTREAM attribute enabled. It also includes a set of metadata columns that describe the BLOBs. These
columns include the file size, the creation time, the last write time, and so on.

Advantages and Disadvantages


Because FileTables are an extension of the FILESTREAM feature, they realize many of the advantages
associated with FILESTREAM. However, they also have extra advantages:
 Depending on your configuration, applications such as Word and Excel® can access files in the
shared file system location. They can read, write, create and delete files, and their changes will be
automatically propagated into the FileTable in the database.
 The file system location need not be on a hard drive installed on the database server. Instead, it could
be a shared folder on a file server.

You should also consider the following disadvantage:

 Because a FileTable is a separate database table with a fixed schema, you cannot integrate the BLOBs
as columns in another table. For example, product manuals must be stored as rows in a separate
FileTable, and not as a column in the Products table. Instead you must use a foreign key relationship
to associate a product with its manual. This may have implications for referential and transactional
integrity.

Demonstration: BLOBs in the Adventure Works Database


Demonstration Steps
Investigate BLOB Storage in the Adventure Works Database

1. On the taskbar, click Microsoft SQL Server Management Studio.


2. In the Connect to Server dialog box, in Server name, type MIA-SQL, and then click Connect.

3. On the File menu, point to Open, and then click Project/Solution.

4. In the Open Project dialog box, navigate to D:\Demofiles\Mod16\Demo, click demo.ssmssln, and
then click Open.

5. In Solution Explorer, double-click the 1 - AdventureWorks BLOBs.sql script file.


16-8 Storing and Querying BLOBs and Text Documents in SQL Server

6. Select the code under the Step 1 comment, and then click Execute.

7. Select the code under the Step 2 comment, and then click Execute.

8. Select the code under the Step 3 comment, and then click Execute.

9. Select the code under the Step 4 comment, and then click Execute. Note that no results are returned.

10. Close Microsoft SQL Server Management Studio, without saving any changes.

Check Your Knowledge


Question

You want to store résumés, which are


Word documents, in the Human
Resources database. You want to store
each document in a column of the
Employees table. A typical résumé is
around 50 KB in size and you want to
maximize performance, and ensure
referential and transactional integrity.
Which of the following approaches
should you use?

Select the correct answer.

Store documents as BLOBs within the


database.

Store documents on a separate file


server. In the database, create a
varchar() column that links an
employee record to the correct
résumé.

Use the FILESTREAM feature.

Use a FileTable.
Developing SQL Databases 16-9

Lesson 2
Working with FILESTREAM
The FILESTREAM feature, together with FileTables, enable database administrators to combine the
performance advantages of BLOBs stored on the file system, with the close integration of data when
BLOBs are stored in the database. Now that you understand when to use FILESTREAM and FileTables, this
lesson discusses their prerequisites and how to implement them in your database.

Lesson Objectives
After completing this lesson, you will be able to:

 Describe the requirements and limitations of FILESTREAM.

 Enable FILESTREAM for a SQL Server instance, a database, and an individual table.

 Describe the requirements and limitations of FileTable.

 Create FileTables in a database.

 Write queries that determine BLOB locations for FileTables and FILESTREAM columns.

 Configure and use FILESTREAM and FileTables.

Considerations for Using FILESTREAM


The advantages of the FILESTREAM technology
make it an attractive option to store large BLOBs
that are tightly integrated with database rows.
Before implementing this technology, however,
consider the following issues:
 For each table with FILESTREAM columns, you
must also create a uniqueidentifier ROWGUID
column.
 FILESTREAM requires at least one FILESTREAM
filegroup. This special kind of filegroup
contains file system directories instead of the
BLOBs themselves. These directories are called
data containers. Create a filegroup before you create a FILESTREAM column.

 FILESTREAM filegroups should be placed on a separate volume from the operating system, page files,
the database, the transaction logs, and the tempdb for optimal performance.

 BLOBs in FILESTREAM columns are automatically included in database backups and restores, so you
do not need a separate maintenance regime for FILESTREAM data.

 If you expect BLOBs in a column to be smaller than 1 MB, you might obtain better performance by
using a varbinary(max) column without FILESTREAM. This is because small BLOBs can be stored in
the same data page as the rest of the row.

 FILESTREAM is not compatible with database mirroring.

 If you are using transparent database encryption for a database, BLOBs stored in FILESTREAM
columns are not encrypted.

 FILESTREAM is not supported by Always Encrypted.


16-10 Storing and Querying BLOBs and Text Documents in SQL Server

Enabling FILESTREAM
FILESTREAM is not enabled by default in SQL
Server. To use it, you must complete three
configuration tasks:

1. Enable FILESTREAM for the SQL Server


instance.

2. Set up a FILESTREAM filegroup in the


database.

3. Create a FILESTREAM column in a table.

Enabling FILESTREAM for the SQL Server


Instance
First, use SQL Server Configuration Manager to enable FILESTREAM:

1. Start SQL Server Configuration Manager.


2. In the list of services on the left, click SQL Server Services.

3. Locate the SQL Server instance you want to configure and double-click it.

4. In the Properties dialog for the SQL Server instance, click the FILESTREAM tab.

5. Select Enable FILESTREAM for Transact-SQL access.

6. If you want to use Win32 APIs to access BLOBs, select Enable FILESTREAM for file I/O access, and
then click OK.
You must also configure the FILESTREAM access level by using the sp_configure stored procedure. There
are three possible access levels:

 0. This value disables FILESTREAM for the instance.

 1. This value enables FILESTREAM access for Transact-SQL clients.

 2. This value enables FILESTREAM access for Transact-SQL and Win32 streaming clients.

Use the following code to configure the FILESTREAM access level:

Setting the Access Level by Using sp_configure


EXEC sp_configure filestream_access_level, 2
RECONFIGURE;
GO

After completing these configuration steps, restart the SQL Server instance.

Creating a FILESTREAM Filegroup


Your database must include at least one FILESTREAM filegroup. To create such a database, specify the
CONTAINS FILESTREAM clause as in the following example:
Developing SQL Databases 16-11

This code example creates a new database with one filegroup that supports FILESTREAM:

Creating a Database with a FILESTREAM Filegroup


CREATE DATABASE HumanResources
ON
PRIMARY ( NAME = HR1,
FILENAME = 'C:\databases\HRDB1.mdf'),
FILEGROUP FileStreamGroup1 CONTAINS FILESTREAM( NAME = HR3,
FILENAME = 'D:\databases\filestream1')
LOG ON ( NAME = HRlog1,
FILENAME = 'E:\TransLogs\HRlog1.ldf')
GO

Creating a FILESTREAM Column


Now that you have an instance and a database that support FILESTREAM, you can create tables with
FILESTREAM columns. These columns must use the varbinary(max) data type. and you must specify the
FILESTREAM attribute.
The following code creates a new table named “Employees” with a FILESTREAM column named “Resume”:

Creating a FILESTREAM-Enabled Table


CREATE TABLE HumanResources.dbo.Employees
(
[Id] [uniqueidentifier] ROWGUIDCOL NOT NULL UNIQUE,
[EmployeeNumber] INTEGER UNIQUE,
[FirstName] STRING NOT NULL,
[LastName] STRING NOT NULL,
[Resume] VARBINARY(MAX) FILESTREAM NULL
);
GO

Considerations for Using FileTables


FileTables are an extension to the FILESTREAM
technology. Therefore, some of the prerequisites
for FileTables are the same as for FILESTREAM—
for example, a FILESTREAM filegroup is required.

The prerequsites for FileTables include:

 FILESTREAM must be enabled at the instance


level. See the previous topic for how to
configure the prerequisite.

 A FILESTREAM filegroup must be enabled at


the database level. See the previous topic for
how to configure the prerequisite.

 Nontransactional access must be configured at the database level. A key advantage of FileTables is
that they enable applications to access BLOBs without going through the transactional system of SQL
Server. You must configure this access and choose the access level. You can choose to enable full
access, read-only access, or disable the access. See the next topic for how to enable nontransactional
access.

 A directory for FileTables must be configured at the database level. When you create a FileTable, a
folder is created as a child of the directory that will store BLOBs. See the next topic for how to
configure this directory.
16-12 Storing and Querying BLOBs and Text Documents in SQL Server

FileTables do not support all of the SQL Server features that other tables support, and some SQL Server
features are partially supported.

 The following SQL Server features cannot be used with FileTables:

o Table partitioning

o Database replication

o Transactional rollbacks and point-in-time recovery

 The following SQL Server features can be used with some limitations:

o If a database includes a FileTable, failover works differently for AlwaysOn availability groups. If
failover occurs, you have full access to the FileTable on the primary replica but no access on
readable secondary replicas.

o FileTables do not support INSTEAD OF triggers for Data Manipulations Language (DML)
operations. AFTER triggers for DML operations are supported, and both AFTER and INSTEAD OF
triggers are supported for Data Definition Language (DDL) operations.

o You cannot include FileTables in indexed views.

Enabling FileTables
Before you can use FileTables, FILESTREAM must
be enabled at the instance level, and a
FILESTREAM filegroup created as described
previously in this lesson. In addition, you must
complete the tasks described in the following
sections.

Enabling Nontransactional Access


Windows applications such as Word and Excel can
obtain a file handle to BLOBs in the FileTable
without supporting transactions. This means these
applications can open BLOBs as they would open
files in a typical Windows shared folder. This is
called nontransactional access and is enabled at the database level.
Use the following query to check whether nontransactional access has been enabled for all the databases
in an instance:

Checking for Nontransactional Access


SELECT DB_NAME(database_id), non_transacted_access, non_transacted_access_desc
FROM sys.database_filestream_options;
GO

The available access levels for nontransactional access are DISABLED, READ_ONLY, and FULL. To enable
FileTables and set the access level, use the FILESTREAM option when you create or alter a database.

In this example, full nontransactional access is enabled for a new database called HumanResources:

Enabling Nontransactional Access for a New Database


CREATE DATABASE HumanResources
WITH FILESTREAM ( NON_TRANSACTED_ACCESS = FULL );
GO
Developing SQL Databases 16-13

Configuring a Directory for FileTables


You configure a directory for FileTables by using the DIRECTORY_NAME attribute in the FILESTREAM
option when you create or alter a database.

In the following code, a FileTable directory is configured for a pre-existing database named
HumanResources:

Configuring a Directory
ALTER DATABASE HumanResources
SET FILESTREAM ( NON_TRANSACTED_ACCESS = FULL,
DIRECTORY_NAME = N'FileTableDirectory' );
GO

Note: You can also enable and configure FileTable prerequisites by using SSMS. These
options appear on the Options tab of the database Properties dialog box.

FILESTREAM Access
You can use Transact-SQL built-in system
functions to determine the file system paths and
IDs for folders that store FILESTREAM and
FileTable BLOBs.

The PathName Function


To access a BLOB stored in a FILESTREAM table,
SELECT the column in a query. In an application,
you must use further code to make use of the
data, such as formatting an image BLOB for
display on a webpage. Occasionally, you may want
to determine the path to the BLOB on the file
system; in this case, use the PathName() function.

Use the @option argument to control the format of the returned path. Possible values are:
 0. The path is returned in NetBIOS format. This is the default.

 1. The path is returned as stored without any conversion.

 2. The path is returned as a Universal Naming Convention (UNC) path.

The following query uses the PathName() function to locate a BLOB on the filesystem:

Determining BLOB Paths


SELECT FirstName, LastName, Photo.PathName()
FROM dbo.Employees
WHERE LastName = 'Smith'
ORDER BY LastName;

The FileTableRootPath Function


You can use this function to locate the root folder for a specific FileTable or for the current database. This
is useful information, because FileTable allows direct access to BLOBs on the file system—you can pass this
path to applications so they can edit files.

The following example returns the FileTable root path for the HumanResources database:
16-14 Storing and Querying BLOBs and Text Documents in SQL Server

FileTableRootPath Function
USE HumanResources;
GO
SELECT FileTableRootPath();

The GetFileNamespacePath Function


This function returns the UNC path to a specific BLOB or directory in a FileTable. The is_full_path
argument specifies whether the returned value should be a full or a relative path. The @option argument
is used for the PathName() function.

The following query returns the relative paths to all BLOBs in a FileTable named Images:

GetFileNamespacePath Function
USE HumanResources;
GO
SELECT file_stream.GetFileNamespacePath() AS [Relative Path] FROM Images;
GO

Demonstration: Configuring FILESTREAM and FileTables


In this demonstration, you will see how to enable and create FILESTREAM columns and FileTables.

Demonstration Steps
Enable FILESTREAM at the Instance Level
1. On the Start screen, type SQL Server 2016 Configuration Manager, and then click SQL Server 2016
Configuration Manager.

2. In the User Account Control dialog box, click Yes.

3. In the left pane, click SQL Server Services.

4. In the right pane, right-click SQL Server (MSSQLSERVER), and then click Properties.
5. In the SQL Server (MSSQLSERVER) Properties dialog box, on the FILESTREAM tab, check that the
Enable FILESTREAM for Transact-SQL access check box is selected, and then click OK.

6. Close SQL Server Configuration Manager.

Configure FILESTREAM and FileTables in the AdventureWorks Database


1. On the taskbar, click Microsoft SQL Server Management Studio.

2. In the Connect to Server dialog box, in Server name, type MIA-SQL, and then click Connect.

3. On the File menu, point to Open, and then click Project/Solution.

4. In the Open Project dialog box, navigate to D:\Demofiles\Mod16\Demo, click demo.ssmssln, and
then click Open.

5. In Solution Explorer, double-click the 2 - Configuring FILESTREAM and FileTables.sql script file.
6. Select the code under the Step 1 comment, and then click Execute.

7. Select the code under the Step 2 comment, and then click Execute.

8. Select the code under the Step 3 comment, and then click Execute.

9. Select the code under the Step 4 comment, and then click Execute.
Developing SQL Databases 16-15

10. Select the code under the Step 5 comment, and then click Execute.

11. Select the code under the Step 6 comment, and then click Execute.

12. Select the code under the Step 7 comment, and then click Execute.

13. Select the code under the Step 8 comment, and then click Execute.

14. Select the code under the Step 9 comment, and then click Execute.
Keep Microsoft SQL Server Management Studio open for the next demonstration.

Sequencing Activity
You have no FILESTREAM or FileTable prerequisites configured. You want to create a FileTable for BLOB
storage. Put the following steps in order by numbering each to indicate the correct order.

Steps

Enable
FILESTREAM at
the instance
level.

Create a
FILESTREAM
filegroup at the
database level.

Configure
nontransactional
access at the
database level.

Configure a
directory for
FileTables at the
database level.

Start creating
FileTables.
16-16 Storing and Querying BLOBs and Text Documents in SQL Server

Lesson 3
Using Full-Text Search
SQL Server has industry-leading indexing and querying performance optimized to handle structured,
tabular data. If you use large varchar() columns to store long text fragments, or use varbinary(max)
columns to store BLOBs, SELECT queries that use predicates such as LIKE might not perform well, or might
not return the rows you intended. The Free-Text Search feature of SQL Server helps with these issues—it
can analyze and index long text fragments in large varchar() columns and BLOBs in a way that is aware of
language-specific linguistic rules, such as word boundaries and inflection.

In this lesson, you will see how to configure and query Full-Text Search and Semantic Search.

Lesson Objectives
After completing this lesson, you will be able to:

 Describe how Full-Text Search enables users to analyze text-based data in ways that are not possible
with standard Transact-SQL queries.

 List the components of the Full-Text Search architecture and describe their role in index operations
and queries.
 Configure Full-Text Search and create full-text indexes.

 Use predicates and functions to query a full-text index.

 Describe how Semantic Search enables users to analyze text-based data in ways that are not possible
with Full-Text Search.

What is Full-Text Search?


Full-Text Search is an optional component of the
SQL Server Database Engine that enables you to
execute language-aware queries against
character-based data. Each language, such as
English, Japanese, or Thai, has different grammar
and rules concerning the form of words and the
boundaries between words. For example, in
English, “running” and “ran” are both forms of the
verb “run”. It is difficult to create a LIKE predicate
that would return rows that contain all forms of
the verb “run”. However, if you execute a full-text
search for “run”, items that contain “running” and
“ran” are returned because full-text searches are aware of the rules of English grammar.

Types of Full-Text Search


If you have a full-text index on a table, you can execute the following types of searches against the
character-based data in the index:

 Simple term search. This type of search matches one or more specific words or phrases.
 Prefix term search. This type of search matches words that start with the character string you
specify.

 Generation term search. This type of search matches inflectional forms of the words you specify.
Developing SQL Databases 16-17

 Proximity term search. This type of search matches an item when a specified word or phrase
appears close to another specified word or phrase.

 Thesaurus search. This type of search matches words that are synonymous with the words you
specify. For example, if you search for “run”, a thesaurus search might match “jog”.
 Weighted term search. This type of search matches the words or phrases you specify, and orders
them so that some word matches appear higher in the list than others.

Full-text searches are not case-sensitive.

Performance
Full-text searches deliver much higher performance than LIKE predicates when executed against large
blocks of text, such as varchar(max) columns. In addition, you cannot use LIKE predicates to search text
in BLOBs, such as varbinary(max) columns. This restriction applies whether or not you are using
FILESTREAM or FileTables.

Property-scoped Searches
Full-text search can index the properties of a file stored as a BLOB, in addition to its text. For example,
Word supports an Author property for each Word document. You can use a full-text search to locate
documents by a given author, even if you have not separately stored the author in a column, in a SQL
Server table.

Language Support
Full-text search supports around 50 languages and distinguishes between dialects of the same language,
such as American English and British English. For each language, the following components are used to
analyze and index text:

 Word breakers and stemmers. A word breaker separates text into individual words by located word
boundaries, such as spaces and periods. A stemmer conjugates verbs to ensure that different forms of
the same word match.

 Stoplists. A stop word or noise word is one that does not help the search. A stoplist is a list of stop
words for a given language. For example, in English, no one would search for the word “the” so it is
removed from the index.

 Thesaurus files. Thesaurus files list synonyms to ensure that thesaurus searches match words that
mean the same thing as the search term.

 Filters. Filters are components that understand the structure of a particular file type, such as a Word
document or an Excel spreadsheet. Filters enable property-scoped searches by enabling SQL Server to
index the properties of those file types.

You can use the sys.fultext_languages catalog view to determine which languages are supported by full-
text searches on your database server. For full details of this catalog view, see Microsoft Docs:

sys.fulltext_languages (Transact-SQL)
http://aka.ms/Hvzcir
16-18 Storing and Querying BLOBs and Text Documents in SQL Server

Components and Architecture of Full-Text Search

Full-Text Search Components


Multiple components work together to implement
Full-Text Search. Some of these components are
involved in creating the index, and some in
executing queries.

Most of the components of SQL Server Full-Text


Search run as part of the SQL Server process
(sqlserver.exe). However, for security reasons,
protocol handlers, filters, and word breakers are
isolated in a separate process called the filter
daemon host (fdhost.exe).

The components that run in the SQL Server process include:

 Full-Text Gatherer. This component runs the crawl threads that process content.

 Full-Text Engine. This component is part of the SQL Server query processor. It receives full-text
queries and communicates with the index to locate matching items.
 Thesaurus. This component stores the language-dependent lists of synonyms that enable thesaurus
search.

 Stoplist. This component defines stop words and removes them from queries and full-text indexes.
 Indexer. This component compiles the full-text index in a format that optimizes query delivery.

The components that run in the filter daemon host include:


 Protocol Handlers. These components communicate with the location where content is stored.
Often, this location is a table in the database, but it can also be a file share and other types of
location.

 Filters. These components analyze file structures, and locate file properties and body text.
 Word breakers. These components look for word boundaries in a specific language, such as spaces,
commas, and periods.

Indexing Process
A full-text population operation is also called a crawl and can be initiated by a change to one of the
indexed columns or on a schedule. When a crawl is initiated, these are the steps that are followed:

1. The full-text engine notifies the filter daemon host that a crawl is underway.

2. The full-text gatherer initiates crawl threads, each of which begins to crawl content and pass it to
different components for processing.

3. The full-text engine reads a large quantity of data into memory in the form of user tables. These
tables are the raw content of the character-based data in the columns of the index. Depending on the
storage location, different protocol handlers are used to obtain this text.

4. Crawl threads pass BLOBs to filters. These analyze the content of the file and return text from the
body and metadata fields.

5. Crawl threads pass text to word breakers that split long strings into words.

6. Individual word lists are passed to the indexer.

7. The indexer calls the stoplist to remove noise words from the index.
Developing SQL Databases 16-19

8. The indexer creates an inverted list of words and their locations in the columns, and stores this list in
the full-text index.

Query Process
When a user executes a full-text query, the SQL Server Query Processor passes the request to the Full-Text
Engine. The Full-Text Engine takes different steps to compile the query, depending on the type of search
that was requested. For example:

 If the query is a generation term search, the Full-Text Engine performs stemming to identify alternate
forms of the search terms.

 If the query is a thesaurus search, the Full-Text Engine calls the thesaurus to identify synonyms.

 If the query includes phrases, the Full-Text Engine calls word breakers.

 The Full-Text Engine removes noise words by calling the stoplist.


 The Full-Text Engine represents the query in the form of SQL operations—primarily as Streaming
Table-Valued Functions (STVFs).
Once query compilation has been completed, the SQL operations are executed against the full-text index
to retrieve results, which are returned to the client.

For more information about full-text search architecture, see Microsoft TechNet:

Full-Text Search Architecture


https://aka.ms/Ulbrhm

Configuring Full-Text Search


To implement Full-Text Search with optimal
performance, consider the following issues to
configure the full-text components:

 Supported data types. Full-text indexes can


only include columns with the following data
types: char, varchar, nchar, nvarchar, text,
ntext, image, xml, varbinary, and
varbinary(max), including varbinary(max),
with FILESTREAM enabled.

 Unique column. A full-text index requires a


unique, non-null column as a key index. For
the best performance, this column should be
of the integer data type.

 Table support. You can only create one full-text index for each database table, but the index can
include multiple columns from that table.

 Language support. A single full-text index can include text in multiple languages. You specify a
single language for each column in the index.
 Filegroup placement. Full-text crawls are disk-intensive operations, so you should consider creating
a dedicated filegroup for full-text indexes. For maximized performance, separate this filegroup onto
its own physical disk.

 Managing updates. By default, the Full-Text Engine is configured to update the index continuously
as changes are made to the underlying column data. This ensures that the index is always up to date.
16-20 Storing and Querying BLOBs and Text Documents in SQL Server

However, you may wish to schedule crawls to take place during off-peak hours or to manually initiate
crawls. Schedules use the SQL Server Agent service to initiate crawls. Remember that the index may
fall out of synchronization with the column data if a crawl has not taken place recently.

Creating a Full-Text Catalog


The first step you must take to configure Full-Text Search is to create a full-text catalog. This is a logical
grouping for full-text indexes in the database.

The following code creates a full-text catalog in the HumanResources database:

Creating a Full-Text Catalog


USE HumanResources;
GO
CREATE FULLTEXT CATALOG HRCatalog;
GO

Creating a Full-Text Index


To create a full-text index, use the CREATE FULLTEXT INDEX statement. You must supply the following
information:
 A name for the index.

 The name of the table on which to create the index.

 The names of the columns to include in the index.

 The name of the column that will act as the key index column.

 Optionally, a language for each column in the index.


The following code creates a full-text index in the HRCatalog. It uses the EmployeeID column as the key
index and includes the FullName, EmailAddress, Skills, and Resume columns. The document type for
Resume files is specified in the file_type column:

Creating a Full-Text Index


CREATE FULLTEXT INDEX ON HumanResources.Employees
(
FullName
Language 1033,
EmailAddress
Language 1033,
Skills
Language 1033,
Resume TYPE COLUMN [file_type]
Language 1033
)
KEY INDEX EmployeeID
ON HRCatalog;
GO
Developing SQL Databases 16-21

Querying a Full-Text Index


To execute a query that makes use of a full-text
index, you must use the CONTAINS or FREETEXT
predicates, often in a WHERE clause. Alternatively,
you can use the CONTAINSTABLE or
FREETEXTTABLE functions, which return tables of
rows that match your search and can be used in
FROM clauses.

CONTAINS and FREETEXT Predicates


The CONTAINS and FREETEXT predicates return
TRUE or FALSE. You can use them in WHERE
clauses or HAVING clauses of a SELECT statement
to ensure that only matching rows are included in
the result set. Other predicates, such as LIKE or BETWEEN, can be used in the same clause if required.
When you use CONTAINS of FREETEXT you can specify a column in the index that must match a list of
columns, or that all columns in the index should be searched.

Use CONTAINS to locate precise or fuzzy matches to words and phrases. You can also use CONTAINS to
perform proximity term searches or weighted term searches.
In the following code, the query returns all Employees in the Sales department that have the phrase
“Team Management” in the Skills column. This is an example of a simple term search:

Simple Term Search


SELECT EmployeeID, FirstName, LastName
FROM HumanResources.Employees
WHERE Department = 'Sales'
AND CONTAINS(Skills, 'Team Management');

In the following example, the query returns all Employees with forms of the verb “analyze” in their résumé.
Results would include employees with the words “analyzing” and “analyzed” in their résumés. The Resume
column may be a BLOB column that uses the varbinary(max) column, either with FILESTREAM enabled
or with BLOBs stored in the database.

Generation Term Search


SELECT EmployeeID, FirstName, LastName
FROM HumanResources.Employees
WHERE CONTAINS(Resume, ' FORMSOF (INFLECTIONAL, analyze) ');

Use FREETEXT to match the meaning, rather than the exact wording of single words, phrases, or
sentences. FREETEXT searches use the thesaurus to match meaning.

The following example uses the FREETEXT predicate to perform a thesaurus search:

Thesaurus Search
SELECT EmployeeID, FirstName, LastName
FROM HumanResources.Employees
WHERE FREETEXT (Resume, 'Project Management' );

CONTAINSTABLE and FREETEXTTABLE Functions


These functions return tables of rows that match the full-text query you specify. You can reference these
tables in the FROM clause of a SELECT statement. The tables returned always include the following
columns:
16-22 Storing and Querying BLOBs and Text Documents in SQL Server

 KEY. This column returns the unique value of the key index column of the full-text index.

 RANK. This column contains a rank value that describes how well the row matched the query. The
higher the ranks value, the better the match.

In the following example, a weighted term search is performed by using the CONTAINSTABLE function.
The results are joined to the original table to return the rank value for each product:

Using CONTAINSTABLE for a Weighted Term Search


SELECT FT_TBL.Name, KEY_TBL.RANK
FROM Production.Product AS FT_TBL
INNER JOIN CONTAINSTABLE(Production.Product, Name,
'ISABOUT (frame WEIGHT (.8),
wheel WEIGHT (.4), tire WEIGHT (.2) )' ) AS KEY_TBL
ON FT_TBL.ProductID = KEY_TBL.[KEY]
ORDER BY KEY_TBL.RANK DESC;

In the following example, the FREETEXTTABLE function is called to perform a thesaurus search. The results
are joined with the original table to display the rank value with the search column:

Using the FREETABLE Function for a Thesaurus Search


SELECT KEY_TBL.RANK, FT_TBL.Description
FROM Production.ProductDescription AS FT_TBL
INNER JOIN
FREETEXTTABLE(Production.ProductDescription, Description,
'perfect all-around bike') AS KEY_TBL
ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
ORDER BY KEY_TBL.RANK DESC;

For more information about CONTAINS, FREETEXT, CONTAINSTABLE, and FREETEXTTABLE see Microsoft
Docs:
Query with Full-Text Search
http://aka.ms/Ai7pzu

Using a NEAR Proximity Term


When you execute queries that include CONTAINS or CONTAINSTABLE, you can search for rows that
contain two words or phrases close to each other in the full-text index. To do this, you use the NEAR
keyword, also known as the custom proximity term.

When you use the custom proximity term, you can specify the maximum number of nonsearch terms that
separate your search terms. This is known as the maximum distance between search terms. You can also
define whether the returned result must contain the search terms in the specified order.

In the following example, the query will return employees if the words “Project” and “Management”
appear with five or fewer terms separating them in the Resume column:

Using the Custom Proximity Predicate


SELECT EmployeeID, FirstName, LastName
FROM HumanResources.Employees
WHERE CONTAINS(Resume, 'NEAR((Project, Management), 5)');

For more information on the custom proximity term, see Microsoft Docs:

Search for Words Close to Another Word with NEAR


http://aka.ms/Abw20y
Developing SQL Databases 16-23

Demonstration: Configuring and Using Full-Text Search


In this demonstration, you will see how to create a full-text index and use it for generation term searches.

Demonstration Steps
Create and Use a Full-Text Index

1. In Solution Explorer, double-click the 3 - Configuring and Using Full-Text Search.sql script file.

2. Select the code under the Step 1 comment, and then click Execute.

3. Select the code under the Step 2 comment, and then click Execute.
4. Select the code under the Step 3 comment, and then click Execute.

5. Select the code under the Step 4 comment, and then click Execute.

6. Select the code under the Step 5 comment, and then click Execute.

7. Select the code under the Step 6 comment, and then click Execute.

8. Close Microsoft SQL Server Management Studio, without saving any changes.

Enabling and Using Semantic Search


Semantic Search provides deep insight into
character-based data, including the data stored in
large BLOBs, by extracting and indexing
statistically relevant key phrases. You can use
Semantic Search to identify documents that are
similar or related, based on the meaning of their
content. It is this emphasis on meaning, rather
than individual search terms, which separates
Semantic Search from Full-Text Search.

What is Semantic Search?


Semantic Search extends the capabilities of Full-
Text Search so you can identify documents that
are similar or related in some way. For example, you could use Semantic Search to identify the résumés in
a FileTable that relate to a specific job role. Although a standard full-text query will reveal résumés that
contain similar keywords or phrases, these searches may not find relevant résumés where the author has
not used the specified keywords that are contained in the search term. By identifying deeper patterns of
meaning, Semantic Search can provide a results set that more accurately matches the search query.

Semantic Search uses a database named the Semantic Language Statistics database, which contains the
statistical models that are used to perform semantic searches.

Note: Semantic Search does not support as many languages as a full-text index. To view
the list of supported languages for Semantic Search, query the sys.fulltext_semantic_languages
catalog view.

Enabling Semantic Search Functionality


Semantic Search is an extension to Full-Text Search and uses full-text indexes. In addition, you must install
the Semantic Language Statistics database, which contains the information SQL Server uses to analyze
meaning in text. You must also modify a full-text index to support Semantic Search.
16-24 Storing and Querying BLOBs and Text Documents in SQL Server

For more information on how to install the Semantic Language Statistics database, see Microsoft Docs:

Install and Configure Semantic Search


http://aka.ms/Wuhba4

After the Semantic Language Statistics database is configured, you can use the CREATE FULLTEXT INDEX
statement or the ALTER FULLTEXT INDEX statement to create a full-text index that includes Semantic
Search.

The following code example adds Semantic Search to an existing full-text index on the Document table in
the AdventureWorks database:

Using ALTER FULLTEXT INDEX to Add Semantic Search


ALTER FULLTEXT INDEX ON Production.Document
ALTER COLUMN Document
ADD Statistical_Semantics;
GO

Using Semantic Search Functions in Queries


Once you have configured Semantic Search, you can use three functions to use it in queries. These
functions identify key phrases in a document, in addition to documents that are similar to each other,
because they share these key phrases. The functions are as follows:
 SEMANTICKEYPHRASETABLE. This function returns a table of key phrases from the column or
columns that you specify. These columns can include BLOBs in varchar(max) columns, including
FILESTREAM columns and document columns in FileTables. The returned table includes the following
columns:

o Document_key. This is the key index value for the returned document in the underlying full-text
index.
o Keyphrase. This is the phrase that the search has identified as key to the meaning of the
document.
o Score. This is a weighting value that indicates the importance of the phrase in the document. The
value is between 0 and 1.

 SEMANTICSIMILARITYTABLE. This function returns a table of documents that are semantically


related to a specified document in the same full-text indexed table. You specify the table to search,
the column or columns to search, and the key index of the document to compare. The returned table
includes the following columns:

o Matched_document_key. This is the key index value for the matched document.

o Score. This is a weighting value that indicates the closeness of the match. The value is between 0
and 1.

 SEMANTICSIMILARITYDETAILSTABLE. This function returns the key phrases that make two
documents similar. Having used SemanticSimilarityTable to find similar documents, you can use this
function to determine the phrases that the similar documents share. The returned table includes the
following columns:
o Keyphrase. This the phrase that is shared between the two documents you have specified.

o Score. This is a weighting value that indicates how important this key phrase is in its similarity
between the two documents.
Developing SQL Databases 16-25

The following example uses the SEMANTICKEYPHRASETABLE function to return the top 10 key phrases
from a specific document in the Resume column of the Employees table. The document is specified by
using the @EmployeeId parameter, which is the key index of a row in the Employees table.

SEMANTICKEYPHRASETABLE
SELECT TOP(10) KeyPhraseTable.keyphrase
FROM SEMANTICKEYPHRASETABLE
(
HumanResources.Employees,
Resume,
@EmployeeId
) AS KeyPhraseTable
ORDER BY KEYP_TBL.score DESC;

Check Your Knowledge


Question

You have a full-text index


set up on the
HumanResources.Employees
table that includes the
Resume column. You want
to locate employees who
have management skills.
You want to search the
Resume column for the
word “manage” and you
want résumés with the
words “manager”,
“managed”, and
“managing” to be included
in the results. What kind of
search should you use?

Select the correct answer.

A simple term search.

A generation term search.

A proximity term search.

A thesaurus search.

A weighted term search.


16-26 Storing and Querying BLOBs and Text Documents in SQL Server

Lab: Storing and Querying BLOBs and Text Documents in


SQL Server
Scenario
You manager has asked you to evaluate and optimize the performance of queries against the LargePhoto
column in the Production.ProductPhotos table. You have decided that, because BLOBs in this column
are often larger than 1 MB, it will be advantageous to create a FILESTREAM column and move the existing
data into the new column.

You have also been asked to create a FileTable, with a corresponding shared folder, so users can store
documents by using Word and other desktop applications. These files will be accessible through the file
share and database queries.

Finally, you have also been asked to create a full-text index on the Description column in the
Production.ProductDescriptions table so that generation term queries and thesaurus queries can be
used.

Objectives
At the end of this lab, you will be able to:
 Enable FILESTREAM and move BLOB data into a FILESTREAM column.

 Enable FileTables and create a FileTable.

 Create and query a full-text index.

Estimated Time: 45 minutes

Virtual machine: 20762C-MIA-SQL

User name: ADVENTUREWORKS\Student


Password: Pa55w.rd

Exercise 1: Enabling and Using FILESTREAM Columns


Scenario
Having decided to move BLOB data into a FILESTREAM column, you will now implement that strategy.

The main tasks for this exercise are as follows:


1. Prepare the Environment

2. Enable FILESTREAM for the SQL Server Instance

3. Enable FILESTREAM for the Database

4. Create a FILESTREAM Column

5. Move Data into the FILESTREAM Column

 Task 1: Prepare the Environment


Prepare the Lab Environment
1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are both running, and then
log on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.

2. Run Setup.cmd in the D:\Labfiles\Lab16\Starter folder as Administrator.


Developing SQL Databases 16-27

 Task 2: Enable FILESTREAM for the SQL Server Instance


1. Using SQL Server Configuration Manager, enable FILESTREAM for the SQL Server (MSSQLSSERVER)
instance. Enable FILESTREAM for file I/O access and give all remote clients access to FILESTREAM
data.

2. Restart the SQL Server (MSSQLSERVER) service.


3. Open the project file D:\Labfiles\Lab16\Starter\Project\Project.ssmssln and the T-SQL script Lab
Exercise 1.sql. Ensure that you are connected to the AdventureWorks database.

4. Write and execute a script that uses the sp_configure stored procedure to set the FILESTREAM access
level to 2.

 Task 3: Enable FILESTREAM for the Database


1. In SQL Server Management Studio, write a query to add a FILESTREAM filegroup called
AdWorksFilestreamGroup to the AdventureWorks2016 database.

2. Write a query to add a file named D:\FilestreamData to the AdventureWorks database.

 Task 4: Create a FILESTREAM Column


1. In SQL Server Management Studio, write a query to add a new UNIQUEIDENTIFIER, non-nullable row
GUID column called PhotoGuid to the Production.ProductPhoto table.
2. Write a query to enable FILESTREAM for the Product.ProductPhoto table, and add BLOBs to the
AdworksFilestreamGroup.

3. Write a query to add a new column called NewLargePhoto to the Production.ProductPhoto table.
Ensure the new column has FILESTREAM enabled and is a nullable varbinary(max) column.

 Task 5: Move Data into the FILESTREAM Column


1. In SQL Server Management Studio, write a query to copy all data from the LargePhoto column into
the NewLargePhoto column.

2. Write a query to drop the LargePhoto column from the Production.ProductPhoto table.
3. Write a query that used the sp_rename stored procedure to change the name of the
NewLargePhoto column to LargePhoto.

4. Close the Lab Exercise 1.sql script.

Results: At the end of this exercise, you will have:

Enabled FILESTREAM on the SQL Server instance.

Enabled FILESTREAM on a database.

Moved data into the FILESTREAM column.


16-28 Storing and Querying BLOBs and Text Documents in SQL Server

Exercise 2: Enabling and Using FileTables


Scenario
You have decided to allow users to be able to store files in the database, and will now implement that
strategy with FileTables.

The main tasks for this exercise are as follows:

1. Enable Nontransactional Access

2. Create a FileTable

3. Add a File to the FileTable

 Task 1: Enable Nontransactional Access


1. In SQL Server Management Studio, open the T-SQL script Lab Exercise 2.sql.

2. Write a query that uses the sys.database_filestream_options system view to display whether
nontransacted access is enabled for each database in the instance.

3. Write a query that enables nontransacted access for the AdventureWorks2016 database. Set the
transacted access level to full and the directory name to “FileTablesDirectory”.

 Task 2: Create a FileTable


1. In SQL Server Management Studio, write a query to create a new FileTable in the
AdventureWorks2016 database. Name the FileTable “DocumentStore” and use a FileTable
directory named “DocumentStore”.
2. Write a query that uses the sys.database_filestream_options system view to display whether
nontransacted access is enabled for each database in the instance.
3. Write a query that uses the sys.filetables system view to list the FileTables in the AdventureWorks
database.

 Task 3: Add a File to the FileTable


1. In SQL Server Management Studio, write a query that uses the FileTableRootPath() function to find
the path to the file share for the DocumentStore filetable.

2. Copy and paste the path you determined into the address bar of a new File Explorer window.

3. Create a new text document called DocumentStoreTest in the file table shared folder.

4. In SQL Management Studio, write a query that displays all rows in the DocumentStore FileTable.

Results: At the end of this exercise, you will have:

Enabled nontransactional access.

Created a FileTable.

Added a file to the FileTable.


Developing SQL Databases 16-29

Exercise 3: Using a Full-Text Index


Scenario
You will now create a full-text index on the Description column in the Production.ProductDescriptions
table so that generation term queries and thesaurus queries can be used.

The main tasks for this exercise are as follows:

1. Create a Full-Text Index

2. Using a Full-Text Index

 Task 1: Create a Full-Text Index


1. In SQL Server Management Studio, open the T-SQL script Lab Exercise 3.sql.

2. Execute the first SELECT query in the Lab Exercise 2.sql script file, which lists the tables that have a
full- text index in the Adventure Works2016 database.

3. Write a query that creates a new full-text catalog in the Adventure Works2016 database with the
name ProductFullTextCatalog.
4. Write a query that creates a new unique index called ui_ProductDescriptionID and indexes the
ProductDescriptionID column in the Production.ProductDescription table.

5. Write a query that creates a new full-text index on the Description column of the
Production.ProductDescription table. Use the ui_ProductDescription unique index and the
ProductFullTextCatalog.

 Task 2: Using a Full-Text Index


1. In SQL Server Management Studio, write a script that executes a simple term query against the
Description column in the Production.ProductDescription table. Locate rows that contain the word
“Bike”. Make a note of the number of rows returned.

2. Write a script that executes a generation term query against the Description column in the
Production.ProductDescription table. Locate rows that contain the word “Bike”. Make a note of the
number of rows returned.

3. Write a script that returns rows from the previous generation term query but not terms from the
previous simple terms query. Examine the Description text for these results.

4. Close Microsoft SQL Server Management Studio, without saving any changes.

Results: At the end of this exercise, you will have created a full-text index.

Question: How did the results of the simple term query you executed in Exercise 3, Task 2
differ from the results of the generation terms query?

Question: What did you notice about the results of the third query you ran against the full-
text index?
16-30 Storing and Querying BLOBs and Text Documents in SQL Server

Module Review and Takeaways


In this module, you have seen the different approaches database administrators can take to the storage of
BLOBs, such as images, documents, and videos. You have learned how to enable and use the advance
FILESTREAM technology and FileTables for BLOBs. You have also learned how to configure and use File-
Text Search.

You might also like