Module 13 14 15 16
Module 13 14 15 16
Module 13
Implementing Managed Code in SQL Server
Contents:
Module Overview 13-1
Lesson 1: Introduction to CLR Integration in SQL Server 13-2
Module Overview
As a SQL Server® professional, you are likely to be asked to create databases that meet business needs.
Most requirements can be met using Transact-SQL. However, occasionally you may need additional
capabilities that can only be met by using common language runtime (CLR) code.
As functionality is added to SQL Server with each new release, the necessity to use managed code
decreases. However, there are times when you might need to create aggregates, stored procedures,
triggers, user-defined functions, or user-defined types. You can use any .NET Framework language to
develop these objects.
In this module, you will learn how to use CLR managed code to create user-defined database objects for
SQL Server.
Objectives
After completing this module, you will be able to:
Lesson 1
Introduction to CLR Integration in SQL Server
Occasionally, you might want to extend the built-in functionality of SQL Server; for example, adding a new
aggregate to the existing list of aggregates supplied by SQL Server.
CLR integration is one method for extending SQL Server functionality. This lesson introduces CLR
integration in SQL Server, and its appropriate use cases.
Lesson Objectives
After completing this lesson, you will be able to:
This module focuses on using CLR managed code to extend SQL Server functionality.
Developing SQL Databases 13-3
.NET Framework
The .NET Framework is a layer of software that sits above the Win32 and Win64 APIs, and provides a layer
of abstraction above the underlying complexity. The .NET Framework is object-oriented and written in a
consistent fashion to a tightly defined set of design guidelines. Many people describe it as appearing to
have been “written by one brain.” It is not specific to any one programming language, and contains
thousands of prebuilt and pretested objects. These objects are collectively referred to as the .NET
Framework class libraries.
The .NET Framework is generally well regarded amongst developers, making it a good choice for building
code to extend SQL Server.
Security features to ensure that managed code will not compromise the server.
The ability to create new resources by using .NET Framework languages such as Microsoft Visual C#
and Microsoft Visual Basic .NET.
13-4 Implementing Managed Code in SQL Server
Memory Management
Managing memory allocation was a problem when developing directly to the Win32 and Win64 APIs.
Component Object Model (COM) programming preceded the .NET Framework, using reference counting
to release memory that was no longer needed. COM would work something like this:
3. Object C acquires a reference to object B. Object B then notes that it has two references.
4. Object C releases its reference. Object B then notes that it has one reference.
5. Object A releases its reference to Object B. Object B then notes that it has no references, so it is
destroyed.
The problem was that it was easy to create situations where memory could be lost. Consider a circular
reference: if two objects have references to each other, with no other references to either of them, they
can consume memory providing they have a reference to each other. This causes a leak, or loss, of the
memory. Over time, this badly written code results in a loss of all available memory, resulting in instability
and crashes. This is obviously highly undesirable when integrating code into SQL Server.
The .NET Framework includes a sophisticated memory management system, which is known as garbage
collection, to avoid memory leaks. There is no referencing counting—instead the CLR periodically checks
which objects are “reachable” and disposes of the other objects.
Type Safety
Type safe code accesses memory in a properly structured way.
Type safety is a problem with Win32 and Win64 code. When a function or procedure is called, all that is
known to the caller is the function’s address in memory. The caller assembles a list of required parameters,
places them in an area called the stack, and jumps to the memory address of the function. Problems can
arise when the design of the function changes, but the calling code is not updated. The calling code can
then refer to memory locations that do not exist.
The .NET CLR is designed to avoid such problems. Objects are isolated from one another and can only
access the memory allocations for which they have permissions. In addition to providing address details of
a function, the CLR also provides the function’s signature. The signature specifies the data types of each of
the parameters, and their order. The CLR will not permit a function to be called with the wrong number or
types of parameters.
Stored procedures.
User-defined aggregates.
Although you can create objects using managed code, it does not necessarily mean that you should.
Transact-SQL should be used most of the time, with managed code used only when necessary.
13-6 Implementing Managed Code in SQL Server
Portability
Upgrading a database system that includes
managed code can be more complicated. From
time to time, SQL Server must be upgraded when
old versions come to the end of their life, and
managed code may or may not work with newer
versions, depending on functionality. This can also
be an issue with Transact-SQL, but Transact-SQL is
more likely to have an upgrade path. CLR
managed code is also dependent on the .NET Framework version installed on the server.
Maintainability
Database administrators (DBAs) generally have a good knowledge of Transact-SQL, but little or no
knowledge of C# or Visual Basic. Adding managed code to a database system means that additional
expertise may be required to maintain the system. For larger organizations that already employ
developers, this may not be a problem. Organizations that rely on DBAs to support their SQL Server
databases may find that adding managed code creates a split in expertise that, over time, causes
problems.
Three-Tier Architecture
Transact-SQL is designed as an efficient language to work with relational database tables. If you have an
extensive need for managed code, consider the three-tier architecture for your system. Each tier is
constructed separately, possibly by different teams with different skills. There is a boundary between each
tier, so that each one can be properly and independently tested. Each tier is built using the development
tools best suited to its needs. This separation of concerns creates systems that are more maintainable, and
faster to develop.
A typical three-tier architecture might be composed of a:
Database tier. The tables, views, stored procedures and other database objects.
Mid tier or business tier. The data access objects and other code that manage the business logic. As
the name suggests, the mid tier (or business tier) sits between the database tier and the presentation
tier.
Presentation tier. This is the user interface tier, which might include forms for data input, reports,
and other content.
Transact-SQL
Transact-SQL is the primary method for manipulating data within databases. It is designed for direct data
access and has many built-in functions. However, Transact-SQL is not a fully-fledged high level
programming language. It is not object-oriented so, for example, you cannot create a stored procedure
that takes a parameter of an animal data type and pass a parameter of a cat data type to it. Also,
Transact-SQL is not designed for tasks such as intensive calculations or string handling.
Developing SQL Databases 13-7
Managed Code
Managed code provides full object-oriented capabilities, although this only applies within the managed
code itself. Managed code works well within SQL Server when used sparingly; otherwise you should
consider using a mid tier.
General Rules
Two good general rules apply when you are choosing between using Transact-SQL and managed code:
Some specialist calculations, strings, or external access might require managed code.
Scalar UDFs
Some scalar user-defined functions (UDFs) that are
written in Transact-SQL cause performance
problems. Managed code can provide an
alternative way of implementing scalar UDFs,
particularly when the function does not depend
on data access.
Table-Valued UDFs
Data-related table-valued UDFs are generally best implemented using Transact-SQL. However, table-
valued UDFs that have to access external resources, such as the file system, environment variables, or the
registry, might be candidates for managed code. Consider whether this functionality properly sits within
the database layer, or whether it should be handled outside of SQL Server.
Stored Procedures
With few exceptions, stored procedures should be written in Transact-SQL. The exceptions to this are
stored procedures that have to access external resources or perform complex calculations. However, you
should consider whether code that performs these tasks should be implemented within SQL Server at all—
it might be better implemented in a mid tier.
DML Triggers
Almost all DML triggers are heavily oriented toward data access and should be written in Transact-SQL.
There are very few valid use cases for implementing DML triggers in managed code.
DDL Triggers
DDL triggers are also data-oriented. However, some DDL triggers have to do extensive XML processing,
particularly based on the XML EVENTDATA structure that SQL Server passes to these triggers. The more
that extensive XML processing is required, the more likely it is that the DDL trigger would be best
implemented in managed code. Managed code would also be a better option if the DDL trigger needed
to access external resources—but this is rarely a good idea within any form of trigger. Again, for any but
the lightest use, consider implementing a mid tier.
13-8 Implementing Managed Code in SQL Server
User-Defined Aggregates
Transact-SQL has no concept of user-defined aggregates. You have to implement these in managed code.
Lesson 2
Implementing and Publishing CLR Assemblies
There are two ways to deploy a CLR assembly to a computer running SQL Server—either with Transact-
SQL scripts or using SQL Server Data Tools (SSDT). This lesson focuses on using SSDT to develop and
deploy CLR assemblies. Code examples have been written using C#.
Lesson Objectives
After completing this lesson, you will:
What Is an Assembly?
Managed code is deployed in SQL Server within an
assembly—a .dll file that contains the executable
code and a manifest. The manifest describes the
contents of the assembly, and the interfaces to the
assembly. SQL Server and other code can then
interrogate what the assembly contains and what
it can do.
Assemblies can contain other resources such as
icons, which are also listed in the manifest. In
general terms, assemblies can be either .exe files
or .dll files; however, SQL Server only works with
.dll files.
In this lesson, you will see how to use SSDT to create assemblies and publish them to SQL Server.
13-10 Implementing Managed Code in SQL Server
SAFE
SAFE assemblies have a limited permission set, and
only provide access for the SQL Server database in which it is cataloged. SAFE is the default permission
set—it’s the most restrictive and secure. Assemblies with SAFE permissions cannot access the external
system; for example, network files, the registry, or other files external to SQL Server.
EXTERNAL_ACCESS
EXTERNAL_ACCESS is the permission set that is required to access local and network resources such as
environment variables and the registry. Assemblies with EXTERNAL_ACCESS permissions cannot be used
within a contained database.
UNSAFE
UNSAFE is the unrestricted permission set that should rarely, if ever, be used in a production environment.
UNSAFE is required for code that calls external unmanaged code, or code that holds state information
across function calls. UNSAFE assemblies cannot be used in a contained database.
You can flag the database as TRUSTWORTHY by using the ALTER DATABASE SET TRUSTWORTHY ON
statement. This is not recommended as, under certain circumstances, it could provide access for
malicious assemblies.
Create an asymmetric key from the assembly file that is cataloged in the master database—then
create a login mapping to that key. Finally, grant the login EXTERNAL ACCESS ASSEMBLY permission
on the assembly. This is the recommended method of granting permission to use the
EXTERNAL_ACCESS or UNSAFE permission sets.
Setting Permissions
When you create an assembly using SSDT, the default permission set is SAFE. Alternatively, you can use
the CREATE ASSEMBLY <Transact-SQL clause> WITH PERMISSION_SET = {SAFE | EXTERNAL_ACCESS |
UNSAFE}.
To change the permission set using SSDT, use the Properties tab to set the Permission level. Right-click
the assembly name, and then click Properties.
Creating an Assembly
http://aka.ms/Ijs5b1
Developing SQL Databases 13-11
SP_CONFIGURE
For security reasons, SQL Server does not allow CLR integration by default. To enable CLR integration, you
must set the clr enabled option to 1. This is set at the instance level using the sp_configure stored
procedure.
This example code firstly displays the settings for sp_configure, enables the advanced options to be
displayed, and then sets the clr enabled option to 1. This allows CLR managed code to the run within the
SQL Server instance:
sys.sp_add_trusted_assembly
Trusted assemblies can be added to a “white list” using sp_add_trusted_assembly. Use
sp_add_trusted_assembly, sp_drop_trusted_assembly, and sys.trusted_assemblies to manage your
whitelist.
For more information about CLR strict security, and see Microsoft Docs:
Aggregates
Stored procedures
Triggers
User-defined functions
User-defined types
2. In Visual Studio, on the File menu, point to New, and then click Project.
3. In the list of templates, click SQL Server and then click SQL Server Database Project. Select the .NET
Framework of your target SQL Server computer, enter a name and directory for your project and then
click OK.
4. To create a CLR object, add a new item to the project using the template for the language of the SQL
CLR object you want to create. The examples in this course are written in C#, so you would select SQL
CLR C#.
5. Select the type of CLR object you want to create. The available choices include aggregates, stored
procedures, triggers, user-defined functions, and data types.
7. When you have completed your CLR code, build the solution, correcting any errors that might occur.
8. Publish the CLR assembly, specifying the target database and connection information. You can then
use the CLR assembly within your SQL Server database.
Using SSDT, open the CLR assembly project, and in Solution Explorer, right-click on the project name, and
then click Properties. On the Project Settings page, amend the target platform. On the SQLCLR page,
set the permission level. Save the assembly before closing the Properties dialog box.
Publish an assembly.
Demonstration Steps
1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are both running, and then
log on to 20762C-MIA-SQL as ADVENTUREWORKS\Student with the password Pa55w.rd.
3. In the User Account Control dialog box, click Yes, and then wait for the script to finish.
4. On the Start screen, type SQL Server Data Tools 2015, and then click SQL Server Data Tools 2015.
6. In the New Project dialog box, expand Templates, and then click SQL Server.
7. In the middle pane, click SQL Server Database Project, in the top pane, in the .NET Framework list,
click .NET Framework 4.6.
8. In the Name box, type ClrDemo, in the Location box, type D:\Demofiles\Mod13, and then click
OK.
9. In Solution Explorer, right-click ClrDemo, point to Add, and then click New Item.
10. In the Add New Item dialog box, in the Installed list, under SQL Server, click SQL CLR C#, in the
middle pane, click SQL CLR C# User Defined Function, in the Name box, type HelloWorld.cs, and
then click Add.
11. Locate the return statement, which is immediately below the comment // Put your code here.
15. On the Project Settings page, in the Target platform list, select SQL Server 2017.
16. On the SQLCLR page, in the Permission level list, note that SAFE is selected, then click Signing.
17. In the Signing dialog, select Sign the assembly. In the Choose a strong name key file box, select
New.
18. In the Create Strong Name Key dialog, in the Key file name box type ClrDemo. In the Enter
password box type Pa55w.rd then in the Confirm password box type Pa55w.rd then click OK, then
click OK.
22. Review the script in the create_login.sql pane. Observe that an asymmetric key is create from the
ClrDemo.dll that you just compiled, and that a login is created based on the imported asymmetric
key, then click Execute. When the script completes, close SQL Server Management Studio.
25. In the Connect dialog box, on the Browse tab, expand Local, and then click MIA-SQL.
26. In the Database Name list, click AdventureWorks2014, and then click Test Connection.
30. After a few seconds, a message will be displayed to say it has published successfully.
32. In the Connect to Server dialog box, in the Server name box, type MIA-SQL, and then click
Connect.
35. In the new query window, type the following code, and then click Execute.
SELECT dbo.HelloWorld();
Question: Do you use managed code in your SQL Server databases? Who maintains the
code when it needs amending?
13-16 Implementing Managed Code in SQL Server
Sequencing Activity
Put the following steps in order by numbering each to indicate the correct order:
Steps
Check which
version of
the .NET
Framework is
installed on the
machine hosting
your SQL Server.
Open Visual
Studio and check
that SSDT is
installed.
Create a new
project.
Amend the
template with
your new code.
Build the
solution.
Publish the
solution.
Create a
Transact-SQL
query using your
new managed
code function.
Developing SQL Databases 13-17
Objectives
After completing this lab you will be able to:
Password: Pa55w.rd
Results: After completing this lab, you will have determined which type of code to use for each new
feature.
13-18 Implementing Managed Code in SQL Server
5. Build the solution, and check that the solution builds without errors.
4. Review and execute the query to verify that the function works as expected.
5. Close the file, but keep SSMS and Visual Studio open for the next exercise.
Developing SQL Databases 13-19
Results: After completing this exercise, you will have a scalar-valued CLR function available in SQL Server
Management Studio.
2. Build the solution, and check that the solution builds without errors.
4. Review and execute the queries to verify that the function works as expected.
5. If time permits, you could test the StringAggregate function by using Test_StringAggregate.sql.
Results: After completing this exercise, you will have a table-valued CLR function available in SQL Server
Management Studio.
Question: After publishing managed code to a database, what do you think the issues are
with using it?
13-20 Implementing Managed Code in SQL Server
Implement and publish CLR assemblies using SQL Server Data Tools (SSDT).
Review Question(s)
Question: This module has reviewed the pros and cons of using managed code within a SQL
Server database. You have integrated some prewritten C# functions into a database and
tested them in some queries.
How might you use managed code in your own SQL Server environment? How do you assess
the pros and cons for your specific situation?
14-1
Module 14
Storing and Querying XML Data in SQL Server
Contents:
Module Overview 14-1
Lesson 1: Introduction to XML and XML Schemas 14-2
Module Overview
XML provides rules for encoding documents in a machine-readable form. It has become a widely adopted
standard for representing data structures, rather than sending unstructured documents. Servers that are
running Microsoft® SQL Server® data management software often need to use XML to interchange data
with other systems; many SQL Server tools provide an XML-based interface.
SQL Server offers extensive handling of XML, both for storage and querying. This module introduces XML,
shows how to store XML data within SQL Server, and shows how to query the XML data.
The ability to query XML data directly avoids the need to extract data into a relational format before
executing Structured Query Language (SQL) queries. To effectively process XML, you need to be able to
query XML data in several ways: returning existing relational data as XML, and querying data that is
already XML.
Objectives
After completing this module, you will be able to:
Lesson 1
Introduction to XML and XML Schemas
Before you work with XML in SQL Server, this lesson provides an introduction to XML and how it is used
outside SQL Server. You will learn some core XML-related terminology, along with how you can use
schemas to validate and enforce the structure of XML.
This lesson also explores the appropriate uses for XML when you are working with SQL Server.
Lesson Objectives
After completing this lesson, you will be able to:
Determine appropriate use cases for XML data storage in SQL Server.
Question: Do you currently work with applications that use XML? If your application does
use XML, have you considered storing and processing that XML data on SQL Server?
Data Interchange
XML came to prominence as a format for
interchanging data between systems. It follows the
same basic structure rules as other markup
languages (such as HTML) and is used as a self-describing language.
XML Document
<?xml version="1.0" ?>
<?xml-stylesheet href="orders.xsl" type="text/xsl"?>
<orders>
<order id="ord123456">
<customer id="cust0921">
<first-name>Dare</first-name>
<last-name>Obasanjo</last-name>
<address>
<street>One Microsoft
Way</street>
<city>Redmond</city>
<state>WA</state>
Developing SQL Databases 14-3
<zip>98052</zip>
</address>
</customer>
</order>
<order id="ord123457">
<customer id="cust0067">
<first-name>Shai</first-name>
<last-name>Bassli</last-name>
<address>
<street 567 3rd Ave</street>
<city>Saginaw</city>
<state>MI</state>
<zip>53900</zip>
</address>
</customer>
</order>
</orders>
Without any context and information, you can determine that this document holds the details about
customer orders, the customers who placed the order, and the customer’s name and address details. This
explains why XML is defined as a self-describing language. In formal terminology, this is described as
“deriving a schema” from a document.
XML Specifics
The lines in the example document that start with “<?” are referred to as processing instructions. These
instructions are not part of the data, but determine the details of encoding. The first line in the preceding
example is known as the prolog, and shows that version “1.0” of the XML specification is being used. The
second line is a processing instruction that indicates the use of the extensible style sheet “orders.xsl” to
format the document for display, if displaying the document becomes necessary.
The third line of the example is the first tag of the document and defines the “orders” element. Note that
the document data starts with an opening orders element and finishes with a closing orders element
shown as “</orders>.“ XML allows for repeating data, so the above example contains two orders for
different customers.
Note: XML elements are case-sensitive. For example, <street> is not the same as <Street>.
Element-Centric XML
<Supplier>
<Name>Tailspin Toys</Name>
<Rating>12</Rating >
</Supplier>
Attribute-Centric XML
<Supplier Name=”Tailspin Toys” Rating=”12”>
</Supplier>
Note that, if all data for an element is contained in attributes, a shortcut form of element is available.
14-4 Storing and Querying XML Data in SQL Server
Attribute-Centric Shortcut
<Supplier Name="Tailspin Toys" Rating="12"></Supplier>
<Supplier Name="Tailspin Toys" Rating="12"/>
Using the above as an example, the most obvious benefit to the attribute-centric approach is the reduced
size of the data. The element-centric approach needs 65 characters versus the 41 characters needed by
the attribute-centric XML data—a large saving of 37 percent. However, element-centric XML is a better
option in some circumstances, because it can describe more complex data; it can define an element as
nullable; and the data can be parsed quicker, because only the elements need to be processed.
SQL Server can output XML encoded in either way; by using the FOR XML statement, you can produce
XML that combines both these approaches.
XML Document
<order id="ord123456">
<customer id="cust0921" />
</order>
This code provides the details for a single order and would be considered to be an XML document.
XML Fragment
<order id="ord123456">
<customer id="cust0921" />
</order>
<order id="ord123457">
<customer id="cust0925" />
</order>
This text contains the details of multiple orders. Although it is perfectly reasonable XML, it is considered to
be a fragment of XML rather than a document.
Developing SQL Databases 14-5
To be called a document, the XML needs to have a single root element, as shown in the following
example:
Well-formed XML will also include at least a prolog defining which version of the XML specification is
being used.
XML Namespaces
An XML namespace is a collection of names that
you can use as element or attribute names. They
are primarily used to avoid name conflicts on
elements in XML documents.
Name Conflicts
This XML defines a table in HTML:
HTML Table
<table>
<tr>
<td>Chicago</td>
<td>New York</td>
<td>London</td>
<td>Paris</td>
</tr>
</table>
Table Furniture
<table>
<name>Side Table</name>
<length>80</length>
<width>80</width>
<height>100</height>
<legs>4</legs>
</table>
There would be a name conflict if an application required both these XML fragments to be contained
within one XML document. XML has a mechanism to resolve these name conflicts with prefixes.
14-6 Storing and Querying XML Data in SQL Server
The above XML has resolved the name conflict, but isn’t valid XML until the prefixes are defined in a
namespace.
XML Namespace
An XML namespace is defined by using the special attribute xmlns. The value of the attribute must be a
valid Universal Resource Identifier (URI) or a Uniform Resource Name (URN). This namespace URI is most
commonly a URL, which will point to a location on the Internet. This location does not need to link
directly to an XML schema. You will see how XML schemas are related to namespaces in the next topic.
The namespace attributes have to be added at the root element in the XML document, or they can be
duplicated at each node element that requires them.
Best Practice: Industry best practice is to include namespace attributes in the top level
node to reduce unnecessary duplication throughout the document.
Developing SQL Databases 14-7
To make the previous XML document well-formed, the prefixes need to have namespaces associated with
them:
Using Namespaces
<?xml version="1.0" ?>
<data
xmlns:html="http://www.w3.org/TR/html4/"
xmlns:furniture="http://www.nopanet.org/?page=OFDAXmlCatalog">
<html:table>
<html:tr>
<html:td>Chicago</html:td>
<html:td>New York</html:td>
<html:td>London</html:td>
<html:td>Paris</html:td>
</html:tr>
</html:table>
<furniture:table>
<furniture:name>Side Table</furniture:name>
<furniture:length>180</furniture:length>
<furniture:width>80</furniture:width>
<furniture:height>100</furniture:height>
<furniture:legs>4</furniture:legs>
</furniture:table>
</data>
If a prefix isn’t specified in the namespace attribute, that namespace will be used by default in any XML
elements without a prefix.
XML Schemas
XML schemas are used to define the specific
elements, attributes, and layout permitted within
an XML document. A well-formed XML document
is one that fulfills the criteria specified in the
Fragments vs. Documents topic. If a well-formed
XML document is validated against an XML
schema, the document is said to be a valid and
well-formed XML document.
XML schemas are often referred to as XML Schema Definitions (XSDs). XSD is also the default file
extension that most products use when they are storing XML schemas in files.
14-8 Storing and Querying XML Data in SQL Server
This example XSD has a namespace that links to the W3C definition of XML Schemas:
Example XSD
<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:element name="manual">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="title"
type="xsd:string"/>
<xsd:element name="author"
type="xsd:string"/>
<xsd:element name="published"
type="xsd:date"/>
<xsd:element name="version"
type="xsd:decimal"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
The above XML schema could be used to validate the following XML:
XML to be Validated
<manual>
<title>How to use XML and SQL Server</title>
<author>Stephen Jiang</author>
<published>2016-05-07</published>
<version>1.07</version>
</manual>
There is no suggestion that this would make for a good database design, but note that you could use this
table design to store all objects from an application—customers, orders, payments, and so on—in a single
table. Compare this to how tables have been traditionally designed in relational databases.
SQL Server gives the developer a wide range of choices, from a simple XML design at one end of the
spectrum, to fully normalized relational tables at the other end. Recognize that there is no generic answer
for how a SQL Server database should be designed; instead, there’s a range of options.
You may need to achieve a level of interoperability between your relational and XML data. Imagine
that you have to join a customer table with a list of customer IDs that are being sent to you as XML.
You might need to use XML formats to achieve cross-domain applications and to have maximum
portability for your data. Other systems that you are communicating with may be based on entirely
different technologies, and might not represent data in the same way as your database server.
You might not know the structure of your data in advance. It is common to have a mixture of
structured and semistructured data. A table might hold some standard relational columns, but also
hold some less structured data in XML columns. For an example, see the
HumanResources.JobCandidate table in the AdventureWorks database.
You might need to preserve a sequence within your data. For example, you might need to retain
order detail lines in a specific sequence. Relational tables and views have no implicit sequence. XML
documents can exhibit a predictable sequence.
You may want to have SQL Server validate that your XML data meets a particular XML schema before
processing it.
You might want to store transferred XML data for historical reasons, or archival purposes.
You may want to create indexes on your XML data to support faster Business Information (BI) queries.
14-10 Storing and Querying XML Data in SQL Server
Demonstration Steps
Structure XML and Structure XML Schemas
1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are running, and then log
on to 20762C-MIA-SQL as AdventureWorks\Student with the password Pa55w.rd.
3. In the User Account Control dialog box, click Yes, and then wait until the script finishes.
5. In the Connect to Server dialog box, in Server name box, type MIA-SQL and then click Connect.
7. In the Open Project dialog box, navigate to D:\Demofiles\Mod14, click Demo14.ssmssln, and then
click Open.
9. Note the warning line under the second <Production.Product> tag. Position the cursor on the tag
to display the warning.
10. Note that the XML editor in SSMS understands XML and formats it appropriately.
11. Note that the Color attribute is missing from elements where the data is NULL.
Lesson 2
Storing XML Data and Schemas in SQL Server
Now that you have learned about XML, schemas, and the surrounding terminology, you can consider how
to store XML data and schemas in SQL Server. This is the first step in learning how to process XML
effectively within SQL Server.
You need to see how the XML data type is used, how to define schema collections that contain XML
schemas, how to declare both typed and untyped variables and database columns, and how to specify
how well-formed and valid the XML data needs to be before it can be stored.
Lesson Objectives
After completing this lesson, you will be able to:
Choose whether XML fragments can be stored, rather than entire XML documents.
XML Data
SQL Server has a native data type for storing XML
data. You can use it for variables, parameters, and
columns in databases. SQL Server also exposes
several methods that you can use for querying or
modifying the stored XML data.
xml is a built-in data type for SQL Server. It is an
intrinsic data type, which means that it is not
implemented separately through managed code.
The xml data type is limited to a maximum size of
2 GB. You can declare variables, parameters, and
database columns by using the xml data type.
XML Variable
DECLARE @Orders xml;
SET @Orders = '<Customer Name="Terry"><Order ID="231310" ProductID="12124"/></Customer>';
SELECT @Orders;
Canonical Form
Internally, SQL Server stores XML data in a format that makes it easy to process. It does not store the XML
data in the same format as it was received in.
14-12 Storing and Querying XML Data in SQL Server
Canonical Form
DECLARE @Settings xml;
SET @Settings = '<Setup><Application Name="StartUpCleanup"
State="On"></Application><Application Name="Shredder" State="Off">Keeps
Spaces</Application></Setup>';
SELECT @Settings;
<Setup>
<Application Name="StartUpCleanup" State="On" />
<Application Name="Shredder" State="Off">Keeps Spaces</Application>
</Setup>
Note that the output that is returned is logically equivalent to the input, but the output is not exactly the
same as the input. For example, the first closing “</Application>” has been removed and replaced by a
closing “/>”. Semantically, the two pieces of XML are identical, the returned XML is referred to as having
been returned in a canonical or logically equivalent form.
If an exact copy of the XML has to be stored and retrieved from the database, consider storing the XML as
a string in, for example, a nvarchar(max). However, using this approach means you will be unable to make
use of the ability to create indexes on the XML and other methods.
For more information about xml, see Microsoft Docs:
xml (Transact-SQL)
https://aka.ms/Yqkpcw
XML Schemas
XML schemas are legible to humans at some level, but they are designed to be processed by computer
systems. Even simple schemas tend to have quite a high level of complexity. Fortunately, you do not need
to be able to read (or worse, write!) such schemas. Tools and utilities generally create XML schemas, and
SQL Server can create them, too. You will see an example of this in a later lesson.
Developing SQL Databases 14-13
XML Schema
<xsd:schema targetNamespace="urn:schemas-microsoft-com:sql:SqlRowSet1"
xmlns:schema="urn:schemas-microsoft-com:sql:SqlRowSet1"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:sqltypes=http://schemas.microsoft.com/sqlserver/2004/sqltypes
elementFormDefault="qualified">
<xsd:import
namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes"
schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd" />
<xsd:element name="Production.Product">
<xsd:complexType>
<xsd:attribute name="ProductID"
type="sqltypes:int" use="required" />
<xsd:attribute name="Name" use="required">
<xsd:simpleType
sqltypes:sqlTypeAlias=
"[AdventureWorks].[dbo].[Name]">
<xsd:restriction
base="sqltypes:nvarchar"
sqltypes:localeId="1033" sqltypes:sqlCompareOptions=
"IgnoreCase
IgnoreKanaType IgnoreWidth">
sqltypes:localeId="1033" sqltypes:sqlCompareOptions=
"IgnoreCase
IgnoreKanaType IgnoreWidth">
sqltypes:localeId="1033" sqltypes:sqlCompareOptions=
"IgnoreCase
IgnoreKanaType IgnoreWidth">
You create an XML schema collection by using the CREATE XML SCHEMA COLLECTION syntax that is
shown in the following code snippet:
System Views
You can see the details of the existing XML schema collections by querying the
sys.xml_schema_collections system view. You can see the details of the namespaces that are referenced
by XML schema collections by querying the sys.xml_schema_namespaces system view. Like XML, XML
schema collections are not stored in the format that you use to enter them. They are stripped into an
internal format.
You can get an idea of how XML schema collections are stored by querying the
sys.xml_schema_components system view, as shown in the following code example:
Untyped XML
You may choose to store any well-formed XML.
One reason is that you might not have a schema
for the XML data. Another reason is that you
might want to avoid the processing overhead that
is involved in validating the XML against the XML
schema collection. For complex schemas,
validating the XML can involve substantial work.
Developing SQL Databases 14-15
The following example shows the creation of a table that has an untyped XML column:
Untyped XML
CREATE TABLE App.Settings (
SessionID int PRIMARY KEY,
WindowSettings xml
);
You can store any well-formed XML in the WindowSettings column, up to the maximum size, which is
currently 2 GB.
Typed XML
You may want to have SQL Server validate your data against a schema. You might want to take advantage
of storage and query optimizations, based on the type information, or want to take advantage of this type
information during the compilation of your queries.
The following example shows the same table being created, but this time, it has a typed XML column:
Typed XML
CREATE TABLE App.Settings (
SessionID int PRIMARY KEY,
WindowSettings xml (SettingsSchemaCollection)
);
In this case, a schema collection called SettingsSchemaCollection has been defined. SQL Server will not
enable data to be stored in the WindowSettings column if it does not meet the requirements of at least
one of the XML schemas in SettingsSchemaCollection.
CONTENT Keyword
CREATE TABLE App.Settings (
SessionID int PRIMARY KEY,
WindowSettings xml (CONTENT SettingsSchemaCollection)
);
SQL Server assumes XML data will be in the form of fragments, as opposed to documents, by default. This
means that the preceding definition will have the same result as the Typed XML example Transact-SQL in
the previous topic.
DOCUMENT Keyword
CREATE TABLE App.Settings (
SessionID int PRIMARY KEY,
WindowSettings xml (DOCUMENT SettingsSchemaCollection)
);
In this case, XML fragments could not be stored in the WindowSettings column. Only well-formed XML
documents will be allowed. For example, a column that is intended to store a customer order can then be
presumed to actually hold a customer order, and not some other type of XML document.
Demonstration Steps
Work with Typed and Untyped XML
4. Select the code under Step 2, and then click Execute to create an XML schema collection.
5. Select the code under Step 3, and then click Execute to create a table with a column that uses the
collection.
6. Select the code under Step 4, and then click Execute to try to insert malformed XML. Note that the
INSERT statement fails.
7. Select the code under Step 5, and then click Execute to try to insert well-formed XML that does not
conform to the schema. Note that this INSERT statement fails too.
8. Select the code under Step 6, and then click Execute to insert a single row fragment.
9. Select the code under Step 7, and then click Execute to insert a multirow fragment.
10. Select the code under Step 8, and then click Execute to view the added XML data in the table.
11. Leave SSMS open for the next demonstration.
Developing SQL Databases 14-17
Lesson 3
Implementing the XML Data Type
Indexes on XML columns are critical for achieving the high performance of XML-based queries. There are
four types of XML index: a primary index and three types of secondary index. This lesson discusses how
you can use each of them to achieve the maximum performance gain for your queries.
Lesson Objectives
After completing this lesson, you will be able to:
It is important to note that XML indexes can be quite large, compared to the underlying XML data.
Relational indexes are often much smaller than the tables on which they are built, but it is not uncommon
to see XML indexes that are larger than the underlying data.
You should also consider alternatives to XML indexes. Promoting a value that is stored within the XML to
a persisted calculated column would make it possible to use a standard relational index to quickly locate
the value.
Developing SQL Databases 14-19
Based on the App.Settings table that was used as an example earlier, you could create a primary XML
index by executing the following code:
A PATH index helps to decide whether a particular path to an element or attribute is valid. It is
typically used with the exist() XQuery method. (XQuery is discussed later in this module.)
A VALUE index helps to obtain the value of an element or attribute.
A PROPERTY index is used when retrieving multiple values through PATH expressions.
You can only create a secondary XML index after a primary XML index has been established.
When you are creating the secondary XML index, you need to reference the primary XML index.
Demonstration Steps
1. Ensure that you have completed the previous demonstration.
4. Select the code under Step 2, and then click Execute to create a primary XML index.
5. Select the code under Step 3, and then click Execute to create a secondary VALUE index.
6. Select the code under Step 4, and then click Execute to query the sys.xml_indexes system view.
7. Select the code under Step 5, and then click Execute to drop and recreate the table without a
primary key.
8. Select the code under Step 6, and then click Execute to try to add the primary xml index again.
Note that this will fail.
Categorize Activity
Categorize each statement against the appropriate index. Indicate your answer by writing the category
number to the right of each statement.
Items
Category 1 Category 2
PRIMARY SECONDARY
Index Index
14-22 Storing and Querying XML Data in SQL Server
Lesson 4
Using the Transact-SQL FOR XML Statement
You have seen how SQL Server can store XML in its different formats. In this lesson, you will see how to
retrieve XML from data stored in traditional tables and rows.
We often need to return data as XML documents, even though it is stored in relational database columns.
Typically, this requirement relates to the exchange of data with other systems, including those from other
organizations. When you add the FOR XML clause to a traditional Transact-SQL SELECT statement, it
causes the output to be returned as XML instead of as a relational rowset.
SQL Server provides several modes for the FOR XML clause to enable the production of many styles of
XML document. You will be learning about each of these modes and their related options.
Lesson Objectives
After completing this lesson, you will be able to:
Explain the role of the FOR XML clause.
3. EXPLICIT mode gives you more control over the shape of the XML. You can use it when other modes
do not provide enough flexibility, but this is at the cost of greater complexity. In deciding the shape
of the XML, you can mix attributes and elements as you like.
4. PATH mode, together with the nested FOR XML query capability, provides much of the flexibility of
the EXPLICIT mode in a simpler manner.
Each of these modes are covered in more detail in the next topics.
Developing SQL Databases 14-23
A. Leonetti SC
A. Wright GC
A. Scott Wright EM
Aaron Adams IN
Aaron Alexander IN
Now consider the modified statement after adding the FOR XML clause:
Note that one XML <row> element is returned for each row from the rowset, the element has a generic
name of row, and all columns are returned as attributes. The returned order is based on the ORDER BY
clause.
14-24 Storing and Querying XML Data in SQL Server
You can override the default row name in the XML and to add a root element:
<People>
<Person FirstName="A." LastName="Leonetti" PersonType="SC" />
<Person FirstName="A." LastName="Wright" PersonType="GC" />
<Person FirstName="A. Scott" LastName="Wright" PersonType="EM" />
<Person FirstName="Aaron" LastName="Adams" PersonType="IN" />
<Person FirstName="Aaron" LastName="Alexander" PersonType="IN" />
</People>
Element-Centric XML
You will notice that, in the previous examples, the columns from the rowset have been returned as
attribute-centric XML. You can modify this behavior to produce element-centric XML by adding the
ELEMENTS keyword to the FOR XML clause.
You can see this in the following query:
Element-Centric XML
SELECT TOP 5 FirstName, LastName, PersonType
FROM Person.Person
ORDER BY FirstName, LastName
FOR XML RAW('Person'),
ROOT('People'),
ELEMENTS;
<People>
<Person>
<FirstName>A.</FirstName>
<LastName>Leonetti</LastName>
<PersonType>SC</PersonType>
</Person>
<Person>
<FirstName>A.</FirstName>
<LastName>Wright</LastName>
<PersonType>GC</PersonType>
</Person>
<Person>
<FirstName>A. Scott</FirstName>
<LastName>Wright</LastName>
<PersonType>EM</PersonType>
</Person>
<Person>
<FirstName>Aaron</FirstName>
<LastName>Adams</LastName>
<PersonType>IN</PersonType>
</Person>
<Person>
<FirstName>Aaron</FirstName>
<LastName>Alexander</LastName>
<PersonType>IN</PersonType>
</Person>
Developing SQL Databases 14-25
</People>
Note that all the columns have now been returned as elements.
The above Transact-SQL will result in the following XML being generated:
<People>
<xsd:schema targetNamespace="urn:schema_example.com"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes"
elementFormDefault="qualified">
<xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes"
schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd" />
<xsd:element name="Person">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="FirstName">
<xsd:simpleType sqltypes:sqlTypeAlias="[AdventureWorks].[dbo].[Name]">
<xsd:restriction base="sqltypes:nvarchar" sqltypes:localeId="1033"
sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth"
sqltypes:sqlSortId="52">
<xsd:maxLength value="50" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="LastName">
<xsd:simpleType sqltypes:sqlTypeAlias="[AdventureWorks].[dbo].[Name]">
<xsd:restriction base="sqltypes:nvarchar" sqltypes:localeId="1033"
sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth"
sqltypes:sqlSortId="52">
<xsd:maxLength value="50" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element name="PersonType">
<xsd:simpleType>
<xsd:restriction base="sqltypes:nchar" sqltypes:localeId="1033"
sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth"
sqltypes:sqlSortId="52">
<xsd:maxLength value="2" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
<Person xmlns="urn:schema_example.com">
<FirstName>A.</FirstName>
<LastName>Leonetti</LastName>
<PersonType>SC</PersonType>
</Person>
<Person xmlns="urn:schema_example.com">
14-26 Storing and Querying XML Data in SQL Server
<FirstName>A.</FirstName>
<LastName>Wright</LastName>
<PersonType>GC</PersonType>
</Person>
<Person xmlns="urn:schema_example.com">
<FirstName>A. Scott</FirstName>
<LastName>Wright</LastName>
<PersonType>EM</PersonType>
</Person>
<Person xmlns="urn:schema_example.com">
<FirstName>Aaron</FirstName>
<LastName>Adams</LastName>
<PersonType>IN</PersonType>
</Person>
<Person xmlns="urn:schema_example.com">
<FirstName>Aaron</FirstName>
<LastName>Alexander</LastName>
<PersonType>IN</PersonType>
</Person>
</People>
Let’s look at the previous SQL example using the AUTO mode:
Each table in the FROM clause, from which at least one column is listed in the SELECT clause, is
represented as an XML element. The columns that are listed in the SELECT clause are mapped to
attributes. You can see the output of the previous query below:
Note how the name of the table is directly used as the element name.
For this reason, it is common to provide an alias for the table, as shown in the following code:
To generate the well-formed XML that was produced in the previous topic, you can add the
ROOT('People') option to the FOR XML AUTO statement.
Executing the above will result in the following XML being produced:
<Employees>
<Employee>
<FirstName>A. Scott</FirstName>
<LastName>Wright</LastName>
<Address>
<City>Newport Hills</City>
</Address>
</Employee>
<Employee>
<FirstName>Aaron</FirstName>
<LastName>Adams</LastName>
<Address>
<City>Downey</City>
</Address>
</Employee>
<Employee>
<FirstName>Aaron</FirstName>
<LastName>Alexander</LastName>
<Address>
<City>Kirkland</City>
</Address>
</Employee>
</Employees>
14-28 Storing and Querying XML Data in SQL Server
Notice how the City value is nested inside the Address element, which in turn is nested inside the
Employee element.
PATH mode, together with the nesting of FOR XML queries and the TYPE clause, gives enough power to
replace most of the EXPLICIT mode queries in a simpler, more maintainable way. EXPLICIT mode is rarely
needed now.
To produce XML in a similar format to the previous topic, the following SQL is required:
You can use FOR XML EXPLICIT mode queries to construct such XML from a rowset, but PATH mode
provides a simpler alternative to the potentially time-consuming EXPLICIT mode queries.
You can use PATH mode, together with the ability to write nested FOR XML queries and the TYPE
directive to return xml data type instances, to write less complex queries. This gives enough power to
replace most of the EXPLICIT mode queries in a simpler, more maintainable way.
The XPath expressions can be used to control the structure of the XML. You can modify the default PATH
behavior by using the “at” (@) symbol to define attributes or the forward slash (/) to define the hierarchy.
<Employees>
<Employee>
<FirstName>A. Scott</FirstName>
<LastName>Wright</LastName>
<Address AddressID="250">
<City>Newport Hills</City>
</Address>
</Employee>
<Employee>
<FirstName>Aaron</FirstName>
<LastName>Adams</LastName>
<Address AddressID="25953">
<City>Downey</City>
</Address>
</Employee>
<Employee>
<FirstName>Aaron</FirstName>
<LastName>Alexander</LastName>
<Address AddressID="14543">
<City>Kirkland</City>
</Address>
</Employee>
</Employees>
Note the use of the “/” to define the structure of the returned XML, and that both the “@” and “/” can be
combined in aliases. If the alias for the Address.AddressID did not include the “/”, the AddressID attribute
would have been added to the Employee element. For example:
<Employee AddressID="250">
<FirstName>A. Scott</FirstName>
<LastName>Wright</LastName>
<Address>
<City>Newport Hills</City>
</Address>
</Employee>
You can use any combination of RAW, AUTO and PATH queries. The previous SQL example will produce
this output:
<Employees>
<Employee ID="225" Login="adventure-works\alan0">
<EmployeeName>
<FirstName>Alan</FirstName>
<LastName>Brewer</LastName>
</EmployeeName>
</Employee>
<Employee ID="193" Login="adventure-works\alejandro0">
<EmployeeName>
Developing SQL Databases 14-31
<FirstName>Alejandro</FirstName>
<LastName>McGuel</LastName>
</EmployeeName>
</Employee>
<Employee ID="163" Login="adventure-works\alex0">
<EmployeeName>
<FirstName>Alex</FirstName>
<LastName>Nayberg</LastName>
</EmployeeName>
</Employee>
</Employees>
TYPE Keyword
In the previous topics in this lesson, you have seen
how FOR XML AUTO queries can return attribute-
centric or element-centric XML. If this data is
returned from a subquery, it needs to be returned
as a specific data type.
You can use a nested FOR XML query, in a FOR XML query, to build nested XML.
You can use any combination of RAW, AUTO and PATH queries. The previous SQL example will produce
this output:
<Employees>
<Employee ID="225" Login="adventure-works\alan0">
<EmployeeName>
<FirstName>Alan</FirstName>
<LastName>Brewer</LastName>
</EmployeeName>
</Employee>
<Employee ID="193" Login="adventure-works\alejandro0">
<EmployeeName>
14-32 Storing and Querying XML Data in SQL Server
<FirstName>Alejandro</FirstName>
<LastName>McGuel</LastName>
</EmployeeName>
</Employee>
<Employee ID="163" Login="adventure-works\alex0">
<EmployeeName>
<FirstName>Alex</FirstName>
<LastName>Nayberg</LastName>
</EmployeeName>
</Employee>
</Employees>
Another type of nested XML query might be where XML is required in the results as an actual column
containing XML alongside non-XML columns.
Executing the previous Transact-SQL will return a table of results, with the last column containing the
required XML data. This will be hyperlinked; if it was a varchar, the results would appear as plain text.
11000 9 <soh
SalesOrderID="43793"
Status="5" /><soh
SalesOrderID="51522"
Status="5" /><soh
SalesOrderID="57418"
Status="5" />
11001 9 <soh
SalesOrderID="43767"
Status="5" /><soh
SalesOrderID="51493"
Status="5" /><soh
SalesOrderID="72773"
Status="5" />
11002 9 <soh
SalesOrderID="43736"
Status="5" /><soh
SalesOrderID="51238"
Status="5" /><soh
SalesOrderID="53237"
Status="5" />
Developing SQL Databases 14-33
Demonstration Steps
1. Ensure that you have completed the previous demonstration.
5. Select the code under Step 3, click Execute to execute an AUTO mode query, and then review the
results.
6. Select the code under Step 4, click Execute to execute an EXPLICIT mode query, and then review the
results.
7. Select the code under Step 5, click Execute to execute PATH mode queries, and then review the
results.
8. Select the code under Step 6, click Execute to execute a query using TYPE, and then review the
results.
9. Select the code under Step 7, click Execute to run the same query without using the TYPE keyword,
and then compare the results with those obtained in the previous step.
Lesson 5
Getting Started with XQuery
XQuery allows you to query XML data. Sometimes data is already in XML and you need to query it
directly. You might want to extract part of the XML into another XML document; you might want to
retrieve the value of an element or attribute; you might want to check whether an element or attribute
exists; and finally, you might want to directly modify the XML. XQuery methods make it possible to
perform these tasks.
Lesson Objectives
After completing this lesson, you will be able to:
What is XQuery?
XQuery is a query language that is designed to
query XML documents. It also includes elements of
other programming languages, such as looping
constructs.
XQuery was developed by a working group within
the World Wide Web Consortium. It was
developed in conjunction with other work in the
W3C, in particular, the definition of Extensible
Stylesheet Language Transformations (XSLT). XSLT
makes use of a subset of XQuery that is known as
XPath.
XPath Expression
/SalesHistory/Sale[@InvoiceNo=635]
This XPath expression specifies a need to traverse the SalesHistory node—that is the root element
because the expression starts with a slash mark (/)—then traverse the Sale subelements (note that there
may be more than one of these), and then to access the InvoiceNo attribute. All invoices that have an
invoice number attribute equal to 635 are returned.
Although there is unlikely to be more than one invoice with the number 635, nothing about XML syntax
(without a schema) enforces this. One thing that can be hard to get used to with the XPath syntax is that
you constantly need to specify that you want the first entry of a particular type—even though logically
Developing SQL Databases 14-35
you may think that it should be obvious that there would only be one. You indicate the first entry in a list
by the expression [1].
To return the first sales record if there are more than one with an invoice number equal to 635:
In XPath, you indicate attributes by using the “at” (@) prefix. The content of the element itself is referred
to by the token text ().
FLWOR Expressions
In addition to basic path traversal, XPath supports an iterative expression language that is known as
FLWOR and commonly pronounced “flower.” FLWOR stands for “for, let, where, order, and return,” which
are the basic operations in a FLWOR query.
FLWOR Expression
SELECT @xmlDoc.query
('
<OrderedItems>
{
for $i in /InvoiceList/Invoice/Items/Item
return $i
}
</OrderedItems>
');
This query supplies OrderedItems as an element. Then, within that element, it locates all items on all
invoices that are contained in the XML document and displays them as subelements of the OrderedItems
element. An example of what the output may look like from this query is shown here:
<OrderedItems>
<Item Product=”1” Price=”1.99” Quantity=”2” />
<Item Product=”3” Price=”2.49” Quantity=”1” />
<Item Product=”1” Price=”1.99” Quantity=”2” />
</OrderedItems>
Note that becoming proficient at XQuery is an advanced topic that is beyond the scope of this course. The
aim of this lesson is to make you aware of what is possible when you are using XQuery methods. The
available XQuery methods are shown in the following table:
Method Description
query() This method returns untyped XML; the XML is selected by an XQuery expression.
value() This method returns a scalar value; it takes XQuery and a SQL Type as its
parameters.
exist() This method returns a bit value; 1 if a node is found to exist; 0 if a node isn’t found
for the specific XQuery expression.
modify() This method modifies the contents of an XML document based on the XML DML
expression.
nodes() This method can be used to shred XML into relational data.
14-36 Storing and Querying XML Data in SQL Server
Shredding XML is covered in more detail in the next lesson, as is the nodes() method.
Advantages of XQuery:
When queries are written in XQuery, they require less code, compared to queries written in XSLT.
XQuery can be used as a strongly typed language when the XML data is typed; this can improve the
performance of the query by avoiding implicit type casts and provide type assurances that can be
used when performing query optimization.
XQuery can be used as a weakly typed language for untyped data to provide high usability. SQL
Server implements static type inferencing with support for both strong and weak type relationships.
XQuery 3.0 became a W3C recommendation on April 8, 2014, and will be supported by major
database vendors. SQL Server currently supports the W3C version XQuery 1.0.
For more information about XQuery, see Microsoft Docs:
An XQuery expression in SQL Server consists of two sections: a prolog and a body. The prolog can contain
a namespace declaration. You will see how to do this later in this module. The body of an XQuery
expression contains query expressions that define the result of the query. Both the input and output of a
query() method are XML.
Note that, if NULL is passed to a query() method, the result that the method returns is also NULL.
query() Method
WITH XMLNAMESPACES ( DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/StoreSurvey' )
SELECT Top 3 BusinessEntityID,
Demographics.query('(/StoreSurvey/AnnualRevenue)') AS Revenue,
Demographics.query('(/StoreSurvey/NumberEmployees)') As Staff
FROM Sales.Store;
Developing SQL Databases 14-37
This query tells SQL Server to return the business entity id, the annual revenue, and the number of staff for
every row in the Sales.Store table. Do not be too concerned with the namespace declaration in this
example; because the XML document in this column has a defined namespace, the query() method needs
to be made aware of it. Running the previous example will return rows in the following format:
BusinessE
Revenue Staff
ntityID
value() Method
WITH XMLNAMESPACES ( DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/StoreSurvey' )
SELECT Top 3 BusinessEntityID,
Demographics.value('(/StoreSurvey/AnnualRevenue/text())[1]',’decimal’) AS
Revenue,
Demographics.value('(/StoreSurvey/NumberEmployees/text())[1]','int') As
Staff
FROM Sales.Store;
The previous Transact-SQL makes a few amendments to the XQuery, the main one being the addition of
[1] to return the first element to the value() method. The value method takes a second parameter that is
the SQL type that needs to be returned. There are some exclusions on the SQL type that can be returned.
The types not allowed are the xml data type, a common language runtime (CLR) user-defined type, image,
14-38 Storing and Querying XML Data in SQL Server
text, ntext, or sql_variant data type. In the previous example, the value() method is returning a decimal
value for the revenue, and an integer for the number of staff. The result of executing this code is:
292 80000 13
294 80000 14
296 80000 15
exist() Method
WITH XMLNAMESPACES ( DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/StoreSurvey' )
SELECT TOP 3 BusinessEntityID,
Demographics.value('(/StoreSurvey/AnnualRevenue)[1]','decimal') AS Revenue,
Demographics.value('(/StoreSurvey/NumberEmployees)[1]','int') As Staff
FROM Sales.Store
WHERE Demographics.exist('/StoreSurvey[NumberEmployees=14]') = 1;
The previous example will return all the stores in the Sales.Store table that have exactly 14 members of
staff. The WHERE clause can make use of valid XQuery expression. Running the above will result in the
following:
294 80000 14
344 80000 14
372 80000 14
Developing SQL Databases 14-39
Note that, unlike the previous methods, an error is returned if NULL is passed to the modify() method.
The previous example will change all the rows to have the following XML structure; note the added
comments element at the bottom of the XML document:
<StoreSurvey xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/StoreSurvey">
<AnnualSales>800000</AnnualSales>
<AnnualRevenue>80000</AnnualRevenue>
<BankName>International Bank</BankName>
<BusinessType>BM</BusinessType>
<YearOpened>1991</YearOpened>
<Specialty>Touring</Specialty>
<SquareFeet>18000</SquareFeet>
<Brands>4+</Brands>
<Internet>T1</Internet>
<NumberEmployees>14</NumberEmployees>
<p1:Comments xmlns:p1="http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/StoreSurvey">Problem with staff levels</p1:Comments>
</StoreSurvey>
14-40 Storing and Querying XML Data in SQL Server
Demonstration Steps
1. Ensure that you have completed the previous demonstration.
2. In SSMS, in Solution Explorer, double-click Demonstration 5.sql.
4. Select the code under Step 2, and then click Execute to create the trigger.
5. Select the code under Step 3, and then click Execute to test the trigger.
6. Select the code under Step 4, and then click Execute to drop the trigger.
7. Select the code under Step 5, and then click Execute to create a trigger to enforce naming
conventions.
8. Select the code under Step 6, and then click Execute to test the trigger. Note that the code to create
a stored procedure named sp_GetVersion fails, due to the trigger.
9. Select the code under Step 7, and then click Execute to create a trigger to enforce tables to have
primary keys.
10. Select the code under Step 8, and then click Execute to test the trigger. Note that the CREATE TABLE
statement will fail because there is no primary key defined.
11. Select the code under Step 9, and then click Execute to clean up the database.
Categorize Activity
Categorize each item against its correct XQuery method. Indicate your answer by writing the method
number to the right of each item.
Items
3 Returns 1, 0 or NULL.
Lesson 6
Shredding XML
Another scenario is the need to extract relational data from an XML document. For example, you might
receive a purchase order from a customer in XML format. You then parse the XML to retrieve the details
of the items that you need to supply.
The extraction of relational data from within XML documents is referred to as “shredding” the XML
documents. There are two ways to do this. SQL Server 2000 introduced the creation of an in-memory tree
that you could then query by using an OPENXML function. Although that is still supported, SQL Server
2005 introduced the XQuery nodes() method; in many cases, this will be an easier way to shred XML data.
In addition to covering these areas in this lesson, you will also see how Transact-SQL provides a way of
simplifying how namespaces are referred to in queries.
Lesson Objectives
After completing this lesson, you will be able to:
2. Call sp_xml_preparedocument to create an in-memory node tree, based on the input XML.
3. Use the OPENXML table-valued function to query the in-memory node tree and extract the relational
data.
4. Process the retrieved relational data with other relational data as part of standard Transact-SQL
queries.
For example:
nodes() Method
WITH XMLNAMESPACES ('http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/Resume' AS ns)
SELECT candidate.JobCandidateID,
Employer.value('ns:Emp.StartDate','date') As StartDate,
Employer.value('ns:Emp.EndDate','date') EndDate,
Employer.value('ns:Emp.OrgName','nvarchar(4000)') As CompanyName
FROM HumanResources.JobCandidate AS candidate
CROSS APPLY candidate.Resume.nodes('/ns:Resume/ns:Employment') AS Resume(Employer)
WHERE JobCandidateID IN (1,2,3);
The previous example loops through the resumés of candidates in the HumanResources.JobCandidate
table. It uses the nodes() method to select the ns:Employment XML element and obtain information
about the candidates employers. Executing the previous example returns the following results:
sp_xml_preparedocument
sp_xml_preparedocument is a system stored
procedure that takes XML either as the untyped
xml data type or as XML stored in the nvarchar
data type; creates an in-memory node tree from
the XML (to make it easier to navigate); and
returns a handle to that node tree.
sp_xml_preparedocument reads the XML text that was provided as input, parses the text by using the
Microsoft XML Core Services (MSXML) parser (Msxmlsql.dll), and provides the parsed document in a state
that is ready for consumption. This parsed document is a tree representation of the various nodes in the
XML document, such as elements, attributes, text, and comments.
Before you call sp_xml_preparedocument, you need to declare an integer variable to be passed as an
output parameter to the procedure call. When the call returns, the variable will then be holding a handle
to the node tree.
It is important to realize that the node tree must stay available and unmoved in visible memory because
the handle is basically a pointer that needs to remain valid. This means that, on 32-bit systems, the node
tree cannot be stored in Address Windowing Extensions (AWE) memory.
sp_xml_removedocument
sp_xml_removedocument is a system stored procedure that frees the memory that a node tree occupies
and invalidates the handle.
In SQL Server 2000, sp_xml_preparedocument created a node tree that was session-scoped; that is, the
node tree remained in memory until the session ended or until sp_xml_removedocument was called. A
common coding error was to forget to call sp_xml_removedocument. Leaving too many node trees to
remain in memory was known to cause a severe lack of available low-address memory on 32-bit systems.
Therefore, a change was made in SQL Server 2005 that made the node trees created by
sp_xml_preparedocument become batch-scoped rather than session-scoped. Even though the tree will
be removed at the end of the batch, it is good practice to explicitly call sp_xml_removedocument to
minimize the use of low-address memory as much as possible.
Note that 64-bit systems generally do not have the same memory limitations as 32-bit systems.
Developing SQL Databases 14-45
OPENXML Function
The OPENXML function provides a rowset over in-
memory XML documents, which is similar to a
table or a view. OPENXML gives access to the XML
data as though it is a relational rowset. It does this
by providing a rowset view of the internal
representation of an XML document.
The parameters that are passed to OPENXML are: the XML document handle; a rowpattern, which is an
XPath expression that maps the nodes of XML data to rows; and a flag that indicates whether to use
attributes rather than elements by default. Associated with the OPENXML clause is a WITH clause that
provides a mapping between the rowset columns and the XML nodes.
OPENXML Function
DECLARE @xmldoc AS int, @xml AS xml;
SELECT @xml=Resume FROM HumanResources.JobCandidate WHERE JobCandidateID=1;
EXEC sp_xml_preparedocument @xmldoc OUTPUT,
@xml,
'<root xmlns:ns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/Resume"/>';
In the above example, the OPENXML function is passed an @xml variable that contains a single XML
document, representing a resumé from the HumanResources.JobCandidate table. The XPath expression
“/ns:Resume/ns:Employment” selects the Employment nodes from the document. Finally, the optional flag
of 2 indicates to the OPENXML function that the WHERE clause is matching on elements instead of
attributes. Executing the previous Transact-SQL produces these results:
The optional flag is a byte value, and therefore can be a combination of the following options:
Byte
Description
value
8 Can be combined (logical OR) with the previous values. In the context of retrieval, this
flag indicates that the consumed data should not be copied to the overflow property
@mp:xmltext.
The first few rows returned from executing the preceding code are:
Namespaces can be used in one of two ways when working with the xml data methods and FOR XML
statements. The first way requires that they are repeated for every method call where they are required.
Taking a previous query() example and rewriting it:
The preferred method for referencing an XML namespace is to use the WITH statement. The benefits are
that it only has to be declared once at the beginning of the query. The WITH statement can be used for
both FOR XML statements and xml data method calls. For example:
nodes() Method
The nodes() method provides an easier way to
shred XML into relational data than OPENXML and
its associated system stored procedures.
nodes() Method
The nodes() method is an XQuery method that is
useful when you want to shred an xml data type
instance into relational data. It is a table-valued
function that enables you to identify nodes that
will be mapped into a new relational data row.
You should be careful about the query plans that are generated when you use the nodes() method. In
particular, no cardinality estimates are available when you use this method. This has the potential to lead
to poor query plans. In some cases, the cardinality is simply estimated to be a fixed value of 10,000 rows.
This might cause an inappropriate query plan to be generated if your XML document contained only a
handful of nodes.
Developing SQL Databases 14-49
APPLY operations cause table-valued functions to be called for each row in the left table of the query.
The following example searches for any telephone numbers that have been captured by support staff and
recorded as additional contact information:
In this query, for every row in the Person.Person table, where the AdditionalContactInfo column isn’t
NULL, the nodes() method is called. When table-valued functions are used in queries like this, you must
provide an alias for both the derived table and the columns that it contains. In this case, the alias provided
to the derived table is helpdesk, and the alias provided to the extracted column is contact.
One output row is being returned for each node at the level of the XPath expression /
AdditionalContactInfo/act:telephoneNumber. From the returned XML column (contact), the
ChangeInContactNumber column is generated by calling the value() method. Executing the previous
query returns these results:
Note that, in these results, two telephone numbers have been returned for Abel. Looking at the XML in
the AdditionalContactInfo row, there are three “act:number” XML nodes:
206-555-2222
206-555-1234
206-555-1244
14-50 Storing and Querying XML Data in SQL Server
The last number is not contained in the results because the XPath to it is /AdditionalContactInfo/act:pager
instead of /AdditionalContactInfo/act:telephoneNumber.
<AdditionalContactInfo
xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/ContactInfo"
xmlns:crm="http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/ContactRecord"
xmlns:act="http://schemas.microsoft.com/sqlserver/2004/07/adventure-
works/ContactTypes">
These are additional phone and pager numbers for the customer.
<act:telephoneNumber><act:number>206-555-2222</act:number>
<act:SpecialInstructions>On weekends, contact the manager at this number.
</act:SpecialInstructions></act:telephoneNumber>
<act:telephoneNumber><act:number>206-555-1234</act:number> </act:telephoneNumber>
<act:pager><act:number>206-555-1244</act:number><act:SpecialInstructions>Do not page
between 9:00 a.m. and 5:00 p.m.</act:SpecialInstructions></act:pager>
Customer provided this additional home address…
Demonstration Steps
1. Ensure that you have completed the previous demonstration.
4. Select the code under Step 2, and then click Execute to select the contents of the dbo.DatabaseLog
table.
5. In the results pane, in the XmlEvent column, click the first entry to view the format of the XML. Note
that this is the EVENTDATA structure returned by the DDL and LOGON triggers.
6. Switch back to the Demonstration 6.sql pane, select the code under Step 4, and then click Execute.
Compare the first row in the results with the data shown in the XmlEvent1.xml pane.
8. Select the code under Step 6, and then click Execute to show the same results obtained by using
OPENXML.
You also have an upcoming project that will require the use of XML data in SQL Server. No members of
your current team have experience working with XML data in SQL Server. You need to learn how to
process XML data within SQL Server and you have been given some sample queries to assist with this
learning. Finally, you will use what you have learned to write a stored procedure for the marketing system
that returns XML data.
Objectives
After completing this lab, you will be able to:
Password: Pa55w.rd
Scenarios
Scenario Requirements
Results: After completing this exercise, you will have determined the appropriate use cases for XML
storage.
Task 1: Review, Execute, and Review the Results of the XML Queries
1. Open SQL Server Management Studio and connect to the MIA-SQL instance of SQL Server using
Windows authentication.
2. Open D:\Labfiles\Lab14\Starter\InvestigateStorage.sql.
3. For each query in the file, review the code, execute the code, and determine how the output results
relate to the queries.
Results: After this exercise, you will have seen how XML data is stored in variables.
14-54 Storing and Querying XML Data in SQL Server
Task 1: Review, Execute, and Review the Results of the XML Queries
1. Open D:\Labfiles\Lab14\Starter\XMLSchema.sql.
2. For each query in the file, review the code, execute the code, and determine how the output results
relate to the queries.
Results: After this exercise, you will have seen how to create XML schema collections.
1. Review, Execute, and Review the Results of the FOR XML Queries
Task 1: Review, Execute, and Review the Results of the FOR XML Queries
1. Open D:\Labfiles\Lab14\Starter\XMLQuery.sql.
2. For each query in the file, review the code, execute the code, and determine how the output results
relate to the queries.
Results: After this exercise, you will have seen how to use FOR XML.
Developing SQL Databases 14-55
Supporting Documentation
Input None.
Parameters:
Output None.
Parameters:
Output Rows within the XML should be in order of SellStartDate ascending, and then
Order: ProductName ascending. That is, sort by SellStartDate first, and then ProductName
within SellStartDate.
Stored
Sales.UpdateSalesTerritoriesByXML
Procedure
Output None.
Parameters:
Returned None.
Rows:
Actions: Update the SalesTerritoryID column in the Sales.Salesperson table, based on the
SalesTerritoryID values extracted from the input parameter.
14-56 Storing and Querying XML Data in SQL Server
4. If Time Permits: Create a Stored Procedure to Update the Sales Territories Table
Task 4: If Time Permits: Create a Stored Procedure to Update the Sales Territories
Table
1. If time permits, implement the Sales.UpdateSalesTerritoriesByXML stored procedure.
2. Test the created stored procedure with the example incoming XML.
Results: After this exercise, you will have a new stored procedure that returns XML in the AdventureWorks
database.
Developing SQL Databases 14-57
Review Question(s)
Question: Which XML query mode did you use for implementing the
WebStock.GetAvailableModelsAsXML stored procedure?
15-1
Module 15
Storing and Querying Spatial Data in SQL Server
Contents:
Module Overview 15-1
Lesson 1: Introduction to Spatial Data 15-2
Module Overview
This module describes spatial data and how this data can be implemented within SQL Server®.
Objectives
After completing this module, you will be able to:
Lesson 1
Introduction to Spatial Data
Many business applications work with addresses or locations, so it is helpful to understand the different
spatial data types, and where they are typically used. SQL Server can process both planar and geodetic
data. In this lesson, we will also consider how the SQL Server data types relate to industry standard
measurement systems.
Lesson Objectives
After completing this lesson, you will be able to:
Explain the relationship between the spatial data support in SQL Server and the industry standards.
Target Applications
There is a perception that spatial data is not useful
in mainstream applications. However, this
perception is invalid: most business applications
can benefit from the use of spatial data.
Business Applications
Although mapping provides an interesting
visualization in some cases, business applications
can make good use of spatial data for more
routine tasks. Almost all business applications
involve the storage of addresses or locations.
Customers or clients have street addresses, mailing
addresses, and delivery addresses. The same is true
for stores, offices, suppliers, and many other business-related entities.
It could be true that customers really do purchase from their local store and the owner was misled by a
small sample of data. Or perhaps customers really do travel to a store other than their local branch
because the products they require are not stocked locally.
Developing SQL Databases 15-3
These sorts of questions are normal in most businesses, and you can answer them quite easily if you
process spatial data in a database.
To store raster data in SQL Server, you could use the varbinary data type, although you would not be able
to directly process the data.
Spatial data in SQL Server is based on 2-D technology. In some of the objects and properties that it
provides, spatial data in SQL Server supports the storage and retrieval of 3-D and 4-D values, but it is
important to realize that the third and fourth dimensions are ignored during calculations. This means that
if you calculate the distance between, say, a point and a building, the calculated distance is the same,
regardless of the floor or level in the building where the point is located.
For more information about the various types of spatial data, see Microsoft Docs:
Planar Systems
Before the advent of computer systems, it was very
difficult to perform calculations on round models
of the Earth. For convenience, mapping tended to
be two-dimensional. Most people are familiar with
traditional flat maps of the world.
Geodetic Systems
Geodetic systems represent the Earth as a round shape. Some systems use simple spheres, but in fact the
Earth is not spherical. Spatial data in SQL Server offers several systems for representing the shape of the
Earth. Most systems model the Earth as an ellipsoid rather than as a sphere.
SQL Specification
One of the two data types that SQL Server
provides is the geometry data type. It conforms to
the OGC Simple Features for SQL Specification
version 1.1.0, and is used for planar spatial data. In
addition to defining how to store the data, the
specification details common properties and
methods to be applied to the data.
The OGC defines a series of data types that form an object tree. Curved arc support was added in SQL
Server 2012.
Extensions
SQL Server also extends the standards in several ways—it provides a round-earth data type called
geography, along with several additional useful properties and methods.
Methods and properties that are related to the OGC standard have been defined by using an ST prefix
(such as STDistance). Those without an ST prefix are Microsoft® extensions to the standard (such as
MakeValid).
Developing SQL Databases 15-5
SRID 4326
The World Geodetic System (WGS) is commonly used in cartography, geodetics, and navigation. The latest
standard is WGS 1984 (WGS 84) and is best known to most people through the Global Positioning System
(GPS). GPS is often used in navigation systems and uses WGS 84 as its coordinate system.
In spatial data in SQL Server, SRID 4326 provides support for WGS 84.
If you query the list of SRIDs in SQL Server, the entry for SRID 4326 has the following name. This is
formally called the Well-Known Text (WKT) that is associated with the ID:
If you query the list of SRIDs in SQL Server, the entry for SRID 4326 has the following name. This is
formally called the Well-Known Text (WKT) that is associated with the ID:
WGS 84
GEOGCS["WGS 84", DATUM["World Geodetic System 1984", ELLIPSOID["WGS 84", 6378137,
298.257223563]], PRIMEM["Greenwich", 0], UNIT["Degree", 0.0174532925199433]]
WGS 84 models the Earth as an ellipsoid (you can imagine it as a squashed ellipsoid), with its major radius
of 6,378,137 meters at the equator, a flattening of 1/ 98.257223563 (or about 21 kilometers) at the poles,
a prime meridian (that is, a starting point for measurement) at Greenwich, and a measurement that is
based on degrees. The starting point at Greenwich is specifically based at the Royal Observatory. The units
are shown as degrees and the size of a degree is specified in the final value in the definition. Most
geographic data today would be represented by SRID 4326.
For more information about spatial reference identifiers, see Microsoft Docs:
Spatial Reference Identifiers (SRIDs)
http://aka.ms/bd0j1v
15-6 Storing and Querying Spatial Data in SQL Server
Demonstration Steps
1. Ensure that the 20762C-MIA-DC and 20762C-MIA-SQL virtual machines are running, and then log
on to 20762C-MIA-SQL as AdventureWorks\Student with the password Pa55w.rd.
3. In the User Account Control dialog box, click Yes. When the script completes, press any key.
4. Start SQL Server Manager Studio, and connect to the MIA-SQL instance using Windows
authentication.
5. On the File menu, point to Open, and then click Project/Solution.
6. In the Open Project dialog box, navigate to D:\Demofiles\Mod15, click 20762_15.ssmssln, and
then click Open.
7. In Solution Explorer, expand Queries, and then double-click the 11 - Demonstration 1A.sql script.
8. Highlight the Transact-SQL under the comment Step 1 - Switch to the tempdb database, and click
Execute.
9. Highlight the Transact-SQL under the comment Step 2 - Query the sys.spatial_reference_systems
system view, and click Execute.
10. Highlight the Transact-SQL under the comment Step 3 - Drill into the value for srid 4326, and click
Execute.
11. Highlight the Transact-SQL under the comment Step 4 - Query the available measurement
systems, and click Execute.
varchar
varbinary
int
string
Developing SQL Databases 15-7
Lesson 2
Working with SQL Server Spatial Data Types
SQL Server supports two spatial data types, geometry and geography. They are both system common
language runtime (CLR) data types. This lesson introduces each of these data types, and shows how to
interchange data by using industry-standard formats.
Lesson Objectives
After completing this lesson, you will be able to:
Explain how system CLR types differ from user defined CLR types.
Use Microsoft extensions to the OGC standard when working with spatial data.
Note: Note that, although “latitude and longitude” is a commonly used phrase, the
geographical community uses the terminology in the reverse order. When you are specifying
inputs for geographic data in SQL Server, the longitude value precedes the latitude value.
15-8 Storing and Querying Spatial Data in SQL Server
Additional Support
The Microsoft Bing® Maps software development kit (SDK) integrates closely with spatial data in SQL
Server. SQL Server Reporting Services includes a map control that you can use to render spatial data and a
wizard to help to configure the map control. The map control is available for reports built using Business
Intelligence Development Studio or Report Builder.
An application that stores or retrieves spatial data from a database in SQL Server needs to be able to work
with that data as a spatial data type. To make this possible, a separate installer (MSI) file is provided as
part of the SQL Server Feature Pack, so that client applications can use the spatial data types in SQL
Server. Installing the feature pack on client systems causes an application on the client to “rehydrate” a
geography object that has been read from a SQL Server database into a SqlGeography object within
.NET managed code.
ST Prefix
An ST prefix has been added to the properties and methods that are implementations of the OGC
standards. For example, the X and Y coordinates of a geometry object are provided by STX and STY
properties, and the distance calculation is provided by the STDistance method.
Microsoft extensions to the OGC standards have no prefix added to the name of the methods or
properties. You should take care when referring to properties and methods because they are case-
sensitive, even on servers configured for case-insensitivity.
For more information about the CLR-enabled configuration setting, see Microsoft Docs:
clr enabled Server Configuration Option
http://aka.ms/oq8rnt
The geometry and geography data types are implemented as CLR types by using managed code. They
are defined as system CLR types and work even when CLR integration is not switched on at the SQL Server
instance level.
Developing SQL Databases 15-9
You can see the currently installed assemblies, and whether they are user-defined, by executing the
following query:
As an example of this, look at the following code that is accessing the STX property of a variable called
@Location:
Accessing Properties
SELECT @Location.STX;
You can call methods that are defined on the data types (geometry and geography) rather than on
instances (that is, columns or variables) of those types. This is an important distinction.
As an example of this, look at the following code that is calling the GeomFromText method of the
geometry data type:
Note: Note that you are not calling the method on a column or variable of the geometry
data type, but on the geometry data type itself. In .NET terminology, this refers to this as calling
a public static method on the geometry class. Note that the methods and properties of the
spatial data types are case-sensitive, even on servers that are configured with case-insensitive
default collations.
You can see the input and output of X, Y, Z, and M in the following code:
The SQL Server geometry data type provides comprehensive coverage of the OGC Geometry data type,
and has X and Y coordinates represented by STX and STY properties.
For more information on the geometry data type, see Microsoft Docs:
geometry (Transact-SQL)
http://aka.ms/hl20zm
To enclose points, they should be listed in counterclockwise order. As you draw a shape, points to the left
of the line that you draw will be enclosed by the shape. The points on the line are also included.
If you draw a postal code region in a clockwise direction, you are defining all points outside the region. In
versions of SQL Server before 2012, this would have resulted in an error because results were not
permitted to span more than a single hemisphere.
For geography, the viewer is quite configurable. You can set which column to display, the geographic
projection to use for display, such as Mercator or Bonne, and you can choose to display another column
as a label over the relevant displayed region.
The spatial results viewer in SQL Server Management Studio is limited to displaying the first 5,000 objects
from the result set.
For more details on the geography data type, see Microsoft Docs:
geography (Transact-SQL)
http://aka.ms/f4ys44
Well-Known Text (WKT). This is the most common string format and is readable by humans.
Well-Known Binary (WKB). This is a more compact binary representation that is useful for
interchange between computers.
Geography Markup Language (GML). This is the XML-based representation for spatial data.
15-12 Storing and Querying Spatial Data in SQL Server
All CLR data types must implement two string-related methods. The Parse method is used to convert a
string to the data type and the ToString method is used to convert the data type back to a string. Both of
these methods are implemented in the spatial types and both assume a WKT format.
Several variations of these methods are used for input and output. For example, the STAsText method
provides a specific WKT format as output and the AsTextZM method is a Microsoft extension that
provides the Z and M values, in addition to the two-dimensional coordinates.
For more information on the geometry data type, see Microsoft Docs:
geometry (Transact-SQL)
http://aka.ms/Oul99w
Common Methods
Common OGC methods include:
The STDistance method, which returns the
distance between two spatial objects. Note
that this does not only apply to points. You
can also calculate the distance between two
polygons. The result is returned as the minimum distance between any two points on the polygons.
The STIntersects method, which returns 1 when two objects intersect and otherwise returns 0.
The STArea method, which returns the total surface area of a geometry instance.
The STLength method, which returns the total length of the objects in a geometry instance. For
example, for a polygon, STLength returns the total length of all line segments that make up the
polygon.
The STUnion method, which returns a new object that is formed by uniting all points from two
objects.
The STBuffer method, which returns an object whose points are within a certain distance of an
instance of a geometry object.
Microsoft Extensions
In addition to the OGC properties and methods,
Microsoft has provided several useful extensions
to the standards. Several of these extensions are
described in this topic, but many more exist.
Common Extensions
Although the coverage that the OGC specifications
provide is good, Microsoft has enhanced the data
types by adding properties and methods that
extend the standards. Note that the extended
methods and properties do not have the ST prefix.
The IsNull method returns 1 if an instance of a spatial type is NULL; otherwise it returns 0.
GML
<Point xmlns="http://www.opengis.net/gml">
<pos>12 15</pos>
</Point>
GML is excellent for information interchange but the representation of objects in XML can quickly become
very large.
The BufferWithTolerance method returns a buffer around an object, but uses a tolerance value to allow
for minor rounding errors.
Demonstration Steps
1. In SQL Server Manager, in Solution Explorer, under Queries, double-click the 21 - Demonstration
2A.sql script file.
2. Highlight the Transact-SQL under the comment Step 1 - Switch to the AdventureWorks database,
and click Execute.
3. Highlight the Transact-SQL under the comment Step 2 - Draw a shape using geometry, and click
Execute.
5. Highlight the Transact-SQL under the comment Step 3 - Draw two shapes, and click Execute.
8. Highlight the Transact-SQL under the comment Step 5 - Join the two shapes together, and click
Execute.
11. Highlight the Transact-SQL under the comment Step 7 - Draw the Pentagon, and click Execute.
13. Highlight the Transact-SQL under the comment Step 8 - Call the ToString method to observe the
use of the Z and M values that are stored but not processed, and click Execute.
14. Highlight the Transact-SQL under the comment Step 9 - Use GML for input, and click Execute.
16. Highlight the Transact-SQL under the comment Step 10 - Output GML from a location (start and
end points of the Panama Canal only – not the full shape), and click Execute.
Lesson 3
Using Spatial Data in Applications
Having learned how spatial data is stored and accessed in SQL Server, you now have to understand the
implementation issues that can arise when you are building applications that use spatial data.
Lesson Objectives
After completing this lesson, you will be able to:
Explain the basic tessellation process used within spatial indexes in SQL Server.
Implement spatial indexes.
Spatial Indexes
Spatial indexes in SQL Server are based on b-tree
structures, but unlike standard relational indexes,
which directly locate the specific rows required to
answer a query, spatial indexes work in a two-
phase manner.
The first phase, known as the primary filter,
obtains a list of rows that are of interest. The
returned rows are referred to as candidate rows
and may include false positives; that is, rows that
are not required to answer the query.
You can check the effectiveness of a primary filter in SQL Server using the Filter method. The Filter
method only applies the primary filter, so you can compare the number of rows that the Filter method
returns to the total number of rows.
Tessellation Process
SQL Server spatial indexes use tessellation to
minimize the number of calculations that have to
be performed. The tessellation process quickly
reduces the overall number of rows to a list that
might potentially be of interest.
Tessellation Process
SQL Server breaks the problem space into relevant
areas by using a four-level hierarchical grid. Each
object is broken down and fitted into the grid
hierarchy based on which cells it touches.
Covering rule—any cell covered completely by an object is a covered cell and is not tessellated.
Cells-per-object rule—sets the maximum number of cells that can be counted for any object.
Deepest-cell rule—records the bottom most tessellated cells for an object.
Tessellation Scheme
SQL Server uses a different tessellation scheme depending on the data type of the column. Geometry grid
tessellation is used for columns of the geometry data type and Geography grid tessellation is used for
columns of the geography data type. The view sys.spatial_index_tessellations returns the tessellation rules
of a spatial index.
For more information on the spatial index tessellation process, see Microsoft Docs:
Index Bounds
Unlike traditional types of index, a spatial index is
most useful when it knows the overall area that
the spatial data covers. Spatial indexes that are
created on the geography data type do not have to specify a bounding box because the Earth itself
naturally limits the data type.
Spatial indexes on the geometry data type specify a BOUNDING_BOX setting. This provides the
coordinates of a rectangle that would contain all possible points or shapes of interest to the index. The
geometry data type has no natural boundaries so, by specifying a bounding box, SQL Server can produce
a more useful index. If values arise outside the bounding box coordinates, the primary filter would have to
return the rows in which they are contained.
Grid Density
With SQL Server, you can specify grid densities when you are creating spatial indexes. You can specify a
value for the number of cells in each grid for each grid level in the index:
Spatial indexes differ from other types of index because it might make sense to create multiple spatial
indexes on the same table and column. Indexes that have one set of grid densities might be more useful
than a similar index that has a different set of grid densities for locating data in a specific query.
To make spatial indexes easier to configure, SQL Server has automatic grid density and level selections:
GEOMETRY_AUTO_GRID and GEOGRAPHY_AUTO_GRID. The automated grid configuration defaults to an
eight-level grid.
Limitations
Spatial indexes do not support the use of ONLINE build operations, which are available for other types of
index in SQL Server Enterprise.
geometry1.STContains(geometry2) = 1
geometry1.STDistance(geometry2) <=
number
geometry1.STIntersects(geometry2)= 1
geometry1.STOverlaps(geometry2) = 1
geometry1.STTouches(geometry2) = 1
geometry1.STWithin(geometry2)= 1
If the predicate in your query is not in one of these forms, spatial indexes that you create will be ignored,
potentially resulting in slower queries.
For more information on geometry methods supported by spatial indexes, see MSDN:
geography1.STIntersects(geography2)= 1
geography1.STEquals(geography2)= 1
geography1.STDistance(geography2) <
number
geography1.STDistance(geography2) <=
number
Unless the predicate in your query is in one of these forms, spatial indexes that you create will be ignored
and query performance might be affected.
For more information on geography methods supported by spatial indexes, see MSDN:
Demonstration Steps
1. In SQL Server Manager, in Solution Explorer, under Queries, double-click the 31 - Demonstration
3A.sql script file.
2. Highlight the Transact-SQL under the comment Step 1 - Open a new query window to the
AdventureWorks database, and click Execute.
3. Highlight the Transact-SQL under the comment Step 2 - Which salesperson is closest to New
York? and click Execute.
4. Highlight the Transact-SQL under the comment Step 3 - Which two salespeople live the closest
together? and click Execute.
Objectives
After completing this lab you will be able to:
Password: Pa55w.rd
Task 2: Write Code to Assign Values Based on Existing Latitude and Longitude
Columns
Write code to assign values to the Location column, based on the existing Latitude and Longitude
columns.
Results: After this exercise, you should have replaced the existing Longitude and Latitude columns with
a new Location column.
Results: After completing this lab, you will have created a spatial index and written a stored procedure
that will return the prospects within a given distance from a chosen prospect.
Developing SQL Databases 15-23
Module 16
Storing and Querying BLOBs and Text Documents in SQL
Server
Contents:
Module Overview 16-1
Lab: Storing and Querying BLOBs and Text Documents in SQL Server 16-26
Module Overview
Traditionally, databases have been used to store information in the form of simple values—such as
integers, dates, and strings—that contrast with more complex data formats, such as documents,
spreadsheets, image files, and video files. As the systems that databases support have become more
complex, administrators have found it necessary to integrate this more complex file data with the
structured data in database tables. For example, in a product database, it can be helpful to associate a
product record with the service manual or instructional videos for that product. SQL Server provides
several ways to integrate these files—that are often known as Binary Large Objects (BLOBs)—and enable
their content to be indexed and included in search results. In this module, you will learn how to design
and optimize a database that includes BLOBs.
Objectives
After completing this module, you will be able to:
Describe the considerations for designing databases that incorporate BLOB data.
Describe the benefits and design considerations for using FILESTREAM to store BLOB data on a
Windows file system.
Describe the benefits of using full-text indexing and Semantic Search, and explain how to use these
features to search SQL Server data, including unstructured data.
16-2 Storing and Querying BLOBs and Text Documents in SQL Server
Lesson 1
Considerations for BLOB Data
There are several ways you can store BLOBs and integrate them with the tabular data in a database. Each
approach has different advantages and disadvantages. You should consider how a chosen approach
affects performance, security, availability, and any other requirements. In this lesson, you will see the
principal features of the different approaches.
Lesson Objectives
After completing this lesson, you will be able to:
Describe how BLOBs differ from structured data and list the technical challenges they present.
Describe how BLOBs can be stored in database files by using columns with the varchar(max) data
type.
Describe how database administrators can store BLOBs outside the database and link to them from
database tables.
Describe the FILESTREAM feature and explain the advantages of using it.
Describe how FileTables extend the FILESTREAM feature and provide access for desktop applications.
Security. If BLOBs are stored in the database, the authorization to access them can be controlled by
using roles, logins and users, as for the authorization to access tables, views, and so on. However, if
BLOBs are stored on the file system, access must be controlled by using Windows accounts, groups,
and NTFS permissions.
Developing SQL Databases 16-3
Indexing and searching text data. BLOBs such as Word files contain large quantities of unstructured
text. Users might like to search this text for specific words but standard SQL Server indexes do not
support it.
Referential and transactional integrity. You must consider how create, read, update, and delete
(CRUD) operations on database rows might affect corresponding BLOBs. For example, if a product is
deleted from the product catalog, should the corresponding product manual also be deleted?
Storing BLOBs in the database. In this approach, you add a column to a database table with the
varbinary(MAX) datatype. The file is stored as binary information within the database file.
Storing BLOBs on the file system. In this approach, you save BLOBs to a shared folder on a file
server. You can link files to database rows by adding a column with the varchar() or nvarchar()
datatype to the relevant tables, and using it to store a path to the file.
FILESTREAM. If you use the FILESTREAM feature, BLOBs are stored in the file system, although they
are accessed through the database as varbinary(MAX) columns. To applications, BLOBs appear to be
stored in database tables, but the use of the file system can increase performance and simplify the
challenges of referential and transactional integrity.
FileTables. This feature is an extension of FILESTREAM. BLOBs are stored on the file system but can
be accessed through the database. In addition, applications can access and modify BLOBs by using
the Windows APIs they would use for accessing nondatabase files.
Remote BLOB Storage (RBS). SQL Server RBS is an optional add-on that you can use to place BLOBs
outside the database in commodity storage solutions. RBS is extensible and comes with several
different providers. Each provider supports a different kind of remote storage solution. Several third-
party vendors have created providers for their proprietary storage technologies; you can also develop
custom providers.
Note: You can control this storage behavior by using the large value types out of row
option in the sp_tableoption stored procedure. If this option is set to 0, the behavior is as
described above. If the option is set to 1, then BLOBs are always stored in separate pages, even if
they are under the 8,000 byte limit.
Note: Early versions of SQL Server included the image data type, which was intended for
BLOB storage. This data type is still available in SQL Server but is deprecated and should not be
used.
All data is stored in the database files; there is no requirement to maintain and back up a separate set
of folders on the file system where BLOBs are located.
Restore operations are simplified, because only the database needs to be restored.
The transactional and referential integrity of BLOB data is automatically maintained by SQL Server.
Administrators do not need to secure a separate set of folders on the file system.
Developers writing applications that use your database can access BLOBs by using Transact-SQL and
do not have to call separate I/O APIs.
Full integration with Full-Text Search and Semantic Search for textual data.
However, the following issues may be considered disadvantages of this approach:
BLOBs can only be accessed through Transact-SQL. Word, for example, cannot open Word
documents that are stored directly as database BLOBs.
Large BLOBs may reduce read performance, because a large number of pages from the b-tree may
have to be retrieved from the disk for each BLOB.
Although restores are simpler, they often take longer, because BLOBs typically add considerable size
to database files.
For example, a Products table in the database may have a column called “ManualPath” of the varchar()
data type. In the row for the “Rear Derailleur Shifter”, the ManualPath column may store the path
“\\DocumentServer\Manuals\RearShifterManual.doc”.
Note: In this example, the stored path is a Server Message Block (SMB) path to a file on a
file server. Depending on your file store, paths may be in other forms, such as URLs.
Atomicity is a major concern. For any operation that alters one of the storage locations, you must consider
how the entry in the other location will be affected. For example, if a product is deleted from the catalog,
should the corresponding product manual be deleted from the file server? If a product’s part number is
altered in the database, does this change the need to be propagated to the BLOB—and how should it be
updated?
You must also plan how to secure both locations—using logins, users, and GRANTS for the database, and
using Windows accounts and permissions (or some other security system) for the file store.
Read performance for large BLOBs is typically faster than it would be for BLOBs stored in the
database.
BLOBs are less likely to become fragmented on the file system. It is easier to reduce fragmentation on
the file system and this can ensure better performance.
Because BLOBs are stored in a shared folder, applications can access them without going through SQL
Server. For example, a user with the correct path can open a manual in Word.
The disadvantages of this approach arise from the less close integration between the BLOBs and their
corresponding database rows:
There is no mechanism for maintaining transactional and referential integrity. For example, if a user
moves a BLOB in the file store, the path in the database row will not automatically be updated and
will be broken.
There is another location to back up and restore.
Security administration must be done twice—once for the database and once for the file store.
Developers must use two mechanisms to access data—Transact-SQL for access to the database and a
Windows API for access to the BLOBs. This adds complexity and increases development time.
FILESTREAM
You use the FILESTREAM feature, which was
introduced in SQL Server 2008, to store BLOBs on
the file system, closely integrated with their
corresponding rows. It combines the extra
performance you can achieve for large BLOBs
served from the file system, with the advantages
of storing BLOBs in the database.
FILESTREAM Implementation
FILESTREAM is an attribute of the varchar(max)
data type. If you enable this attribute on a
varchar(max) column, SQL Server stores BLOBs in
a folder on the NTFS file system. You always access
these BLOBs through the database server but you can choose to use either Transact-SQL or Win32 I/O
APIs, which have better performance for large files. You can also store BLOBs that have more than the 2
GB size limit for BLOBs stored in the database.
To use FILESTREAM, you must create at least one FILESTREAM filegroup in your database. This is a
dedicated kind of filegroup that contains file system directories, called “data containers”, instead of the
actual BLOBs.
Benefits
When you use FILESTREAM, BLOBs are stored outside the database on the file system, but from the point
of view of applications, they appear to be within the database. This has the advantage of high
performance using external BLOBs, and close integration when you store BLOBs within the database.
Advantages
When BLOBs are larger than about 1 MB, read performance will be greater with FILESTREAM than for
BLOBs stored within the database.
BLOBs are fully integrated with the database for management and security.
Developers access all BLOBs through the database server and can use either Transact-SQL or Win32
APIs.
Disadvantages
Applications cannot access BLOBs directly; instead, developers must write code that reads and writes
BLOB data.
BLOBs must be stored on a hard drive installed on the database server itself; you cannot use a shared
folder on another file server.
Note: You can use a Storage Area Network (SAN) to store FILESTREAM BLOBs, because
these appear as local hard drives to the database server.
Developing SQL Databases 16-7
FileTables
FileTables was introduced in SQL Server 2012 as an
extension to the FILESTREAM features that solve
some of that feature’s limitations.
Also, FILESTREAM BLOBs must be stored on a hard drive that is local to the database server. With
FileTables, the storage location can be a shared folder on a file server that is remote to the database
server.
A FileTable is a database table that has a specific schema. It includes a varchar(max) column with the
FILESTREAM attribute enabled. It also includes a set of metadata columns that describe the BLOBs. These
columns include the file size, the creation time, the last write time, and so on.
Because a FileTable is a separate database table with a fixed schema, you cannot integrate the BLOBs
as columns in another table. For example, product manuals must be stored as rows in a separate
FileTable, and not as a column in the Products table. Instead you must use a foreign key relationship
to associate a product with its manual. This may have implications for referential and transactional
integrity.
4. In the Open Project dialog box, navigate to D:\Demofiles\Mod16\Demo, click demo.ssmssln, and
then click Open.
6. Select the code under the Step 1 comment, and then click Execute.
7. Select the code under the Step 2 comment, and then click Execute.
8. Select the code under the Step 3 comment, and then click Execute.
9. Select the code under the Step 4 comment, and then click Execute. Note that no results are returned.
10. Close Microsoft SQL Server Management Studio, without saving any changes.
Use a FileTable.
Developing SQL Databases 16-9
Lesson 2
Working with FILESTREAM
The FILESTREAM feature, together with FileTables, enable database administrators to combine the
performance advantages of BLOBs stored on the file system, with the close integration of data when
BLOBs are stored in the database. Now that you understand when to use FILESTREAM and FileTables, this
lesson discusses their prerequisites and how to implement them in your database.
Lesson Objectives
After completing this lesson, you will be able to:
Enable FILESTREAM for a SQL Server instance, a database, and an individual table.
Write queries that determine BLOB locations for FileTables and FILESTREAM columns.
FILESTREAM filegroups should be placed on a separate volume from the operating system, page files,
the database, the transaction logs, and the tempdb for optimal performance.
BLOBs in FILESTREAM columns are automatically included in database backups and restores, so you
do not need a separate maintenance regime for FILESTREAM data.
If you expect BLOBs in a column to be smaller than 1 MB, you might obtain better performance by
using a varbinary(max) column without FILESTREAM. This is because small BLOBs can be stored in
the same data page as the rest of the row.
If you are using transparent database encryption for a database, BLOBs stored in FILESTREAM
columns are not encrypted.
Enabling FILESTREAM
FILESTREAM is not enabled by default in SQL
Server. To use it, you must complete three
configuration tasks:
3. Locate the SQL Server instance you want to configure and double-click it.
4. In the Properties dialog for the SQL Server instance, click the FILESTREAM tab.
6. If you want to use Win32 APIs to access BLOBs, select Enable FILESTREAM for file I/O access, and
then click OK.
You must also configure the FILESTREAM access level by using the sp_configure stored procedure. There
are three possible access levels:
2. This value enables FILESTREAM access for Transact-SQL and Win32 streaming clients.
After completing these configuration steps, restart the SQL Server instance.
This code example creates a new database with one filegroup that supports FILESTREAM:
Nontransactional access must be configured at the database level. A key advantage of FileTables is
that they enable applications to access BLOBs without going through the transactional system of SQL
Server. You must configure this access and choose the access level. You can choose to enable full
access, read-only access, or disable the access. See the next topic for how to enable nontransactional
access.
A directory for FileTables must be configured at the database level. When you create a FileTable, a
folder is created as a child of the directory that will store BLOBs. See the next topic for how to
configure this directory.
16-12 Storing and Querying BLOBs and Text Documents in SQL Server
FileTables do not support all of the SQL Server features that other tables support, and some SQL Server
features are partially supported.
o Table partitioning
o Database replication
The following SQL Server features can be used with some limitations:
o If a database includes a FileTable, failover works differently for AlwaysOn availability groups. If
failover occurs, you have full access to the FileTable on the primary replica but no access on
readable secondary replicas.
o FileTables do not support INSTEAD OF triggers for Data Manipulations Language (DML)
operations. AFTER triggers for DML operations are supported, and both AFTER and INSTEAD OF
triggers are supported for Data Definition Language (DDL) operations.
Enabling FileTables
Before you can use FileTables, FILESTREAM must
be enabled at the instance level, and a
FILESTREAM filegroup created as described
previously in this lesson. In addition, you must
complete the tasks described in the following
sections.
The available access levels for nontransactional access are DISABLED, READ_ONLY, and FULL. To enable
FileTables and set the access level, use the FILESTREAM option when you create or alter a database.
In this example, full nontransactional access is enabled for a new database called HumanResources:
In the following code, a FileTable directory is configured for a pre-existing database named
HumanResources:
Configuring a Directory
ALTER DATABASE HumanResources
SET FILESTREAM ( NON_TRANSACTED_ACCESS = FULL,
DIRECTORY_NAME = N'FileTableDirectory' );
GO
Note: You can also enable and configure FileTable prerequisites by using SSMS. These
options appear on the Options tab of the database Properties dialog box.
FILESTREAM Access
You can use Transact-SQL built-in system
functions to determine the file system paths and
IDs for folders that store FILESTREAM and
FileTable BLOBs.
Use the @option argument to control the format of the returned path. Possible values are:
0. The path is returned in NetBIOS format. This is the default.
The following query uses the PathName() function to locate a BLOB on the filesystem:
The following example returns the FileTable root path for the HumanResources database:
16-14 Storing and Querying BLOBs and Text Documents in SQL Server
FileTableRootPath Function
USE HumanResources;
GO
SELECT FileTableRootPath();
The following query returns the relative paths to all BLOBs in a FileTable named Images:
GetFileNamespacePath Function
USE HumanResources;
GO
SELECT file_stream.GetFileNamespacePath() AS [Relative Path] FROM Images;
GO
Demonstration Steps
Enable FILESTREAM at the Instance Level
1. On the Start screen, type SQL Server 2016 Configuration Manager, and then click SQL Server 2016
Configuration Manager.
4. In the right pane, right-click SQL Server (MSSQLSERVER), and then click Properties.
5. In the SQL Server (MSSQLSERVER) Properties dialog box, on the FILESTREAM tab, check that the
Enable FILESTREAM for Transact-SQL access check box is selected, and then click OK.
2. In the Connect to Server dialog box, in Server name, type MIA-SQL, and then click Connect.
4. In the Open Project dialog box, navigate to D:\Demofiles\Mod16\Demo, click demo.ssmssln, and
then click Open.
5. In Solution Explorer, double-click the 2 - Configuring FILESTREAM and FileTables.sql script file.
6. Select the code under the Step 1 comment, and then click Execute.
7. Select the code under the Step 2 comment, and then click Execute.
8. Select the code under the Step 3 comment, and then click Execute.
9. Select the code under the Step 4 comment, and then click Execute.
Developing SQL Databases 16-15
10. Select the code under the Step 5 comment, and then click Execute.
11. Select the code under the Step 6 comment, and then click Execute.
12. Select the code under the Step 7 comment, and then click Execute.
13. Select the code under the Step 8 comment, and then click Execute.
14. Select the code under the Step 9 comment, and then click Execute.
Keep Microsoft SQL Server Management Studio open for the next demonstration.
Sequencing Activity
You have no FILESTREAM or FileTable prerequisites configured. You want to create a FileTable for BLOB
storage. Put the following steps in order by numbering each to indicate the correct order.
Steps
Enable
FILESTREAM at
the instance
level.
Create a
FILESTREAM
filegroup at the
database level.
Configure
nontransactional
access at the
database level.
Configure a
directory for
FileTables at the
database level.
Start creating
FileTables.
16-16 Storing and Querying BLOBs and Text Documents in SQL Server
Lesson 3
Using Full-Text Search
SQL Server has industry-leading indexing and querying performance optimized to handle structured,
tabular data. If you use large varchar() columns to store long text fragments, or use varbinary(max)
columns to store BLOBs, SELECT queries that use predicates such as LIKE might not perform well, or might
not return the rows you intended. The Free-Text Search feature of SQL Server helps with these issues—it
can analyze and index long text fragments in large varchar() columns and BLOBs in a way that is aware of
language-specific linguistic rules, such as word boundaries and inflection.
In this lesson, you will see how to configure and query Full-Text Search and Semantic Search.
Lesson Objectives
After completing this lesson, you will be able to:
Describe how Full-Text Search enables users to analyze text-based data in ways that are not possible
with standard Transact-SQL queries.
List the components of the Full-Text Search architecture and describe their role in index operations
and queries.
Configure Full-Text Search and create full-text indexes.
Describe how Semantic Search enables users to analyze text-based data in ways that are not possible
with Full-Text Search.
Simple term search. This type of search matches one or more specific words or phrases.
Prefix term search. This type of search matches words that start with the character string you
specify.
Generation term search. This type of search matches inflectional forms of the words you specify.
Developing SQL Databases 16-17
Proximity term search. This type of search matches an item when a specified word or phrase
appears close to another specified word or phrase.
Thesaurus search. This type of search matches words that are synonymous with the words you
specify. For example, if you search for “run”, a thesaurus search might match “jog”.
Weighted term search. This type of search matches the words or phrases you specify, and orders
them so that some word matches appear higher in the list than others.
Performance
Full-text searches deliver much higher performance than LIKE predicates when executed against large
blocks of text, such as varchar(max) columns. In addition, you cannot use LIKE predicates to search text
in BLOBs, such as varbinary(max) columns. This restriction applies whether or not you are using
FILESTREAM or FileTables.
Property-scoped Searches
Full-text search can index the properties of a file stored as a BLOB, in addition to its text. For example,
Word supports an Author property for each Word document. You can use a full-text search to locate
documents by a given author, even if you have not separately stored the author in a column, in a SQL
Server table.
Language Support
Full-text search supports around 50 languages and distinguishes between dialects of the same language,
such as American English and British English. For each language, the following components are used to
analyze and index text:
Word breakers and stemmers. A word breaker separates text into individual words by located word
boundaries, such as spaces and periods. A stemmer conjugates verbs to ensure that different forms of
the same word match.
Stoplists. A stop word or noise word is one that does not help the search. A stoplist is a list of stop
words for a given language. For example, in English, no one would search for the word “the” so it is
removed from the index.
Thesaurus files. Thesaurus files list synonyms to ensure that thesaurus searches match words that
mean the same thing as the search term.
Filters. Filters are components that understand the structure of a particular file type, such as a Word
document or an Excel spreadsheet. Filters enable property-scoped searches by enabling SQL Server to
index the properties of those file types.
You can use the sys.fultext_languages catalog view to determine which languages are supported by full-
text searches on your database server. For full details of this catalog view, see Microsoft Docs:
sys.fulltext_languages (Transact-SQL)
http://aka.ms/Hvzcir
16-18 Storing and Querying BLOBs and Text Documents in SQL Server
Full-Text Gatherer. This component runs the crawl threads that process content.
Full-Text Engine. This component is part of the SQL Server query processor. It receives full-text
queries and communicates with the index to locate matching items.
Thesaurus. This component stores the language-dependent lists of synonyms that enable thesaurus
search.
Stoplist. This component defines stop words and removes them from queries and full-text indexes.
Indexer. This component compiles the full-text index in a format that optimizes query delivery.
Filters. These components analyze file structures, and locate file properties and body text.
Word breakers. These components look for word boundaries in a specific language, such as spaces,
commas, and periods.
Indexing Process
A full-text population operation is also called a crawl and can be initiated by a change to one of the
indexed columns or on a schedule. When a crawl is initiated, these are the steps that are followed:
1. The full-text engine notifies the filter daemon host that a crawl is underway.
2. The full-text gatherer initiates crawl threads, each of which begins to crawl content and pass it to
different components for processing.
3. The full-text engine reads a large quantity of data into memory in the form of user tables. These
tables are the raw content of the character-based data in the columns of the index. Depending on the
storage location, different protocol handlers are used to obtain this text.
4. Crawl threads pass BLOBs to filters. These analyze the content of the file and return text from the
body and metadata fields.
5. Crawl threads pass text to word breakers that split long strings into words.
7. The indexer calls the stoplist to remove noise words from the index.
Developing SQL Databases 16-19
8. The indexer creates an inverted list of words and their locations in the columns, and stores this list in
the full-text index.
Query Process
When a user executes a full-text query, the SQL Server Query Processor passes the request to the Full-Text
Engine. The Full-Text Engine takes different steps to compile the query, depending on the type of search
that was requested. For example:
If the query is a generation term search, the Full-Text Engine performs stemming to identify alternate
forms of the search terms.
If the query is a thesaurus search, the Full-Text Engine calls the thesaurus to identify synonyms.
If the query includes phrases, the Full-Text Engine calls word breakers.
For more information about full-text search architecture, see Microsoft TechNet:
Table support. You can only create one full-text index for each database table, but the index can
include multiple columns from that table.
Language support. A single full-text index can include text in multiple languages. You specify a
single language for each column in the index.
Filegroup placement. Full-text crawls are disk-intensive operations, so you should consider creating
a dedicated filegroup for full-text indexes. For maximized performance, separate this filegroup onto
its own physical disk.
Managing updates. By default, the Full-Text Engine is configured to update the index continuously
as changes are made to the underlying column data. This ensures that the index is always up to date.
16-20 Storing and Querying BLOBs and Text Documents in SQL Server
However, you may wish to schedule crawls to take place during off-peak hours or to manually initiate
crawls. Schedules use the SQL Server Agent service to initiate crawls. Remember that the index may
fall out of synchronization with the column data if a crawl has not taken place recently.
The name of the column that will act as the key index column.
Use CONTAINS to locate precise or fuzzy matches to words and phrases. You can also use CONTAINS to
perform proximity term searches or weighted term searches.
In the following code, the query returns all Employees in the Sales department that have the phrase
“Team Management” in the Skills column. This is an example of a simple term search:
In the following example, the query returns all Employees with forms of the verb “analyze” in their résumé.
Results would include employees with the words “analyzing” and “analyzed” in their résumés. The Resume
column may be a BLOB column that uses the varbinary(max) column, either with FILESTREAM enabled
or with BLOBs stored in the database.
Use FREETEXT to match the meaning, rather than the exact wording of single words, phrases, or
sentences. FREETEXT searches use the thesaurus to match meaning.
The following example uses the FREETEXT predicate to perform a thesaurus search:
Thesaurus Search
SELECT EmployeeID, FirstName, LastName
FROM HumanResources.Employees
WHERE FREETEXT (Resume, 'Project Management' );
KEY. This column returns the unique value of the key index column of the full-text index.
RANK. This column contains a rank value that describes how well the row matched the query. The
higher the ranks value, the better the match.
In the following example, a weighted term search is performed by using the CONTAINSTABLE function.
The results are joined to the original table to return the rank value for each product:
In the following example, the FREETEXTTABLE function is called to perform a thesaurus search. The results
are joined with the original table to display the rank value with the search column:
For more information about CONTAINS, FREETEXT, CONTAINSTABLE, and FREETEXTTABLE see Microsoft
Docs:
Query with Full-Text Search
http://aka.ms/Ai7pzu
When you use the custom proximity term, you can specify the maximum number of nonsearch terms that
separate your search terms. This is known as the maximum distance between search terms. You can also
define whether the returned result must contain the search terms in the specified order.
In the following example, the query will return employees if the words “Project” and “Management”
appear with five or fewer terms separating them in the Resume column:
For more information on the custom proximity term, see Microsoft Docs:
Demonstration Steps
Create and Use a Full-Text Index
1. In Solution Explorer, double-click the 3 - Configuring and Using Full-Text Search.sql script file.
2. Select the code under the Step 1 comment, and then click Execute.
3. Select the code under the Step 2 comment, and then click Execute.
4. Select the code under the Step 3 comment, and then click Execute.
5. Select the code under the Step 4 comment, and then click Execute.
6. Select the code under the Step 5 comment, and then click Execute.
7. Select the code under the Step 6 comment, and then click Execute.
8. Close Microsoft SQL Server Management Studio, without saving any changes.
Semantic Search uses a database named the Semantic Language Statistics database, which contains the
statistical models that are used to perform semantic searches.
Note: Semantic Search does not support as many languages as a full-text index. To view
the list of supported languages for Semantic Search, query the sys.fulltext_semantic_languages
catalog view.
For more information on how to install the Semantic Language Statistics database, see Microsoft Docs:
After the Semantic Language Statistics database is configured, you can use the CREATE FULLTEXT INDEX
statement or the ALTER FULLTEXT INDEX statement to create a full-text index that includes Semantic
Search.
The following code example adds Semantic Search to an existing full-text index on the Document table in
the AdventureWorks database:
o Document_key. This is the key index value for the returned document in the underlying full-text
index.
o Keyphrase. This is the phrase that the search has identified as key to the meaning of the
document.
o Score. This is a weighting value that indicates the importance of the phrase in the document. The
value is between 0 and 1.
o Matched_document_key. This is the key index value for the matched document.
o Score. This is a weighting value that indicates the closeness of the match. The value is between 0
and 1.
SEMANTICSIMILARITYDETAILSTABLE. This function returns the key phrases that make two
documents similar. Having used SemanticSimilarityTable to find similar documents, you can use this
function to determine the phrases that the similar documents share. The returned table includes the
following columns:
o Keyphrase. This the phrase that is shared between the two documents you have specified.
o Score. This is a weighting value that indicates how important this key phrase is in its similarity
between the two documents.
Developing SQL Databases 16-25
The following example uses the SEMANTICKEYPHRASETABLE function to return the top 10 key phrases
from a specific document in the Resume column of the Employees table. The document is specified by
using the @EmployeeId parameter, which is the key index of a row in the Employees table.
SEMANTICKEYPHRASETABLE
SELECT TOP(10) KeyPhraseTable.keyphrase
FROM SEMANTICKEYPHRASETABLE
(
HumanResources.Employees,
Resume,
@EmployeeId
) AS KeyPhraseTable
ORDER BY KEYP_TBL.score DESC;
A thesaurus search.
You have also been asked to create a FileTable, with a corresponding shared folder, so users can store
documents by using Word and other desktop applications. These files will be accessible through the file
share and database queries.
Finally, you have also been asked to create a full-text index on the Description column in the
Production.ProductDescriptions table so that generation term queries and thesaurus queries can be
used.
Objectives
At the end of this lab, you will be able to:
Enable FILESTREAM and move BLOB data into a FILESTREAM column.
4. Write and execute a script that uses the sp_configure stored procedure to set the FILESTREAM access
level to 2.
3. Write a query to add a new column called NewLargePhoto to the Production.ProductPhoto table.
Ensure the new column has FILESTREAM enabled and is a nullable varbinary(max) column.
2. Write a query to drop the LargePhoto column from the Production.ProductPhoto table.
3. Write a query that used the sp_rename stored procedure to change the name of the
NewLargePhoto column to LargePhoto.
2. Create a FileTable
2. Write a query that uses the sys.database_filestream_options system view to display whether
nontransacted access is enabled for each database in the instance.
3. Write a query that enables nontransacted access for the AdventureWorks2016 database. Set the
transacted access level to full and the directory name to “FileTablesDirectory”.
2. Copy and paste the path you determined into the address bar of a new File Explorer window.
3. Create a new text document called DocumentStoreTest in the file table shared folder.
4. In SQL Management Studio, write a query that displays all rows in the DocumentStore FileTable.
Created a FileTable.
2. Execute the first SELECT query in the Lab Exercise 2.sql script file, which lists the tables that have a
full- text index in the Adventure Works2016 database.
3. Write a query that creates a new full-text catalog in the Adventure Works2016 database with the
name ProductFullTextCatalog.
4. Write a query that creates a new unique index called ui_ProductDescriptionID and indexes the
ProductDescriptionID column in the Production.ProductDescription table.
5. Write a query that creates a new full-text index on the Description column of the
Production.ProductDescription table. Use the ui_ProductDescription unique index and the
ProductFullTextCatalog.
2. Write a script that executes a generation term query against the Description column in the
Production.ProductDescription table. Locate rows that contain the word “Bike”. Make a note of the
number of rows returned.
3. Write a script that returns rows from the previous generation term query but not terms from the
previous simple terms query. Examine the Description text for these results.
4. Close Microsoft SQL Server Management Studio, without saving any changes.
Results: At the end of this exercise, you will have created a full-text index.
Question: How did the results of the simple term query you executed in Exercise 3, Task 2
differ from the results of the generation terms query?
Question: What did you notice about the results of the third query you ran against the full-
text index?
16-30 Storing and Querying BLOBs and Text Documents in SQL Server