Addison Wesley The GuruS Guide To SQL Server Architecture and Internals 2003 Ken Henderson PDF
Addison Wesley The GuruS Guide To SQL Server Architecture and Internals 2003 Ken Henderson PDF
• Examples
Guru's Guide to SQL Server Architecture and Internals,
The
By Ken Henderson
"I can pretty much guarantee that anyone who uses SQL
Server on a regular basis (even those located in Redmond
working on SQL Server) can learn something new from
reading this book."
-David Campbell, Product Unit Manager,Relational Server
Team, Microsoft Corporation
• Table of Contents
• Examples
Guru's Guide to SQL Server Architecture and Internals,
The
By Ken Henderson
Copyright
List of Exercises
Foreword
Historical Perspective
Preface
Acknowledgments
Introduction
About Books Online
About WinDbg
About the Fundamentals
About the "How-To"
About the Breadth of Topics
About C++
About Visual C++ 6.0
About the Terms and Knowledge Measures
About SQL Server Versions
About Master Programming
About the Author
Part 1.
Foundations
Chapter 1.
Overview
Chapter Overview
Chapter Pairs
About the Code
Chapter 2.
Windows Fundamentals
Processes
Threads
Thread Scheduling
Thread Synchronization
Chapter 4.
Memory Fundamentals
Memory Basics
Virtual Memory
Heaps
Shared Memory
Chapter 5.
I/O Fundamentals
I/O Basics
Asynchronous and Nonbuffered I/O
Scatter-Gather I/O
I/O Completion Ports
Memory-Mapped File I/O
Chapter 6.
Networking Fundamentals
Overview
Named Pipes
Windows Sockets
Remote Procedure Call
Recap
Knowledge Measure
Chapter 7.
COM
Overview
Before COM
The Dawn of COM
Basic Architecture
COM at Work
Threading Models
COM and SQL Server
Recap
Knowledge Measure
Chapter 8.
XML
Overview
Simplicity Comes at a Price
A Brief History of XML
XML vs. HTML: An Example
Document Type Definitions
XML Schemas
Converting XML to HTML Using a Style Sheet
The Document Object Model
Processing XML with MSXML
Resources
Recap
Knowledge Measure
Part II.
Chapter 9.
Memory Regions
Sizing
The BPool
Primitive Allocations
AWE
The Lazywriter
The Memory Managers
Pulling It All Together
Recap
Knowledge Measure
Chapter 12.
Query Processor
Transactions
Cursors
Overview
On Cursors and ISAM Databases
Types of Cursors
Appropriate Cursor Use
Transact-SQL Cursor Syntax
Configuring Cursors
Updating Cursors
Cursor Variables
Cursor Stored Procedures
Optimizing Cursor Performance
Recap
Knowledge Measure
Chapter 15.
ODSOLE
Overview
The sp_OA Procedures
Automating with ODSOLE
Automating SQL-DMO by Using ODSOLE
Using ODSOLE to Automate Custom Objects
Recap
Knowledge Measure
Chapter 16.
Full-Text Search
Overview
Architectural Details
Setting Up Full-Text Indexes
Full-Text Predicates
Rowset Functions
Recap
Knowledge Measure
Part III.
Data Services
Chapter 17.
Server Federations
Partitioned Views
Recap
Knowledge Measure
Chapter 18.
SQLXML
Overview
MSXML
FOR XML
Using FOR XML
OPENXML
Using OPENXML
Accessing SQL Server over HTTP
URL Queries
Using URL Queries
Template Queries
Mapping Schemas
Updategrams
XML Bulk Load
Managed Classes
SQLXML Web Service (SOAP) Support
SQLXML Limitations
Recap
Knowledge Measure
Chapter 19.
Notification Services
How It Works
Building Your Own Notification Application
Recap
Knowledge Measure
Chapter 20.
Overview
Packages
The Multiphase Data Pump
The Bulk Insert Task
The Data Driven Query Task
ActiveX Transformations
Other Types of Transformations
Lookup Queries
Workflow Properties
DTS and Transactions
Controlling Package Workflow through Scripting
Parameterized DTS Packages
The DSO Rowset Provider
Using DTS to Transform Replication Subscriptions
Custom Tasks
Controlling DTS through Automation
Recap
Knowledge Measure
Chapter 21.
Snapshot Replication
Overview
The Snapshot Agent
Duties of the Snapshot and Distribution Agents
Updatable Subscriptions
Remote Agent Activation
Replication Cleanup
Recap
Knowledge Measure
Chapter 22.
Transactional Replication
Overview
The MSrepl_commands Table
The sp_replcmds Procedure
The sp_repldone Procedure
Update Stored Procedures
Concurrent Snapshot Processing
Updatable Subscriptions
Validating Replicated Data
Skipping Errors
Cleanup
Recap
Knowledge Measure
Chapter 23.
Merge Replication
Overview
Conflict Resolution
Generations
Filtering
Identity Range Management
Recap
Knowledge Measure
Part IV.
Chapter 24.
DTSDIAG
Part V.
Essays
Copyright
Many of the designations used by manufacturers and sellers to distinguish their
products are claimed as trademarks. Where those designations appear in this book,
and Addison-Wesley was aware of a trademark claim, the designations have been
printed with initial capital letters or in all capitals.
The author and publisher have taken care in the preparation of this book, but make
no expressed or implied warranty of any kind and assume no responsibility for errors
or omissions. No liability is assumed for incidental or consequential damages in
connection with or arising out of the use of the information or programs contained
herein.
The publisher offers discounts on this book when ordered in quantity for bulk
purchases and special sales. For more information, please contact:
International Sales
(317) 581-3793
[email protected]
Henderson, Ken.
The Guru's guide to SQL server architecture and internals / Ken Henderson.
p. cm.
ISBN 0-201-70047-6 (Paperback : alk. paper)
1. SQL server. 2. Client/server computing. 3. Debugging in computer
science�Computer programs.
I. Title.
QA76.9.C55H44 2003
005.75'85�dc22
2003015828
For information on obtaining permission for use of material from this work, please
submit a written request to:
Pearson Education, Inc.
Rights and Contracts Department
75 Arlington Street, Suite 300
Boston, MA 02116
Fax: (617) 848-7047
1 2 3 4 5 6 7 8 9 10�CRS�0706050403
Dedication
For D
List of Exercises
Exercise 3.1 Monitoring Process CPU Usage
Exercise 3.2 Monitoring Thread Creation in SQL Server
Exercise 3.3 Listing Modules and Processes within SQL Server
Exercise 3.5 Displaying Thread Information Using a Debugger
Exercise 3.8 Viewing Thread Priorities, Affinities, and Other Useful Information
Exercise 3.10 Implementing a Kernel Mode Spinlock by Using a Mutex
Exercise 4.2 An Obscured NULL Pointer Reference
Exercise 4.3 A NULL Pointer Reference Due to a Memory Overwrite
Exercise 4.5 Exploring the Process of Reserving Virtual Memory
Exercise 4.6 Reserving, Committing, and Releasing Virtual Memory
Exercise 4.9 Overloading New and Delete to Allocate Memory from a Custom
Heap
Exercise 4.11 Using Shared Memory to Share Data between Processes
Exercise 5.2 A Utility That Converts a UNICODE Text File by Using Asynchronous
I/O
Exercise 5.3 A String Search Utility That Uses Nonbuffered Asynchronous I/O
Exercise 5.4 A File Search Utility That Uses Scatter-Gather I/O
Exercise 5.5 A Multithreaded, Multifile Search Utility That Uses an I/O
Completion Port
Exercise 5.6 A File Search Utility That Uses an I/O Completion Port for Both
Input and Output
Exercise 5.8 A Multithreaded File Scanner That Uses Mapped File I/O
Exercise 6.1 A Find String Utility That Uses Named Pipes
Exercise 6.3 A Winsock Server App That Uses Win32 I/O Functions to Interact
with a Client
Exercise 6.4 An fstring Variant That Works with Both Sockets and Pipes
Exercise 9.1 Inspecting SQL Server's Use of Windows Networking API Functions
Exercise 18.1 Determining How MSXML Computes Its Memory Ceiling
Exercise 18.3 Determining Where OPENXML Is Implemented
Exercise 18.4 Building and Consuming a SQLXML Web Service
Exercise 20.2 Creating a Custom DTS Task in Visual Basic
Exercise 21.1 Running a Replication Agent from the Command Line
Ken Henderson has invested tremendous effort in unlocking the secrets of Microsoft
SQL Server, having recently written The Guru's Guide to Transact-SQL (Addison-
Wesley, 2000) and The Guru's Guide to SQL Server Stored Procedures, XML, and
HTML (Addison-Wesley, 2002). Ken is one of those people who seek deep knowledge.
He isn't satisfied in knowing how to operate something�he needs to know how
something operates. In his research for The Guru's Guide to SQL Server Architecture
and Internals, Ken learned how SQL Server operates, from the ground up�and that
is how he presents it.
Part II, Subsystems, Components, and Technologies, delves into the architecture of
SQL Server's core relational engine. Ken discusses the User Mode Scheduler (UMS), a
base component of SQL Server that allows it to efficiently scale to thousands of
users and process tens of thousands of transactions per second. A highlight of this
section is the chapter on the query processor. Ken offers an incredibly clear
description of how a query is transformed from SQL text (supplied by an application
or user) through parsing, normalization, optimization, and ultimate conversion into a
series of physical operators that the execution engine runs to solve the query.
Throughout this section Ken describes many of the optimizer's inputs, plan choices,
and execution strategies. He also offers a number of "under the cover" tips for how
you can peer into this complex portion of the product to determine why the
optimizer may have chosen a particular plan to solve your complex query.
Part III, Data Services, discusses the various ways you can interact with SQL Server's
core data engine. In this section Ken describes SQL Server's XML facilities and how
Data Transformation Services (DTS) can be used to transform and move data into
and out of SQL Server. The recently released Notification Services, which can be
used to create highly scaled event- and subscription-based data services, is also
covered. Finally, Ken describes the various replication technologies supported by
SQL Server in great detail.
One obligation in writing a foreword for a book is to provide guidance about who
should read it. In this instance, the recommendation is clear: This book is for anyone
interested in furthering their existing knowledge of SQL Server into deep knowledge.
Some people are naturally inclined to explore any new subject in a deep way. If you
are someone who is not satisfied in knowing that something works, but rather you
need to know how and why it works, Ken's book will quench your thirst. Perhaps you
find that your knowledge of SQL Server, built over a period of time by using the
product, no longer offers you explanations to complex issues and questions you
face. Ken's book will take you beyond Books Online and into the inner workings of
the product. Maybe you simply want to know how SQL Server, a complex and high-
performance application, was designed and constructed. Again, The Guru's Guide to
SQL Server Architecture and Internals will meet your needs.
I can pretty much guarantee that anyone who uses SQL Server on a regular basis
(even those located in Redmond working on SQL Server) can learn something new
by reading this book.
David Campbell
June 2003
David Campbell joined Microsoft in 1994 as a developer on the core storage engine
of Microsoft SQL Server. He has been with the SQL Server team since then and is
currently the Product Unit Manager of the Relational Server team.
Historical Perspective
It is hard to believe that it has been ten years since I was offered a job to work at
Microsoft to support its new SQL Server Windows NT product. I never thought I would
find myself or our product in the position we are today. My previous experience was
as a C programmer and database developer on UNIX systems mainly working with
Oracle and Ingres. My perception of Microsoft was purely as a desktop company. The
only database I had even seen on a PC was dBase. As I contemplated the job offer, I
was naturally skeptical. How could Microsoft even create a product that could
compete with the biggest names in the database industry? Fortunately for me,
Andrea Stoppani, the director of SQL Support for Microsoft in 1993, convinced me
that not only would Microsoft and SQL Server be successful but also that I would find
a rewarding career with Microsoft because I would have an opportunity like never
before: to train, learn, debug, and dig into the internal "nuts and bolts" of a
relational database engine.
Ten years later, that promise has held true. In my role as an escalation engineer at
Microsoft, I've had to train, learn, debug, and dig into the internal mechanics of the
engine that drives SQL Server. Because of that gained knowledge, I've been asked to
advise and provide feedback and insight to the development team with each new
release. Through these years, I've witnessed an evolution and revolution with this
product, from the early years of supporting SQL Server 4.20 for OS/2 when
customers on single processor machines were limited to 16MB of RAM to the
enterprise-ready, TPC record-breaking SQL Server 2000 Enterprise Server running on
a machine with 64CPUs and 512GB of RAM.
The engine itself has clearly evolved from its early origin. The storage engine and
query processor for SQL Server 4.20 through SQL Server 6.5 were all based on the
original architectural design that came from the port of the Sybase engine on OS/2.
During these years, remarkable changes and additions were performed to make the
engine run faster and become a reliable, affordable platform for many users looking
to deploy database systems. However, these efforts ultimately reached their limits.
This is why SQL Server 7.0 was so significant. Microsoft attracted some of the
leading developers in the database industry to design and implement a new
architecture for the engine, a foundation to build on for years to come.
The changes and evolution have not just been with the SQL Server engine. In fact, in
my early years of supporting SQL Server, the engine was the primary focus of the
job because the product was pretty much the engine, sqlservr.exe. The development
community back then focused its efforts on Visual Basic or C applications using DB-
Library communicating over named pipes or IPX/SPX. ODBC was just an idea on Kyle
Geiger's computer. Today, it is more common for Microsoft support engineers to deal
with multitiered Web-based applications supporting online business retail
applications with thousands of users all communicating over TCP/IP. The mind-set of
supporting "just the engine" no longer applies. The SQL Server product has
expanded to include a rich framework of data services including Multi-Server Job
Scheduling, Data Replication, XML, Data Transformation Services, and Notification
Services.
As I reflect on the changes to the engine and the core additions that have made it
such a popular product, I think about the common questions I get from customers
and other Microsoft employees: "How can I learn more about what makes SQL Server
so powerful? How can I gain expert knowledge of some of the internals of the SQL
Server engine in order to maximize the usage of the product?" My answer is always,
"Think like a programmer." To be more specific, "Think like a Windows programmer."
SQL Server has grown as a technology and as a force in the database industry. The
number of high-quality books on this product alone is a leading indicator. When I
started at Microsoft in 1993, there were no books on Microsoft SQL Server (and only
one was produced within the next year). Today you can search the Web or go to your
local bookstore and find dozens of books dedicated to this product ranging from
topics on performance tuning to database administration to XML. The development
of this book is a testament to the product's success. The book seeks to expand the
knowledge of important topics about SQL Server in order to broaden the level of
expertise worldwide. With knowledge there is power, and this book is about
empowering SQL Server users, developers, and administrators to get the most out of
the product.
Bob Ward
June 2003
Bob Ward joined Microsoft in 1993 as a support engineer for Microsoft SQL Server. He
is currently an escalation engineer in SQL Server Support.
Preface
I grew up on a farm in America's heartland. From the time I was eight years old until
I left home for college, I lived in a small wood-frame house in rural Oklahoma with
my parents and sisters. I experienced life as a bona fide country boy with all its
attendant wholesomeness, adventure, and isolation.
I came up in a time when running water and electricity were already commonplace,
even in rural Oklahoma, so I have no horror stories to relate about the lack of basic
accoutrements or outhouses or dirt floors. I did, however, milk five cows every day
before I went to school; I bailed hay in the summer and cut firewood in the fall; and
my sisters and I helped our parents plant and harvest a large truck garden every
spring and summer. I fed chickens, hogs, and various other creatures, and I
delivered my share of baby calves and slaughtered perhaps more than my share of
feeder steers.
My parents' motivation for moving to the country was never quite clear to me. My
dad's work as a government engineer afforded us a comfortable life in suburbia that
didn't seem to be in need of such a major overhaul. Nevertheless, during my ninth
year on earth, my parents uprooted us and took us to a life that we city dwellers had
never even dreamt of. Prior to that time, I'd never seen a live cow except on
television, nor had I ever ridden a horse. We pulled up stakes and went to the
country, and all that changed.
I still remember my mother sitting us down the day before we moved and telling us
that leaving the city was a chance to learn some wonderful new aspects of life, to
gain perspective, to see things through different eyes than most people ever had the
chance to. She countered our litany of complaints and misgivings with enthusiasm
and reassurance that not only would everything work out, it would actually be for
the best. She believed that every experience was a chance to learn something. Like
Thoreau, she wanted to suck the very marrow out of life. She decided early on that
we would get the most out of our time on the farm, and she did everything in her
power to make sure that happened. I didn't really understand the import of all she
said back then�I did not want to move�but I understand now.
Without a doubt, moving to the country was a wonderful opportunity to learn life's
lessons. They were all right there in nature: in the rivers, in the trees, and in the
cycle of living and dying so evident all around us. For a boy of eight, there was no
better place to learn. Exploring the woods, rafting down the creek, fishing in the
pond, pulling fresh fruit from a tree and eating it unwashed�every day was an
adventure, a time of exploration to learn more about the observable world. I learned
what life had to teach in ways I never could have had we stayed in the city, and I'll
always be thankful for that.
My mother went to great lengths to make sure our education did not suffer as a
result of our being transplanted to the sticks. She started a personal library for each
of us and tried to infuse in us all the same love for reading that she'd had her whole
life. When the county wouldn't open a library anywhere near us, she convinced the
library to start up a summer bookmobile program. Bookmobile Day, as it came to be
known, was a joyous occasion, a time when a rambunctious pack of little kids raced
each other up the quarter-mile jaunt to the old country church where the mobile
library parked. Inside the converted RV, the walls were lined with books, and the air-
conditioned coolness was a wonderful respite from the hot Oklahoma sun. We would
stay until they kicked us out, each time leaving with an armload of books to be
returned on the next Bookmobile Day.
It was during this time that I first began to explore the mysteries of life itself. I
wanted to know where it all came from, how it all worked. I read voraciously, my
eight-year-old mind gobbling up every science book and every electronics book I
could get my hands on. I wanted to know the secret of it all; I wanted to know what
the basic essence of everything was. I wanted to know how life, how the world�how
everything�worked. I wanted to understand what literally made the world go 'round.
It was in those days that I came across my first physics books and realized that I was
on to something. I had found a trail that might lead me to the understanding I
sought. I read about gravity and magnetism, about strong and weak particles. I
formed a mind model of how the universe worked. I gained an
understanding�however imperfect it might have been�of how everything
interoperated, how it was designed, and how reality as I understood it came down to
just a handful of fundamental concepts that I could readily see at work in the natural
world around me. I came to know a basic "system" of life, a framework that could
explain pretty much everything that existed. Suddenly, the country, nature, and the
world as I knew it began to make sense.
Since that time, I have approached almost everything I've learned with the same raw
curiosity. I want to know how it works; I want to understand it holistically. I work hard
not to settle for cursory explanations or shallow understanding. I am driven to know
precisely how something is put together and how its component parts interoperate
and interrelate. I believe this is the only real way to master something, to truly grasp
its raison d'être.
That philosophy was the genesis of this book. I wrote it to pass on what I have
learned about how SQL Server and its fundamental technologies are designed, how
they work, and how they interoperate. I wrote it because I enjoy exploring SQL
Server. I have covered how to use and program SQL Server in previous books; I
wrote this book to detail how SQL Server is put together from an architectural
standpoint. By doing so, it's my hope that I can pass on to you the same
wonderment, the same love for technology and for all things SQL Server that I have.
It's my belief that the road to true mastery of SQL Server or any other technology
begins with exploring its design. Knowing how to put a technology to practical use is
certainly important, but that begins with understanding how it works and how it was
intended to be used. Being intimately familiar with how SQL Server is designed will
make you a better SQL Server practitioner. It will take you to heights that otherwise
would have been unreachable.
I said goodbye to that sandy-haired boy running barefoot through the backwoods of
rural Oklahoma long ago. I live in the city now, but the country lives in me still. My
mind often drifts back to moonlit walks in the field, the open sky, the wonderment of
all that was and all that could be. I still recall the smell of fresh alfalfa on the evening
breeze, the unfettered joy of rolling headlong down golden hills, the abandonment of
all of life's cares for that one rapturous moment. I miss the adventure and the
oneness with life that I came to know back then. I grow wistful for the echo of the
crow in the distance; I miss twilight in the forest. I miss the tire swing over the pond
and the taste of fresh corn pulled ripe from the stalk.
As I've said, although I left the country, the country never really left me. The same is
true of my insatiable desire to explore and understand all that I can about the world
around me and the things that pique my interest. Although I've moved around a bit
and changed jobs from time to time, the sense of adventure that drove me to
explore the strange new place I found myself in at the age of eight is with me still. I
have spent my life since those days on one journey after another, exploring one new
world after another in hopes of learning all that I possibly can. I am still on my quest
to learn all that I can about SQL Server. I've been working with the technology since
1990, and still there's plenty left to be discovered. Here's hoping that you'll join me
as I retrace my path through the technology, exploring new places and discovering
sights yet unseen. And here's hoping that you will enjoy the trip as much as I have.
Ken Henderson
March 11, 2003
Acknowledgments
Every book I've written has literally had an entire army of people behind it that
helped make it come to pass. This book is no exception. I will begin by thanking the
many SQL Server MVPs and industry experts who reviewed this book and gave me a
wealth of constructive feedback on it. I'm especially grateful to Greg Linwood, Tony
Rogerson, Allan Mitchell, Darren Green, Aaron Bertrand, T. K. Dinesh, and Wayne
Snyder. This wouldn't be the book that it is without your contributions.
I would also like to thank my friends at Microsoft who reviewed the manuscript and
helped me turn out a better book than I otherwise could have, particularly Bart
Duncan, Robert Dorr, Keith Elmore, Bob Ward, Diane Larsen, Christa Carpentiere,
Dick Dievendorff, and those on the SQL Server development team who reviewed the
text. I'd also like to thank Vern Ameen of Microsoft for first believing in me and the
value I would bring to the company. Thanks, Vern�your confidence in me will not be
forgotten. I'd like to thank Ron Soukup, also formerly of Microsoft and the original
"father" of SQL Server, for his belief in me and his support of my work. Ron, the talk
we had shortly after you left the company made all the difference�thank you again
for it.
From a personal standpoint, I'd like to thank my friends John Cochran and Lorraine
Beaudette, who reviewed several portions of the manuscript and encouraged me to
keep the edge on it when I was sorely tempted to round it off. Your support of my
work and your personal friendship over the years are worth more to me than you will
ever know.
My friend Neil Coy has, as always, been both a source of inspiration and a steadfast
mentor throughout this project. I don't think anyone who ever met Neil wasn't
influenced by him in some positive way, and I am certainly in his debt. This book has
benefited from his influence, as have all my books. There are a few constants in my
life that I've come to depend on. One of them is my close kinship with Neil Coy. Neil,
you're a coder's coder. Thanks for all you taught me all those years ago and for all
you continue to teach.
I would be remiss if I didn't also pause to thank the wonderful people at Addison-
Wesley who helped make this book possible. My editor, Karen Gettman, has been a
real trooper throughout the whole project. Thanks for keeping me on track, Karen.
Emily Frey, Karin Hansen, and Curt Johnson have also been a tremendous help in
making this project come to pass. The production team, Elizabeth Ryan of Addison-
Wesley, Kim Arney Mulcahy, Monica Groth Farrar, and Chrysta Meadowbrooke, have
also been wonderful to work with. I will single out Chrysta here for special
recognition: Chrysta, you're the best copyeditor I've ever worked with�keep up the
excellent work. And I'd like to also thank Carter Shanklin, who originally brought me
to Addison-Wesley, and Michael Slaughter and Mary O'Brien, my former editors, who
are largely responsible for the success we've had thus far.
Lastly, I'd like to thank my wife. Simply put, she is a saint. She has made this and
every other book I've written possible. Her superhuman abilities as a mother help
me get past the natural guilt that comes from being away from my kids for months
at a time while I struggle along in my office cranking out my latest book. She makes
the whip that Capote talks about a little more bearable. And she makes all the late
nights and bleary-eyed mornings somehow worth it.
Introduction
One day I started writing, not knowing that I had chained myself for life to a
noble but merciless master. When God hands you a gift, he hands you a whip;
and the whip is intended solely for self-flagellation. . . . I'm here alone in my
dark madness, all by myself with my deck of cards�and, of course, the whip
God gave me.
�Truman Capote[1]
[1] Capote, Truman. Music for Chameleons (reprint edition). New York: Vintage Books, 1994, pp. xi and xix.
I wrote this book to get inside SQL Server. I wanted to see what we could learn about
the product and the technologies on which it's based through the use of a freely
downloadable debugger, a few well-placed xprocs, and a lot of tenacity. The book
you're reading is the result of that experiment.
In my two previous SQL Server books, I focused more on the pragmatic aspects of
SQL Server�how to program it and how to make practical use of its many features.
As the title suggests, this book focuses more on the architectural design of the
product. Here, we dwell on the technical underpinnings of the product more than on
how to use it. It's my belief that understanding how the product works will make you
a better SQL Server practitioner. You will use the product better and leverage its
many features more successfully in your work because you will have a deeper
understanding of how those features work and how they were intended to be used.
About Books Online
As with my previous books, one of the design goals of this book was to avoid
needlessly repeating the information in Books Online. This necessitated omitting
certain subjects that you might expect to find in a book like this. For example, I had
originally planned to include an overview chapter that covered the architectural
layout of the product from a high-level point of view. I had also planned to have a
chapter on the architecture of the storage engine. However, on rereading the
coverage of these subjects in Books Online (see the topic SQL Server Architecture
Overview and the subtopics it links) and in other sources, I didn't feel I could
improve on it substantially.
My purpose isn't to fill these pages with information that is already readily available
to you; it is to pick up where the product documentation (and other books and
whitepapers) leave off and take the discussion to the next level. As such, in this book
I assume that you've read through Books Online and that you understand the basic
concepts it relates.
About WinDbg
This book features a good deal of work with WinDbg, Microsoft's freely downloadable
symbolic debugger. You may be wondering why we need a debugger to explore SQL
Server in the first place. After all, we obviously aren't going to "debug" SQL Server,
and we certainly don't have source code for it, so we won't be stepping through code
as is typically the case with a debugger.
The reason we use a debugger is that it gives us the ability to look under the hood of
a running process in ways no other tool can. A debugger lets us see the threads
currently running inside the process, their current call stacks, the state of virtual
memory and heaps within the process, and various other important process-wide
and thread-specific data. It lets us set breakpoints, view registers, and see when
DLLs are loaded by the process or rebased by Windows. It lets us pause execution,
dump memory regions, and save and restore the complete process state. In short, a
debugger provides a kind of "X-ray" facility�a tool that lets us peer inside a process
and see what's really going on within it. In this case, the object of our interest is SQL
Server, but the basic debugging skills you'll learn in this book could be used to
investigate any Win32 application. One of the chief goals of this book is to equip you
with some basic coding and debugging skills so that you can continue the
exploration of SQL Server on your own.
If we are to truly get inside the product and understand how it works, using a
debugger is a must. Trying to understand the internal workings of a technology by
merely reading about it in books or whitepapers is like trying to learn about a foreign
country without actually visiting it�there's no substitute for just going there.
Given that WinDbg is freely downloadable from the Microsoft Web site, has the
features we need, and is relatively easy to use, it seems the obvious choice. A
symbolic debugger, it can use the symbols that ship with SQL Server and that are
publicly available over the Internet, so it's a suitable choice for exploring the inner
workings and architectural design of the product.
About the Fundamentals
You'll notice an emphasis in this book on understanding the technologies behind SQL
Server in order to understand how it works. I spend several chapters going through
the fundamentals of processes and threads, memory management, Windows I/O,
networking, and several other topics. To the uninitiated, these topics may seem only
tangentially related at best. After all, why do you need to know about asynchronous
I/O to understand SQL Server? You need to know something about it and the other
fundamental technologies on which SQL Server is based in order to have a proper
frame of reference and to gain a deep understanding of how the product itself works.
You need to understand the fundamental Windows concepts on which SQL Server, a
complex Windows application, is based for the same reason that a medical student
needs to understand basic biology in order to get into medical school: Without this
fundamental knowledge, you lack the perspective and foundation necessary to
properly root and ground the more advanced concepts you will be attempting to
learn. Humans learn by association�by associating new data with knowledge
already acquired. Without a solid grounding in the fundamentals of Windows
application design, you lack the basic knowledge required to systematically
associate the details of how a complex Windows application such as SQL Server
works.
To be sure, you can gain a superficial idea of how SQL Server works (for example, by
reading that it makes use of scatter-gather I/O) without really understanding what
the details mean. If you really want to master the product�if you really want to
know it literally inside-out�you have to have some understanding of the
technologies from which it's composed. Knowing how scatter-gather I/O works will
give you immediate insight into why SQL Server uses it and why it enhances
performance. The same is true for virtual memory, thread synchronization,
networking, and the many other foundational topics we explore in this book. Not
only are they relevant; having a basic understanding of them is essential to truly
understanding SQL Server. Without a basic understanding of the fundamental
technologies on which SQL Server is based�Win32 processes and threads, virtual
memory, asynchronous I/O, COM, Windows networking, and various others�you
have neither the tools nor the frame of reference to truly grasp how the product
works or to master how to use it.
I fully realize that not every reader will be interested in the Windows technologies
and APIs behind SQL Server's functionality. That's okay. If the nitty-gritty details of
the Win32 APIs, how to use them, and how applications such as SQL Server typically
employ them don't interest you, feel free to skip the Foundations section (Part I) of
this book. There's still plenty of useful information in the rest of the book, and you
don't have to understand every detail of every API to benefit from it.
About the "How-To"
I've tried very hard to provide the architectural details behind how the various
components of SQL Server work without neglecting the discussion of how to apply
them in practical use. I am still a coder at heart, and there is still plenty of "how-to"
information in this book. At last count, there were some 900 source code files slated
for inclusion on the book's CD. That's more than either of my last two books, both of
which were very focused on putting SQL Server to practical use, as I've said.
In terms of the central topic of all three of my SQL Server books�namely, getting
the most out of the product�I've attempted to elevate the discussion to an
exploration of the architectural design behind the product without leaving behind my
core reader base. Regardless of whether you came to this book expecting the
mother lode of code and practical use information that you typically find in my books
or you agree with me that understanding how the product works is key to using it
effectively, I hope you won't be disappointed with what you find here.
About the Breadth of Topics
You will notice that this book covers a wide range of product features and
technologies. It is not limited merely to the functionality provided within
sqlservr.exe�it tries to cover the entire product. It's my opinion that a book that
purports to discuss the internal workings and architectural design of a complex
product such as SQL Server should cover the whole product, not just the
functionality that resides within the core executable or product features that have
been in place for many years. The world of SQL Server is a lot bigger than just a
single executable. Prior to the 7.0 release of the product, I suppose you could get
away with just covering the functionality provided by the main executable, but that's
no longer the case and hasn't been for years. The product has matured and has
broadened substantially with each new release.
This book isn't titled The Guru's Guide to sqlservr.exe�it's about all of SQL Server
and how its many component pieces work and fit together. So, you'll see coverage in
this book of what might seem like fringe SQL Server technologies such as Full Text
Search, Notification Services, and SQLXML. We'll explore replication, DTS, and a host
of other SQL Server technologies that are not implemented in the main SQL Server
executable. Of necessity, I can't cover every feature in the product or even as many
as I'd like. The book would take ten years to write and would be 5,000 pages long.
However, I've tried to strike a balance between covering topics in the depth that
people have come to expect from my books and exploring a sufficient breadth of
features and technologies such that you can get a good feel for the overall design
and architecture of SQL Server as a product.
About C++
I'm fully aware that many SQL Server people are more comfortable in Visual Basic
than in any C or C++ dialect. I used C and C++ to cover Windows programming
fundamentals and elsewhere in the book for a couple of reasons.
First, the Win32 API itself is written in C. Although whole books have been written on
accessing the Win32 API from VB, it has been my experience that this ranges from
clunky to outright impossible in some circumstances, depending on the API function
in question. The Win32 API was originally written in C, and therefore C and C++ are
the purest and most direct methods of accessing it. Any other approach�be it from
VB, Delphi, C#, or some other language or tool�adds a layer of indirection that can
cloud the discussion.
Second, I used C++ because I happen to believe that the language is not that hard
to learn and that most VB people are more than capable of developing basic C++
programming skills and effectively reading C++ code, regardless of whether they
believe that themselves. There seems to be a natural aversion or fear of all things
C++ among those in the VB community. It's my belief that these concerns are
largely unfounded and that they needlessly limit people's ability to really understand
Windows and complex Windows apps such as SQL Server. My advice: Even if you
don't know C++ and feel you're out of your depth when reading through C++ code,
don't be afraid of it. Work through the examples in this book, follow the instructions I
provide, and see where your exploration leads you. Pick up an introductory book on
the language if it suits you. You may find that the language isn't nearly as hard to
get around in as you thought, and you may benefit�perhaps immensely�from the
experience.
All that said, C++ is far from the only language used in this book. I know that no one
language is used by everyone so I've tried to keep the book balanced in terms of the
language tools used. A good deal of the example code used throughout the book is
some flavor of Visual Basic�VB6, VBScript, or VB.NET. In the ODSOLE chapter, for
example, I show you how to build COM objects in VB6. In the SQLXML chapter, I
show you how to access SQLXML using VBScript. And in the Notification Services
chapter, I show you how to implement a subscription management application using
VB.NET. There's also a healthy helping of C#, Delphi, CMD files, and even a
discussion or two of assembly language. And, of course, there's a wealth of Transact-
SQL code throughout the book. Regardless of your preferred language(s), you should
find code of interest to you in this book.
About Visual C++ 6.0
Some of you may question the decision to use Visual C++ 6.0 for most of the C++
code examples in this book. I chose VC6 over Visual Studio .NET for two reasons: (1)
having been around considerably longer, VC6 is much more pervasive, and (2) Visual
Studio .NET (both the 2001 and the 2003 releases) will automatically upgrade VC6
projects when they are first opened. So, regardless of whether you have Visual
Studio 6 or Visual Studio .NET, the C++ projects on the CD accompanying this book
should open just fine for you. You should be able to compile and run them without
incident. Also, when teaching basic Windows concepts such as thread
synchronization and memory management, I do not use any version-specific
features, so there's no advantage to using Visual Studio .NET over VC6.
About the Terms and Knowledge Measures
Readers of my previous books may notice a significant amount of "supplementary"
material in several of the chapters. You'll likely notice the term definitions that
precede some of the chapter discussions and the knowledge measures at the end of
each discussion. Don't worry: I still hate filler material and have gone to great
lengths to avoid unnecessary screen shots, summaries, and other devices commonly
used to lengthen technical books.
Though I personally don't like putting together term definition tables, knowledge
measures, and the like and have avoided them in previous books, a growing number
of readers have asked for additions such as these in order to make my books more
suitable for classroom use. Several of my previous books are regularly used in
classroom settings even though, admittedly, those books do not lend themselves
well to it. Therefore, I've finally decided to try to do something about that. If you do
not find these sections particularly useful, feel free to skip over them. All of the data
contained in the term definitions is also in the chapter text�you won't miss anything
by skipping them. That said, you may find that having a basic understanding of
some of the terms and concepts before we get into them in depth may be useful to
you. It really comes down to your individual preferences.
I have intentionally not included the answers to the questions in the knowledge
measure sections in order to get a feel for how much they are used. Again, this is an
adaptation intended to make the book more usable in classroom scenarios. I may or
may not continue it in future books, depending on how useful it proves to be. If you
want the answers to the knowledge measure questions, e-mail me at
[email protected], and I'll provide them.
About SQL Server Versions
This book targets the latest release of SQL Server currently available, SQL Server
2000. Throughout the book, when you see a reference to SQL Server, you can
assume that it definitely applies to SQL Server 2000 and probably to other releases
as well. I rarely mention SQL Server's version number because I've found it to be a
little cumbersome. That said, when in doubt, assume what you read in this book
applies to SQL Server 2000.
About Master Programming
With the sheer volume of code and code-related discussions in this book, it might
appear to some that I'm trying to turn you into a master programmer rather than a
master SQL Server practitioner. Nothing could be further from the truth. In order to
really address that concern, let's first define what a master programmer is.
To begin with, a master programmer is someone who likely codes for a living. You
cannot develop expert-level coding skills and keep them sharp by merely studying
other people's code or reading programming books. You have to get in there and get
your hands dirty, and you have to keep doing it. Technology changes and software
engineering evolves quickly enough that there's simply no substitute for coding
every day.
Second, a master programmer is someone who doesn't just know how to churn out
source code. A person I worked with once suggested that the defining characteristic
of an expert coder is great typing skill! I laughed out loud at that assertion because
being an expert coder has nothing to do with typing�I know expert coders who
don't type well at all. That notion reminds me of what Truman Capote said when
asked about Jack Kerouac's work: "That isn't writing at all, it's typing."[2] Just as good
writing amounts to a lot more than typing, so does expert-level coding. Cranking out
reams of source code does not a master programmer make. In fact, there's a paucity
and efficacy about the code of the programming masters that often accomplishes an
astonishing amount of work with a surprisingly small amount of code. The idea isn't
to write lots of code; it is to write good code. It's a question of quality versus
quantity.
Sixth, a master programmer is well read. He knows who Martin Fowler is. He reads
Kent Beck, and he's well versed in Erich Gamma's work. He reads both technology-
specific books as well as those related to software engineering as a discipline. He
reads Steve McConnell, and he also reads Donald Knuth. He knows who Jon Bentley
is, and he also knows Brian Kernighan's work. He is well versed in Grady Booch's
work and also reads Charles Petzold. In a day and age in which technology and the
engineering required to master and put it to practical use seem to evolve at the
speed of light, one can't read too much or stay too current with the latest
developments in the industry. A master programmer knows this and dedicates
himself to a lifelong course of continuing education.
So, with this in mind, I hope it's obvious that I'm not trying to turn anyone into a
master programmer. This book isn't about software development; it's about SQL
Server. To the extent that I delve into subjects seemingly more related to coding
than to SQL Server, there is a method to the madness: I am trying to help develop
basic coding and debugging skills in those who may lack them so that they can
better understand how and why SQL Server is designed the way it is and so that
they can continue the exploration of SQL Server on their own. The whole thrust of
this book is about gaining as deep an understanding of SQL Server as possible so
that we can put it to better use in the real world.
About the Author
Ken Henderson is a husband and father who lives in Dallas, Texas. He is the author of
seven previous books and a long-time software developer, consultant, and
conference speaker. He is also an avid Mavericks fan and spends his spare time
playing games with his kids, watching sports, and playing music. Henderson's Web
site is http://www.khen.com, and he may be reached via e-mail at [email protected].
Part 1: Foundations
Chapter 1. Overview
That which is borne of loneliness and from the heart cannot be defended
against the judgment of a committee of sycophants.
�Raymond Chandler[1]
I'll begin by touching on each of the major subjects covered in this book and give a
brief overview of each chapter. This will give you a high-level view of what the book
itself is about and what we'll be talking about in the chapters ahead.
Chapter Overview
In this chapter, we'll talk about how Windows works from an architectural standpoint.
We'll discuss the various components of Windows and how Windows applications are
constructed. You'll learn about DLLs, virtual memory, CPU nuances, and a variety of
other Windows elements that affect how complex Windows applications such as SQL
Server behave.
This chapter covers Windows' process and threading architecture. You'll learn what a
process is, how it differs from a thread, and how SQL Server behaves itself as a
process. You'll learn about Windows' scheduler and how threads are scheduled for
execution, and you'll explore thread synchronization in depth. You'll work through
several C++ applications and xprocs that demonstrate how processes and threads
work under Windows and how this applies to SQL Server. Starting in this chapter and
continuing throughout the Foundations section, we'll build a series of applications
that search text files for strings. We'll create several different versions of a text
search application�each one employing different Windows foundational
concepts�so that you can readily see how the various Windows technologies are
typically used in real applications. We'll pick up where this chapter leaves off when
we discuss SQL Server's User Mode Scheduler (UMS) later in the book (Chapter 10).
In this chapter, you'll learn about Windows' memory management. You'll learn about
the difference between virtual memory and physical memory, as well as the
difference between virtual memory and heaps. You'll learn how Windows apportions
memory to applications and how it translates virtual memory addresses into physical
memory addresses. We'll continue the text file search theme started in the previous
chapter and build several applications that illustrate how Windows applications can
make use of the memory management facilities provided by the operating system.
You'll gain insight into how SQL Server makes use of some of these facilities by
building apps that feature them. We'll build on the concepts taught in this chapter
when we get to SQL Server Memory Management (Chapter 11) later in the book.
We'll take a tour of Windows' foundational I/O facilities in this chapter. We'll continue
the text file search theme and build applications that make use of synchronous I/O,
asynchronous I/O, nonbuffered file I/O, scatter-gather I/O, and file I/O using memory-
mapped files. You'll gain insight into how SQL Server makes use of Windows' I/O
facilities by seeing them at work in real applications. In SQL Server as a Server
(Chapter 9), we'll discuss how SQL Server makes use of these concepts.
Chapter 6: Networking
Chapter 7: COM
We'll explore the basics of Microsoft's Component Object Model (COM) technology
and discuss how COM is used by SQL Server. You'll learn about threading models,
interfaces, marshaling, reference counting, and many other COM concepts. We'll talk
about how Windows applications typically make use of COM, and we'll talk about
some of the ways in which SQL Server uses it. This chapter will provide the
background you'll need to work through the ODSOLE chapter later in the book.
Chapter 8: XML
In this chapter, you'll learn about the eXtensible Markup Language (XML). You'll learn
how to construct your own XML documents and how HTML and XML fundamentally
differ. You'll learn about attributes, elements, and schemas, and you'll learn to apply
XML style sheets to transform your data. This chapter will provide the background
and foundational information you'll need to work through Chapter 18 (SQLXML) later
in the book.
We'll discuss how SQL Server behaves as a Windows server application in this
chapter. We'll pull together several of the concepts discussed earlier in the book and
show how they're employed by SQL Server. For example, we'll show how SQL Server
uses the Windows networking APIs to listen for new connections and schedule them
for processing via UMS. We'll talk about the DLLs imported by SQL Server and what
they're used for, and we'll talk about where SQL Server fits in the general taxonomy
of Windows applications.
In this chapter, we'll document how the SQL Server query processor works internally
and how it processes and optimizes queries. You'll learn about the four major stages
of query optimization and how each one affects the overall optimization process.
You'll learn how indexes, statistics, and constraints are used by the optimizer to
generate efficient execution plans, and you'll learn how to structure queries for
maximum performance.
The Transactions chapter examines SQL Server transactions in depth. We'll write
several Transact-SQL queries that make use of transactions, then explore how they
work and how SQL Server's transaction management constructs work in general.
You'll learn how to avoid common errors and how to properly use SQL Server's
transaction management facilities in your own applications.
In this chapter, we'll explore how SQL Server cursors work. You'll learn about the
different types of cursors, how to use them, and how to avoid common mistakes.
We'll talk about how transactions and cursors interrelate and how you can avoid
common concurrency and performance issues caused by cursor misuse.
We'll explore how you can make use of COM objects from within Transact-SQL via
SQL Server's Open Data Services Object Linking and Embedding (ODSOLE) facility.
You'll learn about the sp_OA extended procedures, how to use them, and when not to
use them. You'll build several interesting COM objects for use within Transact-SQL
including a bevy of financial functions, array functions, and string manipulation
functions. You'll likely find much of the sample code in this chapter to be very useful.
In this chapter, we'll explore SQL Server's Full-Text Search (FTS) facility. You'll learn
how it works and how it is designed. You'll explore FTS queries and how to make use
of FTS in your own code.
You'll learn about distributed partitioned views and how they relate to server
federations in this chapter. You'll dig into a few execution plans and delve into how
partitioned views affect performance and scalability.
In this chapter, you'll explore the many aspects of SQL Server's XML technology.
You'll learn about FOR XML, OPENXML(), sp_xml_preparedocument, updategrams,
templates, the SQLXML managed classes, and so on. We'll build on the XML
fundamentals we explored in Chapter 8 and explore how SQL Server exposes a
powerful collection of XML-enabled features and how to make use of those features
in your own applications.
In this chapter, you'll get to explore SQL Server's Notification Services technology.
You'll learn about how it is designed and how the typical Notification Services
application is architected. You'll see how easy the Notification Services platform
makes building and deploying notification-oriented applications that are fully
functional, very powerful, and scalable. You'll finish up the chapter by building a
Notification Services application of your own and a subscription management
application using VB.NET.
You'll explore SQL Server's Data Transformation Services (DTS) in this chapter. We'll
talk about how DTS is designed, the fundamental components that make up DTS
packages, and the ways you can leverage them to build data transformation
applications that are powerful and flexible. We'll build a number of packages that
explore the many features and facilities of DTS. You'll finish up with a project that
teaches you how to access and control DTS packages via Automation.
Chapter 21: Snapshot Replication
In this chapter, you'll explore snapshot replication. You'll learn how data typically
moves between a publisher and a subscriber in snapshot replication and how you
can track exactly what is moved between them. You'll learn about the distribution
database and how the Snapshot Agent facilitates the replication process.
As with the previous chapter, this chapter is dedicated to replication. In this case,
we'll discuss transactional replication and the ways it differs from snapshot
replication and merge replication. You'll learn how the Log Reader Agent reads the
transaction log and sends changes to subscribers by way of the distribution
database. You'll learn about immediate and queued updating subscribers, and you'll
see how they work internally.
As the title suggests, you'll explore merge replication in this chapter. You'll set up
several subscriptions, then watch as they participate in a generic merge replication
scenario. You'll learn about generations and conflict resolvers, and you'll see how
merge provides a flexible (yet complex) data replication facility.
In this chapter, you'll learn how to track down undocumented features. Rather than
provide a smorgasbord of undocumented features as I've done in previous books,
this chapter shows how to find these hidden goodies yourself. You'll learn how to use
Profiler to find undocumented features and commands, as well as how to search the
text of system procedures for undocumented DBCC commands and trace flags. With
the skills and data you acquire in this chapter, you should be able to dig up
undocumented features and commands on your own.
In this chapter, I'll introduce you to a utility implemented via a collection of DTS
packages that can collect diagnostic data from SQL Server. Using this tool, you can
concurrently collect Profiler traces, blocking script output, SQLDIAG reports, Perfmon
logs, event logs, and a number of other useful diagnostics. This tool demonstrates
several useful DTS techniques including how to automate a DTS package from Visual
Basic, how to modularize a DTS application by breaking it up into separate
packages, and how to use a DTS package as a workflow manager for other tasks.
Chapter Pairs
It's probably pretty self-evident, but I guess I should point out my intent to construct
much of the book using "chapter pairs." A chapter pair is a pair of chapters where
the second chapter builds directly on the information shared in the first one. You'll
see these throughout the book. For example, rather than assume that the typical
DBA has an intimate understanding of the Windows scheduler (and can, therefore,
explain why SQL Server implements its own scheduler), I provide a chapter that
explains in detail how the Windows scheduler works. The book is aimed at the
database professional who may or may not have an in-depth understanding of
Windows internals. I felt an obligation to explain some of these foundational
concepts rather than flippantly assuming most of my readers already understood
them well. I could have constantly referred to external books and other sources, but I
felt that would be taking the easy way out, and I also felt I had something to add to
the discussion of Windows application fundamentals that has not been said before.
I build on this foundational information throughout the rest of the book as I delve
into the various parts of SQL Server. For example, in the User Mode Scheduler
chapter, I leverage the discussion of the Windows scheduler from earlier in the book.
The same is true of the Memory Fundamentals chapter and the SQL Server Memory
Management chapter: The first chapter lays the groundwork for the second one.
Ditto for the Networking chapter and the SQL Server as a Server chapter�one plays
off the other. Each of the main chapters in the Foundations section sets up one or
more chapters later in the book that cover specific SQL Server technologies or
components. Because of this, you will likely want to read through the Foundations
section before you dive into the rest of the book. If it doesn't make complete sense
the first time through or its applicability isn't immediately apparent, don't worry�it
will become clearer as you work through the remainder of the book.
If you already have a good understanding of Windows internals, COM, XML, and so
on, you may be able to skip these foundational chapters. As I said in the
Introduction, I'm well aware that such details may not be for everyone. If you have
doubts about the value of some of this information to you or your work, my
suggestion would be to start with the SQL Server�specific coverage (e.g., Chapter
9, SQL Server as a Server) and see whether you understand it reasonably well
without first reading through the earlier material (e.g., Chapter 3, Processes and
Threads). If so, good for you. If not, you have a thorough tutorial on many of the
foundational concepts you'll need right here in this book. Either way you go about it,
you should find all you need in this book to learn a good deal about SQL Server
architecture and internals.
About the Code
With the sheer volume of code in this book and the emphasis on exploring SQL
Server as a program, it might at first appear that you'd need to be a seasoned coder
to really understand the book. That's not my intent. I wanted to take a fresh
approach to the discussion of SQL Server's architecture and the way that its various
components are constructed and fit together, so this book explores the product from
the perspective of the professional developer. I think this is appropriate given that
SQL Server is, after all, an application. It was obviously built by developers. There is
no greater insight into the way an application works�no higher level of
mastery�than to understand the application in the same way the people that built it
do. By exploring SQL Server as an application, we attempt to get inside the heads of
the people who built it, to understand what they were thinking when they designed
it. Of course, there are limits as to how far we can go�we didn't write the app and
we don't have source code for it. But there is a great deal we can learn just by
approaching SQL Server as we would any other complex Windows application. By
using tools such as WinDbg, Perfmon, and others, we can look under the hood, so to
speak, and gain a deep understanding of much of the architecture and internals of
the product. In my opinion, there is no better way to master SQL Server or any other
third-party application.
So, do you need to be a coder to master SQL Server or glean everything this book
has to teach? No, but it certainly wouldn't hurt. If you don't consider yourself a
coder, my suggestion would be not to fret. Read through the text and examples in
this book, retrace my steps where you can, and work at your own pace. The fact that
the book is an in-depth study of a particular program doesn't mean that it's only for
programmers. It's my hope that many readers who don't consider themselves coders
will discover their inner programmer and, in taking their coding skill to the next
level, come to understand SQL Server in ways that would otherwise not be possible
and as they never have before. And it's my hope that they will then be sufficiently
equipped to continue the exploration of SQL Server on their own. Rather than just
divulging a mother lode of technical data as is so often the case with even the best
technical books, I wanted to teach readers how to investigate complex Windows
applications such as SQL Server. The investigatory skills you pick up in this book
should be applicable regardless of the product or program you're studying and
should allow you to continue your exploration of SQL Server for years to come.
Chapter 2. Windows Fundamentals
You have two ears and one mouth. If you use them in those proportions, we'll
get along just fine.
�Neil Coy
The programming interface slated for use with the original version of 32-bit Windows
was not the Win32 API. It was the OS/2 Presentation Manager API. Midway through
the development of what was to become the first version of Windows NT, Windows
3.0 was released, and adoption of Windows as a development platform exploded.
Microsoft then decided that the 32-bit programming API should be compatible with
the 16-bit API in order to make porting apps from Windows 3.x easier. This was the
genesis of the Win32 API and is why it may seem a bit incongruous or uneven at
times�it was designed to be as compatible as possible with the old 16-bit Windows
API.
Given that SQL Server is a complex Windows application, it of course makes heavy
use of the Win32 API. In the first section of this book, we will explore many of the
Win32 API functions that SQL Server uses. You'll learn how they work and how to use
them, and you'll gain some insight into how SQL Server makes use of them. You'll
see the relationship between certain key SQL Server features and the Win32 API
functions they rely upon.
You can download the Platform SDK, which contains the C header files and libraries,
as well as the online documentation for the Win32 API, directly from the Microsoft
Web site. The Platform SDK also ships with several of Microsoft's development tools
and is included with the MSDN Library.
User Mode vs. Kernel Mode
In order to keep misbehaving application code from destabilizing the system,
Windows uses two processor modes: user mode and kernel mode. User application
code runs in user mode; OS code and device drivers run in kernel mode. Kernel
mode has a higher hardware privilege level than user mode and provides access to
all system memory and all CPU instructions. By running at a higher privilege than
application software, Windows can keep a misbehaving application from directly
destabilizing the system.
The Intel x86 family of processors actually supports four operating modes (also
known as rings). These are numbered 0 through 3, and each is isolated from the
others by the hardware�a crash in a lower-priority mode will not destabilize higher-
priority modes. Because it was originally designed to support chips such as the
Compaq Alpha and the Silicon Graphics MIPS that provide only two processor modes,
Windows uses only two of these modes (rings 0 and 3, for kernel and user modes,
respectively). Now that these chips are no longer supported, it would probably make
sense for Windows to use at least one additional operating mode on the Intel x86
processor family. Doing so would allow device drivers, for example, to run at a lower
privilege level than the OS and would prevent an errant driver from being able to
bring down the entire system.
Processes and Threads
An instance of a running application is known as a process. Actually, that's a
misnomer. Processes don't actually run�threads do. Every process has at least one
thread (the main thread) but can have many. Each thread represents an independent
execution mechanism. Any code that runs within an application runs via a thread.
Each process is allotted its own virtual memory address space. All threads within the
process share this virtual memory space. Multiple threads that modify the same
resource must synchronize access to the resource in order to prevent erratic
behavior and possible access violations. A process that correctly serializes access to
resources shared by multiple threads is said to be thread-safe.
Each thread in a process gets its own set of volatile registers. A volatile register is
the software equivalent of a CPU register. In order to allow a thread to maintain a
context that is independent of other threads, each thread gets its own set of volatile
(software) registers that are used to save and restore hardware registers. These
volatile registers are copied to/from the CPU registers every time the thread is
scheduled/unscheduled to run by Windows. The process by which this happens is
known as a context switch.
This 4GB address space is divided into two partitions: the user mode partition and
the kernel mode partition. By default, each of these is sized at 2GB, though you can
change this through BOOT.INI switches on the Windows NT family of the OS.
(Windows NT, Windows 2000, Windows XP, and Windows Server 2003 are members
of the Windows NT family; Windows 9x and Windows ME are not.)
Although each process receives its own virtual memory address space, OS code and
device driver code share a single private address space. Each virtual memory page
is associated with a particular processor mode. In order for the page to be accessed,
the processor must be in the required mode. This means that user applications
cannot access kernel mode virtual memory directly; the system must switch into
kernel mode in order for kernel mode memory to be accessible.
We'll talk more about virtual memory and how Windows manages it in Chapter 4,
Memory Fundamentals. For now, just understand that virtual memory does not
necessarily correlate to physical memory. It is a service provided by Windows that
allows applications (and Windows itself) to allocate and use more primary storage
(memory) than physically exists in a machine without having to handle paging data
to and from secondary storage (disk drives).
Subsystems
Windows ships with three environment subsystems: Win32, OS/2, and POSIX. Each of
these provides a different environment or personality for the OS. Of these, Win32 is
preeminent because it's not optional (it must always be running, regardless of the
environmental subsystem chosen and regardless of whether anyone is logged in)
and because it provides the most direct and most complete access to Windows itself.
The other environment subsystems aren't used much and don't provide the same
level of functionality as the Win32 subsystem. Applications that run on Windows are
compiled and linked to run under a particular environment subsystem. Obviously,
most of these, including SQL Server, are Win32 applications.
The Win32 environment subsystem can be broken down into the following major
components.
Win32k.sys, the kernel mode device driver, includes two facilities: (1) the
window manager, the facility responsible for collecting input from the keyboard
and mouse, for managing screen output, and for passing messages to
applications; and (2) the Graphics Device Interface (GDI), the facility
responsible for output to graphics devices.
Windows applications interact with the OS kernel via the subsystem DLLs. These
DLLs hide the actual native OS calls (which are undocumented) from the application.
The purpose of these DLLs is to translate Win32 API calls into OS service calls. These
calls may or may not involve sending a message to the environment subsystem
process hosting the application.
Given my statement that user mode code cannot access kernel mode memory, you
may be wondering how user mode code can invoke code and access data that is
obviously in the kernel�after all, the operating system's core functionality is
implemented in the kernel, hence the name. The way this works is that user mode
applications make Win32 API calls to functions exported from subsystem DLLs. These
DLLs then make calls to undocumented native API functions in NTDLL.DLL. The
functions in NTDLL.DLL then invoke the platform-specific instructions to switch the
processor chip into kernel mode and invoke the appropriate code in the OS kernel.
This code may reside in the kernel executable, NTOSKRNL.EXE, or in the kernel mode
device driver, Win32k.sys. (Purists may quibble that it's a little more complicated
than that, but this gives you a basic picture of how a user mode app interacts with
the Windows kernel.)
Dynamic-Link Libraries
A DLL is a binary file that serves as a shared library of routines that can be
dynamically loaded and unloaded at runtime by applications that use the routines.
Runtime libraries and class libraries for language products such as Visual C++ and
Delphi can take the form of DLLs. The user mode portion of the Win32 API is
ensconced in DLLs such as Kernel32.DLL and User32.DLL. One advantage of a DLL
over a static library is that multiple applications can share a single DLL. Windows
ensures that only one copy of a DLL's code is mapped into memory regardless of the
number of applications that reference it.
DLLs make the functions they contain visible to the outside world by exporting them.
A DLL's export table can be viewed using external tools such as the Depends tool
that comes with Visual Studio or the dumpbin utility that comes with several
Microsoft products. A DLL routine can be exported by name or ordinal or both. DLLs
(and executables) also have import tables�tables listing the DLLs they depend on
and the functions they statically import. I'll talk more about static importing in just a
moment.
A process can load DLLs through one of two means: either implicitly when it starts or
explicitly via a call to the LoadLibrary(Ex) API. The manner in which it loads a
particular DLL is determined by the way in which its executable references the
functions exported (made visible to the outside world) by the DLL. There are two
ways these references can occur. When an executable is compiled and linked, it can
statically import the functions exported by a DLL by importing the DLL's .LIB file.
(.LIB files are not universally required by all compilers and linkers for static linking
but are most prominent with C and C++ products�e.g., neither Visual Basic nor
Delphi require or use .LIB files.) A static import causes the DLL to load automatically
when the executable is started. If the DLL can't be located, the executable won't
start. Static importing is usually the way Windows apps load DLLs. It requires less
code and is managed mostly by the OS. All Windows apps statically import at least
Kernel32.DLL, and most also import User32.DLL because these DLLs contain the
lion's share of the Win32 API.
So, when a DLL is loaded into a process's address space, it becomes code the
process can call. Each DLL has a default load address within the 4GB process
address space that is specified when it is linked. If nothing occupies the address
range where the DLL was configured to load in the calling process, it will load at that
address. If something else is already there, it will have to be "rebased," which
involves reading the entire image and updating all fix-ups, debugging information,
checksums, and time-stamp values to use a different base address. Because this
amounts to updating pages contained in the DLL image, Windows must load each
page that must be modified into virtual memory and make the changes there.
As I said earlier and as Chapter 4 details, the normal mode of operation for
allocations within a process's address space is that those allocations are backed by
the paging file (or physical memory). An exception is made in the case of binary
code, though, because it is normally read-only and copying it to the paging file is
wasteful. Instead, the EXE or DLL file itself "backs" (provides physical storage for)
the range of virtual memory addresses within a process's address space set aside for
the executable or DLL. So, when the process makes a call to a part of the executable
or DLL that is not in physical memory, it will not go to the paging file to get that
page of the EXE or DLL file. Instead, it will go to the appropriate binary file and load
the page into physical memory directly from it. In that sense, EXE and DLL files
become read-only extensions of the paging file.
Normally, only one copy of the pages that make up a DLL or EXE file is maintained in
memory regardless of the number of processes using it. An exception to this is when
a process makes a change to a global or static variable in the DLL or EXE. When this
occurs, Windows makes a copy of the page that's local to the process, carries out
the change, then alters the process to reference the new version of the page going
forward. The mechanism by which this happens is known as copy-on-write memory.
The ability to use a DLL or EXE as the physical storage for the virtual memory
address range it occupies relies on Windows' memory-mapped file facility, which can
actually be used with any type of file, not just binaries. A memory-mapped file
serves as the physical storage for the virtual memory it occupies. Rather than
copying the file to the system paging file, Windows uses the file as though it were a
paging file itself and automatically saves/loads pages to/from this file as the virtual
memory into which it has been mapped is referenced. Virtual memory is just that:
virtual. It doesn't represent actual memory until physical storage is committed to it.
Windows' memory-mapped file facility provides applications the ability to treat files
as though they were memory, with Windows handling all the I/O behind the scenes.
Windows itself uses this facility when it accesses EXEs and DLLs. This is all explained
in great detail in Chapter 4.
Tools
In this section, I'll touch on a few of the tools we'll use throughout the book to
explore SQL Server. These tools come from a variety of sources, and you certainly
don't need all of them to work through the book. I mention them in various places in
order to point out their usefulness and to give you some tips on how to leverage
them to solve a particular problem or examine a particular piece of data. That said,
you can work through most of the examples in the book with little more than the
tools that come with Windows and Microsoft's freely downloadable WinDbg
debugger.
TList
This tool comes on the Windows Support Tools CD. You can use it to list the running
processes and to list the modules loaded within each process. A variety of other
process-specific information can be returned as well.
Pviewer
Pviewer also comes with the Windows Support Tools. It allows you to view
information about running processes and threads. You can also kill processes and
change process priority classes. One really handy aspect of it is that you can use it
to view processes on your local machine as well as those across a network on a
remote machine.
Pview
This tool comes with the Platform SDK and is essentially the same tool as Pviewer. It
offers the same functionality and displays the same type of information. Of course,
the features it sports vary based on the release of the SDK and how recent the build
you have is, but it is basically the same tool as Pviewer.
Perfmon
Windows' Performance Monitor is probably the single most valuable tool included
with the product for looking under the hood to see what's happening within the OS.
Perfmon (or Sysmon, as it's now known) provides the ability to monitor several key
statistics about running processes and threads, memory and CPU utilization, disk
use, and a bevy of other interesting objects and diagnostics.
You use Perfmon by adding counters to a data collection, then allowing the tool to
sample them over time as the system runs. You can view these counters as lines on
a chart, as literal values, or as bars on a histogram. You can save logs created by
Perfmon as binary files, as delimited text files, and as SQL Server tables. Throughout
the book, I'll mention Perfmon counters that are useful in exploring a particular
technology or subsystem within Windows or SQL Server.
WinDbg
A version of WinDbg comes with Windows, and you can also download it from the
Microsoft public Web site (as of this writing, you can find it at
http://www.microsoft.com/ddk/debugging/default.asp). For working through the
examples in this book, I suggest you get WinDbg from the Microsoft Web site so that
you can be sure to have the latest version.
NOTE: Microsoft also includes a command line debugger, cdb.exe, in its Debugging
Tools for Windows package that you may find preferable to WinDbg if you prefer
command lines to GUIs. This debugger uses the same debugger "engine" as WinDbg,
and the debugger commands presented in this book will work equally well with it.
Probably the single most important thing to remember when using WinDbg or any
symbolic debugger is that, in order to successfully debug much of anything, you
must have debugging symbols and the debugger's symbol path must be set so that
it can find them. Debugging symbols are generated by your compiler/linker product.
For Microsoft products, the standard symbol file format is the Program Database
(PDB) format and is automatically produced for debug builds in Visual C++ and for
any executable in VB whose Create Symbolic Debug Info project property is set.
Typically, if you have debug symbols, you will find a PDB file corresponding to your
EXE or DLL name in the same folder with the executable.
An exception to this is SQL Server. SQL Server's symbol files are located in the exe
and dll subfolders under the main SQL Server Binn folder. (The symbols for
sqlservr.exe, the main SQL Server executable, are in the exe folder; the symbols for
the key DLLs it uses are in the dll folder.) These symbols are retail debug
symbols�symbol files that have been stripped of many things essential to real-world
debugging such as parameter types, local variables, and the like. These symbols
aren't suitable for true debugging, but they're just right for our purposes. Retail
symbols work fine for looking under the hood of an application and exploring a
running process.
As I've mentioned, in order to successfully debug anything with WinDbg, you'll need
to correctly set the symbol path. You can set the symbol path in WinDbg by pressing
Ctrl+S or by choosing File | Symbol File Path from the menu system.
The retail symbols for much of Windows (and for many other products) are available
over the Internet via Microsoft's symbols server. You don't have to download
these�you simply point the debugger at the symbols server and it takes care of
downloading (and caching) symbols files as it needs them. As of this writing, you can
use http://msdl.microsoft.com/download/symbols in your symbol path to reference
Microsoft's publicly available symbol server over the Internet. Microsoft has a great
Web page explaining exactly how this works and how to use it at
http://www.microsoft.com/ddk/debugging/symbols.asp. Read this page and set your
symbol path accordingly in WinDbg. Currently, my WinDbg symbol path is set to
SRV*c:\temp\symbols*http://msdl.microsoft.com/download/symbols.
When debugging a component or program to which you have source code, it's also
important to set the WinDbg source path correctly. You can do this by pressing
Ctrl+P or by selecting File | Source File Path in the WinDbg menu system. Setting the
source path allows WinDbg to find the source code to your app while debugging so
that you can step through it, set breakpoints, and so on.
NOTE: I should point out here that there's no guarantee that SQL Server will
continue to ship symbol files of any kind. For now, they're included with the product,
but they may not be at some point in the future. If that ever happens, hopefully
Microsoft will make them available via its public symbols server as it has for some of
its other products.
Recap
Windows is a sophisticated, robust operating system that is comprised of several
subsystems. Programmed using the Win32 API, it provides a mechanism that allows
user applications to (indirectly) make calls into the system kernel code.
Windows provides an architecture that protects the system from being destabilized
by errant user applications. It provides a virtual memory facility to alleviate the need
for apps to implement their own virtual memory managers. And it provides memory-
mapped files and copy-on-write support for making advanced memory management
easy and efficient.
Knowledge Measure
1. How much virtual memory does the user mode portion of a process have by
default?
2. How many operating modes (rings) does the Intel x86 processor family
support?
3. True or false: No code actually runs via a process itself�a thread is required to
execute an application's code.
5. True or false: You must download symbolic debug information from Microsoft
and copy it to a local symbols server in order to use WinDbg to debug a
Microsoft product.
6. True or false: Windows uses all the operating modes supported by whatever
processor chip it is running on.
10. What's the single most important thing you must do in order to ensure a
symbolic debugger is able to debug a process?
Chapter 3. Processes and Threads
It is necessary to the happiness of man that he be mentally faithful to himself.
Infidelity does not consist in believing, or in disbelieving; it consists in
professing to believe what he does not believe.
�Thomas Paine[1]
[1] Paine, Thomas. The Age of Reason, ed. Philip S. Foner. New York: Citadel Press, 1974, p. 50.
In this chapter, we'll explore processes and threads within Windows. We'll discuss
how processes and threads differ and how they're similar, and we'll talk about the
unique role each plays within the Windows operating system.
We'll also explore in detail how the Windows thread scheduler works and how
threads are scheduled on and off of processors. We'll talk about thread
synchronization and how multithreaded apps such as SQL Server use
synchronization objects to serialize access to shared resources and ensure thread
safety.
Processes
Entry-point function� the function address at which a thread begins
executing. For the main thread, this is the entry point of the application (often a
function named main() or something similar); for all other threads, the entry-
point function is specified when the thread is created and is basically a simple
callback routine.
Overview
Each process consists mainly of two components, a process kernel object and a
virtual address space. The operating system uses the kernel object to manage the
process and to provide a means for applications to interact with the process. The
virtual address space contains the executable module's code and data, the code and
data of the DLLs it loads, and dynamic memory allocations such as thread stacks and
heap allocations.
The fact that they provide the virtual address space for the application means that
processes require a lot more system resources than threads. The creation of a
process's virtual address space requires a significant amount of system resources.
The tracking and management of this space is handled by the operating system's
memory manager using virtual address descriptors (VADs), and this requires
resources that threads don't have to concern themselves with.
I should mention here that the executable code (and data) that's contained within the
process's virtual address space isn't actually loaded until needed. It is mapped within
the virtual address space�which merely reserves certain address ranges for it within
the space while the physical storage for the region remains the EXE or DLL file itself.
It is loaded or unloaded as needed by the operating system in page-sized chunks (4K
for Win32 on x86). Every executable or DLL file mapped into a process's address
space is assigned a unique instance handle.
Beyond the kernel object and virtual address space, a process also encapsulates:
An access token that defines the process's security context and identifies the
user, privileges, and security groups associated with the process
Function Description
% % %
Process Image Priority Handle Elapsed Command
CPU User Priv
ID Name Class Count Time Line
Time Time Time
Perfmon
Pstat
Pviewer
Qslice
TaskMgr
% % %
Process Image Priority Handle Elapsed Command
CPU User Priv
ID Name Class Count Time Line
Time Time Time
TList
Perfmon is an indispensable tool when working with processes. Table 3.3 lists some of
the more important process-related Perfmon counters.
Counter Description
Process:% User The time the process has spent in user mode
Time
Process:% The total processor time for the process; should be a sum of the
Processor Time first two counters and may exceed 100% on a multiprocessor
machine
Process:Elapsed The number of seconds that have passed since the process was
Time created
Process Internals
Each thread runs within the context of the process that owns it. The process provides
the resources and environment in which the thread executes its code. By virtue of
Windows' process separation, a thread can't access the address space of other
processes without a shared memory section or the use of the ReadProcessMemory or
WriteProcessMemory functions.
Once the kernel object is created, Windows creates the process's virtual address
space and maps the code and data for the executable module (and any other
required modules) into this space.
Note that CreateProcess can return TRUE before the process has been completely
initialized, even before the operating system loader has located all the DLLs required
by the executable. If a required module can't be located or fails to initialize, Windows
terminates the new process. Because CreateProcess may have returned TRUE before
this was known, the creating process is not aware of the problem and may attempt to
erroneously use the process or thread handle returned by CreateProcess. One way to
avoid this is to check the process status via the GetExitCodeProcess API function
before attempting to use its process or main thread handle. GetExitCodeProcess will
return FALSE if you pass it an invalid handle. Also, you can wrap calls to CloseHandle
in exception blocks such that they trap the INVALID_HANDLE exception that's raised
when CloseHandle is passed a bad handle.
Once a process is created, the system automatically creates its primary or main
thread. Every process has at least one thread represented by an executive thread
(ETHREAD) block in system memory.
After you've created a new process using CreateProcess, you should close the handle
to its main thread in order to allow the corresponding thread object to be freed by the
system when it is no longer needed. Closing this handle doesn't terminate the thread;
it just releases the caller's reference to it. There is no advantage to keeping a handle
to the spawned process's primary thread unless you intend to manipulate that thread
directly through API calls. You can pass the handle of the process itself into a wait
function if you want to wait on the process to finish executing before proceeding.
During the initialization of a new process, the instruction pointer for the process's
primary thread is set to an undocumented and unexported function called
BaseProcessStart. This is where all new processes begin executing in Win32.
The PEB, EPROCESS block, and related structures aren't the only internal structures
Windows uses to track processes. The Win32 subsystem process (CSRSS) maintains a
parallel structure for each process that executes a Win32 program. Additionally, the
kernel mode piece of the Win32 subsystem (Win32k.sys) maintains a per-process
data structure that is created the first time a thread in the process calls a Win32
USER or GDI function that is implemented in kernel mode.
Process Termination
a. Any C++ objects created by the thread will have their destructors called
and the runtime library (RTL) will be allowed to run its cleanup code.
b. Windows will immediately release the memory used by the thread's stack.
c. The process's exit code will be set to the return value of the thread's
entry-point.
2. A thread in the process calls ExitProcess. You want to avoid this because it
prevents C++ object destructors and RTL cleanup code from being called.
4. All the threads in the process terminate on their own. This is fairly rare.
Once terminated, a process will leak absolutely nothing. Windows ensures that all
resources allocated by the process are freed when it exits.
GetExitCodeProcess will return the exit value of the process rather than
STILL_ACTIVE.
SetErrorMode
Each process has a set of flags associated with it that tells the system how the
process should respond to serious errors such as unhandled exceptions, media
failures, and file open failures. You can set these flags via the SetErrorMode API call.
Table 3.4 summarizes the options available to you.
SQL Server calls this function on startup, and, as a rule, you should never call it from
code that runs within the SQL Server process (e.g., from an extended procedure or in-
process COM object) as this could interfere with the server's ability to handle errors.
Value Meaning
In these exercises, we'll monitor various aspects of the SQL Server process. We'll
check out its CPU usage, the threads it creates, and the modules it loads. You can use
these same techniques to monitor other types of Win32 processes.
In this exercise, we'll monitor a CPU spike in the SQL Server process. You'll learn how
to monitor processor use via Task Manager. To complete the exercise, follow these
steps.
1. Start SQL Server (this should not be a production system) it if isn't already
started and connect to it with Query Analyzer. Ideally, you should be the only
user on the system.
3. Click the Processes tab, then click the CPU column to sort the list so that
processes with high CPU usage appear at the top of the list.
4. Minimize Task Manager, then switch back to Query Analyzer and run the
following query: declare @var int set @var=1 while @var<100000 begin set
@var=@var+1 end
5. Switch back to Task Manager while the query runs. In the Processes list, you
should see that sqlservr.exe has moved near the top of the list.
The effect will be more dramatic on uniprocessor machines and, of course, on slower
processors, but you should see some type of spike in the CPU use by the SQL Server
process.
If you'd like to see a graphical depiction of this spike, run the query again and switch
to the Performance tab in Task Manager. On a uniprocessor machine, you will see a
chart like the one in Figure 3.1.
1. Stop and restart SQL Server. This should not be a production server and you
should be its only user for the duration of this test.
4. In the instances list, find sqlservr and click it. (This may have #1 or #2 or some
other number appended to it if you have multiple instances of SQL Server
running; make sure you know which instance you select.)
6. Click the Add button. You should now see a line chart in Perfmon indicating the
current number of threads running within the SQL Server process. Make a note
of how many threads are currently running (the Last field will tell you the exact
count from the last sample interval). Figure 3.2 illustrates how to add the
counter.
7. Open a command window, change to a folder into which you can copy some
files and run some tests, and copy the files STRESS.CMD and STRESS.SQL from
the CH03 folder on the CD accompanying this book. STRESS.CMD runs a
specified T-SQL script or scripts using a specified number of connections (it calls
osql.exe). You can use it to simulate multiple users connecting to your server
and running a given query or queries. Run it without parameters to see usage
help. STRESS.SQL simply dumps the pubs..authors table, then pauses for 15
seconds via the T-SQL WAITFOR DELAY. Start STRESS.CMD with this command
line: STRESS STRESS.SQL 15 N normal Y YourServerName\Instance
You should see 15 command windows open, each of them running STRESS.SQL.
8. Now switch back to Perfmon. You should see a noticeable increase in the
number of threads within the SQL Server process. This is because your new
connections have forced the server to create new worker threads to service
them. Note that there's no strict ratio between worker threads and connections.
SQL Server can often effectively service thousands of connections with only a
few hundred worker threads.
9. About 15 seconds after you started STRESS.CMD, you should see the command
windows it opened automatically close. However, you won't see SQL Server's
worker thread count dip immediately. The server will not immediately destroy
its newly created worker threads even though they're currently sitting idle
because it may need them when the next wave of work comes rolling in.
Caching idle worker threads allows SQL Server to provide stable performance in
environments where the number of connections and queries being sent into the
server varies significantly over time. At the same time, the server doesn't
continue to use system resources that aren't needed�after 15 minutes, SQL
Server will time out an idle worker thread and destroy it.
Figure 3.1. A spike in CPU use by SQL Server can spike the CPU use
for the entire machine.
Note that a process's thread count can also be viewed by numerous other tools�you
don't have to use Perfmon. One tool I'm particularly fond of is Pview (from the
Platform SDK tools). Figure 3.3 illustrates viewing the thread count for SQL Server
using Pview.
Exercise 3.3 Listing Modules and Processes within SQL Server
In this last exercise, you'll use a custom extended procedure, xp_modlist, to list the
currently loaded modules and processes under SQL Server. You'll see all DLLs within
the SQL Server process space, as well as any executables it has spawned. While
Windows does not provide an official mechanism for establishing parent-child
relationships between processes, there are undocumented functions that a routine
can use to get this information. To run xp_modlist under SQL Server, follow these
steps.
sp_addextendedproc 'xp_modlist','xp_sysinfo.dll'
4. Run the following command to instantiate a child process under SQL Server:
xp_cmdshell 'notepad'
Note: You will not see Notepad started unless you are running SQL Server as a
console application. Don't worry about that�we're only starting it in order to
have a subprocess that never completes.
5. Your xp_cmdshell call will appear to hang. Open a new Query Analyzer window
and run the following command: xp_modlist
You should see a list of processes that looks something like this:
Note the three different ParentProcessId values. When you call xp_cmdshell, it
calls the command interpreter, Cmd.exe, which in turn calls Notepad.exe, the
command you passed into xp_cmdshell. By calling your process using cmd.exe,
xp_cmdshell allows it to use command shell services such as piping and
redirection.
6. If you are running SQL Server as a service, you may have to cycle your server
machine in order to get rid of the Notepad instance you just started, depending
on the user account you're logged in with and its permissions.
Although we tend to think of processes as objects that carry out the work of an
application, a process is really just a context provider for threads. In Windows,
threads carry out the work of the application; processes provide an environment in
which to do that work.
SQL Server is a process that can run as either a service or a console mode
application. As with all other Windows processes, it consists of threads, a virtual
address space, kernel objects, and so forth.
6. Of the 4GB set aside for a 32-bit process's address space, how much is reserved
for the kernel and how much is resolved for user mode space by default?
7. True or false: It's possible to create a new process that does not have a main
thread, but this is not recommended since the app won't be able to do
anything.
8. What is the largest size an application's user mode space can be set to under
32-bit Windows?
9. What file contains the kernel mode piece of the Win32 subsystem?
10. True or false: It is safe to call SetErrorMode from an extended stored procedure
so long as you wrap the call in an exception handling block.
11. What is the signal status of a process that is terminated using Task Manager?
12. What BOOT.INI switch is used to configure Windows to provide applications with
a larger-than-normal user mode address space?
13. True or false: Because of the way that Windows protects processes from each
other, it is impossible for one process to alter memory allocated within another
process's address space.
14. True or false: CreateProcess can return TRUE even when the new process
cannot be created because DLLs on which it depends cannot be located.
Threads
A thread is the facility by which program code is executed in Windows. It is the only
Windows object capable of running code. A thread is always created within the
context of a process and lives its entire life within that process.
A thread is the only means by which a process carries out work. Without threads,
processes can't do anything.
Thread� the Windows object by which application code is executed. Every
thread runs within the context of an application or the operating system.
Fiber� a lightweight, thread-like user mode construct that runs within the
context of a thread. Fiber mode imposes a number of restrictions on SQL
Server that you may find unacceptable (e.g., you can't use SQLXML when
running fiber mode). For this reason, fiber mode is generally not
recommended, but it is sometimes useful for achieving greater scalability by
reducing context switching.
TLS� thread local storage, a mechanism by which threads can store data
that is unique to each thread. TLS values are accessed by index. You use the
Win32 API function TlsAlloc to allocate a TLS index and the functions
TlsSetValue and TlsGetValue to set and get values, respectively.
Function Description
TIP: Note that you can pass a pseudohandle into a Win32 API function that requires
a process or thread handle. This causes the function to perform its action on the
calling process or thread.
Table 3.6. Basic Thread Diagnostic Tools and the Information They
Return
% % % Reason for
Thread Start Context Thread Last
CPU User Priv Last Wait
ID Address Switches State Error
Time Time Time State
Perfmon
Pstat
Pviewer
Qslice
TList
Counter Description
Thread:% The percentage of time the thread has spent in kernel mode
Privileged Time
Thread:% User The percentage of time the thread has spent in user mode
Time
Thread:% The percentage of time the thread has used the CPU; should be a
Processor Time sum of the % Privileged Time and % User Time counters
Thread:Thread The current thread state, an integer from 0 to 7 (see Table 3.11 on
State page 64)
Counter Description
Thread Internals
A set of volatile CPU registers that represent the state of the processor.
A TLS area.
A unique identifier known as a thread ID. (As with processes, this is also
internally called a client ID�process and thread IDs never overlap because
they are generated from the same namespace.)
An optional security context. (By default, a thread inherits the security context
of its parent process. Multithreaded apps will sometimes obtain a separate
access token for individual threads in order to impersonate the security context
of the clients they serve.)
The TLS area, registers, and thread stacks are collectively known as a thread's
context. Data about them is stored in the thread's CONTEXT structure. CONTEXT is
the only processor-dependent structure in the Win32 API. The structure itself is
contained in the thread's kernel object.
Of the items stored in the CONTEXT structure, the thread's instruction pointer and
stack pointer registers are probably the two most important. When a thread's kernel
object is initialized, the stack pointer is set to the address of the location of the
thread's entry-point function on the thread's stack. The instruction pointer is set to
the undocumented BaseThreadStart function.
All threads in a process share the process's virtual memory address space and
handle table. By default, they also inherit the process's security access token,
though they can obtain their own for impersonation if necessary, as mentioned
above. And, as I mentioned in the Processes section, the system prevents threads
from accessing the address space of other processes without a shared memory
section or the use of the ReadProcessMemory or WriteProcessMemory API functions.
As with processes, the Win32 subsystem (CSRSS) maintains a parallel structure for
each thread created in a Win32 process. And for threads that have called a Win32
subsystem USER or GDI function, Win32k.sys, the kernel mode portion of the Win32
subsystem, maintains a parallel data structure (a W32THREAD struct) that the
ETHREAD block references.
When a process finishes its work and is ready to shut down, it should signal any
worker threads it has created that they need to return from their entry-point
functions, then simply return from the entry point to its main thread. Returning from
the primary thread's entry-point function ensures that:
Any objects created by the thread will have their destructors called so that
they can be destroyed properly.
Windows will immediately release the memory used by the thread's stack.
The process's exit code will be set to the entry-point function's return value.
The system will decrement the usage count of the process kernel object for the
thread's owning process.
Given that threads use fewer system resources than processes, it makes sense to try
to solve your programming problems using threads rather than processes when
possible. The overhead of managing the virtual address space is not insignificant;
creating too many processes on a machine can run the system out of virtual
memory and bring performance to a standstill.
That said, don't assume that every problem is better solved with multiple threads
rather than multiple processes. Some designs are better implemented using
separate processes. My advice is to educate yourself as to the trade-offs with each
approach and weigh them against one another before deciding which way to go.
Beyond the obvious efficiencies with respect to system resources, multithreading an
application also allows its interface to be simplified. If certain tasks that you
normally trigger by clicking menu options or buttons can be performed automatically
in the background by separate threads, you may be able to eliminate those user
interface elements altogether. Keep in mind, though, that in most apps, a single
thread should handle all user interface updates. This is because, unlike other types
of objects, the window handles to user interface components such as buttons and
text boxes are actually owned by individual threads, not the parent process.
Synchronizing multiple user interface threads such that the app displays and works
correctly is usually more trouble than it's worth.
Multithreading also allows an application to scale. If your machine has multiple CPUs,
you can truly run multiple tasks simultaneously by creating multiple threads, each of
which might get its own CPU.
For all its benefits, keep in mind that multithreading is not the best way to solve
every problem. Performing tasks over multiple threads introduces complexities into
an app that would otherwise not be there�there are always trade-offs. Some
developers believe the first thing you do with a complex task is break it up into
multiple threads. This is a questionable design practice that will get you into trouble
as an application builder if you follow it.
When CreateThread is called, the system creates a thread kernel object to manage
the thread. This object's initial usage count is 2. The thread's kernel object will not
be destroyed until the thread terminates and the handle returned by CreateThread is
closed.
Creating a thread initializes its stack. This stack is allocated from the process's
virtual address space since threads don't have an address space of their own. Once
the stack is allocated, the system writes two values to the upper end of it (thread
stacks always build from high memory addresses downward): the entry-point
function address that was supplied to CreateThread and the value of the user-
defined parameter that was passed in along with it.
NOTE: Usually, developers aren't concerned with the thread ID of a newly created
thread�they only need the object handle in order to interact with the thread.
Therefore, CreateThread allows you to pass NULL for its final parameter, lpThreadId,
and this is a common practice among experienced NT developers. This works fine on
the Windows NT family of operating systems (which includes Windows NT, Windows
2000, and all subsequent versions of Windows) but will cause an access violation on
Windows 9x. If you want your code to run on Win9x, you must pass a value for this
parameter whether you actually intend to do anything with it or not.
Thread Termination
You can terminate a thread using one of the following four methods.
1. Return from the thread's entry-point function. This is the cleanest and best way
to shut down a thread. It ensures that C++ object destructors are called, RTL
cleanup code runs, buffers are flushed to disk, and so on.
2. Have the thread commit suicide by calling ExitThread. This approach prevents
C++ object destructors and RTL cleanup code from running, so you should
avoid it.
3. Call TerminateThread from the same or another process. You shouldn't use
TerminateThread to kill a thread for three reasons.
a. The thread doesn't receive any notification that it is dying. Naturally, this
can present problems with running cleanup code.
c. The memory in which the thread's stack is stored is not freed up until the
process terminates. TerminateThread was implemented this way so that
other threads that might still be running and accessing variables that
were on the terminated thread's stack can continue to run unaffected.
Windows frees all user object handles owned by the thread. As I said earlier,
usually a process owns the objects created by its threads. However, there are
two exceptions: windows and hooks. When a thread terminates, any window or
hook handles it has open are freed.
Windows changes the thread's exit code from STILL_ACTIVE to the return value
of the entry-point function or the value passed to ExitThread or
TerminateThread.
Windows considers the process terminated if the thread is the last active
thread in the process.
NOTE: For obvious reasons, many of the thread-specific Win32 API functions are not
available from Visual C++ if you link with the single-threaded runtime library. You'll
get "unresolved external" errors for functions such as CreateThread and
_beginthreadex if you attempt to call these functions while linking with the single-
threaded RTL. The single-threaded version of the RTL is the default when you build a
non-MFC application, but the multithreaded RTL is the default when you build an
MFC app or use the extended stored procedure wizard to build a SQL Server
extended procedure. If you're going to write code that is to run in a multithreaded
environment you must link with the multithreaded version of the RTL because, in
addition to multithreading-specific functions not even being present in the single-
threaded RTL, several base RTL functions are not thread-safe in the single-threaded
version of the library. Examples include errno, _strerror, tmpnam, strtok, and many,
many others. And, in case you're wondering, there are separate single-threaded and
multithreaded RTLs for C because the original RTL was created around 1970 before
threads were available on operating systems.
Even though CreateThread and ExitThread are the standard Win32 API functions for
creating and ending threads, Visual C++ developers should consider using
_beginthreadex and _endthreadex instead. There are three main reasons for this.
Given that CreateThread is the only way to create a new thread in Windows,
_beginthreadex does ultimately call it. However, it makes a few changes en route
that make your code more robust and protect it from thread-safety issues in the RTL.
It pulls this off by substituting its own thread entry-point function for yours and
storing the address of your function, along with the user-defined parameter you
originally supplied, in the tiddata structure it allocates from the RTL heap for your
thread. It then calls CreateThread and passes its thread entry-point function
(_threadstartex) as the entry-point function and the address of the tiddata block as
the user-defined parameter. This _threadstartex function sets up the SEH frame I
mentioned earlier, does some other initialization work, then retrieves the address of
your entry-point function from the tiddata block passed into it and calls it, passing it
the user-defined parameter you originally passed to _beginthreadex. The end result
is a far safer and more robust thread creation mechanism. SQL Server uses
_beginthreadex and _beginthread to instantiate new threads.
Thread Functions
A thread function is the entry point for the thread. It's where execution begins when
the thread starts. A few points about these types of functions appear below.
You don't have the ANSI/Unicode issues that you have with the primary thread
and its main/wmain and WinMain/wWinMain entry points.
You should try to use function parameters and local variables as much as
possible; using static and global variables makes your code inherently
unthread-safe.
In order to ensure that system cleanup code runs properly, you want to write code
such that it returns from your entry-point function rather than calling
ExitThread/_endthreadex. Returning properly from your thread function ensures the
following.
The destructors of your C++ objects will be called so that the objects can be
disposed of properly.
Windows will release the memory used by the thread's stack immediately
(rather than waiting until process shutdown).
The thread's exit code (stored in the thread's kernel object) will be set to your
thread function's return value.
Windows has built-in support for structured exception handling (SEH). This allows
you to write code that separates the task at hand from what it is supposed to do if
an error occurs.
When an operating system exception occurs that is not handled by a thread, the
system's default handler is triggered (unless the application has installed its own), a
dialog box is displayed, and the process is shut down. (What actually happens when
an unhandled exception occurs can be changed with SetErrorMode, as mentioned
earlier.)
Of course, this has important implications for code you write that runs inside the SQL
Server process. What happens when you call an xproc from T-SQL and the xproc
causes an exception to be raised? The answer depends on whether the exception
was raised within the context of the SQL Server worker thread or a thread created by
the xproc.
If the exception was raised by the SQL Server worker thread running the xproc, the
thread's default exception handler catches the exception and kills the connection
responsible for causing it. If the exception was raised by a thread created by the
xproc (and the thread has no exception handling code of its own), SQL Server will
generate a symptom dump file, then write a message to the error log indicating that
it is terminating, and exit. This means that an unhandled exception in a thread
created by an xproc could cause your server to crash�not a pleasant scenario.
How do you protect against this? Always wrap the thread entry-point function for any
threads you create in an xproc (or in an in-process COM object) with SEH code.
Listing 3.1 provides an example.
GetLastError
Every thread stores the result of the last Win32 API call, commonly referred to as last
error. Though there's no guarantee that a Win32 function will set the last error value
on an error condition, it's very unusual for this not to be the case. You can retrieve
this value using the GetLastError API function. In addition to setting last error on an
error condition, some functions also set it when no error occurs�in other words,
they reset it. You can set the value of last error via the SetLastError API function.
You should call GetLastError immediately after API function calls from which you wish
to obtain error information because, as I've mentioned, some functions reset this
value when they complete without an error. Once a last error value has been lost
because a function has reset it, there's no way to retrieve it.
You can use bitwise operators to retrieve important information from last error
values. For example, bit 29 indicates that the error is a user-defined error. No system
error will have this bit set. If you're defining your own error codes, be sure to set this
bit in order to ensure that it does not conflict with a system-defined error code.
The file winerror.h in the Platform SDK documents the exact format of Win32 error
codes and includes #defines for the most common ones. You can also use NET
HELPMSG nnnn from the command prompt to display a brief description of the most
common Win32 error codes. For example, type this at a command prompt:
NET HELPMSG 5
Access is denied.
This means that Win32 error code 5 indicates an access denied error condition.
The Win32 API FormatMessage allows you to translate an error code into its textual
description. You can use this function to display a message when an error occurs or
to return an error string to a caller. For example, you could use FormatMessage to
translate a Win32 error encountered in an extended procedure into a string that you
can return to SQL Server. SQL Server will then pass this message on to the client.
Fibers
Fibers are often thought of as lightweight threads. They were added to Windows to
make porting UNIX server applications easier. UNIX doesn't implement threading in
the same sense that Windows does. If we view them from the perspective of the
Windows threading model, UNIX apps are single-threaded but can serve multiple
clients. This basically means that UNIX developers have created their own threading
library that they use to simulate the type of pure threading offered by Windows. This
package does many of the same things Windows does when managing threads and
thread context�it saves/restores certain CPU registers, maintains multiple stacks,
and switches between these thread contexts in order to service client requests.
The chief difference between threads and fibers in Windows is that threads are
kernel objects, while fibers are user objects. The operating system knows threads
intimately and has highly tuned and tunable methods of scheduling them,
synchronizing them, and managing them to maximize system performance and
concurrency. Fibers, by contrast, are invisible to the operating system kernel. They
are implemented in user mode code of which the kernel knows nothing.
SQL Server can be configured to use Windows' fiber APIs. This is something Microsoft
generally recommends against, and you should do so only when instructed to by
Microsoft or a Microsoft partner. There are numerous facilities in SQL Server that
either do not work at all or that cannot work correctly when the system is in fiber
mode. For example, the SQLXML facilities in SQL Server (e.g.,
sp_xml_preparedocument) are not available when the system is in fiber mode. This
same is true of SQLMAIL�you can't use it while in fiber mode. Seemingly innocuous
activities such as initializing COM from an xproc that in thread mode would have
potentially affected just one SQL Server worker could conceivably affect many
workers when the server is running in fiber mode since a single thread can own
multiple fibers. A word to the wise is sufficient: Avoid fiber mode if you can.
Exercises
In these exercises, we'll investigate SQL Server's ability to handle exceptions and do
some general research into how it manages threads. We'll begin in Exercise 3.4 by
creating an extended procedure whose whole purpose is to raise different types of
exceptions within the SQL Server process to see how it handles them. Working
through this example will give you a greater understanding of the methods Win32
processes in general�and SQL Server in particular�typically use to deal with critical
errors.
Exercise 3.5 takes you through attaching to SQL Server with a debugger so that we
can look under the hood a bit and see how to inspect thread-relevant information for
a running process. You'll use Microsoft's standard WinDbg debugger to attach to SQL
Server, list some thread stacks, and do some other basic tasks having to do with
threading.
In this exercise, we'll explore what can happen when an extended procedure causes
an exception within the SQL Server process. This exercise will cause your SQL Server
to stop, so be sure to run the exercise against a test or development server. Ideally,
you should be the only user on the server. Let's begin by setting up an xproc called
xp_exception in the master database by following these steps.
3. Start Query Analyzer and add the xproc to the master database using
sp_addextendedproc, like this:
For curious readers, Listing 3.2 provides the main routine in xp_exception.
int iParams=srv_rpcparams(srvproc);
BYTE bType;
ULONG cbMaxLen;
ULONG cbActualLen;
BYTE bCrashType;
BOOL bNull;
DWORD dwThreadID;
if (0==iParams) bCrashType=0;
else srv_paraminfo(srvproc,1,&bType,&cbMaxLen,&cbActualLen,
&bCrashType,&bNull);
switch (bCrashType)
{
case 0: { //Crash the worker thread
srv_sendmsg(srvproc,SRV_MSG_INFO,0,(DBTINYINT)0,(DBTINYINT)0,
NULL,0,0,
"Generating an access violation on the worker thread",
SRV_NULLTERM);
CHAR *pCh=NULL;
//null pointer ref
*pCh='x';
break;
}
case 1: { //Crash a new thread with exception handling
srv_sendmsg(srvproc,SRV_MSG_INFO,0,(DBTINYINT)0,(DBTINYINT)0,
NULL,0,0,
"Generating an access violation on a new thread with exception
handling",SRV_NULLTERM);
CreateThread(NULL,
0,
(LPTHREAD_START_ROUTINE)StartThrdHandled,
NULL,
0,
(LPDWORD)&dwThreadID);
break;
}
case 2: { //Throw a language exception
srv_sendmsg(srvproc,SRV_MSG_INFO,0,(DBTINYINT)0,(DBTINYINT)0,
NULL,0,0,
"Generating a language exception on a new thread without
exception handling",SRV_NULLTERM);
CreateThread(NULL,
0,
(LPTHREAD_START_ROUTINE)StartThrdLanguageException,
NULL,
0,
(LPDWORD)&dwThreadID);
break;
}
case 3: { //Crash a new thread WITHOUT exception handling
srv_sendmsg(srvproc,SRV_MSG_INFO,0,(DBTINYINT)0,(DBTINYINT)0,
NULL,0,0,
"Generating an access violation on a new thread WITHOUT
exception handling -- your server should crash",
SRV_NULLTERM);
CreateThread(NULL,
0,
(LPTHREAD_START_ROUTINE)StartThrdUnhandled,
NULL,
0,
(LPDWORD)&dwThreadID);
break;
}
}
return XP_NOERROR ;
}
As you can see, the xproc takes a single parameter�an integer�then fails in
different ways based on the parameter passed. Table 3.8 lists the supported
parameter values and what each one causes to happen.
Value Action
4. Now, let's run the xproc. Let's start by passing it a parameter value of 0:
exec xp_exception 0
Connection Broken
The SEH block that SQL Server sets up for its worker threads caught the
exception generated by the xproc and killed the corresponding connection.
5. Let's see what happens when we generate an exception on a new thread that
has an SEH wrapper around its entry-point function:
exec xp_exception 1
All you should see in Query Analyzer is a message returned by the xproc�SQL
Server is unaffected by the exception because of the SEH code:
exec xp_exception 2
Again, all you see in Query Analyzer is the message returned by the xproc
itself�SQL Server is unaffected by the language exception:
As I said earlier in the chapter, language exceptions and Win32 exceptions are
two different things. A language exception won't bring down the server, but a
Win32 exception can if not handled properly.
7. Now let's try an access violation on a new thread without an SEH wrapper
around its entry-point function. (Warning: This will cause your server to stop.)
exec xp_exception 3
If you then check your server status in SQL Server Service Manager, you
should find that it has stopped. The unhandled exception generated by
xp_exception caused the process to crash. This should reinforce how important
it is, especially in multithreaded xprocs, to handle any exceptions your code
might raise. Failing to do so can quickly cause a server crash.
Note that I don't recommend that you necessarily handle all exceptions in the main
thread (the SQL Server worker thread from which it was called) of an xproc. If you
want to have the calling connection killed in the event of a catastrophic failure in
your xproc, you can simply allow SQL Server's default SEH handling to take care of
this for you�there's no need to set up your own version of it.
That said, you should definitely handle all exceptions in any new threads you create
in an xproc. Failing to do so will bring down your server when an exception occurs.
If you check the LOG folder under your SQL Server installation, you should find two
new files: an exception log file and a stack dump file. Both are text files that you can
view with Notepad. Feel free to open each of these and inspect them for additional
data about the exception that was raised. The dump file in particular contains
interesting details about DLLs loaded within the SQL Server process address space
and the call stacks of the worker threads when the exception was raised.
In this next exercise, we'll attach to SQL Server using WinDbg and inspect some
thread-related information exposed by the debugger. To take a peek under the SQL
Server hood, follow these steps.
1. If your server is still stopped from the previous exercise, restart it. As before,
this should be a test or development server, and, ideally, you should be the
only user on it (since attaching to it with a debugger will also stop it
momentarily).
3. Press F6 to attach to SQL Server. Find the sqlservr.exe instance in the list of
processes and select it. If you have more than one instance running there may
be more than one sqlservr.exe in the list. If so, expand the tree node of each
sqlservr.exe process to view its command line�you should be able to identify
each instance from its command line. If you just restarted your SQL Server, the
correct instance should be near the bottom of the list of processes.
5. Find the command window and dump the thread stacks using this command:
~*kv
You should see the call stacks of every thread in the SQL Server process listed.
You'll notice that most of the threads are in Win32's WaitForSingleObject API
function. This is because they are waiting on a synchronization object of some
type to be signaled; we'll talk more about WaitForSingleObject and
synchronization objects later in the chapter.
!teb
This will dump the TEB for the current thread. The current thread is indicated
by the prompt to the left of the edit box in the command window. (Given that
we haven't changed it, this should be the last worker thread in the SQL Server
process space.) Your TEB will likely look something like this:
TEB at 7FF99000
ExceptionList: 26caffdc
Stack Base: 26cb0000
Stack Limit: 26cae000
SubSystemTib: 0
FiberData: 1e00
ArbitraryUser: 0
Self: 7ff99000
EnvironmentPtr: 0
ClientId: bcc.cf8
Real ClientId: bcc.cf8
RpcHandle: 0
Tls Storage: f3b38
PEB Address: 7ffdf000
LastErrorValue: 0
LastStatusValue: 0
Count Owned Locks:0
HardErrorsMode: 0
Note the inclusion of the LastErrorValue in the output. This is the value you
would see were the thread to call the Win32 API GetLastError at this point.
7. You can also retrieve the last error value for the current thread via this
command:
!gle
This dumps the last error value as well as the last status value for the thread:
!peb
9. You can now exit the debugger. If you are running on a version of Windows
prior to Windows XP, you will have to restart SQL Server because exiting the
debugger leaves it stopped.
Thread Recap
4. True or false: You can use the TList utility to list the percentage of CPU
utilization for a given process.
5. True or false: When a new process is created, the system will automatically
create its first thread regardless of the parameters passed into CreateProcess.
6. What happens if you attempt to compile and link a Visual C++ program that
calls the CreateThread Win32 API function but links with the single-threaded
version of the runtime library?
8. True or false: When designing most types of complex applications, the first
thing an application architect should do is divide the work the app must
perform into threaded tasks and design threads to carry them out.
9. Name the main reason creating a process is slower and more resource
intensive than creating a thread.
10. What WinDbg command do we use to list the PEB for a thread?
13. What WinDbg command can we use to list only the last error value and last
status value for the current thread?
14. True or false: It's possible for thread and process IDs to overlap because they
are generated from different namespaces.
17. I have received a last error value of 6. What command line sequence can I use
to display the textual description for this error code?
18. What does the Thread:% Privileged Time Perfmon counter indicate?
19. What WinDbg command do we use to list only the TEB for a thread?
20. True or false: The LastErrorValue field included in WinDbg's TEB output is the
same information as that returned by the Win32 GetLastError API function.
Thread Scheduling
A preemptive, multitasking operating system must use some type of formal process to determine which
threads should run and for how long. The algorithms Windows uses to determine when a thread gets
scheduled aren't always well documented, but Microsoft has designed them to be as fair and as generally
applicable as possible. Below, I'll document how a few of these algorithms work; just understand that future
versions of Windows may alter them significantly.
Before we go any further, I should stop and point out that SQL Server does not use the Windows scheduler and
the scheduling APIs as you might expect. That's because it handles most of its thread scheduling needs itself
via its UMS component. One noteworthy side effect of this is that, to the operating system, only one SQL
Server thread generally appears to be active at a time per processor. So, even though the server may have
hundreds of worker threads at any given time, only one of them for each processor on the server appears to
Windows to be actually doing anything.
I'll talk more about the reasons for this in Chapter 10, but just understand for the time being that SQL Server
makes use of the Windows scheduler and scheduling APIs in different ways than you might expect. We'll plumb
the depth of those differences and the reasons behind them later.
Context switch� what happens when Windows saves off the contextual information for one thread
and loads that of another so that the other thread can run. This consists of saving/loading volatile
register values and other elements pertinent to the runtime environment of the thread.
Quantum� the time slice a thread is given to run by the Windows scheduler.
Preemption� what happens when Windows stops one thread before its quantum has expired so that
another thread can run.
Clock interval� the frequency at which the CPU's clock interrupt fires.
Thread state� the current execution status of a thread, represented by an integer value between 0
and 7.
Process priority� the execution priority of a process ranging from Idle through Real-Time.
Thread priority� the process-relative execution priority of a particular thread ranging from Idle
through Time-Critical.
Processor affinity� a bitmap indicating the processors on which a process or thread can run.
Ideal processor� the processor considered the best host for a particular thread. Windows gives
preference to this processor when scheduling the thread to run but will allow the thread to run on
another processor if its ideal processor is busy.
Thread starvation� what happens when a lower-priority thread is continuously preempted by higher-
priority threads and not allowed to run.
Overview
Windows schedules work using a priority-driven, preemptive scheduling system. This means that the highest-
priority thread that is ready to run (runnable) preempts lower-priority threads (within the limits of processor
affinity, which we'll discuss more in a moment). When a lower-priority thread is preempted, it is interrupted
(perhaps just momentarily) while a higher-priority thread is allowed to run. Thread starvation occurs when a
thread is continually preempted and not allowed to run for an extended period of time.
Windows schedules work at thread granularity. Given that processes don't run but only provide an
environment in which threads can run, this makes sense.
Windows creates the illusion that all threads run concurrently by partitioning the time each thread is allowed
to run into time slices called quantums. It hands quantums to threads in a round-robin fashion.
There is no single routine within Windows that performs all task scheduling. Within the Windows kernel, there
are numerous routines and modules involved in the scheduling of work. We refer to them collectively as the
kernel's dispatcher or scheduler.
Sleep/SleepEx Suspends execution of the current thread for a specified amount of time
Get/SetThreadPriorityBoost Gets/sets the system's ability to temporarily boost the priority of a thread
Get/SetProcessPriorityBoost Gets/sets the system's ability to temporarily boost the priority of a process
SwitchToThread Allows a thread to give up the remainder of its quantum so that other threads can
run
Perfmon is, again, a key diagnostic tool here. Pview and Pviewer are also very valuable for monitoring what's
happening with specific processes. Given that much of the scheduling code actually resides in the kernel,
there is only so much information that user mode tools can provide (Table 3.10).
Process Base Priority Process Priority Class Thread Base Priority Thread CurrentPriority
TaskMgr
Perfmon
Pstat
Pviewer
Quantums
As I've mentioned, a quantum refers to the length of time a thread is allowed to run before the system
interrupts it. The quantum value isn't actually a time length�it's an integer value that represents what are
commonly referred to as quantum units. Each time a thread is scheduled, it starts with a certain amount of
quantum units. Each time the processor's clock interrupt fires, a given number of these units are deducted
from this amount. When this count reaches 0, the thread is interrupted and another thread is allowed to run.
Of course, this assumes that no higher-priority threads are waiting for the thread's processor. If a higher-
priority thread needs to run and has affinity with the processor on which a lower-priority thread is currently
running, all bets are off�the higher-priority thread preempts the lower-priority thread and runs regardless of
whether the latter has finished its quantum.
The frequency of the clock interval varies from platform to platform. Clock interrupt frequency is dictated by
Windows' hardware abstraction layer (HAL), not the kernel. On most x86 uniprocessor systems, the clock
interval is 10 milliseconds (ms); on most x86 multiprocessor systems, it's 15 ms.
The number of quantum units deducted for each clock interrupt is 3. On Windows 2000 Professional and
Windows XP, the default quantum amount for a thread is 6. On Windows 2000 Server and Windows Server
2003, it's 36. Given a uniprocessor clock interval of 10 ms, this means that a thread quantum can span 2 clock
interrupts (approximately 20 ms) on Windows XP, or 12 clock interrupts (120 ms) on Windows Server 2003.
And on a multiprocessor system with a clock interval of 15 ms, a thread's quantum can last for a maximum of
30 ms on Windows XP, and 180 ms on Windows Server 2003.
As I've mentioned, a thread might not get to complete its quantum. The numbers above are maximums: if a
higher-priority thread becomes schedulable for a given processor, it will preempt a running lower-priority
thread.
You might be wondering why a quantum is expressed as a multiple of 3 per clock tick rather than the simpler
1:1 ratio. The reason for this is to allow for partial quantum decay when a thread comes out of a wait state.
When a thread whose base priority is less than 14 executes a wait function (e.g., WaitForSingleObject), its
quantum is reduced by 1. (Threads with a base priority of 14 or higher have their quantums reset after coming
out of a wait state.) So, instead of possibly having its entire 6-unit quantum remaining when it comes out of its
wait state, it will have, at most, 5 remaining quantum units. This addresses the situation where a thread
continuously runs and goes to sleep between clock ticks. Were it not for the system enforcing quantum decay
on the thread each time it came out of its wait state, it would have what amounted to an infinite quantum so
long as it did not happen to be running when the clock interrupt fired.
Thread States
If you start Perfmon and display the explanation for the Thread State counter for the Thread object, you'll see
that there are eight potential thread states (Table 3.11).
Value State
Initialized
0
Unknown
7
You will probably be surprised to learn that most threads spend most of their time waiting. In almost any given
application, threads spend the majority of their time waiting for some event�keyboard input, a mouse click,
I/O operations, and so on�to occur or complete. This is certainly the case with SQL Server, as we discovered
earlier in the chapter when we dumped the SQL Server thread stacks using WinDbg.
Thread Priorities
Windows supports 32 priority levels, ranging from 0 to 31�31 being the highest. Threads begin life inheriting
the base priority of their process. This priority can be set when the process is first created with CreateProcess
and can be changed afterward by using SetPriorityClass or externally by using Task Manager or a similar tool.
Table 3.12 summarizes the process priorities supported by Windows.
Once created, a thread's priority can be changed using SetThreadPriority. You never set an exact thread
priority value but instead use one of the predefined constants provided by the Win32 API (e.g., THREAD_
PRIORITY_NORMAL, THREAD_PRIORITY_ABOVE_NORMAL, and so on) to set the priority of a thread. The precise
numeric values of these constants are subject to change (and have changed) between releases of Windows.
Table 3.13 lists the currently supported thread priority levels.
As I've mentioned, higher-priority threads preempt lower-priority ones. This means that, all other things being
equal, as long as a higher-priority thread is runnable, lower-priority threads will not get time on the
processor�they will starve indefinitely until the higher-priority thread either terminates or enters a wait state.
Priority Description
Idle The threads in this process run only when the system is idle.
Below On Windows 2000 and later, threads in a Below normal priority process run at a lower priority than
normal Normal but at a higher one than Idle.
Above On Windows 2000 and later, process threads in the Above normal class run at a higher priority than
normal Normal but at a lower one than High.
High This class is used for time-critical tasks. Task Manager runs at this class so that it can kill processes
that are CPU-intensive.
Real- This class is used for time-critical tasks. Threads in a time-critical process must respond immediately
Time to events. Note that tasks with this base priority compete with the operating system for processor
time and can adversely affect overall system performance. You should not use this priority class
unless absolutely necessary.
Priority Description
Idle The thread runs at a priority level of 1 for Above normal, Below normal, High, Idle, and Normal
priority processes, and a priority level of 16 for Real-Time priority processes.
Lowest The thread runs at 2 points below the Normal priority for the base priority class.
Below The thread runs at 1 point below the Normal priority for the base priority class.
normal
Normal The thread runs at Normal priority for the priority class.
Above The thread runs at 1 point above the Normal priority for the base priority class.
normal
Highest The thread runs at 2 points above Normal priority for the base priority class.
Time- The thread runs at a base priority level of 15 for Above normal, Below normal, High, Idle, and Normal
critical priority processes, and a base priority level of 31 for Real-Time priority processes.
All priorities are not equally available to applications. Priorities 17�21 and 27�30 are not available to user
mode applications (but they are available to device drivers). Also, only one thread in the system can run at
priority 0: Windows' zero page thread. The zero page thread's job is to zero free pages in RAM when there are
no other runnable threads. It runs only when the system is otherwise completely idle. The system always tries
to keep the CPU busy and sits idle only when there is absolutely no work to do.
If the scheduler sees that a thread has completed its quantum and there are no other threads at its priority, it
will reschedule the thread to run for another quantum. This is why it's so easy for lower-priority threads to
become starved�the fact that a thread has just completed its quantum has no bearing on whether it will be
allowed to run again.
That said, when the scheduler detects that a thread has been starved for about 3�4 seconds, it boosts the
thread's priority in hopes of allowing it to run. It also doubles its quantum length. Once the thread runs, its
priority is reset to its former level and its quantum is restored to its default length.
TIP: You can change the base priority of an interactive (nonservice) application by using Task Manager. Simply
right-click the process in the Processes list and select Set Priority from the menu. You can also start a process
using a nondefault priority using the Windows start command and one of the priority command line switches
such as /abovenormal, /high, or /realtime.
Using Windows' Performance Options dialog (My Computer | Properties | Advanced in Windows 2000), you can
opt to have Windows boost the performance of foreground applications. In this dialog, you have two
application prioritization options: Applications and Background services. If you select Background services, no
boost occurs. But if you select Applications, Windows will increase the quantum length of whatever process is
currently in the foreground whenever the foreground application changes. This will generally make the
application that currently has focus more responsive but will do so at the expense of background services such
as SQL Server. On Windows 2000 Professional and Windows XP, the default is Applications. On Windows 2000
Server and Windows Server 2003, the default is Background services.
Real-Time Threads
A real-time thread is a thread whose priority is 16 or higher. Although you can create a process with a base
priority of real-time, keep in mind that real-time threads can block system threads, which may cause Windows
to behave erratically. Important system functions such as disk writes and memory allocations may be delayed
by your real-time thread. The mode in which a thread is running (user mode versus kernel mode) has no
bearing on preemption: A thread running in user mode can preempt a kernel mode thread and vice versa. This
means, of course, that as long as a priority 31 thread remains schedulable on a given processor, no other
thread�kernel mode or otherwise�can run on that processor.
An important difference between real-time threads and threads in other priority classes is that real-time thread
quantums are reset when they're preempted. So, whereas a normal thread would retain whatever was
remaining of its quantum when it was preempted and, once rescheduled, would run for the remainder of that
quantum or until again preempted, a real-time thread gets a brand new quantum once it's rescheduled after
preemption. This gives real-time threads yet another advantage over regular threads in terms of scheduling
and means that, in general, high-priority threads should not be schedulable most of the time if you want even
system throughput.
Note that "real-time" within the world of Windows does not mean real-time in the traditional sense of the term.
Windows doesn't provide conventional real-time operating system features such as guaranteed interrupt
latency or a way for a thread to obtain a guaranteed execution time. Even real-time threads can be
preempted�by other higher-priority real-time threads.
Scheduling Queues
The first thing to understand about the scheduling queue is that it is really a series of queues, one for each
scheduling priority. Each of the 32 thread priority levels has its own scheduling queue. When Windows begins
looking for a thread to run, it simply starts with the highest-priority thread queue (31) and works downward
through the other queues until it finds a ready thread.
In order to avoid having to physically walk all 32 thread queues every time it needs to find a ready thread,
Windows maintains a 32-bit bitmap called the ready summary. Each bit in the bitmap indicates that one or
more threads for the corresponding priority level are ready to run. This allows Windows to quickly detect which
thread queues to search for ready threads without having to iterate through all of them.
Windows also maintains a bitmap that indicates which processors are idle. This allows it to quickly find an idle
processor when it needs to schedule a thread to run.
Approximately every 20 ms, Windows examines all the thread kernel objects that have been created. Usually,
most of these are not runnable because they are waiting on something else to occur (an event, I/O, and so on).
Some, however, will be schedulable, so Windows will select one, load the CPU register values using the ones
last saved to the thread's CONTEXT structure, and schedule it to run. This process is called a context switch.
Once scheduled, a thread executes code and manipulates data in the process's address space until it is either
preempted or its quantum expires. When the system switches to another thread, it saves the CPU registers
back to the currently running thread's CONTEXT structure, restores those saved with the thread about to be
scheduled, and schedules the new thread. This process of switching between threads begins at system startup
and continues until system shutdown. It is Windows' main loop, if you will.
Context Switching
As I mentioned earlier, a context switch involves swapping the values in the CPU registers with those in each
thread's CONTEXT structure. Specifically, a context switch causes the following pieces of data to be
saved/loaded:
Windows actually keeps track of how many times a thread gets scheduled, as you can see by inspecting the
thread with Spy++ or by checking the Thread:Context Switches/sec Perfmon counter.
By default, a process's threads can execute on any of the processors in the host machine. If a process has
multiple threads and the host machine has multiple processors, Windows will attempt to distribute the threads
such that the processors are kept as busy as possible.
It's usually preferable to allow Windows to decide which processor a thread can run on. Windows always
strives to keep the processors in the host machine as active as possible, and it goes to great lengths to ensure
that the selection and scheduling process is fair.
There are situations when it's advantageous to limit the processors on which a thread may run. For example,
you may have a high-priority task for which you want to dedicate a CPU. You can set the thread-processor
affinity for the other threads in the process such that they will not be scheduled on a particular CPU. This
leaves the CPU for use by your high-priority task (and by other processes), whose thread you can then
affinitize to it. This technique of partitioning an application such that certain threads run only on certain
processors is fairly common in high-end, high-performance software.
Sometimes you'll find that a combination of the two approaches is best�you affinitize some threads but let
Windows decide what processors the others can run on. For example, in the scenario above, it would likely be
better to leave the high-priority task's thread unaffinitized. All other things being equal, Windows will not
attempt to schedule the thread on processors that are busy doing other things if there is an idle processor in
the system. Furthermore, since Windows defaults to running a thread on the processor on which it last ran,
once the thread is scheduled on the dedicated processor, it will likely stay there (because the other threads
cannot be scheduled on that processor due to their affinity masks). In that particular scenario, the hybrid
approach is probably more scalable overall because it allows for the possibility that you might add another
concurrent high-priority task thread at some point. In that case, you'd likely leave the new thread unaffinitized
as well. And even though you've set aside a single processor on which the two threads can run, by not
affinitizing them, you allow for the possibility that another processor may become available and permit one of
the high-priority threads to run without requiring it to wait on the other high-priority task to complete. In other
words, even though you've set aside one processor to serve the needs of your high-priority task threads, you
allow them to run on a different processor if the dedicated processor is busy. Since other threads cannot use
that processor, the only time when this will occur is when the dedicated CPU is busy with one of the high-
priority tasks and the other one needs to run. By not affinitizing them, you avoid wasting unused CPU
resources in the machine and you keep from having to set up a dedicated processor for every high-priority
task your application needs to carry out.
Of course, moving threads between processors is expensive because the likelihood that each processor's
secondary cache will be optimally used is much lower than if a thread always runs on the same processor. So,
there are certainly situations where it makes sense to set up strict affinities for the threads in a process rather
than allowing Windows to decide what processor each thread runs on. There is no right answer for all
applications. My advice is to allow Windows to manage processor-thread affinity unless there is simply no
other way to get the performance and scalability you need. You can usually find a way via thread priorities and
synchronization to get the performance you need without resorting to hard processor affinity. Once you get to
the place in application tuning that you are down to considering setting aside dedicated CPUs for certain tasks,
trial and error will probably be your best approach�you will have to experiment a little, and experience will be
your guide.
Types of Affinity
Threads in a Windows process can have one of three basic types of processor affinity: last processor (or soft)
affinity, ideal affinity, and hard affinity. Hard affinity is the type of affinity we were just discussing and is what
typically comes to mind when we talk about processor affinity: A thread is assigned to a given set of
processors and can run on no others. Once a thread has been affinitized to a particular processor or
processors, Windows ensures that it runs only on those processors. Even though other processors may be
sitting idle while the thread waits for its processor(s) to become available, the system will force the thread to
wait. You can set the hard affinity for an individual thread by calling the SetThreadAffinityMask Win32 API
function.
You can also set processor affinity in the header of an executable, but there's no linker switch for it. You can,
however, edit the executable header with ImageCfg.exe to change a processor affinity for a process.
TIP: The Windows Task Manager allows you to set processor affinity for a process via the Set Affinity menu
option. Find the process in the Processes list, right-click it, and select Set Affinity from the menu. Note that this
option is available only on multiprocessor computers and only on the Windows NT family. (Windows 9x offers
no special support for multiprocessor machines.)
As I've mentioned, from a performance standpoint, hard affinity is often not the most optimal approach. You
might limit a thread to a particular process while other processors on the machine sit idle. In that scenario,
ideal affinity may be a better solution. Setting a thread's ideal affinity tells the system which processor you'd
prefer that the thread ran on but allows it to schedule the thread on other processors if the preferred processor
is busy. You can set the ideal processor for a thread using the Win32 API call SetThreadIdealProcessor.
The third type of thread-processor affinity model supported by Windows is last processor affinity. By default,
Windows employs this type of affinity�all other things being equal, Windows attempts to schedule a thread on
the processor on which it last ran. This helps maximize the use of the secondary case on the processor (the
hope is that some of the thread's data may still be in the cache when the thread gets a new quantum).
Windows knows which processor a thread last ran on because it tracks this information in the thread's kernel
block. In fact, Windows maintains two CPU numbers for each thread�the last processor on which the thread
ran and its ideal processor.
When a thread is ready to be scheduled, Windows will first check to see whether it has an ideal processor. If it
does, Windows will attempt to schedule it on that processor. If that processor is busy or the thread does not
have an ideal processor, Windows will attempt to schedule it on the currently executing processor�the CPU on
which the scheduler itself is currently running. If that processor is not idle, the system will select the first idle
processor it finds by scanning the idle processor mask from highest- to lowest-numbered CPU.
If there are no idle processors, Windows checks the priority of the thread against the priority of the thread
currently executing on its ideal processor. If the thread has a higher priority than the one running on its ideal
processor, it preempts the other thread and is scheduled on that processor.
If it has a lower priority than the thread running on its ideal processor (or doesn't have an ideal processor),
Windows checks the processor on which it last ran. If it has a higher thread priority than the thread currently
running on that processor, it preempts that thread and the system sets it up to run.
If the thread cannot preempt either the thread running on its ideal processor or the one running on its last
processor, the system checks the other processors on the system to see whether it can preempt any of their
running threads. The CPUs are checked beginning with the highest processor in the active affinity mask down
to the lowest one. If a thread with a lower priority is found executing on one of these processors, the system
will preempt it in favor of the ready thread.
If a thread has a hard processor affinity mask associated with it, this obviously limits the processors visible to
the preceding search process. For example, even though a thread may have an ideal processor associated
with it, if that processor is not in its processor affinity mask, the thread will never run on its ideal processor.
If no processors are idle and the thread cannot preempt any other running threads, the thread will be placed in
the ready queue and reevaluated for scheduling on the next go-round. Note that Windows never moves
threads to make room for affinitized threads. If Thread A has affinity to CPU 0 and CPU 0 is busy, but Thread B
has affinity to both CPU 0 and CPU 1 (which is idle), the system will not move Thread B from CPU 0 to CPU 1 in
order to allow Thread A to run. Thread A will simply have to wait its turn on CPU 0.
Thread Selection
When the scheduler needs to find a new thread to run on a CPU that is currently executing code (e.g., when
the currently executing thread goes into a wait state, lowers its priority, changes it affinity, and so on),
Windows uses a simple algorithm for picking which thread gets to run. On a single processor system, it picks
the first ready thread in the list of ready queues, starting with the highest-priority ready queue and working
downward. For a multiprocessor machine, it picks a thread that meets one of the following conditions:
Windows allows threads to be suspended and resumed using the SuspendThread and ResumeThread API calls,
respectively. A suspended thread uses no CPU time and will not be scheduled until you resume it.
The system maintains a suspend count for each thread in its kernel object. The suspend count for a thread
indicates how many times the thread has been suspended. As long as this is not 0, the thread will not be
scheduled. When a thread's suspend count reaches 0, it is schedulable unless it is waiting for some other
event (e.g., I/O to complete).
When an application first creates a thread, Windows sets its suspend count to 1 to prevent it from being
scheduled while it is being initialized. Once the thread is initialized, the system checks to see whether the
thread was created with the CREATE_SUSPENDED flag. If it was, the suspend count is left unchanged;
otherwise it is reset to 0. When a thread is created in a suspended state, an application must call
ResumeThread in order to allow it to run.
When ResumeThread is successful, it returns the thread's previous suspend count. If it's not successful, it
returns 0xFFFFFFFF (-1). This is useful information to have given that you must resume as many times as you
suspend in order for a thread to be schedulable. For example, you could use ResumeThread's return value as
the control variable for a loop so that you could be sure a thread's suspend count was set to 0 when you
wanted it to run.
It should be self-evident that a thread can suspend itself, but it can't resume itself. Once a thread is
suspended, another thread will have to resume it in order for it to run.
You have to be careful about suspending running threads because you don't necessarily have a good idea of
what a thread is up to when you suspend it. It's usually preferable to code your thread functions so that a
thread enters a wait state (e.g., because it is waiting on a synchronization object) when you want to
momentarily suspend it than to force it to pause by calling SuspendThread when you may not know exactly
what it is doing. For example, if you suspend a thread with an open mutex or critical section, you may
inadvertently block other threads from executing that are waiting for the object to be released.
As with Win32's wait functions, you can pass in the constant INFINITE to Sleep/SleepEx to cause a thread to
sleep indefinitely, but I can't think of a practical application of this. If you're using Sleep, you won't be able to
awaken the thread in order to make further use of it, so you would be better off destroying it in order to free
up the system resources associated with it. If you're using SleepEx, an INFINITE delay can be terminated only
by an I/O completion callback or an asynchronous procedure call (APC).
Note that you can pass a delay of 0 into Sleep/SleepEx to cause the thread to allow other threads of at least
the same priority to run. On versions of Windows prior to Windows Server 2003, if you want lower-priority
threads to be allowed to run, you'll have to put the thread to sleep for a longer duration (even sleeping for a
single millisecond will allow lower-priority threads to run). Prior to Windows Server 2003, sleeping with a
duration of 0 will cause the calling thread to be rescheduled immediately even if lower-priority threads are
being starved.
An interesting alternative to using Sleep to yield to other runnable threads is the SwitchToThread function.
SwitchToThread exists for the very purpose of allowing a thread to surrender the remainder of its time slice
and allow other threads to run, regardless of whether they have a lower priority than the current thread.
Exercises
In these exercises, you'll experiment with SQL Server and thread/process priorities. You'll learn to start SQL
Server as a real-time process and how to view process priority using Task Manager. You'll learn how SQL Server
"sleeps" while it waits for T-SQL commands such as WAITFOR DELAY to complete. And you'll use an xproc to
look under the hood a bit and display thread priorities, affinities masks, and other useful information for SQL
Server's worker threads.
You're probably aware that you can configure SQL Server to run at the High process priority. You do this in
Enterprise Manager via the Processor tab in the Properties dialog for your server.
When you change this setting and restart your server, Task Manager will show that it's running at High priority.
Microsoft doesn't normally recommend that customers change this setting, but it's not uncommon to find
those who have.
You can take this a step farther and actually run SQL Server as a real-time process. Again, this isn't
recommended, but, from a technical standpoint, it is possible. As I mentioned earlier, running a process at
Real-Time priority will cause its threads to contend with operating system threads for processor time, possibly
slowing down or even blocking key operating systems such as disk I/O and memory allocations. So, I offer the
following to you merely as an experiment. You should not run SQL Server at the Real-Time priority in
production, nor should you conduct the following tests on anything but a test or development machine as the
OS itself may become unresponsive.
1. Stop your SQL Server via the SQL Server Service Manager. Again, this should be a test or development
instance, and, ideally, you should be its only user.
2. Open a command prompt and switch to the folder that contains sqlservr.exe, the SQL Server executable.
This should be the binn subfolder under your SQL Server main installation folder.
Replace YourInstanceName with the name of the instance you're starting. Omit the -s parameter
altogether if you're starting a default instance.
4. You should see SQL Server start as a console application in a separate window. Switch to Task Manager
and check the base process priority column. (If the Base Pri column isn't visible in Task Manager, select
the View | Select Columns menu option and select it from the list of available columns.) The base process
priority should now be Real-Time.
5. Let's do a quick test to see whether running at Real-Time priority helps CPU-intensive queries finish more
quickly. Connect to your SQL Server instance using Query Analyzer and run the following query.
declare @var int
set @var=1
while @var<100000 begin
set @var=@var+1
end
6. Note the amount of time it takes to run. Depending on your processor, this shouldn't be more than a few
seconds.
7. Now stop your SQL Server (go to the console window and press Ctrl+C) and restart it using SQL Server
Service Manager. This should put it back at its default process priority.
8. Now run the query again from Query Analyzer and check how much time it takes to run.
On my system, the execution times of the two runs are the same�running SQL Server at Real-Time priority
didn't speed up my CPU-intensive query. Normally, I wouldn't expect running SQL Server at a higher priority to
make much difference unless there were other things running on the machine at the same time and they had
a high enough base priority that they would often preempt SQL Server.
My point is this: You shouldn't expect running SQL Server at a higher priority to automatically turbocharge your
system all by itself. Setting a process's priority affects who wins when there's contention for processor
resources; it does not automatically speed anything up in and of itself.
For this reason�and because SQL Server has not been certified to be safe to run at Real-Time priority�I
recommend that you leave SQL Server at its default priority.
In this exercise, you'll learn what SQL Server does when you tell it to put a connection to sleep with the
Transact-SQL WAITFOR DELAY command. Having just learned about the Win32 API Sleep and SleepEx
functions, it might seem obvious that SQL Server calls one of these functions to put a thread to sleep when
you execute WAITFOR DELAY in Transact-SQL. Understanding how the server actually handles this scenario
and how it handles language events versus remote procedure call (RPC) events will give us some insight into
how it works internally. Let's take a peek under the hood by following these steps.
1. Start an instance of SQL Server to which you can attach a debugger. This should not be a production
machine, and, ideally, you should be its only user. You may find that starting the server as a console
application is preferable to starting it as a service because doing so prevents SQL Server Agent from
running because it depends on the service.
2. Start WinDbg and make sure that your symbol path is set correctly, as outlined in Chapter 2.
3. Attach to your SQL Server using the debugger (press F6). Find sqlservr.exe in WinDbg's list of running
processes and double-click it. If you just started the SQL Server instance, sqlservr.exe should be near the
bottom of the list.
4. When the Disassembly window opens, close it. We won't need it for this exercise. If it reopens at any
time during the exercise, feel free to close it then as well.
5. Type g in the command prompt window and hit Enter. This will cause the SQL Server process to continue
running.
6. Open a Query Analyzer connection to the server and run the following command:
7. Return to the debugger now, and hit Ctrl+Break to stop the SQL Server process. Type the following
command to see what each thread is doing:
~*kv
This will list off the call stack for each thread. We'll be able to tell what each thread is up to during our
WAITFOR call by examining these stacks.
8. If you look closely at each call stack (and provided that you are the server's lone user), you should see
that nearly all of the threads are performing identical work except one. Skip past the first few threads, as
these are system threads that don't correspond to user connections. Scan down through the list of call
stacks, and you should find one thread with a list of function calls that differs markedly from the others.
Its stack features a call to a function named language_exec. See if you can find it. Note that the topmost
function on this call stack is not Sleep or SleepEx. If this is the thread servicing our WAITFOR DELAY
command, it obviously isn't using the Sleep or SleepEx functions to do it. Under my instance of WinDbg,
the stack that contains the call to language_exec (set in boldface type below) looks like this:
The topmost function on this stack is a call to the NtWaitForSingleObject function, the native API function
that calls into the Windows kernel to actually carry out the wait requested by WaitForSingleObject. So, as
I've said, if this thread is executing our WAITFOR DELAY call (we'll answer that question definitively in a
moment), we can say with certainty that WAITFOR DELAY does not result in a call to the Win32 Sleep
function. (If you think about it, this makes sense without even knowing anything about UMS�WAITFOR
also supports waiting until an absolute date/time and supports being canceled, which Sleep obviously
could not service.)
9. We might infer from language_exec's name that it's what gets executed when we submit a T-SQL
language batch to the server. To verify that, let's set a breakpoint on it and see what happens when we
run our command again. Type this command into the command window:
bp language_exec
You can type bl in the command window to verify that your breakpoint is set. You should see output like
this from bl:
It's important that the second column of the output is set to e, indicating that the breakpoint is enabled.
10. Type g in the command prompt window and hit Enter. This will cause the SQL Server process to continue
running.
11. Return to Query Analyzer and click the stop button to cancel your query, then run it again.
12. Return to WinDbg, and you should see the debugger stopped at your breakpoint. Your output should look
something like this:
Breakpoint 0 hit
eax=00000000 ebx=0097fb00 ecx=0000003c edx=00000000
esi=42d9a090 edi=00000001
eip=004597ef esp=260afa68 ebp=260afefc iopl=0
nv up ei pl zr na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000
efl=00000246
sqlservr!language_exec:
004597ef b8ff619300 mov eax,0x9361ff
This tells us that we hit the breakpoint we set earlier, but, beyond the inference we're drawing from its
name, how do we know that language_exec is the function called when a T-SQL language event is
received by the server? Let's test sending an RPC event to the server to see whether the language_exec
breakpoint is tripped in that situation.
13. Type bd 0 in the command window and press Enter. This will disable the breakpoint we set up earlier. We
need to disable this breakpoint for now so that we can create a procedure for use in our RPC event.
14. Type g in the command prompt window and hit Enter. This will cause the SQL Server process to continue
running.
15. Return to Query Analyzer and stop your query if it is still running. Open a new Query Analyzer window
and create a new procedure in the pubs database using this command:
USE pubs
GO
CREATE PROC waiter as WAITFOR DELAY '00:00:30'
16. Now open a new Query Analyzer window and type the following into it:
{CALL waiter}
This syntax will submit a call to the waiter stored procedure as an RPC event. Do not run it yet.
17. Return to WinDbg and press Ctrl+Break to stop the SQL Server process.
18. Now reenable your breakpoint by typing this command in the command window and pressing Enter:
be 0
19. Type g in the command prompt window and hit Enter. This will cause the SQL Server process to continue
running.
20. Return to Query Analyzer and run the waiter procedure via the RPC syntax you typed in earlier.
21. Switch back to WinDbg and see whether your breakpoint was hit. It should not have been. This gives us
pretty conclusive evidence that language_exec is the internal function used for T-SQL language events in
SQL Server. Feel free to try some other queries to see whether they trip the breakpoint you set up on
language_exec. Unless you submit the query using the RPC syntax outlined above, each batch you
submit to the server should trip the breakpoint at least once. Those with GOs embedded will trip it
multiple times as these are submitted separately to the server by Query Analyzer.
22. You may be wondering what internal function is called due to an RPC event. Let's find out. Hit Ctrl+Break
to stop the SQL Server process, then type ~*kv to list the call stacks of the process's threads.
23. As before, the call stacks of most of the nonsystem threads are virtually identical except for one that
features a call to a routine named execute_rpc. Let's set a breakpoint on execute_rpc to see whether it
gets tripped when we submit our RPC event. Type the following into the WinDbg command window:
bp execute_rpc
24. Type g in the command prompt window and hit Enter. This will cause the SQL Server process to continue
running.
25. Return to Query Analyzer and stop your procedure call if it is still running, then run it again.
26. Return to WinDbg and you should see that your breakpoint has been hit. Your output should look
something like this:
Breakpoint 1 hit
eax=00000000 ebx=0097fb00 ecx=00000018 edx=42dd30a8
esi=42dd6090 edi=00000003
eip=0043b7d2 esp=2619fa68 ebp=2619fefc iopl=0
nv up ei pl zr na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000
efl=00000246
sqlservr!execute_rpc:
0043b7d2 55 push ebp
So, we can reasonably conclude that language_exec is called when a language event comes into the
server, and execute_rpc is called when an RPC event is received. Feel free to experiment with other
language and RPC events to see how this works.
27. Type q to quit the Debugger. You will then need to restart your SQL Server instance with SQL Server
Service Manager.
Exercise 3.8 Viewing Thread Priorities, Affinities, and Other Useful Information
In this last exercise, you'll run an extended procedure to iterate through SQL Server's currently active worker
threads and list important information for each one. Follow these steps.
1. Copy the file xp_sysinfo.dll from the CH03\xp_sysinfo\release subfolder on the CD accompanying this
book to the binn folder under your SQL Server installation path. (This DLL may already be installed from
a previous exercise in this chapter.)
sp_addextendedproc 'xp_threadlist','xp_sysinfo.dll'
3. Open the xp_threadlist.sql T-SQL script from CH03\xp_sysinfo subfolder in Query Analyzer. Do not run it
yet.
4. Create a scratch folder on your hard drive and copy to it the files STRESS.CMD and STRESS.SQL from the
CH03 subfolder on the CD.
5. Run STRESS.CMD with a command line like this one, substituting your server name for the one specified:
6. You should see 15 windows open on your desktop, all of them using osql.exe to run a select against a
table in pubs, then issuing a WAITFOR DELAY command.
7. While the queries run, switch back to Query Analyzer and run the script you loaded previously. You
should see output like this:
8. On a multiprocessor box, you can experiment with SQL Server's processor affinity to cause different
values to show up in the AffinityMask column. In the above example, each thread has an affinity to
processors 1 and 2 in the box, resulting in a bitmask of 3.
You can also use linked server queries to set up impersonation for a particular worker thread. The
threads in the example output above have all inherited SQL Server's security context and do not have
access tokens of their own.
Note that you may see different values for the BasePriority column depending on when you run the xproc
and what the system is doing. I've personally seen a fair amount of fluctuation across the worker
threads.
Note also that the threads listed are the currently active worker threads; inactive or idle threads are not
listed. If you want to see all the threads currently instantiated within the SQL Server process, use a tool
like Perfmon or Pview.
9. The 15 windows should close on their own and your system should now be back to normal.
Windows implements a preemptive, priority-based scheduler for scheduling and running application code via
threads. The system is designed to keep a single application from taking over the system and to provide for
even performance across the system.
There are 32 priority levels at which a thread can run. Threads running at higher priorities preempt those
running at lower priorities.
Each thread gets a schedule time slice called a quantum in which to run. The exact length of the quantum
varies between versions of Windows and between multiprocessor and uniprocessor machines. A thread is not
guaranteed a full quantum because it may be preempted by a higher-priority thread.
SQL Server does not use the Windows scheduler and scheduling APIs in the same way that most multithreaded
applications do. This is because it handles most of its scheduling needs using its UMS facility.
1. What is a quantum?
4. True or false: One of the eight thread states (numbered 0 through 7) is Suspended, indicating that the
thread has been suspended through a call to SuspendThread.
5. True or false: Task Manager cannot display the base priority for individual threads in a process.
9. Name the term for what happens when a lower-priority thread is continuously preempted by higher-
priority threads and not allowed to run.
10. What does Windows do when it sees that a thread has not been allowed to run for 3�4 seconds? What
happens after the thread gets to run?
11. True or false: Windows implements a cooperative multitasking system wherein processes must be careful
to yield to one another in order to keep the system running smoothly.
12. What Win32 API is used to set the processor affinity for an individual thread?
14. For each clock interrupt, how many quantum units are deducted from a thread's quantum?
15. True or false: Windows automatically takes into account the fact that a thread has just completed its
quantum when it decides which thread to schedule next and will automatically allow lower-priority
threads to run before allowing the high-priority thread to run again.
16. Name the term that describes associating a thread or process with a given set of CPUs.
17. True or false: When a thread with a priority lower than 14 successfully waits on a kernel object using
WaitForSingleObject, the system automatically deducts 1 unit from its quantum.
18. True or false: The function responsible for performing the lion's share of scheduling within the Windows
kernel is named ScheduleThread.
19. When you issue a Sleep(0) call, are lower-priority threads allowed to run?
20. True or false: Windows keeps a separate ready list for each thread priority and maintains a bitmap to
make accessing that list faster.
21. What happens to the quantum of a thread with a priority of 14 or higher that has just successfully waited
on a kernel object via WaitForSingleObject?
22. What internal function within SQL Server is responsible for processing language events? How about RPC
events?
23. True or false: SQL Server does not use the Win32 Sleep or SleepEx API functions in order to service the
Transact-SQL WAITFOR DELAY command.
24. True or false: Running SQL Server with a Real-Time process priority will speed up CPU-intensive queries
on an otherwise idle system.
25. True or false: You can change the base priority for a process using the Windows Task Manager.
26. True or false: The Spy++ utility included with the Platform SDK and recent versions of Visual Studio can
display the number of context switches for a given thread.
27. What is the default quantum length on Windows 2000 Professional and Windows XP?
30. True or false: If a thread's ideal CPU is not in its affinity mask, it can still be scheduled on that CPU
because Windows will ignore the mask when the two conflict.
Thread Synchronization
When you get beyond the simplicity of single-threaded applications and begin to
explore the world of multithreaded programming, one of the first things you discover
is the need to synchronize the activities of the threads in your application. Allowing
one thread to modify a global variable while another is using it to control flow logic is
a recipe for erratic application behavior and possibly even an access violation. The
fact that multiple threads can truly execute simultaneously on multiprocessor
machines does not mean that you literally want all of them running all of the time.
There are times when you need one thread to wait on others to finish what they're
doing before proceeding. There are times when you need to synchronize them.
Deadlock� what happens when two or more threads wait indefinitely on
resources owned by each other.
Wait function� one of the Win32 functions designed to put a thread to sleep
until a resource becomes available or a timeout period expires.
Signaled� the state of an object when it is available for use by a thread or
when a Win32 wait function should not wait on it.
Thread-safe code� code that has been designed such that multiple threads
accessing the same resources do so in a safe and predictable manner.
Atomic access� ensuring that a thread retrieves, modifies, and returns a
value or resource as a single operation without having to be concerned about
another thread modifying the value simultaneously.
Function Description
EnterCriticalSection Denotes a section of code that only one thread can access
at a time
Given that most synchronization objects are kernel mode objects, there's a limit to
how much information a user mode tool can give us about thread synchronization. A
kernel mode debugger is unfortunately the best option here. That said, the tools in
Table 3.15 provide several useful pieces of synchronization-related data.
Thread
Handle Context Kernel Object Thread Thread CPU
Security
Count Switches Count by Type Priority State Times
Context
Perfmon
Pview
Spy++
Simply put, a spinlock is a loop that iterates until a resource becomes available.
Listing 3.3 shows a simple example in C++.
void CSpinLock::GetLock() {
while (TRUE == InterlockedExchange(&g_bLocked, TRUE))
SwitchToThread();
InterlockedExchange, which we'll talk about more in a moment, assigns the value of
its second parameter to its first parameter in an atomic fashion and returns the
original value of the first parameter. The spinlock code above simply loops while the
original value of g_bLocked is TRUE�in other words, while the global resource in
question is locked. It continuously assigns TRUE to g_bLocked until
InterlockedExchange no longer returns TRUE. When InterlockedExchange returns
FALSE�meaning that the resource was not locked when the function was called�the
loop exits. Since InterlockedExchange has already set g_bLocked to TRUE, other
threads calling the GetLock method will stop at the while loop until g_bLocked is set
to FALSE by the thread that just acquired the spinlock.
As you can see, a spinlock isn't a separate type of user mode object; rather, it is
implemented by using user mode objects and code�in this case, a global variable
and one of the interlocked functions. Though it is often wrapped in a class of some
type in object-oriented languages such as C++, as far as Windows is concerned, a
spinlock is really more of a programming construct than a distinct type of
synchronization object.
Even though the above example tries to be as CPU efficient as possible by calling
SwitchToThread in its spin loop, it's still using some amount of CPU time while it
waits on the resource. This is, unfortunately, unavoidable without the assistance of a
scheduling facility (such as the one provided by Windows) that can maintain lists of
waiting threads and the resources they need independent of the threads
themselves, basically putting them to sleep until the resources they need become
available.
This is the main reason that kernel mode objects such as mutexes and semaphores
have a distinct advantage over spinlocks in terms of CPU usage. Because Windows
can act on behalf of a waiting thread and allow the thread to consume no resources
while the resources it needs are unavailable, waiting on a kernel object is often more
CPU efficient than using a spinlock to wait on a resource even though a thread must
transition from user mode to kernel mode in order to wait on a kernel object.
Because they are not coordinated by the operating system, spinlocks like the one
above must make a few assumptions.
1. Spinlocks assume that all threads run at the same thread priority level. (You
might want to disable thread priority boosting for this very reason.)
2. Spinlocks assume that the lock variable and the data to which the lock
provides access are maintained in different CPU cache lines. If they are on the
same cache line, you'll see contention between the processor using the
resource and any CPUs waiting on it. In other words, the continual assignment
of the control variable by the other CPUs will contend with the code accessing
the protected resource.
Generally speaking, you should avoid techniques and design elements that
continuously poll for resource availability. Windows provides a rich set of tools for
waiting on resources with minimal CPU usage. It makes sense to use what you get
for free in the OS box.
Often you'll find that a hybrid approach is the best fit for a particular scenario�use
spinlocks and critical sections to protect some types of resources; use kernel objects
for others. Or, use a spinlock to wait a fixed number of iterations, then transition to a
kernel object if it appears that the thread might have to wait for an extended period
of time. This is, in fact, how critical sections themselves are implemented. A critical
section starts off using a spinlock that iterates a specified number of times, then
transitions to kernel mode where it waits on the resource in question.
You'll note that there's no interlocked function for reading a value. That's because
none is necessary. If a thread attempts to read a value that is always modified using
an interlocked function, it can depend on getting a good value. That is, it can
assume that it will see either the value before it was changed or the value
afterward�the system guarantees that it will be one or the other.
Critical Sections
A critical section is a user mode object you can use to synchronize threads and
serialize access to shared resources. A critical section denotes a piece of code that
you want executed by only one thread at a time. The process of using a critical
section goes something like this.
2. On entrance to a routine that you want only one thread to execute at a time,
call EnterCriticalSection, passing in the previously initialized critical section
structure. Once you've done this, any other thread that attempts to enter this
routine will be put to sleep until the critical section is exited.
4. On program shutdown (or some other similar termination event that occurs
after the critical section is no longer needed), call DeleteCriticalSection to free
up the system resources used by the object.
Because you can't specify the amount of time to wait before giving up on a critical
section, it's entirely possible for a thread to wait indefinitely for a resource. This
happens, for example, when a critical section has been orphaned due to an
exception. The waiting threads have no way of knowing that the thread that
previously acquired the critical section never released it, so they wait indefinitely on
a resource that will never be available.
When a thread calls EnterCriticalSection for a critical section object that is already
owned by another thread, Windows puts the thread in a wait state. This means that
it must transition from user mode to kernel mode, costing approximately 1,000 CPU
cycles. This transition is usually cheaper than using a spinlock of some type to
continuously poll a resource to see whether it is available.
You can integrate the concept of a spinlock with a critical section by using the
InitializeCriticalSectionAndSpinCount function. This function allows you to specify a
spin count for entrance into the critical section. If the critical section is not available,
the function will spin for the specified number of iterations before going into a wait
state. For short-duration waits, this may save you the expense of transitioning to
kernel mode unnecessarily.
Note that it makes sense to specify a spin count only on a multiprocessor machine.
The thread owning the critical section can't relinquish it if another thread is spinning,
so InitializeCriticalSectionAndSpinCount ignores a nonzero spin count specification
on uniprocessor machines and immediately enters a wait state if the critical section
is not available.
You can set the spin count for a specific critical section using the Win32 API function
SetCriticalSectionSpinCount. The optimal value will vary from situation to situation,
but the fact that the critical section that's used to synchronize access to a process's
heap has a spin count of 4000 can serve as a guide to you.
As a rule, use one critical section variable per shared resource. Don't try to conserve
system resources by sharing critical sections across different resources. Critical
sections don't consume that much memory in the first place, and attempting to
share them unnecessarily can introduce complexities and deadlock potential into
your code that don't need to be there.
As I've mentioned, while your thread is waiting on a resource, the system acts as an
agent on its behalf. The thread itself consumes no CPU resources while it waits. The
system puts the thread in a wait state and automatically awakens it when the
resource(s) it has been waiting for becomes available.
If you check the states of the threads across all processes on the system, you'll
discover that most are in a wait state of some type most of the time. It is normal for
most of the threads in a process to spend most of their time waiting on some event
to occur (e.g., keyboard or mouse input). This is why it's particularly important for
the operating system to provide mechanisms for waiting on resources that are as
CPU efficient as possible.
Thread Deadlocks
A thread deadlock occurs when two threads each wait on resources the other has
locked. If each is set up to wait indefinitely, the threads are for all intents and
purposes dead and will never be scheduled again. Unlike SQL Server, Windows does
not automatically detect deadlocks. Once threads are caught in a deadly embrace,
the only resolution is to terminate them. Listing 3.4 presents some C++ code that
illustrates a common deadlock scenario.
// deadlock.cpp
//
#include "stdafx.h"
#include "windows.h"
CRITICAL_SECTION g_CS1;
CRITICAL_SECTION g_CS2;
int g_HiTemps[100];
int g_LoTemps[100];
printf("Exiting ThreadFunc1\n");
LeaveCriticalSection(&g_CS1);
LeaveCriticalSection(&g_CS2);
return(0);
}
printf("Exiting ThreadFunc2\n");
LeaveCriticalSection(&g_CS2);
LeaveCriticalSection(&g_CS1);
return(0);
}
InitializeCriticalSection(&g_CS1);
InitializeCriticalSection(&g_CS2);
hThreads[0]=CreateThread(NULL,0,ThreadFunc1,NULL,0,&dwThreadId);
hThreads[1]=CreateThread(NULL,0,ThreadFunc2,NULL,0,&dwThreadId);
WaitForMultipleObjects(2,hThreads,TRUE,INFINITE);
DeleteCriticalSection(&g_CS1);
DeleteCriticalSection(&g_CS2);
return 0;
}
The problem with this code is that the two worker threads access resources in
different orders: ThreadFunc1 enters the critical sections in numerical order;
ThreadFunc2 does not. I placed the call to Sleep after the first critical section is
entered in order to allow the second thread function to start up and allocate the
second critical section before Thread 1 can enter it. Once Sleep expires, Thread 1
then tries to enter the second critical section but is blocked by Thread 2. Thread 2,
on the other hand, is waiting on critical section 1, which Thread 1 already owns. So,
each thread waits indefinitely on resources the other has locked, constituting a
classic deadlock scenario.
The moral of the story is this: Always request resources in a consistent order in
multithreaded applications. This is a good design practice regardless of whether you
are working with user mode or kernel objects.
As I've mentioned, synchronizing threads via user mode objects is faster than
synchronizing them using kernel mode objects, but there are trade-offs. User mode
objects can't be used to synchronize multiple processes, nor can you use the Win32
wait functions to wait on them. Because you can't specify a timeout value when
waiting on user mode objects such as critical sections, it's easier to block other
threads and to get into deadlock situations. On the other hand, each transition to
kernel mode costs you about a thousand x86 clock cycles (and this doesn't include
the actual execution of the kernel mode code that implements the function you're
calling), so there are definitely performance considerations when trying to decide
whether to perform thread synchronization using kernel objects. As with many
things, the key here is to use the right tool for the job.
Signaling
Kernel mode objects can typically be in one of two states: signaled or unsignaled.
You can think of this as a flag that gets raised when the object is signaled and
lowered when it isn't. Signaling an object allows you to notify other threads (in the
current process or outside it) that you are ready for them to do something or that
you have finished a task. For example, you might have a background thread signal
an object when it has finished making a backup copy of the file being currently
edited in your custom programmer's editor. Your background thread saves the file,
then "raises a flag" (it signals an object) to let your foreground thread know that it's
done.
Kernel mode objects such as events, mutexes, waitable timers, and semaphores
exist to be used for thread synchronization and resource protection through
signaling. In themselves, they do nothing. They exist solely to assist with resource
protection and management�especially in multithreaded applications�by being
signalable in different ways. Processes, threads, jobs, events, semaphores, mutexes,
waitable timers, files, console input, and file change notifications can all be signaled
or unsignaled.
Some kernel objects, such as event objects, can be reset to an unsignaled state after
having been signaled, but some can't. For example, neither a process nor a thread
object can be unsignaled once it has been signaled. This is because for either of
these objects to be signaled, they have to terminate. You cannot resume a
terminated process or thread.
Wait Functions
I've mentioned the Win32 wait functions in some of the examples we've looked at
thus far, and now we'll delve into them a bit. The wait functions allow a thread to
suspend itself until another object (or objects) becomes signaled. The most popular
Win32 wait function is WaitForSingleObject. It takes two parameters: the handle of
the object on which to wait, and the number of milliseconds to wait. You can pass in
the INFINITE constant (0xFFFFFFFF, or -1) to wait indefinitely.
As the name suggests, WaitForMultipleObjects can wait for multiple objects (up to
64) to be signaled. Rather than passing in a single handle for its first parameter, you
pass in an array containing the handles of the objects to wait for.
WaitForMultipleObjects can wait for all objects to be signaled or just one of them.
When a single object causes WaitForMultipleObjects to return, its return value
indicates which object was signaled. This value will be somewhere between
WAIT_OBJECT_0 and WAIT_OBJECT_0 + NumberOfHandles � 1. If you wish to call the
function again with the same handle array, you'll need to first remove the signaled
object or the function will return immediately without waiting.
If a wait function times out while waiting on an object, it will return WAIT_TIMEOUT.
You can use this ability to implement a type of spinlock that waits for a short period
of time, times out, carries out some work, then waits again on the desired resource.
This keeps the thread from being completely unschedulable while it waits on a
required resource and gives you a finer granularity of control over the blocking
behavior of your threads.
There are several other wait functions (e.g., MsgWaitForSingleObject,
MsgWaitForMultipleObjects, MsgWaitForMultipleObjectsEx, WaitForMultipleObjectsEx,
WaitForSingleObjectEx, SignalObjectAndWait, and so on) that I won't go into here.
You can consult the Windows Platform SDK reference for details about these
functions. They are mostly variations of either WaitForSingleObject or
WaitForMultipleObjects.
Events
Event objects are exactly what they sound like: objects that allow threads to signal
to one another that something has occurred. They are commonly used to perform
work in steps. One thread performs the first step or two of a task and then signals
another thread via an event to carry out the remainder of the task.
Events come in two varieties: manual-reset events and auto-reset events. You call
SetEvent to signal an event and ResetEvent to unsignal it. Auto-reset events are
automatically unsignaled as soon as a single thread successfully waits on them; a
manual-reset event must be reset through a call to ResetEvent. When multiple
threads are waiting on an auto-reset event that gets signaled, only one of the
threads gets scheduled. When multiple threads are waiting on a manual-reset event
that gets signaled, all waiters become schedulable.
You call the CreateEvent API function to create a new event. Other threads can
access the event by calling CreateEvent, DuplicateHandle, or OpenEvent.
Waitable Timers
Waitable timers are objects that signal themselves at a specified time or at regular
intervals. You create a waitable timer with CreateWaitableTimer and configure it with
SetWaitableTimer. You can pass in an absolute or relative time (pass a negative
DueDate parameter to indicate a relative time in the future) or an interval at which
the timer is supposed to signal. Once the interval or time passes, the timer signals
and optionally queues an APC routine. If you created the timer as a manual-reset
timer, it remains signaled until you call SetWaitableTimer again. If you created it as
an auto-reset timer, it resets as soon as a thread successfully waits on it.
As the name suggests, you can cancel a waitable timer using the
CancelWaitableTimer function. Once a manual-reset timer is signaled, there's no
need to cancel it; you can simply close its handle.
As I mentioned, a waitable timer can optionally queue an APC routine. You won't
likely use this facility much because you can always just wait on the timer to be
signaled, then execute whatever code you want. In the event that you do decide to
use an APC routine, keep in mind that it's pointless for a thread to wait on a timer's
handle and wait on the timer alertably at the same time. Once the timer becomes
signaled, the thread wakes (which takes it out of the alertable state), causing the
APC routine not to be called.
If you've built many apps for Windows, you're probably familiar with Windows' user
timer object. This is a different beast than a kernel mode waitable timer. The biggest
difference between them is that user timers require a user interface harness in your
app, making them relatively resource consumptive. Also, as with the other kernel
mode objects we've been discussing, waitable timers can be shared by multiple
threads and can be secured.
Windows' user timer object generates WM_TIMER messages that come back either to
the thread that called SetTimer or to the thread that created the window. This means
that only one thread is notified when the timer goes off. Conversely, a waitable timer
can signal multiple threads at once, even across processes.
The decision of whether to use a waitable timer object or a user timer should come
down to whether you're doing very much user interface manipulation in response to
the timer. If so, a user timer is probably a better choice since you will have to wait on
both the object and window messages that might occur if you use a waitable timer. If
you end up using a waitable timer in a GUI app and need the thread that's waiting
on the timer to respond to messages while it waits, you can use
MsgWaitForMultipleObjects to wait on the object and messages simultaneously.
Semaphores
Typically, a kernel semaphore object is used to limit the number of threads that may
access a resource at once. While a mutex, by definition, allows just one thread at a
time to access a protected resource, a semaphore can be used to allow multiple
threads to access a resource simultaneously and to set the maximum number of
simultaneous accessors. You specify the maximum number of simultaneous
accessors when you create the semaphore, and Windows ensures that this limit is
enforced.
When you first create a semaphore object, you specify not only the maximum value
for the semaphore but also its starting value. As long as its value remains greater
than 0, the semaphore is signaled, and any thread that attempts to wait on it will
return immediately, decrementing the semaphore as it returns. Say, for example,
that you want a maximum of five threads (out of a pool of ten) to access a particular
resource simultaneously. You would create the semaphore with a maximum value of
5, then as each thread needed access to the resource, it would call one of the wait
functions to wait on the semaphore. The first five would return immediately from the
wait function, and each successful wait would decrement the semaphore's value by
1. When the sixth thread began waiting on the semaphore, it would be blocked until
one of the first five released the semaphore. If, say, Thread 5 then called
ReleaseSemaphore, Thread 6 would return immediately from its wait state. All the
while, the number of threads with simultaneous access to the resource would never
exceed five.
Mutexes
Mutexes are among the most useful and widely used of the Windows kernel objects.
They have many practical uses�from serializing access to critical resources to
making code thread-safe�and they are often found in abundance in sophisticated
multithreaded Windows apps.
Mutexes are very similar to critical sections except, of course, that they're kernel
objects. This means that accessing them can be slower, but they are generally more
functional than critical sections because they can be shared across processes and
because you can use the wait functions to wait on them with a specified timeout.
Mutexes are unusual in that they are the only kernel object that supports the notion
of thread ownership. Each mutex kernel object has a field that contains the thread ID
of the owning thread. If the thread ID is 0, the mutex is not owned and is signaled. If
the thread ID is other than 0, a thread owns the mutex, and the mutex is unsignaled.
Unlike all other kernel objects, a thread that owns a mutex can wait on it multiple
times without releasing it and without waiting. Each time a thread successfully waits
on a mutex, it becomes its owner (its thread ID is stored in the mutex's kernel
object), and the object's recursion counter is incremented. This means that you must
release the mutex (using ReleaseMutex) the same number of times that you waited
on it in order for it to become signaled, thus allowing other threads to take
ownership of it.
Windows also makes sure that a mutex isn't abandoned by a terminated thread,
potentially blocking other threads infinitely. If a thread that owns a mutex terminates
without releasing it, Windows automatically resets the mutex object's thread ID and
recursion counter to 0 (which signals the object). If another thread is waiting on the
mutex, the system gives it ownership of the mutex by setting the mutex's thread ID
to reference the waiting thread and sets its recursion counter to 0. The wait function
itself will return WAIT_ ABANDONED so that the waiter can determine how it came to
own the mutex.
Note that you can use I/O completion ports to synchronize operations besides those
involving asynchronous I/O. Using the PostQueuedCompletionStatus API function in
tandem with GetQueuedCompletionStatus, you can create a multithread signaling
mechanism that is more scalable than the SetEvent/WaitForMultipleObjects
approach. This is due to the fact that WaitForMultipleObjects is limited to waiting on
a maximum of 64 worker threads. Using an I/O completion port, you can create a
synchronization system that can wait on as many threads as the process can create.
Here's an example of how you could implement a mechanism that could wait on
more than 64 threads simultaneously.
1. The main thread of your process creates an I/O completion port that is not
associated with a particular file or files by passing NULL for its FileHandle
parameter.
2. The main thread creates as many threads as your process requires�let's say it
starts with a pool of 100 worker threads.
3. Once all the threads are created, the main thread calls
GetQueuedCompletionStatus to wait on the I/O completion port.
4. Whenever a worker thread has finished its work and wants to signal the main
thread, it calls PostQueuedCompletionStatus to post an I/O completion packet
to the port. For example, it might do this before returning from its entry-point
function or before going to sleep while it waits on, say, a global event
associated with the main thread. In order to let the main thread know which
thread completed its work, the worker thread could pass its thread ID into
PostQueuedCompletionStatus.
6. Because the main thread knows how many threads it created, it calls
GetQueuedCompletionStatus again, looping repeatedly until all the worker
threads have indicated that they have completed their work.
Because a thread need not terminate to be signaled and because the main thread
(or any thread) can wait on as many other threads as it wants, an I/O completion
port provides a nicely scalable alternative to waiting on multiple threads using
WaitForMultipleObjects or a similar API call.
Exercises
In this next exercise, you'll experiment with an application that is intentionally not
thread-safe. You'll get to see firsthand what happens when an application does not
synchronize access to shared resources. This should give you a greater appreciation
for the lengths SQL Server must go to in order to ensure that its worker threads can
access shared resources in a manner that is both safe and fast.
In the final exercise in this chapter, you'll learn to implement a spinlock based on a
kernel mutex object. SQL Server uses spinlocks to guard access to shared resources
within the server; understanding how they work will give you greater insight into
how SQL Server works.
The following C++ application demonstrates three methods for accessing a shared
resource (in this case, a global variable) from multiple resources. You can find it in
the CH03\thread_sync subfolder on the CD accompanying this book. Load the
application into the Visual C++ development environment and compile and run it in
order to work through the exercise.
The app creates 50 threads that check the value of a global variable and, if it's less
than 50, increment it. The end result of this should be a value of 50 in the global
variable once all the threads have terminated, but that's not always the case. Let's
looks at the code (Listing 3.5).
Listing 3.5 Thread Synchronization Options
#include "stdafx.h"
#include "windows.h"
#define MAXTHREADS 50
//#define THREADSAFE
//#define CRITSEC
//#define MUTEX
long g_ifoo=0;
HANDLE g_hStartEvent;
#ifdef CRITSEC
CRITICAL_SECTION g_cs;
#endif
#ifdef MUTEX
HANDLE hMutex;
#endif
printf("g_ifoo=%d\n",g_ifoo);
#ifdef CRITSEC
DeleteCriticalSection(&g_cs);
#endif
#ifdef MUTEX
CloseHandle(hMutex);
#endif
return 0;
}
Three #define constants control how (and whether) the program synchronizes
access to the global variable. By default, THREADSAFE, CRITSEC, and MUTEX are
undefined, so access to the global variable is not synchronized. If you run the
program on a multiprocessor machine enough times, you will eventually see a
situation where g_ifoo does not end up with a value of 50. This is because access to
the variable was not synchronized, and there was an overlap between the time one
thread retrieved the value and another incremented it, as illustrated by the scenario
outlined in Table 3.17.
Because of the overlap, two threads set g_ifoo to the same value, causing g_ifoo to
end up with a value less than 50 because there are only 50 worker threads.
If you then uncomment the //#define THREADSAFE line and recompile, this overlap
should be impossible. This is because the code then uses InterlockedIncrement to
ensure atomicity of the increment operation. In the scenario shown in Table 3.17,
this means that steps 3, 5, and 7 are performed as a single operation, as are steps 4,
6, and 8. Since Thread 10 completes its increment operation before Thread 11 is
allowed to do so, Thread 11 sees 11, not 10, as the current value of g_ifoo when it
performs its increment.
You can take this a step further by commenting out the #define for THREADSAFE
and uncommenting CRITSEC. Access to the global variable is then synchronized with
a critical section.
Step Action
The example below demonstrates a kernel object�based spinlock. You can find it in
the CH03\kernel_spinlock subfolder on the CD accompanying this book. Load it into
the Visual C++ development environment, then compile and run it. Listing 3.6
shows the code.
#include "stdafx.h"
#include "windows.h"
#define MAXTHREADS 2
#define SPINWAIT 1000
HANDLE g_hWorkEvent;
class CSpinLock {
public:
static void GetLock(HANDLE hEvent);
};
g_hWorkEvent=CreateEvent(NULL,false,true,NULL);
return 0;
}
In this example, the spinlock is implemented in its own class and exposed via a
static method. (The static method allows you to avoid having to create an instance
of the CSpinLock class in order to use it.) You could pass any kernel object into the
spinlock; in this example we use an event object.
The main function creates two threads that acquire the spinlock, sleep for five
seconds, then release the spinlock by signaling the event. Since only one of the
threads can acquire the spinlock at a time, the second thread to start waits on the
first one to complete by spinning for as many one-second durations as it takes until
the spinlock is released (i.e., the event is signaled).
Compile and run the code, experimenting with different sleep times for the thread
function and different numbers of threads. You should see output like the following
when you run the application:
SQL Server makes use of both types of synchronization. Its UMS component spends
a fair amount of time waiting on kernel mode synchronization objects, but it also
implements a variety of spinlocks and does its best to avoid switching a thread into
kernel mode unless absolutely necessary.
6. What is the only kernel object that supports the concept of a thread owner?
8. If you are building a windows GUI and want to update the GUI at certain
regular intervals, should you use a waitable timer object or a Windows user
timer?
9. What API function is used to set the signal frequency for a waitable timer
object?
10. True or false: A spinlock is a kernel mode object that spins (loops) until it
acquires a lock on a resource.
12. True or false: You cannot specify a timeout value when waiting on a critical
section object.
13. What action does the system take when a thread that owns a mutex
terminates?
14. What is the maximum number of objects that WaitForMultipleObjects can wait
on simultaneously?
15. What type of message does a Windows user timer object produce?
16. True or false: Generally speaking, you should avoid techniques and design
elements that continuously poll for resource availability.
17. True or false: Windows detects thread deadlocks, selects one of the
participating threads as the deadlock victim, and terminates it.
18. What API routine does a thread use to acquire a critical section object?
19. What API routine does a thread use to release a critical section object?
20. Name a mechanism discussed in this chapter for waiting on more objects than
the maximum supported by WaitForMultipleObjects.
22. True or false: A process object is the only type of kernel object that cannot be
signaled.
23. True or false: The order in which you access kernel resources has no bearing on
thread deadlocks because they are kernel resources.
24. True or false: Synchronizing threads using kernel mode objects is generally
faster than synchronizing them via user objects.
�Thomas Jefferson[1]
[1] Jefferson, Thomas. Letter to nephew, Peter Carr, from Paris. August 10, 1787; Reprinted in Thomas Jefferson: Writings, ed.
Merrill D. Peterson. New York: Library of America, 1994, pp. 900�906.
Page size� the memory page size that a given processor architecture
requires. On the x86, this is 4K. All Windows memory allocations must occur in
multiples of the system page size.
Allocation granularity� the boundary at which virtual memory reservations
must be made under Windows. On all current versions of Windows, this is 64K,
so user mode virtual memory reservations must be made at 64K boundaries
within the process address space.
System paging file� the file (or files) that Windows uses to provide physical
storage for virtual memory. Windows uses the paging file to swap physical
memory pages to and from disk in a manner that is transparent to the
application. The total physical memory storage on a given machine is equal to
the size of the physical memory plus the size of all the paging files combined.
NULL pointer assignment partition� the first 64K of the user mode address
space; it's marked off limits in order to make NULL pointer references easier to
detect.
Function Description
The best all around tool for monitoring Windows memory statistics and performance
is Perfmon. Task Manager is also surprisingly helpful. Keep in mind that Task
Manager's Mem Usage column lists each process's working set size, not its total
virtual memory usage. Since this column includes shared pages, you can't total it to
get the total physical memory used by all processes. Also, Task Manager's VM Size
column actually lists a process's private bytes (its private committed pages), not its
total virtual memory size.
Perfmon
Pstat
Pview
pmon
Reserved Virtual Paging Page Working Paged Nonpaged
Memory File Size Faults Set Size Pool Pool
TaskMgr
TList
Counter Description
Memory:Committed The committed private address space (in both the paging file
Bytes and physical memory)
Process:Virtual Bytes The total size of the process address space (shared and
private pages)
Process:Page File Peak The peak value of the Process:Page File Bytes counter
Addresses
Because Windows is a 32-bit operating system, all user processes have a flat 4GB
address space. This space is limited to 4GB because a 32-bit pointer can have one of
4,294,967,296 (232) values. This means that pointer values in Windows applications
can range from 0x00000000 to 0xFFFFFFFF.
On 64-bit Windows, processes have a flat 16EB (exabyte) address space. A 64-bit
pointer can have one of 18,446,744,073,709,551,616 (264) values, ranging from
0x0000000000000000 to 0xFFFFFFFFFFFFFFFF.
The fact that user processes are limited to 4GB of address space on 32-bit Windows
doesn't mean that apps can't access more than 4GB of physical memory. As you're
probably aware, it's not unusual for server machines to have more than 4GB of RAM
installed. Windows' AWE facility allows applications to fully utilize the physical
memory available in their host machines. We'll discuss AWE in more detail later in
the chapter. For now, just keep in mind that it allows an application to access
physical memory beyond 4GB. Windows 2000 Professional and Windows 2000 Server
both support up to 4GB of physical memory. Windows 2000 Advanced Server
supports up to 8GB, and Windows 2000 Data Center supports up to 64GB. Through
AWE, an application can make use of as much physical memory as the operating
system supports.
Keep in mind that the 4GB that a 32-bit process has to work with is virtual address
space, not physical storage. By virtual, I mean that the address space is simply a
range of memory addresses. Physical storage must be mapped to portions of this
space before an application can make use of it without causing an access violation.
2. Paging memory to and from disk when process threads attempt to use more
physical memory than is currently available
Beyond the virtual memory management services it provides, the memory manager
also provides core services to Windows' environment subsystems. These include the
following:
Memory-mapped files
Copy-on-write memory
Granularities
All processor chips define a fixed page size for working with memory. The page size
on the x86 family of processors is 4K. Any allocation request an application makes is
rounded up to the nearest page boundary. This means, for example, that a 5K
allocation request will actually require 8K of memory.
Like most operating systems, 32-bit Windows has a fixed allocation granularity�a
boundary on which all application memory reservations must occur. The boundary
will always be a multiple of the system page size. In the case of 32-bit Windows, this
boundary is 64K, so when an application requests a memory reservation, that
reservation must begin on a 64K boundary in the process address space. Though
many apps let Windows decide the precise location of the buffers they allocate,
some make allocations at specific addresses. For those that do, they must pass a
starting reservation address into Windows that aligns with a 64K boundary in the
process address space. Windows will round down any starting reservation address
that does not correctly align with the allocation granularity.
An app that's not mindful of the system's 64K allocation granularity can cause
address space to be wasted. If an application reserves a virtual memory region less
than 64K in size, the remainder of the 64K region is unusable by the application
thanks to the system-enforced allocation granularity. Because an app cannot then
specify a reservation that occupies the remainder of the region without having the
system automatically round it down to the start of the 64K region, the unused
address space is essentially wasted. So, it's possible to exhaust the address space
for a process without actually reserving or allocating much memory. We'll talk more
about memory reservation and commitment in the Virtual Memory section below.
You can retrieve both the system allocation granularity and the system page size via
the Win32 GetSystemInfo API function. It's conceivable that both of these could vary
in future versions of Windows, so it's wise not to hard-code references to them. See
Exercise 4.4 later in the chapter for an example of how to use GetSystemInfo in a
SQL Server extended procedure.
Windows isolates processes from one another such that no user process can corrupt
the address space of another process or of the OS itself. This makes Windows more
robust and protects applications from one another. There are four fundamental
aspects of this protection.
4. Shared memory sections have standard Access Control Lists (ACLs) that are
checked when processes access them.
These four aspects of the Windows memory management architecture make the
operating system far more robust than it otherwise would be. They help prevent
intentional and unintentional corruption of one process's address space by another,
and they help make Windows itself resilient in the face of catastrophic application
errors.
NOTE: As I've mentioned earlier, Windows does provide API functions such as
ReadProcessMemory and WriteProcessMemory that allow one process to access
another's address space. That said, using these functions requires specific access
rights; you cannot accidentally read or modify memory belonging to another
process. Typically (but not always), these functions are used by a debugger to
access the memory of a process being debugged. Also note that, by default, when
one process spawns another via a call to CreateProcess, the parent process has the
access permissions required to access the child process's virtual memory. Again, this
is typically used to facilitate debugging.
Partitions
At a high level, the 4GB process address space is organized as shown in Table 4.4.
Within the user mode portion, there are several smaller partitions (Table 4.5). The
following subsections briefly discuss these partitions.
Have you ever wondered why NULL (address 0x00000000) can't be used by an
application? After all, isn't it just another address within the process address space
(the first address, in fact) just like any other address? No, it isn't. And the reason it
isn't is because, in the interest of helping programmers catch NULL pointer
assignments, Windows has marked the first 64K of the process address space as off
limits.
The NULL pointer assignment partition is a very simple yet surprisingly useful
feature in the operating system that helps programs catch failed allocations. For
example, consider the following C code.
This code performs no error checking. If malloc is unable to allocate a buffer of the
requested size, it returns NULL. Because Windows has marked the entirety of the
first 64K of the process's address space as off limits (including address
0x00000000�NULL), any attempt to access a NULL pointer will result in an access
violation. In the code above, if the call to malloc returns NULL, the call to strcpy will
cause an access violation to be raised. This isn't because Windows checks every
pointer reference to make sure that it doesn't equal NULL; it's because no address
within the first 64K of the user mode space�0x00000000 or otherwise�may be
used.
Does this mean that the operating system wastes 64K of the memory in your
system? No, not at all. Remember: A process's address space is virtual�those
sections marked off limits by the operating system are not backed by physical
memory. For such a useful feature as the NULL pointer assignment partition, you
give up only a 64K range of memory addresses�no physical memory is wasted.
Why is the NULL assignment partition 64K in size? Why not just make the NULL
address, 0x00000000, off limits, or, at most, a single 4K page? Windows makes the
entire 64K off limits for two reasons.
2. NULL pointer references are often buried in pointer arithmetic where a NULL
memory address is not actually referenced, but one based on NULL plus an
offset of some kind is. This means that your NULL pointer reference may
actually end up causing your app to reference a memory location other than
0x00000000. Marking the entire 64K region off limits helps catch many of
these situations.
This is best explained by way of example. Exercises 4.1 through 4.3 later in this
chapter walk you through building a few test applications that demonstrate NULL
pointer references and how Windows helps you detect them.
The kernel mode partition is where the code for file system support, thread
management, memory management, networking support, and all device drivers
resides. Everything residing in the kernel mode partition is shared among all
processes.
You may be wondering whether the kernel really needs the top half of the process
address space. Unfortunately, the answer is yes, it does. The kernel needs this space
for OS code, device I/O cache buffers, process page tables, device driver code, and
so forth. To be sure, the kernel could really make good use of much more space. It
finally gets all the space it needs in 64-bit Windows.
One thing to keep in mind about kernel mode space: If you boot with the /3GB option
(discussed below), the kernel space is reduced to just 1GB. This, in turn, limits the
sizes of some of the data structures typically stored in the kernel mode space. For
example, when /3GB is enabled, you may access only 16GB of total system memory
because the size of the process page table is constricted by the limited kernel mode
space.
The PEB and TEB areas aren't regions that you'll make direct use of much, but it's
instructive to know about them and what they are. As I mentioned in Chapter 3,
each process has a process environment block (PEB) that's allocated in the user
mode space. As Table 4.5 indicates, the precise address of a process's PEB is
0x7FFDF000. This means that you can dump this region of memory from under a
debugger in order to view the PEB for a process. WinDbg has a special command for
doing exactly this, !peb. The next time you attach to SQL Server with WinDbg, try
the !peb command. You'll see that it returns a number of interesting pieces of data
including the modules currently loaded within the process, the command line passed
into the process, the address of the default heap, and many others.
As shown in Table 4.5, the address of the TEB for a process's main thread is at
0x7FFDE000. You can list the contents of a TEB using the WinDbg !teb command. If
you execute !teb without any parameters, you get the TEB for the current thread. If
you pass an address into !teb, you'll get the TEB at that address if there is one.
TEBs for the worker threads in a multithreaded application are stored on the page at
address 0x7FFDD000 and the pages immediately preceding it in memory (e.g.,
0x7FFDC000, 0x7FFDB000, and so on).
Shared User Data Page
The memory page at 0x7FFE0000 is known as the shared user data page. It contains
global items such as the clock tick count, the system time, the version number, and
various other system-level data elements. It is read-only and is backed by a memory
page that actually resides in the kernel address space. It exists in the user mode
space in order to allow API routines to access key system data without having to
switch to kernel mode.
Boundary Partitions
The last two regions of the user mode address space are off limits to applications.
The first is the remainder of the 64K region containing the shared user data page.
This 60K region is marked off limits by the operating system; any attempt to access
it will result in an access violation. The fact that the remainder of the 64K region
containing the shared user data page is marked off limits doesn't really affect user
mode applications because that region would be inaccessible to them anyway given
that user mode reservations must begin on an allocation granularity boundary.
The second region is the last 64K of the user mode address space. Windows marks it
off limits in order to prevent an application from accessing a region of virtual
memory that straddles the boundary between user mode and kernel mode. Because
routines such as WriteProcessMemory are actually validated by kernel mode code,
they can access address regions normally off limits to user mode code. By marking
the last 64K of user mode space off limits, Windows protects against memory access
that starts in the user mode space and extends into the kernel mode space.
From an application standpoint, the system paging file increases the amount of
memory available for use. It makes the system appear to have much more physical
memory than it actually does. This is why a machine with, say, 1GB of physical
memory can run many apps simultaneously, each having a 4GB process address
space that is, perhaps, 50% backed by physical storage.
Conceptually, it's helpful to think of the physical storage behind virtual memory as
the system paging file. Even though pages are constantly being copied in and out of
physical RAM, the vast majority of the physical storage behind the virtual memory in
the system is typically in the system paging file.
Although it is possible to run Windows without a paging file, this isn't usually
recommended. In a typical configuration, the system paging file is considerably
larger than the physical memory in the machine and provides apps with an efficient
mechanism for accessing more memory than the machine actually has.
The paging file size is the most important variable affecting how much storage is
available to an application. The amount of RAM has very little impact on the physical
storage available to an app, but it does, of course, affect performance very
significantly. When physical RAM is too low, the system will constantly copy data
pages to and from the paging file (a condition known as thrashing), and, of course,
performance will suffer commensurately.
Windows' AWE facility exists to allow applications to access more than 4GB of
physical memory. As I mentioned earlier, a 32-bit pointer is an integer that is limited
to storing values of 0xFFFFFFFF or less�that is, to references within a 4GB memory
address space. AWE allows an application to circumvent this limitation and access all
the memory supported by the operating system.
Typically, mechanisms that allow a pointer to access memory at locations beyond its
direct reach (i.e., at addresses too large to store in the pointer itself) pull off their
magic by providing a window or region within the accessible address space that is
used to transfer memory to and from the inaccessible region. This is how AWE
works: You provide a region in the process address space�a window�to serve as a
kind of staging area for transfers to and from memory above the 4GB mark.
2. Create a region in the process address space to serve as a window for mapping
views of this physical memory using the VirtualAlloc API function. We'll discuss
VirtualAlloc further in just a moment.
3. Map a view of the physical memory into the virtual memory window using the
MapUserPhysicalPages or MapUserPhysicalPagesScatter Win32 API functions.
While AWE exists on all editions of Windows 2000 and later and can be used even on
systems with less than 2GB of physical RAM, it's most typically used on systems with
2GB or more of memory because it's the only way a 32-bit process can access
memory beyond 3GB, as I mentioned earlier in the chapter. If you enable AWE
support in SQL Server on a system with less than 3GB of physical memory, the
system ignores the option and uses conventional virtual memory management
instead.
One interesting characteristic of AWE memory is that it is never swapped to disk.
You'll notice that the AWE-specific API routines refer to the memory they access as
physical memory. This is exactly what AWE memory is: physical memory outside the
control of the Windows virtual memory manager.
The virtual memory window used to buffer the physical memory provided by AWE
requires read-write access. Hence, the only protection attribute that can be passed
into VirtualAlloc when you set up this window is PAGE_READWRITE. Not surprisingly,
this also means that you can't use VirtualProtect to protect pages within this region
from modification or access.
The /3GB boot option is available on the Advanced Server and Data Center editions
of Windows 2000 (and later). It allows a process's user mode address space to be
expanded from 2GB to 3GB at the expense of the kernel mode address space (which
is reduced from 2GB to 1GB). In Windows parlance, this facility is known as
application memory tuning or 4GB tuning (4GT).
You enable application memory tuning by adding "/3GB" (without the quotes) to the
appropriate line in the [operating systems] section of your BOOT.INI. It's common for
people to configure their systems to be bootable with and without /3GB by setting
up the entries in the [operating systems] section of BOOT.INI such that they can
choose either option at startup.
WARNING: You can also boot Windows 2000 Professional, Windows 2000 Server,
and Windows XP with the /3GB switch. However, this has the negative consequence
of reducing kernel mode space to 1GB without increasing user mode space. In other
words, you gain nothing for the kernel mode space you give up.
NOTE: Windows Server 2003 introduced a new boot option to set the user mode
process space, /USERVA. You add /USERVA to your BOOT.INI just as you would /3GB.
The advantage of /USERVA over /3GB is that it gives you a finer level of control over
exactly how much address space to set aside for user mode use versus kernel mode
use. For example, /USERVA=2560 configures 2.5GB for user mode space and leaves
the remaining 1.5GB for the kernel. The caveats that apply to the /3GB switch apply
here as well.
Large-Address-Aware Executables
Before support for /3GB was added to Windows, an application could never access a
pointer with the high bit set. Only addresses that could be represented by the first
31 bits of a 32-bit pointer could be accessed by user mode applications. This left 1
bit unused, so some developers, being the clever coders they were and not wanting
to waste so much as a bit in the process address space, made use of it for other
purposes (e.g., to flag a pointer as referencing a particular type of application-
specific allocation). This caused a conundrum when /3GB was introduced because
these types of apps would not be able to easily distinguish a legitimate pointer that
happened to reference memory above the 2GB boundary from a pointer that
referenced memory below 2GB but had its high bit set for other reasons. Basically,
booting a machine with /3GB would likely have broken such apps.
To deal with this, Microsoft added support for a new bit flag in the Characteristics
field of the Win32 Portable Executable (PE) file format (the format that defines the
layout of executable files�EXEs and DLLs�under Windows) that indicates whether
an application is large address aware. When this flag
(IMAGE_FILE_LARGE_ADDRESS_AWARE) is enabled, bit 32 in the Characteristics field
in an executable file's header will be set. By having this flag set in its executable
header, an application indicates to Windows that it can correctly handle pointers
with the high bit set�that it doesn't do anything exotic with this bit. When this flag
is set and the appropriate version of Windows has been booted with the /3GB option,
the system will provide the process with a 3GB private user mode address space.
You can check whether an executable has this flag enabled by using utilities such as
DumpBin and ImageCfg that can dump the header of an executable file.
NOTE: The IMAGE_FILE_LARGE_ADDRESS_AWARE flag is checked at process startup
and is ignored for DLLs. DLLs must always behave appropriately when presented
with a pointer whose high bit is set.
/3GB vs. AWE
The ability to increase the private process address space by 50% is certainly a handy
and welcome enhancement to Windows' memory management facilities; however,
Windows' AWE facility is far more flexible and scalable. As I said earlier, when you
increase the private process address space by a gigabyte, that gigabyte comes from
the kernel mode address space, which shrinks from 2GB to 1GB. Since the kernel
mode code is already cramped for space even when it has the full 2GB to work with,
shrinking this space means that certain internal kernel structures must also shrink.
Chief among these is the table Windows uses to manage the physical memory in the
machine. When you shrink the kernel mode partition to 1GB, you limit the size of this
table such that it can manage a maximum of only 16GB of physical memory. For
example, if you're running under Windows 2000 Data Center on a machine with
64GB of physical memory and you boot with the /3GB option, you'll be able to
access only 25% of the machine's RAM�the remaining 48GB will not be usable by
the operating system or applications.
AWE also allows you to access far more memory than /3GB does. Obviously, you get
just one additional gigabyte of private process space via /3GB. This additional space
is made available to apps that are large address aware automatically and
transparently, but it is limited to just 1GB. AWE, by contrast, can make the entirety
of the physical RAM that's available to the operating system available to an
application provided it has been coded to make use of the AWE Win32 API functions.
So, while AWE is more trouble to use and access, it's far more flexible and open
ended.
Address Translation
1. The address will be valid and the page will already reside in physical memory.
2. The address will be valid and the page will be stored in the system paging file.
In this case, the data will be paged into physical memory so that it can be
accessed. This is known as a page fault. (You can track the page faults for a
process via Perfmon's Process:Page Faults/sec counter and via Task Manager's
Page Faults column.)
3. The address will be invalid and the system will raise an access violation
exception (user mode) or blue screen (kernel mode).
Virtual addresses aren't mapped directly to physical addresses. Instead, each virtual
address is composed of three elements: the page directory index, the page table
index, and the byte index. These elements establish the mapping between the
virtual address and the physical RAM it references.
For each process, the Windows memory manager creates a page directory that it
uses to map all the page tables for the process. Windows stores the physical address
of this page directory in each process's KPROCESS block (the kernel process block
stored within the EPROCESS block mentioned in Chapter 3) and maps it to address
0xC0300000 in the process address space.
The CPU keeps track of the address of a process's page directory table via a special
register (CR3, or Control Register 3, on x86; the PDR, or Page Directory Register, on
Alpha). Each time a context switch occurs wherein a thread from a different process
is scheduled on the CPU, this register is loaded from the KPROCESS block so that the
CPU's MMU can determine where the page directory table resides. Context switches
among threads in the same process do not require the register to be reloaded
because all threads in a process share the same address space.
This special register serves as a bootstrap for the system's memory management
facilities. Without it, a process's page directory cannot be located. Without the page
directory, the process address space itself cannot be accessed. The register provides
the entry point for the CPU's memory management hardware to access an individual
process's address space.
Each page directory consists of a series of page directory entries. The first 10 bits of
a 32-bit virtual address store a page directory entry (PDE) index that tells Windows
which page table to use to locate the physical memory associated with the address.
Each page table consists of series of page table entries. The second 10 bits of a 32-
bit virtual address provide an index into this table and indicate which page table
entry (PTE) contains the address of the page in physical memory to which the virtual
address is mapped.
On x86 processors, the last 12 bits of a 32-bit virtual address contain the byte offset
on the physical memory page to which the virtual address refers. The system page
size determines the number of bits required to store the offset. Since the system
page size on x86 processors is 4K, 12 bits are required to store a page offset (4,096
= 212).
1. The CPU's MMU locates the page directory for the process using the special
register mentioned above.
2. The page directory index (from the first 10 bits of the virtual address) is used
to locate the PDE that identifies the page table needed to map the virtual
address to a physical one.
3. The page table index (from the second 10 bits of the virtual address) is used to
locate the PTE that maps the physical location of the virtual memory page
referenced by the address.
4. The PTE is used to locate the physical page. If the virtual page is mapped to a
page that is already in physical memory, the PTE will contain the page frame
number (PFN) of the page in physical memory that contains the data in
question. (Processors reference memory locations by PFN.) If the page is not in
physical memory, the MMU raises a page fault, and the Windows page
fault�handling code attempts to locate the page in the system paging file. If
the page can be located, it is loaded into physical memory, and the PTE is
updated to reflect its location. If it cannot be located and the translation is a
user mode translation, an access violation occurs because the virtual address
references an invalid physical address. If the page cannot be located and the
translation is occurring in kernel mode, a bug check (also called a blue screen)
occurs.
The four-step process required to resolve a virtual address to a physical one may
seem inefficient at first glance. It may seem that it would be far simpler and more
efficient to compose a virtual address of two basic components: (1) a PTE that stores
the reference to the page in physical storage to which the virtual address maps and
(2) a page offset that pinpoints the precise data location of the data block
referenced by the address. However, the x86 and Alpha processors take the four-
step approach they do in order to conserve memory. If we simplify this process into a
basic one-step translation where each virtual address is composed of only two
components as I've just described, we end up consuming far more memory to
manage this table than we do in the four-step process, especially on systems where
the majority of the address space is unallocated. We would need 1,048,576 PTEs to
map a 4GB address space (4GB ÷ 4K page size = 1,048,576). With each PTE
requiring a 32-bit pointer, we would need 4MB of physical memory to map the
address space for each process (1,048,576 x 4 bytes = 4MB). Using the four-step
process that x86 and Alpha processors employ, only the page directory must be fully
defined�memory for the page directory can be allocated as necessary. Given that
the address space for many processes is mostly unallocated, the physical memory
this approach saves is significant.
That said, if this process occurred with every memory access, performance would
likely be very poor, so the x86 and Alpha processors cache virtual-to-physical
address translation pairs. The cache memory set aside for storing these address
pairs is known as a Translation Buffer (TB) or Translation Look-aside Buffer (TLB).
When the MMU is presented with a virtual address, it takes the virtual page number
and compares it with the virtual page number of every entry in the cache. If it finds
a match, it bypasses the four-step process and simply locates the PFN in physical
memory from the cache entry. A downside of the Windows scheduler switching from
one process to another is that cache entries associated with the process being taken
off the scheduler must be cleared. The four-step process then fills the cache with
entries from the new process.
Intel processors starting with the Pentium Pro and later include support for a
memory-mapping model called Physical Address Extension (PAE). PAE can provide
access for up to 64GB of physical memory. In PAE mode, the MMU still implements
page directories and page tables, but a new level exists above them: the page
directory pointer table. Also, in PAE mode, PDEs and PTEs are 64 bits wide (rather
than the standard 32 bits.) The system can address more memory than the standard
translation because PDEs and PTEs are twice their standard width, not because of
the page directory pointer table. The page directory pointer table is needed to
manage these high-capacity tables and the indexes into them.
A special version of the Windows kernel is required to use PAE mode. This kernel
ships with every version of Windows 2000 and later and resides in Ntkrnlpa.exe for
uniprocessor machines and in Ntkrnlpamp for multiprocessor machines. You enable
PAE use by adding the /PAE switch to your BOOT.INI file, just as you might add /3GB
or /USERVA.
Exercises
Earlier in the chapter we discussed NULL pointer references and how Windows helps
applications detect them (though it cannot completely prevent them). The next
three exercises take you through some sample code that exhibits different types of
NULL pointer references and shows how Windows handles each type.
1. Create a console app based on Listing 4.1 by loading and compiling the Visual
Studio project in the CH04\memexamp00 subfolder on the CD accompanying
this book . I'm assuming that you're working with Visual Studio C++ (VC++)
version 6.0 or later in the steps that follow.
#include "stdafx.h"
#include "stdlib.h"
#include "string.h"
3. When the app stops at the strcpy, place your mouse over pszLastName in the
VC++ editor window. A tool-tip hint should display indicating that pszLastName
has a value of 0x00000000. Why is this? The pointer is NULL because we
requested a larger memory allocation (2GB) than Windows could satisfy.
Because the code does no error checking, strcpy will attempt to copy the string
"Smith" into this invalid address.
4. Hit F10 to execute the strcpy line. You should now see an access violation.
Windows has intercepted the attempted access of memory address
0x00000000 (NULL) and raised the error you see. Press Shift+F5 to stop
debugging.
Now let's modify the app to cause a NULL pointer reference that is not so obvious.
1. Change your code to look like Listing 4.2 (or load memexamp01 from the CD).
#include "stdafx.h"
#include "stdlib.h"
#include "string.h"
2. In this code, we use strncpy rather than strcpy to fill the address referenced by
pszLastName with data. strncpy is often preferred over strcpy because it helps
prevent buffer overruns�you can control the number of characters copied.
Because we've used strncpy, we have to take care of terminating the string
referenced by pszLastName, so we begin by placing an ASCII 0 character at the
end of the target buffer for szLastName. To compute the target address for the
string terminator, we simply take the string length of szLastName, add it to the
address contained in pszLastName, and add 1.
3. Unfortunately, this code also assumes that the malloc call won't fail. When
malloc fails, it returns NULL into pszLastName. This address is then used when
we compute where to put the string terminator. Since it's 0, we're effectively
attempting to place an ASCII 0 at a memory address that's equivalent to the
length of the string referenced by szLast Name plus 1. So, rather than a plain
NULL reference, we are referring to address 0x00000006�5 (the length of
"Smith") + 1.
b. Register eax contains the return value from malloc. Because we know the
call will fail, we know that this value is NULL or 0x00000000. This value is
moved into pszLastName immediately before our attempt to set up the
string terminator.
c. Line 14 computes the string length of szLastName, adds that value to the
previous value stored in pszLastName plus 1, and attempts to treat this
new value as an address (to dereference it) so that it can assign the
string terminator. The actual dereference (and the cause of the ensuing
access violation) appears in bold type in Listing 4.2.
5. Because address 0x00000006 is within the first 64K of the process address
space, an access violation is raised when we attempt to dereference it.
In the next exercise, we'll cause a NULL pointer reference by overwriting a pointer
value. This is a common problem in applications, especially those that feature
pointers prominently such as C and C++.
Here's a fairly contrived example that demonstrates, once again, the usefulness of
the NULL pointer access partition.
1. Load the app shown in Listing 4.3 from the CD (CH04\memexamp02) and
compile it.
#include "stdafx.h"
#include "stdlib.h"
#include "string.h"
#define MAX_FIRST_NAME_SIZE 10
#define MAX_LAST_NAME_SIZE 30
#pragma pack(1)
struct NAME
{
char szFirstName[MAX_FIRST_NAME_SIZE];
char *pszLastName;
} nmEmployee;
int main(int argc, char* argv[])
{
int dwFirstNameLen=__min(strlen(argv[1]),MAX_FIRST_NAME_SIZE);
int dwLastNameLen=__min(strlen(argv[2]),MAX_LAST_NAME_SIZE);
nmEmployee.pszLastName=(char *)malloc(dwLastNameLen);
strncpy(nmEmployee.szFirstName,argv[1],dwFirstNameLen);
strncpy(nmEmployee.pszLastName,argv[2],dwLastNameLen);
nmEmployee.szFirstName[dwFirstNameLen+2]='\0';
nmEmployee.pszLastName[dwLastNameLen+2]='\0';
strupr(nmEmployee.pszLastName);
This code will work fine so long as the first argument passed into it is 8
characters or less. Thanks to the faulty pointer arithmetic used throughout the
app, but especially when the name strings are terminated, a first name that's
longer than 8 characters will cause the pszLastName pointer to be overwritten
with an ASCII 0.
2. To see how this works, set the command line parameters (Alt+F7 | Debug |
Program arguments) to "Wolfgangus Mozart" (without quotes).
3. Set a breakpoint at the line that assigns the string terminator for szFirstName:
nmEmployee.szFirstName[dwFirstNameLen+2]='\0';
4. Now, run the app from inside the VC++ IDE. When the debugger stops at your
breakpoint, add nmEmployee to your Watch window, then expand it so that
you can see its members as you step through the code.
5. Press F10 to step over the breakpoint line. You should notice in the Watch
window that not only was szFirstName changed by the line just executed but
pszLastName was changed as well (both members should appear red in the
Watch window). This is because the ASCII 0 assigned to the end of szFirstName
was actually written 3 bytes past the end of the string. Because szFirstName is
10 characters wide and because arrays in C++ are always zero-based, the
valid indexes for szFirstName are 0�9. However, dwFirstNameLen equals 10.
Assigning ASCII 0 to szFirstName[dwFirstNameLen] would have also
overwritten pszLastName but would have gotten only the first byte of the four-
byte pointer. Adding 2 to this offset pushes us into the third byte of the
pszLastName pointer. By zeroing this byte, we change the address to one that
happens to be in the first 64K of the process address space.
6. Now attempt to step over the next line. Because the previous line corrupted
the pszLastName pointer, you should see an access violation. The specific
reason for the access violation is that you are referencing an address in the
first 64K of memory, and Windows' NULL pointer access partition protection
has caught that invalid reference.
I mentioned earlier in the chapter that you can retrieve the system's page size and
allocation granularity through a call to the GetSystemInfo Win32 API function. In this
next exercise, you'll build and run a SQL Server extended procedure that returns this
same information.
1. Copy the xp_sysinfo project from the CH04\xp_sysinfo subfolder on the book's
CD onto your hard drive and load it into Visual C++. For curious readers,
Listing 4.4 shows the complete source code of the xp_sysinfo extended
procedure.
DBCHAR colname[MAXCOLNAME];
DBCHAR szProcType[MAX_PATH];
DBCHAR szMinAddress[MAXCOLNAME];
DBCHAR szMaxAddress[MAXCOLNAME];
DBCHAR szAffinityMask[MAXCOLNAME];
SYSTEM_INFO si;
GetSystemInfo(&si);
wsprintf(colname, "AllocationGranularity");
srv_describe(srvproc, 2, colname, SRV_NULLTERM, SRVINT4,
sizeof(DBINT), SRVINT4, sizeof(DBINT),
&si.dwAllocationGranularity);
wsprintf(colname, "NumberOfProcessors");
srv_describe(srvproc, 3, colname, SRV_NULLTERM, SRVINT4,
sizeof(DBINT), SRVINT4, sizeof(DBINT),
&si.dwNumberOfProcessors);
wsprintf(colname, "ProcessorType");
switch (si.wProcessorArchitecture)
{
case PROCESSOR_ARCHITECTURE_INTEL :
{
strcpy(szProcType,"Intel ");
switch (si.wProcessorLevel)
{
case 3 :
{
strcat(szProcType,"386");
break;
}
case 4 :
{
strcat(szProcType,"486");
break;
}
case 5 :
{
strcat(szProcType,"Pentium");
break;
}
case 6 :
{
strcat(szProcType,"Pentium II or Pentium Pro or later");
break;
}
case 7 :
{
strcat(szProcType,"Pentium III");
break;
}
case 8 :
{
strcat(szProcType,"Pentium 4");
break;
}
default :
{
strcat(szProcType,"Unknown");
break;
}
}
break;
}
case PROCESSOR_ARCHITECTURE_MIPS :
{
strcpy(szProcType,"MIPS ");
switch (si.wProcessorLevel)
{
case 4:
{
strcat(szProcType,"R4000");
break;
}
default:
{
strcat(szProcType,"Unknown");
break;
}
}
break;
}
case PROCESSOR_ARCHITECTURE_ALPHA :
{
strcpy(szProcType,"Alpha ");
switch (si.wProcessorLevel)
{
case 21064:
{
strcat(szProcType,"21064");
break;
}
case 21066:
{
strcat(szProcType,"21066");
break;
}
case 21164:
{
strcat(szProcType,"21164");
break;
}
default:
{
strcat(szProcType,"Unknown");
break;
}
}
break;
}
case PROCESSOR_ARCHITECTURE_PPC :
{
strcpy(szProcType,"PPC ");
switch (si.wProcessorLevel)
{
case 1:
{
strcpy(szProcType, "601");
break;
}
case 3:
{
strcpy(szProcType, "603");
break;
}
case 4:
{
strcpy(szProcType, "604");
break;
}
case 6:
{
strcpy(szProcType, "603+");
break;
}
case 9:
{
strcpy(szProcType, "604+");
break;
}
case 20:
{
strcpy(szProcType, "620");
break;
}
default:
{
strcat(szProcType,"Unknown");
break;
}
}
break;
}
default :
{
strcpy(szProcType,"Unknown ");
break;
}
}
srv_describe(srvproc, 4, colname, SRV_NULLTERM, SRVCHAR,
strlen(szProcType), SRVCHAR, strlen(szProcType),
&szProcType);
wsprintf(colname, "ProcessorAffinityMask");
wsprintf(szAffinityMask,"0x%08X",si.dwActiveProcessorMask);
srv_describe(srvproc, 5, colname, SRV_NULLTERM, SRVCHAR,
strlen(szAffinityMask), SRVCHAR, strlen(szAffinityMask),
&szAffinityMask);
wsprintf(colname, "MinimumAppAddress");
wsprintf(szMinAddress,"0x%08X",si.lpMinimumApplicationAddress);
srv_describe(srvproc, 6, colname, SRV_NULLTERM, SRVCHAR,
strlen(szMinAddress), SRVCHAR, strlen(szMinAddress),
&szMinAddress);
wsprintf(colname, "MaximumAppAddress");
wsprintf(szMaxAddress,"0x%08X",si.lpMaximumApplicationAddress);
srv_describe(srvproc, 7, colname, SRV_NULLTERM, SRVCHAR,
strlen(szMaxAddress), SRVCHAR, strlen(szMaxAddress),
&szMaxAddress);
wsprintf(colname, "UserModeAddressSpace");
DWORD dwUserModeSpace = ((DWORD)si.lpMaximumApplicationAddress -
(DWORD)si.lpMinimumApplicationAddress);
srv_describe(srvproc, 8, colname, SRV_NULLTERM, SRVINT4,
sizeof(DBINT), SRVINT4, sizeof(DBINT), &dwUserModeSpace);
srv_sendrow(srvproc);
return XP_NOERROR;
2. Compile the project. This should produce a DLL named xp_sysinfo.dll in the
Release subfolder under your root xp_sysinfo folder.
3. Copy xp_sysinfo.dll to the binn folder under your SQL Server installation's root
folder. If you've worked through the exercises in previous chapters, you may be
asked whether to replace the existing xp_sysinfo. Answer Yes to this prompt.
sp_addextendedproc 'xp_sysinfo','xp_sysinfo.dll'
5. Run xp_sysinfo from Query Analyzer. You should see output something like this
(results abridged):
As you can see, the system page size is 4K and the allocation granularity is
64K. Note that these numbers may differ on other processors or in future
versions of Windows.
Note also the UserModeSpace column. On this machine, the maximum user mode
space is roughly 2GB. This tells us that the /3GB boot option was not successfully
enabled. Since SQL Server is a large-address-aware application, it would reflect a
user mode address space of roughly 3GB if it were running on an appropriate version
of Windows and the system had been booted with /3GB.
The x86 family of processors has a memory page size of 4K. This means that all
memory allocations under Windows are actually carried out in multiples of 4K. For
example, a 5K allocation request actually requires 8K of memory.
AWE and /3GB provide applications mechanisms for accessing memory beyond the
standard 2GB user mode partition. The /3GB option actually limits the total amount
of physical memory that Windows can manage, so it is generally not recommended.
AWE is the more flexible of the two and can make all the physical memory that's
visible to the operating system available to applications.
3. True or false: A page fault causes an exception to be raised that will crash an
application if the application does not trap it with structured exception-
handling (SEH) code.
4. If you enable the /3GB option on Windows 2000 Professional, how much user
mode address space will SQL Server be allotted when it starts up?
5. True or false: Address translation refers to the two-step process in which the
two components of a virtual address, the page table index and the page offset,
are used to translate a virtual address into a physical one.
6. True or false: Thrashing is the condition in which physical memory pages are
continually swapped to and from the system paging file, often preventing
applications from running in a timely fashion.
7. What address region is set aside by Windows to help applications detect NULL
pointer assignments?
8. How large is the default user mode space in a 32-bit Windows process?
9. How much total physical memory can Windows 2000 Data Center manage?
10. True or false: Using the AWE functions causes the kernel mode space to be so
compressed that only 16GB of total physical memory can be accessed by
Windows.
11. What VC++ linker switch enables an executable to be large address aware?
12. Before support for the /3GB boot option was added to Windows, how many bits
in a virtual address could a user mode application use to reference virtual
memory directly?
13. True or false: The system paging file can actually consist of several physical
files that may reside on different disk drives.
14. What does Task Manager's Mem Usage column indicate for a process?
16. What Windows API function covered in this chapter will return both the system
page size and the system allocation granularity?
17.
True or false: The PEB is not allocated at a specific address in a process's
virtual address space, and its location will almost always vary between
processes.
18. True or false: All processor chips supported by Windows have some form of
built-in memory protection.
19. What's the typical difference between Perfmon's Process:Private Bytes and
Process:Page File Bytes counters?
20. True or false: Because Windows is a 32-bit operating system, all user processes
have a flat 4GB address space.
22. What does Task Manager's VM Size column indicate for a process?
23. What Win32 API function covered in this chapter can you use to deduce
whether a process has an oversized user mode address space?
24. True or false: Because the largest integer a 32-bit pointer can store is 232, the
maximum memory that a user mode application may access is 4GB.
25. True or false: The majority of the physical storage used to implement virtual
memory comes from the physical RAM installed in the machine.
26. What special-purpose register is used to store the location of the page
directory on x86 processors?
27. True or false: If a process needs more than the standard 2GB of virtual memory
space, AWE is generally preferred over the /3GB option.
29. True or false: The shared user data page is actually backed by a page in kernel
mode memory.
30. True or false: Although an application can specify the size of a memory
allocation it wants to make, it cannot specify the precise location for the
allocation.
Virtual Memory
Windows offers three distinct types of memory to applications: virtual memory,
heaps, and shared memory. Virtual memory is best used for managing large arrays
or collections of objects or structures of varying sizes. It is the primary mechanism
by which SQL Server allocates memory and is the focus of this section.
You allocate virtual memory using the VirtualAlloc and VirtualAllocEx API functions.
VirtualAlloc allocates memory only in the calling process's address space;
VirtualAllocEx can allocate memory in another process's address space. VirtualAlloc
is by far the more commonly used of the two, and it's the one we'll use throughout
this chapter.
Pages in virtual memory are always in one of three states: free, reserved, or
committed. You use VirtualAlloc to reserve and/or commit virtual memory, and you
use VirtualFree to decommit and/or release allocated memory. Released memory is
not reserved or committed�it's free.
Page size� the memory page size that a given processor architecture
requires. On the x86, this is 4K. All Windows memory allocations must occur in
multiples of the system page size.
Reserved memory� a region of virtual memory addresses that has been set
aside for use by a process. A reserved region does not require physical storage.
Memory reservations should always be made on allocation granularity
boundaries. Reserved memory cannot be accessed until it is committed.
Guard page� a page that has been flagged with the PAGE_GUARD page
protection attribute. The first time a process attempts to access the guard
page, Windows fails the operation and either raises a STATUS_GUARD_PAGE
exception or returns a last error code of STATUS_GUARD_PAGE_VIOLATION. This
also resets the page's guard status, so the next attempt to access it succeeds.
Function Description
NOTE: Execute-only access is not supported by the x86 family of processors. As far
as the x86 family is concerned, if a page is readable, it is also executable. This
means, for example, that the PAGE_EXECUTE and PAGE_ READONLY protection
attributes are functionally equivalent when Windows is running on an x86 processor.
It's not uncommon for memory allocation routines to make use of the PAGE_GUARD
protection attribute to detect buffer overruns. The way this works is that, for every
allocation it makes, the routine will allocate a page with the PAGE_GUARD attribute
set (a guard page) just after the newly allocated region. If a memory access then
attempts to write past the end of the allocated region, a STATUS_GUARD_PAGE
exception is raised and the access fails.
When a process is started, Windows protects the executable's code pages with the
PAGE_EXECUTE_READ attribute. This allows multiple copies of the same executable
to share the same physical storage. For example, if you run multiple instances of
Explorer, only one copy of explorer.exe is physically mapped into memory.
Copy-On-Write
Knowing that multiple instances of an application share the same physical storage
for the executable's code pages, you may be wondering about its data pages. If all
instances of an executable are mapped to the same physical storage, how can one
instance change, say, a global variable without affecting the others? The answer lies
in understanding the PAGE_WRITECOPY page protection attribute. When a process
starts, Windows protects its data pages with the PAGE_WRITECOPY attribute. When a
process modifies one of its data pages, it gets a private copy of that page. This
functionality is known as copy-on-write and conserves physical storage while still
allowing instances of an executable to make changes to global data without
affecting other instances.
The Windows NT family of operating systems (Windows NT, Windows 2000, Windows
XP, and Windows Server 2003) has always supported copy-on-write functionality.
Most flavors of UNIX also support some type of copy-on-write functionality. However,
some operating systems (such as Windows 9x and OpenVMS) do not. In operating
systems that do not offer copy-on-write functionality, the standard practice is to
make a private copy of all of an executable's data pages when a process first starts.
Obviously, the approach taken in the Windows NT family is much more efficient.
In the Windows NT family, the system always allocates space in the system paging
file to accommodate an executable's data pages. However, the storage set aside for
each data page is not actually used until the page is written to. This conserves
physical storage while still guaranteeing that an application will be able to write to
its data pages when it needs to.
WARNING: Don't pass the PAGE_WRITECOPY or PAGE_EXECUTE_WRITECOPY
attributes when calling VirtualAlloc to reserve or commit memory. If you do, the
allocation will fail with an ERROR_INVALID_PARAMETER error. These page protection
attributes are reserved for system use only.
Reserving Memory
Windows allows a process to reserve memory address space without actually
consuming any committed pages or affecting the process's page file quota (which is
not necessarily page file space but rather limits the number of committed pages a
process can consume). A virtual memory reservation simply sets aside a contiguous
block of process address space; it does not actually make any new memory available
to the application for use. The application must commit the memory in order to use
it.
Most developers aren't accustomed to being able to set the exact address where a
region of memory will be reserved. Generally, memory allocation facilities such as
malloc and the C++ new operator do not permit an application to specify where
memory will be allocated�they only allow the application to control the size of the
allocation. Windows, however, gives the developer control over both aspects of an
allocation, which can have important implications for how an application is coded, as
you'll see in just a moment.
Even though it has reserved the full stack space for the thread, Windows waits to
commit pages within that region until they're needed. It begins by committing just
one page in the reserved region and flags the page just beyond it as a guard page.
When the system attempts to expand a thread's stack into the guard page, Windows
traps the STATUS_GUARD_ PAGE exception that results and expands the stack by
committing the guard page (the page's guard status was already reset by the
attempted access). It then flags the next page following the newly committed page
as a guard page, and the thread is allowed to continue to execute. This process
continues until the end of the region originally reserved for the thread's stack is
reached. In this way, we're able to ensure that the address area used by the thread
stack is contiguous, but we don't use physical storage until we absolutely have to.
As I've mentioned, the allocation granularity on 32-bit Windows is 64K. You should
always reserve memory in allocation granularity�sized chunks because the
unreserved address space that's left when you reserve only part of a 64K region is
inaccessible to user mode allocation requests. Given that the starting address of
each reservation request you make is rounded down to the nearest allocation
boundary, there's no way for an application to force Windows to reserve the
orphaned address space. Over time, this can lead to the process running out of
address space even if there's plenty of physical storage available, which can cause
catastrophic problems for an application. For example, getting down to less than
1MB of contiguous address space will prevent most applications from creating new
threads since the default thread stack size is 1MB. (For SQL Server, the default
thread stack size has been reduced to .5MB, but it is still quite possible to exhaust
the virtual memory address space to the point that new worker threads cannot be
created.)
Committing Memory
You must commit virtual memory before you can use it. Attempting to access
memory that has only been reserved will cause an access violation. Committing a
region of virtual memory is as simple as calling VirtualAlloc with the MEM_COMMIT
flag. You can commit at the same time you reserve, or you can use a separate call to
VirtualAlloc. If you want to reserve and commit simultaneously, you use a bitwise OR
operator to combine the MEM_RESERVE and MEM_COMMIT flags, like this:
If you commit the pages in a reserved region in a separate operation, you're not
required to commit all of them at once. You can select individual pages within the
region to commit. This allows you to easily set up sparsely populated data structures
that combine the benefits of contiguous address blocks with the efficiency of
allocating physical storage only when needed. SQL Server's buffer pool is a good
example of this type of sparse data structure. It is reserved in its entirety at process
startup, and individual pages are committed within it as needed.
The size of any commit request you issue will be rounded up to the nearest page
boundary. For example, if you attempt to commit a 10K region, your request will be
rounded up to 12K. Here's an example of a virtual memory commit request.
...
If a committed page is private and has never been accessed, it is created the first
time it's accessed as a zero-initialized page (also known as a demand zero page).
This means that each virtual memory page starts out filled with zeros.
Windows automatically writes a private committed page that has been modified to
disk as demands on system memory resources require. Windows writes committed
pages to disk through the normal modified page writing process, which moves pages
from the system working set to the modified list and then to disk. You can cause
mapped file pages to be written immediately to disk by calling the Win32
FlushViewOfFile API function. Of course, Windows will automatically reload a page
from disk into physical memory as necessary.
If you've modified a page but want Windows to treat it as though it was not modified
so that it will not be paged to disk when system memory demands dictate, you can
call VirtualAlloc with the MEM_RESET flag. This tells Windows that you don't want the
data on the page preserved if the system determines that it needs the physical
memory the page occupies to satisfy other memory requests. If you reset the
contents of a page currently in the system paging file, the page will be discarded. If
the page is in physical memory, it will be marked as not modified so that the system
can simply overwrite it if it needs that particular memory page. The next time the
page is accessed, it will be filled with zeros, as it was when it was first accessed.
Properly structured, your code can use the knowledge that a page is zeroed when
reset to detect when it needs to reload the data for the page.
Note that VirtualAlloc rounds the base address and allocation size differently when
you pass the MEM_RESET flag. Normally, it rounds the base address down to the
nearest page or allocation granularity boundary and rounds the allocation size up to
the nearest integral number of pages. However, when you pass MEM_RESET into
VirtualAlloc, it rounds the base address up to the nearest page boundary and rounds
the allocation size down to the nearest page boundary. This is done in order to keep
you from resetting a page by accident. If you want to reset a page, it must be
completely encompassed within the region you supply to VirtualAlloc�pages not
wholly contained within the region will not be reset.
Freeing Memory
Committed pages are either private (not shareable) or mapped to shared memory.
Once a page has been committed, an application is free to access it. It can call
VirtualProtect to change the protection attributes on the page, VirtualLock to lock
the page in physical memory, and VirtualFree to release the page.
You can call VirtualFree to release the storage that's been committed for an address
block without releasing its reservation. This is called decommitting. When you
decommit, you can specify how much of the committed region to free up. You can
also free up the address range associated with an allocation. This is called releasing.
When you release a block of virtual memory addresses, you may not specify how
much of the region to release�either the entire region is released or none of it is.
Here are some examples.
...
...
...
Note that pages locked into physical memory using VirtualLock can still be paged to
disk in some circumstances. If all the threads in a process are in a wait state,
Windows is free to prune such pages from the system working set, which would
ultimately result in their being written to disk if they have been modified.
Exercises
In this next exercise, you'll explore what transpires when an application reserves
virtual memory. You'll learn about the way in which VirtualAlloc rounds allocation
requests to page boundaries, and you'll see how the system allocation granularity
affects virtual memory reservations.
void *pv=VirtualAlloc(
(void *)0x7FF01000,
4096,
MEM_RESERVE,
PAGE_READWRITE);
if (pv) {
MEMORY_BASIC_INFORMATION mbi;
DWORD dwLen=sizeof(MEMORY_BASIC_INFORMATION);
VirtualQueryEx(GetCurrentProcess(),pv,&mbi,dwLen);
printf("%d bytes reserved at 0x%08X.\n",
mbi.RegionSize,pv);
}
else printf("Error reserving mem %d.\n",
GetLastError());
return 0;
}
4. The app next calls VirtualQueryEx to retrieve the size of the newly reserved
region, which it then displays. We use VirtualQueryEx rather than VirtualQuery
because there are situations where VirtualQuery can return inaccurate
information on systems with huge amounts of memory.
5. Run the app and compare its output to the original VirtualAlloc request. Two
elements in the output should stand out. Your output should look something
like the following:
6. First, notice that we reserved 8192 bytes rather than 4096. Why is that? It's
because the starting address and allocation size we specified caused the
reservation to span two pages in the virtual memory address space. As I
mentioned earlier, if you specify a starting address for a reservation, it is
always rounded down to the nearest allocation granularity boundary. If the size
of the reservation causes it to span a page, the reservation is rounded up to
the next page boundary. Hence, in this case, we end up reserving two pages
instead of one.
7. Second, notice the starting address of the reservation. It differs from the one
we specified. As I've mentioned, the Windows memory manager will round
down a specified starting address so that it properly aligns with the system
page size and allocation granularity. Since this is a reservation, the starting
address is rounded down to the nearest allocation granularity. If this were
instead a commit request, it would be rounded down to the nearest page
boundary.
8. Last, take note of the region size returned by VirtualQueryEx. Even though the
system allocation granularity is 64K, we reserve only 8K. This means that the
remaining virtual memory addresses between the end of the reservation and
the next allocation granularity boundary (the 56K of address space between
0x7FF02000 and 0x7FF0FFFF) are wasted. They're not accessible because any
attempted reservation within this area will be rounded down to start at
0x7FF00000. As a rule, you should never reserve less than 64K of virtual
memory space. Doing so wastes address space without really providing any
upside. It doesn't conserve memory because you're not actually allocating
memory when you reserve it�you're merely flagging a range of addresses as
in use by the application. You can always commit individual pages within a
reserved range, so it's not as though reserving 64K means you have to also
commit 64K worth of physical storage. And keep in mind that if you run a
process out of address space (regardless of the amount of available physical
storage), your app will likely go down in flames�any new reservation request
for your process (including system-initiated reservations, such as one to
reserve a new thread stack) will fail.
In this next exercise, you'll walk through the process of reserving a region of
memory, then committing and releasing individual pages within that region. After
each step, we'll print out the status of each page in the region to verify that what we
think is happening actually is.
2. The app begins by allocating a 64K region of address space. It then commits
the second page in this space. It then decommits this page and finishes up by
releasing the entire 64K region.
#include "stdafx.h"
#include "conio.h"
#include "windows.h"
MEMORY_BASIC_INFORMATION mbi;
DWORD dwLen=sizeof(mbi);
char * pCur=pV;
while ((DWORD)pCur < ((DWORD)pV + dwRegionSize)) {
VirtualQueryEx(GetCurrentProcess(),pCur,&mbi,dwLen);
if (pv) {
DumpRegionMemoryStatus("Memory status after reserving the
region",pv,REGIONSIZE);
VirtualAlloc((void *)(pv+4096),4096,
MEM_COMMIT,PAGE_READWRITE);
DumpRegionMemoryStatus("Memory status after committing a
page",pv,REGIONSIZE);
VirtualFree((void *)(pv+4096),4096,MEM_DECOMMIT);
DumpRegionMemoryStatus("Memory status after decommitting a
page",pv,REGIONSIZE);
VirtualFree((void *)pv,0,MEM_RELEASE);
DumpRegionMemoryStatus("Memory status after releasing the
region",pv,REGIONSIZE);
}
else printf("Error reserving mem %d.\n",GetLastError());
return 0;
}
4. Run the app and study the output. Your output should look something like this:
As you can see, committing an individual page within a reserved region is fairly
trivial. Decommitting is equally simple, as is releasing the entire region.
In this next exercise, you'll learn how the PAGE_GUARD page protection attribute
works. You'll allocate a memory buffer that's initially guarded, then turn off the guard
attribute for one of its pages by attempting to lock it in memory.
1. Let's start with the source code to vm_guard, the sample app we'll use to
investigate how PAGE_GUARD works. Take a quick look at the code in Listing
4.7 and see whether you can understand how it works on first glance. I'll go
through it step-by-step in just a moment.
#include "stdafx.h"
#include "conio.h"
#include "windows.h"
if (pv) {
VirtualFree((void *)pv,0,MEM_RELEASE);
}
else printf("Error reserving/committing memory. Last error=
%d.\n",GetLastError());
return 0;
}
2. Load this code from the CH04\vm_guard subfolder on the CD and compile and
run it. Your output should look like this:
3. This code begins by allocating a virtual memory block that it protects with the
PAGE_GUARD attribute. The entirety of the block is off limits to access because
of PAGE_GUARD.
4. It then attempts to lock the first page of the block into physical memory using
VirtualLock. (I've hard-coded the page size for simplicity's sake; you should
always use GetSystemInfo to retrieve the system page size at runtime in your
own code.)
5. This first VirtualLock call has two results: it fails, and it resets the PAGE_GUARD
attribute for the first page in the region.
6. You'll note that the GetLastError output from the failed call is 0x80000001,
which is equivalent to the STATUS_GUARD_PAGE_VIOLATION return code I
mentioned earlier in the chapter.
7. Because it resets the PAGE_GUARD status for the first page of the region, the
second attempt to lock this page succeeds. This is how PAGE_GUARD works:
You get a one-shot failure mechanism that can help you detect invalid page
accesses.
NOTE: Note that VirtualQuery(Ex) always reports the page protection attributes of a
page as it was originally allocated�neither changes made with VirtualProtect nor
those made as a result of the PAGE_GUARD attribute being reset are reflected in the
output from VirtualQuery(Ex). I was surprised when I initially discovered this, but it is
consistent with the Platform SDK documentation.
Let's begin with the source code to the xproc. Based on what you've learned thus far
about virtual memory, take a quick look through the code in Listing 4.8 and see if
you can figure out how it works. I'll go through it step-by-step in just a moment.
#include <stdafx.h>
#define XP_NOERROR 0
#define XP_ERROR 1
#define MAXADDRLEN 12
#define MAXSIZELEN 12
#define MAXPROTLEN 128
#define MAXSTATELEN 20
#define MAXTYPELEN 20
#ifdef __cplusplus
extern "C" {
#endif
#ifdef __cplusplus
}
#endif
bool bByPage=false;
DWORD dwParams=srv_rpcparams(srvproc);
if (1==dwParams) {
BYTE bType;
ULONG cbMaxLen;
ULONG cbActualLen;
char szByPage[2];
BOOL fNull;
wsprintf(szColName, "Address");
srv_describe(srvproc, 1, szColName, SRV_NULLTERM, SRVCHAR,
MAXADDRLEN, SRVCHAR, 0, NULL);
wsprintf(szColName, "Size");
srv_describe(srvproc, 2, szColName, SRV_NULLTERM, SRVCHAR,
MAXSIZELEN, SRVCHAR, 0, NULL);
wsprintf(szColName, "Protection");
srv_describe(srvproc, 3, szColName, SRV_NULLTERM, SRVCHAR,
MAXPROTLEN, SRVCHAR, 0, NULL);
wsprintf(szColName, "State");
srv_describe(srvproc, 4, szColName, SRV_NULLTERM, SRVCHAR,
MAXSTATELEN, SRVCHAR, 0, NULL);
wsprintf(szColName, "Type");
srv_describe(srvproc, 5, szColName, SRV_NULLTERM, SRVCHAR,
MAXTYPELEN, SRVCHAR, 0, NULL);
char szProt[256];
char szState[256];
char szType[256];
char szBase[12];
char szSize[12];
MEMORY_BASIC_INFORMATION mbi;
int i=0;
while ((pszStart) &&
(pszStart<si.lpMaximumApplicationAddress)) {
if (szType[0]) szType[strlen(szType)-1]='\0';
else strcpy(szType,"Unknown");
return XP_NOERROR ;
}
1. Copy the binary for this xproc from CH04\xp_vmquery on the CD to the binn
folder under your SQL Server installation.
2. Install it into the master database by running this command in Query Analyzer:
sp_addextendedproc 'xp_vmquery','xp_vmquery.dll'
xp_vmquery
6. Use OSQL to run xp_vmquery with its 'P' option (page mode) in order to view
the allocation information for every page in the SQL Server process (as
opposed to each region, as in step 4). I suggest you run this via OSQL in order
to avoid running out of virtual memory in Query Analyzer as xp_vmquery will
return hundreds of thousands of rows when executed in page mode.
Every page in virtual memory is in one of three states: reserved, committed, or free.
An application can allocate specific pages in memory or can allow Windows to
choose the precise location of the pages allocated to fulfill an allocation request.
The physical storage behind virtual memory is often the system paging file, though,
of course, some pages will be backed by physical memory. Virtual memory can also
be backed by a file on disk. Windows uses this ability to share an application's
executable code between multiple instances of it. The physical storage behind the
virtual memory set aside in each process to store the application's code and data is
the EXE or DLL file itself. When an application attempts to change one of its data
pages, Windows' copy-on-write facility makes a copy of the page and instructs the
modifying process to use it instead of the original. This way, multiple instances share
as many pages as possible of the underlying executable's code and data as long as
possible.
1. True or false: Even if I reserve only a 32K virtual memory address range,
Windows still reserves a 64K range because of the system allocation
granularity.
2. What page protection attribute does Windows use to mark the end of the
committed range of a thread stack?
3. True or false: Even though a page has been locked in physical memory via a
call to VirtualLock, it can still be paged to disk if memory demands dictate.
4. What happens when a process attempts to modify a page that has been
flagged with the PAGE_WRITECOPY protection attribute?
5. True or false: SQL Server makes the majority of its memory allocations via the
system heap.
8. True or false: All virtual memory requests�regardless of whether they are user
mode or kernel mode requests�are subject to the system allocation
granularity.
11. What Win32 API function is used to release a reserved region of virtual memory
addresses?
12.
True or false: The PAGE_EXECUTE and PAGE_READONLY attributes are
functionally equivalent on x86 processors.
13. True or false: An application can specify the size of a virtual memory allocation
but not the exact location�the Windows memory manager decides the precise
memory location of an allocation.
14. What flag can you pass into VirtualAlloc to tell it to reset a region of committed
memory in order to prevent that memory from being swapped to disk if
memory demands dictate?
15. True or false: You call the VirtualReserve function to reserve a region of virtual
memory addresses that you do not yet wish to commit.
16. True or false: The VirtualRelease function can release a region of reserved
virtual memory without first decommitting it.
17. What VC++ linker switch can an application developer use to adjust the
default stack size?
18. True or false: You can cause mapped file pages to be written immediately to
disk by calling the Win32 FlushViewOfFile API function.
19. True or false: Attempting to access a page with the PAGE_GUARD protection
attribute causes a STATUS_GUARD_PAGE exception to be raised and resets the
guard page protection.
20. True or false: Virtual memory that has been decommitted is still reserved until
you release it.
21. What Windows API function does SQL Server call when the set working set size
option has been enabled?
22. Is it possible to alter the amount of virtual memory reserved for a new thread's
stack via the call to CreateThread?
23. By default, how many virtual memory pages can a process lock in physical
memory?
24. True or false: The VirtualDecommit API function can be used to decommit
previously committed virtual memory pages that span allocation granularity
boundaries.
25. True or false: When committing a region of reserved memory, Windows will
round the commit request to the nearest allocation granularity boundary.
Heaps
A heap is a memory region consisting of one or more pages of reserved space that
can be suballocated into smaller pieces by the heap manager. Heaps are most useful
for allocating large numbers of similarly sized, relatively small objects and
structures. You should not use heaps for blocks of 1MB or more; use VirtualAlloc and
company for large allocations such as this.
On the plus side, heaps allow you to ignore the system's allocation granularity and
page size boundaries. On the negative side, heaps are a bit slower to access and
don't provide the same level of control that the virtual memory APIs do. For example,
you can't reserve a heap region without also committing it�VirtualAlloc is the only
Win32 allocation function that separates these two functions.
The exact algorithms used by the heap manager to commit and decommit physical
storage for heaps are undocumented and have changed between releases of
Windows. If you need precise information about and/or control over the process the
heap manager uses to manage the physical storage behind heaps, don't use heaps
in the first place�use virtual memory instead.
Default heap� the built-in heap that Windows provides every process by
default. The default process heap has a base size of 1MB, but this can be
changed via a linker switch.
Private (or custom) heap� a heap created by a process for its own private
use that is separate from the default process heap.
Key Heap APIs
Function Description
Windows provides every process with a default heap. Applications use the default
heap to service allocation facilities such as malloc and the C++ new operator.
Several Win32 API functions also make use of the default heap, including the old 16-
bit LocalAlloc and GlobalAlloc functions.
A process's heap is 1MB in size by default but can be changed via the /HEAP linker
switch (in Visual C++; most other compilers provide a similar option) or with utilities
that can edit an executable's file header. This small size is why heaps aren't ideal for
large allocations. You should leave those to virtual memory.
By default, Windows serializes access to the process heap. This means that only one
thread at a time can access the default heap. This prevents multithreaded heap
synchronization errors and protects the heap from corruption. Note that you can
disable this synchronization for individual allocations from the heap, but this is not
generally recommended.
You cannot destroy the default process heap with a call to HeapDestroy. If you pass
the handle of the default heap into HeapDestroy, the system ignores the call. If you
want to limit the physical size of the default heap, use the /HEAP linker option.
In order to allocate memory from a heap, HeapAlloc must perform the following
steps.
1. Scan the linked list of allocated and freed blocks for the first free block that is
large enough to satisfy the request.
3. Add the new block to the linked list managed by the heap manager.
As its name suggests, the HEAP_ZERO_MEMORY flag causes each block of allocated
memory to be zero-filled just as virtual memory pages are zero-filled on their first
access. This can be handy for tracking down uninitialized buffer errors.
Custom Heaps
An application can create a custom heap by calling the HeapCreate function. There
are several good reasons for creating your own custom heap, including the following.
Proximity allocations� by allocating things close to each other, you lower the
possibility that the system will thrash when iterating through a list of memory
objects.
When you create a custom heap, you can specify the HEAP_NO_SERIALIZE or
HEAP_GENERATE_EXCEPTIONS flags or a combination of the two. As I mentioned
earlier, HEAP_NO_SERIALIZE disables serialized access to the heap. Thread
serialization is enabled by default. I'll cover this more below, but, generally
speaking, you should not use this option unless you're absolutely sure you do not
need thread synchronization when accessing your heap.
The third parameter to HeapCreate specifies the heap's maximum size. Specify 0 if
you want to create a heap with no fixed size limit.
You can use HeapDestroy to destroy a custom heap. If you fail to destroy a custom
heap, it remains in memory until the process terminates.
Heap Serialization
When you create a custom heap, the HEAP_NO_SERIALIZE flag controls whether
access to the heap is automatically serialized. When an app has more than one
thread accessing a heap whose synchronization has been disabled, multiple threads
can simultaneously grab the same memory block, and you have a veritable time
bomb waiting to go off at the most inopportune moment. The fact that you've got
such a serious bug may not be immediately evident. For example, you may not see a
problem until your app is executed on a multiprocessor machine or on a machine
with a much faster processor than your development machine. Thread
synchronization errors are notoriously difficult to track down because they are
almost always timing related. The very act of stepping through your code under a
debugger can make them seem to go away.
One thread might free a block that other threads are still using. These threads
then overwrite unallocated memory, which, in turn, corrupts the heap.
Your process has multiple threads, but only one of them accesses the heap.
Your process is multithreaded and multiple threads access the heap, but your
app handles serializing their heap access itself.
Exercises
You can override the C++ new and delete operators in order to allocate objects from
custom heaps. When a C++ compiler encounters a call to new, it checks to see
whether the class has overloaded the new operator. If it has, the compiler generates
a call to that function rather than generating code that allocates the object on the
default heap. You can use operator overloading to cause new to use any memory
allocation facility you choose. The code in the exercise below overloads new and
delete to allocate objects on a custom heap.
1. Load the code shown in Listing 4.9 from the CH04\heapnew subfolder on the
book's CD or type it into the VC++ environment, then compile and run it.
#include "stdafx.h"
#include "windows.h"
class CMemObj {
public:
static HANDLE s_hPrivateHeap;
static DWORD s_dwBlocks;
void* operator new (size_t size);
void operator delete(void *p);
};
HANDLE CMemObj::s_hPrivateHeap=NULL;
DWORD CMemObj::s_dwBlocks=0;
//Delete the first object from the heap (this causes the heap
//to be released)
delete pMO;
printf("Custom heap after second delete: %d block(s),
handle=0x%08x\n",CMemObj::s_dwBlocks,
CMemObj::s_hPrivateHeap);
return 1;
}
2. Run this application and observe its output. Your output should look something
like this:
5. You may be wondering why the code uses static members for the private heap
handle and the block counter. The reason for this is that we want all instances
of CMemObj to share the same private heap. If these members were not
declared as static, each CMemObj would get its own private heap�not only
wasteful but also illogical. The whole point of the design is to allocate
CMemObj instances from a common private heap. Because all the objects will
be the same size and are relatively small, this is an efficient and logical use of
a heap.
6. You may also be wondering why the code uses a separate member variable to
track the number of allocations from the private heap. After all, couldn't we get
this same information from the HeapWalk API function without requiring a
separate member variable? Yes, we certainly could; however, this would be
terribly inefficient. For every allocation or deallocation, we'd have to walk the
entirety of the heap and count up the blocks that make it up, being careful to
skip those allocations that are for maintenance of the heap itself (the heap's
overhead). In an app that made a large number of allocations, this would
negatively affect performance and would likely have a detrimental impact on
CPU use.
In this next exercise, you'll use an extended procedure to allocate custom heaps
within SQL Server. You'll store some data in these heaps, then return it via a query.
Because you will be attaching with a debugger, you should work through this
exercise only on a test or development machine, and, ideally, you should be its only
user.
3. Attach to SQL Server with WinDbg. When the WinDbg command prompt
displays, type !heap to display a list of the heaps currently allocated by the
process. You will likely see quite a few entries in this list, perhaps as many as
20 or 30. Take note of the exact number, then type g and hit Enter to allow SQL
Server to continue running.
4. Now load arrays.sql from the CD's CH04\xp_array subfolder into Query
Analyzer and run it. This will install a number of user-defined functions that
make calling the xprocs you've just installed very easy.
5. Next, load leapheap.sql from CH04\xp_array and run it. leapheap.sql will create
a heap-based array using the xprocs you installed earlier and will then load
into it a couple of columns from the Northwind..Orders table.
6. Return to the debugger and press Ctrl+Break to stop execution, then type
!heap at the command prompt to again list the heaps that have been allocated
within the SQL Server process. This list should match the one you saw earlier
because, by default, the xp_array code makes its allocations from the default
process heap�it doesn't create a private heap.
to
SET @hdl=fn_createarray(1000, 1)
This will cause the xp_array code to allocate its own heap when fn_createarray
is first called. Note: You should never use this option when calling the xp_array
code from multiple connections. Always use the default system heap when
there is a possibility that multiple worker threads may call into xp_array
simultaneously.
9. Use the mouse to select the entirety of the script text in Query Analyzer up to
(but not including) the call to fn_destroyarray, then press F5 to run it. By not
including the call to fn_destroyarray in the call to the server, we'll leave the
array and its heap in memory for now. Congratulations, you've just created
your own private heap within the SQL Server process space!
10. Return to the debugger, press Ctrl+Break, and run !heap again in the
command window. You should see that a new heap has shown up in the list.
11. Next, run !heap 0 to list segment information for each heap. A heap can
consist of up to 64 separate segments. Each time Windows needs to grow a
heap, it will allocate a new segment for it. The last heap in the list�the one
you just created�should have just one segment.
12. To verify that this is our heap, let's search for some of its data. The segment
information for your new heap should list its starting and ending addresses. It
should look something like this:
23: 10010000
Segment at 10010000 to 10020000 (00008000 bytes committed)
The numbers set in bold are the segment's starting and ending addresses.
13. Use the starting and ending addresses to search for the entry in the array
corresponding to Northwind CustomerId "TOMSP", like this:
The WinDbg s command searches a region of memory for a data value. Its -a
parameter tells it to search for an ANSI string. The two memory addresses
indicate the beginning and ending of the search range, and, of course, the
character string in single quotes specifies the value we want to search for.
Once you run this, you should see displayed the memory address at which this
value resides in your heap. Though this isn't conclusive, it's a pretty good
indication that this is the heap our xprocs used. Type g and press Enter to allow
SQL Server to continue to run.
14. Return to Query Analyzer and load leakheaps.sql from the CH04\xp_ array
subfolder on the CD and run it. This will cause 128 new heaps to be created
within the SQL Server process.
15. Return to WinDbg and press Ctrl+Break to stop the SQL Server process. Run
!heap again at the command prompt. You should see your new heaps in the
heap list.
There is no functional limit to the number of heaps you can allocate within a
process. I can't think of a practical reason to allocate as many as we've
allocated in this exercise, but be aware that it is technically possible. When you
need to dig into what heaps have been allocated within a process and what
they contain, WinDbg's !heap is a good way to start.
16. Type q and press Enter to stop debugging SQL Server, then exit the debugger.
Heap Recap
A heap is a block of memory that's made up of one or more pages of reserved space
that is suballocated by the heap manager. A heap is most useful for allocating
similarly sized, relatively small objects and structures. A heap should not be used to
allocate blocks of 1MB or more in size; virtual memory functions such as VirtualAlloc
should be used instead.
An application can create custom heaps as necessary. One key decision in creating a
custom heap is whether to have the heap manager serialize access to the heap. If
an app has multiple threads and these threads make their own allocations from the
heap, access to it must be serialized in order to prevent the heap from becoming
corrupted. If an app is single-threaded or provides its own thread synchronization
mechanisms, it may be able to safely disable heap serialization on custom heaps.
2. True or false: Before destroying a heap with HeapDestroy, a process should use
HeapFree to free any allocations it has made from it.
3. Are allocations from a heap subject to the limitations imposed by the system
page size and the system allocation granularity?
7. True or false: Because pointers may already reference a block of heap memory
allocated via HeapAlloc, the heap manager will not move a heap block once it
has been allocated.
8.
What maximum size should you specify to HeapCreate if you want to create a
heap that automatically grows as allocations are made from it?
9. What Win32 API function returns a handle to the default process heap?
10. True or false: Several Win32 API functions make use of the default process
heap.
11. What flag should you pass into HeapAlloc if you want it to zero-fill a newly
allocated page?
12. True or false: If you fail to destroy a custom heap, it remains in memory until
the process terminates.
14. True or false: Attempting to destroy the default process heap will cause an
access violation.
16. True or false: By default, Windows does not serialize access to the default
process heap, but you can create a private heap and enable serialization if
your app requires it.
18. Can the Perfmon tool monitor the system paging file size?
19. True or false: STATUS_NO_MEMORY is an exception that the heap manager may
raise in certain circumstances.
Shared Memory
In this section, we'll discuss Windows' shared memory facilities. Shared memory is
memory that is visible to multiple processes or that is present in the virtual address
space of multiple processes. It's the tool of choice when you need to rapidly
exchange data between multiple processes. Because data is exchanged via shared
virtual memory pages, each process that accesses it must already know how to
interpret it and how to work with it. Unlike data that comes in over a TCP/IP socket
or, say, through a Windows message, data accessed via shared memory consists of
committed pages within a process's address space. The process must know
something about what these pages should contain in order to make use of their
data.
Memory-mapped file� a file on disk that has been mapped into virtual
memory such that it serves as the physical storage for the virtual memory.
MapViewOfFile Maps a view of a file into memory such that the file serves as the
physical storage for the memory. The file can be a file on disk or
the system paging file.
Shared memory is used in a number of places within SQL Server. A prime example of
this is the shared memory Net-Library. When a client application resides on the same
machine as SQL Server, it can connect to the server using the shared memory Net-
Library, for example, by specifying the server name as . (a period) or (local) or by
prefixing the server\instance name with lpc:. This means that, rather than
communicating using a protocol such as TCP/IP or Named Pipes and the full network
stack, the client and server use a simple shared memory buffer to exchange data.
Because both processes are running on the same machine, this is not only sensible
but also far more efficient than using the network stack.
You may be wondering how access to this shared memory area is coordinated�that
is, how can we keep the server from reading client-side data before it's ready and
vice versa? This is handled using a named event object. Think back to our discussion
of event objects in Chapter 3. In order to synchronize access to the memory area
used by the shared memory Net-Library, the client and server signal an event object
to tell the other party when it can safely access the buffer. So, the server enters a
wait state by calling WaitForSingleObject on this event object, and the client signals
the event when it's ready for the server to access the buffer. The client then calls
WaitForSingleObject and waits on the event object itself. When the server finishes its
work in the buffer, it signals the event, and the client takes over again. This process
continues as long as the client remains connected to the server.
Section Objects
You can connect a section object to an open disk file to create a mapped file or to
committed virtual memory in order to set up shared memory. A section object that is
committed to virtual memory is considered "paging file backed" because its pages
can be written to the system paging file as necessary. Keep in mind that because
Windows can run without a paging file, these pages might instead be backed by
physical memory.
Memory-Mapped Files
Windows provides a set of API functions that support mapping a file to a region of
virtual addresses. An application can use these functions to conveniently perform file
I/O by making a file appear in the virtual address space as memory. To manipulate
the file, the application simply reads and writes the virtual memory associated with
it.
3. Call MapViewOfFile to map the file into the process's address space. The
pointer returned by MapViewOfFile is the starting address of the memory
region that's mapped to the file.
If the application doesn't want to map a file on disk in order to perform I/O on it but
is interested only in setting up a shared memory region, the app follows just two
required steps.
An executable image (an EXE or DLL file) that serves as the physical storage for a
region of virtual addresses is a type of memory-mapped file. Just as an application
can connect a region of addresses with a file on disk, Windows uses its own file-
mapping facility to make executable images easy to load and process.
Note that some types of media require Windows to copy the entirety of an
executable image into virtual memory rather than allowing it to reside on disk. An
image that's loaded from a floppy disk will be copied in its entirety into virtual
memory. This is done so that setup programs loaded from floppy can continue to run
even after the floppy has been swapped for another during the setup process.
Loading an image from other types of removable media such as a CD-ROM or
network drive does not cause Windows to copy the image into virtual memory unless
it was linked with the /SWAPRUN:CD or /SWAPRUN:NET switches.
Exercises
In the next exercise, we'll use shared memory to share data between multiple
processes, and we'll use a synchronization object to make access to this data thread-
safe. You'll create a single application, then spawn multiple instances of it to see how
processes can share data between them by using shared memory.
3. Type Y a few times to allow the app to continue its modification loop. You'll see
that it retrieves an integer from shared memory, then increments and assigns
it back to shared memory.
4.
After a few iterations of this, leave sharedmem_client.exe running and return to
Explorer. Double-click sharedmem_client.exe a second time in Explorer to start
a second instance of it. You'll notice that it appears to hang.
5. Return to the first instance of the app and type Y to allow it to continue. It will
now appear to hang.
6. Return to the second instance and you'll see that it finally began to run. What's
happening here is that the app uses a named event object to synchronize
access to the shared memory area. This means that only one instance of the
app can modify it at a time. When one app has control of the shared memory
area, the other must wait for it to complete before continuing. You'll notice that
each execution of the modification loop, regardless of which process it is,
increments the integer by 1.
For curious readers, Listing 4.10 shows the code for sharedmem_client.exe.
#include "stdafx.h"
#include "windows.h"
#include "conio.h"
if (INVALID_HANDLE_VALUE==hEvent) return 0;
LPVOID lpSharedMemory;
DWORD dwValue;
HANDLE hMapFile;
if (NULL == hMapFile)
{
printf("Could not create file mapping object.
Last error=%s\n",GetLastError());
return 1;
}
if (NULL == lpSharedMemory)
{
printf("Could not map view of file.
Last error=%s\n",GetLastError());
return 1;
}
__try
{
char ch='N';
do {
//Wait on the object to be signaled
//Since it's an auto-reset event, this also resets it
WaitForSingleObject(hEvent,INFINITE);
printf("Continue? ");
ch=getche();
return 0;
}
10. We next create a named file-mapping object that's backed by the system
paging file. As with the event object, every process that attempts to create this
named object will get a handle to the same kernel object if it already exists.
This means that multiple instances of this executable will refer to the same file-
mapping object and will thus use the same shared memory area.
11. Once we've created the file-mapping object, we set up the shared memory
itself through the call to MapViewOfFile. The pointer returned by this function is
the start of the shared memory area.
12. We then call WaitForSingleObject to wait on the event. Since the event was
created in a signaled state (see the third parameter to CreateEvent),
WaitForSingleObject will return immediately when it is called by the first
instance of sharedmem_client.exe. Because the event was created as an auto-
reset event (see CreateEvent's second parameter), the event will immediately
return to a nonsignaled state as soon as WaitForSingleObject successfully waits
on it. This keeps multiple instances of the process from gaining access to the
shared memory region simultaneously. Each instance must wait until the first
one to successfully wait on the event signals it.
13. We next retrieve the value from shared memory by simply dereferencing and
casting the first DWORD in the buffer. We then increment the private copy of
the integer and assign it back to the shared memory buffer.
14. We finish up by asking the user whether we should continue in the update
loop. If the user types Y, we attempt to go through another round of
incrementing and printing the first DWORD in the shared memory area. Once
the user responds, we immediately signal the event, regardless of the
response. If another instance of the app is waiting on the event, it will be
allowed to run before we can cycle back through the loop. We will then wait
until it finishes with the shared memory area and signals the event. This
continues perpetually until all client instances have been terminated.
NOTE: I wouldn't normally recommend that you code apps so that they hang on to a
kernel object while they wait on user input because this could block other apps
needing access to the object indefinitely. I've coded this example to do so in order to
make the progression of events easier to follow.
In this last exercise, we'll create a shared memory object that will be accessed by a
couple of processes, then we'll view this object using WinObj, a tool from the
Platform SDK.
The app we'll be using to create a shared memory object we can inspect with WinObj
is called SuperRecorder. It is an app I originally wrote about ten years ago and have
enhanced a few times since then. To understand how and why it uses shared
memory, let me give you some of its history and discuss how it evolved over time.
Some of you old-timers may remember the Windows 3.x Recorder accessory.
Recorder allowed you to record mouse and keyboard events as a macro and play
back that macro in any app with a keystroke combination. The ability to record both
keyboard and mouse events was unusual at the time, and, being the keyboard-
centric guy that I am, I used the tool very heavily and had numerous macros defined
on my Win 3.x systems.
I had built several programmer's editors in the 1980s that included macro
functionality of varying levels of sophistication (e.g., Cheetah, TurboEdit, TEdit, and
so on), so I had a pretty good idea of what I wanted to do and how I would handle
the mechanics of storing lists of input events. The question was how to do it within
the Windows environment.
I began researching this and found that recording and playing back Windows
messages, as I'd originally thought I might do, would not work reliably. Posting a
WM_KEYDOWN for VK_SHIFT followed by VK_A didn't necessarily result in a capital A
being typed into the current app. This was before tools like VB's SendKeys function
existed (and that's a keyboard-only mechanism anyway), so I was left to find another
way.
I finally discovered how Recorder itself had pulled off its magic�it was using a
combination of global system hooks (to trap keyboard and mouse events) and the
journal record/playback API functions. The global system hooks allowed it to capture
keyboard and mouse events in a manner similar to an interrupt service routine in
DOS that some of my programmer's editors had used. The journaling API functions
allowed Recorder to reliably record and play back both mouse and keyboard events.
I decided to take this same approach in my component. I began by building a DLL
(system hooks require a DLL to host their callback function), then wrapped the
functions exposed by the DLL in a component.
I called this library/component combo WinMacro and sold it for a few years over
CompuServe and other venues before the World Wide Web really took off. It allowed
a Windows developer to drop a component into an app and instantly have all the
functionality of the Recorder accessory programmatically available in the app. The
app could start or stop recording, record mouse as well as keyboard events, and play
them back based on a mouse/keyboard event or through an API call. I used the
facility in several of my own apps (e.g., the DB-Library and ODBC versions of my
Sequin SQL Editor for Windows app) and even gave a talk at a developer conference
on how I'd built the library and how it worked internally. Life was good.
Then came 32-bit Windows. On testing the WinMacro demo app I'd built,
SuperRecorder, on the first version of Windows NT (Windows NT 3.1), I discovered
that, while macros recorded in a process would play back in that process, other
processes were completely unaware of them. One of the niftier features of WinMacro
was that you could record a macro in, say, Notepad, then go to Word and play it
back. This allowed you to create global, system-wide macros that had the same
hotkeys and played back the same regardless of the app. The same functionality had
been available in the original Recorder accessory. However, try as I might, I could
not get it to work in my initial tests on the pubescent Windows NT.
I looked into this a bit and discovered why WinMacro macros recorded in one process
would not play back in another. Win32's process isolation was preventing the linked
list of macros I'd recorded in one process from being visible to other processes. I had
designed the original WinMacro to depend on the inherent process memory sharing
in Windows 3.x, which was, by design, no longer there in 32-bit Windows.
(I also discovered that the Recorder accessory had vanished in 32-bit Windows�it
was nowhere to be found. I have always suspected that this was because it worked
the same way SuperRecorder did and would have required a rewrite in order to run
on 32-bit Windows.)
Clearly, the design I'd used for 16-bit Windows wasn't going to work on Win32, so, in
about 1994, I set out to rewrite WinMacro for Win32. I called the new version
WinMac32.
For WinMac32, the fundamental problem I had to solve was how to make linked lists
that may have been allocated in one process visible to other processes. I soon
discovered Win32's shared memory facilities and designed WinMac32 to use shared
memory to store its linked list of macros and keyboard/mouse events. This, coupled
with system hooks and DLL injection, allowed me to provide the same functionality
to Win32 apps that I'd originally provided Win 3.x apps. WinMac32 has shipped on
the CD with a couple of my other books, and the full source to it (along with the old
Win 3.x source) is included on the CD accompanying this book. The library's
SendKeys and AppActivate routines, which do not actually use the engine itself, have
been shipped with Borland's Delphi product since version 4.0.
For purposes of this exercise, we'll start the WinMac32 demo app, SuperRecorder,
and record a macro, then we'll go to Notepad and play that macro back. While the
two apps are running, we'll check for the named WinMac32 shared memory object
using WinObj.
3. Type anything you want in the scratch area. Feel free to use the mouse to
select some of the text you type, then cut, copy, and paste it.
5. Now, start Notepad and press the backquote key. You should see your keyboard
and mouse events played back.
6. Start WinObj (from the Platform SDK) and click the node in the tree labeled
BaseNamedObjects.
7. Scroll the list of named objects to the right until you find an object named
WinMac32SharedData. Double-click this object to display its properties. You'll
see that, among other things, WinObj knows that it's a section (shared
memory) object. This is the shared memory object that WinMac32 uses to store
its shared data structures and is the means by which memory allocations made
by one process (macro recordings) can be accessed by another.
8. While we're at it, you can use the TList utility to check out the DLL injection
technique that WinMac32 uses to insert itself into processes as you type or use
the mouse. Given that it has to set system hooks (using the
SetWindowsHookEx API) to grab both the keyboard and mouse events before a
process sees them, WinMac32 will actually cause itself to be loaded into every
process in which you type or use the mouse. While SuperRecorder and
Notepad are still running, run TList from the command prompt to list the
process IDs for all processes.
9. Take the IDs returned for SuperRecorder and Notepad and pass each of them
into TList separately to list the modules loaded into each process. You'll notice
that WinMac32 is loaded into the SuperRecorder process space. This is no big
surprise given that SuperRecorder is a demo app for the library. But you'll also
notice that WinMac32 is loaded into Notepad's process space. Now, we know
that Notepad didn't explicitly load WinMac32 or reference it via an implicit
import, so how did WinMac32 get loaded into the process? Through DLL
injection. Because the hooks that WinMac32 sets are system-wide and because
the callback code to service those hooks lives in WinMac32.DLL itself, Windows
loads the DLL into every process where the hook code will need to
execute�that is, into every process where you type or use the mouse. In fact,
if you use TList to check the module list for the WinObj tool you were just
running, you'll find WinMac32.DLL loaded there as well because you've typed
or used the mouse in it since SuperRecorder was started.
10. If you now close the SuperRecorder application, you'll see that not only does
this behavior stop (you won't see WinMac32.DLL injected into any new
processes) but WinMac32.DLL is also unloaded from all existing processes. I
coded it this way so that you could easily turn off the macro facility by closing
the recorder app. Although, technically, you can record in any app and play
back in any other, I coded the demo app so that closing it turns off the macro
engine and unloads it across the system. If I hadn't done this, you'd have had
to close every process in which WinMac32.DLL had been injected in order to
unload it completely from memory. DLL injection has its pros and cons. One of
the cons is that it can spread like a plague throughout a system, and I didn't
believe that users would want to have to forcibly eradicate it from all running
processes in order to disable it.
WinMac32 makes use of several Win32 API functions that you may want to explore
further. It sets four system-wide hooks using SetWindowsHookEx (a keyboard hook, a
mouse hook, a journal record hook, and a journal playback hook). It installs a
message handler for interprocess communication (with CreateWindow) and a shared
memory area backed by the system paging file using the CreateMappedFile and
MapViewOfFile API functions.
The journaling API functions are how we trap keyboard and mouse events in a
process and play them back reliably. They necessarily interact at a very low level
with the Windows input subsystems, and I've seen them cause blue screens on
systems with faulty keyboard or mouse drivers.
Other than the occasional recompile, I haven't upgraded WinMac32 in years, and I
noticed that it became somewhat unstable with the advent of Service Pack 3 for
Windows NT 4. I haven't taken the time to investigate what changed in this service
pack and may not do so anytime soon given that I no longer sell WinMac32. One of
the advantages of giving away software free of charge is that you can decide how
much time you have to spend supporting it. I haven't really had time to support any
of my freeware in years, and I doubt that will change anytime in the near future.
If you want to compile WinMac32 yourself, you'll need Delphi 2.0 or later. To compile
WinMacro (the Windows 3.x version), you'll need Delphi 1.0. For the embedded
assembly language, you'll need an assembler compatible with TASM 1.5 or later
(MASM will probably do). For more recent versions of Delphi, you can just use the
embedded assembler�no need for separate assembly.
Note that you can also view section objects using WinDbg's !handle command. To do
so, follow these steps.
1. Execute !handle at the WinDbg command prompt to get a list of all the
process-local handles in the process's handle table. Running !handle without
parameters causes it to list every handle (and its object type) in the process's
handle table.
2. Find the handles in the list identified as section objects. Run !handle again, this
time specifying the handle number of each section object handle you want to
view. You can also pass an option mask to !handle that tells it what information
you'd like listed for the handle. For example, say that you're wanting to list
information for handle number a3c, a section object. You could do something
like this:
!handle a3c 4
In this example, a3c is the handle number, and 4 is the option mask. Passing 4 to
!handle tells it to list the name of the object if there is one.
Windows provides a rich set of facilities for allocating and working with shared
memory. Shared memory and memory-mapped files are used throughout the
operating system itself; using them is as easy as using any other type of memory.
Sharing data between processes or mapping a file into virtual memory is relatively
painless and quite powerful.
2. True or false: The system paging file can be used as the physical storage for
shared memory.
3. Does Windows zero-fill shared memory pages on first access in the same way
that it zero-fills private committed pages?
4. When a client connects to SQL Server using shared memory, what type of
kernel object do SQL Server and the client use to coordinate access to the
shared memory area?
5. What Win32 API function is used to write the modified pages in a mapped file
immediately to disk?
7. True or false: When Windows loads an executable into virtual memory from a
hard drive, it uses the system paging file to provide the physical storage for
the executable's code and data.
8. What Windows API function actually provides the pointer to a shared memory
area?
9. True or false: Though shared memory is slower than exchanging data using the
network stack, it is still the tool of choice when you want to exchange data
reliably.
10. True or false: lthough it is used to implement shared memory, a section object
does not necessarily equate to shared memory because it can also be used by
a single process to map a file into virtual memory.
11. True or false: In order to initialize a shared memory area for use, an application
must first call the Win32 API function AllocateUserPhysicalPages, then call
MapUserPhysicalPages to map a view of the shared memory into the virtual
address space.
12. What kernel object is responsible for implementing shared memory and
memory-mapped files?
13. What WinDbg command can you use to display information about a kernel
object that corresponds to a shared memory region?
14. Assume that I'm connecting to a SQL Server named khen\ss2k_sp4 via an
instance of Query Analyzer that's running on the same machine as the server.
What four-character prefix can I use with my server name to force Query
Analyzer to attempt to connect using the shared memory Net-Library?
Chapter 5. I/O Fundamentals
Free thought is the repudiation of all coercion of authority or tradition in
philosophy, theology, and ideology. It is the commitment to the theory that the
power of cultural institutions can be morally exercised only when that power is
limited by guaranteed and protected civil liberties possessed equally by all
citizens.
�Richard Bozarth[1]
I have always believed that the best way to learn about a technology is to use it to
build things. Every computer technology I've ever learned�from hardware to
operating systems to programming languages to applications and tools�I've learned
by getting my hands dirty. There is no substitute for practical experience; there's no
better way to get your mind around how something works than to see it for yourself.
I still remember the first time I experienced the wonderment that comes from
experimenting with computing technology and seeing it come to life before your
very eyes. It was a Saturday afternoon, and I had sat down at a computer terminal
with the goal of learning my first programming language. After about five minutes of
reading through syntax diagrams and some exceedingly dull prose in the language
manual, I decided just to build an application myself and see what would happen. It
would be my very first program.
Most people build something along the lines of a "Hello World" app their first time
out with a new language. Not me. For me, the power of the computer was in its
ability to do repetitive tasks extremely fast, not in its ability to produce output.
Humans can produce output�carbon dioxide, cave drawings, the Mona Lisa, this
book�and they don't need computers to do it. For me, the most intriguing aspect of
the computer was its ability to carry out logic-based tasks faster and more precisely
than I or any other human could ever hope to. For me, the most useful innovation in
computing was the concept of the logical loop�it was to computing what the
invention of the wheel had been to mankind. I could see all sorts of seemingly
complex problems suddenly becoming readily solvable because of the computer's
ability to precisely repeat a given task over and over until a logical condition became
true or some type of resolution was reached.
So, my first program was an app that accepted as input an integer and produced the
factors of the number as output. The eureka feeling I experienced when I hit the
submit key and watched the output of my app appear magically on the CRT is almost
beyond description, and it is still vivid in my mind to this day. This has been my
standard approach to learning new technologies for over two decades now, and I
never grow tired of the profound satisfaction that comes from seeing my handiwork
take flight with the press of a key or the click of a mouse.
In this chapter, we'll investigate Windows' I/O facilities using this hands-on approach.
We'll talk about conceptual definitions and how things work architecturally, then
we'll dive in and write some code. Before we're done, you'll have a good grasp of
how Windows I/O facilities work and how applications such as SQL Server can make
use of them.
I/O Basics
We'll begin our exploration of Windows I/O with the basics. We'll talk about key I/O
terms and concepts, then delve into how Windows I/O works from an architectural
standpoint.
File system� the overall structure in which files are named, stored, and
organized in an operating system. File systems consist of drives, files,
directories, and the metadata needed to locate and access them. A file system
is not only the logical representation of the machine's secondary storage but
also the part of the operating system responsible for translating application file
operation requests into low-level, sector-oriented calls into the device drivers
that control the disk drives.
A file system's format defines the way in which file data is stored and directly
affects the file system's features. For example, a format that doesn't allow files
to be larger than 2GB obviously limits the file sizes the file system can support.
The Windows NT family supports several file systems (e.g., FAT16, FAT32, NTFS,
and so on), but NTFS is its native file system. NTFS provides more features and
performs more efficiently than any other file system Windows supports. Cluster
indexes are 64 bits wide in NTFS, so it can theoretically address up to 16EB (16
exabytes�16 billion GB) of disk space. However, since Windows limits NTFS
volume sizes so that they remain addressable with 32-bit cluster indexes, the
largest an NTFS volume can be is 128TB (using 64K clusters).
File object� the kernel object used to access files and devices under
Windows. Like other types of kernel objects, file objects are system resources
that multiple processes can share, that can be named, that support
synchronization (i.e., the notion of being signaled or unsignaled), and that can
be protected by object-based security.
Synchronous I/O� causes a thread to pause until a pending I/O operation
completes.
Function Description
CreateFile Creates or opens a file; can also be used to create or open other types
of objects
CloseHandle Closes the handle associated with a kernel object (e.g., a file)
Perfmon
pmon
TaskMgr
Perfmon is the undisputed king of I/O-related information under Windows. There are
too many counters to go into here�just understand that you can retrieve a veritable
treasure trove of I/O-related information from Perfmon. Table 5.3 lists a few of the
more useful I/O-related performance counters.
Counter Description
Physical Disk:% Disk The percentage of elapsed time spent servicing read
Read Time requests
Physical Disk:% Disk The percentage of elapsed time spent servicing write
Write Time requests
Physical Disk:Avg. Disk The average number of read and write requests queued to
Queue Length the disk during a sample interval
Counter Description
Cache:Lazy Write The rate at which the lazywriter thread is flushing data to
Flushes/sec disk
Cache:Lazy Write The rate at which the lazywriter thread is writing pages to
Pages/sec disk
Overview
In Windows, you perform most I/O operations via kernel file objects. File objects are
unusual in that they are not strictly memory constructs. They are memory objects
that provide access to the actual resource, which is on disk. Unlike events,
semaphores, and other types of kernel objects, file objects do not manage resources
that reside only in memory�they provide a means of interacting with resources that
reside primarily outside of memory.
If you've done any Windows programming, you may have noticed that the SDK
documentation recommends that you use CreateFile rather than OpenFile when
opening a file. (OpenFile has been deprecated and is for backward compatibility with
16-bit apps only.) Why is this? Does it not seem a little counterintuitive to have to
create something when you only want to open it? What if the file already exists?
Would this overwrite it? No, it wouldn't�not if you call it properly. Here's why: You
use CreateFile rather than OpenFile to open a file because you are creating a kernel
object, not because you are (necessarily) creating a file on disk. What you're
creating is a kernel object that will allow you to read from or write to the file using
other Win32 API functions. CreateFile creates a kernel file object and returns a
process-local handle to it. Whether it attempts to create a physical file on disk
depends entirely on the parameters you pass into it. It can open an existing file, or it
can create a new one�it all depends on how you call it.
Because it's not the actual resource it manages but only an in-memory
representation of it, a file object contains only data that is specific to a particular
object handle. The file itself contains the shareable data. When a thread opens a file
handle, Windows creates a new file object with its own set of handle-based
attributes. So, even though multiple file objects may reference the same file, each
one contains data that is specific to its handle (e.g., the current offset in the file
where the next I/O operation will occur).
Each file object is process-unique unless a process duplicates a file handle in another
process via a call to the DuplicateHandle API function. In other words, even though,
as with other kernel objects, file objects can obviously have names, two processes
that open a file object with the same name will get two different objects. This differs
from other types of kernel objects and is the result of the resource itself residing on
disk rather than in memory.
Even though a file handle is process-local, the file itself isn't; therefore, threads must
synchronize their access to it, just as they must for any other shared resource. We
wouldn't want one thread writing to a file while another is trying to read it, for
example. A thread that intends to write to a file should either open the file with
exclusive write access or use the LockFile Win32 API function to block other threads
from accessing it while the writes are occurring.
Synchronous I/O
Most application I/O operations are synchronous. This means that the calling thread
waits on the operation to complete before proceeding. Windows' synchronous I/O
facility mirrors the I/O facilities found in other operating systems and programming
environments, and it's semantically familiar to most developers.
Applications perform synchronous I/O by calling the basic Win32 I/O functions and
waiting on them to complete. If a file isn't explicitly opened for asynchronous I/O,
threads will wait on I/O operations against it to complete before proceeding. Let's
look at some code.
Exercise
The app below dumps the end of a text file to the console. There are various
versions of tail utilities out there; this is just a simple one to demonstrate
synchronous file I/O using the Win32 I/O functions. You can specify both the file to
dump and an optional offset from the end of the file at which to begin listing file
contents. If you don't specify an offset, the utility will either dump the last 1K of the
file or its last one-fourth, whichever is smaller.
1. Load and compile the app shown in Listing 5.1 from the CH05\tail subfolder on
the book's CD.
#include "stdafx.h"
#include "windows.h"
#include "stdlib.h"
if (INVALID_HANDLE_VALUE==hFile) {
printf("Unable to open file %s. Last error=%d\n",
argv[1],GetLastError());
return 1;
}
DWORD dwFileOfs=1024;
if (argc>=3)
dwFileOfs=atoi(argv[2]);
DWORD dwFileSizeHigh;
DWORD dwFileSizeLow;
dwFileSizeLow=GetFileSize(hFile,&dwFileSizeHigh);
if ((-1==dwFileSizeLow) &&
(NO_ERROR!=(dwError=GetLastError()))) {
printf("Unable to get the size of file %s. Last error=%d\n",
argv[1],GetLastError());
return 1;
}
HeapFree(GetProcessHeap(),0,pszTail);
CloseHandle(hFile);
return 0;
}
2. Run the app either from the VC++ development environment or from the
command prompt and pass in the name of a text file for which you want to list
the final 1K.
3. As you can see, this app takes a file name as an input and lists the last n bytes
of it. This is handy for large files where you're only interested in the end of the
file and don't want to have to list or search the entire file to reach the end.
4. The app begins by calling CreateFile. The CreateFile call stipulates that the file
must exist (OPEN_EXISTING) or the function will fail. By not passing the
FILE_FLAG_OVERLAPPED flag into CreateFile, the app is indicating that it wants
to perform I/O against the file in a synchronous manner.
5. Once the file is open, we compute the offset at which to begin listing file
contents. This defaults to the last 1K of the file but can also be specified on the
command line to the utility.
6. We next call SetFilePointer, the Win32 API responsible for adjusting the current
file offset. SetFilePointer moves the file pointer to a relative position based on
the beginning of the file, the current file position, or the end of the file.
Because we pass in FILE_END, we're specifying that we want to move relative
to the end of the file. In order to move the file pointer backward, you must
specify a negative file offset, so we multiply the previously computed file offset
by �1 as we pass it into SetFilePointer. Given that we opened the file in
synchronous I/O mode, the process's main thread will block while the file
pointer is moved.
SetFilePointer can fail, but since we have computed the offset in a manner that
should be foolproof, I've omitted error-checking code for simplicity's sake.
7. We next allocate a buffer from the process heap to hold the section of the file
that we'll read and display. We allocate the buffer using the
HEAP_ZERO_MEMORY switch. Given that we are going to overwrite all but the
last character in the buffer when we read the file, it would actually have been
more efficient to have set only the final character in our buffer to 0, but I am
lazy, so I'll let the heap manager zero the entire buffer for me. Notice that we
set the size of this buffer to be one character larger than the size of the region
we'll read from the end of the file. We do this in order to ensure that the buffer
will end with an ASCII 0 so that we can safely write the buffer to the console
using printf.
8. We next call the Win32 ReadFile API function in order to read the end of the file
into memory. ReadFile can also fail and can return fewer bytes than were
requested. However, for simplicity's sake, I've omitted any error-handling code
and assumed that we'll get a valid buffer back from ReadFile.
9. Because the file is opened for synchronous I/O, we pass NULL into ReadFile for
the OVERLAPPED structure parameter. For asynchronous I/O, we'd pass a
pointer to a valid structure. Given that we're doing synchronous I/O, ReadFile
will block until the read operation completes.
10. We finish up by writing the buffer we've just read to the console, then we free
the buffer and close the file handle. You call the Win32 API function
CloseHandle to close a file when you're done processing it.
So, that's synchronous I/O in a nutshell. It's very similar to the I/O mechanisms you'll
find in other operating systems and programming environments: You open a file, you
read or write to it, then you close it�not terribly complicated.
I/O Basics Recap
Windows provides a rich set of I/O-related facilities. We've touched on a few of them
here; we'll explore the rest in the remainder of the chapter.
Most I/O in Windows occurs via a file handle. You use CreateFile rather than OpenFile
to open an existing file because you're creating a kernel file object. Whether the
system creates a new disk file for you or opens an existing one depends on the
parameters you pass into the CreateFile call.
A kernel file object is different from other kernel objects in that it does not actually
contain the resource it manages. The resource resides on disk�it's the file itself. If
multiple threads need to make changes to the same file simultaneously, you'll
obviously have to synchronize their access to it, just as you would any other shared
resource.
Synchronous I/O is probably the most common type of I/O used by applications. A
synchronous I/O operation blocks the calling thread until it completes. This type of
I/O is common in applications and operating systems and is the most basic type of
file I/O supported by Windows.
Windows provides a standard set of API functions that make performing synchronous
I/O as easy as possible in a Windows app. You open a file for synchronous I/O just as
you do for any other type of I/O: by calling CreateFile. You read the file with ReadFile
and write to it using WriteFile. When you're finished, you close your file handle by
calling the CloseHandle API function.
1. When opening an existing file, which Win32 API should you call, OpenFile or
CreateFile?
3. True or false: When you're finished with a file you've opened, you should
always close it via the CloseFile Win32 API function.
4. Which Windows API function can you call to move the current file pointer?
5. If you instruct Windows to open an existing file but the file does not actually
exist, what handle value is returned to your application?
8. True or false: A thread can call the LockFile API function to lock part of a file
and prevent other threads from accessing it while it writes to the file.
9. Can the SetFilePointer Win32 API function fail?
10. In contrast to other types of kernel objects, where does the actual resource
managed by a file object reside?
Asynchronous and Nonbuffered I/O
In this section, we'll continue the discussion of Windows' I/O facilities and delve into
asynchronous and nonbuffered I/O. If you haven't yet read the first part of this
chapter, you should probably take a quick read through it before continuing.
Sector� a hardware-addressable block on a storage medium such as a hard
disk. The sector size for hard disks on x86 computers is almost always 512
bytes. This means that if Windows wants to write to byte 11 or 27, it must
write to the first sector of the disk; if it wants to write to byte 1961, it must
write to the fourth sector of the disk. You can retrieve the sector size for a disk
via the GetDiskFreeSpace Win32 API function.
Cluster� an addressable block of sectors that many file systems use. A
cluster is the smallest unit of storage for a file. A cluster is usually larger than a
sector and is always a multiple of the sector size. Clusters are used by file
systems to manage disk space more efficiently than would be possible with
individual sectors. By encompassing multiple sectors, a cluster helps divide a
disk into more manageable pieces. The downsides are that large cluster sizes
may waste disk space or result in fragmentation since file sizes aren't usually
exact multiples of the cluster size. You can retrieve the cluster size of a disk via
the GetDiskFreeSpace Win32 API function.
Overlapped I/O� a synonym for asynchronous I/O.
Nonbuffered I/O� allows Windows to open a file without intermediate
buffering or caching. When combined with asynchronous I/O, nonbuffered I/O
gives the best overall asynchronous performance because operations are not
slowed down by the synchronous operations of the memory manager. That
said, some operations will actually be slower because data cannot be loaded
from the cache. SQL Server uses nonbuffered I/O to eliminate the latency
between logical disk writes and the data being physically written to disk.
APC� asynchronous procedure call, a special type of callback function that
executes in an asynchronous fashion in the context of a given thread. Each
thread has its own APC queue. There are two types of APCs�kernel mode APCs
and user mode APCs. When a kernel mode APC is queued to a thread, the APC
will execute the next time the thread is scheduled. When a user mode APC is
queued to a thread, the APC will execute the next time the thread enters an
alertable state. A thread enters an alertable state when it calls a wait function
that supports alerting. Examples of alertable wait functions include
WaitForSingleObjectEx, WaitForMultipleObjectsEx, and SleepEx.
The ReadFileEx and WriteFileEx asynchronous I/O routines will cause a user
mode APC to be queued when an asynchronous operation completes. An
application can also queue its own APC via the Win32 QueueUserAPC function
Function Description
Function Description
Overview
For the most part, you use the same basic Win32 file I/O API functions to carry out
asynchronous I/O that you use to perform synchronous I/O; you just pass different
parameters. (Although ReadFileEx and WriteFileEx are used exclusively for
asynchronous I/O operations, ReadFile and WriteFile can be used for either type.) In
order to set up a file for asynchronous I/O processing, you must pass the
FILE_FLAG_OVERLAPPED switch into CreateFile. When you then call
ReadFile/ReadFileEx or WriteFile/WriteFileEx, you pass a pointer to an OVERLAPPED
structure that specifies the starting position for the operation (and is also used by
Windows for managing the asynchronous operation).
You can check the status of a pending asynchronous operation via the
HasOverlappedIoCompleted and GetOverlappedResult Win32 API functions. If you
want to wait on an asynchronous operation to complete before proceeding, you have
several options.
You can call one of the Win32 wait functions (e.g., WaitForSingleObject) and
pass in either the optional event you associated with the OVERLAPPED
structure or the handle of the file object itself. If specified, this event should be
a manual-reset event, not an auto-reset event. Windows will signal the event
associated with the OVERLAPPED structure once an asynchronous operation
that was initiated with ReadFile or WriteFile (but not ReadFileEx or WriteFileEx)
has completed. It will also signal the file on which the operation has
completed, however, if you have several concurrent asynchronous operations
on the file in progress, you won't be able to tell from this alone which one of
them has completed, so you have to be careful if you decide to wait on the file
object itself.
You can tell GetOverlappedResult to wait until the operation completes before
returning by specifying TRUE for its bWait parameter. If you want to use
GetOverlappedResult to wait on an asynchronous operation to complete and
multiple asynchronous operations are occurring simultaneously on the
specified file, you must create an event object and assign it to the hEvent
member of the OVERLAPPED structure passed into ReadFile/WriteFile and
GetOverlappedResult. If you don't associate an event object with the
OVERLAPPED structure and you specify a value of TRUE for
GetOverlappedResult's bWait parameter, the function will wait on the specified
file object to be signaled. As I mentioned in the previous bullet point, this isn't
reliable when multiple asynchronous operations are occurring concurrently, so
you must provide an event via the OVERLAPPED structure instead. When the
OVERLAPPED structure contains a reference to an event object,
GetOverlappedResult waits on it, rather than the file, to be signaled. Since
each asynchronous operation must have its own OVERLAPPED structure, this is
a reliable way to determine the status of a pending I/O request. See the fstring
sample application later in the chapter for an example of this technique.
You can wait on some other object using an alertable wait function and allow
the APC function passed into ReadFileEx or WriteFileEx to cause the wait
function to return when an asynchronous operation completes. See the
unicode_convert sample application later in this chapter for an example of this
technique.
You can use an I/O completion port to manage the process of waiting on
asynchronous I/O. See the I/O Completion Ports section later in the chapter for
details.
You will likely need to synchronize access to files opened for asynchronous writes. If
multiple asynchronous write operations are taking place concurrently, you will
naturally need to ensure that the file does not become corrupted due to thread
synchronization issues. You can use GetOverlappedResult and the Win32 wait
functions to help ensure that only one asynchronous write operation transpires at a
time.
I should stop here and point out something that may not be immediately obvious.
You can create a file object with the FILE_FLAG_OVERLAPPED switch and pass a valid
OVERLAPPED structure pointer into ReadFile or WriteFile and still not initiate an
asynchronous I/O operation. With ReadFile and WriteFile, Windows makes the final
decision as to whether the operation is carried out synchronously or asynchronously.
Although you may call these routines with the expectation that an operation will be
carried out asynchronously, you have to code for the possibility that it might not be.
ReadFile will return TRUE when it carries out an operation synchronously. It will
return false and GetLastError will return ERROR_IO_PENDING when it has queued an
operation to be carried out asynchronously. Examples of when a ReadFile/WriteFile
operation that was initiated as an asynchronous operation runs synchronously
include the following.
The file being processed is compressed with NTFS compression. This is yet
another good reason not to compress files that SQL Server makes use of,
especially data and log files.
The requested operation is increasing the size of the file (e.g., a WriteFile
operation to the end of a file that causes the file to grow in size).
The operation is against a buffered (cached) file, but the cache manager and
memory manager are saturated. This is more likely if an application makes a
large number of I/O requests for data that is not in the cache. By making use of
nonbuffered I/O on its data and log files, SQL Server avoids this issue
altogether. We'll talk more about nonbuffered I/O later in the chapter.
The best way to ensure that an I/O operation is carried out asynchronously is to use
ReadFileEx and WriteFileEx rather than ReadFile and WriteFile. ReadFileEx and
WriteFileEx always run asynchronously, regardless of other activity on the system. In
fact, unlike ReadFile and WriteFile, ReadFileEx and WriteFileEx don't even accept a
counter to return the number of bytes processed as a parameter because they are
not designed to be used by themselves to process file I/O�you retrieve the bytes
processed from the APC function they cause to be queued when an I/O operation
completes.
Exercise
Let's take a look at some code. In this sample application, we'll explore a utility that
converts the text in a UNICODE file to a different code page (e.g., ANSI). The tool
works similarly to Notepad's Save As command in that it allows you to write a
UNICODE text file in a different file format.
Exercise 5.2 A Utility That Converts a UNICODE Text File by Using
Asynchronous I/O
#include "stdafx.h"
#include "windows.h"
#include "stdlib.h"
DWORD dwBytesWritten=0;
DWORD dwTotalBytesWritten=0;
DWORD dwCodePage=CP_ACP;
if (4==argc)
dwCodePage=atoi(argv[3]);
HANDLE hInputFile=CreateFile(argv[1],
GENERIC_READ,
FILE_SHARE_READ,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL);
if (INVALID_HANDLE_VALUE==hInputFile) {
printf("Unable to open file %s. Last error=%d\n",argv[1],
GetLastError());
return 1;
}
HANDLE hOutputFile=CreateFile(argv[2],
GENERIC_WRITE,
0,
NULL,
CREATE_ALWAYS,
FILE_ATTRIBUTE_NORMAL |
FILE_FLAG_OVERLAPPED,
NULL);
if (INVALID_HANDLE_VALUE==hOutputFile) {
printf("Unable to open file %s. Last error=%d\n",argv[2],
GetLastError());
return 1;
}
HEAP_ZERO_MEMORY,
INPUT_BUFFER_SIZE * sizeof(wchar_t));
char *pszBuffer = (char *)HeapAlloc(GetProcessHeap(),
HEAP_ZERO_MEMORY,
OUTPUT_BUFFER_SIZE);
OVERLAPPED olIO;
ZeroMemory(&olIO,sizeof(olIO));
DWORD dwBytesRead;
DWORD dwTotalBytesRead=0;
while ((ReadFile(hInputFile,
pwszBuffer,
INPUT_BUFFER_SIZE * sizeof(wchar_t),
&dwBytesRead,NULL))
&& (dwBytesRead)) {
if (dwTotalBytesRead) {
WaitForSingleObjectEx(GetCurrentProcess(),INFINITE,true);
if ((MAXDWORD - olIO.Offset) < dwBytesWritten) {
olIO.OffsetHigh++;
olIO.Offset=(MAXDWORD - olIO.Offset);
}
else olIO.Offset+=dwBytesWritten;
}
DWORD dwBytesConverted=
WideCharToMultiByte(dwCodePage,
0,
pwszBuffer,
dwBytesRead / sizeof(wchar_t),
pszBuffer,
OUTPUT_BUFFER_SIZE,
NULL,
NULL);
if (!dwBytesConverted) {
printf("Error converting file %s near offset %d. Last
error=%d\n",argv[1],dwTotalBytesRead,GetLastError());
return 1;
}
WriteFileEx(hOutputFile,
pszBuffer,
dwBytesConverted,
&olIO,&WriteCompleted);
dwTotalBytesRead+=dwBytesRead;
WaitForSingleObjectEx(GetCurrentProcess(),INFINITE,true);
HeapFree(GetProcessHeap(),0,pwszBuffer);
HeapFree(GetProcessHeap(),0,pszBuffer);
CloseHandle(hInputFile);
CloseHandle(hOutputFile);
return 0;
}
2. Run the utility from MSDEV, passing in the name of a UNICODE file as the first
parameter and the target name for the output file to create as the second
parameter. (Press Alt+F7 in Visual Studio 6.0 and select the Debug tab to set
the command line parameters.) If you don't have a UNICODE file handy, there's
one on the CD named "UNICODE.TXT."
3. You can specify an optional code page number as the utility's third parameter.
(See Table 5.5 for a list of common code pages and their corresponding integer
values.) If you don't specify a third parameter, the ANSI code page is used by
default.
4. The unicode_convert app begins by opening both files. It opens the input file in
synchronous mode. Since we are converting the contents of the input file,
there isn't a lot we can do until each I/O request against it has completed.
There's therefore no reason to attempt to read it asynchronously.
5. We do, however, open the output file in asynchronous mode. The thinking is
this: By not waiting on a write to the output file to complete before reading the
next buffer from the input file, we allow the application to read the input and
write the output simultaneously. As each new buffer is read from the input file,
we are writing the previous buffer to the output file.
ANSI
0
OEM
1
MAC
2
Symbol
42
Win32 API Function
Cde Page
Integer Value
UTF-7
65000
UTF-8
65001
7. The actual work of converting the input text from one code page to another is
done by the Win32 WideCharToMultiByte function. We loop through the input
file, convert each buffer we read using WideCharToMultiByte, then call
WriteFileEx to write each converted buffer to the output file.
9. In order to ensure that we've written each converted text buffer to disk before
altering it through another call to WideCharToMultiByte, we use
WaitForSingleObjectEx to pause execution of the calling thread until the
previously initiated asynchronous write operation has completed. When we call
WaitForSingleObjectEx, we pass in the handle to the current process. This
object is actually not used by WriteFileEx and will not be signaled by the
asynchronous operation (in fact, it won't be signaled until the process exits).
We use it here as a kind of placeholder object so that we can be notified when
the write operation began by WriteFileEx completes. In order for the APC
routine we passed into WriteFileEx to be called, we have to enter an alertable
state. We do that by calling one of the Win32 wait functions that support being
alerted while they wait and passing TRUE for its bAlertable parameter. So,
while WaitForSingleObjectEx waits indefinitely on our process object, the
asynchronous write operation we initiated via the WriteFileEx call will complete
and cause the wait function to return because we have specified that it is
alertable.
You may be wondering why we don't just wait on the file object itself using
WaitForSingleObjectEx rather than intentionally waiting on an object that will
never be signaled as long as the process is running. The reason we do this is
that an alertable wait on an object for which an asynchronous I/O operation is
under way will not allow the specified APC function to run once the operation
completes. Windows will signal the object and cause the wait function to return
immediately without executing the APC. If you initiate an alertable wait on a
different object, however, Windows will interrupt it with an alert when the
asynchronous operation completes and will cause the APC function to execute
within the context of the thread that initiated the operation.
Another way to cause thread execution to pause until the write operation
completes is to call the GetOverlappedResult Win32 API function and set its
bWait parameter to TRUE. This would cause the calling thread to wait until the
pending asynchronous I/O on the specified file (and referenced by the supplied
OVERLAPPED structure) had completed.
10. In addition to using it to tally the total number of bytes read, we use the
dwTotalBytesRead counter as a flag to allow us to determine whether we're in
the first iteration of the loop. The counter will be 0 the first time through the
loop because we have not yet reached its increment instruction at the bottom
of the loop. The reason we don't want to wait on the write operation the first
time through the loop is that we've not yet initiated it. If we call
WaitForSingleObjectEx and begin waiting to be alerted before the
asynchronous operation has even been started, we'll effectively hang the
calling thread.
The reason we pause the calling thread until the previously initiated
asynchronous operation has completed is twofold: (1) we don't want to alter
the buffer WriteFileEx is writing to disk by initiating another call to the
WideCharToMultiByte function before it completes, and (2) we don't want to
initiate another asynchronous write operation before the one pending has
completed. This is what I was referring to when I said that an app must provide
for thread synchronization when performing asynchronous writes. In this case,
we keep things pretty basic and simply prevent multiple asynchronous writes
from occurring simultaneously. In a more complex app, you would likely have
several overlapped I/O operations occurring at once, and you might use one of
the multiobject wait functions (e.g., WaitForMultipleObjectsEx) to wait on all of
them at once (or use a more complex I/O mechanism such as an I/O
completion port).
11.
We loop until we've processed the entire input file. Once ReadFile either
returns FALSE or we see 0 bytes read from the input file, we exit the loop. Note
the final call to WaitForSingleObjectEx. This is necessary to ensure that the last
file write request has completed before we print our conversion tallies and
close the files. Again, we block on this call until the asynchronous operation
has completed.
12. It would probably be instructive to step through the app under the Visual C++
debugger. Begin by stepping through the main processing loop and the APC
function. Pay special attention to the dwBytesRead and dwBytesWritten
counters�they're the best indicators of how far along ReadFile and WriteFileEx
are at any given point in time.
So, that's how asynchronous I/O works in Windows. You create the file object using
the FILE_FLAG_OVERLAPPED flag, then pass a pointer to an OVERLAPPED structure
into ReadFile/ReadFileEx or WriteFile/WriteFileEx to initiate the asynchronous
operation. You then use either GetOverlappedResult or one of the Win32 wait
functions to synchronize access to the file and check the progress of the
asynchronous operation.
Nonbuffered I/O
One thing worth mentioning about nonbuffered I/O is that it alleviates the problem I
mentioned earlier in which Windows may decide to carry out an asynchronous I/O
request synchronously because of memory manager or cache manager saturation.
By bypassing the system cache, you avoid that possibility altogether.
SQL Server uses nonbuffered I/O extensively. By circumventing the system cache
(and performing its own internal cache management), SQL Server has greater
control over whether operations are carried out asynchronously or synchronously
and can ensure better data integrity because it does not have to be concerned with
disk writes appearing to be complete but not actually being written to the physical
media until the cache manager decides to write them.
1. Access to the file must begin at byte offsets that are evenly divisible by the
disk's sector size.
2. File reads and writes must be for numbers of bytes that are evenly divisible by
the disk's sector size. Assuming a default sector size of 512 bytes, an
application can read and write buffers of 1,024 and 8,192 bytes, but not 1,025
or 10,000 bytes.
3. The buffers used for reads and writes must be aligned on memory addresses
that are evenly divisible by the disk's sector size. This means that 0x7FF01000
is a valid buffer start address, but 0x7FF01001 is not.
A good way to ensure that the last requirement is met is to allocate the memory
used for nonbuffered I/O using VirtualAlloc. As I mentioned in Chapter 4, VirtualAlloc
allocates memory on system page size boundaries. Since the system page size and
a disk's sector size are both expressed as powers of 2, allocating a buffer with
VirtualAlloc ensures that it will be aligned on sector size boundaries.
The next exercise is the first of several in this book that will take you through the
process of building a sample application that searches a text file for a specified
string. Each sample app uses a different type of Windows file I/O or uses it in a
different way than the others. By exploring each type of I/O using a common
metaphor, you should be able to compare and contrast the types of I/O facilities that
Windows offers user mode applications and ascertain the strengths and weaknesses
of each one compared with the others. We'll start with simple apps, then gradually
increase their complexity as we go�the hope being that you'll add an understanding
of the unique characteristics of each new sample app to what you learned about the
previous ones.
Exercise
Exercise 5.3 takes you through an application that makes use of nonbuffered I/O. It
opens each file that matches a given file mask in both nonbuffered and overlapped
(asynchronous) mode and searches it for a string. It uses multiple worker threads to
search through the file in parallel, keeps a tally of the number of matches it finds,
and prints out each line containing a match.
1. Load the fstring sample app from the CH05\fstring subfolder on the book's CD
into the Visual C++ development environment. Compile and run it, specifying
a text file and search string to search for as parameters. If you don't have a file
handy that you'd like to search, the CD includes a file named INPUT.TXT. It
contains a few instances of the string "ABCDEF" that you can search for.
2. Find the CreateFile call for the input file in fstring.cpp. You'll find that it passes
both the FILE_FLAG_NO_BUFFERING and FILE_FLAG_ OVERLAPPED switches.
This causes the file to be processed without using the system cache and in
asynchronous mode when possible.
3. The fstring app consists of two main source code modules: fstring.cpp and
bufsrch.cpp. fstring.cpp contains the program's entry point function and the
parts of the code that invoke a search against each file matching the specified
file mask. bufsrch.cpp implements the code necessary to search a given buffer
for a specified string, count the number of matches, and output matches to the
console. This is best explored by going through the code itself. Let's start with
fstring.cpp (Listing 5.3).
Listing 5.3 fstring.cpp, the Main Source Code Module for the
fstring Utility
#include "stdafx.h"
#include "windows.h"
#include "stdlib.h"
#include "process.h"
#include "bufsrch.h"
#define IO_STREAMS_PER_PROCESSOR 6
return ((CBufSearch*)lpParameter)->Search();
strcpy(szFullPathName,szPath);
strcat(szFullPathName,szFileName);
if (INVALID_HANDLE_VALUE==hFile) {
printf("Error opening file. Last error=%d\n",
GetLastError());
return 1;
}
DWORD dwFileSizeHigh;
DWORD dwFileSizeLow=GetFileSize(hFile,&dwFileSizeHigh);
DWORD dwlFileSize=(dwFileSizeHigh*MAXDWORD)+dwFileSizeLow;
hEvents=(HANDLE *)HeapAlloc(hPrivHeap,
HEAP_ZERO_MEMORY,
dwNumThreads*sizeof(HANDLE));
if (NULL==hEvents) {
printf("Error allocating event array. Aborting.\n");
return 1;
}
pbFirst=new CBufSearch(pbFirst,
pcsOutput,
szFileName,
hFile,
dwClusterSize,
szSearchStr,
hEvents[i]);
hThreads[i]= (HANDLE)_beginthreadex(NULL,
0,
&StartSearch,
pbFirst,
0,
&uThreadId);
if (!hThreads[i]) {
printf("Error creating thread. Aborting.\n");
return -1;
}
}
DWORDLONG dwlFilePos=0;
do {
for (CBufSearch *pbCurrent=pbFirst;
NULL!=pbCurrent;
pbCurrent=pbCurrent->m_pbNext) {
pbCurrent->m_OverlappedIO.Offset=
(DWORD)(dwlFilePos / MAXDWORD);
pbCurrent->m_OverlappedIO.Offset=
(DWORD)(dwlFilePos % MAXDWORD);
DWORD dwLastErr=GetLastError();
if (ERROR_IO_PENDING!=dwLastErr) {
dwlFilePos+=dwClusterSize;
}
} while (dwlFilePos<dwlFileSize);
char szPath[MAX_PATH+1];
if (INVALID_HANDLE_VALUE == hFind) {
printf("No files match the specified mask\n");
return false;
}
CRITICAL_SECTION csOutput;
InitializeCriticalSection(&csOutput);
DWORD dwFindCount=0;
do {
dwFindCount+=SearchFile(dwClusterSize,
si.dwNumberOfProcessors*IO_STREAMS_PER_PROCESSOR,
&csOutput,
szPath,fdFiles.cFileName,
szSearchStr);
} while (FindNextFile(hFind,&fdFiles));
FindClose(hFind);
DeleteCriticalSection(&csOutput);
try
{
return (!SearchFiles(argv[1], argv[2]));
}
catch (...)
{
printf("Error reading file\n");
return 1;
}
}
4. Let's start with the main function. It takes the parameters passed into it and
calls the global SearchFiles function. SearchFiles accepts a file mask and a
search string, then locates each file matching the mask using the FindFirstFile
and FindNextFile Win32 API functions.
5. SearchFiles creates a critical section which will be used to synchronize access
to the console, which it then passes into the routine responsible for searching
each file, SearchFile. I'll explain why we use a critical section in a moment.
6. SearchFiles also computes the cluster size for the current drive by calling the
GetDiskFreeSpace function. We need to compute the cluster size on the drive
because we intend to process files using nonbuffered I/O and Windows requires
that I/O requests against nonbuffered files be aligned on disk sector
boundaries. Since a disk's cluster size is always a multiple of its sector size,
setting our read buffer to match the cluster size is a reasonable way to meet
Windows' requirements.
8. SearchFiles then calls the global SearchFile function to search each file for the
string.
9. SearchFile begins by opening the specified file using CreateFile and passing in
the FILE_FLAG_NO_BUFFERING and FILE_FLAG_OVERLAPPED switches. It then
computes the size of the file and makes certain that it does not have more I/O
streams than clusters in the file since we are reading the input file in cluster-
sized chunks.
10. SearchFile next creates a private heap that it will use to store the two arrays of
handles that it will need, the worker thread handle array and the event
synchronization array. It then uses this private heap to allocate these two
arrays. Because these are both allocated from a private heap, SearchFile can
easily free them by simply destroying the heap prior to exiting. You may recall
that I mentioned this technique in Chapter 4.
11. SearchFile next enters a loop wherein it allocates a CBufSearch object for each
I/O stream and creates a worker thread for each CBuf Search instance. The
entry point for each worker thread is the global function StartSearch, which
casts the user parameter that was passed into _beginthreadex as a CBufSearch
pointer and uses it to call the CBufSearch::Search method. We'll talk more
about CBufSearch in just a moment.
12. Once the search objects and worker threads have been created, SearchFile
waits on the worker threads to signal it (via the event synchronization array)
that they're ready to begin processing data.
13. Once the worker threads signal that they're ready, SearchFile enters a loop
wherein it iterates through the CBufSearch objects and reads a cluster from the
file asynchronously for each one. Once it has read a cluster for each
CBufSearch object, it signals the object to begin processing the buffer. After a
search has been queued for each of the CBufSearch objects, SearchFile waits
on them all to finish before queuing more requests. This process continues
until the entire file has been searched.
14.
SearchFile checks the return value of ReadFile so that it can provide for the
possibility that Windows might decide to process the read synchronously even
though we've opened the file with the FILE_FLAG_OVERLAPPED switch and
passed in a valid OVERLAPPED structure. As I mentioned earlier, there are
situations in which Windows will process an asynchronous I/O request
synchronously, and you have to code for that possibility. Here, we set a
member in the CBufSearch object (m_bOverlapped) to indicate whether an
asynchronous I/O operation was successfully initiated. CBufSearch needs to
know whether the read was initiated asynchronously so that it can determine
whether to call GetOverlappedResult to wait on the pending operation before
attempting to search the read buffer.
15. To see how this works, set a breakpoint in SearchFile on each line that assigns
m_bOverlapped, then run fstring under the Visual C++ debugger. If you pass in
INPUT.TXT as the file mask, you should see that it reads this file
asynchronously. Now, stop the debugger, open Explorer, bring up the file
properties for INPUT.TXT and flag it as a compressed file, then rerun your test.
You should see that the file is now read synchronously. As I mentioned earlier,
one sure way to defeat Windows' ability to read or write a file asynchronously
is to compress it with NTFS file compression.
16. Once the file is completely read and searched, SearchFile closes the input file,
frees up the resources it allocated, and returns a match count tally to
SearchFiles.
17. The real work of searching each file is done by the CBufSearch class. Let's have
a closer look at it (Listing 5.4).
#include "bufsrch.h"
//Ctor
CBufSearch::CBufSearch(CBufSearch *pbNext, LPCRITICAL_SECTION
pcsOutput, char *szFileName, HANDLE hFile, DWORD dwClusterSize,
char *szSearchStr, HANDLE hSearchEvent)
{
//Initialize the OVERLAPPED structure
ZeroMemory(&m_OverlappedIO, sizeof(m_OverlappedIO));
m_OverlappedIO.hEvent=CreateEvent(NULL,true,false,NULL);
//Dtor
CBufSearch::~CBufSearch()
{
//Close the event handles we created
//in the constructor
CloseHandle(m_OverlappedIO.hEvent);
CloseHandle(m_hMainEvent);
__try
{
EnterCriticalSection(m_pcsOutput);
#if(_DEBUG)
printf("Thread %08d: Offset: %010I64d %s ",
GetCurrentThreadId(),dwlFilePos+
(szStringPos-m_szBuf),m_szFileName);
#else
printf("Offset: %010I64d %s ",dwlFilePos+
(szStringPos-m_szBuf),m_szFileName);
#endif
//Build format string that limits output to
//current line
strcpy(szFmt,"%.");
sprintf(szFmt+2,"%ds\n",dwNumChars);
LeaveCriticalSection(m_pcsOutput);
bRes=true;
}
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
//Eat the exception -- should never get here
//given that the read buffer is guaranteed to
//be null-terminated.
#if(_DEBUG)
19. The CBufSearch constructor next creates an event that the main thread will
use to signal a worker thread that it can process the read buffer.
Synchronization between the main thread and the worker threads is
accomplished by using two event objects for each worker thread. The first
event�the one created by CBufSearch's constructor�is used to signal the
worker thread that it's safe to process the read buffer. The second�the one
created by the main thread�is used to signal the main thread that the worker
thread is done processing its buffer and is ready for another.
20. The CBufSearch constructor next allocates the memory for the read buffer. It
uses VirtualAlloc to do this so that we can guarantee that the buffer will be
aligned on a sector size boundary in memory, a requirement of Windows'
nonbuffered I/O facility. As I mentioned earlier, because VirtualAlloc always
allocates memory on page boundaries, and because the system page size and
the sector size of a disk are always expressed as powers of 2, we can be
certain that any buffer allocated with VirtualAlloc will be properly aligned for
use with nonbuffered I/O.
21. Note the fact that we size the VirtualAlloc allocation to 1 byte greater than the
cluster size. This is so that we can be sure that the search buffer will be null-
terminated. We always zero-fill the read buffer between reads, so the buffer
will always end with a 0, even after a read has completed, because the read
buffer is larger than the read size specified in the ReadFile call. We need to
ensure that the buffer is always null-terminated because we use C/C++ RTL
functions such as strstr and strchr to search the buffer for matching strings and
characters. There are no memstr or strnstr functions in the RTL, so we rely on
null-termination to mark the end of the search buffer. Another way to handle
this situation is to allocate a guard or no access page beyond the read buffer
and simply trap the exception that will be generated when strstr attempts to
traverse memory beyond our read buffer. You'll see a variation on this
technique employed in the Memory-Mapped File I/O section later in the
chapter.
22. The actual work of searching each read buffer is carried out by the
CBufSearch::Search method. Once a worker thread is started, it calls this
method and never exits it until the end of the input file is reached.
23. CBufSearch::Search begins by signaling the main thread that it has started up
and is ready for a buffer to search. Just after creating the worker threads, the
main thread passes the event array into WaitForMultipleObjects in order to wait
on all the worker threads to start up and enter CBufSearch::Search before it
begins to read the input file. Once each of them signals that it's ready, the
main thread begins processing the file.
25. Next, Search checks to see whether we have a pending asynchronous I/O
operation by inspecting the m_bOverlapped member. If an asynchronous I/O is
pending, Search calls GetOverlappedResult to wait on it to complete. This
causes GetOverlappedResult to wait on the event we associated with the
OVERLAPPED structure to be signaled. Once it is, execution continues.
26. After acquiring a valid read buffer, Search loops through it, scanning for each
occurrence of the search string. When it finds a match, it outputs the line on
which the match occurs and moves to the next line to continue searching. This
means that each line in the input file will register at most one match,
regardless of how many times the search string occurs on the line.
27. When it finds a match and prepares to output a line to the console, Search
enters the critical section originally created in the SearchFiles routine in order
to prevent other worker threads from writing to the console simultaneously.
This is necessary because we use two printfs to write the output to the
console: one to indicate the file name and the offset at which we found the
string, another to output the matching line itself. If not for the critical section,
another thread could send output to the console in between the two printfs,
making the output difficult to interpret. We use two printfs so that we can
avoid copying the matching line into a second buffer before writing it to the
console. Instead, we compute the start and end of the line in the read buffer
itself, then use a printf format string to print the string starting at the matching
line's beginning and continuing for the number of characters between the
beginning of the line and the end of the line. This allows us to avoid first
copying the matching line to a secondary buffer because we output directly
from the read buffer itself. As you'll see in the sample apps later in the chapter,
there's a simple modification we could make to the code here that would
alleviate the need for the two separate printf calls and the use of the critical
section. I took the approach I did in this app to demonstrate the conventional
use of a critical section�to prevent multiple threads from executing a block of
code simultaneously.
28. Once Search has processed its read buffer, it signals the main thread that it's
ready for another buffer and waits on the main thread to signal it that a new
read buffer is ready for processing. If the end of the input file is reached while
Search waits on the main thread, CBufSearch's m_bTerminated member will be
set to TRUE when Search exits WaitForSingleObject and will cause it to
immediately exit its main loop. This, in turn, will cause the thread to exit. As I
mentioned in Chapter 3, it's always preferable to allow a thread to shut down
normally rather than forcing it to terminate by calling TerminateThread or
ExitThread.
So, that's the fstring sample app from start to finish. As you can see, nonbuffered
and asynchronous I/O can be combined to carry out some very useful tasks in an
efficient manner.
You may have noticed that fstring can't detect search string matches that straddle
buffer boundaries. Assuming we have a cluster size of, say, 4K, a string match that
straddles the 4K boundary will not be detected by fstring. This is a limitation of
page-oriented searching algorithms, which we will address in the Memory-Mapped
File I/O section later in the chapter. As designed, fstring is only intended to
demonstrate how asynchronous, nonbuffered I/O can be used by a multithreaded
program to scan a file in parallel�it's not intended to be a general-purpose text
search tool. There are, however, plenty of uses for page-oriented search tools. As
long as your data is organized such that it does not span buffer boundaries (as it
might be in a database program, for example), a page-oriented search algorithm can
be used to search it.
Another point worth making about fstring is that, because it is multithreaded, the
string matches it reports may not be listed in order. There's no guarantee that
matches found early in a file will be listed before those found later. Multiple threads
are being used to scan the input file(s) simultaneously, so the exact timing of when
a particular match is found and written to the console is not predictable and will
likely vary from run to run of the application. You could resolve this by writing the
matches to memory and sorting them before writing them to the console, but this
might require huge amounts of virtual memory and slow down the app considerably.
You could change the search algorithm so that you don't search individual files in
parallel but instead align the worker threads along file boundaries so that each file is
searched by its own thread. This would resolve the issue, but it might not fully utilize
your system resources (especially if you have a multiprocessor machine) if you were
searching only a single file. If you were searching only one file, regardless of how
large the file was, you'd have to wait while a single thread scanned through it in a
synchronous fashion. A better solution is to use the SORT filter. Windows provides
several filters you can make use of to filter or process the output from console
applications and OS commands. You can use SORT to order fstring's output such that
all the match lines for each file are written to the console in offset order. Here's the
syntax:
I intentionally formatted the output lines so that they could be reordered using SORT.
That's why the offset is zero-padded and the text on each match line has a uniform
length up to the point where the file name starts.
Nonbuffered I/O allows a thread to circumvent the system cache as it performs I/O
operations on a file. It has a set of requirements that a thread must meet to make
use of it. In order for a file to be processed with nonbuffered I/O, the file object must
be created with the FILE_FLAG_NO_BUFFERING switch. Access to the file must begin
on even multiples of the disk's sector size. File reads and writes must be for a
number of bytes that is also an even multiple of the disk's sector size. Finally, the
buffer used in a nonbuffered read or write operation must be aligned on a memory
address that is an even multiple of the disk's sector size. One sure way to guarantee
this is to use VirtualAlloc to allocate the buffer. Since VirtualAlloc always allocates
memory on system page size boundaries, and since disk sector sizes and the system
page size are always expressed as a power of 2, a buffer allocated using VirtualAlloc
will always be aligned on a sector size boundary.
Using nonbuffered I/O helps ensure the best performance when using asynchronous
I/O. It is also a good way to help direct Windows to process an asynchronous I/O
operation asynchronously since it bypasses the system cache, a potential cause of
synchronous processing of asynchronous operations.
SQL Server uses nonbuffered and asynchronous I/O extensively. All writes to SQL
Server data or log files are nonbuffered and asynchronous.
1. Is it true that a sector can actually be larger than a cluster on a hard disk?
4. What memory allocation function can you use to ensure that the read or write
buffer for a nonbuffered I/O operation is aligned on a disk sector boundary?
5. What Win32 API function returns the sector and cluster size of a disk?
7. What type of callback routine do the ReadFileEx and WriteFileEx routines cause
to be queued when an asynchronous I/O operation completes?
9. In what situation is the event handle that can be optionally associated with an
OVERLAPPED structure required in order for GetOverlappedResult to function
properly?
10. If an application is executing a loop that reads through a file using ReadFileEx,
is it necessary to adjust the OVERLAPPED structure that is passed into
ReadFileEx between calls to the function?
11. True or false: You can retrieve the number of bytes processed by an
asynchronous I/O operation initiated by WriteFileEx by passing in a DWORD by
reference for WriteFileEx's dwBytesTransferred parameter.
16. Assuming an event object has been associated with the OVERLAPPED structure
you're using for asynchronous I/O, when an operation initiated with ReadFileEx
completes, will the event be signaled?
17. Explain the use of the Offset and OffsetHigh members of the OVERLAPPED
structure in relation to asynchronous I/O.
19. Explain why you should not initiate an alertable wait against a file object that
has a pending asynchronous I/O operation initiated by WriteFileEx.
20. True or false: SQL Server avoids using nonbuffered I/O because it leverages the
Windows' lazywriter facility in order to achieve maximum I/O performance.
22. When ReadFile successfully initiates an asynchronous I/O operation, what value
does GetLastError return?
23. True or false: Compressing a file with NTFS compression will prevent it from
being processed asynchronously by ReadFile and WriteFile.
25. True or false: One of the requirements for initiating a nonbuffered I/O operation
against a file is that access to the file must begin at byte offsets that are
evenly divisible by the disk's sector size.
// bufsrch.cpp -- a utility class that we
#include "bufsrch.h"
//Ctor
CBufSearch::CBufSearch(CBufSearch
*pbNext, char *szFileName, HANDLE hFile,
DWORD dwClusterSize, int iIndex, char
*szBuf, char *szSearchStr, HANDLE
hSearchEvent, OVERLAPPED
*pOverlappedIO) {
m_szFileName=szFileName;
m_hFile=hFile;
m_szSearchStr=szSearchStr;
m_dwClusterSize=dwClusterSize;
m_hSearchEvent=hSearchEvent;
m_pOverlappedIO=pOverlappedIO;
m_iIndex=iIndex;
m_szBuf=szBuf;
m_bOverlapped=true;
m_dwFindCount=0;
}
//Dtor
CBufSearch::~CBufSearch()
CloseHandle(m_hMainEvent);
char *szStart;
for (szStart=szStartPos;
((szStart>m_szBuf) && (cLINE_DELIM!=*
(szStart-1))); szStart--);
return szStart;
char *CBufSearch::FindLineEnd(char
*szStartPos) {
return strchr(szStartPos,cLINE_DELIM); }
bool CBufSearch::Search()
{
char *szBol;
char *szEol;
char *szStringPos;
DWORD dwNumChars;
char *szStartPos;
bool bRes=false;
char szFmt[32];
char szOffsetOutput[255];
DWORDLONG dwlFilePos;
SetEvent(m_hSearchEvent);
//Main thread sets m_bTerminated //to
false at EOF or in the case //of an error
reading the file while (!m_bTerminated) {
szStartPos=m_szBuf;
(m_pOverlappedIO-
>OffsetHigh*MAXDWORD)+
m_pOverlappedIO->Offset+
(m_iIndex*m_dwClusterSize);
if
((!GetOverlappedResult(m_hFile,m_pOverl
appedIO, &m_dwBytesRead, true)) ||
(!m_dwBytesRead)) {
break;
}
__try
strstr(szStartPos,m_szSearchStr)))) {
//Compute the line start and end so that
we //can write it to the console
szBol=FindLineStart(szStringPos);
szEol=FindLineEnd(szStringPos);
if (szEol) {
dwNumChars=szEol-szBol; if
(szEol<(m_szBuf+m_dwBytesRead)-1)
szStartPos=szEol+1;
else szStartPos=NULL;
else {
dwNumChars=MAXLINE_LEN;
szStartPos=NULL;
#if(_DEBUG)
(szStringPos-m_szBuf),m_szFileName);
#else
sprintf(szOffsetOutput, "Offset:
%010I64d %s ",dwlFilePos+
(szStringPos-m_szBuf),m_szFileName);
#endif
sprintf(szFmt+5,"%ds\n",dwNumChars);
printf(szFmt,szOffsetOutput,szBol);
bRes=true;
}
__except(EXCEPTION_EXECUTE_HANDLER
){
#if(_DEBUG)
#endif
//Signal to the main thread that we're
done //with this buffer
SetEvent(m_hSearchEvent); }
return bRes;
//
#include "stdafx.h"
#include "windows.h"
#include "stdlib.h"
#include "process.h"
#include "bufsrch.h"
#define IO_STREAMS_PER_PROCESSOR 6
DWORD dwNumStreams,
char *szPath,
char *szFileName,
char *szSearchStr)
char szFullPathName[MAX_PATH+1];
DWORD dwNumThreads;
HANDLE hPrivHeap;
HANDLE *hThreads;
HANDLE *hEvents;
FILE_SEGMENT_ELEMENT *pSegments;
strcpy(szFullPathName,szPath);
strcat(szFullPathName,szFileName);
HANDLE
hFile=CreateFile(szFullPathName,
GENERIC_READ,FILE_SHARE_READ, NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL
| FILE_FLAG_OVERLAPPED
| FILE_FLAG_NO_BUFFERING
,NULL);
if (INVALID_HANDLE_VALUE==hFile) {
return 1;
DWORD dwFileSizeHigh;
DWORD
dwFileSizeLow=GetFileSize(hFile,&dwFileSi
zeHigh);
DWORD dwlFileSize=
(dwFileSizeHigh*MAXDWORD)+dwFileSiz
eLow;
DWORD dwNumClusts=dwlFileSize /
dwClusterSize; if (dwNumClusts<1)
dwNumClusts=1;
//If file is less than 4GB and we have
more requested //streams (IO threads)
than clusters, set the # of //threads = to
the # of clusters if
((dwlFileSize<0xFFFFFFFF) &&
(dwNumStreams>dwNumClusts))
dwNumThreads=dwNumClusts; else
dwNumThreads=dwNumStreams;
hPrivHeap=HeapCreate(0,0,0);
dwNumThreads*sizeof(HANDLE)); if
(NULL==hThreads) {
}
hEvents=(HANDLE
*)HeapAlloc(hPrivHeap,
HEAP_ZERO_MEMORY,
dwNumThreads*sizeof(HANDLE)); if
(NULL==hEvents) {
pSegments=(FILE_SEGMENT_ELEMENT
*)HeapAlloc(hPrivHeap,
HEAP_ZERO_MEMORY,
(dwNumThreads+1)*
sizeof(FILE_SEGMENT_ELEMENT)); if
(NULL==pSegments) {
OVERLAPPED OverlappedIO;
ZeroMemory(&OverlappedIO,sizeof(Overla
ppedIO));
OverlappedIO.hEvent=CreateEvent(NULL,t
rue,false,NULL);
hEvents[i]=CreateEvent(NULL,false,false,
NULL);
//Allocate one more byte than the cluster
size //(which will result in an additional
page of //virtual memory being committed
and reserved) //so that we don't have to
worry about strstr //running off the end of
our buffer looking //for a null terminator.
pSegments[i].Buffer=
(PVOID64)VirtualAlloc(NULL,dwClusterSiz
e+1, MEM_RESERVE | MEM_COMMIT,
PAGE_READWRITE);
pbFirst=new CBufSearch(pbFirst,
szFileName,
hFile,
dwClusterSize,
i,
hEvents[i],
&OverlappedIO);
hThreads[i]=
(HANDLE)_beginthreadex(NULL,
0,
&StartSearch,
pbFirst,
0,
&uThreadId);
if (!hThreads[i]) {
//Main loop -- loop through the file,
reading it in //chunks of dwClusterSize *
dwNumThreads size. Each //time we fill a
set of scatter buffers, signal the //worker
threads to search them DWORDLONG
dwlFilePos=0; do {
bOverlapped=true;
OverlappedIO.Offset=
(DWORD)(dwlFilePos / MAXDWORD);
OverlappedIO.Offset=
(DWORD)(dwlFilePos % MAXDWORD);
&OverlappedIO)) {
DWORD dwLastErr=GetLastError(); if
(ERROR_IO_PENDING!=dwLastErr) {
//including EOF
bTerminated=true;
if (ERROR_HANDLE_EOF!=dwLastErr) {
printf("Error reading file. Last error=%d",
dwLastErr);
return -1;
else {
else {
bOverlapped=false;
}
pbCurrent=pbCurrent->m_pbNext) {
pbCurrent-
>m_bTerminated=bTerminated;
pbCurrent-
>m_bOverlapped=bOverlapped;
WaitForMultipleObjects(dwNumThreads,h
Events,true,INFINITE);
dwlFilePos+=dwClusterSize*dwNumThre
ads;
} while (dwlFilePos<dwlFileSize);
CBufSearch *pbNext;
pbFirst->m_bTerminated=true;
dwFindCount+=pbFirst->m_dwFindCount;
pbNext=pbFirst->m_pbNext; delete
pbFirst;
CloseHandle(hThreads[i]);
CloseHandle(hEvents[i]); }
CloseHandle(hFile);
CloseHandle(OverlappedIO.hEvent);
char szPath[MAX_PATH+1];
strncpy(szPath,szFileMask,(p-
szFileMask)+1); szPath[(p-
szFileMask)+1]='\0'; }
else
GetCurrentDirectory(MAX_PATH,szPath);
printf("Searching for %s in
%s\n\n",szSearchStr,szFileMask);
//Loop through all the files matching the
mask //and search each one for the string
WIN32_FIND_DATA fdFiles; HANDLE
hFind=FindFirstFile(szFileMask,&fdFiles);
if (INVALID_HANDLE_VALUE == hFind) {
GetSystemInfo(&si);
&dwNumberOfFreeClusters,
&dwTotalNumberOfClusters);
DWORD dwClusterSize=
(dwSectorsPerCluster *
dwBytesPerSector);
DWORD dwFindCount=0;
do {
dwFindCount+=SearchFile(dwClusterSize
, si.dwNumberOfProcessors*
IO_STREAMS_PER_PROCESSOR, szPath,
fdFiles.cFileName,
szSearchStr);
} while (FindNextFile(hFind,&fdFiles));
FindClose(hFind);
return true;
if (argc<3) {
try
catch (...)
//
#include "stdafx.h"
#include "windows.h"
#include "stdlib.h"
#include "process.h"
#include "bufsrch.h"
#include "iobuf.h"
#define IO_STREAMS_PER_PROCESSOR 2
return ((CBufSearch*)lpParameter)-
>Search();
{
char *pszMsg=(char *)dwParam;
printf(pszMsg);
DWORD dwNumStreams,
char *szPath,
char *szFileName,
char *szSearchStr,
HANDLE hMainThread)
char szFullPathName[MAX_PATH+1];
DWORD dwNumThreads;
HANDLE hPrivHeap;
HANDLE *hThreads;
strcpy(szFullPathName,szPath);
strcat(szFullPathName,szFileName);
HANDLE
hFile=CreateFile(szFullPathName,
GENERIC_READ,FILE_SHARE_READ, NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL
| FILE_FLAG_OVERLAPPED
| FILE_FLAG_NO_BUFFERING
,NULL);
if (INVALID_HANDLE_VALUE==hFile) {
return -1;
DWORD dwFileSizeHigh;
DWORD
dwFileSizeLow=GetFileSize(hFile,&dwFileSi
zeHigh);
DWORD dwlFileSize=
(dwFileSizeHigh*MAXDWORD)+
dwFileSizeLow;
DWORD dwNumClusts=dwlFileSize /
dwClusterSize; if (dwNumClusts<1)
dwNumClusts=1;
dwNumThreads=dwNumStreams;
#if(_DEBUG)
printf("Using %d
threads\n\n",dwNumThreads); #endif
hPrivHeap=HeapCreate(0,0,0);
//Create the thread array hThreads=
(HANDLE *)HeapAlloc(hPrivHeap,
HEAP_ZERO_MEMORY,
dwNumThreads*sizeof(HANDLE)); if
(NULL==hThreads) {
printf(
return -1;
}
unsigned uThreadId;
pIoFirst=new
CIoBuf(pIoFirst,hPort,dwClusterSize+1);
pbFirst=new CBufSearch(pbFirst,
szFileName,
szSearchStr,
&DisplayOutput,
hMainThread);
hThreads[i]=
(HANDLE)_beginthreadex(NULL,
0,
&StartSearch,
pbFirst,
CREATE_SUSPENDED,
&uThreadId);
if (!hThreads[i]) {
pIoFirst->s_pIoFirst=pIoFirst;
pIoFirst->s_bTerminated=false; pIoFirst-
>s_bOverlapped=true;
DWORDLONG dwlFilePos=0;
do {
for (CBufSearch *pbCurrent=pbFirst;
NULL!=pbCurrent;
pbCurrent=pbCurrent->m_pbNext) {
CIoBuf *pIoBuf=
pIoFirst-
>SpinToFindBuf(BUF_STATE_INACTIVE,
BUF_STATE_READING);
(DWORD)(dwlFilePos / MAXDWORD);
pIoBuf->m_OverlappedIO.Offset=
(DWORD)(dwlFilePos % MAXDWORD);
&pIoBuf->m_dwBytesRead, &pIoBuf-
>m_OverlappedIO)) {
DWORD dwLastErr=GetLastError(); if
(ERROR_IO_PENDING!=dwLastErr) {
//including EOF
InterlockedExchange(
(LPLONG)&pIoBuf->s_bTerminated,
(long)true);
if (ERROR_HANDLE_EOF!=dwLastErr) {
printf(
return -1;
break;
else {
(LPLONG)&pIoBuf->s_bOverlapped,
(long)true);
}
else {
InterlockedExchange(
(LPLONG)&pIoBuf->s_bOverlapped,
(long)false);
pIoBuf->SetState(BUF_STATE_READY);
dwlFilePos+=dwClusterSize; }
(LPLONG)&pIoFirst->s_bTerminated,
(long)true);
INFINITE);
while
(WAIT_IO_COMPLETION==SleepEx(0,true))
;
CBufSearch *pbNext;
for (; NULL!=pbFirst; pbFirst=pbNext) {
dwFindCount+=pbFirst-
>m_dwFindCount; pbNext=pbFirst-
>m_pbNext; delete pbFirst;
pIoNext=pIoFirst->m_pIoBufNext; delete
pIoFirst;
//Close the thread handles for (i=0;
i<dwNumThreads; i++) {
CloseHandle(hThreads[i]); }
CloseHandle(hFile);
char szPath[MAX_PATH+1];
strncpy(szPath,szFileMask,(p-
szFileMask)+1); szPath[(p-
szFileMask)+1]='\0'; }
else
GetCurrentDirectory(MAX_PATH,szPath);
printf("Searching for %s in
%s\n\n",szSearchStr, szFileMask);
HANDLE hMainThread=
OpenThread(THREAD_ALL_ACCESS,
0,
GetCurrentThreadId());
if (INVALID_HANDLE_VALUE == hFind) {
DWORD dwNumberOfFreeClusters;
DWORD dwTotalNumberOfClusters;
GetDiskFreeSpace(NULL,&dwSectorsPerCl
uster, &dwBytesPerSector,
&dwNumberOfFreeClusters,
&dwTotalNumberOfClusters);
DWORD dwClusterSize=
(dwSectorsPerCluster *
dwBytesPerSector);
DWORD dwFindCount=0;
do {
dwFindCount+=
SearchFile( dwClusterSize,
si.dwNumberOfProcessors*
IO_STREAMS_PER_PROCESSOR, szPath,
fdFiles.cFileName,
szSearchStr,
hMainThread);
} while (FindNextFile(hFind,&fdFiles));
FindClose(hFind);
CloseHandle(hMainThread);
return true;
}
if (argc<3) {
printf(
try
catch (...)
{
printf("Error reading file. Last
error=%d\n", GetLastError());
return 1;
#include "bufsrch.h"
CIoBuf *CBufSearch::s_pIoFirst=NULL;
//Ctor
CBufSearch::CBufSearch(CBufSearch
*pbNext,
char *szFileName,
char *szSearchStr,
m_szFileName=szFileName;
m_szSearchStr=szSearchStr;
m_pOutputCallback=pOutputCallback;
m_hMainThread=hMainThread;
CBufSearch::~CBufSearch()
char *szStart;
for (szStart=szStartPos;
((szStart>m_pIoCurrent->m_szBuf) &&
(cLINE_DELIM!=*(szStart-1))); szStart--);
return szStart;
}
char *CBufSearch::FindLineEnd(char
*szStartPos) {
return strchr(szStartPos,cLINE_DELIM); }
bool CBufSearch::Search()
char *szBol;
char *szEol;
char *szStringPos;
DWORD dwNumChars;
char *szStartPos;
bool bRes=false;
char szFmt[32];
char szMsg[1024];
while (1) {
__try
//Spin until we find a buffer to search if
((NULL==(m_pIoCurrent=
s_pIoFirst-
>SpinToFindBuf(BUF_STATE_READY,
BUF_STATE_SEARCHING))) && (s_pIoFirst-
>s_bTerminated)) break;
szStartPos=m_pIoCurrent->m_szBuf;
m_pIoCurrent->m_dwBytesRead)-1) &&
(NULL!=(szStringPos=
strstr(szStartPos,m_szSearchStr)))) {
if (szEol) {
dwNumChars=szEol-szBol;
if (szEol<(m_pIoCurrent->m_szBuf+
m_pIoCurrent->m_dwBytesRead)-1)
szStartPos=szEol+1;
else szStartPos=NULL;
else {
dwNumChars=MAXLINE_LEN;
szStartPos=NULL;
#if(_DEBUG)
sprintf(szOffsetOutput,
dwlFilePos+
(szStringPos-m_pIoCurrent->m_szBuf),
m_szFileName);
#else
sprintf(szOffsetOutput,
dwlFilePos+
(szStringPos-m_pIoCurrent->m_szBuf),
m_szFileName);
#endif
sprintf(szFmt+5,"%ds\n",dwNumChars);
sprintf(szMsg,szFmt,
szOffsetOutput,
szBol);
char *pszMsg=(char
*)HeapAlloc(m_hOutputHeap,
0,
strlen(szMsg)+1);
strcpy(pszMsg,szMsg);
if (!QueueUserAPC(m_pOutputCallback,
m_hMainThread,
(DWORD)pszMsg))
bRes=true;
__except(EXCEPTION_EXECUTE_HANDLER
){
//Eat the exception -- should never get
here //given that the read buffer is
guaranteed to //be null-terminated.
#if(_DEBUG)
#endif
m_pIoCurrent-
>SetState(BUF_STATE_INACTIVE);
return bRes;
//file reads
#include "iobuf.h"
bool CIoBuf::s_bOverlapped=true;
bool CIoBuf::s_bTerminated=false;
CIoBuf *CIoBuf::s_pIoFirst=NULL;
//Ctor
CIoBuf::CIoBuf(CIoBuf * pIoBufNext,
HANDLE hPort,
DWORD dwBufSize)
m_dwState=BUF_STATE_INACTIVE;
m_dwBufSize=dwBufSize;
m_szBuf=(char *)VirtualAlloc(NULL,
m_dwBufSize,
MEM_RESERVE
| MEM_COMMIT,
PAGE_READWRITE);
m_pIoBufNext=pIoBufNext;
m_hPort=hPort;
ZeroMemory(&m_OverlappedIO,
sizeof(m_OverlappedIO));
m_OverlappedIO.hEvent=
CreateEvent(NULL,true,false,NULL); }
//Dtor
CIoBuf::~CIoBuf()
VirtualFree(m_szBuf,0,MEM_RELEASE);
CloseHandle(m_OverlappedIO.hEvent); }
CIoBuf *CIoBuf::SpinToFindBuf(DWORD
dwOldState, DWORD dwNewState)
{
bool bWasTerminated;
do {
pIoCurrent=pIoCurrent->m_pIoBufNext)
{
if (dwOldState==
(DWORD)InterlockedCompareExchange(
return pIoCurrent;
void CIoBuf::SetState(DWORD
dwNewState)
InterlockedExchange((long
*)&m_dwState,dwNewState); }
DWORD CIoBuf::GetState()
return m_dwState;
DWORDLONG CIoBuf::FilePos()
return
(m_OverlappedIO.OffsetHigh*MAXDWORD)
+
m_OverlappedIO.Offset;
}
void
CIoBuf::CheckForIoPacketAndSetState(DW
ORD dwNewState) {
if (!s_pIoFirst->s_bOverlapped) return;
DWORD dwKey;
DWORD dwBytesRead;
LPOVERLAPPED pOverlappedIO; if
(GetQueuedCompletionStatus(m_hPort,
&dwBytesRead,
&dwKey,
&pOverlappedIO,
1))
pIoCurr=pIoCurr->m_pIoBufNext) {
if (&pIoCurr-
>m_OverlappedIO==pOverlappedIO) {
InterlockedExchange(
(long *)&pIoCurr->m_dwBytesRead,
dwBytesRead);
pIoCurr->SetState(dwNewState); return;
}
assert(false);
#include "bufsrch.h"
CIoBuf *CBufSearch::s_pIoFirst=NULL;
//Ctor
CBufSearch::CBufSearch(CBufSearch
*pbNext,
char *szFileName,
char *szSearchStr)
m_szFileName=szFileName;
m_szSearchStr=szSearchStr;
m_hOutputIoCompletionPort=
CreateIoCompletionPort(INVALID_HANDL
E_VALUE,NULL,0,0);
CBufSearch::~CBufSearch()
DWORD dwLineCount=0;
DWORD dwBytesWritten;
DWORD dwKey;
OUTPUT_OVERLAPPED
*pOutputOverlapped; while
(dwLineCount<m_dwFindCount) {
GetQueuedCompletionStatus(m_hOutputI
oCompletionPort, &dwBytesWritten,
&dwKey,
(OVERLAPPED **)
&pOutputOverlapped,INFINITE);
printf(pOutputOverlapped->pszMsg);
dwLineCount++;
//From an offset in a buffer, find the start
of the line char
*CBufSearch::FindLineStart(char
*szStartPos) {
char *szStart;
for (szStart=szStartPos;
((szStart>m_pIoCurrent->m_szBuf) &&
(cLINE_DELIM!=*(szStart-1))); szStart--);
return szStart;
char *CBufSearch::FindLineEnd(char
*szStartPos) {
return strchr(szStartPos,cLINE_DELIM); }
bool CBufSearch::Search()
char *szBol;
char *szEol;
char *szStringPos;
DWORD dwNumChars;
char *szStartPos;
bool bRes=false;
char szFmt[32];
char szMsg[1024];
while (1) {
__try
s_pIoFirst-
>SpinToFindBuf(BUF_STATE_READY,
BUF_STATE_SEARCHING))) && (s_pIoFirst-
>s_bTerminated)) break;
szStartPos=m_pIoCurrent->m_szBuf;
m_pIoCurrent->m_dwBytesRead)-1) &&
(NULL!=(szStringPos=
strstr(szStartPos,m_szSearchStr)))) {
if (szEol) {
dwNumChars=szEol-szBol;
if (szEol<(m_pIoCurrent->m_szBuf+
m_pIoCurrent->m_dwBytesRead)-1)
szStartPos=szEol+1;
else szStartPos=NULL;
else {
dwNumChars=MAXLINE_LEN;
szStartPos=NULL;
}
#if(_DEBUG)
sprintf(szOffsetOutput,"Thread %08d:
Offset: %010I64d %s ",
GetCurrentThreadId(),
dwlFilePos+(szStringPos-m_pIoCurrent-
>m_szBuf), m_szFileName);
#else
sprintf(szOffsetOutput,"Offset: %010I64d
%s ", dwlFilePos+(szStringPos-
m_pIoCurrent->m_szBuf), m_szFileName);
#endif
sprintf(szFmt+5,"%ds\n",dwNumChars);
sprintf(szMsg,szFmt,
szOffsetOutput,
szBol);
(OUTPUT_OVERLAPPED *)
HeapAlloc(m_hOutputHeap,
0,
sizeof(OUTPUT_OVERLAPPED));
(char *)HeapAlloc(m_hOutputHeap,
0,
strlen(szMsg)+1);
PostQueuedCompletionStatus(
m_hOutputIoCompletionPort,
0,
0,(OVERLAPPED *)pOutputOverlapped);
bRes=true;
__except(EXCEPTION_EXECUTE_HANDLER
){
#if(_DEBUG)
#endif
m_pIoCurrent-
>SetState(BUF_STATE_INACTIVE);
return bRes;
struct OUTPUT_OVERLAPPED :
OVERLAPPED
char *pszMsg;
};
OUTPUT_OVERLAPPED encapsulates an
output packet. It allows us to queue
CBufSearch output while a worker thread
is searching and defer our console I/O until
later.
Once we've allocated the
OUTPUT_OVERLAPPED structure and
copied the output line to it, we post it to
our output completion port using
PostQueuedCompletionStatus. You may
recall that PostQueuedCompletionStatus
can be used by an application to post its
own special-purpose I/O completion
packets. That's exactly what we're doing
here. We're posting output to the
completion port that we will later retrieve
using GetQueuedCompletionStatus.
19.
True or false: Once the concurrency
value for an I/O completion port has
been set, the system ensures that the
number of threads actively processing
I/O completion packets never exceeds
the value specified.
24.
Describe a potential fallacy of the
software design approach that sets a
hard limit for the number of worker
threads in an app equal to the number
of processors in the system.
Memory-mapped file� a file on disk that has been mapped into virtual
memory such that it serves as the physical storage for the virtual memory.
Table 5.9. Key Win32 APIs for Working with Memory-Mapped Files
Function Description
Function Description
MapViewOfFile Maps a view of a file into memory such that the file serves as the
physical storage for the memory. The file can be a file on disk or
the system paging file.
Overview
Threads that access the file simply access memory as though it were one large,
contiguous array. As the memory is accessed, the Windows memory manager
handles paging the file in and out of physical memory behind the scenes. If a thread
makes changes to this memory, the memory manager writes the changes to the file
as part of the normal paging process.
Windows' mapped file I/O facility is produced jointly by the I/O system and the
memory manager. It's an important part of the I/O subsystem and is used
throughout the OS. The system cache manager, for example, uses mapped file I/O to
map files into virtual memory and provide better response time for I/O-bound
applications. While most caching systems allocate a fixed amount of memory for
caching files, Windows' use of mapped file I/O for this purpose means that the
amount of physical memory set aside for caching files can vary based on what else
is going on in the system. If a lot of physical memory is being consumed, the buffer
shrinks to accommodate this consumption. If physical memory is relatively unused,
the cache can be quite large and can provide excellent performance even with very
large files.
Another way in which Windows makes use of its own mapped file I/O facility is with
image file activation. When an executable or DLL is brought into a process's address
space, it is loaded as a mapped file. As Windows needs to access a particular code
or data page within the binary, it's automatically loaded into physical memory via
the normal paging process. In this case, the range of virtual memory addresses
occupied by the binary is backed by the executable or DLL file itself rather than the
system paging file.
Once a file has been mapped into virtual memory, it can be accessed as though it
had actually been copied from disk into memory. The advantage of mapping the file
into memory versus actually copying it from disk is that because the file is never
copied from its original location into the system paging file, the "load" is extremely
fast and doesn't waste physical storage that might be used for other purposes.
Note that because a mapped file resides in the virtual memory address space, it's
subject to the same limitations as any other virtual memory allocation. If there's
insufficient contiguous address space to create the specified mapping, it will fail, just
as virtual memory reservation that's too large might. Also, given that the entire user
mode address space is at most 3GB, you can't map a file entirely into memory that's
larger than 3GB. You might be able to map a smaller segment of it, but you won't be
able to access the entire file as one large sequential memory buffer.
Exercises
Let's examine memory-mapped file I/O up close by building an app that uses it. The
following exercise presents an app that uses mapped file I/O to search a text file.
This exercise takes you through a sample app that uses memory-mapped file I/O to
search the files matching a given mask for a specified string. It is a variation on the
sample app presented earlier in this chapter that used nonbuffered, asynchronous
I/O to perform a similar search.
1. Load the fndstr sample app from the CH05\fndstr subfolder on the book's CD
into Visual Studio and compile it.
2. Run the app under the VC++ debugger, passing it a text file to search and a
string to locate. If you don't have a text file handy, the CD includes a file
named INPUT.TXT that you can use for testing. It contains several instances of
the string "ABCDEF."
3. If you step through the code, you'll notice that the actual process of searching
a given file is carried out by a single call to the strstr C/C++ RTL function.
Because the entirety of the file appears to be loaded into a contiguous memory
buffer, we can easily search it using strstr. There's no need to process the file
in buffer-sized chunks, nor do we need to be concerned with search string
matches that happen to span a buffer boundary. Unlike some of the other I/O
sample apps, fndstr will locate every occurrence of the search string within a
file, regardless of where it physically resides.
4.
Note that, because we're mapping the file into virtual memory, we are subject
to the limitations of the virtual memory address space. To begin with, if the
system is unable to find a contiguous region of virtual memory addresses
that's large enough to map the entire file, the code below will fail. This could
occur if the user mode address space is fragmented by other allocations or file
mappings. Moreover, if the file is larger than the user mode space (either 2GB
or 3GB on 32-bit Windows), the mapping will also fail. So, this technique isn't
suitable for processing extremely large files or for processing even moderate-
sized files in situations where virtual memory may be heavily fragmented.
5. As with most of the other sample applications in this book, the best way to
understand how they work is to walk through the code itself. Listing 5.11 shows
fndstr.cpp, the main source code file for the fndstr sample app.
Listing 5.11 fndstr.cpp, the Main Source Code Module for the
fndstr Utility
#include "stdafx.h"
#include "windows.h"
#include "stdlib.h"
dwFindCount++;
szBol=FindLineStart(szStringPos, szStart);
szEol=FindLineEnd(szStringPos);
if (szEol) {
dwNumChars=szEol-szBol;
if (szEol<szEnd) szStartPos=szEol+1;
else szStartPos=NULL;
}
else {
dwNumChars=MAXLINE_LEN;
szStartPos=NULL;
}
strcpy(szFmt,"%.");
sprintf(szFmt+2,"%ds\n",dwNumChars);
}
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
//Eat the exception
#if(_DEBUG)
printf("Scanned past end of buffer\n");
#endif
}
if (!dwFindCount)
printf("Not found\n");
return dwFindCount;
}
//Scan a single file for the search string
DWORD SearchFile(char *szPath, char *szFileName,
char *szSearchStr)
{
char *szFileData;
char szFullPathName[MAX_PATH+1];
DWORD dwFindCount;
strcpy(szFullPathName,szPath);
strcat(szFullPathName,szFileName);
return dwFindCount;
}
char szPath[MAX_PATH+1];
char *p=strrchr(szFileMask,'\\');
if (p) {
strncpy(szPath,szFileMask,(p-szFileMask)+1);
szPath[(p-szFileMask)+1]='\0';
}
else
GetCurrentDirectory(MAX_PATH,szPath);
if ('\\'!=szPath[strlen(szPath)-1])
strcat(szPath,"\\");
WIN32_FIND_DATA fdFiles;
HANDLE hFind=FindFirstFile(szFileMask,&fdFiles);
if (INVALID_HANDLE_VALUE == hFind) {
printf("No files match the specified mask\n");
return false;
}
DWORD dwFindCount=0;
do {
dwFindCount+=
SearchFile(szPath,fdFiles.cFileName,szSearchStr);
} while (FindNextFile(hFind,&fdFiles));
FindClose(hFind);
6. The basic plumbing for iterating through the files matching a particular mask
and calling a search function is the same in this and the other file search
sample apps in this book, so I won't put you through the tedium of walking
back through it. If you want specifics on how the SearchFiles routine works, see
the Asynchronous and Nonbuffered I/O section earlier in the chapter where we
originally introduced the SearchFiles function and examined the routine in
detail. For the adventurous, it wouldn't be a lot of work to take the search
algorithms in the I/O samples in this book (implemented via the SearchFile
function in each sample app) and encapsulate them such that they
implemented the Strategy design pattern (as outlined in the book Design
Patterns by Erich Gamma and company[2] ) and were interchangeable. Time
and topical constraints do not permit me to do so here, but it would be an
interesting exercise for the curious.
[2] Gamma, Erich, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elements of Reusable Object-Oriented
Software. Reading, MA: Addison-Wesley, 1995.
7. Let's begin by examining the global Search function. It's fairly simple in
construction. It receives a starting pointer and an ending pointer and finds
every occurrence of the search string between them. The mechanics of
actually locating each string occurrence are handled by the strstr C/C++ RTL
function. Once we find a match, we output the line on which it occurs,
reposition the search start just beyond the end of the line, and continue
looking. Once we've searched the whole buffer, we return a find count to the
caller.
8. Note the exception-handling code we use to trap situations where strstr may
scan past the end of our buffer in search of a null-terminator. Because we can't
pad the file as it appears in virtual memory without physically changing it, it's
possible that we'll read past the end of the buffer if the file happens to end on
an exact system page boundary. So, assuming a system page size of 4K, if a
file is exactly 4K in size and does not happen to end with a null-terminator,
strstr will scan past the end of it while looking for the end of the string. Given
that we can't commit an extra page past the end of the mapped region, this
possibility is, unfortunately, unavoidable with the memory-mapped file
technique. If the end of the file does not fall on a page boundary, we can rest
assured that the remainder of its final page will be zero-filled on its first access,
so strstr will find its null-terminator regardless of the file contents. However, if
that's not the case and strstr attempts to access uncommitted address space,
an access violation will be raised. When that happens, our structured
exception-handling code will discard the exception and allow the program to
continue searching other files. If compiled as a debug build, fndstr will note
that the end of the buffer was likely passed by printing a message to the
console. The fndstr app ignores the exception based on the assumption that an
access violation due to strstr going past the end of the mapped file memory is
the only type of exception we should see in our main search loop. Although
other types of obscure exceptions could be raised, this is a fairly safe
assumption.
9. Now let's have a look at the SearchFile routine itself. It begins by opening the
file and creating a file-mapping object for it, a requirement of memory-mapped
file I/O. It next calls the MapViewOfFile Win32 function to map the file into a
contiguous range of virtual memory addresses and return a pointer to the start
of this range. We use the pointer returned by MapViewOfFile as the access
point into the file. Our search routines will use it as their starting address.
10. SearchFile next calls VirtualQueryEx to retrieve the size of the region set aside
for the file mapping. This should be the file size rounded up to the next page
boundary. We use this to compute the end of the search buffer. Since the
system zero-fills a committed virtual memory page the first time it's accessed,
we can be sure that search routines based on strstr will not find false matches
past the end of the file due to data remnants that may have been left in
memory from previous operations.
So, that's the fndstr sample app from beginning to end. Thanks to memory-mapped
file I/O, the app itself is fairly simple and doesn't have to be concerned much with
searching multiple buffers and the idiosyncrasies of searching for a string that may
straddle a buffer boundary.
Given that the nonbuffered, asynchronous sample app fstring (introduced earlier in
the chapter) was multithreaded, you may be wondering why we don't scan the
mapped file in parallel. After all, searching the file is simply a matter of scanning
memory, and we can do that for the most part without having to be concerned about
synchronizing simultaneous access by multiple threads because we are reading, not
writing, the memory.
In the next sample app, we'll explore that very possibility. Because we will be
logically dividing the file into multiple pieces in order to scan it with multiple threads,
we will again face the possibility that a match string could straddle a buffer
boundary. However, because the entirety of the file has been mapped into memory
and appears as one contiguous buffer of address space, we can solve the problem in
a novel way without giving up the ability to scan the file with multiple threads.
1. Load the findstring sample app from the CH05\findstring subfolder on the CD
into Visual Studio and compile it.
3. The SearchFile function is responsible for searching each file. It opens each one
with CreateFile, creates a file-mapping object for it, then maps it into memory
using MapViewOfFile. As with the previous exercise, we use the pointer
returned by MapViewOfFile as the starting point for the search operation.
4.
One of the parameters passed into SearchFile is the number of processors on
the system. The SearchFiles routine computes this at program startup and
passes it into SearchFile. SearchFile then creates a worker thread for each
processor on the system. If you have a two-processor system, you'll see two
worker threads created. If you have a four-processor system, you'll see four
worker threads created. Because the search is done entirely in virtual memory,
there's little benefit in creating more threads than processors.
5. Note the way in which SearchFile checks the number of pages in the file and
lowers the number of threads it will use if the file has fewer pages than there
are processors on the system. This way we can be sure that each thread gets
at least one memory page to search.
7. Each CRangeSearch object is passed a starting and ending offset to search. So,
by virtue of the fact that the file is mapped into a range of contiguous virtual
memory addresses, we only need to compute offset pairs in order to scan it
with multiple threads. If you have two worker threads, each thread will scan
approximately half of the file.
8. Once all the worker threads have been created, SearchFile starts them running
by calling ResumeThread. We initially create the worker threads in a suspended
state so that each CRangeSearch object can recompute the end of its search
range before the actual search process begins. As I mentioned earlier, because
we are dividing the virtual memory region into which the file has been mapped
into multiple logical pieces so that we can scan them in parallel, we again face
the situation where a search match may span a buffer boundary. To handle
this, as we create each CRangeSearch object, we adjust its ending scan offset
such that it coincides with the last complete text line in the region. In other
words, if the last character in the buffer is not an end-of-line marker, we move
the end of the buffer backward until we find the last end-of-line marker in the
buffer. This prevents a search match from spanning a buffer boundary. It also
necessitates that the next CRangeSearch object start its search just after this
final end-of-line marker in the buffer, so CRangeSearch's RecalcEnd method
returns the new starting offset, and this is passed into the next CRangeSearch
object's constructor so that it can set its starting position accordingly.
10. This procedure is best understood by looking at the code itself. Listing 5.12
shows findstring.cpp.
Listing 5.12 findstring.cpp, the Main Source Code Module for
the findstring Utility
#include "stdafx.h"
#include "windows.h"
#include "stdlib.h"
#include "process.h"
#include "rngsrch.h"
strcpy(szFullPathName,szPath);
strcat(szFullPathName,szFileName);
if (dwNumProcessors>dwNumPages)
dwNumThreads=dwNumPages;
else
dwNumThreads=dwNumProcessors;
CRangeSearch *prsFirst=NULL;
char *szNextStartOfs=szFileData;
char *szEndOfs=szFileData;
unsigned uThreadId;
if (i<dwNumThreads-1) {
szEndOfs+=(dwPagesPerThread*dwPageSize)-1;
}
else szEndOfs=szFileData+mbi.RegionSize-1;
if (i<dwNumThreads-1)
szNextStartOfs=prsFirst->RecalcEnd()+1;
hThreads[i]=
(HANDLE)_beginthreadex(NULL,
0,
&StartSearch,
prsFirst,
CREATE_SUSPENDED,
&uThreadId);
return dwFindCount;
}
if ('\\'!=szPath[strlen(szPath)-1])
strcat(szPath,"\\");
WIN32_FIND_DATA fdFiles;
HANDLE hFind=FindFirstFile(szFileMask,&fdFiles);
if (INVALID_HANDLE_VALUE == hFind) {
printf("No files match the specified mask\n");
return false;
}
SYSTEM_INFO si;
GetSystemInfo(&si);
DWORD dwFindCount=0;
do {
dwFindCount+=
SearchFile(si.dwPageSize,
si.dwNumberOfProcessors,
szPath,
fdFiles.cFileName,
szSearchStr);
} while (FindNextFile(hFind,&fdFiles));
FindClose(hFind);
#include "rngsrch.h"
//Ctor
CRangeSearch::CRangeSearch(CRangeSearch *prsNext,
char *szFileName, char *szFileData, char *szStart, char *szEnd,
char* szSearchStr)
{
//Cache ctor params for later use
m_prsNext=prsNext;
m_szFileName=szFileName;
m_szFileData=szFileData;
m_szStart=szStart;
m_szEnd=szEnd;
m_szSearchStr=szSearchStr;
m_dwFindCount=0;
}
// -- assumes null-termination
char *CRangeSearch::FindLineEnd(char *szStartPos)
{
return strchr(szStartPos,cLINE_DELIM);
}
__try
{
while ((szStartPos) &&
(szStartPos<m_szEnd) &&
(NULL!=(szStringPos=strstr(szStartPos,m_szSearchStr)))) {
m_dwFindCount++;
szBol=FindLineStart(szStringPos);
szEol=FindLineEnd(szStringPos);
if (szEol) {
dwNumChars=szEol-szBol;
if (szEol<m_szEnd) szStartPos=szEol+1;
else szStartPos=NULL;
}
else {
dwNumChars=MAXLINE_LEN;
szStartPos=NULL;
}
#if(_DEBUG)
sprintf(szOffsetMsg,"Thread %08d: Offset: %010d %s ",
GetCurrentThreadId(),
szStringPos-m_szFileData,m_szFileName);
#else
sprintf(szOffsetMsg,"Offset: %010d %s ",
szStringPos-m_szFileData,m_szFileName);
#endif
}
}
__except(EXCEPTION_EXECUTE_HANDLER)
{
//Eat the exception
#if(_DEBUG)
printf("Thread %08d reached end of buffer\n",
GetCurrentThreadId());
#endif
}
if (!bRes)
printf("Not found\n");
return bRes;
}
12. The key method in the CRangeSearch class is its Search method. As I
mentioned earlier, once a worker thread enters this method, it never exits until
the thread has finished scanning the range of memory for which it's
responsible.
13. The Search method uses a simple loop based on calls to the strstr C/C++ RTL
function to find the matches in its buffer. For each match, it outputs the line
containing the match, then repeats the scan beginning with the end of the
current line. When it runs out of matches, the thread is done and exits
normally.
14. As with the previous sample app, the possibility remains that strstr could scan
past the end of the virtual address range into which the file has been mapped
while looking for a null-terminator. This is much more likely if the file happens
to end on an exact page boundary. If strstr runs past the end of the mapped
region, an access violation may be raised, so we trap and eat any exceptions
raised by the search loop. Again, we're doing this based on the assumption
that an access violation caused by strstr running off the end of the mapped file
area is the only likely cause of an exception from this routine even though
there remains the possibility that some other obscure condition could raise an
exception within the loop. The code isn't intended to demonstrate exhaustive
or even robust exception handling; the idea is to keep it as simple as possible
while remaining functional enough that you get a sense of some of the
practical uses of memory-mapped file I/O.
You should now have an understanding of some of the things you can do with
mapped file I/O and multithreading. SQL Server uses memory-mapped files in
several places itself, so understanding the basics of how mapped file I/O can be put
to work by an application will give you some insight into how SQL Server makes use
of it.
1. Which Win32 API function is responsible for mapping a file into memory and
returning a pointer to its starting address?
2. True or false: Windows will relocate DLLs and other images that have been
mapped into a process's address space to make room for a file you are
attempting to map into memory.
3. What's the simplest way for an application to change the contents of a file
that's been mapped into memory?
4. True or false: In order for a file to be mapped into memory, its file object must
have been created with the FILE_FLAG_MAPPED switch.
7. True or false: Windows' mapped file I/O facility is produced jointly by the I/O
system and the memory manager.
8. When a file is mapped into virtual memory, at what point does Windows copy it
to the system paging file?
9. What Win32 API can a thread call to flush the modified pages in a memory-
mapped file immediately to disk?
11. What Win32 API function did we use in this chapter to get the exact size of the
memory region into which a file is mapped?
12. True or false: Although a file that has been mapped into virtual memory serves
as the physical storage for the memory, it takes longer to map a file into
memory than to allocate a memory buffer and copy the file from disk because
the Windows memory manager almost always processes I/O synchronously.
14.
True or false: The address range set aside for a memory-mapped file comes
from the default system heap.
15. True or false: One way in which Windows makes use of its own mapped file I/O
facility is with image file activation.
Chapter 6. Networking Fundamentals
It is impossible to calculate the moral mischief, if I may so express it, that
mental lying has produced in society. When a man has so far corrupted and
prostituted the chastity of his mind as to subscribe his professional belief to
things he does not believe, he has prepared himself for the commission of
every other crime.
�Thomas Paine[1]
[1] Paine, Thomas. The Age of Reason, ed. Philip S. Foner. New York: Citadel Press, 1974, p. 50.
The purpose of network software is to take a client request for a resource, execute
the request on the remote machine containing the requested resource, and return
the results to the client. Before networking functionality was built into operating
systems, this was a nontrivial proposition. All sorts of ill-fitting and interim-type
solutions were used to provide basic connectivity between machines and to make it
as seamless as possible (e.g., TSRs, topology-dependent utilities, and so on). With
the advent of OS-integrated network support, interconnectivity is not only trivial and
commonplace, it is expected. Today, we've moved beyond basic connectivity to
things like WiFi and Gigabit Ethernet. Intermachine networking has become as
humdrum as running water and electricity.
Overview
Connection-oriented Winsock application� an application that makes use of
a stream or reliable connection to communicate using the Winsock API.
Name resolution� the process of translating a machine name into a network
address.
Function Description
Successful networking requires that a machine be able to figure out how to get to a
machine containing a remote resource and what communications protocols that
machine can use. The process of translating a machine name into a network address
is known as name resolution. The process of negotiating a compatible set of
communications protocols is known as negotiation.
Once name resolution and protocol negotiation have been successfully completed, a
network request must be altered for transmission across the network by dividing it
into packets that the physical medium can transport. When the request reaches the
destination, it must be reassembled from its packets, checked for completeness,
decoded, and passed on to the appropriate OS component for fulfillment. Once the
OS component fulfills the request, the process must be reversed in order to send the
results back to the client.
Networking software can be classified into four basic categories: services, APIs,
protocols, and network adapter device drivers. Each type is layered on top of the
next to form what's commonly referred to as the network stack. The components in
Windows that implement the network stack correspond loosely to the Open Systems
Interconnection (OSI) reference model, first introduced in 1974 by the International
Organization for Standardization (ISO). You can download the OSI reference model
from the ISO Web site at http://www.iso.org. Table 6.2 details the seven layers of the
model.
Keep in mind that this is more of a conceptual model than something that vendors
implement exactly. It's often used to discuss networking in the abstract, with
implementation details left to vendors. The important thing is to glean its key tenets
and understand how Windows implements them.
Layer
Layer Description
Number
Each layer in the model exists to provide services to the higher layers and to provide
an abstract interface to the services provided by lower layers. It's helpful to think of
each layer built on top of the Physical layer as another level of indirection removed
from the actual process of transmitting bits over the network cabling. As we travel
up the stack, we get further and further away from the physical transmission of data
until we get to the Application layer, which is completely unaware of how data is
physically moved between machines.
Consider how the layers on one machine communicate with those on another in a
network conversation. Conceptually, each layer on the first machine talks to the
same layer on the other machine, and both layers use the same protocol. For
example, a Windows Socket application on one machine will behave as though it is
communicating directly with a Windows Socket application on the other machine.
Physically, however, data must travel down each machine's network stack to the
physical medium, then across the medium to the other machine and back up the
other machine's network stack. So, although the Transport layers on each machine
involved in a network conversation may logically appear to be talking to one
another, in actuality, they are communicating with their own Network, Data-link, and
Physical layers, and it is the Physical layers of the two machines that actually
communicate directly with one another over the network cabling.
By convention, the seven layers of the model are further divided into two broader
tiers. The bottom four layers are commonly referred to as the transport, and the top
three are referred to as clients of the transport or users of the transport. The OSI
Transport layer is where we start to get into the physical logistics of getting from one
machine to another, so dividing the seven layers along these lines is useful from a
conceptual standpoint.
As with other operating systems, Windows doesn't implement the OSI reference
model precisely. It has some layers that the OSI model doesn't have, and a few of its
layers span more than one OSI layer. Table 6.3 lays out the OSI-to-Windows network
component mapping.
Windows
Layer
Networking OSI Layer Comments
Number
Component
7 Network Application
application
5 Network API Session Kernel mode drivers responsible for the kernel
driver mode portion of a network API's implementation.
4 Protocol Transport
driver
(TCP/IP, IPX, Network
3 etc.)
2 NDIS library Data-link The NDIS library encapsulates the kernel mode
and miniport environment for adapter drivers. NDIS miniport
drivers are kernel mode drivers that interface a
TDI transport with a specific network adapter.
Hardware
abstraction
layer
1 Ethernet, Physical
IrDa, etc.
As with any technology decision, which network API you should use in a particular
application comes down to how well the API meets your application's needs. What a
network API can offer an application can vary a great deal in terms of the network
protocols it can use, what types of communication it supports (reliable versus
unreliable, bidirectional versus unidirectional, and so on), and its portability to other
versions of Windows or other operating systems that you might want it to either run
on or be easily portable to. There's no single networking API that is better than every
other networking API in every situation.
Named Pipes
Windows Sockets
NetBIOS
Of these, we'll discuss RPC, Named Pipes, and Windows Sockets because SQL Server
offers network libraries for each of them (the multiprotocol Net-Library uses
Windows' RPC facility). Common Internet File System is the mechanism by which
files are shared on a Windows network. NetBIOS is mostly a legacy API that predates
the emergence of TCP/IP and Sockets as the prevalent internetworking technology
for computers.
// fstring_pipe.cpp : Multithreaded file
//
#include "stdafx.h"
#include "windows.h"
#include "stdlib.h"
#include "process.h"
#include "bufsrch.h"
#include "iobuf.h"
#define IO_STREAMS_PER_PROCESSOR 2
//Entry-point routine for the worker
threads unsigned __stdcall
StartSearch(LPVOID lpParameter) {
return ((CBufSearch*)lpParameter)-
>Search();
char *szFileName,
HANDLE hOutputFile, char *szSearchStr,
HANDLE
hInputFile=INVALID_HANDLE_VALUE
char szFullPathName[MAX_PATH+1];
DWORD dwNumThreads; HANDLE
hPrivHeap;
HANDLE *hThreads;
bool bPipe;
char szMsg[1024];
DWORD dwOutput;
strcpy(szFullPathName,szPath);
strcat(szFullPathName,szFileName);
DWORD dwFileSizeHigh; DWORD
dwFileSizeLow; DWORD dwlFileSize;
bPipe=
(INVALID_HANDLE_VALUE!=hInputFile);
if (!bPipe) {
hInputFile=CreateFile(szFullPathName,
GENERIC_READ,FILE_SHARE_READ, NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL
| FILE_FLAG_OVERLAPPED
| FILE_FLAG_NO_BUFFERING
,NULL);
if (INVALID_HANDLE_VALUE==hInputFile)
{
return -1;
sprintf(szMsg,"Searching for %s in
%s\n\n",szSearchStr, szFileName);
WriteFile(hOutputFile,szMsg,strlen(szMsg)
,&dwOutput,NULL);
DWORD dwRetries=0;
do {
dwFileSizeLow=GetFileSize(hInputFile,&d
wFileSizeHigh);
dwlFileSize=
(dwFileSizeHigh*MAXDWORD)+
dwFileSizeLow;
DWORD dwNumClusts=dwlFileSize /
dwClusterSize;
if (dwNumClusts<1) dwNumClusts=1;
//If file is less than 4GB and we have
more requested //streams (IO threads)
than clusters, set the # of //threads = to
the # of clusters if
((dwlFileSize<0xFFFFFFFF) &&
(dwNumStreams>dwNumClusts))
dwNumThreads=dwNumClusts; else
dwNumThreads=dwNumStreams;
#if(_DEBUG)
sprintf(szMsg,"Using %d
threads\n\n",dwNumThreads);
WriteFile(hOutputFile,szMsg,strlen(szMsg),
&dwOutput,NULL); #endif
return -1;
pIoFirst=new
CIoBuf(pIoFirst,hPort,dwClusterSize+1);
pbFirst=new CBufSearch(pbFirst,
szFileName,
szSearchStr,
hOutputFile);
hThreads[i]=
(HANDLE)_beginthreadex(NULL,
0,
&StartSearch,
pbFirst,
CREATE_SUSPENDED,
&uThreadId);
if (!hThreads[i]) {
pIoFirst->s_pIoFirst=pIoFirst;
do {
CIoBuf *pIoBuf=
pIoFirst-
>SpinToFindBuf(BUF_STATE_INACTIVE,
BUF_STATE_READING);
(DWORD)(dwlFilePos / MAXDWORD);
pIoBuf->m_OverlappedIO.Offset=
(DWORD)(dwlFilePos % MAXDWORD);
&pIoBuf->m_dwBytesRead, &pIoBuf-
>m_OverlappedIO)) {
DWORD dwLastErr=GetLastError(); if
(ERROR_IO_PENDING!=dwLastErr) {
//including EOF
InterlockedExchange(
(LPLONG)&pIoBuf->s_bTerminated,
(long)true);
printf(
return -1;
else
bEof=true;
break;
else {
(LPLONG)&pIoBuf->s_bOverlapped,
(long)true);
}
else {
InterlockedExchange(
(LPLONG)&pIoBuf->s_bOverlapped,
(long)false);
pIoBuf->SetState(BUF_STATE_READY);
dwlFilePos+=dwClusterSize; }
INFINITE);
dwFindCount+=pbFirst-
>m_dwFindCount; pbNext=pbFirst-
>m_pbNext; delete pbFirst;
}
pIoNext=pIoFirst->m_pIoBufNext; delete
pIoFirst;
CloseHandle(hThreads[i]); }
if (!bPipe)
CloseHandle(hInputFile);
WriteFile(hOutputFile,szMsg,strlen(szMsg)
,&dwOutput,NULL);
HANDLE hOutputFile; if
(strcmp(szOutput,"CONOUT$")) {
do {
hOutputFile=
CreateFile(szOutput, GENERIC_WRITE,
FILE_SHARE_READ,
NULL,
CREATE_ALWAYS,
FILE_ATTRIBUTE_NORMAL, NULL);
if
(INVALID_HANDLE_VALUE==hOutputFile) {
printf(
} while
((INVALID_HANDLE_VALUE==hOutputFile)
&& (!SleepEx(5000,false)));
}
else
hOutputFile=
GetStdHandle(STD_OUTPUT_HANDLE);
return hOutputFile; }
DWORD dwOutput;
HANDLE hOutputFile;
//Extract the file path from the specified
mask char *p=strrchr(szFileMask,'\\'); if (p)
{
strncpy(szPath,szFileMask,(p-
szFileMask)+1); szPath[(p-
szFileMask)+1]='\0'; }
else
GetCurrentDirectory(MAX_PATH,szPath);
printf("Searching for %s in
%s\n\n",szSearchStr, szFileMask);
GetSystemInfo(&si);
DWORD dwClusterSize=
(dwSectorsPerCluster *
dwBytesPerSector);
DWORD dwFindCount=0;
HANDLE
hInputPipe=INVALID_HANDLE_VALUE;
strupr(szFileMask);
char
*pipestr=strstr(szFileMask,"\\PIPE\\"); if
(pipestr) {
while (1) {
hOutputFile=OpenOutputFile(szOutput);
hInputPipe=CreateNamedPipe(szFileMask
, PIPE_ACCESS_INBOUND
,PIPE_TYPE_BYTE,
PIPE_UNLIMITED_INSTANCES,
si.dwPageSize,
si.dwPageSize,
INFINITE,
NULL);
if
(INVALID_HANDLE_VALUE==hInputPipe) {
sprintf(szMsg,
WriteFile(hOutputFile, szMsg,
strlen(szMsg),
&dwOutput,
NULL);
return false;
}
printf(
ConnectNamedPipe(hInputPipe ,NULL
);
dwFindCount+=SearchFile(dwClusterSize
, si.dwNumberOfProcessors*
IO_STREAMS_PER_PROCESSOR,
szFileMask,
"",
hOutputFile,
szSearchStr,
hInputPipe
);
DisconnectNamedPipe(hInputPipe);
CloseHandle(hInputPipe);
if
(GetStdHandle(STD_OUTPUT_HANDLE)!=h
OutputFile) CloseHandle(hOutputFile);
};
else {
do {
DWORD dwpFindCount=0; //Count for
period
do {
hOutputFile=OpenOutputFile(szOutput);
dwpFindCount+=SearchFile(dwClusterSize
, si.dwNumberOfProcessors*
IO_STREAMS_PER_PROCESSOR, szPath,
fdFiles.cFileName, hOutputFile,
szSearchStr,
hInputPipe
);
dwFindCount+=dwpFindCount;
if
(GetStdHandle(STD_OUTPUT_HANDLE)!=
hOutputFile)
CloseHandle(hOutputFile);
} while (FindNextFile(hFind,&fdFiles));
FindClose(hFind);
if (dwPeriod)
printf(
dwpFindCount);
return true;
if (argc<3) {
printf("Usage is: fstring_pipe
filemask|pipe searchstring
outputfilename|pipe polling_interval_secs
\n"); return 1;
try
strncpy(szOutpath,argv[3],MAX_PATH);
else
strcpy(szOutpath,"CONOUT$");
dwPeriod=atol(argv[4])*1000;
catch (...)
return 1;
hInputPipe=CreateNamedPipe(szFileMask,
PIPE_ACCESS_INBOUND
,PIPE_TYPE_BYTE,
PIPE_UNLIMITED_INSTANCES,
si.dwPageSize,
si.dwPageSize,
INFINITE,
NULL);
ConnectNamedPipe(hInputPipe
,NULL
);
#include "windows.h"
#include "stdafx.h"
#include "winsock2.h"
DWORD dwError;
char szBuf[BUFF_SIZE+1];
dwError=WSAStartup(wVersionRequeste
d,&wsaData); if (dwError!= 0 ) {
if (LOBYTE(wsaData.wVersion) != 2 ||
HIBYTE(wsaData.wVersion) != 0 ) {
WSACleanup( );
WSASocket(AF_INET, SOCK_STREAM,
0,
NULL,
0,
0);
SOCKET hClientSocket;
sockaddr_in soServerAddress;
ZeroMemory(&soServerAddress,sizeof(soS
erverAddress));
soServerAddress.sin_family=AF_INET;
soServerAddress.sin_addr.s_addr=htonl(IN
ADDR_ANY);
soServerAddress.sin_port=htons(1234);
printf("Binding socket.\n");
printf("Listening...\n");
printf("Client connected\n");
HANDLE
hStdOut=GetStdHandle(STD_OUTPUT_HA
NDLE);
do {
int
iBytesRead=recv(hClientSocket,szBuf,BUF
F_SIZE,0); if
(SOCKET_ERROR!=iBytesRead) {
if (iBytesRead) {
FOREGROUND_INTENSITY); printf(
"Received %d bytes.
Contents=\n",iBytesRead);
else {
dwError=WSAGetLastError(); printf(
} while (!dwError);
return 0;
#include "windows.h"
#include "stdafx.h"
#include "winsock2.h"
#include "stdlib.h"
if (argc<2) {
printf(
WORD wVersionRequested; WSADATA
wsaData;
int iErr;
wVersionRequested = MAKEWORD(2,0);
iErr =
WSAStartup(wVersionRequested,&wsaDat
a); if (iErr != 0) {
return -1;
if (LOBYTE(wsaData.wVersion) != 2 ||
HIBYTE(wsaData.wVersion) != 0 ) {
WSACleanup( );
return -1;
}
strncpy(szHostName,argv[1],p-argv[1]);
szHostName[p-argv[1]]='\0';
p++;
int iPort=atoi(p);
HOSTENT
*heServer=gethostbyname(szHostName);
if (!heServer) {
printf("Unknown host
%s\n",szHostName); return -1;
}
SOCKET soServer=WSASocket(AF_INET,
SOCK_STREAM,
0,
NULL,
0,
0
);
saServerAddress.sin_family = AF_INET;
memcpy(&(saServerAddress.sin_addr),
heServer->h_addr_list[0], heServer-
>h_length);
saServerAddress.sin_port = htons(iPort);
printf("Connection failed
%d\n",WSAGetLastError()); }
DWORD dwLastErr=0;
do {
send(soServer, (char*)(&szBuffer),
BUFF_SIZE, 0)) {
dwLastErr=WSAGetLastError();
printf("Send error. %d\n", dwLastErr); }
return 0;
localhost:1234 L
Received 4096 bytes. Contents=
dwError=0;
do {
DWORD dwBytesRead; if
(!ReadFile((HANDLE)hClientSocket, szBuf,
BUFF_SIZE,
&dwBytesRead, NULL)) {
dwError=GetLastError(); if
(ERROR_HANDLE_EOF==dwError) break;
else {
if (dwBytesRead) {
//Make sure buff is null-terminated
szBuf[dwBytesRead]='\0';
FOREGROUND_INTENSITY); printf(
else break;
} while (!dwError);
//Search the files matching a given mask
for a //specified string
DWORD dwPeriod=0) {
HANDLE hOutputFile;
strncpy(szPath,szFileMask,(p-
szFileMask)+1); szPath[(p-
szFileMask)+1]='\0'; }
else
//If no path was specified, use the current
//folder
GetCurrentDirectory(MAX_PATH,szPath);
printf("Searching for %s in
%s\n\n",szSearchStr, szFileMask);
GetSystemInfo(&si);
DWORD dwClusterSize=
(dwSectorsPerCluster *
dwBytesPerSector);
DWORD dwFindCount=0;
HANDLE
hInputPipe=INVALID_HANDLE_VALUE;
DWORD dwInputType=INPUT_TYPE_FILE;
strupr(szFileMask);
char
*pszPipe=strstr(szFileMask,"\\PIPE\\"); char
*pszPort=PortString(szFileMask); if
(pszPipe) dwInputType=INPUT_TYPE_PIPE;
else {
if (pszPort)
dwInputType=INPUT_TYPE_SOCKET;
switch (dwInputType) {
case INPUT_TYPE_PIPE : {
while (1) {
hOutputFile=OpenOutputFile(szOutput);
hInputPipe=CreateNamedPipe(szFileMask
, PIPE_ACCESS_INBOUND
,PIPE_TYPE_BYTE,
PIPE_UNLIMITED_INSTANCES,
si.dwPageSize,
si.dwPageSize,
INFINITE,
NULL);
if
(INVALID_HANDLE_VALUE==hInputPipe) {
sprintf(szMsg,
WriteFile(hOutputFile, szMsg,
strlen(szMsg),
&dwOutput,
NULL);
return false;
printf(
ConnectNamedPipe(hInputPipe ,NULL
);
dwFindCount+=SearchFile(dwClusterSize
, si.dwNumberOfProcessors*
IO_STREAMS_PER_PROCESSOR,
szFileMask,
"",
hOutputFile,
szSearchStr,
hInputPipe);
DisconnectNamedPipe(hInputPipe);
CloseHandle(hInputPipe);
CloseOutputFile(szOutput,hOutputFile); };
break;
case INPUT_TYPE_SOCKET : {
while (1) {
if (!InitializeWSA()) {
return false;
hOutputFile=OpenOutputFile(szOutput);
printf("Opening socket for
%s\n",szFileMask); //Get a socket for the
server SOCKET hServerSocket=
WSASocket(AF_INET, SOCK_STREAM,
0,
NULL,
0,
0);
SOCKET hClientSocket;
sockaddr_in soServerAddress;
ZeroMemory(&soServerAddress,
sizeof(soServerAddress));
u_short usPort=atoi(pszPort);
soServerAddress.sin_family=AF_INET;
soServerAddress.sin_addr.s_addr=
htonl(INADDR_ANY);
soServerAddress.sin_port=htons(usPort);
szFileMask);
hClientSocket = accept(hServerSocket,
(sockaddr*)(&soClientAddress),
&iAddrSize);
dwFindCount+=SearchFile(dwClusterSize
, si.dwNumberOfProcessors*
IO_STREAMS_PER_PROCESSOR,
szFileMask,
"",
hOutputFile,
szSearchStr,
(HANDLE)hClientSocket);
CloseOutputFile(szOutput, hOutputFile);
break;
case INPUT_TYPE_FILE: {
do {
DWORD dwpFindCount=0; //Count for
period
FindFirstFile(szFileMask,&fdFiles);
if (INVALID_HANDLE_VALUE == hFind) {
do {
hOutputFile=OpenOutputFile(szOutput);
dwpFindCount+=SearchFile(dwClusterSize
, si.dwNumberOfProcessors*
IO_STREAMS_PER_PROCESSOR, szPath,
fdFiles.cFileName, hOutputFile,
szSearchStr,
hInputPipe);
dwFindCount+=dwpFindCount;
CloseOutputFile(szOutput,hOutputFile);
} while ((FindNextFile(hFind,&fdFiles)));
FindClose(hFind);
if (dwPeriod)
printf(
szFileMask,
dwpFindCount);
printf("\nTotal hits for %s in %s:\t%d\n",
szSearchStr,szFileMask,dwFindCount);
return true;
HANDLE hOutputFile; if
(stricmp(szOutput,"CONOUT$")) {
//Socket
char *pszPort=PortString(szOutput); if
(pszPort) {
if (!InitializeWSA()) {
return INVALID_HANDLE_VALUE; }
char szHostName[MAX_PATH+1];
strncpy(szHostName,szOutput,pszPort-
szOutput-1); szHostName[pszPort-
szOutput-1]='\0';
HOSTENT
*heServer=gethostbyname(szHostName);
if (!heServer) {
printf("Unknown host
%s\n",szHostName); return
INVALID_HANDLE_VALUE; }
hOutputFile=
(HANDLE)WSASocket(AF_INET,
SOCK_STREAM,
0,
NULL,
0,
0) ;
u_short usPort=atoi(pszPort);
saOutput.sin_family = AF_INET;
memcpy(&(saOutput.sin_addr), heServer-
>h_addr_list[0], heServer->h_length);
saOutput.sin_port = htons(usPort);
do {
} while ((connect((SOCKET)hOutputFile,
(sockaddr *) &saOutput, sizeof(saOutput)))
&& (printf(
else {
//File or pipe
do {
hOutputFile=
CreateFile(szOutput, GENERIC_WRITE,
FILE_SHARE_READ,
NULL,
CREATE_ALWAYS,
FILE_ATTRIBUTE_NORMAL, NULL);
} while
((INVALID_HANDLE_VALUE==hOutputFile)
&& (printf(
"Waiting on output file/pipe. Last
error=%d\n", GetLastError())) &&
(!SleepEx(5000,false))); }
else
hOutputFile=
GetStdHandle(STD_OUTPUT_HANDLE);
return hOutputFile; }
RPC makes use of other network APIs (e.g., Named Pipes, Message Queuing, or
Winsock) in order to provide a programming model that hides most of the details of
communicating over a network from the developer. Given that RPC actually rides on
top of other APIs, it can use any network transport that those APIs support�it's
compatible with any transport on the system.
An RPC app is composed of a series of procedures; some are local, some reside on
other machines. To the app, they're all local. Procedures that actually reside on other
machines have stub procedures on the local machine that match their calling
conventions and parameter lists exactly. The application developer simply calls
these routines, not necessarily cognizant of where they physically reside. For basic
apps, these stub routines are typically linked statically into the application. For more
complex apps, they often reside in separate DLLs. With Distributed COM (DCOM),
which uses RPC as its means of executing code on other machines, the stub routines
are usually in separate DLLs.
Marshaling
When a stub routine is called, the parameters passed into it must be marshaled for
transport across the network to the actual routine. You can think of marshaling as a
traffic cop or escort that takes care of the work necessary to route data between
different execution contexts. In the case of marshaling between machines, you're
obviously routing data from one process to another, so one thing that marshaling
must take care of is dereferencing any pointers that are passed into the stub routine,
encapsulating the data they reference, and sending it over to the other machine. It
has to do this because a pointer in one process isn't going to be useful to another
process, particularly if the data it references isn't even there. I like to think of
marshaling as the process of taking the data represented by the parameters passed
into a routine by the hand and making sure it gets to its destination in a usable form.
When a stub is called, it invokes RPC runtime routines that carry out the work of
communicating with the destination computer, negotiating a compatible set of
protocols, and sending the request to the other machine. When the destination
machine gets the request, it unmarshals the parameters (by placing the
encapsulated data back into memory allocations that can then be referenced by the
original pointers and fixing up those pointers so that they reference the correct
locations) and calls the original procedure with the correct parameter values. All the
while, the whole process is transparent to the client-side application code�it simply
called a procedure. Once the call completes on the remote system, the whole
process is reversed in order to return the results to the caller.
Asynchronous RPC
Windows supports asynchronous RPC as well as synchronous RPC. When an
asynchronous RPC is made, the calling app continues to execute. When the call
completes, Windows' RPC facility signals an event that was originally associated with
the call. The caller can detect this through the traditional means of checking the
signal state of a kernel object such as calling WaitForSingleObject.
The RPC runtime resides in Rpcrt4.dll. You can use TList to verify that this DLL is
loaded into the SQL Server process space. This DLL is always loaded, regardless of
whether SQL Server is configured to listen on the multiprotocol Net-Library, which
uses the RPC API. This is because SQL Server's executable directly imports this DLL
by name. (You can check this out yourself by using the Depends or DumpBin tools
that ship with Visual Studio.)
Recap
SQL Server supports a variety of network transports and protocols. It's important to
have a basic understanding of how these technologies work and how applications
typically make use of them. Understanding how to use them in your own
applications will help you better understand how they are employed by SQL Server.
From an architectural point of view, SQL Server is just another application on the
network. When it communicates over the network with a client, it does so using
network APIs just as any other application would. When a client connects to SQL
Server over the network, it does so via the network stack, just as it would were it
communicating with some other type of server.
Knowledge Measure
1. True or false: The Winsock listen API function causes the caller to block until a
client connects.
2. What Win32 API function does a named pipe server call to wait on a client
connection?
3. Can a socket handle be used with basic Win32 I/O functions such as ReadFile?
7. True or false: Each layer in the OSI model exists to provide services to the
higher layers and to provide an abstract interface to the services provided by
lower layers.
8. True or false: Windows' networking model matches the OSI model precisely.
9. What is the difference between byte mode and message mode on a named
pipe?
10. What must the second portion of a named pipe's name consist of?
11. What's the shorthand representation for referencing the current machine in a
named pipe's name?
12. True or false: Although Windows' named pipes facility can make use of other
OS facilities such as network transports, it cannot make use of Windows'
security facilities because it is a port of code from OS/2 LAN Manager.
13. True or false: The purpose of network software is to take a client request for a
resource, execute the request on the remote machine containing the requested
resource, and return the results to the client.
14. True or false: A network redirector acts as a type of remote file system.
16. True or false: Windows' named pipes facility runs exclusively over the NetBEUI
protocol.
17. What is the special file name assigned by Windows to the console output
device?
18. Name two functions mentioned in this chapter for use with connectionless
Winsock clients and servers.
19. What byte order must be used to pass in IP addresses and port numbers to
Winsock?
20. What are the bottom four layers of the OSI reference model commonly referred
to as?
21. What underlying mechanism does DCOM use to run code across a network?
22. In what system DLL is most of the Named Pipes API implemented?
23. On what legacy API from UNIX is Windows' Winsock API based?
25. True or false: One advantage of the Named Pipes API over Winsock is that
Named Pipes can be used with asynchronous I/O, whereas Winsock supports
only synchronous I/O.
26. Name the three network protocols over which Windows supports layering
Winsock.
29. What flag must an application supply when creating a named pipe in order for
the pipe to participate in asynchronous operations?
30. What DLL contains the runtime for Windows' RPC facility?
Chapter 7. COM
Having the right don't make it right.
�Kenneth E. Routen
Given COM's ubiquity both inside and outside of SQL Server, no section on
application fundamentals would be complete without some discussion of it. I don't
have the time or space to give COM the treatment I'd like to in this book, so I
suggest you consult books such as Dale Rogerson's Inside COM (Redmond, WA:
Microsoft Press, 1997) and Don Box's Essential COM (Reading, MA: Addison-Wesley,
1998) to get the details of how COM works and how applications can make use of it.
In this chapter, I will update the coverage of COM from my previous books and cover
COM from a high-level standpoint. We'll also talk about how SQL Server exposes
some of its functionality via COM and how it makes use of external COM
components. See Chapter 15 on ODSOLE for more information on accessing COM
objects from Transact-SQL.
Overview
If you've built many Windows applications, you probably have at least a passing
familiarity with COM, OLE, and ActiveX. OLE originally stood for Object Linking and
Embedding and represented the first generation of cross-application object access
and manipulation in Windows. The idea was to have a document-centric view of the
world where an object from one application could happily reside in and interact with
another. OLE 1.0 used Dynamic Data Exchange (DDE) to facilitate communication
between objects. DDE is a message-based interprocess communication mechanism
based on the Windows' messaging architecture. DDE has a number of shortcomings
(it's slow, inflexible, difficult to program, and so on), so the second version of OLE
was moved off of it.
The second iteration of OLE was rewritten to depend entirely on COM. And even
though COM is more efficient and faster than DDE, OLE is still a bit of a bear to deal
with. Why? Because it was the first-ever implementation of COM. We've learned a lot
since then. That said, regardless of its implementation, OLE provides functionality
that's very powerful and very rich. It may be big, slow, and hard to code to, but
that's not COM's fault�that has to do with how OLE itself was built.
ActiveX is also built on COM. The original and still primary focus of ActiveX is on
Internet-enabled components. ActiveX is a set of technologies whose primary
mission is to enable interactive content (hence the "Active" designation) on Web
pages. Formerly known as OLE controls or OCX controls, ActiveX controls are
components you can insert into a Web page or Windows application to make use of
packaged functionality provided by a third party.
COM is the foundation on which OLE and ActiveX controls are built. Through COM, an
object can expose its functionality to other components and applications. In addition
to defining an object's life cycle and how the object exposes itself to the outside
world, COM also defines how this exposure works across processes and networks.
That all changed with the introduction of DLLs. Almost overnight, it became quite
common for third-party vendors to ship only header files and binaries. Instead of
being able to deploy a single executable, the developer would end up distributing a
sometimes sizeable collection of DLLs with his or her application. At runtime, it was
up to the application to load�either implicitly or explicitly�the DLLs provided by the
third-party vendor. As applications became more complex, it was not uncommon to
see executables that required dozens of DLLs with complex interdependencies
between them.
NOTE: This is, in fact, how Windows itself works�Windows is an executable with a
large collection of dynamically linked libraries. Windows apps make calls to the
functions exposed by these DLLs.
This approach worked reasonably well, but it had several drawbacks. One of the
main ones was that the interfaces to these DLLs weren't object-oriented and
therefore were difficult to extend and susceptible to being broken by even minor
changes to an exposed function. If a vendor added a new parameter to a function in
its third-party library, the change might well break the code of everyone currently
using that library. The approach most vendors took to address this was simply to
create a new version of the function (often with an "Ex" suffix or something similar)
that included the new parameter. The end result was call-level interfaces that
became unmanageable very quickly. It was common for third-party libraries (and
even Windows itself) to include multiple versions of the same function call in an
attempt to be compatible with every version of the library that had ever existed. The
situation quickly grew out of control, exacerbated by the fact that there was no easy,
direct method for users of these libraries to know which of the many versions of a
given function should be used. Coding to these interfaces became a trial-and-error
exercise that involved lots of scouring of API manuals and guesswork.
Another big problem with this approach was the proliferation of multiple copies of
the same DLL across a user's computer. Hard drive space was once much more
expensive than it is now, so having multiple copies of a library in different places on
an end user's system was something vendors sought to avoid. Unfortunately, their
solution to the problem wasn't really very well thought out. Their answer was to put
the DLLs their apps needed in the Windows system directory. This addressed the
problem of having multiple copies of the same DLL, but it introduced a whole host of
other issues.
Chief among these was the inherent problems with conflicting versions of the same
DLL. If Vendor A and Vendor B depended on different versions of a DLL produced by
Vendor C, there was a strong likelihood that one of their products would be broken
by the other's version of the DLL. If the interface to the DLL changed even slightly
between versions, it was quite likely that at least one of the apps would misbehave
(if it worked at all) when presented with a version of the DLL it wasn't expecting.
Another problem with centralizing DLLs was the trouble that arose from centralized
yet unmanaged configuration information. In the days before the Windows registry,
it was common to have a separate configuration file (usually with a .INI extension)
for every application (and even multiple configuration files for some applications).
These configuration files might include paths to DLLs that the application made use
of, further complicating the task of unraveling DLL versioning problems. Because
these configuration files were not managed by Windows itself, there was nothing to
stop an application from completely wiping out a needed configuration file, putting
entries into it that might break other applications, or completely ignoring it. These
.INI files were simply text files that an application could use or not use as it saw fit.
The progression used by Windows to locate DLLs was logical and well documented;
however, the fact that an application might use Windows' LoadLibrary function and
grab a DLL from anywhere it pleased on a user's hard drive might not mean
anything in terms of knowing what code an application actually depended on. The
app might pick up a load path from a configuration file that no one else even knew
about, or it might just search the hard drive and load what it thought was the best
version of the library. It was common for applications to have subtle
interdependencies that made the applications themselves rather brittle. We had
come full circle from the days of bloated executables and little or no code
sharing�now everyone depended on everyone else, with the installation of one app
frequently breaking another.
The Dawn of COM
Microsoft's answer to these problems was COM. Simply put, COM provides an
interface to third-party code libraries that is
Object-oriented
Centralized
Versioned
Language-neutral
Since COM uses the system registry, the days of unmanaged or improperly used
configuration information are gone. When an application instantiates a COM object
(usually through a call to CreateObject), Windows checks the system registry to find
the object's location on disk and loads it. There's no guesswork, and multiple copies
of the same object aren't allowed�each COM object lives in exactly one place on the
system.
NOTE: Microsoft has recently introduced the concept of COM redirection and side-
by-side deployment. This allows multiple versions of the same COM object to reside
happily on the same system. This functionality has all the hallmarks of an
afterthought and applies only in limited circumstances. (You can't, for example, use
COM redirection to load different copies of an object into different Web applications
on an IIS implementation�though the Web pages may seem like different apps to
users, there's actually just one application�IIS�in the scenario, and COM still limits
a given app to just one copy of a particular object version.) The vast majority of COM
applications still abide by the standard COM versioning constraints.
This isn't to say that you can't have multiple versions of an object on a system. COM
handles this through multiple interfaces: Each new version of an object has its own
interface and might as well be a completely separate object as far as its users are
concerned. There may or may not be code sharing between the versions of the
object. As an application developer, you try not to worry about this�you just code to
the interface.
COM has its limitations (most of which are addressed in the .NET Framework), but it
is ubiquitous and fairly standardized. The world has embraced COM, so SQL Server
includes a mechanism for working with COM objects from Transact-SQL.
Basic Architecture
The fundamental elements of COM are the following:
Reference counting
Marshaling
Aggregation
Interfaces
Each interface has an interface ID (IID), a GUID that identifies it uniquely. This makes
it easy to support interface versioning. A new version of a COM interface is actually a
separate interface with its own IID. The IIDs for the standard ActiveX, OLE, and COM
interfaces are predefined.
Reference Counting
Unlike .NET and the Java Runtime, COM does not perform automatic garbage
collection. Disposing of objects that are no longer needed is left to the developer.
You use an object's reference count to determine whether the object can be
destroyed.
The IUnknown methods AddRef and Release manage the reference count of
interfaces on a COM object. When a client receives a pointer to a COM interface (a
descendent of IUnknown), AddRef must be called on the interface. When the client
has finished using the interface, it must call Release.
In its most primitive form, each AddRef call increments a counter variable inside its
object and each Release call decrements it. When this count reaches zero, the
interface no longer has any clients and can be destroyed.
You can also implement reference counting such that each reference to an object (as
opposed to an interface implemented by the object) is counted. In this scenario,
calls to AddRef and Release are delegated to a central reference count
implementation. Release frees the whole object when its reference count reaches
zero.
QueryInterface queries an object using the IID of the interface the caller wants a
pointer to. If the object implements the specified interface, QueryInterface retrieves
a pointer to it and also calls AddRef. If the object does not implement the interface,
QueryInterface returns the E_NOINTERFACE error code.
Marshaling
I like to think of marshaling as a kind of traffic cop or escort for data from one
process to another (marshaling sometimes occurs between threads within a process
as well). Marshaling enables the COM interfaces exposed by an object in one process
to be accessed by another process. If a structure contains a pointer to a piece of
data within a process's address space, that pointer is meaningless to other
processes. Marshaling it involves copying the reference and the data into a format
that can be sent to the other process. Once the other process receives the
marshaled data, the marshaling is reversed (or unmarshaled). The data is copied
somewhere within the new process's address space and a reference to it is again
established.
Through marshaling, COM either provides code or uses code provided by the
implementer of the interface to pack a method's parameters into a format that can
be shipped across processes or across the network to other machines and to unpack
those parameters during the call. When the call returns, the process is reversed.
Aggregation
For those situations when an object's implementer wants to make use of the services
offered by another (e.g., third-party) object and wants this second object to function
as a natural part of the first one, COM supports the concepts of aggregation and
containment.
By aggregation, I mean that the containing object creates the contained object as
part of its construction process and exposes the interfaces of the contained object
within its own interface. Some objects can be aggregated, some can't. An object
must follow a specific set of rules to participate in aggregation.
COM at Work
Practically speaking, COM objects are used through two basic means: early binding
and late binding. When an application makes object references that are resolvable at
compile-time, the object is early bound. To early bind an object in Visual Basic, you
add a reference to the library containing the object to your project, then Dim specific
instances of it. To early bind an object in tools like Visual C++ and Delphi, you
import the object's type library and work with the interfaces it provides. In either
case, you code directly to the interfaces exposed by the object as though they were
interfaces you created yourself. The object itself may live on a completely separate
machine and be accessed via Distributed COM (DCOM) or be marshaled by a
transaction manager such as Microsoft Transaction Server or Component Services.
Generally speaking, you don't care�you just code to the interface.
When references to an object aren't known until runtime, the object is late bound. In
Visual Basic, you instantiate a late-bound object via a call to CreateObject and store
the object instance in a variant. In Visual C++, you obtain a pointer to the object's
IDispatch interface (all automation objects implement IDispatch), then call
GetIDsOfNames and Invoke to call the object's methods and get and set its
properties.
Since the compiler didn't know what object you were referencing at compile-time,
you may encounter bad method calls or nonexistent properties at runtime. That's
the trade-off with late binding: It's more flexible in that you can decide at runtime
what objects to create and can even instantiate objects that didn't exist on the
development system, but it's more error prone�it's easy to make mistakes when
you late bind objects because your development environment can't provide the
same level of assistance it can when it knows the objects you're dealing with.
Accessing COM objects via late binding is also slower than doing so via early binding,
sometimes dramatically so. ProgIDs must be translated into IIDs, and the dispatch
IDs of exposed methods and properties must be looked up at runtime in order to be
callable. This takes time, and the difference in execution speed is often quite
noticeable.
Once you have an instance of an object, you call methods and access properties on
it like any other object. COM supports the notion of events (though they're a bit
more trouble to use than they should be), so you can subscribe and respond to
events on COM objects as well.
Threading Models
COM objects support two primary threading models: single-threaded apartment
(STA, also known as "apartment-threaded") and multithreaded apartment (MTA, also
known as "free-threaded"). Don't let the "apartment" concept confuse you or scare
you away. The term helps define a conceptual framework that describes the
relationships among threads, objects, and processes. It establishes an analogy
wherein a process is the equivalent of a building, and the logical container in which
COM threads and objects exist within a process is an apartment�that is, a set of
rooms within the building. Thus an apartment is simply a logical container within a
process. As the analogy suggests, an apartment might contain multiple threads
and/or objects, depending on its type, and a single process might have many
individual apartments.
Each object and each thread can belong to only one apartment. Only the threads
within an apartment can access the objects in that apartment directly; all other
threads go through COM proxies of some sort. A thread establishes residence in an
apartment (and optionally specifies the threading model it wants to use) through a
call to a COM initialization function such as CoInitialize, CoInitializeEx, or
OleInitialize.
In the STA model, an apartment has a single thread and can contain multiple
objects. In the MTA model, an apartment can have multiple threads and multiple
objects. A process can have multiple STAs but only one MTA. This one MTA can
coexist with multiple STAs in the same process.
The threading model for an in-process COM server (a DLL) is not specified via a call
to CoInitializeEx. Instead, it's specified via a registry key, like this:
HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\InprocServer32\
ThreadingModel
The value of this key can be Apartment, Free, or Both. If the key is not found, STA is
assumed.
As I've said, only the threads residing in the apartment in which a COM object was
created can directly access the object. Other threads access the object through
proxy objects. Given that only one thread resides in a given STA apartment, COM
takes responsibility for synchronizing object access by other threads. It accomplishes
this via Windows' messaging facilities. It creates a hidden window for each
apartment and sets up proxies to post messages to the apartment owning an object
in order to invoke it. The serialization of access to the object is managed through
Windows' normal message queuing facilities. Methods on the object are called in
response to messages posted to the apartment's hidden window. As messages are
pulled out of the message queue (via PeekMessage and GetMessage) and
dispatched (via DispatchMessage), the window procedure for the thread, which is
implemented by COM, invokes the appropriate methods on the object. The process is
reversed when the method call completes and results need to be returned to the
calling thread. Messages are posted to the hidden window for the calling thread's
apartment to provide the result(s) and indicate function completion. These
messages are picked up by the calling thread, and it, in turn, returns through the
proxy object method call, thus completing the method call on the object in the other
apartment.
When a component is configured for the STA model, the objects it exposes are
created on the Win32 thread that created it. Other threads can't access these object
instances.
When a component is configured for the MTA model, COM automatically starts a host
MTA and instantiates the objects in it. When a component's threading model has
been configured as both STA and MTA compatible, the object is created in the calling
STA.
The first thread to initialize COM using the STA threading model becomes the main
STA. This STA is required to remain alive until all COM work is completed within a
process because some in-process servers are always created in the context of the
main STA.
When you use DTS or the ActiveX replication objects, you are again making use of
COM. ActiveX scripts in SQL Server Agent as well as those in DTS packages use the
ActiveX script COM interfaces for defining scripting languages and executing user
code.
When you interact with SQLXML using the SQLXMLOLEDB provider or using the
sp_xml_preparedocument procedure, you're interacting with COM. As the name
suggests, SQLXMLOLEDB is an OLE DB provider. The sp_xml_preparedocument
procedure (and its counterpart, sp_xml_removedocument) makes use of MSXML,
Microsoft's XML parser, which is exposed to applications such as SQL Server via COM
interfaces.
And, of course, when you instantiate COM objects from T-SQL using the sp_OA stored
procedures, you're working with COM. As Chapter 15 details, the sp_OA functionality
is based on COM's IDispatch interface and interacts with it in the same way that
simple automation tools such as VBScript do.
Various other facilities within SQL Server also either use COM or expose their
functionality via COM. COM is pervasive throughout SQL Server, just as it is
throughout many complex Windows applications.
Recap
COM offers a language-independent mechanism for exposing the functionality in a
DLL or executable to the outside world. It is based on interfaces and the concept of
binding�an application that makes use of COM is said to be either early bound or
late bound to the COM interfaces it uses.
SQL Server both exposes its own functionality via COM and makes use of
functionality exposed by external COM components. Some examples of SQL Server
functionality exposed via COM include its use of OLE DB and its own native OLE DB
provider, SQLOLEDB, and SQL-DMO, the COM objects on which Enterprise Manager is
based; and SQLXMLOLEDB, the OLE DB provider that exposes the client-side SQLXML
functionality. Examples of external COM interfaces used by SQL Server include
MSXML, the ActiveX scripting interfaces, and OLE DB providers accessed via linked
server queries.
Knowledge Measure
1. What interface does an application interact with when accessing a COM object
via late binding?
4. Describe what happens when data is marshaled from one process to another.
7. What mechanism is used with COM objects to keep track of how many
references are currently pending for a given object?
10. True or false: When accessing an object via late binding using its ProgID, the
ProgID must be translated into an interface ID before the interfaces it
implements can be accessed.
Chapter 8. XML
The most formidable weapon against errors of every kind is reason. I have
never used any other, and I trust I never shall.
�Thomas Paine[1]
[1] Paine, Thomas. The Age of Reason, ed. Philip S. Foner. New York: Citadel Press, 1974, p. 49.
I've included a chapter on introductory XML in this book for four reasons. First, I
wanted to update the coverage of XML from my last book, The Guru's Guide to SQL
Server Stored Procedures, XML, and HTML (Boston, MA: Addison-Wesley, 2002), and
felt this was a good time to do that. The language has evolved since I first wrote that
book, and I've wanted to update what I said about XML for some time.
Second, I believe that the pervasiveness and ubiquity of XML justifies its acceptance
as a core technology. It is strongly supported and relied upon by SQL Server's
SQLXML technologies in the same way that COM, shared memory, and Windows
sockets are used and leveraged in other parts of the product. The next release of
SQL Server promises to provide even more XML support and features. XML has come
into its own in the last few years, and I think it's high time we acknowledged that
and added it to the list of foundational technologies that one must know something
about in order to master modern, complex applications such as SQL Server. The day
is soon approaching when you won't be able to claim to be a practicing technologist
without at least a rudimentary knowledge of XML.
Third, since XML support is now a core component of SQL Server, I think it deserves
to be covered in books that cover SQL Server. Given that I cover topics such as
virtual memory management and thread synchronization in this book because I
believe they are foundational to the architecture and design of SQL Server and
because I believe understanding them helps us know SQL Server better, I feel
compelled to cover XML as well�it is also a key foundational element in the SQL
Server architecture. Understanding XML helps us better understand how SQL Server
uses it and provides insight into how the product is designed. Important parts of SQL
Server are based on XML, and that will only become truer as time goes by. Inasmuch
as Windows sockets and shared memory deserve to be covered in a book focused on
the architectural design of SQL Server, so does XML.
Fourth, many SQL Server practitioners are new to XML. I believe that teaching XML
in a book that is principally about SQL Server gives those who haven't worked much
with it a chance to gain valuable knowledge and deepen their skills. Knowing XML
will make you a better SQL Server practitioner. By understanding a core tenet of the
SQL Server architecture, you will understand the product and its technologies more
viscerally. Understanding how XML works is fundamental to understanding not only
the design and implementation of SQL Server's XML support but also the current and
future course of the product.
Unfortunately, there isn't time or space to discuss XML as completely as I'd like to.
Whole books (many of them, in fact) have been written on the subject of XML and
the XML family of technologies. I'll try to hit the high points of what you need to
know about XML to be able to use SQL Server's XML-related features. That said, you
should probably supplement the coverage here with research of your own. (See the
Resources section later in this chapter for some recommendations.)
Overview
Thanks to the World Wide Web, HTML has taken over the world. And yet, despite its
popularity, HTML has always had a number of serious limitations. You don't have to
build Web applications for very long before you run into some of them. HTML works
reasonably well for formatting informal documents but not so well for more complex
tasks. It was never intended to describe the structure of data, but business needs
have caused it to be used to do just that. The fact that HTML is being used to do
things it was never intended to do has highlighted many of its shortcomings. This
has created the need for a more powerful markup language, one that's data-centric
rather than display-centric, one that doesn't just know how to format data but can
also give the data contextual meaning.
XML is the answer to many of the problems with HTML and with building extensible
applications in general. XML is easy for anyone who understands HTML to learn but
is overwhelmingly more powerful. XML is more than just a markup language�it's a
metalanguage�a language that can be used to define new languages. With XML,
you can create a language that's tailored to your particular application or business
domain and use it to exchange data with your vendors, your trading partners, your
customers, and anyone else that can speak XML.
Rather than replacing HTML, XML complements it. Beyond merely providing a means
of formatting data, XML gives it context. Once data has contextual meaning,
displaying it is the easy part. But displaying it is just one of the many things you can
do with the data once it has context. By correctly separating the presentation of the
data from its storage and management, we open up an almost infinite number of
opportunities for using the data and exchanging it with other parties.
In this chapter, we'll explore the history of markup languages and how XML came
into existence. We'll look at how data is presented in HTML and compare that with
how XML improves on it. We'll touch on the basics of XML notation and how XML can
be displayed through translation to HTML via XML style sheets. We'll talk about
document validation using both Document Type Definitions (DTDs) and XML
schemas, and we'll discuss some of the nuances of each. We'll finish up by touching
on the Document Object Model (DOM) and how it's used to manipulate XML
documents as objects.
Simplicity Comes at a Price
HTML's purpose is to format documents. It specifies display elements�titles,
headings, fonts, captions, and so on. It's very presentation oriented. It's pretty good
at laying out data. It's not good at describing that data or making it generally
accessible.
Web site designers have worked around HTML's many shortcomings in some
astonishingly novel ways. Still, HTML has serious flaws that make it ill suited for
building complex, open information systems. Here are a few of them.
HTML isn't extensible. Each browser supports a fixed set of tags, and you may
not add your own.
Once generated, HTML is static and not easily refreshable. Dynamic HTML
(DHTML) and other technologies help alleviate this, but HTML, in its most basic
essence, was never intended to serve up live data.
If you've been around for awhile, you may be familiar with Standard Generalized
Markup Language (SGML) and may be thinking that it would address many of HTML's
shortcomings. Though SGML doesn't have the weaknesses that HTML does, its vast
flexibility makes it extremely complex. Document Style Semantics and Specification
Language (DSSSL), the language used to format SGML, is powerful and flexible, but
this power comes at a price�it's extremely difficult to use. What we need is a
language that's similar to HTML in terms of ease of use but features the flexibility of
SGML.
A Brief History of XML
With the explosion of the Web and the massive amount of HTML development that
consequently resulted, people began running into HTML's many shortcomings very
quickly. At the same time, SGML proponents, who'd been working in relative
obscurity for many years, began looking for a way to use SGML itself on the Web,
instead of just one application of it (HTML). They realized that SGML itself was too
complex for the task�most people couldn't or wouldn't use it�so they needed an
alternative. Again, they were looking for something that blended the best aspects of
HTML and SGML.
In mid-1996, Jon Bosak of Sun Microsystems approached the World Wide Web
Consortium (W3C) about forming a committee on using SGML on the Web. The effort
was given the green light by the W3C's Dan Connolly and, though organized, led,
and underwritten by Sun, the actual work was shared among Bosak and people from
outside Sun, including Tim Bray, C. M. Sperberg-McQueen, and Jean Paoli of
Microsoft. By November 1996, the committee had the beginnings of a simplified
form of SGML that was no more difficult to learn and use than HTML but that
retained many of the best features of SGML. This was the birth of XML as we know it.
XML vs. HTML: An Example
You can create your own tags in XML. This is such a powerful, vital part of XML that it bears
further discussion. If you're used to working in HTML, this concept is probably very foreign to
you since HTML does not allow you to define your own tags. Though various browser vendors
have extended HTML with their own custom tags, you still can't create your own�you have to
use the tags provided to you by your browser.
So how do you define a new tag in XML? The simplest answer is: you don't have to. You just use
it. You can control what tags are valid in an XML document by using DTD documents and XML
schemas (we'll talk about each of these later), but, the bottom line is this: You simply use a tag
to define it in XML. There is no typedef or similar construct.
To compare and contrast how HTML and XML represent data, let's look at the same data
represented using each language. Listing 8.1 shows some sample HTML that displays a recipe.
If you read through the HTML in Listing 8.1, you'll no doubt notice that the recipe ingredients
are stored in an HTML table. Figure 8.1 shows how it looks in a browser.
It's readable�if you look hard enough, you can tell what data the HTML contains.
However, there's a really big negative aspect that outweighs the others insofar as data markup
goes�there's nothing in the code to indicate the meaning of any of its elements. The data
contained in the document has no context. A program could scan the document and pick out
the items in the table, but it wouldn't know what they were. And while you could hard-code
assumptions about the data (column 1 is Qty, column 2 is Units, and so on), if the format of the
page were changed, your app would break.
The problem is further exacerbated by attempting to extract the data and store it in a
database. Because the semantic information about the data was stripped out when it was
translated into HTML, we have to resupply this information in order to store it meaningfully in a
database. In other words, we have to translate the data back out of HTML because HTML is not
a suitable storage medium for semantic information.
Now let's take a look at the same data represented as XML. You'll notice that the markup has
nothing to do with displaying the data�it is all about describing content. Listing 8.2 shows the
code.
See the difference? The tags in Listing 8.2 relate to recipes, not formatting. The file remains
readable, so it retains the simplicity of the HTML format, but the data now has context. A
program that parses this file will know exactly what a Jalapeño is�it's an Item in an Ingredient
in a Recipe.
And, regarding ease of use, I think you'll find that XML is actually more human readable than
HTML. XML accomplishes the goal of being at least as simple to use as HTML, yet it is orders of
magnitude more powerful. It explains the information in a recipe in terms of recipes, not in
terms of how to display recipes. We leave the display formatting for later and for tools better
suited to it.
Notational Nuances
It's important to get some of the nomenclature straight before we get too far into our
discussion of XML. Let's reexamine part of our XML document.
<Item optional="1">Tequila</Item>
1. Item is the tag name. As in HTML, tags mark the start of an element in XML. Elements are
a key piece of the XML puzzle. XML documents consist mostly of elements and attributes.
3. "1" is the value of the optional attribute and the portion from optional through "1"
comprises the attribute.
Here, Qty is the element name, and unit is its only attribute. The forward slash at the end of
the text indicates that the element itself is empty and therefore does not require a closing tag.
It's shorthand for this:
<Qty unit="dash"></Qty>
In addition to these basic structure rules, XML documents require stricter formatting than
HTML. XML documents must be well formed in order for an XML parser to be able to process
them. In mathematics, equations have particular forms they must follow in order to be logical;
the ones that don't aren't well formed and aren't terribly useful for anything. XML has a similar
requirement. In order for a parser to be able to parse an XML document, the document must
meet certain rules. The most important of these are the following.
Every document must have a root element that envelops the rest of the document. It
need not be named root. In our earlier example, Recipe is the root element.
All tags must have closing tags, either in the form of an end tag or via the empty tag
symbol described above. HTML often doesn't enforce this rule�browsers typically try to
guess where a closing tag should go if it's missing.
All tags must be properly nested. If Qty is contained within Ingredient, you must close Qty
before you close Ingredient. This is, again, not something that's rigorously enforced by
HTML, but an XML parser will not parse tags that are improperly nested.
Unlike element text, attribute values must always be enclosed in single or double quotes.
The characters <, >, and " cannot be represented literally; you must use character
entities instead. A character entity is a string that begins with an ampersand (&), ends
with a semicolon (;), and takes the place of a special symbol in order to avoid confusing
the parser. Since <, >, and " all have special meaning in XML, you must represent them
using the special character entities <, >, and ", respectively. There are two
other predefined special character entities you may use when necessary: & and
'. The & entity takes the place of an ampersand. Since ampersands typically
denote character entities in an XML document, using them in your data can confuse the
parser. Similarly, ' represents a single quote�an apostrophe. Since attribute values
can be enclosed in single quotes, a stray apostrophe can confuse the parser.
Unlike HTML, if you wish to use character entities other than the predefined five we just
talked about, you must first declare them in a DTD. We'll discuss DTDs shortly.
Element and attribute names may not begin with the letters "XML" in any casing. XML
reserves these for its own use.
Is it well formed? Yes. Is it valid? Perhaps not. Most cars don't have two engines. Consider the
following modified excerpt from our earlier example document.
<Ingredient>
<Qty unit="each">12</Qty>
<Qty unit="each">10</Qty>
<Item>Jalapeno peppers</Item>
</Ingredient>
Does it make sense for an ingredient to include two Qty specifications? No, probably not. While
the document is well formed, it's most likely invalid.
How do you establish the validity rules for a document? Through DTDs and XML schemas. We'll
discuss each of these in the sections that follow.
Document Type Definitions
There are two types of XML parsers: validating and nonvalidating. A nonvalidating parser
checks an XML document to be sure that it's well formed and returns it to you as a tree of
objects. A validating parser, on the other hand, ensures the document is well formed, then
checks it against its DTD or schema to determine whether the document is valid. In this
section, we'll discuss the first of these validation methods, the DTD.
A DTD is a somewhat antiquated though still widely used method of validating documents.
DTDs have a peculiar and rather limited syntax, but they are still found in lots of XML
implementations. Over time, it's likely that XML schemas will become the tool of choice for
setting up data validation. That said, there's still plenty of DTD code out there (and there are a
few things that DTDs can do that XML schemas can't), so DTDs are still worth knowing about.
A DTD can formalize and codify the tags used in a particular type of document. Since XML
itself allows you to use virtually any tags you want so long as the document itself is well
formed, a facility is needed to bring structure to documents, to ensure that they make sense.
DTDs were the first attempt at doing this. And since DTDs define what tags can and cannot be
used in a document, as well as certain characteristics of those tags, DTDs are also used to
define new XML dialects, formalized subsets of XML tags and validation rules. Originally, DTDs
put the X in XML�they were the means by which new applications of XML were designed.
Let's have a look at a DTD for our earlier recipe example. Listing 8.3 shows how it might look.
This DTD defines several characteristics of the document that are worth discussing. First, note
the topmost noncomment line in the file (bolded). It indicates the elements that can be
represented by a document that uses this DTD. A question mark after an element indicates
that it's optional.
Second, notice the #PCDATA flags. They indicate that the element or attribute can contain
character data and nothing else.
Third, take note of the #REQUIRED flag. This indicates that the unit attribute of the Qty
element is required. Documents that use this DTD may not omit that attribute.
Fourth, note the default value supplied for the Item element's optional attribute. Rather than
being required, this attribute can be omitted, as its name suggests. Moreover, for elements
that omit the attribute, it defaults to "0."
From the listing, you can see that DTD syntax is not an XML dialect, nor is it terribly intuitive.
That's why people are increasingly using schemas instead. We'll discuss XML schemas shortly.
You link a DTD and a document together by using a document type declaration element at the
top of the document (immediately after the <?xml…> line). The document type declaration
can contain either an inline copy of the DTD or a reference to its file name using a Uniform
Resource Identifier (URI). The one for recipe.xml looks like this:
Here's the document again with the DTD line included (Listing 8.4).
Listing 8.4 The recipe.xml Document with the DTD Reference Included
Validating the data against the DTD can be done through a number of means. If you're using
Internet Explorer 5.0 or later, you can use Microsoft's built-in DTD validator simply by loading
an XML document into the browser, right-clicking it, and selecting Validate. A number of GUI
and command line tools exist to do the same thing. Several of them are listed on the W3C
site, http://www.w3c.org.
XML Schemas
I mentioned earlier that DTDs were somewhat old-fashioned. The reason for this is
that there's a newer, better technology for validating XML documents. It's called XML
Schema. Unlike DTDs, you build XML Schema documents using XML. They consist of
elements and attributes just like the XML documents they validate. They have a
number of other advantages over DTDs, including the following.
DTDs cannot control what kind of information a given element or attribute can
contain. Merely being able to specify that an element stores text is not precise
enough for most business needs. We might want to specify what format the
text should have or whether the text is a date or a number. XML Schema has
extensive support for data domain control.
DTDs feature only 10 stock data types. XML Schema features over 44 base
data types, plus you can create your own.
All declarations in a DTD are global. This means that you can't define multiple
elements with the same name, even if they exist in completely different
contexts.
A complete discussion on XML Schema is outside the scope of this book, but we
should still touch on a few of the high points. Listing 8.5 presents a validation
schema for the recipe.xml document we built earlier.
Look a little daunting? It's a bit longer than the DTD we looked at earlier, isn't it?
However, it's not as bad as it might seem. Most of the document consists of opening
and closing tags�the schema itself is not that complex.
The first thing you should notice is that each of the elements and attributes in the
XML document is assigned a data type. When the document is validated with this
schema, each piece of data in the document is checked to see whether it's valid for
its assigned data type. If it isn't, the document fails the validation test.
Next, take a look at the maxOccurs element. Via a schema, you can specify a
number of ancillary properties for elements including how many (or how few) times
an element can appear in a document. The default for both minOccurs and
maxOccurs is 1. You can make an element optional by setting its minOccurs attribute
to 0.
Next, notice the xsd:enumeration elements under the unit attribute. In an XML
schema, you can specify a list of valid values for an element or attribute. If an
element or attribute attempts to store a value not in the list, the document fails
validation.
Finally, notice the new data type for the Item element's optional attribute. I've
changed it from an integer to a Boolean value, one of the stock data types
supported by XML Schema. I point this out because I want you to realize the very
rich data type set that XML Schema offers. Also understand that you can create new
types by extending the existing ones. Furthermore, you can create complex
types�elements that contain other elements and attributes. In the schema listed
above, the Qty data type is a complex data type, as are the Ingredients and
Instructions types. Any schema element that contains other elements or attributes
is, by definition, a complex data type.
You might be wondering how you associate a schema with an XML document. You do
so by adding a couple of attributes to the document's root element. For example, the
root element in our recipe.xml document now looks like this:
Recipe xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="D:\Ch08\code\recipe.xsd">
The first attribute makes the elements in the xsi (the XML Schema Instance)
namespace available to the document. A namespace is a collection of names
identified by a URI reference. You can define your own, or you can do as we've done
here and refer to a namespace defined on the W3C Web site. As in many
programming disciplines, an XML namespace provides name scoping to an
application so that names from different sources do not collide with one another.
Unlike traditional namespaces, the names within an XML namespace do not have to
be unique. Without getting into why that is, for now just understand that a
namespace gives scope to the names you use in XML. In this particular case, it
provides access to the names in the xsi namespace, which is where XML Schema
Instance elements reside. By referring to the namespace in this way, we can use
XML Schema Instance elements in the document by prefixing them with xsi:.
The second attribute describes the location of the XML Schema document. This is
the document listed above. It contains the schema information for our document.
Once these attributes are in place, XML Schema�aware tools will validate the
document using the schema identified by the attribute.
Converting XML to HTML Using a Style Sheet
In the same way that cascading style sheets are commonly used to transform HTML
documents, Extensible Stylesheet Language Transformations (XSLT) transforms XML
documents. It can transform XML documents from one document format to another,
into other XML dialects, or into completely different file formats such as PostScript,
RTF, and TeX.
The best part about XSLT is that it's XML. An XSLT document is a regular XML
document. "How can that be?" you may ask. "Wouldn't you have issues with circular
references?" No�XSLT is just another XML dialect. Modern XML parsers are
intelligent enough to know how to use the instructions encoded in an XSLT document
(which are just ordinary XML tags and attributes and the like) to transform or provide
structure to another document.
An XSLT style sheet is an XML document that's made up of a series of rules, called
templates, that are applied to another XML document to produce a third document.
These templates are written in XML using specific tags with defined meanings. Each
time a template matches something in the source XML document, a new structure is
produced in the output. This is often HTML, as the example we're about to examine
demonstrates, but it does not have to be.
Listing 8.6 shows an XSLT style sheet that transforms our recipe.xml document into
HTML that closely resembles the HTML we built by hand earlier in the chapter
(Listing 8.1).
<?xml version='1.0'?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<HEAD>
<TITLE>Henderson's Hotter-than-Hell Habanero Sauce</TITLE>
</HEAD>
<body>
<H3>Henderson's Hotter-than-Hell Habanero Sauce</H3>
Homegrown from stuff in my garden
(you don't want to know exactly what).
<H4>Ingredients</H4>
<table border="2">
<tr BGCOLOR="#00FF00">
<TH>Qty</TH>
<TH>Units</TH>
<TH>Item</TH>
</tr>
<xsl:for-each select="Recipe/Ingredients/Ingredient">
<tr>
<td><xsl:value-of select="Qty"/></td>
<td><xsl:value-of select="Qty/@unit"/></td>
<td><xsl:value-of select="Item"/></td>
</tr>
</xsl:for-each>
</table>
<P/>
<H4>Instructions</H4>
<OL>
<xsl:for-each select="Recipe/Instructions">
<LI><xsl:value-of select="Step"/></LI>
</xsl:for-each>
</OL>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
This style sheet does several interesting things. First, note the xsl:template
match="/" element. As I've said, XSLT transformations occur by applying templates
to specific parts of the XML document. The match attribute of this element specifies,
via what's known as XML Path (XPath) syntax, what part of the document the
template should apply to. In this case, it's the root element. So, the style sheet is
saying, "Locate the root element of the document, and when you find it, insert the
following text into the output document." What follows are several lines of standard
HTML that set up the header of the Web page.
Note the xsl: prefix on the template element. It refers to the xsl namespace. The xsl
namespace is where the template element and the other xsl:-prefixed names are
defined. Adding the namespace reference makes the xsl: prefix available to the
document so that it can reference those names. The URI reference is at the top of
the style sheet. It has the following form:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
Next, notice the HTML table header information that's generated by the style sheet.
It contains three sets of HTML <TH> tags that set up the column headers for the
table. This section of the code matches that of the original HTML document we
created earlier.
The most interesting part of the document is the looping it does. This is where the
real power of XSLT lies. Notice the first xsl:for-each loop (bolded). An XSLT for-each
loop does exactly what it sounds like: It iterates through a collection of nodes at the
same level in a document. The base node from which it works is identified by its
select attribute. In this case, that's the Recipe/Ingredients/Ingredient node. As with
the earlier match attribute, this is an XPath to the node we want to access. What this
means is that we're going to loop through the ingredients for the recipe. For each
one we find, we'll generate a new row in the table.
Note the way in which the nodes within each Ingredient element are referenced. We
use the xsl:value-of element to insert the value of each field in each ingredient as
we come to it. To access the unit attribute of the Qty element, we use the XPath
attribute syntax, /@name, where name is the attribute we want to access.
Note the paragraph tag <P/> that follows the looping code. Traditional HTML would
permit this tag to be specified without a matching closing tag, but not XML. And this
brings up an important point: When you provide HTML code for a style sheet to
generate, it must be well formed. That is, it must comply with the rules that dictate
whether an XML document is well formed. Remember: A style sheet is an XML
document in every sense of the word. It must be well formed or it cannot be parsed.
The code finishes up with another for-each loop. This one lists the Step elements in
each Instructions element. Note the use of the HTML Ordered List (<OL>) and List
Item (<LI>) tags. These work just like they do in standard HTML�they produce a
numbered list.
You have several options for using this style sheet to transform the recipe.xml
document. You could use Microsoft's standalone XSLT transformer, you could use a
third-party XSLT transformer, or you could use the one that's built into your browser,
if your browser supports direct XSLT transformations. See the Tools subsection below
for more information, but, in my case, I'm using Internet Explorer's built-in XSLT
transformer. This requires the addition of an <?xml-stylesheet> element to the XML
document itself, just beneath the <?xml version> tag. Here's the complete element:
As you can see, the element contains an href attribute that references the style
sheet using a URI. Now, every time I view the XML document in Internet Explorer,
the style sheet will automatically be applied in order to transform it. Listing 8.7
shows the HTML code that's generated by using the style sheet.
<html>
<HEAD>
<TITLE>Henderson's Hotter-than-Hell Habanero Sauce</TITLE>
</HEAD>
<body>
<H3>Henderson's Hotter-than-Hell Habanero Sauce</H3>
Homegrown from stuff in my garden
(you don't want to know exactly what).
<H4>Ingredients</H4>
<table border="2">
<tr BGCOLOR="#00FF00">
<TH>Qty</TH>
<TH>Units</TH>
<TH>Item</TH>
</tr>
<tr>
<td>6</td>
<td>each</td>
<td>Habanero peppers</td>
</tr>
<tr>
<td>12</td>
<td>each</td>
<td>Cowhorn peppers</td>
</tr>
<tr>
<td>12</td>
<td>each</td>
<td>Jalapeno peppers</td>
</tr>
<tr>
<td></td>
<td>dash</td>
<td>Tequila</td>
</tr>
</table>
<P />
<H4>Instructions</H4>
<OL>
<LI>Chop up peppers, removing their stems, then grind to a liquid.</LI>
</OL>
</body>
</html>
Figure 8.2 shows how Listing 8.7 looks when viewed from a browser.
While it's nifty to be able to translate the XML document into well-formed HTML that
matches our original example, what does that really buy us? Wouldn't it have been
easier just to create the document using HTML in the first place?
Perhaps it would have been easier to create this one document in HTML without
using XML and a style sheet. However, by separating the storage of the data from its
presentation, we can radically alter its formatting without affecting the data. That's
not true of HTML. To understand this, look over the style sheet shown in Listing 8.8.
<?xml version='1.0'?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<HEAD>
<TITLE>Henderson's Hotter-than-Hell Habanero Sauce</TITLE>
</HEAD>
<body>
<H3>Henderson's Hotter-than-Hell Habanero Sauce</H3>
Homegrown from stuff in my garden
(you don't want to know exactly what).
<H4>Ingredients</H4>
<UL>
<xsl:for-each select="Recipe/Ingredients/Ingredient">
<LI>
<xsl:value-of select="Qty"/>	<xsl:value-of
select="Qty/@unit"/> of <xsl:value-of select="Item"/>
</LI>
</xsl:for-each>
</UL>
<P/>
<H4>Instructions</H4>
<table border="2">
<tr BGCOLOR="#00FF00">
<TH>#</TH>
<TH>Step</TH>
</tr>
<xsl:for-each select="Recipe/Instructions">
<tr>
<td><xsl:value-of select="position()"/></td>
<td><xsl:value-of select="Step"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
We can use this style sheet to transform the XML document into a completely
different HTML layout than the first one (you can specify a new style sheet for a
document by changing the document's <?xml-stylesheet> element or by overriding
it in your XSLT transformation tool). Figure 8.3 shows how the new Web page looks in
a browser.
As you can see, the page formatting is completely different. The ingredients table is
gone, replaced by a bulleted list. Conversely, the Instructions steps have been
moved from an ordered list into a table. The formatting has changed completely, but
the data is the same. The XML document didn't change at all.
Since the data now has context, we can access it directly. There's no need to hard-
code table column or table row references to the HTML and translate the data out of
HTML into a usable data format�the data is already in such a format. And,
regardless of how we decide to transform or format the data, that will always be
true. Because it's stored in XML, the data can be manipulated in virtually any way
we see fit.
The xsl:for-each element in our style sheets gave us a glimpse of some of XSLT's
power. Like most languages, much of its utility can be found in its ability to perform
a task repetitively. XSLT defines a number of constructs that are similarly powerful,
among them:
xsl:if
xsl:choose
xsl:sort
xsl:attribute
You can check the XSLT specification itself for the full list, but suffice it to say�XSLT
brings to bear some of the real power and extensibility of XML. It's an example of
what I like to refer to as the "programmable data" aspect of XML. Via XSLT, we have
the ability not only to specify how data is formatted but also to programmatically
change it from within the data itself. That's powerful stuff indeed.
Because we've been carrying out formatting-related tasks with XSLT and XML, it
might appear that XML is just a content management technology. That's not the
case�it's far more than that. Certainly, from the perspective of Web masters, the
XML family of technologies offers huge advancements over HTML. However, XML is
about more than just formatting data or managing content. It is about data and
giving that data sufficient context to be useful in a wide variety of situations. There's
a whole world of applications outside the realm of browsers and Web pages. To add
the power of XML to those types of applications, we use the DOM.
The Document Object Model
Thus far, we've explored XML from the standpoint of generalizing document formats.
But XML's real power comes into play when it is used to structure information.
All XML documents consist of nested sets of elements. Every document is wrapped
in a root element, which in turn houses other elements. The structure is a natural
tree�a tree of elements�objects that represent the content of the document. The
DOM goes beyond the simple text stream approach and provides a language-neutral
means of working with an XML document as a tree of objects.
This object-oriented access to XML documents opens the door to a whole host of
other uses for XML. It makes it trivial to incorporate XML as an interprocess or
interapplication data interchange mechanism because all you deal with are objects
in your programming language of choice. It's doesn't matter whether that language
is Visual Basic, Java, or C#; you can read, manipulate, and generally process XML
documents by calling methods on objects and accessing their properties.
Think of all the possibilities this brings with it. For example, imagine a database
system where the entire database was represented as an XML document. Need a
schema of the database? No problem�extract the XML schema from the DOM, run it
through an XSLT transformation, and you've got yourself a browsable database
schema that's always current. Want to write a unified tool that can administer
objects on SQL Server, Oracle, DB2, and all the other big players in the DBMS space
without having to code to each of the administrative APIs separately? Have them
expose their database schemas as DOM trees, and you should be able to build a
single tool that works with all of them.
Already, vendors are putting DOM and XML to use in scenarios like the ones I've just
described. SQL Server has certainly done its fair share of this, as we'll discuss later
in the book.
Private Sub Command1_Click()
"</Songs>"
End If
Else
Dim oNodes As IXMLDOMNodeList Dim
oNode As IXMLDOMNode
End If
Set oNodes =
xmlDoc.selectNodes(Text2.Text)
sData = oNode.xml
Next
End If
End Sub
Text1.text = ""
ErrorTrap:
& Err.Description
End Sub
Implements IVBSAXContentHandler
Private Sub
IVBSAXContentHandler_startElement(strNa
mespaceURI As String, strLocalName As
String, strQName As String, ByVal
attributes As MSXML2.IVBSAXAttributes)
Dim i As Integer
For i = 0 To (attributes.length - 1)
Form1.Text1.text = Form1.Text1.text & " "
& attributes.getLocalName(i) & "=""" &
attributes.getValue(i) & """"
Next
Form1.Text1.text = Form1.Text1.text &
">" & vbCrLf
End Sub
Private Sub
IVBSAXContentHandler_endElement(strNa
mespaceURI As String, strLocalName As
String, strQName As String)
Private Sub
IVBSAXContentHandler_characters(text As
String) text = Replace(text, vbLf, vbCrLf)
Form1.Text1.text = Form1.Text1.text &
"__CHARACTERS__" & vbCrLf & text &
vbCrLf End Sub
Private Sub
IVBSAXContentHandler_endDocument()
Private Sub
IVBSAXContentHandler_endPrefixMapping(
strPrefix As String)
Private Sub
IVBSAXContentHandler_ignorableWhitespa
ce(strChars As String)
Private Sub
IVBSAXContentHandler_processingInstructi
on(target As String, data As String)
Form1.Text1.text = Form1.Text1.text &
"__PROCESSING
Private Sub
IVBSAXContentHandler_skippedEntity(strN
ame As String) Form1.Text1.text =
Form1.Text1.text & "__SKIPPED ENTITY__"
& vbCrLf & strName & vbCrLf End Sub
Private Sub
IVBSAXContentHandler_startDocument()
Form1.Text1.text = Form1.Text1.text &
"__DOCUMENT START__" & vbCrLf
End Sub
Private Sub
IVBSAXContentHandler_startPrefixMapping
(strPrefix As String, strURI As String)
Form1.Text1.text = Form1.Text1.text &
"__START PREFIX
MAPPING__" & strPrefix & " " & strURI & "
" & vbCrLf End Sub
Implements IVBSAXErrorHandler
Private Sub
IVBSAXErrorHandler_error(ByVal lctr As
IVBSAXLocator, msg As String, ByVal
errCode As Long) Form1.Text1.text =
Form1.Text1.text & "Error: " & msg & "
Code: " & errCode
End Sub
Private Sub
IVBSAXErrorHandler_ignorableWarning
(ByVal oLocator As
MSXML2.IVBSAXLocator, strErrorMessage
As String, ByVal nErrorCode As Long)
End Sub
We begin by instantiating a
SAXXMLReader object. This object will
process an XML document we pass it and
raise events as appropriate as it reads
through the document. The code in the
ContentHandler and ErrorHandler classes
will respond to these events and write
descriptive text to the main form.
Resources
Further Reading
I've found Liz Castro's book XML for the World Wide Web: Visual QuickStart
Guide (Berkeley, CA: Peachpit Press, 2000) to be a concise yet thorough
treatment of the subject. Liz writes good books, and I've found this one
particularly useful.
Steve Holzner's Inside XML (Indianapolis, IN: New Riders, 2000) is also a good
read. It's comprehensive and covers many key XML subjects in great detail.
Erik T. Ray's Learning XML (Sebastopol, CA: O'Reilly, 2001) is another good
introductory text. It contains a nice introduction to the many XML parsers out
there and delves into a few topics (e.g., XML Schema) omitted by some of the
other books.
Most of the major software vendors have a large XML portal of some type
available from their sites. I've found the Microsoft and Sun sites to be the most
informative.
Tools
You should begin by getting yourself a good XML/XSLT/XSD editor. I like XML
Spy (http://www.xmlspy.com), but there are several good ones out there. Don't
let anyone tell you that GUIs are for wimps. Using Notepad to spend hours
doing what a GUI tool will do for you in seconds simply makes no sense. You
end up wasting lots of time trudging around in clunky tools that could have
been spent mastering the technology.
You'll also want an XML Schema/DTD validator. I use the one that's freely
downloadable from the Microsoft Web site, but there are several out there.
Depending on your other tools, you may need a separate XSLT transform tool. I
use Microsoft's XSLT tools, as well as James Clark's XT tool. Again, there are
several freebies out there.
If you run on Windows, get the latest version of the MSXML parser�it's the
best one on the market.
Michael Kay's SAXON tool is worth having even if you have other XSLT
processors. It's a nice piece of software written by a master of the technology.
The MSXML SDK is also worth having if you're on Windows. It contains some
good sample code and documentation that will come in handy if you build
applications using the MSXML APIs.
Recap
XML is programmable, hierarchical data. It is designed to be similar to HTML in terms
of ease of use and similar to SGML in terms of power, extensibility, and
expressiveness.
You can translate XML documents into other formats by using XSLT. Often the target
format is HTML, but it does not have to be. Translating between divergent document
formats is also quite common.
DTDs and XML schemas help ensure that an XML document is not only well formed
but also valid. A document can be well formed and still have invalid data in it. It's up
to a validating XML parser to check the document against its DTD or XML schema to
ensure that it contains valid data.
DOM is a popular API for processing XML documents as objects. Put succinctly, DOM
loads an XML document into a tree object, which you may then manipulate by
getting and setting the object's properties and by calling its methods.
SAX is also an increasingly popular XML processing API. Unlike DOM, it does not load
an entire XML document into memory. Instead, it reads through the document,
raising events as it goes. It's up to the calling application to respond to those events.
XML is not just one technology�it's a whole family of technologies, and those
technologies continue to evolve and continue to be adopted by more people around
the world with each passing day. It's crucial to learn as much as you can about XML
now so that you can make the best use of SQL Server's XML-related features�both
those it has today and those coming in the future. Here's a bold prediction for you:
The day will come when XML will be at least as important to SQL Server application
development as Transact-SQL is today. It's definitely time to dive in.
Knowledge Measure
1. True or false: ou can add your own tags to HTML, but you cannot add custom
attributes.
4. What's the maximum number of root nodes an XML document can have?
5. True or false: You use an XML schema to translate an XML document from one
format into another.
6. In the code fragment <foo "bar"/>, what type of document node is "bar"?
7. True or false: It's possible for a document to be valid but not well formed.
8. What classic data structure does an XML document that's been loaded into
memory via DOM most closely resemble?
12. Is an XML element that contains attributes but no other elements or data
considered empty?
14. True or false: XML parsers are typically more tolerant of tag nesting
mismatches than HTML parsers.
15. Describe the functional difference, if there is one, between this XML: <foo>
</foo>
<foo/>
Part II: Subsystems, Components, and
Technologies
Chapter 9. SQL Server as a Server
We want to stand upon our own feet and look fair and square at the world�its
good facts, its bad facts, its beauties, and its ugliness; see the world as it is
and be not afraid of it. Conquer the world by intelligence and not merely by
being slavishly subdued by the terror that comes from it.
�Bertrand Russell[1]
[1] Russell, Bertrand. "What We Must Do." Little Blue Book No. 1372. Girard, KS: Haldeman-Julius Company, 1929, p. 20.
In this chapter, we'll talk about SQL Server as a Windows server application. Earlier
in the book, we discussed the Win32 networking and I/O API functions that Windows
servers call to carry out their work. We talked about the process and threading APIs,
thread scheduling and synchronization, memory management, and COM. Here we'll
talk about how some of these are used by SQL Server itself and where it fits in the
general taxonomy of Windows server applications.
NOTE: In this chapter I assume that you've already read Chapters 5 and 6, I/O
Fundamentals and Networking, respectively. If you haven't yet read through those
chapters, you will probably want to before proceeding.
SQL Server and Networking
You'll recall that in the Networking chapter I said that SQL Server uses standard
networking API calls to accept and process connections. You're probably already
aware that SQL Server calls into its network library code (which resides in separate
DLLs) to accept and process user connections. What you may not be aware of is that
these DLLs, in turn, call standard Windows networking API functions to carry out
their work.
When listening for connections over TCP/IP, the Windows socket APIs that we
discussed in the Networking chapter are used heavily. The SQL Server Net-Library
code calls accept and WSAAccept to accept new connections and the various other
socket and Win32 I/O API functions to process client requests and return data. As we
saw in the Networking chapter, socket handles returned by Windows'
implementation of the socket API can be used with standard Win32 file I/O functions
such as ReadFile and WriteFile.
When listening for connections over named pipes, the Net-Library code uses the
standard Win32 file I/O API functions to process new connection requests and
process and return results from client requests. As you learned in the Networking
chapter, Windows applications interact with named (and anonymous) pipes using
the same Win32 I/O functions that are used when interacting with disk files.
SQL Server supports client connectivity using the multiprotocol Net-Library via the
Win32 RPC API functions. Win32 applications communicate with one another over
the RPC API via stub (or proxy) functions that the client calls. These stub functions,
in turn, make RPC API calls to marshal each function call and its parameter data to
the destination server. On the destination server, the call is unmarshaled and the
actual server-side function to which the proxy function corresponded is called. In this
way, the RPC API provides a call-level interface to network communications, rather
than the packet-, byte-, or message-oriented interfaces normally presented by
Windows networking API libraries. When a SQL Server client connects over the
multiprotocol Net-Library to SQL Server, it essentially makes procedure calls into the
server-side version of the library that are marshaled and sent across the network to
the target SQL Server. The server-side Net-Library then processes these calls and
feeds the client requests into the normal network I/O processing code line used by
the other Net-Libraries.
SQL Server uses an I/O completion port to allow connection I/O to be processed
asynchronously. As I mentioned in the I/O Fundamentals chapter, an I/O completion
port that's associated with a file (or socket) object receives a new completion packet
each time an asynchronous I/O operation that was initiated for the associated object
completes. This design allows SQL Server to support a high number of concurrent
client connections with a minimum number of dedicated network-related worker
threads.
Each Net-Library receives a separate worker thread that it uses to listen for
connections and process network-related I/O. If your server is listening on TCP/IP
sockets and named pipes, each associated Net-Library uses its own worker thread.
You can see how SQL Server's Net-Library code makes use of the API functions we
discussed in Chapters 5 and 6 by using WinDbg. Work through Exercise 9.1 to get a
good feel for the types and frequency of the networking API calls that SQL Server
makes.
1. Stop your development or test SQL Server instance if it is running. For this
exercise, you should be the server's only user.
2. Start WinDbg and be sure your symbol paths are set correctly as described in
Chapter 2.
3. From the WinDbg File menu, select Open Executable and locate your SQL
Server executable (sqlservr.exe). Set the command line parameters to:
-c -sYourInstanceName
where YourInstanceName is the name of your SQL Server instance. If you are
using the default instance, omit the -s parameter altogether.
4. Click the OK button to start SQL Server under WinDbg. Once you see the
WinDbg command prompt, add the following breakpoints:
bp WS2_32!WSAAccept
bp WS2_32!accept
bp WS2_32!listen
5. Now, type g in the WinDbg command window and press Enter to allow SQL
Server to start up.
6. Start Query Analyzer and attempt to connect to your server. You should see
some of your breakpoints tripped immediately. Add the following additional
breakpoints:
bp kernel32!GetQueuedCompletionStatus
bp kernel32!ReadFile
7. Type g and press Enter again to allow SQL Server to run. Your new connection
should succeed in Query Analyzer. Now open a new connection. Again, you
should see some of your breakpoints tripped, including some of the new ones
you just set. Type g and press Enter to bump through these. You should see
that some of them are hit repeatedly as your new connection is processed and
the default ODBC connection options are processed by the server.
8. At this point, you're done. Press Shift+F5 to stop debugging, then close
WinDbg. You will need to restart your server to continue working with it as it
should be stopped after your debugging session completes.
If you work through the preceding exercise, you'll see that the networking API
functions we explored in Chapter 6 and many of the I/O functions we investigated in
Chapter 5 are used very heavily by SQL Server's Net-Library code. Understanding
how these work and what they're typically used for will give you good insight into
how SQL Server itself works.
The SQL Server Executable
SQL Server's executable is named sqlservr.exe and resides in the binn folder under
your main SQL Server installation. The folder is named binn because earlier releases
of SQL Server shipped 16-bit client-side executables and libraries and these were
stored in a folder named bin. The binn folder was reserved for 32-bit executables
and libraries (the extra "n" signified "NT"). SQL Server no longer ships any 16-bit
binaries, but the folder name for 32-bit binaries has remained unchanged.
sqlservr.exe is linked with the LARGEADDRESSAWARE linker switch. This means that
it can take advantage of user mode address space above 2GB. As I mentioned in
Chapter 4, when a member of the Windows NT Server family (e.g., Windows 2000
Server, Windows Server 2003, and so on) is booted with the /3GB switch (or /USERVA
switch on Windows Server 2003), Windows increases the user mode portion of a
process's virtual address space at the expense of the kernel mode portion. SQL
Server is linked and coded such that it can take advantage of this and use more than
2GB of user mode virtual memory address space when the operating system makes
it available.
SQL Server's DLLs
SQL Server's main executable, sqlservr.exe, statically imports nine different DLLs.
Table 9.1 lists these and indicates the main purpose of each one.
Each of these must be present on the host system in order for SQL Server to start.
Furthermore, any DLLs they require (Gdi32.dll, Ntdll.dll, and so on) must also be
present in order for the server to successfully start. SQL Server also loads numerous
DLLs dynamically on startup depending on the options selected (e.g., which Net-
Libraries the server is configured to listen on) or when they're needed based on
client requests and other activities on the server (e.g., OPENXML queries, linked
server queries, BULK INSERT operations, and so on).
The three DLLs set in bold type in Table 9.1 ship with SQL Server and are, generally
speaking, a matched set. Typically (but not always), these are updated together
when a new service pack or hotfix is released for the product. Each of them includes
a Windows version resource, so you can easily check their version strings by right-
clicking them in Windows Explorer, selecting Properties, and selecting the Version
tab in the Properties dialog.
SQL Server I/O
When running on a member of the Windows NT family, SQL Server performs as much
file I/O as possible asynchronously. You'll recall from Chapter 5 that the standard
Win32 API functions can be executed asynchronously when the file object they're
working with has been opened with the FILE_FLAG_OVERLAPPED switch and a
pointer to a valid OVERLAPPED structure is passed in to the functions that require it.
For example, if you call CreateFile and pass in FILE_FLAG_OVERLAPPED, then call
ReadFileEx with the returned file handle and a pointer to a valid OVERLAPPED
structure, Windows will attempt to process your request asynchronously. SQL Server
takes advantage of this Windows' facility to avoid blocking on I/O whenever possible.
There are situations, of course, that prevent SQL Server from processing I/O
asynchronously. One obvious one is running it on a member of the Win9x family.
Because Win9x doesn't support asynchronous file I/O, all SQL Server file I/O on
Win9x is performed synchronously. SQL Server's UMS component is responsible for
handling the scheduling of I/O requests and has special code to detect Win9x and
perform I/O synchronously. See Chapter 10 for more information on UMS and its
processing of asynchronous I/O.
Another example of a situation in which SQL Server can't use asynchronous I/O is
when the MDF and LDF files that make up a database have been compressed using
NTFS file compression. As I mentioned in Chapter 5, Windows prohibits the use of
asynchronous file I/O against compressed files and will either turn any asynchronous
request against such files into synchronous operations for those APIs that support
synchronous I/O (e.g., ReadFile) or return an error for those that don't (e.g.,
ReadFileEx). This is one reason that Microsoft doesn't support the compression of
database files.
SQL Server makes use of scatter-gather I/O in order to quickly load a contiguous disk
file region into a set of buffers that may or may not be contiguous in memory. As we
saw in Chapter 5, the ReadFileScatter and WriteFileGather Win32 API functions take
a pointer to an array of I/O buffers, then either load data from disk into them or write
data from them to disk. By supporting noncontiguous source and destination
memory buffers, these API functions allow SQL Server to avoid having to use a
contiguous intermediary buffer that matches the size of the disk file region being
read or written to and copying it to or from a series of noncontiguous memory
buffers. First introduced in a service pack for Windows NT 4.0, scatter-gather I/O
allows SQL Server to achieve better scalability and greater performance than would
otherwise be possible. It allows the BPool and MemToLeave memory managers to do
what they do best�manage large numbers of noncontiguous buffers without having
to be concerned with locating any of them in particular proximity to the others based
on I/O requirements�while still allowing the storage engine to process I/O as quickly
as possible. See Chapter 5 for more information on scatter-gather I/O.
SQL Server Components
I will end this chapter by talking in broad terms about the various SQL Server
components involved in processing the typical client request. Several of these are
discussed in more detail in other chapters; we'll just hit the high points here. A
typical client request is one that queries the server for data that resides in a
database. Understanding how these components interoperate and the workflow
between them will give you some good insight into how the server works internally.
As I've mentioned, client requests come into the server via SQL Server's Net-
Libraries. These requests are then scheduled for processing via UMS. The language
processing and execution (LPE) component within the server then takes each
request and passes it to the query processor (QP) for optimization. LPE and QP are
members of the relational engine. Once a client query is optimized and an execution
plan is produced for it, LPE executes it via calls from the relational engine to the
storage engine (SE). The storage engine carries out the physical I/O, table and index
traversal, data retrieval, and so on necessary to carry out the request from the
relational engine. The communication between the LPE and SE components occurs
via COM using calls to OLE DB interfaces. See Chapter 7 for more information on
interfaces and COM in general.
So, the key components and technologies involved in the processing of a typical
client request include the Net-Libraries, UMS, LPE, QP, SE, and OLE DB. All of these
are touched every time a client submits a query against a SQL Server database for
processing. All along the way, SQL Server's various memory managers are accessed
each time a component within the server requires a memory allocation. See Chapter
12 for more information on the sequence of events that occurs when a query is
processed. See Chapter 11 for more information on how SQL Server manages
memory.
Recap
SQL Server is a complex Windows app. There's nothing magical about it that makes
it different from other Windows apps; it simply makes heavy use of the Win32 API
functions, C/C++ runtime library functions, COM interfaces, and so on that any
Windows app can use. It presents a sophisticated, multithreaded Windows
application that clients can connect to and to which they can submit requests for
data from databases, requests related to the storage of that data, and other
administrative commands in the form of Transact-SQL. As a request is processed, its
results�if any�are sent back to the requesting client via the same mechanisms that
delivered the request in the first place.
Knowledge Measure
1. What component within SQL Server is responsible for scheduling I/O
operations?
4. SQL Server's LPE and QP components are members of which larger component
within the server?
6. What is the purpose of the Sqlsort.dll file that ships with SQL Server?
9. True or false: Although SQL Server statically links several DLLs, it also explicitly
loads several of them either at startup or as it runs, depending on the way it's
configured and the commands executed by users.
10. In addition to /3GB, what other switch can be used with Windows Server 2003
to increase the size of the user mode virtual address space available to
applications that are large address aware such as SQL Server?
Chapter 10. User Mode Scheduler
To those searching for the truth�not the truth of dogma and darkness but the
truth brought by reason, search, examination, and inquiry, discipline is
required. For faith, as well intentioned as it may be, must be built on facts, not
fiction�faith in fiction is a damnable false hope.
�Thomas Edison[1]
[1] Edison, Thomas. As quoted in The Book Your Church Does Not Want You to Read, ed. Tim C. Leedom. San Diego, CA: The
Truth Seeker Company, 1993, p. 4.
Up through version 6.5, SQL Server used Windows' scheduling facilities to schedule
worker threads, switch between threads, and generally handle the work of
multitasking. This worked reasonably well and allowed SQL Server to leverage
Windows' hard-learned lessons regarding scalability and efficient processor use.
Between versions 6.5 and 7.0, however, it became evident that SQL Server was
beginning to hit a "scalability ceiling." Its ability to handle thousands of concurrent
users and efficiently scale on systems with more than four processors was hampered
by the fact that the Windows scheduler treated SQL Server like any other
application. Contrary to what some people believed at the time, SQL Server 6.5
made use of no hidden APIs to reach the scalability levels it achieved. It used the
basic thread and thread synchronization primitives we discussed earlier in this book,
and Windows scheduled SQL Server worker threads on and off the processor(s) just
as it did any other process. Clearly, this one-size-fits-all approach was not the most
optimal solution for a high-performance app like SQL Server, so the SQL Server
development team began looking at ways to optimize the scheduling process.
UMS Design Goals
Several goals were established at the outset of this research. The scheduling facility
needed to:
Support fibers, a feature new in Windows NT 4.0, and abstract working with
them so that the core engine would not need separate code lines for thread
mode and fiber mode
Support asynchronous I/O and abstract working with it so that the core engine
would not need a separate code line for versions of Windows that do not
support asynchronous file I/O (e.g., Windows 9x and Windows ME)
Ultimately, it was decided that SQL Server 7.0 should handle its own scheduling.
From that decision, the User Mode Scheduler (UMS) component was born.
UMS acts as a thin layer between the server and the operating system. It resides in a
file named UMS.DLL and is designed to provide a programming model that's very
similar to the Win32 thread scheduling and asynchronous I/O model. Programmers
familiar with one would instantly be at home in the other.
Its primary function is to keep as much of the SQL Server scheduling process as
possible in user mode. This means that UMS necessarily tries to avoid context
switching because that involves the kernel. As I mentioned in Chapter 3, context
switches can be expensive and can limit scalability. In pathological situations, a
process can spend more time switching thread contexts than actually working.
User Mode vs. Kernel Mode Scheduling
You may be wondering what the advantage of moving the scheduling management
inside the SQL Server process is. Wouldn't SQL Server just end up duplicating
functionality already provided by Windows? With all the smart people Microsoft must
have working on Windows and its scheduler, how likely is it that the SQL Server
team could come up with something that much more scalable?
I'll address this in detail below, but the short answer is that SQL Server knows its
own scheduling needs better than Windows or any other code base outside the
product could hope to. UMS doesn't duplicate the complete functionality of the
Windows scheduler, anyway�it implements only the basic features related to task
scheduling, timers, and asynchronous I/O and, in fact, relies on Windows' own thread
scheduling and synchronization primitives. Several Windows scheduling concepts
(e.g., thread priority) have no direct counterparts in UMS.
Preemptive vs. Cooperative Tasking
An important difference�in fact, probably the most important difference�between
Windows' scheduler and SQL Server's UMS is that Windows' scheduler is a
preemptive scheduler, while UMS implements a cooperative model. What does this
mean? It means that Windows prevents a single thread from monopolizing a
processor. As we discussed in Chapter 3, each thread gets a specified time slice in
which to run, after which Windows automatically schedules it off the processor and
allows another thread to run if one is ready to do so. UMS, by contrast, relies on
threads to voluntarily yield. If a SQL Server worker thread does not voluntarily yield,
it will likely prevent other threads from running.
You may be wondering why UMS would take this approach. If you're an old-timer like
me, you might recall that Windows 3.x worked exactly the same way�it made use of
a cooperative scheduler, and it wasn't difficult for a misbehaving app to take over
the system. This was, in fact, why Windows NT was designed from the ground up to
make use of a preemptive scheduler. As long as a single app could bring down the
system, you could never have anything even approaching a robust operating
system.
UMS takes the approach it does in order to keep from involving the Windows kernel
any more than absolutely necessary. In a system where worker threads can be
counted on to yield when they should, a cooperative scheduler can actually be more
efficient than a preemptive one because the scheduling process can be tailored to
the specific needs of the application. As I said earlier, UMS knows SQL Server's
scheduling needs better than the operating system can be expected to.
How UMS Takes Over Scheduling
If UMS is to handle SQL Server's scheduling needs rather than allowing Windows to
do so, UMS must somehow prevent the OS from doing what it does with every other
process: schedule threads on and off the system's processor(s) as it sees fit. How do
you do that in a preemptive OS? UMS pulls this off through some clever tricks with
Windows event objects. Each thread under UMS has an associated event object. For
purposes of scheduling, Windows ignores threads it does not consider
viable�threads that cannot run because they are in an infinite wait state. Knowing
this, UMS puts threads to sleep that it does not want to be scheduled by having
them call WaitForSingleObject on their corresponding event object and passing
INFINITE for the timeout value. As you'll recall from Chapter 3, when a thread calls
WaitForSingleObject to wait on an object and passes INFINITE for the timeout value,
the only way to awaken the thread is for the object to be signaled. When UMS wants
a given thread to run, it signals the thread's corresponding event object. This allows
the thread to come out of its wait state and permits Windows to schedule it to run on
a processor.
Because workers are divided evenly among the UMS schedulers on the server, the
more UMS schedulers you have, the fewer ill-behaved connections it takes to cause
concurrency issues and other problems on the server.
For example, with an eight-processor machine, each UMS scheduler can host
approximately 32 workers. If a spid associated with a particular scheduler holds
locks on resources such that it causes a blocking chain that's 32 processes deep
(certainly not unheard of) and these spids happen to also be associated with the
same scheduler, the scheduler can become unresponsive and unable to process new
work requests. The processing of new work requests on the scheduler would
effectively stop until the blocking issue was resolved.
The UMS Scheduler Lists
Each UMS scheduler maintains five lists that support the work of scheduling threads:
a worker list, a runnable list, a waiter list, an I/O list, and a timer list. Each of these
plays a different role, and nodes are frequently moved between lists.
The worker list is the list of available UMS workers. A UMS worker is an abstraction of
the thread/fiber concept and allows either to be used without the rest of the code
being aware of which is actually being used under the covers. As I said, one of the
design goals of UMS was to provide support for fibers in such a way as not to require
the core engine code to be concerned with whether the system was using fibers or
threads. A UMS worker encapsulates a thread or fiber that will carry out tasks within
the server and abstracts it such that the server does not have to be concerned (for
the most part) with whether it is in thread mode or fiber mode. Throughout this
chapter, I'll refer to UMS workers instead of threads or fibers.
If your SQL Server is in thread mode (the default), a UMS worker encapsulates a
Windows thread object. If your server is in fiber mode, a UMS worker encapsulates a
Windows fiber, the handling of which is actually implemented outside of the
Windows kernel, as I mentioned in Chapter 3.
When a client connects to SQL Server, it is assigned to a specific UMS scheduler. The
selection heuristics are very basic: Whichever scheduler has the fewest number of
associated connections gets the new connection. Once a connection is associated
with a scheduler, it never leaves that scheduler.
Regardless of whether its associated scheduler is busy and there are inactive
schedulers on the system, UMS will not move a spid between schedulers. This
means that it's possible to design scenarios where SQL Server's support for
symmetric multiprocessing is effectively thwarted because an application opens
multiple persistent connections that do not perform a similar amount of work.
Say, for example, that you have a two-processor machine, and a SQL Server client
application opens four persistent connections into the server, with two of those
connections performing 90% of the work of the application. If those two connections
end up on the same scheduler, you may see one CPU consistently pegged while the
other remains relatively idle. The solution in this situation is to balance the load
evenly across the connections and not to keep persistent connections when the
workload is unbalanced. Disconnecting and reconnecting is the only way to move a
spid from one scheduler to another. (This movement isn't guaranteed�a spid that
disconnects and reconnects may end up on the same scheduler depending on the
number of users on the other schedulers.)
Once a spid is assigned to a scheduler, what happens next depends on the status of
the worker list and whether SQL Server's max worker threads configuration value
has been reached. If a worker is available in the worker list, it picks up the
connection request and processes it. If no worker is available and the max worker
threads threshold has not been reached, a new worker is created, and it processes
the request. If no workers are available and max worker threads has been reached,
the connection request is placed on the waiter list and will be processed in FIFO
order as workers become available.
Client connections are treated within UMS as logical (rather than physical) users. It is
normal and desirable for there to be a high ratio of logical users to UMS workers. As
I've mentioned before, this is what allows a SQL Server with a max worker threads
setting of 255 to service hundreds or even thousands of users.
Work Requests
UMS processes work requests atomically. This means that a worker processes an
entire work request�a T-SQL batch execution, for example�before it is considered
idle. It also means that there's no concept of context switching during the execution
of a work request within UMS. While executing a given T-SQL batch, for example, a
worker will not be switched away to process a different batch. The only time a
worker will begin processing another work request is when it has completed its
current work request. It may yield and execute, for example, I/O completion routines
originally queued by another worker, but it is not considered idle until it has
processed its complete work request, and it will not process another work request
until it is finished with the current work request. Once that happens, the worker
either activates another worker and returns itself to the worker list or enters an idle
loop code line if there are no other runnable workers and no remaining work
requests, as we'll discuss in just a moment.
This atomicity is the reason it's possible to drive up the worker thread count within
SQL Server by simply executing a number of simultaneous WAITFOR queries as we
did using the STRESS.CMD tool in Chapter 3. While each WAITFOR query runs, the
worker that is servicing it is considered busy by SQL Server, so any new requests
that come into the server require a different worker. If enough of these types of
queries are initiated, max worker threads can be quickly reached, and, once that
happens, no new connections will be accepted until a worker is freed up.
When the server is in thread mode and a worker has been idle for 15 minutes, SQL
Server destroys it, provided doing so will not reduce the number of workers below a
predefined threshold. This frees the virtual memory associated with an idle worker's
thread stack (.5MB) and allows that virtual memory space to be used elsewhere in
the server.
The runnable list is the list of UMS workers ready to execute an existing work
request. Each worker on this list remains in an infinite wait state until its event
object is signaled. Being on the runnable list does not imply that the worker is
schedulable by Windows. It will be scheduled by Windows as soon as its event object
is signaled according to the algorithms within UMS.
Given that UMS implements a cooperative scheduler, you may be wondering who is
actually responsible for signaling the event of a worker on the runnable list so that it
can run. The answer is that it can be any UMS worker. There are calls throughout the
SQL Server code base to yield control to UMS so that a given operation does not
monopolize its host scheduler. UMS provides multiple types of yield functions that
workers can call. As I've mentioned, in a cooperative tasking environment, threads
must voluntarily yield to one another in order for the system to run smoothly. SQL
Server is designed so that it yields as often as necessary and in the appropriate
places to keep the system humming along.
When a UMS worker yields�either because it has finished the task at hand (e.g.,
processing a T-SQL batch or executing an RPC) or because it has executed code with
an explicit call to one of the UMS yield functions�it is responsible for checking the
scheduler's runnable list for a ready worker and signaling that worker's event so that
it can run. The yield routine itself makes this check. So, in the process of calling one
of the UMS yield functions, a worker actually performs UMS's work for it�there's no
thread set aside within the scheduler for managing it. If there were, that thread
would have to be scheduled by Windows each time something needed to happen in
the scheduler. We'd likely be no better off than we were with Windows handling all of
the scheduling. In fact, we might even be worse off due to contention for the
scheduler thread and because of the additional overhead of the UMS code. By
allowing any worker to handle the maintenance of the scheduler, we allow the
thread already running on the processor to continue running as long as there is work
for it to do�a fundamental design requirement for a scheduling mechanism
intended to minimize context switches. As I said in Chapter 5 in the discussion of the
I/O completion port�based scheduler that we built, a scheduler intended to
minimize context switching must decouple the work queue from the workers that
carry it out. In an ideal situation, any thread can process any work request. This
allows a thread that is already scheduled by the operating system to remain
scheduled and continue running as long as there is work for it to do. It eliminates the
wastefulness of scheduling another thread to do work the thread that's already
running could do.
The waiter list maintains a list of workers waiting on a resource. When a UMS worker
requests a resource owned by another worker, it puts itself on the waiter list for the
resource and for its scheduler and enters an infinite wait state for its associated
event object. When the worker that owns the resource is ready to release it, it is
responsible for scanning the list of workers waiting on the resource and moving
them to the runnable list as appropriate. And when it hits a yield point, it is
responsible for setting the event of the first worker on the runnable list so that the
worker can run. This means that when a worker frees up a resource, it may well
undertake the entirety of the task of moving those workers that were waiting on the
resource from the waiter list to the runnable list and signaling one of them to run.
The I/O list maintains a list of outstanding asynchronous I/O requests. These
requests are encapsulated in UMS I/O request objects. When SQL Server initiates a
UMS I/O request, UMS goes down one of two code paths, depending on which
version of Windows it's running on. If running on Windows 9x or Windows ME, it
initiates a synchronous I/O operation (Windows 9x and ME do not support
asynchronous file I/O). If running on the Windows NT family, it initiates an
asynchronous I/O operation.
You'll recall from our discussion of Windows I/O earlier in the book that when a
thread wants to perform an I/O operation asynchronously, it supplies an
OVERLAPPED structure to the ReadFile/ReadFileEx or WriteFile/WriteFileEx function
calls. Initially, Windows sets the Internal member of this structure to
STATUS_PENDING to indicate that the operation is in progress. As long as the
operation continues, the Win32 API HasOverlappedIoCompleted will return false.
(HasOverlappedIoCompleted is actually a macro that simply checks
OVERLAPPED.Internal to see whether it is still set to STATUS_PENDING.)
In order to initiate an asynchronous I/O request via UMS, SQL Server instantiates a
UMS I/O request object and passes it into a method that's semantically similar to
ReadFile/ReadFileScatter or WriteFile/WriteFileGather, depending on whether it's
doing a read or a write and depending on whether it's doing scatter-gather I/O. A
UMS I/O request is a structure that encapsulates an asynchronous I/O request and
contains, as one of its members, an OVERLAPPED structure. The UMS asynchronous
I/O method called by the server passes this OVERLAPPED structure into the
appropriate Win32 asynchronous I/O function (e.g., ReadFile) for use with the
asynchronous operation. The UMS I/O request structure is then put on the I/O list for
the host scheduler.
Once an IO request is added to the IO list, it is the job of any worker that yields to
check this list to see whether asynchronous I/O operations have completed. To do
this, it simply walks the I/O list and calls HasOverlappedIoCompleted for each one,
passing the I/O request's OVERLAPPED member into the macro. When it finds a
request that has completed, it removes it from the I/O list, then calls its I/O
completion routine. This I/O completion routine was specified when the UMS I/O
request was originally created.
You'll recall from our discussion of asynchronous I/O that when an asynchronous
operation completes, Windows can optionally queue an I/O completion APC to the
original calling thread. As I said earlier, one of the design goals of UMS was to
provide much of the same scheduling and asynchronous I/O functionality found in
the OS kernel without requiring a switch into kernel mode. UMS's support for I/O
completion routines is another example of this design philosophy. A big difference
between the way Windows executes I/O completion routines and the way UMS does
is that, in UMS, the I/O completion routine executes within the context of whatever
worker is attempting to yield (and, therefore, checking the I/O list for completed I/O
operations) rather than always in the context of the thread that originally initiated
the asynchronous operation. The benefit of this is that a context switch is not
required to execute the I/O completion routine. The worker that is already running
and about to yield takes care of calling it before it goes to sleep. Because of this, no
interaction with the Windows kernel is necessary.
The timer list maintains a list of UMS timer requests. A timer request encapsulates a
timed work request. For example, if a worker needs to wait on a resource for a
specific amount of time before timing out, it is added to the timer list. When a
worker yields, it checks for expired timers on the timer list after checking for
completed I/O requests. If it finds an expired timer request, it removes it from the
timer list and moves its associated worker to the runnable list. If the runnable list is
empty when it does this�that is, if no other workers are ready to run�it also signals
the worker's associated event so that it can be scheduled by Windows to run.
If, after checking for completed I/O requests and expired timers, a worker finds that
the runnable list is empty, it enters a type of idle loop. It scans the timer list for the
next timer expiration, then enters a WaitForSingleObject call on an event object
that's associated with the scheduler itself using a timeout value equal to the next
timer expiration. You'll recall from our discussion of asynchronous I/O that the Win32
OVERLAPPED structure contains an event member that can store a reference to a
Windows event object. When a UMS scheduler is created, an event object is created
and associated with the scheduler itself. When an asynchronous I/O request is
initiated by the scheduler, this event is stored in the hEvent member of the I/O
request object's OVERLAPPED structure. This causes the completion of the
asynchronous I/O to signal the scheduler's event object. By waiting on this event
with a timeout set to the next timer expiration, a worker is waiting for either an I/O
request to complete or a timer to expire, whichever comes first. Since it's doing this
via a call to WaitForSingleObject, there's no polling involved and it doesn't use any
CPU resources until one of these two things occurs.
Going Preemptive
Certain operations within SQL Server require that a worker "go preemptive"�that is,
that it be taken off of the scheduler. An example is a call to an extended procedure.
As I've said, because UMS is a cooperative multitasking environment, it relies on
workers to yield at regular points within the code in order to keep the server running
smoothly. Obviously, it has no idea of whether an xproc can or will yield at any sort
of regular interval, and, in fact, there's no documented ODS API function that an
xproc could call to do so. So, the scheduler assumes that an xproc requires its own
thread on which to run. Therefore, prior to a worker executing an xproc, it removes
the next runnable worker from the runnable list and sets its event so that the
scheduler will continue to have a worker to process work requests. Meanwhile, the
original worker executes the xproc and is basically ignored by the scheduler until it
returns. Once the xproc returns, the worker continues processing its work request
(e.g., the remainder of the T-SQL batch in which the xproc was called), then returns
itself to the worker list once it becomes idle, as I mentioned earlier.
The salient point here is that because certain operations within the server require
their own workers, it's possible for there to momentarily be multiple threads active
for a single scheduler (and, by extension, for a single CPU within the machine since
these logical schedulers will often find themselves on their own CPUs). This means
that Windows will schedule these threads preemptively as it usually does, and you
will likely see context switches between them. It also means that since executing an
xproc effectively commandeers a UMS worker, executing a high number of xprocs
can have a very negative effect on scalability and concurrency. Each xproc executed
reduces UMS's ability to service a high number of logical users with a relatively low
number of workers.
Besides xprocs, there are several other activities that can cause a worker to need to
go preemptive. Examples include sp_OA calls, linked server queries, distributed
queries, server-to-server RPCs, T-SQL debugging, and a handful of others. Obviously,
you want to avoid these when you can if scalability and efficient resource use is a
primary concern.
Fiber Mode
When the server is in fiber mode, things work a little differently. As I mentioned in
Chapter 3, a fiber is a user mode concept�the kernel knows nothing of it. Since a
thread is actually Windows' only code execution mechanism, code that is run via a
fiber still has to be executed by a thread at some point. The way this is handled is
that Windows' fiber management APIs associate a group of threads with a single
thread object. When one of the fibers runs a piece of code, the code is actually
executed via its host thread. Afterward, user code is responsible for switching to
another fiber so that it can run�a concept not unlike the cooperative tasking offered
by UMS.
Given that when SQL Server is in fiber mode, multiple workers could be sharing a
single Windows thread, the process that's followed when taking a worker thread
preemptive won't work when we need to switch to preemptive mode with a worker
fiber. Because the execution mechanism within Windows is still a thread, the thread
that hosted the fiber would have to be taken off of the scheduler, and this would, in
turn, take all the other fiber workers hosted by the same thread off of the scheduler
as well�not a desirable situation.
The upshot of this is that executing things like xprocs and linked server queries can
be extremely inefficient in fiber mode. In fact, there are a number of components
within the server that aren't even supported in fiber mode
(sp_xml_preparedocument and ODSOLE, for example). If you need to run lots of
xprocs, linked server queries, distributed transactions, and the like, fiber mode may
not be your best option.
Hidden Schedulers
The server creates hidden schedulers for other uses as well. Other processes within
the server require the same type of latch, resource management, and scheduling
services that UMS provides for work request scheduling, so the server creates
hidden schedulers that allow those processes to make use of this functionality
without having to implement it themselves. An example of such a facility is SQL
Server's backup/restore facility. Given that many backup devices do not support
asynchronous I/O and the fact that doing a large amount of synchronous I/O on a
regular UMS scheduler would negatively impact the concurrency of the entire
scheduler because it would allow a single blocking synchronous I/O call to
monopolize the worker (not unlike calling external code does), SQL Server puts
backup/restore operations on their own scheduler. This allows them to contend with
one another for processor time and permits Windows to preemptively schedule them
with the other schedulers.
DBCC SQLPERF(umsstats)
I mentioned DBCC SQLPERF(umsstats) earlier, and you may already be aware of it
given that, although it's undocumented, it's mentioned in the public Microsoft
Knowledge Base. DBCC SQLPERF(umsstats) allows you to return a result set listing
statistics for the visible UMS schedulers on the system. It can list the total number of
users and workers for the scheduler, the number of workers on the runnable list, the
number of idle workers, the number of outstanding work requests, and so on. It's
very handy when you suspect you're experiencing some type of issue with a
scheduler and need to know what's going on behind the scenes. For example, you
should be able to quickly tell from this output whether a scheduler has reached its
maximum number of workers and whether they're currently busy. Table 10.1 details
the result set returned by DBCC SQLPERF(umsstats).
Statistic Meaning
num runnable
Statistic Meaning
num workers
idle workers
cntxt switches
cntxt
switches(idle)
Total Work The total number of work items processed by all schedulers
Recap
In order to increase scalability and support Windows fibers, SQL Server has managed
its own scheduling since version 7.0 via UMS. UMS serves as a thin layer between
the server and the operating system that provides much of the same functionality
offered by the Win32 thread and scheduling primitives, but it does so without
requiring as many transitions into kernel mode or as many context switches.
A key difference between UMS and Windows' scheduler is that UMS is a cooperative
scheduler. It relies on workers to voluntarily yield often enough to keep the system
running smoothly. By putting control of when a thread is scheduled under the
direction of the server, a much greater responsibility is placed on SQL Server
developers to write code that runs efficiently and yields often enough and in the
appropriate places. However, this also provides a much finer granularity of control
and allows the server to scale better than it could ever hope to using Windows' one-
size-fits-all scheduling approach because SQL Server knows its own scheduling
needs best.
Knowledge Measure
1. What mechanism within UMS encapsulates the scheduling facility for a single
logical processor?
2. What list within UMS tracks the workers that are ready to run?
3. What kernel object does UMS use to put workers to sleep that it does not want
to run at a particular point in time?
4. True or false: UMS provides a special thread for each processor that handles
scheduling workers to run, processing expired timers, and so on.
6. What list within UMS is responsible for tracking outstanding I/O requests?
7. What part of UMS ultimately owns the event object that gets signaled when an
asynchronous I/O request completes?
8. Describe what happens within UMS when SQL Server is in fiber mode and an
xproc is executed.
9. Assuming the server is in thread mode, name a facility within the server that
makes use of a hidden instance of the UMS scheduling facility.
10. True or false: All UMS I/O under Windows 9x is processed synchronously by SQL
Server.
11. When a worker yields, whose responsibility is it to check the timer list to see
whether any timers have expired?
13. True or false: An original UMS design goal was to provide support for using
Windows fibers within SQL Server.
14. True or false: A key difference between UMS scheduling and Windows
scheduling is that Windows provides a cooperative scheduling facility, while
UMS provides a preemptive scheduling facility.
16. Explain what is meant by the statement that a thread in an infinite wait state is
not considered "viable" by the Windows scheduler.
18. When a worker is on the list UMS uses to track workers that are ready to run,
what must occur in order for the worker to actually be schedulable by
Windows?
19. True or false: A linked server query will cause a UMS worker to have to switch
into preemptive mode.
20. True or false: Because SQL Server's max worker threads setting specifies the
maximum number of workers for each instance of the UMS scheduling facility,
a machine with two processors can have a total maximum of 510 worker
threads by default.
Chapter 11. SQL Server Memory
Management
Accustom a people to believe that priests, or any other class of men, can
forgive sins, and you will have sins in abundance.
�Thomas Paine[1]
[1] Paine, Thomas. "Worship and Church Bells." In The Complete Writings of Thomas Paine, Vol. II, ed. Philip S. Foner. New York:
Citadel Press, 1945, p. 726.
In this chapter, we'll explore SQL Server's memory management architecture. The
way that an application manages critical resources such as memory tells us a lot
about how it is designed. It tells us what priority the application designers placed on
efficient resource utilization and on maximizing the performance of the application.
As you will see in the discussion that follows, efficient memory management and
maximum system performance were both of paramount importance to the designers
of SQL Server. A considerable portion of the complex code within the product is
dedicated to managing memory efficiently and effectively. There are always trade-
offs with memory management. Too little memory usage makes your app efficient
but slow. Too much memory usage may make your app fast, but it may not play well
with others and it may become a resource hog in general. As you'll see, SQL Server
attempts to strike a balance between getting the most out of the resources available
to it and running well alongside other applications on the system.
Memory Regions
SQL Server organizes the memory it allocates into two distinct regions: the BPool
(buffer pool) and MemToLeave (memory to leave) regions. If you make use of AWE
memory, there's actually a third region: the physical memory above 3GB made
available by Windows' AWE support. (Refer to Chapter 4 for details on AWE.) The
BPool is the preeminent region of the three. It is SQL Server's primary allocation
pool. MemToLeave consists of the virtual memory space within the user mode
address space that is not used by the BPool. The AWE memory above 3GB functions
as an extension of the BPool and provides additional space for caching data and
index pages.
Sizing
When the server starts, it begins by computing the upper limit of the BPool. This
upper limit is the maximum size to which the server will allow the BPool to grow. On
a non-AWE system, this size will be set equal to the amount of physical memory in
the machine or to the size of the user mode address space minus the size of the
MemToLeave region, whichever is less. (If the sp_configure max server memory
setting has been changed from the default and is less than or equal to the amount of
physical memory in the machine, it will override this computation.) So, if the system
has 1GB of physical memory installed, the BPool will be sized to 1GB, provided max
server memory has not been adjusted.
On an AWE system, the BPool upper limit will be set to either the size of the total
physical memory in the machine or to the max server memory setting, whichever is
less. When AWE is used, the BPool isn't constrained by the size of the user mode
address space or the size of the MemToLeave region.
By default, the MemToLeave region is sized at 384MB. Of this, 128MB is reserved for
worker thread stacks (max worker threads = 255 x .5MB for each thread stack), and
256MB is reserved for allocations outside of the BPool. Examples of the types of
memory allocations that come from MemToLeave include OLE DB provider
allocations, in-process COM object allocations, and any memory allocation by the
server code itself that is larger than 8KB. This last item is important because it
means that large procedure or execution plans can be allocated from the
MemToLeave region.
Although all allocations for a contiguous memory block larger than 8KB come from
the MemToLeave region, the reverse is not always true. Because the server will
attempt to use the uncommitted portion of a reserved block for other allocation
requests, it's possible that allocations smaller than 8KB could end up coming from
the MemToLeave region, depending on the version and service pack of SQL Server
you're using. In order to make an allocation request, a memory consumer within the
server first allocates a memory allocation object, which it then uses to request the
allocation (this object implements the COM IMalloc interface, as we'll discuss later in
the chapter). If multiple requests are made via a given allocation object, it's possible
that some of them will be fulfilled using MemToLeave memory, even if they request
less than 8KB. For example, if a consumer within the server requests 10KB of
contiguous storage, the allocation object will allocate this from the MemToLeave
region. If necessary, it will first reserve a region sized to match the system's
allocation granularity (64KB on 32-bit Windows), then commit two 8KB pages to
satisfy the request. If the memory consumer then uses the same allocation object to
request, say, 4KB of additional space, the system will see the unallocated 6KB of
space at the end of the two-page region it just allocated and will fulfill the new
request from this space. Thus, it's possible for an allocation that's less than 8KB in
size to be satisfied outside of the BPool.
Because OLE DB providers, COM objects, and other external consumers that may
reside within the SQL Server process will know nothing of SQL Server's BPool or its
memory management facilities, it's essential that the server leave some amount of
virtual memory free within the user mode space. That's why MemToLeave exists�it
is basically unused memory within the SQL Server process space. If an in-process
COM object or other external consumer calls VirtualAlloc or HeapAlloc itself, it will
require virtual memory address space in order to satisfy the allocation request. If the
BPool were to take up all of this user mode address space itself, allocation requests
of this type would always fail.
The size of the MemToLeave region can be adjusted using the -g command line
parameter. The total memory required for the worker thread stacks cannot be
changed without changing the max worker threads sp_configure value or modifying
the default thread stack size by hacking the SQL Server executable, which you
should definitely not do. However, you can increase or decrease the 256MB set aside
for allocations outside of the BPool by passing a different value for the -g parameter.
This parameter can be handy in situations where you have a lot of linked server
queries, in-process COM objects, or other memory consumers contending for space
in the MemToLeave region. By making it larger, you give them a bigger sandbox in
which to play. Conversely, shrinking it can provide more virtual memory space for
the BPool and may improve performance in some situations.
Even though the BPool can be sized based on the amount of physical memory in the
machine, it is still based entirely in virtual memory, by default. The exception to this
is when you make use of AWE memory. In that situation, part of the BPool is in virtual
memory (but backed by pages locked in physical memory), and the rest is in
physical memory above 3GB. AWE is the only way for a user mode process to access
more than 3GB of memory. Since the maximum size of the user mode virtual address
space is 3GB (even with /3GB or /USERVA enabled), AWE is the only means by which
a process can access memory above the 3GB boundary. It just so happens that AWE
memory is physical rather than virtual memory and must be mapped into the user
mode space in order to be accessed.
The BPool
The vast majority of SQL Server's memory allocations come from the BPool. The
BPool consists of up to 32 separate memory regions organized into 8KB pages. You'll
recall from Chapter 4 our discussion of VirtualAlloc and the way it can be used to
reserve contiguous blocks of memory. After the server has computed the maximum
size of the BPool, it reserves the MemToLeave region in an attempt to ensure that
this region will be a contiguous address range. If possible, it will reserve this using a
single call to VirtualAlloc. If that's not possible (highly unlikely), the server will make
multiple calls to VirtualAlloc to reserve the MemToLeave region. The server then
attempts to reserve the BPool region from the user mode address space. (Put aside
how AWE affects this for now; we'll return to it in just a moment.) With DLLs,
memory-mapped files, and other allocations already in the user mode space, it's
very unlikely that the entire BPool will be able to be reserved with a single call to
VirtualAlloc. Instead, the BPool will likely have to consist of several fragments spread
across the user mode space. The server will call VirtualAlloc repeatedly, each time
with a smaller reservation request size until it succeeds. It will do this up to 32 times
in order to reserve as much of the BPool's maximum size as possible. Once this
completes, the server releases the MemToLeave region so that it will be available to
external consumers. Because it was reserved before the BPool reservations were
made and then released afterward, the MemToLeave region normally starts out as a
single, contiguous block of free virtual address space.
When AWE is involved, the algorithm is slightly different. The repetitive calls to
VirtualAlloc for the user mode portion of the BPool still occur. However, Windows
does not support reserving and committing AWE memory using separate operations.
As we discussed in Chapter 4, when using AWE, either the memory is allocated or it
isn't. AllocateUserPhysicalPages does not support the concept of reserving, but not
allocating, physical memory�memory is reserved and committed in one fell swoop.
Once it is, it must be mapped into a window in the user mode space so that a 32-bit
pointer can access it.
So, in this scenario, all of the memory set aside for the BPool is locked into physical
memory. This applies to both the user mode portion as well as the AWE portion.
Because physical memory is being locked, this can cause serious performance
problems for other applications running on the same machine, including other
instances of SQL Server. You typically set up a SQL Server this way when it is the
only significant app running on a machine.
When AWE support is enabled in SQL Server, the user account under which the
server is running must have the lock pages in memory privilege. SQL Server's setup
program will automatically grant this privilege to the startup account you choose for
the server. If you start the server from the command line or change the startup
account, you have to take care of this yourself.
Hashing
In order to make locating particular pages faster, SQL Server hashes pages within
the BPool. This basically amounts to creating a hash table over the pages in the
BPool such that, given the database ID, file number, and page number of a data
page, the server can quickly determine whether the BPool contains the data page
and where it is located if it does.
When the server needs to access a particular data page, it hashes the page's
database ID, file number, and page number such that the page maps to a particular
bucket within the hash table. That bucket consists of a linked list of pointers to BPool
pages. It is checked to see whether the page in question is on the list. If it is, it can
be quickly accessed in memory. If it is not, it must first be loaded from disk.
Primitive Allocations
Before any pages can be allocated from the BPool, SQL Server must allocate the
support structures required to manage it. The first of these that we'll talk about is a
global variable to hold a reference to an instance of the class that defines the BPool.
Because this variable has global scope, you can see it yourself using WinDbg and
the public symbols that ship with SQL Server. Exercise 11.1 takes you through
locating both the global variable and its host data type.
a. Due to its very nature and ubiquity within the server, the reference to the
BPool is likely stored in a global variable or similar construct.
.reload -f sqlservr.exe
x sqlservr!*
This will list all of the public symbols included in sqlservr.pdb, the program
database (symbol) file for SQL Server.
5. Scroll to the top of the command window and click above the start of the
output from the x command. Press Ctrl+F, type BPool, and press Enter.
6. You should find the first occurrence of a reference to the BPool class. This is
actually a reference to one of its methods. We can deduce from this that SQL
Server has a class named BPool. It's a fair guess that this is the data type of
the object that stores the SQL Server buffer pool, but we'll establish that with a
good degree of certainty in just a moment.
7. Now let's look for the global variable that we suspect stores the reference to
the buffer pool. Scroll to the top of the output and repeat your search, this time
specifying "bPool" for the search string (no quotes). Be sure to specify a case-
sensitive search in the dialog. Because C++ is a case-sensitive language, it's
common for developers to name an instance of a variable after its type using
different case. We'll start by checking for that.
8. If your search was case-sensitive, it should not have found any symbols named
"bPool," so we need to keep looking. Another common tactic is to spell out a
type name but abbreviate the names of instances of it or vice versa. Since we
already know the type name is BPool, let's look for it spelled out as BufferPool.
Repeat your search, this time specifying BufferPool as the search criteria.
9.
You should find a symbol named sqlserver!BufferPool. Notice that this isn't
prefixed by a class name and a pair of colons (::) as the BPool reference you
found earlier was. This means that it's a global of some type. Based on the
name alone, we can deduce that it's probably not a global function. Thus, it's
likely a global variable and probably the one that stores the reference to the
SQL Server buffer pool.
10. At this point, you could dump the contents of the BufferPool variable to the
console using the dd command or something similar. The value of the variable
itself isn't terribly useful to us at this point; I just wanted you to see that it was
indeed a global. All of SQL Server's BPool functionality is wrapped up in the
BPool class and in BufferPool, the single, global instance of it.
11. Scroll back to the top of the command window and repeat your search, this
time using "DropCleanBuffers" as your search string. You should find an entry
for BPool::DropCleanBuffers in the symbol list.
12. Readers of my previous books will recall the discussions of the DBCC
DROPCLEANBUFFERS command. This command was once undocumented but is
handy for releasing the clean buffers from the BPool in order to test a query
with a cold cache without cycling SQL Server. Given that the BPool class has a
member named DropCleanBuffers, might this be what the DBCC command
calls? Let's set a breakpoint and find out.
bp sqlserver!BPool::DropCleanBuffers
g
14. Now, switch over to Query Analyzer, connect to your server, and run DBCC
DROPCLEANBUFFERS in the editor. Switch back to WinDbg. You should see that
execution has stopped at your breakpoint. This tells us definitively that DBCC
DROPCLEANBUFFERS is implemented via a method off of the BPool class
named DropCleanBuffers. It also reinforces our assertion that the BPool class
we see in the debugger is the actual data type of SQL Server's global buffer
pool instance.
15. Type q in the WinDbg command window and press Enter to stop debugging.
You'll need to restart your SQL Server.
Page Arrays
BUF Array
SQL Server uses a special BUF structure to manage each page in the BPool. Before
reserving the BPool, SQL Server calls VirtualAlloc to allocate an array of BUF
structures from the MemToLeave region equal in size to the number of pages it will
reserve for the BPool (including physical AWE pages). Each page in the BPool will
have a corresponding BUF structure, as will each page of AWE memory allocated by
the server, regardless of whether it has been mapped into virtual memory. Since
each BUF structure is 64 bytes in size, this array isn't usually very large unless the
server is using a significant amount of AWE memory.
Each page's BUF structure functions as a type of header for it. It stores information
such as a pointer to the actual page in the BPool, the reference count for the page,
the page's latch, and status bits that indicate whether the page is dirty, has I/O
pending, is pinned in memory, and so on.
When the lazywriter traverses the pool looking for pages to free, this array of BUF
structures is what it actually sweeps. We'll discuss the lazywriter further in just a
moment.
Commit Bitmap
On startup, SQL Server allocates a bitmap from the default process heap that it uses
to track committed pages in the BPool. The original reservations tracked by the page
arrays are just that�reservations. As we discussed in Chapter 4, it's possible to
reserve virtual memory address space without committing any physical storage to it.
As each page in the BPool is committed, its corresponding bit in the commit bitmap
is set.
AWE
Although the BPool keeps track of which pages in AWE memory it makes use of, it
cannot access them directly due to the way that Windows' AWE facility works. It
accesses them by mapping physical pages in and out of the user mode address
space, as I mentioned in Chapter 4.
Note that if your server has less than 3GB of physical RAM and you enable AWE use
via the sp_configure awe enabled option, SQL Server will ignore it. You must have
3GB or more of physical memory in order to use Windows' AWE facility with SQL
Server, as we discussed earlier in the book.
The Lazywriter
The purpose of the lazywriter is twofold: (1) to keep a specified number of BPool
buffers free so they can be allocated for use by the server and (2) to monitor and
adjust the committed memory usage by the BPool so that enough physical memory
remains free on the system to prevent Windows from paging (provided dynamic
memory management is enabled so that the lazywriter can adjust the size of the
BPool as necessary). SQL Server estimates the number of BPool buffers to keep free
based on the system load and the number of stalls occurring (the number of times a
memory consumer within the server has to wait on a free buffer page).
The amount of physical memory the lazywriter attempts to keep free usually varies
between 4MB and 10MB. It is partially based on the computed page life expectancy
for the pool (the number of seconds a page will stay in the buffer pool without being
referenced). You can track the Buffer Manager:Page life expectancy Perfmon counter
to see what this value is for a particular instance of SQL Server. As the page life
expectancy increases (i.e., as the instance of SQL Server is less memory-pressured
and, therefore, pages stay in the cache longer even when not being referenced), the
amount of physical memory reserved for the OS climbs nearer to 10MB. As the page
life expectancy decreases, the physical memory set aside for the OS continues to fall
until it reaches approximately 4MB.
Keeping a minimum amount of physical memory available for use by the OS helps
ensure that it doesn't page unnecessarily and helps keep it and the other processes
on the SQL Server machine running smoothly. If the system begins to be pressured
for physical memory�even if that pressure is not coming from SQL Server�the
BPool will adjust its committed BPool memory downward so that more physical
memory is available.
The manner in which the BPool determines the amount of available physical memory
varies based on the OS. On Windows 2000, the Win32 API function
GlobalMemoryStatusEx is called. If you're running SQL Server on Windows 2000, you
can see this for yourself by attaching to SQL Server with WinDbg and setting a
breakpoint in kernel32!GlobalMemoryStatusEx. If you list the call stack once your
breakpoint is tripped, you'll see that GlobalMemoryStatusEx is being called by either
BPool::AvailablePagingFile or BPool::AvailablePhysicalMemory. Obviously, the latter
one is the call the BPool is using to ensure that physical memory stays at or above
the required threshold.
As I mentioned earlier, the BUF array contains an element for each page in the
BPool. Each BUF structure functions as a kind of BPool page header and contains a
reference count for its corresponding page. Each time a page is referenced, this
reference count is incremented. Some more expensive pages�such as those that
contain execution plans�begin with a higher reference count than other kinds of
pages. This helps keep them in memory longer and helps the server avoid incurring
the cost of recreating them or reloading them from disk unnecessarily.
Periodically, this BUF array is scanned. The reference count in each BUF structure is
divided by four and the remainder is discarded. When the reference count for a page
reaches zero, the page is checked to see whether it's dirty; if so, a write is scheduled
via UMS to flush the dirty page to disk. (If a page's reference count reaches zero and
the page is not dirty, it is simply freed�i.e., moved to the free list�without writing
anything to disk.) Because SQL Server uses a write-ahead log, this flush to disk of
the dirty page will be blocked while the transaction log is written to. Once a dirty
page has been successfully written to disk, it is unhashed (removed from the hash
table) and added to the free list. A dirty page is not usually decommitted in this
scenario�it is merely moved on to a list of free buffers so that it can be reused.
Avoiding unnecessary decommit/commit operations speeds up SQL Server's memory
operations significantly and is a key design tenet of an effective memory cache.
The size of the free list is calculated internally by SQL Server based on the size of
the BPool. The fact that SQL Server uses separate physical structures for a page and
its header allows a page to be flushed to disk and to "move" from list to list without
anything on the page actually changing. When a dirty page is flushed to disk and
freed, for example, information in the page's header (its BUF structure) changes, but
the page itself does not have to be modified. Once flushed, its contents are
considered disposable and can be overwritten when the page is reused. When a
page needs to move from one list to another (e.g., when it's freed), the much
smaller 64-byte BUF structure is moved from list to list rather than the 8KB page.
The reason the size of the structure matters here has to do with the use of the
processor's local cache. The smaller the structure, the more instances of it can be
stored in the chip's onboard cache, and the more efficient the process of moving
nodes between lists is. Leveraging the local cache on the CPU is another key tenet of
an effective memory cache.
On the Windows NT family, this process of aging pages, flushing them to disk, and
moving them to the free list is usually done by individual UMS workers. The
dedicated lazywriter thread usually finds that it has very little to do. On Windows 9x
and ME, though, the lazywriter thread plays a much bigger role. Because Windows
9x and ME do not support asynchronous file I/O, UMS is forced to perform all file I/O
synchronously (see Chapter 10). This limits the ability of a UMS worker to perform
lazywriter work; therefore, the dedicated lazywriter thread is much more active on
Windows 9x/ME than it is on the Windows NT family.
The lazywriter checks the free buffer count and the available physical memory once
per second or when signaled. You can see this for yourself by attaching to SQL
Server with WinDbg and setting a breakpoint in the BPool::AvailablePhysicalMemory
method that we discovered earlier, as shown below. (Note that due to page width
limitations this code appears printed on two lines, but you must type it all together
as one line.)
bp sqlservr!BPool::AvailablePhysicalMemory
".echo checking availphys; g"
Once you restart WinDbg, you'll see the message following .echo displayed once a
second until you stop the debugger.
I should point out here that, until the BPool upper limit size is reached, the lazywriter
doesn't free up buffers by flushing them to disk and putting them on the free list.
Instead, it merely commits more reserved pages in the BPool each time the free list
falls below the built-in threshold and changes the corresponding bits in the commit
bitmap accordingly.
The lazywriter checks 16 BUF structures at a time. It keeps track of where it left off
with each iteration and picks back up there on the next run (a second later or when
signaled, as I've said). When the lazywriter reaches the end of the BUF array, it
wraps back around to the start in a manner similar to a clock�it sweeps
continuously through the BUF array not unlike the way the hands on a clock sweep
indefinitely.
Checkpoint
The checkpoint process also scans the BPool and flushes dirty pages to disk. It does
not, however, move pages to the free list. Freeing pages is the job of the UMS
workers and the dedicated lazywriter thread. Checkpoint's job is to shorten the
amount of time required to recover the system by keeping the number of dirty pages
to a minimum. It's common for a checkpoint not to find many pages to flush to disk
because dirty pages have already been flushed by the lazywriter thread or individual
UMS workers.
Partitions
In order to provide for better scalability, SQL Server partitions the BPool free list by
CPU. As we discussed in Chapter 10, a given UMS user is assigned to a particular
scheduler when it first connects and remains with that scheduler until it disconnects.
Each UMS scheduler is typically associated with a particular CPU. When a UMS user
needs a free page, the partition associated with its CPU is checked first, followed by
the other partitions if necessary. By partitioning the BPool free list by CPU, we make
better use of the processor's local cache and improve the scalability of the server.
The Memory Managers
Rather than embed all of the complexities of memory management into the BPool
itself, SQL Server spreads it across five major memory manager classes. This allows
different memory consumers to allocate and manage memory independently of one
another while still using a shared, managed memory pool.
The five managers are the Connection, Query Plan, Optimizer, Utility, and General
memory managers. Each is responsible for a different class of memory management
within the server. Let's discuss each of these separately.
Connection
Query Plan
SQL Server uses the Query Plan memory manager mainly for allocations related to
compiled plans and execution plans produced by the query optimizer (you can see
these in the syscacheobjects system table). This memory manager also handles
allocations related to cursors, RPC parameters, index creation, and certain DBCC
commands that deal with indexes on computed columns.
Note that the Query Plan memory manager is the only memory manager that the
lazywriter can cause to free up memory. A page whose BUF header has a status bit
indicating that the page is related to a query plan can be aged out of the cache by
the lazywriter when its reference count reaches zero.
Optimizer
SQL Server uses the Optimizer memory manager to allocate and manage metadata
and tree structures related to query optimization. The server limits this manager to
80% of the server's total memory.
Utility
SQL Server has a special memory manager for use by utility functions. As the name
suggests, it uses the Utility memory manager for various utility-related allocations,
including buffers used by the server-side trace facility, log manager initialization,
cluster- and log shipping�related functions, bitmap comparisons, hash table
searches, and so on.
General
This memory manager is for allocations that don't fit any of the above categories.
Allocations for several different types of internal storage engine structures, including
locks, are made using the General memory manager.
OS Memory Allocations
As a rule, the five major memory managers attempt to allocate memory from the
BPool unless the request is larger than 8KB. For allocations larger than 8KB of
contiguous memory, the request is usually passed on to the memory manager
responsible for allocating memory from the MemToLeave region. This memory
manager is commonly known as the OS or Reserved memory manager. It is not a
separate manager class in the same sense as the five major ones we just discussed;
instead, it provides functionality that all of them can make use of for large
allocations.
SQL Server provides a special emergency condition memory manager that is used in
rare circumstances when the normal memory managers have failed and the server
must have some memory in order to continue without severe consequences (e.g.,
corruption) during such things as logging or recovery. The normal managers do not
automatically use this memory manager when allocations they attempt fail; it is
used only by certain very critical code paths within the server.
When SQL Server starts, this memory manager commits a 64KB region of virtual
memory. Only one consumer can use this memory at a time. Each new consumer
must wait until the others have released the memory before the new consumer can
use it.
When this memory manager is used, it writes a warning message to the SQL Server
error log:
Warning: Due to low virtual memory, special reserved memory used
%d times since startup. Increase virtual memory on server.
Due to its very nature, you'll normally see other errors or warnings accompanying
this message when it occurs.
IMalloc
Internally, SQL Server uses a custom implementation of the COM IMalloc interface to
handle individual memory allocations. As I mentioned earlier in the chapter, when a
consumer needs to make a memory request, it first allocates a memory allocation
object. It then uses this allocation object to make the request. Multiple requests can
be made via a single allocation object. This allocation object implements the
standard COM IMalloc interface.
IMalloc includes all the features you'd expect to find in a standard memory allocator:
allocation, deallocation, resizing, functions to determine allocation block sizes, and
so on. COM provides a standard implementation of this interface that uses the Win32
heap facilities to carry out allocation requests. Since SQL Server performs its own
memory management, it doesn't use the standard COM implementation�it uses an
internal implementation that allocates requests from the BPool or MemToLeave
regions as appropriate.
Pulling It All Together
Thus far, we've explored the individual components of SQL Server's memory
architecture in some detail. In this section, we'll pull these elements together so you
can better understand how SQL Server manages memory overall and why it behaves
the way it does in certain circumstances.
When you start SQL Server, the BPool's upper limit is computed based on the
physical memory in the machine, the max server memory sp_configure value, and
the size of the MemToLeave region. Once this size is computed, the MemToLeave
region is set aside (reserved) so that it will not be fragmented by the BPool
reservations that are to follow. The BPool region is then set aside, using as many as
32 separate reservations in order to allocate around the DLLs and other allocations
that may already be taking up virtual address space within the SQL Server process
by the time the BPool is reserved.
Once the BPool is reserved, the MemToLeave region is released. SQL Server does not
hold on to the MemToLeave region because this region is intended for "external"
(i.e., outside the core SQL Server code) consumption. It is used for internal SQL
Server allocations that exceed 8KB of contiguous space and for allocations made by
external consumers such as OLE DB providers, in-process COM objects, and the like.
As I've said, SQL Server reserves the entire MemToLeave region at startup, then
releases it after the BPool reservations are made in order to keep it from being
fragmented by BPool reservations.
So, once the server has started, the BPool has been reserved, but not committed,
and the MemToLeave region is essentially free space within the virtual memory
address space of the process. If you view the Virtual Bytes Perfmon counter for the
SQL Server process just after SQL Server has started, you'll see that it reflects the
BPool reservation. I've seen people become alarmed because this number is often so
high�after all, it usually reflects either the total physical memory in the machine or
the maximum user mode address space minus the size of the MemToLeave region.
This is nothing to worry about, however, because it is only reserved, not committed,
space. As I said in Chapter 4, reserved space is just address space�it does not have
physical storage behind it until it is committed.
Over time, the amount of memory committed to the BPool will increase until it
reaches the upper limit computed when the server was originally started. You can
track this via the SQL Server:Buffer Manager\Target Pages Perfmon counter. As
different parts of the server need memory, the BPool commits the 8KB pages it
originally reserved until this committed size reaches the computed target. Since this
is virtual space as opposed to physical space, and since the majority of virtual
memory is backed by the system paging file, not by physical RAM, this does not
necessarily equate to more physical memory usage. You can track the BPool's use of
committed virtual memory via the SQL Server:Buffer Manager\Total Pages Perfmon
counter. You can track the server's overall use of committed virtual memory via the
Private Bytes counter for the SQL Server process.
Because most of SQL Server's virtual memory usage comes from the BPool, these
two counters will, generally speaking, increase or level off in tandem. If the Total
Pages counter levels off but the Private Bytes counter continues to climb, this
usually indicates continued allocations from the MemToLeave region. These
allocations could be completely normal�for example, they could be allocations
related to thread stacks as additional worker threads are created within the
server�or they could indicate a leak by an external consumer such as an in-process
COM object or an xproc. If the process runs out of virtual memory address space
because the MemToLeave region is exhausted due to a leak or overconsumption (or
if the maximum free block within the MemToLeave region falls below the default
thread stack size of .5MB), the server will be unable to create new worker threads,
even if the sp_configure max worker threads value has not been reached. In this
situation, if the server needs to create a new worker thread in order to carry out a
work request�for example, to process a new connection request�this work request
will be delayed until the server can create the thread or another worker becomes
available for it to use. This can prevent a user from connecting to the server because
the connection may time out before a sufficient amount of MemToLeave space is
freed or another worker becomes available to process the connection request.
A memory consumer within the server initiates a memory allocation by first creating
a memory object to manage the request. This memory object is an implementation
of the standard COM IMalloc interface. When the object allocates the request, it calls
on the appropriate memory manager within the server to fulfill the request from
either the BPool or the MemToLeave region. For requests of 8KB or less, the request
is usually filled using memory from the BPool. For requests of more than 8KB of
contiguous space, the request is usually filled using memory from the MemToLeave
region. Because a single memory object may be used to carry out multiple
allocations, it's actually possible for an allocation of less than 8KB to be allocated
from the MemToLeave region, as I mentioned earlier.
Consumers of memory within the SQL Server process space are usually internal
consumers, that is, consumers or objects within the SQL Server code itself that need
memory to carry out a task, but they do not have to be. They can also be external
consumers, as I've said. External consumers include OLE DB providers, xprocs, in-
process COM objects, and so on. Usually, these external consumers use normal
Win32 memory API functions to allocate and manage memory and, therefore,
allocate space from the MemToLeave space since it is the only region within the SQL
Server process that appears to be available. However, xprocs are a special
exception. When an xproc calls the ODS srv_alloc API function, it is treated just like
any other consumer within the server. Generally speaking, srv_alloc requests for 8KB
of memory or less are allocated from the BPool. Larger allocations come from the
MemToLeave space.
As the server runs, the lazywriter checks to make sure that a given amount of
physical memory remains available on the server so that Windows and other apps
on the server continue to run smoothly. This amount can vary between 4MB and
10MB (it trends closer to 10MB on Windows Server 2003) and is based on the system
load and the page life expectancy for the BPool. If the physical memory on the
server begins to dip below this threshold, the server decommits BPool pages in order
to shrink its physical storage usage (assuming that dynamic memory configuration is
enabled).
The lazywriter also ensures that a given number of pages remain free at any given
point in time so that, as new allocation requests come in, they will not have to wait
for memory to be allocated. By "free" I mean that the page is committed but not
used. Unused committed BPool pages are tracked via a free list. As pages are used
from this list, the lazywriter commits more pages from the BPool reservation until the
entire reservation has been committed. You will see the Process:Private Bytes
Perfmon counter increase gradually (and usually linearly) due to this activity.
There is a separate free list for each CPU on the system. When a free page is needed
to satisfy an allocation request, the free list associated with the UMS worker
requesting the allocation is checked first, followed by the lists for the other CPUs on
the system. This is done to improve scalability by making better use of the local
cache for each processor on a multiprocessor system. You can monitor a specific
BPool partition via the SQL Server:Buffer Partition Perfmon object. You can monitor
the free list for all partitions via the SQL Server:Buffer Manager\Free Pages Perfmon
counter.
So, throughout the time that it runs, SQL Server's lazywriter process (whether run
from the lazywriter thread or via a UMS worker) monitors the memory status of the
system to be sure that a reasonable amount of physical memory remains available
to the rest of the system and that a healthy number of free pages remains available
for use by new memory allocation requests.
Of necessity, some of this changes when AWE memory is used by the server.
Because Windows' AWE facility does not support the concept of reserving but not
committing memory (i.e., dynamic memory commitment/decommitment), SQL
Server doesn't support dynamic memory management when using AWE memory.
The BPool begins by acquiring and locking physical memory on the machine. The
amount of memory it locks varies based on whether max server memory has been
set. If it has, the BPool attempts to lock the amount specified by max server
memory. If it has not, the BPool locks all of the physical memory on the machine
except for approximately 128MB, which it leaves available for other processes. The
BPool then uses the physical memory above 3GB (the AWE memory) as a kind of
paging file for data and index pages. It maps physical pages from this region into the
virtual memory address space as necessary so they can be referenced via 32-bit
pointers.
Recap
So there you have it: SQL Server memory management in a nutshell. Understanding
how an application allocates and manages memory is essential to understanding
how the application itself works. Memory is such an important resource and its
efficient use is such an integral element of sound application design that
understanding how memory is managed by an application gives you great insight
into the overall design of the application.
Knowledge Measure
1. The BPool is the pool from which most memory allocations are made within
SQL Server. What region is set aside for external consumers such as OLE DB
providers that run in-process with the server?
4. What Perfmon counter reflects the total amount of committed virtual memory
within a process?
5. Describe, in general terms, the two main functions of the lazywriter process.
6. True or false: On all releases of SQL Server 2000 and later, allocations of 8KB
or less always come from the BPool.
7. True or false: Allocations made by an xproc via the ODS srv_alloc API function
are always allocated from the MemToLeave region, regardless of size.
8. What standard COM interface is used within SQL Server to manage individual
memory allocations?
9. Which of the five memory managers is the only one the lazywriter can cause to
free allocated pages?
10. On a system with 512MB of physical memory and a SQL Server startup
parameter of -g768, what will be the maximum size to which the BPool can
grow?
11. What Win32 API function is used by the lazywriter process on Windows 2000 to
check the available physical memory in the machine?
12. True or false: Because UMS workers are often allowed to perform the work of
the lazywriter process, the dedicated lazywriter often finds little to do when it
runs on an instance of SQL Server installed on the Windows NT family of the
Windows operating system.
13. Why is the virtual memory address space for the MemToLeave region reserved
at system startup?
14. On a four-way SMP system, how many BPool free list partitions will SQL Server
create?
15. What command line parameter can you pass into SQL Server to adjust the size
of the MemToLeave region?
16. What operating system privilege must the SQL Server startup account have in
order to make use of AWE memory?
17.
True or false: When you configure SQL Server to use AWE memory on a system
with less than 3GB of physical memory, the server will refuse to start and log
an error in the system error log.
18. What mechanism does SQL Server use to track committed pages in the BPool?
19. What mechanism does SQL Server use to locate a data page in the BPool using
its database ID, file number, and page number?
20. How many total reservations can be made at system startup to reserve the
BPool?
21. On a SQL Server with 2GB of physical memory, a startup parameter of -g512,
and a max server memory setting of 384, what will be the maximum size to
which the BPool can grow?
22. By default, how much of the MemToLeave pool is set aside for thread stacks?
23. True or false: Because memory allocations by in-process COM objects that are
8K or less come from the BPool, they are automatically freed when the object
is destroyed.
24. True or false: When AWE memory is being used by SQL Server, the physical
memory behind the BPool is locked and will not be available to other
processes.
25. True or false: By default, the lazywriter process tries to keep at least 384MB
available for the MemToLeave region within the SQL Server process.
Chapter 12. Query Processor
I believe in the equality of man; and I believe that religious duties consist in
doing justice, loving mercy and endeavoring to make our fellow creatures
happy.
�Thomas Paine[1]
[1] Paine, Thomas. The Age of Reason, ed. Philip S. Foner. New York: Citadel Press, 1974, p. 50.
Query processing and performance is so important to optimal SQL Server usage that
I've included a chapter on it in each of my SQL Server books. In this book, I'll update
the coverage from my last book, The Guru's Guide to SQL Server Stored Procedures,
XML, and HTML, and continue the discussion of query optimization internals begun in
that book. While providing a boatload of individual performance tips would certainly
have some short-term benefits, it occurs to me that understanding the reasoning
behind the how-to is actually more important and will benefit you more in the long
run. As I said in the Introduction, understanding the design behind a technology is
more important than merely learning how to use it. Your ability to tune Transact-SQL
queries is a product of your knowledge and understanding of SQL Server itself�how
it works, what it does when it processes a query, what resources it needs to
formulate an efficient plan, and so on. So, in this chapter, I'll balance the coverage of
practical user information with a discussion of some of the internals of SQL Server
query processing. I'll talk about the various query stages and what happens at each
one. As you come to better understand how SQL Server query processing works,
you'll develop your own methods of speeding up T-SQL code, and you'll use
techniques that are sensible and safe because you'll have a good understanding of
how SQL Server was designed to work.
Key Terms and Concepts
Cardinality� strictly speaking, refers to the number of unique values in a
table. Because SQL Server allows tables to have duplicate rows and indexes to
have duplicate key values, cardinality often more generally refers to the
number of rows in a table or the number of rows returned by a query plan
operator when discussed from a SQL Server perspective. Cardinality can also
refer to the number of unique values in an index.
SQL Server attempts to avoid redundantly parsing and optimizing queries by hashing
the text of each newly submitted query and caching it in memory, along with the
original query text and a link to the execution context and the execution plan that
the original query resulted in. It checks the hashed query text of each newly
submitted query against those already in memory. In order to allow for the possibility
of hash collisions, if it finds a match, it then compares the original text of the new
query with the one in memory to see whether they match exactly. If they do, the
new query doesn't need to be reparsed and reoptimized�the original execution
context and/or execution plan can be reused.
The process within SQL Server that handles parsing queries submitted via language
events and RPCs into relational operator trees is known as language processing and
execution (LPE). LPE is responsible for handing a parsed relational operator tree to
the optimizer, then taking the optimized execution plan produced by the optimizer's
conversion stage and executing it. LPE is not a single component or code line within
the server but actually spans many modules and involves multiple facilities within
the server.
Optimization Stages
Figure 12.1 outlines the query optimization process. Once the optimizer receives a
relational operator tree into which the initial parsed query has been placed, it begins
optimizing it�searching for a means of carrying out the work it requires with the
least amount of cost and changing it as necessary to accomplish that. Generally
speaking, each successive phase or stage of the optimization process allows more
sophisticated (and often more expensive) options to be considered for transforming
the initial relational operator tree presented by the parser into an optimized
execution plan. If a suitably inexpensive plan is found during this process, it can be
returned without requiring further optimization or evaluation of potential
transformations. There are four distinct stages in the optimization process: trivial
plan optimization, simplification, full optimization, and conversion. I'll describe each
of these separately.
With certain queries, the most efficient execution plan is evident based on the query
itself and doesn't require any cost estimation or comparisons between plans. For
example, if a query uses an equality predicate to filter a query on a column with a
unique index, there's no need to estimate the costs of various plans and compare
them; a seek using the unique index is obviously the best choice. The optimizer also
has trivial optimizations for certain types of covered queries, joins between tables
with constraint relationships, DML statements, and a few others. (Note, however,
that a query that contains a subquery is not eligible for a trivial plan.) When the
optimizer detects that a trivial plan is the cheapest plan it can hope to obtain, it
sends the plan on to the conversion stage so that it can be returned for execution.
You can detect when a trivial plan has been chosen by enabling the 8759 trace flag.
When this flag is enabled, the optimizer will write the first portion of a query for
which a trivial plan is created to the error log.
Because the only predicate used is an equality comparison predicate and there's a
unique index on the Orders.OrderNo column, the optimizer will select a trivial plan
that features an index seek to service this query. Note that the fact we're using a
variable here to filter the query doesn't impair the optimizer's ability to select a
trivial plan. Regardless of the value the variable ends up with at runtime, a simple
equality operator is being used to compare the variable with a uniquely indexed
column, and the optimizer intrinsically knows that a trivial plan is the best option.
The trivial optimization and full optimization stages are the only two optimization
stages from which a plan can be generated and the optimization process can end. If
a query cannot be optimized via a trivial optimization, it is passed to the
simplification stage and then on to the full optimization stage.
Simplification
During the simplification stage, certain heuristics are applied that allow for easier
and more effective optimization later. As the name suggests, the purpose of this
stage is to simplify the query expression and facilitate better index and statistics
usage during the full optimization process. Operators can be relocated in the tree,
changed out for other operators, and generally simplified to make the ensuing full
optimization process simpler, quicker, and more effective. For example, subqueries
used in a predicate can be flattened into semi-joins and inner joins, filters that apply
to just one table in a multitable query can be moved above the join in the tree, star
schema relationships can be detected, certain operators can be replaced by more
efficient equivalents, and so on. Here's an example of a query containing a subquery
that is flattened into a semi-join by the optimizer during the simplification stage:
SELECT a.au_id
FROM authors a
WHERE au_id IN (SELECT au_id FROM titleauthor ta WHERE
ta.au_id=a.au_id)
If you view the graphical showplan for this query in Query Analyzer, you'll see that
the optimizer produces a plan featuring a Nested Loops/Left Semi Join operator.
Although expressed as a subquery in the T-SQL code, the query is really asking for
the rows in the authors table that have matches in the titleauthor table. Thus, rather
than run the subquery for each row in the authors table, the optimizer is smart
enough to normalize the query into a simple semi-join between the two tables.
Full Optimization
If the optimizer is unable to find a suitable trivial plan, the plan is simplified, then
passed on to the full optimization stage. The full optimization stage attempts to
project the cost of various alternate ways of returning the data the query requests
and selects the one that is the least expensive. It does not go through this process
indefinitely�a timeout governs how long the optimizer thrashes through the various
ways of processing a query before it chooses one. If the optimizer hasn't found the
most efficient execution plan by the time the timeout expires, it chooses the least
expensive plan it has found up to that point and passes it on to the conversion stage
so that it can be returned for execution.
Transaction Processing
During the transaction processing step, the optimizer applies a subset of the
potential transformations available to it during full optimization. The purpose of this
step is to find most of the transaction processing�oriented query plans that were
not found during the trivial plan stage as quickly as possible. For the most part, only
transformations involving a join with an indexed lookup are allowed during this step.
When an operator does not support an indexed lookup, a transformation involving a
hash may be permitted.
Quick Plan
The quick plan step permits all transformation rules supported by the optimizer. It
allows transformations involving join reordering (of the bottom four tables of the
relational operator tree) and will select the first plan it finds with an estimated cost
of less than approximately one second.
Parallel Optimization
During the parallel optimization step, if the least expensive plan found up to that
point exceeds the cost threshold for parallelism sp_configure setting (and SQL Server
is running on a multiprocessor computer), the optimizer evaluates operators and
transformations that could exploit the multiprocessing capabilities of the machine.
These operators will be more fully explored during the full optimization step.
Full Optimization
If all other attempts at finding a suitably inexpensive execution plan fail, the full
optimization step of the full optimization stage is entered. During this step, the
optimizer recursively walks the relational operator tree as it exists after having
passed through the other optimization stages and steps. This tree consists of
relational AND/OR nodes and operators. The OR nodes represent a set of mutually
exclusive operators (e.g., nested loop vs. hash join vs. merge join) for a particular
plan step. The optimizer compares these with one another and selects the least
expensive. AND nodes represent operators in which each child node needs to be
optimized. The optimizer walks these child nodes and evaluates them separately.
The cost of an OR node is equal to the child node with the lowest cost. The cost of an
AND node is equal to the sum of the cost of the child nodes, plus some operator
cost.
The recursive algorithm followed during this step is basically the same as that
followed during earlier stages, it's just that a larger set of the potential
transformations are considered. For example, given that the full optimization step
comes after the parallel optimization step, parallel operators can be evaluated for
cost during this step. Different types of joins can be compared with one another, and
reformatting (on-the-fly index creation) and other advanced optimization strategies
can be evaluated for cost and compared with one another.
Conversion
Regardless of the stage that actually produces the execution plan, the plan must
first go through a conversion in order to be passed to the LPE process to be carried
out. This is handled by the conversion stage of the optimization process. Each
optimized query plan is processed by this stage before being returned for execution.
Optimization Limits
The optimizer uses multiple techniques to ensure that the process of comparing the
cost of operators and plans does not continue indefinitely or even exorbitantly long.
At some point during the optimization process, the optimizer can begin to reach a
point of diminishing returns. After all, if the time required by the evaluation process
far exceeds the execution time of even the most expensive plans, the optimization
process may not be able to save that much processing time overall. In a pathological
situation, it may be even more expensive than not optimizing at all. The optimizer is
designed to avoid situations like this and, to the extent possible, ensure that there is
a genuine performance benefit associated with its cost.
The optimizer also implements the notion of a timeout value. This timeout value is
based on a set number of transformations the optimizer believes it can perform
before it needs to time out. Throughout the optimization process, the optimizer
checks to see whether it has exceeded this number of transformations and times out
if so. Because the number of transformations used to compute the timeout is based
on an assumption that the optimizer can perform a given number of transformations
per second, this timeout has only a rough correlation to actual elapsed time. It is a
semi-linear function that starts at around 10% of the initial cost estimate and
increases more or less linearly until it levels off at approximately 60 seconds of
elapsed time. Note that this timeout is only as accurate as the transformation time
estimates made. There is no strictly enforced concept of a time-based timeout value.
Because the number of transformations that the optimizer can apply per second can
vary widely, the exact amount of time that may elapse before a query times out may
change from query to query.
You can use trace flag 8675 to determine when the optimizer times out while
generating a query plan. This flag will also tell you when the optimizer reaches the
memory limit to which SQL Server restricts it (about 80% of the BPool). You can also
often infer that an optimization timeout has occurred via the compile time returned
by SET STATISTICS TIME ON. If this is around 60 seconds, you may indeed be seeing
an optimization timeout, particularly in situations where you know that the optimizer
has inexplicably failed to choose the best plan for a given query.
Parameter Sniffing
Prior to compiling an execution plan for a stored procedure, SQL Server attempts to
"sniff" (i.e., discern) the values of the parameters being passed into it and use those
values when compiling the plan. When these values are being used to filter a query
(i.e., as part of a WHERE or HAVING predicate), this allows the optimizer to produce a
more precise execution plan tailored to the values being passed into the stored
procedure (using the statistics histograms for the columns they're used to filter)
rather than based solely on the average density of the relevant table columns.
Generally speaking, this is a good thing and results in improved performance over
older versions of SQL Server.
You can get into trouble, however, when an atypical parameter value is passed in
when a plan is first compiled for a procedure. The plan that's cached in memory may
be suboptimal for the majority of the values that will be supplied for the parameter.
When more typical values are supplied, they may reuse the old plan and may take
longer to execute than they would have if the plan had been tailored to them
instead.
You have a few options in this situation. You can mark the procedure for automatic
recompilation using the WITH RECOMPILE option. This will cause the procedure's
plan to be rebuilt each time it's executed. If the compilation time is negligible, this
can be an easy solution for dealing with parameter values that vary a great deal in
terms of their distribution across a table column.
You can also execute a procedure using the WITH RECOMPILE option�again causing
the plan to be rebuilt. Similarly, you can use the sp_recompile procedure to cause a
procedure's plan to be rebuilt the next time it's executed.
You can also "disable" parameter sniffing by filtering your query using local variables
to which you've copied the parameter values. Using local variables instead of
procedure parameters to filter a query is generally a bad idea because it inhibits the
use of an index's statistics histogram in computing selectivity, but there are
exceptions to the rule. When the optimizer can't use the statistics histogram to
compute the number of rows that may be returned by a particular filter criterion, it
uses magic numbers�hard-coded estimates of the percentage of rows that will be
returned based on the comparison operator used. In some rare cases, these
estimates may be more accurate than using the histogram itself. One such case is
when the value used to scan the histogram is atypical and results in a skewed
estimate of the number of rows that will normally match the supplied parameter.
You can sometimes also reorganize a query that's affected by errant parameter
sniffing such that it runs in a distinct execution context that receives its own
execution plan. This leverages the fact that when a procedure executes a T-SQL
block dynamically or calls another procedure, each gets its own execution plan.
Methods of using this technique to deal with parameter sniffing idiosyncrasies
include using sp_executesql, EXEC(), and breaking a procedure into multiple
procedures. With the sp_executesql and multiple procedure approach, you can still
benefit from plan reuse. With EXEC(), you are not likely to�it's likely that your plan
will have to be recompiled with each execution. Depending on what you're doing and
how long compilation takes, this may or may not be desirable.
You can also explicitly clear the procedure cache with DBCC FREEPROCCACHE. This
will cause all compiled plans to be tossed from memory and force them to be
recompiled the next time each procedure is executed. This is a fairly drastic measure
but can apply in some circumstances. For example, you might use it before and after
running a nightly job that executes a number of procedures with atypical parameter
values that usually run during the day with more typical parameters. This would help
make sure that you don't reuse the plans from earlier in the day and that the plans
created for your nightly job run aren't reused the next day.
Another situation in which you can run into trouble with parameter sniffing is when
an execution plan in the cache reflects typical parameter values that you pass, but
you need to pass an atypical parameter into a procedure and you need it to execute
as quickly as possible. In this case, the problem isn't that you have a suboptimal
plan in the cache�for the majority of your queries, the plan is optimal. The problem
is that these two queries shouldn't be sharing an execution plan. Say, for example,
that you have a stored procedure to which you pass a country code so that it can
return national sales figures. Most of the time, you pass in "US" because your
business is based in the United States and most of your sales are in that country.
Due to the fact that U.S.-based sales records make up most of your sales table, the
optimizer generates an execution plan that uses a table scan. A table scan is more
efficient in this situation than an index seek because most of the rows are being
returned anyway. Sometimes, however, you pass in a different country code�a code
for a country for which there may be only a few sales. You expect this query to
return relatively quickly based on the handful of rows it will eventually yield, but it
doesn't. It also results in a table scan because it reuses the plan originally compiled
when you passed in "US." One potential solution would be to reorganize the
procedure into multiple procedures�one for U.S.-based sales and one for all other
countries. Given that your sales to countries outside the United States are relatively
few and fairly evenly distributed, you will likely see an index seek (provided that an
appropriate index exists) when querying for sales from these countries. Conversely,
your queries for U.S.-based sales will continue to use table scans because that is the
most efficient way to service them. Especially if a plan takes awhile to compile (and,
hence, is not an ideal candidate for automatic recompilation with each execution),
using multiple procedures in this manner may be a viable solution for you.
Auto-Parameterization
In order to increase the likelihood of plan reuse, SQL Server attempts to
automatically parameterize ad hoc queries. By "auto-parameterize," I mean that the
server can replace a constant value in an ad hoc query with a parameter marker so
that the plan can be reused with different values for the constant. Consider the
following query, for example: SELECT * FROM Orders WHERE OrderId=10248
In this query, the value 10248 is a constant. If the server were unable to auto-
parameterize the query, a value other than 10248 would result in a separate
execution plan being generated. Instead, the server is intelligent enough to replace
10248 with a parameter marker and supply 10248 as the value of the parameter for
a given execution context. Other executions of the same query with different
parameters will each have their own execution contexts but will use the same
execution plan.
Rather than relying on the server to correctly guess the parameters for a given
query, you can specify them directly via sp_executesql or by using parameter
markers in your query text. When you do this, there's no ambiguity, and you have
more control over the data types used for each parameter. SQL Server infers each
auto-parameter's data type based on the constant passed in, but it can occasionally
guess wrong, causing a data type mismatch between the parameter and the column
to which it is being compared. Given that this can possibly inhibit index usage, it's
important to be aware of it. For example, in the query above, SQL Server guesses
that the order ID parameter should be a smallint based on the value of 10248, which
fits comfortably within the range of SQL Server's two-byte smallint data type.
However, the OrderId column in the Orders table is actually a four-byte int, so the
execution plan must include a CONVERT operator in order to reconcile the difference
between the two.
The textual or graphical showplan for an auto-parameterized query will indicate that
it has been parameterized. When an ad hoc query has been auto-parameterized,
you'll see parameter placeholders such as @1 in place of at least some of the
constant values in your query text, and you'll see that these placeholders have been
used to filter your query.
You can also turn on trace flag 8759 to detect when a query has been auto-
parameterized. When trace flag 8759 is enabled, the first part of an auto-
parameterized query is written to the SQL Server error log, as shown below.
Without a useful index, SQL Server has little choice but to scan the entire table or tables to find the
data you need. If you're joining two or more tables, SQL Server may have to scan some of them
multiple times to find all the data needed to satisfy the query. Indexes can dramatically speed up
the process of finding data as well as the process of joining tables together.
Storage
The sysindexes system table stores system-level information about SQL Server indexes. Every
index has a row in sysindexes and is identified by its indid column, a 1-based integer indicating the
order in which it was created�the clustered index is always indid 1. If a table has no clustered
index, sysindexes will contain a row for the table itself with an indid value of 0.
SQL Server tracks the extents that belong to a table or index using Index Allocation Map (IAM)
pages. A heap or index will have at least one IAM for each file on which it has allocated extents. An
IAM is a bitmap that maps extents to objects; each bit indicates whether the corresponding extent
belongs to the object that owns the IAM. Each IAM bitmap covers a range of 512,000 pages. The
first IAM page for an index is stored in sysindexes' FirstIAM column. IAM pages are allocated
randomly in a database file and linked together in a chain. Even though the IAM permits SQL
Server to efficiently prefetch a table's extents, individual rows must still be examined�the IAM just
serves as an access method to the pages themselves.
Index Types
SQL Server supports two types of indexes: clustered and nonclustered. Both types have a number
of features in common. Both consist of pages stored in B-(balanced) trees. The node levels of each
type contain pointers to pages at the next level, while the leaf level contains the key values.
B-Trees
As I've said, SQL Server indexes are stored physically as B-trees. B-trees support the notion of
searching through data by using a binary search-type algorithm. B-tree indexes store keys with
similar values close together, with the tree itself being continually rebalanced in order to ensure
that a given value can be reached with a minimum of page traversal. Because B-trees are
balanced, the cost of finding a row is fairly constant, regardless of which row it is.
The first node in a B-tree index is the root node. A pointer to each index's root node is stored in
sysindexes' root column. When searching for data using an index, SQL Server begins at the root
node, then traverses any intermediate levels that might exist, finally either finding or not finding
the data in the bottom-level leaf nodes of the index. The number of intermediate levels will vary
based on the size of the table, the size of the index key, and the number of columns in the key.
Obviously the more data there is or the larger each key is, the more pages you need.
Index pages above the leaf level are known as node pages. Each row in a node page contains a key
or keys and a pointer to a page at the next level whose first key row matches it. This is the general
structure of a B-tree. SQL Server navigates these linkages until it locates the data it's searching for
or reaches the end of the linkage in a leaf-level node. The leaf level of a B-tree contains key values,
and, in the case of nonclustered indexes, bookmarks to the underlying clustered index or heap.
These key values are stored sequentially and can be sorted in either ascending or descending
order on SQL Server 2000 and later.
Unlike nonclustered indexes, the leaf node of a clustered index actually stores the data itself. There
is no bookmark, nor is there a need for one. When a clustered index is present, the data itself lives
in the leaf level of the index.
The data pages in a table are stored in a page chain, a doubly-linked list of pages. When a
clustered index is present, the order of the rows on each page and the order of the pages within
the chain are determined by the index key. Given that the clustered index key causes the data to
be sorted, it's important to choose it wisely. The key should be selected with several considerations
in mind, including the following.
The key should be as small as possible since it will serve as the bookmark in every
nonclustered index.
The key should be chosen such that it aligns well with common ORDER BY and GROUP BY
queries.
It should match reasonably well with common range queries (queries where a range of rows
is requested based on the values in a column or columns).
It should be a set of columns that is not updated extremely frequently because updating a
table's clustered index key can require relocating a row and updating the bookmark in every
nonclustered index on the table.
Beginning with SQL Server 7.0, all clustered indexes have unique keys. If a clustered index is
created that is not unique (e.g., via CREATE INDEX without the UNIQUE keyword), SQL Server
forces the index to be unique by appending a 4-byte value called a uniqueifier to key values as
necessary to differentiate identical key values from one another.
Leaf-level pages in a nonclustered index contain index keys and bookmarks to the underlying
clustered index or heap. A bookmark can take one of two forms. When a clustered index exists on
the table, the bookmark is the clustered index's key. If the clustered index and the nonclustered
index share a common key column, it's stored just once. When a clustered index isn't present, the
bookmark consists of a row identifier (RID) made up of the file number, the page number, and the
slot number of the row referenced by the nonclustered key value.
The fact that a heap (a table without a clustered index) forces nonclustered indexes to reference it
using physical location information is a good enough reason alone to create a clustered index on
every table you build. Without it, changes to the table that cause page splits will have a ripple
effect on the table's nonclustered indexes since the physical location of the rows they reference
will change, perhaps quite often. This was, in fact, one of the major disadvantages of SQL Server
indexing prior to version 7.0�nonclustered indexes always stored physical row locator information
rather than the clustered key value and were thus susceptible to physical row location changes in
the underlying table.
Nonclustered indexes are best at singleton selects�queries that return a single row. Once the
nonclustered B-tree is navigated, the actual data can be accessed with just one page I/O, that is,
the read of the page from the underlying table.
Covering Indexes
A nonclustered index is said to "cover" a query when it contains all the columns requested by the
query. This allows it to skip the bookmark lookup step and simply return the data the query seeks
from its own B-tree. When a clustered index is present, a query can be covered using a
combination of nonclustered and clustered key columns since the clustered key is the nonclustered
index's bookmark. That is, if the nonclustered index is built on the LastName and FirstName
columns and the clustered index key is built on CustomerID, a query that requests the CustomerID
and LastName columns can be covered by the nonclustered index. A covering nonclustered index
is the next best thing to having multiple clustered indexes on the same table.
Performance Issues
Generally speaking, keep your index keys as narrow as possible. Wider keys cause more I/O and
permit fewer key rows to fit on each B-tree page. This results in the index requiring a larger
number of pages than it otherwise would and causes it to take up more disk space. In practice,
you'll likely tailor your indexing strategy to meet specific business requirements. For example, if
you have a query that takes an extremely long time to return because it needs an index with key
columns none of your current indexes have, you may indeed want to widen an existing index or
create a new one.
Naturally, there's a trade-off with adding additional indexes or index columns�namely, DML
performance. Since the indexes on a table have to be maintained and updated as you add or
change data, each new index you add brings with it a certain amount of overhead. The more
indexes you add, the slower updates, inserts, and deletes against the underlying tables become, so
it's important to keep your indexes as compact and narrow as possible while still meeting the
business needs your system was designed to address.
Index Intersection
Prior to version 7.0, the SQL Server query optimizer would use just one index per table to resolve a
query. SQL Server 7.0 and later can use multiple indexes per table and can intersect their sets of
bookmarks before incurring the expense of retrieving data from the underlying table. This has
some implications on index design and key selection, as I'll discuss in a moment.
Index Fragmentation
You can control the amount of fragmentation in an index through its fillfactor setting and through
regular defrag operations. An index's fillfactor affects performance in several ways. First, creating
an index with a relatively low fillfactor helps avoid page splits during inserts. Obviously, with pages
only partially full, the potential for needing to split one of them in order to insert new rows is lower
than it would be with completely full pages. Second, a high fillfactor can help compact pages so
that less I/O is required to service a query. This is a common technique with data warehouses.
Retrieving pages that are only partially full wastes I/O bandwidth.
An index's fillfactor setting affects only the leaf-level pages in the index. SQL Server normally
reserves enough empty space on intermediate index pages to store at least one row of the index's
maximum size. If you want your fillfactor specification applied to intermediate as well as leaf-level
pages, supply the PAD_INDEX option of the CREATE INDEX statement. PAD_INDEX instructs SQL
Server to apply the fillfactor to the intermediate-level pages of the index. If your fillfactor setting is
so high that there isn't room on the intermediate pages for even a single row (e.g., a fillfactor of
100%), SQL Server will override the percentage so that at least one row fits. If your fillfactor setting
is so low that the intermediate pages cannot store at least two rows, SQL Server will override the
fillfactor percentage on the intermediate pages so that at least two rows fit on each page.
Understand that an index's fillfactor setting isn't maintained over time. It's applied when the index
is first created but is not enforced afterward. DBCC SHOWCONTIG is the tool of choice for
determining how full the pages in a table and/or index really are. The key indicators you want to
examine are Logical Scan Fragmentation and Avg. Page Density. DBCC SHOWCONTIG shows three
types of fragmentation: extent scan fragmentation, logical scan fragmentation, and scan density.
Use DBCC INDEXDEFRAG to fix logical scan fragmentation; rebuild indexes to defrag the table
and/or index completely.
Listing 12.1 shows some sample DBCC SHOWCONTIG output from the Northwind Customers table.
(Results)
As you can see, the Customer table is a bit fragmented. Logical Scan Fragmentation is sitting at
40% and Avg. Page Density is at 61.76% (see lines set in bold). In other words, the pages in the
table are, on average, approximately 40% empty. Let's defrag the table's clustered index and see if
things improve any. Listing 12.2 shows the resulting output.
DBCC INDEXDEFRAG(Northwind,Customers,1)
(Results)
(Results)
As you can see, DBCC INDEXDEFRAG helped considerably. Logical Scan Fragmentation has dropped
to 25% and Avg. Page Density is now at a little over 77%, an improvement of approximately 15%.
By default, DBCC SHOWCONTIG reports leaf-level information only. To scan the other levels of the
table/index, specify the ALL_LEVELS option (Listing 12.3).
(Results abridged)
Table 12.1 lists the key data elements reported by DBCC SHOWCONTIG and what they mean.
I use the Logical Scan Fragmentation and Avg. Page Density fields to determine overall table/index
fragmentation. You should see them change in tandem as fragmentation increases or decreases
over time.
Defragmenting
As you just saw, DBCC INDEXDEFRAG is a handy way to defragment an index. It's an online
operation, so the index is still usable while it works. That said, it reorganizes the index only at the
leaf level, performing a kind of bubble sort on the leaf-level pages. To fully defrag an index, you
must rebuild it. You have several ways to do this. First, you could simply drop and recreate the
index by using DROP/CREATE INDEX. The drawback to this, though, is that you have to take the
index offline while you rebuild it, and you aren't allowed to drop indexes that support constraints.
You could use DBCC DBREINDEX or the DROP_EXISTING clause of CREATE INDEX, but, again, the
index is unavailable until it's rebuilt. The one upside is that the index can be created in parallel if
you're running on the Enterprise Edition of SQL Server. Since SQL Server's parallel index creation
scales almost linearly on multiple processors, the length of time that an index is offline while being
recreated can be significantly less on an SMP system than on a single-processor system that is
otherwise identical in terms of hardware resources.
Avg. Bytes Free per Page Average number of bytes free on each page
Out of order pages (not displayed, but used Number of times a page had a lower page number
to compute Logical Scan Fragmentation) than the previous page in the scan
Generally speaking, DBCC INDEXDEFRAG is the best tool for the job unless you find widespread
fragmentation in the non-leaf levels of the index and you feel this fragmentation is unacceptably
affecting query performance. As I mentioned, you can check the fragmentation of the other levels
of an index by passing the ALL_LEVELS option to DBCC SHOWCONTIG.
In addition to defragmenting the leaf-level pages, DBCC INDEXDEFRAG also features a compaction
phase wherein it compacts the index's pages using the original fillfactor as its target. It attempts to
leave enough space for at least one row on each page when it finishes. If it can't obtain a lock on a
particular page during compaction, it skips the page. It removes any pages that end up completely
empty as a result of the compaction.
Creating an index on a view or a computed column persists data that would otherwise exist only in
a logical sense. Normally, the data returned by a view exists only in the tables the view queries.
When you query the view, your query is combined with the one comprising the view and the data
is retrieved from the underlying objects. The same is true for computed columns. Normally, the
data returned by a computed column does not actually exist independently of the columns or
expressions it references. Every time you request it from its host table, the expression that
comprises it is reevaluated and its data is generated on-the-fly.
When you begin building indexes on a view, you must start with a unique clustered index. This is
where the real data persistence happens. Just as with tables, a clustered index created on a view
actually stores the data itself in its leaf-level nodes. Once the clustered index exists, you're free to
create nonclustered indexes on the view as well.
This differs from computed columns in tables. With computed columns, you're not required to first
create a clustered index in order to build nonclustered indexes. Since the column will serve merely
as an index key value, a nonclustered index works just fine.
ARITHABORT ON
CONCAT_NULL_YIELDS_NULL ON
QUOTED_IDENTIFIER ON
ANSI_NULLS ON
ANSI_PADDING ON
ANSI_WARNINGS ON
NUMERIC_ROUNDABORT OFF
Prerequisites
SQL Server requires that seven SET options have specific values in order to create an index on a
view or a computed column. Table 12.2 lists the settings and their required values. As you can see
from the table, all settings except NUMERIC_ROUNDABORT must be set to ON.
Only deterministic expressions can be used with indexed views and indexes on computed columns.
A deterministic expression is one that, when supplied a given input, always returns the same
output. The expression SUBSTRING('He who loves money more than truth will end up poor',23,7) is
a deterministic expression; GETDATE isn't.
You can check to see whether a view or column is indexable using Transact-SQL's OBJECTPROPERTY
and COLUMNPROPERTY functions (Listing 12.4).
USE Northwind
SELECT OBJECTPROPERTY (OBJECT_ID('Invoices'), 'IsIndexable')
SELECT COLUMNPROPERTY (OBJECT_ID('syscomments'), 'text' , 'IsIndexable')
SELECT COLUMNPROPERTY (OBJECT_ID('syscomments'), 'text' , 'IsDeterministic')
(Results)
-----------
0
-----------
0
-----------
0
One final prerequisite for views is that a view is only indexable if it was created with the
SCHEMABINDING option. Creating a view with SCHEMABINDING causes SQL Server to prevent the
objects it references from being dropped unless the view is first dropped or changed so that the
SCHEMABINDING option is removed. Also, ALTER TABLE statements on tables referenced by the
view will fail if they affect the view definition.
In the example above, the Invoices view is not indexable because it was not created with
SCHEMABINDING. Here's a version of it that was, along with a subsequent check of IsIndexable
(Listing 12.5).
(Results)
-----------
1
Note that all the object references now use two-part names (they don't in the original Invoices
view). Creating a view with SCHEMABINDING requires all object references to use two-part names.
Once a view has been indexed, the optimizer can make use of the index when the view is queried.
In fact, on the Enterprise Edition of SQL Server, the optimizer will even use the index to service a
query on the view's underlying tables if it thinks that would yield the least cost in terms of
execution time.
Normally, indexed views are not used at all by the optimizer unless you're running on Enterprise
Edition. For example, consider the following index and query (Listing 12.6).
(Results abridged)
StmtText
------------------------------------------------------------------------------
SELECT * FROM [invoices2] WHERE [orderid]=@1 AND [productid]=@2
|--Nested Loops(Inner Join)
|--Nested Loops(Inner Join)
| |--Nested Loops(Inner Join, OUTER REFERENCES:([Orders].[ShipVia]))
| | |--Nested Loops(Inner Join, OUTER REFERENCES:([Orders].[Employ
| | | |--Nested Loops(Inner Join, OUTER REFERENCES:([Orders].[C
| | | | |--Clustered Index Seek(OBJECT:([Northwind].[dbo].[O
| | | | |--Clustered Index Seek(OBJECT:([Northwind].[dbo].[C
| | | |--Clustered Index Seek(OBJECT:([Northwind].[dbo].[Employ
| | |--Clustered Index Seek(OBJECT:([Northwind].[dbo].[Shippers].[
| |--Clustered Index Seek(OBJECT:([Northwind].[dbo].[Order Details].[
|--Clustered Index Seek(OBJECT:([Northwind].[dbo].[Products].[PK_Product
Though the plan text is clipped on the right, you can tell that the view index obviously isn't being
used even though it contains both of the columns the query filters on. On versions of SQL Server
other than Enterprise Edition, this is completely expected. Out of the box, only the Enterprise
Edition of SQL Server will consider view indexes when formulating an execution plan. There is,
however, a workaround. You can use the NOEXPAND query hint on non-EE versions of SQL Server
to force the consideration of a view's index. Here's the query again, this time with the NOEXPAND
keyword and the resultant query plan (Listing 12.8).
(Results)
StmtText
------------------------------------------------------------------------------
SELECT * FROM invoices2 (NOEXPAND) WHERE orderid=10844 AND productid=22
|--Clustered Index Seek(OBJECT:([Northwind].[dbo].[Invoices2].[inv]), SEEK:([
Notice that the index is now used. Using the NOEXPAND keyword forces the optimizer to use a
view's index, even if doing so yields a suboptimal plan. You should regard NOEXPAND with the
same skepticism that you do other query hints. It's best to let the optimizer do its job and override
it only when you have no other choice.
One telltale sign that a table lacks a clustered index is when you notice that it has RID locks taken
out on it. SQL Server will never take out RID locks on a table with a clustered index; it will always
take out key locks instead.
Generally speaking, you should let SQL Server control locking of all types, including locking with
indexes. Normally, it makes good decisions and will do the best job of managing its own resources.
You can use the sp_indexoption system procedure to manually control what types of locks are
allowed on an indexed table. You can use it to disable row and/or page locks, but I don't
recommend that you normally do this. As with query hints, it's generally best to let the server
decide what type of lock should be taken out on a resource.
Note that sp_indexoption applies only to indexes, so you can't control the locking with the pages in
a heap. That said, when a table has a clustered index, it is affected by the settings specified via
sp_indexoption.
Statistics
You've probably heard the term "statistics" bandied about in discussions of SQL
Server query performance. Statistics are metadata that SQL Server maintains about
index keys and, optionally, nonindexed column values. SQL Server uses statistics to
determine whether using an index could speed up a query. In conjunction with
indexes, statistics are the single most important source of data for helping the
optimizer develop optimum execution plans. When statistics are missing or out-of-
date, the optimizer's ability to formulate the best execution plan for a query is
seriously impaired.
Let's cover a few basic statistics-related terms before we discuss statistics in more
depth.
Cardinality
The cardinality of data refers to how many unique values exist in the data. In strict
relational database theory, duplicate rows (tuples) are not permitted within a
relation (a table), so cardinality would refer to the total number of tuples. That said,
SQL Server does permit duplicate rows to exist in a table, so for our purposes, the
term "cardinality" refers to the number of unique values within a data set.
Density
Density refers to the uniqueness of values within a data set. An index's density is
computed by dividing the number of rows that would correspond to a given key
value by the number of rows in the table. For a unique index, this amounts to
dividing 1 by the table's total row count. Density values range from 0 through 1;
lower densities are better.
Selectivity
Performance Issues
Indexes with high densities will likely be ignored by the optimizer. The most useful
indexes to the optimizer have density values of 0.10 or lower. Let's take the example
of a table called VoterRegistration with 10,000 rows, no clustered index, and a
nonclustered index on its PartyAffiliation column. If there are three political parties
registered in the voting precinct and they each have about the same representation
across the voter base, PartyAffiliation will likely contain only three unique values.
This means that a given key value in the index could identify as many as 3,333 rows
in the table, perhaps more. This gives the index a density of 0.33 (3,333 ÷ 10,000)
and virtually ensures that the optimizer will not use the index when formulating an
execution plan for queries that require columns not covered by the index.
To better understand this, let's compare the cost of using the index versus not using
it to satisfy a simply query. If we wanted to list all the voters in the precinct affiliated
with the Democratic Party, we'd be talking about hitting approximately a third of the
table, or 3,333 rows. If we use the PartyAffiliation index to access those rows, we're
faced with 3,333 separate logical page reads from the underlying table. In other
words, as we found each key value in the index, we'd have to look up its bookmark
in the underlying table in order to get the columns not contained in the index, and
each time we did this, we'd incur the overhead of a logical (and possibly physical)
page I/O. All told, we might incur as much as 26MB of page I/O overhead to lookup
these bookmark values (3,333 keys x 8K/page). Now consider the cost of simply
scanning the table sequentially. If an average of 50 rows fit on each data page and
we have to read the entire table to find all of the ones affiliated with the Democratic
Party, we're still looking at only about 200 logical page I/Os (10,000 rows ÷ 50
rows/page = 200 pages). That's a big difference and is the chief reason you'll see
high-density nonclustered indexes ignored in favor of table/clustered index scans.
Storage
SQL Server stores statistics for an index key or a column in the statblob column of
sysindexes. statblob is an image data type that stores a histogram containing a
sampling of the values in the index key or column. For composite indexes, only the
first column is sampled, but density values are maintained for the other columns.
During the index selection phase of query optimization, the optimizer decides
whether an index matches up with the columns in filter criteria, determines index
selectivity as it relates to that criteria, and estimates the cost of accessing the data
the query seeks.
If an index has only one column, its statistics consist of one histogram and one
density value. If an index has multiple columns, a single histogram is maintained, as
well as density values for each prefix (left-to-right) combination of key columns. The
optimizer uses this combination of an index's histogram and densities�its
statistics�to determine how useful the index is in resolving a particular query.
The fact that a histogram is stored only for the first column of a composite index is
one of the reasons you should position the most selective columns in a multicolumn
index first�the histogram will be more useful to the optimizer. Moreover, this is also
the reason that splitting up composite indexes into multiple single-column indexes is
sometimes advisable. Since the server can intersect and join multiple indexes on a
single table, you retain the benefits of having the columns indexed, and you get the
added benefit of having a histogram for each column (column statistics can help out
here, as well). This isn't a blanket statement�don't run out and drop all your
composite indexes�just keep in mind that breaking down composite indexes is
sometimes a viable performance tuning option.
Column Statistics
Besides index statistics, SQL Server can also create statistics on nonindexed
columns. (This happens automatically when you query a nonindexed column while
AUTO_CREATE_STATISTICS is enabled for the database.) Being able to determine the
likelihood that a given value might occur in a column gives the optimizer valuable
information in determining how best to service a query. It allows the optimizer to
estimate the number of rows that will qualify from a given table involved in a join,
allowing it to more accurately select join order. Also, the optimizer can use column
statistics to provide histogram-type information for the other columns in a
multicolumn index. Basically, the more information you can give the optimizer about
your data, the better.
Listing Statistics
SQL Server uses statistics to track the distribution of key values across a table. The
histogram that's stored as part of an index's statistics contains a sampling of up to
200 values for the index's first key column. Besides the histogram, the statblob
column also contains:
The range of key values between each of the 200 histogram sample values is called
a step. Each sample value denotes the end of a step, and each step stores three
values:
1. EQ_ROWS� the number of rows with a key value matching the sample value
2. RANGE_ROWS� the number of other values inside the range
DBCC SHOW_STATISTICS lists the EQ_ROWS and RANGE_ROWS values verbatim and
uses RANGE_DENSITY to compute the DISTINCT_ RANGE_ROWS and
AVG_RANGE_ROWS for the step. It computes DISTINCT_RANGE_ROWS (the total
number of distinct rows within the step's range) by dividing 1 by RANGE_DENSITY
and computes AVG_ RANGE_ROWS (the average number of rows per distinct key
value) by multiplying RANGE_ROWS by RANGE_DENSITY.
Updating Statistics
Statistics can be updated in a couple of ways. The first and most obvious is through
the AUTO_UPDATE_STATISTICS database option. (You can turn this on via ALTER
DATABASE, sp_dboption, or Enterprise Manager.) When statistics are generated
automatically for a table of any size, SQL Server uses sampling (as opposed to
scanning the entire table) to speed up the process. This works in the vast majority of
cases but can sometimes lead to statistics that are less useful than they could be.
SQL Server provides a few stored procedures to make creating and updating
statistics easier. The sp_updatestats procedure runs UPDATE STATISTICS against all
user-defined tables in the current database. Unlike the UPDATE STATISTICS command
itself, though, sp_updatestats cannot issue a full scan of a table to build statistics�it
always uses sampling. If you want full scan statistics, you have to use UPDATE
STATISTICS.
The sp_createstats procedure can be similarly handy. It can automate the creation of
column statistics for all eligible columns in all eligible tables in a database. Eligible
columns include noncomputed columns with data types other than text, ntext, or
image that do not already have column or first-column index statistics. Eligible
tables include all user (nonsystem) tables. I'm not recommending that you run out
and execute sp_createstats in each of your databases�it's unlikely that you'd need
statistics on every column in a table. However, in the event that you do,
sp_createstats can be a real timesaver.
The sp_autostats procedure allows you to control automatic statistics updating at
the table and index levels. Rather than simply relying on the
AUTO_UPDATE_STATISTICS database option, you can enable/disable auto stats
generation at a more granular level. For example, if you run a nightly job on a large
table to update its statistics using a full scan, you may want to disable automatic
statistics updates for the table. Using sp_autostats, you can disable automatic
statistics updates on this one table while leaving it enabled for the rest of the
database (you can also use UPDATE STATISTICS…WITH NORECOMPUTE). Statistics
updates on large tables, even those that use sampling, can take a while to run and
can use significant CPU and I/O resources.
Keep in mind that the negative impact on performance of not having statistics or
having out-of-date statistics almost always outweighs the performance benefits of
avoiding automatic statistics updates/creation. You should disable auto
update/create stats only when thorough testing has shown that there's no other way
to achieve the performance or scalability you require.
Listing 12.9 shows a stored procedure that you can use to stay on top of statistics
updates. It shows the statistics type, the last time it was updated, and a wealth of
other information that you may find useful in managing index and column statistics.
(Results abridged)
Selectivity Estimation
To correctly determine the relative cost of a query plan, the optimizer needs to be
able to precisely estimate the number of rows returned by the query. As I mentioned
earlier, this is known as selectivity, and it's crucial for the optimizer to be able to
accurately estimate it.
Selectivity estimates come from comparing query criteria with the statistics for the
index key or column they reference. Selectivity tells us whether to expect a single
row or 10,000 rows for a given key value. It gives us an idea of how many rows
correspond to the key or column value we're searching for. This, in turn, helps us
determine what type of approach would be the most efficient for accessing those
rows. Obviously, you take a different approach to access one row than you would for
10,000.
If the optimizer discovers that you don't have index or column statistics for a column
in the query's filter criteria, it may automatically create column-level statistics for
the column if you have the AUTO_CREATE_STATISTICS database option enabled.
You'll incur the cost of creating the statistics the first time around, but subsequent
queries should be sped up by the new stats.
Indexable Expressions
In other books and materials on SQL Server, you often see the discussion of index
use in query plans couched in terms of "SARGs"�search arguments in a query that
the optimizer can translate into a comparison between an index key and a value.
There is, however, no such concept within the optimizer code itself. The term "SARG"
is an anachronism from the old Sybase code base and Sybase training materials. The
optimizer in all recent versions of SQL Server is far more sophisticated in terms of its
ability to use an index to speed up a query than it was in older versions of the
product. The optimizer's ability to use indexes is no longer constrained to simple
expressions, nor is the decision of whether or not to use an index necessarily
dependent on the form an expression may have. In other words, the actual
expression used in a query predicate may be irrelevant depending on the indexes
present, the optimizer's ability to ferret out column references within it, and the cost
estimates produced by the optimizer. The term "SARG" has become less and less
applicable, to the point that I think it's ready to be retired in favor of a more accurate
and more generally encompassing term for "index friendly" query predicate
expressions. A far more precise and accurate term (and the term used within the
optimizer itself) is "indexable expression"�an expression that can be serviced with
an index to provider faster access to the data returned by a query�so that's the
term I'll use in this book.
For our purposes, an indexable expression is a clause in a query that the optimizer
can potentially use in conjunction with an index to limit the results returned by the
query. The optimizer attempts to identify indexable expressions in a query's criteria
so that it can determine the best indexes to use to service the query.
Column op Constant/Variable
(the terms can be reversed) where Column is a table column; op is one of the
following operators: =, >=, <=, >, <, <>, !=, !>, !<, BETWEEN, and LIKE (some
LIKE clauses can be translated into indexable expressions, and some can't); and
Constant/Variable is a constant value, variable reference, or one of a handful of
functions.
Operator Translation
Even though some of the operators mentioned above do not lend themselves to use
with indexed lookups, the optimizer can translate them into expressions that do. For
example, consider this query:
Is the WHERE clause indexable? Yes. Look at the translation that occurs in this
excerpt from the query plan:
SEEK:([authors].[au_lname] < 'Greene' OR
[authors].[au_lname] > 'Greene')
The optimizer is smart enough to know that x != @parm is the same as x < @parm
OR x > @parm and translates it accordingly. Since the two branches of the OR
clause can be executed in parallel and merged, this affords the use of an index to
service the WHERE criteria.
Is it indexable? Yes. Once again, the optimizer translates the WHERE clause criteria
to something it finds more palatable:
Here's another:
Can the optimizer use an index to satisfy the query? Yes, it can. Here's an excerpt
from the plan:
Indexable expressions can be joined together with AND to form compound clauses.
The rule of thumb for identifying an indexable expression is that a clause can be
serviced by an index if the optimizer can detect that it's a comparison between an
index key value and a constant or variable. A common beginner's error is to involve
a column in an expression when comparing it to a constant or variable. For the most
part, this prevents the clause from being serviced by an index because the optimizer
doesn't know what the expression actually evaluates to�it's not known until
runtime. (There are exceptions to this�see the discussion below on folding for more
information.) The trick, then, is to isolate the column in these types of expressions so
that it's not modified or encapsulated in any way. Use commonsense algebra to put
the modifiers in the clause on the value/constant side of the expression and leave
the column itself unmodified.
If the optimizer is not able to identify an expression as an indexable expression, it is
not able to use statistics to estimate the number of rows returned by the associated
operator and must instead hazard a guess. It uses hard-coded "magic" numbers for
these estimates, based on the comparison operator used. Table 12.3 summarizes the
magic numbers the optimizer uses for each operator.
Folding
SELECT *
FROM Orders
WHERE ISNULL(OrderDate,GETDATE())>'2003-04-06 19:55:00.000'
If the optimizer were unable to fold the expression involving the OrderDate column
and the ISNULL function, it would be unable to use the Orders table's OrderDate
index (whose key is the OrderDate column) to service this query. As it is, if you view
the graphical showplan for this query in Query Analyzer, you'll see that the index is
indeed used.
The rule of thumb here is still to avoid wrapping columns in expressions when you
can. However, just be aware that there are situations when the optimizer can
identify indexable expressions anyway.
Join Order and Type Selection
In addition to choosing indexes and indexable expressions, the optimizer also selects a join
order and picks a join strategy for operators that require it. The selection of indexes and
join strategy go hand in hand�indexes influence the types of join strategies that are
viable, and the join strategy influences the types of indexes the optimizer needs to
produce an efficient plan.
= 10
> 30
< 30
BETWEEN 10
1. Nested loop works well with a smaller outer table and an index on the inner table.
2. Merge works well when both inputs are sorted on the joining column. (The optimizer
can sort one of the inputs if necessary.)
3. Hash performs well in situations where there are no usable indexes. Usually, creating
an index (so that a different join strategy can be chosen) will provide better
performance.
The optimizer determines the join strategy to use to service a query. It evaluates the cost
of each strategy and selects the one it thinks will cost the least. It reserves the right to
reorder tables in the FROM clause if doing so will improve the performance of the query.
You can always tell whether this has happened by inspecting the execution plan. The order
of the tables in the execution plan is the order the optimizer thought would perform best.
You can override the optimizer's ability to determine join order by using the OPTION
(FORCE ORDER) clause for a query, the SET FORCEPLAN ON session option, and join hints
(e.g., INNER LOOP JOIN). Each of these forces the optimizer to join tables in the order
specified by the FROM clause.
Note that forcing join order may have the side effect of also forcing a particular join
strategy. For example, consider the query and its execution plan shown in Listing 12.10.
Listing 12.10 A Query with a Right-Outer Join and Its Execution Plan
(Query)
(Execution plan)
StmtText
----------------------------------------------------------------------------
SELECT o.OrderId, p.ProductId
FROM [Order Details] o RIGHT JOIN Products p
ON (o.ProductId=p.ProductId)
|--Nested Loops(Left Outer Join, OUTER REFERENCES:(p.ProductID))
|--Index Scan(OBJECT:(Northwind.dbo.Products.SuppliersProducts AS p))
|--Index Seek(OBJECT:(Northwind.dbo.[Order Details].ProductID AS o),
SEEK:(o.ProductID=p.ProductID) ORDERED FORWARD)
Notice that the optimizer uses a nested loop join and that it reorders the tables (Products
is listed first in the plan, even though Order Details is listed first in the FROM clause). Now
let's force the join order by using the FORCE ORDER query hint and see what happens to
the query plan (Listing 12.11).
Listing 12.11 Forcing the Join Order with the FORCE ORDER Hint
(Query)
(Query plan)
StmtText
---------------------------------------------------------------------------
SELECT o.OrderId, p.ProductId
FROM [Order Details] o RIGHT JOIN Products p ON (o.ProductId=p.ProductId)
OPTION(FORCE ORDER)
|--Merge Join(Right Outer Join, MANY-TO-MANY MERGE:(o.ProductID)=
(p.ProductID), RESIDUAL:(o.ProductID=p.ProductID))
|--Index Scan(OBJECT:(Northwind.dbo.[Order Details].
ProductsOrder_Details AS o), ORDERED FORWARD)
|--Clustered Index Scan(OBJECT:(Northwind.dbo.Products.
PK_Products AS p), ORDERED FORWARD)
Since it can't reorder the tables, the optimizer switches to a merge join strategy. This is
less efficient than letting the optimizer order the tables as it sees fit and join the tables
using a nested loop.
Nested loop joins consist of a loop within a loop. A nested loop join designates one table in
the join as the outer loop and the other as the inner loop. For each iteration of the outer
loop, the entire inner loop is traversed. This works fine for small to medium-sized tables,
but as the loops grow larger, this strategy becomes increasingly inefficient. The general
process is as follows.
2. Use values from that row to find a row in the second table.
3. Repeat the process until there are no more rows in the first table that match the
search criteria.
The optimizer evaluates at least four join combinations even if those combinations are not
specified in the join predicate. It balances the cost of evaluating additional combinations
with the need to keep down the overall cost of producing the query plan.
Nested loop joins perform much better than merge joins and hash joins when working with
small to medium-sized amounts of data. The query optimizer uses nested loop joins if the
outer input is quite small and the inner input is indexed and quite large. It orders the
tables so that the smaller input is the outer table and requires a useful index on the inner
table. The optimizer always uses the nested loop strategy with theta (nonequality) joins.
Merge Joins
Merge joins perform much more efficiently with large data sets than nested loop joins do.
Both tables must be sorted on the merge column in order for the join to work. The
optimizer usually chooses a merge join when working with large data sets that are already
sorted on the join columns. The optimizer can use index trees to provide the sorted inputs
and can also leverage the sort operations of GROUP BY, CUBE, and ORDER BY�the sorting
needs to occur only once. If an input is not already sorted, the optimizer may opt to first
sort it so that a merge join can be performed if it thinks a merge join is more efficient than
a nested loop join. This happens very rarely and is denoted by the Sort operator in the
query plan.
2. Compare them.
4. If the values are not equal, toss the lower value and use the next input value from
that table for the next comparison.
5. Repeat the process until all the rows from one of the tables have been processed.
The optimizer makes only one pass per table. The operation terminates after all the input
values from one of the tables have been evaluated. Any values remaining in the other
table are not processed.
The optimizer can perform a merge join operation for every type of relational join
operation except CROSS JOIN and FULL JOIN. Merge operations can also be used to UNION
tables together (since they must be sorted to eliminate duplicates).
Hash Joins
Hash joins are also more efficient with large data sets than nested loop joins are.
Additionally, they work well with tables that are not sorted on the join column(s). The
optimizer typically opts for a hash join when dealing with large inputs and when no index
exists to join them or an index exists but is unusable.
SQL Server performs hash joins by hashing the rows from the smaller of the two tables
(designated the build table) and inserting them into a hash table, then processing the
larger table (the probe table) a row at a time and searching the hash table for matches.
Because the smaller of the two tables supplies the values in the hash table, the table size
is kept to a minimum, and because hashed values rather than real values are used,
comparisons can be made between the tables very quickly.
Hash joins are a variation on the concept of hashed indexes that have been available in a
handful of advanced DBMS products for several years. With hashed indexes, the hash
table is stored permanently�it is the index. Data is hashed into slots that have the same
hashing value. If the index has a unique contiguous key, what is known as a minimal
perfect hashing function exists�every value hashes to its own slot and there are no gaps
between slots in the index. If the index is unique but noncontiguous, the next best
thing�a perfect hashing function�can exist wherein every value hashes to its own slot,
but there are potentially gaps between them.
If the build and probe inputs are chosen incorrectly (e.g., because of inaccurate density
estimates), the optimizer reverses them dynamically using a process called role reversal.
Hash join operations can service every type of relational join (including UNION and
DIFFERENCE operations) except CROSS JOINs. Hashing can also be used to group data and
remove duplicates (e.g., with vector aggregates�SUM(Quantity) GROUP BY ProductId).
When it uses hashing in this fashion, the optimizer uses the same table for both the build
and probe roles.
When join inputs are large and of similar size, a hash join performs comparably to a merge
join. When join inputs are large but differ significantly in size, a hash join usually
outperforms a merge join by a fair margin.
A subquery is a query nested within a larger one. Typically, a subquery supplies values for
predicate operators (such as IN, ANY, and EXISTS) or a single value for a derived column or
variable assignment. Subqueries can be used in many places, including a query's WHERE
and HAVING clauses.
Understand that joins are not inherently better than subqueries. Often the optimizer will
normalize a subquery into a join, but this doesn't mean that the subquery was an
inefficient coding choice.
For simple to moderately complex queries, derived tables can provide a similar benefit in
that they can allow you to serialize the order of query processing to some extent. Derived
tables can work like parentheses in expression evaluation (and are, in fact, delimited with
parentheses)�they can establish an order of evaluation. When you demote part of a
complex SELECT to a derived table, against which you then apply the remainder of the
SELECT, you're in effect saying, "Do this first, then hand its results to the outer SELECT."
The optimizer isn't required to respect the order you supply�it can rearrange the tables in
derived table expressions and can even discard the nesting in favor of a more efficient
plan�but at least you have a syntactical mechanism for specifying the order you prefer.
You will have to try the technique in specific situations to determine whether it provides
the performance you require.
Logical and Physical Operators
Logical operators describe the relational operations used to process a query. They must
be translated into physical operators in order to be executed. Physical operators
describe what SQL Server must do to carry out the work (e.g., return the data)
requested by the query. Often a logical operator maps to multiple physical operators.
Execution plans consist of physical operators because these operators are what tell the
LPE component within the server what it needs to do to carry out the work of the query.
These physical operators are translated by the LPE component into OLE DB calls from
the relational engine to the storage engine to perform the work requested by the query.
This is best understood by way of example. Consider the following query, which
requests a relational inner join:
SELECT *
FROM Orders o JOIN [Order Details] d ON (o.OrderID = d.OrderID)
The optimizer chooses a merge join for it, though the query itself obviously doesn't ask
for one. Relationally, the query is performing an inner join between the two tables.
Here's an excerpt from its textual showplan:
Notice that the PhysicalOp column lists Merge Join as the operator. That's what happens
behind the scenes to service the operation spelled out in the LogicalOp column: Inner
Join. An inner join is what we requested via our query. The server chose a Merge Join as
the physical operator to carry out our request.
Besides deciding which index to use and what join strategy to apply, the optimizer has
additional decisions to make regarding other types of operations. The subsections below
describe a few of them.
DISTINCT
GROUP BY
The optimizer can service GROUP BY queries by using plain sorts or by hashing. Again,
the physical operator may be Hash or Sort, but the logical operator remains Aggregate
or something similar. Also, as mentioned, Stream Aggregate is a popular choice for
GROUP BY operations.
Because the optimizer may choose to perform a hash operation to group the data, the
result set may not come back in sorted order. You can't rely on GROUP BY to
automatically sort data. If you want the result set sorted, include an ORDER BY clause.
ORDER BY
Even with ORDER BY, the optimizer has a decision to make. Assuming no clustered
index exists that has already sorted the data, the optimizer has to come up with a way
to return the result set in the requested order. It can sort the data, as we'd naturally
expect, or it can traverse the leaf level of an appropriately keyed nonclustered index.
Which option the optimizer chooses depends on a number of factors. The biggest one is
selectivity�how many rows will be returned by the query? Equally relevant is index
covering�can the nonclustered index cover the query? If the number of rows is
relatively low, it may be cheaper to use the nonclustered index than to sort the entire
table. Likewise, if the index can cover the query, you've got the next best thing to
having a second clustered index, and the optimizer will likely use it.
Spooling
The Spooling operator in a query plan indicates that the optimizer is saving the results
from an intermediate query in a table for further processing. With a lazy spool, the work
table is populated as needed. With an eager spool, the table is populated in one step.
The optimizer favors lazy spools over eager spools because of the possibility that it may
be able to avoid having to fill the work table completely based on logic deeper in the
query plan.
There are cases where eager spools are necessary�for example, to protect against the
Halloween problem�but, generally, the optimizer prefers lazy spools because of their
reduced overhead.
Spool operations can be performed on tables as well as indexes. The optimizer uses a
rowcount spool when all it needs to know is whether a row exists.
Recap
Using indexes, statistics, and the submitted query text as input, the SQL Server
query processor produces optimized execution plans that the server then carries out
to access and return the requested data. The server can make use of clustered and
nonclustered indexes as well as auto-generated and manually created statistics. It
can use multiple indexes per table within a given query plan and can intersect and
join indexes.
The query processor picks the plan with the lowest cost. Usually, this is driven by the
estimated I/O for the plan, but there are other cost factors as well. The estimated I/O
for each step in a plan is generally based on the estimated number of rows it will
return for each execution and the estimated number of executions. Discrepancies
between the estimated and actual row counts and executions can indicate
inaccurate estimates, usually due to out-of-date statistics or statistics that are
sampled with a sampling interval that is too low to accurately represent its base
data.
The key to efficient query processing is providing the optimizer enough information
to make informed decisions. Indexes, statistics, indexable expressions, constraints,
and reusable query plans are essential for good query performance.
SQL Server provides several tools for helping you tune your queries. A wonderful tool
for evaluating whether you have optimal indexes and statistics is the Index Tuning
Wizard. You can supply it a trace load or an individual query, and it will suggest
indexing and statistics changes that could make your code run more quickly. SQL
Server's Profiler tool can show procedure recompilations and cache use information,
as can Perfmon. Query Analyzer's graphical showplan can show specific information
about plan estimates and step inputs.
Knowledge Measure
1. True or false: If you create a clustered index with a nonunique key, SQL Server
automatically "uniquefies" it.
2. Provided an index exists on col, can the optimizer use it to service the WHERE
clause predicate col <> 1?
5. True or false: When an index is created, its fillfactor setting affects the number
of rows stored on every page in the index's B-tree except the root page.
8. What DBCC command can you use to list the fragmentation for an index?
9. What stored procedure can you use to affect the types of locks SQL Server will
use for a particular index?
10. What DBCC command can you use to list the statistics histogram for an index?
12. What sp_configure setting determines whether the query optimizer attempts to
create a parallel execution plan on an SMP server?
14. Explain what the term "density" refers to in terms of query optimization.
15. When we see an RID lock in the output of sp_lock, what does that immediately
tell us about the referenced table's indexes?
16. Provided an index exists on col, can the WHERE clause predicate
ISNULL(col,'')='foo' be serviced with an index?
Chapter 13. Transactions
I do not feel obliged to believe that the same God who has endowed us with
sense, reason, and intellect has intended us to forgo their use.
�Galileo Galilei[1]
[1] Galilei, Galileo. "Letter to the Grand Duchess Christina of Tuscany, 1615." In Discoveries and Opinions of Galileo: Including
the Starry Messenger, trans. Stillman Drake. Baltimore: Anchor Books, 1957, pp. 177, 183.
In this chapter, I'll update the coverage of SQL Server transactions that first
appeared in my book The Guru's Guide to Transact-SQL. We'll build on the coverage
that appeared in that book and update it for the current release of the product.
SQL Server's transaction management facilities help ensure the integrity and
recoverability of the data stored in its databases. A transaction is a set of one or
more database operations that are treated as a single unit�either they all occur or
none of them do. As such, a transaction is a database's basic operational metric, its
fundamental unit of work.
SQL Server transactions ensure data recoverability and consistency in spite of any
hardware, operating system, application, or SQL Server errors that may occur. They
ensure that multiple commands performed within a transaction are performed either
completely or not at all, and that a single command that alters multiple rows
changes either all of them or none of them.
The ACID Test
SQL Server transactions are often described as "having the ACID properties" or
"passing the ACID test," where ACID is an acronym for atomic, consistent, isolated,
and durable. Transactional adherence to the ACID tenets is commonplace in modern
DBMSs and is a prerequisite for ensuring the safety and reliability of data.
Atomicity
Consistency
Isolation
Durability
Anytime a data modification occurs, SQL Server writes a record of the change to the
transaction log before the change itself is performed. This is the reason SQL Server
is described as having a write-ahead log�log records are written ahead of their
corresponding data changes. Failing to do this could result in data changes that
would not be rolled back if the server failed before the log record was written.
Modifications are never made directly to disk. Instead, SQL Server reads data pages
into a buffer area as they're needed and changes them in memory. Before it changes
a page in memory, the server ensures that the change is recorded in the transaction
log. Since the transaction log is also cached, these changes are initially made in
memory as well. Write-ahead logging ensures that the lazywriter process does not
write modified data pages ("dirty" pages) to disk before their corresponding log
records.
In terms of transactions, triggers behave as though they were nested one level
deep. If a transaction that contains a trigger is rolled back, so is the trigger. If the
trigger is rolled back, so is any transaction that encompasses it.
SELECT TOP 5 title_id, stor_id FROM sales
ORDER BY
title_id, stor_id
BEGIN TRAN
DELETE sales
title_id, stor_id
GO
ROLLBACK TRAN
title_id, stor_id
title_id stor_id
-------- -------
BU1032 6380
BU1032 8042
BU1032 8042
BU1111 8042
BU2075 7896
(5 row(s) affected)
title_id stor_id
-------- -------
(0 row(s) affected)
title_id stor_id
-------- -------
BU1032 6380
BU1032 8042
BU1032 8042
BU1111 8042
BU2075 7896
(5 row(s) affected)
Distributed Transactions
The CREATE INDEX, BULK INSERT, TRUNCATE TABLE, SELECT… INTO, and
WRITETEXT/UPDATETEXT commands minimize transaction logging by causing only
page operations to be logged. (BULK INSERT can, depending on the circumstances,
create regular detail log records.) Contrary to a popular misconception, these
operations are logged�it's just that they don't generate detail transaction log
information. That's why Books Online refers to them as nonlogged
operations�they're nonlogged in that they don't generate row-level log records. I
often refer to them as minimally logged operations.
Nonlogged operations tend to be much faster than fully logged operations. And since
they generate page allocation log records, they can be rolled back (but not forward)
just like other operations. The price you pay for using them is transaction log
recovery. Nonlogged operations reduce the granularity of the information written to
the transaction log, so they also impact the granularity of the recovery process. This
is often quite acceptable, but it's something you should be aware of.
Naturally, the current recovery model affects transactions and transaction log
management. The Simple recovery mode effectively truncates the transaction log at
each system-generated checkpoint. The Bulk-Logged recovery model fully logs all
operations except nonlogged operations. The Full recovery mode logs all operations,
including those that would otherwise be nonlogged.
One obvious way to avoid logging as well as resource blocks and deadlocks in a
database is by making the database read-only. Naturally, if the database can't be
changed, there's no need for transaction logging or resource blocks. Making the
database single-user even alleviates the need for read locks, avoiding the possibility
of an application blocking itself.
au_lname, au_fname
BEGIN TRAN
DELETE authors
DELETE sales
au_lname, au_fname
ROLLBACK TRAN
PRINT 'End of batch -- never makes it here'
GO
SELECT TOP 5 au_lname, au_fname FROM
authors ORDER BY
au_lname, au_fname
SET XACT_ABORT ON
au_lname au_fname
---------------------------------------- ------------------
--
Bennet Abraham
Blotchet-Halls Reginald
Carson Cheryl
DeFrance Michel
(5 row(s) affected)
'FK__titleauth__au_id__164452B1'. The
conflict occurred in
au_lname au_fname
---------------------------------------- ------------------
--
Bennet Abraham
Blotchet-Halls Reginald
Carson Cheryl
DeFrance Michel
del Castillo Innes
(5 row(s) affected)
READ UNCOMMITTED
Specifying READ UNCOMMITTED is essentially the same as using the NOLOCK hint
with every table referenced in a transaction. It is the least restrictive of SQL Server's
four TILs. It permits dirty reads (reads of uncommitted changes by other
transactions) and nonrepeatable reads (data that changes between reads during a
transaction). To see how READ UNCOMMITTED permits dirty and nonrepeatable
reads, run the queries shown in Listing 13.3 simultaneously.
-- Query 1
-- Query 2
IF @@ROWCOUNT>0 BEGIN
WAITFOR DELAY '00:00:05'
(5 row(s) affected)
(0 row(s) affected)
While the first query is running (you have five seconds), fire off the second one and
you'll see that it's able to access the uncommitted data modifications of the first
query. It then waits for the first transaction to finish, then attempts to read the same
data again. Since the modifications were rolled back, the data has vanished, leaving
the second query with a nonrepeatable read.
READ COMMITTED
READ COMMITTED is SQL Server's default TIL, so if you don't specify otherwise, you'll
get READ COMMITTED. READ COMMITTED avoids dirty reads by initiating share locks
on accessed data but permits changes to underlying data during the transaction,
possibly resulting in nonrepeatable reads and/or phantom data. To see how this
works, run the queries shown in Listing 13.4 simultaneously.
-- Query 1
-- Query 2
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
UPDATE sales SET qty=6 WHERE qty=5
As in the previous example, start the first query, then quickly run the second one
simultaneously (you have five seconds).
In this example, the value of the qty column in the first row of the sales table
changes between reads during the first query�a classic nonrepeatable read.
REPEATABLE READ
REPEATABLE READ initiates locks to prevent other users from changing the data a
transaction accesses but doesn't prevent new rows from being inserted, possibly
resulting in phantom rows appearing between reads during the transaction. Listing
13.5 provides an example. (As with the other examples, start the first query, then
run the second one simultaneously�you have five seconds to start the second
query.)
-- Query 1
-- Query 2
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
INSERT sales VALUES
(6380,9999999,GETDATE(),2,'USG-Whenever','PS2091')
Nothing up my sleeve...
title_id qty
-------- ------
PS2091 3
BU1032 5
PS2091 10
MC2222 10
BU1032 10
As you can see, a new row appears between the first and second reads of the sales
table, even though REPEATABLE READ has been specified. Though REPEATABLE
READ prevents changes to data it has already accessed, it doesn't prevent the
addition of new data, thus introducing the possibility of phantom rows.
SERIALIZABLE
SERIALIZABLE prevents dirty reads and phantom rows by placing a range lock on the
data it accesses. It is the most restrictive of SQL Server's four TILs. It's equivalent to
using the HOLDLOCK hint with every table a transaction references. Listing 13.6
gives an example. (Delete the row you added in the previous example before
running this code.)
-- Query 1
-- Query 2
BEGIN TRAN
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
-- This INSERT will be delayed until the
first transaction completes
INSERT sales VALUES
(6380,9999999,GETDATE(),2,'USG-Whenever','PS2091')
ROLLBACK TRAN
Nothing up my sleeve...
title_id qty
-------- ------
PS2091 3
BU1032 5
PS2091 10
MC2222 10
BU1032 10
...or in my hat
title_id qty
-------- ------
PS2091 3
BU1032 5
PS2091 10
MC2222 10
BU1032 10
In this example, the locks initiated by the SERIALIZABLE isolation level prevent the
second query from running until after the first one finishes. While this provides
airtight data consistency, it does so at a cost of greatly reduced concurrency.
BEGIN TRAN[SACTION]
[name|@TranNameVar]
COMMIT TRAN[SACTION]
[name|@TranNameVar]
ROLLBACK TRAN[SACTION]
[name|@TranNameVar]
BEGIN TRAN
----------------- -----------
---------------- -----------
----------------------- -----------
------------------------ -----------
------------------- -----------
au_id
-----------
213-46-8915
409-56-7008
267-41-2394
724-80-9391
213-46-8915
SELECT 'Before BEGIN
TRAN',@@TRANCOUNT
BEGIN TRAN
IF @@TRANCOUNT>0 BEGIN
----------------- -----------
---------------- -----------
----------------------- -----------
------------------- -----------
au_id
-----------
213-46-8915
409-56-7008
267-41-2394
724-80-9391
213-46-8915
Server: Msg 6401, Level 16, State 1, Line
10
IF @@TRANCOUNT>0 BEGIN
ROLLBACK TRAN -- Never makes it here
because of the -- earlier ROLLBACK
END
---------------------- -----------
--------------------- -----------
----------------------- -----------
------------------------ -----------
au_id
-----------
213-46-8915
409-56-7008
267-41-2394
724-80-9391
213-46-8915
SELECT 'Before BEGIN TRAN
main',@@TRANCOUNT
-- @@TRANCOUNT is unchanged
ROLLBACK TRAN sales SELECT 'After
ROLLBACK TRAN sales',@@TRANCOUNT
-- @@TRANCOUNT is unchanged SELECT
TOP 5 au_id FROM titleauthor IF
@@TRANCOUNT>0 BEGIN
ROLLBACK TRAN
END
---------------------- -----------
--------------------- -----------
--------------------- -----------
----------------------- -----------
--------------------------- -----------
------------------------- -----------
au_id
-----------
213-46-8915
409-56-7008
267-41-2394
724-80-9391
213-46-8915
------------------- -----------
au_id
-----------
213-46-8915
409-56-7008
267-41-2394
724-80-9391
213-46-8915
SELECT 'Before BEGIN
TRAN',@@TRANCOUNT
BEGIN TRAN
END
SELECT TOP 5 au_id FROM titleauthor
ROLLBACK TRAN -- This is an error --
there's no transaction to roll back SELECT
'After ROLLBACK TRAN',@@TRANCOUNT
----------------- -----------
---------------- -----------
----------------------- -----------
------------------- -----------
au_id
-----------
213-46-8915
409-56-7008
267-41-2394
724-80-9391
213-46-8915
------------------- -----------
au_id
-----------
213-46-8915
409-56-7008
267-41-2394
724-80-9391
213-46-8915
IF @@TRANCOUNT>0 BEGIN
ROLLBACK TRAN
END
INSERT #logrecs
EXEC('DBCC LOG(''pubs'')')
00000035:00000144:0001
LOP_BEGIN_CKPT LCX_NULL
0000:00000000
00000035:00000145:0001 LOP_END_CKPT
LCX_NULL 0000:00000000
00000035:00000146:0001
LOP_MODIFY_ROW LCX_SCHEMA_VERSION
0000:00000000
00000035:00000146:0002
LOP_BEGIN_XACT LCX_NULL
0000:000020e0
00000035:00000146:0003
LOP_MARK_DDL LCX_NULL 0000:000020e0
00000035:00000146:0004
LOP_COMMIT_XACT LCX_NULL
0000:000020e0
00000035:00000147:0001
LOP_MODIFY_ROW LCX_SCHEMA_VERSION
0000:00000000
00000035:00000147:0002
LOP_BEGIN_XACT LCX_NULL
0000:000020e1
00000035:00000147:0003
LOP_MARK_DDL LCX_NULL 0000:000020e1
Don't require user input during a transaction. Doing so could allow a slow user
to tie up server resources indefinitely. It could also cause the transaction log to
fill prematurely since active transactions cannot be cleared out of it.
Try to use optimistic concurrency control when possible. That is, rather than
explicitly locking every object your application may change, allow the server to
determine when a row has been changed by another user. You may find that
this occurs so little in practice (perhaps the app is naturally partitioned, or,
once entered, rows are rarely updated, and so on) as to be worth the risk in
order to improve concurrency.
Try to use lower (less restrictive) TILs when possible. READ COMMITTED, the
default, is suitable for most applications and will provide better concurrency
than REPEATABLE READ or SERIALIZABLE.
The current TIL governs transaction isolation. You set the current TIL via the SET
TRANSACTION ISOLATION LEVEL command. Each TIL represents a trade-off between
concurrency and consistency.
In this chapter, you became acquainted with SQL Server transactions and explored
the various Transact-SQL commands that relate to transaction management. You
learned about auto-commit and implicit transactions as well as user-defined and
distributed transactions. You also explored some common transaction-related pitfalls
and you learned methods for avoiding them.
Knowledge Measure
1. True or false: The Simple recovery model fully logs all operations except
nonlogged operations such as BULK INSERT.
4. True or false: SQL Server automatically rolls back a transaction that was
initiated from an aborted Transact-SQL batch.
7. What DBCC command reports the oldest active transaction for a database and
the spid that initiated it?
[1] Ingersoll, Robert Green. "The Ghosts." In Best of Robert Ingersoll: Selections from His Writings and Speeches, ed. Roger E.
Greeley. Amherst, NY: Prometheus Books, 1983, p. 34.
In this chapter, we'll update the coverage of cursors that first appeared in my book
The Guru's Guide to Transact-SQL. We'll continue the discussion begun in that book
regarding cursors and how to use them in SQL Server applications, and we'll update
the information to cover the current release of the product.
Overview
A cursor is a mechanism for accessing the rows in a table or result set on a
piecemeal basis�one at a time. They run counter to SQL Server's normal way of
doing things by parceling result sets into individual rows; fetching a row from a
cursor is analogous to returning a single row via a SELECT statement. Unlike a
traditional result set, a cursor keeps track of its position automatically and provides
a wealth of facilities for scrolling around in the underlying result set. Cursors also
provide a handy means of updating the underlying result set in a positional fashion
and of returning result set pointers via variables.
The advice I usually give people who are thinking about using cursors is not to. If you
can solve a problem using Transact-SQL's many set-oriented tools, do so. It's rare
(but not impossible) for a cursor-based solution to outperform a set-based approach.
SQL Server's standard result sets (also known as "firehose" cursors) have been used
to solve a myriad of distinct kinds of computing problems for years�there aren't
many conventional database challenges that actually require a cursor, though some
are certainly more suited to cursors than to set handling.
On Cursors and ISAM Databases
People porting ISAM or local database applications to SQL Server are often tempted
to perform shallow ports�to make no more changes than absolutely necessary to
get the app working on the new DBMS. This usually involves shortcuts like replacing
ISAM record navigation (e.g., ADO's Recordset.MoveNext) with Transact-SQL cursor
loops. ISAM records and SQL Server cursors aren't synonymous, and any effort to
treat an RDBMS like an ISAM product is likely to go down in flames.
Some time ago, I had the misfortune of assuming the task of porting an ISAM
database application to a full-blown SQL Server app. I was trying to get the company
to move to client/server RDBMS technology and after months of ambivalence they
finally decided that they wanted to convert their flagship application from an ISAM
product to SQL Server as a kind of proof-of-concept. Since, in spite of my best
efforts, the intrinsic benefits of RDBMSs weren't apparent to them, I was inclined to
accept the challenge in order to prove the viability of the technology. This was
despite the fact that I would much rather have started with a new app than with an
existing, vitally important product.
With my guardian angel in silent verbal assault, and without having investigated the
code much, I accepted the task, naively believing that the developers had built the
app in a reasonably relational and logical manner. Having nothing to suggest
otherwise, I assumed that they were processing records in sets where possible in
order to save time and code; even the puny local DBMS on which the app was built
supported a fair amount of set-oriented access (including its own basic SQL dialect).
Of course, I didn't expect the code to be perfect, but I guess I assumed they'd used
their tools more or less as they were intended to be used. In talking with the app's
authors, that's certainly the impression they gave me, and I quickly rushed in where
angels fear to tread.
After two to three weeks of wading through some of the worst application code I'd
ever seen, of watching the application block itself from server resources due to its
dreadful design, and of having one bowling ball after another roll out of the top of
the proverbial closet and hit me in the head, I finally pulled the plug on the SQL
Server conversion.
The app broke virtually every basic tenet of sensible database application design. It
used application code to loop through tables rather than processing rows in sets.
What minimal relational and data integrity it had was implemented in a hodge-
podge of application code and database constraints and was far from airtight. It used
a fatuous table-versioning scheme that had never been finished or fully
implemented and gave no thought to consistent naming conventions or name
casing, so database objects had arcane names that were impossible to remember
and incongruous with one another. The same attribute in multiple tables often had
different names, and different attributes among multiple tables often had the same
name. Tables were denormalized throughout the database, not for performance but
because the developers didn't know any better. There'd been no attempt to provide
for concurrency, and the app was by design (or by the lack of it) strictly a single-user
contrivance. In short, it was a complete disaster from an architectural standpoint,
and the fact that it had ever worked at all, even on the ISAM product, was more a
testament to the developers' tenacity than to the robustness of the app.
So, shortly after this experience, I began rewriting the application. Of course, I could
have taken the "easy" way out and merely performed a shallow port of the app to
SQL Server, essentially turning the server into a glorified ISAM database server. I
could have reused as much of the existing code as possible, regardless of how
poorly designed it was. Every row-by-row access in the app could have been
translated to an equivalent cursor operation on SQL Server. I could have used SQL
Server in ways it was never intended to be used, and I could have refrained from
fixing the many relational and other problems in the app, madly bolting the various
disparate pieces together into a misshapen, software-borne Frankenstein. I could
have done that�it certainly would have been faster in the short run and would have
made management happier�but I just couldn't bring myself to do it. It's been my
experience that there's usually an optimal way to build software, and all my
instincts, training, and knowledge told me that this wasn't it.
Instead, it was apparent to me that the app would have to be redesigned from the
ground up if it was to have a prayer of working properly on SQL Server or any other
RDBMS. The acute need for a rewrite was as much due to the radical differences
between ISAM products and RDBMSs as it was to poor design and coding in the
application to begin with. The fact that software appears to work properly doesn't
mean that it has been constructed properly any more than the fact that a house
appears to be sound means that it won't fall into the ground the first time you try to
build onto it. There is more to application design than whether the app meets
immediate customer requirements. Making customers happy is paramount, but it
should not result in the complete neglect of long-term concerns such as extensibility,
interoperability, performance, scalability, concurrency, and supportability.
These may seem like technology-centric concerns, but customers care about these
things, too, whether they know it or not. They're certainly affected by them
indirectly, if not directly. A feature request that might seem trivial to the typical
user�converting a single-user app to a multiuser app, for example�can be difficult
if not impossible if the app was designed incorrectly to begin with. If the app's
designer gave no thought to concurrency when building it, the app will likely have to
be rewritten in order to accommodate multiple users. This rewrite translates into
delayed releases and users having to wait on the features they need. Application
design affects real people in real ways. Beauty is not in the eye of the beholder�it's
in the eye of the designer.
The really ironic thing about the whole experience was that many of the problem
application's design decisions didn't make any more sense on the ISAM database
platform than they would have on SQL Server. It's just that SQL Server would have
exposed many of these defects to the light of day. It would have forced the app to
clean up its act or go elsewhere. Because of their emphasis on robustness and
performance, RDBMSs tend to be less forgiving of application misbehavior than ISAM
products. I don't lament this�I think it's a good thing. Developers shouldn't build
shoddy applications regardless of the backend.
Porting an ISAM application to SQL Server is not a menial task, even for a properly
designed application. Quickly performing a shallow port by doing things like
replacing ISAM access with SQL Server cursors is almost never the right approach. It
takes a good amount of moral fortitude and a stiff spine to say, "This port is going to
take some work; the app will have to be redesigned or rewritten," but that's often
the best approach. Reinventing the wheel is fine�even necessary�if the wheel
you're "reinventing" was a square one to begin with. Do deep ports when moving
applications to SQL Server; think of it as the foundation on which your applications
should stand, not as just another service they use. Shallow ports are for those who,
as Ron Soukup says, "believe that there's never time to do the port right but there's
always time to do it over."[2]
[2] Soukup, Ron. Inside SQL Server 6.5. Redmond, WA: Microsoft Press, 1998, p. 533.
Types of Cursors
There are four types of cursors supported by Transact-SQL: FORWARD_ ONLY,
DYNAMIC, STATIC, and KEYSET. The primary differences between these types is in
the ability to detect changes to their underlying data while the cursor is being
traversed and in the resources (locks, tempdb space, and so on) they use.
Depending on the type of cursor you create, changes made to its underlying data
may or may not be shown while traversing the cursor. In addition to new column
values, these changes can affect which rows are returned by the cursor
(membership) as well as the ordering of those rows. Also, opening the cursor may
cause the entirety of its result set (or their keys) to be placed in a temporary table,
possibly causing resource contention problems in tempdb. Table 14.1 summarizes
the different cursor types and their attributes.
Forward-Only Cursors
A forward-only cursor (the default) returns rows sequentially from the database. It
does not require space in tempdb, and changes made to the underlying data are
visible as soon as they're reached. Listing 14.1 shows an example.
OPEN c
FETCH c
UPDATE #temp
SET c1=2
WHERE k1=3
FETCH c
FETCH c
CLOSE c
DEALLOCATE c
GO
DROP TABLE #temp
k1 c1
----------- -----------
1 NULL
k1 c1
----------- -----------
2 NULL
k1 c1
----------- -----------
3 2
k1 c1
----------- -----------
1 NULL
2 NULL
3 2
4 NULL
Dynamic Cursors
OPEN c
FETCH c
UPDATE #temp
SET c1=2
WHERE k1=1
FETCH c
FETCH PRIOR FROM c
CLOSE c
DEALLOCATE c
GO
DROP TABLE #temp
k1 c1
----------- -----------
1 NULL
k1 c1
----------- -----------
2 NULL
k1 c1
----------- -----------
1 2
k1 c1
----------- -----------
1 2
2 NULL
3 NULL
4 NULL
Here, we fetch a row, then update it, fetch another, and then refetch the first row.
When we fetch the first row for the second time, we see the change made via the
UPDATE, even though the UPDATE didn't use the cursor to make its change.
Static Cursors
A static cursor returns a read-only result set that's impervious to changes to the
underlying data. It's the opposite of a dynamic cursor, though it's still completely
scrollable. Once a static cursor is opened, changes made to its source data are not
reflected by the cursor. This is because the entirety of its result set is copied to
tempdb when it's first opened. Static cursors are sometimes called snapshot or
insensitive cursors because they aren't sensitive to changes made to their source
data. Listing 14.3 shows an example.
UPDATE #temp
SET c1=2
WHERE k1=1
CLOSE c
DEALLOCATE c
GO
DROP TABLE #temp
k1 c1
----------- -----------
1 NULL
k1 c1
----------- -----------
1 2
2 NULL
3 NULL
4 NULL
Here, we open the cursor and immediately make a change to the first row in its
underlying table. This change isn't reflected when we fetch that row from the cursor
because the row is actually coming from tempdb. A subsequent SELECT from the
underlying table shows the change to be intact even though it's not reflected by the
cursor.
Keyset Cursors
Opening a keyset cursor returns a fully scrollable result set whose membership and
order are fixed. Like forward-only and static cursors, changes to the values in its
underlying data (except for keyset columns) are reflected when they're accessed;
however, new row insertions are not reflected by the cursor. Similarly to a static
cursor, the set of unique key values for the cursor's rows are copied to a table in
tempdb (hence the term "keyset") when the cursor is opened. That's why
membership in the cursor is fixed. If the underlying table doesn't have a primary or
unique key, the entire set of candidate key columns is copied to the keyset table.
Since changes to keyset columns aren't reflected by the cursor, failing to define a
unique key of some type for the underlying data results in a keyset that doesn't
reflect changes to any of its candidate key columns. Listing 14.4 shows a simple
keyset example.
CREATE TABLE #temp (k1 int identity PRIMARY KEY, c1 int NULL)
UPDATE #temp
SET c1=2
WHERE k1=1
CLOSE c
DEALLOCATE c
GO
DROP TABLE #temp
k1 c1
----------- -----------
1 2
k1 c1
----------- -----------
4 NULL
k1 c1
----------- -----------
1 2
2 NULL
3 NULL
4 NULL
5 3
Here, once the keyset cursor is opened, a change is made to its first row before the
row is fetched from the cursor. Another row is then inserted into the underlying
table. Once the routine begins fetching rows from the cursor, the first change we
made shows up, but the new row doesn't. This is because membership in a keyset
cursor doesn't change once it's opened.
Note the inclusion of a PRIMARY KEY constraint in the work table. Without it, changes
to the table's c1 column aren't visible to the cursor, even though the cursor has an
identity column. Why? Because, in and of themselves, identity columns aren't
guaranteed to be unique. You could always use SET IDENTITY_INSERT to add
duplicate identity values, or reset the identity seed to have the server add them for
you. To ensure uniqueness, a PRIMARY KEY or UNIQUE KEY constraint is required.
Without a unique key, the server copies the entirety of the candidate keys for each
row to the keyset cursor's temporary table.
CREATE TABLE #series
(key1 int,
DECLARE s CURSOR
FOR
SELECT DISTINCT key2 FROM #series
ORDER BY key2
OPEN s
SET @sql=''
EXEC(@sql)
CLOSE s
DEALLOCATE s
key1 1 2 3 4 5 6 7
key1 1 2 3 4 5 6 7
USE pubs
OPEN objects
CLOSE objects
DEALLOCATE objects
-----------------------------------------------------------
-------
-----------------------------------------------------------
-------
Scrollable Forms
FOR <span
class="docEmphasis">select</span>
DECLARE <span
class="docEmphasis">name</span>
CURSOR
[LOCAL | GLOBAL]
[FORWARD_ONLY | SCROLL]
[READ_ONLY | SCROLL_LOCKS |
OPTIMISTIC]
[TYPE_WARNING]
FOR <span
class="docEmphasis">select</span>
DECLARE c CURSOR
OPEN c
FETCH c
UPDATE #temp
SET c1=2
WHERE CURRENT OF c
CLOSE c
DEALLOCATE c
GO
k1 c1
----------- -----------
1 NULL
k1 c1
----------- -----------
12
2 NULL
3 NULL
4 NULL
CREATE TABLE #temp (k1 int identity, c1
int NULL, c2 int NULL)
DECLARE c CURSOR
FOR UPDATE OF c1
OPEN c
FETCH c
UPDATE #temp
SET c2=2
WHERE CURRENT OF c
k1 c1 c2
----------- ----------- -----------
1 NULL NULL
DEALLOCATE c
GO
CREATE TABLE #temp (k1 int identity, c1
int NULL)
DECLARE c CURSOR
SET @k1=3 -- Need to move this before
the DECLARE CURSOR
OPEN c
FETCH c
UPDATE #temp
SET c1=2
WHERE CURRENT OF c
CLOSE c
DEALLOCATE c
GO
k1 c1
----------- -----------
k1 c1
----------- -----------
1 NULL
2 NULL
3 NULL
4 NULL
DECLARE Darryl CURSOR -- My brother
Darryl
LOCAL
GLOBAL
OPEN Darryl
FETCH Darryl
CLOSE Darryl
DEALLOCATE Darryl
au_lname au_fname
---------------------------------------- ------------------
--
White Johnson
stor_id title_id qty
GLOBAL
DECLARE LocalCursor CURSOR STATIC --
Declare a LOCAL cursor
LOCAL
SELECT @@CURSOR_ROWS AS
NumberOfGLOBALCursorRows
OPEN LocalCursor
SELECT @@CURSOR_ROWS AS
NumberOfLOCALCursorRows
DEALLOCATE LocalCursor
GO
NumberOfGLOBALCursorRows
------------------------
NumberOfLOCALCursorRows
-----------------------
SET NOCOUNT ON
OPEN c
CLOSE c
DEALLOCATE c
GO
-----------
k1
-----------
k1
-----------
k1
-----------
10
k1
-----------
SET NOCOUNT ON
DECLARE @k int
OPEN c
FETCH c INTO @k
SELECT @k
FETCH c INTO @k
END
CLOSE c
DEALLOCATE c
GO
-----------
-----------
-----------
-----------
-----------
-----------
-----------
7
-----------
-----------
-----------
10
USE pubs
SET NOCOUNT ON
DECLARE c CURSOR SCROLL
OPEN c
FETCH c
UPDATE sales
SET qty=4
WHERE qty=3 -- We happen to know that
only one row qualifies,
CLOSE c
DEALLOCATE c
Before image
title_id qty
-------- ------
PS2091 3
After image
title_id qty
-------- ------
PS2091 4
CLOSE
-- can be configured
USE northwind
SET @start=getdate()
CLOSE c
SET @start=getdate()
CLOSE c
DEALLOCATE c
GO
OPEN c
SET CURSOR_CLOSE_ON_COMMIT ON
BEGIN TRAN
UPDATE #temp
SET c1=2
WHERE k1=1
COMMIT TRAN
FETCH c
FETCH LAST FROM c
CLOSE c
DEALLOCATE c
GO
USE pubs
SET CURSOR_CLOSE_ON_COMMIT ON
BEGIN TRAN
OPEN c
FETCH c
SET qty=qty+1UPDATE sales WHERE
CURRENT OF c
ROLLBACK TRAN
FETCH c
CLOSE c
DEALLOCATE c
GO
SET CURSOR_CLOSE_ON_COMMIT OFF
qty
------
5
BEGIN TRAN
OPEN c
FETCH c
UPDATE sales
SET qty=qty+1
WHERE CURRENT OF c
ROLLBACK TRAN
CLOSE c
DEALLOCATE c
qty
------
5
qty
------
3
qty
------
30
Despite the fact that a transaction is rolled
back while our dynamic cursor is open, the
cursor is unaffected. This contradicts the
way the server is documented to behave.
SET NOCOUNT ON
DECLARE C CURSOR DYNAMIC
OPEN c
FETCH c
-- A positioned UPDATE
FETCH c
-- A positioned DELETE
CLOSE c
DEALLOCATE c
stor_id ord_num ord_date qty payterms
title_id ------- -------- ------------------------ ----- --
------- --------
OPEN c
FETCH @sc
SET NOCOUNT ON
FETCH @mycursor
FETCH @mycursor
END
END
CLOSE @mycursor
DEALLOCATE @mycursor
Each of these returns its result via a cursor output parameter, so you'll need to
supply a local cursor variable in order to process them.
Procedure Function
Don't use static/insensitive cursors unless you need them. Opening a static
cursor causes all of its rows to be copied to a temporary table. That's why it's
insensitive to changes�it's actually referencing a copy of the table in tempdb.
Naturally, the larger the result set, the more likely declaring a static cursor
over it will cause resource contention issues in tempdb.
Don't use keyset cursors unless you really need them. As with static cursors,
opening a keyset cursor creates a temporary table. Though this table contains
only key values from the underlying table (unless no unique key exists), it can
still be quite substantial when dealing with large result sets.
Define read-only cursors using the READ_ONLY keyword. This prevents you
from making accidental changes and lets the server know that the cursor will
not alter the rows it traverses.
Be careful with modifying large numbers of rows via a cursor loop that's
contained within a transaction. Depending on the transaction isolation level,
those rows may remain locked until the transaction is committed or rolled
back, possibly causing resource contention on the server.
Consider using asynchronous cursors with large result sets in order to return
control to the caller as quickly as possible. Asynchronous cursors are especially
useful when returning a sizeable result set to a scrollable form because they
allow the application to begin displaying rows almost immediately.
OPEN c
FETCH c
CLOSE c
DEALLOCATE c
GO
DROP TABLE #temp
Recap
Cursors are not the recommended way to solve most data access or update
problems, and they can cause serious performance headaches when used
improperly. Your first thought (before using Transact-SQL) when contemplating how
to solve a problem you have seen should be to align your code with the way SQL
Server was designed to work�that is, to access data in sets if at all possible. Resort
to using cursors only after you've explored as many set-based alternatives as
possible.
Knowledge Measure
1. True or false: When porting an ISAM application to SQL Server, you should
attempt to change as little about the app as possible, particularly with respect
to the way that data is accessed.
4. True or false: To return a cursor from a stored procedure, you must pass the
cursor the procedure's return statement.
7. What mechanism does SQL Server use to store the data returned by a static
cursor?
8. What automatic variable is typically used to control a loop that iterates through
a cursor using FETCH?
10. What function can you use to check the status of a cursor?
Chapter 15. ODSOLE
The surest way to corrupt a youth is to instruct him to hold in higher esteem
those who think alike than those who think differently.
�Friedrich Nietzsche[1]
[1] Nietzsche, Friedrich. "The Dawn." In The Portable Nietzsche, ed. Walter Kaufmann. New York: The Viking Press, 1954, p. 91.
In this chapter, we'll talk about automating (i.e., controlling) COM components using
SQL Server's Open Data Services Object Linking and Embedding (ODSOLE) facility.
ODSOLE is implemented via Transact-SQL's sp_OA extended procedures (e.g.,
sp_OACreate, sp_OAMethod, and so on). We'll talk about how Automation works in
general, then we'll explore several examples of it using the sp_OA procs.
You can also create your own COM objects and access them from T-SQL using the
ODSOLE facility. You can wrap functionality not available from T-SQL in a COM
component and call it from within your T-SQL batches and stored procedures.
Before we get into working with COM objects from Transact-SQL via ODSOLE, let's
discuss a few basics regarding COM threading models and concurrency.
Understanding how threading works with respect to ODSOLE and how object
concurrency is managed will help us better understand how to use ODSOLE
effectively and safely.
Each object and each thread can belong to only one apartment. Only the threads
within an apartment can access the objects in that apartment directly; all other
threads go through COM proxies of some sort. A thread establishes residence in an
apartment (and optionally specifies the threading model it wants to use) through a
call to a COM initialization function such as CoInitialize, CoInitializeEx, or
OleInitialize.
In the STA model, an apartment has a single thread and can contain multiple
objects. In the MTA model, an apartment can have multiple threads and multiple
objects. A process can have multiple STAs but only one MTA. This one MTA can
coexist with multiple STAs in the same process.
ODSOLE is implemented using the STA model. If you attach to SQL Server with a
debugger and set a breakpoint on OleInitialize (in OLE32.DLL) before making your
first sp_OA call, you'll see that the ODSOLE code calls OleInitialize. OleInitialize is
hard-coded to use the STA model, hence, we can deduce that ODSOLE uses the STA
model. (Once a thread has been initialized for a particular COM threading model it
cannot be changed to a different one without first uninitializing COM.)
The threading model for an in-process COM server (a DLL) is not specified via a call
to CoInitializeEx. Instead, it's specified via a registry key, like this:
HKEY_LOCAL_MACHINE\SOFTWARE\Classes\CLSID\InprocServer32\
ThreadingModel
The value of this key can be Apartment, Free, or Both. If the key is not found, STA is
assumed.
As I've said, only the threads residing in the apartment in which a COM object was
created can directly access the object. Other threads access the object through
proxy objects. Given that only one thread resides in a given STA apartment, COM
takes responsibility for synchronizing object access by other threads. It accomplishes
this via Windows' messaging facilities. It creates a hidden window for each
apartment and sets up proxies to post messages to the apartment owning an object
in order to invoke it. The serialization of access to the object is managed through
Windows' normal windows message queuing facilities. Methods on the object are
called in response to messages posted to the apartment's hidden window. As
messages are pulled out of the message queue (via PeekMessage and GetMessage)
and dispatched (via DispatchMessage), the window procedure for the thread, which
is implemented by COM, invokes the appropriate methods on the object. The process
is reversed when the method call completes and results need to be returned to the
calling thread. Messages are posted to the hidden window for the calling thread's
apartment to provide the result(s) and indicate function completion. These
messages are picked up by the calling thread, and it, in turn, returns through the
proxy object method call, thus completing the method call on the object in the other
apartment.
When a component is configured for the STA model, the objects it exposes are
created on the SQL Server worker thread that called sp_OACreate. Other worker
threads can't access these object instances. Given that sp_OA frees all created
objects when a batch exits anyway (thus ensuring that all calls to a given object
occur on the same physical thread), the use of STA by ODSOLE works out well.
When a component is configured for the MTA model, COM automatically starts a host
MTA and instantiates the objects in it. When a component's threading model has
been configured as both STA and MTA compatible, the object is created in the calling
STA.
The first thread to initialize COM using the STA threading model becomes the main
STA. This STA is required to remain alive until all COM work is completed because
some in-process servers are always created in the context of the main STA.
Once created, the main STA is all but ignored by ODSOLE. Each worker thread that
services sp_OA calls makes its own call to OleInitialize and increments the reference
count on the main STA message thread. The main STA remains in existence until SQL
Server is shut down or sp_OAStop is called.
An application can make use of COM objects through two basic means: through early
binding or through late binding. When an application makes object references that
are resolvable at compile-time, the object is considered early bound. To early bind an
object in Visual Basic, you add a reference to the library containing the object during
development, then Dim specific instances of it. To early bind an object in tools like
Visual C++ and Delphi, you import the object's type library and work with the
interfaces it provides. In either case, you code directly to the interfaces exposed by
the object as though they were interfaces you created yourself. The object itself may
live on a completely separate machine and be accessed via Distributed COM (DCOM)
or be marshaled by a transaction manager such as Microsoft Transaction Server or
Component Services. Generally speaking, you don't care�you just code to the
interface.
When references to an object aren't known until runtime, the object is late bound.
You normally instantiate it via a call to CreateObject and store the object reference in
a variant. Since the compiler didn't know what object you were referencing at
compile-time, you may encounter bad method calls or nonexistent properties at
runtime. That's the trade-off with late binding. It's more flexible in that you can
decide at runtime what objects to create and can even instantiate objects that didn't
exist on the development system, but it's more error prone�it's easy to make
mistakes when you late bind objects because your development environment can't
provide the same level of assistance it can when it knows the objects you're dealing
with. Accessing COM objects via late binding is also slower than doing so via early
binding, sometimes dramatically so. That said, late binding is all we have available
to us from ODSOLE, so that's what we'll focus on in this chapter.
The sp_OA Procedures
Transact-SQL's Automation stored procedures are named using the convention
sp_OAFunction, where Function indicates what the procedure does (e.g.,
sp_OACreate creates COM objects, sp_OAMethod calls a method, sp_OAGetProperty
and sp_OASetProperty get and set object properties, respectively, and so on). Each
of the sp_OA procs except sp_OACreate expects an integer parameter containing a
pointer to the previously created object. The sp_OACreate procedure, of course,
creates the object, and so it expects an integer variable to be passed in as an output
parameter to receive the reference to the object it creates. This integer actually
references an internal wrapper object created by ODSOLE to encapsulate the COM
object. This internal object contains a reference to the COM object as well as other
housekeeping information.
Some sp_OA procs may support returning an output parameter from a COM method
or property retrieval (e.g., sp_OAMethod or sp_OAGetProperty). If this parameter is
not supplied, a single-column, single-row result set is returned. If the call returns an
array, the output parameter is set to NULL if it is supplied, and the array is
translated into a result set. If the array is a single-dimensional array, a single row
with the array elements as columns is returned. If the array is a two-dimensional
array, it will be returned as a multirow result set. If the array has more than two
dimensions, an error is returned.
sp_OACreate
As I've mentioned, you use sp_OACreate to instantiate a COM object. The call to
sp_OACreate returns a pointer to an internal ODSOLE object that encapsulates the
reference to the underlying COM object. When sp_OACreate is called, the following
events occur.
1. The main STA is created if it does not already exist. You can see this from
WinDbg by trapping calls to OleInitialize. If it already exists, the main STA
thread's reference count is incremented.
2. TLS storage for the current worker thread is initialized. This storage is used for,
among other things, tracking the objects created during the batch so they can
be automatically released when the batch terminates.
4. Two ODS event handlers, one for language events and one for RPC events, are
set up to run when the batch completes. These handlers take care of
automatically releasing created objects, calling CoUninitialize for the current
worker thread, decrementing the reference count on the main STA, and
performing other housekeeping work when the batch terminates.
5. The supplied object name is passed to the OLE API function CLSIDFromProgID
in order to translate it from a ProgID to a COM class ID that can be instantiated
via CoCreateInstance. A ProgID, or, programmatic identifier, is a string that
identifies a COM object so that applications can access it by name. A ProgID
can't be instantiated directly by COM, so in order to instantiate an object using
its ProgID, we must translate its ProgID into its COM class ID using
CLSIDFromProgID. If CLSIDFromProgID fails, ODSOLE assumes the string passed
in is already a class ID and passes it to CLSIDFromString. If CLSIDFromString
fails, an error is returned and the call to sp_OACreate fails.
6. The class ID derived from the supplied object name is passed into
CoCreateInstance in order to create an instance of the COM object.
7. The QueryInterface COM method is called on the newly created object in order
to return a reference to its implementation of the IDispatch interface. As I've
mentioned, IDispatch is how late-binding clients interact with COM
components. You could say that they early bind to the IDispatch interface.
8. At this stage, the object is ready for use by the other sp_OA procs, so the
object reference is wrapped in an internal object, and a pointer to this internal
object is returned in the sp_OACreate output parameter.
You can pass a context parameter into sp_OACreate. This parameter becomes the
dwClsContext parameter to CoCreateInstance and determines the context in which
an object is created. An object can be instantiated as an in-process server (in which
case it runs in the same process as the caller�SQL Server�hence the term) or as an
out-of-process server (in which case it runs in its own process), or the context
parameter can be specified so as to support either context, with the actual context
used varying based on whether the COM component resides in a DLL or EXE file.
Creating a COM object out-of-process helps ensure that it cannot corrupt the SQL
Server process or cause other types of stability problems.
sp_OAMethod
When an RPC or language event calls sp_OAMethod, the following events occur.
1. The COM API function GetIDsOfNames is called to get the dispatch ID of the
method being called. If this fails, sp_OAMethod fails.
3. The IDispatch::Invoke method is called to invoke the method. If this fails, its
HRESULT is returned as the sp_OAMethod result.
sp_OASetProperty/sp_OAGetProperty
These procedures are very similar to sp_OAMethod. Setting/getting a property as
opposed to calling a method is essentially the same operation when using late
binding, so ODSOLE treats them very much the same. In fact, it has been my
experience that you can usually use sp_OASetProperty or sp_OAGetProperty
interchangeably with sp_OAMethod. GetIDsOfNames is called, as is
IDispatch::Invoke. A bitmap parameter to IDispatch::Invoke indicates whether a
property is being accessed or a method is being called; ODSOLE passes a mask that
includes both switches because they are virtually indistinguishable when accessing
an object via late binding.
sp_OAGetErrorInfo
This procedure returns the error information for the supplied object pointer or for the
current worker thread. Typically, you'll check for a nonzero return from one of the
other sp_OA calls, then call sp_OAGetErrorInfo to retrieve additional error information
as appropriate.
sp_OADestroy
The procedure releases the reference to an object created via sp_OACreate. Once an
object's reference has been released, it cannot be used in any further ODSOLE calls.
sp_OAStop
This shuts down OLE processing and stops the main STA thread. No additional sp_OA
calls can be made until sp_OACreate is called again to create an object and restart
the main STA thread.
The use of dot notation in the code above alleviates the need to first retrieve a
pointer to the Databases collection, then make a separate call to get a specific item
from it. The notation can be as deep as you need it to be; ODSOLE will traverse the
method or property name and navigate to the leaf term as appropriate.
Named Parameters
As with Visual Basic and other Automation controllers, ODSOLE supports the notion
of named parameters. These can be specified in method and property names (via
dot notation) and can also be specified on the command line to an sp_OA extended
proc. For the sp_OA procs, named parameters must come after the third parameter
to the proc (named parameters before the fourth parameter are ignored) and will
have their leading @ prefix stripped in the process. ODSOLE named parameters
must follow the normal rules for Automation named parameters: Unnamed
parameters must be specified before named parameters, and named parameters
can be specified in any order.
Automating with ODSOLE
In the next few sections, we'll look at several examples that show how to automate
COM objects using the sp_OA procs and ODSOLE. We'll walk through some basic
examples that show how to access COM functionality that is probably already on
your machine, then we'll explore automating SQL Server's Distributed Management
Objects (SQL-DMO) from ODSOLE. We'll finish up with some esoteric coverage of
implementing arrays in T-SQL using COM objects, using COM Interop and ODSOLE to
access objects in the .NET Framework, and a few other odds and ends.
sp_checkspelling
Listing 15.1 illustrates a simple procedure that uses the sp_OA procedures to
automate a COM object. The procedure instantiates the Microsoft Word Application
object and calls its CheckSpelling method to check the spelling of a word you pass to
the procedure.
USE master
GO
IF (OBJECT_ID('sp_checkspelling') IS NOT NULL)
DROP PROC sp_checkspelling
GO
CREATE PROC sp_checkspelling
@word varchar(30), -- Word to check
@correct bit OUT -- Returns whether word is correctly spelled
/*
Object: sp_checkspelling
Description: Checks the spelling of a word using the Microsoft Word
Application Automation object
Usage: sp_checkspelling
@word varchar(128), -- Word to check
@correct bit OUT -- Returns whether word is correctly spelled
Returns: (None)
$Author: Ken Henderson $. Email: [email protected]
Example: EXEC sp_checkspelling 'asdf', @correct OUT
Created: 2000-10-14. $Modtime: 2001-01-13 $.
*/
AS
IF (@word='/?') GOTO Help
DECLARE @object int, -- Work variable for instantiating
-- COM objects
@hr int -- Contains HRESULT returned by COM
-- Destroy it
EXEC @hr = sp_OADestroy @object
IF (@hr <> 0) BEGIN
EXEC sp_displayoaerrorinfo @object, @hr
RETURN @hr
END
RETURN 0
Help:
(Results)
----
0
There are three key elements of this procedure: the creation of the COM object, the
method call, and the disposal of the object. Let's begin with the call to sp_OACreate.
Calling sp_OACreate instantiates a COM object. Word.Application is a ProgID
associated with Microsoft Word. How do we know to specify Word.Application here?
Several ways�first, we could check the Word object interface as documented in
MSDN. Second, we could fire up Visual Basic and add a Reference to the Microsoft
Word Object Library to a project, then allow Visual Studio's Intellisense technology to
show us the objects and methods available from Word. (You can do the same thing
via Visual C++'s #import directive or Delphi's Project | Import Type Library option.)
Third, we could simply check the system registry and scan for all the interfaces
involving Microsoft Word. The registry, for example, tells us that Word.Application is
Word's VersionIndependentProgID string. This means that instantiating
Word.Application should work regardless of the version of Word that's installed.
We store the object handle that's returned by sp_OACreate in @object. This handle is
then passed into sp_OAMethod when we call methods on the Word.Application
interface. In this case, we call just one method, Check Spelling, and pass @word as
the word to check spelling for and @correct to receive the 1 or 0 returned by the
method.
When we're finished with the object, we destroy it through a call to sp_OADestroy.
Again, we pass in the @object handle we received earlier from sp_OACreate.
This is what it's like to work with COM objects in Transact-SQL. As with many
languages and technologies, you create the object, do some things with it, then
clean up after yourself when you're done.
sp_vbscript_reg_ex
This next example we'll look at adds regular expression support to Transact-SQL.
Regular expressions allow for wildcard and other types of string match tests. They
are a common feature in programmers' editors (e.g., the Sequin SQL programming
editor included on the CD accompanying this book supports regular expressions) and
are exposed in a variety of APIs and languages. One facility that provides a nice
regular expression evaluator is Microsoft's ActiveX scripting engine. It provides an
object named RegExp that encapsulates basic regular expression pattern matching
functionality. Listing 15.2 uses this object to perform a regular expression match
from a stored procedure via Automation and ODSOLE.
USE master
GO
IF OBJECT_ID('dbo.sp_vbscript_reg_ex','P') IS NOT NULL
DROP PROC dbo.sp_vbscript_reg_ex
GO
CREATE PROC dbo.sp_vbscript_reg_ex @pattern varchar(255),
@matchstring varchar(8000)
AS
declare @obj int
declare @res int
declare @match bit
set @match=0
exec @res=sp_OACreate 'VBScript.RegExp',@obj OUT
IF (@res <> 0) BEGIN
PRINT 'VBScript.RegExp Create failed'
EXEC sp_DisplayOAErrorInfo @obj, @res
RETURN
END
exec @res=sp_OASetProperty @obj, 'Pattern', @pattern
IF (@res <> 0) BEGIN
PRINT 'Set Pattern failed'
EXEC sp_DisplayOAErrorInfo @obj, @res
RETURN
END
exec @res=sp_OASetProperty @obj, 'IgnoreCase', 1
IF (@res <> 0) BEGIN
PRINT 'Set IgnoreCase failed'
EXEC sp_DisplayOAErrorInfo @obj, @res
RETURN
END
exec @res=sp_OAMethod @obj, 'Test',@match OUT, @matchstring
IF (@res <> 0) BEGIN
PRINT 'Test call failed'
EXEC sp_DisplayOAErrorInfo @obj, @res
RETURN
END
exec @res=sp_OADestroy @obj
return @match
As you can see, there isn't much code to this routine. The procedure takes the
following basic approach.
2. Set the Pattern property of the RegExp object. This establishes the regular
expression we intend to use.
3. Set the IgnoreCase property to true on the RegExp object. This provides for
case-insensitive searches. Comment this out or control it via a parameter to
the stored procedure if you want to perform case-sensitive matches.
4. Call the Test method on the RegExp object. Test checks a supplied string
against the previously specified pattern to see whether they match and returns
a Boolean indicating the result.
5. Destroy the object. Given that, as I've mentioned, ODSOLE frees allocated
objects automatically, this isn't technically necessary, but it's still a good
practice.
(Results)
-----------
1
-----------
0
-----------
1
-----------
1
-----------
1
I've mainly focused on pattern matches that would be difficult if not impossible to do
using standard T-SQL LIKE and PATINDEX wildcards. RegExp supports many other
regular expression search terms. See the VBScript documentation or MSDN for
additional details.
Another facility that provides a full-featured regular expression search engine is the
.NET Framework. Given that the Framework also supports wrapping managed
classes such that they can be accessed via COM, you can create managed objects
that are accessible from T-SQL via ODSOLE.
Note that calling managed code from within SQL Server 2000 and earlier is
unsupported by Microsoft. The ODSOLE facility and the .NET Framework have not
been tested for interoperability with one another, so you may encounter problems
that you won't be able to take to Microsoft Product Support Services.
That said, the vast functionality provided by the .NET Framework is hard to pass up,
especially when it's so easy to get at using ODSOLE. In the sample code that follows,
we'll create a managed class in C# that encapsulates the .NET Framework regular
expression facility, then we'll publish and register this for use with COM via the .NET
Framework's COM Interop technology so that we can access it from T-SQL via the
sp_OA procs.
Let's begin with Listing 15.4, the source code to our managed class. (You can find
the complete source to this example in the SQLRegExLib subfolder in the CH15
folder on the CD accompanying this book.)
using System;
using System.Text.RegularExpressions;
namespace SQLRegExLib
{
public interface IRegEx
{
bool IsMatch(string Expression, string MatchString);
}
/// <summary>
/// Summary description for Class1.
/// </summary>
public class SQLRegEx : IRegEx
{
public SQLRegEx()
{
}
public bool IsMatch(string Expression, string MatchString)
{
Regex regex = new Regex(Expression,RegexOptions.Compiled |
RegexOptions.IgnoreCase);
if (null!=regex) return regex.IsMatch(MatchString);
else throw new Exception("Unable to create Regex object");
}
}
}
This class exposes a single method, IsMatch, that takes two parameters: the pattern
string and the string to search for a match. Inside the method, IsMatch creates an
instance of the .NET Framework's Regex class, then calls its IsMatch method to
determine whether the pattern and string match.
The key requirement that the class must meet in order to be accessible from COM is
that it must be a public class. It must also implement a default (parameterless
constructor). Additionally, though not required, this particular module also defines a
public interface which the class then implements. Coding an explicit interface and
implementing it in your public classes makes accessing those classes easier from
COM.
1. Create a new Windows Class Library project and add this class to it.
3. Copy the DLL to the binn folder under the SQL Server startup folder. Since we
are not going to sign the assembly (the DLL) with a strong name and install it
into the Global Assembly Cache, it must reside in the startup folder of its caller.
Since we will be calling it via ODSOLE from SQL Server, the startup folder is the
folder in which sqlservr.exe resides.
4. Register the assembly and its type library via the regasm.exe command line
tool that comes with the .NET Framework. Registering a component and
registering its type library are two distinct operations that you must complete
separately. (See the regasm.exe help for details.) This will make the necessary
entries in the system registry so that the class can be accessed from COM via
its ProgID or class ID.
5. Call the sp_OA procedures to instantiate and manipulate the object, just like
any other COM object.
When you register a managed class via regasm.exe, the main .NET Framework DLL,
mscoree.DLL, is set up as the server for the COM object. This differs from
unmanaged code in which the DLL that hosts the COM object functions as the server.
With managed classes, the main .NET Framework DLL itself functions as the COM
server, and the managed code assembly is referenced via the Assembly key under
the object's entry in the system registry.
Listing 15.5 presents a stored procedure that wraps the calls to our managed code
component once it's registered for use by COM.
This procedure works similarly to the procedure we created for the RegExp object we
explored earlier and supports the same types of regular expressions.
Note that the virtual memory footprint of this version of the regular expression code
is likely to be much larger than the RegExp-based code we built earlier. That's
because you're not just loading the code for the SQLRegEx component into the SQL
Server process space; you're loading the .NET Framework, which the component
requires, into that process space as well. As with loading any other type of large DLL,
this could cause virtual memory address contention and fragmentation issues, as we
discussed in Chapter 4.
Once again, this technique has not been tested by Microsoft and is not supported. I
provide it here for instructional value only.
While the ability to call a COM object via a stored procedure is certainly handy, I'm
sure some of you are wondering whether you could wrap COM object functionality in
a user-defined function for use in T-SQL queries. Wouldn't it be nice to be able to use
a regular expression in a WHERE clause to filter a SELECT statement? Of course it
would. Here's a function that demonstrates how to do that (Listing 15.6).
This code does several interesting things. First, note the use of the
system_function_schema pseudo-user to create a system function. A system
function is a function that's available from any database context without requiring a
fully qualified name. As I documented in my book The Guru's Guide to SQL Server
Stored Procedures, XML, and HTML, two steps are required in order to make a
function a system function: It must be created in the master database with an owner
of system_function_schema while allow updates is enabled, and its name must begin
with fn_. I'm creating our regular expression function as a system function because
it's naturally something that would be useful system-wide. It deserves to be a
system function by virtue of its usefulness alone.
Second, note the fact that we call the sp_OA procs directly from our function. If
you've done much UDF coding, you're probably aware of the fact that you can't call
regular stored procedures from a UDF. Fortunately for us, although the sp_OA procs
are prefixed with sp_, they're actually extended procedures, which you can call from
a UDF. Equally fortunate is the fact that they aren't "spec procs"�extended
procedures implemented internally by the server. Their entry points are in
ODSOLE70.DLL, so they're callable from a UDF just like any other regular xproc.
The code in this function closely mirrors that of the stored proc we created earlier to
access the VBScript RegExp object. We create the object, set some properties, then
call the Test method to see whether we have a match.
As Listing 15.7 illustrates, once we've wrapped our regular expression functionality
in a UDF, we can use it to filter a query.
use pubs
go
SELECT *
FROM authors
WHERE fn_regex('G.*',au_lname)<>0
(Results abridged)
I'm sure you can think of other COM objects you might like to wrap in a UDF for use
in queries. As you can see, it's not difficult to set up a system function to make the
functionality in a COM object available across SQL Server.
USE master
GO
IF (OBJECT_ID('sp_exporttable') IS NOT
NULL) DROP PROC sp_exporttable GO
Object: sp_exporttable
Description: Exports a table in a manner
similar to BULK INSERT
*/
AS
END
IF (RIGHT(@outputpath,1)<>'\') SET
@outputpath=@outputpath+'\' -- Append
a "\" if necessary SET
@logname=@outputpath+@table+'.LOG'
-- Construct log file name SET
@errname=@outputpath+@table+'.ERR' -
- Construct error file name
IF (@outputname IS NULL)
SET
@outputname=@outputpath+@table+'.B
CP' -- Construct output name ELSE
IF (CHARINDEX('\',@outputname)=0) SET
@outputname=@outputpath+@outputna
me
EXEC @hr=sp_OACreate
'SQLDMO.SQLServer', @srvobject OUTPUT
IF (@hr <> 0) GOTO ServerError
EXEC @hr=sp_OACreate
'SQLDMO.BulkCopy', @bcobject OUTPUT
-- Set BulkCopy's LogFilePath property to
the log file name EXEC @hr =
sp_OASetProperty @bcobject,
'LogFilePath', @logname IF (@hr <> 0)
GOTO BCPError
IF (@trustedconnection=1) BEGIN
-- Get a pointer from the Database object's
Tables collection -- for the table
IF
(OBJECTPROPERTY(OBJECT_ID(@table),'IsT
able')=1) BEGIN
END
-- Call the object's ExportData method to
export the table/view -- using BulkCopy
Error:
GOTO ErrorCleanUp
BCPError:
GOTO ErrorCleanUp
ServerError:
GOTO ErrorCleanUp
ErrorCleanUp:
RETURN -2
Help:
EXEC sp_usage
@objectname='sp_exporttable',
@desc='Exports a table in a manner
similar to BULK INSERT', @parameters='
@author='Ken Henderson',
@email='[email protected]',
@datecreated='19990614',@datelastchan
ged='20001201', @example='EXEC
sp_exporttable ''authors'', ''C:\TEMP\''',
@returns='Number of rows exported'
RETURN -1
GO
EXEC @rc=pubs..sp_exporttable
@table='pubs..authors',
@outputpath='c:\temp\'
SELECT RowsExported=@rc
RowsExported
------------
23
USE pubs
GO
EXEC @rc=sp_exporttable
@table='pubs..authors',
@outputpath='c:\temp\'
GO
SELECT RowsExported=@rc
USE master
GO
IF OBJECT_ID('sp_generate_script','P') IS
NOT NULL
[, @password='password'][,
@trustedconnection=1]
Returns: (None) $Author: Ken Henderson
$. Email: [email protected] $Revision: 8.0 $
Example: sp_generate_script
@objectname='authors',
@outputname='authors.sql'
*/
GO
-- SQLDMO_SCRIPT_TYPE vars
DECLARE
@SQLDMOScript_ObjectPermissions int
DECLARE @SQLDMOScript_PrimaryObject
int DECLARE
@SQLDMOScript_ClusteredIndexes int
DECLARE @SQLDMOScript_Triggers int
DECLARE
@SQLDMOScript_DatabasePermissions int
DECLARE @SQLDMOScript_Permissions int
DECLARE @SQLDMOScript_ToFileOnly int
DECLARE @SQLDMOScript_Bindings int
DECLARE @SQLDMOScript_AppendToFile
int DECLARE @SQLDMOScript_NoDRI int
DECLARE
@SQLDMOScript_UDDTsToBaseType int
DECLARE
@SQLDMOScript_IncludeIfNotExists int
DECLARE
@SQLDMOScript_NonClusteredIndexes int
DECLARE @SQLDMOScript_Indexes int
DECLARE
@SQLDMOScript_NoCommandTerm int
DECLARE @SQLDMOScript_DRIIndexes int
DECLARE
@SQLDMOScript_IncludeHeaders int
DECLARE @SQLDMOScript_OwnerQualify
int DECLARE
@SQLDMOScript_TimestampToBinary int
DECLARE @SQLDMOScript_SortedData int
DECLARE
@SQLDMOScript_SortedDataReorg int
DECLARE @SQLDMOScript_TransferDefault
int DECLARE
@SQLDMOScript_DRI_NonClustered int
DECLARE @SQLDMOScript_DRI_Clustered
int DECLARE @SQLDMOScript_DRI_Checks
int DECLARE
@SQLDMOScript_DRI_Defaults int
DECLARE
@SQLDMOScript_DRI_UniqueKeys int
DECLARE
@SQLDMOScript_DRI_ForeignKeys int
DECLARE
@SQLDMOScript_DRI_PrimaryKey int
DECLARE @SQLDMOScript_DRI_AllKeys int
DECLARE
@SQLDMOScript_DRI_AllConstraints int
DECLARE @SQLDMOScript_DRI_All int
DECLARE
@SQLDMOScript_DRIWithNoCheck int
DECLARE @SQLDMOScript_NoIdentity int
DECLARE
@SQLDMOScript_UseQuotedIdentifiers int
-- SQLDMO_SCRIPT2_TYPE vars
DECLARE @SQLDMOScript2_Default int
DECLARE @SQLDMOScript2_AnsiPadding
int DECLARE @SQLDMOScript2_AnsiFile int
DECLARE @SQLDMOScript2_UnicodeFile
int DECLARE @SQLDMOScript2_NonStop
int
DECLARE @SQLDMOScript2_MarkTriggers
int DECLARE
@SQLDMOScript2_OnlyUserTriggers int
DECLARE @SQLDMOScript2_EncryptPWD
int DECLARE
@SQLDMOScript2_SeparateXPs int
-- SQLDMO_SCRIPT_TYPE values
SET @SQLDMOScript_Default = 4
SET @SQLDMOScript_Drops = 1
SET @SQLDMOScript_ObjectPermissions =
2
SET @SQLDMOScript_PrimaryObject = 4
SET @SQLDMOScript_ClusteredIndexes =
8
SET @SQLDMOScript_Triggers = 16
SET
@SQLDMOScript_DatabasePermissions =
32
SET @SQLDMOScript_Permissions = 34
SET @SQLDMOScript_ToFileOnly = 64
SET @SQLDMOScript_UDDTsToBaseType =
1024
SET @SQLDMOScript_IncludeIfNotExists =
4096
SET
@SQLDMOScript_NonClusteredIndexes =
8192
SET @SQLDMOScript_NoCommandTerm =
32768
SET @SQLDMOScript_IncludeHeaders =
131072
SET @SQLDMOScript_OwnerQualify =
262144
SET @SQLDMOScript_TimestampToBinary
= 524288
SET @SQLDMOScript_SortedData =
1048576
SET @SQLDMOScript_SortedDataReorg =
2097152
SET @SQLDMOScript_TransferDefault =
422143
SET @SQLDMOScript_DRI_NonClustered =
4194304
SET @SQLDMOScript_DRI_Clustered =
8388608
SET @SQLDMOScript_DRI_Checks =
16777216
SET @SQLDMOScript_DRI_Defaults =
33554432
SET @SQLDMOScript_DRI_UniqueKeys =
67108864
SET @SQLDMOScript_DRI_ForeignKeys =
134217728
SET @SQLDMOScript_DRI_PrimaryKey =
268435456
SET @SQLDMOScript_DRI_AllKeys =
469762048
SET @SQLDMOScript_DRI_AllConstraints =
520093696
SET @SQLDMOScript_DRI_All =
532676608
SET @SQLDMOScript_DRIWithNoCheck =
536870912
SET @SQLDMOScript_NoIdentity =
1073741824
SET
@SQLDMOScript_UseQuotedIdentifiers =
-1
-- SQLDMO_SCRIPT2_TYPE values
SET @SQLDMOScript2_Default = 0
SET @SQLDMOScript2_AnsiPadding = 1
SET @SQLDMOScript2_AnsiFile = 2
SET @SQLDMOScript2_UnicodeFile = 4
SET @SQLDMOScript2_NonStop = 8
SET @SQLDMOScript2_NoFG = 16
SET @SQLDMOScript2_MarkTriggers = 32
SET @SQLDMOScript2_OnlyUserTriggers =
64
SET @res=0
SET
@dbname=ISNULL(PARSENAME(@objectn
ame,3),DB_NAME()) -- Extract the DB
name; default to current
SET
@objectname=PARSENAME(@objectname,
1) -- Remove extraneous -- stuff from table
name IF (@objectname IS NULL) BEGIN
END
END
EXEC @hr=sp_OACreate
'SQLDMO.SQLServer', @srvobject OUTPUT
END
IF (@trustedconnection=1) BEGIN
EXEC @hr = sp_OASetProperty
@srvobject, 'LoginSecure', 1
ELSE BEGIN
ELSE BEGIN
END
END
EXEC @hr=sp_OACreate
'SQLDMO.Transfer', @tfobject OUTPUT
SET
@scriptoptions=@SQLDMOScript_OwnerQ
ualify |
@SQLDMOScript_Default |
@SQLDMOScript_Triggers |
@SQLDMOScript_Bindings |
@SQLDMOScript_Permissions |
@SQLDMOScript_Indexes |
@SQLDMOScript_DRI_Defaults --|
@SQLDMOScript_NoDRI IF
@includeheaders=1 SET
@scriptoptions=@scriptoptions |
@SQLDMOScript_IncludeHeaders END
ELSE BEGIN
OPEN ObjectList
SET @obcode=POWER(2,
(CHARINDEX(@obtype+'
',@OBJECT_TYPES)/3))
@SQLDMOScript_IncludeHeaders END --
ELSE IF (@objectname IS NULL)
SET
@itemname='Databases.Item("'+@dbnam
e+'")'
SET
@scriptoptions=@SQLDMOScript_NoDRI |
@SQLDMOScript_DRI_PrimaryKey |
@SQLDMOScript_DRI_UniqueKeys |
@SQLDMOScript_AppendToFile |
@SQLDMOScript_OwnerQualify IF
@includeheaders=1 SET
@scriptoptions=@scriptoptions |
@SQLDMOScript_IncludeHeaders
EXEC @hr = sp_OAMethod @object,
'ScriptTransfer',NULL, @tfobject,
2,@outputname IF @hr <> 0 BEGIN
@SQLDMOScript_DRI_ForeignKeys |
@SQLDMOScript_DRI_Checks |
@SQLDMOScript_DRI_Defaults |
@SQLDMOScript_AppendToFile |
@SQLDMOScript_OwnerQualify IF
@includeheaders=1 SET
@scriptoptions=@scriptoptions |
@SQLDMOScript_IncludeHeaders
IF (@resultset=1) BEGIN
GOTO FreeAll
ServerError:
SET @res=-1
FreeAll:
FreeSrv:
RETURN @res
GO
USE Northwind
GO
EXEC sp_generate_script 'Customers',
@server='khenmp\ss2000'
Column1
-----------------------------------------------------------
-------
GO
[DF__Customers__rowgu__0EF836A4]
DEFAULT (newid()), CONSTRAINT
[PK_Customers] PRIMARY KEY CLUSTERED
(
[CustomerID]
) ON [PRIMARY]
) ON [PRIMARY]
GO
(1 row(s) affected)
<span class="docEmphStrong">NOTE:
Ignore the code displayed above. It's a
remnant of the</span> <span
class="docEmphStrong">SQL-DMO
method used to produce the script
file</span> line
-----------------------------------------------------------
-------
[DF__Customers__rowgu__0EF836A4]
DEFAULT (newid()), CONSTRAINT
[PK_Customers] PRIMARY KEY CLUSTERED
[CustomerID]
) ON [PRIMARY]
) ON [PRIMARY]
GO
In the results section of the listing,
everything above the PRINT message
(bolded) is spurious output that you can
safely ignore. The output below the
message is the actual script. Here, I've
scripted the Customers table from the
Northwind database. I could just as easily
have scripted the entire database or
supplied a mask to generate a script or
several at once.
'String functions
Public Function VBFormat(vExpr As
Variant, strFormat As String) As String
VBFormat = Format(vExpr, strFormat) End
Function
Next
Else
VBScriptRegEx = 0
End If
End Function
Public Function
VBScriptRegExTest(strPattern As String,
strMatch As String) As Boolean Dim regEx,
Match, Matches Set regEx =
CreateObject("VBScript.RegExp")
regEx.Pattern = strPattern
regEx.IgnoreCase = True
VBScriptRegExTest = regEx.Test(strMatch)
End Function
' Misc
Public Function VBPmt(nRate As Double,
nPer As Double, nPV
' Routines
Public Sub
VBAppActivateAndSendKeys(strTitle As
String, strKeys As String, Optional bWait
As Variant) AppActivate strTitle,
IIf(IsMissing(bWait), False, bWait)
SendKeys strKeys, IIf(IsMissing(bWait),
False, bWait) End Sub
exec @hr=sp_OACreate
'VBODSOLE.VBODSOLELib', @obj OUT
END
END
select @pos
END
-----------
46
Else
lGlobalArraySize = lGlobalArraySize + 1
End If
ReDim Preserve
GlobalArray(lGlobalArraySize)
GlobalArray(lGlobalArraySize) = vArray()
VBCreateArray = lGlobalArraySize End
Function
Else
lGlobalArraySize = lGlobalArraySize + 1
End If
ReDim Preserve
GlobalArray(lGlobalArraySize)
GlobalArray(lGlobalArraySize) = Split(strIn,
IIf(IsMissing(strDelim), " ", strDelim))
VBCreateArraySplit = lGlobalArraySize End
Function
exec @hr=sp_oacreate
'VBODSOLE.VBODSOLELib', @obj OUT
END
exec @hr=sp_oamethod @obj,
'VBSetArray', NULL, @arr, 3, 'foo'
SELECT @val
SELECT @len
Cleanup:
------------------------------
foo
-----------
10
GO
GO
GO
DROP FUNCTION
system_function_schema.fn_createobject,
system_function_schema.fn_destroyobject
, system_function_schema.fn_createarray,
system_function_schema.fn_setarray,
system_function_schema.fn_getarray,
system_function_schema.fn_destroyarray,
system_function_schema.fn_arraylen,
system_function_schema.fn_listarray GO
CREATE FUNCTION
system_function_schema.fn_createobject()
RETURNS int
AS
BEGIN
RETURN(@obj) END
GO
CREATE FUNCTION
system_function_schema.fn_destroyobject
(@obj int) RETURNS int
AS
BEGIN
DECLARE @hr int exec
@hr=sp_OADestroy @obj RETURN(@hr)
END
GO
CREATE FUNCTION
system_function_schema.fn_createarray(
@obj int, @size int) RETURNS int
AS
BEGIN
RETURN(@hdl) END
GO
CREATE FUNCTION
system_function_schema.fn_destroyarray(
@obj int, @hdl int) RETURNS int
AS
BEGIN
RETURN 0
END
GO
CREATE FUNCTION
system_function_schema.fn_setarray(@obj
int, @hdl int, @index int, @value
sql_variant) RETURNS int
AS
BEGIN
DECLARE @hr int
RETURN 0
END
GO
CREATE FUNCTION
system_function_schema.fn_getarray(@ob
j int, @hdl int, @index int) RETURNS
sql_variant
AS
BEGIN
RETURN(@valuestr)
END
GO
CREATE FUNCTION
system_function_schema.fn_arraylen(@obj
int, @hdl int) RETURNS int
AS
BEGIN
RETURN @len
END
GO
CREATE FUNCTION
system_function_schema.fn_listarray(@obj
int, @hdl int) RETURNS @array TABLE (idx
int, value sql_variant) AS
BEGIN
END
RETURN
END
GO
GO
GO
SET @hdl=fn_createarray(@obj,@siz)
SELECT @hdl, fn_arraylen(@obj,@hdl)
SELECT
@res=fn_setarray(@obj,@hdl,1,'test1'),
@res=fn_setarray(@obj,@hdl,10,'test10'),
@res=fn_setarray(@obj,@hdl,998,'test998
'),
@res=fn_setarray(@obj,@hdl,1000,'test10
00')
-- Get element 10
SELECT fn_getarray(@obj,@hdl,10)
SELECT fn_getarray(@obj,@hdl,998)
SET @res=fn_destroyarray(@obj,@hdl)
SET @res=fn_destroyobject(@obj)
----------- -----------
1 1000
-----------------------------------------------------------
-------
test10
-----------------------------------------------------------
-------
test998
idx value
----------- -----------------------------------------------
-------
1 test1
10 test10
998 test998
1000 test1000
DECLARE @o int, @h int, @res int,
@arraybase int
SELECT @h=fn_createarray(@o,1000),
@arraybase=10247
WHERE idx=10249-@arraybase
SET @res=fn_destroyobject(@o)
OrderId OrderDate
----------- -----------------------------------------------
-------
10249 NULL
Here, we load the OrderDate column from
the Northwind Orders table into our array
using a SELECT statement and our
fn_setarray function. Notice how we're
able to load the entire table with a single
SELECT statement. We then query the
array like a table using the fn_listarray
table-value function and filter the query
using the array index.
SQL Server's ODSOLE facility uses late binding to interact with COM objects. This
means that it makes calls to the IDispatch COM interface, similarly to traditional
scripting languages such as VBScript and JScript.
ODSOLE makes use of the STA threading model. When first initialized, it creates the
main STA if it has not already been created. In the STA model, access from other
apartments is coordinated via Windows messages.
You create COM object instances from Transact-SQL using sp_OACreate, and you
destroy them via sp_OADestroy. Any objects that aren't destroyed when the batch
completes are automatically released by one of the two batch termination handlers
ODSOLE sets up when an sp_OA proc is first called on a worker thread.
Knowledge Measure
1. What is the current preferred term for OLE Automation?
2. True or false: The sp_OA family of extended procedures are actually "spec
procs"�their entry points do not reside in an external DLL.
3. What well-known COM interface does sp_OA use to invoke a method on a COM
object?
4. How does ODSOLE react when an array value is returned from an sp_OA call?
7. True or false: On SQL Server 2000 and earlier, using the sp_OA procedures to
instantiate managed code classes that have been published as COM objects is
not supported by Microsoft.
10. What must a T-SQL coder do in order to force a given COM component to
attempt to start out-of-process?
11. True or false: >The pointer returned by sp_OACreate is the address of the
newly created COM object.
12. What object does the following dot notation refer to:
'Databases.Items("pubs")'?
13. What is the maximum number of MTAs that a single process can support?
14. What type of connection must a stored procedure establish if it wants to make
most SQL-DMO calls against the server on which it's running?
15. When a managed class is compiled into a DLL assembly and registered for use
with COM via the regasm.exe tool, what DLL is actually the host insofar as COM
is concerned?
18. How is the threading model for an in-process COM server set by default?
19.
True or false: Unlike Visual Basic, SQL Server's COM Automation facility does
not support named parameters.
20. Is it possible to change the COM threading model of an existing thread without
first uninitializing COM on that thread?
21. Describe how ODSOLE reacts when an output value is returned from an
sp_OAMethod call but no output parameter is specified.
22. How would ODSOLE handle an array of structs returned from an sp_OAMethod
call?
23. What well-known COM interface method does ODSOLE call in order to invoke a
method on a COM object that has been late bound?
24. True or false: Given that their names start with sp_, the Transact-SQL parser
does not allow the sp_OA procs to be called from UDFs because it mistakes
them for regular stored procedures.
25. True or false: In order to be used by a COM client, a COM object created by
exposing and registering a public managed class and interface must be
installed into the Global Assembly Cache.
Chapter 16. Full-Text Search
I have never imputed to Nature a purpose or a goal, or anything that could be
understood as anthropomorphic. What I see in Nature is a magnificent
structure that we can comprehend only very imperfectly, and that must fill a
thinking person with a feeling of humility. This is a genuinely religious feeling
that has nothing to do with mysticism.
�Albert Einstein[1]
[1] Einstein, Albert, as quoted by Helen Dukas in Albert Einstein, the Human Side. Princeton, NJ: Princeton University Press,
1981, p. 39.
In this chapter, we'll talk about SQL Server's Full-Text Search (FTS) facility and the
engine behind it, the Microsoft Search service. We'll talk about how FTS works and
explore how you can use full-text queries to retrieve data using sophisticated search
criteria.
Support for full-text searching was first added in SQL Server 7.0 and hasn't changed
much since then. For data in SQL Server tables, FTS provides much of the
functionality typically found in standalone indexing products such as the Microsoft
Indexing Service (an operating system file-based search engine). It allows these
tables to be searched using more sophisticated terms than is possible with standard
Transact-SQL.
This chapter updates the coverage of FTS in my book The Guru's Guide to Transact-
SQL. As with other chapters in this book where I've updated what I said on a
particular topic in the past, I've tried to strike a balance between describing the
architectural layout of the technology and providing up-to-date instruction on how to
put it to practical use. Understanding how a technology works is the key to getting
the most out of it. That said, throughout this book I've tried to get beyond the
conceptual and show how to put what we know about a technology's design to work.
I don't think there's a better way to solidify new learning than to put it to practical
use.
Overview
The ability to search character and text fields is nothing new in the world of SQL
databases. For years, DBMSs have provided facilities for searching character strings
and fields for other strings. However, these facilities are usually rudimentary at best.
Prior to the advent of full-text search support, SQL Server's built-in text searching
tools were more of the garden-variety type�just beyond ANSI-compliance, but
nothing to write home about. You could perform equality tests using character
strings (as with all data types), and you could search for a pattern within a string
(using LIKE and PATINDEX), but you couldn't do anything sophisticated such as
search by word proximity or inflectional usage.
The addition of native full-text indexing support changed this. Traditionally, database
architects who wanted advanced text searching had to rely on database gateways,
operating system files, and technologies external to SQL Server. That's no longer the
case. The Microsoft Search service provides the functionality of a full-blown text
search engine such as Microsoft Indexing Service within the SQL Server
environment. It's used to build the metadata necessary to support full-text searching
and to process full-text search queries. The service itself runs only on the server
version of the Windows NT operating system family (Windows NT Server and
Advanced Server, Windows 2000 Server and Advanced Server, Windows Server
2003, and so on) and can be accessed by SQL Server clients on Windows 9x,
Windows ME, Windows 2000 Professional, and Windows XP.
Architectural Details
The data maintained by Microsoft Search�the full-text indexes and catalog
information it uses to service queries�is not stored in regular system tables and
can't be accessed directly from SQL Server. It's stored in operating system files and
is accessible only by the service itself and by NT administrators. By default, these
files are located in the FTDATA folder under your root SQL Server installation path.
These files are not backed up by regular database backups, so you must back them
up separately (and synchronize these backups with the backup of their
corresponding metadata within SQL Server) if you want to protect them from
catastrophic loss.
It's helpful to think of Microsoft Search as a text server in the same way that SQL
Server is a SQL or database server�it receives queries and in structions related to
full-text searching and returns results appropriately. Its one client is SQL Server,
which is how you access it.
Communication between SQL Server and Microsoft Search occurs via a full-text
provider. This provider resides in SQLFTQRY.DLL in the binn folder under your default
SQL Server installation. SQLFTQRY provides both administration and full-text query
services to SQL Server. The sp_fulltext_... system procedures interact with it via the
undocumented DBCC CALLFULLTEXT command to carry out administrative tasks
related to full-text indexes and Microsoft Search. The call interface to DBCC
CALLFULLTEXT looks like this: DBCC CALLFULLTEXT(funcid[,catid][,objid][,sub])
DBCC CALLFULLTEXT requires one parameter and supports three additional optional
ones: funcid specifies what function to perform and what parameters are valid, catid
is the full-text catalog ID, objid is the object ID of the affected object, and sub is the
subfunction ID if there is one. Note that CALLFULLTEXT is only valid within a system
stored procedure. This procedure must have its system bit set (using the
undocumented procedure sp_MS_marksystemobject) and its name must begin with
sp_fulltext_. Table 16.1 lists the supported functions.
USE master
GO
IF OBJECT_ID('sp_fulltext_resource') IS NOT NULL
DROP PROC sp_fulltext_resource
GO
CREATE PROC sp_fulltext_resource @value int -- value for
-- 'resource_usage'
AS
DBCC CALLFULLTEXT(13,@value) -- FTSetResource (@value)
IF (@@error<>0) RETURN 1
-- SUCCESS --
RETURN 0 -- sp_fulltext_resource
GO
15 Sets the data (query) timeout for FT Data (query) timeout value in
requests to SQL Server seconds (1�32767)
As a rule, you shouldn't call DBCC CALLFULLTEXT in your own code. The function IDs
and parameters listed above could change between releases (this actually happened
between the 7.0 and 2000 releases of SQL Server), and you could do serious
damage to your full-text installation by calling a destructive function by accident.
There's no good reason not to use Enterprise Manager and the sp_fulltext_...
procedures to manage the Microsoft Search service and your full-text indexes. I
document the above so that you can understand how SQL Server's full-text
searching facility works. I don't recommend that you make use of DBCC
CALLFULLTEXT in production code.
Microsoft Search can search only SQL Server data. The full-text indexes you build
cover SQL Server data exclusively�you can't use them to search operating system
files. You can, however, make use of the Microsoft Indexing Service and the OLE DB
provider it supplies for performing searches against operating system files. And, via
a linked server or distributed query, you can access this provider from T-SQL and
even use it in joins with regular full-text search queries against SQL Server objects.
Using SQL Server's full-text search rowset and predicate functions coupled with
linked server or OPENQUERY/OPENROWSET references that make use of the
Microsoft OLE DB Provider for Indexing Service, you can combine the results from
full-text queries within SQL Server with full-text searches against files outside of SQL
Server.
Full-Text Searches on Binary Data
Because they can contain characters that are invalid in SQL Server character data
types such as text, char, and nchar, data files from products such as Microsoft Word
and Microsoft Excel cannot be stored in regular SQL Server character columns.
Instead, they must be stored in image columns if you want to be able to store any
byte value a file might contain and want to be able to store files larger than 8K
(binary and varbinary columns can be used for files smaller than 8K).
SQL Server can create a full-text index over an image column and automatically
recognize certain types of external data. It does this via file extension�based
"filters." A filter simply identifies a particular type of binary data using a file
extension. Normally, this file extension would be part of an external file name.
However, within a SQL Server table, you associate a file extension with a particular
image column value by setting up a column in the same table to contain the file
extension for each row's binary data. You then supply this column name as the
Document type column when you set up the table for full-text indexing in the Full-
Text Indexing Wizard. (You can also specify this column via sp_fulltext_column's
@type_colname parameter if you are creating the full-text index via stored
procedures.) Currently, SQL Server supports five filter types: .DOC (Microsoft Word),
.XLS (Microsoft Excel), .PPT (Microsoft PowerPoint), .TXT (text files), and .HTM
(Hypertext Markup Language). A table that stores binary file data using this
technique can store a different format in each row. For example, you could have a
Word document in one row, followed by an Excel document in another, and an HTML
file in yet another. Each row's file extension column would identify the type of data it
contained and direct Microsoft Search to use the appropriate filter.
Once an image type with the appropriate filter has been properly indexed, you can
execute full-text queries over it just as you would any other type of column. The full-
text indexes created over these types of columns are the same as those created
over more traditional character columns.
TIP: Using a DTS package and a Read File transformation is an excellent way to load
data from external sources such as Word documents and Excel spreadsheets into a
SQL Server database so that it can be full-text indexed. A Read File transformation
can load the contents of an operating system file into a column in a table. You set up
a table that lists the file names to load, then configure the transformation to read
the files listed in the table and post their contents to a column in the destination
table. It's trivial to set up and allows you to easily load large numbers of files into a
database table. See Chapter 20 on DTS for more information on Read File
transformations.
USE pubs
SET @tablename='pub_info'
SET @catalogname='pubsCatalog'
SET @indexname='UPKCL_pubinfo'
SET @columnname='pr_info'
-- STEP 3: Create a full-text index for the
table EXEC sp_fulltext_table
@tablename,'create',@catalogname,@ind
exname
USE master
GO
IF OBJECT_ID('sp_enable_fulltext') IS NOT
NULL
/*
Object: sp_enable_fulltext
*/
AS
SET NOCOUNT ON
IF (@tablename='/?') OR (@columnname
IS NULL) OR
IF
(FULLTEXTSERVICEPROPERTY('IsFulltextIns
talled')=0) BEGIN -- Search engine's not
installed RAISERROR('The Microsoft Search
service is not installed on server
%s',16,10,@@SERVERNAME) RETURN -1
END
IF (UPPER(@startserver)='YES')
IF (@catalogname IS NULL)
SET
@catalogname=DB_NAME()+'Catalog'
INSERT #indexes
IF
(DATABASEPROPERTY(DB_NAME(),'IsFulltex
tEnabled')<>1) -- Enable FTS for the
database EXEC sp_fulltext_database
'enable'
SET
@catalogstatus=FULLTEXTCATALOGPROPE
RTY(@catalogname, 'PopulateStatus')
IF
(OBJECTPROPERTY(OBJECT_ID(@tablenam
e), 'TableHasActiveFullTextIndex')=0) --
Create full-text index -- if not already
present EXEC sp_fulltext_table
@tablename,'create',@catalogname,
@indexname
ELSE
EXEC sp_fulltext_table
@tablename,'deactivate' -- Deactivate it --
so we can make changes to it
IF
(COLUMNPROPERTY(OBJECT_ID(@tablena
me),@columnname,
'IsFulltextIndexed')=0) BEGIN -- Add the
column to the index EXEC
sp_fulltext_column @tablename,
@columnname, 'add'
'.'+@columnname+' in database
'+DB_NAME()
END ELSE
PRINT 'Column '+@columnname+' in
table '+DB_NAME()+'.'+
EXEC sp_fulltext_table
@tablename,'activate'
RETURN 0
Help:
EXEC sp_usage
@objectname='sp_enable_fulltext',@desc
='Enables full-text indexing for a specified
column',
@parameters='@tablename=name of
host table, @columnname=column to set
up,
RETURN �1
sp_enable_fulltext 'pub_info','pr_info'
Successfully added a full-text index for
pub_info.pr_info in database pubs
Before we begin exploring these functions via code, let's enable full-text searching
on the Employees table in the Northwind database. Employees includes a Notes text
column that's ideal for full-text searching. You can use the sp_enable_fulltext
procedure discussed above to set it up, like so:
This should create the necessary metadata and indexing information to allow the
full-text search functions to work properly.
CONTAINS locates rows that contain a word or words or variations of them. It can
perform exact and inexact word locations, word proximity searches, and inflectional
searches. You can think of it as the LIKE predicate on steroids. Listing 16.4 presents
an example that uses CONTAINS to find all the people in the Employees table whose
Notes fields mention the word "English."
(Results abridged)
Note that since we're searching all the full-text index columns in Employees (there's
only one), we could have substituted * for the column name and achieved the same
result, like this:
SELECT LastName, FirstName, Notes
FROM EMPLOYEES
WHERE CONTAINS(*,'English')
CONTAINS supports word proximity searches as well. Listing 16.5 shows a refinement
of the last example that narrows the employees listed to those whose Notes fields
contain the word "degree" located near the word "English."
(Results abridged)
This time, only two rows are listed because Margaret Peacock's Notes field doesn't
contain the word "degree" at all. Note that the tilde character (~) is synonymous
with NEAR, so we could rewrite the beginning of Listing 16.5 like this:
The search condition string also supports Boolean expressions, as shown in Listing
16.6.
(Results abridged)
This query returns those rows containing the words "English" or "German." The exact
or relative positions of the words are unimportant�if either of the words appear
anywhere in the Notes column, the row is returned.
In this use, CONTAINS behaves similarly to LIKE, but there's one important
difference�CONTAINS is sensitive to word boundaries; LIKE isn't. For example, here's
the query rewritten to use LIKE:
It looks similar, but this query doesn't really ask the same question as the CONTAINS
query. It will find matches with variations of the search words and even with words
that happen to contain them (e.g., Germantown, Englishman, Germanic, Burgerman,
and so on). The CONTAINS query, by contrast, is word-savvy�it knows the difference
between English and Englishman and is smart enough to return only what you ask
for.
(Result abridged)
This query locates all rows with Notes fields containing words that begin with "psy"
or "chem." Note that neither single-character wildcards nor wildcards that appear at
the start of a search term are supported.
Quotes are used within the condition string to delineate search strings from one
another. When wildcards and multiple terms are present in the search criteria string,
quotes are required; omitting them will cause the query to fail.
A really powerful aspect of CONTAINS is its support for inflectional searches. The
ability to search based on word forms is a potent and often very useful addition to
the Transact-SQL repertoire. Listing 16.8 illustrates how to search for the forms of a
word.
(Results abridged)
You can use the FORMSOF clause to locate the different tenses of a verb as well as
the singular and plural forms of a noun. In this case, the code finds five rows that
contain forms of the word "complete" including "completed" and "completing."
FREETEXT is useful for locating rows containing words that have the same basic
meaning as those in a search string. Unlike CONTAINS, FREETEXT allows you to
specify a series of terms that are then weighted internally and matched with values
in the full-text column(s). Listing 16.9 presents an example that locates employees
with college degrees, especially bachelor's degrees.
(Results abridged)
Here, any row containing any of the terms or similar words are returned. As with
CONTAINS, * is used to signify all full-text indexed columns in the table.
Rowset Functions
Transact-SQL defines a special class of functions called rowset functions that can be
used in place of tables in the FROM clauses of queries. Rowset functions return result
sets in a fashion similar to a derived table and can be joined with real tables,
summarized, grouped, and so on. There are two rowset functions related to full-text
searching: CONTAINSTABLE and FREETEXTTABLE. These are rowset versions of the
predicates discussed earlier in the chapter. Rather than being used in the WHERE
clause, these functions typically appear in the FROM clause of a SELECT statement.
They return a result set consisting of index key values and row rankings.
Despite the fact that it's a rowset function rather than a predicate, the
CONTAINSTABLE function works very similarly to CONTAINS, as its name would
suggest. It supports the same search string criteria as CONTAINS and requires one
parameter�the name of the underlying table�in addition to those required by the
predicate. Listing 16.10 presents an example that uses CONTAINSTABLE to produce a
list of key values and search rankings.
SELECT *
FROM CONTAINSTABLE(Employees,*,'English OR French OR Italian
OR German OR Flemish')
ORDER BY RANK DESC
(Results)
KEY RANK
----------- -----------
8 64
2 64
4 48
7 48
9 48
6 32
5 32
CONTAINSTABLE returns two columns: the key value of the row from the underlying
table and a ranking of each row. In this example, we use the RANK column to
sequence the rows logically such that higher rankings are listed first. The key value
can be used to join back to the original table in order to translate the key into
something a bit more meaningful, as you'll see in a moment.
The rankings returned by the RANK column can be tailored to your needs by using
the ISABOUT function of the search criteria string, as shown in Listing 16.11.
Listing 16.11 Using ISABOUT
SELECT *
FROM CONTAINSTABLE(Employees,*,'ISABOUT(English weight(.8),
French weight(.1), Italian weight(.2), German weight(.4),
Flemish weight(0.0))')
ORDER BY RANK DESC
(Results)
KEY RANK
----------- -----------
9 85
2 54
4 47
7 47
8 7
6 3
5 3
In this example, weights are assigned for each language skill specifically indicated
by an employee's Notes entry, ranging from 0.0 for Flemish to 0.8 for English. Valid
weights range from 0.0 to 1.0. As in the previous example, we use the RANK column
to sequence the rows such that higher rankings are listed first. ISABOUT is also
available with the CONTAINS predicate but has no effect since its only purpose is to
alter the RANK column, which is not used by the predicate.
To generate results that are truly meaningful, you need to join the result set returned
by CONTAINSTABLE with its underlying table. The key values and rankings returned
by the function itself aren't terribly useful without some correlation to the original
data. Listing 16.12 presents an example.
Listing 16.12 Joining the Result Set with Its Underlying Table
(Results abridged)
As with the earlier examples, this query sequences its result set using the RANK
column returned by CONTAINSTABLE. Note the use of brackets ([]) around the
reference to the KEY column returned by CONTAINSTABLE. Inexplicably, SQL Server
uses KEY as a column name for CONTAINSTABLE even though it's a reserved word.
This necessitates surrounding it with brackets (or double quotes if the
QUOTED_IDENTIFIER setting is enabled) any time you reference it directly.
To see the effect of the rank weighting, let's revise the query to use the default
ranking returned by Microsoft Search (Listing 16.13).
(Results abridged)
As you can see, the custom weighting we supplied makes a noticeable difference. It
changes much of the order in which the rows are listed.
As with its predicate cousin, FREETEXTTABLE locates rows containing words with the
same basic meaning as those specified in the search criteria. The format of its
search criteria string is open-ended ("free") and has no specific syntax. The search
engine extracts each word from the string and assigns it a weight, then locates rows
accordingly. Listing 16.14 performs the earlier search that locates employees with
bachelor's degrees but is rewritten to use FREETEXTTABLE.
(Results)
With such a broad criteria string, the query returns all but one row in the Employees
table. Each of these has some form of one of the words listed in the search criteria
string.
Recap
SQL Server's FTS facility is a potent tool that provides most of the functionality of
standalone file-based search engines. Enabling columns for text searching is
nontrivial and you should use Enterprise Manager or the sp_enable_fulltext stored
procedure (included in this chapter) to set them up. Once a column has been set up
for full-text searches, the CONTAINS and FREETEXT predicates, as well as the
CONTAINSTABLE and FREETEXTTABLE rowset functions, become available for use.
They offer a powerful alternative to commonplace search implements such as LIKE
and PATINDEX.
Knowledge Measure
1. What's the name of the DLL that SQL Server uses to interact with the Microsoft
Search engine?
3. True or false: SQL Server's FTS facility can be installed on Windows 2000
Server but not on Windows 2000 Professional.
4. Name the two new predicate functions you can use on tables that have been
full-text indexed.
5. What's the maximum number of full-text indexes a single table can have?
6. Is it possible to query both full-text indexed data and files that have been
indexed by the Microsoft Indexing Service from the same T-SQL query?
7. Name the two new rowset functions you can use with tables that have been
full-text indexed.
8. What T-SQL function can you call to determine whether SQL Server's FTS
facility has been installed?
10. Is it possible to configure the amount of time the Microsoft Search service will
wait on a connection to SQL Server to complete before timing out?
Part III: Data Services
Chapter 17. Server Federations
I am convinced that a vivid consciousness of the primary importance of moral
principles for the betterment and ennoblement of life does not need the idea of
a lawgiver, especially a lawgiver who works on the basis of reward and
punishment.
�Albert Einstein[1]
[1] Einstein, Albert. Letter to M. Berkowitz, October 25, 1950. Reprinted in The Expanded Quotable Einstein, ed. Alice
Calaprice. Princeton, NJ: Princeton University Press, 2000, p. 216.
A SQL Server federation is a group of SQL Servers that have had a distributed
partitioned view spread horizontally across them. Each server in the federation
stores only part of the view's underlying data. Each has the full view definition and
can use its metadata to determine the actual server that stores the physical data a
query against the partitioned view seeks. In this way, the group of servers acts as a
loose federation of SQL Servers. In terms of scalability, it allows an organization to
"scale out" horizontally rather than (or in addition to) scaling up to a more powerful
server machine.
You create a federation of SQL Servers by creating a distributed partitioned view that
spans them. In this chapter, we'll talk about partitioned views (both local and
distributed), then discuss how to create distributed partitioned views. We'll explore
performance issues associated with partitioned views and delve into a few execution
plans. This updates the coverage of partitioned views in my last book, The Guru's
Guide to SQL Server Stored Procedures, XML, and HTML.
CREATE TABLE CustomersUS (
GO
StmtText
-----------------------------------------------------------
-------
SELECT CompanyName=CompanyName
FROM dbo.CustomersV WHERE
Country=@
|--Compute Scalar(DEFINE:(<span
class="docEmphStrong">CustomersUS</s
pan>.CompanyName=CustomersUS.Co |--
Clustered Index Scan(OBJECT:
(Northwind.dbo.CustomersUS.P
)
GO
)
GO
)
GO
-----------------------------------------------------------
-------
Executes StmtText
----------- -----------------------------------------------
-------
1 |--Concatenation
1 |--Filter(WHERE:(STARTUP
EXPR(Convert([@1])=199
StmtText
-----------------------------------------------------------
-------
|-Compute Scalar(DEFINE:
(Orders1997.OrderID=Orders1997.OrderID
, Or |-Clustered Index Scan(OBJECT:
(Northwind.dbo.Orders1997.PK_Order
CONSTRAINT PK_Orders1996
CONSTRAINT PK_Orders1997
CONSTRAINT PK_Orders1998
-----------------------------------------------------------
-------
Executes StmtText
----------- -----------------------------------------------
-------
1 |--Filter(WHERE:(STARTUP
EXPR(Convert([@1])=199
StmtText
-----------------------------------------------------------
-------
GO
Executes StmtText
--------- -------------------------------------------------
-------
1 |--Concatenation
1 |--Filter(WHERE:(STARTUP
EXPR(Convert([@1])<='US'
1 |--Filter(WHERE:(STARTUP
EXPR(Convert([@1])<='UK'
1 | |--Clustered Index Seek(OBJECT:
([Northwind].
1 |--Filter(WHERE:(STARTUP
EXPR(Convert([@1])<='Fra 0 |--Clustered
Index Seek(OBJECT:([Northwind].
OrderID=1000
(Results abridged)
StmtText
-----------------------------------------------------------
--------------------
|-Compute Scalar(DEFINE:
(HOMER.Northwind.dbo.Orders1997.Custo
merID=HOMER.nort |-Remote
Query(SOURCE:(HOMER),QUERY:(SELECT
Col1024 FROM (SELECT Tbl1003.
"OrderID" Col1023,Tbl1003."CustomerID"
Col1024,Tbl1003."OrderYear" Col1027
FROM "northwind"."dbo"."Orders1997"
Tbl1003) Qry1031 <span
class="docEmphStrong">WHERE
Col1023=(1000)</span>))
SQL Server imposes a number of restrictions on partitioned views, but once those
restrictions are met, the query optimizer on each server in the federation handles
eliminating unnecessary partitions (either at compile-time or at runtime) so that
these partitions are not needlessly scanned for matching rows.
Knowledge Measure
1. True or false: If you view the execution plan for a query against a partitioned
view and see unnecessary partitions listed in the plan, you have probably
discovered a bug and should call Microsoft.
2. What does an Executes column of 0 for a given query plan step indicate?
5. True or false: The SQL Server query optimizer can eliminate unneeded
partitions from a query against a partitioned view at compile-time as well as at
runtime.
7. True or false: To set up a DPV, you must disable the lazy schema validation
option.
Chapter 18. SQLXML
The key to everything is happiness. Do what you can to be happy in this world.
Life is short�too short to do otherwise. The deferred gratification you mention
so often is more deferred than gratifying.
�H. W. Kenton
NOTE: This chapter assumes that you're running, at a minimum, SQL Server 2000
with SQLXML 3.0. The SQLXML Web releases have changed and enhanced SQL
Server's XML functionality significantly. For the sake of staying current with the
technology, I'm covering the latest version of SQLXML rather than the version that
shipped with the original release of SQL Server 2000.
This chapter updates the coverage of SQLXML in my last book, The Guru's Guide to
SQL Server Stored Procedures, XML, and HTML. That book was written before Web
Release 1 (the update to SQL Server 2000's original SQLXML functionality) had
shipped. As of this writing, SQLXML 3.0 (which would be the equivalent of Web
Release 3 had Microsoft not changed the naming scheme) has shipped, and Yukon,
the next version of SQL Server, is about to go into beta test.
This chapter will also get more into how the SQLXML technologies are designed and
how they fit together from an architectural standpoint. As with the rest of the book,
my intent here is to get beyond the "how to" and into the "why" behind how SQL
Server's technologies work.
I must confess that I was conflicted when I sat down to write this chapter. I wrestled
with whether to update the SQLXML coverage in my last book, which was more
focused on the practical application of SQLXML but which I felt really needed
updating, or to write something completely new on just the architectural aspects of
SQLXML, with little or no discussion of how to apply them in practice. Ultimately, I
decided to do both things. In keeping with the chief purpose of this book, I decided
to cover the architectural aspects of SQLXML, and, in order to stay up with the
current state of SQL Server's XML family of technologies, I decided to update the
coverage of SQLXML in my last book from the standpoint of practical use. So, this
chapter updates what I had to say previously about SQLXML and also delves into the
SQLXML architecture in ways I've not done before.
Overview
With the popularity and ubiquity of XML, it's no surprise that SQL Server has
extensive support for working with it. Like most modern DBMSs, SQL Server regularly
needs to work with and store data that may have originated in XML. Without this
built-in support, getting XML to and from SQL Server would require the application
developer to translate XML data before sending it to SQL Server and again after
receiving it back. Obviously, this could quickly become very tedious given the
pervasiveness of the language.
SQL Server is an XML-enabled DBMS. This means that it can read and write XML
data. It can return data from databases in XML format, and it can read and update
data stored in XML documents. As Table 18.1 illustrates, out of the box, SQL Server's
XML features can be broken down into eight general categories.
Feature Purpose
FOR XML An extension to the SELECT command that allows result sets to be
returned as XML
XPath queries Allows SQL Server databases to be queried using XPath syntax
Schemas Supports XSD and XDR mapping schemas and XPath queries against
them
SOAP support Allows clients to access SQL Server's functionality as a Web service
Managed Classes that expose the functionality of SQLXML inside the .NET
classes Framework
XML Bulk A high-speed facility for loading XML data into a SQL Server database
Load
We'll explore each of these in this chapter and discuss how they work and how they
interoperate.
MSXML
SQL Server uses Microsoft's XML parser, MSXML, to load XML data, so we'll begin our
discussion there. There are two basic ways to parse XML data using MSXML: using
the Document Object Model (DOM) or using the Simple API for XML (SAX). Both DOM
and SAX are W3C standards. The DOM method involves parsing the XML document
and loading it into a tree structure in memory. The entire document is materialized
and stored in memory when processed this way. An XML document parsed via DOM
is known as a DOM document (or just "DOM" for short). XML parsers provide a
variety of ways to manipulate DOM documents. Listing 18.1 shows a short Visual
Basic app that demonstrates parsing an XML document via DOM and querying it for
a particular node set. (You can find the source code to this app in the
CH18\msxmltest subfolder on the CD accompanying this book.)
Listing 18.1
If Len(Text1.Text) = 0 Then
Text1.Text = bstrDoc
End If
If Len(Text2.Text) = 0 Then
Text2.Text = "//Song"
End If
Set oNodes = xmlDoc.selectNodes(Text2.Text)
SAX, by contrast, is an event-driven API. You process an XML document via SAX by
configuring your application to respond to SAX events. As the SAX processor reads
through an XML document, it raises events each time it encounters something the
calling application should know about, such as an element starting or ending, an
attribute starting or ending, and so on. It passes the relevant data about the event
to the application's handler for the event. The application can then decide what to
do in response�it could store the event data in some type of tree structure, as is the
case with DOM processing; it could ignore the event; it could search the event data
for something in particular; or it could take some other action. Once the event is
handled, the SAX processor continues reading the document. At no point does it
persist the document in memory as DOM does. It's really just a parsing mechanism
to which an application can attach its own functionality. In fact, SAX is the underlying
parsing mechanism for MSXML's DOM processor. Microsoft's DOM implementation
sets up SAX event handlers that simply store the data handed to them by the SAX
engine in a DOM tree.
As you've probably surmised by now, SAX consumes far less memory than DOM
does. That said, it's also much more trouble to set up and use. By persisting
documents in memory, the DOM API makes working with XML documents as easy as
working with any other kind of object.
SQL Server uses MSXML and the DOM to process documents you load via
sp_xml_preparedocument. It restricts the virtual memory MSXML can use for DOM
processing to one-eighth of the physical memory on the machine or 500MB,
whichever is less. In actual practice, it's highly unlikely that MSXML would be able to
access 500MB of virtual memory, even on a machine with 4GB of physical memory.
The reason for this is that, by default, SQL Server reserves most of the user mode
address space for use by its buffer pool. You'll recall that we talked about the
MemToLeave space in Chapter 11 and noted that the non�thread stack portion
defaults to 256MB on SQL Server 2000. This means that, by default, MSXML won't be
able to use more than 256MB of memory�and probably considerably less given that
other things are also allocated from this region�regardless of the amount of
physical memory on the machine.
The reason MSXML is limited to no more than 500MB of virtual memory use
regardless of the amount of memory on the machine is that SQL Server calls the
GlobalMemoryStatus Win32 API function to determine the amount of available
physical memory. GlobalMemoryStatus populates a MEMORYSTATUS structure with
information about the status of memory use on the machine. On machines with
more than 4GB of physical memory, GlobalMemoryStatus can return incorrect
information, so Windows returns a -1 to indicate an overflow. The Win32 API function
GlobalMemoryStatusEx exists to address this shortcoming, but SQLXML does not call
it. You can see this for yourself by working through the following exercise.
1. Restart your SQL Server, preferably from a console since we will be attaching
to it with WinDbg. This should be a test or development system, and, ideally,
you should be its only user.
3. Attach to SQL Server using WinDbg. (Press F6 and select sqlservr.exe from the
list of running tasks; if you have multiple instances, be sure to select the right
one.)
bp kernel32!GlobalMemoryStatus
5. Once the breakpoint is added, type g and hit Enter to allow SQL Server to run.
9. You'll recall from Chapter 3 that we discovered that the entry point for T-SQL
batch execution within SQL Server is language_exec. You can see the call to
language_exec at the bottom of this stack�this was called when you
submitted the T-SQL batch to the server to run. Working upward from the
bottom, we can see the call to SpXmlPrepareDocument, the internal "spec
proc" (an extended procedure implemented internally by the server rather
than in an external DLL) responsible for implementing the
sp_xml_preparedocument xproc. We can see from there that
SpXmlPrepareDocument calls LoadXMLDocument, LoadXMLDocument calls a
method named Load, Load calls a method named DoLoad, and DoLoad calls
GlobalMemoryStatus. So, that's how we know how MSXML computes the
amount of physical memory in the machine, and, knowing the limitations of
this function, that's how we know the maximum amount of virtual memory
MSXML can use.
10. Type q and hit Enter to quit WinDbg. You will have to restart your SQL Server.
FOR XML
Despite MSXML's power and ease of use, SQL Server doesn't leverage MSXML in all
of its XML features. It doesn't use it to implement server-side FOR XML queries, for
example, even though it's trivial to construct a DOM document programmatically
and return it as text. MSXML has facilities that make this quite easy. For example,
Listing 18.2 presents a Visual Basic app that executes a query via ADO and
constructs a DOM document on-the-fly based on the results it returns.
Listing 18.2
oConn.Open (Text3.Text)
oComm.ActiveConnection = oConn
oComm.CommandText = Text1.Text
Set oRs = oComm.Execute
oConn.Close
Text2.Text = xmlDoc.xml
Set xmlDoc = Nothing
Set oRs = Nothing
Set oComm = Nothing
Set oConn = Nothing
End Sub
As you can see, translating a result set to XML doesn't require much code. The ADO
Recordset object even supports being streamed directly to an XML document (via its
Save method), so if you don't need complete control over the conversion process,
you might be able to get away with even less code than in my example.
As I've said, SQL Server doesn't use MSXML or build a DOM document in order to
return a result set as XML. Why is that? And how do we know that it doesn't use
MSXML to process server-side FOR XML queries? I'll answer both questions in just a
moment.
The answer to the first question should be pretty obvious. Building a DOM from a
result set before returning it as text would require SQL Server to persist the entire
result set in memory. Given that the memory footprint of the DOM version of an XML
document is roughly three to five times as large as the document itself, this doesn't
paint a pretty resource usage picture. If they had to first be persisted entirely in
memory before being returned to the client, even moderately large FOR XML result
sets could use huge amounts of virtual memory (or run into the MSXML memory
ceiling and therefore be too large to generate).
To answer the second question, let's again have a look at SQL Server under a
debugger.
1. Restart your SQL Server, preferably from a console since we will be attaching
to it with WinDbg. This should be a test or development system, and, ideally,
you should be its only user.
3. Attach to SQL Server using WinDbg. (Press F6 and select sqlservr.exe from the
list of running tasks; if you have multiple instances, be sure to select the right
one.) Once the WinDbg command prompt appears, type g and press Enter so
that SQL Server can continue to run.
SELECT * FROM (
SELECT 'Summer Dream' as Song
UNION
SELECT 'Summer Snow'
UNION
SELECT 'Crazy For You'
) s FOR XML AUTO
This query unions some SELECT statements together, then queries the union
as a derived table using a FOR XML clause.
5. After you run the query, switch back to WinDbg. You will likely see some
ModLoad messages in the WinDbg command window. WinDbg displays a
ModLoad message whenever a module is loaded into the process being
debugged. If MSXMLn.DLL were being used to service your FOR XML query,
you'd see a ModLoad message for it. As you've noticed, there isn't one. MSXML
isn't used to service FOR XML queries.
6. If you've done much debugging, you may be speculating that perhaps the
MSXML DLL is already loaded; hence, we wouldn't see a ModLoad message for
it when we ran our FOR XML query. That's easy enough to check. Hit
Ctrl+Break in the debugger, then type lm in the command window and hit
Enter. The lm command lists the modules currently loaded into the process
space. Do you see MSXMLn.DLL in the list? Unless you've been interacting with
SQL Server's other XML features since you recycled your server, it should not
be there. Type g in the command window and press Enter so that SQL Server
can continue to run.
8. Hit Ctrl+Break again to stop WinDbg, then type q and hit Enter to stop
debugging. You will need to restart your SQL Server.
So, based on all this, we can conclude that SQL Server generates its own XML when
it processes a server-side FOR XML query. There is no memory-efficient mechanism
in MSXML to assist with this, so it is not used.
Using FOR XML
As you saw in Exercise 18.2, you can append FOR XML AUTO to the end of a SELECT
statement in order to cause the result to be returned as an XML document fragment.
Transact-SQL's FOR XML syntax is much richer than this, though�it supports several
options that extend its usefulness in numerous ways. In this section, we'll discuss a
few of these and work through examples that illustrate them.
As I'm sure you've already surmised, you can retrieve XML data from SQL Server by
using the FOR XML option of the SELECT command. FOR XML causes SELECT to
return query results as an XML stream rather than a traditional rowset. On the
server-side, this stream can have one of three formats: RAW, AUTO, or EXPLICIT. The
basic FOR XML syntax looks like this:
RAW returns column values as attributes and wraps each row in a generic row
element. AUTO returns column values as attributes and wraps each row in an
element named after the table from which it came.[1] EXPLICIT lets you completely
control the format of the XML returned by a query.
[1] There's actually more to this than simply naming each row after the table, view, or UDF that produced it. SQL Server uses a set of
heuristics to decide what the actual element names are with FOR XML AUTO.
I'll discuss these options in more detail in just a moment. Also note that there are
client-side specific options available with FOR XML queries that aren't available in
server-side queries. We'll talk about those in just a moment, too.
RAW Mode
RAW mode is the simplest of the three basic FOR XML modes. It performs a very
basic translation of the result set into XML. Listing 18.3 shows an example.
Listing 18.3
SELECT CustomerId, CompanyName
FROM Customers FOR XML RAW
(Results abridged)
XML_F52E2B61-18A1-11d1-B105-00805F49916B
------------------------------------------------------------------
<row CustomerId="ALFKI" CompanyName="Alfreds Futterkiste"/><row Cu
CompanyName="Ana Trujillo Emparedados y helados"/><row CustomerId=
CompanyName="Antonio Moreno Taquer'a"/><row CustomerId="AROUT" Com
Horn"/><row CustomerId="BERGS" CompanyName="Berglunds snabbköp"/><
CustomerId="BLAUS" CompanyName="Blauer See Delikatessen"/><row Cus
CompanyName="Blondesddsl p_re et fils"/><row CustomerId="WELLI"
CompanyName="Wellington Importadora"/><row CustomerId="WHITC" Comp
Clover Markets"/><row CustomerId="WILMK" CompanyName="Wilman Kala"
CustomerId="WOLZA"
CompanyName="Wolski Zajazd"/>
Each column becomes an attribute in the result set, and each row becomes an
element with the generic name of row.
As I've mentioned before, the XML that's returned by FOR XML is not well formed
because it lacks a root element. It's technically an XML fragment and must include a
root element in order to be usable by an XML parser. From the client side, you can
set an ADO Command object's xml root property in order to automatically generate
a root node when you execute a FOR XML query.
AUTO Mode
FOR XML AUTO gives you more control than RAW mode over the XML fragment that's
produced. To begin with, each row in the result set is named after the table, view, or
table-valued UDF that produced it. For example, Listing 18.4 shows a basic FOR XML
AUTO query.
Listing 18.4
(Results abridged)
XML_F52E2B61-18A1-11d1-B105-00805F49916B
------------------------------------------------------------------
<Customers CustomerId="ALFKI" CompanyName="Alfreds Futterkiste"/><
CustomerId="ANATR" CompanyName="Ana Trujillo Emparedados y helados
CustomerId="ANTON" CompanyName="Antonio Moreno Taquer'a"/><Custome
CustomerId="AROUT" CompanyName="Around the Horn"/><Customers Custo
CompanyName="Vins et alcools Chevalier"/><Customers CustomerId="WA
CompanyName="Wartian Herkku"/><Customers CustomerId="WELLI" Compan
Importadora"/><Customers CustomerId="WHITC" CompanyName="White Clo
Markets"/><Customers CustomerId="WILMK" CompanyName="Wilman Kala"/
CustomerId="WOLZA"
CompanyName="Wolski Zajazd"/>
Notice that each row is named after the table from whence it came: Customers. For
results with more than one row, this amounts to having more than one top-level
(root) element in the fragment, which isn't allowed in XML.
One big difference between AUTO and RAW mode is the way in which joins are
handled. In RAW mode, a simple one-to-one translation occurs between columns in
the result set and attributes in the XML fragment. Each row becomes an element in
the fragment named row. These elements are technically empty themselves�they
contain no values or subelements, only attributes. Think of attributes as specifying
characteristics of an element, while data and subelements compose its contents. In
AUTO mode, each row is named after the source from which it came, and the rows
from joined tables are nested within one another. Listing 18.5 presents an example.
Listing 18.5
XML_F52E2B61-18A1-11d1-B105-00805F49916B
------------------------------------------------------------------
<Customers CustomerID="ALFKI" CompanyName="Alfreds Futterkiste">
<Orders OrderId="10643"/><Orders OrderId="10692"/>
<Orders OrderId="10702"/><Orders OrderId="10835"/>
<Orders OrderId="10952"/><Orders OrderId="11011"/>
</Customers>
<Customers CustomerID="ANATR" CompanyName="Ana Trujillo Emparedado
<Orders OrderId="10308"/><Orders OrderId="10625"/>
<Orders OrderId="10759"/><Orders OrderId="10926"/></Customers>
<Customers CustomerID="FRANR" CompanyName="France restauration">
<Orders OrderId="10671"/><Orders OrderId="10860"/>
<Orders OrderId="10971"/>
</Customers>
I've formatted the XML fragment to make it easier to read�if you run the query
yourself from Query Analyzer, you'll see an unformatted stream of XML text.
Note the way in which the Orders for each customer are contained within each
Customer element. As I said, AUTO mode nests the rows returned by joins. Note my
use of the full table name in the join criterion. Why didn't I use a table alias?
Because AUTO mode uses the table aliases you specify to name the elements it
returns. If you use shortened monikers for a table, its elements will have that name
in the resulting XML fragment. While useful in traditional Transact-SQL, this makes
the fragment difficult to read if the alias isn't sufficiently descriptive.
ELEMENTS Option
The ELEMENTS option of the FOR XML AUTO clause causes AUTO mode to return
nested elements instead of attributes. Depending on your business needs, element-
centric mapping may be preferable to the default attribute-centric mapping. Listing
18.6 gives an example of a FOR XML query that returns elements instead of
attributes.
Listing 18.6
XML_F52E2B61-18A1-11d1-B105-00805F49916B
------------------------------------------------------------------
<Customers>
<CustomerID>ALFKI</CustomerID>
<CompanyName>Alfreds Futterkiste</CompanyName>
</Customers>
<Customers>
<CustomerID>ANATR</CustomerID>
<CompanyName>Ana Trujillo Emparedados y helados</CompanyName>
</Customers>
<Customers>
<CustomerID>ANTON</CustomerID>
<CompanyName>Antonio Moreno Taquer'a</CompanyName>
</Customers>
<Customers>
<CustomerID>AROUT</CustomerID>
<CompanyName>Around the Horn</CompanyName>
</Customers>
<Customers>
<CustomerID>WILMK</CustomerID>
<CompanyName>Wilman Kala</CompanyName>
</Customers>
<Customers>
<CustomerID>WOLZA</CustomerID>
<CompanyName>Wolski Zajazd</CompanyName>
</Customers>
Notice that the ELEMENTS option has caused what were being returned as attributes
of the Customers element to instead be returned as subelements. Each attribute is
now a pair of element tags that enclose the value from a column in the table.
NOTE: Currently, AUTO mode does not support GROUP BY or aggregate functions.
The heuristics it uses to determine element names are incompatible with these
constructs, so you cannot use them in AUTO mode queries. Additionally, FOR XML
itself is incompatible with COMPUTE, so you can't use it in FOR XML queries of any
kind.
EXPLICIT Mode
If you need more control over the XML than FOR XML produces, EXPLICIT mode is
more flexible (and therefore more complicated to use) than either RAW mode or
AUTO mode. EXPLICIT mode queries define XML documents in terms of a "universal
table"�a mechanism for returning a result set from SQL Server that describes what
you want the document to look like, rather than composing the document itself. A
universal table is just a SQL Server result set with special column headings that tell
the server how to produce an XML document from your data. Think of it as a set-
oriented method of making an API call and passing parameters to it. You use the
facilities available in Transact-SQL to make the call and pass it parameters.
A universal table consists of one column for each table column that you want to
return in the XML fragment, plus two additional columns: Tag and Parent. Tag is a
positive integer that uniquely identifies each tag that is to be returned by the
document; Parent establishes parent-child relationships between tags.
The other columns in a universal table�the ones that correspond to the data you
want to include in the XML fragment�have special names that actually consist of
multiple segments delimited by exclamation points (!). These special column names
pass muster with SQL Server's parser and provide specific instructions regarding the
XML fragment to produce. They have the following format:
Element!Tag!Attribute!Directive
The first thing you need to do to build an EXPLICIT mode query is to determine the
layout of the XML document you want to end up with. Once you know this, you can
work backward from there to build a universal table that will produce the desired
format. For example, let's say we want a simple customer list based on the
Northwind Customers table that returns the customer ID as an attribute and the
company name as an element. The XML fragment we're after might look like this:
Listing 18.7 shows a Transact-SQL query that returns a universal table that specifies
this layout.
Listing 18.7
SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1]
FROM Customers
(Results abridged)
The first two columns are the extra columns I mentioned earlier. Tag specifies an
identifier for the tag we want to produce. Since we want to produce only one
element per row, we hard-code this to 1. The same is true of Parent�there's only
one element and a top-level element doesn't have a parent, so we return NULL for
Parent in every row.
By itself, this table accomplishes nothing. We have to add FOR XML EXPLICIT to the
end of it in order for the odd column names to have any special meaning. Add FOR
XML EXPLICIT to the query and run it from Query Analyzer. Listing 18.8 shows what
you should see.
Listing 18.8
SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1]
FROM Customers
FOR XML EXPLICIT
XML_F52E2B61-18A1-11d1-B105-00805F49916B
------------------------------------------------------------------
<Customers CustomerId="ALFKI">Alfreds Futterkiste</Customers>
<Customers CustomerId="ANATR">Ana Trujillo Emparedados y helados
</Customers>
<Customers CustomerId="WHITC">White Clover Markets</Customers>
<Customers CustomerId="WILMK">Wilman Kala</Customers>
<Customers CustomerId="WOLZA">Wolski Zajazd</Customers>
hide Hides (omits) a column that appears in the universal table from the
resulting XML fragment
As you can see, each CustomerId value is returned as an attribute, and each
CompanyName is returned as the element data for the Customers element, just as
we specified.
Directives
The fourth part of the multivalued column headings supported by EXPLICIT mode
queries is the directive segment. You use it to further control how data is
represented in the resulting XML fragment. As Table 18.2 illustrates, the directive
segment supports eight values.
Listing 18.9
SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1],
ContactName AS [Customers!1!ContactName!element]
FROM Customers
FOR XML EXPLICIT
XML_F52E2B61-18A1-11d1-B105-00805F49916B
------------------------------------------------------------------
<Customers CustomerId="ALFKI">Alfreds Futterkiste
<ContactName>Maria Anders</ContactName>
</Customers>
<Customers CustomerId="ANATR">Ana Trujillo Emparedados y
<ContactName>Ana Trujillo</ContactName>
</Customers>
<Customers CustomerId="ANTON">Antonio Moreno Taquer'a
<ContactName>Antonio Moreno</ContactName>
</Customers>
<Customers CustomerId="AROUT">Around the Horn
<ContactName>Thomas Hardy</ContactName>
</Customers>
<Customers CustomerId="BERGS">Berglunds snabbköp
<ContactName>Christina Berglund</ContactName>
</Customers>
<Customers CustomerId="WILMK">Wilman Kala
<ContactName>Matti Karttunen</ContactName>
</Customers>
<Customers CustomerId="WOLZA">Wolski Zajazd
<ContactName>Zbyszek Piestrzeniewicz</ContactName>
</Customers>
Listing 18.10
SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1],
ContactName AS [Customers!1!ContactName!xml]
FROM Customers
FOR XML EXPLICIT
The xml directive (bolded) causes the column to be returned without encoding any
special characters it contains.
Thus far, we've been listing the data from a single table, so our EXPLICT queries
haven't been terribly complex. That would still be true even if we queried multiple
tables as long as we didn't mind repeating the data from each table in each top-level
element in the XML fragment. Just as the column values from joined tables are often
repeated in the result sets of Transact-SQL queries, we could create an XML
fragment that contained data from multiple tables repeated in each element.
However, that wouldn't be the most efficient way to represent the data in XML.
Remember: XML supports hierarchical relationships between elements. You can
establish these hierarchies by using EXPLICIT mode queries and T-SQL UNIONs.
Listing 18.11 provides an example.
Listing 18.11
SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1],
NULL AS [Orders!2!OrderId],
NULL AS [Orders!2!OrderDate!element]
FROM Customers
UNION
SELECT 2 AS Tag,
1 AS Parent,
CustomerId,
NULL,
OrderId,
OrderDate
FROM Orders
ORDER BY [Customers!1!CustomerId], [Orders!2!OrderDate!element]
FOR XML EXPLICIT
This query does several interesting things. First, it links the Customers and Orders
tables using the CustomerId column they share. Notice the third column in each
SELECT statement�it returns the CustomerId column from each table. The Tag and
Parent columns establish the details of the relationship between the two tables. The
Tag and Parent values in the second query link it to the first. They establish that
Order records are children of Customer records. Lastly, note the ORDER BY clause. It
arranges the elements in the table in a sensible fashion�first by CustomerId and
second by the OrderDate of each Order. Listing 18.12 shows the result set.
Listing 18.12
XML_F52E2B61-18A1-11d1-B105-00805F49916B
------------------------------------------------------------------
<Customers CustomerId="ALFKI">Alfreds Futterkiste
<Orders OrderId="10643">
<OrderDate>1997-08-25T00:00:00</OrderDate>
</Orders>
<Orders OrderId="10692">
<OrderDate>1997-10-03T00:00:00</OrderDate>
</Orders>
<Orders OrderId="10702">
<OrderDate>1997-10-13T00:00:00</OrderDate>
</Orders>
<Orders OrderId="10835">
<OrderDate>1998-01-15T00:00:00</OrderDate>
</Orders>
<Orders OrderId="10952">
<OrderDate>1998-03-16T00:00:00</OrderDate>
</Orders>
<Orders OrderId="11011">
<OrderDate>1998-04-09T00:00:00</OrderDate>
</Orders>
</Customers>
<Customers CustomerId="ANATR">Ana Trujillo Emparedados y helados
<Orders OrderId="10308">
<OrderDate>1996-09-18T00:00:00</OrderDate>
</Orders>
<Orders OrderId="10625">
<OrderDate>1997-08-08T00:00:00</OrderDate>
</Orders>
<Orders OrderId="10759">
<OrderDate>1997-11-28T00:00:00</OrderDate>
</Orders>
<Orders OrderId="10926">
<OrderDate>1998-03-04T00:00:00</OrderDate>
</Orders>
</Customers>
As you can see, each customer's orders are nested within its element.
The hide directive omits a column you've included in the universal table from the
resulting XML document. One use of this functionality is to order the result by a
column that you don't want to include in the XML fragment. When you aren't using
UNION to merge tables, this isn't a problem because you can order by any column
you choose. However, the presence of UNION in a query requires order by columns
to exist in the result set. The hide directive gives you a way to satisfy this
requirement without being forced to return data you don't want to. Listing 18.13
shows an example.
Listing 18.13
SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1],
PostalCode AS [Customers!1!PostalCode!hide],
NULL AS [Orders!2!OrderId],
NULL AS [Orders!2!OrderDate!element]
FROM Customers
UNION
SELECT 2 AS Tag,
1 AS Parent,
CustomerId,
NULL,
NULL,
OrderId,
OrderDate
FROM Orders
ORDER BY [Customers!1!CustomerId], [Orders!2!OrderDate!element],
[Customers!1!PostalCode!hide]
FOR XML EXPLICIT
Notice the hide directive (bolded) that's included in the column 5 heading. It allows
the column to be specified in the ORDER BY clause without actually appearing in the
resulting XML fragment.
CDATA sections may appear anywhere in an XML document that character data may
appear. A CDATA section is used to escape characters that would otherwise be
recognized as markup (e.g., <, >, /, and so on). Thus CDATA sections allow you to
include sections in an XML document that might otherwise confuse the parser. To
render a CDATA section from an EXPLICIT mode query, include the cdata directive, as
demonstrated in Listing 18.14.
Listing 18.14
SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId],
CompanyName AS [Customers!1],
Fax AS [Customers!1!!cdata]
FROM Customers
FOR XML EXPLICIT
XML_F52E2B61-18A1-11d1-B105-00805F49916B
------------------------------------------------------------------
<Customers CustomerId="ALFKI">Alfreds Futterkiste
<![CDATA[030-0076545]]>
</Customers>
<Customers CustomerId="ANATR">Ana Trujillo Emparedados y helados
<![CDATA[(5) 555-3745]]>
</Customers>
<Customers CustomerId="ANTON">Antonio Moreno Taquer'a
</Customers>
<Customers CustomerId="AROUT">Around the Horn
<![CDATA[(171) 555-6750]]>
</Customers>
<Customers CustomerId="BERGS">Berglunds snabbköp
<![CDATA[0921-12 34 67]]>
</Customers>
As you can see, each value in the Fax column is returned as a CDATA section in the
XML fragment. Note the omission of the attribute name in the cdata column heading
(bolded). This is because attribute names aren't allowed for CDATA sections. Again,
they represent escaped document segments, so the XML parser doesn't process any
attribute or element names they may contain.
The ID, IDREF, and IDFREFS data types can be used to represent relational data in an
XML document. Set up in a DTD or XML-Data schema, they establish relationships
between elements. They're handy in situations where you need to exchange
complex data and want to minimize the amount of data duplication in the document.
EXPLICIT mode queries can use the id, idref, and idrefs directives to specify
relational fields in an XML document. Naturally, this approach works only if a schema
is used to define the document and identify the columns used to establish links
between entities. FOR XML's XMLDATA option provides a means of generating an
inline schema for its XML fragment. In conjunction with the id directives, it can
identify relational fields in the XML fragment. Listing 18.15 gives an example.
Listing 18.15
SELECT 1 AS Tag,
NULL AS Parent,
CustomerId AS [Customers!1!CustomerId!id],
CompanyName AS [Customers!1!CompanyName],
NULL AS [Orders!2!OrderID],
NULL AS [Orders!2!CustomerId!idref]
FROM Customers
UNION
SELECT 2,
NULL,
NULL,
NULL,
OrderID,
CustomerId
FROM Orders
ORDER BY [Orders!2!OrderID]
FOR XML EXPLICIT, XMLDATA
XML_F52E2B61-18A1-11d1-B105-00805F49916B
------------------------------------------------------------------
<Schema name="Schema2" xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes">
<ElementType name="Customers" content="mixed" model="open">
<AttributeType name="CustomerId" dt:type="id"/>
<AttributeType name="CompanyName" dt:type="string"/>
<attribute type="CustomerId"/>
<attribute type="CompanyName"/>
</ElementType>
<ElementType name="Orders" content="mixed" model="open">
<AttributeType name="OrderID" dt:type="i4"/>
<AttributeType name="CustomerId" dt:type="idref"/>
<attribute type="OrderID"/>
<attribute type="CustomerId"/>
</ElementType>
</Schema>
<Customers xmlns="x-schema:#Schema2" CustomerId="ALFKI"
CompanyName="Alfreds Futterkiste"/>
<Customers xmlns="x-schema:#Schema2" CustomerId="ANATR"
CompanyName="Ana Trujillo Emparedados y helados"/>
<Customers xmlns="x-schema:#Schema2" CustomerId="ANTON"
CompanyName="Antonio Moreno Taquer'a"/>
<Customers xmlns="x-schema:#Schema2" CustomerId="AROUT"
CompanyName="Around the Horn"/>
<Orders xmlns="x-schema:#Schema2" OrderID="10248"
CustomerId="VINET"/>
<Orders xmlns="x-schema:#Schema2" OrderID="10249"
CustomerId="TOMSP"/>
<Orders xmlns="x-schema:#Schema2" OrderID="10250"
CustomerId="HANAR"/>
<Orders xmlns="x-schema:#Schema2" OrderID="10251"
CustomerId="VICTE"/>
<Orders xmlns="x-schema:#Schema2" OrderID="10252"
CustomerId="SUPRD"/>
<Orders xmlns="x-schema:#Schema2" OrderID="10253"
CustomerId="HANAR"/>
<Orders xmlns="x-schema:#Schema2" OrderID="10254"
CustomerId="CHOPS"/>
<Orders xmlns="x-schema:#Schema2" OrderID="10255"
CustomerId="RICSU"/>
Note the use of the id and idref directives in the CustomerId columns of the
Customers and Orders tables (bolded). These directives link the two tables by using
the CustomerId column they share.
If you examine the XML fragment returned by the query, you'll see that it starts off
with the XML-Data schema that the XMLDATA directive created. This schema is then
referenced in the XML fragment that follows.
SQLXML also supports the notion of offloading to the client the work of translating a
result set into XML. This functionality is accessible via the SQLXML managed classes,
XML templates, a virtual directory configuration switch, and the SQLXMLOLEDB
provider. Because it requires the least amount of setup, I'll cover client-side FOR XML
using SQLXMLOLEDB here. The underlying technology is the same regardless of the
mechanism used.
SQLXMLOLEDB serves as a layer between a client (or middle-tier) app and SQL
Server's native SQLOLEDB provider. The Data Source property of the SQLXMLOLEDB
provider specifies the OLE DB provider through which it executes queries; currently
only SQLOLEDB is allowed.
SQLXMLOLEDB is not a rowset provider. In order to use it from ADO, you must access
it via ADO's stream mode. I'll show you some code in just a minute that illustrates
this.
You perform client-side FOR XML processing using SQLXMLOLEDB by following these
general steps.
3. Create and open an ADO stream object and associate it with your Command
object's Output Stream property.
4. Execute a FOR XML EXPLICIT, FOR XML RAW, or FOR XML NESTED Transact-SQL
query via your Command object, specifying the adExecuteStream option in
your call to Execute.
Listing 18.16 illustrates. (You can find the source code for this app in the
CH18\forxml_clientside subfolder on this book's CD.)
Listing 18.16
oConn.Open (Text3.Text)
oComm.ActiveConnection = oConn
oComm.Properties("ClientSideXML") = "True"
If Len(Text1.Text) = 0 Then
Text1.Text = _
"select * from pubs..authors FOR XML NESTED"
End If
oComm.CommandText = Text1.Text
oComm.Properties("Output Stream") = stOutput
oComm.Properties("xml root") = "Root"
oComm.Execute , , adExecuteStream
Text2.Text = stOutput.ReadText(adReadAll)
stOutput.Close
oConn.Close
As you can see, most of the action here revolves around the ADO Command object.
We set its ClientSideXML property to True and its Output Stream property to an ADO
stream object we created before callings its Execute method.
Note the use of the FOR XML NESTED clause. The NESTED option is specific to client-
side FOR XML processing�you can't use it in server-side queries. It's very much like
FOR XML AUTO but has some minor differences. For example, when a FOR XML
NESTED query references a view, the names of the view's underlying base tables are
used in the generated XML. The same is true for table aliases�their base names are
used in the XML that's produced. Using FOR XML AUTO in a client-side FOR XML
query causes the query to be processed on the server rather than the client, so use
NESTED when you want similar functionality to FOR XML AUTO on the client.
Given our previous investigation into whether MSXML is involved in the production of
server-side XML (Exercise 18.2), you might be wondering whether it's used by
SQLXML's client-side FOR XML processing. It isn't. Again, you can attach a debugger
(in this case, to the forxml_clientside app) to see this for yourself. You will see
SQLXMLn.DLL loaded into the app's process space the first time you run the query.
This DLL is where the SQLXMLOLEDB provider resides and is where SQLXML's client-
side FOR XML processing occurs.
OPENXML
OPENXML is a built-in Transact-SQL function that can return an XML document as a
rowset. In conjunction with sp_xml_preparedocument and sp_xml_removedocument,
OPENXML allows you to break down (or shred) nonrelational XML documents into
relational pieces that can be inserted into tables.
The most expedient way to determine this is to run SQL Server under a debugger,
stop it in the middle of an OPENXML call, and inspect the call stack. That would tell
us in what module it was implemented. Since we don't know the name of the classes
or functions that implement OPENXML, we can't easily set a breakpoint to
accomplish this. Instead, we will have to just be quick and/or lucky enough to stop
the debugger in the right place if we want to use this approach to find out the
module in which OPENXML is implemented. This is really easier said than done. Even
with complicated documents, OPENXML returns fairly quickly, so breaking in with a
debugger while it's in progress could prove pretty elusive.
Another way to accomplish the same thing would be to force OPENXML to error and
have a breakpoint set up in advance to stop in SQL Server's standard error reporting
routine. From years of working with the product and seeing my share of access
violations and stack dumps, I know that ex_raise is a central error-reporting routine
for the server. Not all errors go through ex_raise, but many of them do, so it's worth
setting a breakpoint in ex_raise and forcing OPENXML to error to see whether we can
get a call stack and ascertain where OPENXML is implemented. Exercise 18.3 will
take you through the process of doing exactly that.
1. Restart your SQL Server, preferably from a console since we will be attaching
to it with WinDbg. This should be a test or development system, and, ideally,
you should be its only user.
3. Attach to SQL Server using WinDbg. (Press F6 and select sqlservr.exe from the
list of running tasks; if you have multiple instances, be sure to select the right
one.)
bp sqlservr!ex_raise
5. Type g and press Enter so that SQL Server can continue to run.
7. Query Analyzer should appear to hang because the breakpoint you set in
WinDbg has been hit. Switch back to WinDbg and type kv at the command
prompt and press Enter. This will dump the call stack. Your stack should look
something like this (I've removed everything but the function names):
sqlservr!ex_raise
sqlservr!CXMLDocsList::XMLMapFromHandle+0x3f
sqlservr!COpenXMLRange::GetRowset+0x14d
sqlservr!CQScanRmtScan::OpenConnection+0x141
sqlservr!CQScanRmtBase::Open+0x18
sqlservr!CQueryScan::Startup+0x10d
sqlservr!CStmtQuery::ErsqExecuteQuery+0x26b
sqlservr!CStmtSelect::XretExecute+0x229
sqlservr!CMsqlExecContext::ExecuteStmts+0x3b9
sqlservr!CMsqlExecContext::Execute+0x1b6
sqlservr!CSQLSource::Execute+0x357
sqlservr!language_exec+0x3e1
sqlservr!process_commands+0x10e
UMS!ProcessWorkRequests+0x272
UMS!ThreadStartRoutine+0x98 (FPO: [EBP 0x00bd6878] [1,0,4])
MSVCRT!_beginthread+0xce
KERNEL32!BaseThreadStart+0x52 (FPO: [Non-Fpo])
8. This call stack tells us a couple of things. First, it tells us that OPENXML is
implemented directly by the server itself. It resides in sqlservr.exe, SQL
Server's executable. Second, it tells us that a class named COpenXMLRange is
responsible for producing the rowset that the T-SQL OPENXML function returns.
9. Type q and hit Enter to stop debugging. You will need to restart your SQL
Server.
By reviewing this call stack, we can deduce how OPENXML works. It comes into the
server via a language or RPC event (our code obviously came into the server as a
language event�note the language_exec entry in the call stack) and eventually
results in a call to the GetRowset method of the COpenXMLRange class. We can
assume that GetRowset accesses the DOM document previously created via the call
to sp_xml_preparedocument and turns it into a two-dimensional matrix that can be
returned as a rowset, thus finishing up the work of the OPENXML function.
Now that we know the name of the class and method behind OPENXML, we could set
a new breakpoint in COpenXMLRange::GetRowset, pass a valid document handle
into OPENXML, and step through the disassembly for the method when the
breakpoint is hit. However, we've got a pretty good idea of how OPENXML works;
there's little to be learned about OPENXML's architecture from stepping through the
disassembly at this point.
DECLARE @hDoc int
<song><name>Somebody to
Love</name></song> <song>
<name>These Are the Days of Our
Lives</name></song> <song>
<name>Bicycle Race</name></song>
<song><name>Who Wants to Live
Forever</name></song> <song>
<name>I Want to Break Free</name>
</song> <song><name>Friends Will Be
Friends</name></song> </songs>'
name
-----------------------------------------------------------
-------
Somebody to Love
Bicycle Race
USE tempdb
GO
<song><name>Somebody to
Love</name></song> <song>
<name>These Are the Days of Our
Lives</name></song> <song>
<name>Bicycle Race</name></song>
<song><name>Who Wants to Live
Forever</name></song> <song>
<name>I Want to Break Free</name>
</song> <song><name>Friends Will Be
Friends</name></song> </songs>'
GO
name
-----------------------------------------------------------
-------
Somebody to Love
Bicycle Race
artist song
--------------------------- -------------------------------
-------
<artist> <name>Johnny
Hartman</name> <song> <name>It Was
Almost Like a Song</name></song>
<song> <name>I See Your Face Before
Me</name></song> <song> <name>For
All We Know</name></song> <song>
<name>Easy Living</name></song>
</artist> <artist> <name>Harry Connick,
Jr.</name> <song> <name>Sonny
Cried</name></song> <song>
<name>A Nightingale Sang in Berkeley
Square</name></song> <song>
<name>Heavenly</name></song>
<song> <name>You Didn''t Know Me
When</name></song> </artist>
</songs>'
--------------------------- -------------------------------
-------
artist song
--------------------------- -------------------------------
-------
4 2 1 song
5 4 1 name
22 5 3 #text
6 2 1 song
7 6 1 name
23 7 3 #text
8 2 1 song
9 8 1 name
24 9 3 #text
10 2 1 song
11 10 1 name
25 11 3 #text
14 12 1 song
15 14 1 name
26 15 3 #text
16 12 1 song
17 16 1 name
27 17 3 #text
18 12 1 song
19 18 1 name
28 19 3 #text
20 12 1 song
21 20 1 name
29 21 3 #text
USE tempdb
GO
(ArtistId varchar(5),
Name varchar(30)) GO
(ArtistId varchar(5),
SongId int,
Name varchar(50)) GO
SELECT id,name
FROM OPENXML(@hdoc,
'/songs/artist/song', 1) WITH (artistid
varchar(5) '../@id', id int '@id', name
varchar(50) '@name') EXEC
sp_xml_removedocument @hDoc
GO
GO
ArtistId Name
-------- ------------------------------
USE tempdb
GO
Name varchar(30)) GO
(ArtistId varchar(5),
SongId int,
Name varchar(50)) GO
SELECT a.ArtistId,
ON (a.ArtistId=s.ArtistId)
GO
INSERT Artists
GO
GO
GO
DROP TABLE Artists, Songs
ArtistId Name
-------- ------------------------------
Configuring a virtual directory allows you to work with SQL Server's XML features via
HTTP. You use a virtual directory to establish a link between a SQL Server database
and a segment of a URL. It provides a navigation path from the root directory on
your Web server to a database on your SQL Server.
SQL Server's ability to publish data over HTTP is made possible through SQLISAPI, an
Internet Server API (ISAPI) extension that ships with the product. SQLISAPI uses
SQLOLEDB, SQL Server's native OLE DB provider, to access the database associated
with a virtual directory and return results to the client.
Client applications have four methods of requesting data from SQL Server over HTTP.
These can be broken down into two broad types: those more suitable for private
intranet access because of security concerns, and those safe to use on the public
Internet.
Private Intranet
Public Internet
Due to their open-ended nature, methods 1 and 2 could pose security risks over the
public Internet but are perfectly valid on corporate or private intranets. Normally,
Web applications use server-side schemas and query templates to make XML data
accessible to the outside world in a controlled fashion.
Load the Configure IIS Support utility in the SQLXML folder under Start | Programs.
You should see the IIS servers configured on the current machine. Click the plus sign
to the left of your server name to expand it. (If your server isn't listed�for example,
if it's a remote server�right-click the IIS Virtual Directory Manager node and select
Connect to connect to your server.) To add a new virtual directory, right-click the
Default Web Site node and select New | Virtual Directory. You should then see the
New Virtual Directory Properties dialog.
Specifying a Virtual Directory Name and Path
The Virtual Directory Name entry box is where you specify the name of the new
virtual directory. This is the name that users will include in a URL to access the data
exposed by the virtual directory, so it's important to make it descriptive. A common
convention is to name virtual directories after the databases they reference. To work
through the rest of the examples in the chapter, specify Northwind as the name of
the new virtual directory.
Though Local Path will sometimes not be used, it's required nonetheless. In a normal
ASP or HTML application, this would be the path where the source files themselves
reside. In SQLISAPI applications, this folder does not necessarily need to contain
anything, but it must exist nevertheless. On NTFS partitions, you must also make
sure that users have at least read access to this folder in order to use the virtual
directory. You configure which user account will be used to access the application
(and thus will need access to the folder) in the dialog's Security page.
Click the Security tab to select the authentication mode you'd like to use. You can
use a specific user account, Windows Integrated Authentication, or Basic (clear text)
Authentication. Select the option that matches your usage scenario most closely;
Windows Integrated Authentication will likely be the best choice for working through
the demos in this chapter.
Next, click the Data Source page tab. This is where you set the SQL Server and the
database that the virtual directory references. Select your SQL Server from the list
and specify Northwind as the database name.
Go to the Virtual Names table and set up two virtual names, templates and schemas.
Create two folders under Northwind named Templates and Schemas so that each of
these virtual names can have its own local folder. Set the type for schemas to
schema and the type for templates to template. Each of these provides a navigation
path from a URL to the files in its local folder. We'll use them later.
The last dialog page we're concerned with is the Settings page. Click it, then make
sure every checkbox on it is checked. We want to allow all of these options so that
we may test them later in the chapter. The subsections below provide brief
descriptions of each of the options on the Settings page.
When this option is enabled, you can execute queries posted to a URL (via an HTTP
GET or POST command) as sql= or template= parameters. URL queries allow users
to specify a complete Transact-SQL query via a URL. Special characters are replaced
with placeholders, but, essentially, the query is sent to the server as is, and its
results are returned over HTTP. Note that this option allows users to execute
arbitrary queries against the virtual root and database, so you shouldn't enable it for
anything but intranet use. Go ahead and enable it for now so that we can try it out
later.
Selecting this option disables the Allow template=… containing updategrams only
option because you can always post XML templates with updategrams when this
option is selected. The Allow template=… containing updategrams only option
permits XML templates (that contain only updategrams) to be posted to a URL. Since
this disallows SQL and XPath queries from existing in a template, it provides some
limited security.
Template queries are by far the most popular method of retrieving XML data from
SQL Server over HTTP. XML documents that store query templates�generic
parameterized queries with placeholders for parameters�reside on the server and
provide a controlled access to the underlying data. The results from template
queries are returned over HTTP to the user.
Allow XPath
When Allow XPath is enabled, users can use a subset of the XPath language to
retrieve data from SQL Server based on an annotated schema. Annotated schemas
are stored on a Web server as XML documents and map XML elements and
attributes to the data in the database referenced by a virtual directory. XPath queries
allow the user to specify the data defined in an annotated schema to return.
Allow POST
HTTP supports the notion of sending data to a Web server via its POST command.
When Allow POST is enabled, you can post a query template (usually implemented
as a hidden form field on a Web page) to a Web server via HTTP. This causes the
query to be executed and returns the results back to the client.
As I mentioned earlier, the open-endedness of this usually limits its use to private
intranets. Malicious users could form their own templates and post them over HTTP
to retrieve data to which they aren't supposed to have access or, worse yet, make
changes to it.
This option specifies that XML formatting (e.g., FOR XML) is to be done on the client
side. Enabling this option allows you to offload to the client the work of translating a
rowset into XML for HTTP queries.
This option controls whether query errors in an XML template are returned in the
HTTP header or as part of the generated XML document. When this option is enabled
and a query in a template fails, HTTP error 512 is returned and error descriptions are
returned in the HTTP header. When it's disabled and a template query fails, the HTTP
success code, 200, is returned, and the error descriptions are returned as processing
instructions inside the XML document.
Enable all the options on the Settings page except the last two described above and
click OK to create your new virtual directory.
TIP: A handy option on the Advanced tab is Disable caching of mapping schemas.
Normally, mapping schemas are cached in memory the first time they're used and
accessed from the cache thereafter. While developing a mapping schema, you'll
likely want to disable this so that the schema will be reloaded each time you test it.
URL Queries
The facility that permits SQL Server to be queried via HTTP resides in SQLXML's ISAPI
extension DLL, SQLISn.DLL, commonly referred to as SQLISAPI. Although the
Configure IIS Support tool provides a default, you can configure the exact extension
DLL uses when you set up a virtual directory for use by HTTP queries.
If you attach to IIS (the executable name is inetinfo.exe) with WinDbg prior to
running any HTTP queries, you'll see ModLoad messages for SQLISn.DLL as well as
one or two other DLLs. An ISAPI extension DLL is not loaded until the first time it's
called.
Architecturally, here's what happens when you execute a basic URL query.
2. It travels from your browser to the Web server as an HTTP GET request.
3. The virtual directory specified in your query indicates which extension DLL
should be called to process the URL. IIS loads the appropriate extension and
passes your query to it.
6. The ISAPI extension returns the result data to the Web server, which then, in
turn, sends it to the client browser that requested it. Thus, the original HTTP
GET request is completed.
Using URL Queries
The easiest way to test the virtual directory you built earlier is to submit a URL query that uses it from an XML-enabled browser suc
Explorer. URL queries take this form:
http://localhost/Northwind?sql=SELECT+*+FROM+Customers+FOR+XML+AUTO &root=Customers
NOTE: As with all URLs, the URL listed above should be typed on one line. Page width restrictions may force some of the URLs liste
to span multiple lines, but a URL should always be typed on a single line.
Here, localhost is the name of the Web server. It could just as easily be a fully qualified DNS domain name such as http://www.khen
Northwind is the virtual directory name we created earlier.
A question mark separates the URL from its parameters. Multiple parameters are separated by ampersands. The first parameter we
named sql. It specifies the query to run. The second parameter specifies the name of the root element for the XML document that w
returned. By definition, you get just one of these per document. Failure to specify a root element results in an error if your query re
than one top-level element.
To see how this works, submit the URL shown in Listing 18.25 from your Web browser. (Be sure to change localhost to the correct n
Web server if it resides on a different machine).
Listing 18.25
http://localhost/Northwind?sql=SELECT+*+FROM+Customers+WHERE
+CustomerId='ALFKI'+FOR+XML+AUTO
(Results)
Notice that we left off the root element specification. Look at what happens when we bring back more than one row (Listing 18.26)
Listing 18.26
http://localhost/Northwind?sql=SELECT+*+FROM+Customers+
WHERE+CustomerId='ALFKI'+OR+CustomerId='ANATR'+FOR+XML+AUTO
(Results abridged)
Since we're returning multiple top-level elements (two, to be exact), our XML document has two root elements named Customers,
course, isn't allowed since it isn't well-formed XML. To remedy the situation, we need to specify a root element. This element can b
anything�it serves only to wrap the rows returned by FOR XML so that we have a well-formed document. Listing 18.27 shows an e
Listing 18.27
http://localhost/Northwind?sql=SELECT+*+FROM+Customers+WHERE
+CustomerId='ALFKI'+OR+CustomerId='ANATR'+FOR+XML+AUTO
&root=CustomerList
(Results)
You can also supply the root element yourself as part of the sql parameter, as shown in Listing 18.28.
Listing 18.28
http://localhost/Northwind?sql=SELECT+'<CustomerList>';
SELECT+*+FROM+Customers+WHERE+CustomerId='ALFKI'+OR
+CustomerId='ANATR'+FOR+XML+AUTO;
SELECT+'</CustomerList>';
(Results formatted)
<CustomerList>
<Customers CustomerID="ALFKI" CompanyName="Alfreds Futterkiste"
ContactName="Maria Anders" ContactTitle="Sales Representative"
Address="Obere Str. 57" City="Berlin" PostalCode="12209"
Country="Germany" Phone="030-0074321" Fax="030-0076545" />
<Customers CustomerID="ANATR" CompanyName=
"Ana Trujillo Emparedados y helados" ContactName="Ana Trujillo"
ContactTitle="Owner" Address="Avda. de la Constituci�n 2222"
City="México D.F." PostalCode="05021" Country="Mexico"
Phone="(5) 555-4729" Fax="(5) 555-3745" />
</CustomerList>
The sql parameter of this URL actually contains three queries. The first one generates an opening tag for the root element. The sec
query itself, and the third generates a closing tag for the root element. We separate the individual queries with semicolons.
As you can see, FOR XML returns XML document fragments, so you'll need to provide a root element in order to produce a well-form
Special Characters
Certain characters that are perfectly valid in Transact-SQL can cause problems in URL queries because they have special meanings
You've already noticed that we're using the plus symbol (+) to signify a space character. Obviously, this precludes the direct use of
itself. Instead, you must encode characters that have special meaning within a URL query so that SQLISAPI can properly translate t
passing on the query to SQL Server. Encoding a special character amounts to specifying a percent sign (%) followed by the charact
value in hexadecimal. Table 18.3 lists the special characters recognized by SQLISAPI and their corresponding values.
http://localhost/Northwind?sql=SELECT+'<CustomerList>';SELECT+*+FROM+Customers+
WHERE+CustomerId+LIKE+'A%25'+FOR+XML+AUTO;SELECT+'</CustomerList>';
This query specifies a LIKE predicate that includes an encoded percent sign (%), Transact-SQL's wildcard symbol. Hexadecimal 25 (
the ASCII value of the percent sign, so we encode it as %25.
+ 2B
& 26
? 3F
% 25
/ 2F
# 23
Style Sheets
In addition to the sql and root parameters, a URL query can also include the xsl parameter in order to specify an XML style sheet to
translate the XML document that's returned by the query into a different format. The most common use of this feature is to transla
document into HTML. This allows you to view the document using browsers that aren't XML aware and gives you more control over
the document in those that are. Here's a URL query that includes the xsl parameter:
http://localhost/Northwind?
sql=SELECT+CustomerId,+CompanyName+FROM+Customers+FOR+XML+AUTO&root=CustomerList&xsl=CustomerList.xsl
Listing 18.29 shows the XSL style sheet it references and the output produced.
Listing 18.29
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="/">
<HTML>
<BODY>
<TABLE border="1">
<TR>
<TD><B>Customer ID</B></TD>
<TD><B>Company Name</B></TD>
</TR>
<xsl:for-each select="CustomerList/Customers">
<TR>
<TD>
<xsl:value-of select="@CustomerId"/>
</TD>
<TD>
<xsl:value-of select="@CompanyName"/>
</TD>
</TR>
</xsl:for-each>
</TABLE>
</BODY>
</HTML>
</xsl:template>
</xsl:stylesheet>
(Results abridged)
By default, SQLISAPI returns the results of a URL query with the appropriate type specified in the header so that a browser can prop
When FOR XML is used in the query, this is text/xml unless the xsl attribute specifies a style sheet that translates the XML docume
In that case, text/html is returned.
You can force the content type using the contenttype URL query parameter, like this:
http://localhost/Northwind?
sql=SELECT+CustomerId,+CompanyName+FROM+Customers+FOR+XML+AUTO&root=CustomerList&xsl=CustomerList.xsl&content
Here, we've specified the style sheet from the previous example in order to cause the content type to default to text/html. Then we
default by specifying a contenttype parameter of text/xml. The result is an XML document containing the translated result set, as s
18.30.
Listing 18.30
<HTML>
<BODY>
<TABLE border="1">
<TR>
<TD>
<B>Customer ID</B>
</TD>
<TD>
<B>Company Name</B>
</TD>
</TR>
<TR>
<TD>ALFKI</TD>
<TD>Alfreds Futterkiste</TD>
</TR>
<TR>
<TD>ANATR</TD>
<TD>Ana Trujillo Emparedados y helados</TD>
</TR>
<TR>
<TD>WILMK</TD>
<TD>Wilman Kala</TD>
</TR>
<TR>
<TD>WOLZA</TD>
<TD>Wolski Zajazd</TD>
</TR> </TABLE> </BODY>
</HTML>
So, even though the document consists of well-formed HTML, it's rendered as an XML document because we've forced the content
Non-XML Results
Being able to specify the content type comes in particularly handy when working with XML fragments in an XML-aware browser. As
earlier, executing a FOR XML query with no root element results in an error. You can, however, work around this by forcing the cont
like this:
http://localhost/Northwind?
sql=SELECT+*+FROM+Customers+WHERE+CustomerId='ALFKI'+OR+CustomerId='ANATR'+FOR+XML+AUTO&contenttype=text/html
If you load this URL in a browser, you'll probably see a blank page because most browsers ignore tags that they don't understand.
can view the source of the Web page and you'll see an XML fragment returned as you'd expect. This would be handy in situations w
communicating with SQLISAPI using HTTP from outside of a browser�from an application of some sort. You could return the XML fr
client, then use client-side logic to apply a root element and/or process the XML further.
SQLISAPI also allows you to omit the FOR XML clause in order to return a single column from a table, view, or table-valued function
stream, as shown in Listing 18.31.
Listing 18.31
http://localhost/Northwind?sql=SELECT+CAST(CustomerId+AS+
char(10))+AS+CustomerId+FROM+Customers+ORDER+BY+CustomerId
&contenttype=text/html
(Results)
ALFKI ANATR ANTON AROUT BERGS BLAUS BLONP BOLID BONAP BOTTM BSBEV
CACTU CENTC CHOPS COMMI CONSH DRACD DUMON EASTC ERNSH FAMIA FISSA
FOLIG FOLKO FRANK FRANR FRANS FURIB GALED GODOS GOURL GREAL GROSR
HANAR HILAA HUNGC HUNGO ISLAT KOENE LACOR LAMAI LAUGB LAZYK LEHMS
LETSS LILAS LINOD LONEP MAGAA MAISD MEREP MORGK NORTS OCEAN OLDWO
OTTIK PARIS PERIC PICCO PRINI QUEDE QUEEN QUICK RANCH RATTC REGGC
RICAR RICSU ROMEY SANTG SAVEA SEVES SIMOB SPECD SPLIR SUPRD THEBI
THECR TOMSP TORTU TRADH TRAIH VAFFE VICTE VINET WANDK WARTH WELLI
WHITC WILMK WOLZA
Note that SQLISAPI doesn't support returning multicolumn results this way. That said, this is still a handy way to quickly return a sim
Stored Procedures
You can execute stored procedures via URL queries just as you can other types of Transact-SQL queries. Of course, this procedure n
its result using FOR XML if you intend to process it as XML in the browser or on the client side. The stored procedure in Listing 18.3
Listing 18.32
Once your procedure correctly returns results in XML format, you can call it from a URL query using the Transact-SQL EXEC comma
18.33 shows an example of a URL query that calls a stored procedure using EXEC.
Listing 18.33
http://localhost/Northwind?sql=EXEC+ListCustomersXML
+@CustomerId='A%25',@CompanyName='An%25'&root=CustomerList
(Results)
Notice that we specify the Transact-SQL wildcard character "%" by using its encoded equivalent, %25. This is necessary, as I said e
% has special meaning in a URL query.
TIP: You can also use the ODBC CALL syntax to call a stored procedure from a URL query. This executes the procedures via an RPC
server, which is generally faster and more efficient than normal T-SQL language events. On high-volume Web sites, the small differ
performance this makes can add up quickly.
Here are a couple of URL queries that use the ODBC CALL syntax:
http://localhost/Northwind?sql={CALL+ListCustomersXML}+&root=CustomerList
http://localhost/Northwind?sql={CALL+ListCustomersXML('ALFKI')}+&root=CustomerList
If you submit one of these URLs from your Web browser while you have a Profiler trace running that includes the RPC:Starting even
see an RPC:Starting event for the procedure. This indicates that the procedure is being called via the more efficient RPC mechanism
via a language event.
See the Template Queries section below for more information on making RPCs from SQLXML.
Template Queries
A safer and more widely used technique for retrieving data over HTTP is to use
server-side XML templates that encapsulate Transact-SQL queries. Because these
templates are stored on the Web server and referenced via a virtual name, the end
users never see the source code. The templates are XML documents based on the
XML-SQL namespace and function as a mechanism for translating a URL into a query
that SQL Server can process. As with plain URL queries, results from template
queries are returned as either XML or HTML.
Listing 18.34
Note the use of the sql namespace prefix with the query itself. This is made possible
by the namespace reference on the second line of the template (bolded).
Here we're merely returning two columns from the Northwind Customers table, as
we've done several times in this chapter. We include FOR XML AUTO to return the
data as XML. The URL shown in Listing 18.35 uses the template, along with the data
it returns.
Listing 18.35
http://localhost/Northwind/templates/CustomerList.XML
(Results abridged)
Notice that we're using the templates virtual name that we created under the
Northwind virtual directory earlier.
Parameterized Templates
You can also create parameterized XML query templates that permit the user to
supply parameters to the query when it's executed. You define parameters in the
header of the template, which is contained in its sql:header element. Each
parameter is defined using the sql:param tag and can include an optional default
value. Listing 18.36 presents an example.
Listing 18.36
Note the use of sql:param to define the parameter. Here, we give the parameter a
default value of % since we're using it in a LIKE predicate in the query. This means
that we list all customers if no value is specified for the parameter.
Note that SQLISAPI is smart enough to submit a template query to the server as an
RPC when you define query parameters. It binds the parameters you specify in the
template as RPC parameters and sends the query to SQL Server using RPC API calls.
This is more efficient than using T-SQL language events and should result in better
performance, particularly on systems with high throughput.
Listing 18.37
http://localhost/Northwind/Templates/CustomerList2.XML?
CustomerId=A%25
(Results)
Style Sheets
As with regular URL queries, you can specify a style sheet to apply to a template
query. You can do this in the template itself or in the URL that accesses it. Here's an
example of a URL that applies a style sheet to a template query:
http://localhost/Northwind/Templates/CustomerList3.XML?
xsl=Templates/CustomerList3.xsl&contenttype=text/html
Note the use of the contenttype parameter to force the output to be treated as HTML
(bolded). We do this because we know that the style sheet we're applying translates
the XML returned by SQL Server into an HTML table.
We include the relative path from the virtual directory to the style sheet because it's
not automatically located in the Templates folder even though the XML document is
located there. The path specifications for a template query and its parameters are
separate from one another.
As I've mentioned, the XML-SQL namespace also supports specifying the style sheet
in the template itself. Listing 18.38 shows a template that specifies a style sheet.
Listing 18.38
Listing 18.39
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="/">
<HTML>
<BODY>
<TABLE border="1">
<TR>
<TD><I>Customer ID</I></TD>
<TD><I>Company Name</I></TD>
</TR>
<xsl:for-each select="CustomerList/Customers">
<TR>
<TD><B>
<xsl:value-of select="@CustomerId"/>
</B></TD>
<TD>
<xsl:value-of select="@CompanyName"/>
</TD>
</TR>
</xsl:for-each>
</TABLE>
</BODY>
</HTML>
</xsl:template>
</xsl:stylesheet>
Listing 18.40 shows a URL that uses the template and the style sheet shown in the
previous two listings, along with the results it produces.
Listing 18.40
http://localhost/Northwind/Templates/CustomerList4.XML?
contenttype=text/html
(Results abridged)
Note that, once again, we specify the contenttype parameter in order to force the
output to be treated as HTML. This is necessary because XML-aware browsers such
as Internet Explorer automatically treat the output returned by XML templates as
text/xml. Since the HTML we're returning is also well-formed XML, the browser
doesn't know to render it as HTML unless we tell it to. That's what the contenttype
specification is for�it causes the browser to render the output of the template query
as it would any other HTML document.
TIP: While developing XML templates and similar documents that you then test in a
Web browser, you may run into problems with the browser caching old versions of
documents, even when you click the Refresh button or hit the Refresh key (F5). In
Internet Explorer, you can press Ctrl+F5 to cause a document to be completely
reloaded, even if the browser doesn't think it needs to be. Usually, this resolves
problems with an old version persisting in memory after you've changed the one on
disk.
You can also disable the caching of templates for a given virtual directory by
selecting the Disable caching of templates option on the Advanced page of the
Properties dialog for the virtual directory. I almost always disable all caching while
developing templates and other XML documents.
Applying Style Sheets on the Client
If the client is XML-enabled, you can also apply style sheets to template queries on
the client side. This offloads a bit of the work of the server but requires a separate
roundtrip to download the style sheet to the client. If the client is not XML-enabled,
the style sheet will be ignored, making this approach more suitable to situations
where you know for certain whether your clients are XML-enabled, such as with
private intranet or corporate applications.
Listing 18.41
Note the xml-stylesheet specification at the top of the document (bolded). This tells
the client-side XML processor to download the style sheet specified in the href
attribute and apply it to the XML document rendered by the template. Listing 18.42
shows the URL and results.
Listing 18.42
http://localhost/Northwind/Templates/CustomerList5.XML?
contenttype=text/html
(Results abridged)
Client-Side Templates
As I mentioned earlier, it's far more popular (and safer) to store templates on your
Web server and route users to them via virtual names. That said, there are times
when allowing the user the flexibility to specify templates on the client side is very
useful. Specifying client-side templates in HTML or in an application alleviates the
necessity to set up in advance the templates or the virtual names that reference
them. While this is certainly easier from an administration standpoint, it's potentially
unsafe on the public Internet because it allows clients to specify the code they run
against your SQL Server. Use of this technique should probably be limited to private
intranets and corporate networks.
Listing 18.43
<HTML>
<HEAD>
<TITLE>Customer List</TITLE>
</HEAD>
<BODY>
<FORM action='http://localhost/Northwind' method='POST'>
<B>Customer ID Number</B>
<INPUT type=text name=CustomerId value='AAAAA'>
<INPUT type=hidden name=xsl value=Templates/CustomerList2.xsl>
<INPUT type=hidden name=template value='
<CustomerList xmlns:sql="urn:schemas-microsoft-com:xml-sql">
<sql:header>
<sql:param name="CustomerId">%</sql:param>
</sql:header>
<sql:query>
SELECT CompanyName, ContactName
FROM Customers
WHERE CustomerId LIKE @CustomerId
FOR XML AUTO
</sql:query>
</CustomerList>
'>
<P><input type='submit'>
</FORM>
</BODY>
</HTML>
The client-side template (bolded) is embedded as a hidden field in the Web page. If
you open this page in a Web browser, you should see an entry box for a Customer ID
and a submit button. Entering a customer ID or mask and clicking Submit Query will
post the template to the Web server. SQLISAPI will then extract the query contained
in the template and run it against SQL Server's Northwind database (because of the
template's virtual directory reference). The CustomerList2.xsl style sheet will then be
applied to translate the XML document that SQL Server returns into HTML, and the
result will be returned to the client. Listing 18.44 shows an example.
Listing 18.44
Customer ID Number
(Results)
As with server-side templates, client-side templates are sent to SQL Server using an
RPC.
<?xml version="1.0"?>
<Schema name="NorthwindProducts"
xmlns="urn:schemas-microsoft-com:xml-
data"
xmlns:dt="<span
class="docEmphStrong">urn:schemas-
microsoft-com:datatypes</span>">
<ElementType name="Description"
dt:type="string"/> <ElementType
name="Price" dt:type="fixed.19.4"/>
<ElementType name="Product"
model="closed"> <AttributeType
name="ProductCode" dt:type="string"/>
<attribute type="ProductCode"
required="yes"/> <element
type="Description" minOccurs="1"
maxOccurs="1"/> <element type="Price"
minOccurs="1" maxOccurs="1"/>
</ElementType>
<ElementType name="Category"
model="closed"> <AttributeType
name="CategoryID" dt:type="string"/>
<AttributeType name="CategoryName"
dt:type="string"/> <attribute
type="CategoryID" required="yes"/>
<attribute type="CategoryName"
required="yes"/> <element
type="Product" minOccurs="1"
maxOccurs="*"/> </ElementType>
<ElementType name="Catalog"
model="closed"> <element
type="Category" minOccurs="1"
maxOccurs="1"/> </ElementType>
</Schema>
<?xml version="1.0"?>
<Catalog xmlns=
"x-
schema:http://localhost/ProductsCat.xdr">
<Category CategoryID="1"
CategoryName="Beverages"> <Product
ProductCode="1">
<Description>Chai</Description>
<Price>18</Price> </Product>
<Product ProductCode="2">
<Description>Chang</Description>
<Price>19</Price> </Product>
</Category>
<Category CategoryID="2"
CategoryName="Condiments"> <Product
ProductCode="3"> <Description>Aniseed
Syrup</Description> <Price>10</Price>
</Product>
</Category>
</Catalog>
<Product ProductCode="<span
class="docEmphStrong">2</span>">
<Description><span
class="docEmphStrong">Chang</span>
</Description> <Price><span
class="docEmphStrong">19</span>
</Price> </Product>
</Category>
<Category CategoryID="<span
class="docEmphStrong">2</span>"
CategoryName="<span
class="docEmphStrong">Condiments</sp
an>"> <Product ProductCode="<span
class="docEmphStrong">3</span>">
<Description><span
class="docEmphStrong">Aniseed
Syrup</span></Description> <Price>
<span
class="docEmphStrong">10</span>
</Price> </Product>
</Category>
</Catalog>
<?xml version="1.0"?>
<Schema name="customers"
xmlns="urn:schemas-microsoft-com:xml-
data"> <ElementType
name="Customers"> <AttributeType
name="CustomerId"/> <AttributeType
name="CompanyName"/>
</ElementType>
</Schema>
<?xml version="1.0"?>
<Schema name="customers"
xmlns="urn:schemas-microsoft-com:xml-
data"> <ElementType
name="Customers"> <ElementType
name="CustomerId"
content="textOnly"/> <ElementType
name="CompanyName"
content="textOnly"/> </ElementType>
</Schema>
<?xml version="1.0"?>
<Schema name="customers"
xmlns="urn:schemas-microsoft-com:xml-
data"> xmlns:sql="urn:schemas-
microsoft-com:xml-sql"> <ElementType
name="Customer"
sql:relation="Customers"> <AttributeType
name="CustomerNumber"
sql:field="CustomerId"/> <AttributeType
name="Name"
sql:field="CompanyName"/>
</ElementType>
</Schema>
<xsd:schema
xmlns:xsd="http://www.w3.org/2001/XMLS
chema"
xmlns:sql="urn:schemas-microsoft-
com:mapping-schema"> <xsd:element
name="Customers" >
<xsd:complexType>
<xsd:attribute name="CustomerID"
type="xsd:string" /> <xsd:attribute
name="CompanyName"
type="xsd:string" /> <xsd:attribute
name="ContactName" type="xsd:string"
/> </xsd:complexType>
</xsd:element>
</xsd:schema>
<ROOT xmlns:sql="urn:schemas-
microsoft-com:xml-sql"> <sql:xpath-query
mapping-schema="Customers.xsd">
/Customers
</sql:xpath-query>
</ROOT>
<xsd:schema
xmlns:xsd="http://www.w3.org/2001/XMLS
chema"
xmlns:sql="urn:schemas-microsoft-
com:mapping-schema"> <xsd:element
name="Cust" sql:relation="Customers" >
<xsd:complexType>
<xsd:sequence>
<xsd:element name="CustNo"
sql:field="CustomerId"
type="xsd:integer" />
<xsd:element name="Contact"
sql:field="ContactName"
type="xsd:string" />
<xsd:element name="Company"
sql:field="CompanyName"
type="xsd:string" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
<xsd:schema
xmlns:xsd="http://www.w3.org/2001/XMLS
chema"
xmlns:sql="urn:schemas-microsoft-
com:mapping-schema"> <xsd:element
name="Cust" sql:relation="Customers" >
<xsd:complexType>
<xsd:attribute name="CustNo"
sql:field="CustomerId"
type="xsd:integer" />
<xsd:attribute name="Contact"
sql:field="ContactName"
type="xsd:string" />
<xsd:attribute name="Company"
sql:field="CompanyName"
type="xsd:string" />
</xsd:complexType>
</xsd:element>
</xsd:schema>
<xsd:schema
xmlns:xsd="http://www.w3.org/2001/XMLS
chema"
xmlns:sql="urn:schemas-microsoft-
com:mapping-schema">
<xsd:element name="Employee"
sql:relation="Employees"
type="EmployeeType" />
<xsd:complexType
name="EmployeeType" >
<xsd:sequence>
<xsd:element name="Order"
sql:relation="Orders">
<xsd:annotation>
<xsd:appinfo>
<sql:relationship
parent="Employees"
parent-key="EmployeeID"
child="Orders"
child-key="EmployeeID" />
</xsd:appinfo>
</xsd:annotation>
<xsd:complexType>
<xsd:attribute name="OrderID"
type="xsd:integer" /> <xsd:attribute
name="EmployeeID" type="xsd:integer"
/> </xsd:complexType>
</xsd:element>
</xsd:sequence>
<xsd:attribute name="EmployeeID"
type="xsd:integer" /> <xsd:attribute
name="LastName" type="xsd:string" />
</xsd:complexType>
</xsd:schema>
<xsd:schema
xmlns:xsd="http://www.w3.org/2001/XMLS
chema"
xmlns:sql="urn:schemas-microsoft-
com:mapping-schema">
<xsd:element name="OrderDetails"
sql:relation="[Order Details]"
type="OrderDetailsType" />
<xsd:complexType
name="OrderDetailsType" >
<xsd:sequence>
<xsd:element name="Order"
sql:relation="Orders">
<xsd:annotation>
<xsd:appinfo>
<sql:relationship
parent="[Order Details]"
parent-key="OrderID"
child="Orders"
child-key="OrderID"
inverse="true" />
</xsd:appinfo>
</xsd:annotation>
<xsd:complexType>
<xsd:attribute name="OrderID"
type="xsd:integer" /> <xsd:attribute
name="EmployeeID" type="xsd:integer"
/> </xsd:complexType>
</xsd:element>
</xsd:sequence>
<xsd:attribute name="ProductID"
type="xsd:integer" /> <xsd:attribute
name="Qty" sql:field="Quantity"
type="xsd:integer" />
</xsd:complexType>
</xsd:schema>
<xsd:schema
xmlns:xsd="http://www.w3.org/2001/XMLS
chema"
xmlns:sql="urn:schemas-microsoft-
com:mapping-schema">
<xsd:element name="Employee"
sql:relation="Employees"
type="EmployeeType" />
<xsd:complexType
name="EmployeeType" >
<xsd:sequence>
<xsd:element name="Order"
sql:relation="Orders">
<xsd:annotation>
<xsd:appinfo>
<sql:relationship
parent="Employees"
parent-key="EmployeeID"
child="Orders"
child-key="EmployeeID" />
</xsd:appinfo>
</xsd:annotation>
<xsd:complexType>
<xsd:attribute name="OrderID"
type="xsd:integer" /> <xsd:attribute
name="EmployeeID" type="xsd:integer"
/> </xsd:complexType>
</xsd:element>
</xsd:sequence>
<xsd:attribute name="EmployeeID"
type="xsd:integer" /> <xsd:attribute
name="LastName" type="xsd:string" />
<xsd:attribute name="Level"
type="xsd:integer"
sql:mapped="0" />
</xsd:complexType>
</xsd:schema>
<xsd:schema
xmlns:xsd="http://www.w3.org/2001/XMLS
chema"
xmlns:sql="urn:schemas-microsoft-
com:mapping-schema">
<xsd:element name="Employee"
sql:relation="Employees"
type="EmployeeType" />
<xsd:complexType
name="EmployeeType" >
<xsd:sequence>
<xsd:element name="Order"
sql:relation="Orders">
<xsd:annotation>
<xsd:appinfo>
<sql:relationship
parent="Employees"
parent-key="EmployeeID"
child="Orders"
child-key="EmployeeID" />
</xsd:appinfo>
</xsd:annotation>
<xsd:complexType>
<xsd:attribute name="OrderID"
type="xsd:integer" /> <xsd:attribute
name="EmployeeID" type="xsd:integer"
/> </xsd:complexType>
</xsd:element>
</xsd:sequence>
<xsd:attribute name="EmployeeID"
type="xsd:integer"
sql:limit-field="EmployeeID"
sql:limit-value="3"/>
<xsd:attribute name="LastName"
type="xsd:string" /> </xsd:complexType>
</xsd:schema>
<ROOT xmlns:sql="urn:schemas-
microsoft-com:xml-sql"> <sql:xpath-query
mapping-
schema="EmpOrders_Filtered.XSD">
/Employee
</sql:xpath-query>
</ROOT>
<ROOT xmlns:sql="urn:schemas-
microsoft-com:xml-sql"> <Employee
EmployeeID="3" LastName="Leverling">
<Order EmployeeID="3"
OrderID="10251" /> <Order
EmployeeID="3" OrderID="10253" />
<Order EmployeeID="3"
OrderID="10256" /> <Order
EmployeeID="3" OrderID="10266" />
<Order EmployeeID="3"
OrderID="10273" /> <Order
EmployeeID="3" OrderID="10283" />
<Order EmployeeID="3"
OrderID="10309" /> <Order
EmployeeID="3" OrderID="10321" />
<Order EmployeeID="3"
OrderID="10330" /> <Order
EmployeeID="3" OrderID="10332" />
<Order EmployeeID="3"
OrderID="10346" /> <Order
EmployeeID="3" OrderID="10352" /> ...
</ROOT>
<xsd:schema
xmlns:xsd="http://www.w3.org/2001/XMLS
chema"
xmlns:sql="urn:schemas-microsoft-
com:mapping-schema">
<xsd:element name="Employee"
sql:relation="Employees"
type="EmployeeType"
sql:key-fields="EmployeeID"/>
<xsd:complexType
name="EmployeeType" >
<xsd:sequence>
<xsd:element name="Order"
sql:relation="Orders">
<xsd:annotation>
<xsd:appinfo>
<sql:relationship
parent="Employees"
parent-key="EmployeeID"
child="Orders"
child-key="EmployeeID" />
</xsd:appinfo>
</xsd:annotation>
<xsd:complexType>
<xsd:attribute name="OrderID"
type="xsd:integer" /> <xsd:attribute
name="EmployeeID" type="xsd:integer"
/> </xsd:complexType>
</xsd:element>
</xsd:sequence>
<xsd:attribute name="LastName"
type="xsd:string" /> <xsd:attribute
name="FirstName" type="xsd:string" />
</xsd:complexType>
</xsd:schema>
For row deletions, an updategram will have a before image but no after image. For
insertions, it will have an after image but no before image. And, of course, for
updates, an updategram will have both a before image and an after image. Listing
18.58 provides an example.
Listing 18.58
<?xml version="1.0"?>
<employeeupdate xmlns:updg=
"urn:schemas-microsoft-com:xml-updategram">
<updg:sync>
<updg:before>
<Employees EmployeeID="4"/>
</updg:before>
<updg:after>
<Employees City="Scotts Valley" Region="CA"/>
</updg:after>
</updg:sync>
</employeeupdate>
In this example, we change the City and Region columns for Employee 4 in the
Northwind Employees table. The EmployeeID attribute in the before element
identifies the row to change, and the City and Region attributes in the after element
identify which columns to change and what values to assign them.
Each batch of updates within a sync element is considered a transaction. Either all
the updates in the sync element succeed or none of them do. You can include
multiple sync elements to break updates into multiple transactions.
Mapping Data
Of course, in sending data to the server for updates, deletions, and insertions via
XML, we need a means of linking values in the XML document to columns in the
target database table. SQL Server sports two facilities for doing this: default
mapping and mapping schemas.
Default Mapping
Naturally, the easiest way to map data in an updategram to columns in the target
table is to use the default mapping (also known as intrinsic mapping). With default
mapping, a before or after element's top-level tag is assumed to refer to the target
database table, and each subelement or attribute it contains refers to a column of
the same name in the table.
Here's an example that shows how to map the OrderID column in the Orders table:
<Orders OrderID="10248"/>
This example maps XML attributes to table columns. You could also map
subelements to table columns, like this:
<Orders>
<OrderID>10248</OrderID>
</Orders>
You need not select either attribute-centric or element-centric mapping. You can
freely mix them within a given before or after element, as shown below:
<Orders OrderID="10248">
<ShipCity>Reims</ShipCity>
</Orders>
Use the four-digit hexadecimal UCS-2 code for characters in table names that are
illegal in XML elements (e.g., spaces). For example, to reference the Northwind
Order Details table, do this:
<Order_x0020_Details OrderID="10248"/>
Mapping Schemas
You can also use XDR and XSD mapping schemas to map data in an updategram to
tables and columns in a database. You use a sync's updg:mapping-schema attribute
to specify the mapping schema for an updategram. Listing 18.59 shows an example
that specifies an updategram for the Orders table.
Listing 18.59
<?xml version="1.0"?>
<orderupdate xmlns:updg=
"urn:schemas-microsoft-com:xml-updategram">
<updg:sync updg:mapping-schema="OrderSchema.xml">
<updg:before>
<Order OID="10248"/>
</updg:before>
<updg:after>
<Order City="Reims"/>
</updg:after>
</updg:sync>
</orderupdate>
Listing 18.60
<?xml version="1.0"?>
<Schema xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:sql="urn:schemas-microsoft-com:xml-sql">
<ElementType name="Order" sql:relation="Orders">
<AttributeType name="OID"/>
<AttributeType name="City"/>
<attribute type="OID" sql:field="OrderID"/>
<attribute type="City" sql:field="ShipCity"/>
</ElementType>
</Schema>
Listing 18.61
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:sql="urn:schemas-microsoft-com:mapping-schema">
<xsd:element name="Order" sql:relation="Orders" >
<xsd:complexType>
<xsd:attribute name="OID" sql:field="OrderId"
type="xsd:integer" />
<xsd:attribute name="City" sql:field="ShipCity"
type="xsd:string" />
</xsd:complexType>
</xsd:element>
</xsd:schema>
As you can see, a mapping schema maps the layout of the XML document to the
Northwind Orders table. See the Mapping Schemas section earlier in the chapter for
more information on building XML mapping schemas.
NULLs
Listing 18.62
<?xml version="1.0"?>
<employeeupdate xmlns:updg=
"urn:schemas-microsoft-com:xml-updategram">
<updg:sync updg:nullvalue="NONE">
<updg:before>
<Orders OrderID="10248"/>
</updg:before>
<updg:after>
<Orders ShipCity="Reims" ShipRegion="NONE"
ShipName="NONE"/>
</updg:after>
</updg:sync>
</employeeupdate>
As you can see, we define a placeholder for NULL named NONE. We then use this
placeholder to assign a NULL value to the ShipRegion and ShipName columns.
Parameters
Listing 18.63
<?xml version="1.0"?>
<orderupdate xmlns:updg=
"urn:schemas-microsoft-com:xml-updategram">
<updg:header>
<updg:param name="OrderID"/>
<updg:param name="ShipCity"/>
</updg:header>
<updg:sync>
<updg:before>
<Orders OrderID="$OrderID"/>
</updg:before>
<updg:after>
<Orders ShipCity="$ShipCity"/>
</updg:after>
</updg:sync>
</orderupdate>
This nuance has interesting implications for passing currency values as parameters.
To pass a currency parameter value to a table column (e.g., the Freight column in
the Orders table), you must map the data using a mapping schema.
NULL Parameters
Listing 18.64
<?xml version="1.0"?>
<orderupdate xmlns:updg=
"urn:schemas-microsoft-com:xml-updategram">
<updg:header nullvalue="NONE">
<updg:param name="OrderID"/>
<updg:param name="ShipCity"/>
</updg:header>
<updg:sync>
<updg:before>
<Orders OrderID="$OrderID"/>
</updg:before>
<updg:after>
<Orders ShipCity="$ShipCity"/>
</updg:after>
</updg:sync>
</orderupdate>
This updategram accepts two parameters. Passing a value of NONE will cause the
ShipCity column to be set to NULL for the specified order.
Note that we don't include the xml-updategram (updg:) qualifier when specifying the
nullvalue placeholder for parameters in the updategram's header.
Multiple Rows
I mentioned earlier that each before element can identify at most one row. This
means that to update multiple rows, you must include an element for each row you
wish to change.
The id Attribute
When you specify multiple subelements within your before and after elements, SQL
Server requires that you provide a means of matching each before element with its
corresponding after element. One way to do this is through the id attribute. The id
attribute allows you to specify a unique string value that you can use to match a
before element with an after element. Listing 18.65 gives an example.
Listing 18.65
<?xml version="1.0"?>
<orderupdate xmlns:updg=
"urn:schemas-microsoft-com:xml-updategram">
<updg:sync>
<updg:before>
<Orders updg:id="ID1" OrderID="10248"/>
<Orders updg:id="ID2" OrderID="10249"/>
</updg:before>
<updg:after>
<Orders updg:id="ID2" ShipCity="Munster"/>
<Orders updg:id="ID1" ShipCity="Reims"/>
</updg:after>
</updg:sync>
</orderupdate>
Here, we use the updg:id attribute to match up subelements in the before and after
elements. Even though these subelements are specified out of sequence, SQL Server
is able to apply the updates to the correct rows.
Another way to do this is to specify multiple before and after elements rather than
multiple subelements. For each row you want to change, you specify a separate
before/after element pair, as demonstrated in Listing 18.66.
Listing 18.66
<?xml version="1.0"?>
<orderupdate xmlns:updg=
"urn:schemas-microsoft-com:xml-updategram">
<updg:sync>
<updg:before>
<Orders OrderID="10248"/>
</updg:before>
<updg:after>
<Orders ShipCity="Reims"/>
</updg:after>
<updg:before>
<Orders OrderID="10249"/>
</updg:before>
<updg:after>
<Orders ShipCity="Munster"/>
</updg:after>
</updg:sync>
</orderupdate>
As you can see, this updategram updates two rows. It includes a separate
before/after element pair for each update.
Results
<?xml version="1.0"?>
<orderupdate xmlns:updg=
"urn:schemas-microsoft-com:xml-updategram">
</orderupdate>
Any errors that occur during updategram execution are returned as <?MSSQLError>
elements within the updategram's root element.
In real applications, you often need to be able to retrieve an identity value that's
generated by SQL Server for one table and insert it into another. This is especially
true when you need to insert data into a table whose primary key is an identity
column and a table that references this primary key via a foreign key constraint.
Take the example of inserting orders in the Northwind Orders and Order Details
tables. As its name suggests, Order Details stores detail information for the orders in
the Orders table. Part of Order Details' primary key is the Orders table's OrderID
column. When we insert a new row into the Orders table, we need to be able to
retrieve that value and insert it into the Order Details table.
From Transact-SQL, we'd usually handle this situation with an INSTEAD OF insert
trigger or a stored procedure. To handle it with an updategram, we use the at-
identity attribute. Similarly to the id attribute, at-identity serves as a
placeholder�everywhere we use its value in the updategram, SQL Server supplies
the identity value for the corresponding table. (Each table can have just one identity
column.) Listing 18.67 shows an example.
Listing 18.67
<?xml version="1.0"?>
<orderinsert xmlns:updg=
"urn:schemas-microsoft-com:xml-updategram">
<updg:sync>
<updg:before>
</updg:before>
<updg:after>
<Orders updg:at-identity="ID" ShipCity="Reims"/>
<Order_x0020_Details OrderID="ID" ProductID="11"
UnitPrice="$16.00" Quantity="12"/>
<Order_x0020_Details OrderID="ID" ProductID="42"
UnitPrice="$9.80" Quantity="10"/>
</updg:after>
</updg:sync>
</orderinsert>
Here, we use the string "ID" to signify the identity column in the Orders table. Once
the string is assigned, we can use it in the insertions for the Order Details table.
Listing 18.68
<?xml version="1.0"?>
<orderinsert xmlns:updg=
"urn:schemas-microsoft-com:xml-updategram">
<updg:sync>
<updg:before>
</updg:before>
<updg:after updg:returnid="ID">
<Orders updg:at-identity="ID" ShipCity="Reims"/>
<Order_x0020_Details OrderID="ID" ProductID="11"
UnitPrice="$16.00" Quantity="12"/>
<Order_x0020_Details OrderID="ID" ProductID="42"
UnitPrice="$9.80" Quantity="10"/>
</updg:after>
</updg:sync>
</orderinsert>
Executing this updategram will return an XML document that looks like this:
<?xml version="1.0"?>
<orderinsert xmlns:updg=
"urn:schemas-microsoft-com:xml-updategram">
<returnid>
<ID>10248</ID>
</returnid>
</orderinsert>
It's not unusual to see Globally Unique Identifiers (GUIDs) used as key values across
a partitioned view or other distributed system. (These are stored in columns of type
uniqueidentifier.) Normally, you use the Transact-SQL NEWID() function to generate
new uniqueidentifiers. The updategram equivalent of NEWID() is the guid attribute.
You can specify the guid attribute to generate a GUID for use elsewhere in a sync
element. As with id, nullvalue, and the other attributes presented in this section, the
guid attribute establishes a placeholder that you can then supply to other elements
and attributes in the updategram in order to use the generated GUID. Listing 18.69
presents an example.
Listing 18.69
<orderinsert>
xmlns:updg="urn:schemas-microsoft-com:xml-updategram">
<updg:sync>
<updg:before>
</updg:before>
<updg:after>
<Orders updg:guid="GUID">
<OrderID>GUID</OrderID>
<ShipCity>Reims</ShipCity>
</Orders>
<Order_x0020_Details OrderID="GUID" ProductID="11"
UnitPrice="$16.00" Quantity="12"/>
<Order_x0020_Details OrderID="GUID" ProductID="42"
UnitPrice="$9.80" Quantity="10"/>
</updg:after>
</updg:sync>
</orderinsert>
XML Bulk Load
As we saw in the earlier discussions of updategrams and OPENXML, inserting XML
data into a SQL Server database is relatively easy. However, both of these methods
of loading data have one serious drawback: They're not suitable for loading large
amounts of data. In the same way that using the Transact-SQL INSERT statement is
suboptimal for loading large numbers of rows, using updategrams and OPENXML to
load large volumes of XML data into SQL Server is slow and resource intensive.
SQLXML provides a facility intended specifically to address this problem. Called the
XML Bulk Load component, it is a COM component you can call from OLE
Automation�capable languages and tools such as Visual Basic, Delphi, and even
Transact-SQL. It presents an object-oriented interface to loading XML data in bulk in
a manner similar to the Transact-SQL BULK INSERT command.
The first step in using the XML Bulk Load component is to define a mapping schema
that maps the XML data you're importing to tables and columns in your database.
When the component loads your XML data, it will read it as a stream and use the
mapping schema to decide where the data goes in the database.
The mapping schema determines the scope of each row added by the Bulk Load
component. As the closing tag for each row is read, its corresponding data is written
to the database.
You access the Bulk Load component itself via the SQLXMLBulkLoad interface on the
SQLXMLBulkLoad COM object. The first step in using it is to connect to the database
using an OLE DB connection string or by setting its ConnectionCommand property to
an existing ADO Command object. The second step is to call its Execute method. The
VBScript code in Listing 18.70 illustrates.
Listing 18.70
You can also specify an XML stream (rather than a file) to load, making cross-DBMS
data transfers (from platforms that feature XML support) fairly easy.
XML Fragments
Setting the XMLFragment property to True allows the Bulk Load component to load
data from an XML fragment (an XML document with no root element, similar to the
type returned by Transact-SQL's FOR XML extension). Listing 18.71 shows an
example.
Listing 18.71
Enforcing Constraints
By default, the XML Bulk Load component does not enforce check and referential
integrity constraints. Enforcing constraints as data is loaded slows down the process
significantly, so the component doesn't enforce them unless you tell it to. For
example, you might want to do that when you're loading data directly into
production tables and you want to ensure that the integrity of your data is not
compromised. To cause the component to enforce your constraints as it loads data,
set the CheckConstraints property to True, as shown in Listing 18.72.
Listing 18.72
Normally you'd want to stop a bulk load process when you encounter a duplicate
key. Usually this means you've got unexpected data values or data corruption of
some type and you need to look at the source data before proceeding. There are,
however, exceptions. Say, for example, that you get a daily data feed from an
external source that contains the entirety of a table. Each day, a few new rows show
up, but, for the most part, the data in the XML document already exists in your table.
Your interest is in loading the new rows, but the external source that provides you
the data may not know which rows you have and which ones you don't. They may
provide data to lots of companies�what your particular database contains may be
unknown to them.
In this situation, you can set the IgnoreDuplicateKeys property before the load, and
the component will ignore the duplicate key values it encounters. The bulk load
won't halt when it encounters a duplicate key�it will simply ignore the row
containing the duplicate key, and the rows with nonduplicate keys will be loaded as
you'd expect. Listing 18.73 shows an example.
Listing 18.73
When IgnoreDuplicateKeys is set to True, inserts that would cause a duplicate key
will still fail, but the bulk load process will not halt. The remainder of the rows will be
processed as though no error occurred.
IDENTITY Columns
There are a couple of caveats regarding the KeepIdentity property. First, when
KeepIdentity is set to True, SQL Server uses SET IDENTITY_ INSERT to enable identity
value insertion into the target table. SET IDENTITY_ INSERT has specific permissions
requirements�execute permission defaults to the sysadmin role, the db_owner and
db_ddladmin fixed database roles, and the table owner. This means that a user who
does not own the target table and who also is not a sysadmin, db_owner, or DDL
administrator will likely have trouble loading data with the XML Bulk Load
component. Merely having bulkadmin rights is not enough.
Another caveat is that you would normally want to preserve identity values when
bulk loading data into a table with dependent tables. Allowing these values to be
regenerated by the server could be disastrous�you could break parent-child
relationships between tables with no hope of reconstructing them. If a parent table's
primary key is its identity column and KeepIdentity is set to False when you load it,
you may not be able to resynchronize it with the data you load for its child table.
Fortunately, KeepIdentity is enabled by default, so normally this isn't a concern, but
be sure you know what you're doing if you choose to set it to False.
Listing 18.74
Another thing to keep in mind is that KeepIdentity is a very binary option�either it's
on or it's not. The value you give it affects every object into which XML Bulk Load
inserts rows within a given bulk load. You can't retain identity values for some tables
and allow SQL Server to generate them for others.
NULL Values
For a column not mapped in the schema, the column's default value is inserted. If
the column doesn't have a default, NULL is inserted. If the column doesn't allow
NULLs, the bulk load halts with an error message.
The KeepNulls property allows you to tell the bulk load facility to insert a NULL value
rather than a column's default when the column is not mapped in the schema.
Listing 18.75 demonstrates.
Listing 18.75
As with SQL Server's other bulk load facilities, you can configure SQLXMLBulkLoad to
lock the target table before it begins loading data into it. This is more efficient and
faster than using more granular locks but has the disadvantage of preventing other
users from accessing the table while the bulk load runs. To force a table lock during
an XML bulk load, set the ForceTableLock property to True, as shown in Listing 18.76.
Listing 18.76
Transactions
By default, XML bulk load operations are not transactional�that is, if an error occurs
during the load process, the rows loaded up to that point will remain in the
database. This is the fastest way to do things, but it has the disadvantage of
possibly leaving a table in a partially loaded state. To force a bulk load operation to
be handled as a single transaction, set SQLXMLBulkLoad's Transaction property to
True before calling Execute.
When Transaction is True, all inserts are cached in a temporary file before being
loaded onto SQL Server. You can control where this file is written by setting the
TempFilePath property. TempFilePath has no meaning unless Transaction is True. If
TempFilePath is not otherwise set, it defaults to the folder specified by the TEMP
environmental variable on the server.
I should point out that bulk loading data within a transaction is much slower than
loading it outside of one. That's why the component doesn't load data within a
transaction by default. Also note that you can't bulk load binary XML data from
within a transaction.
Listing 18.77
In this example, SQLXMLBulkLoad establishes its own connection to the server over
OLE DB, so it operates within its own transaction context. If an error occurs during
the bulk load, the component rolls back its own transaction.
Listing 18.78
Errors
The XML Bulk Copy component supports logging error messages to a file via its
ErrorLogFile property. This file is an XML document itself that lists any errors that
occurred during the bulk load. Listing 18.79 demonstrates how to use this property.
Listing 18.79
The file you specify will contain a Record element for each error that occurred during
the last bulk load. The most recent error message will be listed first.
In addition to loading data into existing tables, the XML Bulk Copy component can
also create target tables for you if they do not already exist, or drop and recreate
them if they do exist. To create nonexistent tables, set the component's SchemaGen
property to True, as shown in Listing 18.80.
Listing 18.80
Since SchemaGen is set to True, any tables in the schema that don't already exist
will be created when the bulk load starts. For tables that already exist, data is simply
loaded into them as it would normally be.
If you set the BulkLoad property of the component to False, no data is loaded. So, if
SchemaGen is set to True but BulkLoad is False, you'll get empty tables for those in
the mapping schema that did not already exist in the database, but you'll get no
data. Listing 18.81 presents an example.
Listing 18.81
Listing 18.82
Listing 18.83 shows some VBScript code that sets the SGUseID property so that a
primary key will automatically be defined for the table that's created on the server.
Listing 18.83
Here's the Transact-SQL that results when the bulk load executes:
In addition to being able to create new tables from those in the mapping schema,
SQLXMLBulkLoad can also drop and recreate tables. Set the SGDropTables property
to True to cause the component to drop and recreate the tables mapped in the
schema, as shown in Listing 18.84.
Listing 18.84
Set objBulkLoad = CreateObject("SQLXMLBulkLoad.SQLXMLBulkLoad")
objBulkLoad.ConnectionString = _
"provider=SQLOLEDB;data source=KUFNATHE;database=Northwind;" & _
"Integrated Security=SSPI;"
objBulkLoad.SchemaGen = True
objBulkLoad.SGDropTables = True
objBulkLoad.Execute "d:\xml\OrdersSchema.xml",
"d:\xml\OrdersData.xml"
Set objBulkLoad = Nothing
USE Northwind
GO
DROP PROC ListCustomers
GO
CREATE PROC ListCustomers
@CustomerID nvarchar(10)='%'
AS
PRINT '@CustomerID = ' +@CustomerID
SELECT *
FROM Customers
using System;
using Microsoft.Data.SqlXml;
using System.IO;
using System.Xml;
class CmdExample
CmdExampleWriteXML(); return 0;
The advantage of this, of course, is that you don't need SQL Server client software to
run queries and access SQL Server objects. This means that applications on client
platforms not directly supported by SQL Server (e.g., Linux) can submit queries and
retrieve results from SQL Server via SQLXML and its SOAP facility.
You set up SQL Server to masquerade as a Web service by configuring a SOAP virtual
name in the IIS Virtual Directory Management tool. (You can find this under the
SQLXML | Configure IIS menu option under Start | Programs.) A SOAP virtual name is
simply a folder associated with an IIS virtual directory name whose type has been
set to soap. You can specify whatever service name you like in the Web Service
Name text box; the conventional name is soap. Once this virtual name is set up, you
configure specific SQL Server objects to be exposed by the Web service by clicking
the Configure button on the Virtual Names tab and selecting the object name, the
format of the XML to produce on the middle tier (via SQLISAPI), and the manner in
which to expose the object: as a collection of XML elements, as a single Dataset
object, or as a collection of Datasets. As the exercise we'll go through in just a
moment illustrates, you can expose a given server object multiple times and in
multiple ways, providing client applications with a wealth of ways to communicate
with SQL Server over SOAP.
In this next exercise, we'll walk through exposing SQL Server as a Web service and
then consuming that service in a C# application. We'll set up the SOAP virtual name,
then we'll configure a SQL Server procedure object to be exposed as a collection of
Web service methods. Finally, we'll build a small application to consume the service
and demonstrate how to interact with it.
2. Start the IIS Virtual Directory Management for SQLXML tool that you used to
configure the Northwind virtual folder earlier.
3. Go to the Virtual Names tab and add a new virtual name with a Name, Type,
and Web Service Name of soap. Set the path to the folder you created in step
1.
4. Save the virtual name configuration. At this point, the Configure button should
be enabled. Click it to begin exposing specific procedural objects and
templates via the Web service.
5. Click the ellipsis button to the right of the SP/Template text box and select the
ListCustomers stored procedure from the list.
6. Name the method ListCustomers and set its row format to Raw and its output
format to XML objects, then click OK.
7. Repeat the process and name the new method ListCustomersAsDataset (you
will be referencing the ListCustomers stored procedure). Set its output type to
Single dataset, then click OK.
9. Start a new C# Windows application project in Visual Studio .NET. The app we'll
build will allow you to invoke the SQLXML Web service facility to execute the
ListCustomers stored proc using a specified CustomerID mask.
10. Add a single TextBox control to the upper-left corner of the default form to
serve as the entry box for the CustomerID mask.
11. Add a Button control to the right of the TextBox control to be used to execute
the Web service method.
12. Add three RadioButton controls to the right of the button to specify which Web
method we want to execute. Name the first rbXMLElements, the second
rbDataset, and the third rbDatasetObjects. Set the Text property of each
control to a brief description of its corresponding Web method (e.g., the Text
property for rbXMLElements should be something like "XML Elements").
13. Add a ListBox control below the other controls on the form. This will be used to
display the output from the Web service methods we call. Dock the ListBox
control to the bottom of the form and be sure it is sized to occupy most of the
form.
14. Make sure your instance of IIS is running and accessible. As with the other
Web-oriented examples in this chapter, I'm assuming that you have your own
instance of IIS and that it's running on the local machine.
15. Right-click your solution in the Solution Explorer and select Add Web
Reference. In the URL for the Web reference, type the following:
http://localhost/Northwind/soap?wsdl
This URL refers by name to the virtual directory you created earlier, then to the
soap virtual name you created under it, and finally to the Web Services
Description Language (WSDL) functionality provided by SQLISAPI. As I
mentioned earlier, a question mark in a URL denotes the start of the URL's
parameters, so wsdl is being passed as a parameter into the SQLISAPI
extension DLL. Like XML and SOAP, WSDL is its own W3C standard and
describes, in XML, Web services as a set of end points operating on messages
containing either procedural or document-oriented information. You can learn
more about WSDL by visiting this link on the W3C Web site:
http://www.w3.org/TR/wsdl.
16. Once you've added the Web reference, the localhost Web service will be
available for use within your application. A proxy class is created under your
application folder that knows how to communicate with the Web service you
referenced. To your code, this proxy class looks identical to the actual Web
service. When you make calls to this class, they are transparently marshaled to
the Web service itself, which might reside on some other machine located
elsewhere on the local intranet or on the public Internet. You'll recall from
Chapter 6 that I described Windows' RPC facility as working the very same way.
Web services are really just an extension of this concept. You work and
interoperate with local classes and methods; the plumbing behind the scenes
handles getting data to and from the actual implementation of the service
without your app even being aware of the fact that it is dealing with any sort of
remote resource.
17. Double-click the Button control you added earlier and add to it the code in
Listing 18.87.
Listing 18.87
int iReturn = 0;
object result;
object[] results;
System.Xml.XmlElement resultElement;
System.Data.DataSet resultDS;
localhost.soap proxy = new localhost.soap();
proxy.Credentials=System.Net.CredentialCache.DefaultCredentials;
results = proxy.ListCustomers(textBox1.Text);
if (result.GetType().IsPrimitive)
{
listBox1.Items.Add(
string.Format("ListCustomers return value: {0}", result));
}
if (result is System.Data.DataSet)
{
resultDS = (System.Data.DataSet) results[j];
listBox1.Items.Add("DataSet " +resultDS.GetXml());
}
else if (result is localhost.SqlMessage)
{
errorMessage = (localhost.SqlMessage) results[j];
listBox1.Items.Add("Message " +errorMessage.Message);
listBox1.Items.Add(errorMessage.Source);
}
}
listBox1.Items.Add("");
}
// Return ListCustomers as Dataset
else if (rbDataset.Checked)
{
listBox1.Items.Add("Executing ListCustomersAsDataset...");
listBox1.Items.Add("");
resultDS = proxy.ListCustomersAsDataset(textBox1.Text,
out iReturn);
listBox1.Items.Add(resultDS.GetXml());
listBox1.Items.Add(
string.Format("ListCustomers return value: {0}", iReturn));
listBox1.Items.Add("");
}
18. This code can be divided into three major routines�one each for the three Web
service methods we call. Study the code for each type of output format and
compare and contrast their similarities and differences. Note the use of
reflection in the code to determine what type of object we receive back from
Web service calls in situations where multiple types are possible.
19. Compile and run the app. Try all three output formats and try different
CustomerID masks. Each time you click your Button control, the following
things happen.
a. Your code makes a method call to a proxy class Visual Studio .NET added
to your project when you added the Web reference to the SQLXML SOAP
Web service you set up for Northwind.
b. The .NET Web service code translates your method call into a SOAP call
and passes it across the network to the specified host. In this case, your
Web service host probably resides on the same machine, but the
architecture allows it to reside anywhere on the local intranet or public
Internet.
c. The SQLXML ISAPI extension receives your SOAP call and translates it into
a call to the ListCustomers stored procedure in the database referenced
by your IIS virtual directory, Northwind.
d. SQL Server runs the procedure and returns its results as a rowset to
SQLISAPI.
e. SQLISAPI translates the rowset to the appropriate XML format and object
based on the way the Web service method you called was configured,
then returns it via SOAP to the .NET Framework Web service code running
on your client machine.
f. The .NET Framework Web services code translates the SOAP it receives
into the appropriate objects and result codes and returns them to your
application.
g. Your app then uses additional method calls to extract the returned
information as text and writes that text to the ListBox control.
So, there you have it, a basic runthrough of how to use SQLXML's SOAP facilities to
access SQL Server via SOAP. As I've said, an obvious application of this technology is
to permit SQL Server to play in the Web service space�to interoperate with other
Web services without requiring the installation of proprietary client software or the
use of supported operating systems. Thanks to SQLXML's Web service facility,
anyone who can speak SOAP can access SQL Server. SQLXML's Web service support
is a welcome and very powerful addition to the SQL Server technology family.
USE master
GO
IF OBJECT_ID('sp_xml_concat','P') IS NOT
NULL
EXEC('
DECLARE
SELECT @c =
CONVERT(nvarchar(4000),SUBSTRING('+@
column+', 1+@cnt*4000,4000)) FROM
'+@table+'
END
''EXEC(
EXEC sp_xml_preparedocument
@hdl_doc OUT, ''+@concat+''
')
OPEN hdlcursor
DEALLOCATE hdlcursor
GO
(Code abridged)
USE Northwind
GO
"1996-07-04T00:00:00"> <OrderDetail
OrderID="10248" ProductID="11"
Quantity="12"/> <OrderDetail
OrderID="10248" ProductID="42"
Quantity="10"/> // More code lines here...
</Order> </Customer>
<Customer CustomerID="LILAS"
ContactName="Carlos GOnzlez"> <Order
CustomerID="LILAS" EmployeeID="3"
OrderDate=
"1996-08-16T00:00:00"> <OrderDetail
OrderID="10283" ProductID="72"
Quantity="3"/> </Order> </Customer>
</Customers>')
(CustomerID nvarchar(50))
CustomerID
--------------------------------------------------
VINET
LILAS
-----------
36061
USE master
GO
IF OBJECT_ID('sp_run_xml_proc','P') IS NOT
NULL
EXEC @hr=sp_OACreate
'SQLDMO.SQLServer', @sqlobject OUT
END
END
END
END
END
SET @dbname=DB_NAME()
END
DECLARE @rows int, @cols int, @x int, @y
int, @col varchar(8000), @row
varchar(8000)
END
END
SET @y=1
SET @x=1
SET @row=''
END
SET @x=@x+1
END
END
RETURN 0
Help:
RETURN -1
GO
USE pubs
GO
GO
EXEC
[TUK\PHRIP].pubs.dbo.sp_run_xml_proc
'testxml'
a message here
XMLText
-----------------------------------------------------------
-------
<pubs..authors au_id="172-32-1176"
au_lname="White" au_fname="John
<pubs..authors au_id="672-71-3249"
au_lname="Yokomoto" au_fname="A
SET NOCOUNT ON
GO
USE pubs
GO
GO
(XMLText varchar(8000))
GO
INSERT #XMLText1
-- Put the document in a variable -- and
add a root element
SET @doc=''
SET @doc='<root>'+@doc+'</root>'
GO
au_lname
----------------------------------------
Bennet
Blotchet-Halls
Carson
DeFrance
...
Ringer
Ringer
Smith
Straight
Stringer
White
Yokomoto
Knowledge Measure
1. What XML parser does SQL Server's XML features use?
2. True or false: The NESTED option can be used only in client-side FOR XML.
3. What extended stored procedure is used to prepare an XML document for use
by OPENXML?
4. What's the theoretical maximum amount of memory that SQLXML will allow
MSXML to use from the SQL Server process space?
5. True or false: There is currently no way to disable template caching for a given
SQLISAPI virtual directory.
8. What XML support file must you first define before bulk loading an XML
document into a SQL Server database?
10. Is it possible to change the name of the ISAPI extension DLL associated with a
given virtual directory, or must all SQLISAPI-configured virtual directories use
the same ISAPI extension?
11. Explain the way that URL queries are handled by SQLXML.
12. True or false: You can return traditional rowsets from SQLXMLOLEDB just as you
can from any other OLE DB provider.
13. What Win32 API does SQLXML call in order to compute the amount of physical
memory in the machine?
14. Name the two major APIs that MSXML provides for parsing XML documents.
15. Approximately how much larger in memory is a DOM document than the
underlying XML document?
18. What two properties must be set on the ADO Command object in order to allow
for client-side FOR XML processing?
19. What method of the ADO Recordset object can persist a recordset as XML?
20. What does the acronym "SAX" stand for in XML parlance?
21. When a standard Transact-SQL query is executed via a URL query, what type of
event does it come into SQL Server as?
22. What's the name of the OLE DB provider that implements client-side FOR XML
functionality and in what DLL does it reside?
23. Does SQLXML use MSXML to return XML results from server-side FOR XML
queries?
25. What component should you use to load XML data into SQL Server in the
fastest possible manner?
26. True or false: SQLISAPI does not support returning non-XML data from SQL
Server.
27. Is it possible to configure a virtual directory such that FOR XML queries are
processed on the client side by default?
28. Approximately how much larger than the actual document is the in-memory
representation of an XML document stored by SQLXML for use with OPENXML?
29. True or false: SQLXML does not support inserting new data via OPENXML
because OPENXML returns a read-only rowset.
30. What mapping-schema notational attribute should you use with the
xsd:relationship attribute if you are using a mapping schema with an
updategram and the mapping schema relates two tables in reverse order?
31. Name the central SQL Server error-reporting routine in which we set a
breakpoint in this chapter.
32. Describe a scenario in which it would make sense to use a mapping schema
with an updategram.
33. What lone value can SQLXMLOLEDB's Data Source parameter have?
34. True or false: The SAX parser is built around the notion of persisting a
document in memory in a tree structure so that it is readily accessible to the
rest of the application.
Chapter 19. Notification Services
You can't lead from the middle of the pack.
�Kenneth E. Routen
Unless you've been living in a cave, you've likely noticed the trend toward
notification-based data delivery. Even now, you can go to Web sites and subscribe to
weather notifications, traffic reports, sports scores, and a host of other notification-
based data. Once subscribed, the data can "follow" you around�being
simultaneously pushed to you via e-mail, pager messages, cell phone text
messages, instance messaging, and so on. These types of apps are taking over the
world and are a major focus of Microsoft's .NET initiative.
Unlike SQL Server proper, you don't install Notification Services, then build
notification applications that connect to it with client software. Instead, you design a
notification application using the tools provided by Notification Services, and they in
turn generate your application. So, the term "Notification Services" can be a bit
nebulous, especially if you think of it in terms of how SQL Server works. I think the
most accurate way to view Notification Services is as a collection of services and
tools designed to help you build rich, scalable notification applications.
A Notification Services application does not run inside the SQL Server process, nor
does it even have to run on the same machine. Technically, it's a SQL Server client,
just like any other application. Notification Services applications use SQL Server as a
data store in much the same way that products like Systems Management Server
and Microsoft Exchange can.
NSControl
The instance database will always end with a suffix of NSMain. An application
database will always be both prefixed and suffixed with the name you specified for
the application in the application configuration file. We'll talk more about the
application configuration file in just a moment.
These databases contain a bevy of stored procedures, system tables, and other
support objects. With only two exceptions that I know of, the names of these objects
will always be prefixed with NS. The subscriptions view in the application database
has the name AppNameSubscriptions, where AppName is the name of the
application, and the notification UDF in the instance database is named
ClassNameNotify, where ClassName is the name of the notification class to which
the UDF corresponds. All other Notification Services objects in the instance and
application databases are prefixed with NS.
There are two configuration files used to define a Notification Services application:
the instance configuration file and the application definition file. The instance
configuration file defines the instance and references the applications it hosts. The
application definition file defines an individual application. If an instance hosts
multiple applications, you will have just one instance configuration file and multiple
application definition files.
Understand that these configuration files are used only by NSControl. They provide
configuration details for specific NSControl tasks, such as creating the instance and
application databases or updating an instance's configuration. They are not
referenced by the instance service once it has been created. Once passed into
NSControl, the configuration information provided by these files is materialized into
objects and table entries in the instance and application databases�they are not
referenced thereafter by the event collection process, by the generator, or during
the distribution process. You should not delete them, of course, because you may
decide to change a configuration detail after an app has been created and registered
(in which case, you'd pass the appropriate configuration file into NSControl Update).
However, be aware that these files just serve as inputs to the NSControl
utility�nothing more.
NSService.exe
Note that you don't have to run NSService.exe as a service. Similarly to SQL Server's
replication agents, NSService.exe can also be executed as a command line utility.
There's no good reason for doing this; I mention it only for completeness and to give
you a sense of how all the pieces fit together. You can run NSService.exe from the
command line like this:
nsservice InstanceName �a
where InstanceName is the name of your instance. Of course, the executable will
either need to be on your path or you'll need to be in the folder where it resides
when you run this command. (The default location is \Program Files\Microsoft SQL
Server Notification Services\v2.0.2114.0\Bin, where v2.0.2114.0 is the version of
Notification Services you have installed.)
WARNING: Again, I mention how to run NSService.exe from the command line only
for completeness. I don't suggest that you run it this way in any sort of production
scenario. I could foresee problems if you were to run NSInstance.exe from the
command line at the same time it was already running as a service, with both
processes referencing the same Notification Services instance.
The main purpose of NSService.exe is to match event data with subscriptions and
produce notifications. NSService.exe provides a notification engine�a facility
capable of collecting events, matching those events with subscriptions, and turning
those matches into notifications to subscribers. You hook your own code into this
engine through the following mechanisms:
Notification Services provides an extensible foundation onto which you can build a
rich set of custom functionality. It provides the engine and mechanisms for turning
events and subscriptions into notifications. You turn this generic notification
generation engine into a custom application by hooking your own functionality into
the appropriate places using well-defined interfaces and standard APIs.
Three key components carry out the work of a Notification Services application:
event providers, the generator, and the distributor. Event providers handle collecting
events. The generator matches those events with subscriptions and generates raw
notifications. The distribution process takes these raw notifications and turns them
into notifications suitable for delivery to subscribers.
Event Providers
1. For XML data, an independent provider can create an EventLoader object and
write events from the XML data into the application database.
4. A provider can use COM objects and methods to submit events. Notification
Services uses COM Interop to expose its managed code classes as COM
interfaces.
The Generator
In contrast to event providers, you can have just one generator per application. The
generator matches subscriptions with events and produces notifications. The rules
by which a match between a subscription and an event is found are called match
rules and consist of Transact-SQL statements that you define in the application
definition file.
The process by which matches are detected and notifications are generated can be
more complex than simply matching subscriptions to events. Notifications can be
generated based on a schedule defined by a subscription and can also make use of
historical data.
The generator produces "raw" notifications that must be put through the distribution
process in order to be turned into something suitable for delivery to subscribers. I've
often wished that a term other than "notification" was used in the Notification
Services docs for these pubescent notification items. Perhaps "prenotification" or
"raw notification" or even "message" would be better. Whatever the case, just
understand that the output from the generation process must still be processed by
the distributor before it can be sent to subscribers.
Match Rules
As I've said, the events that the generator translates into raw notifications are
matched with subscriptions through the use of match rules. Match rules are T-SQL
SELECT statements specified in the application definition file. The generator
supports three rule types.
The Quantum
The quantum time period you specify in the QuantumDuration node of the
application definition file controls how often the generator wakes up and fires rules.
This time period is specified in standard ISO 8601 XML Schema time format and can
range from a few seconds to as long as necessary. Obviously, a shorter quantum
period causes the generator to fire more often and the requisite load on the system
to be heavier. A longer quantum period lightens the load on the system but causes
notifications to take longer to be generated and, hence, to be delivered to
subscribers. If unspecified, the quantum period defaults to one minute.
Because the columns in the match rule's T-SQL SELECT statement set are passed
into the UDF, it is called for each row in the result set. (When a UDF used in a
SELECT statement is not passed any columns, it is called just once for the entire
statement.) The interaction between the UDF and notification generation is best
understood by way of example. We'll discuss match rules in more depth later in the
chapter, but here's a sample rule that demonstrates the use of the UDF to produce
notifications (Listing 19.1).
Listing 19.1
SELECT dbo.BNSInfoNotify(s.SubscriberId,
s.DeviceName,
s.SubscriberLocale,
p.ID,
p.Product,
p.OpenedBy)
FROM BNSEvents e, BNSSubscriptions s, BNS..Bugs p
WHERE e.ID = p.ID
AND p.Product = s.Product
Note the use of the BNSInfoNotify function. It calls the notification xproc. The SELECT
statement isn't as interested in the return value of the function as it is in the
function's side effect. Because the UDF calls an xproc, it's able to modify other
tables on the server within the context of the SELECT statement. Normally, this isn't
allowed in a UDF. Basically, the UDF allows the match rule to avoid having to open a
cursor on the SELECT statement that matches events with subscriptions and call
NSInsertNotificationN for each one separately. By taking this approach, Notification
Services lessens the match rule coding burden on the developer.
This is the same approach taken by the xp_exec extended procedure in my last
book, The Guru's Guide to SQL Server Stored Procedures, XML, and HTML. As I
pointed out in that book, because of the numerous restrictions on what you can do
from within a SQL Server UDF, calling an xproc that uses a loop-back connection to
run custom T-SQL code is about the only means of doing a number of useful things
from within a UDF, including using a SELECT statement to drive parameterized calls
to a stored procedure, which is what the notification UDF does. The notification UDF
takes the same approach demonstrated by the xp_exec example code: It uses an
xproc to open a loop-back connection into the server and runs code that would
normally not be allowed from within a UDF (stored procedure calls) based on values
passed in from the SELECT statement.
This technique is not unlike the facility provided by DTS's Data Driven Query (DDQ)
task. A DDQ task executes parameterized queries based on data fed to it by another
DTS task (which may well be a SELECT statement or stored procedure call).
Unlike xp_exec, the notification xproc does not support the notion of joining the
caller's transactional context. Why is that? Shouldn't the xproc join the caller's
transactional context in order to guarantee that it will not be blocked by it? The
reason the notification xproc doesn't do this is simple: SQL Server doesn't support it.
As I pointed out in the discussion of xp_exec in my last book, an xproc cannot join its
caller's transactional context when called from a UDF. Given that the xp_NSNotify
xproc wasn't intended to be called from anywhere except a UDF, it wouldn't have
made sense for it to provide any support for binding to the calling spid's
transactional space.
Despite the fact that it's about the only way to do what the Notification Services
developers were obviously trying to accomplish, the fact that a separate connection
is initiated from within a UDF raises some interesting issues. For one thing, given
that separate spids could theoretically be going after the same resources, scenarios
could exist where the notification xproc could be blocked by its caller. Given that the
transactional context in which match rules run is controlled exclusively by
NSService.exe, this is unlikely. The user would probably have to code some UDF
tricks of his or her own into the match rule in order to change the transactional
environment to the extent that the xproc could be blocked. It would have to find a
way to take out and hold locks on the same tables NSInsertNotificationN accesses to
insert new notifications. I can't imagine this happening except in a very contrived
situation.
A far more likely scenario is that the loop-back connection could fail. I've seen this
actually occur with xp_exec and the undocumented xp_execresultset xproc that
ships with SQL Server. (As of SQL Server 2000 Service Pack 3, this is no longer an
xproc.) When a client uses a trusted connection to connect to SQL Server, if the
domain controller is unreachable and sufficient credentials information isn't cached
on the local machine, the connection will receive the infamous "cannot generate
SSPI context" error. SSPI is the Security Support Provider Interface, and the error
indicates that it can't complete the necessary security operations to successfully
delegate the client's security token to SQL Server, often because it can't reach the
domain controller. Thanks to cached credentials, you can usually resolve this (at
least until the next reboot) by reconnecting to the network or otherwise making the
domain controller accessible.
If access to the domain controller is tenuous, calls to the notification xproc could fail
even though NSService.exe is connected to SQL Server and continues to run without
problems. NSService.exe makes use of connection pooling, so it's not as likely to
need to initiate a new connection to the server once it's running. The notification
xproc, on the other hand, may indeed initiate a new connection and may fail if the
domain controller is unreachable. A common scenario in which I might expect to see
this is when running Notification Services on a laptop. If the laptop normally logs into
a domain controller and is rebooted after being disconnected from the network, you
may run into problems with generating notifications due to SSPI context generation
failures. If so, the system event log will indicate the failed call and the reason for it.
Although the UDF/xproc technique used by Notification Services lessens the coding
burden on the person writing the match rules for a notification application, I'm not
sure that it makes it more scalable than, say, using a cursor over the match rule
result. As I mentioned in Chapter 10, invoking an xproc of any kind, regardless of
whether it initiates a loop-back connection, takes the worker thread associated with
the calling spid "off" of the scheduler. In other words, the thread becomes unusable
to the scheduler for processing work requests by other spids. Because the scheduler
doesn't know whether the xproc will yield often enough, it must assume that it will
not and must ignore its host worker thread in terms of carrying out work requests
until the xproc completes.
If you use the STRESS.CMD utility discussed earlier in this book to instantiate, say,
255 connections to execute an xproc that simply calls Windows' Sleep API function in
order to pause the calling thread for 30 seconds, you won't be able to establish a
new connection to your server while the xprocs run. Moreover, any other
connections currently executing queries may find themselves starved because they
have yielded a worker thread that an xproc has taken over. So, right off the bat,
calling an xproc has a negative impact on SQL Server scalability because it forces a
worker thread to be dedicated to a given spid rather than being shared by perhaps
many of them as it normally would be.
Add to this the fact that, for each row in the match result set, the xproc initiates a
new SQL Server connection, and you could well encounter scalability issues with this
approach. In a system that may process thousands or even millions of notifications
using a single match rule invocation, you might see thousands or millions of new
SQL Server connections being made and dropped during one instance of the
generator firing. Worse yet, if the system is extremely busy or has a limited number
of available worker threads (e.g., due to xprocs taking them over), the notification
function's xproc may not be able to immediately establish a loop-back connection
and may block while it waits on one. Obviously, the system impact caused by these
loop-back connections could be very significant, depending on the number and
frequency of them.
The INSTEAD OF trigger approach I mentioned earlier doesn't have this caveat, nor
would simply opening a cursor over the match result set and invoking the
NSInsertNotificationN stored procedure for each row. Of course, cursors can have
their own performance and scalability issues, but they will, at least, not cause a spid
to commandeer a worker thread, nor will they cause connection thrashing.
Distributors
As I said earlier, notifications produced by the generator are generated in a raw
format and need to be formatted and packaged in order to be distributed as end
user notifications. That's where the distributor comes in. The distributor takes the
raw notifications produced by the generator and translates them into formatted
notifications that can be sent out over a delivery protocol. XML style sheets are
commonly used to translate raw notifications into files suitable for delivery to
subscribers.
Once the generator finishes creating a batch of raw notifications, the distributor
reads the subscriber data in the batch and determines the type of formatting
required. The distributor then formats each raw notification and sends it to a
delivery service (e.g., SMTP) using a delivery channel as defined in the instance
configuration file.
Formatting Messages
When you set up a notification class in an application definition file, you specify how
the class is to be formatted. As I've mentioned, a common method of transforming
raw notifications into something that can be sent to subscribers is to use an XML
style sheet. To use Notification Services' built-in XSL-based formatter, specify
XsltFormatter as the ContentFormatter node's ClassName in your application
definition file. Of course, you must also create the style sheet and specify the
location and name of the file in the ContentFormatter node.
You must set up a content formatter for every combination of device and locale. An
application that supports multiple locales and multiple delivery devices must set up
a content formatter for each combination of them.
Notification Services provides one content formatter for you out of the box, the
XsltFormatter content formatter. Given the flexibility and power of the XML style
sheet language, this will be the only content formatter that many apps will ever
need. As we saw in Chapter 8, an XML style sheet can produce a wide variety of
output from a single XML data source. You have looping, conditional logic, and user-
defined functions at your disposal, so there isn't much you can't do.
That said, you may run into situations where you need or want to construct a custom
content formatter. If so, Notification Services allows you to build custom content
formatters by implementing the IContentFormatter interface. You specify this
content formatter and any parameters it requires in the NotificationClasses section
of your application definition file.
Delivering Notifications
NSService.exe does not handle the ultimate delivery of notifications�it leaves that
to delivery services such as SMTP. It delivers notifications to these services via
delivery channels, which you define in the instance configuration file. A delivery
channel packages the notifications it receives into a protocol packet and sends them
on to the specified delivery service. For example, for SMTP, a delivery channel would
package notifications using the SMTP delivery protocol and deliver them to the
specified SMTP or Exchange server, and the server would, in turn, deliver them to
subscribers. Notification Services supports the following standard delivery protocols.
The SMTP protocol� supports sending notifications to SMTP or Exchange
servers.
Note that you can also create your own custom delivery protocols. To create a
custom delivery protocol, a class must implement either the IDeliveryProtocol
interface or the IHttpProtocolProvider interface. IDeliveryProtocol is the base custom
delivery protocol interface. Notification Services provides the IHttpProtocolProvider
interface to make it easier to build custom HTTP-based delivery protocols. Most of
the plumbing required to deal with HTTP is provided for you; you simply provide
code to format the envelope and process the response. You can use the
IHttpProtocolProvider to provide notifications via HTTP-based protocols such as
SOAP, SMS, and .NET Alerts.
Notification Services ships with a set of sample applications that you can copy and
customize for use with your own notification applications. You can use the
CopySample utility to copy a sample app to a new application, complete with its own
IIS virtual directory and project-specific environment variables.
These sample apps demonstrate how to use a Makefile project in Visual Studio .NET
to integrate a custom build process into the Visual Studio IDE. Each sample solution
consists of two projects. One is the subscription management application, which is a
regular ASP.NET application; the other is a VC++ Makefile project named
AppDefinition, which encapsulates the collection of configuration files, CMD files,
XSLT style sheets, and other support files used to create and register the sample
notification application.
A VC++ Makefile project allows you to wire in your own custom commands for the
Build, Clean, and Rebuild commands in the Visual Studio IDE. The Notification
Services sample projects leverage this to call custom CMD files that handle each of
these operations for the AppDefinition project. The result is a custom build solution
that's fairly well integrated with the Visual Studio IDE. When you right-click the
AppDefinition project in the Visual Studio Solution Explorer and select Build, you're
actually calling a CMD file that runs NSControl. Its output is collected in the output
window, and it is, for all intents and purposes, a part of the IDE itself.
While I'm on the subject of CMD files and the sample apps, I should mention that the
CMD files that ship with the Notification Services samples are a great way to learn
about NSControl and about the Notification Services application creation process in
general. Several of these CMD files feature clever tricks to get the most out of
Windows' CMD file syntax. I think these are worth mentioning in the interest of
understanding how the samples work and, indirectly, how NSControl works. If you
can read the sample CMD files that ship with the product, you are well on your way
to knowing how to use NSControl to create and manage a Notification Services
application. Table 19.1 lists some of the more interesting CMD file syntax I've
discovered in these files and what it does.
Building Your Own Notification Application
In this section, we'll build a Notification Services application from scratch. The app
we'll build will be a Bug Notification Service (BNS)�it will notify subscribers when
new product bugs are entered into a SQL Server table and when changes are made
to them. Events will be generated via a trigger on the Bugs table in a custom
database (outside of Notification Services), and notifications will be delivered over
SMTP. We'll build all the pieces of the application, including the subscription
management app. I'm assuming for the purposes of this exercise that you have SQL
Server, Notification Services, and an SMTP server (the Windows NT product family
ships with an SMTP server) all installed on the same machine. If your SMTP server is
running, stop it for the time being so that no actual e-mails are sent out as a result
of our example application.
The typical Notification Services application development cycle goes like this.
1. Create the instance configuration file and the application definition file.
Syntax Function
Syntax Function
4. Use NSControl Register to register the new instance and its corresponding
Windows service.
6. Use NET START or the Services application to start the new service. At this
point, you have a running instance of the server-side portion of your app.
Let's start by building the instance configuration file. We'll use a bare-bones
configuration to keep the concepts from being lost in the details. Listing 19.2 shows
the one we'll use for the BNS app. (You can find this code in the instConfig.XML file in
the CH19\bns\svc subfolder on the CD accompanying this book.)
Listing 19.2
I've bolded the two values in the file that you'll need to customize to your specific
configuration. You can specify values for them by setting up environment variables
before calling NSControl or by passing them on the NSControl command line. We'll
pass them on the command line when we call NSControl in just a moment.
The first parameter, the SqlServerSystem node, is the machine name and SQL
Server instance (separated by a backslash) that will host your notification
application. The second entry, BaseDirectoryPath, is the base path for the
configuration files, XML style sheets, and other support files that will be used to
create the application.
This configuration file references a single application, BNS, and a single delivery
channel, the EmailChannel, which uses the built-in SMTP delivery protocol. Note the
Parameters node within the BNS Application node. This defines the parameters to
pass into the application definition file. Unlike the instance configuration file, the
application definition file cannot retrieve parameters from the environment or from
the NSControl command line. In this case, we simply pass on the parameters
originally passed into the instance configuration file (along with the COMPUTERNAME
environment variable). In other scenarios, you might have more sophisticated
parameter needs.
Let's now turn to the application definition file, appADF.XML, shown in Listing 19.3.
(You can find this file in the CH19\bns\svc subfolder on the CD accompanying this
book.)
Listing 19.3
<SubscriptionClasses>
<SubscriptionClass>
<SubscriptionClassName>BNSSubscriptions
</SubscriptionClassName>
<Schema>
<Field>
<FieldName>DeviceName</FieldName>
<FieldType>nvarchar(255)</FieldType>
<FieldTypeMods>not null</FieldTypeMods>
</Field>
<Field>
<FieldName>SubscriberLocale</FieldName>
<FieldType>nvarchar(10)</FieldType>
<FieldTypeMods>not null</FieldTypeMods>
</Field>
<Field>
<FieldName>Product</FieldName>
<FieldType>nvarchar(30)</FieldType>
<FieldTypeMods>not null</FieldTypeMods>
</Field>
<Field>
<FieldName>ID</FieldName>
<FieldType>nvarchar(15)</FieldType>
<FieldTypeMods>not null</FieldTypeMods>
</Field>
<Field>
<FieldName>OpenedBy</FieldName>
<FieldType>nvarchar(30)</FieldType>
<FieldTypeMods>not null</FieldTypeMods>
</Field>
<Field>
<FieldName>AssignedTo</FieldName>
<FieldType>nvarchar(30)</FieldType>
<FieldTypeMods>not null</FieldTypeMods>
</Field>
</Schema>
<EventRules>
<EventRule>
<RuleName>BNSSubscriptionsRule</RuleName>
<Action>
SELECT dbo.BNSInfoNotify(s.SubscriberId,
s.DeviceName,
s.SubscriberLocale,
p.ID,
p.Product,
p.OpenedBy,
p.AssignedTo,
p.Description,
p.DateChanged,
p.Pri,
p.Sev)
FROM BNSEvents e, BNSSubscriptions s, BNS..Bugs p
WHERE e.ID = p.ID
AND p.Product = s.Product
AND p.OpenedBy LIKE s.OpenedBy
AND p.AssignedTo LIKE s.AssignedTo
AND p.ID LIKE s.ID
</Action>
<EventClassName>BNSEvents</EventClassName>
</EventRule>
</EventRules>
</SubscriptionClass>
</SubscriptionClasses>
<NotificationClasses>
<NotificationClass>
<NotificationClassName>BNSInfo</NotificationClassName>
<Schema>
<Fields>
<Field>
<FieldName>ID</FieldName>
<FieldType>int</FieldType>
</Field>
<Field>
<FieldName>Product</FieldName>
<FieldType>nvarchar(30)</FieldType>
</Field>
<Field>
<FieldName>OpenedBy</FieldName>
<FieldType>nvarchar(30)</FieldType>
</Field>
<Field>
<FieldName>AssignedTo</FieldName>
<FieldType>nvarchar(30)</FieldType>
</Field>
<Field>
<FieldName>Description</FieldName>
<FieldType>nvarchar(80)</FieldType>
</Field>
<Field>
<FieldName>DateChanged</FieldName>
<FieldType>datetime</FieldType>
</Field>
<Field>
<FieldName>Pri</FieldName>
<FieldType>int</FieldType>
</Field>
<Field>
<FieldName>Sev</FieldName>
<FieldType>integer</FieldType>
</Field>
</Fields>
</Schema>
<ContentFormatter>
<ClassName>XsltFormatter</ClassName>
<Arguments>
<Argument>
<Name>XsltBaseDirectoryPath</Name>
<Value>%BasePath%</Value>
</Argument>
<Argument>
<Name>XsltFileName</Name>
<Value>BNSInfo.xslt</Value>
</Argument>
</Arguments>
</ContentFormatter>
<Protocols>
<Protocol>
<ProtocolName>SMTP</ProtocolName>
<Fields>
<Field>
<FieldName>Subject</FieldName>
<SqlExpression>'Bug Change Notification'
</SqlExpression>
</Field>
<Field>
<FieldName>BodyFormat</FieldName>
<SqlExpression>'html'</SqlExpression>
</Field>
<Field>
<FieldName>From</FieldName>
<SqlExpression>'[email protected]'</SqlExpression>
</Field>
<Field>
<FieldName>Priority</FieldName>
<SqlExpression>'Normal'</SqlExpression>
</Field>
<Field>
<FieldName>To</FieldName>
<SqlExpression>DeviceAddress</SqlExpression>
</Field>
</Fields>
</Protocol>
</Protocols>
</NotificationClass>
</NotificationClasses>
<Providers>
<NonHostedProvider>
<ProviderName>SQLTriggerEventProvider</ProviderName>
</NonHostedProvider>
</Providers>
<Generator>
<SystemName>%SystemName%</SystemName>
</Generator>
<Distributors>
<Distributor>
<SystemName>%SystemName%</SystemName>
<QuantumDuration>PT5S</QuantumDuration>
</Distributor>
</Distributors>
<ApplicationExecutionSettings>
<QuantumDuration>PT5S</QuantumDuration>
<Vacuum>
<RetentionAge>P3DT00H00M00S</RetentionAge>
<VacuumSchedule>
<Schedule>
<StartTime>23:00:00</StartTime>
<Duration>P0DT02H00M00S</Duration>
</Schedule>
<Schedule>
<StartTime>03:00:00</StartTime>
<Duration>P0DT02H00M00S</Duration>
</Schedule>
</VacuumSchedule>
</Vacuum>
</ApplicationExecutionSettings>
</Application>
I've bolded the parameters that are passed in from the instance configuration file. As
I mentioned earlier, you never pass the application definition file directly to
NSControl. When NSControl requires a configuration file, you pass the instance
configuration file, and it, in turn, references the application definition files for the
applications it hosts.
The SystemName node in the Generator and Distributor sections refers to the
machine name that's hosting your notification application. The
XsltBaseDirectoryPath node refers to the path that contains the XML style sheets
that the XSLT content formatter is to use.
WARNING: The sample applications that ship with Notification Services do not
include vacuum sections in their application definition files, so if you use the
CopySample utility to copy one of them so that you can customize it for your own
use, your new app won't have a vacuum section in its application definition file by
default and won't vacuum obsolete data. You can add a vacuum section to the
application definition file before creating your application with NSControl Create, or
you can add it afterward and use NSControl Update to apply the change. You can
also invoke vacuuming manually by running the NSVacuum stored procedure. Users
who vacuum old data must belong to the NSVacuum role.
Note the way that the match rule is constructed in the EventRules section. Its join
clause looks like this:
The use of LIKE in the query predicates allows wildcards to be used in place of actual
values in the subscription definition. So, the OpenedBy, AssignedTo, and ID (the bug
ID) columns are effectively optional. If they are unspecified in the subscription
management application, the GUI can plug them with a value of "%" in order to
cause them to be effectively ignored for purposes of filtering the query. This
probably isn't the best strategy in terms of effective index use, but it does allow the
match rule to be somewhat dynamic. I'll show you an alternate way of constructing
dynamic matching rules later in the chapter.
The only other thing that we need to do before running NSControl to create, register,
and enable the notification application is create the custom database that stores the
Bugs table. As I mentioned earlier, this table will have a trigger on it that will
generate a notification every time a new bug is added or changed. Listing 19.4
presents a T-SQL script to create the BNS database. (You can find this code in the
CreateBNS.SQL script in the \CH19\ bns\svc subfolder on the CD accompanying this
book.)
Listing 19.4
USE master
GO
IF (DB_ID('BNS') IS NOT NULL)
DROP DATABASE BNS
GO
USE BNS
GO
OPEN NewBugCursor
CLOSE NewBugCursor
DEALLOCATE NewBugCursor
END
GO
This script creates a new database, BNS, then creates a Bugs table in it for storing
bug entries. It then creates a trigger over the table that fires on an INSERT or
UPDATE operation against the table and generates notifications accordingly. Run the
script to create the BNS database, then we'll get started building the notification
application. (If you see warning messages regarding missing objects and the inability
to set up dependency information in sysdepends, ignore them.)
Once you've built the BNS database, add the Notification Services bin folder to the
system path so that we can call NSControl without fully qualifying its location. This
folder should be at \Program Files\Microsoft SQL Server Notification
Services\vN.N.N.N\bin on the drive on which you installed Notification Services.
After you've added the bin folder to the system path, open a command window and
change to the CH19\bns\svc subfolder on the CD accompanying this book. Once
there, run NSControl to create the instance and application databases using this
command line (type this on a single line):
Server_Instance and BasePath are parameters being passed into the instance
configuration file. As I mentioned earlier, Server_Instance is your machine name and
SQL Server instance name, separated by a backslash. BasePath is the folder where
the configuration files, XML style sheets, and other supporting files reside for the
instance. Replace each of these values with the appropriate values for your system
when you run NSControl.
Once the instance and application databases are created, you're ready to register
the instance. Although you can register the instance without also registering its
Windows service, there's no point in doing that in our scenario, so we'll register both
at the same time. Use NSControl to register the instance like this (type this on a
single line):
Replace YourServer\YourSSInstance with the name of the machine and SQL Server
instance that is hosting your notification application. For purposes of this exercise,
I'm assuming that your SQL Server and Notification Services app will reside on the
same machine. Replace YourSvcUser and YourSvcPwd with the user name and
password, respectively, of the NT account that you want NSService.exe to run under.
It's common for users to create a special account just for this purpose. If you do that,
be sure that it has the proper access rights by following the instructions laid out in
the "NS$instance_name Service Account Security" topic in Notification Services
Books Online.
Once an instance has been registered, you must enable it in order for it to begin
generating notifications. You can selectively enable and disable parts of an
application�for example, you could enable event collection but not the generator. In
our case, we want to enable the entire application, so we'll simply pass the instance
name on the command line to NSControl:
You can enable the instance before or after you start the Windows service. If you
enable it before you've started the service (as we just have), you'll see that some of
the components of the application show a pending state after NSControl Enable
runs. That's nothing to worry about. They'll come online once the service is started,
which we'll do next.
You can start your new notification service using Windows' Services application or
via NET START. Given that I'm an old-fashioned sort of guy, I usually use NET START,
like this:
Once the service is started, it's officially ready to begin delivering notifications. Now
all we have to do is create a subscription management application and begin
entering subscriptions and events to test out our new notification application.
As I mentioned earlier, you can use the CopySample utility to duplicate one of the
sample apps that ships with Notification Services so that you can customize it for
your own use. We're not going to do that in this exercise. Instead, we'll build a new
subscription management app from scratch in Visual Basic .NET. The app will be a
traditional Windows app rather than the more common ASP.NET application because
I want to illustrate what's involved with interacting with the Notification Services
object model with as little clutter as possible. Building a subscription management
app as a traditional Windows app isn't the most practical approach because, by
default, the app needs to run on the same machine as your notification application.
When the subscription app is an ASP.NET app, this is not a big problem�regardless
of where the users of the app reside, the app itself runs on the IIS server, which can
easily share its host machine with a notification application. However, when the
subscription management application is a Windows app, the app runs in the context
of whatever user machine executes it. By default, a user would have to run the app
on the same machine that hosts your notification application in order for it to work
properly.
It's possible to access the Notification Services object model remotely using COM
Interop and DCOM, but, as a rule, you'll probably want to build your subscription
management apps as ASP.NET apps to keep things simple and to keep from mixing
COM and managed code classes unless absolutely necessary. The app presented
here isn't intended for production use.
As I've said, I didn't build the subscription management app we'll use in this example
as an ASP.NET app because I wanted to keep the example as simple as possible.
There are nuances and idiosyncrasies associated with connectionless (Web-based)
apps that you don't have with traditional Windows apps, and my purpose here is not
to provide a subscription app that you can immediately drop into place in a
production system but rather to demonstrate how easy it is to interact with and use
the Notification Services object model.
I should also mention that I used Visual Basic .NET here rather than C# because I
think a fair number of SQL Server practitioners are Visual Basic people rather than C,
C++, or C# coders. Personally, I much prefer C# to Visual Basic of any flavor and
would much rather use it. However, given that all the Notification Services sample
apps except the Flight sample are written in C# and given that, as I've said, most
SQL Server people are likely to prefer Visual Basic, I've built the subscription app
we'll look at in Visual Basic .NET. Most of the code in this app could easily be reused
in an ASP.NET application written in Visual Basic .NET with little or no changes. All of
the sample subscription management apps that ship with Notification Services are
ASP.NET apps, so you can reference them for ideas on how to implement your own
ASP.NET-based subscription management app.
The Notification Services sample apps make use of a common utility class that's
stored in NSUtility.cs. This class makes interacting with the Notification Services
object model even easier than it already is and is used extensively by the sample
apps. Given that our subscription app is written in Visual Basic and given that you
can't use a C# module in a Visual Basic .NET project without first compiling it to an
assembly, I've translated a fair number of the functions in NSUtility.cs into Visual
Basic .NET. You're welcome to extract these functions from the source code we're
about to explore and use them in your own Visual Basic .NET�based subscription
management applications.
The first thing you must do in order to allow a managed code app to interact with
the Notification Services object model is add a reference to the
Microsoft.SqlServer.NotificationServices assembly to your project. If you load the
BNSSubscribe solution into Visual Studio .NET, you'll see that it does indeed
reference this assembly.
So, without further ado, let's look at the source code for BNSSubscribe (Listing 19.5),
the subscription management application for the BNS. Given that much of the
source for the app was generated by Visual Studio .NET, I won't take you through all
of it. We'll just explore the code that I wrote to implement the app and hit the high
points of how it works. (You can find this code in the Form1.vb file in the
\CH19\bns\BNSSubscribe subfolder on the CD accompanying this book. Due to VB's
prehistoric predilection for line-orientated coding, some of this code doesn't format
well on the printed page; see the source code file itself for a much more readable
listing.)
Listing 19.5
Dim dr As DataRow
dt.Columns.Add(New DataColumn("SubscriptionId",
System.Type.GetType("System.Int32")))
dt.Columns.Add(New DataColumn("Product", System.Type.GetType
("System.String")))
dt.Columns.Add(New DataColumn("ID", System.Type.GetType
("System.String")))
dt.Columns.Add(New DataColumn("OpenedBy", System.Type.GetType
("System.String")))
dt.Columns.Add(New DataColumn("AssignedTo",
System.Type.GetType("System.String")))
Dim i As Integer
Dim name As String
For i = 0 To dt.Columns.Count - 1
name = dt.Columns(i).ColumnName
Return ds
End Function
dgSubscriptions.SetDataBinding(CreateDataSource
(subscriptionEnumeration), "Subscriptions")
End Sub
Return subscriberDeviceName
End Function
subscriber.SubscriberId = subscriberId
subscriber.Add()
subscriberDevice.SubscriberId = subscriberId
subscriberDevice.DeviceTypeName = protocolName
subscriberDevice.DeviceName = "myDevice"
subscriberDevice.DeviceAddress = emailaddress
subscriberDevice.DeliveryChannelName = deliveryChannelName
subscriberDevice.Add()
Catch ex As NSException
If (NSEventEnum.DuplicateSubscriber <> ex.ErrorCode) Then
Throw (ex)
End If
End Try
End Sub
If subscription.HasTimedRule Then
subscription.ScheduleStart = dateTimeStart
subscription.ScheduleRecurrence = recurrence
End If
Return subscription.Add()
End Function
subscription.Delete()
End Sub
Try
Try
If 0 <> OpenedBy.Length Then openedbymask =
tbOpenedBy.Text
If 0 <> AssignedTo.Length Then assignedtomask =
tbAssignedTo.Text
If 0 <> Bug.Length Then bugmask =
Int32.Parse(tbBug.Text).ToString()
Catch ex As Exception
Throw New Exception("Invalid Bug ID specified.", ex)
End Try
subscriberDeviceName = GetSubscriberDeviceName(userName)
subscriptionFields.Add("DeviceName", subscriberDeviceName)
subscriptionFields.Add("SubscriberLocale", "en-US")
subscriptionFields.Add("Product", Product)
subscriptionFields.Add("ID", bugmask)
subscriptionFields.Add("OpenedBy", openedbymask)
subscriptionFields.Add("AssignedTo", assignedtomask)
subscriptionId = AddSubscription(userName,
subscriptionClassName, subscriptionFields,
Nothing, Nothing)
UpdateGrid(userName)
Try
DeleteSubscription(userName, subscriptionClassName,
subscriptionIdString)
UpdateGrid(userName)
Catch ex As Exception
sbMsg.Text = "Cannot delete the subscription: " +
ex.Message
End Try
End Sub
End Class
Let's walk through the main elements of this application. First, in the form class's
constructor, I instantiate NSInstance and NSApplication objects. I've not included the
constructor code here because most of it was generated by Visual Studio .NET, but
here are the specific lines that instantiate these objects:
The instance and application variables are members of the form class, so they are
accessible to all of its methods. Once created, we can use them when we make
other calls to the Notification Services object model.
Once these objects are created, we retrieve the subscriptions for the current user
and update the grid at the bottom of the form that lists them. The UpdateGrid
method is responsible for carrying this work out. As you can see from the source
code listing, it creates a new SubscriptionEnumeration object, then passes it to a
method named CreateDataSource to create a data source (a DataSet object, in this
case) that's suitable for use with a DataGrid control. We pass in the current user
name (which we retrieved at application startup via
System.Security.Principal.WindowsIdentity.GetCurrent.Name) to UpdateGrid, and it
uses this to filter the subscriptions it enumerates.
The next code of interest is the btAdd_Click and AddSub methods. This code runs
when the user clicks the Add button in the GUI to add a subscription. The
btAdd_Click method calls AddSub, and AddSub carries out the real work of adding
the subscription. It begins by adding the subscriber (in case he or she hasn't already
been added), then retrieves the subscriber's delivery device and adds the
subscription. As I mentioned earlier, the GUI supplies wildcard values for
subscription fields that are left blank. This allows them to be effectively optional
when the match rule runs.
The last segment of code that we'll look at is the btDelete_Click method. Obviously,
this code runs when the user selects a subscription definition in the DataGrid and
clicks the Delete button to delete it. It calls the DeleteSubscription method, which
I've translated from the NSUtility.cs C# source file that ships with the Notification
Services sample apps.
The rest of the code in this source module is support code for the app's two main
functions: adding and deleting subscriptions. A subscription management app could
obviously provide much more functionality than this, but you need at least these two
functions in any subscription management app you build.
Testing Notifications
Let's use BNSSubscribe to add a subscription so that we can test our notification
application. Run the app now from the \CH19\bns\BNSSubscribe\bin subfolder and
add a subscription for bugs filed against the Sequin product. (If we were going to
allow notifications to actually be delivered, we'd need to change the e-mail address
field to a valid address; don't worry about that for now.) Leave the other entry fields
blank and click the Add button to add the subscription. Once the subscription adds
successfully, start Query Analyzer and run the following query:
USE BNS
GO
INSERT INTO Bugs
VALUES(1, 'Sequin', '','','', getdate(), 1, 1, 1, '','')
Once you've executed this query, open Windows Explorer and navigate to the pickup
folder for your SMTP server. On the Windows NT family, this is located at
\Inetpub\mailroot\pickup. After a few seconds, you should see a file with an
extension of .eml show up in this folder. If your SMTP server is running, it will pick up
this file and attempt to send it to the recipient. Congratulations�you've just
generated your first notification with Notification Services!
Possible Improvements
As you might guess, there are a number of ways we could improve this application
and make it more suitable for production use. For one thing, as I've mentioned, the
subscription management app should probably be a Web app if the software is going
to be used in a production scenario. For another, you could enable digest
notifications in order to keep subscribers from being pummeled by notifications in
the event that lots of bug entries are suddenly added or changed. Setting up digest
notifications would group all of the notifications in a notification batch for each
subscriber into a single e-mail, perhaps substantially reducing the number of
notification e-mails they receive.
Another useful improvement to this application would be to open up the match rule
specification so that a user could specify more sophisticated subscription criteria. As
I mentioned earlier, the use of wildcards to make columns optional in a match rule
query is not the most efficient approach in terms of index usage. Using a plain "%"
wildcard virtually guarantees a table scan. (You'll see a clustered index scan if a
clustered index exists on the table, but they are semantically and functionally
equivalent.) A better approach would be to completely eliminate columns the user
doesn't want to filter the subscription by from the query. However, given that the
same match rule is used for all subscribers, this isn't an easy task.
One method for doing this would be to set up a dynamic match rule using some T-
SQL tricks I've demonstrated in previous books. Basically, the dynamic match rule
technique works like this.
2. When the user clicks the Add button in the GUI to add the subscription to the
application database, the GUI creates a SQL Server view object behind the
scenes that encapsulates the specified subscription criteria. Each view returns
all the columns in the underlying table for all rows that match the custom
criteria supplied by the user. The GUI then adds the subscription to the
application database using this view name as its lone criterion.
3. When the match rule runs, it generates a dynamic T-SQL query that takes the
names of the views previously built by the GUI and concatenates them into a
minimum number of large UNION ALL queries. Each of these dynamic queries
is less than 8,000 bytes in length due to the limitations of the varchar data
type. Each one is used to generate notifications just as a normal match rule
SELECT is. The end result is that the custom subscription criteria specified in
each subscription is used to drive the notification process without requiring the
use of a cursor or other inefficient mechanism.
This is best understood by way of example, such as Listing 19.6. (You can find this
code in the dynamic_filter.sql script in the \CH19\bns\svc subfolder on the CD
accompanying this book.)
Listing 19.6
SET @nViewQryLen=@nViewNameLen+
DATALENGTH(@strSelectFrag)+DATALENGTH(@strUnionFrag)
SET @strQry=''
SELECT @nNumViews=COUNT(*)
FROM sysobjects
WHERE type='V'
AND name LIKE 'V%'
SET @strLastView=''
SELECT @strQry=@strQry+
@strSelectFrag+
cast(name as varchar(128))+
@strUnionFrag,
@strLastView=cast(name as varchar(128))
FROM sysobjects
WHERE type='V'
AND name LIKE 'V%'
AND name > @strLastView
ORDER BY name
SET ROWCOUNT 0
INSERT matchtable
EXEC(@strQry)
SET ROWCOUNT @nMaxViewsPerQuery
SET @nNumViews=@nNumViews-@nMaxViewsPerQuery
SET @strQry=''
END
SET ROWCOUNT 0
The salient code here is within the WHILE loop. It uses a couple of interesting
techniques to avoid using a cursor and to query as many of the subscription criteria
views at once as possible. First, it uses the concatenation trick I first demonstrated in
my book The Guru's Guide to Transact-SQL to build a dynamic T-SQL string that
SELECTs from as many criteria views at once as possible. Second, it sets
@strLastView to the final view name it processes with each iteration so that it can
keep track of which view names remain to be processed. Given that our dynamic
query string is limited to 8,000 characters, we calculate the number of criteria views
we can process with each iteration, then use SET ROWCOUNT to limit the number of
view names we see within the loop to this number. That's the purpose of the
@MaxViewsPerQuery variable. Because we know that a variable that's assigned the
value from a column in a result set with multiple rows retains the value from the last
row, we can assign @strLastView in the same SELECT statement that we use to build
our dynamic query string, then use it in subsequent iterations of the code to pick up
where we left off.
The INSERT statement above is a placeholder for the SELECT statement that's
normally used in a matching rule. I insert the results from the dynamic filter into this
table to make the code easy to test. Obviously, we'd call the notification function to
generate raw notifications using the dynamic filter views if we were going to make
use of this in our application. For example, we might change our match rule query to
look something like this:
EXEC('SELECT dbo.BNSInfoNotify(b.SubscriberId,
s.DeviceName,
s.SubscriberLocale,
b.ID,
b.Product,
b.OpenedBy,
b.AssignedTo,
b.Description,
b.DateChanged,
b.Pri,
b.Sev)
FROM BNSSubscriptions e JOIN ('+@strQry+') b ON
(e.SubscriberId=b.SubscriberId)')
We'd then execute it for each iteration of the WHILE loop. The result would be
notification generations based on the criteria specified in the views.
This is just a prototype of how you could implement dynamic match rules with
Notification Services. In my own testing, it's quite fast and seems fully functional.
You can use the create_views.sql and dynamic_filter.sql scripts in the CH19\bns\svc
subfolder on the CD accompanying this book to experiment with this technique.
Of course, this type of mechanism isn't really justified when the underlying data has
only a few columns in the first place, as is the case with our Bugs table. However,
envision a situation where you have hundreds of columns in the underlying data.
Using the wildcard trick to allow columns to be omitted from the subscription criteria
may not be suitable from a performance standpoint. It may also be too restrictive in
terms of search criteria�users may want more flexibility than merely specifying
simple search criteria for a fixed set of fields. In that scenario, a dynamic match rule
mechanism like the one just demonstrated would likely be not only more efficient
than the wildcard approach but also much more flexible. You could even create a
supplemental table to allow users to associate names or descriptions with their
criteria views that they could then use to reference the view for later reuse . In other
words, you'd be allowing them to create named, reusable queries that were
encapsulated as view objects and capable of generating notifications through a
modified version of the standard Notification Services match rule, as demonstrated
above.
Recap
Notification Services provides a flexible, extensible, and scalable platform for
creating notification applications. It provides most of the plumbing necessary to
build these types of applications; you supply custom code as necessary to
implement the functionality you need.
There are three main components in any Notification Services application: the
Windows service, the SQL Server data store, and the subscription management
application. Notification Services provides the tools to automatically produce the SQL
Server data store and the Windows service and provides sample applications that
you can copy and customize to create your own subscription management
applications.
Within the Windows service, NSService.exe, three main tasks are carried out to
produce subscriber notifications. The event collector process is responsible for
collecting events of interest to the application. The generator is responsible for
matching these events with subscriptions and generating raw notifications. The
distributor is responsible for transforming these raw messages into notifications and
delivering them to delivery channels to be sent to subscribers.
Notification Services provides a rich object framework with which you can construct
notification applications. Interfaces are provided for implementing custom event
providers, delivery channels, and content formatters. Objects are provided for
managing subscribers, subscriber devices, subscriptions, and various other
Notification Services elements. Between this model and the XML configuration files
and style sheets supported by the product, you have a fully programmable solution
that can be customized to meet your needs.
Knowledge Measure
1. What element in a notification application is responsible for translating raw
notifications into notification messages suitable for sending to subscribers?
4. What is the name of the executable that serves as the Windows service within
a Notification Services application?
6. Which task usually comes first in the Notification Services development cycle:
registering the Windows service or creating the instance and application
databases?
8. Name the lone content formatter that ships with Notification Services.
9. When the notification function is called from within a matching rule, what
happens behind the scenes?
11. True or false: Once an instance has been registered, its configuration file is not
used directly by the instance's Windows service.
12. True or false: You use regsvr32.exe to register a new notification application
instance.
13. Which XML configuration file�the instance configuration file or the application
definition file�lists the delivery channels available for application use?
16. Name the process within a Notification Services application that's used to get
rid of obsolete event and notification data.
17. Name the three types of Notification Services components for which you can
develop custom code by implementing predefined interfaces.
18.
What does the quantum period specified in the application definition file
control?
�Kenneth E. Routen
DTS showcases key Microsoft technologies such as COM, ATL, OLE DB, and, of
course, SQL Server. The DTS Designer is based on an extensible object model, which
allows it to be extended with COM components. The DTS object model itself is
accessible via COM Automation, so you can programmatically control DTS packages
and transformations in any language that is capable of COM Automation (e.g., Visual
Basic).
It should come as no surprise that DTS is able to transfer data between a source and
a destination. Given this and the fact that its underlying data architecture is based
on OLE DB, it shouldn't be surprising that you can transfer data from one OLE DB
provider to another. The source and destination can be flat files, client/server
databases, mainframe databases, and so on. The whole point of DTS is to make
moving data from Point A to Point B as easy as possible, so it's fitting that it offers a
myriad of options for doing exactly that.
I'm not going to bore you with step-by-step instructions for building lots of different
types of DTS packages or for transferring data between providers. The Import/Export
Wizard and the DTS Designer are intuitive enough that you shouldn't need a book for
that. You drop some connections onto the DTS Designer workspace, link these
connections with transformation tasks and precedence constraints, and away you
go. Building a basic package is so simple that you shouldn't need to consult Books
Online to figure out how to do it, let alone a third-party book. The DTS Designer is
best experienced by simply diving in and building a few packages of your own. An
excellent way to get a jump start on this is to use the DTS Import/Export Wizard and
allow it to build a package for you, then load the resulting package into the DTS
Designer.
I'm also not going to fill these pages with screenshot after screenshot of the DTS
Designer. I once made the mistake of buying a SQL Server programming book only
to discover that the coverage of many topics, DTS included, consisted mostly of
screenshots. Frequently, the author put two screenshots on one page, with just a line
or two of explanatory text between them. I don't think you need a book to see what
the DTS Designer looks like�you can fire it up for yourself right from Enterprise
Manager. And although it might help make this book seem a bit heftier, I don't think
you bought it to look at screen prints of something you can readily view on your PC.
So we'll talk about the basic elements of DTS packages, we'll delve into how DTS is
put together from an architectural standpoint, we'll discuss a few of the fringe
elements of DTS programming and applications, and then we'll walk through a
sample DTS application. You'll learn about some of DTS's strengths and weaknesses,
and you'll hopefully gain a fresh perspective into some of the innovative ways in
which you can put DTS to use.
Overview
Architecturally, DTS is composed of the following pieces:
OLE DB providers. (Technically speaking, these aren't part of DTS, but they are
integral to using it. OLE DB is the means by which DTS retrieves and stores
data.)
A multiphase data pump that you can use to set up sophisticated data
transformations.
NOTE: As of this writing, if you want to enable the ability to save packages to the
repository, you must right-click the Data Transformation Services node in Enterprise
Manager, select the Properties option, and check the Enable Save To Meta Data
Services checkbox. Until this option is enabled, Meta Data Services will not appear
as an option in the package save dialog box.
When you choose the SQL Server storage option, the DTS Designer calls an
undocumented stored procedure named sp_add_dtspackage to store the package in
msdb. The package is passed as an image data type to the procedure and stored in
a table named sysdtspackages. You can view the list of packages saved to
sysdtspackages via the Local Packages node located under the Data Transformation
Services node in Enterprise Manager.
COM structured storage files resemble file systems in many ways. They provide a
means of persisting COM objects to disk. Storing a DTS package on disk as a COM
structured storage file allows you to easily transport it via other mediums such as e-
mail and CD-ROM.
Connections
The connections supported by DTS consist of OLE DB data sources. Because an OLE
DB provider exists that supports ODBC data sources, you can also use ODBC data
sources with DTS. Examples of the kinds of data you can access with DTS through
OLE DB include the following.
SQL Server databases (via SQLOLEDB, the native SQL Server OLE DB provider)
HTML
Tasks
A task is an atomic work item. Each task in a DTS package defines a part of the data
transformation process and is executed as a single step. Out of the box, DTS
provides a number of task objects you can use to build complex data transformation
applications. Examples of the types of things you can do include the following.
Data copying between disparate OLE DB providers. You can load data from one
type of data source into another type of destination. For example, you can
copy data from an Oracle database into a SQL Server database or vice versa.
You can transfer nonrelational as well as relational data, and you can use DTS's
Bulk Load functionality to set up high-speed data loads from text files into SQL
Server databases.
Complex data transformations. You can map columns from a data source to a
data destination and specify how the data is to be transformed as it is copied.
You can use ActiveX scripts to manipulate the data in transit, and you can
specify one-to-many, many-to-one, and other unusual relationships between
source and destination data.
Nested packages. Using the Execute Package task, you can set up one package
to call another, adjust its global variables, and so on.
Database object duplication. You can transfer tables, views, stored procedures,
user-defined functions, defaults, rules, and user-defined data types from one
SQL Server database to another. DTS can even optionally script the entire
operation for you as a series of T-SQL scripts and BCP data files.
The Multiphase Data Pump
Much of the real functionality of DTS is encapsulated in its multiphase data pump.
It's the engine behind the Transform Data task, the Data Driven Query task, and the
Parallel Data Pump task. As you might expect, the main purpose of the pump is to
move and transform data between data sources.
The data pump goes through six basic steps or phases during the transformation of
data from one source to another. These phases can be exposed as events to which
you can attach scripting code in order to customize the behavior of the data pump.
Note that the row transformation phase is the only data pump phase to which you
can attach code by default. In order to make the other phases visible to your
package and scripting code, you must enable multiphase pump display in the DTS
Designer by right-clicking the Data Transformation Services node in Enterprise
Manager and checking the Show multi-phase pump in DTS Designer option. Once
the multiphase pump has been fully exposed in the Designer, you can attach ActiveX
script code to phase events to customize the behavior of the pump. As I've said, the
data pump goes through six basic phases when transforming data.
1. Presource phase� this step occurs before any rows are actually read from the
source data. This is a good place to create header rows or carry out other
preparatory work prior to the start of the row transformation process.
2. Row transform phase� this is the default data pump phase and is the step
where each row is read from the source data and optionally transformed.
3. Post�row transform phase� this step occurs after the row transform phase
has completed and is itself composed of three subphases.
4. Batch complete phase� this phase occurs whenever the number of rows
inserted into the destination rowset equals the batch size specified in the
Options page for the transformation. It also occurs when all the rows in the
source data have been processed and there is at least one row in the
destination rowset. When this phase occurs, the pump writes the rows in the
destination rowset to the destination table. Depending on whether you've
configured the package to use transactions, an error during this phase may
result in only some of the rows being written to the destination. This phase
executes on completion of a batch, regardless of whether it's successfully
written to the destination. Note that it may not execute in some circumstances,
such as when the insertion of each source row into the destination rowset fails.
In that scenario, there's no batch to write to the destination because the
destination rowset is empty.
5. Postsource phase� this is the corollary to the presource phase and allows you
to hook up script code that performs some task after all rows have been
transformed. You can still access the destination data at this point, so you
could use this phase to write out summary rows or further interact with the
destination data in some way.
6. Pump complete phase� this phase occurs after all processing is complete, just
before the pump shuts down. Although you no longer have access to either
source or destination data at this point, you still have the full ActiveX scripting
environment at your disposal to do things such as write audit or log records,
interact with the file system, and so on.
You attach script code to the other phases in a data pump transformation just as you
would normally attach it to the row transform phase: You click the Transformations
tab, then select the phase to which you want to attach code in the Phases filter
combo box and click the New button to set it up. Select the ActiveX script option in
the Create New Transformation dialog, then click the Properties button in the
Transformation Options dialog in order to add your scripting code. Configure the
source and destination columns as appropriate and save the transformation. When
you run the package, the code you attached to the specified pump phase will
execute as appropriate.
NOTE: All of the example packages mentioned in this chapter use "(local)" when
referring to a SQL Server. If the server on which you'll be testing these packages is a
named instance, you can create a client configuration alias named "(local)" in order
to avoid having to modify the example packages before running them.
If you bring up the options dialog for each of the ActiveX transformations, you'll see
a Phases tab that allows you to specify which phase(s) the code is to be associated
with. By default, this is the same as the phase selected in the Phase filter combo box
when the transformation was first created. You'll note that the last two items in the
dialog are reversed in terms of when they execute: the pump complete event
actually occurs after the postsource event.
The Bulk Insert Task
Aside from the multiphase data pump, the other key DTS mechanism for loading
data is the Bulk Insert task. The Bulk Insert task basically exposes the functionality
of the T-SQL BULK INSERT command via a graphical interface and internally calls it to
load data into the server. The fact that BULK INSERT is the actual mechanism used to
load data into the server brings with it a couple of important considerations. First,
the path to the source file must be specified in relation to the target SQL Server. So,
if you're using a Bulk Insert task to copy a file from a client machine to a remote SQL
Server, you must specify a UNC path to the file, and the account under which SQL
Server is running must have access to the file (e.g., LocalSystem won't work
because LocalSystem has no network rights). Second, you'll see much better
performance if the file you're loading resides on the same machine as the SQL
Server instance. Internally, the T-SQL BULK INSERT command is implemented by a
COM object that runs inside SQL Server. If the file being loaded resides on the SQL
Server machine, the COM object simply opens it and reads it as a local file. If the file
resides elsewhere on the network, you may find that network bandwidth limitations
throttle the speed of your bulk load, just as they would with any other bulk load
facility such as bcp.exe or the BULK INSERT command itself.
Note that you cannot use a Bulk Insert task to load data directly from one SQL
Server database table into another. The source for a Bulk Insert task must be an
operating system file. That said, you can enable high-speed bulk copy processing
with normal Transform Data tasks via the task option Use fast load (which is enabled
by default).
Also note that Bulk Insert tasks do not support the types of ActiveX script
transformations supported by the more ubiquitous Transform Data task. Because
Bulk Insert calls the T-SQL BULK INSERT command, its functionality is more limited
than that of the Transform Data task.
You can load the sample package BulkInsertExample.DTS from the CH20 folder on
the CD accompanying this book to experiment with Bulk Insert tasks. I encourage
you to start a Profiler trace before executing the package so that you can see the T-
SQL code being sent to the server.
The Data Driven Query Task
In a simpler world, basic DML would be sufficient to transfer data between two
sources. Data transformation would consist mostly of one-to-one column copies, and
you'd rarely need to employ any sort of complex logic to determine how and
whether data should be copied from one place to another. In the real world,
however, complex logic is often behind the methods used to move data from place
to place, and stored procedures and custom queries are often necessary to get the
job done. The Data Driven Query task exists for these types of situations. If you have
a data transformation need wherein you require more functionality than simply
inserting rows into a destination table, a Data Driven Query task may fit the bill. For
garden-variety data load operations, the Transform Data task and the Bulk Insert
task are preferable to a Data Driven Query because they are highly optimized for
inserts. You should use a Data Driven Query task instead of one of these only when
your needs exceed their capabilities.
A Data Driven Query task works by allowing you to set up alternate insert, update,
and delete queries to be executed for each row in the source data. These queries
can be simple SQL queries or they can be stored procedure calls or complex SQL
batches. Each query can have replaceable parameters, to which the Data Driven
Query task can assign the values from the source data as they are read for each row.
When you set up a Data Driven Query task, you configure a binding table. It's
important to understand the purpose of the binding table and how it's used by the
task. Basically, the binding table is a placeholder�a mechanism for defining the
destination rowset into which to place data from the source rowset. The queries you
set up determine where the data actually goes. They can reference the binding table
or a different table or tables altogether. Understand that these queries are
placeholders only�you can execute an UPDATE from your insert query, a DELETE
from your update query, and so forth. Listing 20.1 shows an example of a typical
update query.
Listing 20.1
UPDATE authors_new
SET au_lname=?,
au_fname =?,
phone=?,
address=?,
city=?,
state=?,
zip=?,
contract=1
WHERE au_id=?
Here, we're updating the columns in a copy of the pubs..authors table with values
from the source data with the exception of the contract column, which we force to 1
for updates.
Once you've configured your insert, update, and delete queries, you set up an
ActiveX script to determine which of these gets called as data is processed. The
return value from the script determines which query gets executed. You can check
for the existence/nonexistence of rows, use a switch statement to return a different
script result based on a column in the source data, branch based on global variables,
and so on�basically you have full control over which of your queries gets executed.
Listing 20.2 demonstrates how a Data Driven Query ActiveX script might look.
Listing 20.2
Function Main()
DTSDestination("au_id") = DTSSource("au_id")
DTSDestination("au_lname") = DTSSource("au_lname")
DTSDestination("au_fname") = DTSSource("au_fname")
DTSDestination("phone") = DTSSource("phone")
DTSDestination("address") = DTSSource("address")
DTSDestination("city") = DTSSource("city")
DTSDestination("state") = DTSSource("state")
DTSDestination("zip") = DTSSource("zip")
DTSDestination("contract") = DTSSource("contract")
Here, we branch based on the value of the state column in the source data. For
California customers, we always do an insert. For Oregon customers, we always do
an update. For Kansas customers, we always do a delete. And for everyone else, we
do nothing�we skip the source row. As I said, the insert, update, delete, and select
query placeholders are just that: placeholders. The actual queries you execute could
do anything you please�for example, they might all do inserts into the destination,
albeit in different ways. You can use the four selections you have inside the Data
Driven Query task to divide the queries you need to run into up to four groups. And,
of course, the SQL code or stored procedure you call can branch further based on
values it receives from the source data.
You can explore the Data Driven Query task by loading the
DataDrivenQueryExample.DTS package from the CH20 folder on the CD
accompanying this book. Again, collecting a Profiler trace while the package runs
can be a handy way to see what's going on behind the scenes on the server as each
row in the source data is transformed.
ActiveX Transformations
Earlier, I mentioned that DTS provides several mechanisms for employing ActiveX
script code to enhance package functionality. One of the ways you can use ActiveX
code to enhance the functionality of a DTS package is through ActiveX
transformations. ActiveX transformations give you complete control over the
mapping of source columns to destination columns in a data transformation. You can
concatenate multiple source columns into a single destination column, and you can
break a single source column into multiple destination columns. You can assign a
destination column using no source columns at all (usually the source data comes
from global variables or lookups in this scenario); you can transform a source column
without using a destination column (usually a global variable is the target of the
transformation in this case); and you can perform transformations where you have
neither source nor destination columns (e.g., to manipulate global variables or
external objects via the FileSystemObject or ADO).
This is best understood by way of example. The sample ActiveX script in Listing 20.3
combines the address, city, state, and zip columns from the pubs..authors table into
a single destination column.
Listing 20.3
Function Main()
DTSDestination("address") = DTSSource("address") & " " _
& DTSSource("city") _
& ", "+DTSSource("state") & " " &DTSSource("zip")
Main = DTSTransformStat_OK
End Function
Note the use of the built-in DTSSource and DTSDestination objects. Since an ActiveX
transformation script is executed for each row in the source data, DTSSource will
always refer to whatever the current row in the source is, and DTSDestination will
always refer to the current row in the destination. You can find the source to this
package in the file ActiveXTransformationExample.DTS in the CH20 folder on the CD
accompanying this book.
Other Types of Transformations
I said at the beginning of the chapter that I wouldn't bore you with every single DTS
feature and usage detail. You can glean most of these yourself by reading through
Books Online and by building a few packages. That said, a few of the more esoteric
features bear mentioning. One is the WriteFile transformation. A WriteFile
transformation allows you to take two source columns�one supplying a file name
and the other supplying the file's data�and transform them into an external text file
for each row in the source table. The text file can be written in ANSI, UNICODE, or
OEM format. The WriteFileExample.DTS package in the CH20 folder on the CD
accompanying this book demonstrates how to use a WriteFile task to transform the
pubs.pub_info table into a series of text files.
Similarly, a Read File task can be used to load the contents of a series of text files on
disk into a destination column. In this scenario, you have just one source and one
destination column. The source column specifies the name of the file to load, and
the destination column specifies the target column for the file's contents. The
ReadFileExample.DTS package in the CH20 folder on the CD accompanying this book
demonstrates how to use a Read File task to load the text files created by the
WriteFileExample.DTS package into a copy of the pubs..pub_info table.
In both examples, we use the pub_id column of the pub_info table to name the input
or output file. Since pub_id is the primary key for the table, this ensures that we
don't run into file name collisions in the file system.
Lookup Queries
Although I don't recommend you use lookup queries extensively, I feel obligated to
cover them and explain how they work. Essentially, a lookup query is a
parameterized subquery that you can set up to be called to look up data values. (A
lookup query can actually do things besides look up data values and perform other
types of work for each value in the source data.) Using a lookup query is similar to
opening a cursor on a table in T-SQL and calling a stored procedure or executing a
subquery for each row in the cursor. You usually execute a lookup query during a
transformation of some type. A lookup query can be a stored procedure call or a
plain SQL query. A typical lookup query might resemble the following:
SELECT phone
FROM authors
WHERE (au_id = ?)
You set up lookup queries on the Lookups tab of your transformation task. If the task
supports lookup queries, you'll see a Lookups tab in the dialog. Each lookup query
has a name, a source connection, a cache setting, and a query associated with it.
The cache setting allows you to configure the number of values returned by the
lookup that are cached for reuse. This is especially useful when you are transforming
a relatively large number of rows and the number of rows in the lookup table is
relatively small.
You reference lookup queries by using the DTSLookups global function. A typical
reference to a lookup query in an ActiveX transformation might look like this:
DTSDestination("phone") =
DTSLookups("phone").Execute(DTSSource("au_id"))
Here, we pass the DTSLookups function the name of the lookup we want, then call
the Execute method on that lookup object, passing it the parameter required by its
parameterized query.
It's possible for a lookup query to return zero rows. When that happens, the Execute
method will return an empty variant. You can test for this in your script using the
VBScript IsEmpty function, as shown in Listing 20.4.
Listing 20.4
Dim Phone
Phone = DTSLookups("phone").Execute(DTSSource("au_id"))
If IsEmpty(Phone) Then
DTSDestination("phone")="None"
Else
DTSDestination("phone")=Phone
End If
It's also possible for a lookup query to return multiple rows. While you only have
access to the first row returned, you can use an ORDER BY clause in your lookup
query to ensure that the first row returned is the one you want. You can also detect
when multiple rows are returned by inspecting the LastRowCount property of the
lookup object returned by DTSLookups, as shown in Listing 20.5.
Listing 20.5
Dim c
c=DTSLookups("phone").LastRowCount
If c > 1 Then
MsgBox "Warning: " & c & " lookup matches found"
End If
You can explore lookup queries further by loading into the DTS Designer the
LookupQueryExample.DTS package in the CH20 folder on the CD accompanying this
book. Double-click the Transform Data task and check out its Lookups page. In this
example, we copy the authors listed in the pubs..titleauthor table to a new table in
the Northwind database. We take the au_id column from titleauthor and pass it to a
series of lookup queries to retrieve various columns from the authors table. No
columns from titleauthor are actually copied to the destination except for the au_id
column.
Workflow Properties
You can control the behavior of each part of a package's workflow by adjusting its
workflow properties. I won't go through all of these, but some of them are very
handy to know about, especially when building more complex packages. You bring
up the Workflow Properties dialog by right-clicking a package step and selecting
Workflow Properties from the popup menu.
The Close connection on completion option instructs DTS to close the connection
associated with a task after it has completed. This is useful in the following
circumstances.
There are a limited number of connections available on a data source and you
need to be conservative with them.
You need to free the locks an open connection is holding on resources (e.g.,
local disk files).
You need to refresh the metadata the connection caches when it's opened.
By default, DTS is multithreaded and will execute tasks in parallel. You can control
the maximum number of tasks executed in parallel via the Package Properties
dialog. Some COM components, however, are not free-threaded and do not support
parallel execution of tasks (e.g., custom tasks created with VB are exclusively
apartment-threaded, as are some OLE DB providers). Although the DTS Designer will
automatically set this option for you for tasks that it detects require it, you must set
it yourself for associated tasks�tasks that have a dependency on the apartment-
threaded task must also run on the main package thread (e.g., an ActiveX Script
task or a Dynamic Properties task that makes changes to the apartment-threaded
task, or an Execute Package task that executes a package containing an apartment-
threaded task). If you run into a situation where you must use such a component in a
DTS package, you can enable the Execute on main package thread workflow option
to ensure that you can access the component safely.
Each thread DTS creates to execute tasks has a Normal thread priority by default.
You can adjust this for a particular task in order to increase or lessen its priority in
relation to the other tasks in the package. This causes DTS to call the Win32 API
SetThreadPriority to adjust the thread's priority in relation to the other worker
threads. (Recall that we discussed SetThreadPriority in Chapter 3.) Normally, you
shouldn't need to adjust a task's priority, but this ability can come in handy when
you have CPU-intensive tasks that you need to execute as quickly as possible, even
at the expense of other work in the package.
DTS and Transactions
Normally, each modification within a DTS package is committed as it's made. The
Use transactions package property controls whether the package initiates a
transaction when executed. And although this property defaults to true, you must
also enable the Join transaction if present workflow property for a step in order to
actually queue changes to a transaction. Since this workflow property is false by
default, changes made within a package are normally committed as soon as they're
made.
Enabling the Use transactions package property (which is true by default) and
Enabling the Join transaction if present workflow property for each step that
you want to participate in the transaction
In order for the transaction to start successfully, the following conditions must be
true.
The data sources involved must support transactions (e.g., SQL Server and
Oracle support transactions; a dBase file does not).
The steps you are attempting to enlist in the transaction must be supported
task types (e.g., the Bulk Insert task is supported; the Execute Process task is
not).
Note that because configuring a package step to join a transaction causes its
associated connections to be enlisted in the transaction, any other tasks that use
those same connections will also be enlisted in the transaction, even if they do not
have the Join transaction if present option enabled. To avoid this, use separate
connections for tasks that you want to enlist in transactions and those that you
don't.
You can control whether a package step commits the current transaction on success
or rolls it back on failure through the Commit transaction on successful completion of
this step and Rollback transaction on failure workflow options, respectively. You can
also control whether successfully executing the package as a whole commits the
open transaction via the Commit on successful package completion package
property.
A package you execute from within another package via an Execute Package task
inherits its calling package's transactional context if the Execute Package task that
called it has joined its parent package's transaction. This changes the transactional
semantics within the child package considerably. For one thing, no package
transaction is initiated, even if the child package has enabled the Use transactions
option and has steps with the Join transaction if present option enabled. For another,
the Commit transaction on successful completion of this step and Commit on
successful package completion options are ignored. No commit is performed within
the child package, regardless of these settings. Note, however, that a rollback within
a child package will roll back the parent package's transaction.
The easiest way to implement looping behavior within a DTS package is to put the
steps that you want to execute repetitively in a separate package and use a simple
ActiveX script to execute it as many times as necessary. Listing 20.6 shows an
example of an ActiveX Script task that loops a set number of times, executing a
second package with each iteration. (You can find this code in the outer.dts sample
package in the CH20 folder on the CD accompanying this book.)
Listing 20.6
Function Main()
Dim oPkg
Set oPkg=CreateObject("DTS.Package")
oPkg.LoadFromStorageFile "inner.dts", ""
For x=1 TO 5
oPkg.Execute
Next
Main = DTSTaskExecResult_Success
End Function
Note that because this code doesn't rely on the DTS runtime environment, you could
run it as a standalone VBScript (minus the function result, of course). It simply
creates a DTS package object, loads a package from a structured storage file, then
executes it five times. We don't explicitly free the object (by setting oPkg to Nothing)
because the script runtime environment will do that for us. See the VBScript file,
loop.vbs, in the CH20 folder on the CD accompanying this book for any example of a
standalone VBScript that executes a package repetitively.
Another way to control package workflow from within an ActiveX script and to effect
a type of looping behavior is to set the ExecutionStatus of an already executed step
back to DTSStepExecStat_Waiting. This will cause the step to execute again,
producing a type of repetitive loop within the package. Since you can decide
whether to change the ExecutionStatus property based on the logical condition of
your choosing, you have all the tools you need to construct a logical loop, just as you
might typically build in a more traditional programming environment.
There are a couple of distinct approaches for using ExecutionStatus to implement a
loop within a DTS package. The easiest and most flexible is to associate an ActiveX
script with a package step via the Workflow Properties dialog. We use the Steps
collection of the current package to locate the step to repeat, then set its
ExecutionStatus to DTSStepExecStat_Waiting.
This approach is better than the approach that we'll discuss in a moment because it
doesn't require ActiveX Script tasks to be set aside for the sole purpose of
implementing looping behavior. By piggybacking an ActiveX script onto an already
existing package step through a Workflow Properties association, we alleviate the
need for separate components just to set up our loop, and we allow any step to be
the loop initializer, worker, or controller.
Another advantage of this approach is that we can control whether the step to which
we've associated our ActiveX script executes based on the looping condition. For
example, we might not want the step to execute until the loop completes. Because
we're associating an ActiveX script with the step's workflow, we can return
DTSStepScriptResult_DontExecuteTask from the ActiveX script to prevent the step
from executing.
Listing 20.7 offers an example of a DTS package loop implemented via ActiveX script
workflow associations. (You can find this on the CD accompanying this book in the
file CH20\LoopExample.dts�see the Loop1 example code.)
Listing 20.7
Step 1:
'*****************************************************************
' Initialize the loop control variable
'*****************************************************************
Function Main()
DTSGlobalVariables("foo").Value=0
MsgBox "Loop1: Initialize"
Main = DTSTaskExecResult_Success
End Function
Step 2:
'*****************************************************************
' Perform loop work
'*****************************************************************
Function Main()
MsgBox "Loop1: Work"
Main = DTSTaskExecResult_Success
End Function
Step 3:
'*****************************************************************
' Check the loop variable and repeat the step as appropriate
'*****************************************************************
Function Main()
Dim oPkg
DTSGlobalVariables("foo").Value = _
DTSGlobalVariables("foo").Value + 1
oPkg.Steps("DTSStep_DTSActiveScriptTask_6").ExecutionStatus = _
DTSStepExecStat_Waiting
Main = DTSStepScriptResult_DontExecuteTask
Else
Main = DTSStepScriptResult_ExecuteTask
End If
End Function
As I've said, there are multiple ways to use ExecutionStatus to construct a looping
mechanism in a DTS package. Another method for doing this makes use of ActiveX
Script tasks to implement the looping code. This technique works as described
below.
1. You set up a package with a step that you want to execute repeatedly.
2. You link an ActiveX Script task step to this step using a precedence constraint.
This constraint can be any of the three stock workflow constraints: On Success,
On Completion, or On Failure, depending on what you're actually trying to
accomplish. You'd likely use the first two when implementing a basic loop; you
might use an On Failure constraint when implementing retry logic.
3. In the ActiveX script code, you check a logical condition that determines
whether to repeat the initial step (assuming you don't want to loop indefinitely)
and set the ExecutionStatus of the step to DTSStepExecStat_Waiting if so.
This is best understood by seeing the code itself, so Listing 20.8 presents the
ActiveX source code from a package that demonstrates this technique. (You can find
this on the CD accompanying this book in the file CH20\LoopExample.dts�see the
Loop2 example code.)
Listing 20.8
Step 1:
'*****************************************************************
' Initialize the loop control variable
'*****************************************************************
Function Main()
DTSGlobalVariables("bar").Value=0
MsgBox "Loop2: Initialize"
Main = DTSTaskExecResult_Success
End Function
Step 2:
'*****************************************************************
' Perform loop work
'*****************************************************************
Function Main()
MsgBox "Loop2: Work"
Main = DTSTaskExecResult_Success
End Function
Step 3:
'*****************************************************************
' Check the loop variable and repeat the step as appropriate
'*****************************************************************
Function Main()
DTSGlobalVariables("bar").Value=DTSGlobalVariables("bar").Value+1
If DTSGlobalVariables("bar").Value<5 Then
MsgBox "Loop2: " & DTSGlobalVariables("bar").Value
Dim oPkg
Set oPkg = DTSGlobalVariables.Parent
oPkg.Steps("DTSStep_DTSActiveScriptTask_1").ExecutionStatus _
= DTSStepExecStat_Waiting
End If
Main = DTSTaskExecResult_Success
End Function
Here, we have a package comprised of three steps. In the first step, we use an
ActiveX script to initialize our loop control variable, a global variable named bar. In
step 2, we carry out the work that we want to execute repeatedly. In step 3, we
increment our control variable and check to see whether it's less than 5, and, if so,
change the ExecutionStatus of the step 2 ActiveX task to DTSStepExecStat_Waiting.
This will cause the second step to execute again, which will lead back to step 3, at
which time we'll again increment and evaluate the loop control variable to determine
whether to continue the loop or proceed with the rest of the package.
Some people prefer this approach to the first one I presented because the ActiveX
code that implements the loop is more readily accessible in the DTS Designer. With
the first approach, this code is somewhat hidden through a workflow properties
association.
As you've probably already surmised, you can also use ActiveX script associations to
implement conditional execution. In the first ExecutionStatus looping example
above, we return DTSStepScriptResult_DontExecuteTask from the ActiveX script
associated with a step's workflow when we don't want the step to execute and
DTSStepScriptResult_ExecuteTask when we do. You can use this same technique to
conditionally execute a step based on any criteria you can evaluate from an ActiveX
script.
Note that you can also use ActiveX script associations to implement retry logic. In
addition to returning DTSStepScriptResult_ExecuteTask and
DTSStepScriptResult_DontExecuteTask from a workflow ActiveX script, you can also
return DTSStepScriptResult_RetryLater in order to cause the step to be reexecuted at
some later point during package execution.
You can use a variation of the ActiveX Script task looping technique to implement
conditional execution of package steps. Say, for example, that when a given
condition is true, you want to execute a specified series of package steps, otherwise
you want to skip them. One way to do this is to check for the condition in an ActiveX
task that precedes the series in question and set the ExecutionStatus of the first
step in the series to DTSStepExecStat_Inactive if you do not want to execute it.
Deactivating a step has the effect of causing it and the steps that follow it not to
execute. Listing 20.9 presents some VBScript code to illustrate.
Listing 20.9
Function Main()
Dim oPkg
Set oPkg = DTSGlobalVariables.Parent
If DTSGlobalVariables("bFoo") Then
oPkg.Steps("DTSStep_DTSCreateProcessTask_1") _
.ExecutionStatus = DTSStepExecStat_Waiting
Else
oPkg.Steps("DTSStep_DTSCreateProcessTask_1") _
.ExecutionStatus = DTSStepExecStat_Inactive
End If
Main = DTSTaskExecResult_Success
End Function
Listing 20.10
Function Main()
If DTSGlobalVariables("bFoo") Then
Main = DTSTaskExecResult_Success
Else
Main = DTSTaskExecResult_Failure
End If
End Function
Here, we check a global variable and return a task result based on its value. This
technique isn't as clean as the other techniques because it can cause errors to be
reported by the DTS package execution engine. Of course, you could ignore these
error messages, but by using one of the other techniques instead you avoid the error
messages altogether.
Parameterized DTS Packages
A natural thing to want to do with a DTS package is to parameterize it so that, for
example, it can work with different data sources/destinations than the ones it
referenced when it was originally built. Things like global variable values, connection
properties, object names, and the like are natural targets for parameterization.
There are a couple of ways to parameterize DTS packages. The first is to use a
Dynamic Properties task. A Dynamic Properties task allows you to set the value of
any property in a package from one of six sources:
3. An environment variable
4. A T-SQL query (only the first column of the first row in the query result set is
used)
5. A data file
6. A constant
Given that you can supply values for global variables when you execute a package
(e.g., using the dtsrunui utility or an ExecutePackage task within another package), a
common practice is to assign global variables to package properties using a
Dynamic Properties task, then assign those variables' values at runtime. Using
global variables in this way provides a layer of abstraction between the mechanism
supplying the parameter values and the dynamic method used to assign property
values within the package. This allows you to easily change the method of executing
the package without losing the dynamic nature of the package.
Another way to parameterize a DTS package is via ActiveX script code. It's often less
trouble to modify package properties using script code than it is to set up a Dynamic
Properties task. Also, understand that you can't insert global variables into the
middle of a property using a Dynamic Properties task�either you assign the whole
property or you don't assign any of it. Using script code or Automation code from
outside the package, you have more control over changing property values at
runtime and can use global variables to help you do so in any way you see fit.
2. Double-click each of the entries in the Change list to see what properties
they're being used to assign. What you should see is that global variables are
being used to assign the key connection properties for the TransferObject task.
3. Exit the Dynamic Properties dialog and right-click the DTS Designer canvas.
Select Package Properties from the content menu.
4. Click the Global Variables tab in the property dialog. There you should see six
global variables defined: one each for the source server, database, and table
and the destination server, database, and table.
5. Start the dtsrunui utility. In the DTS Run dialog, change the Location field to
Structured Storage File, then select the ParamExample.DTS package that you
previously viewed in the DTS Designer by clicking the ellipsis button to the
right of the File name: text box.
6. Key ParamExample for the package name, then click the Advanced button to
display a dialog that will allow you to set global variable values for the package
before executing it.
7. Configure values for the source server, database, and table as well as the
destination server, database, and table. The pubs and Northwind databases
are fine for use here.
8. Click OK to close the Advanced DTS Run dialog, then click the Run button to
run the package with your parameterized values. You should see the package
run and copy the specified object from the source to the destination you
specified.
The DSO Rowset Provider
Another one of the cooler things you can do with a DTS package is query it with an
OLE DB provider. You can flag a step of a package as a DSO rowset provider, then
query the package from T-SQL using OPENROWSET and the DTSPackageDSO OLE DB
provider. This allows you, for example, to expose the result of a transformation as a
rowset that can be queried via T-SQL. This means, of course, that one package can
serve as the data source for another package because you can obviously set up T-
SQL queries as the data source in a transformation within a package. It also means
that you can offload complex data processing to a DTS package, then invoke the
transformation from within a T-SQL query or stored procedure. Here's an example of
a T-SQL query that retrieves the results of a transformation task as a result set:
SELECT *
FROM OPENROWSET('DTSPackageDSO', '/FD:\CH20\DSORowsetExample.dts',
'SELECT * ')
The two parameters the DTSPackageDSO provider supports are the dtsrun-like
command line parameters and the query text. You can specify dtsrun command line
parameters to identify the source and location of the package. In the example
above, I'm referencing a package stored as a COM structured storage file.
Note the absence of anything resembling a table name in the query text passed into
OPENROWSET. This is because the referenced package has just one step that's been
flagged as a DSO rowset provider. If multiple steps are flagged this way, specify the
step name in the query text to identify the one that you want to query, like this:
SELECT *
FROM OPENROWSET('DTSPackageDSO',
'/FD: \CH20\DSORowsetExample.dts',
'SELECT * FROM DTSStep_DTSDataPumpTask_1')
Note that a step that's been flagged as a DSO rowset provider won't actually execute
when you run the package. It's reserved strictly for providing data and is ignored by
the package execution engine. You can experiment with a DSO rowset provider task
by querying the DSORowsetExample.DTS package (in the CH20 folder on the CD
accompanying this book) from Query Analyzer using the sample T-SQL script,
DSORowsetExample.SQL. You will likely want to first load the package into the DTS
Designer so that you can configure connection properties and so forth.
Using DTS to Transform Replication Subscriptions
We'll discuss SQL Server's replication facilities in more detail in Chapters 21 through
23. For now, let's explore how you can use a DTS package to transform published
data as it is provided to subscribers. You're probably familiar with the basic concept
of a replication publisher�a server or machine that provides data to subscribers, the
data consumers in a replication scenario. In addition to being able to use T-SQL to
manipulate data as it's being published to subscribers, you can also create DTS
packages that perform sophisticated transformations of the data en route. The
package used to transform a subscription can reside at the distributor or at
individual subscribers and can be used to partition as well as transform published
data.
3. Click Yes on the Transform Published Data page in the Create Publication
wizard.
4. Finish creating the publication by selecting the articles, snapshot options, and
so forth in the Create Publication wizard.
2. You will be prompted for the name of the publication to transform. Accept the
default and click Next.
3. You'll next select a destination for the transformation. If you intend to have
more than one subscriber for the publication, configure a destination that is
representative of your intended subscribers as a whole, then click Next.
4. Define the transformations for your data in the next dialog. Click the ellipsis
button to the right of each article to display the Column Mappings and
Transformations dialog. Here, you can make simple one-to-one column
mapping assignments, or you can define sophisticated ActiveX
transformations.
5. You'll next configure where the DTS package will be physically located. You can
locate the package at the distributor or on the subscriber.
6. You finish by naming the package. The package will be stored in
msdb..sysdtspackages on the server you chose in the previous step. You can't
store subscription transformation packages in COM Structured Storage File
format or in the Meta Data Services repository.
When you subscribe to a transformable subscription, you will be prompted for the
name of the package to use to transform the subscription. Note that you can have
multiple transform packages for each publication. You can use this ability to
transform and partition data differently for each subscriber.
How It Works
The DTS package created by the Transform Published Data wizard consists of at least
four objects: for each article, a Connection object, an Execute SQL task, and a Data
Driven Query task; plus a Connection object used by all the articles to provide data
to the subscriber. The Execute SQL task selects the rows from the source article in
order to provide data to the Data Driven Query task. A Data Driven Query task is
always used in lieu of a Transform Data or Bulk Insert task when creating
transformable subscriptions. It can perform straight column-to-column copies or
more sophisticated transformations using ActiveX script code that you supply, as we
just discussed.
You can open transformation packages in the DTS Designer just as you would any
other local SQL Server package. To do so, connect to the distributor or subscriber
(depending on where you stored the package) from Enterprise Manager, then click
the Local Packages node under Data Transformation Services. In the list on the right,
double-click the package you defined in the Transform Published Data wizard in
order to open it.
In order to retrieve data from the publication, a DTS transformation package will
contain a Connection object that references the SQL Server Replication OLE DB
Provider for DTS for each article in the publication. If you display the Properties
dialog for one of these Connection objects, then click the Properties button, you'll
find the list of columns being published by the article on the All tab of the Data Link
Properties dialog. This provider is designed for the exclusive use of transformable
packages and is not available from the default DTS Designer Connection palette.
Custom Tasks
As I mentioned earlier, DTS is based on an extensible COM model. One of the benefits of this
is that you can create your own custom tasks as COM objects and install them into the DTS
Designer. There are a couple of ways to go about this: You can create a custom task object
from scratch in a language capable of implementing COM interfaces, or you can customize
one of the sample custom task components included with SQL Server. Depending on your
needs, either of these approaches can be a viable method of extending DTS, so I'll show you
how to do it both ways.
All DTS task objects implement the DTS CustomTask interface. Any custom task that you build
will also need to implement this interface. In terms of interface implementation, there's no
difference between built-in tasks and custom tasks that you create�DTS's built-in tasks are
just COM components that, at a minimum, implement the CustomTask interface. A custom
task can also implement other interfaces, such as CustomTaskUI (for components that define
their own user interfaces), but we'll focus on the CustomTask interface in our discussions here
because it is what allows a COM component to serve as a custom DTS task.
In this next exercise, we'll create a custom task in Visual Basic that can execute T-SQL scripts.
You may be wondering why we'd need such a component given the existence of the built-in
Execute SQL task. The reason is simple: While Execute SQL can execute T-SQL scripts, it has
no mechanism for handling large amounts of variable script output. That is, if you execute a
script that returns multiple result sets interlaced with PRINT and RAISERROR statement
output, you have no way to retrieve that output from an Execute SQL task as you can with
tools like Query Analyzer and osql.
In actuality, our new custom task will merely call a script execution utility that is specified via
a property. osql is a good choice here, but you could just as well call ISQL or some other script
execution utility. Let's go ahead and build the component, then we'll talk some more about
what it can do and how it does it.
1. Start Visual Basic (the steps that follow assume VB6; VB.NET should work as well,
though the steps will likely differ some) and start a new ActiveX DLL project from the
New Project dialog box. Although the DTS runtime environment (e.g., dtsrun) can make
use of custom tasks implemented as out-of-process components (i.e., EXEs), a custom
task must be defined as an in-process component if you want to use it with the DTS
Designer.
2. From the Project menu, display the Project Properties dialog box and change the Project
Name to ExecuteSQLScript. Make sure Startup Object is set to "(None)," Threading
Model is set to "Apartment Threaded," and Project Compatibility is selected on the
Component tab, then click OK.
3. VB names your new class Class1 by default. In the Properties Window, change this to
clsExecuteSQLScript. Also make sure that the Instancing property is set to
"5�Multiuse."
4. Click the References option on the Project menu and select Microsoft DTSPackage Object
Library from the list of available references. This will import the DTSPackage COM type
library into your project and make the interfaces it exposes available to you.
5. In the code window, make sure that "(General)" and "(Declarations)" are selected in the
two combo boxes, then type the following into the editor:
Implements DTS.CustomTask
This tells VB that your new component will implement the DTS CustomTask interface
(which is defined in the DTSPackage Object Library). The VB IDE will then make the
methods exposed by the CustomTask interface available to you in the code editor.
6. Change the left combo box in the code editor to reference the CustomTask interface.
The editor should immediately insert an empty CustomTask_Properties Get method.
Although you can implement your own property editor for your new component, setting
this property to Nothing instructs DTS to provide a default property editor for you.
8. In the right combo box, select each of the other methods exposed by the CustomTask
interface. The VB editor will insert empty placeholder methods for each one.
9. Add the following two lines at the top of your source code file:
We'll use these two member variables to cache the new component's name and
description.
10. Add the following line to the Let function for the CustomTask_Description property:
m_bstrDescription = RHS
This assigns the RHS parameter (which is passed into the function when the Description
property is assigned) to the member variable we set up earlier.
11. Add the following line to the Get function for the CustomTask_Description property:
CustomTask_Description = m_bstrDescription
12. Set up the Name property similarly to the Description property�code its Let and Get
functions so that they assign and retrieve the m_bstrName member variable we defined
earlier.
13. You'll notice that the property Let and Get methods defined by the CustomTask interface
are private by default. This is because they refer to properties of the base DTS custom
task, not to your custom task. You'll need to define public properties in order for them to
be visible to a DTS package. Create two new methods (shown in Listing 20.11) to
expose the Description property from your custom task.
Listing 20.11
Don't create similar methods for the Name property because we don't want to allow it to
be changed in the DTS Designer. The Designer will auto-generate a name for the task
when you drop it onto the design sheet. In keeping with the way the Designer was
intended to work, you should not allow the name to be changed, so there's no point in
exposing it via a public property.
14. At this point, you have a complete custom DTS task object and could compile it to a DLL
and register it in the DTS Designer if you liked. The component doesn't yet do anything,
so we won't do that just yet.
15. Add the member variables shown in Listing 20.12 to the top of your source code file.
Listing 20.12
The function of each of these should be pretty obvious, but I'll provide a brief
description for most of them anyway. The m_bstrScriptUtility member will store the
command line to the script utility. As you'll see in just a moment, it supports the use of
replaceable parameters that are actually defined by other properties. The
m_bstrServerInstance member stores the SQL Server name and instance (separated by
a backslash) to which we want to connect. The m_bstrAuthString member stores the
authentication string we want to pass to the script utility. In the case of osql, this might
be "-E" (use a trusted connection) or "-U user -P password" (for SQL Server
authentication). The m_bstrScriptToExecute member stores the name of the script we
want to execute. Note that this could also be a SQL query string if the script utility
supports having a query string passed directly on its command line (as osql does). The
m_bstrOutputFileName member stores the name of the output file the script utility is to
create. If the utility directly supports trapping its output in a file and having the name of
that file passed on the command line, you can use this member to supply it. If the utility
does not directly support an output file name but writes its output to the console, you
can change the script utility to execute the command processor (e.g., CMD.EXE on the
Windows NT family of operating systems) and use redirection to route the output to this
file. You have to execute the script utility via the command processor in this situation
because the command processor is the facility by which console redirection occurs. See
the CustomTaskVB_TimeoutExample.DTS package in the CH20\CustomTaskVB subfolder
on the CD accompanying this book for an example of this technique.
16. Set up public Let and Get functions for each of these, being careful to use the proper
data types as you do. Feel free to copy and paste some of the existing Let and Get
methods in order to speed this up.
17. In this next step, we'll define the custom task's Execute method and the plumbing it
requires. Because we need to be able to shell to a script utility and pause execution
until it either completes or times out, we can't use the VB Shell function (which executes
asynchronously) and must instead call the Win32 CreateProcess function. Recall that we
discussed CreateProcess in Chapter 3; it's the Win32 API by which one process can start
another. You use Declare Function to import a function that resides in a DLL (as all
Win32 API functions do) into a VB program. We need to import several of these, so we'll
be adding several Declare Function statements in the code editor. Add the code shown
in Listing 20.13 to the top of your source code module.
Listing 20.13
18. Now that we've imported the Win32 functions we need, we're ready to code the Execute
method itself. Add the code shown in Listing 20.14 to the CustomTask_Execute method.
Listing 20.14
Dim pi As PROCESS_INFORMATION
Dim si As STARTUPINFO
Dim lRes As Long
lRes = CreateProcess(bstrNull, _
bstrCmdLine, _
ByVal 0&, _
ByVal 0&, _
0&, _
CREATE_DEFAULT_ERROR_MODE, _
ByVal 0&, _
bstrNull, _
si, _
pi)
Call CloseHandle(pi.hThread)
Call CloseHandle(pi.hProcess)
pTaskResult = DTSTaskExecResult_Success
Else
Dim lLastError As Long
End If
19. This code performs the work of actually executing the script utility. It begins by dimming
the PROCESS_INFORMATION and STARTUPINFO structures required by CreateProcess.
STARTUPINFO supplies parameters to the process creation, and PROCESS_INFORMATION
will be populated with handles and process/thread IDs pertinent to the new process
once it's created.
20. The code next replaces several hard-coded tokens in the script utility command line. In
order to allow you to easily configure the command line parameters for the script
execution utility using DTS global variables, Dynamic Property tasks, Automation code,
and the like, the component is set up to store the major parts of the script utility
command line in separate properties, which it then uses to build the command line it
executes by replacing tokens you specify with their appropriate values when the
Execute method is called. For example, assume you have the following script utility
command line:
When you execute the step, the Execute method will replace %auth_string% with the
value of m_bstrAuthString, %server_instance% with m_bstrServerInstance, %script%
with m_bstrScriptToExecute, and %output% with m_bstrOutputFileName. Setting up
Execute to work this way allows you to place the parameters wherever you'd like in the
command line and allows them to be easily changed via global variables, Dynamic
Property tasks, and similar mechanisms without forcing you to rebuild the command line
string for the script utility from within the host package (e.g., via an ActiveX script).
21. Execute next calls CreateProcess to start the script utility. It then calls
WaitForSingleObject to wait on the script utility to complete. If the Timeout property is
nonzero, Execute passes a timeout value into WaitForSingleObject so that the wait can
time out; otherwise, Execute waits indefinitely on the script utility to complete.
22. If the TerminateOnTimeout property is set to true and WaitForSingleObject times out
while waiting on the script utility, Execute calls TerminateProcess to terminate the utility.
23. Note the call to the WriteTaskRecord method of the PackageLog object. PackageLog is a
member of the DTS object model and provides a means of writing to a package's log. In
this case, we silently record information that should not cause the task to fail and does
not need to be immediately reported to the user.
24. When CreateProcess fails, it returns 0. We check for this and call the OnError package
event if it occurs. Note the use of the Me identifier to refer to the current task object.
25. Once CreateProcess has been called and has either succeeded or failed, we return the
appropriate step result from the custom task. This allows DTS to detect whether we
successfully ran the script utility.
26. Now that you've finished coding the ExecuteSQLScript task, let's install your new
component into the DTS Designer so that you can use it in a package. Begin by
compiling the component into a DLL file via the File | Make menu option.
27. Next, create a new DTS package by right-clicking the Data Transformation Services node
in Enterprise Manager and selecting New Package. Select the Task | Register Custom
Task menu option from within the DTS Designer. Click the ellipsis next to the Task
location text box and locate your custom task's DLL file.
28. Supply a meaningful description for your component in the Task description text box.
The DTS Designer appends ": undefined" to whatever you supply here when it creates a
default description for a new instance of your task object in a package, so use
something that easily identifies the component. I'm using "Execute SQL Script Task."
29. Click OK to close the Register Custom Task dialog and register your new component with
the DTS Designer. You should see it appear in the task palette.
30. You can now use your new component in a DTS package, so go ahead and drag it onto
the design sheet. You should see displayed the default property dialog that DTS
constructs for you based on the public properties exposed by your component.
31. You can experiment further with your new custom task by loading the
CustomTaskVBExample.DTS and CustomTaskVB_TimeoutExample.DTS packages from
the CH20\CustomTaskVB subfolder on the CD accompanying this book.
CustomTaskVBExample executes a simple script via OSQL.EXE using an
ExecuteSQLScript custom task. CustomTaskVB_ TimeoutExample executes a query
string (rather than a script) and times out if the query takes longer than five seconds to
execute. It also executes the script utility via the CMD.EXE command processor and
uses redirection to route the output to the output file.
Another method for creating a custom DTS task is to customize one of the sample tasks that
ship with SQL Server. In the next exercise, I'll walk you through customizing the
DTSSampleTask custom task that's included in the DTS example code that comes with SQL
Server. You can find the code for this task in the …
\80\Tools\DevTools\Samples\dts\CustomTasks\DTSTask subfolder under your SQL Server
installation path. (This is the default location; you may have to unzip the DTS samples in
order to see this folder.) This sample task implements the same essential functionality as the
Execute Process task: You provide a process command line to execute, and the task calls the
Win32 CreateProcess API to execute it, optionally waiting on it to complete and/or timing out
and terminating it as appropriate.
NOTE: DTSSampleTask is a COM component written in C++. Customizing it involves
modifying Interface Definition Language (IDL) as well as C++. If you aren't pretty comfortable
with both of these languages, you should probably skip this exercise.
1. Load the dtstask project (from the DTSTask subfolder, mentioned above) into the Visual
C++ IDE. (The instructions that follow assume VC6, but VC7 should work as well, though
the steps will likely differ some.) You may want to copy the subfolder to a different
location in order to avoid modifying the original sample task code.
2. We'll begin with the dtstask.idl file. This file contains the IDL code that defines the
custom task component's COM interface. The MIDL compiler uses this file to produce
other files that are ultimately compiled along with the component's other source files to
build the custom task DLL.
3. You should see property attribute specifications for the ProcessCommandLine property
near the middle of the file. Delete the four lines that define the ProcessCommandLine
property.
4. Insert the lines shown in Listing 20.15 in place of the ones you deleted and renumber
the properties that follow them accordingly.
Listing 20.15
5. Change the custom task's help string at the bottom of dtstask.idl, which currently reads
"DTS Sample Task: Create process," to "DTS ExecuteScript Task."
6. The next file we'll change is the task.h header file. This file defines the interface to the
class that implements our custom task object. Remove all references to
m_bstrCommandLine and ProcessCommandLine in this file. Be sure to remove the
entirety of the get and put method declarations in IDTSSampleTask for
ProcessCommandLine.
m_bstrScriptExecutionUtility = SysAllocString(L"");
m_bstrScriptToExecute = SysAllocString(L"");
m_bstrOutputFileToCreate = SysAllocString(L"");
if (m_bstrScriptExecutionUtility)
SysFreeString(m_bstrScriptExecutionUtility);
if (m_bstrScriptToExecute)
SysFreeString(m_bstrScriptToExecute);
if (m_bstrOutputFileToCreate)
SysFreeString(m_bstrOutputFileToCreate);
Listing 20.16
STDMETHOD(get_ScriptExecutionUtility)(
/* [retval][out] */ BSTR *pRetVal);
STDMETHOD(put_ScriptExecutionUtility)(
/* [in] */ BSTR NewValue);
STDMETHOD(get_ScriptToExecute)(
/* [retval][out] */ BSTR *pRetVal);
STDMETHOD(put_ScriptToExecute)(
/* [in] */ BSTR NewValue);
STDMETHOD(get_OutputFileToCreate)(
/* [retval][out] */ BSTR *pRetVal);
STDMETHOD(put_OutputFileToCreate)(
/* [in] */ BSTR NewValue);
10. Change the private BSTR member declarations in IDTSSampleTask to look like this:
This will define the private member variables we'll use to cache the values of our new
properties.
11. The last file we'll modify is task.cpp, the module responsible for implementing our
custom task. Find the get and put methods for ProcessCommandLine and remove them.
In their place, add the code shown in Listing 20.17.
Listing 20.17
STDMETHODIMP CTask::get_ScriptExecutionUtility(
/* [retval][out] */ BSTR *pRetVal)
{
if (!pRetVal)
return E_POINTER;
*pRetVal = SysAllocString(m_bstrScriptExecutionUtility);
if (!*pRetVal)
return E_OUTOFMEMORY;
return NOERROR;
}
STDMETHODIMP CTask::put_ScriptExecutionUtility(
/* [in] */ BSTR NewValue)
{
if (m_bstrScriptExecutionUtility)
SysFreeString(m_bstrScriptExecutionUtility);
m_bstrScriptExecutionUtility = SysAllocString(NewValue);
if (!m_bstrScriptExecutionUtility)
return E_OUTOFMEMORY;
return NOERROR;
}
STDMETHODIMP CTask::get_ScriptToExecute(
/* [retval][out] */ BSTR *pRetVal)
{
if (!pRetVal)
return E_POINTER;
*pRetVal = SysAllocString(m_bstrScriptToExecute);
if (!*pRetVal)
return E_OUTOFMEMORY;
return NOERROR;
}
STDMETHODIMP CTask::put_ScriptToExecute(
/* [in] */ BSTR NewValue)
{
if (m_bstrScriptToExecute)
SysFreeString(m_bstrScriptToExecute);
m_bstrScriptToExecute = SysAllocString(NewValue);
if (!m_bstrScriptToExecute)
return E_OUTOFMEMORY;
return NOERROR;
}
STDMETHODIMP CTask::get_OutputFileToCreate(
/* [retval][out] */ BSTR *pRetVal)
{
if (!pRetVal)
return E_POINTER;
*pRetVal = SysAllocString(m_bstrOutputFileToCreate);
if (!*pRetVal)
return E_OUTOFMEMORY;
return NOERROR;
}
STDMETHODIMP CTask::put_OutputFileToCreate(
/* [in] */ BSTR NewValue)
{
if (m_bstrOutputFileToCreate)
SysFreeString(m_bstrOutputFileToCreate);
m_bstrOutputFileToCreate = SysAllocString(NewValue);
if (!m_bstrOutputFileToCreate)
return E_OUTOFMEMORY;
return NOERROR;
}
These methods will provide the plumbing necessary to set and get values for our new
properties.
12. The last thing we need to do to customize this task for our own use is modify the
Execute method to call our script execution utility. Without belaboring this, I'll just
present the code here, and you can read through it to see how it works. The main
difference between it and the original Execute method is that it replaces predefined
string tokens in the script utility command line with the values of m_bstrScriptToExecute
and m_bstrOutputFileToCreate as appropriate. As with the CustomTaskVB example, this
provides flexibility in setting up the script utility command line and allows you to supply
the name of the script to execute and the output file name to create using global
variables, Dynamic Properties tasks, and similar package facilities. It also fixes a bug in
the original sample task code that ships with SQL Server that caused two handles (a
thread handle and a process handle) to be leaked with each call to Execute. Replace the
original Execute method with the one provided in Listing 20.18.
Listing 20.18
STDMETHODIMP CTask::Execute(
/* [in] */ IDispatch *pPackage,
/* [in] */ IDispatch *pPackageEvents,
/* [in] */ IDispatch *pPackageLog,
/* [out][in] */ LONG *pTaskResult)
{
//**************************************************************
//NOTE: This sample does not properly SetErrorInfo. You should
//implement ISupportErrorInfo
//and properly call SetErrorInfo on failures.
//**************************************************************
USES_CONVERSION;
HRESULT hr = NOERROR;
PROCESS_INFORMATION procInfo;
STARTUPINFO startupInfo;
DWORD dwTimeout;
LPTSTR szScriptExecutionUtility;
LPTSTR szScriptToExecute;
LPTSTR szOutputFileToCreate;
TCHAR szCmd[0x1000];
TCHAR szCmd2[0x1000];
const TCHAR *SCRIPT = _T("%script%");
const TCHAR *OUTPUT = _T("%output%");
memset(&startupInfo, 0, sizeof(startupInfo));
memset(&procInfo, 0, sizeof(procInfo));
startupInfo.cb = sizeof(STARTUPINFO);
szScriptExecutionUtility = OLE2T(m_bstrScriptExecutionUtility);
szScriptToExecute = OLE2T(m_bstrScriptToExecute);
szOutputFileToCreate = OLE2T(m_bstrOutputFileToCreate);
_tcscpy(szCmd,szCmd2);
}
else if ((q) && (p) && (p>q)) { //Got script and output,
//output first
//Replace output token
_tcsncpy(szCmd,szScriptExecutionUtility,
q-szScriptExecutionUtility);
szCmd[q-szScriptExecutionUtility]=_T('\0');
_tcscat(szCmd,szOutputFileToCreate);
_tcscat(szCmd,q+_tcslen(OUTPUT));
_tcscpy(szCmd,szCmd2);
}
else if ((q) && (!p)) { //Got output and no script
//Replace output token
_tcsncpy(szCmd,szScriptExecutionUtility,q-szScriptExecutionUtility);
szCmd[q-szScriptExecutionUtility]=_T('\0');
_tcscat(szCmd,szOutputFileToCreate);
_tcscat(szCmd,q+_tcslen(OUTPUT));
}
else { //Got script and no output
//Replace script token
_tcsncpy(szCmd,szScriptExecutionUtility,
p-szScriptExecutionUtility);
szCmd[p-szScriptExecutionUtility]=_T('\0');
_tcscat(szCmd,szScriptToExecute);
_tcscat(szCmd,p+_tcslen(SCRIPT));
}
else
_tcscpy(szCmd,szScriptExecutionUtility);
if (WAIT_TIMEOUT != WaitForSingleObject
(procInfo.hProcess, dwTimeout))
{
//Process terminated
DWORD dwExitCode;
if (!GetExitCodeProcess(procInfo.hProcess, &dwExitCode))
return E_UNEXPECTED;
if (dwExitCode == (DWORD)m_lSuccessReturnCode) //If exit code
// matches the desired value it is a success
*pTaskResult = DTSTaskExecResult_Success; //Assume failure
}
else
{
if (m_bTerminateProcessAfterTimeout) //Blow the process if
//that is what the user wants. *Not* the default.
TerminateProcess(procInfo.hProcess, (UINT)-1);
if (m_bFailPackageOnTimeout) { //We error this ::Execute if
//timeout instead of just failing task if that
//is what the user wants.
CloseHandle(procInfo.hProcess);
CloseHandle(procInfo.hThread);
return HRESULT_FROM_WIN32(ERROR_TIMEOUT); //This is like
//a cancel.
}
}
CloseHandle(procInfo.hProcess);
CloseHandle(procInfo.hThread);
return hr;
}
13. You're now ready to compile this component into a DLL and register it in the DTS
Designer. Hit F7 to build the project. Once dtstask.DLL is created, register it in the DTS
Designer via the Task | Register Custom Task menu option, just as you registered the
ExecuteSQLScript.DLL custom task DLL (dtstask.DLL is created in the
ReleaseUMinDependency folder by default). As before, you can supply a description for
the custom task when you register it. I'm using "Execute Script Task."
14. Once the new component is registered in the DTS Designer, you can drag it onto the
design sheet and configure it for use in the package. As with the Execute SQL Script
Task that we created in the previous exercise, you can use token strings in the script
utility command line in order to specify where you want the ScriptToExecute and
OutputFileToCreate properties to be inserted. For example, you might set the
ScriptExecutionUtility property to something like this:
When the task is executed and its Execute method is called, %script% will be replaced
with the current value of the ScriptToExecute property, and %output% will be replaced
with the current value of the OutputFileToCreate property. Since these properties can be
assigned using global variables, Dynamic Properties tasks, and similar mechanisms, this
capability allows you to dynamically replace parts of the script utility command line
without having to rebuild the command line in its entirety through script code or
something similar.
15. You can experiment further with the ExecuteScript custom task by loading the
CustomTaskExample.DTS package from the CH20\CustomTask subfolder on the CD
accompanying this book into the DTS Designer and running and modifying it.
Debugging a custom task component is much like debugging any other type of DLL (again, I
recommend you build custom task components as DLLs so they can be used in the DTS
Designer), but I can offer some pointers to make your life easier if you ever need to do this.
I've seen newsgroup postings where people were jumping through lots of hoops (often
unnecessarily) in order to debug a custom DTS task. Here are some general guidelines that
may make this more intuitive should you ever need to do it.
1. Keep in mind that you don't debug a DLL directly, you debug its host executable. I've
seen people advising the uninitiated to attempt to debug directly from Visual Basic or
Visual C++ by setting up Enterprise Manager as the "host application" insofar as the IDE
debugger is concerned. While this can be made to work, it's overly complicated. There's
an easier way: Simply start your host process as you normally would, then attach your
debugger to that process. In the case of Enterprise Manager, this will be MMC.EXE, the
application under which SEM runs. In the case of dtsrun, this, of course, is dtsrun.exe.
And in the case of a standalone application that is accessing the DTS object model via
COM Automation, you would attach your debugger to the application itself. Both the
VC++ debugger and WinDbg, the standalone debugger we've been using throughout
this book, support the ability to attach to a running process.
2. Use a debug build of your component if possible. Debugging with optimized code can be
difficult if not impossible because optimizing compilers can rearrange and even
eliminate the machine code that corresponds to individual source code lines. The
custom task we just created supports two different debug build configurations. You can
access them from the VC6 IDE via the Build | Set Active Configuration menu item.
3. Be sure to generate debug symbols for your component. This is the default for debug
builds in VC++, and you can enable it for VB projects by checking the Create Symbolic
Debug Info checkbox in the Project Properties | Compile dialog.
4. Before you attach to the host app with your debugger, be sure your symbol and source
paths are set correctly in the debugger. You'll probably need to add the path containing
the PDB generated by VB or VC++ to your symbol path, and you may need to add the
path containing your component's source code to the debugger's source path.
5. Once you've attached to your host application, check to see whether your component's
DLL is already loaded and whether the debugger has located its symbol file. In WinDbg,
you can do this using the lm command, which lists the currently loaded modules, along
with the symbol path for each one if the symbol file has been located and loaded. If your
component hasn't yet been loaded, do whatever is necessary in the host application to
cause it to load�you won't be able to reference its symbols (e.g., to set breakpoints)
until it has been loaded. If you see your component DLL loaded, but its symbols haven't
yet been loaded, make sure the symbol file is on the debugger's symbol path, then
instruct the debugger to attempt to reload the symbols for your component. (You won't
be able to fully debug your component until its symbols load successfully.) Different
debuggers have different commands for doing this; you use the .reload -f command in
WinDbg to force a symbol reload.
Provided that your symbol and source paths are set correctly, you should be able to debug
your component just as you would any other DLL. Let's walk through a simple exercise in
which we use the WinDbg standalone debugger to debug a custom task you've created.
1. Compile your custom task to a DLL file and make sure symbolic debugging information
is created so that a PDB file is generated. You enable this in Visual Basic by selecting the
Create Symbolic Debug Info option on the Compile tab in the Project Properties dialog.
2. Start the DTS Designer and install your custom task into it if you've not already done so.
3. Start WinDbg and set its symbol and source paths (on the File menu) to the folder(s)
containing the PDB file corresponding to your DLL and its source code, respectively.
4. From WinDbg, attach to the Enterprise Manager's host app, MMC.EXE. (Press F6 to see
the list of running processes.)
5. Set a breakpoint in the Execute method of your custom task's class. For example, if your
class is named clsExecuteSQLScript and your Execute method is named
CustomTask_Execute, you might enter this command in WinDbg to set the breakpoint:
bp clsExecuteSQLScript::CustomTask_Execute
This breakpoint will cause execution to stop when the task is executed.
7. Switch back to the DTS Designer and drop an instance of your custom task onto the
design sheet.
8. Right-click your custom task object and select Execute. This should trigger your
breakpoint in WinDbg and stop execution. (You'll have to use Alt+Tab to get back to
WinDbg�the Designer will probably appear to hang.)
9. You should now see the source code module containing your custom task's Execute
method loaded in WinDbg, and you should be able to step through it by pressing F10.
You can set up watches, examine the call stack and registers, view local variables, and
so on as you step through your code. This can be immensely helpful in tracking down
obscure bugs in custom DTS tasks.
10. When you're ready for the Designer to resume running, type g and press Enter in the
WinDbg command window. When you detach from MMC.EXE, you'll likely notice that it
disappears altogether. Unless you're on Windows XP or later, detaching from a process
to which you've attached "invasively" (invasive attachment is the default and is
necessary to set breakpoints) automatically terminates the process that was being
debugged.
Another technique for debugging custom tasks is to create a simple VB app that
programmatically creates a package containing your custom task (or opens a package
containing a reference to it) and executes it. You can then code the app to touch specific parts
of the package, execute specified steps that you want to check out, and so on. You can step
through the app itself under a debugger and interactively view package properties, step into
your custom task code, and so on. Without ever leaving VB, you can do many of the things
you could do using a standalone debugger and attaching to MMC or dtsrun.
Dim g_oPkg As DTS.Package2
Dim g_bInited
End If
Next
End Function
If (g_bInited) Then
g_oPkg.UnInitialize
g_bInited = False
End If
lbGlobals.Clear
Dim oGlobal As DTS.GlobalVariable2
lbSteps.Clear
lbTasks.Clear
g_bInited = True
End Sub
Set oStep =
g_oPkg.Steps(lbSteps.List(lbSteps.ListInde
x)) oStep.Execute
End If
End Sub
g_bInited = False
End Sub
DTS.Package2Class pkg;
<span class="docEmphStrong">private
void btOpen_Click(object sender,
System.EventArgs e)</span> {
if
(DialogResult.OK!=od_dts.ShowDialog())
return; tbFileName.Text=od_dts.FileName;
pkg = new DTS.Package2Class(); object
dummy=new object();
pkg.LoadFromStorageFile(tbFileName.Text,
"",null,null,null, ref dummy);
TreeNode noderoot;
TreeNode nodeparent;
TreeNode nodechild;
btAddChange.Enabled=false;
btSave.Enabled=false;
btSend.Enabled=false;
tvItems.BeginUpdate();
lvProperties.BeginUpdate();
lvChanges.BeginUpdate(); try
tvItems.Nodes.Clear();
lvProperties.Items.Clear();
lvChanges.Items.Clear();
noderoot=tvItems.Nodes.Add(pkg.Name);
noderoot.Tag=pkg;
nodeparent=noderoot.Nodes.Add("Conne
ctions"); foreach (DTS.Connection conn in
pkg.Connections) {
nodechild=nodeparent.Nodes.Add(conn.
Name); nodechild.Tag=conn;
nodeparent=noderoot.Nodes.Add("Tasks"
); foreach (DTS.Task task in pkg.Tasks) {
nodechild=nodeparent.Nodes.Add(task.N
ame); nodechild.Tag=task;
}
nodeparent=noderoot.Nodes.Add("Steps
"); foreach (DTS.Step step in pkg.Steps) {
nodechild=nodeparent.Nodes.Add(step.N
ame); nodechild.Tag=step;
nodeparent=noderoot.Nodes.Add("Global
Variables"); foreach (DTS.GlobalVariable
var in pkg.GlobalVariables) {
nodechild=nodeparent.Nodes.Add(var.Na
me); nodechild.Tag=var;
tvItems.ExpandAll();
btSave.Enabled=true;
finally
{
tvItems.EndUpdate();
lvProperties.EndUpdate();
lvChanges.EndUpdate(); }
<span class="docEmphStrong">private
void tvItems_AfterSelect(object sender,
</span> <span
class="docEmphStrong">System.Windows
.Forms.TreeViewEventArgs e)</span> {
tbChange.Text="";
btAddChange.Enabled=false;
lvProperties.BeginUpdate(); try
lvProperties.Items.Clear(); if ((null==pkg)
||
(null==tvItems.SelectedNode) ||
(null==tvItems.SelectedNode.Tag))
return;
DTS.Properties props=null; if
(tvItems.SelectedNode.Tag is
DTS.Package) props =
(tvItems.SelectedNode.Tag as
DTS.Package).
Properties;
else if (tvItems.SelectedNode.Tag is
DTS.Connection) props =
(tvItems.SelectedNode.Tag as
DTS.Connection).
Properties;
else if (tvItems.SelectedNode.Tag is
DTS.Task) props =
(tvItems.SelectedNode.Tag as DTS.Task).
Properties;
else if (tvItems.SelectedNode.Tag is
DTS.Step) props =
(tvItems.SelectedNode.Tag as DTS.Step).
Properties;
else if (tvItems.SelectedNode.Tag is
DTS.GlobalVariable) props =
(tvItems.SelectedNode.Tag as
DTS.GlobalVariable).
Properties;
ListViewItem lvitem =
lvProperties.Items.Add(prop.Name);
lvitem.SubItems.Add(prop.Value as string);
lvitem.Tag=tvItems.SelectedNode.Tag; if
(!prop.Set)
lvitem.ImageIndex=1; //readonly
property }
}
finally
lvProperties.EndUpdate(); }
<span class="docEmphStrong">private
void
lvProperties_SelectedIndexChanged(object
sender,</span> <span
class="docEmphStrong">System.EventAr
gs e)</span> {
if ((null==lvProperties.SelectedItems) ||
(0==lvProperties.SelectedItems.Count))
return;
if
(1!=lvProperties.SelectedItems[0].ImageIn
dex) {
btAddChange.Enabled=true;
tbChange.ReadOnly=false; }
else
btAddChange.Enabled=false;
tbChange.ReadOnly=true; }
if
(lvitem.Tag.Equals(lvProperties.SelectedIte
ms[0])) {
tbChange.Text=lvitem.SubItems[3].Text;
return;
tbChange.Text=lvProperties.SelectedItem
s[0].SubItems[1].Text; }
<span class="docEmphStrong">private
void btAddChange_Click(object sender,
</span> <span
class="docEmphStrong">System.EventAr
gs e)</span> {
if ((null==lvProperties.SelectedItems) ||
(0==lvProperties.SelectedItems.Count) )
return;
if
(lvi.Tag.Equals(lvProperties.SelectedItems[
0])) {
lvChanges.Items.Remove(lvi); break;
}
ListViewItem lvitem =
lvChanges.Items.Add(tvItems.
SelectedNode.Text);
lvitem.SubItems.Add(lvProperties.Selecte
dItems[0].
SubItems[0].Text);
lvitem.SubItems.Add(lvProperties.Selecte
dItems[0].
SubItems[1].Text);
lvitem.SubItems.Add(tbChange.Text);
lvitem.Tag=lvProperties.SelectedItems[0];
btSend.Enabled=true;
// Clear changes
lvChanges.Items.Clear();
btSend.Enabled=false; }
Replace("%","{%}").Replace("^","
{^}").Replace("+","{+}"); return
instring.Replace("\\{","
{{}").Replace("\\}","{}}"); }
<span class="docEmphStrong">private
void btSend_Click(object sender,
System.EventArgs e)</span> {
lvitem.Checked=false; }
lvChanges.BeginUpdate(); try
string Keys;
Keys="%pd";
SendKeys.SendWait(Keys); foreach
(ListViewItem lvitem in lvChanges.Items) {
Keys="";
if ((lvitem.Tag as ListViewItem).Tag is
DTS.Connection) Keys+="c";
Keys+="g";
SendKeys.SendWait(Keys); if
(0!=Keys.Length) //Not package properties
{
Keys="{RIGHT}{DOWN}";
SendKeys.SendWait(Keys);
Keys=lvitem.Text; //item name
SendKeys.SendWait(Keys); }
Keys="{TAB}{DOWN}";
SendKeys.SendWait(Keys);
Keys=lvitem.SubItems[1].Text; //prop
name SendKeys.SendWait(Keys); Keys="
{ENTER}";
SendKeys.SendWait(Keys); Keys="
{TAB}";
SendKeys.SendWait(Keys);
Keys=ReplaceSpecialKeys(lvitem.SubItem
s[3].Text); SendKeys.SendWait(Keys);
Keys="{ENTER}";
SendKeys.SendWait(Keys); Keys="+
{TAB}{HOME}";
SendKeys.SendWait(Keys);
lvitem.Checked=true;
finally
lvChanges.EndUpdate(); }
}
TreeNode FindNode(TreeNodeCollection
nodes, string searchstring)
if (0!=tvitem.Nodes.Count)
result=FindNode(tvitem.Nodes,searchstrin
g); if (null==result)
if
(tvitem.Text.ToLower()==searchstring.ToLo
wer()) {
result = tvitem;
break;
else break;
return result;
Replace("\r","\\r").Replace("\t","\\t"); }
Replace("\\r","\r").Replace("\\t","\t"); }
<span class="docEmphStrong">private
void menuItem2_Click(object sender,
</span> <span
class="docEmphStrong">System.EventAr
gs e)</span> {
if
(DialogResult.OK!=od_TXT.ShowDialog())
return; lvChanges.Items.Clear();
StreamReader f =
File.OpenText(od_TXT.FileName);
tvItems.BeginUpdate();
lvProperties.BeginUpdate();
lvChanges.BeginUpdate(); try
string linein;
string[] args=linein.Split('\t');
ListViewItem lvitem = new ListViewItem();
//Break up the input line on tab foreach
(string arg in args) {
if (0==lvitem.Text.Length)
lvitem.Text=arg;
else
lvitem.SubItems.Add(ReplaceEscapeChar
s(arg)); }
Text);
if (null==parentnode) {
lvitem.SubItems[0].Text); return;
tvItems.SelectedNode=parentnode;
if
(lvi.Text.ToLower()==lvitem.SubItems[1].
Text.ToLower())
propitem = lvi;
break;
if (null==propitem)
MessageBox.Show("Unable to locate
property " +
lvitem.SubItems[1].Text); return;
}
lvitem.Tag=propitem;
lvChanges.Items.Add(lvitem);
btSend.Enabled=true;
finally
f.Close();
tvItems.EndUpdate();
lvProperties.EndUpdate();
lvChanges.EndUpdate(); }
if
(DialogResult.OK!=sd_TXT.ShowDialog())
return; StreamWriter f =
File.CreateText(sd_TXT.FileName); try
f.WriteLine(";ItemName\tPropertyName\t
OldValue\tNewValue"); foreach
(ListViewItem lvitem in lvChanges.Items) {
f.WriteLine("
{0}\t{1}\t{2}\t{3}",lvitem.Text,lvitem.
SubItems[1].Text,lvitem.SubItems[2].Text
,lvitem.
SubItems[3].Text);
}
od_TXT.FileName=sd_TXT.FileName; }
finally
f.Close();
<span class="docEmphStrong">private
void btSave_Click(object sender,</span>
<span
class="docEmphStrong">System.EventAr
gs e)</span> {
if
(DialogResult.OK!=sd_TXT.ShowDialog())
return; StreamWriter f =
File.CreateText(sd_TXT.FileName); try
{
f.WriteLine(";ItemName\tPropertyName\t
OldValue\tNewValue"); foreach
(DTS.Property prop in pkg.Properties) {
f.WriteLine("{0}
{1}\t{2}\t{3}\t{4}",prop.Set?"":";",
pkg.Name,prop.Name,NoEscapeChars(pro
p.Value), NoEscapeChars(prop.Value)); }
f.WriteLine("{0}
{1}\t{2}\t{3}\t{4}",prop.Set?"":";",
conn.Name,prop.Name,NoEscapeChars(pr
op.Value), NoEscapeChars(prop.Value)); }
f.WriteLine("{0}
{1}\t{2}\t{3}\t{4}",prop.Set?"":";",
step.Name,prop.Name,NoEscapeChars(pro
p.Value), NoEscapeChars(prop.Value)); }
f.WriteLine("{0}
{1}\t{2}\t{3}\t{4}",prop.Set?"":";",
var.Name,prop.Name,NoEscapeChars(prop
.Value), NoEscapeChars(prop.Value)); }
finally
f.Close();
The primary means of data transport within a DTS package is the multiphase data
pump. This component allows data to be moved from one OLE DB provider to
another. It's the engine behind the Transform Data task, the Data Driven Query task,
and the Parallel Data Pump task. You can also use Bulk Insert tasks to move data
from a text file into SQL Server. DTS provides a number of mechanisms for moving
data around flexibly and quickly.
2. What single DTS component is the engine behind the Transform Data task, the
Data Driven Query task, and the Parallel Data Pump task?
4. What DTS task can you use to retrieve values from an INI file and assign them
to global variables?
8. How can you detect that a lookup query has returned multiple rows?
10. Describe a scenario in which a Data Driven Query task would be preferable to
either a Transform Data task or a Bulk Insert task.
11. How many stock queries can you associate with a single Data Driven Query
task?
12. When a nested package is joined to the transaction context of its caller, what
happens when the child package commits the transaction?
13. Can you set up ActiveX transformations for a Bulk Insert task?
14. What step must you first take in order to be able to save DTS packages in the
Meta Data Services repository?
15. When a DTS package is stored locally on a SQL Server, into what database and
table is it saved?
16. What OLE DB provider is used to query a DTS package from Transact-SQL?
17. What OLE DB provider is used to query publication data for a transformable
subscription?
18. Which data pump phase occurs first, the postsource phase or the pump
complete phase?
19.
True or false: In order to use the high-speed bulk copy facilities provided by an
OLE DB data source, you must enable the Use fast load option in a Transform
Data task�it is not enabled by default.
20. True or false: While an Execute SQL task can execute regular T-SQL scripts with
embedded batch terminators, it does not provide a mechanism for retrieving
irregular or multirowset output.
21. Why is using the Disconnected Edit facility to change property values in a DTS
package potentially dangerous?
23. What option must you enable in order to show multiphase data pump options
in the DTS Designer?
24. What annoying side effect occurs when you save a DTS package using
Automation code?
25. True or false: DTS supports direct connections to HTML data sources.
26. Explain the purpose of the Close connection on completion workflow option.
27. What method of the DTS Package object can an application call in order to load
a package directly from a structured storage file?
28. True or false: Because the DTS Package object library is a COM library, you
cannot use the foreach construct in managed code languages such as C# or
VB.NET to iterate through a collection exposed by the library.
29. True or false: Even if there is only one Connection object in the package, the
Microsoft Distributed Transaction Coordinator must be available in order for a
DTS package to queue data modifications in a transaction.
30. True or false: Although you may have other ActiveX script languages installed
on the machine on which the DTS Designer is running, you may use only
VBScript and JScript when coding ActiveX scripts for DTS packages.
31. Once you've created a new project, what must you do to use the DTS Package
object library in a managed code application?
33. When a child package that has its Use transactions property enabled is
executed via an Execute Package task from a package that has already started
a transaction and the Execute Package task is enlisted in that transaction, does
the child package initiate a nested transaction?
34. True or false: Due to the lack of IDispatch support in the COM Interop facility
provided by the .NET Framework, it is impossible to open a DTS package in a
managed code app using the DTS Package object library.
�H. W. Kenton
In this chapter and the next two, we'll delve into how SQL Server replication works.
This book isn't about replication per se, so we don't have the time or space to get
into the subject in the breadth or depth we might like. The chapters on replication in
this book will not walk you through setting up replication or administering it; you
won't find any screenshots of the Enterprise Manager Replication wizards in these
pages. The Replication wizards are simple enough to use that you shouldn't need the
help of a book for that, anyway. Moreover, Books Online covers the basic setup and
management of replication quite well; what you can't figure out from the Enterprise
Manager wizards is covered pretty well there.
This book will also not take you through the many idiosyncrasies of replication or
caveats and exceptions that seem to have attached themselves to every single
feature within it. Again, Books Online does a pretty fair job of that, and the aim of
this book isn't to merely regurgitate Books Online.
This chapter assumes that you've read what Books Online has to say about
replication and that you're familiar with basic replication terms such as distributor,
publisher, agent, snapshot, article, and so forth. If you haven't yet read through the
coverage of replication in Books Online, now would be a good time to do that.
As with the other topics covered in this book, our mission here is to explore
replication from an architectural standpoint and answer the fundamental question,
How does it work? How does it do what it does? You may have a basic understanding
of what snapshot replication does, but do you know how it does it? That's the subject
of this chapter.
Overview
Snapshot replication consists of taking a copy of an object on the publisher and
transferring it to a subscriber. The transmittal of the replicated data is an all-or-
nothing proposition: Although you can filter a snapshot horizontally as well as
vertically to limit the data it captures, there's no support in snapshot replication for
propagating incremental changes. You use transactional or merge replication for
that.
SQL Server implements snapshot replication via the Snapshot Agent and the
Distribution Agent. The Snapshot Agent takes care of creating the data and schema
snapshot that will be propagated to subscribers. The Distribution Agent handles
taking the snapshot and applying it to subscribers.
The Snapshot Agent
As with all replication agents, the Snapshot Agent is a console mode application that
uses ODBC to communicate with SQL Server. The executables for SQL Server's
replication agents reside in the 80\COM subfolder under the Microsoft SQL Server
folder. When you set up a snapshot publication, SQL Server creates a SQL Server
Agent job that runs the Snapshot Agent. You can view this job by opening the
appropriate entry under the Agents\Snapshot Agents node in the Replication Monitor
on the distributor (or via the Jobs node under Management\SQL Server Agent) in
Enterprise Manager. Each snapshot job consists of several steps, one of which is to
run snapshot.exe, the executable for the Snapshot Agent. The job step that actually
runs the Snapshot Agent is configured as being of type Replication Snapshot, which
tells SQL Server Agent to run snapshot.exe when it executes the step.
You can explore the Replication Snapshot job step to see what parameters are passed
to snapshot.exe by default. Depending on how you've set up your snapshot
publication, the parameters you see should look something like the following:
Note that you can run snapshot.exe independently of SQL Server Agent. Exercise
21.1 takes you through doing just that.
1. Start Enterprise Manager from your distributor machine and use the Create
Publication wizard to create a snapshot publication if you haven't already done
so. When you set up a snapshot publication via the Create Publication wizard,
Enterprise Manager calls the sp_addpublication and
sp_addpublication_snapshot stored procedures to create it, as shown in Listing
21.1.
Listing 21.1
2. Open the Snapshot Agent entry corresponding to the publication under the
Agents\Snapshot Agents node in Replication Monitor in Enterprise Manager.
4. Select the text in the Command window and copy it to the clipboard (Ctrl+C).
5. Close the job step editor and the agent properties dialog in Enterprise Manager.
7. Type snapshot followed by a space on the command line, then paste the
clipboard contents onto the command line (e.g., Alt+Space, E, P).
8. Hit Enter to run the agent from the command line. You should see output like
that shown in Listing 21.2.
Listing 21.2
9. The snapshot has been generated from the command line just as it normally is
from SQL Server Agent. This is a handy thing to do when you're having trouble
getting a snapshot to generate properly. You can run it from the command line
to see its output appear in a console window rather than having to view system
tables in the distribution database.
10. Note that you can direct the agent's output to a text file via the -Output
filename parameter. Simply add -Output followed by the name of the output file
you want to create to the agent command line and restart it to try this out. If
the file you specify already exists, it will be appended to; if it does not exist, it
will be created.
The Snapshot Agent creates files containing the data and schema for a publication's
articles under the snapshot folder. By default, this folder is located under the
ReplData folder under the root folder of your distributor's SQL Server installation but
can be changed via the Publication Properties dialog in Enterprise Manager. Each
publication gets its own snapshot folder. This folder name is composed of the name of
the server, instance, and database from which the publication is publishing data as
well as the name of the publication itself. Under each publication snapshot folder,
there are two subfolders: unc and FTP. The unc path is the container for UNC path-
based publication snapshots. The FTP subfolder is the container for FTP-based
publication snapshots. Unless you explicitly enable a publication to be distributed
over FTP, you should find its snapshots in a subfolder under the unc path.
Each generated snapshot gets its own subfolder under either unc or FTP. The name of
this subfolder consists of the current date and time when the snapshot was
generated. This subfolder contains the actual snapshot files themselves.
The snapshot files themselves are a mix of BCP data and Transact-SQL scripts. The
data is saved in BCP format and has a file extension of .bcp. If you configured the
publication for distribution only to other SQL Servers and did not allow it to be
transformed via DTS, this file is written in BCP's native format (because this is
generally faster); otherwise, character format is used. The other files (e.g., .sch, .dri,
.trg, .idx, and so on) contain the Transact-SQL necessary to create each article's
object, its declarative referential integrity, its triggers, and its indexes, as
appropriate.
Besides storing the data for a table, the BCP files created by the Snapshot Agent also
store the data returned by view objects. Just as you can use the BCP utility to write
the results of a query against a view to a file, the Snapshot Agent will create a
snapshot of the data returned by a view as though the view contained the data itself.
When the Snapshot Agent runs, it determines whether any new subscriptions have
been added since the last time it ran. If no new subscriptions have been created, the
agent doesn't need to create new scripts or data files. However, if a publication was
created with the option to generate its snapshot immediately, the Snapshot Agent
will create a new snapshot for it each time the agent runs.
Note that the Snapshot Agent is used to generate snapshots not only for snapshot
replication but also for transactional and merge replication. Regardless of the type of
replication, an initial snapshot is needed in order to seed a subscriber with data.
Once a snapshot has been generated for a publication, either the Distribution Agent
(for snapshot and transactional replication) or the Merge Agent (for merge
replication) picks it up and distributes it to subscribers. You can also take the files
from the snapshot folder and transfer them to the subscriber manually. This is often
done when setting up a remote subscriber for the first time that has a low-bandwidth
connection to the distributor. By putting the initial snapshot on, say, a CD, you allow
the site to be set up more quickly than if it had to wait on the snapshot to be
transferred over a slow WAN link.
Duties of the Snapshot and Distribution Agents
As I've mentioned, the work of snapshot replication is divided between the Snapshot
Agent and the Distribution Agent. In this next section, we'll cover the work each of
these carries out separately.
The Snapshot Agent begins by connecting from the distributor to the publisher and
setting a share lock on each of the tables included in a publication's articles. These
locks help guarantee a consistent view of the data by preventing changes to the
data until they are released. Naturally, this means that you'd normally want to run
the Snapshot Agent during times when not being able to change the data in
published articles isn't a problem for your users.
As it works, the Snapshot Agent keeps a record of what it's doing in the
MSsnapshot_history table. If you list the MSsnapshot_history table, you'll see log
records similar to those we saw when we ran the agent from the command prompt.
You can adjust the verbosity of the agent via its OutputVerboseLevel parameter. This
affects the level of detail written to MSsnapshot_history (and to the console if you
are running the agent from the command line). MSsnapshot_history is a good place
to start if you run into problems with a snapshot and want to narrow down where the
problem is occurring. You can access the information in MSsnapshot_history by
simply querying the table, or from Enterprise Manager by selecting the publication
under the Replication Monitor node, then double-clicking its Snapshot Agent (every
snapshot publication will have its own Snapshot Agent) and selecting Session Details
from the Snapshot Agent History dialog. Listing 21.3 shows an example of what you
might find in the MSsnapshot_history table.
After locking the objects it intends to take a snapshot of, the agent next connects
over the network from the publisher to the distributor (if they are on separate
machines) and saves the schema for each article in the publication to its own file in
the snapshot folder on the distributor (the location of the snapshot folder can be
changed, as I mentioned earlier). Each schema file will have the extension .sch and
will contain the T-SQL statements necessary to create an article's object. If you
opted to include indexes, triggers, or referential integrity when you added an article
to the publication (clustered indexes are included by default in the Create
Publication wizard), the Snapshot Agent will save the T-SQL commands necessary to
create these objects in a separate file for each article, each with its own file
extension.
The Snapshot Agent next writes the data itself to the snapshot folder. As I said
earlier, each data file is written in BCP format (either native or character mode,
depending on whether the publication allows heterogeneous subscribers and
whether it is being transformed via DTS) and has the extension .bcp. The data and
schema files represent a synchronization set�files that record the state of an object
as it existed at a given point in time�for each article in the publication.
The agent next adds rows to the distribution database's MSrepl_commands table
indicating the location of the synchronization set and specifying any pre- or
postapplication scripts.
When you set up a snapshot publication, you can specify custom T-SQL scripts to run
before and/or after the snapshot is applied at a subscriber. When you specify one of
these scripts, the Snapshot Agent copies it to the snapshot folder so that the
Distribution Agent can run it when applying the snapshot and adds a row to
MSrepl_commands for it.
Name Type
publisher_database_id int
xact_seqno varbinary(16)
type int
article_id int
originator_id int
command_id int
partial_command bit
Name Type
command varbinary(1024)
Listing 21.4
USE distribution
GO
SELECT
publisher_database_id,
xact_seqno,
type,
article_id,
originator_id,
command_id,
partial_command,
CAST(command AS nvarchar(512)) AS command
FROM msrepl_commands
(Results abridged)
The Snapshot Agent also adds rows to the MSrepl_transactions table. These entries
reference the subscriber synchronization task.
Once it has completed generating the snapshot and updating the appropriate
replication system tables, the Snapshot Agent releases the share locks it took out
earlier and adds its final log entries to the MSsnapshot_history table.
The Distribution Agent handles the task of moving the schema and data files from a
snapshot to a subscriber. It begins this task by connecting from the server where it is
running to the distributor. For push subscriptions, the Distribution Agent usually runs
on the distributor; for pull subscriptions, it usually runs on the subscriber.
Finally, the Distribution Agent applies the snapshot to the subscriber by creating the
necessary objects and loading the data contained in each synchronization set in the
snapshot. It handles data type conversions as necessary for non�SQL Server and
down-level subscribers. It synchronizes all the articles in the publication and
preserves the transactional and referential integrity of the affected objects in the
subscription database, provided that the subscriber has the capability to do so.
The Distribution Agent can apply a snapshot when a subscription is first created or
based on a schedule you define when you create a publication. If you set up a
snapshot to be applied on a schedule, keep in mind that the schedule is based on
the system time on the machine on which the Distribution Agent is running. If it's
running on the distributor, the schedule will be based on the system time of the
distributor machine. If it's running on a subscriber (e.g., as it would be with a pull
subscription), the schedule is based on the system time of the subscriber machine.
Updatable Subscriptions
By default, data replicated via snapshot replication is not updatable at the
subscriber�that is, you can't make changes at the subscriber and have them
propagate to the publisher or to other subscribers. However, you can enable a
snapshot publication such that it is updatable at the subscriber. You have three
options for doing so: immediate updating subscribers, queued updating subscribers,
and immediate updating subscribers with queued updating as a fallback.
Immediate Updating
Of course, this option requires that the publisher be available to the subscriber when
the change needs to be made. The 2PC to the publisher occurs automatically (via a
trigger on the replicated table), so the subscriber makes changes to the table
without making any special provision for the fact that a distributed transaction is
actually being initiated behind the scenes.
When you publish a table via snapshot replication and allow it to be updated, SQL
Server adds a uniqueidentifier column to it named msrepl_tran_ version. This column
is used in the filter criteria for subsequent updates and deletes. Because this column
is added to the table (and you won't normally be inserting values into it), inserts on
the publisher or subscriber must include a column list.
SQL Server restricts the tables you can publish via snapshot replication and update
on the subscriber to those that already have at least one unique key. Although you
can publish a table via snapshot replication that does not have a primary key, it
cannot be a part of a publication that allows the subscriber to update it.
On the subscriber, updatable tables received via snapshot replication have three
triggers automatically created for them, one each for insert, update, and delete
operations. Each of these triggers requires that it be the first trigger to execute for
its specified operation. They enforce this through use of the undocumented
TriggerInsertOrder, TriggerUpdateOrder, and TriggerDeleteOrder OBJECTPROPERTY
strings. I'm not sure why they don't use the documented ExecIsFirstInsertTrigger,
ExecIsFirstUpdateTrigger, and ExecIsFirstDeleteTrigger properties instead, but these
undocumented properties appear to work similarly.
Once each trigger has ensured that it is executing in the proper sequence, it calls a
stored procedure to carry out the update on the publisher. It passes this procedure
the values for each column in the table and a bitmap indicating which ones were
actually changed. (It derives this bitmap from the COLUMNS_UPDATED function.) If
multiple rows are being changed by the DML operation that fired the trigger, it opens
a cursor on the appropriate deleted and/or inserted table and calls the stored
procedure once for each row being altered.
This stored procedure uses the table's primary key and the GUID stored in the
uniqueidentifier column to ensure that it updates the correct row. It also makes use
of the supplied bitmap to determine which column(s) to change. To keep things
simple, it assigns a column's value back to itself if the column was not changed by
the original DML operation on the subscriber. Unfortunately, this will cause other
triggers on the table that use the COLUMNS_UPDATED or IF UPDATE syntax to think
that each column in the table has changed regardless of whether it actually has.
The data modification is sent to the publisher using SQL Server's normal linked
server facility and invokes the Microsoft Distributed Transaction Coordinator as
necessary. If the change succeeds on the publisher, the trigger allows the change to
proceed on the subscriber; otherwise it is rolled back.
Subscriber applications need to allow for the possibility that an update may fail on
the publisher. This could be due to several reasons including a possible conflict with
updates from other subscribers or the publisher itself. Often, the correct course of
action is simply to wait a few seconds and retry the update.
Queued Updating
A publication enabled for queued updating works much the same as one enabled for
immediate updating�triggers on the subscriber enable changes to be propagated to
the publisher, which then propagates them to the other subscribers. The difference,
of course, is that these changes are stored in a queue until they can be sent to the
publisher. By default, this queue is a SQL Server table creatively named
MSreplication_queue, but the queue can also be implemented via Microsoft Message
Queuing (MSMQ). You can change a publication initially set up to use MSreplication_
queue to use MSMQ via the Updatable tab in the Publication Properties dialog in
Enterprise Manager.
The Queue Reader Agent reads the queue and applies the stored changes to the
publisher. If MSreplication_queue is being used, the agent reads the queued changes
directly from the table. If MSMQ is being used, the agent reads the changes from the
queue on the distributor.
If conflicts are detected, they're resolved according to the conflict resolution policy
established when the publication was first created. Consequently, compensating
commands may be generated to roll back a transaction to a subscriber, but they will
be sent only to the originating subscriber, not to all subscribers of the publication.
Contrary to what you might infer from the loquacious name, configuring a
publication to allow immediate updating on the subscriber with queued updating as
a failover does not cause queued updating to be enabled automatically when
immediate updating fails (e.g., because the subscriber cannot connect to the
publisher). You must enable the failover manually, and, once you do, you cannot
switch back to immediate updating mode until the subscriber and publisher can
communicate and the Queue Reader Agent has applied all queued updates.
Queued updating is not automatically enabled with this option because there may
be easy resolutions to communications difficulties between a subscriber and the
publisher. It may be preferable to resolve those issues rather than to have updates
automatically begin queuing to a table or MSMQ store.
Remote Agent Activation
I've mentioned that the Distribution Agent can be run on a server other than the
default if you choose. You can do this via remote agent activation. Running the
Distribution Agent remotely amounts to either running it on the subscriber for push
subscriptions or running it on the distributor for pull subscriptions. Normally, the
Distribution Agent runs on the distributor for push subscriptions and on the
subscriber for pull subscriptions. You can change its default location when you set up
a subscription. This is something to consider when you want to offload some of the
work of the distributor to subscribers when processing push subscriptions, especially
large numbers of them. Given that the agent runs on the subscriber by default with
pull subscriptions, this isn't an issue unless you have push subscriptions. If so, you
may want to configure the agent to run on subscriber machines in order to lessen
the load on the distributor.
Remote agent activation uses DCOM to run an agent on another machine. You must
have DCOM permissions properly configured in order to use remote agent activation.
Failing to do so could cause the synchronization between the distributor and a
subscriber to fail. Note that you can't use remote agent activation with Win9x
subscribers.
Replication Cleanup
One of the tasks you have to provide for in an environment intended for long-term
use is cleanup. An architecture that does not perform system cleanup and handle as
much of its own system maintenance as possible can present administration
headaches down the road. When replication is first enabled, SQL Server creates five
canned SQL Server Agent jobs to help keep replication humming along and to aid it
in cleaning up after itself. Table 21.2 summarizes these jobs.
Each of these maintenance tasks helps replication continue to run well over the
long-term and reduces the administrative burden on those managing the replication
installation. For example, the distribution cleanup job deletes the files associated
with a snapshot once the snapshot has been applied to all subscribers. Of course, if
a snapshot publication supports anonymous subscribers or was defined with the
option to create the first snapshot immediately, at least one copy of the snapshot
files must be retained.
Job Purpose
Agent history cleanup Ages and clears out replication agent history from
the distribution database
Replication agents checkup Detects replication agents that have "gone silent"
(that are not actively logging history)
Recap
Snapshot replication is used for copying whole objects from a publisher to a
subscriber. Although a snapshot can be filtered, snapshot replication has no facility
for sending incremental changes to subscribers.
SQL Server implements snapshot replication via the Snapshot and Distribution
Agents. The Snapshot Agent handles creating the snapshot, which consists of BCP
data files and T-SQL scripts. The Distribution Agent takes the snapshot and applies it
to subscribers. All along the way, both agents access and update replication system
tables in the distribution database.
Knowledge Measure
1. In what format does the Snapshot Agent save data from tables published as
articles in a snapshot publication?
2. True or false: Although similar to SQL Server Agent, the Snapshot Agent is
installed as its own Windows service when you enable replication and does not
make use of SQL Server Agent.
3. What agent is responsible for delivering the snapshots created by the Snapshot
Agent to subscribers?
7. Name one of the five tasks that SQL Server sets up when replication is first
enabled in order to help the system maintain itself.
8. Identify two scenarios in which the Snapshot Agent will automatically write
data files in BCP character mode format rather than native format.
11. What type of column is added to snapshot replication tables that are part of
publications enabled for immediate updating?
12. What is the name of the table in which the Snapshot Agent records its progress
as it creates a snapshot?
14. What's the creative name of the table used for queuing updates when SQL
Server provides the store for queued updates?
15. True or false: The Snapshot Agent is also used in transactional replication to
create an initial snapshot of data that is published for transactional replication.
16. What agent command line parameter can you specify to change the verbosity
level of the Snapshot Agent?
17. When MSMQ provides the queue for queued updates, where does the queue
reside?
18. True or false: Every snapshot publication gets its own instance of the Snapshot
Agent.
20. True or false: Triggers are used to initiate 2PC operations when the immediate
updating option is used with a snapshot publication.
Chapter 22. Transactional Replication
Whenever someone tells me that adopting their beliefs could give me the
same type of life they have, I tell them that that's exactly what I'm afraid of.
�H. W. Kenton
Transactional replication is by far the most widely used form of SQL Server
replication. It combines a good deal of functionality with reasonably simple
configuration and administration. It's more full-featured than snapshot replication
but much easier to set up and manage than merge replication.
One mistake I often see people make when deciding which type of replication to go
with is not realizing that both snapshot replication and transactional replication offer
immediate and queued updating subscriptions. People somehow get the idea that
merge replication is the only replication type that supports bidirectional data
replication. That's not the case, as should have been obvious from our discussion in
the last chapter regarding updating subscribers and snapshot replication.
Transactional replication offers the same updatability that snapshot replication
provides. Additionally, it offers a publication model that is practical to use in
situations where you need to regularly apply incremental updates from a publisher
to subscribers.
Overview
SQL Server implements transactional replication via the Snapshot Agent, the Log
Reader Agent, and the Distribution Agent. As with snapshot replication, the Snapshot
Agent prepares the initial snapshot of a transactional publication. The Log Reader
Agent scans the transaction log on the publisher and detects the changes made to
the data after the snapshot has been taken and records them in the distribution
database. (With concurrent snapshot processing, the Log Reader Agent can actually
log changes made during the snapshot generation, as we'll discuss below.) The
Distribution Agent reads the changes recorded in the distribution database and
applies them to subscribers.
The MSrepl_commands Table
Each modification to a published table causes the Log Reader Agent to write at least
one row to MSrepl_commands. Unlike snapshot replication, you can't simply cast the
command column in MSrepl_commands as an nvarchar(512) and return human-
readable text for transactional replication commands, so you'll want to use
sp_browsereplcmds instead.
Understand that, just as a single T-SQL command may cause multiple entries to be
written to the transaction log, so can a single T-SQL command cause the Log Reader
Agent to write multiple entries to MSrepl_commands. If you have, say, a T-SQL
UPDATE statement that affects 10,000 rows, you'll get at least 10,000 entries in
MSrepl_commands. The same is true for DELETE commands�if a single T-SQL
DELETE command deletes 10,000 rows, the Log Reader Agent will add at least
10,000 rows to MSrepl_commands. For each row in a transactional publication table
article that's modified by a T-SQL command, you'll see at least one entry in
MSrepl_commands.
Because of this, a single DML statement against a table that has been published
with transactional replication may cause significant log activity not only in the
original database but also in the distribution database and in the destination
database on subscribers. For this reason, transactional replication probably isn't your
best choice when you want to replicate the entirety of a database and all the activity
on it to another machine. You'd likely be better off using something like log shipping
in that scenario.
NOTE: You can configure log shipping and replication to work together. Specifically,
you can configure transactional replication such that it interoperates with log
shipping to provide a warm standby server if the publisher fails.
You have two options for integrating log shipping and transactional replication:
synchronous mode and semisynchronous mode. To enable synchronous mode, you
set the sync with backup option on the publishing database via the
sp_replicationdboption stored procedure. Once in synchronous mode, the Log Reader
Agent will ignore change records in the transaction log until they have been backed
up. This ensures that no subscriber can get ahead of the distributor. This way, if the
publisher fails, we can be certain that no subscriber will have data not on the
standby server. Naturally, this increases the latency between the time a change is
made on the publisher and the time is propagated to subscribers from what is
usually a few seconds to what might be several minutes.
To enable semisynchronous mode you simply set up log shipping as you normally
would and allow transactional replication to behave as it does by default. In this
mode, it is possible for the standby server and subscribers to be out of sync, but the
latency between changes on the publisher and their propagation to subscribers is
not tied to the log shipping interval, usually somewhere between two and ten
minutes.
The sp_replcmds Procedure
The Log Reader Agent uses the extended procedure sp_replcmds (implemented
internally by SQL Server) to retrieve the log records produced by DML statements in
the published database's transaction log. Each publisher database participating in
transactional replication will have just one Log Reader Agent regardless of how many
transactional publications it contains.
The first client that calls sp_replcmds for a database is considered the log reader for
that database until it disconnects. Other clients attempting to run sp_replcmds
before the first client disconnects will receive an error stating that "Another log
reader is replicating the database."
One reason that only one Log Reader Agent is permitted for each database is that
scanning the log for changes can impact performance. Each time the Log Reader
Agent invokes sp_replcmds, it causes log reader code within the SQL Server process
to scan the published database's transaction log for changes that need to be
replicated. When it does this, the Log Reader Agent changes the typical sequential
method SQL Server uses to access the log into something more random. While the
server is writing new entries to the end of the transaction log as changes are made
to the database, the Log Reader Agent may be reading a different section of the log
in order to write replication commands to MSrepl_commands and
MSrepl_transactions. This can cause resource contention for the transaction log and
impact the performance of the server as a whole.
SQL Server maintains a global cache of article metadata known as the article cache.
This cache stores metadata from sysarticles and syscolumns for each replicated
article. When the log reader code within the server needs the metadata for a
particular article, it consults this cache. If the article is already in the cache, the log
reader code retrieves the required information from the cache. If the article isn't in
the cache, it accesses sysarticles and syscolumns directly and retrieves the
information it needs, then adds that information to the cache.
You can check for the existence of the article cache via the undocumented DBCC
RESOURCE command. See my previous books, The Guru's Guide to Transact-SQL and
The Guru's Guide to SQL Server Stored Procedures, XML, and HTML for more
information on DBCC RESOURCE. This command will return the address within the
server process of the article cache. To see this for yourself, run the following from
Query Analyzer:
Once DBCC RESOURCE returns, search its output for the string article_cache. This
member of the global resource structure contains the address of the replication
article cache.
The article_cache member of the global resource structure contains a pointer to a
linked list of replicated database objects. A member in each list item indicates which
SRV_PROC structure is the current log reader for the corresponding database. When
a connection calls sp_replcmds, the server assigns its SRV_PROC pointer to the
appropriate item in the database object list (provided another connection has not
already done so). As long as this SRV_PROC pointer is nonzero, another connection
cannot successfully run sp_replcmds for that particular database. When the
connection currently assigned as a database's log reader disconnects, an ODS
predisconnect handler resets the SRV_PROC pointer in the appropriate database
object so that another connection can then assume the log reader role.
sp_replcmds Parameters
1. Start Enterprise Manager and expand the Replication Monitor node on your
replication distributor.
3. Find your Log Reader Agent in the list on the right and double-click it.
5. Give the new profile a name, then change its ReadBatchSize to the value you'd
like passed in for @maxtrans. Setting this value to 1 is a common
troubleshooting step. (If unspecified, @maxtrans actually defaults to 1.)
6. Back in the Agent Profile dialog, click the radio button next to your new profile
in order to select it, then click OK.
Note that although Books Online documents just one parameter for sp_replcmds, the
procedure actually supports several parameters. If you capture a Profiler trace while
the Log Reader Agent is running, you'll find that besides the documented
@maxtrans parameter, two additional integer parameters are regularly passed in. In
my cursory tests, their values appear never to change (they're always 0 and -1), but
it's worth noting that there's more here than meets the eye.
Another point worth mentioning is that the parameters you see in Profiler differ from
those shown in the Log Reader Agent's output. To see this for yourself, create a
transactional replication publication and enable an output file for the Log Reader
Agent. (Specify the -Output filename command line parameter to set up an agent
output file, as we discussed in Chapter 21.) You should see output file entries like
this for the calls to sp_replcmds:
Note that you shouldn't execute sp_repldone manually. Doing so can invalidate the
order and consistency of replicated transactions. If you run into an emergency
situation where your transaction log is overflowing because replicated transactions
refuse to be purged (and you're certain they've been sent to the distributor), calling
sp_repldone manually might be an appropriate step, but you should do so only when
instructed to by Microsoft or a Microsoft support partner. Instead, let the Log Reader
Agent decide when to execute sp_repldone. I mention it here so that you'll know why
you may occasionally see it in Log Reader Agent output and Profiler traces.
Update Stored Procedures
The format of the actual commands the Log Reader Agent places in MSrepl_
commands varies based on how a particular article is set up. You can customize the
way that DML commands are passed to subscribers for each table article in a
publication. To do this, click the ellipsis button next to the table article in the Articles
tab of the Publication Properties dialog. Select the Commands tab in the Table Article
Properties dialog to configure the format of the commands sent to the subscriber.
You have four options.
1. You can clear the Replace… checkboxes in order to cause the Log Reader to
write plain INSERT, UPDATE, or DELETE statements to MSrepl_commands. If
you are publishing for SQL Server subscribers only, it's better and more
efficient to use stored procedures to carry out these operations, so SQL Server
gives you three options for doing so.
2. If your publication is limited to SQL Server subscribers, you'll notice that the
checkboxes to use stored procedures instead of INSERT, UPDATE, and DELETE
commands are checked by default. You have three options for specifying how
the Distribution Agent will call these procedures on the subscriber: CALL,
MCALL, and XCALL syntax. Each calling convention provides a slightly different
mechanism for calling a DML stored procedure and affords flexibility to those
designing replication topologies. By default, the insert and delete procedures
are called using CALL syntax, and the update procedure is invoked using
MCALL syntax. For an insert, the CALL syntax specifies that the procedure
accepts values for each of the table's columns as parameters. For a delete,
CALL specifies that the procedure accepts values for the table's primary key
column(s) as parameters. For an update, MCALL specifies that the procedure
must accept new values for all the article's columns, followed by the original
values of its primary key column(s).
4. The last option for calling these routines is to use XCALL syntax. XCALL syntax
specifies that an update procedure will be passed the original value of every
column in the article followed by the new value for every column in it. Having
the original column values allows you to more easily implement optimistic
concurrency controls that detect changes to an article on the subscriber by
other users.
NOTE: One thing to watch out for here is malformed procedure names in the stored
proc calls generated by the Log Reader Agent. As of this writing, allowing the Log
Reader Agent to generate CALL, MCALL, or XCALL syntax for a table article whose
name contains a space produces malformed commands in MSrepl_commands. This
is because the names of the update procedures are based on the name of the table
article, and the Log Reader Agent doesn't properly wrap the procedure name in
either quotes or square brackets, so syntax like this gets into MSrepl_commands:
Note the space between sp_MSupd_Order and Details. The proper syntax is:
You can work around this by placing square brackets or quotes around the stored
proc names in the Table Article Properties dialog when you specify the calling
convention to use.
Regardless of whether you use CALL, MCALL, or XCALL syntax, the Distribution Agent
sends the appropriate stored proc calls to each subscriber as an RPC event. The
MCALL and XCALL conventions are really just variations on the ODBC CALL syntax,
which allows you to invoke stored procedures using SQL Server's RPC facility. (You'll
recall that we discussed RPC in Chapter 6.) MCALL and XCALL just specify what types
of parameters to pass into a DML procedure; they both ultimately result in the
procedure being invoked using ODBC's RPC CALL syntax.
Concurrent Snapshot Processing
By default, SQL Server places shared locks on the table articles in a transactional
publication while the initial snapshot is being created. This ensures transactional
consistency for the snapshot but prevents updates from being made to the tables
while the snapshot is being generated.
Once the snapshot generation completes, the Snapshot Agent writes a second
record to the transaction log indicating that it has finished. The Log Reader Agent
then reads the transactions from the log that occurred between the time the
snapshot was started and when it was completed and writes the appropriate
commands to the distribution database to reconcile the generated snapshot with the
present state of the table.
Naturally, this requires the Log Reader Agent to participate in completing the
snapshot process since it handles collecting the changes that occur during the
snapshot generation and writing them to the distribution database. In order for the
snapshot to be consistent from a transactional standpoint, the Log Reader Agent
must detect the changes that occurred between the start of the snapshot generation
and its conclusion and write them to the distribution database. In fact, if the Log
Reader Agent is unable to run, the Distribution Agent will be unable to apply the
snapshot to subscribers and will return an error indicating that the snapshot is not
available.
Note that you can't use UPDATETEXT on a column in a table for which a concurrent
snapshot is being generated. If you attempt to do so, you'll receive a 7137 error,
"UPDATETEXT is not allowed because the column is being processed by a concurrent
snapshot. . . ." Once the snapshot completes, you can again execute UPDATETEXT
statements against the column.
Note that replication can fail if you enable concurrent snapshot processing for a
publication containing a table with a primary key or unique constraint not contained
in the clustered index and modifications to the clustering key occur during snapshot
processing. To prevent this, don't enable concurrent snapshot processing on
publications containing a table with a primary key or unique constraint not contained
in its clustered index unless you can ensure that the clustered index columns will not
be modified during snapshot processing.
When publishing to subscribers running on SQL Server 7.0, the distributor must be
running on SQL Server 2000 or later, and using concurrent snapshots requires using
push subscriptions to push the data out to them. Using a push subscription causes
the Distribution Agent to run at the distributor, whereas a pull subscription would
cause it to run at the SQL Server 7.0 subscriber where concurrent snapshot
processing is not available.
Due to the many restrictions and caveats associated with concurrent snapshot
processing, it's not enabled by default. However, you can edit the properties for a
transactional publication after it has been created and enable it via the Snapshot tab
of the Publication Properties dialog.
Updatable Subscriptions
By default, data replicated via transactional replication is not updatable at the
subscriber�that is, you can't make changes at the subscriber and have them
propagate to the publisher or to other subscribers. However, you can enable a
transactional publication such that it is updatable at the subscriber. You have three
options for doing so: immediate updating subscribers, queued updating subscribers,
and immediate updating subscribers with queued updating as a fallback.
Immediate Updating
Of course, this option requires the publisher to be available to the subscriber when
the change needs to be made. The 2PC to the publisher occurs automatically (via a
trigger on the replicated table), so the subscriber makes changes to the table
without making any special provision for the fact that a distributed transaction is
actually being initiated behind the scenes.
When you publish a table via transactional replication and allow it to be updated,
SQL Server adds a uniqueidentifier column to it named msrepl_ tran_version. This
column is used in subsequent updates and deletes to locate the correct row(s) to
change. Because this column is added to the table (and you won't normally be
inserting values into it), inserts on the publisher or subscriber must include a column
list.
On the subscriber, updatable tables received via transactional replication have three
triggers automatically created for them, one each for insert, update, and delete
operations. Each of these triggers requires that it be the first trigger to execute for
its specified operation. They enforce this through use of the undocumented
TriggerInsertOrder, TriggerUpdateOrder, and TriggerDeleteOrder OBJECTPROPERTY
strings. I'm not sure why they don't use the documented ExecIsFirstInsertTrigger,
ExecIsFirstUpdateTrigger, and ExecIsFirstDeleteTrigger properties instead, but these
undocumented properties appear to work similarly.
Once each trigger has ensured that it is executing in the proper sequence, it calls a
stored procedure to carry out the update on the publisher. It passes this procedure
the values for each column in the table and a bitmap indicating which ones were
actually changed. (It derives this bitmap from the COLUMNS_UPDATED function.) If
multiple rows are being changed by the DML operation that fired the trigger, it opens
a cursor on the appropriate deleted and/or inserted table and calls the stored
procedure once for each row being altered.
This stored procedure uses the table's primary key and the GUID stored in the
uniqueidentifier column to ensure that it updates the correct row. It also makes use
of the supplied bitmap to determine which column(s) to change. To keep things
simple, it assigns a column's value back to itself if the column was not changed by
the original DML operation on the subscriber. Unfortunately, this will cause other
triggers on the table that use the COLUMNS_UPDATED or IF UPDATE syntax to think
that each column in the table has changed regardless of whether it actually has.
The data modification is sent to the publisher using SQL Server's normal linked
server facility and invokes the Microsoft Distributed Transaction Coordinator as
necessary. If the change succeeds on the publisher, the trigger allows the change to
proceed on the subscriber; otherwise, it is rolled back.
Subscriber applications need to allow for the possibility that an update may fail on
the publisher. This could be due to several reasons including a possible conflict with
updates from other subscribers or the publisher itself. Often, the correct course of
action is simply to wait a few seconds and retry the update.
Queued Updating
A publication enabled for queued updating works much the same as one enabled for
immediate updating�triggers on the subscriber enable changes to be propagated to
the publisher, which then propagates them to the other subscribers. The difference,
of course, is that these changes are stored in a queue until they can be sent to the
publisher. By default, this queue is a SQL Server table creatively named
MSreplication_queue, but the queue can also be implemented via Microsoft Message
Queuing (MSMQ). You can change a publication initially set up to use
MSreplication_queue to use MSMQ via the Updatable tab in the Publication
Properties dialog in Enterprise Manager.
The Queue Reader Agent reads the queue and applies the stored changes to the
publisher. If MSreplication_queue is being used, the agent reads the queued changes
directly from the table. If MSMQ is being used, the agent reads the changes from the
queue on the distributor.
If conflicts are detected, they're resolved according to the conflict resolution policy
established when the publication was first created. Consequently, compensating
commands may be generated to roll back a transaction to a subscriber, but they will
be sent only to the originating subscriber, not to all subscribers of the publication.
Immediate Updating with Queued Updating as
Failover
Contrary to what you might infer from the loquacious name, configuring a
publication to allow immediate updating on the subscriber with queued updating as
a failover does not cause queued updating to be enabled automatically when
immediate updating fails (e.g., because the subscriber cannot connect to the
publisher). You must enable the failover manually, and, once you do, you cannot
switch back to immediate updating mode until the subscriber and publisher can
communicate and the Queue Reader Agent has applied all queued updates.
Queued updating is not automatically enabled with this option because there may
be easy resolutions to communications difficulties between a subscriber and the
publisher. It may be preferable to resolve those issues rather than to have updates
automatically begin queuing to a table or MSMQ store.
Validating Replicated Data
As with snapshot and merge replication, SQL Server can validate data replicated via
a transactional replication publication. You can choose to verify merely that the
publication tables on the subscriber and the publisher have the same number of
rows, or you can verify checksums or binary checksums of the data in the tables.
When you use checksum validation, the server compares a 32-bit cyclic redundancy
check (CRC) for each table article on the publisher and subscriber. Because this CRC
is computed on a column-by-column basis, the precise column order for an article
can vary between the publisher and subscriber without affecting the comparison.
Data from text and image columns is not included in the comparison.
2. In the list of publications on the right, find the publication you want to validate
and right-click it. Select Validate Subscriptions.
4. Click the Validate Options button to specify the type of validation to do. You
can choose to compute a "fast" row count (based on cached information) or to
retrieve an actual row count for each table via a T-SQL query. You can opt to
compare checksums or, if both the publisher and subscriber are running SQL
Server 2000 or later, binary checksums.
5. Click OK once you've chosen your validation options, then click OK in the
Validate Subscriptions dialog to validate your data.
To see this firsthand, let's work though a simple exercise to force a data validation
failure.
4. Update one of the columns in the Customers table in pubs such that it differs
from the publisher's version of the table.
6. Right-click the publication in the list on the right and select Validate
Subscriptions.
7. Leave the Validate all subscriptions radio button selected and click the
Validation Options button.
8. Click the Compare checksums to verify row data option and the This subscriber
is a server running SQL Server 2000, use a binary checksum option, then click
OK.
10. After the Distribution Agent next runs, double-click it in the Agents\Distribution
Agents list under Replication Monitor on the distributor to display its agent
history. Click the Session Details button to list details for the sessions listed in
the agent history.
11. You should see an entry indicating that data validation may have failed for the
table article in question. You can click the Error Details button to view
additional information about the failure. Click Close to exit the Session Details
dialog, then click Close again to exit the Distribution Agent History dialog.
12. You can also view the errors directly from the tables in the distribution
database. To do this, execute the following queries from Query Analyzer:
where distribution is the name of your distribution database. You should see
entries in both tables indicating potential data validation problems.
You can configure replication alerts to notify you when a data validation fails, and
you can also reinitialize the affected subscription(s) automatically. To set up either
one of these, follow the steps below.
1. Click the Replication Alerts node under the Replication Monitor on the
distributor.
2. Right-click the Subscriber has failed validation entry in the list on the right and
select Properties.
3.
In the General tab in the Properties dialog, click the Enabled checkbox to
enable the alert. You can configure specific operators on the Response tab.
Note that immediate updating subscribers can cause data validation to fail between
the time a change is made on the subscriber and the time it is propagated to the
publisher. Naturally, during the period of time the two copies of the published article
do not match, a data validation will fail. The only sure way to avoid this is not to
make changes on the subscriber during the validation process.
Note that checksum validations are not supported with publications where DTS is
used to transform the data because transformation implies different data values at
the publisher and subscriber, so the checksum values would not likely match.
Also, row count validation is not supported for articles configured as DTS horizontal
partitions because the filter criteria for the partition is saved with the DTS package;
it's not in a view on the publisher as is the case with regular replication filters.
Cleanup
One of the stock SQL Server Agent jobs added when replication is installed is the
Distribution clean up task. Once all subscribers have received replicated
transactions, the sp_MSdistribution_cleanup stored procedure is called to remove
delivered transactions from the distribution database. The amount of time delivered
transactions remain in the distribution database is known as the retention period
and defaults to 72 hours for transactional replication. You can change the retention
period for a distributor by following these steps.
2. On the Distributor tab, select the distribution database and click the Properties
button.
3. You can set the minimum and maximum retention in hours or days via the
dialog's Transaction retention section.
If you coordinate the retention period you specify with the backup interval of your
subscriber destination databases, you can ensure that the data required to recover a
subscriber destination database is available within the distribution database. This
can make recovering a failed subscriber database fairly trivial because the
subscriber can simply reload its last backup, then synchronize with the publisher and
be brought immediately up to date.
Recap
Transactional replication strikes a nice balance between enhanced functionality and
straightforward administration. It provides more flexible and full-featured replication
options than snapshot replication affords but is much easier to set up and manage
than merge replication.
4. True or false: Although the Snapshot Agent places shared locks on table
articles when it begins creating the snapshot for a transactional publication,
the publication can be configured to allow changes to these objects while the
snapshot is being created.
7. What table in the distribution database stores history information for the
Distribution Agent?
11. What stored procedure must one call in order to set the sync with backup
option for a database?
12. Is it possible for multiple Log Reader Agents to execute concurrently for the
same published database?
13. What internal extended procedure can the Log Reader Agent call to indicate
that it has completed processing a set of change records from the transaction
log?
14. One of the three calling conventions you can use to call update procedures
used with transactional replication is the CALL syntax. Name the other two and
explain how they differ.
16. True or false: Log shipping and transactional replication are mutually
exclusive�because they both read the transaction log of a published
database, you cannot use them together.
17.
True or false: In order for an immediate updating transactional replication
subscriber to fail over to queued mode, you must manually enable the failover.
18. Is it possible for a SQL Server 7.0 subscriber to initiate a pull subscription for a
concurrent snapshot?
19. Assume that I issue a T-SQL UPDATE statement against a published table
article that changes 100 of its rows. At a minimum, what number of new
entries should I see in MSrepl_commands on the distributor?
20. What's the simplest way to change the retention period for a distributor?
Chapter 23. Merge Replication
I've never liked the word "tolerate." Saying you tolerate someone else's culture
or beliefs is like saying you put up with the guy who keeps a messy yard
because the government says you have to, but you'd gladly evict him if the
opportunity presented itself. We shouldn't just tolerate other cultures�we
should celebrate them. Inasmuch as we want our own culture to be respected
and well thought of, we should respect and enjoy the cultures of other people.
�H. W. Kenton
Although both transactional replication and snapshot replication support the notion
of bidirectional data updates, they do not offer as much power and flexibility as
merge replication offers. While they both support subscriber-side updates as well as
queued updating, they were not designed from the ground up for these tasks�they
are essentially data transfer mechanisms that also provide a nice set of data update
functionality. Snapshot and transactional replication were not designed to be used in
environments where subscribers are frequently disconnected or where subscribers
update data as much as or more than the publisher. They also cannot publish data
directly to SQL Server CE subscribers. That said, merge replication is much more
difficult to set up and manage, does not guarantee transactional consistency, and
may not perform as well as transactional replication in simple bidirectional
scenarios. It is also the relative new kid on the block, having first appeared in SQL
Server 7.0.
As I said in Chapter 21, I'm not going to try to cover every single idiosyncrasy of
merge replication or how to administer it in this chapter. This chapter is about how
merge replication works. Books Online provides good coverage of the many nuances
and details of how to set up and manage merge replication, just as it does the other
types of replication.
Overview
SQL Server implements merge replication via the Snapshot Agent and the Merge
Agent. As with snapshot and transactional replication, the Snapshot Agent is
responsible for creating the initial snapshot of the data that will be distributed by the
publisher. Once it creates the initial snapshot, the Snapshot Agent creates
synchronization jobs on the publisher and builds replication-specific system tables,
stored procedures, and triggers. Subscriber-side system objects are created when
the snapshot is first applied to the subscriber.
Note that a merge subscription table can subscribe to only one publication at a time.
If you attempt to subscribe from a single subscriber table to an article in two
different publications, one of the Merge Agents will fail when the initial snapshot is
applied.
The Distribution Agent is not used in merge replication. The functions normally
performed by the Distribution Agent in snapshot and transactional replication are
handled by the Merge Agent in merge replication. The distribution database is also
not used much in merge replication. Its main function is to support agent logging
and history�no data is stored in it for forwarding to and from subscribers. Each
publisher/subscriber database maintains its own store-and-forward tables. Because
the role of the distributor is so limited with merge replication, it's not unusual for the
distributor to reside on the same server as the publisher.
The Merge Agent is responsible for taking the initial snapshot and applying it to
subscribers. Once the snapshot is applied, the Merge Agent is responsible for
collecting the incremental changes made after the snapshot was taken, applying
them to subscribers, and uploading changes made at subscribers to the publisher.
This process is known as synchronization. For push subscriptions, the Merge Agent
runs on the distributor. For pull subscriptions, it runs on the subscriber. As with the
Distribution Agent used for snapshot and transactional replication, you can change
the machine on which the Merge Agent runs via remote agent activation.
As with the other replication agents, the Merge Agent is an ODBC console
application. Normally it runs via SQL Server Agent, but you can also run it from the
command line. Its executable file is replmerge.exe. If you check its thread count in
Perfmon or attach a debugger and dump its thread stacks, you'll see that it's a
multithreaded application.
The Merge Agent applies changes to a publisher or subscriber by calling the stored
procedures generated by the Snapshot Agent. There are separate stored procedures
for inserting, updating, and deleting rows. Typically, these procedures are named
sp_upd_GUID, sp_ins_GUID, and sp_del_GUID, where GUID is the uniqueidentifier
article ID (from the artid column in sysmergearticles) for the corresponding article.
These are invoked via RPC.
The Merge Agent is able to detect changed data on the publisher and subscribers
because special triggers record data changes to published tables in merge system
tables on the publisher or subscriber as they occur. Each time a row is inserted or
updated, the change is recorded in MSmerge_contents. Each time a row is deleted, a
trigger records it in MSmerge_tombstone. Together, these tables serve the same
purpose in merge replication that the MSrepl_commands table serves in snapshot
and transactional replication. You can use the rowguid column in each of them to join
back to the original publishing table. By examining these tables, along with the
MSmerge_genhistory and MSmerge_info tables, the Merge Agent is able to
determine which rows to send to the other party in the synchronization operation.
It's worth mentioning here that the publisher and subscribers in a merge replication
scenario are pretty much equal partners. Unlike snapshot and transactional
replication, where the publisher is clearly preeminent, because merge replication is
designed for bidirectional replication, most of the key system tables that exist on the
publisher also exist on subscribers. For example, a merge subscriber logs changes to
published articles in its own MSmerge_contents and MSmerge_tombstone tables just
as the publisher does. It has an MSmerge_genhistory table that is structured
identically to the one on the publisher. It tracks sent and received generations in its
own MSmerge_replinfo table exactly as the publisher does. The one exception to this
symmetry is the set of conflict tables. Every published article has its own conflict
table. When a conflict is resolved, the details of the resolution are logged in the
conflict table. Since conflict resolution always occurs from the vantage point of the
publisher, only the publisher maintains conflict tables. Other than this, the system
tables and processes for transmitting changed data are virtually the same on the
subscriber as on the publisher in a merge replication implementation.
Conflict Resolution
Merge replication provides an elaborate and extensible conflict resolution system. To
begin with, you can specify what actually constitutes a conflict�a change made to
the same article row by two different parties in the merge replication scenario, or a
change made to the same article column in the same row by two different parties. If
you employ column-based conflict detection, two parties in the replication scenario
can change different columns in the same row without causing a conflict. If you
employ row-based conflict detection, changes made by two different parties to the
same row cause a conflict that must be resolved.
The system provides a default conflict resolver (the publisher wins all conflicts with a
subscriber; higher-priority subscribers win conflicts with lower-priority subscribers)
as well as several others you can use depending on your business needs. For
example, you can stipulate that the first or last publisher/subscriber to make a
change wins. You can specify that the minimum or maximum of two conflicting
values wins. You can also create your own conflict resolvers either as COM objects or
as stored procedures. You can list the currently installed resolvers using the
sp_enumcustomresolvers stored procedure.
When a publisher and subscriber are synchronized, the changes made on the
subscriber are uploaded to the publisher first, then the changes on the publisher are
downloaded to the subscriber. This allows for early conflict detection since conflict
resolution always occurs from the vantage point of the publisher and the default
conflict resolver stipulates that the publisher wins all conflicts.
During the synchronization process, deletes are processed first, followed by inserts
and updates. This means that changes recorded in the subscriber's
MSmerge_tombstone table are uploaded to the publisher when the synchronization
process first begins. After deletes are uploaded, inserts and updates are then
applied. As I said above, these are logged in MSmerge_ contents on the publisher
and subscribers.
Each time a publisher is synchronized with one of its subscribers, it effectively takes
ownership of the changes made by the subscriber to the published data. If the
subscriber made changes that did not conflict with those on the publisher (or if a
conflict resolver is being used that allows the subscriber to win), the changes are
applied to the publisher. When the publisher then synchronizes with other
subscribers, these changes are applied to the subscribers provided there are no
conflicts. If there are conflicts, they are resolved, and changes are made to the
publisher or subscriber accordingly.
If a publisher synchronizes with a lower-priority subscriber and receives a change
from it that is later reversed when the publisher synchronizes with a higher-priority
subscriber, the data on the lower-priority subscriber will differ from both the
publisher and the higher-priority subscriber until the Merge Agent runs to
synchronize it again. This means that it's entirely possible for different versions of
the same row to exist at different places in a replication scenario temporarily, even
after all subscribers have been synchronized once. Eventually, all parties will end up
with the same data, but it may take multiple synchronizations for that to occur.
Generations
Each row tracked in MSmerge_contents and MSmerge_tombstone is assigned a
generation number. A generation is a simple integer column that functions as a sort
of logical clock that enables the Merge Agent to determine when a change was
made to a row and how the time of that change relates to changes to the same row
made by other parties in the replication scenario. An integer rather than a datetime
value is used because it avoids having any sort of dependency on synchronized
clocks between sites and is more resilient to common intersite issues such as time
zone differences.
Only one row is maintained in MSmerge_contents for each row inserted or updated
in a table. Each time a row is updated, its generation number is updated in
MSmerge_contents with the current generation number as specified in
MSmerge_genhistory. Each time the Merge Agent synchronizes the article with the
publisher/subscriber, it updates the MSmerge_replinfo table to indicate the last
generation that was sent and received (via calls to the sp_MSsetlastsentgen and
sp_MSsetlastrecgen procedures, respectively), then calls sp_MSupdategenhistory to
update MSmerge_genhistory to the next generation number for each article for
which it retrieves changes.
To understand how merge generations work, let's walk through an example. Let's
say that TableA and TableB are published in two separate merge publications and
that they are the only two tables published from a given database. A couple of rows
in TableA are updated, and these rows are recorded in MSmerge_contents with the
current (maximum) generation for TableA found in MSmerge_genhistory, which is 2.
The Merge Agent then runs, retrieves the updated rows from MSmerge_contents,
and updates MSmerge_genhistory's maximum generation number for TableA to 4
because 3 is the maximum generation value used by any article in the table. Later, a
row in TableB is updated. It records its modified rows in MSmerge_ contents using
the current generation for TableB, which is generation 3. The Merge Agent runs
again, picks up the new modifications, and updates MSmerge_genhistory to use
generation number 5 for TableB because 4 is the maximum generation value on file
in the table. At this point, the current generation for TableA is 4 and the current
generation for TableB is 5. If we then update another row in TableA, that change will
be recorded in MSmerge_contents using generation 4. When the Merge Agent runs
and retrieves the changes made to TableA, it will set the current generation value for
TableA to 6 because 5 is the highest generation value currently on file. So changes
to TableA can be found under generations 2, 4, and 6; changes to TableB will be
found under generations 3 and 5. Each time the Merge Agent sets the current value
for a given article, it also sets the high-water mark for the next generation number
that must be produced, regardless of the article.
When you define a column filter, a view is created that includes only the requested
columns. This view will have a name of the form publication_ article_VIEW and will
reside in the publication database. The object ID of this view, rather than that of the
base object, will be stored in the sysmergearticles sync_objid column and will be
queried for new rows by the Merge Agent.
When you create a row filter, a similar process occurs. A view is created in the
publication database with a name of the form publication_article_ VIEW. It selects
rows from the associated table article that match the filter expression (WHERE
clause criteria) you specified when you defined the filter. Here's what a typical
vertical filter view looks like:
In this case, the filter is based on the id column in the table, so a WHERE clause is
generated that restricts the rows returned to those matching a particular set of id
values (bolded). Additionally, the view invokes the ISPALUSER function (using ODBC
escape syntax) to ensure that the user accessing the view is on the publication
access list for the publication. (The GUID passed into ISPALUSER corresponds to the
pubid value for the publication stored in sysmergearticles.)
As with a vertical filter view, the object ID for a horizontal filter view is stored in the
sync_objid column in the sysmergearticles table. On the subscriber, the object ID of
the table article itself is used.
If the subscriber inserts a row that violates a horizontal filter, the Merge Agent will
delete it the next time the subscriber synchronizes with the publisher. It will place
the deleted row in the MSmerge_tombstone table on the subscriber and will set
MSmerge_tombstone.reason to "System delete."
Optimizing Synchronization
SQL Server supports a special optimization for publications with horizontal filters
that allows the publisher to keep track of which rows go into and out of a filter
partition on a published table. If you have a horizontally filtered publication where
the filter column or columns are frequently changed, you may want to enable this
optimization (available via the Optimize Synchronization option in the Create
Publication Wizard) because it can drastically reduce the amount of data sent to
subscribers. Consider the situation where you horizontally filter a table based on the
ZipCode column. When the zip code is changed for a particular row in the table, the
partition the table corresponds to changes as well. By default, the Merge Agent has
no way to know which partition the newly changed row once belonged in, so it
doesn't know which subscribers to send the updated row to. It must therefore send
the update to all of them, wasting bandwidth and network resources along the way.
By enabling the Optimize Synchronization option, you provide the Merge Agent a
means of determining the previous values for the filter columns. This allows it to
determine which subscriber(s) need the updated row.
Keep in mind that this requires extra storage on the publisher. The exact amount of
additional space will vary based on the size of the filter columns. If the filter columns
are 50 bytes in size and you have 100,000 rows, you'll need 5MB of additional space
to use the Optimize Synchronization option. Given the tremendous bandwidth
savings this can provide, this is usually a good trade-off.
Dynamic Filters
When you create a publication containing a dynamic filter, you have the option of
having the system validate that the value of the nondeterministic function(s) you're
using to restrict the rows sent to a subscriber doesn't change between
synchronizations. This helps ensure that data is partitioned consistently with each
synchronization. For example, if you used the HOST_ NAME function to set up a
dynamic filter and a subscriber machine's computer name changed at some point
after rows had been delivered to the subscriber by the Merge Agent, the subscription
would need to be reinitialized because the rows currently on the subscriber would
fall outside the dynamic filter. Due to the existence of the filter, it would be
impossible for the Merge Agent to update those same rows with changes from the
publisher.
Join Filters
A join filter allows a row filter to be extended from one published table to another.
After supplying a horizontal filter for one table in the publication, you can extend the
filter to another table in the publication by setting up a join filter (basically
consisting of a T-SQL JOIN clause) that relates the first table to the second one. This
will cause the second table to be filtered by the join condition you've set up�only
rows that match the join condition will be supplied to subscribers.
You can relate multiple tables in this fashion, essentially filtering all the tables based
on the horizontal filter criteria for the first table combined with the join conditions
you specify for the other tables. The relationships you establish between these
tables are enforced during synchronization and allow you to set up complex
relationships between published articles.
Dynamic Snapshots
By default, when you set up a dynamic filter, the Merge Agent applies the initial
snapshot to the subscriber one row at a time in order to make sure that each row
matches the filter criteria (the filter criteria aren't resolved until synchronization).
Naturally, this can take awhile with large amounts of data. You can greatly speed up
the application of the initial snapshot by creating a dynamic snapshot. A dynamic
snapshot allows you to prepare a set of BCP data files that are filtered in advance by
values for SUSER_SNAME and HOST_ NAME that you specify. In other words, if you
have a subscriber whose Merge Agent will be connecting as a given user in order to
synchronize a dynamically filtered publication, you can create the snapshot for this
particular subscriber in advance and store it in its own folder. Then, when the
subscriber's Merge Agent connects to synchronize with the publisher, it can insert
data at normal bulk copy speed rather than inserting rows one at a time.
2. Create a dynamic snapshot job via the Create Dynamic Snapshot Job wizard.
(You can access this wizard by right-clicking a publication under the
Databases\YourDatabase\Publications node in Enterprise Manager.)
3. Create a pull subscription and specify the snapshot location that you supplied
to the Create Dynamic Snapshot Job wizard. Check the checkbox indicating
that this is a dynamic snapshot.
Keep in mind that dynamic snapshot jobs cannot be administered via Replication
Monitor. Although all replication jobs are managed by SQL Server Agent, most of
them can also be administered by Replication Monitor. Dynamic snapshot jobs are an
exception. Instead, you'll want to use Management\SQL Server Agent\Jobs to
manage dynamic snapshot jobs.
Identity Range Management
One of the common problems you often face with typical replication scenarios is managing identity
ranges on the publisher and subscriber effectively. If multiple parties can insert new rows, how do we
keep their identity values from colliding?
There are several solutions to this problem. One that I've used in the past is to set up odd-even/negative-
positive identity values. For example, say that you have a simple implementation with a publisher and a
single subscriber. You could seed the publisher's identity value at 1 and increment it by 1 with each row
insertion, and you could set the subscriber's seed value to �1 (remember, SQL Server's int data type is
signed) and increment it by �1 with each new row. Now let's say you have four machines in your
implementation�a publisher and three subscribers. Here, you could seed the first machine at 1 and
increment it by 2 with each new row (resulting in odd identity values); the second machine could be
seeded at 2 and incremented by 2 (resulting in even values); the third machine could be seeded at �1
and incremented by �2 with each new row (resulting in negative odd values); and the fourth machine
could be seeded at �2 and incremented by �2 (resulting in negative even values). You can expand this
to more machines by breaking up each identity range into smaller pieces. You get the basic idea�you
can come up with numerous creative ways to keep identity values from colliding between publishers and
subscribers just by using a few tricks with the identity column's seed and increment values.
Another way to manage identity values in replication scenarios is to use the NOT FOR REPLICATION
option when you define your identity column and either set up compatible identity ranges on the
publisher and subscribers or set identity values programmatically. Normally, when an identity value is
inserted directly via an INSERT statement while SET IDENTITY INSERT is enabled, the identity seed is
reset. Enabling NOT FOR REPLICATION gives replication agents a waiver in regard to identity
reseeding�that is, when a replication agent inserts a value into an identity column that was created with
the NOT FOR REPLICATION option, the identity value is not reseeded. This allows replication to do what it
must do in order to ship data from place to place within the topology�that is, to insert, update, and
delete data�without inhibiting the normal use of identity columns either at the publisher or at
subscribers.
If you go the NOT FOR REPLICATION route, you should also create a CHECK constraint on your articles in
order to ensure that the identity values on each participant in the replication scenario do not overlap.
Because you are assuming the responsibility of managing these values yourself, you must ensure that
they're assigned and managed logically and consistently.
Still another option for avoiding problems with identity value collisions in replication scenarios is not to
use an identity column at all (or avoid publishing it). If an identity column is not actually needed at the
subscriber, you may be able to use a vertical filter to omit it from the publication. If the column is part of
the table's primary key, you cannot omit it using a filter, so you may want to look at identifying a
different column or columns as your primary key if that's possible and reasonable in your scenario. For
example, the rowguidcol added by merge replication makes a decent primary key (although, at 16 bytes
of storage, it's a little larger than ideal). Or you may be able to use some other column or combination of
columns. The bottom line is that if you can avoid publishing the identity value in the first place, you
avoid the possibility of identity value collisions between the publisher and subscribers and can still
potentially take advantage of SQL Server's ability to automatically generate identity values on the
publisher if you decide to simply filter it out of the publication rather than drop it from the table.
The best way to manage identity values in replication scenarios is to let SQL Server do it for you. For
merge publications (where bidirectional modification is assumed) and transactional and snapshot
publication with queued updating subscribers enabled, SQL Server can manage identity ranges on the
publisher and subscribers. When you add an article to the publication, you can click the ellipsis button on
the Specify Articles page of the Create Publication Wizard to display the Table Article Properties dialog.
You can then click the Identity Range tab to set up SQL Server's automatic identity range management.
You can set three key values in this dialog. The first is the identity range size at the publisher. The second
is the identity range size at subscribers. Both of these default to 100; you can set them to something
higher or lower if you wish. The last value is the threshold at which to assign a new range. Via the
appropriate replication agent, SQL Server will automatically assign a compatible range of identity values
on the publisher and on each subscriber. No party in the replication scenario will start off with
overlapping ranges. Then, as rows are inserted into the table at each site and changes are synchronized
between the publisher and subscribers, the identity range will be checked to see how full it is. When it
reaches the threshold specified in the Table Article Properties dialog, a new range will be created for it,
and its identity seed will be set to the start of this new range.
We use a threshold rather than the end value of the range because synchronization may not be
scheduled often enough to keep the table from running out of values in the range. Therefore, we try to
leave a reasonably sized buffer at the end of the range so that we can create a new range before we run
out of values, provided that we synchronize often enough. If you run out of values in a managed identity
value range, you'll see a message like this:
As the message says, you can call sp_adjustpublisheridentityrange (or just allow synchronization to
occur) to alleviate this problem. Note that you call sp_adjustpublisheridentityrange regardless of whether
the problem actually exists on the publisher or on one of the subscribers.
You should be able to adjust the range sizes, threshold, and synchronization schedule to prevent this
error from occurring. You can check the details of a managed identity range by inspecting the
MSrepl_identity_range table.
Recap
Merge replication is considerably more difficult to set up and manage than
transactional or snapshot replication, but it offers several features that they don't
offer. It's not necessarily the best choice in every scenario where you need
bidirectional data replication (transactional replication is often quite adequate), but
merge replication offers more flexibility than snapshot or transactional replication,
especially for disconnect users and SQL Server CE subscribers.
The system offers a bevy of mechanisms for detecting and dealing with conflicts. You
can control what is considered a conflict in the first place and who wins when a
conflict is detected.
Knowledge Measure
1. True or false: You can manage dynamic snapshot jobs from the Replication
Monitor in Enterprise Manager.
7. In what table does merge replication track the last sent and received
generations?
8. True or false: Merge replication is the only type of replication that can publish
to heterogeneous subscribers such as Microsoft Access.
9. True or false: Merge replication is the only type of replication that can publish
to SQL Server CE subscribers.
10. When applying an update to a subscriber, does the Merge Agent call a stored
procedure or invoke the T-SQL UPDATE command?
12. What is the name of the uniqueidentifier column that merge replication will
automatically add to a table article if it does not already have one?
14. True or false: The Distribution Agent is responsible for transferring updates
from a merge subscriber to the publisher.
15. True or false: In merge replication, a trigger updates the GUID column in a row
each time it's changed.
16. What type of filter is the SUSER_SNAME T-SQL function typically used with?
18. When NOT FOR REPLICATION is not specified for an identity column and a
connection inserts a value into the column (while SET IDENTITY_INSERT is
enabled), what happens to the identity seed?
19.
What's the best way to avoid identity value collisions between a publisher and
subscribers or between multiple subscribers?
20. In what release of SQL Server did merge replication first appear?
22. True or false: During the synchronization process, deletes are processed first,
followed by inserts and updates.
23. True or false: During the synchronization process, changes from the publisher
are downloaded to the subscribers first, then their changes are uploaded in
sequence to the publisher.
24. What role does the distribution database play in merge replication?
�Steve McConnell[1]
[1] McConnell, Steve. After the Gold Rush. Redmond, WA: Microsoft Press, 1999, p. 27.
It's my hope that by pointing out these features, particularly those in Transact-
SQL�the facility users go through to reach the server�I will encourage these
features either to be documented or to be made completely inaccessible to users.
It's my belief that SQL Server has an inordinate number of these hidden goodies. It
appears to have far more of these types of things than similar products. Perhaps
that's because it has so many more features than other similar products; perhaps
that's because it relies too much on undocumented functionality.
Many of these undocumented features are robust enough for important parts of the
product such as replication or the Index Tuning Wizard to depend on them, so the
argument that they aren't suitable for use by end users just doesn't ring true. Some
are used to implement fringe features of the product and very well might not be
suitable for more general use, so I can understand why they're undocumented.
Some trace flags can be downright dangerous if used improperly, for example, so
I've always been wary of listing many of them. Generally, I list only those
undocumented features I consider to be safe to use or so generally applicable that I
feel compelled to mention them.
I think some undocumented features are undocumented simply because they fell
through the cracks. There are so many features in SQL Server in general and in
Transact-SQL in particular that I wonder if perhaps some of them were inadvertently
left out of the product documentation. Take the TriggerInsertOrder,
TriggerDeleteOrder, and TriggerUpdateOrder OBJECTPROPERTY property names, for
example. I don't know why they aren't documented. For that matter, I don't know
why they exist. Replication triggers use them to ensure that a trigger is the first one
executed for a particular type of DML operation. The documented
ExecIsFirstInsertTrigger, ExecIsFirstUpdateTrigger, and ExecIsFirstDeleteTrigger
property names could be used to return the same information�there's really no
need for these undocumented property names in the first place. And, even if there
were, what would be the harm in documenting them so long as users understood
that, other than the first or last trigger for a particular DML operation, trigger order
execution is undocumented and can't be depended on in any fashion?
So, all that said, let me repeat the disclaimer from my previous books: Use
undocumented features at your own risk. By leaving product features
undocumented, a vendor reserves the right to change them at any time. A hot fix
release, security patch, service pack, or new product version could change or
eliminate undocumented functionality that you or your code has come to depend on.
Moreover, an undocumented feature you make use of may not have been tested as
thoroughly as the rest of the product (it's unusual for test teams to develop test
suites for undocumented features) and may not work reliably in all situations. Many
times, using an undocumented feature isn't even the best tool for the job and may
even be completely superfluous (e.g., TriggerInsertOrder). The euphoria that comes
from discovering an undocumented xproc and putting it to immediate use in
production code quickly subsides when it crashes your server with an access
violation. And from a product support perspective, assume undocumented means
unsupported�you can't expect a vendor to support functionality it has intentionally
left out of a product's documented feature set. The bottom line is: It's best not to use
undocumented features unless absolutely necessary.
Now that I've got that out of my system, let's proceed with the discussion of how to
find undocumented SQL Server features. Finding these is surprisingly easy; you will
likely be astonished at just how much hidden functionality is right under your nose.
NOTE
I have it on good authority that Microsoft intends to remove much of the access to
undocumented features in the next release of SQL Server. That's probably a good
move; however, for now, those features are still there and readily accessible, so we'll
explore how to get at them in order to better understand how the product works.
The syscomments Gold Mine
A bountiful source of undocumented features, commands, functions, and syntax can
be found in the syscomments system table. The syscomments table is where the
source code to every procedural object�every stored procedure, trigger, view, user-
defined function, default, and rule object�in a database is stored. Each database
has its own copy of this table. I'm especially fond of the syscomments tables in
master, msdb, and the replication distribution databases. I've also found
undocumented features hiding in replication-generated stored procedures, views,
and triggers in publication or subscription databases.
SELECT OBJECT_NAME(id),
SUBSTRING(text,PATINDEX('%dbcc%',text),50) as text
FROM master..syscomments
WHERE PATINDEX('%dbcc%',text)<>0
To find just the DBCC TRACEON calls (and thereby find references to undocumented
trace flags), we might run this query:
SELECT OBJECT_NAME(id),
SUBSTRING(text,PATINDEX('%traceon%',text),50) as text
FROM master..syscomments
WHERE PATINDEX('%traceon%',text)<>0
Some of the procedural objects in master, msdb, and distribution helpfully divulge
whether they make use of undocumented features by including the word
"undocumented" or the words "DO NOT DOCUMENT" somewhere in their text. I've
found these to be especially useful. You can find them by using a query like this:
SELECT OBJECT_NAME(id),
SUBSTRING(text,PATINDEX('%document%',text),50) as text
FROM master..syscomments
WHERE PATINDEX('%document%',text)<>0
Goodies in sysobjects
The sysobjects system table exists in every database and contains a row for every
object in the database. If you have some idea of the name or type of an
undocumented object, you can check for its existence in Enterprise Manager's
navigation tree and in Query Analyzer's Object Browser. You can also query
sysobjects directly to get this information, which is where Enterprise Manager and
Query Analyzer ultimately get it. The code below queries sysobjects directly for
simplicity's sake.
Extended procedures must reside in the master database and are usually prefixed
with xp_. Here's a sysobjects query that returns the name of the extended
procedures registered in the master database:
Note that some of these procedures have prefixes of sp_ rather than the more
traditional xp_. Also, some are implemented internally by the server rather than in
external DLLs. Extended procedures implemented internally by SQL Server are
known as special procedures (spec procs for short). You can list the spec procs
registered in master via a query like this:
SELECT OBJECT_NAME(c.id)
FROM master..syscomments c JOIN master..sysobjects o ON
(c.id=o.id)
WHERE o.type='X'
AND c.text NOT LIKE '%.dll%' OR c.text IS NULL
The name of the DLL containing a regular xproc will be listed in the xproc's text
column in syscomments. Since a spec proc doesn't reside in a DLL, its entry in
syscomments will contain something besides a DLL name. The query above lists the
xprocs entries in syscomments where this is the case.
Knowing about the existence of an extended procedure is one thing; knowing how to
use it is another. Many xprocs are called by regular procedural objects such as
stored procedures and user-defined functions. Here's a T-SQL query that will list all
the extended procedures whose names begin with xp_ that are called by objects in
master:
SELECT OBJECT_NAME(id),
SUBSTRING(text,PATINDEX('%xp_%',text),50) as text
FROM master..syscomments
WHERE text LIKE '%xp\_%' ESCAPE '\'
(I've used an ESCAPE character here to help eliminate false hits due to the fact that
"_" is a wildcard character.)
Undocumented Functions
Of course, this query (and all the others in this chapter that work similarly) can yield
false hits due to extraneous string matches in the text in syscomments. Obviously,
you'll have to examine the results you get from these queries and eliminate the false
matches they return.
Scripting Undocumented and System Objects
Enterprise Manager disables the Generate SQL Script menu item for objects marked
as system objects (those whose system bit has been set via a call to
sp_MS_marksystemobject). You can still double-click a system object in Enterprise
Manager to display its source, but you can't edit it. The obvious intent here is to
prevent people from accidentally modifying important internal objects. The
Properties dialog for stored procedures and similar objects is a pain to use, and I
rarely subject myself to trying to view or edit the source to procedural objects with
it. It's a modal dialog (and, therefore, doesn't have a maximize button), and it's also
missing a resize grip in its lower right corner, something every self-respecting
resizable form should have. So rather than deal with the inadequacies of Enterprise
Manager, I usually prefer to script stored procedures to a file, then use another tool
to view or edit them. Unfortunately, Enterprise Manager's attempt to protect me
from myself hampers this a bit when dealing with system procedures. If I can't easily
generate a script and I don't want to use Enterprise Manager's limited Stored
Procedure Properties dialog, how can I quickly view or edit the text of a system
stored procedure? Thankfully, Query Analyzer doesn't have the same concern for my
well-being that Enterprise Manager does. Its Object Browser allows me to generate a
script for any unencrypted object on my server, including system objects. So, when I
need to view or edit procedural objects�especially system objects�using a capable
editor, I use Query Analyzer.
Of course, another way to list the source for a system procedural object is to use the
sp_helptext procedure. You can use sp_helptext to return the source for any
unencrypted procedural object in a database.
The Profiler Treasure Trove
An excellent way to learn about undocumented feature use in SQL Server is to use
Profiler to record the execution of stored procedures and batches on the server,
especially by tools included with SQL Server. A couple of my favorites are Enterprise
Manager and the Index Tuning Wizard. The replication agents also occasionally
reveal some hidden goodies.
Examining a Profiler trace captured while these tools run can yield interesting details
about the way the server works. For example, while watching a Profiler trace during
an Index Tuning Wizard run, I discovered that the wizard creates what are known as
"hypothetical" indexes�statistics-only indexes used to determine whether a
different indexing strategy would improve the performance of a given query or
workload. (You can check to see whether an index is a hypothetical index via the
IsHypothetical INDEXPROPERTY property name.) The Index Tuning Wizard builds
these by calling CREATE INDEX using an undocumented WITH option: statistics_only.
This tells us some things about how the server works. For example, if I didn't already
know it, this would have told me that the optimizer doesn't look at data when it
decides whether to use an index�it looks only at the statistics for an index when
optimizing a query plan. It tells us how the wizard can evaluate so many different
kinds of indexes in such a (relatively) short period of time: It creates test indexes
without any data. So we learn something about the Index Tuning Wizard as well as
SQL Server from the discovery of this undocumented feature.
Profiler also comes in handy in figuring out how undocumented xprocs and functions
are used. SQL-DMO and Enterprise Manager use undocumented xprocs and functions
extensively; capturing a Profiler trace while you run Enterprise Manager through its
paces can turn up all sorts of hidden features.
One peculiar thing I've discovered about Profiler traces, though, is that the tool is
apparently hard-coded to hide lines that contain the text "-- sp_ password." Given
that this text would be displayed for SP:StmtStarting/StmtCompleted events for the
sp_password system procedure, the intent seems to be to keep Profiler traces from
recording password changes. That said, the rudimentary way in which this has been
implemented would allow a hacker to cover her or his tracks by appending "--
sp_password" to any line in a T-SQL batch or stored procedure she or he did not want
to show up in a trace.
Snooping around in the Installation Scripts
Another good source of undocumented information is the set of installation scripts
that ship with SQL Server. You can find these scripts in the Install folder under your
root SQL Server installation. You can conduct the same sorts of searches against
these scripts that we used with syscomments to find undocumented trace flags,
DBCC commands, stored procedures, user-defined functions, and the like. For
example, I discovered how to create my own INFORMATION_SCHEMA views (and
documented the process in my last book) by examining ansiview.sql. I figured out
how to create my own system functions by examining instdist.sql. There's a wealth
of undocumented information to be had by examining the scripts that ship with the
product, especially with respect to creating objects. We know that the server is
somehow creating many of these via Transact-SQL. If we want to use the same
functionality or just want to better understand how the server works, the installation
scripts are good places to start.
DLL Imports
A final place to check for undocumented features is the SQL Server executable's
import table. Every Windows executable has an import table. For each of the DLLs
that the executable requires to load, this table lists the functions that it has explicitly
referenced. By examining this table, you can learn about undocumented functions
the executable may be calling. Once you have a function name, you can use a
debugger to set a breakpoint and see when it's called.
A good example of this is the OPENDS60.DLL that ships with SQL Server. This DLL
implements the Open Data Services API that the server uses internally and that is
used to construct extended procedures. If you look at the import table for
sqlservr.exe, the SQL Server executable, you'll see that it imports a large number of
functions from OPENDS60.DLL, many of them undocumented. If you were to attach a
debugger to sqlservr.exe and set breakpoints for these functions, you could likely
determine in what circumstances they were used. This might help narrow down
exactly what they do and how and why the server calls them.
Recap
Don't use undocumented features unless there's absolutely no other way. Your best
strategy here is not to use them at all unless instructed to by Microsoft. Use your
investigation into the undocumented features in SQL Server as a means of getting to
know the product better, not a treasure hunt for nifty routines you can drop untested
onto production systems.
Knowledge Measure
1. Cite three reasons you shouldn't use undocumented SQL Server features.
Chapter 25. DTSDIAG
A few times over the years I've faced dire situations that tempted me to return
to my former faith. Times of trial and despair are hard on anyone, and my
former faith provided just the right crutch to avoid facing reality and ascribe
something that was truly unjust or wrong to some higher purpose. But each
time I've faced this down�each time I've withstood the temptation�I've found
myself stronger and better able to handle the storms of life than before.
Freeing oneself from a mental dependence on errant faith is a lot like giving up
an addiction�there are powerful temptations to lapse back into the former
habits, but the momentary dulling of the senses that comes from falling off the
wagon is never worth the high cost.
�H. W. Kenton
I will close out this book by introducing you to a diagnostic application that you may
find useful in your own work. It's based on SQL Server's DTS technology and makes
use of the DTS object model. It demonstrates the kind of power an application can
wield by bringing together the technologies on which SQL Server is based. If you
haven't yet read Chapter 20 on DTS, you might want to before proceeding.
The name of the application is DTSDIAG. Its purpose in life is to collect diagnostic
data from SQL Server. It can simultaneously collect Perfmon/Sysmon counters; a
SQLDIAG report; the application, system, and security event logs; a Profiler trace;
and the output of a blocking detection script (as defined in Microsoft Knowledge
Base articles 251004, "INF: How to Monitor SQL Server 7.0 Blocking," and 271509,
"INF: How to Monitor SQL Server 2000 Blocking").
DTSDIAG consists of a standalone Visual Basic application, four DTS packages, and
some miscellaneous command line tools and scripts that it executes to gather the
desired diagnostic data. The VB app allows you to specify the version of SQL Server
to connect to, as well as the authentication information to use. Once the collection
process has been started by clicking the Start button in the app, you can stop it by
clicking the Stop button.
I've often found the need for a tool such as this when diagnosing SQL Server issues.
Many times, expecting someone to collect Perfmon, Profiler, and the other types of
diagnostics that I typically like to look at when investigating an issue turns out to be
too much to ask. Often, the person I'm trying to assist simply can't get all the
diagnostic collections going at once. Sometimes they can collect the right
diagnostics, but they collect them at the wrong times or at different times. DTSDIAG
alleviates this by allowing me to configure which diagnostics I need before sending
the tool out to a target machine. I set up the types of data I want to collect in an INI
file, then have the DTSDIAG executable and support files copied onto the target
machine and executed. The only data supplied at the collection site is the name of
the server/instance (and version) to connect to and the supporting authentication
information. This makes the diagnostic collection process as foolproof as possible
while still allowing it to be configured as necessary.
So, now that you know what the app does, let's have a look at its source code. I've
already mentioned that diagnostic collection is started/stopped via the Start/Stop
button in the DTSDIAG application. Here's the VB code attached to that button
(Listing 25.1).
Listing 25.1
Else
btStartStop.Enabled = False
ExecutePackage "dtsdiag_shutdown_template.dts",
"dtsdiag_shutdown.dts", App.Path + "\dtsdiag_shutdown.log"
ExecutePackage "dtsdiag_cleanup_template.dts",
"dtsdiag_cleanup.dts", App.Path + "\dtsdiag_cleanup.log"
bRunning = False
btStartStop.Caption = "Start"
btStartStop.Enabled = True
End If
End Sub
We use the same button for starting and stopping collection and merely change the
button's caption based on what state we're in. When we start collection, we call a
subroutine named ExecutePackage in order to run the dtsdiag_template.dts
package. ExecutePackage saves dtsdiag_template.dts as dtsdiag.dts (I'll explain why
in a moment) and runs it.
Certain diagnostics such as the SQLDIAG report and the system event logs can be
collected when DTSDIAG is started up or when it is shut down or at both occasions.
Whether and when these diagnostics are collected is specified in the DTSDIAG.INI
file. The dtsdiag_shutdown_template.dts package exists to collect diagnostics that
have been configured for collection during shutdown. The
dtsdiag_cleanup_template.dts package exists to remove the stored procedures and
other remnants from the collection process once DTSDIAG is stopped. It also checks
for the existence of KILL.EXE, a utility from the Windows NT 4/2000 Resource Kit that
can terminate other processes, and attempts to kill instances of osql, the utility
DTSDIAG uses to collect much of its diagnostic data.
Listing 25.2
[DTSDIAG]
SQLDiag=1
SQLDiagStartup=0
SQLDiagShutdown=1
EventLogs=1
EventLogsStartup=0
EventLogsShutdown=1
Profiler=1
ProfilerEvents=76,75,92,94,93,95,16,22,21,33,67,55,79,80,61,69,25,
59,60,27,58,14,15,81,17,10,11,35,36,37,19,50,12,13
Perfmon=1
BlockingScript=1
BlockerLatch=0
BlockerFast=1
MaxTraceFileSize=100
MaxPerfmonLogSize=256
PerfmonPollingInterval=5
ProfilerPollingInterval=5
BlockingPollingInterval=120
Counter0=\MSSQL$%s:Buffer Manager\Buffer cache hit ratio
Counter1=\MSSQL$%s:Buffer Manager\Buffer cache hit ratio base
Counter2=\MSSQL$%s:Buffer Manager\Page lookups/sec
...
The format of the file should be pretty self-explanatory. Each type of diagnostic has
its own Boolean switch. For example, if the Profiler value is set to 1, we attempt to
collect a Profiler trace; otherwise, we don't.
Some of the settings in the file serve as options for the collection process. For
example, ProfilerEvents contains a comma-delimited list of events (see
sp_trace_setevent in Books Online for the master event number list) to capture in
the Profiler trace. The CounterN entries contain the list of Perfmon/Sysmon counters
to collect. The BlockerLatch and BlockerFast options contain parameter switches for
the blocking detection script (again, as outlined in Knowledge Base articles 251004
and 271509).
The key routine in DTSDIAG.EXE is the ExecutePackage method. Let's look at the
code (Listing 25.3), then I'll walk you through what it does and how it does it.
Listing 25.3
Dim oFs
Set oFs = CreateObject("Scripting.FileSystemObject")
If oFs.FileExists(TargName) Then
Kill TargName 'Delete in advance so the file won't grow
'ad infinitum
End If
oPkg.SaveToStorageFile TargName
oPkg.Execute
oPkg.UnInitialize
Set oPkg = Nothing
End Sub
The routine begins by instantiating a DTS Package object. Although Package2 is the
newer interface (introduced with SQL Server 2000), coding to the Package interface
allows us to run on SQL Server 7.0.
Once the Package object is created, we load the specified package from its
structured storage file. Each of the packages DTSDIAG uses is stored in COM's
Structured Storage File format.
We next iterate through the tasks defined in the package and locate each Execute
Process task by searching for CreateProcess in the task's name. We access each
Execute Process task by assigning the CustomTask property of the generic task
object in the Package.Tasks collection to the previously dimmed
DTS.CreateProcessTask variable.
In case you're wondering, we iterate through the Execute Process tasks in each
package in order to translate certain placeholders in the ProcessCommandLine
property before executing the package. Because we need to execute complex scripts
and retrieve their variable output in order to collect diagnostic data via DTSDIAG, we
can't use a typical Execute SQL task to run much of the T-SQL DTSDIAG runs.
Instead, we must shell to OSQL.EXE. Obviously, we want our calls to osql to be
configurable�for example, we want to be able to specify the server and instance to
connect to, the options for some of the diagnostic stored procedures we run, and so
on. We could have used one of the custom task objects we built earlier in the book to
make this a snap, but that would have required the custom task to be installed on
the target machine when packages that contained it were executed. Because I didn't
want to require COM objects to be registered before diagnostics could be collected,
DTSDIAG doesn't use any custom tasks. Instead, it uses regular Execute Process
tasks and placeholders in the ProcessCommandLine property in a manner similar to
the ExecuteSQLScript and ExecuteScript custom tasks we built earlier in the book.
Our VB code iterates through these tasks and replaces the placeholders with their
appropriate values prior to executing a package.
Note the call to the TranslateVars function. TranslateVars is responsible for
translating the variables in each ProcessCommandLine into their appropriate values.
It's actually more complex than ExecutePackage, and we'll tour it in just a moment.
Once the ProcessCommandLine property for each Execute Process task has been
properly translated, we write the translated package to the target package name
and execute it. When the package finishes executing, we clean up the package
object and return.
Some of the values we need to translate come from the DTSDIAG.INI configuration
file. Therefore, our code contains a Declare Function DLL import for the
GetPrivateProfileString API function, which is the Win32 function used to retrieve
values from an INI file. Listing 25.4 shows the source code for TranslateVars and the
GetPrivateProfileString import.
Listing 25.4
strWork = Space(BUFFSIZE)
' Events
Res = GetPrivateProfileString("DTSDIAG", "ProfilerEvents", "", _
strWork, BUFFSIZE, App.Path + "\dtsdiag.ini")
' MaxTraceFileSize
strWork = Space(BUFFSIZE)
Res = GetPrivateProfileString("DTSDIAG", "MaxTraceFileSize", "", _
strWork, BUFFSIZE, App.Path + "\dtsdiag.ini")
' BlockerLatch
strWork = Space(BUFFSIZE)
Res = GetPrivateProfileString("DTSDIAG", "BlockerLatch", "", _
strWork, BUFFSIZE, App.Path + "\dtsdiag.ini")
' BlockerFast
strWork = Space(BUFFSIZE)
Res = GetPrivateProfileString("DTSDIAG", "BlockerFast", "", _
strWork, BUFFSIZE, App.Path + "\dtsdiag.ini")
' BlockingPollingInterval
strWork = Space(BUFFSIZE)
Res = GetPrivateProfileString("DTSDIAG", _
"BlockingPollingInterval", "", _
strWork, BUFFSIZE, App.Path + "\dtsdiag.ini")
TranslateVars = CmdLine
End Function
Once all the required configuration values are retrieved from DTSDIAG.INI,
TranslateVars uses the VB Replace function to translate each token into its
appropriate value. It finishes by returning the translated process command line as its
function result.
You may be wondering why we don't just use a Dynamic Properties task inside the
relevant DTS packages since these INI values ultimately end up inside packages.
After all, a Dynamic Properties task can retrieve values directly from an INI file
without requiring any type of Automation code. Rather than coding an external app
that modifies packages on the fly using COM Automation, wouldn't it be simpler just
to use a Dynamic Properties task inside each package where we need to read INI
configuration values? The answer is that we do use one when possible. However,
many of the configuration values we need to supply must be inserted into the
middle of task property values, so they can't readily be supplied by a Dynamic
Properties task. Using a Dynamic Properties task to assign an INI configuration value
to a property is tenable only when you are assigning the entire property. Assigning
only a portion of the property requires a script or external Automation code.
So, now that we've toured the VB source code for DTSDIAG, let's talk about the DTS
packages it uses. Open dtsdiag_template.dts (in the CH25\dtsdiag subfolder on the
CD accompanying this book) in the DTS Designer so that we can discuss a few of its
high points.
The package begins by creating a folder under the startup folder named OUTPUT. If
the folder already exists, it is deleted and recreated. This folder will contain all the
files collected by DTSDIAG. Output files from tasks we execute to get set up for the
collection process (e.g., creating stored procedures) will have ## prefixed to their
names. This allows them to be easily distinguished from the actual diagnostic files
we're interested in. Normally you can delete these ## files after the collection
process is complete. You'll need them only if there is some problem with DTSDIAG.
Note the use of a Dynamic Properties task to load configuration values from
DTSDIAG.INI. As I mentioned earlier, we load as many configuration values as we
can using a Dynamic Properties task. Each type of diagnostic has a global variable
associated with it that controls whether it gets collected. For example, the global
variable sqldiag controls whether SQLDIAG.EXE is executed. The Dynamic Properties
task sets the sqldiag global variable by reading DTSDIAG.INI and retrieving the value
of the SQLDiag key.
The Blocker, Profiler, and SQLDIAG processes within the package begin by calling
osql to create the stored procedures they will call to collect the required data. The
blocker process creates two stored procedures: one named sp_code_runner, a stored
procedure capable of running other procedures or T-SQL code on a schedule or until
a logical condition becomes true, and one named either sp_blocker_pss70 or
sp_blocker_pss80 (depending on the version of SQL Server you're connecting to), the
blocking detection stored procedures provided in the Knowledge Base articles I
mentioned earlier. Because the sp_blockerXXXX procedures belong to Microsoft, I
have not included them on the CD accompanying this book. You will have to access
the aforementioned Knowledge Base articles at http://www.microsoft.com and
download them yourself if you want to use DTSDIAG to run them. Alternatively, you
can supply your own blocking detection procedure(s)�there's nothing requiring the
use of the Microsoft stored procedures in DTSDIAG.
Note that we don't execute SQLDIAG.EXE directly from our DTS package.
SQLDIAG.EXE must be run on its host SQL Server; running it directly from the
package would require that the package be run on the server, something you might
not want to do. Instead, we call a stored procedure that shells to SQLDIAG.EXE on
the server via xp_cmdshell. This allows you to collect a SQLDIAG report without
physically being on the SQL Server machine. Note that this technique requires
additional steps on a SQL Server 2000 cluster (see Knowledge Base article 233332).
The Perfmon task executes a custom utility I've written in C++ (also included on the
CD) that collects a specified set of Perfmon counters and writes them to a Perfmon
BLG-format log. PMC is similar to the LogMan utility included with Windows XP and
later (see Knowledge Base article 303133) but works on Windows 9x and later as
well. Note that, because of a header file change Microsoft made with the
introduction of Windows XP, you will need the version of PDH.DLL (the Performance
Data Helper library, the engine behind Perfmon/Sysmon) that ships with Windows
2000 in order to use PMC on Windows XP or later. For your convenience, I've
included this file in the dtsdiag folder on the CD accompanying this book. If you
decide to run PMC on Windows XP or later (as opposed to running LogMan), I
recommend that you use the version of the PDH.DLL I've included with DTSDIAG. You
shouldn't replace the version of PDH.DLL that comes with the operating system with
the one I've included. Just leave it in the DTSDIAG startup folder, and PMC will find it
when it starts.
PMC reads the INI file name passed into it as a parameter (DTSDIAG.INI, in this
case), locates INI values named CounterN, and adds each one to a Perfmon BLG log.
If it finds the string "%s" in a counter name, it translates this to the name of the
specified SQL Server instance (optionally passed on its command line) before adding
it to the Perfmon log. If no instance name is specified, but PMC encounters "%s" in a
counter name, it assumes the default SQL Server instance is being specified and
replaces the entire "MSSQL$%s" string with "SQLServer" in order to add the counter
for the default instance.
The event logs are collected using the elogdmp.exe utility included with the
Windows 2000 Resource Kit. Again, since this utility belongs to Microsoft, I haven't
included it on the CD accompanying this book. You can actually use any event log
dumper utility you want (e.g., dumpel.exe from the Windows NT 4 Resource Kit will
also work)�you just need to configure the event log Execute Process tasks
accordingly.
Note that the event logs are collected via an Execute Package task, which starts a
separate package that collects all three of them in parallel. This is done because
event logs are one of those tasks that can be collected at startup or shutdown or
both. So, in order to allow for event log collection from dtsdiag_template.dts as well
as dtsdiag_shutdown_template.dts, we've put the event log collection tasks off in
their own package, which we execute as appropriate during startup or shutdown.
A final aspect of DTSDIAG that's worth exploring is the way we use ActiveX script
workflow associations to enable/disable certain execution paths within packages.
You'll recall that we discussed this technique earlier in the book. In DTSDIAG we use
it, for example, to disable the Profiler task path when the DTSDIAG.INI Profiler value
is set to 0 (false). Listing 25.5 presents the ActiveX script that's associated with the
Create Profiler Proc Execute Process task.
Listing 25.5
Function Main()
If DTSGlobalVariables("profiler") Then
Main = DTSStepScriptResult_ExecuteTask
Else
Main = DTSStepScriptResult_DontExecuteTask
End If
End Function
The global variable profiler is assigned by the Dynamic Properties task at the start of
package processing. If this variable is nonzero, we execute the Profiler task path,
otherwise, we skip it.
For tasks that can be executed at startup, shutdown, or both, we have to check a
second global variable to determine whether to execute them. Listing 25.6 shows
the ActiveX script associated with the SQLDIAG task line.
Listing 25.6
Function Main()
If (DTSGlobalVariables("sqldiag")) And _
(DTSGlobalVariables("sqldiagstartup")) Then
Main = DTSStepScriptResult_ExecuteTask
Else
Main = DTSStepScriptResult_DontExecuteTask
End If
End Function
So, we check not only the global variable sqldiag but also sqldiagstartup (or
sqldiagshutdown) to be sure that we're supposed to collect the SQLDIAG report when
this particular step is executed. In the dtsdiag_template.dts, we check
sqldiagstartup; in dtsdiag_shutdown_template.dts, we check sqldiagshutdown.
That's DTSDIAG in a nutshell. You can run the utility to experiment with it further and
load its various packages into the DTS Designer to see how they're constructed. You
can play with the VB code to explore controlling DTS packages via Automation. The
source code and support files for DTSDIAG are located in the CH25\dtsdiag subfolder
on the CD accompanying this book.
A natural evolution to the DTSDIAG concept is the notion of loading the collected
data into SQL Server for analysis. I will leave that as a reader exercise but will
provide a few hints for the adventurous. The event log and SQLDIAG reports are
plain text files and, with some massaging, can be easily imported into SQL Server
tables. The blocking script output can also be processed as text and imported into a
set of SQL Server tables, although it's a little more challenging because of the
variability in the output format. A Profiler trace can be read as a rowset using the
fn_trace_gettable T-SQL function, so importing it into a table is a snap. A Perfmon
BLG log can be converted to a CSV format using the Relog tool included with
Windows XP and later (see Knowledge Base article 303133), which can then be
imported into SQL Server using DTS. Once you have all the data in a SQL Server
database, you can dream up all sorts of sophisticated analysis for it. The trick is in
coalescing the data in the first place.
Part V: Essays
Why I Really, Really Don't Like Fish!
Get people to have a positive attitude and like those they work with so that they'll
be better workers? Gee, wish I'd thought of that. What a novel concept! You mean I
can boost my effectiveness as a manager just by pumping people up with hokum?
And to think all this time I've labored under the delusion that my people actually
wanted substance from me. Now, after all these years, I learn that they just want me
to stroke their egos, to blow smoke up the proverbial derrière.
Recently, I had the misfortune of being subjected to the latest surfacing of what has
become a cyclic management fad: the positive-attitude triteness known as the Pike
Place Market Fish! story. It's a fish story, alright.
I originally stumbled across this frivolity a few years ago in video form. While
designing a new human resources system for the CIA, I arrived early for a meeting
one day to discover the entire team I worked with crowded around a television set
engrossed in the story of the Seattle fish market and how wonderful everything was
simply because the employees had decided that it would be.
I had a bit of a revelation as I watched the poor saps eagerly eating up the stale
Fish! claptrap: I'm in the wrong business. To paraphrase one of the best conmen of
them all, if a man really wants to make a million dollars, all he needs to do is come
up with a clichéd, banal self-help philosophy that has no substance or quantifiable
means of measuring its effectiveness. Drum up a few anecdotal success stories, put
the whole thing on video or turn it into a book, and you've got everything you need
to make your company a happy place, regardless of what it does for anyone else.
Making a million dollars would make almost anyone happy, even a cynic like me.
The reason that Fish! and similar fads are so vacuous is that many of the real
problems of management and employee morale cannot be solved by simply
changing one's attitude. If someone is grossly underpaid, all the positive thinking in
the world won't put more money in her bank account each month. If an employee is
working the graveyard shift, all the toy fish in the sea won't make him feel any
better about being away from his family every night. And if management is truly
inept, eagerly jumping from one fad to another on what seems like a monthly basis,
no self-help video is going to fix that for the long term. In that situation, the
employees' frame of mind isn't the problem; management is. Given that it is the root
cause of the problem, only management can effect a long-term solution.
You might view Fish! and its ilk as harmless drivel. After all, what's wrong with
getting someone to work at being happy, to try to keep a positive attitude regardless
of the situation? There's no harm in that�if only it ended there. The problem with
Fish! and similar nonsense is that it gives management an out, a way to shirk its
responsibility to keep employee morale at a respectable level. It puts the onus of
keeping the employees happy on the employees themselves, which works out nicely
for management (I'm sure it improves their morale), but not so well for the
employees, who are often powerless to address the real issues affecting their
happiness: compensation, opportunities, job fulfillment, job security, and so on.
Subscribing to the Fish! mentality allows management to say, "So, you find stapling
reports all day, every day boring? Well, just 'choose your attitude'! 'Learn to play'!
Throw that stapler across the office every now and then!" When an employee says,
"I haven't had a raise in five years, and my daughter starts college next month, and I
can't help her financially," management can respond, "Learn to 'be present' for her.
Learn to 'make her day.' She'll appreciate that more than any financial help you
could give her." Fish! provides a convenient mechanism for substituting clichés and
platitudes for substance. It allows management to address real-life problems with
glib nonsolutions in a Dilbert-esque fashion. And it's a great cost-cutting move for
the business�after all, talk is cheap.
Beyond giving management a way out of doing what it should instead want to
do�namely, keeping its most valuable resource, its employees, happy�the thing
that really peeves me about Fish! is that it encourages management to treat
employees like children. In place of having a substantive discussion about real
issues, Fish! makes it okay to hand people a child's toy and brand them as
malcontents if they don't find that satisfactory. It makes it okay to twist someone's
arm into going home at the end of a twelve-hour day and making cookies to bring in
the next day so that he or she can remain a "team player." Beyond the inanity and
emptiness of the hollow promises and nonsolutions Fish! encourages management
to proffer, it actually adds insult to injury by making it acceptable to humiliate adults
by treating them as though they had the emotional awareness of a two-year-old and
expecting them to be pacified by trinkets, manager-speak, and token gestures.
Amalgams of self-help management gobbledygook such as Fish! are demeaning by
their very existence because they presume that those who hear their pitch are naive
enough and emotionally shallow enough to fall for it. Unfortunately, those subjected
to their ploys often have little choice in the matter.
So, I'll offer you the same words of wisdom Ernest Hemingway gave Stanley Karnow
in Friendly Advice: Develop a built-in baloney detector (only Hemingway didn't say
"baloney"). Don't fall for such fluff. Insist on substantive discussions with
management. Don't be a malcontent, and do work at being happy, but, to
paraphrase Billy Crystal in the movie Mr. Saturday Night, don't let anyone defecate
in your hat and then say, "Thank you, it fits much better now." When we acquiesce
to lame management overtures, we make things worse for everyone. We give those
in management who don't realize the value of their employees no reason to change
their ways in the future. Stand up for your rights, your fellow employees, and your
dignity. No job is worth losing your self-respect.
Pseudo-Techie Tactics 101
How to Make Yourself Appear to Be an
Expert via Newsgroup Postings
1. Answer a different question than what was asked, with lots of superfluous
detail, particularly if the detail is obscure or hard to verify.
2. Avoid answering questions directly. If a poster asks a direct question such as,
"Will this work?" or "Will technique x perform better than y?" avoid actually
answering the question at all costs. Never give a definitive answer to a
question and never make definitive statements if you can avoid it. If you are
forced to make a definitive statement, keep it safe�something along the lines
of, "SQL Server is a Microsoft product" or "XML stands for eXtensible Markup
Language" is a good place to start.
6. Whenever possible, work into a response extracurricular work you have done
or plan to do that may or may not be related to the question at hand. For
example, if the question is about how one might optimize a stored procedure
they've written, refer them to the Web cast you're working on regarding .NET
technologies.
8. Troll FAQs and newsgroup archives for canned responses, then, when a newbie
or other uninformed person asks a question, regurgitate what you found, even
if it's not entirely on point. You get special bonus points for using responses
from technical experts you want to impress. So, for example, if you want to
impress Joe Celko, when you come across a response in an archive that he
posted, use the response almost verbatim when posting a response to a similar
question. Joe may have forgotten his original post and may infer that you think
like he does, and the recipient will infer that you're cut from the same cloth as
people like Joe, that you have similar technical depth. You get to appear to be
an expert, while someone else did all the work�what a deal! Give credit for
such nuggets only on a rare occasion, usually when you have some reason to
throw the original poster a bone. If the original poster is someone you don't
care about impressing or schmoozing with, never, ever give credit for the
original post. If someone notices the similarity between your post and the
original, feign ignorance, an area in which you are actually likely to have
expert skills.
10. Embed as much code as possible in your postings, even if it's unnecessary or
doesn't add materially to the point you're making, especially on newsgroups
that have high vendor visibility. You get special bonus points for dropping into
C++ or assembly language, even if you excerpted the code straight from
source code owned by other people or a debugger and don't have a clue as to
what it actually does. You get an additional bonus point if you drop into a
lengthy code discussion and use the explanations by the actual experts on it as
though you came up with them yourself. Good sources of information here are
bug database entries, MVP mailing lists, books, vendor whitepapers, and
private exchanges with developers or product support experts. Be careful to
never, ever give credit to the original source of your information. People need
to believe that you are an expert coder and that you did all the analysis
yourself.
11. Along these lines, do as much as you can to portray yourself as a "bit head"
even though you may not actually know what a bit is. One of my personal
favorites is using Courier fonts for all your postings (the inference being that
your postings typically consist of so much code and alignment-sensitive data
such as stack dumps and the like that you're better off just using a fixed-pitch
font in every post). Low-level details that are beyond the technical depth of the
average manager are especially impressive�divulge these, even if they're not
relevant to the discussion at hand and even if you don't understand them
yourself, whenever possible.
13.
Along these lines, never, ever be the first respondent when a new issue in a
technology in which you want to appear to have expertise arises. Always let
someone else make the first post. It's far better for another poster to stick his
or her neck out while you hang back in the shadows and wait for someone else
to come up with the right answer.
14. Avoid doing research to answer a posted question. If the question requires you
to think, run a repro scenario, or otherwise do much of anything except get
credit for other people's work, avoid it. Let someone else do the dirty work.
15. Attempt to answer as many questions as possible with other questions. Take
the conversation off into irrelevant areas using such questions whenever
possible.
16. Post responses to the entire newsgroup whenever it benefits you to do so,
regardless of whether the ongoing conversation would be beneficial to the
other members of the newsgroup and especially if it embarrasses the poster.
This is particularly important if the newsgroup has high vendor visibility or if it
has a large number of members. Never offer to take a conversation offline that
would benefit you personally to discuss in a public forum (e.g., when
discussing with a newbie the fine points of how to read a stack dump or how
SQLXML is put together from an architectural standpoint).
17. Regularly forward information you receive on one newsgroup to another, even
though you know most of the people on the second newsgroup are also on the
first and have already received the information. So, for example, if an MVP
posts something of interest to the SQL Server programming newsgroup,
forward it to the SQL Server administration newsgroup, even though everyone
on the administration group is very likely to already be on the programming
newsgroup. This makes you appear to be the custodian of such information
and links you by association with it, even though you had nothing to do with its
creation.
18. Make a point of asking esoteric questions involving fringe technology elements
on newsgroups with high vendor visibility and/or PYNTI (People You Need To
Impress) membership. Attempt to work in enough techno-babble that people
will infer that you're knee-deep in something of historical complexity. Be sure
you ask these questions on newsgroups where they are least likely to be
actually answered and most likely to impress the people you want to impress.
For example, you might ask questions having to do with reading symbol files
directly on a SQL Server newsgroup. A question like, "Has anyone implemented
a fast Fourier transform algorithm in assembly language?" would be a big hit
on a Visual Basic newsgroup. Similarly, "Has anyone successfully replaced the
.NET Framework XML parser with MSXML using Pinvoke?" would be a great one
for one of the Office newsgroups. Never actually say what you're doing or
why�after all, you're just trying to attract attention to yourself, not actually
get an answer. Never let anyone know that you barely understand enough
about the fringe technology you're referencing to ask a question about it, let
alone build anything with it.
19. Along these lines, attempt to work into your esoteric newsgroup questions
superfluous detail that makes you look good. For example, if your question
asks whether the order of the page frame array returned by
AllocateUserPhysicalPages has changed in a particular release of Windows and
possibly caused some code you're reviewing that sorts the array to take much
longer to run, include detail on sorting algorithms in general and a long
discourse about how the code you're reviewing uses the array. Provide a
tutorial for the gullible on shell sorts, bubble sorts, memory access algorithms,
and so on. Completely ignore more obvious potential causes of your issue,
such as that you don't know what you're talking about and the length of the
sort hasn't increased at all. Also, gloss right over the fact that a simple "Does
anyone know whether the page frame array order for
AllocateUserPhysicalPages changed for Windows nnn?" would have sufficed.
Remember: You're not actually after a resolution to your issue�you are
attempting to impress people.
20. Work as many buzzwords into your newsgroup posts as possible. Especially if
these are esoteric or fringe terms, use them as though everyone already
knows what you mean. Terms like "undetected distributed deadlock," "thread
priority inversion," "scribbler," and "merge nonconvergence" are good
candidates, especially on general-purpose newsgroups where lots of people are
likely not to know what the terms mean. Throw these terms around like you
were born knowing what they mean (heck, convince people that you actually
coined the terms, if you can) and that people who don't know what you're
talking about must be profoundly ignorant.
21. Regularly criticize technologies you don't understand, especially those you fear
or those that people you consider rivals hold in esteem. Be sure to keep your
criticisms general and above any sort of reason-based debate. Remember: Your
job is to spread FUD�Fear, Uncertainty, and Doubt. Don't let anyone know that
you've not actually used a technology you're criticizing to build anything of any
significance and that you only "know" what you read. If dissing a language, be
sure to keep secret the fact that you haven't written a line of production code
with it. Those are all details that people who listen to you don't need to know.
This type of criticism will establish you as having skills and knowledge that you
don't in the eyes of some of the folks you want to impress. For example, if you
say, "C# isn't a real language," some gullible people might actually believe
you've written production code in C# and that you have enough experience
with languages in general to say which ones are and are not "real." This allows
you to demean your rivals for espousing such poor technology and allows you
to feign knowledge you don't have�two for the price of one!
22. Let as many people as possible know about anything of any significance that
you do. For example, if you are forced to conduct what seems to you to be
significant research into a particular issue, make sure lots of people know
about your in-depth investigation. You get bonus points if you come to
conclusions that actually required several hours to reach but you understate
them and pretend to have arrived at an answer in a matter of minutes. This
can make you look like a wizard to the gullible and can conceal how long it
actually takes you to figure out even the simplest of problems. An effective
means of doing this is the ubiquitous newsgroup "sanity check" post. It works
like this: You identify an area in which you'd like to be perceived as having
expertise (e.g., SQL Server LPE nuances). You then conduct some research into
a particular issue within it (e.g., how determinism affects UDFs). Let's say the
research takes, oh, twelve hours. Post a message to a newsgroup with PYNTI
members outlining your research and asking for a "sanity check" of it. Don't
actually ask any specific questions�that might betray that you actually don't
know enough about the subject even to ask an intelligent question. Remember:
You aren't actually wanting any sort of "check"�the purpose of your post is to
inform the poor saps on the newsgroup of your latest grand achievement.
Casually mention that you spent only about 15 minutes looking into the issue.
This covers you in case your research is off in left field and also makes you look
like a genius to the dimwitted on the newsgroup�again, another two-for-one
bargain.
Editorial Department
Addison-Wesley Professional
Pearson Technology Group
75 Arlington Street, Suite 300
Boston, MA 02116
Email: [email protected]
http://www.awprofessional.com/