The Revolution in Database Architecture: Jim Gray Microsoft Research
The Revolution in Database Architecture: Jim Gray Microsoft Research
Jim Gray
Microsoft Research
March 2004
Technical Report
MSR-TR-2004-31
Microsoft Research
Microsoft Corporation
One Microsoft Way
Redmond, WA 98052
The Revolution in Database Architecture
Jim Gray
Microsoft
455 Market St. #1650
San Francisco, CA, 94105 USA
http://research.microsoft.com/~Gray
[email protected]
ABSTRACT 1. INTRODUCTION
Database system architectures are undergoing revolutionary This is an extended abstract for a SIGMOD 2004 keynote address.
changes. Most importantly, algorithms and data are being unified It argues that databases are emerging from a period of relative
by integrating programming languages with the database system. stasis where the agenda was “implement SQL better.” Now data-
This gives an extensible object-relational system where non- base architectures are in the punctuated stage of punctuated-
procedural relational operators manipulate object sets. Coupled equilibrium. They have become the vehicles to deliver an inte-
with this, each DBMS is now a web service. This has huge impli- grated application development environment, to be data-rich
cations for how we structure applications. DBMSs are now object nodes of the Internet, to do data discovery, and to be self-
containers. Queues are the first objects to be added. These queues managing. They are also our main hope to deal with the informa-
are the basis for transaction processing and workflow applica- tion avalanche hitting individuals, organizations, and all aspects
tions. Future workflow systems are likely to be built on this core. of human organization. It is an exciting time! There are many
Data cubes and online analytic processing are now baked into exciting new research problems and many challenging implemen-
most DBMSs. Beyond that, DBMSs have a framework for data tation problems. This talk highlights some of them.
mining and machine learning algorithms. Decision trees, Bayes
nets, clustering, and time series analysis are built in; new algo-
rithms can be added. There is a rebirth of column stores for sparse
2. THE REVOLUTIONS
tables and to optimize bandwidth. Text, temporal, and spatial 2.1 Object Relational Arrives
data access methods, along with their probabilistic reasoning have We be data. But, you cannot separate data and algorithms. Un-
been added to database systems. Allowing approximate and prob- fortunately, Cobol has a data division and a procedure division
abilistic answers is essential for many applications. Many believe and so it had separate committees to define each one. The data-
that XML and xQuery will be the main data structure and access base community inherited that artificial division from the Cobol
pattern. Database systems must accommodate that perspective. Data Base Task Group (DBTG). We were separated from our
External data increasingly arrives as streams to be compared to procedural twin at birth. We have been trying to reunite with it
historical data; so stream-processing operators are being added to for 40 years now. In the mid-eighties stored procedures were
the DBMS. Publish-subscribe systems invert the data-query ra- added to SQL (thank you Sybase), and there was a proliferation of
tios; incoming data is compared against millions of queries rather object-relational database systems. In the mid-nineties many SQL
than queries searching millions of records. Meanwhile, disk and vendors added objects to their own systems. Although these were
memory capacities are growing much faster than their bandwidth each good efforts, they were fundamentally flawed because de
and latency, so the database systems increasingly use huge main novo language designs are very high risk.
memories and sequential disk access. These changes mandate a The object-oriented language community has been refining its
much more dynamic query optimization strategy – one that adapts ideas since Simula67. There are now several good OO languages
to current conditions and selectivities rather than having a static with excellent implementations and development environments
plan. Intelligence is moving to the periphery of the network. (Java and C# for example.) There is a common language runtime
Each disk and each sensor will be a competent database machine. that supports nearly all languages with good performance.
Relational algebra is a convenient way to program these systems.
Database systems are now expected to be self-managing, self- The big news now is the marriage of databases and these lan-
healing, and always-up. We researchers and developers have our guages. The runtimes are being added to the database engine so
work cut out for us in delivering all these features. that now one can write database stored-procedures (modules) in
these languages and can define database objects as classes in these
languages. Database data can be encapsulated in classes and the
language development environment allows you to program and
debug SQL seamlessly mixing Java or C# with SQL, doing ver-
sion control on the programs, and generally providing a very pro-
Permission to make digital or hard copies of all or part of this work for ductive programming environment. SQLJ is a very nice integra-
personal or classroom use is granted without fee provided that copies are tion of SQL and Java, but there are even better ideas in the pipe-
not made or distributed for profit or commercial advantage and that line.
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists, This integration of languages with databases eliminates the inside-
requires prior specific permission and/or a fee. the-database outside-the-database dichotomy that we have lived
SIGMOD 2004, June 13–18, 2004, Paris, France. with for the last 40 years. Now fields are objects (values or refer-
Copyright 2004 ACM 1-58113-859-8/04/06 …$5.00. ences); records are vectors of objects (fields); and tables are se-
quences of record objects. Databases are collections of tables. 2.3 Queues, Transactions, Workflows
This objectified view of database systems has huge leverage – it The Internet is a loosely coupled federation of computer servers
enables most of the other revolutions. It is a way for us to struc- and clients. Clients are sometime disconnected, and yet they need
ture and modularize our systems. to be able continue functioning. Rather than building tightly-
A clean object-oriented programming model also makes database coupled RPC-based applications, Internet-scale applications must
triggers much more powerful and much easier to construct and be constructed as asynchronous tasks structured as workflows
debug. Triggers are the database equivalent of rule-based pro- involving multiple autonomous agents. eMail gives an intuitive
gramming. As such, they have proponents and opponents. Hav- understanding of these design issues. You want to be able to read
ing a good language foundation will probably not sway the active and send mail even though you are not connected to the network.
database opponents, but it will certainly make it easier to build All the major database systems now include a queuing system that
systems. makes it easy to define queues, queue and dequeue messages,
The database integration with language runtimes is only possible attach triggers to queues, and dispatch tasks driven by the queues.
because database system architecture has been modularized and A good programming environment within the database system and
rationalized. This modularity enables the other architectural revo- the simplicity of the transaction model makes it easy and natural
lutions which are done as extensions to the core data manger. to use queues. Being able to publish queues as web services is
also a big advantage. But, queues are almost immediately used to
2.2 Databases are Web Services --TPlite go beyond simple ACID transactions and implement publish-
Databases are encapsulated by business logic. Before the advent subscribe and workflow systems. These are built as applications
of stored-procedures, all the business logic ran in the transaction atop the basic queuing system. There is a lot of innovation and
processing monitor which was the middle tier of the classic three- controversy over exactly how workflows and notifications should
tier presentation-application-data architecture. With stored pro- work – it is an area of ferment and fruitful experimentation.
cedures, the TP-monitors were disintermediated by two-tiered
The research question here is how to structure workflows.
client/server architectures. The emergence of web-servers and
Frankly, solutions to this problem have eluded us for several dec-
HTTP brought three-tier architectures back to center stage – in
ades. But the immediacy of the problem is likely to create
part as protocol converters between HTTP and the database cli-
enough systems that some design patterns will emerge. The re-
ent/server protocol, and in part by moving the presentation ser-
search challenge is to characterize these design patterns.
vices (HTML) back to the web server.
As eCommerce evolves, most web clients are application pro- 2.4 Cubes and Online Analytic Processing
grams rather than browsers blindly displaying whatever the server Early relational systems used indices as table replicas that allowed
delivers. Today, most eCommerce clients screen-scrape to get vertical partitioning, allowed associative search, and allowed con-
data from the web pages, but there is increasing use of XML web venient data ordering. Database optimizers and executors use
services as a way of delivering data to fat-client applications. semi-join on these structures to run common queries on covering
Most web services are being delivered by classic web servers indices. These query strategies give huge speedups.
today (Apache, Microsoft IIS); but, database systems are starting These early ideas evolved to materialized views (often maintained
to listen to port 80 and to publish web services. In this new by triggers) that went far beyond simple covering indices and
world, one can take a class or a stored procedure implemented provided fast access to star and snowflake schema. In the 1990s
inside the database system, and publish it on the internet as a web we discovered the fairly common OLAP pattern of data cubes that
service (WSDL interface definition, DISCO discovery, UDDI aggregate data along many dimensions. The research community
registration, and SOAP call stubs are all generated automatically). extended the cube-dimension concepts and developed algorithms
So, the TPlite client-server model is back, if you want it. to automate cube design and implementation. There are very ele-
Designers still have the option of three-tier or n-tier application gant and efficient ways to maintain cubes. Useable cubes that
designs; but, they now have the two-tier option again. The sim- aggregate multi-terabyte fact tables can be represented in a few
plicity of two-tier client/server is attractive, but security issues gigabytes. These algorithms are now key parts of the major data-
(databases have huge attack surfaces) may cause many designers base engines. This is an area intense research and rapid innova-
to want three-tier server architectures with the web server in the tion – much of the work now focuses on better ways to query and
demilitarized zone (DMZ). visualize cubes.
It is likely that web services will be the way we federate heteroge- 2.5 Data Mining
neous database systems. This is an active research area. What is We are slowly climbing the value chain from data to information
the right object model for a database? What is the right way to to knowledge to wisdom. Data mining is our first step into the
represent information on the wire? How do schemas work in the knowledge domain. The database community has found a very
Internet? How does schema evolution work? How do you find elegant way to embrace and extend machine learning technology
data and databases? We do not have good answers to any of like clustering, decision trees, Bayes nets, neural nets, time series
these questions. Much of my time is devoted to trying to answer analysis, etc... The key idea is to create a learning table T; telling
these questions for the federation of astronomy databases we call the system to learn columns x, y, z, from attributes a, b, c (or to
the World-Wide Telescope. cluster attributes a, b, c, or to treat a as the time stamp for b.)
Then one inserts training data into the table T, and the data min-
ing algorithm builds a decision tree or Bayes net or time series
model for the data. The training phase uses SQL’s well under-
stood Create/Insert metaphor. At any point, one can ask the sys- structure and constraints. Even the best designed database leaves
tem to display the model as an XML document that, in turn, can out some constraints and leaves some relationships unspecified.
be rendered in intuitive graphical formats. A huge battle is raging in the database community. The radicals
After the training phase, the table T can be used to generate syn- believe cyberspace is just one big XML document that should be
thetic data; given a key a,b,c it can return the likely x,y,z values of manipulated with xQuery++. The reactionaries believe that struc-
that key along with the probabilities. Equivalently, T can evaluate ture is your friend and that semi-structured data is a mess to be
the probability that some value is correct. The neat thing about avoided. Both camps are well represented within the database
this is that the framework allows you to add your own machine- community – often stratified by age. It is easy to say that the truth
learning algorithms to this framework. This gives the machine- lies somewhere in between, but it is hard at this point to say how
learning community a vehicle to make their technology accessible this movie will end.
to a broad user base. One especially interesting development is the integration of data-
Given this framework, the research challenges are now to develop base systems with file systems. Individuals have hundreds of
better mining algorithms. There is also the related problem of thousands of files (mails, documents, photos, ...). Corporations
probabilistic and approximate answers that is elaborated later. have billions of files. Folder hierarchies and traditional filing
systems are inadequate – you just can’t find things by location
2.6 Column Stores (folder) or grep (string search). A fully indexed semi-structured
It is increasingly common to find tables with thousands of col- database of the objects is needed to for decent precision and recall
umns – they arise when a particular object has thousands of meas- on search. It is paradoxical, but file systems are evolving into
ured attributes. Not infrequently, many of the values are null. For database systems. These modern file systems are a good example
example, an LDAP object has 7 required and a thousand optional of the semi-structured data challenge, and indeed are challenging
attributes. It is convenient to think of each object as a row of a some of the best data management architects.
table, but representing it that way is very inefficient – both in 2.9 Stream Processing
space and bandwidth. Classical relational systems represent each Data is increasingly generated by instruments that monitor the
row as a vector of values and often materialize rows even if they environment – telescopes looking at the heavens, DNA sequenc-
are null (not all systems do that, but most do.) This row-store ers decoding molecules, bar-code readers watching passing
representation makes for very large tables and very sparse infor- freight-cars, patient monitors watching the life-signs of a person
mation. in the emergency room, cell-phone and credit-card systems look-
Storing sparse data column-wise as ternary relations (key, attrib- ing for fraud, RFID scanners watching products flow through the
ute, value) allows extraordinary compression—often as a bitmap. supply chain, and smart-dust sensing its environment.
Querying such bitmaps can reduce query times by orders of mag- In each of these cases, one wants to compare the incoming data
nitude – and enable whole new optimization strategies. Adabase with the history of an object. The data structures, query operators,
and Model204 pioneered these ideas, but they are now having a and execution environments for such stream processing systems
rebirth. The research challenge is to develop automatic algorithms are qualitatively different from classic DBMS architectures. In
that do column store physical design and to develop efficient al- essence, the arriving data items each represent a fairly complex
gorithms for updating and searching column stores. query against the existing database. Researchers have been build-
ing stream processing systems, and their stream-processing ideas
2.7 Text, Temporal, and Spatial Data Access have started appearing in mainstream products.
The database community has insulated itself from the information
retrieval community, and has largely eschewed dealing with 2.10 Publish-Subscribe and Replication
messy data types like time and space (not everyone has, just most Enterprise database architects have adopted a wholesale-retail
of us.) We had our hands full dealing with the “simple stuff” of data model where data-warehouses collect vast data archives and
numbers, strings, and relational operators on them. But, real ap- publish subsets to many data-marts each of which serves some
plications have massive amounts of text data, have temporal prop- special interest group. This bulk publish-distribute-subscribe
erties, and have spatial properties. model is widely used and employs just about every replication
The DBMS extensibility offered by integrating languages with the scheme you can imagine. There is a trend to install custom sub-
DBMS makes it relatively easy to add data types and libraries for scriptions at the warehouse – application designers are adding
text, spatial, and temporal indexing and access. Indeed the SQL thousands, sometimes millions of subscriptions. In addition, they
standard has been extended in all these areas. But, all three of are asking that the subscriptions have real-time notification. That
these data types, and especially text retrieval, require that the da- is, when new data arrives, if it affects the subscription, then the
tabase deal with approximate answers and with probabilistic rea- change is immediately propagated to the subscriber. For example,
soning. This has been a stretch for traditional database systems. finance applications want to be notified of price fluctuations, in-
It is fair to say that much more research is needed to seamlessly ventory applications want to be notified of stock level changes,
integrate these important data types with our current frameworks. and information retrieval applications want to be notified when
Both data mining and these complex datatypes depend on ap- new content is posted.
proximate reasoning – but we do not have a clear algebra for it. Pub-sub and stream processing systems have similar structure.
The millions of standing queries are compiled into a dataflow
2.8 Semi-Structured Data graph. As new data arrives, the data flow graph is incrementally
Not all data fits into the relational model. Jennifer Widom ob-
evaluated to see which subscriptions are affected. The new data
serves that we all start with the schema <stuff/> and then add
triggers updates to those subscriptions. This technology relies 2.14 Self Managing and Always Up
heavily on the active-database work of the 1990s and is still If every file system, every disk and every piece of smart dust has a
evolving. The research challenge is to support more sophisticated database inside, database systems will have to be self-managing,
standing queries and to provide better optimization techniques self-organizing, and self healing. The database community is
that handle the vast number of queries and vast data volumes. rightly proud of the advances they have made in automating de-
sign and operation – most people are unaware that their eMail
2.11 Late Binding in Query Plans system is a simple database and that their file system is a simple
All these changes have a huge impact on the way the database
database and that many other applications they use and manage
query optimizer works. Having user-defined functions deep in-
are in fact simple database systems. But, as you can see from the
side the query plans makes cost estimation problematic. Having
feature list enumerated here, database systems are becoming much
real data with high skew has always been problematic, but in this
more sophisticated. Much work remains to make the distributed
new world the relational operators are just the outer loop of a non-
data stores so robust that they never lose data and they always
procedural program that should be executed with the least cost
answer questions efficiently.
and in parallel.
Cost-based static-plan optimizers continue to be the mainstay for 3. CONCLUDING REMARKS
simple queries that run in seconds. But, for complex queries, the The theme of this talk is that we live in a time of extreme change.
query optimizer must adapt to current workloads, must adapt to It is an exciting time; essentially everything all design assump-
data skew and statistics, and must plan in a much more dynamic tions are being re-evaluated. There are research challenges eve-
way – changing plans as the system load and data statistics rywhere. There are no small challenges in this list of revolutions.
change. For petabyte-scale databases it seems the only solution is Yet, I think our biggest challenge is a unification of approximate
to run continuous data scans and let queries piggyback on the and exact reasoning. Most of us come from the exact-reasoning
scans. Teradata pioneered that mechanism, and it is likely to be- world—but most of our clients are asking questions with ap-
come more common in the future. proximate or probabilistic answers.
2.12 Massive Memory, Massive Latency The restructuring of database systems to be web services and to
To make life even more interesting, disk and memory capacities integrate with language runtimes has created a modularity that
continue to grow faster than latency and bandwidth improve. It enables these revolutions. The reunification of code and data is
used to take less than a second to read all of ram and less than 20 pivotal. Almost all the other changes depend on that. The exten-
minutes to read everything on a disk. Now, a multi-terabyte ram sion framework allows researchers and entrepreneurs to add new
memory scans take minutes and terabyte-disk scans take hours. algorithms and whole new subsystems to the DBMS. Databases
Random access is a hundred times slower than sequential. These are evolving from SQL-engines to data integrators and mediators
changing ratios require new algorithms that intelligently use that provide a transactional and non-procedural access to data in
multi-processors sharing a massive main memory, and intelli- many forms. Database systems are becoming database operating
gently use precious disk bandwidth. The database engines need to systems, into which one can plug subsystems and applications.
overhaul their algorithms to deal with the fact that main memories
The database community has a healthy interplay between research
are huge (billions of pages trillions of bytes). The era of main-
and development. Virtually all the people and most innovations
memory databases has finally arrived.
in database systems can be traced to the research prototypes first
2.13 Smart Objects: Databases Everywhere describe in research papers. Product groups watch research proto-
At the other extreme, each disk controller now has tens of mega- types with great interest, academics frequently take sabbaticals in
bytes of storage and a very capable processor. It is quite feasible industry, and there are many startups. These collaborations are
to have intelligent disks that offer either database access (SQL or world-wide, largely fostered by SIGMOD and the VLDB-
some other non-procedural language) and even web service ac- Foundation’s international focus. The ecosystem compensates for
cess. Moving from a block-oriented disk interface to a file inter- the haphazard government funding of database research. Data and
face, and then to a set or service interface has been the goal of databases are central to all aspects of science and industry – and
database machine advocates for three decades. In the past they researchers and industry recognizes that, even if funding agencies
needed special purpose hardware. But, now disks have fast gen- do not.
eral purpose processors as a consequence of Moore’s law. So, it Going forward, the information avalanche shows no sign of slow-
seems likely that database machines will have a rebirth. ing. This guarantees a full menu of challenges for the database
In a related development, people building sensor networks have research community -- challenges far beyond the ones mentioned
discovered that if you view each sensor as a row of a table, where here. But, I believe the low-hanging fruit is clustered around the
the sensor values are fields of the row, then it is very easy to write topics outlined here.
programs to query the sensors. What’s more, distributed query
technology, augmented with some new algorithms gives very 4. ACKNOWLEDGMENTS
efficient programs for these sensor networks, minimizing band- Talks by David DeWitt, Mike Stonebraker, and Jennifer Widom at
width and making them easy to program and debug. So tiny- CIDR inspired much of this presentation.
database systems are appearing in smart dust – a surprising and
exciting development.