Best Practice in Database Development For Performance
Best Practice in Database Development For Performance
Best Practices in
Database
Development
For the _____ IS/IT Department
_____
8/17/2009
• Applications come, applications go. The data, however, lives essentially forever.
In the long term, the goal is not about building applications, it is really about using
(and protecting) the data that is underneath, or behind these applications.
employee table to the primary key on a department table. Not only does this
enforce the integrity of your data, it will have an add-on effect to the performance
of your queries by giving the database optimizer the correct information to
determine the most effective query paths. Likewise, “A Purchase Order detail MUST
reference a valid Purchase Order,” should be instantiated as a mandatory foreign
key on the Purchase Order Detail table to the primary key on the Purchase Order
table.
You will also need to be aware of the differing types of index that are available to
you, and when it would be appropriate (and just as importantly, when it would not
be appropriate) to use them.
At its most basic level, interaction with a database is easy. To retrieve information,
READ or SELECT. To create, CREATE or INSERT. To change, UPDATE, and to remove,
DELETE. Point these statements at database tables or views; use generic code
generators, application development tools and the data manipulation procedures
are done. If we’re not sure what tables we want to access, or what fields we want
until we get some input from the end user, it’s very simple to build the query into a
string and then execute the string through a JDBC (or equivalent) call.
Unfortunately, that design approach is fundamentally flawed on a number of levels,
the most important areas being security, scalability and performance.
database to database, and application to application, but the one constant across
the varying technology is that refusing to use bind variables will potentially not only
open your application up to quite severe security risks, it will also be a prime cause
of your application not being able to scale up, or perform under load.
Application developers and Project Managers should make sure that the technology
being used to interact with any RDBMS system supports the ability to issue queries
with bind variables. Unbound queries can destroy the ability of your application to
scale and run efficiently. And even though this should be obvious, it bears
repeating - string substitution in an SQL query is not binding variables.
2
The first query will have to be hard parsed every time you issue the query with a
different employee number – and if this query is going to be executed many times
with different parameters, you will flood your database’s memory area with multiple
copies of the same statement. Once this occurs, then every new query will have to
be hard parsed, as well as possibly (depending on the database platform) the
unwelcome side affect of causing your database to start page swapping into your
virtual memory with all the unwelcome IO operations that entails.
However, the second query will only have to be checked, optimized and parsed
once, and once only, (at worst,) for each active connection.
Stmt = “begin p(‘” & input1 & “’, ‘” & input2 & “’ ); end;”
Now you have to explain to your line manager, director and reporting partner why
the salary table has been dropped, and unless you have auditing functions turned
on, this will be very hard to trace.
Stmt = “begin p(‘” & input1 & “’, ‘” & input2 & “’ ); end;”
and their variations are not binding variables. Use your technologies variation of
the Java
PreparedStatement.
PreparedStatement pstat =
Conn.prepareStatement
(“begin p(?, ?); end;”);
Pstat.setString(1, );
Pstat.setString(2, );
Bind variables are not just about performance; they are also about security. This is
particularly important if you are using a code generator, or dealing with third party
software.
Unless you are intimately acquainted with the underlying architecture of your
database platform’s optimizer, and you have extensively benchmarked your
changes, and you have discrete, quantifiable and definite performance reasons to
use hints, and you fully understand the internal mechanics of what the hint will do
to your execution paths under load, leave them alone. Under almost all
circumstances, the database optimizer will have come up with the optimal query
execution plan for your given query.
This of course presumes that your database design and indexing strategy is valid.
Remember - a hint is not a quick fix for poor database design.
we are using – and they are different between different databases – and how will
this affect our design?
• How to design appropriate methods and rules to cater for this?
The most efficient way is to package your transactions into stored procedures in
SQL Server and packages in Oracle, and use static SQL as much as possible.
Certainly within an Oracle instance, the first time a package is called in your
session, it is pinned into the shared pool, and the static SQL statements are
parsed and the explain plans written, so when the packages are called again,
they just execute – no need for a hard or soft parse. This is one of the most
pressing arguments for the usage of properly designed stored procedures and
static SQL – “parse once, execute many”. A good philosophy to adopt when
designing database applications and the associated SQL transactions is -
• Do it in one SQL statement if you possibly can; think in terms of set processing,
not row-by-row - especially for bulk processes.
• If you can’t do it in one SQL statement, then a little database native procedural
code (ie. PL/SQL on Oracle, or T-SQL in SQL Server) wrapped around well
designed SQL statements will almost certainly solve most problems – but
remember to think in terms of processing “sets” of data. Keep your procedural
programs small and explicit, and designed to solve one specific problem.
• If for whatever reason, native procedural code (and this includes native Java
libraries in both Oracle and SQL Server) won’t give you what you need – and this
should RARELY be necessary – then an external call to a Java library or C/C+
+/C#/Assembler dll could be justified.
• If that won’t do it, you probably shouldn’t be trying it in the first place, or your
design is fundamentally flawed
• AVOID dynamic SQL unless you have absolutely no other choice; dynamic SQL
is very easy to code, but it can be complicated to code properly and if you don’t
code it properly, you run the risk of flooding the instance memory area with
queries that have to be hard parsed every time and that can bring a database to
its knees.
Further, using database packages and procedures, you can make use of the
appropriate methods to lock you records for INSERTS and UPDATES depending on
your database, and elegantly handle any waits or exceptions you may encounter.
rollback. Do NOT assume that the database server’s auto-commit knows your
database design, because, it is a virtual certainty that it does not.
For example, in Oracle databases, version 8.1.6 and above, there are two methods
for generating dynamic SQL - Native Dynamic SQL and the DBMS_SQL API package
which was introduced in version 7.1. The differences between these two methods
are quite pronounced, even though they both provide a similar service, so it would
2
Make sure your web services/Java Beans/ORBs/J2EE/.Net/COM apps are binding; this
is particularly important with 3rd party software you didn’t develop, or when using a
design tool that generates the SQL for you. At the very least, you must have the
ability to examine the actual SQL that has been generated. Once again, if the tools
you have been provided with do not support binding, or the ability to call a
database package, flag this as a risk to your Project Manager.
The trade off for this brilliant functionality and performance, is that you, the
developer, and your development team, have the responsibility to know and
understand what tools are available to you across all aspects of your project, but
particularly so within your database. Have a serious review, in the design and
architecture phase, about what you want your application database to do, and how
best to interact with it. It is the IT Department Manager’s responsibility to ensure
that the team has a Database Administrator available, to assist with the choices of
technology available from the database.
models, with the appropriate referential integrity, primary and unique key
constraints and the proper index strategies in place. It’s important to remember
this – applications and technology come and go. C# / .Net / J2EE / CORBA / RMI /
WebServices / Applet / Servlet / Client-Server applications may not be around in five
years, as new technologies emerge. Your data, however, has a much longer
lifecycle and is far more valuable. Your database may be only supporting one
application initially, but data is an extremely valuable asset; other applications may
want to utilize your data and the variety of platforms and technologies vary
enormously.
• FIRE_EMPLOYEE,
• HIRE_EMPLOYEE,
• GET_EMPLOYEE_DETAILS,
• GIVE_ME_A_PROMOTION.
The advantage of this approach is that your transactions can be re-used by almost
any other application, and you have a much more consistent ability to secure
access to your data. Your DML statements have been optimized for your data
structures and RDBMS product and version, you know that the queries have been
optimized against your particular schema (and be aware that query design and
optimization will probably change during your load testing phase), if appropriate the
correct record locking design has been incorporated and you know your application
will scale securely.
If however, the SQL is propagated in a number of Java classes or .Net objects, the
problem is exacerbated by the necessity of having to locate each instance of the
object and apply the fix.
In fact, you should give very serious thought to not allowing your applications
access to the underlying tables at all – the application interacts with the data only
2
through carefully designed packages, procedures, views and functions, and these
database objects are the only objects the application can see or use.
1.13 In Summary
3. Be aware of the new features on new database releases, these can make
your life much easier.
6. When you’re manipulating data, think in sets, particularly during data loads.
Row-by-row processing is usually inefficient and unnecessary, except in OLAP
transaction systems, and even then set based updates should be applied if
possible. Looping selects and updates over a network can result in severe
performance degradation. Do it all in a single statement if you can.
when you do need to use a FOR loop and bulk processing on an update, insertion
or deletion, but COMMITs in any kind of FOR loop are dangerous because they
break your transactional integrity. COMMITs in for loops can also be very
inefficient and generate a considerable amount more redo than single
transactions. One senior DBA puts it very succinctly: “Using any form of auto-
commit is one of the safest routes to an un-scalable and error-prone application.
First of all, you shouldn’t split up a logical transaction into many physical
transactions. If any of these physical transactions fail, in what hybrid state is
your database? How do you restart the process?... There are many ways to stay
out of hell. Doing things properly is one. Not committing inside a loop is
another. Start doing things properly and stop hacking”
8. XML – if there are alternatives to processing XML, try these methods first.
Large XML documents or volumes are memory inefficient, processor intensive,
and can therefore be very slow to process. Smaller documents shouldn’t prove
problematic, though, unless the transaction volumes are high. Remember, XML
was designed for, and is most effective as a means of packaging data sets with
metadata, for transparently copying the data from one system to another – not
as a database data super-type.
9. Design your resultsets to return just the information you want over the
network. It’s a waste of resources to SELECT * FROM [table] when all you want is
one or two fields from a few records.
10. Dynamic SQL can be a trap. Either learn to do it very, very well, or leave it
well alone.
11. As a general rule, BIND EVERY VARIABLE. Yes, there are exceptions, but
your database specific DBA will advise you of these, and when they are
appropriate.
12. Know your database - How the data is related, what business rules are
enforced, etc.
13. Know your database server – what features it has, how it locks, how it
handles transactions, and how indexes are handled. Each database server will
be different.
14. Try, as far as possible, to encapsulate all known transactions within your
application into database stored procedures. By creating a set of database
transaction API’s, you will simplify access to your system, as well as enhance
performance, scalability and security.
2
15. Normalize your tables to third normal form; only de-normalize when
necessary for performance, and with full knowledge of the implications, risks,
and rewards of the de-normalization that you are considering. Remember, in the
long run, well-normalized tables are easier to maintain, update, and use for most
purposes, than are de-normalized tables, and un-normalized tables can lead to
many difficult situations and data anomalies.
16. Talk to Technical Data Architects and other experienced developers in other
disciplines; their experiences may be able to help you solve problems in your
own discipline. If they give you advice, seriously entertain the thought of
implementing their feedback, even though it might run counter to what you’ve
been led to believe, or to how you’ve written applications in the past. Above
all, do NOT operate in a vacuum, isolated from the other disciplines, as this is an
almost certain approach to failure in some manner, for the application as a
whole.