How Multi Threaded Architecture Works in DB2 9
How Multi Threaded Architecture Works in DB2 9
DB2 9.5
An overview
Summary: New multithreaded capabilities were introduced inDB2® 9.5 for Linux®, UNIX®,
and Windows®, codenamed “Viper 2." Learn how these new capabilities affect you if you
regularly monitor processes or threads, if you need to understand how much memory your
database is using, or if you want to simplify mission-critical tasks such as backup, restore, and
roll forward. You'll learn how these changes affect configuration parameters, and gain
knowledge of the new technology in DB2 9.5.
Tag this!
Update My dW interests (Log in | What's this?) Skip to help for Update My dW interests
Introduction
In order to understand the new multithreaded capabilities in DB2 9.5, this article starts with a
look at the DB2 process model. The entire DB2 process model is controlled by Base System
Utilities (BSUs). BSUs allocate memory for the instance and database, intercept and handle
signals, and handle exceptions sent to DB2. Figure 1 shows the old DB2 process model for the
Linux and UNIX platforms.
Figure 1. Old DB2 process model on Linux and UNIX
The communication between database servers, clients, and applications is taken care of by a
framework. This kind of framework is nothing but the process model used by all DB2 servers. It
makes sure that internally used database files won't interfere with user or database applications.
Engine dispatchable units (EDUs) are responsible for performing various tasks such as
processing database application requests, reading database log files, and flushing log records
from the log buffer to the log files on disk. Typically the DB2 server handles this as a separate
EDU per task. Prior to DB2 9.5, most of these EDUs were processed based on UNIX and Linux
environments and were thread based on a Windows environment. Now, in 9.5 there is uniformity
in the process model of DB2 as EDUs are now thread based on Linux, UNIX, and Windows
environments.
• The new memory model is simpler and more easily configured. See the following entries
in the DB2 Information Center:
o Configuring memory and memory heaps
o Memory configuration has been simplified
• This model saves resources:
Significantly fewer system file descriptors are used. The most obvious distinction
between processes and threads is that all threads of a process share the same memory
space and system-defined facilities. Facilities include open file handles (file descriptors),
shared memory, process synchronization primitives, and current directory. All threads in
a process can share the same file descriptors. There is no need to have each agent
maintain its own file descriptor table.
• Performance is enhanced:
Operating systems can generally switch (context switching) faster between threads of the
same process than between different process. There is no need to switch address space.
Because global memory is shared and almost no new memory must be allocated, creating
a thread is simpler and faster than creating a process. Process creation is expensive in
terms of processor cycles and memory usage.
• There are more automatic and dynamic configurable parameters, so less is required from
the DBA.
This is covered in the Process model configuration simplification section of this article.
• The process model is the same now across all three platforms: Linux, UNIX, and
Windows.
Prior to DB2 9.5, on UNIX and Linux environments with the help of the ps system command or
the db2_local_ps command, you were able to list all active DB2 EDUs. However, in DB2 9.5
those commands no longer list any EDU threads within the db2sysc process. Therefore, one of
the changes DB2 users and DBAs will see when they use an OS command to look at the
processes running on the system is that they will see only one process as apposed to several. This
is an administrative change you might expect from a DBA perspective.
$ ps -fu db2ins10
UID PID PPID C STIME TTY TIME CMD
db2ins10 1237176 2109662 0 Feb 28 - 0:12 db2acd 0
db2ins10 1921136 2109662 0 Feb 28 - 0:14 db2sysc 0
db2ins10 2101494 1941686 0 14:22:34 pts/1 0:00 -ksh
db2ins10 2420958 2101494 0 15:25:33 pts/1 0:00 ps -fu db2ins10
On AIX:
To view all threads of the db2sysc process (PID = 1921136):
On Linux:
To view all threads of the db2sysc process (PID = 1921136): ps -lLfp 1921136
The DBA's life is made easier now. Improvements were done in db2pd to list processes and
threads. You can now use the db2pd command, with the -edu option, to list all EDU threads that
are active. It can be used on UNIX, Linux, and Windows systems.
Listing 2. view all threads of the db2sysc process on a Linux system
$ db2pd -edu
Back to top
• db2pd -dbptnmem
• db2 get snapshot for applications on sample
• select * from table(admin_get_dbp_mem_usage())
• db2mtrk -a and db2mtrk -p
Using db2pd
Controller Automatic: Y
Memory Limit: 13994636 KB
Current usage: 76608 KB
HWM usage: 332736 KB
Cached memory: 16064 KB
Fields information:
Memory Limit is the DB2 server's upper bound of memory that can be consumed. It is the value
of the INSTANCE_MEMORY configuration parameter.
HWM usage is the high water mark (HWM) or peak memory usage that has been consumed
since the activation of the database partition when the db2start command was run.
Cached memory is how much of the current usage is not currently being used, but is cached for
performance reasons for future memory requests.
• All registered "consumers" of memory within the DB2 server are listed with the amount
of the total memory they are consuming.
• Name: A brief, distinguishing name of a "consumer" of memory. Examples include:
o APPL-<dbname> for application memory consumed for a database <dbname>
o DBMS-xxx for global database manager memory requirements
o FMP_RESOURCES for memory required to communicate with db2fmps
o PRIVATE for miscellaneous private memory requirements
o FCM_RESOURCES for Fast Communication Manager resources
o LCL-<pid> for memory segment used to communicate with local applications
o DB-<dbname> for database memory consumed for a database <dbname>
• Mem Used (KB): How much memory is currently allotted to that consumer.
• HWM Used (KB): High-water mark, or peak, memory that the consumer has used.
• Cached (KB): Of the Mem Used (KB), the amount of memory that is not currently being
used but is immediately available for future memory allocations.
Using SQL
1 record(s) selected.
Using db2mtrk
You cannot permanently set different INSTANCE_MEMORY values for different database
partitions. For a DB2 Express licenses, the upper bound on INSTANCE_MEMORY is further
restricted to at most 4GB of memory (1,048,576 * 4KB pages). DB2 Workgroup licenses are
restricted to at most 16GB of memory (4,194,304 * 4KB pages). Attempts to update the
INSTANCE_MEMORY configuration parameter to values larger than these limits will fail with
a SQL5130N return code, specifying the restricted range allowed for the license. Other license
types have no additional restrictions. You cannot set INSTANCE_MEMORY to be more than
RAM.
Back to top
Whereas, in DB2 9.5, the BACKUP command is enhanced to take a list of database partitions,
which provides a single system view.
How do you determine what log files are required during roll-forward?
The above example shows that during roll-forward if point in time (PIT) specified in the
command is old or early, you get error message (SQL1275N). The error message tells you about
the correct PIT. You might consider using BACKUP with INCLUDE logs. However, in a DPF
database, BACKUP with INCLUDE logs generates error message (SQL2032N). Therefore, you
cannot use this option.
Whereas, in DB2 9.5 you can use the "TO END OF BACKUP" clause with the
ROLLFORWARD command to roll forward all partitions in a partitioned database to the
minimum recovery time. The minimum recovery time is the earliest point in time during a roll-
forward when a database is consistent (when the objects listed in the database catalogs match the
objects that physically exist on disk). Manually determining the correct point in time to which to
roll forward a database is difficult, particularly for a partitioned database. The "END OF
BACKUP" option makes it easy.
$ db2 rollforward db test to end of backup and stop
DB20000I The ROLLFORWARD command completed successfully
Back to top
User limits set or show various restrictions on resource usage for a shell. It's a good practice to
set some of these limitations to prevent such issues as a faulty shell script to start unlimited
copies of itself or to prevent users on the system to start processes that run forever. But, what to
set it to? Below are the few considerations for various restrictions on resources:
On 32-bit LINUX
Default : 1MB
Minimum : 64 KB
Maximum : 4MB
• MAXFILOP is the maximum per database per partition. New high defaults of ~32K for
32-bit and ~64K for 64-bit.
• Current ulimit setting (or 8GB on AIX if ulimit is set to unlimited). DB2 overrides an
unlimited core limit. In order to get a core larger than 8GB, you have to explicitly set the
core limit to something larger than 8GB, but not unlimited.
Back to top
In this section you will see how the configuration parameters behave differently in DB2 9.5.
Take note of the default values and ranges, as they are different than before.
Figure 3. Configuration parameters
Before taking Schooner, have a quick look over newly introduced threads and processes:
• db2thcln (thread stack cleanup): Recycles resources when an EDU terminates (UNIX-
only).
• db2aiothr (aio collector thread): Manages asynchronous I/O requests for the database
partition (UNIX-only).
• db2alarm (alarm thread): Notifies EDUs when their requested timer has expired (UNIX-
only).
• db2vend (fenced vendor process): Executes vendor code on behalf of an EDU, for
instance to execute the user-exit program for log archiving (UNIX-only).
• db2extev (external event handler thread): The same as SIGUSR2.
• db2acd: A health monitor process.
Finally, does it impact the applications that you currently have if you are moving up DB2 9.5?
The answer to this question is NO, absolutely NOT. The internal change this does not affect the
application at all. In fact, it is largely transparent from an administration and application
programming prospective.
Back to top
Acknowledgements
Special thanks to Amar Thakkar and Samir Kapoor for their technical review of this article.
Resources
Learn
• IBM DB2 9.5 Information Center for Linux, UNIX and Windows: Find information
describing how to use the DB2 family of products and features, as well as related
WebSphere® Information Integration products and features.
• IBM DB2 Express-C 9.5: Download DB2 Express-C 9.5, a no-charge version of DB2
Express 9 database server.
• IBM DB2 Training and Certification: Find award winning instructors, industry leading
software, hands-on labs, and more.
• DB2 for Linux, UNIX, and Windows Forum: Share questions, thoughts, and ideas with
others DB2 users and developers.
• developerWorks Information Management zone: Learn more about DB2. Find technical
documentation, how-to articles, education, downloads, product information, and more.
• Build your next development project with IBM trial software, available for download
directly from developerWorks.
• Download IBM product evaluation versions and get your hands on application
development tools and middleware products from DB2, Lotus®, Rational, Tivoli®, and
WebSphere.
Discuss
Shashank Kharche is a staff software engineer with the IBM Australia Development Lab in
Sydney, Australia. He is an IBM certified DB2 administrator. Shashank currently works as part
of the Down Systems Division, Asia Pacific region, and has widespread experience in DB2
database and the diagnosis and resolution of critical problems. He has published several
technotes for IBM. He holds a Bachelor's degree in Computers Science and Engineering. You
can reach him at [email protected].
Simultaneous multithreading
From Wikipedia, the free encyclopedia
• 1 Details
• 2 Taxonomy
• 3 Historical implementations
• 4 Modern commercial
implementations
• 5 Disadvantages
• 6 See also
• 7 References
• 8 External links
[edit] Details
Because the technique is really an efficiency solution and there is inevitable increased conflict on
shared resources, measuring or agreeing on the effectiveness of the solution can be difficult.
Some researchers have shown that the extra threads can be used to proactively seed a shared
resource like a cache, to improve the performance of another single thread, and claim this shows
that SMT is not just an efficiency solution. Others use SMT to provide redundant computation,
for some level of error detection and recovery.
However, in most current cases, SMT is about hiding memory latency, increasing efficiency, and
increasing throughput of computations per amount of hardware used.
[edit] Taxonomy
In processor design, there are two ways to increase on-chip parallelism with less resource
requirements: one is superscalar technique which tries to increase Instruction Level Parallelism
(ILP), the other is multithreading approach exploiting Thread Level Parallelism (TLP).
Superscalar means executing multiple instructions at the same time while chip-level
multithreading (CMT) executes instructions from multiple threads within one processor chip at
the same time. There are many ways to support more than one thread within a chip, namely:
The key factor to distinguish them is to look at how many instructions the processor can issue in
one cycle and how many threads from which the instructions come. For example, Sun
Microsystems' UltraSPARC T1 (known as "Niagara" until its November 14, 2005 release) is a
multicore processor combined with fine-grain multithreading technique instead of simultaneous
multithreading because each core can only issue one instruction at a time.
While multithreading CPUs have been around since the 1950s, simultaneous multithreading was
first researched by IBM in 1968. The first major commercial microprocessor developed with
SMT was the Alpha 21464 (EV8). This microprocessor was developed by DEC in coordination
with Dean Tullsen of the University of California, San Diego, and Susan Eggers and Hank Levy
of the University of Washington. The microprocessor was never released, since the Alpha line of
microprocessors was discontinued shortly before HP acquired Compaq which had in turn
acquired DEC. Dean Tullsen's work was also used to develop the "Hyper-threading" (or "HTT")
versions of the Intel Pentium 4 microprocessors, such as the "Northwood" and "Prescott".
The Intel Pentium 4 was the first modern desktop processor to implement simultaneous
multithreading, starting from the 3.06GHz model released in 2002, and since introduced into a
number of their processors. Intel calls the functionality Hyper-Threading Technology (HTT), and
provides a basic two-thread SMT engine. Intel claims up to a 30% speed improvement compared
against an otherwise identical, non-SMT Pentium 4. The performance improvement seen is very
application dependent, and some programs actually slow down slightly when HTT is turned on
due to increased contention for resources such as bandwidth, caches, TLBs, re-order buffer
entries, etc. This is generally the case for poorly written data access routines that cause high
latency intercache transactions (cache thrashing) on multi-processor systems. Programs written
before multiprocessor and multicore designs were prevalent commonly did not optimize cache
access because on a single CPU system there is only a single cache which is always coherent
with itself. On a multiprocessor system each CPU or core will typically have its own cache,
which is interlinked with the cache of other CPU/cores in the system to maintain cache
coherency. If thread A accesses a memory location [00] and thread B then accesses memory
location [01] it can cause an intercache transaction particularly where the cache line fill exceeds
2 bytes, as is the case for all modern processors.
The latest[when?] MIPS architecture designs include an SMT system known as "MIPS MT". MIPS
MT provides for both heavyweight virtual processing elements and lighter-weight hardware
microthreads. RMI, a Cupertino-based startup, is the first MIPS vendor to provide a processor
SOC based on 8 cores, each of which runs 4 threads. The threads can be run in fine-grain mode
where a different thread can be executed each cycle. The threads can also be assigned priorities.
The IBM POWER5, announced in May 2004, comes as either a dual core DCM, or quad-core or
oct-core MCM, with each core including a two-thread SMT engine. IBM's implementation is
more sophisticated than the previous ones, because it can assign a different priority to the various
threads, is more fine-grained, and the SMT engine can be turned on and off dynamically, to
better execute those workloads where an SMT processor would not increase performance. This is
IBM's second implementation of generally available hardware multithreading.
Although many people reported that Sun Microsystems' UltraSPARC T1 (known as "Niagara"
until its 14 November 2005 release) and the upcoming processor codenamed "Rock" (to be
launched ~2009 [1]) are implementations of SPARC focused almost entirely on exploiting SMT
and CMP techniques, Niagara is not actually using SMT. Sun refers to these combined
approaches as "CMT", and the overall concept as "Throughput Computing". The Niagara has 8
cores, but each core has only one pipeline, so actually it uses fine-grained multithreading. Unlike
SMT, where instructions from multiple threads share the issue window each cycle, the processor
uses a round robin policy to issue instructions from the next active thread each cycle. This makes
it more similar to a barrel processor. Sun Microsystems' Rock processor is different, it has more
complex cores that have more than one pipeline.
The Intel Atom, released in 2008, is the first Intel product to feature SMT (marketed as Hyper-
Threading) without supporting instruction reordering, speculative execution, or register
renaming. Intel reintroduced Hyper-Threading with the Nehalem microarchitecture, after its
absence on the Core microarchitecture.
[edit] Disadvantages
Simultaneous multithreading cannot improve performance if any of the shared resources are
limiting bottlenecks for the performance. In fact, some applications run slower when
simultaneous multithreading is enabled. Critics argue that it is a considerable burden to put on
software developers that they have to test whether simultaneous multithreading is good or bad
for their application in various situations and insert extra logic to turn it off if it decreases
performance. Current operating systems lack convenient API calls for this purpose and for
preventing processes with different priority from taking resources from each other [2].
There is also a security problem with simultaneous multithreading. It has been proven that it is
possible for one application to steal a cryptographic key from another application running in the
same processor by monitoring its cache use [3].
[edit] References
1. ^ http://www.theregister.co.uk/2007/12/14/sun_rock_delays/
2. ^ How good is hyperthreading?
3. ^ Hyper-Threading Considered Harmful
INDEX TERMS
Primary Classification:
D. Software
D.4 OPERATING SYSTEMS
D.4.1 Process Management
Subjects: Multiprocessing/multiprogramming/multitasking
Additional Classification:
C. Computer Systems Organization
C.1 PROCESSOR ARCHITECTURES
General Terms:
Design, Measurement, Performance