0% found this document useful (0 votes)
101 views9 pages

Matrix Template Library for HPC

This document summarizes an article from the journal Computing in Science & Engineering about the Matrix Template Library (MTL). MTL is a C++ template library for scientific computing that provides generic components for high-performance linear algebra. The summary discusses how MTL applies generic programming principles to numerical linear algebra to provide efficient linear algebra functionality while requiring fewer lines of code than non-generic libraries. MTL's algorithms can be composed arbitrarily and operate generically on any matrix type defined in the library.

Uploaded by

Honorius Fendi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views9 pages

Matrix Template Library for HPC

This document summarizes an article from the journal Computing in Science & Engineering about the Matrix Template Library (MTL). MTL is a C++ template library for scientific computing that provides generic components for high-performance linear algebra. The summary discusses how MTL applies generic programming principles to numerical linear algebra to provide efficient linear algebra functionality while requiring fewer lines of code than non-generic libraries. MTL's algorithms can be composed arbitrarily and operate generically on any matrix type defined in the library.

Uploaded by

Honorius Fendi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

SCIENTIFIC PROGRAMMING

Editor: Paul F. Dubois, [email protected]

THE MATRIX TEMPLATE LIBRARY: GENERIC COMPONENTS FOR


HIGH-PERFORMANCE SCIENTIFIC COMPUTING
By Jeremy G. Siek and Andrew Lumsdaine

T HE STANDARD TEMPLATE LIBRARY WAS RELEASED IN 1995

AND ADOPTED INTO THE ANSI C++ STANDARD SHORTLY

THEREAFTER.1 WHEN WE FIRST DISCOVERED THE STL, IT BECAME


Generic programming
As Alexander Stepanov has de-
scribed, the generic-programming
process applied to a particular problem
domain takes the following basic steps:

APPARENT TO US THAT GENERIC PROGRAMMING, THE FUNDAMEN- 1. Identify useful and efficient algo-
rithms.
tal paradigm underlying the STL, was nitude fewer lines of code. 2. Find their generic representation
an important and powerful new soft- Composability and genericity are all (parameterize each algorithm such
ware development methodology—and well and good, but to many scientific that it makes the fewest possible re-
this has been borne out by the tremen- computing users the advantages of an quirements of the data on which it
dous success of the STL for general- elegant programming interface are sec- operates).
purpose programming. Not so obvious ondary to issues of performance. Given 3. Derive a set of (minimal) require-
then, however, was how (or even if) MTL’s heavy use (and many layers) of ments that allow these algorithms to
generic programming could apply to abstraction, we might naturally assume run and to run efficiently.
other problem domains. To investigate that there is a corresponding perfor- 4. Construct a framework based on
the merit of this approach for scientific mance penalty. It turns out that generic classifications of requirements.
computing, we embarked on a research programming is a powerful tool with
project to apply generic programming respect to performance as well—in two In applying this process to numerical
to high-performance numerical linear regards: linear algebra, the first step can be di-
algebra. rectly motivated by the mathematical
That effort brought forth the Matrix • Properly designed abstractions (in definition of linear algebra. That is, we
Template Library, a generic compo- conjunction with modern compilers) need algorithms that implement the ba-
nent library for scientific computing.2 incur no performance penalty per se. sic axiomatic operations of linear alge-
Although MTL consists of a relatively That is, generic components are as bras: multiplying a vector by a scalar,
small number of software components, efficient as their handwritten coun- adding two vectors, or applying a linear
its power and utility arise from the terparts. transformation to a vector. In addition
composability of the components and • High-level performance-tuning me- to these operations, we can include op-
the generic nature of the algorithms. chanisms (such as cache or register- erations that give our linear space extra
That is, the components can be com- blocking schemes) can be described structure, such as inner products and
posed arbitrarily to produce an ex- in a generic fashion, giving users norms as well as algorithms for working
tremely wide variety of matrix formats. vendor-optimized performance lev- with dual spaces (that is, a transpose op-
Similarly, each algorithm can operate els in a portable and easy-to-tune eration). Thus, our set of useful and ef-
on any matrix type defined in this fash- fashion. This has been borne out in ficient algorithms consists of just these
ion. The resulting linear algebra func- our own experiments—MTL can six (respectively performing the desired
tionality far exceeds that of libraries match the performance of vendor- operations described above): scale(),
that are not based on component tech- tuned libraries on a number of plat- sum(), mult(), dot(), norm(), and
nology, while requiring orders of mag- forms. transpose().

70 COMPUTING IN SCIENCE & ENGINEERING


Cafe Dubois
Lapack, Version 3.0 is available
Lapack is a library of numerical linear algebra subroutines
designed for high performance on workstations, vector com-
puters, and shared-memory multiprocessors. Jack Dongarra
([email protected]) recently announced the release of
Version 3.0, which introduces new routines, as well as ex-
tending the functionality of existing routines.
According to the announcement, the most significant new
routines and functions are singular value decomposition com-
puted by the divide-and-conquer method; SVD-based divide-
and-conquer least-squares solver; new simple and expert dri-
vers for the generalized nonsymmetric eigenproblem; new
generalized symmetric eigenproblem drivers; symmetric eigen-
problem drivers using fast but relatively robust eigenvector com-
putations; a faster QR decomposition with column pivoting;
a faster solver for the rank-deficient least squares problem; a
blocked version of xTZRQF (xTZRZF) and associated xORMRZ/
xUNMRZ solver for the generalized Sylvester equation; and
computational routines (xTGEXC, xTGSEN, xTGSNA).
For complete information, please refer to the release notes
file in the Lapack directory on netlib (http://www.netlib.org/ Floating-point exceptions and the Borland compiler
lapack releasenotes). There are Lapack bindings available for Dan Kerner ([email protected]) is a senior programmer
Fortran 90, C, and Java (http://www.netlib.org/lapack90, http:// for Civilized Software. Civilized Software’s flagship product is
www.netlib.org/clapack, and http://www.netlib.org/ java/f2j). MLAB, a program for curve fitting, solving systems of ordi-
The third edition of the Lapack Users’ Guide is in prepara- nary differential equations, statistics, and graphics. For more
tion and will be available soon at http://www.netlib.org/ information, see the Web site at http://www.civilized.com.
lapack/lug/lapacklug.html. Dan uses the Borland compiler for Windows. Dan writes, “I
recently finished a paper describing how to handle floating-
Little languages point exceptions in 32-bit computer programs running on
I continually find myself needing to write a parser for a small Windows 95/98/NT PCs.” Unfortunately, this same method
language. Often this is useful when translating something in does not work for Microsoft Visual C++, so the audience for
one format to another, or as an input language for a small tool. the paper is too small for this column, but Dan will be happy
There is a neat little tool to help do this written by John Aycock to e-mail you the paper if you are interested.
of the University of Victoria. John calls it “a framework for im-
plementing little languages, in 100% pure Python.” Since John Conferences past and present
didn’t name it, I’ll call it LLP. A paper about LLP and download The Open Source Software Conference put on by O’Reilly
information is at http://csr.uvic.ca/aycock/python. I used LLP and Associates (http://www.oreilly.com) in August was a really
when I wrote Pyfort (Sept./Oct. issue). The entire LLP package great affair. They combined conferences on six major open-
has some facilities that aren’t needed for a lot of simple appli- source software components (Linux, Sendmail, Perl, Python,
cations. As you read about it, be aware that you’re not locked Apache, and Tcl/Tk) with an open-source business track. This
into using the whole thing if you don’t want to. was just a great idea; you could attend sessions in any of the
One thing to like about LLP is that it uses an algorithm conferences. Every conference always has some dead spots
called the Earley algorithm; it requires less experience to where you are not interested in the subject, so rather than suf-
write a grammar for this tool than for LR(1) parsers such as fer through a boring session, you can go learn something you
yacc. Beginners at yacc often get as far as their first “shift/ wouldn’t ordinarily have time for. This is a very cool idea.
reduce error” and give up. Bill Joy from Sun Microsystems described a new “Com
Another fun thing about it is that is uses introspection. Be- munity Source” license scheme that you might call “Sorta-
lieve it or not, you input your grammar as comments in the Open Software.” He said Sun was committed to releasing
functions that will carry out your actions. Likewise, you enter its sources under this license, although not all at once.
your lexical expressions as comments in the functions that The next great conference is the 8th International Python
will create tokens after the recognizer spots them. The tool Conference, 24–27 January, in Alexandria, Virginia. Sign up
examines the functions you added to it and extracts the at http://www.python.org. Be there or be square.
grammar. This turns out to be strange but very effective. –Paul Dubois, [email protected]

NOVEMBER/DECEMBER 1999 71
SCIENTIFIC PROGRAMMING

More about MTL

As with everything these days, to find out more about MTL you can refer to
the official MTL Web site: http://lsc.nd. edu/research/mtl. This site contains com-
plete documentation of the MTL as well as the freely available source code. In
addition, links to some of the other generic programming efforts mentioned in
this article (such as the Standard Template Library, Iterative Template Library, element of x and z (or visit them in any
and Generic Graph Component Library) are available there. particular order).
We can implement the requirements
that we do face using an interface similar
Let’s look at the most interesting of these operations to STL’s. As with the STL, iterators form the principal in-
(mult()) and consider how we can make it generic. To be terface to data types (for using operator++() and opera-
generic, we would like to realize the operation, say, z ← Ax tor*() for traversal and access). We extend the STL notion
for any concrete representation of vectors x and z as well for of iterator with respect to a matrix in two ways. First, MTL
any matrix (linear operator) A. We can implement type in- has a two-level hierarchy of iterators to traverse matrices (to
dependence (at least syntactically) in C++ by using templates. reflect the objects’ 2D nature). Second, iterators over ma-
Thus, we could prototype a generic mult()as follows: trix elements can provide row and column index informa-
tion as well as be de-referenced for the matrix element value.
template <c
class Matrix, class VecX, class VecZ> The body of the mult() algorithm is then:
void mult (c
const Matrix& A, const VecX& x, VecZ z);
{
The next step is to define the body of mult() such that it typename Matrix::const_iterator i;
can work with arbitrary types (having a suitably defined in- typename Matrix::OneD::const_iterator j;
terface). At first, this might seem impossible. After all, there
are myriad matrix types: sparse, dense, rectangular, banded, for (i = A.begin(); i != A.end(); ++i)
column oriented, row oriented, and so on. for (j = (*i).begin(); j != (*i).end(); ++j)
Interestingly, the mathematical description of a matrix- z[row(j)] += *j * x[column(j)];
vector product points us in the right direction. We can write }
down what we mean by matrix-vector product in an N-
dimensional space this way: For those who might be unfamiliar with C++, or with the
particulars of STL-style C++ generic programs, this algo-
zi = ∑ aij xj . rithm simply embodies the textual description of the mult()
j
algorithm. The two nested loops serve to iterate over all the
One textual interpretation of this mathematical statement elements of A in the order they are stored in memory—
is the following. Let the matrix A be such that each element whether that be by row, column, or diagonal. Thus, for row-
in A has a corresponding row and column index, i and j, re- oriented matrices, the inner loop performs a dot-product op-
spectively. For each element in A, sum the product of that eration on rows of A with vector x; for column-oriented
element with the jth element of x into the ith element of z. matrices, the inner loop takes linear combinations of columns
This leads to the next step of our generic-programming of A scaled by elements of x (in an axpy()-like manner).
process: deriving a set of minimal requirements for the With this basic set of requirements in place, we can move
algorithm. to the last step of the generic-programming process—frame-
In this case, our textual description of the algorithm pro- work construction.
vides those requirements. To realize a generic mult(), we
must be able to Data types in MTL
Although our tour thus far through MTL’s generic-
• visit each element of A (optionally skipping zeroes), programming process has described what the interfaces to
• access the value of (de-reference) each element of A, matrices and vectors should look like, it leaves open the is-
• access the row and column indices of each element of A, and sue of the underlying implementation of those objects—al-
• randomly access values (that is, given some integer k, ac- lowing, in some sense, almost infinite flexibility, provided
cess the kth value) of the vectors x and z. the interface conditions are satisfied. Within MTL, we
sought to provide the largest possible variety of concrete
Note what is not required. We need not visit the elements matrix types (particularly those that are commonly used else-
of A in any particular order. Nor must A be of any particular where) while also requiring only a small number of actual
shape, nor must zero values be stored, nor must we visit every components. To accomplish these simultaneous goals, we

72 COMPUTING IN SCIENCE & ENGINEERING


Jeremy G. Siek is a PhD candidate in computer science at the University
of Notre Dame. His interests include generic pro-
gramming, high-performance libraries for C++,
and lanuage/compiler support for active libraries.
He received a BS in mathematics and an MS in
computer science at Notre Dame. Contact him at
SGI, 1600 Amphitheatre Pkwy MS 7U-178, Moun-
developed a compositional model for matrix types consist- tain View, CA 94043; [email protected].
ing of five basically orthogonal categories:
Andrew Lumsdaine is an associate professor in the Department of Com-
• Element type: The type of the individual matrix element— puter Science and Engineering at the University of
for example, a real or complex floating-point number. Notre Dame and is presently enjoying a sabbatical
• 1D storage: Containers of elements, similar to STL con- at Lawrence Berkeley National Laboratory. His re-
tainers—for example, dense, compressed. search interests include generic programming,
• 2D storage: Abstractly, a container of 1D storage con- high-performance scientific computing, and par-
tainers (not necessarily concretly so). allel and distributed processing. He received his
• Orientation: The mapping of indices between the 2D undergraduate and graduate education (along
storage structure and the matrix—for example, row-major, with a PhD) from MIT. He is a member of the IEEE, SIAM, and ACM.
column-major. Through the summer of 2000, he can be contacted at M/S 50B-2239,
• Shape: The outer envelope describing the structure of the One Cyclotron Rd., Berkeley, CA 94720; [email protected].
matrix—for example, rectangular, banded, triangular.

MTL matrices. Combining the appropriate components typedef matrix<d


double, rectangle<>,
to specify particular matrix types in MTL can be a somewhat compressed<>, row_major>:: type CompressedRow;
complex process, so we have created a matrix type specifica-
tion interface that simplifies this process for the user. The ma- typedef matrix<ddouble, rectangle<>,
trix interface can be described with a simple grammar.3 dense<>, column_major>::type FortranMatrix;

Matrix: matrix<EltType, Shape, Storage, typedef matrix<complex<ddouble>, banded<>,


Orientation>::type packed<>, column_major>::type BLASComplexPacked;
EltType: float | double | doubledouble | complex
<float> | complex<double> | complex typedef matrix<d
double, banded<>,
<doubledouble> | (any other field type) banded<>, column_major>:: type BLASBanded;
Shape: rectangle | diagonal | banded | triangle
| symmetric | hermitian | To construct a matrix, we need only invoke the construc-
TwoD Storage: dense | packed | banded | bandedview | tor with the matrix dimensions. We can create a matrix from
compressed | array of OneD | envelope data in a file (for example, in Matrix Market or Harwell Boe-
|(other TwoD) ing format) by using the appropriate matrix stream. For in-
OneD: dense vector | compressed vector | pair terfacing to other libraries, we can also create MTL matri-
vector | fortran-style list | linked ces from existing data by specifying that the matrix has
list | red-black tree | “external” storage and passing the appropriate data pointers
Orientation: row major | column major into the matrix constructor. Here are several examples.

This specification is open-ended. A new component type // Create an empty 1000 x 1000 matrix
can be used with any component group simply by meeting FortranMatrix A(1000, 1000);
the (deliberately minimal) interface specification for that
group. For example, we could use an interval class for Elt- // Create a matrix from a file
Type. A combinatorial number of matrices can be constructed matrix_market_stream<d double> mms (filename);
from the basic components already defined in MTL. We can CompressedRow B(mms);
use the current collection of just 21 MTL matrix components
with the standard numerical types of instance to construct lit- // Create a matrix from existing data
erally thousands of different matrix types! typedef matrix<ddouble, rectangle<>,
The following code examples show how to specify MTL compressed<iint,external>, column_major>::type
matrices corresponding to some of the more commonly used ExtCompRow;
matrix types. ExtCompRow C(m, n, nnz, values, indices, row_ptrs);

NOVEMBER/DECEMBER 1999 73
SCIENTIFIC PROGRAMMING

For the most part, MTL users do not need to access indi- Algorithms. Table 1 lists the principle algorithms included
vidual elements of a matrix once it is constructed because the in MTL. In the table, alpha and s are scalars, x,y, and z
MTL algorithms provide most operations. However, users are 1D containers, A, B, C, and E are matrices, and T is a tri-
may want to construct their own matrix algorithms. With this angular matrix. MTL does not define different operations
in mind, there are several ways to access the elements of an for each permutation of transpose, scaling, and striding as is
MTL matrix. The main access method is through iterators, as typically necessary in traditional libraries. Instead, only one
we showed in the previous matrix-vector multiply example. algorithm is provided, but it can be combined with the use of
Users can also access an MTL matrix in more traditional ways. strided and scaled vector adaptors, or the trans() mod-
For example, a single matrix element can be accessed with the ifier, to create the permutations as described later.
operator(i,j), 1D slices can be accessed with the opera-
tor[i], and a submatrix can be obtained with the sub_ma- Adaptors. One novel aspect of the MTL algorithm inter-
trix(i,j,m,n) method. The type of a 1D slice depends on face is the way we use adaptors to provide algorithm flexibil-
the matrix type (the slice might be a row, a column, or a diag- ity at a small constant implementation cost and with little or
onal). The rows() and columns() helper functions provide no extra runtime cost. Algorithm flexibility improves perfor-
an interface for creating different views of the same matrix re- mance by allowing a single function to carry out entire fami-
gardless of the underlying storage matrix layout. lies of operations. For example, you might wish to scale a vec-
tor while adding it to another (y ← α × x + y). This is the
// Get an Element operation carried out by the daxpy() BLAS function. By us-
FortranMatrix::value_type w = A(4 4,5); ing adaptors, the MTL add() operation can handle any com-
// Get a slice, in this case a column bination of scaling or striding without loss of performance.
FortranMatirx::Column c = A[4 4]; The adaptor modifies the behavior of the vector inside of the
// Get a row explicitly algorithm. In the case of scaling, this causes the elements to
FortanMatrix::Row r = rows (A) [3 3]; be multiplied as they are accessed. The call to scaled() does
not perform the multiplications before the call to add(), as
MTL matrices (and vectors) are reference counted, so the this would hurt performance, but instead the multiplications
user need not be concerned with memory management. happen during the add(). With a good optimizing C++ com-
piler, there is no extra overhead induced by the adaptors.
MTL vectors. MTL provides both dense-vector and
several sparse-vector types, implemented using standard // y ← αx + y
STL components (the MTL layer adds reference count- add (scale(x, alpha), x, y);
ing and handle-based semantics). The MTL vectors ex-
port the same interface as STL containers, including the // equivalent operation using BLAS
begin() and end() iterator accessors and the usual op- daxpy (n, alpha, xptr, 1, yptr, 1);
erator[i]. In addition, MTL vectors provide a conve-
nient way to access subvector views. The transpose of a matrix can be used in an algorithm with
the trans() adaptor. This adaptor performs a type conver-
// subrange vector s refers to elements [10, 30) sion (swapping matrix orientations) at compile time—there
dense1D<ddouble>::subrange_type s = x(110,30); is zero runtime cost. The next example shows how the ma-
trix-vector multiply algorithm (generically written to com-
MTL algorithms and adaptors pute z ← A × x + y) can also compute y ← AT × (α x) + β y.
The MTL provides basic (abstract) linear algebra func-
tionality and also provides a number of utility functions. // y ← AT × (αx) + βy
MTL thus provides functionality basically equivalent to that mult(trans(A), scaled(x, alpha), scaled(y, beta),
available with the BLAS Levels-1, 2 and 3.4–6 However, in y);
contrast to the BLAS, MTL algorithms work with a larger
number of matrix types (any matrix type that can be con- // equivalent operation using BLAS
structed within MTL), such as sparse matrices, and also with dgemv(‘T’, M, N, alpha, A_ptr, A_ld, x_ptr, 1,
any element type, not just single, double, and complex. beta, y_ptr, 1);

74 COMPUTING IN SCIENCE & ENGINEERING


Table 1. MTL linear algebra operations.

Function name Operation


scale(x,alpha) x←αx
scale(A,alpha) A←αA
add(x,y) y←x+y
add(x,y,z) z←x+y
Example: preconditioned GMRES(m) add(x,y,z,w) w←x+y+z
One important use for MTL is to rapidly construct high- add(A,C) C←A+C
quality, high-performance numerical libraries. Although not add(A,B,C) C←A+B
necessary, a generic approach can be used when developing mult(A,x,y) y←A×x
these libraries as well, resulting in reusable scientific soft- mult(A,x,y,z) z←A×x+y
ware at a higher level. To demonstrate how you might use mult(A,B,C) C←A×B
MTL for a nontrivial high-level library, we show the com- mult(A,B,C,E) E←A×B+C
plete implementation of the preconditioned GMRES(m) al- tri_solve(T,x,y) y ← T−1 × x
gorithm7 in Figure 1 (taken from our Iterative Template Li- tri_solve(T,B,C) C ← T−1 × B
brary collection). rank_one(x,A) A ← x × yT + A
The basic algorithmic steps (corresponding to the GMRES rank_two(x,y,A) A ← x × yT + y × xT + A
algorithm7) are given in the comments, and the calls to MTL s = dot(x,y) s ← xT ⋅ y

in the body of the algorithm should be fairly clear. Some of the s = dot_conj(x,y) s ← xT ⋅ y
other code might seem somewhat impenetrable at first glance, transpose(A) A ← AT
so we’ll quickly walk through the more difficult statements. transpose(A,B) B ← AT
The algorithm parameterizes GMRES in some important s = one_norm(x) s ← ∑i | xi |
ways, as the template statement shows on lines 1 and 2. The s = one_norm(A) s ← maxi(∑j|aij |)
matrix and vector types are parameterized, so that any ma- s = two_norm(x) s ← (∑i xi2)1/2
trix type can be used. In particular, you can use matrices hav- s = inf_norm(x) s ← max | xi |
ing any element type—real or complex. In fact, matrices with- s = inf_norm(A) s ← maxj(∑i| aij |)
out explicit elements at all (matrix-free operators8) can be s = sum(x) s ← ∑i xi
used. For MTL matrices, the generic MTL mult() will s = max(x) s ← max (xi)
generally suffice. For non-MTL matrices, or matrix-free op- s = min(x) s ← min (xi)
erators, a suitably overloaded mult() must be provided. i = max_index(x) i ← index of max | xi |
There are also two other type parameterizations of inter- ele_mult(x,y,z) z←y⊗x
est, the Preconditioner and the Iteration. Similar to ele_div(x,y,z) z←y∅x
the matrix parameterization, the preconditioner type is pa- set(x,alpha) xi ← α
rameterized so that arbitrary preconditioners can be used. set(A, alpha) A←α
The preconditioner simply must be callable with the set_diag(Aij,alpha) Aii ← α
solve() algorithm. The Iteration type parameter lets copy(x,y) y←x
users control the stopping criterion for the algorithm. (ITL copy(A,B) B←A
includes predefined stopping criteria.) swap(x,y) y↔x
The using namespace mtl statement on line 6 lets us ac- swap(A,B) B↔A
cess MTL functions (which are all declared within the mtl
namespace) without explicitly using the mtl:: scope oper-
ator. The use of a namespace helps prevent name clashes has been implemented, tested, and debugged, programmers
with other libraries and user code. wanting to use GMRES are forever spared the work of im-
On line 7, we use the internally defined typedef for the plementation, testing, and debugging GMRES themselves,
value_type to determine the type of the individual ele- saving time and reliability.
ments of the matrix. All MTL matrix and vector classes have
an accessible type member called value_type that specifies Performance
the type of the element data. By using this internal type, To compare MTL with other available libraries (both
rather than a fixed type, we can make the algorithm generic public domain and vendor-supplied), we performed a set of
with respect to element type. experiments involving dense matrix-matrix multiplication,
Although the entire algorithm fits on a single page, it is not dense matrix-vector multiplication, and sparse matrix-vector
a toy implementation—it is both high quality and high per- multiplication.
formance. This is where the power of reusable software com-
ponents becomes apparent. Now that a generic GMRES(m) Dense matrix-matrix multiplication. Figure 2a shows

NOVEMBER/DECEMBER 1999 75
SCIENTIFIC PROGRAMMING

1 template <cclass Matrix, class Vector, class VectorB, Figure 1. The Iterative Template Library
2 class Preconditioner, class Iteration> (ITL) implementation of the precondi-
3 int gmres (cconst Matrix &A, Vector &x,, const Vector B &b
tioned GMRES(m) algorithm. This algo-
4 const Preconditioner &M, int m, Iteration& outer)
5 { rithm computes an approximate solution
6 using namespace mtl; to Ax = b preconditioned with M. The
7 typedef typename Matrix::value_type T; restart value is specified by the parame-
8 typedef matrix<T, rectangle<>, dense<>,
column_major>::type InternalMatrix;
ter m.
9 InternalMatrix H(m+1, m), V(x.size(), m+1);
10 Vector s(m+1), w(x. size()), r(x.size()), u(x.size());
11 std::vector< givens_rotation<T> > rotations (m+1);
piler. We used all possible compiler-
12
13 mult(A, scaled(x, -1.0), b, w); optimization flags in all cases and
14 solve(M, w, r); // r0 = b – Ax0 cleared the cache between each trial.
15 typename Iteration::real beta = abs(two_norm(r))); To demonstrate portability across dif-
16
ferent architectures and compilers,
17 while (! outer.finished(beta)) { // outer iteration
18 copy (scaled(r, 1./beta), v[0 0]); // v1 = r0/r0 Figure 2b compares the performance
19 set (s, 0.0); of MTL with the Engineering and
20 0] = beta;
s[0 Scientific Subroutine Library (ESSL)
21 int j = 0;
22 Iteration inner (outer.normb(), m, outer.tol());
on an IBM RS/6000 590. In this case,
23 we compiled the MTL executable with
24 do { // Inner iteration the KCC and IBM xlc compilers.
25 mult(A, V[j], u);
26 solve(M, u, w);
27 for (iint i = 0; i <= j; i++) {
Dense and sparse matrix-vector
28 H(i,j) = dot_conj(w,V[I]); // hij =<Avj, vi> multiplication. Figure 3 shows per-
29 add(w, scaled(V[i], -h(i,j)), w); //v^k+1 = Avj — Σi=1 j
hijvi formance results obtained using the
30 } matrix-vector multiplication algorithm
31 H(j + 1, j) = two_norm(w); //hj+1,j = v^j+1
32 copy(scaled(w, 1./H(j + 1)), V[j+1]); //vj+1 = v^ j+1/hj+1,j
for dense and for sparse matrices, and
33 compares the performance to that ob-
34 // QR triangularization of H tained with nongeneric libraries. Fig-
35 for (iint i = 0; i < j; i++
ure 3a compares MTL’s dense matrix-
36 rotations [i].apply(H(i,j), H(i++1,j));
37 vector performance to the Netlib
38 rotations [j] = givens_rotation<T>(H(j,j), H(j+1,j)); BLAS (Fortran) and the Sun Perfor-
39 rotations [j].apply(H(j,j), H(j+1,j)); mance Library. Figure 3b compares
40 rotations [j].apply(s[i], s[i+ +1]);
MTL’s sparse matrix-vector perfor-
41
42 ++inner, ++outer, ++j; mance to Sparskit9 (Fortran), and the
43 } while (! Inner.finished(abs(s[j]))); NIST Sparse BLAS (C).10 We ran the
44 experiments on a Sun Ultra 30, used
45 // Form the approximate solution
46 tri_solve(tri_view<upper>()(H.sub_matrix(0 0, j, 0, j)), s);
sparse matrices from the MatrixMarket
47 mult(V.sub_matrix(0 0, x.size(), 0, j), s, x, x); collection, and did not clear the cache
48 between each matrix-vector timing
49 // Restart trial. This experiment focused on the
50 mult(A, scaled(x, -1.0), b, w);
51 solve(M, w, r);
algorithm’s pipeline behavior. If we had
52 beta = abs(two_norm(r)); cleared the cache, the bottleneck would
53 } have become memory bandwidth, and
54 we could not have seen differences in
55 return outer.error_code();
56 }
pipeline behavior. Blocking for cache is
not as important for matrix-vector
multiplication because there is no reuse
of matrix data.
the performance of dense matrix- get extra attention due to benchmark-
matrix product for MTL, Fortran ing). We compiled the MTL executa- The future of MTL
BLAS, and the Sun Performance Li- bles using Kuck and Associates C++, in Although MTL’s core functionality
brary, all obtained on a Sun Ultra 30. conjunction with v.5.0 of the Solaris C is complete, in many ways our work has
The experiment shows that the MTL compiler. We compiled the Fortran only begun. Using MTL as a founda-
can compete with vendor-tuned li- BLAS (obtained from Netlib) with tion, we plan to develop several li-
braries (on an algorithm that tends to v.5.0 of the Solaris Fortran 77 com- braries with higher levels of function-

76 COMPUTING IN SCIENCE & ENGINEERING


600 400

MTL MTL
350 Fortran BLAS
500 Sun Perf Lib
Fortran BLAS Sun Perf Lib
300
Performance (Mflops)

Performance (Mflops)
400
250

300 200

150
200
100
100
50

0 1 0
10 10 2 10 3 101 102 103
(a) Matrix size (a) N

300 120
MTL MTL
ESSL SPARSKIT
250 Netlib 100 NIST
Performance (Mflops)

Performance (Mflops)
200 80

150 60

100 40

50 20

0 1 2 3 0 0 1 2
10 10 10 10 10 10
(b) Matrix size (b) Average nonzeroes per row

Figure 2. Performance comparison of the MTL dense matrix- Figure 3. Performance of the MTL matrix-vector product ap-
matrix product with other libraries on (a) Sun Ultra 30 and (b) plied to (a) column-oriented dense and (b) row-oriented sparse
IBM RS6000. data structures compared with other libraries on Sun Ultra 30.

ality (similar to how Lapack11 uses the BLAS). As mentioned example, for library development) the existing syntax is per-
earlier, we have already developed the first such library, a fectly suitable. Nevertheless, such a syntax can have value,
collection of iterative solvers called the Iterative Template so we are investigating an operator-based interface based on
Library. Current work focuses on sparse and dense linear the expression template technology found in Blitz++12 and
solvers, eigenproblem routines, and SVD computations. In PETE.13
the meantime, we provide wrappers to give users a conve- Using an interpretive front end offers an alternative ap-
nient MTL-style interface to Lapack. The growing MTL proach to rapid prototyping (and an operator-based syntax)
user group is actively building on top of MTL and con- with MTL. We have recently developed one such system
tributing algorithms. (having a Matlab-like syntax) and are also investigating the
We are often asked about using overloaded operators for use of other interpreted scripting languages (such as
MTL. Although an operator-based syntax can be very handy Python).
for rapid prototyping, it is in some sense “syntactic sugar” We continue to refine MTL and work on porting it to
that we felt to be orthogonal to MTL’s original goals. We new compilers. Our (perhaps immodest) hope is that it will
don’t feel the present MTL syntax to be a significant draw- ultimately become suitable as a standard. We are currently
back in terms of its original goals—to apply generic pro- working closely with the SGI STL team to better integrate
gramming to the domain of numerical linear algebra. Sim- MTL with the STL.
ilarly, in terms of software-engineering practice (for Beyond MTL, we are also investigating the application of

NOVEMBER/DECEMBER 1999 77
SCIENTIFIC PROGRAMMING
P U R P O S E The
IEEE Com -
puter Society is the
world’s largest association of
computing professionals, and
is the leading provider of
technical information in the
field.

M E M B E R S H I P Members receive the monthly maga-


zine COMPUTER, discounts, and opportunities to serve (all activ-
ities are led by volunteer members). Membership is open to
all IEEE members, affiliate society members, and others
generic programming techniques to other problem domains interested in the computer field.
such as graph algorithms, image and video processing, and EXECUTIVE COMMITEE
parallel processing. We have made considerable progress in President: LEONARD L. TRIPP
Boeing Commercial Airplane Group
the former effort, which produced the Generic Graph Com- P.O. Box 3707
ponent Library.14 We report on GGCL elsewhere, but re- M/S 19-RF VP, Standards Activities:
Seattle, WA 98124
mark that it offers the same benefits as MTL—abstraction STEVEN L. DIAMOND *
VP, Technical Activities:
with high levels of performance. A GGCL implementation President-Elect: JAMES D. ISAAK *
GUYLAINE M. POLLOCK *
of minimum degree sparse matrix ordering matched the per- Past President: Secretary:
DORIS L. CARVER * DEBORAH K. SCHERRER*
formance of the very best Fortran routines.15 VP, Press Activities: Treasurer:

CARL K. CHANG MICHEL ISRAEL*
VP, Educational Activities: IEEE Division V Director:
JAMES H. CROSS II † MARIO R. BARBACCI †
VP, Conferences and Tutorials: IEEE Division VIII Director:
WILLIS K. KING (2ND VP) *
VP, Chapters Activities: BARRY W. JOHNSON†
FRANCIS C.M. LAU* Executive Director &
References VP, Publications: Chief Executive Officer:
1. A. Stepanov and M. Lee, The Standard Template Library, Tech. Report BENJAMIN W. WAH (1ST VP)* T. MICHAEL ELLIOTT †
HPL-95-11, HP Laboratories, Menlo Park, Calif., 1995.
2. J.G. Siek and A. Lumsdaine, “The Matrix Template Library: A Generic *voting member of the Board of Governors nonvoting member of the Board of Governors

Programming Approach to High Performance Numerical Linear Alge- B OARD OF GOVERNORS


bra,” Proc. Int’l Symp. Computing in Object-Oriented Parallel Environ- Term Expiring 1999: Steven L. Diamond, Richard A. Eckhouse,
ments, Springer Lecture Notes in Computer Science, pp. 59–70 Gene F. Hoffnagle, Tadao Ichikawa, James D. Isaak, Karl Reed, Debo-
rah K. Scherrer
3. K. Czarnecki, Generative Programming: Principles and Techniques of Soft- Term Expiring 2000: Fiorenza C. Albert-Howard, Paul L. Bor-
ware Engineering Based on Automated Configuration and Fragment-Based rill, Carl K. Chang, Deborah M. Cooper, James H. Cross, II, Ming T. Liu,
Component Models, PhD thesis, Technische Universität, Ilmenau, Ger- Christina M. Schober
many, 1998. Term Expiring 2001: Kenneth R. Anderson, Wolfgang K. Giloi,
Haruhisa Ichikawa, Lowell G. Johnson, David G. McKendry, Anneliese
4. J. Dongarra et al., “Algorithm 656: An Extended Set of Basic Linear Alge- von Mayrhauser, Thomas W. Williams
bra Subprograms,” ACM Trans. Mathematical Software, Vol. 14, No. 1, Next Board Meeting: 15 November 1999, Portland, Ore.
1998, pp. 18–32.
5. J. Dongarra et al., “A Set of Level 3 Basic Linear Algebra Subprograms,” COMPUTER SOCIETY OFFICES
ACM Trans. Mathematical Software, Vol. 16, No. 1, 1990, pp. 1–17. Headquarters Office European Office
6. C. Lawson et al., “Basic Linear Algebra Subprograms for Fortran Usage,” 1730 Massachusetts Ave. NW, 13, Ave. de L’Aquilon
Washington, DC 20036-1992 B-1200 Brussels, Belgium
ACM Trans. Mathematical Software, Vol. 5, No. 3, 1979, pp. 308–323. Phone: (202) 371-0101 Phone: 32 (2) 770-21-98
7. Y. Saad and M. Shultz, “GMRES: A Generalized Minimum Residual Algo- Fax: (202) 728-9614 Fax: 32 (2) 770-85-05
E-mail: [email protected] E-mail: [email protected]
rithm for Solving Nonsymmetric Linear Systems,” SIAM J. Scientific Sta-
Publications Office Asia/Pacific Office
tistical Computing, Vol. 7, No. 3, July 1986, pp. 856–869. 10662 Los Vaqueros Cir., Watanabe Building
8. P.N. Brown and A.C. Hindmarsh, “Matrix-Free Methods for Stiff Systems PO Box 3014
Los Alamitos, CA 90720-1314 1-4-2 Minami-Aoyama,
of ODEs,” SIAM J. Numerical Analysis, Vol. 23, No. 3, June 1986, pp. General Information: Minato-ku, Tokyo 107-0062,
610–638. Phone: (714) 821-8380 Japan
[email protected] Phone: 81 (3) 3408-3118
9. Y. Saad, SPARSKIT: A Basic Tool Kit for Sparse Matrix Computations, tech. Membership and Fax: 81 (3) 3408-3553
report, NASA Ames Research Center, Moffitt Field, Calif., 1990. Publication Orders: (800) 272-6657 E-mail: [email protected]
Fax: (714) 821-4641
10. K.A. Remington and R. Pozo, NIST Sparse BLAS User’s Guide, National In- E-mail: [email protected]
stitute of Standards and Technology, Washington DC.
11. E. Anderson et al., “Lapack: A Portable Linear Algebra Package for High- E X E C U T I V E S T A F F
Executive Director & Chief Information Officer:
Performance Computers, Proc. Supercomputing ‘90, IEEE Computer Soc. Chief Executive Officer: ROBERT G. CARE
Press, Los Alamitos, Calif., 1990, pp. 1–10. T. MICHAEL ELLIOTT
12. T.L. Veldhuizen, “Arrays in Blitz++,” Proc. Second Int’l Scientific Computing Director, Volunteer Services: Manager, Research &
in Object-Oriented Parallel Environments (ISCOPES’98), Lecture Notes in ANNE MARIE KELLY Planning:
JOHN C. KEATON
Computer Science, Springer-Verlag, New York, 1998, pp. 223–230. Chief Financial Officer:
VIOLET S. DOAN
13. S. Haney et al., Easy Expression Templates Using PETE, the Portable Expres-
sion Template Engine, Tech. Report LA-UR-99-777, Advanced Comput-
ing Laboratory, LANL, Los Alamos, N.M., 1999. I EEE OFFICERS
14. L.-Q. Lee, J.G. Siek, and A. Lumsdaine, “The Generic Graph Component President: KENNETH R. LAKER
President-Elect: BRUCE A. EISENSTEIN
Library,” OOPSLA ’99, IEEE Computer Soc. Press, Los Alamitos, Calif., Executive Director: DANIEL J. SENESE
1999, to be published. Secretary: MAURICE PAPO
15. L.-Q. Lee, J. G. Siek, and A. Lumsdaine, “Generic Graph Component Li- Treasurer: DAVID A. CONNOR
VP, Educational Activities: ARTHUR W. WINSTON
brary,” ISCOPE ’99, 1999, to be published. VP, Publications Activities: LLOYD A. “PETE” MORLEY
VP, Regional Activities: DANIEL R. BENIGNI
VP, Standards Association: DONALD C. LOUGHRY
VP, Technical Activities: MICHAEL S. ADLER
78 President, IEEE-USA: PAUL J. KOSTEK
30Sept1999

You might also like