Mathematics and Algorithms
Mathematics and Algorithms
Numbers
Types of Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Numeric Quantities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Digits in Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Exact and Approximate Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Numerical Precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Arbitrary-Precision Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Arbitrary-Precision Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Machine-Precision Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Interval Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Indeterminate and Infinite Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Controlling Numerical Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Algebraic Calculations
Symbolic Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Values for Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Transforming Algebraic Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Simplifying Algebraic Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Putting Expressions into Different Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Simplifying with Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Picking Out Pieces of Algebraic Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Controlling the Display of Large Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
Using Symbols to Tag Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Algebraic Manipulation
Structural Operations on Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Finding the Structure of a Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Structural Operations on Rational Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Algebraic Operations on Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Polynomials Modulo Primes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Symmetric Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Polynomials over Algebraic Number Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Trigonometric Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Expressions Involving Complex Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Logical and Piecewise Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Simplification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Using Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Linear Algebra
Constructing Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Getting and Setting Pieces of Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
Scalars, Vectors and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
Operations on Scalars, Vectors and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Multiplying Vectors and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Vector Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Matrix Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Basic Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Solving Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
Advanced Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
Tensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
Sparse Arrays: Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
Calculus
Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
Total Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
Derivatives of Unknown Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
The Representation of Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Defining Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
Indefinite Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Integrals That Can and Cannot Be Done . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
Definite Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
Integrals over Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
Manipulating Integrals in Symbolic Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Integral Transforms and Related Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
Generalized Functions and Related Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
Mathematical Functions
Naming Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
Generic and Nongeneric Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
Numerical Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
Piecewise Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388
Pseudorandom Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
Integer and Number Theoretic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394
Combinatorial Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Elementary Transcendental Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 419
Functions That Do Not Have Unique Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
Mathematical Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
Orthogonal Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425
Special Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
Elliptic Integrals and Elliptic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
Mathieu and Related Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
Working with Special Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
Numbers
Types of Numbers
Four underlying types of numbers are built into Mathematica.
Rational numbers always consist of a ratio of two integers, reduced to lowest terms.
In[1]:= 12 344 2222
6172
Out[1]=
1111
Approximate real numbers are distinguished by the presence of an explicit decimal point.
In[2]:= 5456.
Out[2]= 5456.
You can distinguish different types of numbers in Mathematica by looking at their heads.
(Although numbers in Mathematica have heads like other expressions, they do not have explicit
elements which you can extract.)
The presence of an explicit decimal point makes Mathematica treat 123. as an approximate
real number, with head Real .
In[7]:= [email protected]
Out[7]= Real
If you use complex numbers extensively, there is one subtlety you should be aware of. When
you enter a number like 123., Mathematica treats it as an approximate real number, but as-
sumes that its imaginary part is exactly zero. Sometimes you may want to enter approximate
Mathematics and Algorithms 3
If you use complex numbers extensively, there is one subtlety you should be aware of. When
you enter a number like 123., Mathematica treats it as an approximate real number, but as-
sumes that its imaginary part is exactly zero. Sometimes you may want to enter approximate
complex numbers with imaginary parts that are zero, but only to a certain precision.
When the imaginary part is the exact integer 0, Mathematica simplifies complex numbers to
real ones.
In[10]:= Head@123 + 0 ID
Out[10]= Integer
Here the imaginary part is only zero to a certain precision, so Mathematica retains the complex
number form.
In[11]:= Head@123. + 0. ID
Out[11]= Complex
The distinction between complex numbers whose imaginary parts are exactly zero, or are only
zero to a certain precision, may seem like a pedantic one. However, when we discuss, for
example, the interpretation of powers and roots of complex numbers in "Functions That Do Not
Have Unique Values", the distinction will become significant.
One way to find out the type of a number in Mathematica is just to pick out its head using
Head@exprD. For many purposes, however, it is better to use functions like IntegerQ which
explicitly test for particular types. Functions like this are set up to return True if their argument
is manifestly of the required type, and to return False otherwise. As a result, IntegerQ@xD will
give False, unless x has an explicit integer value.
Complex Numbers
You can enter complex numbers in Mathematica just by including the constant I, equal to -1 .
Make sure that you type a capital I.
If you are using notebooks, you can also enter I as by typing Esc ii Esc (see "Mathematical
Notation in Notebooks: Numerical Calculations"). The form is normally what is used in output.
Numeric Quantities
Mathematica knows that constants such as Pi are numeric quantities. It also knows that stan-
dard mathematical functions such as Log and Sin have numerical values when their arguments
are numerical.
In general, Mathematica assumes that any function which has the attribute NumericFunction
will yield numerical values when its arguments are numerical. All standard mathematical func-
tions in Mathematica already have this attribute. But when you define your own functions, you
can explicitly set the attribute to tell Mathematica to assume that these functions will have
numerical values when their arguments are numerical.
Digits in Numbers
This gives a list of digits, together with the number of digits that appear to the left of the
decimal point.
In[2]:= [email protected]
Out[2]= 881, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6<, 3<
Here is the binary digit sequence for 56, padded with zeros so that it is of total length 8.
In[3]:= IntegerDigits@56, 2, 8D
Out[3]= 80, 0, 1, 1, 1, 0, 0, 0<
This reconstructs the original number from its binary digit sequence.
In[4]:= FromDigits@%, 2D
Out[4]= 56
When the base is larger than 10, extra digits are represented by letters a|z.
You can do computations with numbers in base 16. Here the result is given in base 10.
In[11]:= 16^^fffaa2 + 16^^ff - 1
Out[11]= 16 776 096
You can give approximate real numbers, as well as integers, in other bases.
In[13]:= 2^^101.100101
Out[13]= 5.57813
"Output Formats for Numbers" describes how to print numbers in various formats. If you want
to create your own formats, you will often need to use MantissaExponent to separate the
pieces of real numbers.
This gives a list in which the mantissa and exponent of the number are separated.
In[17]:= [email protected] 10 ^ 125D
Out[17]= 80.345, 126<
Mathematica gives an exact result for 2100 , even though it has 31 decimal digits.
In[1]:= 2 ^ 100
Out[1]= 1 267 650 600 228 229 401 496 703 205 376
You can tell Mathematica to give you an approximate numerical result, just as a calculator
would, by ending your input with N. The N stands for "numerical". It must be a capital letter.
"Special Ways to Input Expressions" will explain what the means.
Mathematics and Algorithms 9
You can tell Mathematica to give you an approximate numerical result, just as a calculator
would, by ending your input with N. The N stands for "numerical". It must be a capital letter.
"Special Ways to Input Expressions" will explain what the means.
When you type in an integer like 7, Mathematica assumes that it is exact. If you type in a
number like 4.5, with an explicit decimal point, Mathematica assumes that it is accurate only to
a fixed number of decimal places.
This is taken to be an exact rational number, and reduced to its lowest terms.
In[5]:= 452 62
226
Out[5]=
31
Whenever you give a number with an explicit decimal point, Mathematica produces an approxi-
mate numerical result.
In[6]:= 452.3 62
Out[6]= 7.29516
10 Mathematics and Algorithms
Here again, the presence of the decimal point makes Mathematica give you an approximate
numerical result.
In[7]:= 452. 62
Out[7]= 7.29032
When any number in an arithmetic expression is given with an explicit decimal point, you get an
approximate numerical result for the whole expression.
In[8]:= 1. + 452 62
Out[8]= 8.29032
Numerical Precision
As discussed in "Exact and Approximate Results", Mathematica can handle approximate real
numbers with any number of digits. In general, the precision of an approximate real number is
the effective number of decimal digits in it that are treated as significant for computations. The
accuracy is the effective number of these digits that appear to the right of the decimal point.
Note that to achieve full consistency in the treatment of numbers, precision and accuracy often
have values that do not correspond to integer numbers of digits.
The accuracy is lower since only some of the digits are to the right of the decimal point.
In[3]:= Accuracy@xD
Out[3]= 25.0285
This number has all its digits to the right of the decimal point.
Mathematics and Algorithms 11
This number has all its digits to the right of the decimal point.
In[4]:= x 10 ^ 6
Out[4]= 0.0936480474760830209737166901849
An approximate real number always has some uncertainty in its value, associated with digits
beyond those known. One can think of precision as providing a measure of the relative size of
this uncertainty. Accuracy gives a measure of the absolute size of the uncertainty.
Mathematica is set up so that if a number x has uncertainty d, then its true value can lie any-
where in an interval of size d from x - d 2 to x + d 2. An approximate number with accuracy a is
defined to have uncertainty 10-a , while a nonzero approximate number with precision p is
defined to have uncertainty x 10-p .
Precision@xD -log10 Hd x L
Adding or subtracting a quantity smaller than the uncertainty has no visible effect.
In[6]:= 8x - 10 ^ - 26, x, x + 10 ^ - 26<
Out[6]= 893 648.0474760830209737166901849,
93 648.0474760830209737166901849, 93 648.0474760830209737166901849<
As discussed in more detail below, machine numbers work by making direct use of the numeri-
cal capabilities of your underlying computer system. As a result, computations with them can
often be done more quickly. They are however much less flexible than arbitrary-precision num-
bers, and difficult numerical analysis can be needed to determine whether results obtained with
them are correct.
Machine numbers.
On this computer, machine numbers have slightly less than 16 decimal digits.
In[10]:= $MachinePrecision
Out[10]= 15.9546
When you enter an approximate real number, Mathematica has to decide whether to treat it as
a machine number or an arbitrary-precision number. Unless you specify otherwise, if you give
less than $MachinePrecision digits, Mathematica will treat the number as machine precision,
and if you give more digits, it will treat the number as arbitrary precision.
Mathematics and Algorithms 13
When Mathematica prints out numbers, it usually tries to give them in a form that will be as
easy as possible to read. But if you want to take numbers that are printed out by Mathematica,
and then later use them as input to Mathematica, you need to make sure that no information
gets lost.
In standard output form, Mathematica prints a number like this to six digits.
In[11]:= N@PiD
Out[11]= 3.14159
In input form, Mathematica explicitly indicates the precision of the number, and gives extra
digits to make sure the number can be reconstructed correctly.
In[14]:= InputForm@%D
Out[14]//InputForm= 3.1415926535897932384626433832795028842`20.
14 Mathematics and Algorithms
InputFormAexpr,NumberMarks ->TrueE
use ` marks in all approximate numbers
InputFormAexpr,NumberMarks ->AutomaticE
use ` only in arbitrary-precision numbers
InputFormAexpr,NumberMarks ->FalseE
never use ` marks
The default setting for the NumberMarks option, both in InputForm and in functions such as
ToString and OpenWrite is given by the value of $NumberMarks. By resetting $NumberMarks,
therefore, you can globally change the way that numbers are printed in InputForm.
This makes Mathematica by default always include number marks in input form.
In[16]:= $NumberMarks = True
Out[16]= True
Even with no number marks, InputForm still uses * ^ for scientific notation.
In[18]:= InputForm@N@Exp@600D, 20D, NumberMarks -> FalseD
Out[18]//InputForm= 3.7730203009299398234*^260
In doing numerical computations, it is inevitable that you will sometimes end up with results
that are less precise than you want. Particularly when you get numerical results that are very
close to zero, you may well want to assume that the results should be exactly zero. The func-
tion Chop allows you to replace approximate real numbers that are close to zero by the exact
integer 0.
Mathematics and Algorithms 15
Arbitrary-Precision Calculations
When you use N to get a numerical result, Mathematica does what a standard calculator
would do: it gives you a result to a fixed number of significant figures. You can also tell Mathe-
matica exactly how many significant figures to keep in a particular calculation. This allows you
to get numerical results in Mathematica to any degree of precision.
This gives the numerical value of p to a fixed number of significant digits. Typing N@PiD is
exactly equivalent to Pi N .
In[1]:= N@PiD
Out[1]= 3.14159
Here is 7 to 30 digits.
In[3]:= N@Sqrt@7D, 30D
Out[3]= 2.64575131106459059050161575364
Doing any kind of numerical calculation can introduce small roundoff errors into your results.
When you increase the numerical precision, these errors typically become correspondingly
smaller. Making sure that you get the same answer when you increase numerical precision is
often a good way to check your results.
The quantity ep 163 turns out to be very close to an integer. To check that the result is not, in
fact, an integer, you have to use sufficient numerical precision.
In[4]:= N@Exp@Pi Sqrt@163DD, 40D
17
Out[4]= 2.625374126407687439999999999992500725972 10
Arbitrary-Precision Numbers
When you do calculations with arbitrary-precision numbers, Mathematica keeps track of preci-
sion at all points. In general, Mathematica tries to give you results which have the highest
possible precision, given the precision of the input you provided.
When you do a computation, Mathematica keeps track of which digits in your result could be
affected by unknown digits in your input. It sets the precision of your result so that no affected
digits are ever included. This procedure ensures that all digits returned by Mathematica are
correct, whatever the values of the unknown digits may be.
Mathematics and Algorithms 17
digits are ever included. This procedure ensures that all digits returned by Mathematica are
correct, whatever the values of the unknown digits may be.
If you give input only to a few digits of precision, Mathematica cannot give you such high-
precision output.
In[5]:= N@[email protected], 30D
Out[5]= 6.58965
If you want Mathematica to assume that the argument is exactly 142 1000, then you have to
say so explicitly.
In[6]:= N@Gamma@142 1000D, 30D
Out[6]= 6.58964729492039788328481917496
In many computations, the precision of the results you get progressively degrades as a result of
"roundoff error". A typical case of this occurs if you subtract two numbers that are close
together. The result you get depends on high-order digits in each number, and typically has far
fewer digits of precision than either of the original numbers.
Both input numbers have a precision of around 20 digits, but the result has much lower preci-
sion.
In[7]:= 1.11111111111111111111 - 1.11111111111111111000
-18
Out[7]= 1.1 10
Adding extra digits in one number but not the other is not sufficient to allow extra digits to be
found in the result.
In[8]:= 1.11111111111111111111345 - 1.11111111111111111000
-18
Out[8]= 1.1 10
The precision of the output from a function can depend in a complicated way on the precision of
the input. Functions that vary rapidly typically give less precise output, since the variation of
the output associated with uncertainties in the input is larger. Functions that are close to con-
stants can actually give output that is more precise than their input.
18 Mathematics and Algorithms
the output associated with uncertainties in the input is larger. Functions that are close to con-
stants can actually give output that is more precise than their input.
Here is a case where the output is less precise than the input.
In[9]:= Sin@111 111 111.0000000000000000D
Out[9]= -0.2975351033349432
The result you get by adding the exact integer 1 has a higher precision.
In[11]:= 1+%
Out[11]= 1.0000000000000000042483542552915889953
It is worth realizing that different ways of doing the same calculation can end up giving you
results with very different precisions. Typically, if you once lose precision in a calculation, it is
essentially impossible to regain it; in losing precision, you are effectively losing information
about your result.
The fact that different ways of doing the same calculation can give you different numerical
answers means, among other things, that comparisons between approximate real numbers
must be treated with care. In testing whether two real numbers are "equal", Mathematica
effectively finds their difference, and tests whether the result is "consistent with zero" to the
precision given.
The internal algorithms that Mathematica uses to evaluate mathematical functions are set up to
maintain as much precision as possible. In most cases, built-in Mathematica functions will give
you results that have as much precision as can be justified on the basis of your input. In some
cases, however, it is simply impractical to do this, and Mathematica will give you results that
have lower precision. If you give higher-precision input, Mathematica will use higher precision
in its internal calculations, and you will usually be able to get a higher-precision result.
Numerical evaluation.
If you start with an expression that contains only integers and other exact numeric quantities,
then N@expr, nD will in almost all cases succeed in giving you a result to n digits of precision. You
should realize, however, that to do this Mathematica sometimes has to perform internal interme-
diate calculations to much higher precision.
The global variable $MaxExtraPrecision specifies how many additional digits should be allowed
in such intermediate calculations.
Mathematica automatically increases the precision that it uses internally in order to get the
correct answer here.
20 Mathematics and Algorithms
Mathematica automatically increases the precision that it uses internally in order to get the
correct answer here.
In[18]:= N@Sin@10 ^ 40D, 30D
Out[18]= -0.569633400953636327308034181574
Using the default setting $MaxExtraPrecision = 50, Mathematica cannot get the correct
answer here.
In[19]:= N@Sin@10 ^ 100D, 30D
This tells Mathematica that it can use more digits in its internal calculations.
In[20]:= $MaxExtraPrecision = 200
Out[20]= 200
Even when you are doing computations that give exact results, Mathematica still occasionally
uses approximate numbers for some of its internal calculations, so that the value of
$MaxExtraPrecision can thus have an effect.
With the default value of $MaxExtraPrecision , Mathematica cannot work this out.
In[24]:= Sin@Exp@200DD > 0
N::meprec : Internal precision limit $MaxExtraPrecision = 50.` reached while evaluating -SinA200 E.
200
Out[24]= SinA E > 0
In doing calculations that degrade precision, it is possible to end up with numbers that have no
significant digits at all. But even in such cases, Mathematica still maintains information on the
accuracy of the numbers. Given a number with no significant digits, but accuracy a, Mathemat-
ica can then still tell that the actual value of the number must be in the range
8- 10-a , + 10-a < 2. Mathematica by default prints such numbers in the form 0. 10e .
Adding the result to an exact 1 gives a number with quite high precision.
In[29]:= 1 + %%
Out[29]= 1.000000000000000000000
One subtlety in characterizing numbers by their precision is that any number that is consistent
with zero must be treated as having zero precision. The reason for this is that such a number
has no digits that can be recognized as significant, since all its known digits are just zero.
But it still has a definite accuracy, that characterizes the uncertainty in it.
In[32]:= Accuracy@dD
Out[32]= 19.5029
If you do computations whose results are likely to be near zero, it can be convenient to specify
the accuracy, rather than the precision, that you want to get.
N::meprec :
1
Internal precision limit $MaxExtraPrecision = 50.` reached while evaluating -ArcCot@3D + ArcTanB F.
3
-71
Out[35]= 0. 10
When Mathematica works out the potential effect of unknown digits in arbitrary-precision num-
bers, it assumes by default that these digits are completely independent in different numbers.
Mathematics and Algorithms 23
When Mathematica works out the potential effect of unknown digits in arbitrary-precision num-
bers, it assumes by default that these digits are completely independent in different numbers.
While this assumption will never yield too high a precision in a result, it may lead to unneces-
sary loss of precision.
In particular, if two numbers are generated in the same way in a computation, some of their
unknown digits may be equal. Then, when these numbers are, for example, subtracted, the
unknown digits may cancel. By assuming that the unknown digits are always independent,
however, Mathematica will miss such cancellations.
This quantity has lower precision, since Mathematica assumes that the unknown digits in each
number d are independent.
In[39]:= Precision@H1 + dL - dD
Out[39]= 34.0126
The precision of the result is now about 44 digits, rather than 34.
In[42]:= Precision@%D
Out[42]= 44.0126
SetPrecision works by adding digits which are zero in base 2. Sometimes, Mathematica stores
slightly more digits in an arbitrary-precision number than it displays, and in such cases,
SetPrecision will use these extra digits before introducing zeros.
This creates a number with a precision of 40 decimal digits. The extra digits come from conver-
sion to base 10.
In[43]:= [email protected], 40D
Out[43]= 0.4000000000000000222044604925031308084726
If you set $MaxPrecision = n as well as $MinPrecision = n, then you can force all arbitrary-
precision numbers to have a fixed precision of n digits. In effect, what this does is to make
Mathematica treat arbitrary-precision numbers in much the same way as it treats machine
numbers~but with more digits of precision.
Fixed-precision computation can make some calculations more efficient, but without careful
analysis you can never be sure how many digits are correct in the results you get.
Mathematics and Algorithms 25
The first few digits are correct, but the rest are wrong.
In[47]:= Evaluate@1 + kD - 1
-27
Out[47]= 8.7565107626963908935 10
Machine-Precision Numbers
Whenever machine-precision numbers appear in a calculation, the whole calculation is typically
done in machine precision. Mathematica will then give machine-precision numbers as the result.
Whenever the input contains any machine-precision numbers, Mathematica does the computa-
tion to machine precision.
In[1]:= 1.4444444444444444444 ^ 5.7
Out[1]= 8.13382
The fact that you can get spurious digits in machine-precision numerical calculations with Mathe-
matica is in many respects quite unsatisfactory. The ultimate reason, however, that Mathemat-
ica uses fixed precision for these calculations is a matter of computational efficiency.
Mathematica is usually set up to insulate you as much as possible from the details of the com-
puter system you are using. In dealing with machine-precision numbers, you would lose too
much, however, if Mathematica did not make use of some specific features of your computer.
The important point is that almost all computers have special hardware or microcode for doing
floating-point calculations to a particular fixed precision. Mathematica makes use of these
features when doing machine-precision numerical calculations.
The typical arrangement is that all machine-precision numbers in Mathematica are represented
as "double-precision floating-point numbers" in the underlying computer system. On most
current computers, such numbers contain a total of 64 binary bits, typically yielding 16 decimal
digits of mantissa.
The main advantage of using the built-in floating-point capabilities of your computer is speed.
Arbitrary-precision numerical calculations, which do not make such direct use of these capabili-
ties, are usually many times slower than machine-precision calculations.
Mathematics and Algorithms 27
The main advantage of using the built-in floating-point capabilities of your computer is speed.
Arbitrary-precision numerical calculations, which do not make such direct use of these capabili-
ties, are usually many times slower than machine-precision calculations.
There are several disadvantages of using built-in floating-point capabilities. One already
mentioned is that it forces all numbers to have a fixed precision, independent of what precision
can be justified for them.
A second disadvantage is that the treatment of machine-precision numbers can vary slightly
from one computer system to another. In working with machine-precision numbers, Mathemat-
ica is at the mercy of the floating-point arithmetic system of each particular computer. If float-
ing-point arithmetic is done differently on two computers, you may get slightly different results
for machine-precision Mathematica calculations on those computers.
This gives the value of $MachineEpsilon for the computer system on which these examples
are run.
In[7]:= $MachineEpsilon
Although this prints as 1., Mathematica knows that the result is larger than 1.
In[8]:= 1. + $MachineEpsilon
Out[8]= 1.
In this case, however, the result is not distinguished from 1. to machine precision.
In[12]:= % InputForm
Out[12]//InputForm= 1.
Machine numbers have not only limited precision, but also limited magnitude. If you generate a
number which lies outside the range specified by $MinMachineNumber and $MaxMachineNumber,
Mathematica will automatically convert the number to arbitrary-precision form.
This is the maximum machine-precision number which can be handled on the computer system
used for this example.
In[14]:= $MaxMachineNumber
Here is another computation whose result is outside the range of machine-precision numbers.
In[16]:= [email protected]
Interval Arithmetic
The square of any number between -2 and +5 is always between 0 and 25.
In[2]:= Interval@8- 2, 5<D ^ 2
Out[2]= Interval@80, 25<D
IntervalUnion@interval1 ,interval2 ,D
find the union of several intervals
IntervalIntersection@interval1 ,interval2 ,D
find the intersection of several intervals
IntervalMemberQ@interval,xD test whether the point x lies within an interval
IntervalMemberQ@interval1 ,interval2 D
test whether interval2 lies completely within interval1
Operations on intervals.
You can use Max and Min to find the end points of intervals.
In[8]:= Max@%D
Out[8]= 5
You can use intervals not only with exact quantities but also with approximate numbers. Even
with machine-precision numbers, Mathematica always tries to do rounding in such a way as to
preserve the validity of results.
This shows explicitly the interval treated by Mathematica as the machine-precision number 0.
In[10]:= [email protected]
-308
Out[10]= IntervalA9-2.22507 10 , 2.22507 10-308 =E
Mathematics and Algorithms 31
This shows the corresponding interval around 100., shifted back to zero.
In[11]:= [email protected] - 100
-14 -14
Out[11]= IntervalA9-1.42109 10 , 1.42109 10 =E
If you type in an expression like 0 0, Mathematica prints a message, and returns the result
Indeterminate .
In[1]:= 00
1
Power::infy : Infinite expression encountered.
0
Out[1]= Indeterminate
you ever try to use Indeterminate in an arithmetic computation, you always get the result
Indeterminate. A single indeterminate expression effectively "poisons" any arithmetic computa-
32 Mathematics and Algorithms
you ever try to use Indeterminate in an arithmetic computation, you always get the result
Indeterminate. A single indeterminate expression effectively "poisons" any arithmetic computa -
tion. (The symbol Indeterminate plays a role in Mathematica similar to the "not a number"
object in the IEEE Floating Point Standard.)
The usual laws of arithmetic simplification are suspended in the case of Indeterminate .
In[2]:= Indeterminate - Indeterminate
Out[2]= Indeterminate
You can use Check inside a program to test whether warning messages are generated in a
computation.
In[4]:= Check@H7 - 7L H8 - 8L, meaninglessD
1
Power::infy : Infinite expression encountered.
0
Out[4]= meaningless
There are many situations where it is convenient to be able to do calculations with infinite
quantities. The symbol Infinity in Mathematica represents a positive infinite quantity. You can
use it to specify such things as limits of sums and integrals. You can also do some arithmetic
calculations with it.
Mathematics and Algorithms 33
use it to specify such things as limits of sums and integrals. You can also do some arithmetic
calculations with it.
If you try to find the difference between two infinite quantities, you get an indeterminate result.
In[7]:= Infinity - Infinity
Out[7]= Indeterminate
There are a number of subtle points that arise in handling infinite quantities. One of them
concerns the "direction" of an infinite quantity. When you do an infinite integral, you typically
think of performing the integration along a path in the complex plane that goes to infinity in
some direction. In this case, it is important to distinguish different versions of infinity that
correspond to different directions in the complex plane. + and - are two examples, but for
some purposes one also needs i and so on.
Although the notion of a "directed infinity" is often useful, it is not always available. If you type
in 1 0, you get an infinite result, but there is no way to determine the "direction" of the infinity
1
Power::infy : Infinite expression encountered.
0
Out[9]= ComplexInfinity
NHoldRest prevent all but the first argument from being affected
Usually N goes inside functions and gets applied to each of their arguments.
In[1]:= N@f@2 3, PiDD
Out[1]= [email protected], 3.14159D
Algebraic Calculations
Symbolic Computation
One of the important features of Mathematica is that it can do symbolic, as well as numerical
calculations. This means that it can handle algebraic formulas as well as numbers.
Numerical computation 62 + 3 - 1 64
Symbolic computation 3 x + 2 x - x + 22
Out[3]= -1 + 2 x + x3
Mathematica automatically carries out basic algebraic simplifications. Here it combines x2 and
-4 x2 to get -3 x2 .
In[4]:= x^2 + x - 4 x^2
Out[4]= x - 3 x2
You can type in any algebraic expression, using the operators listed in "Arithmetic". You can
use spaces to denote multiplication. Be careful not to forget the space in x y. If you type in xy
with no space, Mathematica will interpret this as a single symbol, with the name xy, not as a
product of the two symbols x and y.
36 Mathematics and Algorithms
Mathematica rearranges and combines terms using the standard rules of algebra.
In[5]:= x y + 2 x^2 y + y^2 x^2 - 2 y x
Out[5]= -x y + 2 x2 y + x2 y2
Out[7]= 4 - 3 x2 + x3 + 8 y - 8 x y + 2 x2 y
When you type in more complicated expressions, it is important that you put parentheses in the
right places. Thus, for example, you have to give the expression x4 y in the form x ^ H4 yL. If you
leave out the parentheses, you get x4 y instead. It never hurts to put in too many parentheses,
but to find out exactly when you need to use parentheses, look at "Operator Input Forms".
When you type in an expression, Mathematica automatically applies its large repertoire of rules
for transforming expressions. These rules include the standard rules of algebra, such as x - x = 0,
together with much more sophisticated rules involving higher mathematical functions.
4
Mathematica uses standard rules of algebra to replace J 1 + x N by H1 + xL2 .
In[10]:= Sqrt@1 + xD ^ 4
Out[10]= H1 + xL2
Mathematics and Algorithms 37
Mathematica knows no rules for this expression, so it leaves the expression in the original form
you gave.
In[11]:= Log@1 + Cos@xDD
Out[11]= Log@1 + Cos@xDD
The notion of transformation rules is a very general one. In fact, you can think of the whole of
Mathematica as simply a system for applying a collection of transformation rules to many
different kinds of expressions.
The general principle that Mathematica follows is simple to state. It takes any expression you
input, and gets results by applying a succession of transformation rules, stopping when it
knows no more transformation rules that can be applied.
Take any expression, and apply transformation rules until the result no longer changes.
Often, however, you need to replace a symbol like x with a definite "value". Sometimes this
value will be a number; often it will be another expression.
To take an expression such as 1 + 2 x and replace the symbol x that appears in it with a definite
value, you can create a Mathematica transformation rule, and then apply this rule to the expres-
sion. To replace x with the value 3, you would create the transformation rule x -> 3. You must
type -> as a pair of characters, with no space in between. You can think of x -> 3 as being a
rule in which "x goes to 3".
To apply a transformation rule to a particular Mathematica expression, you type expr . rule. The
"replacement operator" . is typed as a pair of characters, with no space in between.
You can replace x with any expression. Here every occurrence of x is replaced by 2 - y.
38 Mathematics and Algorithms
You can replace x with any expression. Here every occurrence of x is replaced by 2 - y.
In[2]:= 1 + x + x ^ 2 . x -> 2 - y
Out[2]= 3 + H2 - yL2 - y
Here is a transformation rule. Mathematica treats it like any other symbolic expression.
In[3]:= x -> 3 + y
Out[3]= x3+y
This applies the transformation rule on the previous line to the expression x ^ 2 - 9.
In[4]:= x ^ 2 - 9 . %
Out[4]= -9 + H3 + yL2
Out[5]= H4 - aL H2 + aL2
The replacement operator . allows you to apply transformation rules to a particular expres-
sion. Sometimes, however, you will want to define transformation rules that should always be
applied. For example, you might want to replace x with 3 whenever x occurs.
As discussed in "Defining Variables", you can do this by assigning the value 3 to x using x = 3.
Once you have made the assignment x = 3, x will always be replaced by 3, whenever it
appears.
Now x is replaced by 1 + a.
In[9]:= x^2 - 1
Out[9]= -1 + H1 + aL2
You can define the value of a symbol to be any expression, not just a number. You should
realize that once you have given such a definition, the definition will continue to be used when-
ever the symbol appears, until you explicitly change or remove the definition. For most people,
forgetting to remove values you have assigned to symbols is the single most common source of
mistakes in using Mathematica.
A symbol such as x can serve many different purposes in Mathematica, and in fact, much of the
flexibility of Mathematica comes from being able to mix these purposes at will. However, you
need to keep some of the different uses of x straight in order to avoid making mistakes. The
most important distinction is between the use of x as a name for another expression, and as a
symbolic variable that stands only for itself.
Traditional programming languages that do not support symbolic computation allow variables to
be used only as names for objects, typically numbers, that have been assigned as values for
them. In Mathematica, however, x can also be treated as a purely formal variable, to which
various transformation rules can be applied. Of course, if you explicitly give a definition, such as
40 Mathematics and Algorithms
Traditional programming languages that do not support symbolic computation allow variables to
be used only as names for objects, typically numbers, that have been assigned as values for
them. In Mathematica, however, x can also be treated as a purely formal variable, to which
various transformation rules can be applied. Of course, if you explicitly give a definition, such as
x = 3, then x will always be replaced by 3, and can no longer serve as a formal variable.
You should understand that explicit definitions such as x = 3 have a global effect. On the other
hand, a replacement such as expr . x -> 3 affects only the specific expression expr. It is usually
much easier to keep things straight if you avoid using explicit definitions except when abso-
lutely necessary.
You can always mix replacements with assignments. With assignments, you can give names to
expressions in which you want to do replacements, or to rules that you want to use to do the
replacements.
Out[13]= 1 + x2
Out[15]= 1 + 25 a2
This finds the value of t when x is replaced by Pi , and then evaluates the result numerically.
In[16]:= t . x -> Pi N
Out[16]= 10.8696
Expand gives the "expanded form", with products and powers multiplied out.
In[1]:= Expand@H1 + xL ^ 2D
Out[1]= 1 + 2 x + x2
Out[2]= H1 + xL2
Out[4]= H1 + x + 3 yL4
There are some cases, though, where Factor can give you more complicated expressions.
In[5]:= Factor@x ^ 10 - 1D
Out[5]= H-1 + xL H1 + xL I1 - x + x2 - x3 + x4 M I1 + x + x2 + x3 + x4 M
Out[6]= -1 + x10
the "simplest form", a worthwhile practical procedure is to look at many different forms of an
expression, and pick out the one that involves the smallest number of parts.
42 Mathematics and Algorithms
the "simplest form", a worthwhile practical procedure is to look at many different forms of an
expression, and pick out the one that involves the smallest number of parts.
Out[1]= H1 + xL2
Simplify leaves x10 - 1 in expanded form, since for this expression, the factored form is
larger.
In[2]:= Simplify@x ^ 10 - 1D
Out[2]= -1 + x10
You can often use Simplify to "clean up" complicated expressions that you get as the results
of computations.
Here is the integral of 1 Ix4 - 1M. Integrals are discussed in more detail in "Integration".
In[3]:= Integrate@1 Hx ^ 4 - 1L, xD
ArcTan@xD 1 1
Out[3]= - + Log@-1 + xD - Log@1 + xD
2 4 4
Differentiating the result from Integrate should give back your original expression. In this
case, as is common, you get a more complicated version of the expression.
In[4]:= D@%, xD
1 1 1
Out[4]= - -
4 H-1 + xL 4 H1 + xL 2 I1 + x2 M
Simplify succeeds in getting back the original, simpler, form of the expression.
In[5]:= Simplify@%D
1
Out[5]=
-1 + x4
Simplify is set up to try various standard algebraic transformations on the expressions you
give. Sometimes, however, it can take more sophisticated transformations to make progress in
finding the simplest form of an expression.
Mathematics and Algorithms 43
Simplify is set up to try various standard algebraic transformations on the expressions you
give. Sometimes, however, it can take more sophisticated transformations to make progress in
finding the simplest form of an expression.
FullSimplify tries a much wider range of transformations, involving not only algebraic func-
tions, but also many other kinds of functions.
For fairly small expressions, FullSimplify will often succeed in making some remarkable
simplifications. But for larger expressions, it can become unmanageably slow.
The reason for this is that to do its job, FullSimplify effectively has to try combining every
part of an expression with every other, and for large expressions the number of cases that it
has to consider can be astronomically large.
Simplify also has a difficult task to do, but it is set up to avoid some of the most time-consum-
ing transformations that are tried by FullSimplify. For simple algebraic calculations, there-
fore, you may often find it convenient to apply Simplify quite routinely to your results.
In more complicated calculations, however, even Simplify, let alone FullSimplify, may end
up needing to try a very large number of different forms, and therefore taking a long time. In
such cases, you typically need to do more controlled simplification, and use your knowledge of
the form you want to get to guide the process.
In many applications, the most common of these functions are Expand, Factor and Simplify.
44 Mathematics and Algorithms
However, particularly when you have rational expressions that contain quotients, you may need
to use other functions.
Expand expands out the numerator, but leaves the denominator in factored form.
In[2]:= Expand@eD
2 3x x3
Out[2]= - +
H-3 + xL2 H1 + xL H-3 + xL2 H1 + xL H-3 + xL2 H1 + xL
Apart breaks the expression apart into terms with simple denominators.
In[5]:= Apart@%D
5 19 1
Out[5]= 1 + + +
H-3 + xL2 4 H-3 + xL 4 H1 + xL
According to Simplify , this is the simplest way to write the original expression.
In[7]:= Simplify@eD
H-1 + xL2 H2 + xL
Out[7]=
H-3 + xL2 H1 + xL
Getting expressions into the form you want is something of an art. In most cases, it is best
simply to experiment, trying different transformations until you get what you want. Often you
will be able to use palettes in the front end to do this.
When you have an expression with a single variable, you can choose to write it as a sum of
terms, a product, and so on. If you have an expression with several variables, there is an even
wider selection of possible forms. You can, for example, choose to group terms in the expres-
sion so that one or another of the variables is "dominant".
As we have seen, even when you restrict yourself to polynomials and rational expressions,
there are many different ways to write any particular expression. If you consider more compli-
cated expressions, involving, for example, higher mathematical functions, the variety of possi-
ble forms becomes still greater. As a result, it is totally infeasible to have a specific function
built into Mathematica to produce each possible form. Rather, Mathematica allows you to con-
struct arbitrary sets of transformation rules for converting between different forms. Many Mathe-
matica packages include such rules; the details of how to construct them for yourself are given
in "Transformation Rules and Definitions".
There are nevertheless a few additional built-in Mathematica functions for transforming
expressions.
This expands out the trigonometric expression, writing it so that all functions have argument x.
In[12]:= TrigExpand@Tan@xD Cos@2 xDD
3 Tan@xD 1
Out[12]= Cos@xD Sin@xD - - Sin@xD2 Tan@xD
2 2 2
Mathematics and Algorithms 47
This expands the sine assuming that x and y are both real.
In[15]:= ComplexExpand@Sin@x + I yDD
Out[15]= Cosh@yD Sin@xD + Cos@xD Sinh@yD
The transformations on expressions done by functions like Expand and Factor are always
correct, whatever values the symbolic variables in the expressions may have. Sometimes,
however, it is useful to perform transformations that are only correct for some possible values
of symbolic variables. One such transformation is performed by PowerExpand .
Out[17]= xy
Out[18]= x y
48 Mathematics and Algorithms
Mathematica does not automatically simplify this, since it is only true for some values of x.
In[1]:= Simplify@Sqrt@x ^ 2DD
Out[1]= x2
This tells Simplify to make the assumption x > 0, so that simplification can proceed.
In[3]:= Simplify@Sqrt@x ^ 2D, x > 0D
Out[3]= x
Out[4]= 2a+2 a- -b a+ -b
Out[5]= 2 a+ a2 + b
Integers integers
This uses the fact that sin HxL, but not arcsin HxL, is real when x is real.
In[10]:= Simplify@Re@8Sin@xD, ArcSin@xD<D, Element@x, RealsDD
Out[10]= 8Sin@xD, Re@ArcSin@xDD<
Out[1]= 1 + 6 x + 9 x2 + 8 y2 + 24 x y2 + 16 y4
Out[2]= 6 + 24 y2
Out[4]= 8 y2
You may notice that the function Part@expr, nD used to pick out the nth term in a sum is the
same as the function described in "Manipulating Elements of Lists" for picking out elements in
lists. This is no coincidence. In fact, as discussed in "Manipulating Expressions like Lists," every
Mathematica expression can be manipulated structurally much like a list. However, as discussed
in "Manipulating Expressions like Lists," you must be careful, because Mathematica often shows
algebraic expressions in a form that is different from the way it treats them internally.
Coefficient works even with polynomials that are not explicitly expanded out.
In[5]:= Coefficient@H1 + 3 x + 4 y ^ 2L ^ 2, xD
Out[5]= 6 + 24 y2
If you end your input with a semicolon, Mathematica will do the computation you asked for, but
will not display the result. You can nevertheless use % or Out@nD to refer to the result.
By default, the Mathematica front end will display any outputs which are excessively large in a
shortened form inside an interface which allows you to refine the display of the output.
1 + 100 x + 4950 x2 + 161 700 x3 + 3 921 225 x4 + 75 287 520 x5 + 1 192 052 400 x6 + 5138 +
1 568 717 617 782 433 884 352 170 216 652 800 y98 + 3 137 435 235 564 867 768 704 340 433 305 600 x y98 +
Out[1]= 1 568 717 617 782 433 884 352 170 216 652 800 x2 y98 + 63 382 530 011 411 470 074 835 160 268 800 y99 +
63 382 530 011 411 470 074 835 160 268 800 x y99 + 1 267 650 600 228 229 401 496 703 205 376 y100
Show Less Show More Show Full Output Set Size Limit...
The Show Less and Show More buttons allow you to decrease or increase the level of detail to which
Mathematica shows the expression. The Show Full Output button removes the interface entirely and
displays the full result, but the result may take considerable time to display. The default threshold size at
which this feature starts working may be set using the Set Size Limit option, which opens the Prefer-
ences dialog to the panel with the appropriate setting.
The large output suppression feature is implemented using the Mathematica function Short.
You can use Short directly for finer control over the display of expressions. You can also use it
for outputs which are not large enough to be suppressed by the default suppression scheme.
Ending your input with ; stops Mathematica from displaying the complicated result of the
computation.
52 Mathematics and Algorithms
Ending your input with ; stops Mathematica from displaying the complicated result of the
computation.
In[2]:= Expand@Hx + 5 y + 10L ^ 8D;
You can still refer to the result as %. Short displays a one-line outline of the result. The
<< n >> stands for n terms that have been left out.
In[3]:= % Short
2 7 7 8
Out[3]//Short= 100 000 000 + 80 000 000 x + 28 000 000 x + 39 + 6 250 000 y + 625 000 x y + 390 625 y
This shows a three-line version of the expression. More parts are now visible.
In[4]:= Short@%, 3D
2 3 4
Out[4]//Short= 100 000 000 + 80 000 000 x + 28 000 000 x + 5 600 000 x + 700 000 x +
56 000 x5 + 2800 x6 + 80 x7 + x8 + 28 + 5 250 000 x2 y5 + 175 000 x3 y5 +
43 750 000 y6 + 8 750 000 x y6 + 437 500 x2 y6 + 6 250 000 y7 + 625 000 x y7 + 390 625 y8
Working with physical units gives one simple example. When you specify the length of an
object, you want to give not only a number, but also the units in which the length is measured.
In standard notation, you might write a length as 12 meters.
You can imitate this notation almost directly in Mathematica. You can for example simply use a
symbol meters to indicate the units of your measurement.
Mathematics and Algorithms 53
The symbol meters here acts as a tag, which indicates the units used.
In[1]:= 12 meters
Out[1]= 12 meters
There is in fact a Mathematica package that allows you to work with units. The package defines
many symbols that represent standard types of units.
Algebraic Manipulation
Expand expands out products and powers, writing the polynomial as a simple sum of terms.
In[2]:= t = Expand@%D
Out[2]= -4 + 12 x - 28 x2 + 52 x3 - 64 x4 + 64 x5 - 48 x6 + 16 x7
Out[4]= 4 I-1 + 3 x - 7 x2 + 13 x3 - 16 x4 + 16 x5 - 12 x6 + 4 x7 M
There are several ways to write any polynomial. The functions Expand, FactorTerms and
Factor give three common ways. Expand writes a polynomial as a simple sum of terms, with all
products expanded out. FactorTerms pulls out common factors from each term. Factor does
complete factoring, writing the polynomial as a product of terms, each of as low degree as
possible.
When you have a polynomial in more than one variable, you can put the polynomial in different
forms by essentially choosing different variables to be "dominant". Collect@poly, xD takes a
56 Mathematics and Algorithms
When you have a polynomial in more than one variable, you can put the polynomial in different
forms by essentially choosing different variables to be "dominant". Collect@poly, xD takes a
polynomial in several variables and rewrites it as a sum of terms containing different powers of
the "dominant variable" x.
Out[5]= 1 + 6 x + 12 x2 + 8 x3 + 3 y + 12 x y + 12 x2 y + 3 y2 + 6 x y2 + y3
Out[6]= 1 + 8 x3 + 3 y + 3 y2 + y3 + x2 H12 + 12 yL + x I6 + 12 y + 6 y2 M
If you specify a list of variables, Collect will effectively write the expression as a polynomial
in these variables.
In[7]:= Collect@Expand@H1 + x + 2 y + 3 zL ^ 3D, 8x, y<D
Out[7]= 1 + x3 + 8 y3 + 9 z + 27 z2 + 27 z3 + x2 H3 + 6 y + 9 zL +
y2 H12 + 36 zL + y I6 + 36 z + 54 z2 M + x I3 + 12 y2 + 18 z + 27 z2 + y H12 + 36 zLM
Expand@poly,pattD expand out poly avoiding those parts which do not contain
terms matching patt
This avoids expanding parts which do not contain objects matching b@_D.
In[9]:= Expand@Ha@1D + a@2D + 1L ^ 2 H1 + b@1DL ^ 2, b@_DD
c
PowerExpand @exprD expand out Ha bLc and Iab M in expr
Mathematica does not automatically expand out expressions of the form Ha bL ^ c except when c is
an integer. In general it is only correct to do this expansion if a and b are positive reals. Never-
Mathematics and Algorithms 57
Mathematica does not automatically expand out expressions of the form Ha bL ^ c except when c is
an integer. In general it is only correct to do this expansion if a and b are positive reals. Never-
theless, the function PowerExpand does the expansion, effectively assuming that a and b are
indeed positive reals.
PowerExpand does the expansion, effectively assuming that x and y are positive reals.
In[11]:= PowerExpand@%D
n n
Out[11]= x y
Horner form.
Horner form is a way of arranging a polynomial that allows numerical values to be computed
more efficiently by minimizing the number of multiplications.
Out[2]= 1 + x - 2 x2 - 2 x3 + x4 + x5 - 2 y - 4 x y + 4 x3 y + 2 x4 y + y2 + 3 x y2 + 3 x2 y2 + x3 y2
This gives the maximum exponent with which x appears in the polynomial t. For a polynomial
in one variable, Exponent gives the degree of the polynomial.
In[6]:= Exponent@t, xD
Out[6]= 5
Coefficient @poly, exprD gives the total coefficient with which expr appears in poly. In this
case, the result is a sum of two terms.
In[7]:= Coefficient@t, x ^ 2D
Out[7]= -2 + 3 y2
Out[8]= -2 + 3 y2
Out[9]= 1 - 2 y + y2
60 Mathematics and Algorithms
For multivariate polynomials, CoefficientList gives an array of the coefficients for each
power of each variable.
In[11]:= CoefficientList@t, 8x, y<D
Out[11]= 881, -2, 1<, 81, -4, 3<, 8-2, 0, 3<, 8-2, 4, 1<, 81, 2, 0<, 81, 0, 0<<
It is important to notice that the functions in this tutorial will often work even on polynomials
that are not explicitly given in expanded form.
Many of the functions also work on expressions that are not strictly polynomials.
Without giving specific integer values to a, b and c, this expression cannot strictly be consid-
ered a polynomial.
In[13]:= x^a + x^b + y^c
Out[13]= xa + xb + yc
Exponent@expr, xD still gives the maximum exponent of x in expr, but here has to write the
result in symbolic form.
In[14]:= Exponent@%, xD
Out[14]= Max@0, a, bD
Expand expands the numerator of each term, and divides all the terms by the appropriate
denominators.
In[3]:= Expand@tD
1 2x x2 3 x2
Out[3]= 4+ -4x+ + x2 + +
1-x 1-x 1-x H1 + xL2
ExpandAll does all possible expansions in the numerator and denominator of each term.
In[5]:= ExpandAll@tD
1 2x x2 3 x2
Out[5]= 4+ -4x+ + x2 + +
1-x 1-x 1-x 1 + 2 x + x2
ExpandAllAexpr,pattE , etc. avoid expanding parts which contain no terms matching patt
Controlling expansion.
You can use Factor to factor the numerator and denominator of the resulting expression.
In[9]:= Factor@%D
2 H-2 + xL H2 + xL
Out[9]=
H-1 + xL H1 + xL
Apart writes the expression as a sum of terms, with each term having as simple a denomina-
tor as possible.
In[10]:= Apart@uD
3 3
Out[10]= 2- +
-1 + x 1+x
Mathematics and Algorithms 63
Factor first puts all terms over a common denominator, then factors the result.
In[12]:= Factor@%D
2 H-2 + xL H2 + xL
Out[12]=
H-1 + xL H1 + xL
In expressions with several variables, you can use Apart@expr, varD to do partial fraction decom-
positions with respect to different variables.
If you do more advanced algebra with polynomials, however, you will have to use the algebraic
operations discussed in this tutorial.
You should realize that most of the operations discussed in this tutorial work only on ordinary
polynomials, with integer exponents and rational-number coefficients for each term.
64 Mathematics and Algorithms
You should realize that most of the operations discussed in this tutorial work only on ordinary
polynomials, with integer exponents and rational-number coefficients for each term.
PolynomialQuotient@poly1 ,poly2 ,xD find the result of dividing the polynomial poly1 in x by poly2 ,
dropping any remainder term
PolynomialRemainder@ find the remainder from dividing the polynomial poly1 in x
poly1 ,poly2 ,xD by poly2
Reduction of polynomials.
pHxL bHxL
Given two polynomials pHxL and qHxL, one can always uniquely write qHxL
= aHxL + qHxL
, where the
degree of bHxL is less than the degree of qHxL. PolynomialQuotient gives the quotient aHxL, and
PolynomialRemainder gives the remainder bHxL.
Out[3]= x2
PolynomialMod is essentially the analog for polynomials of the function Mod for integers. When
the modulus m is an integer, PolynomialMod@poly, mD simply reduces each coefficient in poly
modulo the integer m. If m is a polynomial, then PolynomialMod@poly, mD effectively tries to get
a polynomial with as low a degree as possible by subtracting from poly appropriate multiples q m
of m. The multiplier q can itself be a polynomial, but its degree is always less than the degree of
poly. PolynomialMod yields a final polynomial whose degree and leading coefficient are both as
small as possible.
This reduces x2 modulo x + 1. The result is simply the remainder from dividing the polynomials.
In[5]:= PolynomialMod@x ^ 2, x + 1D
Out[5]= 1
In this case, PolynomialMod and PolynomialRemainder do not give the same result.
In[6]:= 8PolynomialMod@x ^ 2, a x + 1D, PolynomialRemainder@x ^ 2, a x + 1, xD<
1
Out[6]= :x2 , >
a2
The main difference between PolynomialMod and PolynomialRemainder is that while the
former works simply by multiplying and subtracting polynomials, the latter uses division in
getting its results. In addition, PolynomialMod allows reduction by several moduli at the same
time. A typical case is reduction modulo both a polynomial and an integer.
66 Mathematics and Algorithms
PolynomialGCD@poly1 , poly2 D finds the highest degree polynomial that divides the polyi exactly.
PolynomialExtendedGCD gives the extended greatest common divisor of the two polynomi-
als.
In[9]:= 8g, 8r, s<< = PolynomialExtendedGCD@x ^ 3 + 2 x ^ 2 - x + 1, x ^ 4 + x + 2, xD
29 26 x 23 x2 21 x3 93 19 x 21 x2
Out[9]= :1, : - - + , - - >>
215 215 215 215 215 215 215
The returned polynomials r and s can be used to represent the GCD in terms of the original
polynomials.
In[10]:= r Hx ^ 3 + 2 x ^ 2 - x + 1L + s Hx ^ 4 + x + 2L Expand
Out[10]= 1
any pair of polynomials, the resultant is always a polynomial in their coefficients. By looking at
when the resultant is zero, you can tell for what values of their parameters two polynomials
have a common root. Two polynomials with leading coefficient one have k common roots if
exactly the first k elements in the list Subresultants@poly1 , poly2 , xD are zero.
Here is the resultant with respect to y of two polynomials in x and y. The original polynomials
have a common root in y only for values of x at which the resultant vanishes.
In[11]:= Resultant@Hx - yL ^ 2 - 2, y ^ 2 - 3, yD
Out[11]= 1 - 10 x2 + x4
The function Discriminant@poly, xD is the product of the squares of the differences of its roots.
It can be used to determine whether the polynomial has any repeated roots. The discriminant is
equal to the resultant of the polynomial and its derivative, up to a factor independent of the
variable.
Mathematics and Algorithms 67
The function Discriminant@poly, xD is the product of the squares of the differences of its roots.
It can be used to determine whether the polynomial has any repeated roots. The discriminant is
equal to the resultant of the polynomial and its derivative, up to a factor independent of the
variable.
Grbner bases appear in many modern algebraic algorithms and applications. The function
GroebnerBasis@8poly1 , poly2 , <, 8x1 , x2 , <D takes a set of polynomials, and reduces this set
to a canonical form from which many properties can conveniently be deduced. An important
feature is that the set of polynomials obtained from GroebnerBasis always has exactly the
same collection of common roots as the original set.
The Hx + yL2 is effectively redundant, and so does not appear in the Grbner basis.
In[14]:= GroebnerBasis@8Hx + yL, Hx + yL ^ 2<, 8x, y<D
Out[14]= 8x + y<
The polynomial 1 has no roots, showing that the original polynomials have no common roots.
In[15]:= GroebnerBasis@8x + y, x ^ 2 - 1, y ^ 2 - 2 x<, 8x, y<D
Out[15]= 81<
The polynomials are effectively unwound here, and can now be seen to have exactly five com-
mon roots.
In[16]:= GroebnerBasis@8x y ^ 2 + 2 x y + x ^ 2 + 1, x y + y ^ 2 + 1<, 8x, y<D
Out[16]= 9-1 - y2 + y3 + y4 + y5 , x + y2 + y3 + y4 =
PolynomialReduce@poly, 8p1 , p2 , <, 8x1 , x2 , <D yields a list 88a1 , a2 , <, b< of polynomials
with the property that b is minimal and a1 p1 + a2 p2 + + b is exactly poly.
68 Mathematics and Algorithms
Out[18]= 12 + 34 x + 34 x2 + 14 x3 + 2 x4
FactorTerms pulls out only the factor of 2 that does not depend on x.
In[19]:= FactorTerms@t, xD
Out[19]= 2 I6 + 17 x + 17 x2 + 7 x3 + x4 M
FactorSquareFree factors out the 2 and the term H1 + xL ^ 2, but leaves the rest unfactored.
In[20]:= FactorSquareFree@tD
Out[20]= 2 H1 + xL2 I6 + 5 x + x2 M
Out[21]= 2 H1 + xL2 H2 + xL H3 + xL
Particularly when you write programs that work with polynomials, you will often find it conve-
nient to pick out pieces of polynomials in a standard form. The function FactorList gives a list
of all the factors of a polynomial, together with their exponents. The first element of the list is
always the overall numerical factor for the polynomial.
The form that FactorList returns is the analog for polynomials of the form produced by
FactorInteger for integers.
Mathematics and Algorithms 69
The form that FactorList returns is the analog for polynomials of the form produced by
FactorInteger for integers.
Here is a list of the factors of the polynomial in the previous set of examples. Each element of
the list gives the factor, together with its exponent.
In[22]:= FactorList@tD
Out[22]= 882, 1<, 81 + x, 2<, 82 + x, 1<, 83 + x, 1<<
FactorApoly,GaussianIntegers->TrueE
factor a polynomial, allowing coefficients that are Gaussian
integers
Factor and related functions usually handle only polynomials with ordinary integer or rational-
number coefficients. If you set the option GaussianIntegers -> True, however, then Factor
will allow polynomials with coefficients that are complex numbers with rational real and imagi-
nary parts. This often allows more extensive factorization to be performed.
Out[23]= 1 + x2
Irreducibility testing.
70 Mathematics and Algorithms
Cyclotomic polynomials.
Out[29]= 1 - x + x2
Out[30]= H-1 + xL H1 + xL I1 - x + x2 M I1 + x + x2 M
Mathematics and Algorithms 71
Decomposing polynomials.
Factorization is one important way of breaking down polynomials into simpler parts. Another,
quite different, way is decomposition. When you factor a polynomial PHxL, you write it as a
product p1 HxL p2 HxL ... of polynomials pi HxL. Decomposing a polynomial QHxL consists of writing it as
a composition of polynomials of the form q1 Hq2 H ... HxL ...LL.
Out[31]= 91 + x + x2 , x2 =
Out[34]= 91 - 2 x + x4 , 5 x + x3 =
Unlike factoring, the decomposition of polynomials is not completely unique. For example, the
two sets of polynomials pi and qi , related by q1 HxL = p1 Hx - aL and q2 HxL = p2 HxL + a give the same
result on composition, so that p1 Hp2 HxLL = q1 Hq2 HxLL. Mathematica follows the convention of absorb-
ing any constant terms into the first polynomial in the list produced by Decompose.
72 Mathematics and Algorithms
InterpolatingPolynomial@8 f1 , f2 ,<,xD
give a polynomial in x which is equal to fi when x is the
integer i
InterpolatingPolynomial@88x1 , f1 <,8x2 , f2 <,<,xD
give a polynomial in x which is equal to fi when x is xi
This yields a quadratic polynomial which goes through the specified three points.
In[35]:= InterpolatingPolynomial@88- 1, 4<, 80, 2<, 81, 6<<, xD
Out[35]= 4 + H1 + xL H-2 + 3 xL
modulo a prime p.
GroebnerBasisApolys,vars,Modulus->pE
find the Grbner basis modulo p
Out[1]= 1 + 6 x + 15 x2 + 20 x3 + 15 x4 + 6 x5 + x6
Mathematics and Algorithms 73
Out[2]= 1 + x2 + x4 + x6
Here are the factors of the resulting polynomial over the integers.
In[3]:= Factor@%D
Out[3]= I1 + x2 M I1 + x4 M
Out[4]= H1 + xL6
Symmetric Polynomials
A symmetric polynomial in variables x1 , , xn is a polynomial that is invariant under arbitrary
permutations of x1 , , xn . Polynomials
s1 = x1 + x2 + + xn
s2 = x1 x2 + x1 x3 + + xn-1 xn
sn = x1 x2 xn
The fundamental theorem of symmetric polynomials says that every symmetric polynomial in
x1 , , xn can be represented as a polynomial in elementary symmetric polynomials in x1 , , xn .
When the ordering of variables is fixed, an arbitrary polynomial f can be uniquely represented
as a sum of a symmetric polynomial p, called the symmetric part of f , and a remainder q that
e e
does not contain descending monomials. A monomial c x11 xnn is called descending iff
e1 en .
74 Mathematics and Algorithms
SymmetricReduction@ f ,8x1 ,,xn <D give a pair of polynomials 8p, q< in x1 , , xn such that
f == p + q, where p is the symmetric part and q is the
remainder
SymmetricReduction@ f ,8x1 ,,xn <,8s1 ,,sn <D
give the pair 8p, q< with the elementary symmetric polynomi -
als in p replaced by s1 , , sn
This writes the polynomial Hx + yL2 + Hx + zL2 + Hz + yL2 in terms of elementary symmetric polynomi-
als. The input polynomial is symmetric, so the remainder is zero.
In[2]:= SymmetricReductionAHx + yL2 + Hx + zL2 + Hz + yL2 , 8x, y, z<E
2
Out[2]= 92 Hx + y + zL - 2 Hx y + x z + y zL, 0=
Here the elementary symmetric polynomials in the symmetric part are replaced with variables
s1 , s2 , s3 . The polynomial is not symmetric, so the remainder is not zero.
In[3]:= SymmetricReductionAx5 + y5 + z4 , 8x, y, z<, 8s1 , s2 , s3 <E
5 3 2 2 4 5
Out[3]= :s1 - 5 s1 s2 + 5 s1 s2 + 5 s1 s3 - 5 s2 s3 , z - z >
Out[1]= 1 + x4
With coefficients that can involve 2 , the polynomial can now be factored.
In[2]:= Factor@1 + x ^ 4, Extension -> 8Sqrt@2D<D
Out[2]= - -1 + 2 x - x2 1+ 2 x + x2
Out[3]= I- + x2 M I + x2 M
Out[4]= I- + x2 M I + x2 M
If one allows coefficients that involve both 2 and -1 the polynomial can be factored com-
pletely.
In[5]:= Factor@1 + x ^ 4, Extension -> 8Sqrt@2D, Sqrt@- 1D<D
1
Out[5]= 2 - H1 + L x 2 - H1 - L x 2 + H1 - L x 2 + H1 + L x
4
Out[6]= 1 + x4
76 Mathematics and Algorithms
FactorApoly,Extension->AutomaticE
factor poly allowing algebraic numbers in poly to appear in
coefficients
Out[7]= 2+2 2 x + x2
Out[8]= 2+2 2 x + x2
But now the field of coefficients is extended by including 2 , and the polynomial is factored.
In[9]:= Factor@t, Extension -> AutomaticD
2
Out[9]= 2 +x
Other polynomial functions work much like Factor. By default, they treat algebraic number
coefficients just like independent symbolic variables. But with the option
Extension -> Automatic they perform operations on these coefficients.
2+2 2 x + x2
Out[10]=
-2 + x2
- 2 -x
Out[11]=
2 -x
Out[12]= I-2 + x2 M 2 + 2 2 x + x2
Out[13]= -2 2 -2x+ 2 x2 + x3
Irreducibility testing.
Trigonometric Expressions
78 Mathematics and Algorithms
Trigonometric Expressions
And this reduces the expression to a form that is linear in the trigonometric functions.
In[3]:= TrigReduce@%D
1
Out[3]= HSin@2 x - 2 yD + Sin@2 x + 2 yDL
2
With TrigFactorList, however, you can see the parts of functions like Tan.
Mathematics and Algorithms 79
With TrigFactorList, however, you can see the parts of functions like Tan.
In[7]:= TrigFactorList@%D
Out[7]= 881, 1<, 8Sin@xD, 2<, 8Cos@xD, -1<<
ExpToTrig does the reverse, getting rid of explicit complex numbers whenever possible.
In[10]:= ExpToTrig@%D
Out[10]= Tanh@xD
The function ComplexExpand expands out algebraic and trigonometric expressions, making
definite assumptions about the variables that appear.
This expands the expression, assuming that x and y are both real.
In[1]:= ComplexExpand@Tan@x + I yDD
Sin@2 xD Sinh@2 yD
Out[1]= +
Cos@2 xD + Cosh@2 yD Cos@2 xD + Cosh@2 yD
In this case, a is assumed to be real, but x is assumed to be complex, and is broken into
explicit real and imaginary parts.
In[2]:= ComplexExpand@a + x ^ 2, 8x<D
With several complex variables, you quickly get quite complicated results.
In[3]:= ComplexExpand@Sin@xD Exp@yD, 8x, y<D
Out[3]= Re@yD Cos@Im@yDD Cosh@Im@xDD Sin@Re@xDD - Re@yD Cos@Re@xDD Sin@Im@yDD Sinh@Im@xDD +
IRe@yD Cosh@Im@xDD Sin@Im@yDD Sin@Re@xDD + Re@yD Cos@Im@yDD Cos@Re@xDD Sinh@Im@xDDM
There are several ways to write a complex variable z in terms of real parameters. As above, for
example, z can be written in the "Cartesian form" Re@zD + I Im@zD. But it can equally well be
written in the "polar form" Abs@zD Exp@I Arg@zDD.
The option TargetFunctions in ComplexExpand allows you to specify how complex variables
should be written. TargetFunctions can be set to a list of functions from the set
8Re, Im, Abs, Arg, Conjugate, Sign<. ComplexExpand will try to give results in terms of
whichever of these functions you request. The default is typically to give results in terms of Re
and Im.
The option TargetFunctions in ComplexExpand allows you to specify how complex variables
Mathematics and Algorithms 81
8Re, Im, Abs, Arg, Conjugate, Sign<. ComplexExpand will try to give results in terms of
whichever of these functions you request. The default is typically to give results in terms of Re
and Im.
LogicalExpand puts logical expressions into a standard disjunctive normal form (DNF), consist-
ing of an OR of ANDs.
LogicalExpand works on all logical functions, always converting them into a standard OR of
ANDs form. Sometimes the results are inevitably quite large.
Any collection of nested conditionals can always in effect be flattened into a piecewise normal
form consisting of a single Piecewise object. You can do this in Mathematica using
PiecewiseExpand.
Functions like Max and Abs, as well as Clip and UnitStep, implicitly involve conditionals, and
combinations of them can again be reduced to a single Piecewise object using
PiecewiseExpand.
Functions like Floor, Mod and FractionalPart can also be expressed in terms of Piecewise
objects, though in principle they can involve an infinite number of cases.
Mathematics and Algorithms 83
Functions like Floor, Mod and FractionalPart can also be expressed in terms of Piecewise
objects, though in principle they can involve an infinite number of cases.
1 1x< 2
Out[8]= 2 2 x< 3
3 x 3
Mathematica by default limits the number of cases that Mathematica will explicitly generate in
the expansion of any single piecewise function such as Floor at any stage in a computation.
You can change this limit by resetting the value of $MaxPiecewiseCases.
Simplification
Simplifying expressions.
It does not, however, do more sophisticated transformations that involve, for example, special
functions.
In[4]:= Simplify@Gamma@1 + nD nD
Gamma@1 + nD
Out[4]=
n
FullSimplifyAexpr,ExcludedForms->patternE
try to simplify expr, without touching subexpressions that
match pattern
Controlling simplification.
FullSimplifyAexpr,TimeConstraint->tE
try to simplify expr, working for at most t seconds on each
transformation
FullSimplifyAexpr,TransformationFunctions->9 f1 , f2 ,=E
use only the functions fi in trying to transform parts of expr
FullSimplifyAexpr,TransformationFunctions->9Automatic, f1 , f2 ,=E
use built-in transformations as well as the fi
SimplifyAexpr,ComplexityFunction->cE
and FullSimplifyAexpr,ComplexityFunction->cE
In both Simplify and FullSimplify there is always an issue of what counts as the "simplest"
form of an expression. You can use the option ComplexityFunction -> c to provide a function
to determine this. The function will be applied to each candidate form of the expression, and
the one that gives the smallest numerical value will be considered simplest.
Using Assumptions
Mathematica normally makes as few assumptions as possible about the objects you ask it to
manipulate. This means that the results it gives are as general as possible. But sometimes
these results are considerably more complicated than they would be if more assumptions were
made.
86 Mathematics and Algorithms
1 1
Out[1]= - +
x x
The reason is that its value is quite different for different choices of x.
In[2]:= % . x -> 8- 3, - 2, - 1, 1, 2, 3<
2
Out[2]= :- , - 2 , -2 , 0, 0, 0>
3
With the assumption x > 0, Simplify can immediately reduce the expression to 0.
In[3]:= Simplify@1 Sqrt@xD - Sqrt@1 xD, x > 0D
Out[3]= 0
Without making assumptions about x the truth or falsity of this equation cannot be determined.
In[6]:= Simplify@Abs@xD == xD
Out[6]= x Abs@xD
This establishes the standard result that the arithmetic mean is larger than the geometric one.
In[8]:= Simplify@Hx + yL 2 >= Sqrt@x yD, x >= 0 && y >= 0D
Out[8]= True
This proves that erfHxL lies in the range H0, 1L for all positive arguments.
In[9]:= FullSimplify@0 < Erf@xD < 1, x > 0D
Out[9]= True
Simplify and FullSimplify always try to find the simplest forms of expressions. Sometimes,
however, you may just want Mathematica to follow its ordinary evaluation process, but with
certain assumptions made. You can do this using Refine. The way it works is that
Refine@expr, assumD performs the same transformations as Mathematica would perform automat -
ically if the variables in expr were replaced by numerical expressions satisfying the assumptions
assum.
Refine just evaluates Log@xD as it would for any explicit negative number x.
In[11]:= Refine@Log@xD, x < 0D
Out[11]= p + Log@-xD
An important class of assumptions is those which assert that some object is an element of a
particular domain. You can set up such assumptions using x dom, where the character can
be entered as el or @ElementD.
This represents the assertion that the symbol x is an element of the domain of real numbers.
In[16]:= x Reals
Out[16]= x Reals
If you say that a variable satisfies an inequality, Mathematica will automatically assume that it
is real.
In[19]:= Simplify@x Reals, x > 0D
Out[19]= True
By using Simplify, FullSimplify and FunctionExpand with assumptions you can access
many of Mathematica's vast collection of mathematical facts.
Mathematica knows about discrete mathematics and number theory as well as continuous
mathematics.
In something like Simplify@expr, assumD or Refine@expr, assumD you explicitly give the assump-
tions you want to use. But sometimes you may want to specify one set of assumptions to use in
a whole collection of operations. You can do this by using Assuming.
Functions like Simplify and Refine take the option Assumptions , which specifies what default
assumptions they should use. By default, the setting for this option is
Assumptions :> $Assumptions. The way Assuming then works is to assign a local value to
$Assumptions, much as in Block.
In addition to Simplify and Refine, a number of other functions take Assumptions options,
and thus can have assumptions specified for them by Assuming. Examples are
FunctionExpand, Integrate, Limit, Series, LaplaceTransform.
Equations
"Defining Variables" discussed assignments such as x = y which set x equal to y. Here we discuss
equations, which test equality. The equation x == y tests whether x is equal to y.
This tests whether 2 + 2 and 4 are equal. The result is the symbol True .
In[1]:= 2 + 2 == 4
Out[1]= True
It is very important that you do not confuse x = y with x == y. While x = y is an imperative state-
ment that actually causes an assignment to be done, x == y merely tests whether x and y are
equal, and causes no explicit action. If you have used the C programming language, you will
recognize that the notation for assignment and testing in Mathematica is the same as in C.
x is equal to 4, not 6.
In[5]:= x == 6
Out[5]= False
The tests we have used so far involve only numbers, and always give a definite answer, either
True or False. You can also do tests on symbolic expressions.
Mathematica cannot get a definite result for this test unless you give x a specific numerical
value.
In[7]:= x == 5
Out[7]= x5
If you replace x by the specific numerical value 4, the test gives False .
In[8]:= % . x -> 4
Out[8]= False
Even when you do tests on symbolic expressions, there are some cases where you can get
definite results. An important one is when you test the equality of two expressions that are
identical. Whatever the numerical values of the variables in these expressions may be, Mathe-
matica knows that the expressions must always be equal.
The two expressions are identical, so the result is True , whatever the value of x may be.
In[9]:= 2 x + x ^ 2 == 2 x + x ^ 2
Out[9]= True
Mathematica does not try to tell whether these expressions are equal. In this case, using
Expand would make them have the same form.
In[10]:= 2 x + x ^ 2 == x H2 + xL
Out[10]= 2 x + x2 x H2 + xL
Expressions like x == 4 represent equations in Mathematica. There are many functions in Mathe-
matica for manipulating and solving equations.
Mathematics and Algorithms 93
Out[11]= -7 + 2 x + x2 0
Out[12]= -7 + 2 x + x2 0
Out[13]= -7 + 2 x + x2 0
Solving Equations
An expression like x ^ 2 + 2 x - 7 == 0 represents an equation in Mathematica. You will often need
to solve equations like this, to find out for what values of x they are true.
This gives the two solutions to the quadratic equation x2 + 2 x - 7 = 0. The solutions are given as
replacements for x.
In[1]:= Solve@x ^ 2 + 2 x - 7 == 0, xD
You can get a list of the actual solutions for x by applying the rules generated by Solve to x
using the replacement operator.
In[3]:= x . %
Out[3]= 8-3.82843, 1.82843<
You can equally well apply the rules to any other expression involving x.
In[4]:= x ^ 2 + 3 x . %%
Out[4]= 83.17157, 8.82843<
94 Mathematics and Algorithms
Solve always tries to give you explicit formulas for the solutions to equations. However, it is a
basic mathematical result that, for sufficiently complicated equations, explicit algebraic formu-
las in terms of radicals cannot be given. If you have an algebraic equation in one variable, and
the highest power of the variable is at most four, then Mathematica can always give you formu-
las for the solutions. However, if the highest power is five or more, it may be mathematically
impossible to give explicit algebraic formulas for all the solutions.
Mathematica can always solve algebraic equations in one variable when the highest power is
less than five.
In[5]:= Solve@x ^ 4 - 5 x ^ 2 - 3 == 0, xD
5 37 5 37 1 1
Out[5]= ::x - + >, :x + >, :x - -5 + 37 >, :x -5 + 37 >>
2 2 2 2 2 2
There are some equations, however, for which it is mathematically impossible to find explicit
formulas for the solutions. Mathematica uses Root objects to represent the solutions in this
case.
In[7]:= Solve@2 - 4 x + x ^ 5 == 0, xD
5 5
Out[7]= 99x RootA2 - 4 1 + 1 &, 1E=, 9x RootA2 - 4 1 + 1 &, 2E=,
9x RootA2 - 4 1 + 15 &, 3E=, 9x RootA2 - 4 1 + 15 &, 4E=, 9x RootA2 - 4 1 + 15 &, 5E==
Even though you cannot get explicit formulas, you can still evaluate the solutions numerically.
In[8]:= N@%D
Out[8]= 88x -1.51851<, 8x 0.508499<, 8x 1.2436<, 8x -0.116792 - 1.43845 <, 8x -0.116792 + 1.43845 <<
In addition to being able to solve purely algebraic equations, Mathematica can also solve some
equations involving other functions.
Mathematics and Algorithms 95
It is important to realize that an equation such as sinHxL = a actually has an infinite number of
possible solutions, in this case differing by multiples of 2 p. However, Solve by default returns
just one solution, but prints a message telling you that other solutions may exist. You can use
Reduce to get more information.
There is no explicit "closed form" solution for a transcendental equation like this.
In[10]:= Solve@Cos@xD == x, xD
Solve::tdep :
The equations appear to involve the variables to be solved for in an essentially non-algebraic way.
Out[10]= Solve@Cos@xD x, xD
You can find an approximate numerical solution using FindRoot , and giving a starting value
for x.
In[11]:= FindRoot@Cos@xD == x, 8x, 0<D
Out[11]= 8x 0.739085<
Solve can also handle equations involving symbolic functions. In such cases, it again prints a
warning, then gives results in terms of formal inverse functions.
InverseFunction::ifun : Inverse functions are being used. Values may be lost for multivalued inverses.
You can also use Mathematica to solve sets of simultaneous equations. You simply give the list
of equations, and specify the list of variables to solve for.
Here is a list of two simultaneous equations, to be solved for the variables x and y.
In[13]:= Solve@8a x + y == 0, 2 x + H1 - aL y == 1<, 8x, y<D
1 a
Out[13]= ::x - , y- >>
2
-2 + a - a 2 - a + a2
Here are some more complicated simultaneous equations. The two solutions are given as two
lists of replacements for x and y.
In[14]:= Solve@8x ^ 2 + y ^ 2 == 1, x + 3 y == 0<, 8x, y<D
3 1 3 1
Out[14]= ::x - , y >, :x , y- >>
10 10 10 10
2 2
Out[15]= :- , >
5 5
When you are working with sets of equations in several variables, it is often convenient to
reorganize the equations by eliminating some variables between them.
This eliminates y between the two equations, giving a single equation for x.
In[16]:= Eliminate@8a x + y == 0, 2 x + H1 - aL y == 1<, yD
2
Out[16]= I2 - a + a M x 1
If you have several equations, there is no guarantee that there exists any consistent solution
for a particular variable.
Mathematics and Algorithms 97
There is no consistent solution to these equations, so Mathematica returns 8<, indicating that
the set of solutions is empty.
In[17]:= Solve@8x == 1, x == 2<, xD
Out[17]= 8<
There is also no consistent solution to these equations for almost all values of a.
In[18]:= Solve@8x == 1, x == a<, xD
Out[18]= 8<
The general question of whether a set of equations has any consistent solution is quite a subtle
one. For example, for most values of a, the equations 8x == 1, x == a< are inconsistent, so
there is no possible solution for x. However, if a is equal to 1, then the equations do have a
solution. Solve is set up to give you generic solutions to equations. It discards any solutions
that exist only when special constraints between parameters are satisfied.
If you use Reduce instead of Solve, Mathematica will however keep all the possible solutions to
a set of equations, including those that require special conditions on parameters.
This shows that the equations have a solution only when a == 1. The notation a == 1 && x == 1
represents the requirement that both a == 1 and x == 1 should be True .
In[19]:= Reduce@8x == a, x == 1<, xD
Out[19]= a 1 && x 1
This gives the complete set of possible solutions to the equation. The answer is stated in terms
of a combination of simpler equations. && indicates equations that must simultaneously be true;
indicates alternatives.
In[20]:= Reduce@a x - b == 0, xD
b
Out[20]= Hb 0 && a 0L a 0 && x
a
b b
Out[21]= Hb 0 && a 0L a 0 && x - x
a a
Reduce also has powerful capabilities for handling equations specifically over real numbers or
integers. "Equations and Inequalities over Domains" discusses this in more detail.
Out[23]= y - 1 - x2 y 1 - x2
Out[24]= -1 x 1 && y - 1 - x2 y 1 - x2
If you have not assigned any explicit value to x, however, Mathematica cannot work out
whether x ^ 2 + 3 x == 2 is True or False. As a result, it leaves the equation in the symbolic form
x ^ 2 + 3 x == 2.
You can manipulate symbolic equations in Mathematica in many ways. One common goal is to
rearrange the equations so as to "solve" for a particular set of variables.
Mathematics and Algorithms 99
You can manipulate symbolic equations in Mathematica in many ways. One common goal is to
rearrange the equations so as to "solve" for a particular set of variables.
Out[1]= 3 x + x2 2
You can use the function Reduce to reduce the equation so as to give "solutions" for x. The
result, like the original equation, can be viewed as a logical statement.
In[2]:= Reduce@%, xD
1 1
Out[2]= x -3 - 17 x -3 + 17
2 2
You can combine and manipulate equations just like other logical statements. You can use
logical connectives such as and && to specify alternative or simultaneous conditions. You can
use functions like LogicalExpand, as well as FullSimplify, to simplify collections of equa-
tions.
For many purposes, you will find it convenient to manipulate equations simply as logical state-
ments. Sometimes, however, you will actually want to use explicit solutions to equations in
other calculations. In such cases, it is convenient to convert equations that are stated in the
form lhs == rhs into transformation rules of the form lhs -> rhs. Once you have the solutions to an
equation in the form of explicit transformation rules, you can substitute the solutions into
expressions by using the . operator.
Reduce produces a logical statement about the values of x corresponding to the roots of the
quadratic equation.
In[3]:= Reduce@x ^ 2 + 3 x == 2, xD
1 1
Out[3]= x -3 - 17 x -3 + 17
2 2
ToRules converts the logical statement into an explicit list of transformation rules.
100 Mathematics and Algorithms
ToRules converts the logical statement into an explicit list of transformation rules.
In[4]:= 8ToRules@%D<
1 1
Out[4]= ::x -3 - 17 >, :x -3 + 17 >>
2 2
You can now use the transformation rules to substitute the solutions for x into expressions
involving x.
In[5]:= x ^ 2 + a x . %
1 2 1 1 2 1
Out[5]= : -3 - 17 + -3 - 17 a, -3 + 17 + -3 + 17 a>
4 2 4 2
One can also solve quadratic equations just by applying a simple formula.
In[2]:= Solve@x ^ 2 + a x + 2 == 0, xD
1 1
Out[2]= ::x -a - -8 + a2 >, :x -a + -8 + a2 >>
2 2
Mathematica can also find exact solutions to cubic equations. Here is the first solution to a
comparatively simple cubic equation.
In[3]:= Solve@x ^ 3 + 34 x + 1 == 0, xD@@1DD
13 1 13
2 J J-9 + 471 729 NN
2
Out[3]= :x -34 + >
3 J-9 + 471 729 N 323
For cubic and quartic equations the results are often complicated, but for all equations with
degrees up to four Mathematica is always able to give explicit formulas for the solutions.
Mathematics and Algorithms 101
For cubic and quartic equations the results are often complicated, but for all equations with
degrees up to four Mathematica is always able to give explicit formulas for the solutions.
An important feature of these formulas is that they involve only radicals: arithmetic combina-
tions of square roots, cube roots and higher roots.
It is a fundamental mathematical fact, however, that for equations of degree five or higher, it is
no longer possible in general to give explicit formulas for solutions in terms of radicals.
There are some specific equations for which this is still possible, but in the vast majority of
cases it is not.
Out[4]= -48 + 44 x2 - 12 x4 + x6
For a polynomial that factors in the way this one does, it is straightforward for Solve to find
the roots.
In[5]:= Solve@% == 0, xD
The polynomial does not factor, but it can be decomposed into nested polynomials, so Solve
can again find explicit formulas for the roots.
In[7]:= Solve@% == 0, xD
No explicit formulas for the solution to this equation can be given in terms of radicals, so
Mathematica uses an implicit symbolic representation.
In[8]:= Solve@x ^ 5 - x + 11 == 0, xD
If what you want in the end is a numerical solution, it is usually much faster to use NSolve
from the outset.
In[10]:= NSolve@x ^ 5 - x + 11 == 0, xD
Out[10]= 88x -1.66149<, 8x -0.46194 - 1.565 <,
8x -0.46194 + 1.565 <, 8x 1.29268 - 0.903032 <, 8x 1.29268 + 0.903032 <<
Root objects provide an exact, though implicit, representation for the roots of a polynomial.
You can work with them much as you would work with Sqrt@2D or any other expression that
represents an exact numerical quantity.
Here is the Root object representing the first root of the polynomial discussed above.
In[11]:= r = Root@ ^ 5 - + 11 &, 1D
Round does an exact computation to find the closest integer to the root.
In[13]:= Round@rD
Out[13]= -2
If you substitute the root into the original polynomial, and then simplify the result, you get
zero.
In[14]:= FullSimplify@x ^ 5 - x + 11 . x -> rD
Out[14]= 0
Mathematics and Algorithms 103
This finds the product of all the roots of the original polynomial.
In[15]:= FullSimplify@Product@Root@11 - + ^ 5 &, kD, 8k, 5<DD
Out[15]= -11
If the only symbolic parameter that exists in an equation is the variable that you are solving
for, then all the solutions to the equation will just be numbers. But if there are other symbolic
parameters in the equation, then the solutions will typically be functions of these parameters.
The solution to this equation can again be represented by Root objects, but now each Root
object involves the parameter a.
In[17]:= Solve@x ^ 5 + x + a == 0, xD
When a is replaced with 1, the Root objects can be simplified, and some are given as explicit
radicals.
In[18]:= Simplify@% . a -> 1D
1
Out[18]= :9x RootA1 - 12 + 13 &, 1E=, :x - - + 3 >,
2
1
:x + 3 >, 9x RootA1 - 12 + 13 &, 2E=, 9x RootA1 - 12 + 13 &, 3E=>
2
0.5
Out[19]=
-2 -1 1 2
-0.5
-1.0
If you give Solve any nth -degree polynomial equation, then it will always return exactly n solu-
tions, although some of these may be represented by Root objects. If there are degenerate
104 Mathematics and Algorithms
If you give Solve any nth -degree polynomial equation, then it will always return exactly n solu-
tions, although some of these may be represented by Root objects. If there are degenerate
solutions, then the number of times that each particular solution appears will be equal to its
multiplicity.
Here are the first four solutions to a tenth-degree equation. The solutions come in pairs.
In[22]:= Take@Solve@Hx ^ 5 - x + 11L ^ 2 == 0, xD, 4D
Mathematica also knows how to solve equations which are not explicitly in the form of polynomi-
als.
So long as it can reduce an equation to some kind of polynomial form, Mathematica will always
be able to represent its solution in terms of Root objects. However, with more general equa-
tions, involving say transcendental functions, there is no systematic way to use Root objects, or
even necessarily to find numerical approximations.
Solve::tdep :
The equations appear to involve the variables to be solved for in an essentially non-algebraic way.
Out[26]= Solve@Cos@xD x, xD
Polynomial equations in one variable only ever have a finite number of solutions. But transcen-
dental equations often have an infinite number. Typically the reason for this is that functions
like Sin in effect have infinitely many possible inverses. With the default option setting
InverseFunctions -> True, Solve will nevertheless assume that there is a definite inverse for
any such function. Solve may then be able to return particular solutions in terms of this inverse
function.
Mathematica returns a particular solution in terms of ArcSin, but prints a warning indicating
that other solutions are lost.
In[28]:= Solve@Sin@xD == a, xD
InverseFunction::ifun : Inverse functions are being used. Values may be lost for multivalued inverses.
If you ask Solve to solve an equation involving an arbitrary function like f, it will by default try
to construct a formal solution in terms of inverse functions.
106 Mathematics and Algorithms
InverseFunction::ifun : Inverse functions are being used. Values may be lost for multivalued inverses.
Inverse functions.
While Solve can only give specific solutions to an equation, Reduce can give a representation of
a whole solution set. For transcendental equations, it often ends up introducing new parame-
ters, say with values ranging over all possible integers.
As discussed at more length in "Equations and Inequalities over Domains", Reduce allows you
to restrict the domains of variables. Sometimes this will let you generate definite solutions to
transcendental equations~or show that they do not exist.
CountRoots accepts polynomials with Gaussian rational coefficients. The root count includes
multiplicities.
108 Mathematics and Algorithms
This gives the number of real roots of Ix2 - 2M Ix2 - 3M Ix2 - 4M.
This counts the roots of Ix2 - 2M Ix2 - 3M Ix2 - 4M in the closed interval @1, 2D.
The roots of Ix2 + 1M x3 in the vertical axis segment between 0 and 2 consist of a triple root at 0
and a single root at .
In[3]:= CountRootsAIx2 + 1M x3 , 8x, 0, 2 <E
Out[3]= 4
This counts 17th -degree roots of unity in the closed unit square.
In[4]:= CountRootsAx17 - 1, 8x, 0, 1 + <E
Out[4]= 5
Isolating Intervals
A set S K, where K is or , is an isolating set for a root a of a polynomial f if a is the only
root of f in S. Isolating roots of a polynomial means finding disjoint isolating sets for all the
roots of the polynomial.
RootIntervals@8poly1 ,poly2 ,<D give a list of disjoint isolating intervals for the real roots of
any of the polyi , together with a list of which polynomials
actually have each successive root
RootIntervals@polyD give disjoint isolating intervals for real roots of a single
polynomial
Mathematics and Algorithms 109
For a real root r the returned isolating interval is a pair of rational numbers 8a, b<, such that
either a < r < b or a b r. For a nonreal root r the isolating rectangle returned is a pair of Gaus-
sian rational numbers 8a, b<, such that ReHaL < ReHrL < ReHbL ImHaL < ImHrL < ImHbL and either ImHaL 0
or ImHbL 0.
The second list shows which interval contains a root of which polynomial.
In[7]:= RootIntervals@8f + 3, f + 5, f + 7<D
5 9 9 9 9 5
Out[7]= :::- , - >, :- , -1>, 8-1, 0<, 80, 1<, :1, >, : , >>, 881<, 82<, 83<, 83<, 82<, 81<<>
4 8 8 8 8 4
Here are isolating intervals for the third- and fourth-degree roots of unity. The second interval
contains a root common to both polynomials.
In[9]:= RootIntervalsA9x3 - 1, x4 - 1=, ComplexesE
3 3 3 3 3 3 3 3
Out[9]= ::8-1, -1<, 80, 2<, :- - , - - >, :- >, + , - +
4 216 4 4 4 16 2
3 3 3 3 3 3
3 3
:- - , - >, :- + , + >>, 882<, 81, 2<, 81<, 81<, 82<, 82<<>
16 2 8 4 16 4 8 2
110 Mathematics and Algorithms
All numbers in the interval have the first ten decimal digits in common.
In[12]:= N@%, 10D
Out[12]= 8-0.788420396 + 1.295043616 , -0.788420396 + 1.295043616 <
Algebraic Numbers
When you enter a Root object, the polynomial that appears in it is automatically reduced to a
minimal form.
In[1]:= Root@24 - 2 + 4 ^ 5 &, 1D
This extracts the pure function which represents the polynomial, and applies it to x.
In[2]:= First@%D@xD
Out[2]= 12 - x + 2 x5
Root objects are the way that Mathematica represents algebraic numbers. Algebraic numbers
have the property that when you perform algebraic operations on them, you always get a single
algebraic number as the result.
Again this can be reduced to a single Root object, albeit a fairly complicated one.
In[6]:= RootReduce@%D
In this simple case the Root object is automatically expressed in terms of radicals.
In[7]:= Root@ ^ 2 - - 1 &, 1D
1
Out[7]= 1- 5
2
When cubic polynomials are involved, Root objects are not automatically expressed in terms of
radicals.
In[8]:= Root@ ^ 3 - 2 &, 1D
Out[9]= 213
If Solve and ToRadicals do not succeed in expressing the solution to a particular polynomial
equation in terms of radicals, then it is a good guess that this fundamentally cannot be done.
However, you should realize that there are some special cases in which a reduction to radicals
is in principle possible, but Mathematica cannot find it. The simplest example is the equation
x5 + 20 x + 32 = 0, but here the solution in terms of radicals is very complicated. The equation
1 1
x5 + 20 x + 32 = 0, but here the solution in terms of radicals is very complicated. The equation
1 1
Even though a simple form in terms of radicals does exist, ToRadicals does not find it.
In[11]:= ToRadicals@%D
Beyond degree four, most polynomials do not have roots that can be expressed at all in terms
of radicals. However, for degree five it turns out that the roots can always be expressed in
terms of elliptic or hypergeometric functions. The results, however, are typically much too
complicated to be useful in practice.
RootSum@ f , formD the sum of form@xD for all x satisfying the polynomial
equation f @xD == 0
Normal@exprD the form of expr with RootSum replaced by explicit sums of
Root objects
Sums of roots.
This expands the RootSum into an explicit sum involving Root objects.
In[14]:= Normal@%D
Out[15]= 2
Simultaneous Equations
You can give Solve a list of simultaneous equations to solve. Solve can find explicit solutions
for a large class of simultaneous polynomial equations.
Here is a more complicated example. The result is a list of solutions, with each solution consist -
ing of a list of transformation rules for the variables.
In[2]:= Solve@8x ^ 2 + y ^ 2 == 1, x + y == a<, 8x, y<D
1 1 a 2 - a2 1
Out[2]= ::x a- 2 - a2 , y a+ 2 - a2 >, :x + , y a- 2 - a2 >>
2 2 2 2 2
114 Mathematics and Algorithms
Even when Solve cannot find explicit solutions, it often can "unwind" simultaneous equations
to produce a symbolic result in terms of Root objects.
In[4]:= First@Solve@8x ^ 2 + y ^ 3 == x y, x + y + x y == 1<, 8x, y<DD
1 2 3
Out[4]= :x J1 - RootA1 - 3 1 + 12 + 2 13 + 2 14 + 15 &, 1E - RootA1 - 3 1 + 12 + 2 13 + 2 14 + 15 &, 1E -
2
4
RootA1 - 3 1 + 12 + 2 13 + 2 14 + 15 &, 1E N, y RootA1 - 3 1 + 12 + 2 13 + 2 14 + 15 &, 1E>
The variables that you use in Solve do not need to be single symbols. Often when you set up
large collections of simultaneous equations, you will want to use expressions like a@iD as
variables.
Solve@eqnsD try to solve eqns for all the objects that appear in them
If you do not explicitly specify objects to solve for, Solve will try to solve for all the variables.
In[8]:= Solve@8x + y == 1, x - 3 y == 2<D
5 1
Out[8]= ::x , y - >>
4 4
Mathematics and Algorithms 115
If you construct simultaneous equations from matrices, you typically get equations between
lists of expressions.
In[9]:= 883, 1<, 82, - 5<<.8x, y< == 87, 8<
Out[9]= 83 x + y, 2 x - 5 y< 87, 8<
Solve implicitly assumes that the parameter a does not have the special value 0.
In[1]:= Solve@a x == 0, xD
Out[1]= 88x 0<<
116 Mathematics and Algorithms
Reduce, on the other hand, gives you all the possibilities, without assuming anything about the
value of a.
In[2]:= Reduce@a x == 0, xD
Out[2]= a 0 x 0
A basic difference between Reduce and Solve is that Reduce gives all the possible solutions to a
set of equations, while Solve gives only the generic ones. Solutions are considered "generic" if
they involve conditions only on the variables that you explicitly solve for, and not on other
parameters in the equations. Reduce and Solve also differ in that Reduce always returns combi-
nations of equations, while Solve gives results in the form of transformation rules.
Solving equations.
Reduce gives the full version, which includes the possibility a == b == 0. In reading the output,
note that && has higher precedence than .
In[4]:= Reduce@a x + b == 0, xD
b
Out[4]= Hb 0 && a 0L a 0 && x -
a
Here is the full solution to a general quadratic equation. There are three alternatives. If a is
nonzero, then there are two solutions for x, given by the standard quadratic formula. If a is
zero, however, the equation reduces to a linear one. Finally, if a, b and c are all zero, there is
no restriction on x.
In[5]:= Reduce@a x ^ 2 + b x + c == 0, xD
-b - b2 - 4 a c -b + b2 - 4 a c
Out[5]= a 0 && x x
2a 2a
c
a 0 && b 0 && x - Hc 0 && b 0 && a 0L
b
Mathematics and Algorithms 117
When you have several simultaneous equations, Reduce can show you under what conditions
the equations have solutions. Solve shows you whether there are any generic solutions.
There is a solution to these equations, but only when a has the special value 1.
In[7]:= Reduce@8x == 1, x == a<, xD
Out[7]= a 1 && x 1
This is the kind of result Solve returns when you give an equation that is always true.
In[11]:= Solve@x == x, xD
Out[11]= 88<<
When you work with systems of linear equations, you can use Solve to get generic solutions,
and Reduce to find out for what values of parameters solutions exist.
Reduce, however, shows that there would be a solution if the parameters satisfied the special
condition a == 2 b - c.
In[16]:= Reduce@eqn, 8x, y, z<D
Out[16]= a 2 b - c && y -6 b + 5 c - 2 x && z 5 b - 4 c + x
For nonlinear equations, the conditions for the existence of solutions can be much more compli-
cated.
Out[17]= 9x y a, x2 y2 b=
Eliminating Variables
When you write down a set of simultaneous equations in Mathematica, you are specifying a
collection of constraints between variables. When you use Solve, you are finding values for
some of the variables in terms of others, subject to the constraints represented by the
equations.
Eliminating variables.
If you solve for both x and y, you get results in terms of a and b.
In[2]:= Solve@eqn, 8x, y<D
Out[2]= 88x -a + b, y 7 a + 2 b<<
Similarly, if you solve for x and a, you get results in terms of y and b.
In[3]:= Solve@eqn, 8x, a<D
1 1
Out[3]= ::x H9 b - yL, a H-2 b + yL>>
7 7
If you only want to solve for x, however, you have to specify whether you want to eliminate y
or a or b. This eliminates y, and so gives the result in terms of a and b.
In[4]:= Solve@eqn, x, yD
Out[4]= 88x -a + b<<
In some cases, you may want to construct explicitly equations in which variables have been
eliminated. You can do this using Eliminate.
This combines the two equations in the list eqn, by eliminating the variable a.
In[6]:= Eliminate@eqn, aD
Out[6]= 9 b - y 7 x
To solve the problem, we simply have to write f in terms of a and b, eliminating the original
variables x and y.
In[8]:= Eliminate@8f == x ^ 5 + y ^ 5, a == x + y, b == x y<, 8x, y<D
5 3 2
Out[8]= f a - 5 a b + 5 a b
In dealing with sets of equations, it is common to consider some of the objects that appear as
true "variables", and others as "parameters". In some cases, you may need to know for what
values of parameters a particular relation between the variables is always satisfied.
SolveAlways @eqns,varsD solve for the values of parameters for which the eqns are
satisfied for all values of the vars
This finds the values of parameters that make the equation hold for all x.
In[9]:= SolveAlways@a + b x + c x ^ 2 == H1 + xL ^ 2, xD
Out[9]= 88a 1, b 2, c 1<<
Relational operators.
Since both of the quantities involved are numeric, Mathematica can determine that this is true.
In[4]:= Pi ^ E < E ^ Pi
Out[4]= True
Logical operations.
You should remember that the logical operations ==, && and are all double characters in
Mathematica. If you have used a programming language such as C, you will be familiar with
this notation.
This specifies that either x + y == 1 or x - y == 2. Solve gives two solutions for x, correspond-
ing to these two possibilities.
In[3]:= Solve@x + y == 1 x - y == 2, xD
Out[3]= 88x 1 - y<, 8x 2 + y<<
If you explicitly include the assertion that x != 0, one of the previous solutions is suppressed.
In[5]:= Solve@x ^ 3 == x && x != 0, xD
Out[5]= 88x -1<, 8x 1<<
Here is a slightly more complicated example. Note that the precedence of is lower than the
precedence of &&, so the equation is interpreted as Hx ^ 3 == x && x != 1L x ^ 2 == 2, not
x ^ 3 == x && Hx != 1 x ^ 2 == 2L.
In[6]:= Solve@x ^ 3 == x && x != 1 x ^ 2 == 2, xD
When you use Solve, the final results you get are in the form of transformation rules. If you
use Reduce or Eliminate, on the other hand, then your results are logical statements, which
you can manipulate further.
124 Mathematics and Algorithms
When you use Solve, the final results you get are in the form of transformation rules. If you
use Reduce or Eliminate, on the other hand, then your results are logical statements, which
you can manipulate further.
This finds values of x which satisfy x ^ 5 == x but do not satisfy the statement representing the
solutions of x ^ 2 == x.
In[8]:= Reduce@x ^ 5 == x && ! %, xD
Out[8]= x -1 x - x
The logical statements produced by Reduce can be thought of as representations of the solution
set for your equations. The logical connectives &&, and so on then correspond to operations
on these sets.
You may often find it convenient to use special notations for logical connectives, as discussed in
"Operators".
Inequalities
Just as the equation x ^ 2 + 3 x == 2 asserts that x ^ 2 + 3 x is equal to 2, so also the inequality
x ^ 2 + 3 x > 2 asserts that x ^ 2 + 3 x is greater than 2. In Mathematica, Reduce works not only on
equations, but also on inequalities.
Mathematics and Algorithms 125
When applied to an equation, Reduce@eqn, xD tries to get a result consisting of simple equations
for x of the form x ==r1 , . When applied to an inequality, Reduce@ineq, xD does the exactly
analogous thing, and tries to get a result consisting of simple inequalities for x of the form
l1 < x <r1 , .
You can think of the result generated by Reduce@ineq, xD as representing a series of intervals,
described by inequalities. Since the graph of a polynomial of degree n can go up and down as
many as n times, a polynomial inequality of degree n can give rise to as many as n 2 + 1 distinct
intervals.
Transcendental functions like sin HxL have graphs that go up and down infinitely many times, so
that infinitely many intervals can be generated.
If you have inequalities that involve <= as well as <, there may be isolated points where the
inequalities can be satisfied. Reduce represents such points by giving equations.
Mathematics and Algorithms 127
Multivariate inequalities.
For inequalities involving several variables, Reduce in effect yields nested collections of interval
specifications, in which later variables have bounds that depend on earlier variables.
In geometrical terms, any linear inequality divides space into two halves. Lists of linear inequali-
ties thus define polyhedra, sometimes bounded, sometimes not. Reduce represents such polyhe-
dra in terms of nested inequalities. The corners of the polyhedra always appear among the
endpoints of these inequalities.
Lists of inequalities in general represent regions of overlap between geometrical objects. Often
the description of these can be quite complicated.
This represents the part of the unit disk on one side of a line.
In[16]:= Reduce@8x ^ 2 + y ^ 2 < 1, x + 3 y > 2<, 8x, y<D
1 1 2-x
Out[16]= 2-3 6 <x< 2+3 6 && <y< 1 - x2
10 10 3
3 1 1 1 3 - 2 x2 p
Out[19]= - + 81 - 4 p2 <x<- 9- 81 - 4 p2 && - <y<
4 12 2 3 2 6x
1 1 3 1 p 3 - 2 x2
9- 81 - 4 p2 <x< + 81 - 4 p2 && <y<
2 3 4 12 6x 2
If you have inequalities that involve parameters, Reduce automatically handles the different
cases that can occur, just as it does for equations.
1 - x2 1 - x2
Out[21]= y Reals && a < 0 && x -1 && y < - y >
a a
1 - x2 1 - x2
-1 < x < 1 x 1 && y < - y >
a a
1 - x2 1 - x2
Ha 0 && -1 < x < 1L a > 0 && -1 < x < 1 && - <y<
a a
Reduce tries to give you a complete description of the region defined by a set of inequalities.
Sometimes, however, you may just want to find individual instances of values of variables that
satisfy the inequalities. You can do this using FindInstance.
FindInstance is in some ways an analog for inequalities of Solve for equations. For like Solve,
it returns a list of rules giving specific values for variables. But while for equations these values
can generically give an accurate representation of all solutions, for inequalities they can only
correspond to isolated sample points within the regions described by the inequalities.
Every time you call FindInstance with specific input, it will give the same output. And when
there are instances that correspond to special, limiting, points of some kind, it will preferentially
return these. But in general, the distribution of instances returned by FindInstance will typi-
Every time you call FindInstance with specific input, it will give the same output. And when
there are instances that correspond to special, limiting, points of some kind, it will preferentially
130 Mathematics and Algorithms
cally seem somewhat random. Each instance is, however, in effect a constructive proof that the
inequalities you have given can in fact be satisfied.
If you ask for one point in the unit disk, FindInstance gives the origin.
In[24]:= FindInstance@x ^ 2 + y ^ 2 <= 1, 8x, y<D
Out[24]= 88x 0, y 0<<
0.5
Out[26]=
1.0 0.5 0.5 1.0
0.5
1.0
Integers integers
Reduce by default assumes that x can be complex, and gives all five complex solutions.
In[1]:= Reduce@x ^ 6 - x ^ 4 - 4 x ^ 2 + 4 == 0, xD
Out[1]= x -1 x 1 x - 2 x - 2 x 2 x 2
But here it assumes that x is real, and gives only the real solutions.
Mathematics and Algorithms 131
But here it assumes that x is real, and gives only the real solutions.
In[2]:= Reduce@x ^ 6 - x ^ 4 - 4 x ^ 2 + 4 == 0, x, RealsD
Out[2]= x -1 x 1 x - 2 x 2
And here it assumes that x is an integer, and gives only the integer solutions.
In[3]:= Reduce@x ^ 6 - x ^ 4 - 4 x ^ 2 + 4 == 0, x, IntegersD
Out[3]= x -1 x 1
A single polynomial equation in one variable will always have a finite set of discrete solutions.
And in such a case one can think of Reduce@eqns, vars, domD as just filtering the solutions by
selecting the ones that happen to lie in the domain dom.
But as soon as there are more variables, things can become more complicated, with solutions
to equations corresponding to parametric curves or surfaces in which the values of some vari-
ables can depend on the values of others. Often this dependence can be described by some
collection of equations or inequalities, but the form of these can change significantly when one
goes from one domain to another.
Out[4]= y - 1 - x2 y 1 - x2
Out[5]= -1 x 1 && y - 1 - x2 y 1 - x2
Over the integers, the solution can be represented as equations for discrete points.
In[6]:= Reduce@x ^ 2 + y ^ 2 == 1, 8x, y<, IntegersD
Out[6]= Hx -1 && y 0L Hx 0 && y -1L Hx 0 && y 1L Hx 1 && y 0L
If your input involves only equations, then Reduce will by default assume that all variables are
complex. But if your input involves inequalities, then Reduce will assume that any algebraic
variables appearing in them are real, since inequalities can only compare real quantities.
132 Mathematics and Algorithms
For systems of polynomials over real and complex domains, the solutions always consist of a
finite number of components, within which the values of variables are given by algebraic num-
bers or functions.
While in principle Reduce can always find the complete solution to any collection of polynomial
equations and inequalities with real or complex variables, the results are often very compli-
cated, with the number of components typically growing exponentially as the number of vari-
ables increases.
As soon as one introduces functions like Sin or Exp, even equations in single real or complex
variables can have solutions with an infinite number of components. Reduce labels these compo-
nents by introducing additional parameters. By default, the nth parameter in a given solution
will be named C@nD. In general you can specify that it should be named f @nD by giving the
Mathematics and Algorithms 133
As soon as one introduces functions like Sin or Exp, even equations in single real or complex
variables can have solutions with an infinite number of components. Reduce labels these compo-
nents by introducing additional parameters. By default, the nth parameter in a given solution
will be named C@nD. In general you can specify that it should be named f @nD by giving the
option setting GeneratedParameters -> f .
Reduce can handle equations not only over real and complex variables, but also over integers.
Solving such Diophantine equations can often be a very difficult problem.
Reduce can solve any system of linear equations or inequalities over the integers. With m linear
equations in n variables, n - m parameters typically need to be introduced. But with inequalities,
a much larger number of parameters may be needed.
Three parameters are needed here, even though there are only two variables.
In[15]:= Reduce@83 x - 2 y > 1, x > 0, y > 0<, 8x, y<, IntegersD
Out[15]= HC@1D C@2D C@3DL Integers && C@1D 0 && C@2D 0 &&
C@3D 0 && HHx 2 + 2 C@1D + C@2D + C@3D && y 2 + 3 C@1D + C@2DL
Hx 2 + 2 C@1D + C@2D + C@3D && y 1 + 3 C@1D + C@2DLL
With two variables, Reduce can solve any quadratic equation over the integers. The result can
be a Fibonacci-like sequence, represented in terms of powers of quadratic irrationals.
The actual values for specific C@1D as integers, as they should be.
In[17]:= FullSimplify@% . Table@8C@1D -> i<, 8i, 4<DD
Out[17]= 8x 649 && y 180, x 842 401 && y 233 640,
x 1 093 435 849 && y 303 264 540, x 1 419 278 889 601 && y 393 637 139 280<
Reduce can handle many specific classes of equations over the integers.
Equations over the integers sometimes have seemingly quite random collections of solutions.
And even small changes in equations can often lead them to have no solutions at all.
For polynomial equations over real and complex numbers, there is a definite decision procedure
for determining whether or not any solution exists. But for polynomial equations over the inte-
gers, the unsolvability of Hilbert's tenth problem demonstrates that there can never be any
such general procedure.
For specific classes of equations, however, procedures can be found, and indeed many are
implemented in Reduce. But handling different classes of equations can often seem to require
whole different branches of number theory, and quite different kinds of computations. And in
fact it is known that there are universal integer polynomial equations, for which filling in some
variables can make solutions for other variables correspond to the output of absolutely any
possible program. This then means that for such equations there can never in general be any
closed-form solution built from fixed elements like algebraic functions.
If one includes functions like Sin, then even for equations involving real and complex numbers
the same issues can arise.
Mathematics and Algorithms 135
If one includes functions like Sin, then even for equations involving real and complex numbers
the same issues can arise.
Since there are only ever a finite number of possible solutions for integer equations modulo n,
Reduce can systematically find them.
Reduce can also handle equations that involve several different moduli.
Reduce normally treats complex variables as single objects. But in dealing with functions that
are not analytic or have branch cuts, it sometimes has to break them into pairs of real variables
Re@zD and Im@zD.
136 Mathematics and Algorithms
Reduce by default assumes that variables that appear algebraically in inequalities are real. But
you can override this by explicitly specifying Complexes as the default domain. It is often useful
in such cases to be able to specify that certain variables are still real.
Since x does not appear algebraically, Reduce immediately assumes that it can be complex.
In[28]:= Reduce@Abs@xD < 1, xD
Out[28]= -1 < Re@xD < 1 && - 1 - Re@xD2 < Im@xD < 1 - Re@xD2
Out[29]= x < 0 && - x2 < Re@yD < x2 && - x2 - Re@yD2 < Im@yD < x2 - Re@yD2
x > 0 && - x2 < Re@yD < x2 && - x2 - Re@yD2 < Im@yD < x2 - Re@yD2
FindInstance@expr,8x1 ,x2 ,<,domD try to find an instance of the xi in dom satisfying expr
FindInstance@expr,vars,dom,nD try to find n instances
Complexes the domain of complex numbers
Mathematics and Algorithms 137
If FindInstance@expr, vars, domD returns 8< then this means that Mathematica has effectively
proved that expr cannot be satisfied for any values of variables in the specified domain. When
expr can be satisfied, FindInstance will normally pick quite arbitrarily among values that do
this, as discussed for inequalities in "Inequalities: Manipulating Equations and Inequalities".
Particularly for integer equations, FindInstance can often find particular solutions to equations
even when Reduce cannot find a complete solution. In such cases it usually returns one of the
smallest solutions to the equations.
One feature of FindInstance is that it also works with Boolean expressions whose variables
can have values True or False. You can use FindInstance to determine whether a particular
expression is satisfiable, so that there is some choice of truth values for its variables that
makes the expression True.
An implicit description in terms of equations or inequalities is sufficient if one just wants to test
whether a point specified by values of variables is in the region. But to understand the structure
of the region, or to generate points in it, one typically needs a more explicit description, of the
kind obtained from Reduce.
If we pick a value for x consistent with the first inequality, we then immediately get an explicit
inequality for y.
In[4]:= % . x -> 1 2
3 3
Out[4]= - <y<
2 2
Reduce@expr, 8x1 , x2 , <D is set up to describe regions by first giving fixed conditions for x1 ,
then giving conditions for x2 that depend on x1 , then conditions for x3 that depend on x1 and x2 ,
and so on. This structure has the feature that it allows one to pick points by successively choos-
ing values for each of the xi in turn~in much the same way as when one uses iterators in func-
tions like Table.
Mathematics and Algorithms 139
This gives a representation for the region in which one first picks a value for y, then x.
In[5]:= Reduce@semi, 8y, x<D
In some simple cases the region defined by a system of equations or inequalities will end up
having only one component. In such cases, the output from Reduce will be of the form
e1 && e2 where each of the ei is an equation or inequality involving variables up to xi .
In most cases, however, there will be several components, represented by output containing
forms such as u1 u2 . Reduce typically tries to minimize the number of components used
in describing a region. But in some cases multiple parametrizations may be needed to cover a
single connected component, and each one of these will appear as a separate component in the
output from Reduce.
In representing solution sets, it is common to find that several components can be described
together by using forms such as && Hu1 u2 L && . Reduce by default does this so as to
return its results as compactly as possible. You can use LogicalExpand to generate an
expanded form in which each component appears separately.
In generating the most compact results, Reduce sometimes ends up making conditions on later
variables xi depend on more of the earlier xi than is strictly necessary. You can force Reduce to
generate results in which a particular xi only has minimal dependence on earlier xi by giving the
option Backsubstitution -> True. Usually this will lead to much larger output, although some-
times it may be easier to interpret.
Out[6]= x 2 x -3 - 3 x -3 + 3 && y 4 - x2
For polynomial equations or inequalities over the reals, the structure of the result returned by
Reduce is typically a cylindrical algebraic decomposition or CAD. Sometimes Reduce can yield a
simpler form. But in all cases you can get the complete CAD by using
CylindricalDecomposition. For systems containing inequalities only,
GenericCylindricalDecomposition gives you "most" of the solution set and is often faster.
This gives the two-dimensional part of the solution set along with a curve containing the
boundary.
In[9]:= GenericCylindricalDecomposition@x ^ 2 - y ^ 2 >= 1, 8x, y<D
Out[9]= : x < -1 && - -1 + x2 < y < -1 + x2 x > 1 && - -1 + x2 < y < -1 + x2 , 1 - x2 + y2 0>
The results include a few points from each piece of the solution set.
In[11]:= Show@
8RegionPlot@x ^ 2 - y ^ 2 >= 1, 8x, - 3, 3<, 8y, - 3, 3<D, Graphics@Point@8x, y<D . %D<D
3
Out[11]=
0
-1
-2
-3
-3 -2 -1 0 1 2 3
Quantifiers
In a statement like x ^ 4 + x ^ 2 > 0, Mathematica treats the variable x as having a definite,
though unspecified, value. Sometimes, however, it is useful to be able to make statements
about whole collections of possible values for x. You can do this using quantifiers.
Exists@8x1 ,<,cond,exprD there exist values of the xi satisfying cond for which expr
holds
You can work with quantifiers in Mathematica much as you work with equations, inequalities or
logical connectives. In most cases, the quantifiers will not immediately be changed by evalua-
tion. But they can be simplified or reduced by functions like FullSimplify and Reduce.
This asserts that an x exists that makes the inequality true. The output here is just a formatted
version of the input.
In[1]:= Exists@x, x ^ 4 + x ^ 2 > 0D
Out[1]= $x x2 + x4 > 0
142 Mathematics and Algorithms
Mathematica supports a version of the standard notation for quantifiers used in predicate logic
and pure mathematics. You can input " as @ForAllD or fa, and you can input $ as @ExistsD
or ex. To make the notation precise, however, Mathematica makes the quantified variable a
subscript. The conditions on the variable can also be given in the subscript, separated by a
comma.
Given a statement that involves quantifiers, there are certain important cases where it is possi-
ble to resolve it into an equivalent statement in which the quantifiers have been eliminated.
Somewhat like solving an equation, such quantifier elimination turns an implicit statement
about what is true for all x or for some x into an explicit statement about the conditions under
which this holds.
Quantifier elimination.
Mathematics and Algorithms 143
This shows that the equations can only be satisfied if c obeys a certain condition.
In[5]:= Resolve@Exists@x, x ^ 2 == c && x ^ 3 == c + 1DD
Out[5]= -1 - 2 c - c2 + c3 0
Resolve can always eliminate quantifiers from any collection of polynomial equations and
inequations over complex numbers, and from any collection of polynomial equations and inequal -
ities over real numbers. It can also eliminate quantifiers from Boolean expressions.
This finds the conditions for a quadratic form over the reals to be positive.
In[6]:= Resolve@ForAll@x, a x ^ 2 + b x + c > 0D, RealsD
Out[6]= Ia > 0 && -a b2 + 4 a2 c > 0M Ha 0 && b 0 && c > 0L Ia 0 && b 0 && c > 0 && -a b2 + 4 a2 c > 0M
This shows that there is a way of assigning truth values to p and q that makes the expression
true.
In[7]:= Resolve@Exists@8p, q<, p q && ! qD, BooleansD
Out[7]= True
You can also use quantifiers with Reduce. If you give Reduce a collection of equations or inequali-
ties, then it will try to produce a detailed representation of the complete solution set. But some-
times you may want to address a more global question, such as whether the solution set covers
all values of x, or whether it covers none of these values. Quantifiers provide a convenient way
to specify such questions.
This finds the conditions for a circle to be contained within an arbitrary conic section.
In[10]:= Reduce@ForAll@8x, y<, x ^ 2 + y ^ 2 < 1, a x ^ 2 + b y ^ 2 < cD, 8a, b, c<, RealsD
Out[10]= Ha 0 && HHb 0 && c > 0L Hb > 0 && c bLLL Ha > 0 && HHb < a && c aL Hb a && c bLLL
This finds the condition for all pairs of roots to the quartic to be equal.
In[13]:= Reduce@ForAll@8x, y<, q@xD == 0 && q@yD == 0, x == yD, 8b, c, d, e<D
3 b2 b3 b4
Out[13]= c && d && e Hb 0 && c 0 && d 0 && e 0L
8 16 256
9 4 is, though.
In[15]:= Resolve@Exists@8x, y<, 4 x ^ 2 == 9 y ^ 2 && y > 0D, IntegersD
Out[15]= True
Mathematics and Algorithms 145
Minimize and Maximize yield lists giving the value attained at the minimum or maximum,
together with rules specifying where the minimum or maximum occurs.
25 5 5
Out[3]= : , :x - , y- >>
8 2 2
Minimize@expr, xD minimizes expr allowing x to range over all possible values from - to +.
Minimize@8expr, cons<, xD minimizes expr subject to the constraints cons being satisfied. The
constraints can consist of any combination of equations and inequalities.
This finds the maximum within an ellipse. The result is fairly complicated.
In[6]:= Maximize@85 x y - x ^ 4 - y ^ 4, x ^ 2 + 2 y ^ 2 <= 1<, 8x, y<D
Out[6]= 9-RootA-811 219 + 320 160 1 + 274 624 12 - 170 240 13 + 25 600 14 &, 1E,
9x RootA25 - 102 12 + 122 14 - 70 16 + 50 18 &, 2E,
y RootA25 - 264 12 + 848 14 - 1040 16 + 800 18 &, 1E==
Minimize and Maximize can solve any linear programming problem in which both the objective
function expr and the constraints cons involve the variables xi only linearly.
They can also in principle solve any polynomial programming problem in which the objective
function and the constraints involve arbitrary polynomial functions of the variables. There are
many important geometrical and other problems that can be formulated in this way.
This solves the simple geometrical problem of maximizing the area of a rectangle with fixed
perimeter.
In[9]:= Maximize@8x y, x + y == 1<, 8x, y<D
1 1 1
Out[9]= : , :x , y >>
4 2 2
This finds the maximal volume of a cuboid that fits inside the unit sphere.
In[10]:= Maximize@88 x y z, x ^ 2 + y ^ 2 + z ^ 2 <= 1<, 8x, y, z<D
8 1 1 1
Out[10]= : , :x , y- , z- >>
3 3 3 3 3
An important feature of Minimize and Maximize is that they always find global minima and
maxima. Often functions will have various local minima and maxima at which derivatives van-
Mathematics and Algorithms 147
An important feature of Minimize and Maximize is that they always find global minima and
maxima. Often functions will have various local minima and maxima at which derivatives van-
ish. But Minimize and Maximize use global methods to find absolute minima or maxima, not
just local extrema.
Out[11]=
-10 -5 5 10
-5
-10
If you give functions that are unbounded, Minimize and Maximize will return - and + as the
minima and maxima. And if you give constraints that can never be satisfied, they will return +
and - as the minima and maxima, and Indeterminate as the values of variables.
One subtle issue is that Minimize and Maximize allow both nonstrict inequalities of the form
x <= v, and strict ones of the form x < v. With nonstrict inequalities there is no problem with a
minimum or maximum lying exactly on the boundary x -> v. But with strict inequalities, a mini-
mum or maximum must in principle be at least infinitesimally inside the boundary.
With a strict inequality, Mathematica prints a warning, then returns the point on the boundary.
In[13]:= Minimize@8x ^ 2 - 3 x + 6, x > 3<, xD
Minimize and Maximize normally assume that all variables you give are real. But by giving a
constraint such as x Integers you can specify that a variable must in fact be an integer.
148 Mathematics and Algorithms
Minimize and Maximize can compute maxima and minima of linear functions over the integers
in bounded polyhedra. This is known as integer linear programming.
Minimize and Maximize can produce exact symbolic results for polynomial optimization prob-
lems with parameters.
Out[16]=
MinValue@8 f ,cons<,8x,y,<D give the minimum value of f subject to the constraints cons
MaxValue@8 f ,cons<,8x,y,<D give the maximum value of f subject to the constraints cons
ArgMin@8 f ,cons<,8x,y,<D give a position at which f is minimized subject to the
constraints cons
ArgMax@8 f ,cons<,8x,y,<D give a position at which f is maximized subject to the
constraints cons
Out[18]= 5
For strict polynomial inequality constraints computing only the maximum value may be much
faster.
In[19]:= TimeConstrained@
Maximize@8- x ^ 2 + 2 x y - z - 1, x ^ 2 y < z ^ 3 && x - z ^ 2 > y ^ 2 + 2<, 8x, y, z<D, 300D
Out[19]= $Aborted
Out[20]= 90.312, -1 - RootA-6 674 484 057 677 824 + 27 190 416 613 703 680 1 -
9 845 871 213 297 967 104 12 + 30 310 812 947 042 320 384 13 - 38 968 344 650 849 575 680 14 +
27 943 648 095 748 511 616 15 - 13 622 697 129 083 140 957 16 + 5 905 344 357 450 294 480 17 -
2 872 859 681 127 251 424 18 + 1 484 592 492 626 145 792 19 - 567 863 224 101 551 360 110 +
100 879 538 475 737 088 111 + 303 891 741 605 888 112 - 2 224 545 911 472 128 113 +
70 301 735 976 960 114 + 25 686 756 556 800 115 + 1 786 706 395 136 116 + 73 014 444 032 117 &, 1E=
Linear Algebra
Constructing Matrices
DiagonalMatrix makes a matrix with zeros everywhere except on the leading diagonal.
In[4]:= DiagonalMatrix@8a, b, c<D
Out[4]= 88a, 0, 0<, 80, b, 0<, 80, 0, c<<
Table evaluates If@i j, a ++, 0D separately for each element, to give a matrix with
sequentially increasing entries in the lower-triangular part.
Mathematics and Algorithms 151
Table evaluates If@i j, a ++, 0D separately for each element, to give a matrix with
sequentially increasing entries in the lower-triangular part.
In[8]:= a = 1; Table@If@i j, a ++, 0D, 8i, 3<, 8j, 3<D
Out[8]= 881, 0, 0<, 82, 3, 0<, 84, 5, 6<<
Take@m,8i0 ,i1 <,8 j0 , j1 <D the submatrix with rows i0 through i1 and columns j0
through j1
m@@i0 ;;i1 , j0 ;; j1 DD the submatrix with rows i0 through i1 and columns j0
through j1
mAA9i1 ,,ir }, { j1 , , js }]] the rs submatrix with elements having row indices ik and
column indices jk
Matrices in Mathematica are represented as lists of lists. You can use all the standard Mathemat-
ica list-manipulation operations on matrices.
152 Mathematics and Algorithms
m@@i0 ;;i1 , j0 ;; j1 DD= reset the submatrix with rows i0 through i1 and columns j0
88a11 ,a12 ,<,8a21 ,a22 ,<,< through j1 to new values
Here is a 33 matrix.
In[5]:= m = 88a, b, c<, 8d, e, f<, 8g, h, i<<
Out[5]= 88a, b, c<, 8d, e, f<, 8g, h, i<<
This resets elements in the first and third columns of each row.
In[11]:= m@@All, 1 ;; 3 ;; 2DD = 881, 2<, 83, 4<, 85, 6<<; m
Out[11]= 881, v, 2<, 83, y, 4<, 85, 1 + k, 6<<
This resets elements in the first and third columns of rows 2 through 3.
In[12]:= m@@2 ;; 3, 1 ;; 3 ;; 2DD = 88a, b<, 8c, d<<; m
Out[12]= 881, v, 2<, 8a, y, b<, 8c, 1 + k, d<<
A vector in Mathematica consists of a list of scalars. A matrix consists of a list of vectors, repre-
senting each of its rows. In order to be a valid matrix, all the rows must be the same length, so
that the elements of the matrix effectively form a rectangular array.
154 Mathematics and Algorithms
VectorQ@exprD give True if expr has the form of a vector, and False
otherwise
MatrixQ@exprD give True if expr has the form of a matrix, and False
otherwise
Dimensions@exprD a list of the dimensions of a vector or matrix
Anything that is not manifestly a list is treated as a scalar, so applying VectorQ gives False .
In[2]:= VectorQ@x + yD
Out[2]= False
This is a 23 matrix.
In[3]:= Dimensions@88a, b, c<, 8ap, bp, cp<<D
Out[3]= 82, 3<
For a vector, Dimensions gives a list with a single element equal to the result from Length.
In[4]:= Dimensions@8a, b, c<D
Out[4]= 83<
This object does not count as a matrix because its rows are of different lengths.
In[5]:= MatrixQ@88a, b, c<, 8ap, bp<<D
Out[5]= False
A consequence is that most mathematical functions are applied element by element to matrices
and vectors.
Mathematics and Algorithms 155
The same is true for a matrix, or, for that matter, for any nested list.
In[2]:= Log@88a, b<, 8c, d<<D
Out[2]= 88Log@aD, Log@bD<, 8Log@cD, Log@dD<<
If you try to add two vectors with different lengths, you get an error.
In[5]:= 8a, b, c< + 8ap, bp<
Thread::tdlen : Objects of unequal length in 8a, b, c< + 8ap, bp< cannot be combined.
Any object that is not manifestly a list is treated as a scalar. Here c is treated as a scalar, and
added separately to each element in the vector.
In[7]:= 8a, b< + c
Out[7]= 8a + c, b + c<
The object p is treated as a scalar, and added separately to each element in the vector.
In[9]:= 8a, b< + p
Out[9]= 8a + p, b + p<
This is what happens if you now replace p by the list 8c, d<.
In[10]:= % . p -> 8c, d<
Out[10]= 88a + c, a + d<, 8b + c, b + d<<
You would have got a different result if you had replaced p by 8c, d< before you did the first
operation.
In[11]:= 8a, b< + 8c, d<
Out[11]= 8a + c, b + d<
It is important to realize that you can use "dot" for both left- and right-multiplication of vectors
by matrices. Mathematica makes no distinction between "row" and "column" vectors. Dot car-
ries out whatever operation is possible. (In formal terms, a.b contracts the last index of the
tensor a with the first index of b.)
This left-multiplies the vector v by m. The object v is effectively treated as a column vector in
this case.
In[6]:= m.v
Out[6]= 8a x + b y, c x + d y<
You can also use dot to right-multiply v by m. Now v is effectively treated as a row vector.
In[7]:= v.m
Out[7]= 8a x + c y, b x + d y<
For some purposes, you may need to represent vectors and matrices symbolically, without
explicitly giving their elements. You can use dot to represent multiplication of such symbolic
objects.
158 Mathematics and Algorithms
For some purposes, you may need to represent vectors and matrices symbolically, without
explicitly giving their elements. You can use dot to represent multiplication of such symbolic
objects.
You can apply the distributive law in this case using the function Distribute , as discussed in
"Structural Operations".
In[12]:= Distribute@%D
Out[12]= a.c.d + a.c.e + b.c.d + b.c.e
The "dot" operator gives "inner products" of vectors, matrices, and so on. In more advanced
calculations, you may also need to construct outer or Kronecker products of vectors and matri-
ces. You can use the general function Outer or KroneckerProduct to do this.
Vector Operations
This gives a vector u in the direction opposite to v with twice the magnitude.
In[2]:= u = -2 v
Out[2]= 8-2, -6, -4<
Out[5]= 14
1 3 2
Out[6]= : , , >
14 14 7
Two vectors are orthogonal if their dot product is zero. A set of vectors is orthonormal if they
are all unit vectors and are pairwise orthogonal.
p is a scalar multiple of v.
In[11]:= pv
12 12 12
Out[11]= :- , - , - >
7 7 7
u - p is orthogonal to v.
In[12]:= Hu - pL.v
Out[12]= 0
Starting from the set of vectors 8u, v<, this finds an orthonormal set of two vectors.
In[13]:= Orthogonalize@8u, v<D
1 3 2 13 3 2
Out[13]= :: , - , - >, : , , >>
14 14 7 14 182 91
When one of the vectors is linearly dependent on the vectors preceding it, the corresponding
position in the result will be a zero vector.
In[14]:= Orthogonalize@8v, p, u<D
1 3 2 13 3 2
Out[14]= :: , , >, 80, 0, 0<, : , - , - >>
14 14 7 14 182 91
Matrix Inversion
Matrix inversion.
This gives the inverse of m. In producing this formula, Mathematica implicitly assumes that the
determinant a d - b c is nonzero.
162 Mathematics and Algorithms
This gives the inverse of m. In producing this formula, Mathematica implicitly assumes that the
determinant a d - b c is nonzero.
In[2]:= Inverse@mD
d b c a
Out[2]= :: , - >, :- , >>
-b c + a d -b c + a d -b c + a d -b c + a d
Multiplying the inverse by the original matrix should give the identity matrix.
In[3]:= %.m
bc ad bc ad
Out[3]= ::- + , 0>, :0, - + >>
-b c + a d -b c + a d -b c + a d -b c + a d
You have to use Together to clear the denominators, and get back a standard identity matrix.
In[4]:= Together@%D
Out[4]= 881, 0<, 80, 1<<
If you try to invert a singular matrix, Mathematica prints a warning message, and returns the
input unchanged.
In[8]:= Inverse@881, 2<, 81, 2<<D
If you give a matrix with exact symbolic or numerical entries, Mathematica gives the exact
inverse. If, on the other hand, some of the entries in your matrix are approximate real num-
bers, then Mathematica finds an approximate numerical result.
Multiplying by the original matrix gives you an identity matrix with small round-off errors.
In[11]:= %.m
-15 -16
Out[11]= 991., 1.66187 10 =, 93.27429 10 , 1.==
When you try to invert a matrix with exact numerical entries, Mathematica can always tell
whether or not the matrix is singular. When you invert an approximate numerical matrix, Mathe-
matica can usually not tell for certain whether or not the matrix is singular: all it can tell is, for
example, that the determinant is small compared to the entries of the matrix. When Mathemat-
ica suspects that you are trying to invert a singular numerical matrix, it prints a warning.
Mathematica prints a warning if you invert a numerical matrix that it suspects is singular.
In[13]:= Inverse@881., 2.<, 81., 2.<<D
This matrix is singular, but the warning is different, and the result is useless.
In[14]:= Inverse@N@881, 2, 3<, 84, 5, 6<, 87, 8, 9<<DD
If you work with high-precision approximate numbers, Mathematica will keep track of the
precision of matrix inverses that you generate.
This takes the matrix, multiplies it by its inverse, and shows the first row of the result.
In[16]:= Hm.Inverse@mDL@@1DD
-19 -19 -20 -20 -20
Out[16]= 91.000000000000000000, 0. 10 , 0. 10 , 0. 10 , 0. 10 , 0. 10 =
This generates a 20-digit numerical approximation to a 66 Hilbert matrix. Hilbert matrices are
notoriously hard to invert numerically.
In[17]:= m = N@Table@1 Hi + j - 1L, 8i, 6<, 8j, 6<D, 20D;
The result is still correct, but the zeros now have lower accuracy.
In[18]:= Hm.Inverse@mDL@@1DD
-15 -14 -14 -14 -14
Out[18]= 91.000000000000000, 0. 10 , 0. 10 , 0. 10 , 0. 10 , 0. 10 =
Inverse works only on square matrices. "Advanced Matrix Operations" discusses the function
PseudoInverse, which can also be used with nonsquare matrices.
Transpose@mD transpose m
Transposing a matrix interchanges the rows and columns in the matrix. If you transpose an
mn matrix, you get an nm matrix as the result.
Det@mD gives the determinant of a square matrix m. Minors@mD is the matrix whose Hi, jLth
element gives the determinant of the submatrix obtained by deleting the Hn - i + 1Lth row and the
Hn - j + 1Lth column of m. The Hi, jLth cofactor of m is H-1Li+ j times the Hn - i + 1, n - j + 1Lth element of
the matrix of minors.
Minors@m, kD gives the determinants of the kk submatrices obtained by picking each possible
set of k rows and k columns from m. Note that you can apply Minors to rectangular, as well as
square, matrices.
The trace or spur of a matrix Tr@mD is the sum of the terms on the leading diagonal.
Here is a 22 matrix.
In[7]:= m = 880.4, 0.6<, 80.525, 0.475<<
Out[7]= 880.4, 0.6<, 80.525, 0.475<<
In some cases, however, you may prefer to convert the system of linear equations into a matrix
equation, and then apply matrix manipulation operations to solve it. This approach is often
useful when the system of equations arises as part of a general algorithm, and you do not know
in advance how many variables will be involved.
A system of linear equations can be stated in matrix form as m.x = b, where x is the vector of
variables.
Note that if your system of equations is sparse, so that most of the entries in the matrix m are
zero, then it is best to represent the matrix as a SparseArray object. As discussed in "Sparse
Arrays: Linear Algebra", you can convert from symbolic equations to SparseArray objects
using CoefficientArrays. All the functions described here work on SparseArray objects as
well as ordinary matrices.
You can also get the vector of solutions by calling LinearSolve . The result is equivalent to the
one you get from Solve .
In[4]:= LinearSolve@m, 8a, b<D
1 1
Out[4]= : H-a + 5 bL, H2 a - bL>
9 9
Another way to solve the equations is to invert the matrix m, and then multiply 8a, b< by the
inverse. This is not as efficient as using LinearSolve .
In[5]:= [email protected], b<
a 5b 2a b
Out[5]= :- + , - >
9 9 9 9
RowReduce performs a version of Gaussian elimination and can also be used to solve the
equations.
In[6]:= RowReduce@881, 5, a<, 82, 1, b<<D
1 1
Out[6]= ::1, 0, H-a + 5 bL>, :0, 1, H2 a - bL>>
9 9
If you have a square matrix m with a nonzero determinant, then you can always find a unique
solution to the matrix equation m.x = b for any b. If, however, the matrix m has determinant
zero, then there may be either no vector, or an infinite number of vectors x which satisfy m.x = b
for a particular b. This occurs when the linear equations embodied in m are not independent.
When m has determinant zero, it is nevertheless always possible to find nonzero vectors x that
satisfy m.x = 0. The set of vectors x satisfying this equation form the null space or kernel of the
matrix m. Any of these vectors can be expressed as a linear combination of a particular set of
basis vectors, which can be obtained using NullSpace@mD.
Mathematics and Algorithms 169
When m has determinant zero, it is nevertheless always possible to find nonzero vectors x that
satisfy m.x = 0. The set of vectors x satisfying this equation form the null space or kernel of the
matrix m. Any of these vectors can be expressed as a linear combination of a particular set of
basis vectors, which can be obtained using NullSpace@mD.
Multiplying the basis vector for the null space by m gives the zero vector.
In[11]:= m.%@@1DD
Out[11]= 80, 0<
An important feature of functions like LinearSolve and NullSpace is that they work with
rectangular, as well as square, matrices.
When you represent a system of linear equations by a matrix equation of the form m.x = b, the
number of columns in m gives the number of variables, and the number of rows gives the num-
ber of equations. There are a number of cases.
This asks for the solution to the inconsistent set of equations x = 1 and x = 0.
In[16]:= LinearSolve@881<, 81<<, 81, 0<D
LinearSolve gives one of the possible solutions to this underdetermined set of equations.
In[18]:= v = LinearSolve@m, 81, 1<D
2 1
Out[18]= : , , 0>
5 5
When a matrix represents an underdetermined system of equations, the matrix has a nontrivial
null space. In this case, the null space is spanned by a single vector.
In[19]:= NullSpace@mD
Out[19]= 88-1, -1, 1<<
If you take the solution you get from LinearSolve , and add any linear combination of the
basis vectors for the null space, you still get a solution.
In[20]:= m.Hv + 4 %@@1DDL
Out[20]= 81, 1<
The number of independent equations is the rank of the matrix MatrixRank@mD. The number of
redundant equations is Length@NullSpace@mDD. Note that the sum of these quantities is always
equal to the number of columns in m.
LinearSolve @mD generate a function for solving equations of the form m.x = b
In some applications, you will want to solve equations of the form m.x = b many times with the
same m, but different b. You can do this efficiently in Mathematica by using LinearSolve @mD to
create a single LinearSolveFunction that you can apply to as many vectors as you want.
You get the same result by giving the vector as an explicit second argument to LinearSolve .
In[23]:= LinearSolve@881, 4<, 82, 3<<, 85, 7<D
13 3
Out[23]= : , >
5 5
CharacteristicPolynomial@m,xD
the characteristic polynomial of m
The eigenvalues of a matrix m are the values li for which one can find nonzero vectors vi such
that m.vi = li vi . The eigenvectors are the vectors vi .
Finding the eigenvalues of an n n matrix in general involves solving an nth -degree polynomial
equation. For n 5, therefore, the results cannot in general be expressed purely in terms of
explicit radicals. Root objects can nevertheless always be used, although except for fairly
sparse or otherwise simple matrices the expressions obtained are often unmanageably complex.
Even for a matrix as simple as this, the explicit form of the eigenvalues is quite complicated.
In[1]:= Eigenvalues@88a, b<, 8- b, 2 a<<D
1 1
Out[1]= : 3a- a2 - 4 b 2 , 3a+ a2 - 4 b 2 >
2 2
If you give a matrix of approximate real numbers, Mathematica will find the approximate numeri -
cal eigenvalues and eigenvectors.
Eigensystem computes the eigenvalues and eigenvectors at the same time. The assignment
sets vals to the list of eigenvalues, and vecs to the list of eigenvectors.
174 Mathematics and Algorithms
Eigensystem computes the eigenvalues and eigenvectors at the same time. The assignment
sets vals to the list of eigenvalues, and vecs to the list of eigenvectors.
In[5]:= 8vals, vecs< = Eigensystem@mD
Out[5]= 886.31303, -5.21303<, 880.746335, 0.66557<, 8-0.513839, 0.857886<<<
This verifies that the first eigenvalue and eigenvector satisfy the appropriate condition.
In[6]:= m.vecs@@1DD == vals@@1DD vecs@@1DD
Out[6]= True
This finds the eigenvalues of a random 44 matrix. For nonsymmetric matrices, the eigenvalues
can have imaginary parts.
In[7]:= Eigenvalues@Table@RandomReal@D, 84<, 84<DD
Out[7]= 82.30022, 0.319764 + 0.547199 , 0.319764 - 0.547199 , 0.449291<
The function Eigenvalues always gives you a list of n eigenvalues for an nn matrix. The eigen-
values correspond to the roots of the characteristic polynomial for the matrix, and may not
necessarily be distinct. Eigenvectors, on the other hand, gives a list of eigenvectors which are
guaranteed to be independent. If the number of such eigenvectors is less than n, then
Eigenvectors appends zero vectors to the list it returns, so that the total length of the list is
always n.
Here is a 33 matrix.
In[8]:= mz = 880, 1, 0<, 80, 0, 1<, 80, 0, 0<<
Out[8]= 880, 1, 0<, 80, 0, 1<, 80, 0, 0<<
There is, however, only one independent eigenvector for the matrix. Eigenvectors appends
two zero vectors to give a total of three vectors in this case.
In[10]:= Eigenvectors@mzD
Out[10]= 881, 0, 0<, 80, 0, 0<, 80, 0, 0<<
Mathematics and Algorithms 175
Eigenvalues sorts numeric eigenvalues so that the ones with large absolute value come first.
In many situations, you may be interested only in the largest or smallest eigenvalues of a
matrix. You can get these efficiently using Eigenvalues @m, kD and Eigenvalues @m, - kD.
The generalized eigenvalues for a matrix m with respect to a matrix a are defined to be those li
176 Mathematics and Algorithms
The generalized eigenvalues for a matrix m with respect to a matrix a are defined to be those li
for which m.vi = li a.vi .
Note that while ordinary matrix eigenvalues always have definite values, some generalized
eigenvalues will always be Indeterminate if the generalized characteristic polynomial vanishes,
which happens if m and a share a null space. Note also that generalized eigenvalues can be
infinite.
These two matrices share a one-dimensional null space, so one generalized eigenvalue is
Indeterminate .
In[15]:= [email protected], 0<, 80, 0<<, 882, 0<, 81, 0<<<D
Out[15]= 80., Indeterminate<
The singular values of a matrix m are the square roots of the eigenvalues of m.m* , where *
denotes Hermitian transpose. The number of such singular values is the smaller dimension of
the matrix. SingularValueList sorts the singular values from largest to smallest. Very small
singular values are usually numerically meaningless. With the option setting Tolerance -> t,
SingularValueList drops singular values that are less than a fraction t of the largest singular
value. For approximate numerical matrices, the tolerance is by default slightly greater than zero.
If you multiply the vector for each point in a unit sphere in n-dimensional space by an mn
matrix m, then you get an m-dimensional ellipsoid, whose principal axes have lengths given by
the singular values of m.
Mathematics and Algorithms 177
If you multiply the vector for each point in a unit sphere in n-dimensional space by an mn
matrix m, then you get an m-dimensional ellipsoid, whose principal axes have lengths given by
the singular values of m.
The 2-norm of a matrix Norm@m, 2D is the largest principal axis of the ellipsoid, equal to the
largest singular value of the matrix. This is also the maximum 2-norm length of m.v for any
possible unit vector v.
The p-norm of a matrix Norm@m, pD is in general the maximum p-norm length of m.v that can be
attained. The cases most often considered are p = 1, p = 2 and p = . Also sometimes considered
is the Frobenius norm Norm@m, "Frobenius"D, which is the square root of the trace of m.m* .
When you create a LinearSolveFunction using LinearSolve @mD, this often works by decom-
posing the matrix m into triangular forms, and sometimes it is useful to be able to get such
forms explicitly.
LU decomposition effectively factors any square matrix into a product of lower- and upper-
triangular matrices. Cholesky decomposition effectively factors any Hermitian positive-definite
matrix into a product of a lower-triangular matrix and its Hermitian conjugate, which can be
viewed as the analog of finding a square root of a matrix.
The standard definition for the inverse of a matrix fails if the matrix is not square or is singular.
The pseudoinverse mH-1L of a matrix m can however still be defined. It is set up to minimize the
sum of the squares of all entries in m.mH-1L - I, where I is the identity matrix. The pseudoinverse
is sometimes known as the generalized inverse, or the Moore|Penrose inverse. It is particularly
used for problems related to least-squares fitting.
Most square matrices can be reduced to a diagonal matrix of eigenvalues by applying a matrix
of their eigenvectors as a similarity transformation. But even when there are not enough eigen-
vectors to do this, one can still reduce a matrix to a Jordan form in which there are both eigen-
values and Jordan blocks on the diagonal. Jordan decomposition in general writes any square
matrix in the form s. j.s-1 .
Numerically more stable is the Schur decomposition, which writes any square matrix m in the
form q.t.q* , where q is an orthonormal matrix, and t is block upper-triangular. Also related is the
Hessenberg decomposition, which writes a square matrix m in the form p.h.p* , where p is an
orthonormal matrix, and h can have nonzero elements down to the diagonal below the leading
diagonal.
Tensors
Tensors are mathematical objects that give generalizations of vectors and matrices. In Mathe-
matica, a tensor is represented as a set of lists, nested to a certain number of levels. The
nesting level is the rank of the tensor.
Mathematics and Algorithms 179
rank 0 scalar
rank 1 vector
rank 2 matrix
rank k rank k tensor
The indices that specify a particular element in the tensor correspond to the coordinates in the
cuboid. The dimensions of the tensor correspond to the side lengths of the cuboid.
One simple way that a rank k tensor can arise is in giving a table of values for a function of k
variables. In physics, the tensors that occur typically have indices which run over the possible
directions in space or spacetime. Notice, however, that there is no built-in notion of covariant
and contravariant tensor indices in Mathematica: you have to set these up explicitly using
metric tensors.
MatrixForm displays the elements of the tensor in a two-dimensional array. You can think of
the array as being a 23 matrix of column vectors.
In[3]:= MatrixForm@tD
2 3 4
3 5 7
Out[3]//MatrixForm=
3 4 5
4 6 8
The rank of a tensor is equal to the number of indices needed to specify each element. You can
pick out subtensors by using a smaller number of indices.
Transpose@t,8p1 ,p2 ,<D transpose the indices in a tensor so that the kth becomes
the pk th
You can think of a rank k tensor as having k "slots" into which you insert indices. Applying
Transpose is effectively a way of reordering these slots. If you think of the elements of a tensor
Mathematics and Algorithms 181
You can think of a rank k tensor as having k "slots" into which you insert indices. Applying
Transpose is effectively a way of reordering these slots. If you think of the elements of a tensor
as forming a k-dimensional cuboid, you can view Transpose as effectively rotating (and possibly
reflecting) the cuboid.
In the most general case, Transpose allows you to specify an arbitrary reordering to apply to
the indices of a tensor. The function Transpose@T, 8p1 , p2 , , pk <D gives you a new tensor T
such that the value of T i1 i2 ik is given by Ti p ip ip .
1 2 k
If you originally had an n p1 n p2 n pk tensor, then by applying Transpose, you will get an
n1 n2 nk tensor.
Applying Transpose gives you a 32 tensor. Transpose effectively interchanges the two
"slots" for tensor indices.
In[8]:= mt = Transpose@mD
Out[8]= 88a, ap<, 8b, bp<, 8c, cp<<
The element m@@2, 3DD in the original tensor becomes the element m@@3, 2DD in the trans-
posed tensor.
In[9]:= 8m@@2, 3DD, mt@@3, 2DD<
Out[9]= 8cp, cp<
If you have a tensor that contains lists of the same length at different levels, then you can use
Transpose to effectively collapse different levels.
This collapses all three levels, giving a list of the elements on the "main diagonal".
In[13]:= Transpose@Array@a, 83, 3, 3<D, 81, 1, 1<D
Out[13]= 8a@1, 1, 1D, a@2, 2, 2D, a@3, 3, 3D<
Outer products, and their generalizations, are a way of building higher-rank tensors from lower-
rank ones. Outer products are also sometimes known as direct, tensor or Kronecker products.
From a structural point of view, the tensor you get from Outer@ f , t, uD has a copy of the
structure of u inserted at the "position" of each element in t. The elements in the resulting
structure are obtained by combining elements of t and u using the function f .
Mathematics and Algorithms 183
This gives the "outer f" of two vectors. The result is a matrix.
In[18]:= Outer@f, 8a, b<, 8ap, bp<D
Out[18]= 88f@a, apD, f@a, bpD<, 8f@b, apD, f@b, bpD<<
If you take the "outer f" of a length 3 vector with a length 2 vector, you get a 32 matrix.
In[19]:= Outer@f, 8a, b, c<, 8ap, bp<D
Out[19]= 88f@a, apD, f@a, bpD<, 8f@b, apD, f@b, bpD<, 8f@c, apD, f@c, bpD<<
The result of taking the "outer f" of a 22 matrix and a length 3 vector is a 223 tensor.
In[20]:= Outer@f, 88m11, m12<, 8m21, m22<<, 8a, b, c<D
Out[20]= 888f@m11, aD, f@m11, bD, f@m11, cD<, 8f@m12, aD, f@m12, bD, f@m12, cD<<,
88f@m21, aD, f@m21, bD, f@m21, cD<, 8f@m22, aD, f@m22, bD, f@m22, cD<<<
In terms of indices, the result of applying Outer to two tensors Ti1 i2 ir and U j1 j2 js is the tensor
In doing standard tensor calculations, the most common function f to use in Outer is Times,
corresponding to the standard outer product.
The simplest examples are with vectors. If you apply Inner to two vectors of equal length, you
get a scalar. Inner@ f , v1 , v2 , gD gives a generalization of the usual scalar product, with f
playing the role of multiplication, and g playing the role of addition.
184 Mathematics and Algorithms
The simplest examples are with vectors. If you apply Inner to two vectors of equal length, you
get a scalar. Inner@ f , v1 , v2 , gD gives a generalization of the usual scalar product, with f
playing the role of multiplication, and g playing the role of addition.
You can think of Inner as performing a "contraction" of the last index of one tensor with the
first index of another. If you want to perform contractions across other pairs of indices, you can
do so by first transposing the appropriate indices into the first or last position, then applying
Inner, and then transposing the result back.
In many applications of tensors, you need to insert signs to implement antisymmetry. The
function Signature@8i1 , i2 , <D, which gives the signature of a permutation, is often useful for
this purpose.
Mathematics and Algorithms 185
Outer@ f ,t1 ,t2 ,D form a generalized outer product by combining the lowest-
level elements of t1 , t2 ,
Outer@ f ,t1 ,t2 ,,nD treat only sublists at level n as separate elements
Outer@ f ,t1 ,t2 ,,n1 ,n2 ,D treat only sublists at level ni in ti as separate elements
Inner@ f ,t1 ,t2 ,gD form a generalized inner product using the lowest-level
elements of t1
Inner@ f ,t1 ,t2 ,g,nD contract index n of the first tensor with the first index of
the second tensor
Here is a block matrix (a matrix of matrices that can be viewed as blocks that fit edge to edge
within a larger matrix).
In[30]:= TableForm@88 881, 2<, 84, 5<<, 883<, 86<< <, 8 887, 8<<, 889<< <<D
1 2 3
Out[30]//TableForm= 4 5 6
7 8 9
As discussed in "Sparse Arrays: Manipulating Lists", you can use patterns to specify collections
of elements in sparse arrays. You can also have sparse arrays that correspond to tensors of any
rank.
This makes a 5050 sparse numerical matrix, with 148 nonzero elements.
In[1]:= m = SparseArray@8830, _< -> 11.5, 8_, 30< -> 21.5, 8i_, i_< -> i<, 850, 50<D
Out[1]= SparseArray@<148>, 850, 50<D
Out[2]=
You can apply most standard structural operations directly to SparseArray objects, just as you
would to ordinary lists. When the results are sparse, they typically return SparseArray objects.
This gives the rules for the nonzero elements on the second row.
188 Mathematics and Algorithms
This gives the rules for the nonzero elements on the second row.
In[9]:= ArrayRules@m@@2DDD
Out[9]= 881< 3, 82< 2, 830< 21.5, 8_< 0<
Out[17]= 9c + x2 - z, x2 + 2 y + z2 =
For machine-precision numerical sparse matrices, Mathematica supports standard file formats
such as Matrix Market (.mtx) and Harwell|Boeing. You can import and export matrices in these
formats using Import and Export.
xi
This constructs the sum 7i=1 .
i
This makes i increase in steps of 2, so that only odd-numbered values are included.
In[3]:= Sum@x ^ i i, 8i, 1, 5, 2<D
x3 x5
Out[3]= x+ +
3 5
190 Mathematics and Algorithms
imax
Sum @ f ,8i,imin ,imax <D the sum i=i f
min
imax
Product@ f ,8i,imin ,imax <D the product i=i f
min
Mathematica can also give an exact result for this infinite sum.
In[6]:= Sum@1 i ^ 4, 8i, 1, Infinity<D
p4
Out[6]=
90
-2 x14 + EllipticTheta@2, 0, xD
Out[7]=
2 x14
standard mathematical notation, the range of the outermost variable is given first.
3 i
This is the multiple sum i=1 j=1 xi y j . Notice that the outermost sum over i is given first, just
as in the mathematical notation.
In[10]:= Sum@x ^ i y ^ j, 8i, 1, 3<, 8j, 1, i<D
Out[10]= x y + x2 y + x3 y + x2 y2 + x3 y2 + x3 y3
The way the ranges of variables are specified in Sum and Product is an example of the rather
general iterator notation that Mathematica uses. You will see this notation again when we
discuss generating tables and lists using Table ("Making Tables of Values"), and when we
describe Do loops ("Repetitive Operations").
Power Series
The mathematical operations we have discussed so far are exact. Given precise input, their
results are exact formulas.
In many situations, however, you do not need an exact result. It may be quite sufficient, for
example, to find an approximate formula that is valid, say, when the quantity x is small.
This gives a power series approximation to H1 + xLn for x close to 0, up to terms of order x3 .
In[1]:= Series@H1 + xL ^ n, 8x, 0, 3<D
1 1
Out[1]= 1 + n x + H-1 + nL n x2 + H-2 + nL H-1 + nL n x3 + O@xD4
2 6
Mathematica knows the power series expansions for many mathematical functions.
192 Mathematics and Algorithms
Mathematica knows the power series expansions for many mathematical functions.
In[2]:= Series@Exp@- a tD H1 + Sin@2 tDL, 8t, 0, 4<D
a2 4 a3 1
Out[2]= 1 + H2 - aL t + -2 a + t2 + - + a2 - t3 + I32 a - 8 a3 + a4 M t4 + O@tD5
2 3 6 24
If you give it a function that it does not know, Series writes out the power series in terms of
derivatives.
In[3]:= Series@1 + f@tD, 8t, 0, 3<D
1 1
Out[3]= H1 + f@0DL + f @0D t + f @0D t2 + fH3L @0D t3 + O@tD4
2 6
Power series are approximate formulas that play much the same role with respect to algebraic
expressions as approximate numbers play with respect to numerical expressions. Mathematica
allows you to perform operations on power series, in all cases maintaining the appropriate order
or "degree of precision" for the resulting power series.
When you do operations on a power series, the result is computed only to the appropriate order
in x.
In[5]:= % ^ 2 H1 + %L
13 x2 35 x3 97 x4 55 x5
Out[5]= 2 + 5 x + + + + + O@xD6
2 6 24 24
Series@expr,8x,x0 ,n<D find the power series expansion of expr about the point
x = x0 to at most nth order
Normal@seriesD truncate a power series to give an ordinary expression
Series@expr,8x,x0 ,n<D find the power series expansion of expr about the point
x = x0 to order at most Hx - x0 Ln
SeriesAexpr,9x,x0 ,nx =,9y,y0 ,ny =E
find series expansions with respect to y then x
Here is the power series expansion for expHxL about the point x = 0 to order x4 .
In[1]:= Series@Exp@xD, 8x, 0, 4<D
x2 x3 x4
Out[1]= 1 + x + + + + O@xD5
2 6 24
If Mathematica does not know the series expansion of a particular function, it writes the result
symbolically in terms of derivatives.
In[3]:= Series@f@xD, 8x, 0, 3<D
1 1
Out[3]= f@0D + f @0D x + f @0D x2 + fH3L @0D x3 + O@xD4
2 6
In mathematical terms, Series can be viewed as a way of constructing Taylor series for func-
tions.
194 Mathematics and Algorithms
In mathematical terms, Series can be viewed as a way of constructing Taylor series for func-
tions.
The standard formula for the Taylor series expansion about the point x = x0 of a function gHxL
Ix-x0 Mk
with kth derivative gHkL HxL is gHxL = HkL
k=0 g Hx0 L k!
. Whenever this formula applies, it gives the
same results as Series. (For common functions, Series nevertheless internally uses somewhat
more efficient algorithms.)
Series can also generate some power series that involve fractional and negative powers, not
directly covered by the standard Taylor series formula.
x x32 x2
Out[5]= 1 + x + + + + O@xD52
2 6 24
There are, of course, mathematical functions for which no standard power series exist. Mathe-
matica recognizes many such cases.
1
Series sees that expJ x N has an essential singularity at x = 0, and does not produce a power
series.
In[7]:= Series@Exp@1 xD, 8x, 0, 2<D
1
Out[7]= x
Mathematics and Algorithms 195
1
Series can nevertheless give you the power series for expJ x N about the point x = .
In[8]:= Series@Exp@1 xD, 8x, Infinity, 3<D
2 3
1 1 1 1 1 1 4
Out[8]= 1 + + + + OB F
x 2 x 6 x x
Especially when negative powers occur, there is some subtlety in exactly how many terms of a
particular power series the function Series will generate.
One way to understand what happens is to think of the analogy between power series taken to
a certain order, and real numbers taken to a certain precision. Power series are "approximate
formulas" in much the same sense as finite-precision real numbers are approximate numbers.
The procedure that Series follows in constructing a power series is largely analogous to the
procedure that N follows in constructing a real-number approximation. Both functions effectively
start by replacing the smallest pieces of your expression by finite-order, or finite-precision,
approximations, and then evaluating the resulting expression. If there are, for example, cancella-
tions, this procedure may give a final result whose order or precision is less than the order or
precision that you originally asked for. Like N, however, Series has some ability to retry its
computations so as to get results to the order you ask for. In cases where it does not succeed,
you can usually still get results to a particular order by asking for a higher order than you need.
Series compensates for cancellations in this computation, and succeeds in giving you a result
to order x3 .
In[9]:= Series@Sin@xD x ^ 2, 8x, 0, 3<D
1 x x3
Out[9]= - + + O@xD4
x 6 120
When you make a power series expansion in a variable x, Mathematica assumes that all objects
that do not explicitly contain x are in fact independent of x. Series thus does partial derivatives
(effectively using D) to build up Taylor series.
You can use Series to generate power series in a sequence of different variables. Series works
like Integrate, Sum and so on, and expands first with respect to the last variable you specify.
Series performs a series expansion successively with respect to each variable. The result in
this case is a series in x, whose coefficients are series in y.
In[12]:= Series@Exp@x yD, 8x, 0, 3<, 8y, 0, 3<D
y2 y3
4
Out[12]= 1 + Iy + O@yD M x + + O@yD4 x2 + + O@yD4 x3 + O@xD4
2 6
The power series is printed out as a sum of terms, ending with O@xD raised to a power.
In[1]:= Series@Cos@xD, 8x, 0, 4<D
x2 x4
Out[1]= 1 - + + O@xD5
2 24
By using SeriesData objects, rather than ordinary expressions, to represent power series,
Mathematica can keep track of the order and expansion point, and do operations on the power
series appropriately. You should not normally need to know the internal structure of
SeriesData objects.
You can recognize a power series that is printed out in standard output form by the presence of
an O@xD term. This term mimics the standard mathematical notation OHxL, and represents omit-
ted terms of order x. For various reasons of consistency, Mathematica uses the notation O@xD ^ n
for omitted terms of order xn , corresponding to the mathematical notation OHxLn , rather than the
slightly more familiar, though equivalent, form OHxn L.
Any time that an object like O@xD appears in a sum of terms, Mathematica will in fact convert
the whole sum into a power series.
Mathematics and Algorithms 197
Any time that an object like O@xD appears in a sum of terms, Mathematica will in fact convert
the whole sum into a power series.
The presence of O@xD makes Mathematica convert the whole sum to a power series.
In[3]:= a x + Exp@xD + O@xD ^ 3
x2
Out[3]= 1 + H1 + aL x + + O@xD3
2
The logarithmic factors appear explicitly inside the SeriesData coefficient list.
In[7]:= % InputForm
Out[7]//InputForm= SeriesData[x, 0, {1, Log[x], Log[x]^2/2, Log[x]^3/6, Log[x]^4/24}, 0, 5, 1]
When you square the power series, you get another power series, also accurate to fourth order.
198 Mathematics and Algorithms
When you square the power series, you get another power series, also accurate to fourth order.
In[2]:= %^2
4 x3 2 x4
2
Out[2]= 1 + 2 x + 2 x + + + O@xD5
3 3
Taking the logarithm gives you the result 2 x, but only to order x4 .
In[3]:= Log@%D
5
Out[3]= 2 x + O@xD
Mathematica keeps track of the orders of power series in much the same way as it keeps track
of the precision of approximate real numbers. Just as with numerical calculations, there are
operations on power series which can increase, or decrease, the precision (or order) of your
results.
When you perform an operation that involves both a normal expression and a power series,
Mathematica "absorbs" the normal expression into the power series whenever possible.
If you add Sin@xD, Mathematica generates the appropriate power series for Sin@xD, and
combines it with the power series you have.
In[11]:= % + Sin@xD
3 x2 x4
Out[11]= 2 + 2 x + + + O@xD5
2 24
Mathematica also absorbs expressions that multiply power series. The symbol a is assumed to
be independent of x.
In[12]:= Ha + xL % ^ 2
29 a
2 3
Out[12]= 4 a + H4 + 8 aL x + H8 + 10 aL x + H10 + 6 aL x + 6 + x4 + O@xD5
12
Mathematica knows how to apply a wide variety of functions to power series. However, if you
apply an arbitrary function to a power series, it is impossible for Mathematica to give you any-
thing but a symbolic result.
Mathematica does not know how to apply the function f to a power series, so it just leaves the
symbolic result.
In[13]:= f@Series@Exp@xD, 8x, 0, 3<DD
x2 x3
Out[13]= fB1 + x + + + O@xD4 F
2 6
200 Mathematics and Algorithms
This replaces the variable x in the power series for expHxL by a power series for sinHxL.
In[2]:= ComposeSeries@%, Series@Sin@xD, 8x, 0, 5<DD
x2 x4 x5
Out[2]= 1 + x + - - + O@xD6
2 8 15
If you have a power series for a function f HyL, then it is often possible to get a power series
approximation to the solution for y in the equation f HyL = x. This power series effectively gives
the inverse function f -1 HxL such that f I f -1 HxLM = x. The operation of finding the power series for an
Composing the series with its inverse gives the identity function.
In[7]:= ComposeSeries@%, %%%D
6
Out[7]= y + O@yD
Power series in Mathematica are represented in a special internal form, which keeps track of
such attributes as their expansion order.
For some purposes, you may want to convert power series to normal expressions. From a
mathematical point of view, this corresponds to truncating the power series, and assuming that
all higher-order terms are zero.
Squaring the power series gives you another power series, with the appropriate number of
terms.
In[2]:= t^2
2 x4 23 x6 44 x8
2
Out[2]= x - + - + O@xD10
3 45 105
SeriesCoefficient@series,nD give the coefficient of the nth -order term in a power series
2
This gives the coefficient for the term xn in the Taylor expansion of the function ex about zero.
In[6]:= SeriesCoefficient@E ^ x ^ 2, 8x, 0, n<D
KroneckerDelta@Mod@n, 2DD
Out[6]=
n
!
2
This solves the equations for the coefficients a@iD. You can also feed equations involving power
series directly to Solve .
In[4]:= Solve@%D
1 1
Out[4]= ::a@3D - , a@1D 1, a@2D >, 8a@3D 0, a@1D -1, a@2D 0<>
12 2
Some equations involving power series can also be solved using the InverseSeries function
discussed in "Composition and Inversion of Power Series".
Summation of Series
Sum @expr,8n,nmin ,nmax <D find the sum of expr as n goes from nmin to nmax
Evaluating sums.
Out[2]= BesselIB0, 2 x F
Here is another sum that can be done in terms of common special functions.
In[3]:= Sum@n ! x ^ n H2 nL !, 8n, 1, Infinity<D
1 x
Out[3]= x4 p x ErfB F
2 2
There are many analogies between sums and integrals. And just as it is possible to have indefi-
nite integrals, so indefinite sums can be set up by using symbolic variables as upper limits.
Taking the difference between results for successive values of n gives back the original sum-
mand.
In[8]:= FullSimplify@% - H% . n -> n - 1LD
1
Out[8]=
H1 + nL4
Mathematica can do essentially all sums that are found in books of tables. Just as with indefi-
nite integrals, indefinite sums of expressions involving simple functions tend to give answers
that involve more complicated functions. Definite sums, like definite integrals, often, however,
come out in terms of simpler functions.
RSolve takes recurrence equations and solves them to get explicit formulas for a@nD.
This takes the solution and makes an explicit table of the first ten a@nD.
In[2]:= Table@a@nD . First@%D, 8n, 10<D
Out[2]= 81, 2, 4, 8, 16, 32, 64, 128, 256, 512<
RSolve can be thought of as a discrete analog of DSolve. Many of the same functions gener-
ated in solving differential equations also appear in finding symbolic solutions to recurrence
equations.
RSolve does not require you to specify explicit values for terms such as a@1D. Like DSolve, it
automatically introduces undetermined constants C@iD to give a general solution.
RSolve can solve equations that do not depend only linearly on a@nD. For nonlinear equations,
however, there are sometimes several distinct solutions that must be given. Just as for differen-
tial equations, it is a difficult matter to find symbolic solutions to recurrence equations, and
standard mathematical functions only cover a limited set of cases.
RSolve can solve not only ordinary difference equations in which the arguments of a differ by
integers, but also q-difference equations in which the arguments of a are related by multiplica-
tive factors.
Just as one can set up partial differential equations that involve functions of several variables,
so one can also set up partial recurrence equations that involve multidimensional sequences.
Just as in the differential equations case, general solutions to partial recurrence equations can
involve undetermined functions.
Finding Limits
In doing many kinds of calculations, you need to evaluate expressions when variables take on
particular values. In many cases, you can do this simply by applying transformation rules for
the variables using the . operator.
You can get the value of cos Ix2 M at 0 just by explicitly replacing x with 0, and then evaluating
the result.
In[1]:= Cos@x ^ 2D . x -> 0
Out[1]= 1
sin HxL
Consider, for example, finding the value of the expression x
when x = 0. If you simply replace
0 sin HxL
x by 0 in this expression, you get the indeterminate result 0
. To find the correct value of x
Finding limits.
Mathematics and Algorithms 209
sin HxL
This gives the correct value for the limit of as x 0.
x
Limit can find this limit, even though you cannot get an ordinary power series for x log HxL at
x = 0.
In[4]:= Limit@x Log@xD, x -> 0D
Out[4]= 0
Out[5]= 2
Not all functions have definite limits at particular points. For example, the function sinH1 xL
oscillates infinitely often near x = 0, so it has no definite limit there. Nevertheless, at least so
long as x remains real, the values of the function near x = 0 always lie between -1 and 1. Limit
represents values with bounded variation using Interval objects. In general, Interval@8xmin ,
xmax <D represents an uncertain value which lies somewhere in the interval xmin to xmax .
Limit returns an Interval object, representing the range of possible values of sin H1 xL near
its essential singularity at x = 0.
In[8]:= Limit@Sin@1 xD, x -> 0D
Out[8]= Interval@8-1, 1<D
Some functions may have different limits at particular points, depending on the direction from
which you approach those points. You can use the Direction option for Limit to specify the
direction you want.
Directional limits.
The function 1 x has a different limiting value at x = 0, depending on whether you approach
from above or below.
In[11]:= Plot@1 x, 8x, - 1, 1<D
10
-5
-10
Limit makes no assumptions about functions like f@xD about which it does not have definite
knowledge. As a result, Limit remains unevaluated in most cases involving symbolic functions.
Residues
Limit@expr, x -> x0 D tells you what the value of expr is when x tends to x0 . When this value is
infinite, it is often useful instead to know the residue of expr when x equals x0 . The residue is
given by the coefficient of Hx - x0 L-1 in the power series expansion of expr about the point x0 .
Computing residues.
Pad Approximation
The Pad approximation is a rational function that can be thought of as a generalization of a
Taylor polynomial. A rational function is the ratio of polynomials. Because these functions only
use the elementary arithmetic operations, they are very easy to evaluate numerically. The
polynomial in the denominator allows you to approximate functions that have rational
singularities.
212 Mathematics and Algorithms
Pad approximations.
More precisely, a Pad approximation of order Hn, mL to an analytic function f HxL at a regular
p HxL
point or pole x0 is the rational function q HxL
where pHxL is a polynomial of degree n, qHxL is a polyno -
mial of degree m, and the formal power series of f HxL qHxL - pHxL about the point x0 begins with the
term Hx - x0 Ln+m+1 . If m is equal to n, the approximation is called a diagonal Pad approximation of
order n.
The initial terms of this series vanish. This is the property that characterizes the Pad approxima-
tion.
In[3]:= Series@x Denominator@pdD - Numerator@pdD, 8x, 1, 8<D
Hx - 1L7 Hx - 1L8
Out[3]= + + O@x - 1D9
75 600 120 960
Mathematics and Algorithms 213
This plots the difference between the approximation and the true function. Notice that the
approximation is very good near the center of expansion, but the error increases rapidly as you
move away.
In[4]:= Plot@pd - x , 8x, 0, 2<D
0.00001
5. 10-6
-5. 10-6
-0.00001
This gives the diagonal Pad approximation of order 1 to a generalized rational function at x = 0.
Sqrt@xD
In[5]:= PadeApproximantB , 8x, 0, 1<F
H1 + Sqrt@xDL ^ 3
x
x -
3
Out[5]=
8 x
1+ +2x
3
This gives the diagonal Pad approximation of order 5 to the logarithm of a rational function at
the branch point x = 0.
x
In[6]:= PadeApproximantBLogB F, 8x, 0, 5<F
1+x
47 x3 11 x4 137 x5
-x - 2 x2 - - -
36 36 7560
Out[6]= + Log@xD
5x 20 x2 5 x3 5 x4 x5
1+ + + + +
2 9 6 42 252
The series expansion of the function agrees with the diagonal Pad approximation up to order
10.
x
In[7]:= SeriesB% - LogB F, 8x, 0, 11<F
1+x
x11
Out[7]= + O@xD12
698 544
214 Mathematics and Algorithms
Calculus
Differentiation
D@ f ,xD partial derivative f
x
D@ f ,x,y,D multiple derivative f
x y
n
D@ f 8x,n<D nth derivative xn
f
This gives xn .
x
In[1]:= D@x ^ n, xD
Out[1]= n x-1+n
You can differentiate with respect to any expression that does not involve explicit mathematical
operations.
In[3]:= D@x@1D ^ 2 + x@2D ^ 2, x@1DD
Out[3]= 2 x@1D
If y does in fact depend on x, you can use the explicit functional form y@xD. "The Representa-
tion of Derivatives" describes how objects like y '@xD work.
In[5]:= D@x ^ 2 + y@xD ^ 2, xD
Out[5]= 2 x + 2 y@xD y @xD
Instead of giving an explicit function y@xD, you can tell D that y implicitly depends on x.
y
D@y, x, NonConstants -> 8y<D then represents x
, with y implicitly depending on x.
Vector derivatives.
Total Derivatives
d d
Dt@ f ,x,y,D multiple total derivative f
dx dy
When you find the derivative of some expression f with respect to x, you are effectively finding
out how fast f changes as you vary x. Often f will depend not only on x, but also on other
variables, say y and z. The results that you get then depend on how you assume that y and z
vary as you change x.
There are two common cases. Either y and z are assumed to stay fixed when x changes, or they
f
are allowed to vary with x. In a standard partial derivative x
, all variables other than x are
df
assumed fixed. On the other hand, in the total derivative dx
, all variables are allowed to change
with x.
In Mathematica, D@ f , xD gives a partial derivative, with all other variables assumed indepen-
dent of x. Dt@ f , xD gives a total derivative, in which all variables are assumed to depend on x.
In both cases, you can add an argument to give more information on dependencies.
This gives the partial derivative Ix2 + y2 M. y is assumed to be independent of x.
x
In[1]:= D@x ^ 2 + y ^ 2, xD
Out[1]= 2x
d
This gives the total derivative Ix2 + y2 M. Now y is assumed to depend on x.
dx
In[2]:= Dt@x ^ 2 + y ^ 2, xD
Out[2]= 2 x + 2 y Dt@y, xD
dy
You can make a replacement for .
dx
Mathematics and Algorithms 217
dy
You can make a replacement for .
dx
dy
You can also make an explicit definition for . You need to use y : to make sure that the
dx
definition is associated with y.
In[4]:= y : Dt@y, xD = 0
Out[4]= 0
Mathematica applies the chain rule for differentiation, and leaves the result in terms of f '.
In[3]:= D@x f@x ^ 2D, xD
When a function has more than one argument, superscripts are used to indicate how many
times each argument is being differentiated.
In[5]:= D@g@x ^ 2, y ^ 2D, xD
This represents g Hx, yL. Mathematica assumes that the order in which derivatives are
x x y
taken with respect to different variables is irrelevant.
In[6]:= D@g@x, yD, x, x, yD
You can find the value of the derivative when x = 0 by replacing x with 0.
In[7]:= % . x -> 0
d
The standard mathematical notation f H0L is really a shorthand for dt
f HtL t=0 , where t is a
d
"dummy variable". Similarly, f Ix2 M is a shorthand for dt
f HtL t=x2 . As suggested by the notation f ,
d
the object dt
f HtL can in fact be viewed as a "pure function", to be evaluated with a particular
choice of its parameter t. You can think of the operation of differentiation as acting on a func-
tion f , to give a new function, usually called f .
With functions of more than one argument, the simple notation based on primes breaks down.
d d
You cannot tell for example whether g H0, 1L stands for dt
gHt, 1L t=0 or dt
gH0, tL t=1 , and for almost
any g, these will have totally different values. Once again, however, t is just a dummy variable,
whose sole purpose is to show with respect to which "slot" g is to be differentiated.
The object f ' in Mathematica is the result of applying the differentiation operator to the func-
tion f. The full form of f ' is in fact Derivative@1D@fD. Derivative@1D is the Mathematica
differentiation operator.
220 Mathematics and Algorithms
The object f ' in Mathematica is the result of applying the differentiation operator to the func-
tion f. The full form of f ' is in fact Derivative@1D@fD. Derivative@1D is the Mathematica
differentiation operator.
The arguments in the operator Derivative@n1 , n2 , D specify how many times to differentiate
with respect to each "slot" of the function on which it acts. By using operators to represent
differentiation, Mathematica avoids any need to introduce explicit "dummy variables".
This gives a derivative of the function g with respect to its second "slot".
In[4]:= D@g@x, yD, yD
Here is the second derivative with respect to the variable y, which appears in the second slot of
g.
In[6]:= D@g@x, yD, 8y, 2<D FullForm
Out[6]//FullForm= Derivative@0, 2D@gD@x, yD
Since Derivative only specifies how many times to differentiate with respect to each slot, the
order of the derivatives is irrelevant.
Mathematics and Algorithms 221
Since Derivative only specifies how many times to differentiate with respect to each slot, the
order of the derivatives is irrelevant.
In[8]:= D@g@x, yD, y, y, xD FullForm
Out[8]//FullForm= Derivative@1, 2D@gD@x, yD
Here is a more complicated case, in which both arguments of g depend on the differentiation
variable.
In[9]:= D@g@x, xD, xD
The object f ' behaves essentially like any other function in Mathematica. You can evaluate the
function with any argument, and you can use standard Mathematica . operations to change
the argument. (This would not be possible if explicit dummy variables had been introduced in
the course of the differentiation.)
This is the Mathematica representation of the derivative of a function f, evaluated at the origin.
In[11]:= f '@0D FullForm
Out[11]//FullForm= Derivative@1D@fD@0D
The result of this derivative involves f ' evaluated with the argument x ^ 2.
In[12]:= D@f@x ^ 2D, xD
Out[12]= 2 x f Ax2 E
You can evaluate the result at the point x = 2 by using the standard Mathematica replacement
operation.
In[13]:= % . x -> 2
Out[13]= 4 f @4D
222 Mathematics and Algorithms
There is some slight subtlety when you need to deduce the value of f ' based on definitions for
objects like f@x_D.
When you take the derivative of h@xD, Mathematica first evaluates h@xD, then differentiates
the result.
In[15]:= D@h@xD, xD
Out[15]= 4 x3
You can get the same result by applying the function h ' to the argument x.
In[16]:= h '@xD
Out[16]= 4 x3
Out[17]= 4 13 &
The function f ' is completely determined by the form of the function f. Definitions for objects
like f@x_D do not immediately apply however to expressions like f '@xD. The problem is that
f '@xD has the full form Derivative@1D@fD@xD, which nowhere contains anything that explicitly
matches the pattern f@x_D. In addition, for many purposes it is convenient to have a representa-
tion of the function f ' itself, without necessarily applying it to any arguments.
What Mathematica does is to try and find the explicit form of a pure function which represents
the object f '. When Mathematica gets an expression like Derivative@1D@fD, it effectively
converts it to the explicit form D@f@D, D & and then tries to evaluate the derivative. In the
explicit form, Mathematica can immediately use values that have been defined for objects like
f@x_D. If Mathematica succeeds in doing the derivative, it returns the explicit pure-function
result. If it does not succeed, it leaves the derivative in the original f ' form.
Here is the result of applying the pure function to the specific argument y.
In[19]:= %@yD
Out[19]= Sec@yD2
Defining Derivatives
You can define the derivative in Mathematica of a function f of one argument simply by an
assignment like f '@x_D = fp@xD.
This defines the derivative of f HxL to be f pHxL. In this case, you could have used = instead of :=.
In[1]:= f '@x_D := fp@xD
Out[2]= 2 x fpAx2 E
To define derivatives of functions with several arguments, you have to use the general represen-
tation of derivatives in Mathematica.
Defining derivatives.
This defines the second derivative of g with respect to its second argument.
In[8]:= Derivative@0, 2D@gD@x_, y_D := g2p@x, yD
Integration
Mathematica knows how to do almost any integral that can be done in terms of standard mathe-
matical functions. But you should realize that even though an integrand may contain only fairly
simple functions, its integral may involve much more complicated functions~or may not be
expressible at all in terms of standard mathematical functions.
Mathematics and Algorithms 225
Mathematica knows how to do almost any integral that can be done in terms of standard mathe-
matical functions. But you should realize that even though an integrand may contain only fairly
simple functions, its integral may involve much more complicated functions~or may not be
expressible at all in terms of standard mathematical functions.
p 2
Out[6]= FresnelSB xF
2 p
This integral simply cannot be done in terms of standard mathematical functions. As a result,
Mathematica just leaves it undone.
In[8]:= Integrate@x ^ x, xD
x
Out[8]= x x
226 Mathematics and Algorithms
Integration.
b
Here is the definite integral a sin2 HxL d x.
p
Out[10]=
2
1 x
This evaluates the multiple integral 0 d x 0 d y Ix2 + y2 M. The range of the outermost integration
variable appears first.
In[13]:= Integrate@x ^ 2 + y ^ 2, 8x, 0, 1<, 8y, 0, x<D
1
Out[13]=
3
Mathematics and Algorithms 227
Indefinite Integrals
The Mathematica function Integrate@ f , xD gives you the indefinite integral f d x. You can
think of the operation of indefinite integration as being an inverse of differentiation. If you take
the result from Integrate@ f , xD, and then differentiate it, you always get a result that is mathe-
matically equal to the original expression f .
In general, however, there is a whole family of results which have the property that their deriva-
tive is f . Integrate@ f , xD gives you an expression whose derivative is f . You can get other
expressions by adding an arbitrary constant of integration, or indeed by adding any function
that is constant except at discrete points.
If you fill in explicit limits for your integral, any such constants of integration must cancel out.
But even though the indefinite integral can have arbitrary constants added, it is still often very
convenient to manipulate it without filling in the limits.
You can add an arbitrary constant to the indefinite integral, and still get the same derivative.
Integrate simply gives you an expression with the required derivative.
In[2]:= D@% + c, xD
Out[2]= x2
dx
This gives the indefinite integral .
x2 -1
The Integrate function assumes that any object that does not explicitly contain the integration
variable is independent of it, and can be treated as a constant. As a result, Integrate is like an
inverse of the partial differentiation function D.
The integration variable can be any expression that does not involve explicit mathematical
operations.
In[7]:= Integrate@x b@xD ^ 2, b@xDD
1
Out[7]= x b@xD3
3
Another assumption that Integrate implicitly makes is that all the symbolic quantities in your
xn+1
integrand have "generic" values. Thus, for example, Mathematica will tell you that xn x is n+1
Mathematica gives the standard result for this integral, implicitly assuming that n is not equal to
- 1.
In[8]:= Integrate@x ^ n, xD
x1+n
Out[8]=
1+n
You should realize that the result for any particular integral can often be written in many differ-
ent forms. Mathematica tries to give you the most convenient form, following principles such as
Mathematics and Algorithms 229
You should realize that the result for any particular integral can often be written in many differ-
ent forms. Mathematica tries to give you the most convenient form, following principles such as
avoiding explicit complex numbers unless your input already contains them.
ArcTanB a xF
Out[10]=
a
ArcTanhB b xF
Out[11]=
b
This is mathematically equal to the first integral, but is given in a somewhat different form.
In[12]:= % . b -> - a
ArcTanhB -a xF
Out[12]=
-a
Even though they look quite different, both ArcTan@xD and - ArcTan@1 xD are indefinite
integrals of 1 I1 + x2 M.
In[14]:= Simplify@D@8ArcTan@xD, - ArcTan@1 xD<, xDD
1 1
Out[14]= : , >
1 + x2 1 + x2
One of the main problems is that it is difficult to know what kinds of functions will be needed to
evaluate a particular integral. When you work out a derivative, you always end up with func-
tions that are of the same kind or simpler than the ones you started with. But when you work
out integrals, you often end up needing to use functions that are much more complicated than
the ones you started with.
This integral can be evaluated using the same kind of functions that appeared in the input.
In[1]:= Integrate@Log@xD ^ 2, xD
p 2
Out[3]= FresnelSB xF
2 p
This integral involves an incomplete gamma function. Note that the power is carefully set up to
allow any complex value of x.
In[4]:= Integrate@Exp@- x ^ aD, xD
1
x Hxa L-1a GammaB , xa F
a
Out[4]= -
a
Mathematica includes a very wide range of mathematical functions, and by using these func-
tions a great many integrals can be done. But it is still possible to find even fairly simple-look-
ing integrals that just cannot be done in terms of any standard mathematical functions.
Here is a fairly simple-looking integral that cannot be done in terms of any standard mathemati-
cal functions.
Mathematics and Algorithms 231
Here is a fairly simple-looking integral that cannot be done in terms of any standard mathemati-
cal functions.
In[5]:= Integrate@Sin@xD Log@xD, xD
Sin@xD
Out[5]= x
Log@xD
The main point of being able to do an integral in terms of standard mathematical functions is
that it lets one use the known properties of these functions to evaluate or manipulate the result
one gets.
In the most convenient cases, integrals can be done purely in terms of elementary functions
such as exponentials, logarithms and trigonometric functions. In fact, if you give an integrand
that involves only such elementary functions, then one of the important capabilities of
Integrate is that if the corresponding integral can be expressed in terms of elementary func-
tions, then Integrate will essentially always succeed in finding it.
Integrals of rational functions are straightforward to evaluate, and always come out in terms of
rational functions, logarithms and inverse trigonometric functions.
In[6]:= Integrate@x HHx - 1L Hx + 2LL, xD
1 2
Out[6]= Log@-1 + xD + Log@2 + xD
3 3
The integral here is still of the same form, but now involves an implicit sum over the roots of a
polynomial.
In[7]:= Integrate@1 H1 + 2 x + x ^ 3L, xD
Log@x - 1D
Out[7]= RootSumB1 + 2 1 + 13 &, &F
2 + 3 12
Integrals of trigonometric functions usually come out in terms of other trigonometric functions.
In[9]:= Integrate@Sin@xD ^ 3 Cos@xD ^ 2, xD
Cos@xD 1 1
Out[9]= - - Cos@3 xD + Cos@5 xD
8 48 80
232 Mathematics and Algorithms
By nesting elementary functions you sometimes get integrals that can be done in terms of
elementary functions.
In[12]:= Integrate@Cos@Log@xDD, xD
1 1
Out[12]= x Cos@Log@xDD + x Sin@Log@xDD
2 2
But occasionally one can get results in terms of elementary functions alone.
In[15]:= Integrate@Sqrt@Tan@xDD, xD
1
Out[15]= -2 ArcTanB1 - 2 Tan@xD F + 2 ArcTanB1 + 2 Tan@xD F +
2 2
LogB-1 + 2 Tan@xD - Tan@xDF - LogB1 + 2 Tan@xD + Tan@xDF
Beyond working with elementary functions, Integrate includes a large number of algorithms
for dealing with special functions. Sometimes it uses a direct generalization of the procedure for
Mathematics and Algorithms 233
Beyond working with elementary functions, Integrate includes a large number of algorithms
for dealing with special functions. Sometimes it uses a direct generalization of the procedure for
elementary functions. But more often its strategy is first to try to write the integrand in a form
that can be integrated in terms of certain sophisticated special functions, and then having done
this to try to find reductions of these sophisticated functions to more familiar functions.
A large book of integral tables will list perhaps a few thousand indefinite integrals. Mathematica
can do essentially all of these integrals. And because it contains general algorithms rather than
just specific cases, Mathematica can actually do a vastly wider range of integrals.
You could expect to find this integral in any large book of integral tables.
In[19]:= Integrate@Log@1 - xD x, xD
Out[19]= -PolyLog@2, xD
To do this integral, however, requires a more general algorithm, rather than just a direct table
lookup.
In[20]:= Integrate@Log@1 + 3 x + x ^ 2D x, xD
1 2x 1 2x
Out[20]= Log@xD LogB 3- 5 + xF - LogB1 + F + Log@xD LogB 3+ 5 + xF - LogB1 + F +
2 3- 5 2 3+ 5
1 1
Log@xD -LogB 3 - 5 + xF - LogB 3+ 5 + xF + LogA1 + 3 x + x2 E -
2 2
2x 2x
PolyLogB2, - F - PolyLogB2, - F
3- 5 3+ 5
Particularly if you introduce new mathematical functions of your own, you may want to teach
Mathematica new kinds of integrals. You can do this by making appropriate definitions for
Integrate.
In the case of differentiation, the chain rule allows one to reduce all derivatives to a standard
form, represented in Mathematica using Derivative. But for integration, no such similar stan-
dard form exists, and as a result you often have to make definitions for several different ver-
sions of the same integral. Changes of variables and other transformations can rarely be done
234 Mathematics and Algorithms
In the case of differentiation, the chain rule allows one to reduce all derivatives to a standard
form, represented in Mathematica using Derivative. But for integration, no such similar stan-
dard form exists, and as a result you often have to make definitions for several different ver-
sions of the same integral. Changes of variables and other transformations can rarely be done
automatically by Integrate.
This integral cannot be done in terms of any of the standard mathematical functions built into
Mathematica.
In[21]:= Integrate@Sin@Sin@xDD, xD
Out[21]= Sin@Sin@xDD x
Before you add your own rules for integration, you have to remove write protection.
In[22]:= Unprotect@IntegrateD
Out[22]= 8Integrate<
You can set up your own rule to define the integral to be, say, a "Jones" function.
In[23]:= Integrate@Sin@Sin@a_. + b_. x_DD, x_D := Jones@a, xD b
As it turns out, the integral sinHsinHxLL x can in principle be represented as an infinite sum of 2 F1
Definite Integrals
Integration functions.
b
Here is the integral a x2 d x.
Mathematics and Algorithms 235
b
Here is the integral a x2 d x.
a b
This gives the multiple integral 0 d x 0 dyIx2 + y2 M.
The y integral is done first. Its limits can depend on the value of x. This ordering is the same as
is used in functions like Sum and Table .
In[3]:= Integrate@x ^ 2 + y ^ 2, 8x, 0, a<, 8y, 0, x<D
a4
Out[3]=
3
In simple cases, definite integrals can be done by finding indefinite forms and then computing
appropriate limits. But there is a vast range of integrals for which the indefinite form cannot be
expressed in terms of standard mathematical functions, but the definite form still can be.
Out[4]= Cos@Sin@xDD x
Here is an integral where the indefinite form can be found, but it is much more efficient to work
out the definite form directly.
In[6]:= Integrate@Log@xD Exp@- x ^ 2D, 8x, 0, Infinity<D
1
Out[6]= - p HEulerGamma + Log@4DL
4
236 Mathematics and Algorithms
Just because an integrand may contain special functions, it does not mean that the definite
integral will necessarily be complicated.
In[7]:= Integrate@BesselK@0, xD ^ 2, 8x, 0, Infinity<D
p2
Out[7]=
4
Even when you can find the indefinite form of an integral, you will often not get the correct
answer for the definite integral if you just subtract the values of the limits at each end point.
The problem is that within the domain of integration there may be singularities whose effects
are ignored if you follow this procedure.
1
Integrate::idiv : Integral of does not converge on 8-2, 2<.
x2
2 1
Out[12]= x
-2 x2
Here is a more subtle example, involving branch cuts rather than poles.
Mathematics and Algorithms 237
Here is a more subtle example, involving branch cuts rather than poles.
In[13]:= Integrate@1 H1 + a Sin@xDL, xD
x
a+TanB F
2
2 ArcTanB F
1-a2
Out[13]=
1 - a2
The definite integral, however, gives the correct result which depends on a. The assumption
assures convergence.
In[15]:= Integrate@1 H1 + a Sin@xDL, 8x, 0, 2 Pi<, Assumptions -> - 1 < a < 1D
2p
Out[15]=
1 - a2
1
Integrate::idiv : Integral of does not converge on 8-1, 2<.
x
2 1
Out[18]= x
-1 x
When parameters appear in an indefinite integral, it is essentially always possible to get results
that are correct for almost all values of these parameters. But for definite integrals this is no
longer the case. The most common problem is that a definite integral may converge only when
the parameters that appear in it satisfy certain specific conditions.
For the definite integral, however, n must satisfy a condition in order for the integral to be
convergent.
In[21]:= Integrate@x ^ n, 8x, 0, 1<D
1
Out[21]= IfBRe@nD > -1, , IntegrateAxn , 8x, 0, 1<, Assumptions Re@nD -1EF
1+n
Even when a definite integral is convergent, the presence of singularities on the integration
path can lead to discontinuous changes when the parameters vary. Sometimes a single formula
containing functions like Sign can be used to summarize the result. In other cases, however,
an explicit If is more convenient.
Mathematics and Algorithms 239
Even when a definite integral is convergent, the presence of singularities on the integration
path can lead to discontinuous changes when the parameters vary. Sometimes a single formula
containing functions like Sign can be used to summarize the result. In other cases, however,
an explicit If is more convenient.
The result is discontinuous as a function of a. The discontinuity can be traced to the essential
singularity of sinHxL at x = .
In[26]:= Plot@%, 8a, - 5, 5<D
1.5
1.0
0.5
Out[26]=
-4 -2 2 4
-0.5
-1.0
-1.5
There is no convenient way to represent this answer in terms of Sign , so Mathematica gener-
ates an explicit If .
In[27]:= Integrate@Sin@xD BesselJ@0, a xD x, 8x, 0, Infinity<, Assumptions -> Im@aD == 0D
1
a ArcSinB F p
a
Out[27]= IfBa < -1 a > 1, , F
Abs@aD 2
1.0
Out[28]=
0.5
-4 -2 2 4
Integrals over
Integrals over Regions
Regions
240 Mathematics and Algorithms
Even though an integral may be straightforward over a simple rectangular region, it can be
significantly more complicated even over a circular region.
Particularly if there are parameters inside the conditions that define regions, the results for
integrals over regions may break into several cases.
With two parameters even this breaks into quite a few cases.
In[5]:= Integrate@Boole@a x < bD, 8x, 0, 1<D
1 Ha > 0 && a - b 0L Ha 0 && b > 0L
a-b
a 0 && a - b < 0 && b 0
Out[5]= a
b
a > 0 && b > 0 && a - b > 0
a
7 p6
Out[7]=
1 + 2 p3 + 4 p3
Mathematica cannot give an explicit result for this integral, so it leaves the integral in symbolic
form.
In[1]:= Integrate@x ^ 2 f@xD, xD
2
Out[1]= x f@xD x
Out[2]= x2 f@xD
Here is a definite integral with end points that do not explicitly depend on x.
In[5]:= defint = Integrate@f@xD, 8x, a, b<D
b
Out[5]= f@xD x
a
Differential Equations
You can use the Mathematica function DSolve to find symbolic solutions to ordinary and partial
differential equations.
Solving a differential equation consists essentially in finding the form of an unknown function.
In Mathematica, unknown functions are represented by expressions like y@xD. The derivatives
of such functions are represented by y '@xD, y ''@xD and so on.
The Mathematica function DSolve returns as its result a list of rules for functions. There is a
question of how these functions are represented. If you ask DSolve to solve for y@xD, then
DSolve will indeed return a rule for y@xD. In some cases, this rule may be all you need. But this
rule, on its own, does not give values for y '@xD or even y@0D. In many cases, therefore, it is
better to ask DSolve to solve not for y@xD, but instead for y itself. In this case, what DSolve
will return is a rule which gives y as a pure function, in the sense discussed in "Pure Functions".
If you ask DSolve to solve for y@xD, it will give a rule specifically for y@xD.
In[1]:= DSolve@y '@xD + y@xD == 1, y@xD, xD
Out[1]= 99y@xD 1 + -x C@1D==
The rule applies only to y@xD itself, and not, for example, to objects like y@0D or y '@xD.
In[2]:= y@xD + 2 y '@xD + y@0D . %
Out[2]= 91 + -x C@1D + y@0D + 2 y @xD=
If you ask DSolve to solve for y, it gives a rule for the object y on its own as a pure function.
Mathematics and Algorithms 243
If you ask DSolve to solve for y, it gives a rule for the object y on its own as a pure function.
In[3]:= DSolve@y '@xD + y@xD == 1, y, xD
You can add constraints and boundary conditions for differential equations by explicitly giving
additional equations such as y@0D == 0.
If you ask Mathematica to solve a set of differential equations and you do not give any con-
straints or boundary conditions, then Mathematica will try to find a general solution to your
equations. This general solution will involve various undetermined constants. One new constant
is introduced for each order of derivative in each equation you give.
The default is that these constants are named C@nD, where the index n starts at 1 for each
invocation of DSolve. You can override this choice, by explicitly giving a setting for the option
GeneratedParameters. Any function you give is applied to each successive index value n to get
the constants to use for each invocation of DSolve.
The general solution to this fourth-order equation involves four undetermined constants.
In[9]:= DSolve@y ''''@xD == y@xD, y@xD, xD
Out[9]= 99y@xD x C@1D + -x C@3D + C@2D Cos@xD + C@4D Sin@xD==
Each independent initial or boundary condition you give reduces the number of undetermined
constants by one.
In[10]:= DSolve@8y ''''@xD == y@xD, y@0D == y '@0D == 0<, y@xD, xD
Out[10]= 99y@xD -x IC@3D + 2 x C@3D - 2 x C@4D - 2 x C@3D Cos@xD + x C@4D Cos@xD + x C@4D Sin@xDM==
You should realize that finding exact formulas for the solutions to differential equations is a
difficult matter. In fact, there are only fairly few kinds of equations for which such formulas can
be found, at least in terms of standard mathematical functions.
The most widely investigated differential equations are linear ones, in which the functions you
are solving for, as well as their derivatives, appear only linearly.
Mathematics and Algorithms 245
This is a homogeneous first-order linear differential equation, and its solution is quite simple.
In[11]:= DSolve@y '@xD - x y@xD == 0, y@xD, xD
x2
x2 x2 p x
Out[12]= ::y@xD 2 C@1D + 2 ErfB F>>
2 2
If you have only a single linear differential equation, and it involves only a first derivative of the
function you are solving for, then it turns out that the solution can always be found just by
doing integrals.
But as soon as you have more than one differential equation, or more than a first-order deriva-
tive, this is no longer true. However, some simple second-order linear differential equations can
nevertheless be solved using various special functions from "Special Functions". Indeed, histori-
cally many of these special functions were first introduced specifically in order to represent the
solutions to such equations.
14 1 5
Out[16]= ::y@xD C@1D I-1 + Cos@xD2 M LegendrePB , , Cos@xDF +
2 2
14 1 5
C@2D I-1 + Cos@xD2 M LegendreQB , , Cos@xDF>>
2 2
Occasionally second-order linear equations can be solved using only elementary functions.
In[17]:= DSolve@x ^ 2 y ''@xD + y@xD == 0, y@xD, xD
1 1
Out[17]= ::y@xD x C@1D CosB 3 Log@xDF + x C@2D SinB 3 Log@xDF>>
2 2
Beyond second order, the kinds of functions needed to solve even fairly simple linear differen-
tial equations become extremely complicated. At third order, the generalized Meijer G function
MeijerG can sometimes be used, but at fourth order and beyond absolutely no standard mathe-
matical functions are typically adequate, except in very special cases.
Here is a third-order linear differential equation which can be solved in terms of generalized
hypergeometric functions.
In[18]:= DSolve@y '''@xD + x y@xD == 0, y@xD, xD
Out[18]= ::y@xD
1 3 x4 1 3 5 x4
C@1D HypergeometricPFQB8<, : , >, - F+ x C@2D HypergeometricPFQB8<, : , >, - F+
2 4 64 2 2 4 4 64
1 5 3 x4
x2 C@3D HypergeometricPFQB8<, : , >, - F>>
8 4 2 64
For nonlinear differential equations, only rather special cases can usually ever be solved in
terms of standard mathematical functions. Nevertheless, DSolve includes fairly general proce-
dures which allow it to handle almost all nonlinear differential equations whose solutions are
found in standard reference books.
Mathematics and Algorithms 247
First-order nonlinear differential equations in which x does not appear on its own are fairly easy
to solve.
In[20]:= DSolve@y '@xD - y@xD ^ 2 == 0, y@xD, xD
1
Out[20]= ::y@xD >>
-x - C@1D
2 2 x32 2 2 x32
Out[21]= ::y@xD x -BesselJB- , F + BesselJB , F C@1D
3 3 3 3
1 2 x32 1 2 x32
BesselJB , F + BesselJB- , F C@1D >>
3 3 3 3
2 2
Out[23]= ::y@xD - >, :y@xD >>
1 + 2 x + 2 2 x C@1D 1 + 2 x + 2 2 x C@1D
Solve::tdep :
The equations appear to involve the variables to be solved for in an essentially non-algebraic way.
-1-2 x y@xD
2 ArcTanhB F
1 5 -1 - x y@xD H-1 - x y@xDL
Out[24]= SolveB + LogB F C@1D - Log@xD, y@xDF
2 5 x2 y@xD2
In practical applications, it is quite often convenient to set up differential equations that involve
piecewise functions. You can use DSolve to find symbolic solutions to such equations.
248 Mathematics and Algorithms
Out[25]=
Beyond ordinary differential equations, one can consider differential-algebraic equations that
involve a mixture of differential and algebraic equations.
DSolve is set up to handle not only ordinary differential equations in which just a single indepen-
dent variable appears, but also partial differential equations in which two or more independent
variables appear.
This finds the general solution to a simple partial differential equation with two independent
variables.
In[28]:= DSolve@D@y@x1, x2D, x1D + D@y@x1, x2D, x2D == 1 Hx1 x2L, y@x1, x2D, 8x1, x2<D
1
Out[28]= ::y@x1, x2D H-Log@x1D + Log@x2D + x1 C@1D@-x1 + x2D - x2 C@1D@-x1 + x2DL>>
x1 - x2
Mathematics and Algorithms 249
The basic mathematics of partial differential equations is considerably more complicated than
that of ordinary differential equations. One feature is that whereas the general solution to an
ordinary differential equation involves only arbitrary constants, the general solution to a partial
differential equation, if it can be found at all, must involve arbitrary functions. Indeed, with m
independent variables, arbitrary functions of m - 1 arguments appear. DSolve by default names
these functions C@nD.
Out[30]= yH0,0,1L @x1, x2, x3D + yH0,1,0L @x1, x2, x3D + yH1,0,0L @x1, x2, x3D 0
c2 x c2 x
Out[33]= ::y@x, tD C@1DBt - F + C@2DBt + F>>
c2 c2
For an ordinary differential equation, it is guaranteed that a general solution must exist, with
the property that adding initial or boundary conditions simply corresponds to forcing specific
choices for arbitrary constants in the solution. But for partial differential equations this is no
longer true. Indeed, it is only for linear partial differential and a few other special types that
such general solutions exist.
Other partial differential equations can be solved only when specific initial or boundary values
are given, and in the vast majority of cases no solutions can be found as exact formulas in
terms of standard mathematical functions.
250 Mathematics and Algorithms
Other partial differential equations can be solved only when specific initial or boundary values
are given, and in the vast majority of cases no solutions can be found as exact formulas in
terms of standard mathematical functions.
Since y and its derivatives appear only linearly here, a general solution exists.
In[34]:= DSolve@x1 D@y@x1, x2D, x1D + x2 D@y@x1, x2D, x2D == Exp@x1 x2D, y@x1, x2D, 8x1, x2<D
1 x2
Out[34]= ::y@x1, x2D ExpIntegralEi@x1 x2D + 2 C@1DB F >>
2 x1
DSolve::nlpde :
Solution requested to nonlinear partial differential equation. Trying to build a complete integral.
a x1
Out[36]= ::y@x1, x2D C@1D + + x2 C@2D>>
C@2D
Laplace Transforms
Out[2]= t4 Sin@tD
The Laplace transform of this Bessel function just involves elementary functions.
In[5]:= LaplaceTransform@BesselJ@n, tD, t, sD
-n
s+ 1 + s2
Out[5]=
1 + s2
Laplace transforms have the property that they turn integration and differentiation into essen-
tially algebraic operations. They are therefore commonly used in studying systems governed by
differential equations.
252 Mathematics and Algorithms
Fourier Transforms
Integral transforms can produce results that involve "generalized functions" such as
HeavisideTheta.
In[1]:= FourierTransform@1 H1 + t ^ 4L, t, wD
I1+M w
1 -
2 w
Out[1]= + 2 p - + 2 w
HeavisideTheta@-wD + 1 - 2 w
HeavisideTheta@wD
4 4
1 -i w t
- FHwL e w.
2p
In different scientific and technical fields different conventions are often used for defining
Fourier transforms. The option FourierParameters in Mathematica allows you to choose any of
these conventions you want.
Mathematics and Algorithms 253
In different scientific and technical fields different conventions are often used for defining
Fourier transforms. The option FourierParameters in Mathematica allows you to choose any of
these conventions you want.
-i w t 1
pure mathematics 81, - 1< - f HtL e t 2 p -
FHwL ei w t w
1
classical physics 8- 1, 1< 2 p -
f HtL ei w t t - FHwL e
-i w t
w
1 iwt 1 -i w t
modern physics 80, 1< - f HtL e t - FHwL e w
2p 2p
-i w t 1
systems engineering 81, - 1< - f HtL e t 2 p -
FHwL ei w t w
-2 p i w t 2piwt
signal processing 80, - 2 Pi< - f HtL e t - FHwL e w
b
general case 8a, b< H b L H2 pL1-a - FHwL e
-i b w t
w
H2 pL1+a
ibwt
- f HtL e t
Out[3]=
2
Here is the same Fourier transform with the choice of parameters typically used in signal
processing.
In[4]:= FourierTransform@Exp@- t ^ 2D, t, w, FourierParameters -> 80, - 2 Pi<D
2
w2
Out[4]= -p p
FourierSinTransform@expr,t,wD
Fourier sine transform
FourierCosTransform@expr,t,wD
Fourier cosine transform
254 Mathematics and Algorithms
InverseFourierSinTransform@expr,w,tD
inverse Fourier sine transform
InverseFourierCosTransform@expr,w,tD
inverse Fourier cosine transform
2 2
w
p p
Out[5]= : , >
2
1+w 1 + w2
32
Z Transforms
Z transforms.
1
by the contour integral 2pi
n-1
FHzL z z. Z transforms are effectively discrete analogs of Laplace
transforms. They are widely used for solving difference equations, especially in digital signal
processing and control theory. They can be thought of as producing generating functions, of the
kind commonly used in combinatorics and number theory.
Out[3]= z
Out[1]=
2
-2 -1 1 2
5
4
Out[2]=
3
2
1
2 1 1 2
The limit of the functions for infinite n is effectively a Dirac delta function, whose integral is
again 1.
In[4]:= Integrate@DiracDelta@xD, 8x, - Infinity, Infinity<D
Out[4]= 1
Inserting a delta function in an integral effectively causes the integrand to be sampled at dis-
crete points where the argument of the delta function vanishes.
Mathematics and Algorithms 257
This effectively counts the number of zeros of cosHxL in the region of integration.
In[8]:= Integrate@DiracDelta@Cos@xDD, 8x, - 30, 30<D
Out[8]= 20
The Heaviside function HeavisideTheta@xD is the indefinite integral of the delta function. It is
variously denoted HHxL, qHxL, mHxL, and UHxL. As a generalized function, the Heaviside function is
defined only inside an integral. This distinguishes it from the unit step function UnitStep@xD,
which is a piecewise function.
The indefinite integral of the delta function is the Heaviside theta function.
In[9]:= Integrate@DiracDelta@xD, xD
Out[9]= HeavisideTheta@xD
The value of this integral depends on whether a lies in the interval H-2, 2L.
In[10]:= Integrate@f@xD DiracDelta@x - aD, 8x, - 2, 2<, Assumptions -> a RealsD
Out[10]= f@aD HeavisideTheta@2 - aD HeavisideTheta@2 + aD
Out[11]= 2 p DiracDelta@wD
258 Mathematics and Algorithms
The Fourier transform of cosHtL involves the sum of two delta functions.
In[12]:= FourierTransform@Cos@tD, t, wD
p p
Out[12]= DiracDelta@-1 + wD + DiracDelta@1 + wD
2 2
Dirac delta functions can be used in DSolve to find the impulse response or Green's function of
systems represented by linear and certain other differential equations.
HeavisideTheta@tD SinB r tF
Out[13]= ::x@tD >>
r
Related to the multidimensional Dirac delta function are two integer functions: discrete delta
and Kronecker delta. Discrete delta dHn1 , n2 , L is 1 if all the ni = 0, and is zero otherwise. Kro-
necker delta dn1 n2 is 1 if all the ni are equal, and is zero otherwise.
Arithmetic
You can do arithmetic with Mathematica just as you would on an electronic calculator.
Here the stands for division, and the ^ stands for power.
In[2]:= 2.4 8.9 ^ 2
Out[2]= 0.0302992
Spaces denote multiplication in Mathematica. The front end automatically replaces spaces
between numbers with light gray multiplication signs.
In[3]:= 234
Out[3]= 24
Spaces are not needed, though they often make your input easier to read.
In[6]:= H3 + 4L ^ 2 - 2 H3 + 1L
Out[6]= 41
260 Mathematics and Algorithms
x^y power
-x minus
xy divide
x y z or x*y*z multiply
x+y+z add
Or like this.
In[9]:= 2.3*^70
With Mathematica, you can perform calculations with a particular precision, usually higher than
an ordinary calculator. When given precise numbers, Mathematica does not convert them to an
approximate representation, but gives a precise result.
1
There is no "closed form" result for 0 sin Hsin HxLL x. Mathematica returns the integral in
symbolic form.
In[1]:= Integrate@Sin@Sin@xDD, 8x, 0, 1<D
1
Out[1]= Sin@Sin@xDD x
0
You can now take the symbolic form of the integral, and ask for its approximate numerical
value.
In[2]:= N@%D
Out[2]= 0.430606
When Mathematica cannot find an explicit result for something like a definite integral, it returns
a symbolic form. You can take this symbolic form, and try to get an approximate numerical
value by applying N.
By giving a second argument to N , you can specify the numerical precision to use.
In[3]:= N@Integrate@Sin@Sin@xDD, 8x, 0, 1<D, 40D
Out[3]= 0.4306061031206906049123773552484657864336
If you want to evaluate an integral numerically in Mathematica, then using Integrate and
applying N to the result is not the most efficient way to do it. It is better instead to use the
function NIntegrate, which immediately gives a numerical answer, without first trying to get
an exact, symbolic, result. You should realize that even when Integrate does not in the end
manage to give you an exact result, it may spend a lot of time trying to do so.
262 Mathematics and Algorithms
NIntegrate evaluates numerical integrals directly, without first trying to get a symbolic result.
In[4]:= NIntegrate@Sin@Sin@xDD, 8x, 0, 1<D
Out[4]= 0.430606
When you do a symbolic integral, Mathematica takes the functional form of the integrand you
have given, and applies a sequence of exact symbolic transformation rules to it, to try and
evaluate the integral.
However, when Mathematica does a numerical integral, after some initial symbolic preprocess-
ing, the only information it has about your integrand is a sequence of numerical values for it. To
get a definite result for the integral, Mathematica then effectively has to make certain assump-
tions about the smoothness and other properties of your integrand. If you give a sufficiently
pathological integrand, these assumptions may not be valid, and as a result, Mathematica may
simply give you the wrong answer for the integral.
This problem may occur, for example, if you try to integrate numerically a function which has a
very thin spike at a particular position. Mathematica samples your function at a number of
points, and then assumes that the function varies smoothly between these points. As a result, if
none of the sample points come close to the spike, then the spike will go undetected, and its
contribution to the numerical integral will not be correctly included.
Mathematics and Algorithms 263
0.8
Out[1]= 0.6
0.4
0.2
-10 -5 5 10
NIntegrate gives the correct answer for the numerical integral of this function from -10 to
+10.
In[2]:= NIntegrate@Exp@- x ^ 2D, 8x, - 10, 10<D
Out[2]= 1.77245
If, however, you ask for the integral from -10000 to 10000, with its default settings
NIntegrate will miss the peak near x = 0, and give the wrong answer.
In[3]:= NIntegrate@Exp@- x ^ 2D, 8x, - 10 000, 10 000<D
NIntegrate tries to make the best possible use of the information that it can get about the
numerical values of the integrand. Thus, for example, by default, if NIntegrate notices that the
estimated error in the integral in a particular region is large, it will take more samples in that
region. In this way, NIntegrate tries to "adapt" its operation to the particular integrand you
have given.
The kind of adaptive procedure that NIntegrate uses is similar, at least in spirit, to what Plot
does in trying to draw smooth curves for functions. In both cases, Mathematica tries to go on
taking more samples in a particular region until it has effectively found a smooth approximation
to the function in that region.
The kinds of problems that can appear in numerical integration can also arise in doing other
numerical operations on functions.
For example, if you ask for a numerical approximation to the sum of an infinite series, Mathemat -
ica samples a certain number of terms in the series, and then does an extrapolation to estimate
the contributions of other terms. If you insert large terms far out in the series, they may not be
detected when the extrapolation is done, and the result you get for the sum may be incorrect.
A similar problem arises when you try to find a numerical approximation to the minimum of a
function. Mathematica samples only a finite number of values, then effectively assumes that the
264 Mathematics and Algorithms
A similar problem arises when you try to find a numerical approximation to the minimum of a
function. Mathematica samples only a finite number of values, then effectively assumes that the
actual function interpolates smoothly between these values. If in fact the function has a sharp
dip in a particular region, then Mathematica may miss this dip, and you may get the wrong
answer for the minimum.
If you work only with numerical values of functions, there is simply no way to avoid the kinds of
problems we have been discussing. Exact symbolic computation, of course, allows you to get
around these problems.
In many calculations, it is therefore worthwhile to go as far as you can symbolically, and then
resort to numerical methods only at the very end. This gives you the best chance of avoiding
the problems that can arise in purely numerical computations.
xmax
NIntegrate@ f ,8x,xmin ,xmax <D numerical approximation to x f dx
min
1
Here is a numerical approximation to
i=1 .
i3
Here is a double integral over a triangular domain. Note the order in which the variables are
given.
In[4]:= NIntegrate@Sin@x yD, 8x, 0, 1<, 8y, 0, x<D
Out[4]= 0.119906
Numerical Integration
N@Integrate@expr,8x,xmin ,xmax <DD try to perform an integral exactly, then find numerical
approximations to the parts that remain
NIntegrate@expr,8x,xmin ,xmax <D find a numerical approximation to an integral
NIntegrate@expr,8x,xmin ,xmax <,8y,ymin ,ymax <,D
xmax ymax
multidimensional numerical integral x d x y d y expr
min min
3
This finds a numerical approximation to the integral 0 e-x x.
1 1
Here is the numerical value of the double integral -1 d x -1 d y Ix2 + y2 M.
266 Mathematics and Algorithms
1 1
Here is the numerical value of the double integral -1 d x -1 d y Ix2 + y2 M.
In[2]:= NIntegrate@x ^ 2 + y ^ 2, 8x, - 1, 1<, 8y, - 1, 1<D
Out[2]= 2.66667
An important feature of NIntegrate is its ability to deal with functions that "blow up" at known
points. NIntegrate automatically checks for such problems at the endpoints of the integration
region.
The function 1 x blows up at x = 0, but NIntegrate still succeeds in getting the correct
value for the integral.
In[3]:= NIntegrate@1 Sqrt@xD, 8x, 0, 1<D
Out[3]= 2.
NIntegrate::slwcon :
Numerical integration converging too slowly; suspect one of the following: singularity, value
of the integration is 0, highly oscillatory integrand, or WorkingPrecision too small.
NIntegrate::ncvb :
NIntegrate failed to converge to prescribed accuracy after 9 recursive bisections in x near
8x< = 91.22413 10-225 =. NIntegrate obtained 191612.2902185145`
and 160378.51781028978` for the integral and error estimates.
Out[5]= 191 612.
NIntegrate does not automatically look for singularities except at the endpoints of your integra-
tion region. When other singularities are present, NIntegrate may not give you the right
answer for the integral. Nevertheless, in following its adaptive procedure, NIntegrate will often
detect the presence of potentially singular behavior, and will warn you about it.
Mathematics and Algorithms 267
NIntegrate warns you of a possible problem due to the singularity in the middle of the integra-
tion region. The final result is numerically quite close to the correct answer.
In[6]:= NIntegrate@x ^ 2 Sin@1 xD, 8x, - 1, 2<D
NIntegrate::slwcon :
Numerical integration converging too slowly; suspect one of the following: singularity, value
of the integration is 0, highly oscillatory integrand, or WorkingPrecision too small.
Out[6]= 1.38755
If you know that your integrand has singularities at particular points, you can explicitly tell
NIntegrate to deal with them. NIntegrate@expr, 8x, xmin , x1 , x2 , , xmax <D integrates expr from
xmin to xmax , looking for possible singularities at each of the intermediate points xi .
This gives the same integral, but now explicitly deals with the singularity at x = 0.
In[7]:= NIntegrate@x ^ 2 Sin@1 xD, 8x, - 1, 0, 2<D
Out[7]= 1.38755
You can also use the list of intermediate points xi in NIntegrate to specify an integration con-
tour to follow in the complex plane. The contour is taken to consist of a sequence of line seg-
ments, starting at xmin , going through each of the xi , and ending at xmax .
This integrates 1 x around a closed contour in the complex plane, going from -1, through the
points -i, 1 and i, then back to -1.
In[8]:= NIntegrate@1 x, 8x, - 1, - I, 1, I, - 1<D
Out[8]= 0. + 6.28319
When NIntegrate tries to evaluate a numerical integral, it samples the integrand at a sequence
of points. If it finds that the integrand changes rapidly in a particular region, then it recursively
takes more sample points in that region. The parameters MinRecursion and MaxRecursion
268 Mathematics and Algorithms
When NIntegrate tries to evaluate a numerical integral, it samples the integrand at a sequence
of points. If it finds that the integrand changes rapidly in a particular region, then it recursively
takes more sample points in that region. The parameters MinRecursion and MaxRecursion
specify the minimum and maximum number of recursions to use. Increasing the value of
MinRecursion guarantees that NIntegrate will use a larger number of sample points.
MaxPoints and MaxRecursion limit the number of sample points which NIntegrate will ever try
to use. Increasing MinRecursion or MaxRecursion will make NIntegrate work more slowly.
With the default settings for all options, NIntegrate misses the peak in expI-Hx - 1L2 M near
x = 1, and gives the wrong answer for the integral.
In[10]:= NIntegrate@Exp@- Hx - 1L ^ 2D, 8x, - 1000, 1000<D
NIntegrate::slwcon :
Numerical integration converging too slowly; suspect one of the following: singularity, value
of the integration is 0, highly oscillatory integrand, or WorkingPrecision too small.
NIntegrate::ncvb :
NIntegrate failed to converge to prescribed accuracy after 9 recursive bisections in x near
8x< = 83.87517<. NIntegrate obtained 1.6330510571683285` and
0.004736564243403896` for the integral and error estimates.
Out[10]= 1.63305
With the option MinRecursion -> 3, NIntegrate samples enough points that it notices the
peak around x = 1. With the default setting of MaxRecursion , however, NIntegrate cannot
use enough sample points to be able to expect an accurate answer.
In[11]:= NIntegrate@Exp@- Hx - 1L ^ 2D, 8x, - 50 000, 1000<, MinRecursion 3D
NIntegrate::slwcon :
Numerical integration converging too slowly; suspect one of the following: singularity, value
of the integration is 0, highly oscillatory integrand, or WorkingPrecision too small.
NIntegrate::ncvb :
NIntegrate failed to converge to prescribed accuracy after 9 recursive bisections in x near
8x< = 8-8.44584<. NIntegrate obtained 1.8181913371063452`
and 1.165089629798181` for the integral and error estimates.
Out[11]= 1.81819
With this setting of MaxRecursion , NIntegrate can get an accurate answer for the integral.
In[12]:= NIntegrate@Exp@- Hx - 1L ^ 2D, 8x, - 50 000, 1000<, MinRecursion 3, MaxRecursion 20D
NIntegrate::slwcon :
Numerical integration converging too slowly; suspect one of the following: singularity, value
of the integration is 0, highly oscillatory integrand, or WorkingPrecision too small.
Out[12]= 1.77242
Another way to solve the problem is to make NIntegrate break the integration region into
several pieces, with a small piece that explicitly covers the neighborhood of the peak.
Mathematics and Algorithms 269
Another way to solve the problem is to make NIntegrate break the integration region into
several pieces, with a small piece that explicitly covers the neighborhood of the peak.
In[13]:= NIntegrate@Exp@- Hx - 1L ^ 2D, 8x, - 1000, - 10, 10, 1000<D
Out[13]= 1.77245
For integrals in many dimensions, it can take a long time for NIntegrate to get a precise
answer. However, by setting the option MaxPoints, you can tell NIntegrate to give you just a
rough estimate, sampling the integrand only a limited number of times.
Here is a way to get a rough estimate for an integral that takes a long time to compute.
In[14]:= NIntegrate@1 Sqrt@x + Log@y + zD ^ 2D,
8x, 0, 1<, 8y, 0, 1<, 8z, 0, 1<, MaxPoints -> 1000D
NIntegrate::maxp : The integral failed to converge after 1023 integrand evaluations. NIntegrate obtained
1.4548878649546768` and 0.03247010762528413` for the integral and error estimates.
Out[14]= 1.45489
imax
NSum@ f ,8i,imin ,imax <D find a numerical approximation to the sum i=i f
min
1
This gives a numerical approximation to
i=1 .
i3 +i!
There is no exact result for this sum, so Mathematica leaves it in a symbolic form.
In[2]:= Sum@1 Hi ^ 3 + i !L, 8i, 1, Infinity<D
1
Out[2]=
3
i=1 i + i!
270 Mathematics and Algorithms
The way NSum works is to include a certain number of terms explicitly, and then to try and
estimate the contribution of the remaining ones. There are three approaches to estimating this
contribution. The first uses the Euler|Maclaurin method, and is based on approximating the sum
by an integral. The second method, known as the Wynn epsilon method, samples a number of
additional terms in the sum, and then tries to fit them to a polynomial multiplied by a decaying
exponential. The third approach, useful for alternating series, uses an alternating signs method;
it also samples a number of additional terms and approximates their sum by the ratio of two
polynomials (Pad approximation).
If you do not explicitly specify the method to use, NSum will try to choose between the
EulerMaclaurin or WynnEpsilon methods. In any case, some implicit assumptions about the
functions you are summing have to be made. If these assumptions are not correct, you may get
inaccurate answers.
The most common place to use NSum is in evaluating sums with infinite limits. You can, how-
ever, also use it for sums with finite limits. By making implicit assumptions about the objects
you are evaluating, NSum can often avoid doing as many function evaluations as an explicit Sum
computation would require.
100
This finds the numerical value of n=0 e-n by extrapolation techniques.
In[4]:= NSum@Exp@- nD, 8n, 0, 100<D
Out[4]= 1.58198
Mathematics and Algorithms 271
You can also get the result, albeit much less efficiently, by constructing the symbolic form of the
sum, then evaluating it numerically.
In[5]:= Sum@Exp@- nD, 8n, 0, 100<D N
Out[5]= 1.58198
NProduct works in essentially the same way as NSum, with analogous options.
NSolve gives you numerical approximations to all the roots of a polynomial equation.
In[1]:= NSolve@x ^ 5 + x + 1 == 0, xD
Out[1]= 88x -0.754878<, 8x -0.5 - 0.866025 <, 8x -0.5 + 0.866025 <,
8x 0.877439 - 0.744862 <, 8x 0.877439 + 0.744862 <<
You can also use NSolve to solve sets of simultaneous equations numerically.
In[2]:= NSolve@8x + y == 2, x - 3 y + z == 3, x - y + z == 0<, 8x, y, z<D
Out[2]= 88x 3.5, y -1.5, z -5.<<
If your equations involve only linear functions or polynomials, then you can use NSolve to get
numerical approximations to all the solutions. However, when your equations involve more
complicated functions, there is in general no systematic procedure for finding all solutions, even
numerically. In such cases, you can use FindRoot to search for solutions. You have to give
FindRoot a place to start its search.
The equation has several solutions. If you start at a different x, FindRoot may return a differ-
ent solution.
272 Mathematics and Algorithms
The equation has several solutions. If you start at a different x, FindRoot may return a differ-
ent solution.
In[4]:= FindRoot@3 Cos@xD == Log@xD, 8x, 10<D
Out[4]= 8x 13.1064<
You can search for solutions to sets of equations. Here the solution involves complex numbers.
In[5]:= FindRoot@8x == Log@yD, y == Log@xD<, 88x, I<, 8y, 2<<D
Out[5]= 8x 0.318132 + 1.33724 , y 0.318132 + 1.33724 <
When Solve cannot find solutions in terms of radicals to polynomial equations, it returns a
symbolic form of the result in terms of Root objects.
In[1]:= Solve@x ^ 5 + 7 x + 1 == 0, xD
You can use NSolve to get numerical solutions to polynomial equations directly, without first
trying to find exact results.
In[4]:= NSolve@x ^ 7 + x + 1 == 0, xD
Out[4]= 88x -0.796544<, 8x -0.705298 - 0.637624 <,
8x -0.705298 + 0.637624 <, 8x 0.123762 - 1.05665 <,
8x 0.123762 + 1.05665 <, 8x 0.979808 - 0.516677 <, 8x 0.979808 + 0.516677 <<
Mathematics and Algorithms 273
NSolve will give you the complete set of numerical solutions to any polynomial equation or
system of polynomial equations.
FindRoot@lhs==rhs,8x,x0 <D search for a numerical solution to the equation lhs == rhs,
starting with x = x0
0.5
Out[1]=
-1.0 -0.5 0.5 1.0
-0.5
-1.0
This finds a numerical approximation to the value of x at which the intersection occurs. The 0
tells FindRoot what value of x to try first.
In[2]:= FindRoot@Cos@xD == x, 8x, 0<D
Out[2]= 8x 0.739085<
In trying to find a solution to your equation, FindRoot starts at the point you specify, and then
progressively tries to get closer and closer to a solution. Even if your equations have several
solutions, FindRoot always returns the first solution it finds. Which solution this is will depend
on what starting point you chose. If you start sufficiently close to a particular solution,
FindRoot will usually return that solution.
The function sinHxL has an infinite number of roots of the form x = n p. If you start sufficiently
close to a particular root, FindRoot will give you that root.
In[3]:= FindRoot@Sin@xD, 8x, 3<D
Out[3]= 8x 3.14159<
If you want FindRoot to search for complex solutions, then you have to give a complex
starting value.
In[5]:= FindRoot@Sin@xD == 2, 8x, I<D
Out[5]= 8x 1.5708 + 1.31696 <
This
This finds
finds aa solution
solution to
to aa set
set of
of simultaneous
simultaneous equations.
equations.
Mathematics and Algorithms 275
The variables used by FindRoot can have values that are lists. This allows you to find roots of
functions that take vectors as arguments.
This generates a numerical solution to the equation y HxL = yHxL with 0 < x < 2. The result is given
in terms of an InterpolatingFunction .
In[1]:= NDSolve@8y '@xD == y@xD, y@0D == 1<, y, 8x, 0, 2<D
Out[1]= 88y InterpolatingFunction@880., 2.<<, <>D<<
With an algebraic equation such as x2 + 3 x + 1 = 0, each solution for x is simply a single number.
For a differential equation, however, the solution is a function, rather than a single number. For
example, in the equation y HxL = yHxL, you want to get an approximation to the function yHxL as the
independent variable x varies over some range.
276 Mathematics and Algorithms
With an algebraic equation such as x2 + 3 x + 1 = 0, each solution for x is simply a single number.
For a differential equation, however, the solution is a function, rather than a single number. For
example, in the equation y HxL = yHxL, you want to get an approximation to the function yHxL as the
independent variable x varies over some range.
[email protected] use the list of rules for the function y to get values for y@xD
InterpolatingFunction@dataD@xD evaluate an interpolated function at the point x
Plot@Evaluate@[email protected],8x,xmin ,xmax <D
plot the solution to a differential equation
Here is a plot of the solution for z@xD found on line 3. Plot is discussed in "Basic Plotting".
In[5]:= Plot@Evaluate@z@xD . %3D, 8x, 0, Pi<D
1.0
0.5
Out[5]=
0.5 1.0 1.5 2.0 2.5 3.0
-0.5
-1.0
NDSolve finds solutions iteratively. It starts at a particular value of x, then takes a sequence of
steps, trying eventually to cover the whole range xmin to xmax .
In order to get started, NDSolve has to be given appropriate initial or boundary conditions for
the yi and their derivatives. These conditions specify values for yi @xD, and perhaps derivatives
yi '@xD, at particular points x. In general, at least for ordinary differential equations, the conditions
you give can be at any x: NDSolve will automatically cover the range xmin to xmax .
This finds a solution for y with x in the range 0 to 2, using an initial condition for y@0D.
In[1]:= NDSolve@8y '@xD == y@xD, y@0D == 1<, y, 8x, 0, 2<D
Out[1]= 88y InterpolatingFunction@880., 2.<<, <>D<<
278 Mathematics and Algorithms
This still finds a solution with x in the range 0 to 2, but now the initial condition is for y@3D.
In[2]:= NDSolve@8y '@xD == y@xD, y@3D == 1<, y, 8x, 0, 2<D
Out[2]= 88y InterpolatingFunction@880., 2.<<, <>D<<
When you use NDSolve, the initial or boundary conditions you give must be sufficient to deter-
mine the solutions for the yi completely. When you use DSolve to find symbolic solutions to
differential equations, you can get away with specifying fewer initial conditions. The reason is
that DSolve automatically inserts arbitrary constants C@iD to represent degrees of freedom
associated with initial conditions that you have not specified explicitly. Since NDSolve must give
a numerical solution, it cannot represent these kinds of additional degrees of freedom. As a
result, you must explicitly give all the initial or boundary conditions that are needed to deter-
mine the solution.
In a typical case, if you have differential equations with up to nth derivatives, then you need to
give initial conditions for up to Hn - 1Lth derivatives, or give boundary conditions at n points.
With a third-order equation, you need to give initial conditions for up to second derivatives.
In[4]:= NDSolve@8y '''@xD + 8 y ''@xD + 17 y '@xD + 10 y@xD == 0,
y@0D == 6, y '@0D == - 20, y ''@0D == 84<, y, 8x, 0, 1<D
Out[4]= 88y InterpolatingFunction@880., 1.<<, <>D<<
Out[5]=
3
With a third-order equation, you can also give boundary conditions at three points.
In[6]:= NDSolve@8y '''@xD + Sin@xD == 0, y@0D == 4, y@1D == 7, y@2D == 0<, y, 8x, 0, 2<D
Out[6]= 88y InterpolatingFunction@880., 2.<<, <>D<<
Mathematica allows you to use any appropriate linear combination of function values and
derivatives as boundary conditions.
In[7]:= NDSolve@8y ''@xD + y@xD == 12 x, 2 y@0D - y '@0D == - 1, 2 y@1D + y '@1D == 9<, y, 8x, 0, 1<D
Out[7]= 88y InterpolatingFunction@880., 1.<<, <>D<<
In most cases, all the initial conditions you give must involve the same value of x, say x0 . As a
result, you can avoid giving both xmin and xmax explicitly. If you specify your range of x as 8x, x1 <,
then Mathematica will automatically generate a solution over the range x0 to x1 .
You can give initial conditions as equations of any kind. In some cases, these equations may
have multiple solutions. In such cases, NDSolve will correspondingly generate multiple solu-
tions.
2
Out[10]=
0.2 0.4 0.6 0.8 1.0
-2
-2
-4
-4
0.5
Out[12]=
2 4 6 8 10
-0.5
1.0
0.5
Out[13]=
-0.5
This constructs a set of five coupled differential equations and initial conditions.
In[14]:= eqns = Join@Table@y@iD '@xD == y@i - 1D@xD - y@iD@xD, 8i, 2, 4<D,
8y@1D '@xD == - y@1D@xD, y@5D '@xD == y@4D@xD, y@1D@0D == 1<,
Table@y@iD@0D == 0, 8i, 2, 5<DD
Out[14]= 8y@2D @xD y@1D@xD - y@2D@xD, y@3D @xD y@2D@xD - y@3D@xD,
y@4D @xD y@3D@xD - y@4D@xD, y@1D @xD -y@1D@xD, y@5D @xD y@4D@xD,
y@1D@0D 1, y@2D@0D 0, y@3D@0D 0, y@4D@0D 0, y@5D@0D 0<
0.8
0.6
Out[16]=
0.4
0.2
2 4 6 8 10
NDSolve can handle functions whose values are lists or arrays. If you give initial conditions like
y@0D == 8v1 , v2 , , vn <, then NDSolve will assume that y is a function whose values are lists of
length n.
Out[18]= 2 4 6 8
-5
-10
NDSolve has many methods for solving equations, but essentially all of them at some level
work by taking a sequence of steps in the independent variable x, and using an adaptive proce-
dure to determine the size of these steps. In general, if the solution appears to be varying
rapidly in a particular region, then NDSolve will reduce the step size or change the method so
as to be able to track the solution better.
NDSolve reduced the step size around x = 0 so as to reproduce the kink accurately.
In[20]:= Plot@Evaluate@y@xD . %D, 8x, - 5, 5<D
5.0
4.5
Out[20]=
4.0
-4 -2 2 4
Through its adaptive procedure, NDSolve is able to solve "stiff" differential equations in which
there are several components which vary with x at very different rates.
NDSolve follows the general procedure of reducing step size until it tracks solutions accurately.
There is a problem, however, when the true solution has a singularity. In this case, NDSolve
might go on reducing the step size forever, and never terminate. To avoid this problem, the
option MaxSteps specifies the maximum number of steps that NDSolve will ever take in attempt-
ing to find a solution. For ordinary differential equations the default setting is
MaxSteps -> 10 000.
Mathematics and Algorithms 283
NDSolve::mxst : Maximum number of 10000 steps reached at the point x == -1.00413 10-172 .
-172
Out[23]= 99y@xD InterpolatingFunctionA99-1., -1.00413 10 ==, <>E@xD==
Out[24]=
The default setting for MaxSteps should be sufficient for most equations with smooth solutions.
When solutions have a complicated structure, however, you may occasionally have to choose
larger settings for MaxSteps. With the setting MaxSteps -> Infinity there is no upper limit on
the number of steps used.
To take the solution to the Lorenz equations this far, you need to remove the default bound on
MaxSteps .
In[25]:= NDSolve@8x '@tD == - 3 Hx@tD - y@tDL, y '@tD == - x@tD z@tD + 26.5 x@tD - y@tD,
z '@tD == x@tD y@tD - z@tD, x@0D == z@0D == 0, y@0D == 1<,
8x, y, z<, 8t, 0, 200<, MaxSteps -> InfinityD
Out[25]= 88x InterpolatingFunction@880., 200.<<, <>D,
y InterpolatingFunction@880., 200.<<, <>D, z InterpolatingFunction@880., 200.<<, <>D<<
Out[26]=
When NDSolve solves a particular set of differential equations, it always tries to choose a step
size appropriate for those equations. In some cases, the very first step that NDSolve makes
may be too large, and it may miss an important feature in the solution. To avoid this problem,
284 Mathematics and Algorithms
When NDSolve solves a particular set of differential equations, it always tries to choose a step
size appropriate for those equations. In some cases, the very first step that NDSolve makes
may be too large, and it may miss an important feature in the solution. To avoid this problem,
you can explicitly set the option StartingStepSize to specify the size to use for the first step.
The equations you give to NDSolve do not necessarily all have to involve derivatives; they can
also just be algebraic. You can use NDSolve to solve many such differential-algebraic equations.
0.5
Out[28]=
1 2 3 4 5
-0.5
This finds a numerical solution to the wave equation. The result is a two-dimensional interpolat-
ing function.
In[29]:= NDSolve@8D@u@t, xD, t, tD == D@u@t, xD, x, xD, u@0, xD == Exp@- x ^ 2D,
Derivative@1, 0D@uD@0, xD == 0, u@t, - 6D == u@t, 6D<, u, 8t, 0, 6<, 8x, - 6, 6<D
Out[29]= 88u InterpolatingFunction@880., 6.<, 8..., -6., 6., ...<<, <>D<<
Mathematics and Algorithms 285
Out[30]=
Out[32]=
6
Out[33]=
4
0
10 5 0 5 10
286 Mathematics and Algorithms
Out[36]=
Mathematics and Algorithms 287
Numerical Optimization
FindMinimum @8 f ,cons<,8x,y,<D search for a local minimum subject to the constraints cons
FindMaximum @ f ,xD , etc. search for a local maximum
The last element of the list gives the value at which the minimum is achieved.
In[2]:= Gamma@xD . Last@%D
Out[2]= 0.885603
Like FindRoot, FindMinimum and FindMaximum work by starting from a point, then progres-
sively searching for a minimum or maximum. But since they return a result as soon as they find
anything, they may give only a local minimum or maximum of your function, not a global one.
15
Out[3]= 10
-3 -2 -1 1 2
288 Mathematics and Algorithms
This gives the local minimum on the left, which in this case is also the global minimum.
In[5]:= FindMinimum@x ^ 4 - 3 x ^ 2 + x, 8x, - 1<D
Out[5]= 8-3.51391, 8x -1.30084<<
NMinimize and NMaximize are numerical analogs of Minimize and Maximize. But unlike
Minimize and Maximize they usually cannot guarantee to find absolute global minima and
maxima. Nevertheless, they typically work well when the function f is fairly smooth, and has a
limited number of local minima and maxima.
Mathematics and Algorithms 289
With the constraint x > 0, NMinimize will give the local minimum on the right.
In[9]:= NMinimize@8x ^ 4 - 3 x ^ 2 + x, x > 0<, xD
Out[9]= 8-1.07023, 8x 1.1309<<
If both the objective function f and the constraints cons are linear in all variables, then minimiza-
tion and maximization correspond to a linear programming problem. Sometimes it is convenient
to state such problems not in terms of explicit equations, but instead in terms of matrices and
vectors.
LinearProgramming@c,m,bD find the vector x which minimizes c.x subject to the con-
straints m.x b and x 0
LinearProgramming@c,m,b,lD use the constraints m.x b and x l
You can specify a mixture of equality and inequality constraints by making the list b be a
sequence of pairs 8bi , si <. If si is 1, then the ith constraint is mi .x bi . If si is 0 then it is mi .x == bi ,
and if si is -1 then it is mi .x bi .
In LinearProgramming@c, m, b, lD, you can make l be a list of pairs 88l1 , u1 <, 8l2 , u2 <, < represent-
ing lower and upper bounds on the xi .
In doing large linear programming problems, it is often convenient to give the matrix m as a
SparseArray object.
When you give a setting for WorkingPrecision, this typically defines an upper limit on the
precision of the results from a computation. But within this constraint you can tell Mathematica
how much precision and accuracy you want it to try to get. You should realize that for many
kinds of numerical operations, increasing precision and accuracy goals by only a few digits can
greatly increase the computation time required. Nevertheless, there are many cases where it is
important to ensure that high precision and accuracy are obtained.
NIntegrate::eincr :
The global error of the strategy GlobalAdaptive has increased more than 400 times. The global error is
expected to decrease monotonically after a number of integrand evaluations. Suspect one
of the following: the difference between the values of PrecisionGoal and WorkingPrecision
is too small; the integrand is highly oscillatory or it is not a HpiecewiseL smooth function;
or the true value of the integral is 0. Increasing the value of the GlobalAdaptive option
MaxErrorIncreases might lead to a convergent numerical integration. NIntegrate obtained
0.43060610312069060491237735524846578643219268469700477957788899453862440935
086147`79.99999999999999 and
5.03891680239785224285840796406833800958006097055414813173023183082827274593
35312`79.99999999999999*^-40 for the integral and error estimates.
Out[4]= 0.430606103120690604912377355248
Given a particular setting for WorkingPrecision, each of the functions for numerical operations
in Mathematica uses certain default settings for PrecisionGoal and AccuracyGoal. Typical is
the case of NDSolve, in which these default settings are equal to half the settings given for
WorkingPrecision.
292 Mathematics and Algorithms
Given a particular setting for WorkingPrecision, each of the functions for numerical operations
in Mathematica uses certain default settings for PrecisionGoal and AccuracyGoal. Typical is
the case of NDSolve, in which these default settings are equal to half the settings given for
WorkingPrecision.
The precision and accuracy goals normally apply both to the final results returned, and to
various norms or error estimates for them. Functions for numerical operations in Mathematica
typically try to refine their results until either the specified precision goal or accuracy goal is
reached. If the setting for either of these goals is Infinity, then only the other goal is consid-
ered.
In doing ordinary numerical evaluation with N@expr, nD, Mathematica automatically adjusts its
internal computations to achieve n-digit precision in the result. But in doing numerical opera-
tions on functions, it is in practice usually necessary to specify WorkingPrecision and
PrecisionGoal more explicitly.
0.750364
0.739113
0.739085
0.739085
Out[1]= 8x 0.739085<
Note the importance of using option :> expr rather than option -> expr. You need a delayed rule :>
to make expr be evaluated each time it is used, rather than just when the rule is given.
Mathematics and Algorithms 293
Note the importance of using option :> expr rather than option -> expr. You need a delayed rule :>
to make expr be evaluated each time it is used, rather than just when the rule is given.
Reap and Sow provide a convenient way to make a list of the steps taken.
In[2]:= Reap@FindRoot@Cos@xD == x, 8x, 1<, StepMonitor :> Sow@xDDD
Out[2]= 88x 0.739085<, 880.750364, 0.739113, 0.739085, 0.739085<<<
To take a successful step toward an answer, iterative numerical algorithms sometimes have to
do several evaluations of the functions they have been given. Sometimes this is because each
step requires, say, estimating a derivative from differences between function values, and some-
times it is because several attempts are needed to achieve a successful step.
0.5
Out[6]=
50 100 150 200 250
-0.5
-1.0
294 Mathematics and Algorithms
Method options.
There are often several different methods known for doing particular types of numerical compu-
tations. Typically Mathematica supports most generally successful ones that have been dis-
cussed in the literature, as well as many that have not. For any specific problem, it goes to
considerable effort to pick the best method automatically. But if you have sophisticated knowl-
edge of a problem, or are studying numerical methods for their own sake, you may find it
useful to tell Mathematica explicitly what method it should use. Function reference pages list
some of the methods built into Mathematica; others are discussed in "Numerical and Related
Functions" or in advanced documentation.
This solves a differential equation using method m, and returns the number of steps and evalua-
tions needed.
In[7]:= try@m_D := Block@8s = e = 0<, NDSolve@8y ''@xD + Sin@y@xDD == 0, y '@0D == y@0D == 1<, y,
8x, 0, 100<, StepMonitor :> s ++, EvaluationMonitor :> e ++, Method -> mD; 8s, e<D
With the method selected automatically, this is the number of steps and evaluations that are
needed.
In[8]:= try@AutomaticD
Out[8]= 81118, 2329<
This shows what happens with several other possible methods. The Adams method that is
selected automatically is the fastest.
In[9]:= try
8"Adams", "BDF", "ExplicitRungeKutta", "ImplicitRungeKutta", "Extrapolation"<
Out[9]= 881118, 2329<, 82415, 2861<, 8287, 4595<, 8882, 13 092<, 884, 4146<<
This shows what happens with the explicit Runge-Kutta method when the difference order
parameter is changed.
In[10]:= Table@try@8"ExplicitRungeKutta", "DifferenceOrder" -> n<D, 8n, 4, 9<D
Out[10]= 883519, 14 078<, 8614, 4300<, 8849, 6794<, 8472, 4722<, 8288, 3746<, 8287, 4594<<
Mathematics and Algorithms 295
This shows successive steps in a simple iterative procedure with input 0.1111.
In[1]:= NestList@FractionalPart@2 D &, 0.1111, 10D
Out[1]= 80.1111, 0.2222, 0.4444, 0.8888, 0.7776, 0.5552, 0.1104, 0.2208, 0.4416, 0.8832, 0.7664<
Here is the result with input 0.1112. Progressive divergence from the result with input 0.1111 is
seen.
In[2]:= NestList@FractionalPart@2 D &, 0.1112, 10D
Out[2]= 80.1112, 0.2224, 0.4448, 0.8896, 0.7792, 0.5584, 0.1168, 0.2336, 0.4672, 0.9344, 0.8688<
The action of FractionalPart@2 xD is particularly simple in terms of the binary digits of the
number x: it just drops the first one, and shifts the remaining ones to the left. After several
steps, this means that the results one gets are inevitably sensitive to digits that are far to the
right, and have an extremely small effect on the original value of x.
This shows the shifting process achieved by FractionalPart@2 xD in the first 8 binary digits
of x.
In[3]:= RealDigits@Take@%, 5D, 2, 8, - 1D
Out[3]= 8880, 0, 0, 1, 1, 1, 0, 0<, 0<, 880, 0, 1, 1, 1, 0, 0, 0<, 0<,
880, 1, 1, 1, 0, 0, 0, 1<, 0<, 881, 1, 1, 0, 0, 0, 1, 1<, 0<, 881, 1, 0, 0, 0, 1, 1, 1<, 0<<
If you give input only to a particular precision, you are effectively specifying only a certain
number of digits. And once all these digits have been "excavated" you can no longer get accu-
rate results, since to do so would require knowing more digits of your original input. So long as
you use arbitrary-precision numbers, Mathematica automatically keeps track of this kind of
degradation in precision, indicating a number with no remaining significant digits by 0. 10e , as
discussed in "Arbitrary-Precision Numbers".
296 Mathematics and Algorithms
Successive steps yield numbers of progressively lower precision, and eventually no precision at
all.
In[4]:= NestList@FractionalPart@40 D &, N@1 9, 20D, 20D
Out[4]= 90.11111111111111111111, 0.4444444444444444444, 0.77777777777777778, 0.1111111111111111,
0.44444444444444, 0.777777777778, 0.11111111111, 0.444444444, 0.77777778, 0.111111,
0.4444, 0.778, 0.1, 0. 10-1 , 0. 101 , 0. 103 , 0. 104 , 0. 106 , 0. 107 , 0. 109 , 0. 1011 =
This asks for the precision of each number. Zero precision indicates that there are no correct
significant digits.
In[5]:= Map@Precision, %D
Out[5]= 820., 19., 17.641, 15.1938, 14.1938, 12.8348, 10.3876, 9.38764,
8.02862, 5.58146, 4.58146, 3.22244, 0.77528, 0., 0., 0., 0., 0., 0., 0., 0.<
It is important to realize that if you use approximate numbers of any kind, then in an example
like the one above you will always eventually run out of precision. But so long as you use arbi-
trary-precision numbers, Mathematica will explicitly show you any decrease in precision that is
occurring. However, if you use machine-precision numbers, then Mathematica will not keep
track of precision, and you cannot tell when your results become meaningless.
If you use machine-precision numbers, Mathematica will no longer keep track of any degrada-
tion in precision.
In[7]:= NestList@FractionalPart@40 D &, N@1 9D, 20D
Out[7]= 80.111111, 0.444444, 0.777778, 0.111111, 0.444444, 0.777778, 0.111111, 0.444445, 0.77781,
0.112405, 0.496185, 0.847383, 0.89534, 0.813599, 0.543945, 0.757813, 0.3125, 0.5, 0., 0., 0.<
By iterating the operation FractionalPart@2 xD you extract successive binary digits in what-
ever number you start with. And if these digits are apparently random~as in a number like p~
then the results will be correspondingly random. But if the digits have a simple pattern~as in
any rational number~then the results you get will be correspondingly simple.
This generates a seemingly random sequence, even starting from simple input.
In[8]:= NestList@FractionalPart@3 2 D &, 1, 15D
1 3 1 3 9 27 81 243 217 651 1953 1763 5289 15 867 14 833
Out[8]= :1, , , , , , , , , , , , , , , >
2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16 384 32 768
After the values have been computed, one can safely find numerical approximations to them.
In[9]:= N@%D
Out[9]= 81., 0.5, 0.75, 0.125, 0.1875, 0.28125, 0.421875, 0.632813, 0.949219,
0.423828, 0.635742, 0.953613, 0.43042, 0.64563, 0.968445, 0.452667<
Here are the last 5 results after 1000 iterations, computed using exact numbers.
In[10]:= Take@N@NestList@FractionalPart@3 2 D &, 1, 1000DD, - 5D
Out[10]= 80.0218439, 0.0327659, 0.0491488, 0.0737233, 0.110585<
Many kinds of iterative procedures yield functions that depend sensitively on their input. Such
functions also arise when one looks at solutions to differential equations. In effect, varying the
independent parameter in the differential equation is a continuous analog of going from one
step to the next in an iterative procedure.
Basic Statistics
The variance Variance@listD is defined to be varHxL = s2 HxL = Hxi - mHxLL2 Hn - 1L, for real data. (For
complex data var HxL = s2 HxL = Hxi - mHxLL Hxi - mHxLL Hn - 1L.)
If the elements in list are thought of as being selected at random according to some probability
distribution, then the mean gives an estimate of where the center of the distribution is located,
while the standard deviation gives an estimate of how wide the dispersion in the distribution is.
The median Median@listD effectively gives the value at the halfway point in the sorted version of
list. It is often considered a more robust measure of the center of a distribution than the mean,
since it depends less on outlying values.
The qth quantile Quantile@list, qD effectively gives the value that is q of the way through the
sorted version of list.
There are, however, about ten other definitions of quantile in use, all potentially giving slightly
different results. Mathematica covers the common cases by introducing four quantile parame-
ters in the form Quantile@list, q, 88a, b<, 8c, d<<D. The parameters a and b in effect define
where in the list should be considered a fraction q of the way through. If this corresponds to an
integer position, then the element at that position is taken to be the qth quantile. If it is not an
300 Mathematics and Algorithms
There are, however, about ten other definitions of quantile in use, all potentially giving slightly
different results. Mathematica covers the common cases by introducing four quantile parame-
ters in the form Quantile@list, q, 88a, b<, 8c, d<<D. The parameters a and b in effect define
where in the list should be considered a fraction q of the way through. If this corresponds to an
integer position, then the element at that position is taken to be the qth quantile. If it is not an
integer position, then a linear combination of the elements on either side is used, as specified
by c and d.
The position in a sorted list s for the qth quantile is taken to be k = a + Hn + bL q. If k is an integer,
then the quantile is sk . Otherwise, it is sdkt + Hs`kp - sdkt L Hc + d Hk - dktLL, with the indices taken to be 1
or n if they are out of range.
Whenever d = 0, the value of the qth quantile is always equal to some actual element in list, so
that the result changes discontinuously as q varies. For d = 1, the qth quantile interpolates lin-
early between successive elements in list. Median is defined to use such an interpolation.
Note that Quantile@list, qD yields quartiles when q = m 4 and percentiles when q = m 100.
Sometimes each item in your data may involve a list of values. The basic statistics functions in
Mathematica automatically apply to all corresponding elements in these lists.
Mathematics and Algorithms 301
Note that you can extract the elements in the ith "column" of a multidimensional list using
list@@All, iDD.
Descriptive Statistics
Descriptive statistics refers to properties of distributions, such as location, dispersion, and
shape. The functions described here compute descriptive statistics of lists of data. You can
calculate some of the standard descriptive statistics for various known distributions by using the
functions described in "Continuous Distributions" and "Discrete Distributions".
1
The statistics are calculated assuming that each value of data xi has probability equal to n
,
1
Mean@dataD average value i xi
n
1
RootMeanSquare@dataD root mean square i xi2
n
Location statistics.
Location statistics describe where the data are located. The most common functions include
measures of central tendency like the mean, median, and mode. Quantile@data, qD gives the
location before which H100 qL percent of the data lie. In other words, Quantile gives a value z
302 Mathematics and Algorithms
Location statistics describe where the data are located. The most common functions include
measures of central tendency like the mean, median, and mode. Quantile@data, qD gives the
location before which H100 qL percent of the data lie. In other words, Quantile gives a value z
such that the probability that Hxi < zL is less than or equal to q and the probability that Hxi zL is
greater than or equal to q.
Here is a dataset.
In[1]:= data = 86.5, 3.8, 6.6, 5.7, 6.0, 6.4, 5.3<
Out[1]= 86.5, 3.8, 6.6, 5.7, 6., 6.4, 5.3<
This is the mean when the smallest entry in the list is excluded. TrimmedMean allows you to
describe the data with removed outliers.
1
In[3]:= TrimmedMeanBdata, : , 0>F
7
Out[3]= 6.08333
1 2
Variance@dataD unbiased estimate of variance, i Hxi - xL
n-1
Dispersion statistics.
Dispersion statistics summarize the scatter or spread of the data. Most of these functions
describe deviation from a particular location. For instance, variance is a measure of deviation
from the mean, and standard deviation is just the square root of the variance.
This gives an unbiased estimate for the variance of the data with n - 1 as the divisor.
In[4]:= Variance@dataD
Out[4]= 0.962857
Covariance is the multivariate extension of variance. For two vectors of equal length, the covari-
ance is a number. For a single matrix m, the i, jth element of the covariance matrix is the
covariance between the ith and jth columns of m. For two matrices m1 and m2 , the i, jth element
of the covariance matrix is the covariance between the ith column of m1 and the jth column of m2 .
Scaling the covariance matrix terms by the appropriate standard deviations gives the correla-
tion matrix.
In[10]:= With@8sd = StandardDeviation@mD<,
Transpose@Transpose@% sdD sdDD
Out[10]= 881., -0.132314<, 8-0.132314, 1.<<
1
CentralMoment@data,rD rth central moment n
r
i Hxi - xL
Shape statistics.
You can get some information about the shape of a distribution using shape statistics. Skew-
ness describes the amount of asymmetry. Kurtosis measures the concentration of data around
the peak and in the tails versus the concentration in the flanks.
Skewness is calculated by dividing the third central moment by the cube of the population
standard deviation. Kurtosis is calculated by dividing the fourth central moment by the square
of the population variance of the data, equivalent to CentralMoment@data, 2D. (The population
variance is the second central moment, and the population standard deviation is its square
root.)
A negative value for skewness indicates that the distribution underlying the data has a long left-
sided tail.
In[12]:= Skewness@dataD
Out[12]= -1.20108
Mathematics and Algorithms 305
ExpectedValue@ f ,listD expected value of the pure function f with respect to the
values in list
ExpectedValue@ f @xD,list,xD expected value of the function f of x with respect to the
values of list
Expected values.
1
The expected value of a function f is n
i=1 f Hxi L for the list of values x1 , x2 , , xn . Many descrip-
n
tive statistics are expected values. For instance, the mean is the expected value of x, and the
rth central moment is the expected value of Hx - xLr where x is the mean of the xi .
Discrete Distributions
The functions described here are among the most commonly used discrete statistical distribu-
tions. You can compute their densities, means, variances, and other related properties. The
distributions themselves are represented in the symbolic form name@param1 , param2 , D. Func-
tions such as Mean, which give properties of statistical distributions, take the symbolic represen-
tation of the distribution as an argument. "Continuous Distributions" describes many continuous
statistical distributions.
The Poisson distribution PoissonDistribution@mD describes the number of events that occur in
a given time period where m is the average number of events per period.
The terms in the series expansion of log H1 - qL about q = 0 are proportional to the probabilities of
a discrete random variable following the logarithmic series distribution
LogSeriesDistribution@qD. The distribution of the number of items of a product purchased by
a buyer in a specified interval is sometimes modeled by this distribution.
The Zipf distribution ZipfDistribution@rD, sometimes referred to as the zeta distribution, was
first used in linguistics and its use has been extended to model rare events.
Distributions are represented in symbolic form. PDF@dist, xD evaluates the mass function at x if x
is a numerical value, and otherwise leaves the function in symbolic form whenever possible.
Similarly, CDF@dist, xD gives the cumulative distribution and Mean@distD gives the mean of the
specified distribution. For a more complete description of the various functions of a statistical
distribution, see the description of their continuous analogues in "Continuous Distributions".
Here is a symbolic representation of the binomial distribution for 34 trials, each having probabil-
ity 0.3 of success.
In[1]:= bdist = BinomialDistribution@34, 0.3D
Out[1]= BinomialDistribution@34, 0.3D
You can get the expression for the mean by using symbolic variables as arguments.
In[3]:= Mean@BinomialDistribution@n, pDD
Out[3]= n p
This gives the expected value of x3 with respect to the binomial distribution.
In[5]:= ExpectedValue[x^3, bdist, x]
Out[5]= 1282.55
The elements of this matrix are pseudorandom numbers from the binomial distribution.
In[6]:= RandomInteger@bdist, 82, 3<D
Out[6]= 8810, 7, 9<, 812, 10, 11<<
Continuous Distributions
Mathematics and Algorithms 309
Continuous Distributions
The functions described here are among the most commonly used continuous statistical distribu-
tions. You can compute their densities, means, variances, and other related properties. The
distributions themselves are represented in the symbolic form name@param1 , param2 , D. Func-
tions such as Mean, which give properties of statistical distributions, take the symbolic represen-
tation of the distribution as an argument. "Discrete Distributions" describes many discrete
statistical distributions.
If X1 ,, Xn are independent normal random variables with unit variance and mean zero, then
n
i=1 Xi2 has a c2 distribution with n degrees of freedom. If a normal variable is standardized by
subtracting its mean and dividing by its standard deviation, then the sum of squares of such
quantities follows this distribution. The c2 distribution is most typically used when describing
the variance of normal samples.
A variable that has a Student t distribution can also be written as a function of normal random
variables. Let X and Z be independent random variables, where X is a standard normal distribu-
tion and Z is a c2 variable with n degrees of freedom. In this case, X Z n has a t distribution
with n degrees of freedom. The Student t distribution is symmetric about the vertical axis, and
characterizes the ratio of a normal variable to its standard deviation. Location and scale parame-
ters can be included as m and s in StudentTDistribution@m, s, nD. When n = 1, the t distribu-
tion is the same as the Cauchy distribution.
The F-ratio distribution is the distribution of the ratio of two independent c2 variables divided by
their respective degrees of freedom. It is commonly used when comparing the variances of two
populations in hypothesis testing.
Distributions that are derived from normal distributions with nonzero means are called noncen-
tral distributions.
The sum of the squares of n normally distributed random variables with variance s2 = 1 and
nonzero means follows a noncentral c2 distribution NoncentralChiSquareDistribution@n, lD.
The noncentrality parameter l is the sum of the squares of the means of the random variables
Mathematics and Algorithms 311
The sum of the squares of n normally distributed random variables with variance s2 = 1 and
nonzero means follows a noncentral c2 distribution NoncentralChiSquareDistribution@n, lD.
The noncentrality parameter l is the sum of the squares of the means of the random variables
in the sum. Note that in various places in the literature, l 2 or l is used as the noncentrality
parameter.
ratio X c2n n where c2n is a central c2 random variable with n degrees of freedom, and X is an
parameter l and n1 degrees of freedom and c2m is a central c2 random variable with m degrees of
freedom.
If X is uniformly distributed on@- p, pD, then the random variable tanHXL follows a Cauchy distribu -
tion CauchyDistribution@a, bD, with a = 0 and b = 1.
When X1 and X2 have independent gamma distributions with equal scale parameters, the ran-
X1
dom variable X1 +X2
follows the beta distribution BetaDistribution@a, bD, where a and b are
When X1 and X2 have independent gamma distributions with equal scale parameters, the ran-
X1
dom variable X1 +X2
follows the beta distribution BetaDistribution@a, bD, where a and b are
p
For n = 1, the c distribution is identical to HalfNormalDistribution@qD with q = 2
. For n = 2,
The cumulative distribution function (cdf) at x is given by the integral of the probability density
function (pdf) up to x. The pdf can therefore be obtained by differentiating the cdf (perhaps in a
generalized sense). In this package the distributions are represented in symbolic form.
PDF@dist, xD evaluates the density at x if x is a numerical value, and otherwise leaves the func-
tion in symbolic form. Similarly, CDF@dist, xD gives the cumulative distribution.
The inverse cdf InverseCDF@dist, qD gives the value of x at which CDF@dist, xD reaches q. The
median is given by InverseCDF@dist, 1 2D. Quartiles, deciles and percentiles are particular
values of the inverse cdf. Quartile skewness is equivalent to Hq1 - 2 q2 + q3 L Hq3 - q1 L, where q1 , q2
and q3 are the first, second, and third quartiles, respectively. Inverse cdfs are used in construct-
ing confidence intervals for statistical parameters. InverseCDF@dist, qD and Quantile@dist, qD
are equivalent for continuous distributions.
The mean Mean@distD is the expectation of the random variable distributed according to dist and
is usually denoted by m. The mean is given by y x f HxL x, where f HxL is the pdf of the distribu-
tion. The variance Variance@distD is given by Hx - mL2 f HxL x. The square root of the variance is
The Skewness@distD and Kurtosis@distD functions give shape statistics summarizing the asymme-
1
try and the peakedness of a distribution, respectively. Skewness is given by 3
Hx - mL f HxL x
s3
Mathematics and Algorithms 315
The Skewness@distD and Kurtosis@distD functions give shape statistics summarizing the asymme -
1
try and the peakedness of a distribution, respectively. Skewness is given by 3
Hx - mL f HxL x
s3
1
and kurtosis is given by 4
Hx - mL f HxL x.
s4
In the discrete case, f HtL = f HxL expHi t xL. Each distribution has a unique characteristic function,
which is sometimes used instead of the pdf to define a distribution.
The expected value ExpectedValue@g, distD of a function g is given by f HxL gHxL x. In the dis-
crete case, the expected value of g is given by f HxL gHxL. ExpectedValue@g@xD, dist, xD is equiva-
lent to ExpectedValue@g, distD.
This is the cumulative distribution function. It is given in terms of the built-in function
GammaRegularized .
In[3]:= cdfunction = CDF@gdist, xD
Out[3]= GammaRegularized@3, 0, xD
0.8
0.6
Out[4]=
0.4
0.2
2 4 6 8 10
316 Mathematics and Algorithms
This is a pseudorandom array with elements distributed according to the gamma distribution.
In[5]:= RandomReal@gdist, 5D
Out[5]= 81.46446, 8.56359, 2.70647, 1.97748, 2.97108<
The data argument of FindClusters can be a list of data elements or rules indexing elements
and labels.
8e1 v1 ,e2 v2 ,< data specified as a list of rules between data elements ei
and labels vi
8e1 ,e2 ,<8v1 ,v2 ,< data specified as a rule mapping data elements ei to labels
vi
The data elements ei can be numeric lists, matrices, tensors, lists of True and False elements,
or lists of strings. All data elements ei must have the same dimensions.
The rule-based data syntax allows for clustering data elements and returning labels for those
elements.
Mathematics and Algorithms 317
The rule-based data syntax allows for clustering data elements and returning labels for those
elements.
Here two-dimensional points are clustered and labeled with their positions in the data list.
In[3]:= data1 = 881, 2<, 83, 7<, 80, 3<, 83, 1<< ;
FindClusters@data1 -> Range@Length@data1DDD
Out[3]= 881, 3, 4<, 82<<
The rule-based data syntax can also be used to cluster data based on parts of each data entry.
For instance, you might want to cluster data in a data table while ignoring particular columns in
the table.
This clusters the data while ignoring the first two elements in each data entry.
In[5]:= FindClusters@Drop@datarecords, None, 81, 2<D datarecordsD
Out[5]= 888Joe, Smith, 158, 64.4<, 8Sally, Jones, 168, 62.<<,
88Mary, Davis, 137, 64.4<, 8Bob, Lewis, 141, 62.8<<, 88John, Thompson, 235, 71.1<,
8Lewis, Black, 225, 71.4<, 8Tom, Smith, 243, 70.9<, 8Jane, Doe, 225, 71.4<<<
The following commands define a set of 300 two-dimensional data points chosen to group into
four somewhat nebulous clusters.
In[6]:= GaussianRandomData@n_Integer, p_, sigma_D := TableAp + 8Re@D, Im@D< &A
RandomReal@NormalDistribution@0, sigmaDD RandomReal@80,2 p<D E, 8n<E;
datapairs = BlockRandom@
SeedRandom@1234D;
Join@GaussianRandomData@100, 82, 1<, .3D,
GaussianRandomData@100, 81, 1.5<, .2D,
GaussianRandomData@100, 81, 1.1<, .4D,
GaussianRandomData@100, 81.75, 1.75<, 0.1DDD;
2.0
1.5
Out[8]= 1.0
0.5
With the default settings, FindClusters has found the four clusters of points.
2.0
1.5
Out[9]= 1.0
0.5
2.0
1.5
Out[10]= 1.0
0.5
Randomness is used in clustering in two different ways. Some of the methods use a random
assignment of some points to a specific number of clusters as a starting point. Randomness
may also be used to help determine what seems to be the best number of clusters to use.
Changing the random seed for generating the randomness by using
Mathematics and Algorithms 319
Randomness is used in clustering in two different ways. Some of the methods use a random
assignment of some points to a specific number of clusters as a starting point. Randomness
may also be used to help determine what seems to be the best number of clusters to use.
Changing the random seed for generating the randomness by using
FindClusters@8e1 , e2 , <, Method 8Automatic, "RandomSeed" -> s<D may lead to different
results for some cases.
In principle, clustering techniques can be applied to any set of data. All that is needed is a
measure of how far apart each element in the set is from other elements, that is, a function
giving the distance between elements.
f Hei , ei L = 0
f Iei , e j M 0
f Iei , e j M = f Ie j , ei M
If the ei are vectors of numbers, FindClusters by default uses a squared Euclidean distance. If
the ei are lists of Boolean True and False (or 0 and 1) elements, FindClusters by default uses
a dissimilarity based on the normalized fraction of elements that disagree. If the ei are strings,
FindClusters by default uses a distance function based on the number of point changes
needed to get from one string to another.
1.5
Out[11]= 1.0
0.5
Dissimilarities for Boolean vectors are typically calculated by comparing the elements of two
Boolean vectors u and v pairwise. It is convenient to summarize each dissimilarity function in
terms of ni j , where ni j is the number of corresponding pairs of elements in u and v, respectively,
equal to i and j. The number ni j counts the pairs 8i, j< in 8u1 , v1 <, 8u2 , v2 < , with i and j being
either 0 or 1. If the Boolean values are True and False, True is equivalent to 1 and False is
equivalent to 0.
YuleDissimilarity@u,vD the Yule dissimilarity 2 n10 n01 Hn11 n00 + n10 n01 L
These are the clusters found using the default dissimilarity for Boolean data.
In[13]:= FindClusters@bdataD
Out[13]= 888False, False, False, False, False, True, False, False, True, True<<,
88True, False, False, False, False, False, False, False, False, True<,
8True, False, False, True, False, False, True, False, True, True<,
8True, True, False, False, True, False, False, False, True, True<,
8True, True, False, False, True, True, True, True, True, True<<<
Mathematics and Algorithms 321
The edit distance is determined by counting the number of deletions, insertions, and substitu-
tions required to transform one string into another while preserving the ordering of characters.
In contrast, the Damerau|Levenshtein distance counts the number of deletions, insertions,
substitutions, and transpositions, while the Hamming distance counts only the number of substi-
tutions.
The methods "Agglomerate" and "Optimize" determine how to cluster the data for a particular
number of clusters k. "Agglomerate" uses an agglomerative hierarchical method starting with
each member of the set in a cluster of its own and fusing nearest clusters until there are k
remaining. "Optimize" starts by building a set of k representative objects and clustering
around those, iterating until a (locally) optimal clustering is found. The default "Optimize"
method is based on partitioning around medoids.
Additional Method suboptions are available to allow for more control over the clustering. Avail-
able suboptions depend on the Method chosen.
For a given set of data and distance function, the choice of the best number of clusters k may
be unclear. With Method -> 8methodname, "SignificanceTest" -> "stest"<, "stest" is used to
322 Mathematics and Algorithms
For a given set of data and distance function, the choice of the best number of clusters k may
be unclear. With Method -> 8methodname, "SignificanceTest" -> "stest"<, "stest" is used to
determine statistically significant clusters to help choose an appropriate number. Possible
values of "stest" are "Silhouette" and "Gap". The "Silhouette" test uses the silhouette
statistic to test how well the data is clustered. The "Gap" test uses the gap statistic to deter-
mine how well the data is clustered.
The "Silhouette" test subdivides the data into successively more clusters looking for the first
minimum of the silhouette statistic.
The "Gap" test compares the dispersion of clusters generated from the data to that derived
from a sample of null hypothesis sets. The null hypothesis sets are uniformly randomly dis-
tributed data in the box defined by the principal components of the input data. The "Gap"
method takes two suboptions: "NullSets" and "Tolerance". The suboption "NullSets" sets
the number of null hypothesis sets to compare with the input data. The option "Tolerance"
sets the sensitivity. Typically larger values of "Tolerance" will favor fewer clusters being cho-
sen. The default settings are "NullSets" -> 5 and "Tolerance" -> 1.
This shows the result of clustering datapairs using the "Silhouette" test.
In[16]:= ListPlot@FindClusters@datapairs,
Method 8Automatic, "SignificanceTest" -> "Silhouette"<DD
2.0
1.5
Out[16]= 1.0
0.5
Here are the clusters found using the "Gap" test with the tolerance parameter set to 3. The
larger value leads to fewer clusters being selected.
In[17]:= ListPlot@FindClusters@datapairs,
Method 8Automatic, "SignificanceTest" 8"Gap", "Tolerance" 3<<DD
2.0
1.5
Out[17]= 1.0
0.5
Note that the clusters found in these two examples are identical. The only difference is how the
number of clusters is chosen.
Mathematics and Algorithms 323
Note that the clusters found in these two examples are identical. The only difference is how the
number of clusters is chosen.
With Method -> 8"Agglomerate", "Linkage" -> f <, the specified linkage function f is used for
agglomerative clustering.
Linkage methods determine this intercluster dissimilarity, or fusion level, given the dissimilari-
ties between member elements.
With Linkage -> f , f is a pure function that defines the linkage algorithm. Distances or dissimi-
larities between clusters are determined recursively using information about the distances or
dissimilarities between unmerged clusters to determine the distances or dissimilarities for the
newly merged cluster. The function f defines a distance from a cluster k to the new cluster
formed by fusing clusters i and j. The arguments supplied to f are dik , d jk , dij , ni , n j , and nk ,
where d is the distance between clusters and n is the number of elements in a cluster.
These are the clusters found using complete linkage hierarchical clustering.
In[18]:= ListPlot@FindClusters@datapairs, Method 8"Agglomerate", "Linkage" "Complete"<DD
2.0
1.5
Out[18]= 1.0
0.5
Here are the clusters determined from a single iteration of the "Optimize" method.
In[19]:= ListPlot@FindClusters@datapairs, Method 8"Optimize", "Iterations" 1<DD
2.0
1.5
Out[19]= 1.0
0.5
Using Nearest
Nearest is used to find elements in a list that are closest to a given data point.
Nearest function.
The rule-based data syntax lets you use nearest elements to return their labels.
In[7]:= Nearest@881, 1<, 82, 2<, 83, 3<< 8a, b, c<, 81, 2<D
Out[7]= 8a, b<
If Nearest is to be applied repeatedly to the same numerical data, you can get significant
performance gains by first generating a NearestFunction.
This finds points in the set that are closest to the 10 target points.
326 Mathematics and Algorithms
This finds points in the set that are closest to the 10 target points.
In[10]:= target = RandomReal@1, 810, 2<D;
res = Map@nf, targetD; Timing
-16
Out[10]= 94.85723 10 , Null=
For numerical data, by default Nearest uses the EuclideanDistance. For strings,
EditDistance is used.
FitA9y1 ,y2 ,=,9 f1 , f2 ,=,xE fit the values yn to a linear combination of functions fi
FitA99x1 ,y1 =,9x2 , fit the points Hxn , yn L to a linear combination of the fi
y2 =,=,9 f1 , f2 ,=,xE
This generates a table of the numerical values of the exponential function. Table is discussed
in "Making Tables of Values".
In[1]:= data = Table@Exp@x 5.D, 8x, 7<D
Out[1]= 81.2214, 1.49182, 1.82212, 2.22554, 2.71828, 3.32012, 4.0552<
Mathematics and Algorithms 327
This finds a least-squares fit to data of the form c1 + c2 x + c3 x2 . The elements of data are
assumed to correspond to values 1, 2, ... of x.
In[2]:= Fit@data, 81, x, x ^ 2<, xD
2
Out[2]= 1.09428 + 0.0986337 x + 0.0459482 x
This finds a fit to the new data, of the form c1 + c2 sin HxL + c3 sin H2 xL.
In[5]:= Fit@%, 81, Sin@xD, Sin@2 xD<, xD
Out[5]= 0.989559 + 2.04199 Sin@xD - 0.418176 Sin@2 xD
One common way of picking out "signals" in numerical data is to find the Fourier transform, or
frequency spectrum, of the data.
Fourier transforms.
Note that the Fourier function in Mathematica is defined with the sign convention typically
used in the physical sciences~opposite to the one often used in electrical engineering. "Fourier
Transforms" gives more details.
Curve Fitting
There are many situations where one wants to find a formula that best fits a given set of data.
One way to do this in Mathematica is to use Fit.
Fit@8 f1 , f2 ,<,8 fun1 , fun2 ,<,xD find a linear combination of the funi that best fits the
values fi
60
50
40
Out[2]=
30
20
10
5 10 15 20
This gives a linear fit to the list of primes. The result is the best linear combination of the
functions 1 and x.
Mathematics and Algorithms 329
This gives a linear fit to the list of primes. The result is the best linear combination of the
functions 1 and x.
In[3]:= Fit@fp, 81, x<, xD
Out[3]= -7.67368 + 3.77368 x
60
50
40
Out[4]= 30
20
10
5 10 15 20
60
50
40
Out[5]= 30
20
10
5 10 15 20
60
50
40
Out[7]=
30
20
10
5 10 15 20
This shows the fit superimposed on the original data. The quadratic fit is better than the linear
one.
In[8]:= Show@%, gpD
70
60
50
40
Out[8]=
30
20
10
5 10 15 20
If you give data in the form 8 f1 , f2 , < then Fit will assume that the successive fi correspond to
values of a function at successive integer points 81, 2, <. But you can also give Fit data that
corresponds to the values of a function at arbitrary points, in one or more dimensions.
Multivariate fitting.
This gives a table of the values of x, y and 1 + 5 x - x y. You need to use Flatten to get it in the
right form for Fit.
Mathematics and Algorithms 331
This gives a table of the values of x, y and 1 + 5 x - x y. You need to use Flatten to get it in the
right form for Fit.
In[9]:= Flatten@Table@8x, y, 1 + 5 x - x y<, 8x, 0, 1, 0.4<, 8y, 0, 1, 0.4<D, 1D
Out[9]= 880., 0., 1.<, 80., 0.4, 1.<, 80., 0.8, 1.<, 80.4, 0., 3.<, 80.4, 0.4, 2.84<,
80.4, 0.8, 2.68<, 80.8, 0., 5.<, 80.8, 0.4, 4.68<, 80.8, 0.8, 4.36<<
Fit takes a list of functions, and uses a definite and efficient procedure to find what linear
combination of these functions gives the best least-squares fit to your data. Sometimes, how-
ever, you may want to find a nonlinear fit that does not just consist of a linear combination of
specified functions. You can do this using FindFit, which takes a function of any form, and
then searches for values of parameters that yield the best fit to your data.
FindFit@data, form, search for values of the pari that make form best fit data
8par1 ,par2 ,<,xD
FindFit@data, form,pars,8x,y,<D fit multivariate data
By default, both Fit and FindFit produce least-squares fits, which are defined to minimize the
quantity c2 = i ri 2 , where the ri are residuals giving the difference between each original data
NormFunction -> u, then FindFit will attempt to find the fit that minimizes the
quantity u@rD, where r is the list of residuals. The default is NormFunction -> Norm, correspond-
ing to a least-squares fit.
By default, both Fit and FindFit produce least-squares fits, which are defined to minimize the
332 Mathematics and Algorithms
quantity c2 = r 2
, where the ri are residuals giving the difference between each original data
point and its fitted value. One can, however, also consider fits based on other norms. If you set
the option NormFunction -> u, then FindFit will attempt to find the fit that minimizes the
quantity u@rD, where r is the list of residuals. The default is NormFunction -> Norm, correspond-
ing to a least-squares fit.
This uses the -norm, which minimizes the maximum distance between the fit and the data.
The result is slightly different from least-squares.
In[14]:= FindFit@fp, a x Log@b + c xD, 8a, b, c<, x, NormFunction -> HNorm@, InfinityD &LD
Out[14]= 8a 1.15077, b 1.0023, c 1.04686<
FindFit works by searching for values of parameters that yield the best fit. Sometimes you
may have to tell it where to start in doing this search. You can do this by giving parameters in
the form 88a, a0 <, 8b, b0 <, <. FindFit also has various options that you can set to control how it
does its search.
FittedModel objects can be evaluated at a point or queried for results and diagnostic informa-
tion. Diagnostics vary somewhat across model types. Available model fitting functions fit linear,
generalized linear, and nonlinear models.
Here is a shortened list of available results for the linear fitted model.
In[4]:= lm@"Properties"D Short
Out[4]//Short= 8AdjustedRSquared, AIC, 58, StudentizedResiduals, VarianceInflationFactors<
The major difference between model fitting functions such as LinearModelFit and functions
such as Fit and FindFit is the ability to easily obtain diagnostic information from the
FittedModel objects. The results are accessible without refitting the model.
Typical data for these model fitting functions takes the same form as data in other fitting func-
tions such as Fit and FindFit.
8y1 ,y2 ,< data points with a single predictor variable taking values 1,
2,
88x11 ,x12 ,,y1 <,8x21 ,x22 ,,y2 <,< data points with explicit coordinates
Data specifications.
Linear Models
Mathematics and Algorithms 335
Linear Models
Linear models with assumed independent normally distributed errors are among the most
common models for data. Models of this type can be fitted using the LinearModelFit function.
Options for model specification and for model analysis are available.
The Weights option specifies weight values for weighted linear regression. The
NominalVariables option specifies which predictor variables should be treated as nominal or
categorical. With NominalVariables -> All, the model is an analysis of variance (ANOVA)
model. With NominalVariables -> 8x , , x ,x , , x < the model is an analysis of covari-
336 Mathematics and Algorithms
The Weights option specifies weight values for weighted linear regression. The
NominalVariables option specifies which predictor variables should be treated as nominal or
categorical. With NominalVariables -> All, the model is an analysis of variance (ANOVA)
model. With NominalVariables -> 8x1 , , xi-1 , xi+1 , , xn < the model is an analysis of covari-
ance (ANCOVA) model with all but the ith predictor treated as nominal. Nominal variables are
represented by a collection of binary variables indicating equality and inequality to the observed
nominal categorical values for the variable.
Here are the default and mean squared error variance estimates.
In[10]:= 8lm@"EstimatedVariance"D,
lm@"EstimatedVariance", VarianceEstimatorFunction HMean@ ^ 2D &LD<
Out[10]= 86.71608, 6.04447<
A major feature of the model fitting framework is the ability to obtain results after the fitting.
The full list of available results can be obtained from the "Properties" value.
The properties include basic information about the data, fitted model, and numerous results and
diagnostics.
The "BestFitParameters" property gives the fitted parameter values 8b0 , b1 , <. "BestFit"
is the fitted function b0 + b1 f1 + b2 f2 + and "Function" gives the fitted function as a pure
function. The "DesignMatrix" is the design or model matrix for the data. "Response" gives the
list of the response or y values from the original data.
Types of residuals.
Residuals give a measure of the point-wise difference between the fitted values and the original
responses. "FitResiduals" gives the differences between the observed and fitted values
` `
8y1 - y1 , y2 - y2 , <. "StandardizedResiduals" and "StudentizedResiduals" are scaled forms
` `2 `2
of the residuals. The ith standardized residual is Hyi - yi L s H1 - hii L wi where s is the
estimated error variance, hii is the ith diagonal element of the hat matrix, and wi is the weight
`2
for the ith data point. The ith studentized residual uses the same formula with s replaced by
` 2
sHiL , the variance estimate omitting the ith data point.
`2 -1
"CovarianceMatrix" gives the covariance between fitted parameters. The matrix is s HX W XN
`2
where s is the variance estimate, X is the design matrix, and W is the diagonal matrix of
weights. "CorrelationMatrix" is the associated correlation matrix for the parameter esti-
mates. "ParameterErrors" is equivalent to the square root of the diagonal elements of the
covariance matrix.
These are the formatted parameter and parameter confidence interval tables.
In[16]:= lm2@8"ParameterTable", "ParameterConfidenceIntervalTable"<D
Estimate Standard Error t Statistic P-Value Estimate Standard Error Confidence Interval
1 1.40308 0.595477 2.35622 0.0506221 1 1.40308 0.595477 8-0.00500488, 2.81116<
Out[16]= : , >
x 0.340391 0.0782093 4.35231 0.00334539 x 0.340391 0.0782093 80.155456, 0.525327<
y 2.08429 0.0496681 41.9643 1.13829 10 -9
y 2.08429 0.0496681 81.96684, 2.20174<
The Estimate column of these tables is equivalent to "BestFitParameters". The t statistics are
the estimates divided by the standard errors. Each p-value is the two-sided p-value for the t
340 Mathematics and Algorithms
The Estimate column of these tables is equivalent to "BestFitParameters". The t statistics are
the estimates divided by the standard errors. Each p-value is the two-sided p-value for the t
statistic and can be used to assess whether the parameter estimate is statistically significantly
different from 0. Each confidence interval gives the upper and lower bounds for the parameter
confidence interval at the level prescribed by the ConfidenceLevel option. The various
ParameterTable and ParameterConfidenceIntervalTable properties can be used to get the
columns or the unformatted array of values from the table.
fitting the ith basis function to a linear function of the other basis functions. With
IncludeConstantBasis -> True, the first inflation factor is for the constant term.
"EigenstructureTable" gives the eigenvalues, condition indices, and variance partitions for
the nonconstant basis functions. The Index column gives the square root of the ratios of the
eigenvalues to the largest eigenvalue. The column for each basis function gives the proportion
of variation in that basis function explained by the associated eigenvector.
"EigenstructureTablePartitions" gives the values in the variance partitioning for all basis
functions in the table.
"SingleDeletionVariances" list of variance estimates with the ith data point omitted
Point-wise measures of influence are often employed to assess whether individual data points
have a large impact on the fitting. The hat matrix and catcher matrix play important roles in
`
such diagnostics. The hat matrix is the matrix H such that y = H y where y is the observed
`
y is the predicted response vector. "HatDiagonal" gives the diagonal
elements of the hat matrix. "CatcherMatrix" is the matrix C such that b = C y where b is the
fitted parameter vector.
Point-wise measures of influence are often employed to assess whether individual data points
have a large impact on the fitting. The hat matrix and catcher matrix play important rolls in
Mathematics and Algorithms 341
`
`
response vector and y is the predicted response vector. "HatDiagonal" gives the diagonal
elements of the hat matrix. "CatcherMatrix" is the matrix C such that b = C y where b is the
fitted parameter vector.
"FitDifferences" gives the DFFITS values that provide a measure of influence of each data
point on the fitted or predicted values. The ith DFFITS value is given by hii H1 - hii L rti where hii
is the ith hat diagonal and rti is the ith studentized residual.
"BetaDifferences" gives the DFBETAS values that provide measures of influence of each data
point on the parameters in the model. For a model with p parameters, the ith element of
"BetaDifferences" is a list of length p with the jth value giving the measure the of the influ-
ence of data point i on the jth parameter in the model. The ith "BetaDifferences" vector can
rti H1 - hii L Inj=1 k=1 c2jk M where c jk is the j,kth element of the catcher
p
be written as 9ci1 , , cip =
matrix.
"CookDistances" gives the Cook distance measures of leverage given. The ith Cook distance is
given by Hhii H1 - hii L rsi p where rsi is the ith standardized residual.
The ith element of "CovarianceRatios" is given by Hn - pL p IH1 - hii L Irti2 + n - p - 1M M and the ith
p
` 2 `2 ` 2
"FVarianceRatios" value is equal to sHiL Js H1 - hii L N where sHiL is the ith single deletion
variance.
The Durbin|Watson d statistic "DurbinWatsonD" is used for testing the existence of a first-order
autoregressive process. The d statistic is equivalent to n-1
i=1 Hri+1 - ri L i=1 ri where ri is the
2 n 2
ith residual.
0.6
0.5
0.4
Out[18]=
0.3
0.2
0.1
2 4 6 8 10
342 Mathematics and Algorithms
Goodness of fit measures are used to assess how well a model fits or to compare models. The
coefficient of determination "RSquared" is the ratio of the model sum of squares to the total
sum of squares. "AdjustedRSquared" penalizes for the number of parameters in the model and
n -1
is given by 1 - H n -p L H1 - R2 L.
"AIC" and "BIC" are likelihood-based goodness of fit measures. Both are equal to -2 times the
log-likelihood for the model plus k p where p is the number of parameters to be estimated
including the estimated variance. For "AIC" k is 2, and for "BIC" k is logHnL.
344 Mathematics and Algorithms
The invertible function g is called the link function and the linear combination b0 + b1 f1 + b2 f2 +
is referred to as the linear predictor. Common special cases include the linear regression model
with the identity link function and Gaussian or normal exponential family distribution, logit and
probit models for probabilities, Poisson models for count data, and gamma and inverse Gaus-
sian models.
`
The error variance is a function of the prediction y and is defined by the distribution up to a
constant f, which is referred to as the dispersion parameter. The error variance for a fitted
` ` ` `
value y can be written as f vHyL, where f is an estimate of the dispersion parameter obtained
`
from the observed and predicted response values, and vHyL is the variance function associated
`
with the exponential family evaluated at the value y.
1
Out[22]= FittedModelB F
0.742193 - 20 x
Logit and probit models are common binomial models for probabilities. The link function for the
y
logit model is logJ 1-y N and the link for the probit model is the inverse CDF for a standard normal
distribution 2 erf-1 H2 y - 1L. Models of this type can be fitted via GeneralizedLinearModelFit
with ExponentialFamily -> "Binomial" and the appropriate LinkFunction or via
LogitModelFit and ProbitModelFit.
LogitModelFit@data, funs,varsD obtain a logit model with basis functions funs and predictor
variables vars
LogitModelFit@8m,v<D obtain a logit model based on a design matrix m and
response vector v
ProbitModelFit@data, funs,varsD obtain a probit model fit to data
ProbitModelFit@8m,v<D obtain a probit model fit to a design matrix m and response
vector v
Parameter estimates are obtained via iteratively reweighted least squares with weights
obtained from the variance function of the assumed distribution. Options for
GeneralizedLinearModelFit include options for iteration fitting such as PrecisionGoal,
options for model specification such as LinkFunction, and options for further analysis such as
ConfidenceLevel.
346 Mathematics and Algorithms
The options for LogitModelFit and ProbitModelFit are the same as for
GeneralizedLinearModelFit except that ExponentialFamily and LinkFunction are defined
by the logit or probit model and so are not options to LogitModelFit and ProbitModelFit.
This gives 95% and 99% confidence intervals for the parameters in the gamma model.
In[24]:= 8glm2@"ParameterConfidenceIntervals"D,
glm2@"ParameterConfidenceIntervals", ConfidenceLevel .99D<
Out[24]= 8880.62891, 0.855475<, 8-0.0616093, -0.0319729<<,
880.593314, 0.891071<, 8-0.0662656, -0.0273166<<<
"BestFitParameters" gives the parameter estimates for the basis functions. "BestFit" gives
` ` `
the fitted function g-1 H b0 + b1 f1 + b2 f2 + L, and "LinearPredictor" gives the linear combina-
` ` `
tion b0 + b1 f1 + b2 f2 + . "DesignMatrix" is the design or model matrix for the basis functions.
"Deviances" deviances
"DevianceTable" deviance table
"DevianceTableDegreesOfFreedom degrees of freedom differences from the table
"
"DevianceTableDeviances" deviance differences from the table
"DevianceTableEntries" unformatted array of values from the table
"DevianceTableResidualDegrees residual degrees of freedom from the table
OfFreedom"
"DevianceTableResidualDevianc residual deviances from the table
es"
348 Mathematics and Algorithms
Deviances and deviance tables generalize the model decomposition given by analysis of vari-
` `
ance in linear models. The deviance for a single data point is 2 fIm HyL - m HyLM where m is the log-
likelihood function for the fitted model. "Deviances" gives a list of the deviance values for all
data points. The sum of all deviances gives the model deviance. The model deviance can be
decomposed as sums of squares are in an ANOVA table for linear models.
1
Out[32]= FittedModelB F
-0.852313 + 18 x + 18 y
As with sums of squares, deviances are additive. The Deviance column of the table gives the
increase in the model deviance when the given basis function is added. The Residual Deviance
column gives the difference between the model deviance and the deviance for the submodel
containing all previous terms in the table. For large samples, the increase in deviance is approxi-
mately c2 distributed with degrees of freedom equal to that for the basis function in the table.
"NullDeviance" is the deviance for the null model, the constant model equal to the mean of all
observed responses for models including a constant or g-1 H0L if a constant term is not included.
Mathematics and Algorithms 349
"NullDeviance" is the deviance for the null model, the constant model equal to the mean of all
observed responses for models including a constant or g-1 H0L if a constant term is not included.
As with "ANOVATable", a number of properties are included to extract the columns or unformat-
ted array of entries from "DevianceTable".
Types of residuals.
"FitResiduals" is the list of residuals, differences between the observed and predicted
responses. Given the distributional assumptions, the magnitude of the residuals is expected to
change as a function of the predicted response value. Various types of scaled residuals are
employed in the analysis of generalized linear models.
`
If di and ri = yi - yi are the deviance and residual for the ith data point, the ith deviance residual is
`
given by rdi = di sgnHri L. The ith Pearson residual is defined as r pi = ri vHyi L where v is the
variance function for the exponential family distribution. Standardized deviance residuals and
`
standardized Pearson residuals include division by fH1 - hii L where hii is the ith diagonal of the
hat matrix. "LikelihoodResiduals" values combine deviance and Pearson residuals. The ith
`
likelihood residual is given by sgnHri L Irdi 2 + hii r pi 2 H1 - hii LM f .
"WorkingResiduals" gives the residuals from the last step of the iterative fitting. The ith work-
gHmL `
ing residual can be obtained as ri m evaluated at m = yi .
350 Mathematics and Algorithms
"WorkingResiduals" gives the residuals from the last step of the iterative fitting. The ith work-
gHmL `
ing residual can be obtained as ri m evaluated at m = yi .
This plots the residuals and Anscombe residuals for the inverse Gaussian model.
In[41]:= Map@ListPlot@, Filling 0D &, glm3@8"FitResiduals", "AnscombeResiduals"<DD
0.10
0.010
0.05
0.005
Out[41]= : , >
5 10 15 20 25
5 10 15 20 25
-0.005 -0.05
"CovarianceMatrix" gives the covariance between fitted parameters and is very similar to the
definition for linear models. With CovarianceEstimatorFunction -> "ExpectedInformation"
the expected information matrix obtained from the iterative fitting is used. The matrix is
` -1
f HX W XN where X is the design matrix, and W is the diagonal matrix of weights from the final
stage of the fitting. The weights include both weights specified via the Weights option
and the weights associated with the distribution's variance function. With
CovarianceEstimatorFunction -> "ObservedInformation" the matrix is given by -f I -1
where
I is the observed Fisher information matrix, which is the Hessian of the log-likelihood function
with respect to parameters of the model.
f HX W XN where X is the design matrix, and W is the diagonal matrix of weights from the final
stage of the fitting. The weights include both weights specified via the Weights option
Mathematics and Algorithms 351
"CookDistances" and "HatDiagonal" extend the leverage measures from linear regression to
generalized linear models. The hat matrix from which the diagonal elements are extracted is
defined using the final weights of the iterative fitting.
The Cook distance measures of leverage are defined as in linear regression with standardized
residuals replaced by standardized Pearson residuals. The ith Cook distance is given by
Hhii H1 - hii L rspi p where rspi is the ith standardized Pearson residual.
"LogLikelihood" is the log-likelihood for the fitted model. "AIC" and "BIC" are penalized log-
likelihood measures 2 + k p where is the log-likelihood for the fitted model, p is the number of
parameters estimated including the dispersion parameter, and k is 2 for "AIC" and logHnL for
"BIC" for a model of n data points. "LikelihoodRatioStatistic" is given by 2 H - 0 L where 0
is the log-likelihood for the null model.
A number of the goodness of fit measures generalize R2 from linear regression as either a
measure of explained variation or as a likelihood-based measure. "CoxSnellPseudoRSquared"
given as 1 - ni=1 ri2 ni=1 Hyi - yL2 where ri is the ith residual and y is the mean of the responses yi .
Nonlinear Models
A nonlinear least-squares model is an extension of the linear model where the model need not
be a linear combination of basis function. The errors are still assumed to be independent and
normally distributed. Models of this type can be fitted using the NonlinearModelFit function.
the bi are parameters to be fitted, and the xi are predictor variables. As with any nonlinear
Mathematics and Algorithms 353
` `
Nonlinear models have the form y = f Ix1 , , xi , b1 , , b j M where y is the fitted or predicted value,
the bi are parameters to be fitted, and the xi are predictor variables. As with any nonlinear
optimization problem, a good choice of starting values for the parameters may be necessary.
Starting values can be given using the same parameter specifications as for FindFit.
Options for model fitting and for model analysis are available.
General numeric options such as AccuracyGoal, Method, and WorkingPrecision are the same
as for FindFit.
The Weights option specifies weight values for weighted nonlinear regression. The optimal fit is
for a weighted sum of squared errors.
All other options can be relevant to computation of results after the initial fitting. They can be
set within NonlinearModelFit for use in the fitting and to specify the default settings for
results obtained from the FittedModel object. These options can also be set within an already
constructed FittedModel object to override the option values originally given to
NonlinearModelFit.
354 Mathematics and Algorithms
Basic properties of the data and fitted function for nonlinear models behave like the same
properties for linear and generalized linear models with the exception that
"BestFitParameters" returns a rule as is done for the result of FindFit.
This gives the fitted function and rules for the parameter estimates.
In[26]:= nlm@8"BestFit", "BestFitParameters"<D
Out[26]= [email protected] + 2.76912 xD, 8a -0.748315, b 2.76912<<
Many diagnostics for nonlinear models extend or generalize concepts from linear regression.
These extensions often rely on linear approximations or large sample approximations.
Types of residuals.
As in linear regression, "FitResiduals" gives the differences between the observed and fitted
` `
values 8y1 - y1 , y2 - y2 , <, and "StandardizedResiduals" and "StudentizedResiduals" are
scaled forms of these differences.
` `2 `2
The ith standardized residual is Hyi - yi L s H1 - hii L wi where s is the estimated error
variance, hii is the ith diagonal element of the hat matrix, and wi is the weight for the ith data
`2
point, and the ith studentized residual is obtained by s replacing with the ith single deletion
` 2
variance sHiL . For nonlinear models a first-order approximation is used for the design matrix,
which is needed to compute the hat matrix.
Mathematics and Algorithms 355
"ANOVATable" provides a decomposition of the variation in the data attributable to the fitted
function and to the errors or residuals.
The uncorrected total sums of squares gives the sum of squared responses, while the corrected
total gives the sum of squared differences between the responses and their mean value.
"CovarianceMatrix" gives the approximate covariance between fitted parameters. The matrix
`2 -1 `2
is s HX W XN where s is the variance estimate, X is the design matrix for the linear approxima-
tion to the model, and W is the diagonal matrix of weights. "CorrelationMatrix" is the associ-
356 Mathematics and Algorithms
"CovarianceMatrix" gives the approximate covariance between fitted parameters. The matrix
`2 -1 `2
is s HX W XN where s is the variance estimate, X is the design matrix for the linear approxima-
tion to the model, and W is the diagonal matrix of weights. "CorrelationMatrix" is the associ-
ated correlation matrix for the parameter estimates. "ParameterErrors" is equivalent to the
square root of the diagonal elements of the covariance matrix.
Curvature diagnostics.
The first-order approximation used for many diagnostics is equivalent to the model being linear
in the parameters. If the parameter space near the parameter estimates is sufficiently flat, the
linear approximations and any results that rely on first-order approximations can be deemed
reasonable. Curvature diagnostics are used to assess whether the approximate linearity is
reasonable. "FitCurvatureTable" is a table of curvature diagnostics.
"SingleDeletionVariances" list of variance estimates with the ith data point omitted
where n is the number of data points, p is the number of parameters, hii is the ith hat diagonal,
`
s is the variance estimate for the full dataset, and ri is the ith residual.
Here the fitted function and mean prediction bands are obtained.
In[29]:= 8fit@x_D, mp@x_D< = nlm@8"BestFit", "MeanPredictionBands"<D
Out[30]= 2
10 15 20
"AdjustedRSquared", "AIC", "BIC", and "RSquared" are all direct extensions of the measures
as defined for linear models. The coefficient of determination "RSquared" is the ratio of the
model sum of squares to the total sum of squares. "AdjustedRSquared" penalizes for the
n-1
number of parameters in the model and is given by 1 - H n-p L H1 - R2 L.
"AIC" and "BIC" are equal to -2 times the log-likelihood for the model plus k p where p is the
number of parameters to be estimated including the estimated variance. For "AIC" k is 2, and
for "BIC" k is logHnL.
The approximate function reproduces each of the values in the original table.
360 Mathematics and Algorithms
The approximate function reproduces each of the values in the original table.
In[3]:= [email protected]
Out[3]= 0.247404
In this case the interpolation is a fairly good approximation to the true sine function.
In[5]:= [email protected]
Out[5]= 0.29552
You can work with approximate functions much as you would with any other Mathematica
functions. You can plot approximate functions, or perform numerical operations such as integra-
tion or root finding.
If you give a non-numerical argument, the approximate function is left in symbolic form.
In[6]:= sin@xD
Out[6]= InterpolatingFunction@880., 2.<<, <>D@xD
Here is the same numerical integral for the true sine function.
In[8]:= NIntegrate@Sin@xD ^ 2, 8x, 0, Pi 2<D
Out[8]= 0.785398
A plot of the approximate function is essentially indistinguishable from the true sine function.
In[9]:= Plot@sin@xD, 8x, 0, 2<D
1.0
0.8
0.6
Out[9]=
0.4
0.2
This finds the derivative of the approximate sine function, and evaluates it at p 6.
In[10]:= sin '@Pi 6D
Out[10]= 0.865372
InterpolatingFunction objects contain all the information Mathematica needs about approxi-
mate functions. In standard Mathematica output format, however, only the part that gives the
domain of the InterpolatingFunction object is printed explicitly. The lists of actual parame-
ters used in the InterpolatingFunction object are shown only in iconic form.
If you ask for a value outside of the domain, Mathematica prints a warning, then uses extrapola-
tion to find a result.
In[13]:= sin@3D
InterpolatingFunction::dmval :
Input value 83< lies outside the range of data in the interpolating function. Extrapolation will be used.
Out[13]= 0.0155471
The more information you give about the function you are trying to approximate, the better the
approximation Mathematica constructs can be. You can, for example, specify not only values of
the function at a sequence of points, but also derivatives.
This interpolates through the values of the sine function and its first derivative.
In[14]:= sind = Interpolation@Table@88x<, Sin@xD, Cos@xD<, 8x, 0, 2, 0.25<DD
Out[14]= InterpolatingFunction@880., 2.<<, <>D
This finds a better approximation to the derivative than the previous interpolation.
In[15]:= sind '@Pi 6D
Out[15]= 0.865974
Interpolation works by fitting polynomial curves between the points you specify. You can use
the option InterpolationOrder to specify the degree of these polynomial curves. The default
setting is InterpolationOrder -> 3, yielding cubic curves.
This creates an approximate function using linear interpolation between the values in the table.
In[17]:= Interpolation@tab, InterpolationOrder -> 1D
Out[17]= InterpolatingFunction@880, 6<<, <>D
0.5
Out[18]=
1 2 3 4 5 6
-0.5
-1.0
With the default setting InterpolationOrder -> 3, cubic curves are used, and the function
looks smooth.
In[19]:= Plot@Evaluate@Interpolation@tabDD@xD, 8x, 0, 6<D
1.0
0.5
Out[19]=
1 2 3 4 5 6
-0.5
-1.0
Increasing the setting for InterpolationOrder typically leads to smoother approximate func-
tions. However, if you increase the setting too much, spurious wiggles may develop.
Mathematics and Algorithms 363
To interpolate this array you explicitly have to tell Mathematica the domain it covers.
In[23]:= ListInterpolation@tab, 885.5, 7.2<, 82.3, 8.9<<D
Out[23]= [email protected], 7.2<, 82.3, 8.9<<, <>D
ListInterpolation works for arrays of any dimension, and in each case it produces an
InterpolatingFunction object which takes the appropriate number of arguments.
Mathematica can handle not only purely numerical approximate functions, but also ones which
involve symbolic parameters.
364 Mathematics and Algorithms
Mathematica can handle not only purely numerical approximate functions, but also ones which
involve symbolic parameters.
This shows how the interpolated value at 2.2 depends on the parameters.
In[27]:= [email protected] Simplify
Out[27]= 2.2 - 0.048 a - 0.032 b
With the default setting for InterpolationOrder used, the value at this point no longer
depends on a.
In[28]:= [email protected] Simplify
Out[28]= 3.8 + 0.864 b
In working with approximate functions, you can quite often end up with complicated combina-
tions of InterpolatingFunction objects. You can always tell Mathematica to produce a single
InterpolatingFunction object valid over a particular domain by using
FunctionInterpolation.
Here is the discrete Fourier transform of the data. It involves complex numbers.
In[2]:= Fourier@%D
Out[2]= 80. + 0. , -0.707107 - 1.70711 , 0. + 0. , -0.707107 - 0.292893 ,
0. + 0. , -0.707107 + 0.292893 , 0. + 0. , -0.707107 + 1.70711 <
Fourier works whether or not your list of data has a length which is a power of two.
In[4]:= Fourier@81, - 1, 1<D
Out[4]= 80.57735 + 0. , 0.57735 - 1. , 0.57735 + 1. <
366 Mathematics and Algorithms
This generates a list of 200 elements containing a periodic signal with random noise added.
In[5]:= data = Table@N@Sin@30 2 Pi n 200D + HRandomReal@D - 1 2LD, 8n, 200<D;
1.0
0.5
Out[6]=
50 100 150 200
-0.5
-1.0
-1.5
The discrete Fourier transform, however, shows a strong peak at 30 + 1, and a symmetric peak
at 201 - 30, reflecting the frequency component of the original signal near 30 200.
In[7]:= ListLinePlot@Abs@Fourier@dataDD, PlotRange -> AllD
7
4
Out[7]=
3
ing list.
In different scientific and technical fields different conventions are often used for defining dis-
crete Fourier transforms. The option FourierParameters allows you to choose any of these
conventions you want.
Mathematics and Algorithms 367
Mathematica can find discrete Fourier transforms for data in any number of dimensions. In n
dimensions, the data is specified by a list nested n levels deep. Two-dimensional discrete
Fourier transforms are often used in image processing.
One issue with the usual discrete Fourier transform for real data is that the result is complex-
valued. There are variants of real discrete Fourier transforms that have real results. Mathemat-
ica has commands for computing the discrete cosine transform and the discrete sine transform.
There are four types each of Fourier discrete sine and cosine transforms typically in use,
denoted by number or sometimes roman numeral as in "DCTII" for the discrete cosine trans-
form of type 2.
Check that the type 3 transform is the inverse of the type 2 transform.
In[11]:= FourierDCT@FourierDCT@pulse, 2D, 3D
Out[11]= 8-1., -1., -1., -1., -1., 1., 1., 1., 1., 1.<
The discrete real transforms are convenient to use for data or image compression.
The discrete cosine transform has most of the information in the first few modes.
In[13]:= dct = FourierDCT@dataD;
ListLinePlot@dct, PlotRange AllD
5
-10
Reconstruct the front from only the first 20 modes (1/10 of the original data size). The oscilla-
tions are a consequence of the truncation and are known to show up in image processing
applications as well.
In[14]:= tdata = FourierDCT@PadRight@Take@dct, 20D, 200, 0D, 3D;
ListLinePlot@8data, tdata<D
1.5
1.0
Out[14]= 0.5
In both convolution and correlation the basic idea is to combine a kernel list with successive
sublists of a list of data. The convolution of a kernel Kr with a list us has the general form
r Kr us-r , while the correlation has the general form r Kr us+r .
This forms the convolution of the kernel 8x, y< with a list of data.
In[1]:= ListConvolve@8x, y<, 8a, b, c, d, e<D
Out[1]= 8b x + a y, c x + b y, d x + c y, e x + d y<
370 Mathematics and Algorithms
In this case reversing the kernel gives exactly the same result as ListConvolve .
In[3]:= ListCorrelate@8y, x<, 8a, b, c, d, e<D
Out[3]= 8b x + a y, c x + b y, d x + c y, e x + d y<
In forming sublists to combine with a kernel, there is always an issue of what to do at the ends
of the list of data. By default, ListConvolve and ListCorrelate never form sublists which
would "overhang" the ends of the list of data. This means that the output you get is normally
shorter than the original list of data.
In practice one often wants to get output that is as long as the original list of data. To do this
requires including sublists that overhang one or both ends of the list of data. The additional
elements needed to form these sublists must be filled in with some kind of "padding". By
default, Mathematica takes copies of the original list to provide the padding, thus effectively
treating the list as being cyclic.
The last term in the last element now comes from the beginning of the list.
In[7]:= ListCorrelate@8x, y<, 8a, b, c, d<, 1D
Out[7]= 8a x + b y, b x + c y, c x + d y, d x + a y<
Now the first term of the first element and the last term of the last element both involve
wraparound.
In[8]:= ListCorrelate@8x, y<, 8a, b, c, d<, 8- 1, 1<D
Out[8]= 8d x + a y, a x + b y, b x + c y, c x + d y, d x + a y<
In the general case ListCorrelate@kernel, list, 8kL , kR <D is set up so that in the first element of
the result, the first element of list appears multiplied by the element at position kL in kernel, and
in the last element of the result, the last element of list appears multiplied by the element at
position kR in kernel. The default case in which no overhang is allowed on either side thus corre-
sponds to ListCorrelate@kernel, list, 81, - 1<D.
With a kernel of length 3, alignments 8- 1, 2< always make the first and last elements of the
result the same.
In[9]:= ListCorrelate@8x, y, z<, 8a, b, c, d<, 8- 1, 2<D
Out[9]= 8c x + d y + a z, d x + a y + b z, a x + b y + c z, b x + c y + d z, c x + d y + a z<
For many kinds of data, it is convenient to assume not that the data is cyclic, but rather that it
is padded at either end by some fixed element, often 0, or by some sequence of elements.
When the padding is indicated by 8p, q<, the list 8a, b, c< overlays 8, p, q, p, q, < with
a p aligned under the a.
In[12]:= ListCorrelate@8x, y, z<, 8a, b, c<, 8- 1, 1<, 8p, q<D
Out[12]= 8p x + q y + a z, q x + a y + b z, a x + b y + c z, b x + c y + q z, c x + q y + p z<
Different choices of kernel allow ListConvolve and ListCorrelate to be used for different
kinds of computations.
0.6
0.4
Out[16]=
0.2
20 40 60 80 100
-0.2
2
Out[18]= 1
20 40 60 80
-1
You can use ListConvolve and ListCorrelate to handle symbolic as well as numerical data.
The result corresponds exactly with the coefficients in the expanded form of this product of
polynomials.
In[20]:= Expand@Ha + b x + c x ^ 2L Hu + v x + w x ^ 2LD
Out[20]= a u + b u x + a v x + c u x 2 + b v x 2 + a w x2 + c v x 3 + b w x3 + c w x4
Out[22]=
374 Mathematics and Algorithms
Out[24]=
Cellular Automata
Cellular automata provide a convenient way to represent many kinds of systems in which the
values of cells in an array are updated in discrete steps according to a local rule.
This starts with the list given, then evolves rule 30 for 4 steps.
In[1]:= CellularAutomaton@30, 80, 0, 0, 1, 0, 0, 0<, 4D
Out[1]= 880, 0, 0, 1, 0, 0, 0<, 80, 0, 1, 1, 1, 0, 0<,
80, 1, 1, 0, 0, 1, 0<, 81, 1, 0, 1, 1, 1, 1<, 80, 0, 0, 1, 0, 0, 0<<
This shows 100 steps of rule 30 evolution from random initial conditions.
In[2]:= ArrayPlot@CellularAutomaton@30, RandomInteger@1, 250D, 100DD
Out[2]=
Mathematics and Algorithms 375
If you give an explicit list of initial values, CellularAutomaton will take the elements in this list
to correspond to all the cells in the system, arranged cyclically.
The right neighbor of the cell at the end is the cell at the beginning.
In[4]:= CellularAutomaton@30, 81, 0, 0, 0, 0<, 1D
Out[4]= 881, 0, 0, 0, 0<, 81, 1, 0, 0, 1<<
It is often convenient to set up initial conditions in which there is a small "seed" region, superim-
posed on a constant "background". By default, CellularAutomaton automatically fills in enough
background to cover the size of the pattern that can be produced in the number of steps of
evolution you specify.
This shows rule 30 evolving from an initial condition containing a single black cell.
In[5]:= ArrayPlot@CellularAutomaton@30, 881<, 0<, 100DD
Out[5]=
376 Mathematics and Algorithms
This shows rule 30 evolving from an initial condition consisting of a 81, 1< seed on a back-
ground of repeated 81, 0, 1, 1< blocks.
In[6]:= ArrayPlot@CellularAutomaton@30, 881, 1<, 81, 0, 1, 1<<, 100DD
Out[6]=
Particularly in studying interactions between structures, you may sometimes want to specify
initial conditions for cellular automata in which certain blocks are placed at particular offsets.
Out[7]=
n k = 2, r = 1, elementary rule
8n,k< general nearest-neighbor rule with k colors
8n,k,r< general rule with k colors and range r
8n,8k,1<< k-color nearest-neighbor totalistic rule
8n,8k,1<,r< k-color range r totalistic rule
8n,8k,8wt1 ,wt2 ,<<,r< rule in which neighbor i is assigned weight wti
8lhs1 ->rhs1 ,lhs2 ->rhs2 ,< explicit replacements for lists of neighbors
8 fun,8<,rspec< rule obtained by applying function fun to each neighbor list
In the simplest cases, a cellular automaton allows k possible values or "colors" for each cell, and
has rules that involve up to r neighbors on each side. The digits of the "rule number" n then
specify what the color of a new cell should be for each possible configuration of the
neighborhood.
Mathematics and Algorithms 377
In the simplest cases, a cellular automaton allows k possible values or "colors" for each cell, and
has rules that involve up to r neighbors on each side. The digits of the "rule number" n then
specify what the color of a new cell should be for each possible configuration of the
neighborhood.
This shows the new color of the center cell for each of the 8 neighborhoods.
In[10]:= Map@CellularAutomaton@30, , 1D@@2, 2DD &, %D
Out[10]= 80, 0, 0, 1, 1, 1, 1, 0<
For rule 30, this sequence corresponds to the base-2 digits of the number 30.
In[11]:= FromDigits@%, 2D
Out[11]= 30
Out[12]=
For a general cellular automaton rule, each digit of the rule number specifies what color a
different possible neighborhood of 2 r + 1 cells should yield. To find out which digit corresponds
r = 1 cellular automaton, the number is obtained from the list of elements neig in the
neighborhood by neig.8k ^ 2, k, 1<.
378 Mathematics and Algorithms
to which neighborhood, one effectively treats the cells in a neighborhood as digits in a number.
For an r = 1 cellular automaton, the number is obtained from the list of elements neig in the
neighborhood by neig.8k ^ 2, k, 1<.
It is sometimes convenient to consider totalistic cellular automata, in which the new value of a
cell depends only on the total of the values in its neighborhood. One can specify totalistic cellu-
lar automata by rule numbers or "codes" in which each digit refers to neighborhoods with a
given total value, obtained for example from neig.81, 1, 1<.
In general, CellularAutomaton allows one to specify rules using any sequence of weights.
Another choice sometimes convenient is 8k, 1, k<, which yields outer totalistic rules.
Out[13]=
Rules with range r involve all cells with offsets -r through +r. Sometimes it is convenient to
think about rules that involve only cells with specific offsets. You can do this by replacing a
single r with a list of offsets.
This generates the truth table for 2-cell-neighborhood rule number 7, which turns out to be the
Boolean function Nand.
In[14]:= Map@CellularAutomaton@87, 2, 880<, 81<<<, , 1D@@2, 2DD &,
881, 1<, 81, 0<, 80, 1<, 80, 0<<D
Out[14]= 80, 1, 1, 1<
Rule numbers provide a highly compact way to specify cellular automaton rules. But sometimes
it is more convenient to specify rules by giving an explicit function that should be applied to
each possible neighborhood.
Mathematics and Algorithms 379
Rule numbers provide a highly compact way to specify cellular automaton rules. But sometimes
it is more convenient to specify rules by giving an explicit function that should be applied to
each possible neighborhood.
This runs an additive cellular automaton whose rule adds all values in each neighborhood
modulo 4.
In[15]:= ArrayPlot@CellularAutomaton@8Mod@Apply@Plus, D, 4D &, 8<, 1<, 881<, 0<, 100DD
Out[15]=
Out[16]=
When you specify rules by functions, the values of cells need not be integers.
In[17]:= ArrayPlot@CellularAutomaton@8Mod@1 2 Apply@Plus, D, 1D &, 8<, 1<, 881<, 0<, 100DD
Out[17]=
This runs rule 30 for 5 steps, keeping only the last step.
In[19]:= CellularAutomaton@30, 881<, 0<, 885<<D
Out[19]= 881, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1<<
The step specification spect works very much like taking elements from a list with Take. One
difference, though, is that the initial condition for the cellular automaton is considered to be
step 0. Note that any step specification of the form 8< must be enclosed in an additional list.
u steps 0 through u
8u< step u
8u1 ,u2 < steps u1 through u2
8u1 ,u2 ,du< steps u1 , u1 + du,
This evolves for 100 steps, but keeps only every other step.
In[22]:= ArrayPlot@CellularAutomaton@30, 881<, 0<, 880, 100, 2<<DD
Out[22]=
Mathematics and Algorithms 381
Much as you can specify which steps to keep in a cellular automaton evolution, so also you can
specify which cells to keep. If you give an initial condition such as 8a1 , a2 , <, blist, then ai is
taken to have offset 0 for the purpose of specifying which cells to keep.
All all cells that can be affected by the specified initial condition
Automatic all cells in the region that differs from the background
(default)
0 cell aligned with beginning of aspec
x cells at offsets up to x on the right
-x cells at offsets up to x on the left
8x< cell at offset x to the right
8-x< cell at offset x to the left
8x1 ,x2 < cells at offsets x1 through x2
8x1 ,x2 ,dx< cells x1 , x1 + dx,
This keeps all steps, but drops cells at offsets more than 20 on the left.
In[23]:= ArrayPlot@CellularAutomaton@30, 881<, 0<, 8100, 8- 20, 100<<DD
Out[23]=
382 Mathematics and Algorithms
If you give an initial condition such as 88a1 , a2 , <, blist<, then CellularAutomaton will always
effectively do the cellular automaton as if there were an infinite number of cells. By using a
specx such as 8x1 , x2 < you can tell CellularAutomaton to include only cells at specific offsets x1
through x2 in its output. CellularAutomaton by default includes cells out just far enough that
their values never simply stay the same as in the background blist.
In general, given a cellular automaton rule with range r, cells out to distance rt on each side
could in principle be affected in the evolution of the system. With specx being All, all these cells
are included; with the default setting of Automatic, cells whose values effectively stay the
same as in blist are trimmed off.
By default, only the parts that are not constant black are kept.
In[25]:= ArrayPlot@CellularAutomaton@225, 881<, 0<, 100DD
Out[25]=
Mathematics and Algorithms 383
Using All for specx includes all cells that could be affected by a cellular automaton with this
range.
In[26]:= ArrayPlot@CellularAutomaton@225, 881<, 0<, 8100, All<DD
Out[26]=
CellularAutomaton generalizes quite directly to any number of dimensions. Above two dimen-
sions, however, totalistic and other special types of rules tend to be more useful, since the
number of entries in the rule table for a general rule rapidly becomes astronomical.
This is the rule specification for the two-dimensional 9-neighbor totalistic cellular automaton
with code 797.
In[27]:= code797 = 8797, 82, 1<, 81, 1<<;
Out[29]=
Out[30]=
Mathematics and Algorithms 385
Mathematical Functions
Naming Conventions
Mathematical functions in Mathematica are given names according to definite rules. As with
most Mathematica functions, the names are usually complete English words, fully spelled out.
For a few very common functions, Mathematica uses the traditional abbreviations. Thus the
modulo function, for example, is Mod, not Modulo.
Mathematical functions that are usually referred to by a person's name have names in Mathemat-
ica of the form PersonSymbol. Thus, for example, the Legendre polynomials Pn HxL are denoted
LegendreP@n, xD. Although this convention does lead to longer function names, it avoids any
ambiguity or confusion.
When the standard notation for a mathematical function involves both subscripts and super-
scripts, the subscripts are given before the superscripts in the Mathematica form. Thus, for
example, the associated Legendre polynomials Pm
n HxL are denoted LegendreP@n, m, xD.
This gives a result for the integral of xn that is valid for almost all values of n.
In[1]:= Integrate@x ^ n, xD
x1+n
Out[1]=
1+n
For the special case of x-1 , however, the correct result is different.
In[2]:= Integrate@x ^ - 1, xD
Out[2]= Log@xD
The overall goal of symbolic computation is typically to get formulas that are valid for many
possible values of the variables that appear in them. It is however often not practical to try to
get formulas that are valid for absolutely every possible value of each variable.
386 Mathematics and Algorithms
1
Power::infy : Infinite expression encountered.
0
Out[4]= Indeterminate
This construct treats both cases, but would be quite unwieldy to use.
In[5]:= If@x != 0, 0, IndeterminateD
Out[5]= If@x 0, 0, IndeterminateD
If Mathematica did not automatically replace 0 x by 0, then few symbolic computations would
get very far. But you should realize that the practical necessity of making such replacements
can cause misleading results to be obtained when exceptional values of parameters are used.
The basic operations of Mathematica are nevertheless carefully set up so that whenever possi-
ble the results obtained will be valid for almost all values of each variable.
Out[6]= x2
This makes the assumption that x is a positive real variable, and does the replacement.
In[8]:= Simplify@Sqrt@x ^ 2D, x > 0D
Out[8]= x
Numerical Functions
Mathematics and Algorithms 387
Numerical Functions
IntegerPart @xD and FractionalPart@xD can be thought of as extracting digits to the left and
right of the decimal point. Round@xD is often used for forcing numbers that are close to integers
to be exactly integers. Floor@xD and Ceiling@xD often arise in working out how many ele-
ments there will be in sequences of numbers with non-integer spacings.
Piecewise Functions
Boole@exprD is a basic function that turns True and False into 1 and 0. It is sometimes known
as the characteristic function or indicator function.
Piecewise functions.
It is often convenient to have functions with different forms in different regions. You can do this
using Piecewise.
Mathematics and Algorithms 389
0.8
0.6
Out[2]=
0.4
0.2
Piecewise functions appear in systems where there is discrete switching between different
domains. They are also at the core of many computational methods, including splines and finite
elements. Special cases include such functions as Abs, UnitStep, Clip, Sign, Floor and Max.
Mathematica handles piecewise functions in both symbolic and numerical situations.
0.8
0.6
Out[3]=
0.4
0.2
5 10 15 20 25 30
Pseudorandom Numbers
Mathematica has three functions for generating pseudorandom numbers that are distributed
uniformly over a range of values.
1
RandomInteger@D 0 or 1 with probability
2
RandomReal and RandomComplex allow you to obtain pseudorandom numbers with any precision.
Mathematics and Algorithms 391
If you get arrays of pseudorandom numbers repeatedly, you should get a "typical" sequence of
numbers, with no particular pattern. There are many ways to use such numbers.
One common way to use pseudorandom numbers is in making numerical tests of hypotheses.
For example, if you believe that two symbolic expressions are mathematically equal, you can
test this by plugging in "typical" numerical values for symbolic parameters, and then comparing
the numerical results. (If you do this, you should be careful about numerical accuracy problems
and about functions of complex variables that may not have unique values.)
Out[7]= x2 Abs@xD
Substituting in a random numerical value shows that the equation is not always True .
In[8]:= % . x -> RandomComplex@D
Out[8]= False
Other common uses of pseudorandom numbers include simulating probabilistic processes, and
sampling large spaces of possibilities. The pseudorandom numbers that Mathematica generates
for a range of numbers are always uniformly distributed over the range you specify.
RandomInteger, RandomReal and RandomComplex are unlike almost any other Mathematica
functions in that every time you call them, you potentially get a different result. If you use
them in a calculation, therefore, you may get different answers on different occasions.
392 Mathematics and Algorithms
RandomInteger, RandomReal and RandomComplex are unlike almost any other Mathematica
functions in that every time you call them, you potentially get a different result. If you use
them in a calculation, therefore, you may get different answers on different occasions.
The sequences that you get from RandomInteger, RandomReal and RandomComplex are not in
most senses "truly random", although they should be "random enough" for practical purposes.
The sequences are in fact produced by applying a definite mathematical algorithm, starting
from a particular "seed". If you give the same seed, then you get the same sequence.
When Mathematica starts up, it takes the time of day (measured in small fractions of a second)
as the seed for the pseudorandom number generator. Two different Mathematica sessions will
therefore almost always give different sequences of pseudorandom numbers.
If you want to make sure that you always get the same sequence of pseudorandom numbers,
you can explicitly give a seed for the pseudorandom generator, using SeedRandom.
If you reseed the pseudorandom generator with the same seed, you get the same sequence of
pseudorandom numbers.
In[11]:= SeedRandom@143D; RandomReal@1, 83<D
Out[11]= 80.110762, 0.364563, 0.163681<
Every single time RandomInteger, RandomReal or RandomComplex is called, the internal state of
the pseudorandom generator that it uses is changed. This means that subsequent calls to these
functions made in subsidiary calculations will have an effect on the numbers returned in your
main calculation. To avoid any problems associated with this, you can localize this effect of their
use by doing the calculation inside of BlockRandom .
Mathematics and Algorithms 393
BlockRandom @exprD evaluates expr with the current state of the pseudorandom
generators localized
By localizing the calculation inside BlockRandom , the internal state of the pseudorandom
generator is restored after generating the first list.
In[12]:= 8BlockRandom@8RandomReal@D, RandomReal@D<D, 8RandomReal@D, RandomReal@D<<
Out[12]= 880.952312, 0.93591<, 80.952312, 0.93591<<
Many applications require random numbers from non-uniform distributions. Mathematica has
many distributions built into the system. You can give a distribution with appropriate parame-
ters instead of a range to RandomInteger or RandomReal.
RandomInteger@distD , RandomReal@distD
a pseudorandom number distributed by the random distribu-
tion dist
RandomInteger@dist,nD , RandomReal@dist,nD
a list of n pseudorandom numbers distributed by the
random distribution dist
RandomInteger@dist,8n1 ,n2 ,<D , RandomReal@dist,8n1 ,n2 ,<D
an n1 n2 array of pseudorandom numbers distributed
by the random distribution dist
This generates a 44 matrix of real numbers using the standard normal distribution.
In[14]:= RandomReal@NormalDistribution@D, 84, 4<D
Out[14]= 881.17681, -0.774733, -1.74139, 1.3577<, 8-1.261, 0.0408214, 0.989022, 2.80942<,
8-1.27146, 1.63037, 1.98221, 0.403135<, 81.00722, -0.927379, 0.747369, -2.28065<<
This generates five high-precision real numbers distributed normally with mean 2 and standard
deviation 4.
In[15]:= RandomReal@NormalDistribution@2, 4D, 5, WorkingPrecision 32D
Out[15]= 8-3.7899344407106290701062146195097,
-1.1607986070402381009885236231751, 12.042079595098604792470496688453,
2.3508651879131153670572237267418, 5.0287452449413463045300577818173<
An additional use of pseudorandom numbers is for selecting from a list. RandomChoice selects
with replacement and RandomSample samples without replacement.
394 Mathematics and Algorithms
An additional use of pseudorandom numbers is for selecting from a list. RandomChoice selects
with replacement and RandomSample samples without replacement.
Selecting at random.
Chances are very high that at least one of the choices was repeated in the output. That is
because when an element is chosen, it is immediately replaced. On the other hand, if you want
to select from an actual set of elements, there should be no replacement.
Sample 10 items at random from the digits 0 through 9 without replacement. The result is a
random permutation of the digits.
In[17]:= RandomSample@Range@0, 9D, 10D
Out[17]= 87, 9, 2, 5, 3, 4, 1, 8, 0, 6<
Sample 10 items from a set having different frequencies for each digit.
In[18]:= RandomSample@80, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6,
7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9<, 10D
Out[18]= 86, 6, 7, 8, 9, 7, 2, 2, 9, 9<
The result from Mod always has the same sign as the second argument.
In[4]:= Mod@- 5.6, 1.2D
Out[4]= 0.4
For any integers a and b, it is always true that b * Quotient@a, bD + Mod@a, bD is equal to a.
Particularly when you are using Mod to get indices for parts of objects, you will often find it
convenient to specify an offset.
396 Mathematics and Algorithms
This effectively extracts the 18th part of the list, with the list treated cyclically.
In[5]:= Part@8a, b, c<, Mod@18, 3, 1DD
Out[5]= c
The greatest common divisor function GCD@n1 , n2 , D gives the largest integer that divides all
the ni exactly. When you enter a ratio of two integers, Mathematica effectively uses GCD to
cancel out common factors and give a rational number in lowest terms.
The least common multiple function LCM@n1 , n2 , D gives the smallest integer that contains all
the factors of each of the ni .
The Kronecker delta function KroneckerDelta@n1 , n2 , D is equal to 1 if all the ni are equal,
and is 0 otherwise. dn1 n2 can be thought of as a totally symmetric tensor.
PrimeQAn,GaussianIntegers->TrueE
give True if n is a Gaussian prime, and False otherwise
This gives the factors of 24 as 23 , 31 . The first element in each list is the factor; the second is its
exponent.
In[8]:= FactorInteger@24D
Out[8]= 882, 3<, 83, 1<<
You should realize that according to current mathematical thinking, integer factoring is a funda-
mentally difficult computational problem. As a result, you can easily type in an integer that
Mathematica will not be able to factor in anything short of an astronomical length of time. But
as long as the integers you give are less than about 50 digits long, FactorInteger should have
no trouble. And in special cases it will be able to deal with much longer integers.
Although Mathematica may not be able to factor a large integer, it can often still test whether
or not the integer is a prime. In addition, Mathematica has a fast way of finding the kth prime
number.
It is often much faster to test whether a number is prime than to factor it.
In[12]:= PrimeQ@234 242 423D
Out[12]= False
500
400
300
Out[13]=
200
100
20 40 60 80 100
Particularly in number theory, it is often more important to know the distribution of primes than
their actual values. The function PrimePi@xD gives the number of primes pHxL that are less than
or equal to x.
Liouville's function gives H-1Lk where k is the number of prime factors counting multiplicity.
In[18]:= 8LiouvilleLambda@3 ^ 5D, LiouvilleLambda@2 3 ^ 5D<
Out[18]= 8-1, 1<
The Mangoldt function returns the log of prime power base or zero when composite.
In[19]:= 8MangoldtLambda@3 ^ 5D, MangoldtLambda@2 3 ^ 5D<
Out[19]= 8Log@3D, 0<
By default, FactorInteger allows only real integers. But with the option setting
Mathematics and Algorithms 399
By default, FactorInteger allows only real integers. But with the option setting
GaussianIntegers -> True, it also handles Gaussian integers, which are complex numbers with
integer real and imaginary parts. Just as it is possible to factor uniquely in terms of real primes,
it is also possible to factor uniquely in terms of Gaussian primes. There is nevertheless some
potential ambiguity in the choice of Gaussian primes. In Mathematica, they are always chosen
to have positive real parts, and non-negative imaginary parts, except for a possible initial factor
of -1 or i.
The modular power function PowerMod@a, b, nD gives exactly the same results as Mod@a ^ b, nD
for b > 0. PowerMod is much more efficient, however, because it avoids generating the full form
of a ^ b.
400 Mathematics and Algorithms
The modular power function PowerMod@a, b, nD gives exactly the same results as Mod@a ^ b, nD
for b > 0. PowerMod is much more efficient, however, because it avoids generating the full form
of a ^ b.
You can use PowerMod not only to find positive modular powers, but also to find modular
inverses. For negative b, PowerMod@a, b, nD gives, if possible, an integer k such that
k a-b 1 mod n. (Whenever such an integer exists, it is guaranteed to be unique modulo n.) If no
such integer k exists, Mathematica leaves PowerMod unevaluated.
PowerMod is equivalent to using Power , then Mod, but is much more efficient.
In[22]:= PowerMod@2, 13 451, 3D
Out[22]= 2
This finds the smallest non-negative integer x so that x2 is equal to 3 mod 11.
In[25]:= PowerMod@3, 1 2, 11D
Out[25]= 5
This returns all integers less than 11 which satisfy the relation.
In[27]:= PowerModList@3, 1 2, 11D
Out[27]= 85, 6<
Mathematics and Algorithms 401
If d does not have a square root modulo n, PowerMod@d, nD will remain unevaluated and
PowerModList will return an empty list.
In[28]:= PowerMod@3, 1 2, 5D
1
Out[28]= PowerModB3, , 5F
2
In[29]:= PowerModList@3, 1 2, 5D
Out[29]= 8<
Even for a large modulus, the square root can be computed fairly quickly.
There are fHkL distinct Dirichlet characters for a given modulus k, as labeled by the index j.
Different conventions can give different orderings for the possible characters.
Out[33]= 9
The Euler totient function fHnL gives the number of integers less than n that are relatively prime
to n. An important relation (Fermat's little theorem) is that af HnL 1 mod n for all a relatively prime
to n.
The Mbius function mHnL is defined to be H-1Lk if n is a product of k distinct primes, and 0 if n
contains a squared factor (other than 1). An important relation is the Mbius inversion formula,
which states that if gHnL = d n f HdL for all n, then f HnL = d n mHdL gHn dL, where the sums are over all
The divisor function sk HnL is the sum of the kth powers of the divisors of n. The function s0 HnL
gives the total number of divisors of n, and is variously denoted dHnL, nHnL and tHnL. The function
402 Mathematics and Algorithms
The divisor function sk HnL is the sum of the kth powers of the divisors of n. The function s0 HnL
gives the total number of divisors of n, and is variously denoted dHnL, nHnL and tHnL. The function
s1 HnL, equal to the sum of the divisors of n, is often denoted sHnL.
The function DivisorSum@n, formD represents the sum of form@iD for all i that divide n.
DivisorSum@n, form, condD includes only those divisors for which cond@iD gives True.
This gives a list of sums for the divisors of five positive integers.
In[38]:= Table@DivisorSum@n, &D, 8n, 5<D
Out[38]= 81, 3, 4, 7, 6<
This imposes the condition that the value of each divisor i must be less than 6.
In[39]:= Table@DivisorSum@n, &, < 6 &D, 8n, 11, 15<D
Out[39]= 81, 10, 1, 3, 9<
n
The Jacobi symbol JacobiSymbol@n, mD reduces to the Legendre symbol J m N when m is an odd
that m = i pi .
The Jacobi symbol JacobiSymbol@n, mD reduces to the Legendre symbol J m N when m is an odd
is said to be a quadratic residue modulo m if there exists an integer k such that k2 n mod m. The
n
full Jacobi symbol is a product of the Legendre symbols J p N for each of the prime factors pi such
i
that m = i pi .
The extended GCD ExtendedGCD @n1 , n2 , D gives a list 8g, 8r1 , r2 , << where g is the greatest
common divisor of the ni , and the ri are integers such that g = r1 n1 + r2 n2 + . The extended GCD
is important in finding integer solutions to linear Diophantine equations.
The first number in the list is the GCD of 105 and 196.
In[40]:= ExtendedGCD@105, 196D
Out[40]= 87, 8-13, 7<<
The multiplicative order function MultiplicativeOrder@k, nD gives the smallest integer m such
that km 1 mod n. Then m is known as the order of k modulo n. The notation ordn HkL is occasionally
used.
The Carmichael function or least universal exponent lHnL gives the smallest integer m such that
km 1 mod n for all integers k relatively prime to n.
Continued fractions.
404 Mathematics and Algorithms
This generates the first 10 terms in the continued fraction representation for p.
In[42]:= ContinuedFraction@Pi, 10D
Out[42]= 83, 7, 15, 1, 292, 1, 1, 1, 2, 1<
This reconstructs the number represented by the list of continued fraction terms.
In[43]:= FromContinuedFraction@%D
1 146 408
Out[43]=
364 913
Continued fractions appear in many number theoretic settings. Rational numbers have terminat-
ing continued fraction representations. Quadratic irrational numbers have continued fraction
representations that become repetitive.
The continued fraction representation of 79 starts with the term 8, then involves a sequence
of terms that repeat forever.
In[46]:= ContinuedFraction@Sqrt@79DD
Out[46]= 88, 81, 7, 1, 16<<
Mathematics and Algorithms 405
Out[47]= 79
Continued fraction convergents are often used to approximate irrational numbers by rational
ones. Those approximations alternate from above and below, and converge exponentially in the
number of terms. Furthermore, a convergent p q of a simple continued fraction is better than
any other rational approximation with denominator less than or equal to q.
This gives a list of rational approximations of 101/9801, derived from its continued fraction
expansion.
In[51]:= Convergents@ 101 9801 D
1 25 101
Out[51]= :0, , , >
97 2426 9801
This lists successive rational approximations to p, until the numerical precision is exhausted.
406 Mathematics and Algorithms
This lists successive rational approximations to p, until the numerical precision is exhausted.
In[53]:= Convergents@ N@ Pi D D
22 333 355 103 993 104 348 208 341
Out[53]= :3, , , , , , ,
7 106 113 33 102 33 215 66 317
312 689 833 719 1 146 408 4 272 943 5 419 351 80 143 857
, , , , , >
99 532 265 381 364 913 1 360 120 1 725 033 25 510 582
With an exact irrational number, you have to explicitly ask for a certain number of terms.
In[54]:= Convergents@ Pi, 10 D
22 333 355 103 993 104 348 208 341 312 689 833 719 1 146 408
Out[54]= :3, , , , , , , , , >
7 106 113 33 102 33 215 66 317 99 532 265 381 364 913
LatticeReduce@8v1 v2 ,<D a reduced lattice basis for the set of integer vectors vi
HermiteDecomposition@8v1 ,v2 ,<D the echelon form for the set of integer vectors vi
The lattice reduction function LatticeReduce@8v1 , v2 , <D is used in several kinds of modern
algorithms. The basic idea is to think of the vectors vk of integers as defining a mathematical
lattice. Any vector representing a point in the lattice can be written as a linear combination of
the form ck vk , where the ck are integers. For a particular lattice, there are many possible
choices of the "basis vectors" vk . What LatticeReduce does is to find a reduced set of basis
vectors v k for the lattice, with certain special properties.
Three unit vectors along the three coordinate axes already form a reduced basis.
In[55]:= LatticeReduce@881, 0, 0<, 80, 1, 0<, 80, 0, 1<<D
Out[55]= 881, 0, 0<, 80, 1, 0<, 80, 0, 1<<
This gives the reduced basis for a lattice in four-dimensional space specified by three vectors.
In[56]:= l = LatticeReduce@881, 0, 0, 12 345<, 80, 1, 0, 12 435<, 80, 0, 1, 12 354<<D
Out[56]= 88-1, 0, 1, 9<, 89, 1, -10, 0<, 885, -143, 59, 6<<
Notice that in the last example, LatticeReduce replaces vectors that are nearly parallel by
vectors that are more perpendicular. In the process, it finds some quite short basis vectors.
For a matrix m, HermiteDecomposition gives matrices u and r such that u is unimodular, u.m = r,
and r is in reduced row echelon form. In contrast to RowReduce, pivots may be larger than 1
because there are no fractions in the ring of integers. Entries above a pivot are minimized by
subtracting appropriate multiples of the pivot row.
Mathematics and Algorithms 407
For a matrix m, HermiteDecomposition gives matrices u and r such that u is unimodular, u.m = r,
and r is in reduced row echelon form. In contrast to RowReduce, pivots may be larger than 1
because there are no fractions in the ring of integers. Entries above a pivot are minimized by
subtracting appropriate multiples of the pivot row.
In this case, the original matrix is recovered because it was in row echelon form.
In[57]:= 8u, r< = HermiteDecomposition@lD
Out[57]= 8881371, 143, 1<, 81381, 144, 1<, 81372, 143, 1<<,
881, 0, 0, 12 345<, 80, 1, 0, 12 435<, 80, 0, 1, 12 354<<<
Here the second matrix has some pivots larger than 1, and nonzero entries over pivots.
In[59]:= HermiteDecomposition@88- 2, 1, 1<, 85, 9, 4<, 8- 4, 2, - 11<<D
Out[59]= 8882, 1, 0<, 83, 2, 1<, 82, 0, -1<<, 881, 11, 6<, 80, 23, 0<, 80, 0, 13<<<
Here are the digits in the base-2 representation of the number 77.
In[60]:= IntegerDigits@77, 2D
Out[60]= 81, 0, 0, 1, 1, 0, 1<
Out[62]= 4
3
20 40 60 80 100 120
408 Mathematics and Algorithms
Bitwise operations.
Bitwise operations act on integers represented as binary bits. BitAnd@n1 , n2 , D yields the
integer whose binary bit representation has ones at positions where the binary bit representa-
tions of all of the ni have ones. BitOr@n1 , n2 , D yields the integer with ones at positions where
any of the ni have ones. BitXor@n1 , n2 D yields the integer with ones at positions where n1 or n2
but not both have ones. BitXor@n1 , n2 , D has ones where an odd number of the ni have ones.
This finds the bitwise AND of the numbers 23 and 29 entered in base 2.
In[63]:= BaseForm@BitAnd@2^^10111, 2^^11101D, 2D
Out[63]//BaseForm= 101012
Bitwise operations are used in various combinatorial algorithms. They are also commonly used
in manipulating bitfields in low-level computer languages. In such languages, however, integers
normally have a limited number of digits, typically a multiple of 8. Bitwise operations in Mathe-
matica in effect allow integers to have an unlimited number of digits. When an integer is nega-
tive, it is taken to be represented in two's complement form, with an infinite sequence of ones
on the left. This allows BitNot@nD to be equivalent simply to -1 - n.
SquareFreeQ @nD give True if n does not contain a squared factor, False
otherwise
SquareFreeQ @nD checks to see if n has a square prime factor. This is done by computing
MoebiusMu@nD and seeing if the result is zero; if it is, then n is not squarefree, otherwise it is.
Computing MoebiusMu@nD involves finding the smallest prime factor q of n. If n has a small
prime factor (less than or equal to 1223), this is very fast. Otherwise, FactorInteger is used to
find q.
Mathematics and Algorithms 409
SquareFreeQ @nD checks to see if n has a square prime factor. This is done by computing
MoebiusMu@nD and seeing if the result is zero; if it is, then n is not squarefree, otherwise it is.
Computing MoebiusMu@nD involves finding the smallest prime factor q of n. If n has a small
prime factor (less than or equal to 1223), this is very fast. Otherwise, FactorInteger is used to
find q.
NextPrime@nD finds the smallest prime p such that p > n. For n less than 20 digits, the algorithm
does a direct search using PrimeQ on the odd numbers greater than n. For n with more than 20
digits, the algorithm builds a small sieve and first checks to see whether the candidate prime is
divisible by a small prime before using PrimeQ. This seems to be slightly faster than a direct
search.
Even for large numbers, the next prime can be computed rather quickly.
For RandomPrime @8min, max<D and RandomPrime @maxD, a random prime p is obtained by ran-
domly selecting from a prime lookup table if max is small and by a random search of integers in
the range if max is large. If no prime exists in the specified range, the input is returned unevalu-
ated with an error message.
The algorithm for PrimePowerQ involves first computing the least prime factor p of n and then
attempting division by n until either 1 is obtained, in which case n is a prime power, or until
division is no longer possible, in which case n is not a prime power.
The Chinese remainder theorem states that a certain class of simultaneous congruences always
has a solution. ChineseRemainder@list1 , list2 D finds the smallest non-negative integer r such
that Mod@r, list2 D is list1 . The solution is unique modulo the least common multiple of the ele-
ments of list2 .
This means that 244 0 mod 4, 244 1 mod 9, and 244 2 mod 121.
In[73]:= ChineseRemainder@80, 1, 2<, 84, 9, 121<D
Out[73]= 244
PrimitiveRoot@nD returns a generator for the group of numbers relatively prime to n under
multiplication mod n. This has a generator if and only if n is 2, 4, a power of an odd prime, or
twice a power of an odd prime. If n is a prime or prime power, the least positive primitive root
will be returned.
In[78]:= PrimitiveRootA10933 E
Out[78]= 5
In[79]:= PrimitiveRootA2 55 E
Out[79]= 3127
If the argument is composite and not a prime power or twice a prime power, the function does
not evaluate.
In[80]:= PrimitiveRoot@11 13D
Out[80]= PrimitiveRoot@143D
Combinatorial Functions
n! factorial nHn - 1L Hn - 2L 1
n!! double factorial nHn - 2L Hn - 4L
n
Binomial@n,mD binomial coefficient = n ! @m ! Hn - mL !D
m
Multinomial @n1 ,n2 ,D multinomial coefficient Hn1 + n2 + L ! Hn1 ! n2 ! L
CatalanNumber@nD Catalan number cn
Combinatorial functions.
The factorial function n ! gives the number of ways of ordering n objects. For non-integer n, the
numerical value of n ! is obtained from the gamma function, discussed in "Special Functions".
414 Mathematics and Algorithms
The factorial function n ! gives the number of ways of ordering n objects. For non-integer n, the
numerical value of n ! is obtained from the gamma function, discussed in "Special Functions".
n
The binomial coefficient Binomial@n, mD can be written as = n ! @m ! Hn - mL !D. It gives the
m
number of ways of choosing m objects from a collection of n objects, without regard to order.
The Catalan numbers, which appear in various tree enumeration problems, are given in terms
2n
of binomial coefficients as cn = Hn + 1L.
n
The subfactorial Subfactorial@nD gives the number of permutations of n objects that leave no
object fixed. Such a permutation is called a derangement. The subfactorial is given by
n ! nk=0 H-1Lk k !.
Mathematica gives the exact integer result for the factorial of an integer.
In[1]:= 30 !
Out[1]= 265 252 859 812 191 058 636 308 480 000 000
This gives the number of ways of partitioning 6 + 5 = 11 objects into sets containing 6 and 5
objects.
In[4]:= Multinomial@6, 5D
Out[4]= 462
Mathematics and Algorithms 415
11
The result is the same as .
6
In[5]:= Binomial@11, 6D
Out[5]= 462
The Fibonacci numbers Fibonacci@nD satisfy the recurrence relation Fn = Fn-1 + Fn-2 with
F1 = F2 = 1. They appear in a wide range of discrete mathematical problems. For large n, Fn Fn-1
approaches the golden ratio. The Lucas numbers LucasL@nD satisfy the same recurrence rela-
tion as the Fibonacci numbers do, but with initial conditions L1 = 1 and L2 = 3.
n=0 Fn HxL t .
t I1 - x t - t2 M = n
The harmonic numbers HarmonicNumber@nD are given by Hn = ni=1 1 i; the harmonic numbers of
order r HarmonicNumber@n, rD are given by HnHrL = ni=1 1 ir . Harmonic numbers appear in many
combinatorial estimation problems, often playing the role of discrete analogs of logarithms.
n=0 Bn HxL t n ! .
= The Bernoulli numbers BernoulliB@nD are given by Bn = Bn H0L. The Bn
xt t n
t e He - 1L
appear as the coefficients of the terms in the Euler|Maclaurin summation formula for approximat-
ing integrals. The Bernoulli numbers are related to the Genocchi numbers by Gn = 2 H1 - 2n L Bn .
Numerical values for Bernoulli numbers are needed in many numerical algorithms. You can
always get these numerical values by first finding exact rational results using BernoulliB@nD,
and then applying N.
1
the Euler numbers EulerE@nD are given by En = 2n En J 2 N.
other positive integer values of a, the Nrlund polynomials give higher-order Bernoulli numbers.
The generalized Bernoulli polynomials NorlundB@n, a, xD satisfy the generating function relation
ta ex t Het - 1La =
n=0 Bn HxL t n ! .
HaL n
416 Mathematics and Algorithms
You can also get Bernoulli polynomials by explicitly computing the power series for the generat-
ing function.
In[7]:= Series@t Exp@x tD HExp@tD - 1L, 8t, 0, 4<D
1 1 1 1
Out[7]= 1 + - +x t+ I1 - 6 x + 6 x2 M t2 + Ix - 3 x2 + 2 x3 M t3 + I-1 + 30 x2 - 60 x3 + 30 x4 M t4 + O@tD5
2 12 12 720
Stirling numbers show up in many combinatorial enumeration problems. For Stirling numbers of
the first kind StirlingS1@n, mD, H-1Ln-m SnHmL gives the number of permutations of n elements
which contain exactly m cycles. These Stirling numbers satisfy the generating function relation
x Hx - 1L Hx - n + 1L = nm=0 SnHmL xm . Note that some definitions of the SnHmL differ by a factor H-1Ln-m
Stirling numbers of the second kind StirlingS2@n, mD, sometimes denoted HmL
n , give the num-
ber of ways of partitioning a set of n elements into m non-empty subsets. They satisfy the
relation xn = nm=0 HmL
n x Hx - 1L ... Hx - m + 1L.
The Bell numbers BellB@nD give the total number of ways that a set of n elements can be
partitioned into non-empty subsets. The Bell polynomials BellB@n, xD satisfy the generating
tn
function relation eHe -1L x = .
t
n=0 Bn HxL n!
The partition function PartitionsP @nD gives the number of ways of writing the integer n as a
sum of positive integers, without regard to order. PartitionsQ @nD gives the number of ways of
writing n as a sum of positive integers, with the constraint that all the integers in each sum are
distinct.
Mathematics and Algorithms 417
This gives the number of partitions of 100, with and without the constraint that the terms
should be distinct.
In[13]:= 8PartitionsQ@100D, PartitionsP@100D<
Out[13]= 8444 793, 190 569 292<
The partition function pHnL increases asymptotically like e n . Note that you cannot simply use
Plot to generate a plot of a function like PartitionsP because the function can only be
evaluated with integer arguments.
In[14]:= ListPlot@Table@N@Log@PartitionsP@nDDD, 8n, 100<DD
15
10
Out[14]=
20 40 60 80 100
Most of the functions here allow you to count various kinds of combinatorial objects. Functions
like IntegerPartitions and Permutations allow you instead to generate lists of various combi-
nations of elements.
The signature function Signature@8i1 , i2 , <D gives the signature of a permutation. It is equal
to +1 for even permutations (composed of an even number of transpositions), and to -1 for odd
418 Mathematics and Algorithms
The signature function Signature@8i1 , i2 , <D gives the signature of a permutation. It is equal
to +1 for even permutations (composed of an even number of transpositions), and to -1 for odd
permutations. The signature function can be thought of as a totally antisymmetric tensor, Levi-
Civita symbol or epsilon symbol.
Clebsch|Gordan coefficients and n-j symbols arise in the study of angular momenta in quantum
mechanics, and in other applications of the rotation group. The Clebsch|Gordan coefficients
ClebschGordan@8 j1 , m1 <, 8 j2 , m2 <, 8 j, m<D give the coefficients in the expansion of the quan-
tum mechanical angular momentum state j, m\ in terms of products of states j1 , m1 \ j2 , m2 \.
The 3-j symbols or Wigner coefficients ThreeJSymbol@8 j1 , m1 <, 8 j2 , m2 <, 8 j3 , m3 <D are a more
symmetrical form of Clebsch|Gordan coefficients. In Mathematica, the Clebsch|Gordan coeffi-
j j j j1 j2 j3
cients are given in terms of 3-j symbols by Cm11 m2 23m3 = H-1Lm3 + j1 - j2 2 j3 + 1 .
m1 m2 -m3
The 6-j symbols SixJSymbol@8 j1 , j2 , j3 <, 8 j4 , j5 , j6 <D give the couplings of three quantum
mechanical angular momentum states. The Racah coefficients are related by a phase to the 6-j
symbols.
1+j+m
H-1L-j+m
1+3 j+2 j2
Out[15]= -
2
Mathematics and Algorithms 419
Mathematica gives exact results for logarithms whenever it can. Here is log2 1024.
In[1]:= Log@2, 1024D
Out[1]= 10
You can find the numerical values of mathematical functions to any precision.
In[2]:= N@Log@2D, 40D
Out[2]= 0.6931471805599453094172321214581765680755
You can convert from degrees by explicitly multiplying by the constant Degree.
In[6]:= N@Sin@30 DegreeDD
Out[6]= 0.5
Here is a plot of the hyperbolic tangent function. It has a characteristic "sigmoidal" form.
In[7]:= Plot@Tanh@xD, 8x, - 8, 8<D
1.0
0.5
Out[7]=
-5 5
-0.5
-1.0
The haversine function Haversine@zD is defined by sin2 Hz 2L. The inverse haversine function
InverseHaversine@zD is defined by 2 sin-1 I z M. The Gudermannian function Gudermannian@zD is
p
defined as gdHzL = 2 tan-1 Hez L - 2 . The inverse Gudermannian function InverseGudermannian@zD is
defined by gd-1 HzL = log@tanHz 2 + p 4LD. The Gudermannian satisfies such relations as
sinhHzL = tan@gdHxLD. The sinc function Sinc@zD is the Fourier transform of a square signal.
There are a number of additional trigonometric and hyperbolic functions that are sometimes
used. The versine function is sometimes encountered in the literature and simply is
versHzL = 2 havHzL. The coversine function is defined as coversHzL = 1 - sinHzL. The complex exponential
eix is sometimes written as cisHxL.
Mathematics and Algorithms 421
The need to make one choice from two solutions means that Sqrt@xD cannot be a true inverse
function for x ^ 2. Taking a number, squaring it, and then taking the square root can give you a
different number than you started with.
Squaring and taking the square root does not necessarily give you the number you started with.
In[2]:= Sqrt@H- 2L ^ 2D
Out[2]= 2
When you evaluate -2 i , there are again two possible answers: -1 + i and 1 - i. In this case,
however, it is less clear which one to choose.
There is in fact no way to choose z so that it is continuous for all complex values of z. There
has to be a "branch cut"~ a line in the complex plane across which the function z is discontinu-
ous. Mathematica adopts the usual convention of taking the branch cut for z to be along the
negative real axis.
The branch cut in Sqrt along the negative real axis means that values of Sqrt@zD with z just
above and below the axis are very different.
In[4]:= 8Sqrt@- 2 + 0.1 ID, Sqrt@- 2 - 0.1 ID<
Out[4]= 80.0353443 + 1.41466 , 0.0353443 - 1.41466 <
The discontinuity along the negative real axis is quite clear in this three-dimensional picture of
the imaginary part of the square root function.
In[6]:= Plot3D@Im@Sqrt@x + I yDD, 8x, - 4, 4<, 8y, - 4, 4<D
Out[6]=
When you find an nth root using z1n , there are, in principle, n possible results. To get a single
value, you have to choose a particular principal root. There is absolutely no guarantee that
taking the nth root of an nth power will leave you with the same number.
This takes the tenth power of a complex number. The result is unique.
In[7]:= H2.5 + IL ^ 10
Out[7]= -15 781.2 - 12 335.8
There are 10 possible tenth roots. Mathematica chooses one of them. In this case it is not the
number whose tenth power you took.
In[8]:= % ^ H1 10L
Out[8]= 2.61033 - 0.660446
There are many mathematical functions which, like roots, essentially give solutions to equa-
tions. The logarithm function and the inverse trigonometric functions are examples. In almost
all cases, there are many possible solutions to the equations. Unique "principal" values neverthe-
Mathematics and Algorithms 423
less have to be chosen for the functions. The choices cannot be made continuous over the
whole complex plane. Instead, lines of discontinuity, or branch cuts, must occur. The positions
of these branch cuts are often quite arbitrary. Mathematica makes the most standard mathemati-
cal choices for them.
Sqrt@zD and z^s H-, 0L for Re s > 0, H-, 0D for Re s 0 (s not an integer)
Exp@zD none
Log@zD H-, 0D
trigonometric functions none
ArcSin@zD and ArcCos@zD H-, -1L and H+1, +L
ArcTan@zD H-i, -iD and Hi, iD
ArcCsc@zD and ArcSec@zD H-1, +1L
ArcCot@zD @-i, +iD
hyperbolic functions none
ArcSinh@zD H-i, -iL and H+i, +iL
ArcCosh@zD H-, +1L
ArcTanh@zD H-, -1D and @+1, +L
ArcCsch@zD H-i, iL
ArcSech@zD H-, 0D and H+1, +L
ArcCoth@zD @-1, +1D
ArcSin is a multiple-valued function, so there is no guarantee that it always gives the "inverse"
of Sin.
In[9]:= ArcSin@[email protected]
Out[9]= -1.35841
Values of ArcSin@zD on opposite sides of the branch cut can be very different.
In[10]:= 8ArcSin@2 + 0.1 ID, ArcSin@2 - 0.1 ID<
Out[10]= 81.51316 + 1.31888 , 1.51316 - 1.31888 <
424 Mathematics and Algorithms
A three-dimensional picture, showing the two branch cuts for the function sin-1 HzL.
In[11]:= Plot3D@Im@ArcSin@x + I yDD, 8x, - 4, 4<, 8y, - 4, 4<D
Out[11]=
Mathematical Constants
I i= -1
Infinity
Pi p > 3.14159
Degree p 180: degrees to radians conversion factor
E e > 2.71828
EulerGamma Euler's constant g > 0.577216
Mathematical constants.
1
Eulers constant EulerGamma is given by the limit g = limm Jm
k=1 k
- log mN. It appears in many
-2
Catalans constant Catalan is given by the sum
k=0 H-1L H2 k + 1L . It often appears in asymp-
k
s=1 J1 + s Hs+2L
N . It gives the geometric mean of the terms in the continued fraction representa-
Orthogonal Polynomials
Orthogonal polynomials.
The associated Legendre polynomials LegendreP@n, m, xD are obtained from derivatives of the
m2
Legendre polynomials according to Pm m 2
n HxL = H-1L I1 - x M d m @Pn HxLD d xm . Notice that for odd inte-
gers m n, the Pm
n HxL contain powers of 1 - x2 , and are therefore not strictly polynomials. The
m
426 Mathematics and Algorithms
The associated Legendre polynomials LegendreP@n, m, xD are obtained from derivatives of the
m2
Legendre polynomials according to Pm m 2
n HxL = H-1L I1 - x M d m @Pn HxLD d xm . Notice that for odd inte-
gers m n, the Pm
n HxL contain powers of 1 - x2 , and are therefore not strictly polynomials. The
1
The integral -1 P7 HxL P8 HxL d x gives zero by virtue of the orthogonality of the Legendre polynomi-
als.
In[2]:= Integrate@LegendreP@7, xD LegendreP@8, xD, 8x, - 1, 1<D
Out[2]= 0
GegenbauerC @n, 0, xD is always equal to zero. GegenbauerC @n, xD is however given by the limit
limm0 CnHmL HxL m. This form is sometimes denoted CnH0L HxL.
Series of Chebyshev polynomials are often used in making numerical approximations to func-
tions. The Chebyshev polynomials of the first kind ChebyshevT@n, xD are defined by
Tn Hcos qL = cosHn qL. They are normalized so that Tn H1L = 1. They satisfy the orthogonality relation
1 -12
2
-1 Tm HxL Tn HxL I1 - x M x = 0 for m n. The Tn HxL also satisfy an orthogonality relation under sum-
The name "Chebyshev" is a transliteration from the Cyrillic alphabet; several other spellings,
such as "Tschebyscheff", are sometimes used.
times used is Hen HxL = 2-n2 Hn Jx 2 N (a different overall normalization of the Hen HxL is also some-
times used).
The Hermite polynomials are related to the parabolic cylinder functions or Weber functions Dn HxL
2
by Dn HxL = 2-n2 e-x 4
Hn Jx 2 N.
428 Mathematics and Algorithms
This gives the density for an excited state of a quantum-mechanical harmonic oscillator. The
average of the wiggles is roughly the classical physics result.
In[7]:= Plot@HHermiteH@6, xD Exp@- x ^ 2 2DL ^ 2, 8x, - 6, 6<D
25 000
20 000
15 000
Out[7]=
10 000
5000
-6 -4 -2 2 4 6
You can get formulas for generalized Laguerre polynomials with arbitrary values of a.
In[8]:= LaguerreL@2, a, xD
1
Out[8]= I2 + 3 a + a2 - 4 x - 2 a x + x2 M
2
Zernike radial polynomials ZernikeR@n, m, xD are used in studies of aberrations in optics. They
HmL 1
n HxL Rk HxL x x = 0 for n k.
satisfy the orthogonality relation 0 RHmL
for m n. Legendre, Gegenbauer, Chebyshev and Zernike polynomials can all be viewed as
special cases of Jacobi polynomials. The Jacobi polynomials are sometimes given in the alterna-
Hp-q,q-1L
tive form Gn Hp, q, xL = n ! G Hn + pL G H2 n + pL Pn H2 x - 1L.
Special Functions
Mathematica includes all the common special functions of mathematical physics found in stan-
dard handbooks. We will discuss each of the various classes of functions in turn.
One point you should realize is that in the technical literature there are often several conflicting
definitions of any particular special function. When you use a special function in Mathematica,
therefore, you should be sure to look at the definition given here to confirm that it is exactly
what you want.
Mathematics and Algorithms 429
One point you should realize is that in the technical literature there are often several conflicting
definitions of any particular special function. When you use a special function in Mathematica,
therefore, you should be sure to look at the definition given here to confirm that it is exactly
what you want.
135 135 p
Out[1]=
128
p 3 p 15 p
Out[5]= : , , >
2 4 8
Special functions in Mathematica can usually be evaluated for arbitrary complex values of their
arguments. Often, however, the defining relations given in this tutorial apply only for some
special choices of arguments. In these cases, the full function corresponds to a suitable exten-
430 Mathematics and Algorithms
Special functions in Mathematica can usually be evaluated for arbitrary complex values of their
arguments. Often, however, the defining relations given in this tutorial apply only for some
special choices of arguments. In these cases, the full function corresponds to a suitable exten-
sion or "analytic continuation" of these defining relations. Thus, for example, integral representa-
tions of functions are valid only when the integral exists, but the functions themselves can
usually be defined elsewhere by analytic continuation.
As a simple example of how the domain of a function can be extended, consider the function
to show analytically that for any x, the complete function is equal to 1 H1 - xL. Using this form,
you can easily find a value of the function for any x, at least so long as x 1.
integer n, GHnL = Hn - 1L ! . GHzL can be viewed as a generalization of the factorial function, valid for
complex arguments z.
Mathematics and Algorithms 431
The Euler gamma function Gamma@zD is defined by the integral GHzL = 0 tz-1 e-t d t. For positive
integer n, GHnL = Hn - 1L ! . GHzL can be viewed as a generalization of the factorial function, valid for
complex arguments z.
There are some computations, particularly in number theory, where the logarithm of the
gamma function often appears. For positive real arguments, you can evaluate this simply as
Log@Gamma@zDD. For complex arguments, however, this form yields spurious discontinuities.
Mathematica therefore includes the separate function LogGamma@zD, which yields the logarithm
of the gamma function with a single branch cut along the negative real axis.
1
The Euler beta function Beta@a, bD is BHa, bL = GHaL GHbL GHa + bL = 0 ta-1 H1 - tLb-1 d t.
The alternative incomplete gamma function gHa, zL can therefore be obtained in Mathematica as
Gamma@a, 0, zD.
z
The incomplete beta function Beta@z, a, bD is given by Bz Ha, bL = 0 ta-1 H1 - tLb-1 d t. Notice that in
the incomplete beta function, the parameter z is an upper limit of integration, and appears as
the first argument of the function. In the incomplete gamma function, on the other hand, z is a
lower limit of integration, and appears as the second argument of the function.
In certain cases, it is convenient not to compute the incomplete beta and gamma functions on
their own, but instead to compute regularized forms in which these functions are divided by
complete beta and gamma functions. Mathematica includes the regularized incomplete beta
function BetaRegularized@z, a, bD defined for most arguments by IHz, a, bL = BHz, a, bL BHa, bL, but
taking into account singular cases. Mathematica also includes the regularized incomplete
gamma function GammaRegularized@a, zD defined by QHa, zL = GHa, zL GHaL, with singular cases
taken into account.
The incomplete beta and gamma functions, and their inverses, are common in statistics. The
inverse beta function InverseBetaRegularized@s, a, bD is the solution for z in s = IHz, a, bL. The
inverse gamma function InverseGammaRegularized@a, sD is similarly the solution for z in
432 Mathematics and Algorithms
The incomplete beta and gamma functions, and their inverses, are common in statistics. The
inverse beta function InverseBetaRegularized@s, a, bD is the solution for z in s = IHz, a, bL. The
inverse gamma function InverseGammaRegularized@a, sD is similarly the solution for z in
s = QHa, zL.
Derivatives of the gamma function often appear in summing rational series. The digamma
function PolyGamma@zD is the logarithmic derivative of the gamma function, given by
yHzL = G HzL GHzL. For integer arguments, the digamma function satisfies the relation
yHnL = -g + Hn-1 , where g is Euler's constant (EulerGamma in Mathematica) and Hn are the har-
monic numbers.
The polygamma functions PolyGamma@n, zD are given by yHnL HzL = d n yHzL d zn . Notice that the
digamma function corresponds to yH0L HzL. The general form yHnL HzL is the Hn + 1Lth , not the nth ,
logarithmic derivative of the gamma function. The polygamma functions satisfy the relation
yHnL HzL = H-1Ln+1 n !
k=0 1 Hz + kL
n+1
. PolyGamma@n, zD is defined for arbitrary complex n by fractional
BarnesG@zD is a generalization of the Gamma function and is defined by its functional identity
BarnesG@z + 1D = Gamma@zD BarnesG@zD, where the third derivative of the logarithm of BarnesG
is positive for positive z. BarnesG is an entire function in the complex plane.
LogBarnesG@zD is a holomorphic function with a branch cut along the negative real-axis such
that Exp@LogBarnesG@zDD = BarnesG@zD.
Many exact results for gamma and polygamma functions are built into Mathematica.
In[1]:= PolyGamma@6D
137
Out[1]= - EulerGamma
60
tions with integer arguments arise in evaluating various sums and integrals. Mathematica gives
exact results when possible for zeta functions with integer arguments.
There is an analytic continuation of zHsL for arbitrary complex s 1. The zeta function for complex
arguments is central to number theoretic studies of the distribution of primes. Of particular
1
importance are the values on the critical line Re HsL = 2 .
1
In studying zJ 2 + i tN, it is often convenient to define the two Riemann|Siegel functions
1
RiemannSiegelZ@tD and RiemannSiegelTheta@tD according to ZHtL = ei J HtL zJ 2 + i tN and
1
JHtL = Im log GJ 4 + i t 2N - t logHpL 2 (for t real). Note that the Riemann|Siegel functions are both real
as long as t is real.
The Stieltjes constants StieltjesGamma@nD are generalizations of Euler's constant which appear
in the series expansion of zHsL around its pole at s = 1; the coefficient of H1 - sLn is gn n !. Euler's
434 Mathematics and Algorithms
The Stieltjes constants StieltjesGamma@nD are generalizations of Euler's constant which appear
in the series expansion of zHsL around its pole at s = 1; the coefficient of H1 - sLn is gn n !. Euler's
constant is g0 .
-s2
The generalized Riemann zeta function Zeta@s, aD is implemented as zHs, aL = 2
k=0 IHk + aL M ,
tHnL
The Ramanujan t Dirichlet L function RamanujanTauL@sD is defined by L HsL =
n=1 ns
(for
Re HsL > 6), with coefficients RamanujanTau@nD. In analogy with the Riemann zeta function, it is
again convenient to define the functions RamanujanTauZ@tD and RamanujanTauTheta@tD.
10
Out[4]=
0
10
Here is a three-dimensional picture of the Riemann zeta function in the complex plane.
In[14]:= Plot3D@Abs@Zeta@x + I yDD, 8x, - 3, 3<, 8y, 2, 35<D
Out[14]=
1
This is a plot of the absolute value of the Riemann zeta function on the critical line Re z = . You
2
can see the first few zeros of the zeta function.
In[15]:= Plot@Abs@Zeta@1 2 + I yDD, 8y, 0, 40<D
Out[15]= 3.0
2.5
2.0
1.5
1.0
0.5
10 20 30 40
This is a plot of the absolute value of the Ramanujan t L function on its critical line Re z = 6.
In[4]:= Plot@Abs@RamanujanTauL@6 + I yDD, 8y, 0, 20<D
Out[4]=
2.0
1.5
1.0
0.5
5 10 15 20
1
Sn,p HzL = H-1Ln+p-1 HHn - 1L ! p !L 0 logn-1 HtL log p H1 - ztL t t. Polylogarithm functions appear in Feynman
sums of reciprocal powers can be expressed in terms of the Lerch transcendent. For example,
-s 1
the Catalan beta function bHsL = k
k=0 H-1L H2 k + 1L can be obtained as 2-s F J-1, s, 2 N.
The Lerch transcendent can also be used to evaluate Dirichlet L-series which appear in number
theory. The basic L-series has the form LHs, cL =
k=1 cHkL k , where the "character" cHkL is an
-s
integer function with period m. L-series of this kind can be written as sums of Lerch functions
with z a power of e2 p im .
s2
LerchPhi@z, s, a, DoublyInfinite -> TrueD gives the doubly infinite sum
k=- z IHa + kL M .
k 2
ZetaZero@kD the kth zero of the zeta function zHzL on the critical line
This gives the first zero with height greater than 15.
In[3]:= N@ZetaZero@1, 15DD
Out[3]= 0.5 + 21.022
The second exponential integral function ExpIntegralEi@zD is defined by EiHzL = --z e-t t t (for
the principal value of the integral is taken. liHzL is central to the study of the distribution of
primes in number theory. The logarithmic integral function is sometimes also denoted by LiHzL.
z
In some number theoretic applications, liHzL is defined as 2 d t log t, with no principal value taken.
This differs from the definition used in Mathematica by the constant liH2L.
The sine and cosine integral functions SinIntegral @zD and CosIntegral @zD are defined by
z
SiHzL = 0 sinHtL t t and CiHzL = -z cosHtL t t. The hyperbolic sine and cosine integral functions
z
SinhIntegral@zD and CoshIntegral@zD are defined by ShiHzL = 0 sinhHtL t t and
z
ChiHzL = g + logHzL + 0 HcoshHtL - 1L t t.
438 Mathematics and Algorithms
The error function Erf@zD is the integral of the Gaussian distribution, given by
z -t2
erfHzL = 2 p 0 e d t. The complementary error function Erfc@zD is given simply by
erfcHzL = 1 - erfHzL. The imaginary error function Erfi@zD is given by erfiHzL = erfHizL i. The generalized
z
error function Erf@z0 , z1 D is defined by the integral 2 p z 1 e-t d t. The error function is central
2
The inverse error function InverseErf@sD is defined as the solution for z in the equation
s = erfHzL. The inverse error function appears in computing confidence intervals in statistics as
well as in some algorithms for generating Gaussian random numbers.
Closely related to the error function are the Fresnel integrals FresnelC@zD defined by
z z
CHzL = 0 cosIp t2 2M d t and FresnelS@zD defined by SHzL = 0 sinIp t2 2M d t. Fresnel integrals occur in
diffraction theory.
The Bessel functions BesselJ@n, zD and BesselY@n, zD are linearly independent solutions to the
differential equation z2 y + z y + Iz2 - n2 M y = 0. For integer n, the Jn HzL are regular at z = 0, while the
Bessel functions arise in solving differential equations for systems with cylindrical symmetry.
Jn HzL is often called the Bessel function of the first kind, or simply the Bessel function. Yn HzL is
referred to as the Bessel function of the second kind, the Weber function, or the Neumann
function (denoted Nn HzL).
The Hankel functions (or Bessel functions of the third kind) HankelH1@n, zD and HankelH2@n, zD
give an alternative pair of solutions to the Bessel differential equation, related according to
HnH1,2L HzL = Jn HzL iYn HzL.
integer n, spherical Bessel functions can be expanded in terms of elementary functions by using
FunctionExpand.
The
The spherical
spherical Bessel
Bessel functions
functions SphericalBesselJ@n, zD and
SphericalBesselJ@n, zD and SphericalBesselY@n, zD, as
SphericalBesselY@n, zD, as well
well
as
as the
the spherical
spherical Hankel
Hankel functions
functions SphericalHankelH1@n, zD and
SphericalHankelH1@n, zD and SphericalHankelH2@n,
SphericalHankelH2@n, zD
zD,
440 Mathematics and Algorithms
functions by fn HzL = p2 z F 1 HzL, where f and F can be j and J, y and Y, or hi and H i . For
n+
2
integer n, spherical Bessel functions can be expanded in terms of elementary functions by using
FunctionExpand.
The modified Bessel functions BesselI@n, zD and BesselK@n, zD are solutions to the differential
equation z2 y + z y - Iz2 + n2 M y = 0. For integer n, In HzL is regular at z = 0; Kn HzL always has a logarith-
mic divergence at z = 0. The In HzL are sometimes known as hyperbolic Bessel functions.
Particularly in electrical engineering, one often defines the Kelvin functions KelvinBer@n, zD,
KelvinBei@n, zD, KelvinKer@n, zD and KelvinKei@n, zD. These are related to the ordinary
Bessel functions by bern HzL + i bein HzL = en p i Jn Iz e-p i4 M, kern HzL + i kein HzL = e-n p i2 Kn Iz ep i4 M.
The Airy functions AiryAi@zD and AiryBi@zD are the two independent solutions AiHzL and BiHzL to
the differential equation y - z y = 0. AiHzL tends to zero for large positive z, while BiHzL increases
unboundedly. The Airy functions are related to modified Bessel functions with one-third-integer
orders. The Airy functions often appear as the solutions to boundary value problems in electro-
magnetic theory and quantum mechanics. In many cases the derivatives of the Airy functions
AiryAiPrime @zD and AiryBiPrime @zD also appear.
The Struve function StruveH@n, zD appears in the solution of the inhomogeneous Bessel equa-
2 zn+1
tion which for integer n has the form z2 y + z y + Iz2 - n2 M y = p H2 n-1L!!
; the general solution to this
equation consists of a linear combination of Bessel functions with the Struve function Hn HzL
added. The modified Struve function StruveL@n, zD is given in terms of the ordinary Struve
function by Ln HzL = -i e-i n p2 Hn HzL. Struve functions appear particularly in electromagnetic theory.
Here is a plot of J0 I x M. This is a curve that an idealized chain hanging from one end can form
when you wiggle it.
In[16]:= Plot@BesselJ@0, Sqrt@xDD, 8x, 0, 50<D
Out[16]= 1.0
0.8
0.6
0.4
0.2
10 20 30 40 50
-0.2
-0.4
Mathematics and Algorithms 441
p 1
-x J1 + N
2 x
Out[2]=
x
The Airy function plotted here gives the quantum-mechanical amplitude for a particle in a
potential that increases linearly from left to right. The amplitude is exponentially damped in the
classically inaccessible region on the right.
In[17]:= Plot@AiryAi@xD, 8x, - 10, 10<D
Out[17]=
0.4
0.2
-10 -5 5 10
-0.2
-0.4
Spheroidal Functions
Spheroidal functions.
SpheroidalS1 and SpheroidalS2 are effectively spheroidal analogs of the spherical Bessel
functions jn HzL and yn HzL, while SpheroidalPS and SpheroidalQS are effectively spheroidal
analogs of the Legendre functions Pm
n HzL and Qn HzL. g > 0 corresponds to a prolate spheroidal
m 2
Many different normalizations for spheroidal functions are used in the literature. Mathematica
uses the Meixner|Schfke normalization scheme.
Mathematics and Algorithms 443
0.5
Out[2]=
-1.0 -0.5 0.5 1.0
-0.5
-1.0
Angular spheroidal functions PSn,0 Hg, hL for integers n 0 are eigenfunctions of a band-limited
Fourier transform.
In[3]:= Integrate@SpheroidalPS@3, 0, g, hD Exp@ w g hD, 8h, - 1, 1<D
Out[3]= -2 SpheroidalPS@3, 0, g, wD SpheroidalS1@3, 0, g, 1D
1
An angular spheroidal function with m = gives Mathieu angular functions.
2
In[4]:= SpheroidalPS@1 2, 1 2, c, zD
c2 c2 14
Out[4]= MathieuCBMathieuCharacteristicAB1, F, , ArcCos@zDF K p I1 - z2 M O
4 4
The Legendre functions and associated Legendre functions satisfy the differential equation
I1 - z2 M y - 2 z y + An Hn + 1L - m2 I1 - z2 ME y = 0. The Legendre functions of the first kind,
LegendreP@n, zD and LegendreP@n, m, zD, reduce to Legendre polynomials when n and m are
integers. The Legendre functions of the second kind LegendreQ@n, zD and LegendreQ@n, m, zD
444 Mathematics and Algorithms
The Legendre functions and associated Legendre functions satisfy the differential equation
I1 - z2 M y - 2 z y + An Hn + 1L - m2 I1 - z2 ME y = 0. The Legendre functions of the first kind,
LegendreP@n, zD and LegendreP@n, m, zD, reduce to Legendre polynomials when n and m are
integers. The Legendre functions of the second kind LegendreQ@n, zD and LegendreQ@n, m, zD
give the second linearly independent solution to the differential equation. For integer m they
have logarithmic singularities at z = 1. The Pn HzL and Qn HzL solve the differential equation with
m = 0.
LegendreP@n,m,zD or LegendreP@n,m,1,zD
m2
type 1 function containing I1 - z2 M
Legendre functions of type 1 and Legendre functions of type 2 have different symbolic forms,
but the same numerical values. They have branch cuts from - to -1 and from +1 to +. Legen-
dre functions of type 3, sometimes denoted m
n HzL and n HzL, have a single branch cut from - to
m
+1.
Toroidal functions or ring functions, which arise in studying systems with toroidal symmetry,
m m
can be expressed in terms of the Legendre functions P 1 HcoshhL and Q 1 HcoshhL.
n- n-
2 2
m m
Conical functions can be expressed in terms of P 1 HcosqL and Q 1 HcosqL.
- +i p - +i p
2 2
When you use the function LegendreP@n, xD with an integer n, you get a Legendre polynomial.
If you take n to be an arbitrary complex number, you get, in general, a Legendre function.
In the same way, you can use the functions GegenbauerC and so on with arbitrary complex
indices to get Gegenbauer functions, Chebyshev functions, Hermite functions, Jacobi functions
and Laguerre functions. Unlike for associated Legendre functions, however, there is no need to
distinguish different types in such cases.
Mathematics and Algorithms 445
Many of the special functions that we have discussed so far can be viewed as special cases of
the confluent hypergeometric function Hypergeometric1F1@a, b, zD.
The confluent hypergeometric function can be obtained from the series expansion
obtained when a and b are both integers. If a < 0, and either b > 0 or b < a, the series yields a
polynomial with a finite number of terms.
If b is zero or a negative integer, then 1 F1 Ha; b; zL itself is infinite. But the regularized confluent
hypergeometric function Hypergeometric1F1Regularized@a, b, zD given by 1 F1 Ha; b; zL GHbL has
a finite value in all cases.
Among the functions that can be obtained from 1 F1 are the Bessel functions, error function,
incomplete gamma function, and Hermite and Laguerre polynomials.
The function 1 F1 Ha; b; zL is sometimes denoted FHa; b; zL or MHa, b, zL. It is often known as the
Kummer function.
UHa, b, zL, like 1 F1 Ha; b; zL, is sometimes known as the Kummer function. The U function is some-
times denoted by Y.
The parabolic cylinder functions ParabolicCylinderD@n, zD are related to the Hermite func-
The Coulomb wave functions are also special cases of the confluent hypergeometric function.
Coulomb wave functions give solutions to the radial Schrdinger equation in the Coulomb
potential of a point nucleus. The regular Coulomb wave function is given by
FL Hh, rL = CL HhL r L+1 -i r
e 1 F1 HL + 1 - i h; 2 L + 2; 2 i rL, where CL HhL = 2 e L -ph2
GHL + 1 + i hL GH2 L + 2L.
Other special cases of the confluent hypergeometric function include the Toronto functions
THm, n, rL, Poisson|Charlier polynomials rn Hn, xL, Cunningham functions wn,m HxL and Bateman func-
tions kn HxL.
The 0 F1 function has the series expansion 0 k=0 1 HaLk z k ! and satisfies the differential
F1 H; a; zL = k
equation z y + a y - y = 0.
Bessel functions of the first kind can be expressed in terms of the 0 F1 function.
Mathematics and Algorithms 447
equation zH1 - zL y + @c - Ha + b + 1L zD y - a b y = 0.
The hypergeometric function can also be written as an integral: 2 F1 Ha, b; c; zL = GHcL @GHbL GHc - bLD
1 b-1
0 t H1 - tLc-b-1 H1 - t zL-a d t.
The hypergeometric function is also sometimes denoted by F, and is known as the Gauss series
or the Kummer series.
The Legendre functions, and the functions which give generalizations of other orthogonal polyno-
mials, can be expressed in terms of the hypergeometric function. Complete elliptic integrals can
also be expressed in terms of the 2 F1 function.
The Riemann P function, which gives solutions to Riemann's differential equation, is also a 2 F1
function.
448 Mathematics and Algorithms
The Meijer G function MeijerGA98a1 , , an <, 9an + 1, , a p ==, 88b1 , , bm <, 8bm + 1, ,
GH1 - bm+1 - sL GI1 - bq - sMM z-s ds, where the contour of integration is set up to lie between the
poles of GH1 - ai - sL and the poles of GHbi + sL. MeijerG is a very general function whose special
cases cover most of the functions discussed in the past few sections.
The product log function gives the solution for w in z = w ew . The function can be viewed as a
generalization of a logarithm. It can be used to represent solutions to a variety of transcenden-
tal equations. The tree generating function for counting distinct oriented trees is related to the
product log by THzL = -WH-zL.
Mathematics and Algorithms 449
Elliptic Integrals
450 Mathematics and Algorithms
Elliptic Integrals
Elliptic integrals.
Integrals of the form RHx, yL x, where R is a rational function, and y2 is a cubic or quartic polyno-
mial in x, are known as elliptic integrals. Any elliptic integral can be expressed in terms of the
three standard kinds of Legendre-Jacobi elliptic integrals.
The elliptic integral of the first kind EllipticF@f, mD is given for -p 2 < f < p 2 by
f -12 sin HfL -12
FHf mL = 0 A1 - m sin2 HqLE q= 0 AI1 - t2 M I1 - m t2 ME t. This elliptic integral arises in solving
the equations of motion for a simple pendulum. It is sometimes known as an incomplete elliptic
integral of the first kind.
Note that the arguments of the elliptic integrals are sometimes given in the opposite order from
what is used in Mathematica.
p
The complete elliptic integral of the first kind EllipticK@mD is given by KHmL = F I 2 mM. Note that
K is used to denote the complete elliptic integral of the first kind, while F is used for its incom-
plete form. In many applications, the parameter m is not given explicitly, and KHmL is denoted
simply by K. The complementary complete elliptic integral of the first kind K HmL is given by
KH1 - mL. It is often denoted K . K and i K give the "real" and "imaginary" quarter-periods of the
corresponding Jacobi elliptic functions discussed in "Elliptic Functions".
The elliptic integral of the second kind EllipticE@f, mD is given for -p 2 < f < p 2 by
f 12 sin HfL -12 12
EHf mL = 0 A1 - m sin2 HqLE q= 0 I1 - t2 M I1 - m t2 M t.
p
The complete elliptic integral of the second kind EllipticE@mD is given by EHmL = E I 2 mM. It is
The Jacobi zeta function JacobiZeta@f, mD is given by ZHf mL = EHf mL - EHmL FHf mL KHmL.
Mathematics and Algorithms 451
The Jacobi zeta function JacobiZeta@f, mD is given by ZHf mL = EHf mL - EHmL FHf mL KHmL.
2
The Heuman lambda function is given by L0 Hf mL = FHf 1 - mL KH1 - mL + p
KHmL ZHf 1 - mL.
p
The complete elliptic integral of the third kind EllipticPi@n, mD is given by PHn mL = PIn; 2
mM.
Here is a plot of the complete elliptic integral of the second kind EHmL.
In[1]:= Plot@EllipticE@mD, 8m, 0, 1<D
1.5
1.4
1.3
Out[1]=
1.2
1.1
Out[1]=
452 Mathematics and Algorithms
Elliptic Functions
Rational functions involving square roots of quadratic forms can be integrated in terms of
inverse trigonometric functions. The trigonometric functions can thus be defined as inverses of
the functions obtained from these integrals.
By analogy, elliptic functions are defined as inverses of the functions obtained from elliptic
integrals.
The amplitude for Jacobi elliptic functions JacobiAmplitude@u, mD is the inverse of the elliptic
integral of the first kind. If u = FHf mL, then f = amHu mL. In working with Jacobi elliptic functions,
the argument m is often dropped, so amHu mL is written as amHuL.
The Jacobi elliptic functions JacobiSN@u, mD and JacobiCN@u, mD are given respectively by
snHuL = sinHfL and cnHuL = cosHfL, where f = amHu mL. In addition, JacobiDN@u, mD is given by
There are a total of twelve Jacobi elliptic functions JacobiPQ@u, mD, with the letters P and Q
chosen from the set S, C, D and N. Each Jacobi elliptic function JacobiPQ@u, mD satisfies the
relation pqHuL = pnHuL qnHuL, where for these purposes nnHuL = 1.
There are many relations between the Jacobi elliptic functions, somewhat analogous to those
between trigonometric functions. In limiting cases, in fact, the Jacobi elliptic functions reduce to
trigonometric functions. So, for example, snHu 0L = sinHuL, snHu 1L = tanhHuL, cnHu 0L = cosHuL,
cnHu 1L = sechHuL, dnHu 0L = 1 and dnHu 1L = sechHuL.
u
The notation PqHuL is often used for the integrals 0 pq2 HtL t. These integrals can be expressed in
One of the most important properties of elliptic functions is that they are doubly periodic in the
complex values of their arguments. Ordinary trigonometric functions are singly periodic, in the
sense that f Hz + swL = f HzL for any integer s. The elliptic functions are doubly periodic, so that
f Hz + rw + sw L = f HzL for any pair of integers r and s.
The Jacobi elliptic functions snHu mL, etc. are doubly periodic in the complex u plane. Their peri-
ods include w = 4 KHmL and w = 4 iKH1 - mL, where K is the complete elliptic integral of the first
kind.
The choice of p and q in the notation pqHu mL for Jacobi elliptic functions can be understood in
terms of the values of the functions at the quarter periods K and i K .
This shows two complete periods in each direction of the absolute value of the Jacobi elliptic
1
function snJu N.
3
Out[3]= 4
0
0 1 2 3 4 5 6 7
454 Mathematics and Algorithms
Also built into Mathematica are the inverse Jacobi elliptic functions InverseJacobiSN@v, mD,
InverseJacobiCN@v, mD, etc. The inverse function sn-1 Hv mL, for example, gives the value of u
for which v = snHu mL. The inverse Jacobi elliptic functions are related to elliptic integrals.
n=1 q cosH2 n uL, J4 Hu, qL = 1 + 2 n=1 H-1L q cosH2 n uL. The theta functions are often
2 n n 2
J3 Hu, qL = 1 + 2 n
written as Ja HuL with the parameter q not explicitly given. The theta functions are sometimes
written in the form JHu mL, where m is related to q by q = exp @-p KH1 - mL KHmLD. In addition, q is
sometimes replaced by t, given by q = ei pt . All the theta functions satisfy a diffusion-like differen-
tial equation 2 JHu, tL u2 = 4 p i JHu, tL t.
The Siegel theta function SiegelTheta @t, sD with Riemann square modular matrix t of dimen-
sion p and vector s generalizes the elliptic theta functions to complex dimension p. It is defined
by QHt, sL = n expH pHn.t.n + 2 n.sLL, where n runs over all p-dimensional integer vectors. The Siegel
The Jacobi elliptic functions can be expressed as ratios of the theta functions.
An alternative notation for theta functions is QHu mL = J4 Hv mL, Q1 Hu mL = J3 Hv mL, HHu mL = J1 HvL,
H1 Hu mL = J2 HvL, where v = p u 2 KHmL.
The Neville theta functions can be defined in terms of the theta functions as Js HuL = 2 KHmL J1 Hv mL
pJ1 H0 mL, Jc HuL = J2 Hv mL J2 H0 mL, Jd HuL = J3 Hv mL J3 H0 mL, Jn HuL = J4 Hv mL J4 H0 mL, where v = p u 2 KHmL.
The Jacobi elliptic functions can be represented as ratios of the Neville theta functions.
The Weierstrass elliptic function WeierstrassP@u, 8g2 , g3 <D can be considered as the inverse of
an elliptic integral. The Weierstrass function Hu; g2 , g3 L gives the value of x for which
x -12
u = I4 t3 - g2 t - g3 M t. The function WeierstrassPPrime@u, 8g2 , g3 <D is given by
Hu; g2 , g3 L = u
Hu; g2 , g3 L.
The Weierstrass functions are also sometimes written in terms of their fundamental half-periods
w and w , obtained from the invariants g2 and g3 using WeierstrassHalfPeriods@8u, 8g2 , g3 <D.
The function InverseWeierstrassP@p, 8g2 , g3 <D finds one of the two values of u for which
p = Hu; g2 , g3 L. This value always lies in the parallelogram defined by the complex number half-
Mathematics and Algorithms 455
The function InverseWeierstrassP@p, 8g2 , g3 <D finds one of the two values of u for which
p = Hu; g2 , g3 L. This value always lies in the parallelogram defined by the complex number half-
periods w and w .
InverseWeierstrassP@8p, q<, 8g2 , g3 <D finds the unique value of u for which p = Hu; g2 , g3 L and
q = Hu; g2 , g3 L. In order for any such value of u to exist, p and q must be related by
q2 = 4 p3 - g2 p - g3 .
The Weierstrass zeta function WeierstrassZeta@u, 8g2 , g3 <D and Weierstrass sigma function
WeierstrassSigma@u, 8g2 , g3 <D are related to the Weierstrass elliptic functions by
z Hz; g2 , g3 L = -Hz; g2 , g3 L and s Hz; g2 , g3 L sHz; g2 , g3 L = zHz; g2 , g3 L.
The Weierstrass zeta and sigma functions are not strictly elliptic functions since they are not
periodic.
The modular lambda function ModularLambda@tD relates the ratio of half-periods t = w w to the
parameter according to m = lHtL.
The Klein invariant modular function KleinInvariantJ@tD and the Dedekind eta function
DedekindEta @tD satisfy the relations D = g32 JHtL = H2 pL12 h24 HtL.
Modular elliptic functions are defined to be invariant under certain fractional linear transforma-
tions of their arguments. Thus for example lHtL is invariant under any combination of the trans-
formations t t + 2 and t t H1 - 2 tL.
456 Mathematics and Algorithms
The definitions for elliptic integrals and functions given above are based on traditional usage.
For modern algebraic geometry, it is convenient to use slightly more general definitions.
The function EllipticLog @8x, y<, 8a, b<D is defined as the value of the integral
1 x 3 -12
2
It + a t2 + b tM t, where the sign of the square root is specified by giving the value of y such
x -12
that y = x3 + a x2 + b x . Integrals of the form It2 + a tM t can be expressed in terms of the
ordinary logarithm (and inverse trigonometric functions). You can think of EllipticLog as
giving a generalization of this, where the polynomial under the square root is now of degree
three.
The function EllipticExp @u, 8a, b<D is the inverse of EllipticLog . It returns the list 8x, y<
that appears in EllipticLog . EllipticExp is an elliptic function, doubly periodic in the com-
plex u plane.
The Mathieu functions MathieuC@a, q, zD and MathieuS@a, q, zD are solutions to the equation
y + @a - 2 q cos H2 zLD y = 0. This equation appears in many physical situations that involve elliptical
shapes or periodic potentials. The function MathieuC is defined to be even in z, while MathieuS
is odd.
When q = 0 the Mathieu functions are simply cos I a zM and sin I a zM. For nonzero q, the Mathieu
functions are only periodic in z for certain values of a. Such Mathieu characteristic values are
given by MathieuCharacteristicA@r, qD and MathieuCharacteristicB@r, qD with r an integer
or rational number. These values are often denoted by ar and br .
For integer r, the even and odd Mathieu functions with characteristic values ar and br are often
denoted c er Hz, qL and s er Hz, qL, respectively. Note the reversed order of the arguments z and q.
According to Floquets theorem any Mathieu function can be written in the form ei rz
f HzL, where
f HzL has period 2p and r is the Mathieu characteristic exponent
MathieuCharacteristicExponent@a, qD. When the characteristic exponent r is an integer or
rational number, the Mathieu function is therefore periodic. In general, however, when r is not a
real integer, ar and br turn out to be equal.
458 Mathematics and Algorithms
20
10
Out[1]=
2 4 6 8 10 12 14
-10
-20
Most special functions have simpler forms when given certain specific arguments. Mathematica
will automatically simplify special functions in such cases.
Here again Mathematica reduces a special case of the Airy function to an expression involving
gamma functions.
In[2]:= AiryAi@0D
1
Out[2]=
2
323 GammaB F
3
For most choices of arguments, no exact reductions of special functions are possible. But in
such cases, Mathematica allows you to find numerical approximations to any degree of preci-
sion. The algorithms that are built into Mathematica cover essentially all values of parameters~
real and complex~for which the special functions are defined.
Mathematics and Algorithms 459
For most choices of arguments, no exact reductions of special functions are possible. But in
such cases, Mathematica allows you to find numerical approximations to any degree of preci-
sion. The algorithms that are built into Mathematica cover essentially all values of parameters~
real and complex~for which the special functions are defined.
The result here is a huge complex number, but Mathematica can still find it.
In[5]:= N@AiryAi@1000 IDD
6472
Out[5]= -4.780266637767027 10 + 3.674920907226875 106472
Most special functions have derivatives that can be expressed in terms of elementary functions
or other special functions. But even in cases where this is not so, you can still use N to find
numerical approximations to derivatives.
One feature of working with special functions is that there are a large number of relations
between different functions, and these relations can often be used in simplifying expressions.
In this case the final result does not even involve PolyGamma.
In[17]:= FunctionExpand@Im@PolyGamma@0, 3 IDDD
1 1
Out[17]= + p Coth@3 pD
6 2